term_vector

edit

Term vectors contain information about the terms produced by the analysis process, including:

  • a list of terms.
  • the position (or order) of each term.
  • the start and end character offsets mapping the term to its origin in the original string.
  • payloads (if they are available) — user-defined binary data associated with each term position.

These term vectors can be stored so that they can be retrieved for a particular document.

The term_vector setting accepts:

no

No term vectors are stored. (default)

yes

Just the terms in the field are stored.

with_positions

Terms and positions are stored.

with_offsets

Terms and character offsets are stored.

with_positions_offsets

Terms, positions, and character offsets are stored.

with_positions_payloads

Terms, positions, and payloads are stored.

with_positions_offsets_payloads

Terms, positions, offsets and payloads are stored.

The fast vector highlighter requires with_positions_offsets. The term vectors API can retrieve whatever is stored.

Setting with_positions_offsets will double the size of a field’s index.

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "text": {
                "type": "text",
                "term_vector": "with_positions_offsets"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="1",
    document={
        "text": "Quick brown fox"
    },
)
print(resp1)

resp2 = client.search(
    index="my-index-000001",
    query={
        "match": {
            "text": "brown fox"
        }
    },
    highlight={
        "fields": {
            "text": {}
        }
    },
)
print(resp2)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        text: {
          type: 'text',
          term_vector: 'with_positions_offsets'
        }
      }
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
    text: 'Quick brown fox'
  }
)
puts response

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      match: {
        text: 'brown fox'
      }
    },
    highlight: {
      fields: {
        text: {}
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      text: {
        type: "text",
        term_vector: "with_positions_offsets",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 1,
  document: {
    text: "Quick brown fox",
  },
});
console.log(response1);

const response2 = await client.search({
  index: "my-index-000001",
  query: {
    match: {
      text: "brown fox",
    },
  },
  highlight: {
    fields: {
      text: {},
    },
  },
});
console.log(response2);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "text": {
        "type":        "text",
        "term_vector": "with_positions_offsets"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "text": "Quick brown fox"
}

GET my-index-000001/_search
{
  "query": {
    "match": {
      "text": "brown fox"
    }
  },
  "highlight": {
    "fields": {
      "text": {} 
    }
  }
}

The fast vector highlighter will be used by default for the text field because term vectors are enabled.