- .NET Clients: other versions:
- Introduction
- Breaking changes
- API Conventions
- Elasticsearch.Net - Low level client
- NEST - High level client
- Troubleshooting
- Search
- Query DSL
- Full text queries
- Term level queries
- Exists Query Usage
- Fuzzy Date Query Usage
- Fuzzy Numeric Query Usage
- Fuzzy Query Usage
- Ids Query Usage
- Prefix Query Usage
- Date Range Query Usage
- Numeric Range Query Usage
- Term Range Query Usage
- Regexp Query Usage
- Term Query Usage
- Terms List Query Usage
- Terms Lookup Query Usage
- Terms Query Usage
- Type Query Usage
- Wildcard Query Usage
- Compound queries
- Joining queries
- Geo queries
- Geo Bounding Box Query Usage
- Geo Distance Query Usage
- Geo Distance Range Query Usage
- Geo Hash Cell Query Usage
- Geo Polygon Query Usage
- Geo Shape Circle Query Usage
- Geo Shape Envelope Query Usage
- Geo Shape Geometry Collection Query Usage
- Geo Shape Indexed Shape Query Usage
- Geo Shape Line String Query Usage
- Geo Shape Multi Line String Query Usage
- Geo Shape Multi Point Query Usage
- Geo Shape Multi Polygon Query Usage
- Geo Shape Point Query Usage
- Geo Shape Polygon Query Usage
- Specialized queries
- Span queries
- NEST specific queries
- Aggregations
- Metric Aggregations
- Average Aggregation Usage
- Cardinality Aggregation Usage
- Extended Stats Aggregation Usage
- Geo Bounds Aggregation Usage
- Geo Centroid Aggregation Usage
- Max Aggregation Usage
- Min Aggregation Usage
- Percentile Ranks Aggregation Usage
- Percentiles Aggregation Usage
- Scripted Metric Aggregation Usage
- Stats Aggregation Usage
- Sum Aggregation Usage
- Top Hits Aggregation Usage
- Value Count Aggregation Usage
- Bucket Aggregations
- Adjacency Matrix Usage
- Children Aggregation Usage
- Date Histogram Aggregation Usage
- Date Range Aggregation Usage
- Filter Aggregation Usage
- Filters Aggregation Usage
- Geo Distance Aggregation Usage
- Geo Hash Grid Aggregation Usage
- Global Aggregation Usage
- Histogram Aggregation Usage
- Ip Range Aggregation Usage
- Missing Aggregation Usage
- Nested Aggregation Usage
- Range Aggregation Usage
- Reverse Nested Aggregation Usage
- Sampler Aggregation Usage
- Significant Terms Aggregation Usage
- Terms Aggregation Usage
- Pipeline Aggregations
- Average Bucket Aggregation Usage
- Bucket Script Aggregation Usage
- Bucket Selector Aggregation Usage
- Cumulative Sum Aggregation Usage
- Derivative Aggregation Usage
- Extended Stats Bucket Aggregation Usage
- Max Bucket Aggregation Usage
- Min Bucket Aggregation Usage
- Moving Average Ewma Aggregation Usage
- Moving Average Holt Linear Aggregation Usage
- Moving Average Holt Winters Aggregation Usage
- Moving Average Linear Aggregation Usage
- Moving Average Simple Aggregation Usage
- Percentiles Bucket Aggregation Usage
- Serial Differencing Aggregation Usage
- Stats Bucket Aggregation Usage
- Sum Bucket Aggregation Usage
- Matrix Aggregations
- Metric Aggregations
WARNING: Version 5.x has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Testing analyzers
editTesting analyzers
editWhen building your own analyzers, it’s useful to test that the analyzer does what we expect it to. This is where the Analyze API comes in.
Testing in-built analyzers
editTo get started with the Analyze API, we can test to see how a built-in analyzer will analyze a piece of text
var analyzeResponse = client.Analyze(a => a .Analyzer("standard") .Text("F# is THE SUPERIOR language :)") );
This returns the following response from Elasticsearch
{ "tokens": [ { "token": "f", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "is", "start_offset": 3, "end_offset": 5, "type": "<ALPHANUM>", "position": 1 }, { "token": "the", "start_offset": 6, "end_offset": 9, "type": "<ALPHANUM>", "position": 2 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4 } ] }
which is deserialized to an instance of IAnalyzeResponse
by NEST
that we can work with
foreach (var analyzeToken in analyzeResponse.Tokens) { Console.WriteLine($"{analyzeToken.Token}"); }
In testing the standard
analyzer on our text, we’ve noticed that
-
F#
is tokenized as"f"
-
stop word tokens
"is"
and"the"
are included -
"superior"
is included but we’d also like to tokenize"great"
as a synonym for superior
We’ll look at how we can test a combination of built-in analysis components next to build an analyzer to fit our needs.
Testing built-in analysis components
editA transient analyzer can be composed from built-in analysis components to test an analysis configuration
var analyzeResponse = client.Analyze(a => a .Tokenizer("standard") .Filter("lowercase", "stop") .Text("F# is THE SUPERIOR language :)") );
{ "tokens": [ { "token": "f", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4 } ] }
Great! This has removed stop words, but we still have F#
tokenized as "f"
and no "great"
synonym for "superior"
.
Character and Token filters are applied in the order in which they are specified.
Let’s build a custom analyzer with additional components to solve this.
Testing a custom analyzer in an index
editA custom analyzer can be created within an index, either when creating the index or by updating the settings on an existing index.
When adding to an existing index, it needs to be closed first.
In this example, we’ll add a custom analyzer to an existing index. First, we need to close the index
client.CloseIndex("analysis-index");
Now, we can update the settings to add the analyzer
client.UpdateIndexSettings("analysis-index", i => i .IndexSettings(s => s .Analysis(a => a .CharFilters(cf => cf .Mapping("my_char_filter", m => m .Mappings("F# => FSharp") ) ) .TokenFilters(tf => tf .Synonym("my_synonym", sf => sf .Synonyms("superior, great") ) ) .Analyzers(an => an .Custom("my_analyzer", ca => ca .Tokenizer("standard") .CharFilters("my_char_filter") .Filters("lowercase", "stop", "my_synonym") ) ) ) ) );
And open the index again. Here, we also wait up to five seconds for the status of the index to become green
client.OpenIndex("analysis-index"); client.ClusterHealth(h => h .WaitForStatus(WaitForStatus.Green) .Index("analysis-index") .Timeout(TimeSpan.FromSeconds(5)) );
With the index open and ready, let’s test the analyzer
var analyzeResponse = client.Analyze(a => a .Index("analysis-index") .Analyzer("my_analyzer") .Text("F# is THE SUPERIOR language :)") );
Since we added the custom analyzer to the "analysis-index" index, we need to target this index to test it |
The output now looks like
{ "tokens": [ { "token": "fsharp", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3 }, { "token": "great", "start_offset": 10, "end_offset": 18, "type": "SYNONYM", "position": 3 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4 } ] }
Exactly what we were after!
Testing an analyzer on a field
editIt’s also possible to test the analyzer for a given field type mapping. Given an index created with the following settings and mappings
client.CreateIndex("project-index", i => i .Settings(s => s .Analysis(a => a .CharFilters(cf => cf .Mapping("my_char_filter", m => m .Mappings("F# => FSharp") ) ) .TokenFilters(tf => tf .Synonym("my_synonym", sf => sf .Synonyms("superior, great") ) ) .Analyzers(an => an .Custom("my_analyzer", ca => ca .Tokenizer("standard") .CharFilters("my_char_filter") .Filters("lowercase", "stop", "my_synonym") ) ) ) ) .Mappings(m => m .Map<Project>(mm => mm .Properties(p => p .Text(t => t .Name(n => n.Name) .Analyzer("my_analyzer") ) ) ) ) );
The analyzer on the name
field can be tested with
var analyzeResponse = client.Analyze(a => a .Index("project-index") .Field<Project>(f => f.Name) .Text("F# is THE SUPERIOR language :)") );
Advanced details with Explain
editIt’s possible to get more advanced details about analysis by setting Explain()
on
the request.
For this example, we’ll use Object Initializer syntax instead of the Fluent API; choose whichever one you’re most comfortable with!
var analyzeRequest = new AnalyzeRequest { Analyzer = "standard", Text = new [] { "F# is THE SUPERIOR language :)" }, Explain = true }; var analyzeResponse = client.Analyze(analyzeRequest);
We now get further details back in the response
{ "detail": { "custom_analyzer": false, "analyzer": { "name": "standard", "tokens": [ { "token": "f", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0, "bytes": "[66]", "positionLength": 1 }, { "token": "is", "start_offset": 3, "end_offset": 5, "type": "<ALPHANUM>", "position": 1, "bytes": "[69 73]", "positionLength": 1 }, { "token": "the", "start_offset": 6, "end_offset": 9, "type": "<ALPHANUM>", "position": 2, "bytes": "[74 68 65]", "positionLength": 1 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3, "bytes": "[73 75 70 65 72 69 6f 72]", "positionLength": 1 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4, "bytes": "[6c 61 6e 67 75 61 67 65]", "positionLength": 1 } ] } } }
On this page