Elasticsearch Simple Query String Query

In this tutorial, we’re gonna look at Elasticsearch Simple Query String Query that allows us to specify AND|OR|NOT… conditions and multi-field search within a single query string. Unlike query_string query (which is recommended for expert users only), simple_query_string query discards invalid parts of the query and never throws an exception.

I. Simple Query String Query

This is an example using simple_query_string to find out documents that have title or tags containing string that matches for query:

The response will be like this:

In the example above, we use some top level parameters:
query: flow +(processor | publisher) -submissionpublisher
=> with default_operator = and, the query shows that we want documents with string containing:
“flow” AND (“processor” OR “publisher”) and_operator (not “submissionpublisher”)

fields: [ "title^3", "tags" ]
It specifies which fields to be searched for, and how weight each field scores (= title*3 + tags).

Elasticsearch simple_query_string has strong parameters:
– query
– fields
– default_operator
– flags
– analyze_wildcard
– analyzer
– quote_field_suffix
– lenient
– minimum_should_match

Next part of this tutorial explains more details about these parameters.

II. Top Level Parameters

1. Query

query is actual query to be parsed, Elasticsearch supports the following special characters in query string:
+ : AND operation
| : OR operation
- : negates a single token
" : wraps a number of tokens to signify a phrase for searching (e.g: “java sample approach”)
* : at the end of a term to signigy a prefix query (e.g: java*)
( and ) : signify precedence
~N : after a word to signify edit distance (fuzziness)
~N : after a phrase to signify slop amount

*Note: In order to search for any of these special characters, we must escape them with \.

Default Operator

default_operator (default to OR) value is so important because it can make a big different behavior.
For example:

"default_operator": "or" have a different behavior from "default_operator": "and".
or operator applies to the query will tell Elasticsearch that we want to find out documents that contain “flow” or “publisher”, or documents that don’t contain “submissionpublisher”.

The response may be contain many documents that don’t relate to our intention (title: “How to integrate Angular 4 with SpringBoot RestApi” for example).

Fuzziness

Using ~N after a word to make a string fuzziness query:

Response:

2. Fields

fields is used to specify field array to perform the parsed query. Defaults to *. So if we don’t add fields parameter to the request query, Elasticsearch will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields.

We can also use pattern based field names on fields parameter, or the weight each field scores with ^ symbol:

Elasticsearch knows that t* matches with title and tags field, and score calculated from description field will mutiply with 3.

3. FLags

flags specifies which parsing features should be enabled (ALL, NONE, AND, OR, NOT, PREFIX, PHRASE, PRECEDENCE, ESCAPE, WHITESPACE, FUZZY, NEAR, and SLOP). Defaults to ALL.

For using multi flags, we just add | between 2 flags:

Response:

4. Analyzing

analyze_wildcard: If true, analyze the prefix. Defaults to false.

analyzer: force the analyzer to use to analyze each term of the query when creating composite queries.
For example:

Documents with title containing both “reactive” and “and” will match the query.
But if we add an analyzer like this:

Because “and” is a stop word, so it will be removed, documents with title containing “reactive” (not need “and” anymore) will match the query. Response could be document with title: “Java 9 Flow API – Reactive Streams”.

More about Analyzer:
Basic Analyzers
Custom Analyzer

5. Quote Field Suffix

quote_field_suffix: appends a suffix to fields for quoted parts of the query string.

For example, we have title field with analyzer_A, and title.exact field inside title with analyzer_B. Using quote_field_suffix can help us to mix exact search with stemming in a query string by putting word in 2 double quotes:

The query will apply “integrating” with title.exact field (analyzer_B) and “jpa” with title field (analyzer_A).

This is full example:

Check Response:

6. Others

lenient If true, format of field will be ignored, so it may cause format based failures (like search text from a numeric field). Default to false.

For example, this query causes a number_format_exception:


If we set lenient to true:

There is no exception, and the response:

minimum_should_match: The minimum number of clauses that must match for a document to be returned. See minimum_should_match for options.

By grokonez | November 25, 2017.


Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*