Elasticsearch Compound Queries – Function Score Query

When we make a query, in some cases, we will want to modify score of documents in the result. This tutorial shows how using function_score can help us.

I. Function Score Query

To use function_score, we have to define a query, then add one or more functions:
– one function:

In the example above, we use just one function: random_score with boost_mode (sum to add boost value to score). So each document in result will have score = random_value[0..1] + 3.

– combine several functions:

The example above:
– get all documents by match_all query (then some of them will be excluded by min_score parameter).

boost them by score 2.

– use 2 functions:
+ random_score with weight for first filter
+ weight for second filter
(supported functions are here)

score_mode: sum means new score is the sum of given functions’ score. More score mode’s here.

boost_mode: sum means query score (match_all with boost) and function score (after applying score_mode) are added. More boost mode’s here.

max_boost restricts new score to not exceed the specified value. Default to FLT_MAX.
In the example (score_mode is sum):

=> new_score could be from 0 to 5. So max_boost = 4 will limit it to 4.

So the max score of all documents will not be over 6 because with boost_mode is sum:
max_score = boost + max_boost = 2 + 4 = 6

min_score: exclude documents which do not meet minimum score.
Because modifying the score does not change which documents match, so using min_score = 3 will exclude all documents with score (after all calculating) lower than 3.

*Note: For min_score to work, all documents returned by the query need to be scored.

II. Mode

1. Score Mode

score_mode specifies how the computed scores (among defined functions) are combined. We have some types:
– multiply (default)
– sum
– avg
– first
– max
– min

In case score_mode = avg, the individual scores will be combined by a weighted average.
For example:
+ function_1: score = 1, weight = 3
+ function_2: score = 2, weight = 4
=> score = (1*3+2*4)/(3+4)

2. Boost Mode

boost_mode defines how to calculate score from query score and function score:
– multiply: (default)
– replace: use only function score
– sum
– avg
– max
– min

III. Types of score functions

1. Script Score

If we want to customize score with document’s numeric field values with custom computation, use script_score function:

Response:

2. Weight

weight score can multiply the score by weight value.

In many cases, scores are on different scales (0 to 1, or 5 to 10…). If we have more than one function, we may want to set different impact of each function by its weight (its level of importance), weight will be multiplied with the score computed by the respective function.

If weight is given without any other function declaration, it simply returns weight value if document matches.

The number value is of type float.

3. Random Score

random_score function generates scores that are uniformly distributed in [0, 1].

Each time we send a request with random_score function, the score is changed. We can avoid this change by providing a seed for variation. Then Elasticsearch will generate random score based on this seed with:
– a hash of the _uid field (Elasticsearch version released before 6.0)

– the minimum value of field for the considered document and a salt that is computed based on the index name and shard id so that documents that have the same value but are stored in different indexes get different scores (from Elasticsearch 6.0).
A good default choice might be to use the _seq_no field, whose only drawback is that scores will change if the document is updated since update operations also update the value of the _seq_no field.

4. Field Value Factor

If we want to use a field from documents to influence the score, field_value_factor function can help use with avoiding the overhead of scripting (like script_score function).

For example, we wish to influence the score with field age:

field: field from the document.

factor: value to multiply the field value with. Defaults to 1.

modifier: (defaults to none)
+ none: not apply
+ log
+ log1p: add 1 to the field value and take log
+ log2p: add 2 to the field value and take log
+ ln
+ ln1p: add 1 to the field value and take ln
+ ln2p: add 2 to the field value and take ln
+ square
+ sqrt
+ reciprocal

missing: value if the document doesn’t have the specified field.

So, function in example above will be translated into formula:

Response:

5. Decay Functions

Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.

DECAY_FUNCTION: one of linear, exp, or gauss.

FIELD must be a numeric, date, or geo-point field.

origin: point of origin used for calculating distance (must be a number, date or geo-point).
Required for geo and numeric field.
For date fields the default is now. Date math (for example now-1h) is supported for origin.

scale: distance from origin + offset at which the computed score will equal decay parameter.
For geo fields: can be defined as number+unit (1km, 12m,…). Default unit is meters.
For date fields: can to be defined as a number+unit (1h, 10d,…). Default unit is milliseconds.
For numeric field: Any number.

offset: (optional) decay function will only compute for documents with a distance greater than the defined offset. Default to 0.

decay: (optional) defines how documents are scored at the distance given at scale. Default to 0.5.

By grokonez | November 21, 2017.



Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*