When we make a query, in some cases, we will want to modify score of documents in the result. This tutorial shows how using `function_score`

can help us.

Contents

### I. Function Score Query

To use `function_score`

, we have to define a query, then add one or more functions:

– one function:

GET javasampleapproach/tutorial/_search { "query": { "function_score": { "query": { "match_all": {} }, "boost": "3", "random_score": {}, "boost_mode": "sum" } } } |

In the example above, we use just one function: `random_score`

with `boost_mode`

(**sum** to add `boost`

value to `score`

). So each document in result will have score = random_value[0..1] + 3.

– combine several functions:

GET javasampleapproach/tutorial/_search { "query": { "function_score": { "query": { "match_all": {} }, "boost": "2", "functions": [ { "filter": { "match": { "title": "flow" } }, "random_score": {}, "weight": 2 }, { "filter": { "match": { "title": "processor" } }, "weight": 3 } ], "score_mode": "sum", "boost_mode": "sum", "max_boost": 4, "min_score": 3 } } } |

The example above:

– get all documents by `match_all`

query (then some of them will be excluded by `min_score`

parameter).

– `boost`

them by score `2`

.

– use 2 functions:

+ `random_score`

with `weight`

for first filter

+ `weight`

for second filter

(supported functions are here)

– `score_mode`

: **sum** means new score is the sum of given functions’ score. More score mode’s here.

– `boost_mode`

: **sum** means query score (`match_all`

with `boost`

) and function score (after applying `score_mode`

) are added. More boost mode’s here.

– `max_boost`

restricts new score to not exceed the specified value. Default to **FLT_MAX**.

In the example (`score_mode`

is **sum**):

new_score = function1_score + function2_score = weight1*random_score*ifMatch + weight2*ifMatch = 2*[0..1] + 3*[0/1] |

=> **new_score** could be from `0`

to `5`

. So `max_boost = 4`

will limit it to `4`

.

So the max score of all documents will not be over **6** because with `boost_mode`

is **sum**:

max_score = `boost`

+ `max_boost`

= 2 + 4 = 6

– `min_score`

: exclude documents which do not meet minimum score.

Because modifying the score does not change which documents match, so using `min_score = 3`

will exclude all documents with score (after all calculating) lower than `3`

.

*Note: For `min_score`

to work, all documents returned by the query need to be scored.

### II. Mode

##### 1. Score Mode

`score_mode`

specifies how the computed scores (among defined functions) are combined. We have some types:

– multiply (default)

– sum

– avg

– first

– max

– min

In case `score_mode = avg`

, the individual scores will be combined by a weighted average.

For example:

+ function_1: score = 1, weight = 3

+ function_2: score = 2, weight = 4

=> score = (1*3+2*4)/(3+4)

##### 2. Boost Mode

`boost_mode`

defines how to calculate score from **query** score and **function** score:

– multiply: (default)

– replace: use only function score

– sum

– avg

– max

– min

### III. Types of score functions

##### 1. Script Score

If we want to customize score with document’s numeric field values with custom computation, use `script_score`

function:

GET jsa_customer_idx/customer/_search { "query": { "function_score": { "query": { "range": { "age": { "gte": 35 } } }, "script_score": { "script": { "params": { "a": 1.2, "b": 5 }, "source": "params.a*doc['age'].value + params.b" } } } } } |

Response:

{ ... "hits": { "total": 4, "max_score": 54.2, "hits": [ { "_index": "jsa_customer_idx", "_type": "customer", "_id": "9", "_score": 54.2, "_source": { "fullname": "Nasim Strong", "age": 41, ... } }, { "_index": "jsa_customer_idx", "_type": "customer", "_id": "10", "_score": 50.6, "_source": { "fullname": "Keegan Blair", "age": 38, ... } }, ... ] } } |

##### 2. Weight

`weight`

score can multiply the score by weight value.

In many cases, scores are on different scales (0 to 1, or 5 to 10…). If we have more than one function, we may want to set different impact of each function by its weight (its level of importance), `weight`

will be multiplied with the score computed by the respective function.

If `weight`

is given without any other function declaration, it simply returns weight value if document matches.

The `number`

value is of type **float**.

"weight" : number |

##### 3. Random Score

`random_score`

function generates scores that are uniformly distributed in [0, 1].

Each time we send a request with `random_score`

function, the score is changed. We can avoid this change by providing a `seed`

for variation. Then Elasticsearch will generate random score based on this `seed`

with:

– a hash of the `_uid`

field (Elasticsearch version released before **6.0**)

"random_score": { "seed": 8, } |

– the minimum value of `field`

for the considered document and a salt that is computed based on the **index name** and **shard id** so that documents that have the same value but are stored in different indexes get different scores (from Elasticsearch 6.0).

A good default choice might be to use the `_seq_no`

field, whose only drawback is that scores will change if the document is updated since update operations also update the value of the `_seq_no`

field.

"random_score": { "seed": 8, "field": "_seq_no" } |

##### 4. Field Value Factor

If we want to use a field from documents to influence the score, `field_value_factor`

function can help use with avoiding the overhead of scripting (like `script_score`

function).

For example, we wish to influence the score with field `age`

:

GET jsa_customer_idx/customer/_search { "query": { "function_score": { "field_value_factor": { "field": "age", "factor": 1.2, "modifier": "sqrt", "missing": 5 } } } } |

– `field`

: field from the document.

– `factor`

: value to multiply the field value with. Defaults to **1**.

– `modifier`

: (defaults to **none**)

+ none: not apply

+ log

+ log1p: add 1 to the field value and take log

+ log2p: add 2 to the field value and take log

+ ln

+ ln1p: add 1 to the field value and take ln

+ ln2p: add 2 to the field value and take ln

+ square

+ sqrt

+ reciprocal

– `missing`

: value if the document doesn’t have the specified `field`

.

So, function in example above will be translated into formula:

if (customer type has 'age' field) score = sqrt(1.2 * doc['age'].value) else score = 5 |

Response:

{ ... "hits": { "total": 11, "max_score": 7.0142713, "hits": [ { ... "_id": "9", "_score": 7.0142713, "_source": { "fullname": "Nasim Strong", "age": 41, ... } }, { ... "_id": "10", "_score": 6.7527776, "_source": { "fullname": "Keegan Blair", "age": 38, ... } }, ... ] } } |

##### 5. Decay Functions

Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.

"DECAY_FUNCTION": { "FIELD": { "origin": "8,9", "scale": "3km", "offset": "0km", "decay": 0.36 } } |

– `DECAY_FUNCTION`

: one of `linear`

, `exp`

, or `gauss`

.

– `FIELD`

must be a **numeric**, **date**, or **geo-point** field.

– `origin`

: point of origin used for calculating distance (must be a **number**, **date** or **geo-point**).

Required for geo and numeric field.

For date fields the default is **now**. Date math (for example `now-1h`

) is supported for origin.

– `scale`

: distance from `origin`

+ `offset`

at which the computed score will equal `decay`

parameter.

For **geo** fields: can be defined as number+unit (1km, 12m,…). Default unit is **meters**.

For **date** fields: can to be defined as a number+unit (1h, 10d,…). Default unit is **milliseconds**.

For **numeric** field: Any number.

offset: (optional) decay function will only compute for documents with a distance greater than the defined `offset`

. Default to **0**.

– `decay`

: (optional) defines how documents are scored at the distance given at scale. Default to **0.5**.

What is the title key for?

“filter”: { “match”: { “title”: “flow” } }