ElasticSearch_06_ES data query

Series Article Directory

foreword

ES data query, we will explain from 5 aspects:

1. Basic query (similar to where field1 = xxx and filed2 like "%xxx%" in mysql)
2. _source filtering/result filtering (similar to select field1,field2 from table-name in mysql)
3. Advanced query (similar to where field1 = xxx and filed2 like "%xxx%" in mysql)
4. Filter the query result row records (similar to where field1 = xxx and filed2 like "%xxx%" in mysql)
5. Sort (similar to order by field desc/asc in mysql)

Summary: basic query match bool range fuzzy, then filter filter, then _source sort, and finally sort order .

All es operation statements in this article: https://www.syjshare.com/res/5W547A7Z

1. Basic query

basic grammar

GET /index library name/_search
{
    "query":{
        "query type":{
            "Query conditions":"query condition value"
        }
    }
}

The query here represents a query object, which can have different query properties

  • Query type:
    • For example: match_all, match, term, range, etc.
  • Query conditions: Query conditions will be written differently depending on the type, which will be explained in detail later

data prefab


1.1 Query all (match_all)

Example:

GET /myindex/_search
{
    "query":{
        "match_all": {}
    }
}
  • query: represents the query object
  • match_all: means query all

result:

  • took: the time spent by the query, in milliseconds
  • time_out: whether to time out
  • _shards: shard information
  • hits: search result overview object
    • total: the total number of items searched
    • max_score: the highest score for the document in all results
    • hits: an array of document objects for search results, each element is a piece of searched document information
      • _index: index library
      • _type: document type
      • _id: document id
      • _score: document score
      • _source: The source data of the document

1.2 Match query (match)

1. Insert prefabricated data

Let's add a piece of data first for testing:

PUT /myindex/goods/3
{
    "title":"Mi TV 4 A",
    "price":3899.00
}

GET /myindex/_search
{
    "query":{
        "match_all": {}
    }
}

Now, there are 2 mobile phones, 1 TV in the index library:

2. The match keyword is or match

match type query, the query conditions will be segmented, and then query, the relationship between multiple terms is or

GET /myindex/_search
{
    "query":{
        "match":{
            "title":"Xiaomi TV"
        }
    }
}

or

GET /myindex/_search
{
    "query":{
        "match":{
            "title":{
              "query": "Xiaomi TV"            }
        }
    }
}

or

GET /myindex/_search
{
    "query":{
        "match":{
            "title":{
              "query": "Xiaomi TV",
              "operator": "or"
            }
        }
    }
}

result:


In the above case, as long as any one of the four characters of Xiaomi TV matches, it can be matched. Therefore, there is an or relationship between multiple words.

3. The match keyword + operator specifies yes and matches

In some cases, we need to find more precise, we want this relationship to become and, we can do this:

GET /myindex/_search
{
    "query":{
        "match": {
          "title": {
            "query": "Xiaomi TV",
            "operator": "and"
          }
        }
    }
}

result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "Mi TV 4 A",
          "price": 3899
        }
      }
    ]
  }
}

In this example, only terms that also contain Xiaomi TV will be searched.

4. Choose between or and and

Choosing between or and and is a bit too black and white. If there are 5 query terms after the conditional segmentation given by the user, and want to find documents that only contain 4 of them, what should I do? Setting the operator parameter to and will only exclude this document.

Sometimes this is what we want, but in most use cases of full-text search, we want to include those documents that are potentially relevant while excluding those that are less relevant. In other words, we want to be in the middle of some kind of outcome.

The match query supports the minimum_should_match parameter, which allows us to specify the number of terms that must match to indicate whether a document is relevant. We can set this to a specific number, more commonly a percentage, since we have no control over how many words the user enters when searching:

GET /myindex/_search
{
    "query":{
        "match":{
            "title":{
            	"query":"xiaomi curved tv",
            	"minimum_should_match": "75%"
            }
        }
    }
}

In this example, the search sentence can be divided into 3 words. If the and relation is used, it needs to satisfy 3 words at the same time to be searched. Here we use the minimum number of brands: 75%, which means that as long as 75% of the total number of entries is matched, here 3*75% is approximately equal to 2. So as long as it contains 2 entries, the conditions are met.

result:

1.3 Multi-field query (multi_match)

multi_match is similar to match, except that it can be queried in multiple fields

GET /myindex/_search
{
    "query":{
        "multi_match": {
            "query":    "Millet",
            "fields":   [ "title", "price" ]
        }
	}
}

In this example, we will query the word Xiaomi in the title field and price field

1.4 Exact value matching (term)

The term query is used to match exact values, which may be numbers, times, booleans, or those unsplit strings

GET /myindex/_search
{
    "query":{
        "term":{
            "price":2699.00
        }
    }
}

result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "title": "Xiaomi phone",
          "price": 2699
        }
      }
    ]
  }
}

1.5 Multi-term exact match (terms)

The terms query is the same as the term query, but it allows you to specify multiple values ​​to match against. If the field contains any of the specified values, then the document satisfies the condition:

GET /myindex/_search
{
    "query":{
        "terms":{
            "price":[2699.00,3899.00]
        }
    }
}

result:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "myindex",
        "_type" : "goods",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "Xiaomi phone",
          "price" : 2699
        }
      },
      {
        "_index" : "myindex",
        "_type" : "goods",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "Mi TV 4 A",
          "price" : 3899.0
        }
      }
    ]
  }
}

One more example, as follows:

GET /myindex/_search
{
    "query":{
        "terms":{
            "price":[2699.00,2899.00,3899.00]
        }
    }
}

result:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "rice phone",
          "price": 2899
        }
      },
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "title": "Xiaomi phone",
          "price": 2699
        }
      },
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "Mi TV 4 A",
          "price": 3899
        }
      }
    ]
  }
}

2. In the returned result, only the fields that need to be returned

By default, elasticsearch will return all fields in the document saved in _source in the search results.

If we only want to get some of the fields, we can add _source filtering

2.1 Implemented through the _source keyword

Example:

GET /myindex/_search
{
  "_source": ["title","price"],
  "query": {
    "term": {
      "price": 2699
    }
  }
}

Returned result:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "r9c1KGMBIhaxtY5rlRKv",
        "_score": 1,
        "_source": {
          "price": 2699,
          "title": "Xiaomi phone"
        }
      }
    ]
  }
}

2.2 Implemented through includes and excludes

We can also pass:

  • includes: to specify the fields you want to display
  • excludes: to specify fields that you do not want to display

Both are optional.

Example:

GET /myindex/_search
{
  "_source": {
    "includes":["title"]
  },
  "query": {
    "term": {
      "price": 2699
    }
  }
}

The result will be the same as:

GET /myindex/_search
{
  "_source": {
     "excludes": ["price"]
  },
  "query": {
    "term": {
      "price": 2699
    }
  }
}

3. Advanced query

3.1 Boolean combination (bool)

bool combines various other queries by must (and), must_not (not), should (or)

GET /myindex/_search
{
    "query":{
        "bool":{
        	"must":     { "match": { "title": "rice" }},
        	"must_not": { "match": { "title":  "television" }},
        	"should":   { "match": { "title": "cell phone" }}
        }
    }
}

result:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "heima",
        "_type": "goods",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "rice phone",
          "price": 2899
        }
      }
    ]
  }
}

3.2 Range query (range)

The range query finds those numbers or times that fall within the specified range

GET /myindex/_search
{
    "query":{
        "range": {
            "price": {
                "gte":  1000.0,
                "lt":   2800.00
            }
    	}
    }
}

The range query allows the following characters:

operatorillustrate
gtmore than the
gtegreater or equal to
ltless than
lteless than or equal to

3.3 Fuzzy query (fuzzy)

We add a new product:

POST /myindex/goods/4
{
    "title":"apple cell phone",
    "price":6899.00
}

The fuzzy query is the fuzzy equivalent of the term query. It allows users to deviate the spelling of the search term from the actual term, but the deviation must not exceed an edit distance of 2:

GET /myindex/_search
{
  "query": {
    "fuzzy": {
      "title": "appla"
    }
  }
}

The above query can also query the apple mobile phone

We can specify the allowable edit distance by fuzziness:

GET /myindex/_search
{
  "query": {
    "fuzzy": {
        "title": {
            "value":"appla",
            "fuzziness":1
        }
    }
  }
}

Fourth, filter the query result row records (filter)

4.1 Filter in conditional query

All queries affect the scoring and ranking of documents. If we need to filter in the query results, and do not want the filter to affect the score, then don't use the filter as a query. Instead use the filter method:

GET /myindex/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "Xiaomi phone" }},
        	"filter":{
                "range":{"price":{"gt":2000.00,"lt":3800.00}}
        	}
        }
    }
}

Note: You can also perform bool combination condition filtering again in filter.

Summary: Match the output result of the word segmentation "Xiaomi mobile phone", then filter by price again, and finally output.

4.2 No query conditions, direct filtering

If a query only has filtering, no query conditions, and do not want to score, we can use constant_score to replace the bool query with only the filter statement. Performance-wise it's exactly the same, but goes a long way towards improving query brevity and clarity.

GET /myindex/_search
{
    "query":{
        "constant_score":   {
            "filter": {
            	 "range":{"price":{"gt":2000.00,"lt":3000.00}}
            }
        }
}

5. Sorting

5.1 Single field sorting

sort allows us to sort by different fields and specify the sorting method by order

GET /myindex/_search
{
  "query": {
    "match": {
      "title": "Xiaomi phone"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

Summary: Xiaomi mobile phone word segmentation query, and then order desc sorting.

5.2 Multi-field sorting

Suppose we want to query using a combination of price and _score, and the matching results are sorted first by price and then by relevance score:

GET /_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "Xiaomi phone" }}
        }
    },
    "sort": [
      { "price": { "order": "desc" }},
      { "_score": { "order": "desc" }}
    ]
}

GET /myindex/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "Xiaomi phone" }},
        	"filter":{
                "range":{"price":{"gt":2000.00,"lt":3000.00}}
        	}
        }
    },
    "sort": [
      { "price": { "order": "desc" }},
      { "_score": { "order": "desc" }}
    ]
}

Summarize

ES data query, we will explain from 5 aspects:

1. Basic query (similar to where field1 = xxx and filed2 like "%xxx%" in mysql)
2. _source filtering/result filtering (similar to select field1,field2 from table-name in mysql)
3. Advanced query (similar to where field1 = xxx and filed2 like "%xxx%" in mysql)
4. Filter the query result row records (similar to where field1 = xxx and filed2 like "%xxx%" in mysql)
5. Sort (similar to order by field desc/asc in mysql)

Summary: basic query match , then bool range fuzzy, then filter filter, then _source sort, and finally sort order .

Tags: MySQL Database ElasticSearch

Posted by goosez22 on Tue, 04 Oct 2022 07:18:50 +1030