Elasticsearch aggs: how to set the 'from' param? - group-by

Elasticsearch aggregation: How can I set the 'from' parameter, not just the size, for the result of an aggregation?

Do you mean like this:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{ "range": { "timestamp": { "from": "now-180d", "to": "now" } } }
]
}
}
}
}
}

Related

Cannot find # in OpenSearch query

I have an index that includes a field and when a '#' is input, I cannot get the query to find the #.
Field Data: "#3213939"
Query:
GET /invoices/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"referenceNumber": {
"query": "#32"
}
}
},
{
"wildcard": {
"referenceNumber": {
"value": "*#32*"
}
}
}
]
}
}
}
"#" character drops during standard text analyzer this is why you can't find it.
POST _analyze
{
"text": ["#3213939"]
}
Response:
{
"tokens": [
{
"token": "3213939",
"start_offset": 1,
"end_offset": 8,
"type": "<NUM>",
"position": 0
}
]
}
You can update the analyzer and customize it.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
OR
you can use referenceNumber.keyword field.
GET test_invoices/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"referenceNumber": {
"query": "#32"
}
}
},
{
"wildcard": {
"referenceNumber.keyword": {
"value": "*#32*"
}
}
}
]
}
}
}

Get inserted document counts in specific date range using date histogram in elasticsearch

I have list documents in elasticsearch which contains various fileds.
documents looks like below.
{
"role": "api_user",
"apikey": "key1"
"data":{},
"#timestamp": "2021-10-06T16:47:13.555Z"
},
{
"role": "api_user",
"apikey": "key1"
"data":{},
"#timestamp": "2021-10-06T18:00:00.555Z"
},
{
"role": "api_user",
"apikey": "key1"
"data":{},
"#timestamp": "2021-10-07T13:47:13.555Z"
}
]
I wanted to find the number of documents present in specifi date range with 1day interval, let's say
2021-10-05T00:47:13.555Z to 2021-10-08T00:13:13.555Z
I am trying the below aggregation for the result.
{
"size": 0,
"query": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2021-10-05T00:47:13.555Z",
"lte": "2021-10-08T00:13:13.555Z",
"format": "strict_date_optional_time"
}
}
}
]
}
}
},
"aggs": {
"data": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
}
}
The expected output should be:-
For 2021-10-06 I should get 2 documents and 2021-10-07 I should get 1 document and if the docs are not present I should get count as 0.
the below solution works
{
"size":0,
"query":{
"bool":{
"must":[
],
"filter":[
{
"match_all":{
}
},
{
"range":{
"#timestamp":{
"gte":"2021-10-05T00:47:13.555Z",
"lte":"2021-10-08T00:13:13.555Z",
"format":"strict_date_optional_time"
}
}
}
],
"should":[
],
"must_not":[
]
}
},
"aggs":{
"data":{
"date_histogram":{
"field":"#timestamp",
"fixed_interval":"12h",
"time_zone":"Asia/Calcutta",
"min_doc_count":1
}
}
}
}

Fetching esJsonRDD from elasticsearch with complex filtering in Spark

I am currently fetching the elasticsearch RDD in our Spark Job filtering based on one-line elastic query as such (example):
val elasticRdds = sparkContext.esJsonRDD(esIndex, s"?default_operator=AND&q=director.name:DAVID + \n movie.name:SEVEN")
Now if our search query becomes complex like:
{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}
Can I still convert that query to in-line elastic query to use it with esJsonRDD? Or is there anyway that the above query could still be used as is with esJsonRDD?
If not, what is the better way to fetch such RDDs in Spark?
Because esJsonRDD seems to accept only inline(one line) elastic queries.
Use triple quotes:
val query = """{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}"""
val elasticRdds = sparkContext.esJsonRDD(esIndex, query)

filter range date elasticsearch

This is how my datas look like
{
"name": "thename",
"openingTimes": {
"monday": [
{
"start": "10:00",
"end": "14:00"
},
{
"start": "19:00",
"end": "02:30"
}
]
}
}
I want to query this document saying, opened on monday between 13:00 and 14:00.
I tried this filter but it doesn't return my document:
{
"filter": {
"range": {
"openingTimes.monday.start": {
"lte": "13:00"
},
"openingTimes.monday.end": {
"gte": "14:00"
}
}
}
}
If I simply say opened on monday at 13:00, it works:
{
"filter": {
"range": {
"openingTimes.monday.start": {
"lte": "13:00"
}
}
}
}
Or even closing on monday from 14:00, works too:
{
"filter": {
"range": {
"openingTimes.monday.start": {
"gte": "14:00"
}
}
}
}
but combining both of them doens't give me anything. How can I manage to create a filter meaning opened on monday between 13:00 and 14:00 ?
EDIT
This is how I mapped the openingTime field
{
"properties": {
"monday": {
"type": "nested",
"properties": {
"start": {"type": "date","format": "hour_minute"},
"end": {"type": "date","format": "hour_minute"}
}
}
}
}
SOLUTION (#DanTuffery)
Based on #DanTuffery answer I changed my filter to his (which is working perfectly) and added the type definition of my openingTime attribute.
For the record I am using elasticsearch as my primary db through Ruby-on-Rails using the following gems:
gem 'elasticsearch-rails', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-model', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-persistence', git: 'git://github.com/elasticsearch/elasticsearch-rails.git', require: 'elasticsearch/persistence/model'
Here is how my openingTime attribute's mapping looks like:
attribute :openingTimes, Hash, mapping: {
type: :object,
properties: {
monday: {
type: :nested,
properties: {
start:{type: :date, format: 'hour_minute'},
end: {type: :date, format: 'hour_minute'}
}
},
tuesday: {
type: :nested,
properties: {
start:{type: :date, format: 'hour_minute'},
end: {type: :date, format: 'hour_minute'}
}
},
...
...
}
}
And here is how I implemented his filter:
def self.openedBetween startTime, endTime, day
self.search filter: {
nested: {
path: "openingTimes.#{day}",
filter: {
bool: {
must: [
{range: {"openingTimes.#{day}.start"=> {lte: startTime}}},
{range: {"openingTimes.#{day}.end" => {gte: endTime}}}
]
}
}
}
}
end
First create your mapping with the openingTimes object at the top level.
/PUT http://localhost:9200/demo/test/_mapping
{
"test": {
"properties": {
"openingTimes": {
"type": "object",
"properties": {
"monday": {
"type": "nested",
"properties": {
"start": {
"type": "date",
"format": "hour_minute"
},
"end": {
"type": "date",
"format": "hour_minute"
}
}
}
}
}
}
}
}
Index your document
/POST http://localhost:9200/demo/test/1
{
"name": "thename",
"openingTimes": {
"monday": [
{
"start": "10:00",
"end": "14:00"
},
{
"start": "19:00",
"end": "02:30"
}
]
}
}
With a nested filter query you can search for the document with the start and end fields within boolean range queries:
/POST http://localhost:9200/demo/test/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "openingTimes.monday",
"filter": {
"bool": {
"must": [
{
"range": {
"openingTimes.monday.start": {
"lte": "13:00"
}
}
},
{
"range": {
"openingTimes.monday.end": {
"gte": "14:00"
}
}
}
]
}
}
}
}
}
}
}

Find in subdocuments returning document

I have a collection looking somewhat like this:
{
"colors": ["blue","white"],
"items": {
"old": {
"name": "test"
}
"current": {
"name": "new_test"
}
}
},
{
"colors": ["red","green"],
"items": {
"old": {
"name": "test2"
}
"current": {
"name": "new_test2"
}
}
},
Is it possible to use find like this:
db.collection.find({"items": { "old": { "name": "test" } } })
So the command would return:
{
"colors": ["blue","white"],
"items": {
"old": {
"name": "test"
}
"current": {
"name": "new_test"
}
}
}
Is this possible?
Yes, you can use the 'dot notation' to reach into the object:
db.collection.find({"items.old.name": "test" })
The query syntax you used also works, but it has different semantics: It will match the entire subdocument for equality instead of just a single field. For instance, the following query would also return a result:
db.foo.find({"items.old": {"name" : "test"} }),
butdb.collection.find({"items": { "old": { "name": "test" } } }) does not, because items also contains a current field.