I have an index that includes a field and when a '#' is input, I cannot get the query to find the #.
Field Data: "#3213939"
Query:
GET /invoices/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"referenceNumber": {
"query": "#32"
}
}
},
{
"wildcard": {
"referenceNumber": {
"value": "*#32*"
}
}
}
]
}
}
}
"#" character drops during standard text analyzer this is why you can't find it.
POST _analyze
{
"text": ["#3213939"]
}
Response:
{
"tokens": [
{
"token": "3213939",
"start_offset": 1,
"end_offset": 8,
"type": "<NUM>",
"position": 0
}
]
}
You can update the analyzer and customize it.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
OR
you can use referenceNumber.keyword field.
GET test_invoices/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"referenceNumber": {
"query": "#32"
}
}
},
{
"wildcard": {
"referenceNumber.keyword": {
"value": "*#32*"
}
}
}
]
}
}
}
Related
I have a problem with my Jolt transformation but no idea how to fix it.
I get "null" Element in the array I produce:
{
"Verkaufsprodukt": [
{
"Produkt": [
{
"Elementarprodukt": [
{
"ArtID": {
"bezeichnung": "b",
"value": "0302"
},
"VersichertePerson": {
"PartnerID": "1"
}
},
{
"ArtID": {
"bezeichnung": "f"
},
"VersichertePerson": {
"PartnerID": "1"
}
},
{
"ArtID": {
"bezeichnung": "c"
},
"VersichertePerson": {
"PartnerID": "1"
}
},
{
"ArtID": {
"bezeichnung": "a",
"value": "0301"
},
"VersichertePerson": {
"PartnerID": "1"
}
}
]
}
]
}
],
"Partner": [
{
"Name": "Holgerson",
"PartnerID": "1",
"Vorname": "Nils"
}
]
}
My result:
{
"vertragsdetails" : {
"versichertePersonen" : {
"versicherungssummenOderLeistungen" : [ null, {
"kennung" : "0302"
}, null, {
"kennung" : "0301"
} ]
}
}
}
Here is my spec:
[
{
"operation": "shift",
"spec": {
"Verkaufsprodukt": {
"*": {
"Produkt": {
"*": {
"Elementarprodukt": {
"*": {
"VersichertePerson": {
"PartnerID": {
"1": {
"#(3)": {
"ArtID": {
"value": "vertragsdetails.versichertePersonen.versicherungssummenOderLeistungen[&6].kennung"
}
}
}
}
}
}
}
}
}
}
}
}
}
]
I see, that the null elements comes from the "ArtID" elements without "vaules" but how can I get rid of them?
I tried a '"operation": "shift",' but that deleted also other elements I want to have.
Can somebody help? Thanks!
I found a soulotion. I added this to my spec:
{
"operation": "shift",
"spec": {
"vertragsdetails": {
"versichertePersonen": {
"versicherungssummenOderLeistungen": {
"*": {
"kennung": {
"#1": "vertragsdetails.versichertePersonen.versicherungssummenOderLeistungen[]"
}
}
}
"dataToKeep": "vertragsdetails.versichertePersonen.dataToKeep"
}
}
}
}
I want to filter out the Sum_PKTS which value is lower than 10.
How could I merge the two query string?
Is it possible?
BTW, the "Sum_PKTS" field is sum by "field" : "Packet.
the goal is to filter the local IP and aggregate "packet" field, and finally filter the Sum_PKTS which value is lower than 10.
{
"range":{
"Sum_PKTS":{
"gte": 10
}
}
}
--
GET /_search
{
"size" : 0,
"query": {
"bool": {
"should": [
{
"match":{"IPV4_DST_ADDR":"192.168.0.0/16"}
},
{
"match":{"IPV4_SRC_ADDR":"192.168.0.0/16"}
}
],
"minimum_should_match": 1,
"must":[
{
"range":{
"#timestamp":{
"gte":"now-5m"
}
}
}
]
}
},
"aggs": {
"DST_Local_IP": {
"filter": {
"bool": {
"filter": {
"match":{"IPV4_DST_ADDR":"192.168.0.0/16"}
}
}
},
"aggs": {
"genres":{
"terms" : {
"field" : "IPV4_DST_ADDR" ,
"order" : { "Sum_PKTS" : "desc" }
},
"aggs":{
"Sum_PKTS": {
"sum" : { "field" : "Packet" }
}
}
}
}
},
"SRC_Local_IP": {
"filter": {
"bool": {
"filter": {
"match":{"IPV4_SRC_ADDR":"192.168.0.0/16"}
}
}
},
"aggs": {
"genres":{
"terms" : {
"field" : "IPV4_SRC_ADDR" ,
"order" : { "Sum_PKTS" : "desc" }
},
"aggs":{
"Sum_PKTS": {
"sum" : { "field" : "Packet" }
}
}
}
}
}
}
}
thank you in advance!
You can achieve what you want using a bucket selector pipeline aggregation (see the two Sum_PKTS_gte_10 aggregations below):
{
"size": 0,
"query": {
"bool": {
"should": [
{
"match": {
"IPV4_DST_ADDR": "192.168.0.0/16"
}
},
{
"match": {
"IPV4_SRC_ADDR": "192.168.0.0/16"
}
}
],
"minimum_should_match": 1,
"must": [
{
"range": {
"#timestamp": {
"gte": "now-5m"
}
}
}
]
}
},
"aggs": {
"DST_Local_IP": {
"filter": {
"bool": {
"filter": {
"match": {
"IPV4_DST_ADDR": "192.168.0.0/16"
}
}
}
},
"aggs": {
"genres": {
"terms": {
"field": "IPV4_DST_ADDR",
"order": {
"Sum_PKTS": "desc"
}
},
"aggs": {
"Sum_PKTS": {
"sum": {
"field": "Packet"
}
},
"Sum_PKTS_gte_10": {
"bucket_selector": {
"buckets_path": {
"sum_packets": "Sum_PKTS"
},
"script": "params.sum_packets >= 10"
}
}
}
}
}
},
"SRC_Local_IP": {
"filter": {
"bool": {
"filter": {
"match": {
"IPV4_SRC_ADDR": "192.168.0.0/16"
}
}
}
},
"aggs": {
"genres": {
"terms": {
"field": "IPV4_SRC_ADDR",
"order": {
"Sum_PKTS": "desc"
}
},
"aggs": {
"Sum_PKTS": {
"sum": {
"field": "Packet"
}
},
"Sum_PKTS_gte_10": {
"bucket_selector": {
"buckets_path": {
"sum_packets": "Sum_PKTS"
},
"script": "params.sum_packets >= 10"
}
}
}
}
}
}
}
}
I want to put double filter in aggs.
such like this.
"aggs": {
"download1" : {
"filter" : [
{ "term": { "IPV4_DST_ADDR":"192.168.0.159"}},
{ "range": { "LAST_SWITCHED": { "gte": "now-5m" } }}
],
"aggs" : {
"downlod_bytes" : { "sum" : { "field" : "IN_BYTES" } }
}
}
}
but it show me an error:
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Expected [START_OBJECT] under [filter], but got a [START_ARRAY] in [download1]",
"line": 33,
"col": 24
}
]}
How can I do, thank you in advance!
You need to combine both queries with a bool/filter
{
"aggs": {
"download1": {
"filter": {
"bool": {
"filter": [
{
"term": {
"IPV4_DST_ADDR": "192.168.0.159"
}
},
{
"range": {
"LAST_SWITCHED": {
"gte": "now-5m"
}
}
}
]
}
},
"aggs": {
"downlod_bytes": {
"sum": {
"field": "IN_BYTES"
}
}
}
}
}
}
I am currently fetching the elasticsearch RDD in our Spark Job filtering based on one-line elastic query as such (example):
val elasticRdds = sparkContext.esJsonRDD(esIndex, s"?default_operator=AND&q=director.name:DAVID + \n movie.name:SEVEN")
Now if our search query becomes complex like:
{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}
Can I still convert that query to in-line elastic query to use it with esJsonRDD? Or is there anyway that the above query could still be used as is with esJsonRDD?
If not, what is the better way to fetch such RDDs in Spark?
Because esJsonRDD seems to accept only inline(one line) elastic queries.
Use triple quotes:
val query = """{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}"""
val elasticRdds = sparkContext.esJsonRDD(esIndex, query)
This is how my datas look like
{
"name": "thename",
"openingTimes": {
"monday": [
{
"start": "10:00",
"end": "14:00"
},
{
"start": "19:00",
"end": "02:30"
}
]
}
}
I want to query this document saying, opened on monday between 13:00 and 14:00.
I tried this filter but it doesn't return my document:
{
"filter": {
"range": {
"openingTimes.monday.start": {
"lte": "13:00"
},
"openingTimes.monday.end": {
"gte": "14:00"
}
}
}
}
If I simply say opened on monday at 13:00, it works:
{
"filter": {
"range": {
"openingTimes.monday.start": {
"lte": "13:00"
}
}
}
}
Or even closing on monday from 14:00, works too:
{
"filter": {
"range": {
"openingTimes.monday.start": {
"gte": "14:00"
}
}
}
}
but combining both of them doens't give me anything. How can I manage to create a filter meaning opened on monday between 13:00 and 14:00 ?
EDIT
This is how I mapped the openingTime field
{
"properties": {
"monday": {
"type": "nested",
"properties": {
"start": {"type": "date","format": "hour_minute"},
"end": {"type": "date","format": "hour_minute"}
}
}
}
}
SOLUTION (#DanTuffery)
Based on #DanTuffery answer I changed my filter to his (which is working perfectly) and added the type definition of my openingTime attribute.
For the record I am using elasticsearch as my primary db through Ruby-on-Rails using the following gems:
gem 'elasticsearch-rails', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-model', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-persistence', git: 'git://github.com/elasticsearch/elasticsearch-rails.git', require: 'elasticsearch/persistence/model'
Here is how my openingTime attribute's mapping looks like:
attribute :openingTimes, Hash, mapping: {
type: :object,
properties: {
monday: {
type: :nested,
properties: {
start:{type: :date, format: 'hour_minute'},
end: {type: :date, format: 'hour_minute'}
}
},
tuesday: {
type: :nested,
properties: {
start:{type: :date, format: 'hour_minute'},
end: {type: :date, format: 'hour_minute'}
}
},
...
...
}
}
And here is how I implemented his filter:
def self.openedBetween startTime, endTime, day
self.search filter: {
nested: {
path: "openingTimes.#{day}",
filter: {
bool: {
must: [
{range: {"openingTimes.#{day}.start"=> {lte: startTime}}},
{range: {"openingTimes.#{day}.end" => {gte: endTime}}}
]
}
}
}
}
end
First create your mapping with the openingTimes object at the top level.
/PUT http://localhost:9200/demo/test/_mapping
{
"test": {
"properties": {
"openingTimes": {
"type": "object",
"properties": {
"monday": {
"type": "nested",
"properties": {
"start": {
"type": "date",
"format": "hour_minute"
},
"end": {
"type": "date",
"format": "hour_minute"
}
}
}
}
}
}
}
}
Index your document
/POST http://localhost:9200/demo/test/1
{
"name": "thename",
"openingTimes": {
"monday": [
{
"start": "10:00",
"end": "14:00"
},
{
"start": "19:00",
"end": "02:30"
}
]
}
}
With a nested filter query you can search for the document with the start and end fields within boolean range queries:
/POST http://localhost:9200/demo/test/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "openingTimes.monday",
"filter": {
"bool": {
"must": [
{
"range": {
"openingTimes.monday.start": {
"lte": "13:00"
}
}
},
{
"range": {
"openingTimes.monday.end": {
"gte": "14:00"
}
}
}
]
}
}
}
}
}
}
}