I'm fairly new to Elasticsearch. I'm trying to write a query that will group by a field and calculate a sum. In SQL, my query would look like this:
SELECT lane, SUM(routes) FROM lanes GROUP BY lane
I have this data that looks like this in ES:
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "TUeWFEhnS9q1Ukb2QdZABg",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M05",
"routes": 4047
}
},
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "owVmGW9GT562_2Alfru2DA",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M03",
"routes": 4065
}
},
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "JY9xNDxqSsajw76oMC2gxA",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M05",
"routes": 3056
}
},
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "owVmGW9GT345_2Alfru2DB",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M03",
"routes": 5675
}
},
...
I want to essentially run the same query in ES as I did in SQL, so that my result would be something like (in json of course): M05: 7103, M03: 9740
In elasticsearch, you can achieve this by using terms stats facet:
{
"query" : {
"match_all" : { }
},
"facets" : {
"lane_routes_stats" : {
"terms_stats" : {
"key_field" : "lane",
"value_field" : "routes",
"order": "term"
}
}
}
}
Related
hope you're fine.
I cannot seem to find a way to aggregate the following document by 'equity id'.
{
"_id": {
"$oid": "6001dc246192c700013e8252"
},
"user": "blablabla",
"_type": "User::Individual",
"created_at": {
"$date": "2021-01-15T18:17:11.130Z"
},
"integrations": [{
"_id": {
"$oid": "6001dc62e7a0970001258da8"
},
"status": "completed",
"authentication_failed_msg": null
}],
"portfolios": [{
"_id": {
"$oid": "6001dc62e7a0970001258da9"
},
"_type": "SimplePortfolio",
"transactions": [{
"_id": {
"$oid": "6001dc62e7a0970001258daa"
},
"settlement_period": 2,
"expenses": 0,
"source": "integration",
"_type": "Transaction::Equity::Buy",
"date": {
"$date": "2020-03-02T00:00:00.000Z"
},
"shares": 100,
"price": 13.04,
"equity_id": "abcd"
}, {
"_id": {
"$oid": "6001dc62e7a0970001258dab"
},
"settlement_period": 2,
"expenses": 0,
"source": "integration",
"_type": "Transaction::Equity::Buy",
"date": {
"$date": "2020-03-02T00:00:00.000Z"
},
"shares": 1000,
"price": 1.03,
"equity_id": "efgh"
I tried something like
db.collection.aggregate([{"$unwind": {'$portfolios.transactions'}},
{"$group" : {"_id": "$equity_id"}}])
Got error InvalidDocument: cannot encode object: {'$portfolios.transactions'}, of type: <class 'set'>
Ideally what I want a list grouped by user and equity_id and a sum of its shares. Does anyone know if the error is caused by my aggregation or the document structure?
You should $unwind twice.
db.collection.aggregate([
{
"$unwind": "$portfolios"
},
{
"$unwind": "$portfolios.transactions"
},
{
"$group": {
"_id": "$portfolios.transactions.equity_id"
}
}
])
mongoplayground
My document looks like:
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test_db2",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"name": "very cool shoes",
"price": 26
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "2",
"_score": 1,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "3",
"_score": 1,
"_source": {
"name": "shirt",
"price": 25
}
}
]
}
How to create autocomplete in elasticsearch like for example:
I put in input word "sh" , after that I should see result
shoes
shampoo
shirt
.....
Example of what I need
Take a look at ngrams. Or actually, edge ngrams are probably all you need.
Qbox has a couple of blog posts about setting up autocomplete with ngrams, so for a more in-depth discussion I would refer you to these:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
But just very quickly, this should get you started.
First I set up the index:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"edgengram_filter"
]
}
},
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
},
"price":{
"type": "integer"
}
}
}
}
}
Then I indexed your documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name": "very cool shoes","price": 26}
{"index":{"_id":2}}
{"name": "great shampoo","price": 15}
{"index":{"_id":3}}
{"name": "shirt","price": 25}
Now I can get autocomplete results with a simple match query:
POST /test_index/_search
{
"query": {
"match": {
"name": "sh"
}
}
}
which returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.30685282,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.30685282,
"_source": {
"name": "shirt",
"price": 25
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.19178301,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"name": "very cool shoes",
"price": 26
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/0886488ddfb045c69eed67b15e9734187c8b2491
I have create an index like that:
PUT twitter
PUT twitter/_mapping/myType
{
"myType" : {
"properties" : {
"message" : {"type" : "date",
"date_detection": true,
"store" : true }
}
}
}
Then I put several documents:
POST twitter/myType
{
"message":123456
}
I have this document and other with message values: "123456",-123456,"2014-01-01","-123456" (Note string and numeric difference here). Only document with value "12#3454" failed to put.
So now I execute:
GET twitter/myType/_search?pretty=true&q=*:*
And results are:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5JvFsHvhUOO_5MdfCv",
"_score": 1,
"_source": {
"message": -123456
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5Ju6aOvhUOO_5MdfCs",
"_score": 1,
"_source": {
"message": "123456"
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5Ju0KOvhUOO_5MdfCq",
"_score": 1,
"_source": {
"message": "2014-01-01"
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5JvDiGvhUOO_5MdfCu",
"_score": 1,
"_source": {
"message": "-123456"
}
}
]
}
}
Why I get these value in date fields instead of string value - ISODateTimeFormat.dateOptionalTimeParser? Is there a way to get all date with one format (e.g. string or millis)?
Elasticsearch version is 1.4.3
That's the _source you are seeing, meaning the exact JSON you indexed, no formatting, nothing.
If you want to see what actually ES indexed (meaning the date in milliseconds), you can use fielddata_fields:
GET /twitter/myType/_search
{
"query": {
"match_all": {}
},
"fielddata_fields": [
"message"
]
}
And the answer to your question is that is not actually available out-of-the-box. You need to use script_fields:
GET /twitter/myType/_search
{
"query": {
"match_all": {}
},
"fielddata_fields": [
"message"
],
"_source": "*",
"script_fields": {
"my_script": {
"script": "new Date(doc[\"message\"].value)"
}
}
}
Also, your mapping is wrong: date_detection should be put in the type not in the field:
PUT twitter
{
"mappings": {
"myType": {
"date_detection": true,
"properties": {
"message": {
"type": "date",
"store": true
}
}
}
}
}
And from the output below you'll see how ES treats those numbers you put in there:
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk5",
"_score": 1,
"_source": {
"message": "123456"
},
"fields": {
"message": [
3833727840000000
],
"my_script": [
"123456-01-01T00:00:00.000Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk4",
"_score": 1,
"_source": {
"message": 123456
},
"fields": {
"message": [
123456
],
"my_script": [
"1970-01-01T00:02:03.456Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk8",
"_score": 1,
"_source": {
"message": "-123456"
},
"fields": {
"message": [
-3958062278400000
],
"my_script": [
"-123456-01-01T00:00:00.000Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk7",
"_score": 1,
"_source": {
"message": "2014-01-01"
},
"fields": {
"message": [
1388534400000
],
"my_script": [
"2014-01-01T00:00:00.000Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk6",
"_score": 1,
"_source": {
"message": -123456
},
"fields": {
"message": [
-123456
],
"my_script": [
"1969-12-31T23:57:56.544Z"
]
}
}
]
Building my first RESTful api, and thought I'd try elasticsearch for a base. Is there a way customize the API in Elasticsearch to only return certain fields from results of a query. For instance if I have data with fname, lname, city, state, zip, email and I only want to return a list of fnames and cities for every query matching the city field. So something like this:
curl -XPOST "http://localhost:9200/custom_call/_search" -d'
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
}
}'
Would ideally return something like:
{"took": 52, "timed_out": false, "_shards": {
"total": 35,
"successful": 35,
"failed": 0
}, "hits": {
"total": 1,
"max_score": 0.375,
"hits": [
{
"_index": "persons",
"_type": "person",
"_id": "6",
"_score": 0.375,
"_source": {
"fname": "Bob",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "13",
"_score": 0.375,
"_source": {
"fname": "Sue",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "21",
"_score": 0.375,
"_source": {
"fname": "Jose",
"city": "Toronto",
}
}
]
}}
Not sure if Elasticsearch is set up to do this or even if you would want it to. My first foray into building a RESTful API. I figure if NPR StackOverflow like it, its worth a shot! Thanks for the help.
Yes you can, I think you haven't tried to find out on your own.
Here is how to do that,
POST localhost:9200/index/type/_search
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
},
"_source" :["fields_you_want_to_get"]
}
The term you are looking is source filtering.
I'm new to elasticsearch, managed to set it up and import recordset from my mongodb collection using the river plugin. For a start, I want to query against the "desc" field but just can't manage to get the query .. not sure if the problem is driven by the way index was defined.. can anyone help please?
Sample recordset in elastic search looks like this
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 107209,
"max_score": 1,
"hits": [
{
"_index": "shiv",
"_type": "shiv",
"_id": "iG1eIzN7RGO7hFfxTlnLuA",
"_score": 1,
"_source": {
"_id": {
"$oid": "50901d7f485bf7bd1c000021"
},
"brand": "",
"category": {
"$ref": "categories",
"$id": {
"$oid": "4fbd2221758cb11d14000174"
}
},
"comments": [],
"count_comment": 0,
"count_fav": 2,
"count_hotness": 1.46,
"count_rekick": 0,
"count_share": 0,
"country": {
"$ref": "countries",
"$id": {
"$oid": "4fec98f7758cb18c6e0002c9"
}
},
"currency": "pound",
"desc": "A men's automatic watch, this Seamaster Bond model features a Co-Axial escapement and date function. Its blue dial is teamed with a stainless steel case and bracelet for a look that's sporty and refined.",
"gender": "male",
"ident": "omega-seamaster-diver-bond-men-s-automatic-watch---ernest-jones-1351622015",
"img_url": "http://s7ondemand4.scene7.com/is/image/Signet/5735793?$detail$",
"lifestyles": [
{
"$ref": "lifestyles",
"$id": {
"$oid": "508ff6ca485bf73112000060"
}
}
],
"location": "United Kingdom",
"owner": {
"$ref": "accounts",
"$id": {
"$oid": "50742fd8485bf74b7a00213f"
}
},
"price": 2400,
"store": "ernestjones.co.uk",
"tags": [
"ernest-jones",
"bond"
],
"timestamp_creation": 1351622015,
"timestamp_exp": 1356825600,
"timestamp_update": 1351622015,
"title": "Omega Seamaster Diver Bond men's automatic watch - Ernest Jones",
"url": "http%3A%2F%2Fwww.ernestjones.co.uk%2Fwebstore%2Fd%2F5735793%2Fomega%20seamaster%20diver%20bond%20men%27s%20automatic%20watch%2F%3Futm_source%3Dgooglebase%26utm_medium%3Dfeedmanager%26cm_mmc%3DFroogle-_-CKB-_-nurses_fobs-_-watches%26cm_mmca1%3Domega%26cm_mmca2%3Dmale%26cm_mmca3%3Dadult"
}
}
]
}
}
The mapping of the index "shiv" looks like
{
"shiv": {
"properties": {
"$oid": {
"type": "string"
}
}
}
}
Thanks again
There are lots of ways to query, have you tried a match query?
Using curl or a rest client of your choice...
http://[host]:9200/[index_name]/[doc_type]/_search
{
"query" : {
"match" : {
"desc" : "some value you want to find in desc"
}
}
}