Building my first RESTful api, and thought I'd try elasticsearch for a base. Is there a way customize the API in Elasticsearch to only return certain fields from results of a query. For instance if I have data with fname, lname, city, state, zip, email and I only want to return a list of fnames and cities for every query matching the city field. So something like this:
curl -XPOST "http://localhost:9200/custom_call/_search" -d'
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
}
}'
Would ideally return something like:
{"took": 52, "timed_out": false, "_shards": {
"total": 35,
"successful": 35,
"failed": 0
}, "hits": {
"total": 1,
"max_score": 0.375,
"hits": [
{
"_index": "persons",
"_type": "person",
"_id": "6",
"_score": 0.375,
"_source": {
"fname": "Bob",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "13",
"_score": 0.375,
"_source": {
"fname": "Sue",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "21",
"_score": 0.375,
"_source": {
"fname": "Jose",
"city": "Toronto",
}
}
]
}}
Not sure if Elasticsearch is set up to do this or even if you would want it to. My first foray into building a RESTful API. I figure if NPR StackOverflow like it, its worth a shot! Thanks for the help.
Yes you can, I think you haven't tried to find out on your own.
Here is how to do that,
POST localhost:9200/index/type/_search
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
},
"_source" :["fields_you_want_to_get"]
}
The term you are looking is source filtering.
Related
I have successfully imported some JSON data into cloudant, the JSON data has three levels. Then created the dashdb warehouse from cloudant to put the data into relational tables. It appears that dashdb has created three tables for each of the levels in the JSON data but has not provided me with a Key to join back to the top level. Is there a customisation that is done somewhere that tells dashdb how to join the tables.
A sample JSON doc is below:
{
"_id": "579b56388aa56fd03a4fd0a9",
"_rev": "1-698183d4326352785f213b823749b9f8",
"v": 0,
"startTime": "2016-07-29T12:48:04.204Z",
"endTime": "2016-07-29T13:11:48.962Z",
"userId": "Ranger1",
"uuid": "497568578283117a",
"modes": [
{
"startTime": "2016-07-29T12:54:22.565Z",
"endTime": "2016-07-29T12:54:49.894Z",
"name": "bicycle",
"_id": "579b56388aa56fd03a4fd0b1",
"locations": []
},
{
"startTime": "2016-07-29T12:48:02.477Z",
"endTime": "2016-07-29T12:53:28.503Z",
"name": "walk",
"_id": "579b56388aa56fd03a4fd0ad",
"locations": [
{
"at": "2016-07-29T12:49:05.716Z",
"_id": "579b56388aa56fd03a4fd0b0",
"location": {
"coords": {
"latitude": -34.0418308,
"longitude": 18.3503616,
"accuracy": 37.5,
"speed": 0,
"heading": 0,
"altitude": 0
},
"battery": {
"is_charging": true,
"level": 0.7799999713897705
}
}
},
{
"at": "2016-07-29T12:49:48.488Z",
"_id": "579b56388aa56fd03a4fd0af",
"location": {
"coords": {
"latitude": -34.0418718,
"longitude": 18.3503895,
"accuracy": 33,
"speed": 0,
"heading": 0,
"altitude": 0
},
"battery": {
"is_charging": true,
"level": 0.7799999713897705
}
}
},
{
"at": "2016-07-29T12:50:20.760Z",
"_id": "579b56388aa56fd03a4fd0ae",
"location": {
"coords": {
"latitude": -34.0418788,
"longitude": 18.3503887,
"accuracy": 33,
"speed": 0,
"heading": 0,
"altitude": 0
},
"battery": {
"is_charging": true,
"level": 0.7799999713897705
}
}
}
]
},
{
"startTime": "2016-07-29T12:53:37.137Z",
"endTime": "2016-07-29T12:54:18.505Z",
"name": "carshare",
"_id": "579b56388aa56fd03a4fd0ac",
"locations": []
},
{
"startTime": "2016-07-29T12:54:54.112Z",
"endTime": "2016-07-29T13:11:47.818Z",
"name": "bus",
"_id": "579b56388aa56fd03a4fd0aa",
"locations": [
{
"at": "2016-07-29T13:00:08.039Z",
"_id": "579b56388aa56fd03a4fd0ab",
"location": {
"coords": {
"latitude": -34.0418319,
"longitude": 18.3503623,
"accuracy": 36,
"speed": 0,
"heading": 0,
"altitude": 0
},
"battery": {
"is_charging": false,
"level": 0.800000011920929
}
}
}
]
}
]
}
SQL for the three tables created in dashdb showing all the fields in each table is here. Note there is no FK that I can see, the "_ID" fields are unique to each table.
SELECT ENDTIME,STARTTIME,USERID,UUID,V,"_ID","_REV"
FROM <schemaname>.RANGER_DATA
where "_ID" = '579b56388aa56fd03a4fd0a9'
SELECT ARRAY_INDEX,ENDTIME,NAME,STARTTIME,TOTALPAUSEDMS,"_ID"
FROM <schemaname>.RANGER_DATA_MODES
where "_ID" = '579b56388aa56fd03a4fd0b1'
SELECT ARRAY_INDEX,AT,LOCATION_BATTERY_IS_CHARGING,LOCATION_BATTERY_LEVEL,LOCATION_COORDS_ACCURACY,LOCATION_COORDS_ALTITUDE,LOCATION_COORDS_HEADING,LOCATION_COORDS_LATITUDE,LOCATION_COORDS_LONGITUDE,LOCATION_COORDS_SPEED,RANGER_DATA_MODES,"_ID"
FROM <schemaname>.RANGER_DATA_MODES_LOCATIONS
where "_ID" = '579b56388aa56fd03a4fd0b0'
Cloudant uses _id for its UID for each document. It seems that the warehousing task iterates over these documents and assumes that there is a new document every time it sees a new _id.
Because you're using _id in your modes and locations this will produce an undesired result in the SQL DB.
Renaming your _id in modes and locations to something else should fix the problem.
My document looks like:
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test_db2",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"name": "very cool shoes",
"price": 26
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "2",
"_score": 1,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "3",
"_score": 1,
"_source": {
"name": "shirt",
"price": 25
}
}
]
}
How to create autocomplete in elasticsearch like for example:
I put in input word "sh" , after that I should see result
shoes
shampoo
shirt
.....
Example of what I need
Take a look at ngrams. Or actually, edge ngrams are probably all you need.
Qbox has a couple of blog posts about setting up autocomplete with ngrams, so for a more in-depth discussion I would refer you to these:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
But just very quickly, this should get you started.
First I set up the index:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"edgengram_filter"
]
}
},
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
},
"price":{
"type": "integer"
}
}
}
}
}
Then I indexed your documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name": "very cool shoes","price": 26}
{"index":{"_id":2}}
{"name": "great shampoo","price": 15}
{"index":{"_id":3}}
{"name": "shirt","price": 25}
Now I can get autocomplete results with a simple match query:
POST /test_index/_search
{
"query": {
"match": {
"name": "sh"
}
}
}
which returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.30685282,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.30685282,
"_source": {
"name": "shirt",
"price": 25
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.19178301,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"name": "very cool shoes",
"price": 26
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/0886488ddfb045c69eed67b15e9734187c8b2491
I have not yet used Elasticsearch so please excuse the bad description. I would like to know if it is possible to configure Elasticsearch to do the following - I had some issues in MongoDB with this as the full text search functionalities seem to be a little limiting.
Here's my problem - when I do a search for the term Korea I do not
want it to match North Korea or N. Korea in the document.
The assumption is that a search for Korea is about South Korea. This is obviously different from a synonym as it is kind of the opposite. A phrase search for South Korea is out of the question here as it isn't applicable in my problem. Is this possible?
I will accept answers for either MongoDB or Elasticsearch.
What if you use a query like this one:
{
"query": {
"bool": {
"should": [
{
"match": {
"some_field": "korea"
}
},
{
"query_string": {
"query": "-some_field:(\"north korea\")"
}
},
{
"query_string": {
"query": "-some_field:(\"n. korea\")"
}
}
]
}
}
}
What it does is like this:
if that field content matches "korea" then it's receiving a score
if that field isn't matching "north korea" again it's getting some score boost
again, if it doesn't match "n. korea" is getting some additional score.
Basically, the score increases if it matches "korea", if it doesn't match "north korea" and if it doesn't match "n. korea".
For example, for documents like this
POST /my_index/test/1
{
"text": "North Korea"
}
POST /my_index/test/2
{
"text": "Korea"
}
POST /my_index/test/3
{
"text": "N. Korea"
}
POST /my_index/test/4
{
"text": "South Korea"
}
The query above will return this:
"hits": [
{
"_index": "korea",
"_type": "test",
"_id": "2",
"_score": 1.4471208,
"_source": {
"text": "Korea"
}
},
{
"_index": "korea",
"_type": "test",
"_id": "4",
"_score": 1.4227209,
"_source": {
"text": "South Korea"
}
},
{
"_index": "korea",
"_type": "test",
"_id": "1",
"_score": 0.48779577,
"_source": {
"text": "North Korea"
}
},
{
"_index": "korea",
"_type": "test",
"_id": "3",
"_score": 0.48779577,
"_source": {
"text": "N. Korea"
}
}
]
The highest scores are for documents that are not about north korea.
I'm new to elasticsearch, managed to set it up and import recordset from my mongodb collection using the river plugin. For a start, I want to query against the "desc" field but just can't manage to get the query .. not sure if the problem is driven by the way index was defined.. can anyone help please?
Sample recordset in elastic search looks like this
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 107209,
"max_score": 1,
"hits": [
{
"_index": "shiv",
"_type": "shiv",
"_id": "iG1eIzN7RGO7hFfxTlnLuA",
"_score": 1,
"_source": {
"_id": {
"$oid": "50901d7f485bf7bd1c000021"
},
"brand": "",
"category": {
"$ref": "categories",
"$id": {
"$oid": "4fbd2221758cb11d14000174"
}
},
"comments": [],
"count_comment": 0,
"count_fav": 2,
"count_hotness": 1.46,
"count_rekick": 0,
"count_share": 0,
"country": {
"$ref": "countries",
"$id": {
"$oid": "4fec98f7758cb18c6e0002c9"
}
},
"currency": "pound",
"desc": "A men's automatic watch, this Seamaster Bond model features a Co-Axial escapement and date function. Its blue dial is teamed with a stainless steel case and bracelet for a look that's sporty and refined.",
"gender": "male",
"ident": "omega-seamaster-diver-bond-men-s-automatic-watch---ernest-jones-1351622015",
"img_url": "http://s7ondemand4.scene7.com/is/image/Signet/5735793?$detail$",
"lifestyles": [
{
"$ref": "lifestyles",
"$id": {
"$oid": "508ff6ca485bf73112000060"
}
}
],
"location": "United Kingdom",
"owner": {
"$ref": "accounts",
"$id": {
"$oid": "50742fd8485bf74b7a00213f"
}
},
"price": 2400,
"store": "ernestjones.co.uk",
"tags": [
"ernest-jones",
"bond"
],
"timestamp_creation": 1351622015,
"timestamp_exp": 1356825600,
"timestamp_update": 1351622015,
"title": "Omega Seamaster Diver Bond men's automatic watch - Ernest Jones",
"url": "http%3A%2F%2Fwww.ernestjones.co.uk%2Fwebstore%2Fd%2F5735793%2Fomega%20seamaster%20diver%20bond%20men%27s%20automatic%20watch%2F%3Futm_source%3Dgooglebase%26utm_medium%3Dfeedmanager%26cm_mmc%3DFroogle-_-CKB-_-nurses_fobs-_-watches%26cm_mmca1%3Domega%26cm_mmca2%3Dmale%26cm_mmca3%3Dadult"
}
}
]
}
}
The mapping of the index "shiv" looks like
{
"shiv": {
"properties": {
"$oid": {
"type": "string"
}
}
}
}
Thanks again
There are lots of ways to query, have you tried a match query?
Using curl or a rest client of your choice...
http://[host]:9200/[index_name]/[doc_type]/_search
{
"query" : {
"match" : {
"desc" : "some value you want to find in desc"
}
}
}
I'm fairly new to Elasticsearch. I'm trying to write a query that will group by a field and calculate a sum. In SQL, my query would look like this:
SELECT lane, SUM(routes) FROM lanes GROUP BY lane
I have this data that looks like this in ES:
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "TUeWFEhnS9q1Ukb2QdZABg",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M05",
"routes": 4047
}
},
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "owVmGW9GT562_2Alfru2DA",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M03",
"routes": 4065
}
},
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "JY9xNDxqSsajw76oMC2gxA",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M05",
"routes": 3056
}
},
{
"_index": "kpi",
"_type": "mroutes_by_lane",
"_id": "owVmGW9GT345_2Alfru2DB",
"_score": 1.0,
"_source": {
"warehouse_id": 107,
"date": "2013-04-08",
"lane": "M03",
"routes": 5675
}
},
...
I want to essentially run the same query in ES as I did in SQL, so that my result would be something like (in json of course): M05: 7103, M03: 9740
In elasticsearch, you can achieve this by using terms stats facet:
{
"query" : {
"match_all" : { }
},
"facets" : {
"lane_routes_stats" : {
"terms_stats" : {
"key_field" : "lane",
"value_field" : "routes",
"order": "term"
}
}
}
}