Predefined Term Mapping - mongodb

I have not yet used Elasticsearch so please excuse the bad description. I would like to know if it is possible to configure Elasticsearch to do the following - I had some issues in MongoDB with this as the full text search functionalities seem to be a little limiting.
Here's my problem - when I do a search for the term Korea I do not
want it to match North Korea or N. Korea in the document.
The assumption is that a search for Korea is about South Korea. This is obviously different from a synonym as it is kind of the opposite. A phrase search for South Korea is out of the question here as it isn't applicable in my problem. Is this possible?
I will accept answers for either MongoDB or Elasticsearch.

What if you use a query like this one:
{
"query": {
"bool": {
"should": [
{
"match": {
"some_field": "korea"
}
},
{
"query_string": {
"query": "-some_field:(\"north korea\")"
}
},
{
"query_string": {
"query": "-some_field:(\"n. korea\")"
}
}
]
}
}
}
What it does is like this:
if that field content matches "korea" then it's receiving a score
if that field isn't matching "north korea" again it's getting some score boost
again, if it doesn't match "n. korea" is getting some additional score.
Basically, the score increases if it matches "korea", if it doesn't match "north korea" and if it doesn't match "n. korea".
For example, for documents like this
POST /my_index/test/1
{
"text": "North Korea"
}
POST /my_index/test/2
{
"text": "Korea"
}
POST /my_index/test/3
{
"text": "N. Korea"
}
POST /my_index/test/4
{
"text": "South Korea"
}
The query above will return this:
"hits": [
{
"_index": "korea",
"_type": "test",
"_id": "2",
"_score": 1.4471208,
"_source": {
"text": "Korea"
}
},
{
"_index": "korea",
"_type": "test",
"_id": "4",
"_score": 1.4227209,
"_source": {
"text": "South Korea"
}
},
{
"_index": "korea",
"_type": "test",
"_id": "1",
"_score": 0.48779577,
"_source": {
"text": "North Korea"
}
},
{
"_index": "korea",
"_type": "test",
"_id": "3",
"_score": 0.48779577,
"_source": {
"text": "N. Korea"
}
}
]
The highest scores are for documents that are not about north korea.

Related

MongoDB lookup with multiple nested levels

In my application, I have a section of comments and replies under some documents.
Here's how my database schema looks like
db.updates.insertOne({
"_id": "62347813d28412ffd82b551d",
"documentID": "17987e64-f848-40f3-817e-98adfd9f4ecd",
"stream": [
{
"id": "623478134c449b218b68f636",
"type": "comment",
"text": "Hey #john, we got a problem",
"authorID": "843df3dbbdfc62ba2d902326",
"taggedUsers": [
"623209d2ab26cfdbbd3fd348"
],
"replies": [
{
"id": "623478284c449b218b68f637",
"type": "reply",
"text": "Not sure, let's involve #jim here",
"authorID": "623209d2ab26cfdbbd3fd348",
"taggedUsers": [
"26cfdbbd3fd349623209d2ab"
]
}
]
}
]
})
db.users.insertMany([
{
"_id": "843df3dbbdfc62ba2d902326",
"name": "Manager"
},
{
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
},
{
"_id": "26cfdbbd3fd349623209d2ab",
"name": "Jim"
},
])
I want to join those two collections, and replace user ids with complete user information on all levels. So the final JSON should look like this
{
"_id": "62347813d28412ffd82b551d",
"documentID": "17987e64-f848-40f3-817e-98adfd9f4ecd",
"stream": [
{
"id": "623478134c449b218b68f636",
"type": "comment",
"text": "Hey #john, we got a problem",
"author": {
"_id": "843df3dbbdfc62ba2d902326",
"name": "Manager"
},
"taggedUsers": [
{
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
}
],
"replies": [
{
"id": "623478284c449b218b68f637",
"type": "reply",
"text": "Not sure, let's involve #jim here",
"author": {
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
},
"taggedUsers": [
{
"_id": "26cfdbbd3fd349623209d2ab",
"name": "Jim"
}
]
}
]
}
]
}
I know how to do the $lookup on the top-level fields, including pipelines, but how can I do with the nested ones?

check if a field of type array contains an array

Im using mongoose, I have the following data of user collection:
[{
"_id": "1",
"notes": [
{
"value": "A90",
"text": "math"
},
{
"value": "A80",
"text": "english"
},
{
"value": "A70",
"text": "art"
}
]
},
{
"_id": "2",
"notes": [
{
"value": "A90",
"text": "math"
},
{
"value": "A80",
"text": "english"
}
]
},
{
"_id": "3",
"notes": [
{
"value": "A80",
"text": "art"
}
]
}]
and I have as a parameters the following array: [ "A90", "A80" ]
so I want to make a query to use this array to return only the records that have all the array items in the notes (value) table.
So for the example above it will return:
[{
"_id": "1",
"notes": [
{
"value": "A90",
"text": "math"
},
{
"value": "A80",
"text": "english"
},
{
"value": "A70",
"text": "art"
}
]
},
{
"_id": "2",
"notes": [
{
"value": "A90",
"text": "math"
},
{
"value": "A80",
"text": "english"
}
]
}]
I tried the following find query:
{ "notes": { $elemMatch: { value: { $in: valuesArray } } }}
but it returns a record even if just one element in valuesArray exist.
it turned out to be quite easy:
find({ "notes.value": { $all: arrayValues } })

ElasticSearch autocomplete for keywords from a string

My document looks like:
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test_db2",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"name": "very cool shoes",
"price": 26
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "2",
"_score": 1,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "3",
"_score": 1,
"_source": {
"name": "shirt",
"price": 25
}
}
]
}
How to create autocomplete in elasticsearch like for example:
I put in input word "sh" , after that I should see result
shoes
shampoo
shirt
.....
Example of what I need
Take a look at ngrams. Or actually, edge ngrams are probably all you need.
Qbox has a couple of blog posts about setting up autocomplete with ngrams, so for a more in-depth discussion I would refer you to these:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
But just very quickly, this should get you started.
First I set up the index:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"edgengram_filter"
]
}
},
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
},
"price":{
"type": "integer"
}
}
}
}
}
Then I indexed your documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name": "very cool shoes","price": 26}
{"index":{"_id":2}}
{"name": "great shampoo","price": 15}
{"index":{"_id":3}}
{"name": "shirt","price": 25}
Now I can get autocomplete results with a simple match query:
POST /test_index/_search
{
"query": {
"match": {
"name": "sh"
}
}
}
which returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.30685282,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.30685282,
"_source": {
"name": "shirt",
"price": 25
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.19178301,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"name": "very cool shoes",
"price": 26
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/0886488ddfb045c69eed67b15e9734187c8b2491

Custom API in Elastic search

Building my first RESTful api, and thought I'd try elasticsearch for a base. Is there a way customize the API in Elasticsearch to only return certain fields from results of a query. For instance if I have data with fname, lname, city, state, zip, email and I only want to return a list of fnames and cities for every query matching the city field. So something like this:
curl -XPOST "http://localhost:9200/custom_call/_search" -d'
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
}
}'
Would ideally return something like:
{"took": 52, "timed_out": false, "_shards": {
"total": 35,
"successful": 35,
"failed": 0
}, "hits": {
"total": 1,
"max_score": 0.375,
"hits": [
{
"_index": "persons",
"_type": "person",
"_id": "6",
"_score": 0.375,
"_source": {
"fname": "Bob",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "13",
"_score": 0.375,
"_source": {
"fname": "Sue",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "21",
"_score": 0.375,
"_source": {
"fname": "Jose",
"city": "Toronto",
}
}
]
}}
Not sure if Elasticsearch is set up to do this or even if you would want it to. My first foray into building a RESTful API. I figure if NPR StackOverflow like it, its worth a shot! Thanks for the help.
Yes you can, I think you haven't tried to find out on your own.
Here is how to do that,
POST localhost:9200/index/type/_search
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
},
"_source" :["fields_you_want_to_get"]
}
The term you are looking is source filtering.

Elasticsearch nested query

I'm new to elasticsearch, managed to set it up and import recordset from my mongodb collection using the river plugin. For a start, I want to query against the "desc" field but just can't manage to get the query .. not sure if the problem is driven by the way index was defined.. can anyone help please?
Sample recordset in elastic search looks like this
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 107209,
"max_score": 1,
"hits": [
{
"_index": "shiv",
"_type": "shiv",
"_id": "iG1eIzN7RGO7hFfxTlnLuA",
"_score": 1,
"_source": {
"_id": {
"$oid": "50901d7f485bf7bd1c000021"
},
"brand": "",
"category": {
"$ref": "categories",
"$id": {
"$oid": "4fbd2221758cb11d14000174"
}
},
"comments": [],
"count_comment": 0,
"count_fav": 2,
"count_hotness": 1.46,
"count_rekick": 0,
"count_share": 0,
"country": {
"$ref": "countries",
"$id": {
"$oid": "4fec98f7758cb18c6e0002c9"
}
},
"currency": "pound",
"desc": "A men's automatic watch, this Seamaster Bond model features a Co-Axial escapement and date function. Its blue dial is teamed with a stainless steel case and bracelet for a look that's sporty and refined.",
"gender": "male",
"ident": "omega-seamaster-diver-bond-men-s-automatic-watch---ernest-jones-1351622015",
"img_url": "http://s7ondemand4.scene7.com/is/image/Signet/5735793?$detail$",
"lifestyles": [
{
"$ref": "lifestyles",
"$id": {
"$oid": "508ff6ca485bf73112000060"
}
}
],
"location": "United Kingdom",
"owner": {
"$ref": "accounts",
"$id": {
"$oid": "50742fd8485bf74b7a00213f"
}
},
"price": 2400,
"store": "ernestjones.co.uk",
"tags": [
"ernest-jones",
"bond"
],
"timestamp_creation": 1351622015,
"timestamp_exp": 1356825600,
"timestamp_update": 1351622015,
"title": "Omega Seamaster Diver Bond men's automatic watch - Ernest Jones",
"url": "http%3A%2F%2Fwww.ernestjones.co.uk%2Fwebstore%2Fd%2F5735793%2Fomega%20seamaster%20diver%20bond%20men%27s%20automatic%20watch%2F%3Futm_source%3Dgooglebase%26utm_medium%3Dfeedmanager%26cm_mmc%3DFroogle-_-CKB-_-nurses_fobs-_-watches%26cm_mmca1%3Domega%26cm_mmca2%3Dmale%26cm_mmca3%3Dadult"
}
}
]
}
}
The mapping of the index "shiv" looks like
{
"shiv": {
"properties": {
"$oid": {
"type": "string"
}
}
}
}
Thanks again
There are lots of ways to query, have you tried a match query?
Using curl or a rest client of your choice...
http://[host]:9200/[index_name]/[doc_type]/_search
{
"query" : {
"match" : {
"desc" : "some value you want to find in desc"
}
}
}