ElasticSearch Multi Index Query - postgresql

simple question: I have multiple indexes in my elasticsearch engine mirrored by postgresql using logstash. ElasticSearch performs well for fuzzy searches, but now I need to use references within the indexes, that need to be handled by the queries.
Index A:
{
name: "alice",
_id: 5
}
...
Index B:
{
name: "bob",
_id: 3,
best_friend: 5
}
...
How do I query:
Get every match of index B with field name starting with "b" and index A referenced by "best_friend" with the name starting with "a"
Is this even possible with elasticsearch?

Yes, that's possible: POST A,B/_search will query multiple indexes.
In order to match a record from a specific index, you can use meta-data field _index
Below is a query that gets every match of index B with field name starting with "b" and index A with the name starting with "a" but not matches a reference as you usually do in relational SQL databases. foreign key reference matching (join) in Elastic and every NoSQL is YOUR responsibility AFAIK. refer to Elastic Definitive Guide to find out the best approach to your needs. Lastly, NoSQL is not SQL, change your mind.
POST A,B/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"prefix": {
"name": "a"
}
},
{
"term": {
"_index": "A"
}
}
]
}
},
{
"bool": {
"must": [
{
"prefix": {
"name": "b"
}
},
{
"term": {
"_index": "B"
}
}
]
}
}
]
}
}
}

Related

How to use nested query using &or with &any in mongodb?

I'm learning mongoDB queries and have a problem given my collection looks like:
"filename": "myfile.png",
"updatedCoordinates": [
{
"xmin": 537.640869140625,
"xmax": 1049.36376953125,
"ymin": 204.90736389160156,
"ymax": 714.813720703125,
"label": "LABEL_0",
"status": "UNCHANGED"
},
{
"xmin": 76.68355560302734,
"xmax": 544.8860473632812,
"ymin": 151.90313720703125,
"ymax": 807.1371459960938,
"label": "LABEL_0",
"status": "UNCHANGED"
}],
"predictedCoordinates": [
{
"xmin": 537.640869140625,
"xmax": 1049.36376953125,
"ymin": 204.90736389160156,
"ymax": 714.813720703125,
"status": "UNCHANGED",
"label": "LABEL_0"
}
]
and the eligible values of status are: UNCHANGED, CHANGED, UNDETECTED
How would I query: Get all the in instances from the db where status == CHANGED / UNDECTED for ANY of the values inside either updatedCoordinates or predictedCoordinates ?
It means that if status of minimum of 1 entry inside either updated or predicted is set to changed or undetected, it's eligible for my query.
I tried:
{"$or":[{"updatedCoordinates.status": "CHANGED"}, {"predictedCoordinates.status": "CHANGED"}]}
With Python dict, I can query as:
def find_eligible(single_instance:dict):
for key in ["predictedCoordinates", "updatedCoordinates"]:
for i in single_instance[key]:
if i["status"] in ["CHANGED", "UNDETECTED]: return True
return False
But retrieving 400K instances first just to filter a few ones is not a good idea.
Try running this query:
db.collection.find({
"$or": [
{
"updatedCoordinates.status": {
"$in": [
"CHANGED",
"UNDETECTED"
]
}
},
{
"predictedCoordinates.status": {
"$in": [
"CHANGED",
"UNDETECTED"
]
}
}
]
})
Mongodb playground link: https://mongoplayground.net/p/Qda-G5L1mbR
Simple use of Mongo's dot notation allows access to nested values in arrays / objects, like so:
db.collection.find({
"updatedCoordinates.status": "CHANGED"
})
Mongo Playground

Some documents not appear in atlas-search when query by few letters

I have a collection. The document structure is,
{
model: {
name: 'string name'
}
}
I have enabled atlas search, Also created a search index for model.name field. Search works fine, But the only issue is couldn't get results for very minimal query letters.
Example:
I have a document,
{
model: {
name: "space1duplicate"
}
}
If I query space, I couldn't get the result.
{
index: 'search_index',
compound: {
must: [
{
text: {
query: 'space',
path: 'model.name'
}
}
]
}
}
But If I query space1duplica, It returns the result.
During indexing, full text search engine tokenizes the input by splitting up text into searchable chunks. Check out the relevant section in the documentation.
By default Atlas Search does not split words by digits, but if you need that, try to define a custom analyzer with the regex tokenizer and use it for your field:
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"analyzer": "digitSplitter",
"type": "string"
}
]
}
},
"analyzers": [
{
"charFilters": [],
"name": "digitSplitter",
"tokenFilters": [],
"tokenizer": {
"pattern": "[0-9]+",
"type": "regexSplit"
}
}
]
}
Also note that you can use multiple analyzers for string fields, if needed.
Atlas search uses Lucene to do the job. Documentation on mongodb site is mostly focused on mongo specific syntax to pass the query to Lucene and might be a bit confusing if you are not familiar with its query language.
First of all, there are number of tokenizers and analizers available, each serve specific purpose. You really need include index definition when you ask quetions about atlas search.
Default tokeniser uses word separators to build the index, then removes endings to store stems, again depending on language, English by default.
So in order to find "space1duplicate" by beginning of the word you can use "autocomplete" analizer with nGram tokens. The index should be created as following:
{
"mappings": {
"dynamic": false,
"fields": {
"name": {
"tokenization": "nGram",
"type": "autocomplete"
}
}
},
"storedSource": {
"include": [
"name"
]
}
}
Once it's indexed (you may need to wait a bit you you have larger dataset), you can find the document with following search:
{
index: 'search_index',
compound: {
must: [
{
autocomplete: {
query: 'spa',
path: 'name'
}
}
]
}
}

Search and update in array of objects MongoDB

I have a collection in MongoDB containing search history of a user where each document is stored like:
"_id": "user1"
searchHistory: {
"product1": [
{
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
},
{
"timestamp": 1623481234,
"query": {
"query": "lindor",
"qty": 4
}
},
],
"product2": [
{
"timestamp": 1623473622,
"query": {
"query": "table",
"qty": 1
}
},
{
"timestamp": 1623438232,
"query": {
"query": "ike",
"qty": 1
}
},
]
}
Here _id of document acts like a foreign key to the user document in another collection.
I have backend running on nodejs and this function is used to store a new search history in the record.
exports.updateUserSearchCount = function (userId, productId, searchDetails) {
let addToSetData = {}
let key = `searchHistory.${productId}`
addToSetData[key] = { "timestamp": new Date().getTime(), "query": searchDetails }
return client.db("mydb").collection("userSearchHistory").updateOne({ "_id": userId }, { "$addToSet": addToSetData }, { upsert: true }, async (err, res) => {
})
}
Now, I want to get search history of a user based on query only using the db.find().
I want something like this:
db.find({"_id": "user1", "searchHistory.somewildcard.query": "some query"})
I need a wildcard which will replace ".somewildcard." to search in all products searched.
I saw a suggestion that we should store document like:
"_id": "user1"
searchHistory: [
{
"key": "product1",
"value": [
{
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
}
]
}
]
However if I store document like this, then adding search history to existing document becomes a tideous and confusing task.
What should I do?
It's always a bad idea to save values are keys, for this exact reason you're facing. It heavily limits querying that field, obviously the trade off is that it makes updates much easier.
I personally recommend you do not save these searches in nested form at all, this will cause you scaling issues quite quickly, assuming these fields are indexed you will start seeing performance issues when the arrays get's too large ( few hundred searches ).
So my personal recommendation is for you to save it in a new collection like so:
{
"user_id": "1",
"key": "product1",
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
}
Now querying a specific user or a specific product or even a query substring is all very easily supported by creating some basic indexes. an "update" in this case would just be to insert a new document which is also much faster.
If you still prefer to keep the nested structure, then I recommend you do switch to the recommended structure you posted, as you mentioned updates will become slightly more tedious, but you can still do it quite easily using arrayFilters for updating a specific element or just using $push for adding a new search

How to get count of documents that match a certain condition in Elasticsearch

Suppose, I have a MongoDB query
db.tm_watch.count({trademark: {$in: [ObjectId('1'), ObjectId('2')]}});
that returns the count of documents that have trademark equal to 1 or 2.
I have tried this query to convert it into elasticsearch one.
es_query = {
"query": {
"bool": {
"must": [
{"terms": {"trademark": ids}},
{"term": {"team": req.user.team.id}},
],
}
}
}
esClient.count({
index: 'tm_watch',
type: 'docs',
body: es_query
}
but I don't know is this correct since I'm new to Elasticsearch.
Thanks!
The ES equivalent to mongodb's .count method is the Count API.
Assuming your index name is tm_watch and the field trademark has a .keyword multi-field mapping, you could use a terms query:
POST tm_watch/_count
{
"query": {
"terms": {
"trademark.keyword": [ "1", "2" ]
}
}
}

Ensuring exactly N items with value X remain in an array with mongodb

Assuming we have a document in my MongoDB collection like the following:
{
"_id": "coffee",
"orders": [ "espresso", "cappuccino", "espresso", ... ],
}
How do I use a single update statement that ensures there are exactly say 2 espressos in this document, without knowing how many there are to begin with?
I know that using 2 consecutive statements I can do
db.test.update(
{ _id: "coffee" },
{ "$pull": { "orders": "espresso" } }
);
followed by
db.test.update(
{ "_id": "coffee" },
{ "$push": { "orders": { "$each": ["espresso", "espresso"] } } }
);
But when combining both into a single statement, MongoDB balks with an error 40, claiming Updating the path 'orders' would create a conflict at 'orders' (understandable enough - how does MongoDB what to do first?).
So, how can I do the above in a single statement? Please note that since I'll be using the above in the context of a larger unordered bulk operation, combining the above in an ordered bulk operation won't work.
Thanks for your help!