Calculate relevant result on full text search in mongodb - mongodb

I am trying to get the more relevant results from mongo, let say that i have this collections
{ "text" : "mitsubishi lancer 2011"}
{ "text" : "mitsubishi lancer 2011"}
{ "text" : "mitsubishi lancer 2011 in good conditions"}
{ "text" : "lancer 2011"}
{ "text" : "mitsubishi lancer 2014"}
{ "text" : "lancer 2016"}
and make this query
db.post.find({$text: {$search: "mitsubishi lancer 2011"}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}})
i get this result
{ "text" : "mitsubishi lancer 2011", "score" : 2 }
{ "text" : "mitsubishi lancer 2011", "score" : 2 }
{ "text" : "mitsubishi lancer 2011 in good conditions", "score" : 1.7999999999999998 }
{ "text" : "lancer 2011", "score" : 1.5 }
{ "text" : "mitsubishi lancer 2014", "score" : 1.3333333333333333 }
{ "text" : "lancer 2016", "score" : 0.75 }
How do i know that the first two have all the text that i search?
who the score is calculated?

The scoring algorithm is internal to MongoDB and should probably be expected to change over time so the precise values shouldn't matter. You can attempt to understand what's going on by looking at the sources if you want (although I wouldn't recommend that).
The final score depends on the number of occurrences of your searched terms (or rather their word stems), the distances between the matches, the match quality (full match vs. partial), language settings and weights which you can configure. That's all pretty hefty stuff that cannot easily be documented. There is, however, a blog post that explains some aspects quite nicely: https://blog.codecentric.de/en/2013/01/text-search-mongodb-stemming/
Also, things get a bit clearer once you try out various queries using different combinations of search terms and indexed data.
Lastly, if you want to find out if there is a perfect match, the only way I can think of to make this work is something like this:
db.getCollection('test').aggregate(
{
// do the normal filtering query
$match: {
$text: {
$search: "mitsubishi lancer 2011"
}
}
}, {
// select what's relevant in the output and add an indicator "perfectmatch"
$project: {
"text": 1,
"score": {
$meta: "textScore"
},
"perfectmatch": {
$cond: [
{ $eq: [ "$text", "mitsubishi lancer 2011" ] }, // this would check for a perfect match using the exact full string, for individual token matching you would need to do tokenize your query and do a series of other checks here.
true,
false
]
}
}
}, {
// if you want to have the results sorted by "best match first"
$sort: {
"score": -1
}
})

Related

Updating matched array by identifier with multiple names [duplicate]

I have a large DB with various inconsistencies. One of the items I would like to clear up is changing the country status based on the population.
A Sample of the data is:
{ "_id" : "D", "name" : "Deutschland", "pop" : 70000000, "country" : "Large Western" }
{ "_id" : "E", "name" : "Eire", "pop" : 4500000, "country" : "Small Western" }
{ "_id" : "G", "name" : "Greenland", "pop" : 30000, "country" : "Dependency" }
{ "_id" : "M", "name" : "Mauritius", "pop" : 1200000, "country" : "Small island"}
{ "_id" : "L", "name" : "Luxembourg", "pop" : 500000, "country" : "Small Principality" }
Obviously I would like to change the country field go something more uniform, based on population size.
I've tried this approach, but obviously missing some way of tying into an update of the country field.
db.country.updateMany( { case : { $lt : ["$pop" : 20000000] }, then : "Small country" }, { case : { $gte : ["$pop" : 20000000] }, then : "Large country" }
Edit: Posted before I was finished writing.
I was thinking to use $cond functionality, to basically return if true, do X, if false, do y, while using the updateMany.
Is this possible, or is there a workaround?
You really want want bulkWrite() using two "updateMany" statements within it instead. Aggregation expressions cannot be used to do "alternate selection" in any form of update statement.
db.country.bulkWrite([
{ "updateMany": {
"filter": { "pop": { "$lt": 20000000 } },
"update": { "$set": { "country": "Small Country" } }
}},
{ "updateMany": {
"filter": { "pop": { "$gt": 20000000 } },
"update": { "$set": { "country": "Large Country" } }
}}
])
There is still an outstanding "feature request" on SERVER-6566 for "conditional syntax", but this is not yet resolved. The "bulk" API was actually introduced after this request was raised, and really can be adapted as shown to do more or less the same thing.
Also using $out in an aggregation statement as was otherwise suggested is not an option to "update" and can only write to a "new collection" at present. The slated change from MongoDB 4.2 onwards would allow $out to actually "update" an existing collection, however this would only be where the collection to be updated is different from any other collection used within the gathering of data from the aggregation pipeline. So it is not possible to use an aggregation pipeline to update the same collection as what you are reading from.
In short, use bulkWrite().

Perform a search on main collection field and array of objects simultaneously

I have my document structure as below:
{
"codeId" : 8.7628945723895E13, // long numeric value stored in scientific notation by Mongodb
"problemName" : "Hardware Problem",
"problemErrorCode" : "97695686856",
"status" : "active",
"problemDescription" : "ghdsojgnhsdjgh sdojghsdjoghdghd i0dhgjodshgddsgsdsdfghsdfg",
"subProblems" : [
{
"codeId" : 8.76289457238896E14,
"problemName" : "Some problem",
"problemErrorCode" : "57790389503490249640",
"problemDescription" : "This is edited",
"status" : "active",
"_id" : ObjectId("589476eeae39b20b1c15535b")
},
...
]
}
I have a search field which should search by codeId which basically serves as parentCodeID in search fields as shown below
Now, along with parentIdCode I want to search for codeId, problemCode, problemName and problemDescription as well.
How do I query the submodules with a regex search and at same time tag some parent field with "$or" clause etc. to achieve this ?
You can try something like this.
query = {
'$or': [{
"codeId":somevalue
}, {
"subProblems.codeId": {
"$regex": searchValue,
"$options": "i"
}
}, {
//rest of sub modules fields
}]
};

Mongodb Update/Upsert array exact match

I have a collection :
gStats : {
"_id" : "id1",
"criteria" : ["key1":"value1", "key2":"value2"],
"groups" : [
{"id":"XXXX", "visited":100, "liked":200},
{"id":"YYYY", "visited":30, "liked":400}
]
}
I want to be able to update a document of the stats Array of a given array of criteria (exact match).
I try to do this on 2 steps :
Pull the stat document from the array of a given "id" :
db.gStats.update({
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2096955"},{"value1" : "2015610"}]}
},
{
$pull : {groups : {"id" : "XXXX"}}
}
)
Push the new document
db.gStats.findAndModify({
query : {
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2015610"}, {"key2" : "2096955"}]}
},
update : {
$push : {groups : {"id" : "XXXX", "visited" : 29, "liked" : 144}}
},
upsert : true
})
The Pull query works perfect.
The Push query gives an error :
2014-12-13T15:12:58.571+0100 findAndModifyFailed failed: {
"value" : null,
"errmsg" : "exception: Cannot create base during insert of update. Cause
d by :ConflictingUpdateOperators Cannot update 'criteria' and 'criteria' at the
same time",
"code" : 12,
"ok" : 0
} at src/mongo/shell/collection.js:614
Neither query is working in reality. You cannot use a key name like "criteria" more than once unless under an operator such and $and. You are also specifying different fields (i.e groups) and querying elements that do not exist in your sample document.
So hard to tell what you really want to do here. But the error is essentially caused by the first issue I mentioned, with a little something extra. So really your { "$size": 2 } condition is being ignored and only the second condition is applied.
A valid query form should look like this:
query: {
"$and": [
{ "criteria" : { "$size" : 2 } },
{ "criteria" : { "$all": [{ "key1": "2015610" }, { "key2": "2096955" }] } }
]
}
As each set of conditions is specified within the array provided by $and the document structure of the query is valid and does not have a hash-key name overwriting the other. That's the proper way to write your two conditions, but there is a trick to making this work where the "upsert" is failing due to those conditions not matching a document. We need to overwrite what is happening when it tries to apply the $all arguments on creation:
update: {
"$setOnInsert": {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
},
"$push": { "stats": { "id": "XXXX", "visited": 29, "liked": 144 } }
}
That uses $setOnInsert so that when the "upsert" is applied and a new document created the conditions specified here rather than using the field values set in the query portion of the statement are used instead.
Of course, if what you are really looking for is truly an exact match of the content in the array, then just use that for the query instead:
query: {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
}
Then MongoDB will be happy to apply those values when a new document is created and does not get confused on how to interpret the $all expression.

How to query and update nested arrays

I am building a course system. Each course has multiple sections, each section has multiple steps. My datastructure is as follows:
{
"_id" : "Mtz4DMTwMMKWTWbzE",
"slug" : "how-to-be-awesome",
"title" : "How to be awesome",
"description" : "In 4 easy lessons.",
"createdAt" : ISODate("2014-08-25T13:33:24.675Z"),
"sections" : [
{
"title" : "Be cool",
"description" : "Title says it all really",
"steps" : [
{
"title" : "Wear sunglasses",
"description" : "Always works."
},
{
"title" : "Be funny",
"description" : "Make an occasional joke. But no lame ones."
}
]
}
]
}
This worked while adding steps;
Course._collection.update( { _id: course._id, sections: section }, {
"$push": {
"sections.$.steps": step
}
})
But I can't figure out how to update a step. I tried to give the steps an ID and do it like that, but it's not working, apparently because it's two arrays deep, and you can't have two positionals ($) in a query. I tried something like this:
Course._collection.update( { _id: course._id, 'sections.steps._id': step._id }, {
"$set": {
"sections.steps.$.title": "test updated title"
}
})
But this gave the following error:
can't append to array using string field name: steps
Is there a way to do this? Or is my schema design off?
Thanks!

MongoDB: get documents by tags

I've got documents containing tags array. I want to provide tags based recommendations on site, so I need to get documents containing same tags + documents that don't match 1 tag + documents that don't match 2 tags and etc...
How do I do that?
example collection:
db.tags.insert({"tags":["red", "tall", "cheap"]});
db.tags.insert({"tags":["blue", "tall", "expensive"]});
db.tags.insert({"tags":["blue", "little", "cheap"]});
find all that include the tag "blue"
db.tags.find({tags: { $elemMatch: { $eq: "blue" } }})
find all tagged "blue" and only blue
db.tags.find({tags: "blue"})
find all tagged "blue" and "cheap"
db.tags.find({ tags: { $all: ["cheap", "blue"] } } )
find all not "blue"
db.tags.find({tags: { $ne: "blue" } })
find all "blue" and "cheap" but not "red" and not "tall"
not possible in my mongo db. From mongodb 1.9.1 on something like this should work, though (not tested):
db.tags.find({ $and: [ {tags: { $all: ["blue", "cheap"] } }, { tags: { $nin: ["red", "tall"] } } ] })
The rephrased question is:
Suppose if job postings have search tags attached like
Job Postings
[{_id : ObjectId(1249999493),tags : ['Location1', 'SkillSet1', 'SkillSet2', 'Someother1', 'Someother2']},
{_id : ObjectId(1249999494),tags : ['Location3', 'SkillSet1', 'SkillSet0', 'Someother4', 'Someother3']}]
Now, he wants the records having tags ['Location1','SkillSet1', 'SkillSet0']
And the selected docs having more keywords from the query should come first. Less keywords matching should come last. So, that one can get more suitable job posting for the search query.
Am I sensible or do I need to re-phrase ?
Steps:
Find matching products which contains any of the specified keys.
Unfold on keys
Do find again to filter unwanted after unfolding
Group them by adding key occurrence
Sort desc to get most relevant first
[{ "$match" : { "keys" : { "$in" : [ { "$regex" : "text" , "$options" : "i"}]}}}, { "$unwind" : "$keys"}, { "$match" : { "keys" : { "$in" : [ { "$regex" : "text" , "$options" : "i"}]}}}, { "$group" : { "_id" : { "productId" : "$productId"} , "relatedTags" : { "$sum" : 1} }}, { "$sort" : { "relatedTags" : -1}},{ "$limit" : 10}]