How to assign weights to searched documents in MongoDb? - mongodb

This might sounds like simple question for you but i have spend over 3 hours to achieve it but i got stuck in mid way.
Inputs:
List of keywords
List of tags
Problem Statement: I need to find all the documents from the database which satisfy following conditions:
List documents that has 1 or many matching keywords. (achieved)
List documents that has 1 or many matching tags. (achieved)
Sort the found documents on the basis of weights: Each keyword matching carry 2 points and each tag matching carry 1 point.
Query: How can i achieve requirement#3.
My Attempt: In my attempt i am able to list only on the basis of keyword match (that too without multiplying weight with 2 ).
tags are array of documents. Structure of each tag is like
{
"id" : "ICC",
"some Other Key" : "some Other value"
}
keywords are array of string:
["women", "cricket"]
Query:
var predicate = [
{
"$match": {
"$or": [
{
"keywords" : {
"$in" : ["cricket", "women"]
}
},
{
"tags.id" : {
"$in" : ["ICC"]
}
}
]
}
},
{
"$project": {
"title":1,
"_id": 0,
"keywords": 1,
"weight" : {
"$size": {
"$setIntersection" : [
"$keywords" , ["cricket","women"]
]
}
},
"tags.id": 1
}
},
{
"$sort": {
"weight": -1
}
}
];

It seems that you were close in your attempt, but of course you need to implement something to "match your logic" in order to get the final "score" value you want.
It's just a matter of changing your projection logic a little, and assuming that both "keywords" and "tags" are arrays in your documents:
db.collection.aggregate([
// Match your required documents
{ "$match": {
"$or": [
{
"keywords" : {
"$in" : ["cricket", "women"]
}
},
{
"tags.id" : {
"$in" : ["ICC"]
}
}
]
}},
// Inspect elements and create a "weight"
{ "$project": {
"title": 1,
"keywords": 1,
"tags": 1,
"weight": {
"$add": [
{ "$multiply": [
{"$size": {
"$setIntersection": [
"$keywords",
[ "cricket", "women" ]
]
}}
,2] },
{ "$size": {
"$setIntersection": [
{ "$map": {
"input": "$tags",
"as": "t",
"in": "$$t.id"
}},
["ICC"]
]
}}
]
}
}},
// Then sort by that "weight"
{ "$sort": { "weight": -1 } }
])
So it is basicallt the $map logic here that "transforms" the other array to just give the id values for comparison against the "set" solution that you want.
The $add operator provides the additional "weight" to the member you want to "weight" your responses by.

Related

Retrieve specific element of a nested document

Just cannot figure this out. This is the document format from a MongoDB of jobs, which is derived from an XML file the layout of which I have no control over:
{
"reference" : [ "93417" ],
"Title" : [ "RN - Pediatric Director of Nursing" ],
"Description" : [ "...a paragraph or two..." ],
"Classifications" : [
{
"Classification" : [
{
"_" : "Nurse / Midwife",
"name" : [ "Category" ]
},
{
"_" : "FL - Jacksonville",
"name" : [ "Location" ],
},
{
"_" : "Permanent / Full Time",
"name" : [ "Work Type" ],
},
{
"_" : "Some Health Care Org",
"name" : [ "Company Name" ],
}
]
}
],
"Apply" : [
{
"EmailTo" : [ "jess#recruiting.co" ]
}
]
}
The intention is to pull a list of jobs from the DB, to include 'Location', which is buried down there as the second document at 'Classifications.Classification._'.
I've tried various 'aggregate' permutations of $project, $unwind, $match, $filter, $group… but I don't seem to be getting anywhere. Experimenting with just retrieving the company name, I was expecting this to work:
db.collection(JOBS_COLLECTION).aggregate([
{ "$project" : { "meta": "$Classifications.Classification" } },
{ "$project" : { "meta": 1, _id: 0 } },
{ "$unwind" : "$meta" },
{ "$match": { "meta.name" : "Company Name" } },
{ "$project" : { "Company" : "$meta._" } },
])
But that pulled everything for every record, thus:
[{
"Company":[
"Nurse / Midwife",
"TX - San Antonio",
"Permanent / Full Time",
"Some Health Care Org"
]
}, { etc etc }]
What am I missing, or misusing?
Ideally with MongoDB 3.4 available you would simply $project, and use the array operators of $map, $filter and $reduce. The latter to "compact" the arrays and the former to to extract the relevant element and detail. Also $arrayElemAt takes just the "element" from the array(s):
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": {
"$reduce": {
"input": "$Classifications.Classification",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this" ]
}
}
},
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
Or even skip the $reduce which is merely applying the $concatArrays to "merge" and simply grab the "first" array index ( since there is only one ) using $arrayElemAt:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$arrayElemAt": [ "$Classifications.Classification", 0 ] },
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
That makes the operation compatible with MongoDB 3.2, which you "should" be running at least.
Which in turn allows you to consider alternate syntax for MongoDB 3.4 using $indexOfArray based on the initial input variable of the "first" array index using $let to somewhat shorten the syntax:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$let": {
"vars": {
"meta": {
"$arrayElemAt": [
"$Classifications.Classification",
0
]
}
},
"in": {
"$arrayElemAt": [
"$$meta._",
{ "$indexOfArray": [
"$$meta.name", [ "Location" ]
]}
]
}
}
}
}}
])
If indeed you consider that to be "shorter", that is.
In the other sense though, much like above there is an "array inside and array", so in order to process it, you $unwind twice, which is effectively what the $concatArrays inside $reduce is countering in the ideal case:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$unwind": "$Classifications" },
{ "$unwind": "$Classifications.Classification" },
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": { "_id": 0, "output": "$Classifications.Classification._" } }
])
All statements actually produce:
{
"output" : "FL - Jacksonville"
}
Which is the matching value of "_" in the inner array element for the "Location" as selected by your original intent.
Keeping in mind of course that all statements really should be preceded with the relevant [$match]9 statement as shown:
{ "$match": { "Classifications.Classification.name": "Location" } },
Since without that you would be possibly processing documents unnecessarily, which did not actually contain an array element matching that condition. Of course this may not be the case due to the nature of the documents, but it's generally good practice to make sure the "initial" selection always matches the conditions of details you later intend to "extract".
All of that said, even if this is the result of a direct import from XML, the structure should be changed since it does not efficiently present itself for queries. MongoDB documents do not work how XPATH does in terms of issuing queries. Therefore anything "XML Like" is not going to be a good structure, and if the "import" process cannot be changed to a more accommodating format, then there should at least be a "post process" to manipulate this into a separate storage in a more usable form.

Querying MongoDB array of similar objects

I've to work with old MongoDB where objects in one collection are structured like this.
{
"_id": ObjectId("57fdfcc7a7c81fde38b79a3d"),
"parameters": [
{
"key": "key1",
"value": "value1"
},
{
"key": "key2",
"value": "value2"
}
]
}
The problem is that parameters is an array of objects, which makes efficient querying difficult. There can be about 50 different objects, which all have "key" and "value" properties. Is it possible to make a query, where the query targets "key" and "value" inside one object? I've tried
db.collection.find({$and:[{"parameters.key":"value"}, {"parameters.value":"another value"}]})
but this query hits all the objects in parameters array.
EDIT. Nikhil Jagtiani found solution to my original question, but actually I should be able query to target multiple objects inside parameters array. E.g. check keys and values in two different objects in parameters array.
Please refer below mongo shell aggregate query :
db.collection.aggregate([
{
$unwind:"$parameters"
},
{
$match:
{
"parameters.key":"key1",
"parameters.value":"value1"
}
}
])
1) Stage 1 - Unwind : Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
2) Stage 2 - Match : Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
Without aggregation, queries will return the entire document even if one subdocument matches. This pipeline will only return the required subdocuments.
Edit: If you need to specify multiple key value pairs, what we need is $in for parameters field.
db.collection.aggregate([{$unwind:"$parameters"},{$match:{"parameters":{$in:[{ "key" : "key1", "value" : "value1"},{ "key" : "key2", "value" : "value2" }]}}}])
will match the following two pairs of key-values as subdocuments:
1) { "key" : "key1", "value" : "value1" }
2) { "key" : "key2", "value" : "value2" }
There is a $filter operator in the aggregation framework which is perfect for such queries. A bit verbose but very efficient, you can use it as follows:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$project": {
"parameters": {
"$filter": {
"input": "$parameters",
"as": "item",
"cond": {
"$or": [
{
"$and" : [
{ "$eq": ["$$item.key", "key1"] },
{ "$eq": ["$$item.value", "val1"] }
]
},
{
"$and" : [
{ "$eq": ["$$item.key", "key2"] },
{ "$eq": ["$$item.value", "val2"] }
]
}
]
}
}
}
}
}
])
You can also do this with more set operators in MongoDB 2.6 without using $unwind:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$project": {
"parameters": {
"$setDifference": [
{ "$map": {
"input": "$parameters",
"as": "item",
"in": {
"$cond": [
{ "$or": [
{
"$and" : [
{ "$eq": ["$$item.key", "key1"] },
{ "$eq": ["$$item.value", "val1"] }
]
},
{
"$and" : [
{ "$eq": ["$$item.key", "key2"] },
{ "$eq": ["$$item.value", "val2"] }
]
}
]},
"$$item",
false
]
}
}},
[false]
]
}
}
}
])
For a solution with MongoDB 2.4, you would need to use the $unwind operator unfortunately:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{ "$unwind": "$parameters" },
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$group": {
"_id": "$_id",
"parameters": { "$push": "$parameters" }
}
}
]);
Is it possible to make a query, where the query targets "key" and
"value" inside one object?
This is possible if you know which object(id) you are going to query upfront(to be given as input parameter in the find query). If that is not possible then we can try on the below approach for efficient querying.
Build an index on the parameters.key and if needed also on parameters.value. This would considerably improve the query performance.
Please see
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/core/index-multikey/

Check if embed exists - aggregation framework mongodb

This is my test collection:
>db.test.find()
{
"_id": ObjectId("54906479e89cdf95f5fb2351"),
"reports": [
{
"desc": "xxx",
"order": {"$id": ObjectId("53fbede62827b89e4f86c12e")}
}
]
},
{
"_id": ObjectId("54906515e89cdf95f5fb2352"),
"reports": [
{
"desc": "xxx"
}
]
},
{
"_id": ObjectId("549067d3e89cdf95f5fb2353"),
"reports": [
{
"desc": "xxx"
}
]
}
I want to count all documents and documents with order, so:
>db.test.aggregate({
$group: {
_id: null,
all: {
$sum: 1
},
order: {
$sum: {
"$cond": [
{
"$ifNull": ["$reports.order", false]
},
1,
0
]
}
}
}
})
and my results:
{
"result" : [
{
"_id" : null,
"all" : 3,
"order" : 3
}
],
"ok" : 1
}
but expected:
{
"result" : [
{
"_id" : null,
"all" : 3,
"order" : 1
}
],
"ok" : 1
}
It makes no difference what I'll put - "$reports.order", "$reports.xxx", etc, aggregation framework check only if the field reports exists, ignores embed.
$ifNull and $eq dosn't work with embeded documents?
Is any way to do something like this
db.test.find({"reports.order": {$exists: 1}})
in aggregation framework?
Sorry for my english and I hope that you understood what I want to show you :)
I think it doesn't work because the field "reports" contain an array, not an object.
I mean, your aggregation works as you expect in this collection:
>db.test.find()
{
"_id": ObjectId("54906479e89cdf95f5fb2351"),
"reports":
{
"desc": "xxx",
"order": {"$id": ObjectId("53fbede62827b89e4f86c12e")}
}
},
{
"_id": ObjectId("54906515e89cdf95f5fb2352"),
"reports":
{
"desc": "xxx"
}
},
{
"_id": ObjectId("549067d3e89cdf95f5fb2353"),
"reports":
{
"desc": "xxx"
}
}
Note that I removed the "[" and "]", so now it's an object, not an array (one-to-one relation).
Because you have array inside the "report" field, you need to unwind the array to output one document for each element. I suppose that if you have two "order" fields inside the "reports" array, you only wants to count it once. I mean:
"reports": [
{
"desc": "xxx",
"order": {"$id": ObjectId("53fbede62827b89e4f86c12e")},
"order": "yyy",
}
]
Should only count as one for the object final "order" sum.
In this case, you need to unwind, group by _id (because the previous example outputs two documents for the same _id) and then group again to count all documents:
db.test.aggregate([
{$unwind: '$reports'},
{$group:{
_id:"$_id",
order:{$sum:{"$cond": [
{
"$ifNull": ["$reports.order", false]
},
1,
0
]
}
}
}},
{$group:{
_id:null,
all:{$sum:1},
order: {
$sum:{
"$cond": [{$eq: ['$order', 0]}, 0, 1]
}
}
}}])
Maybe there is a shorter solution, but this works.

How to find document and single subdocument matching given criterias in MongoDB collection

I have collection of products. Each product contains array of items.
> db.products.find().pretty()
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "01.02.2100",
"purchasePrice" : 1,
"sellingPrice" : 10,
"count" : 15
},
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
So, can you please give me an advice, how I can query MongoDB to retrieve all products with only single item which date is equals to the date I pass to query as parameter.
The result for "31.08.2014" must be:
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
What you are looking for is the positional $ operator and "projection". For a single field you need to match the required array element using "dot notation", for more than one field use $elemMatch:
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
)
Or the $elemMatch for more than one matching field:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "shop": 1, "name":1, "items.$": 1 }
)
These work for a single array element only though and only one will be returned. If you want more than one array element to be returned from your conditions then you need more advanced handling with the aggregation framework.
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$unwind": "$items" },
{ "$match": { "items.date": "31.08.2014" } },
{ "$group": {
"_id": "$_id",
"shop": { "$first": "$shop" },
"name": { "$first": "$name" },
"items": { "$push": "$items" }
}}
])
Or possibly in shorter/faster form since MongoDB 2.6 where your array of items contains unique entries:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$project": {
"shop": 1,
"name": 1,
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.date", "31.08.2014" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
Or possibly with $redact, but a little contrived:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$redact": {
"$cond": [
{ "$eq": [ { "$ifNull": [ "$date", "31.08.2014" ] }, "31.08.2014" ] },
"$$DESCEND",
"$$PRUNE"
]
}}
])
More modern, you would use $filter:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$addFields": {
"items": {
"input": "$items",
"cond": { "$eq": [ "$$this.date", "31.08.2014" ] }
}
}}
])
And with multiple conditions, the $elemMatch and $and within the $filter:
db.products.aggregate([
{ "$match": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "$addFields": {
"items": {
"input": "$items",
"cond": {
"$and": [
{ "$eq": [ "$$this.date", "31.08.2014" ] },
{ "$eq": [ "$$this.purchasePrice", 1 ] }
]
}
}
}}
])
So it just depends on whether you always expect a single element to match or multiple elements, and then which approach is better. But where possible the .find() method will generally be faster since it lacks the overhead of the other operations, which in those last to forms does not lag that far behind at all.
As a side note, your "dates" are represented as strings which is not a very good idea going forward. Consider changing these to proper Date object types, which will greatly help you in the future.
Based on Neil Lunn's code I work with this solution, it includes automatically all first level keys (but you could also exclude keys if you want):
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
{ items: { $elemMatch: { date: "31.08.2014" } } },
)
With multiple requirements:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ items: { $elemMatch: { "date": "31.08.2014", "purchasePrice": 1 } } },
)
Mongo supports dot notation for sub-queries.
See: http://docs.mongodb.org/manual/reference/glossary/#term-dot-notation
Depending on your driver, you want something like:
db.products.find({"items.date":"31.08.2014"});
Note that the attribute is in quotes for dot notation, even if usually your driver doesn't require this.

How to address arrays with mongodb Set Operators

Using the example zipcodes collection, I have a query like this:
db.zipcodes.aggregate([
{ "$match": {"state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id": { "city": "$city" }, "ZipsPerCity": {"$addToSet": "$_id"}}},
{ "$match": { "ZipsPerCity" : { "$size": 2 }}},
]).pretty()
This is just an example that looks for cities (in the state of NY and PA) that have 2 zipcodes:
{
"_id" : {
"city" : "BETHLEHEM"
},
"ZipsPerCity" : [
"18018",
"18015"
]
}
{
"_id" : {
"city" : "BEAVER SPRINGS"
},
"ZipsPerCity" : [
"17843",
"17812"
]
}
Now suppose that I want to compare "BEAVER SPRINGS" zip codes to "BETHLEHEM" zip codes, using the "$setDifference" set operator? I tried using the "$setDifference" operator in a $project operator, like this:
db.zipcodes.aggregate([
{ "$match": { "state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id: {city : "$city"},"ZipsPerCity": {$addToSet: "$_id"}}},
{ "$match": { "ZipsPerCity" : { $size: 2 }}},
{ "$project": {
"int": { "$setDifference":[
"$_id.city.BETHLEHEM.ZipsPerCity",
"$_id.city.BEAVER SPRINGS.ZipsPerCity"
]}
}}
]).pretty()
That doesn't even look right, let alone produce results. No errors though.
How would you refer to a couple of arrays built using $addToSet like this, using $setDifference (or any of the set operators)?
The first thing about what you are trying to do here is that the arrays you want to compare are actually in two different documents. All of the aggregation framework operators in fact work on only one document at a time, with the exception of $group which is meant to "aggregate" documents and possibly $unwind which essentially turns one document into many.
In order to compare you would need the data to occur in one document, or at least be "paired" in some way. So there is a technique to do that:
db.zipcodes.aggregate([
{ "$match": {"state": { "$in": [ "PA","NY" ] } }},
{ "$group": {
"_id": "$city",
"ZipsPerCity": { "$addToSet": "$_id"}
}},
{ "$match": { "ZipsPerCity" : { "$size": 2 } }},
{ "$group": {
"_id": null,
"A": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BETHLEHEM" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}},
"B": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BEAVER SPRINGS" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}}
}},
{ "$project": {
"A": 1,
"B": 1,
"C": { "$setDifference": [ "$A.ZipsPerCity", "$B.ZipsPerCity" ] }
}}
])
That is a little contrived and I am well aware that the actual result set has more than two cities, but the point it to illustrate that the arrays/sets sent to the "set operators" such as $setDifference need to be in the same document.
The result here compares the "left" array with the "right" array, returning the members from the "left" that are different to the "right". Both sets are unique here with no overlap so the results should be expected:
{
"_id" : null,
"A" : {
"city" : "BETHLEHEM",
"ZipsPerCity" : [
"18018",
"18015"
]
},
"B" : {
"city" : "BEAVER SPRINGS",
"ZipsPerCity" : [
"17843",
"17812"
]
},
"C" : [
"18018",
"18015"
]
}
This is really better illustrated with actual "sets" with common members. So this document:
{ "A" : [ "A", "A", "B", "C", "D" ], "B" : [ "B", "C" ] }
Responds to $setDifference:
{ "C" : [ "A", "D" ] }
And $setEquals:
{ "C" : false }
$setIntersection:
{ "C" : [ "B", "C" ] }
$setUnion:
{ "C" : [ "B", "D", "C", "A" ] }
$setIsSubSet reversing the order to $B, $A:
{ "C" : true }
The other set operators $anyElementTrue and $allElementsTrue are likely most useful when used along with the $map operator which can re-shape arrays and evaluate conditions against each element.
A very good usage of $map is alongside $setDifference, where you can "filter" array contents without using $unwind:
db.arrays.aggregate([
{ "$project": {
"A": {
"$setDifference": [
{
"$map": {
"input": "$A",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
That can be very handy when you have a lot of results in the pipeline and you do not want to "expand" out all of those results by "unwinding" the array. But note that this is a "set" and as such only one element matching "A" is returned:
{ "A" : ["A"] }
So the things to keep in mind here are that you:
Operate only within the "same" document at a time
The results are generally "sets" and that means they are both "unique" and "un-ordered" as a result.
Overall that should be a decent run-down on what the set operators are and how you use them.