Match Document Key Nearest to Search Value - mongodb

I have the following collection:
{
"_id" : "Stats1",
"minutes" : {
"0" : [
{
"0" : {
"f" : 1,
"t" : 0,
"v" : "0"
}
}
],
"22" : [
{
"2" : "1"
}
],
"29" : [
{
"32" : "2"
}
],
"38" : [
{
"40" : "3"
}
]
}
}
and when i try:
db.stats.aggregate()
.project({"_id":"1", "minArray": {"$objectToArray": "$minutes"}})
i am getting error message:
"$objectToArray requires a document input, found: array"
and when i try:
db.stats.aggregate()
.project({"_id":"1", "minArray": {"$arrayToObject": "$minutes"}})
i am getting error message:
"$arrayToObject requires an array input, found: object"
I would like to get closest value for minute exact or lower than 30:
{ "minute" : "29", "value" : [{ "32" : "2"}] }

So the errors are because without a $match your pipeline is attempting to access other documents which don't have the expected structure. That's really something separate to sort out though.
To actually answer your question from it's end objective, you want a pipeline like this:
var _id = "Stats1";
var target = 30;
db.stats.aggregate([
{ "$match": { "_id" : _id } },
{ "$replaceRoot": {
"newRoot": {
"$let": {
"vars": {
"working": {
"$map": {
"input": { "$objectToArray": "$minutes" },
"in": {
"k": { "$toInt": "$$this.k" },
"v": "$$this.v",
"diff": { "$abs": { "$subtract": [ target, { "$toInt": "$$this.k" }] } }
}
}
}
},
"in": {
"$arrayToObject": {
"$map": {
"input": {
"$filter": {
"input": {
"$objectToArray": {
"$arrayElemAt": [
"$$working",
{ "$indexOfArray": [ "$$working.diff", { "$min": "$$working.diff" } ] }
]
}
},
"cond": { "$ne": [ "$$this.k", "diff" ] }
}
},
"in": {
"k": { "$cond": [{ "$eq": [ "$$this.k", "k"] }, "minute", "value" ] },
"v": { "$cond": [{ "$eq": [ "$$this.k", "k"] }, { "$toString": "$$this.v" }, "$$this.v" ] }
}
}
}
}
}
}
}}
])
Which of course returns the wanted output:
{ "minute" : "29", "value" : [ { "32" : "2" } ] }
In sequence you do the $objectToArray as you initially attempted, but then you need that key or "k" value to actually be converted to numeric for comparison. You also need to calculate the difference of that from the value you are searching for, in this case 30. That gives you a "working" copy of the data in array form, which is important for the next input stages.
The next section is basically read inwards from the levels of indentation to best understand the order.
First you basically want to extract the element from that working array where the difference ( using $abs so positive and negative are the same ) is the minimal value with $min. This gives the position of the first match from $indexOfArray and used that with $arrayElemAt to return that single selected element from the working array.
We don't want all the fields in that object, so $objectToArray converts that single object into "k" and "v" paired objects, and the first step is to $filter where that key is the difference field and remove this from that list.
Next you want to rename the fields and change some data formats, so the $map iterates the remaining array ( just two entries ) assigning readable names and setting the string format for the "minute".
Finally this can go back to an object as $arrayToObject as the final output. Since we wanted to refer to that "working" array several times, we declare in $let which allows us to do that. And since all of that was an expression that outputs what you want as a document, you use $replaceRoot to wrap this as an "expression" is basically it's single expected argument.

Related

Retrieve specific element of a nested document

Just cannot figure this out. This is the document format from a MongoDB of jobs, which is derived from an XML file the layout of which I have no control over:
{
"reference" : [ "93417" ],
"Title" : [ "RN - Pediatric Director of Nursing" ],
"Description" : [ "...a paragraph or two..." ],
"Classifications" : [
{
"Classification" : [
{
"_" : "Nurse / Midwife",
"name" : [ "Category" ]
},
{
"_" : "FL - Jacksonville",
"name" : [ "Location" ],
},
{
"_" : "Permanent / Full Time",
"name" : [ "Work Type" ],
},
{
"_" : "Some Health Care Org",
"name" : [ "Company Name" ],
}
]
}
],
"Apply" : [
{
"EmailTo" : [ "jess#recruiting.co" ]
}
]
}
The intention is to pull a list of jobs from the DB, to include 'Location', which is buried down there as the second document at 'Classifications.Classification._'.
I've tried various 'aggregate' permutations of $project, $unwind, $match, $filter, $group… but I don't seem to be getting anywhere. Experimenting with just retrieving the company name, I was expecting this to work:
db.collection(JOBS_COLLECTION).aggregate([
{ "$project" : { "meta": "$Classifications.Classification" } },
{ "$project" : { "meta": 1, _id: 0 } },
{ "$unwind" : "$meta" },
{ "$match": { "meta.name" : "Company Name" } },
{ "$project" : { "Company" : "$meta._" } },
])
But that pulled everything for every record, thus:
[{
"Company":[
"Nurse / Midwife",
"TX - San Antonio",
"Permanent / Full Time",
"Some Health Care Org"
]
}, { etc etc }]
What am I missing, or misusing?
Ideally with MongoDB 3.4 available you would simply $project, and use the array operators of $map, $filter and $reduce. The latter to "compact" the arrays and the former to to extract the relevant element and detail. Also $arrayElemAt takes just the "element" from the array(s):
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": {
"$reduce": {
"input": "$Classifications.Classification",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this" ]
}
}
},
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
Or even skip the $reduce which is merely applying the $concatArrays to "merge" and simply grab the "first" array index ( since there is only one ) using $arrayElemAt:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$arrayElemAt": [ "$Classifications.Classification", 0 ] },
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
That makes the operation compatible with MongoDB 3.2, which you "should" be running at least.
Which in turn allows you to consider alternate syntax for MongoDB 3.4 using $indexOfArray based on the initial input variable of the "first" array index using $let to somewhat shorten the syntax:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$let": {
"vars": {
"meta": {
"$arrayElemAt": [
"$Classifications.Classification",
0
]
}
},
"in": {
"$arrayElemAt": [
"$$meta._",
{ "$indexOfArray": [
"$$meta.name", [ "Location" ]
]}
]
}
}
}
}}
])
If indeed you consider that to be "shorter", that is.
In the other sense though, much like above there is an "array inside and array", so in order to process it, you $unwind twice, which is effectively what the $concatArrays inside $reduce is countering in the ideal case:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$unwind": "$Classifications" },
{ "$unwind": "$Classifications.Classification" },
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": { "_id": 0, "output": "$Classifications.Classification._" } }
])
All statements actually produce:
{
"output" : "FL - Jacksonville"
}
Which is the matching value of "_" in the inner array element for the "Location" as selected by your original intent.
Keeping in mind of course that all statements really should be preceded with the relevant [$match]9 statement as shown:
{ "$match": { "Classifications.Classification.name": "Location" } },
Since without that you would be possibly processing documents unnecessarily, which did not actually contain an array element matching that condition. Of course this may not be the case due to the nature of the documents, but it's generally good practice to make sure the "initial" selection always matches the conditions of details you later intend to "extract".
All of that said, even if this is the result of a direct import from XML, the structure should be changed since it does not efficiently present itself for queries. MongoDB documents do not work how XPATH does in terms of issuing queries. Therefore anything "XML Like" is not going to be a good structure, and if the "import" process cannot be changed to a more accommodating format, then there should at least be a "post process" to manipulate this into a separate storage in a more usable form.

Project field defined by another field's value

I have a document structured like so:
mode: "b",
a: [0,1,2],
b: [1,4,5],
c: [2,2]
And I want to project the field that equals mode. The end result should be something like:
data: [1,4,5] // since mode == "b", it returns b's value
I tried $$CURRENT[$mode], but it looks like you can't use brackets like that in mongo. I tried using a local variable like so:
$let: {
vars: {mode: "$mode"},
in: "$$CURRENT.$$mode"
}
but that doesn't work either. I'm considering using $switch and then manually putting in all the possible modes. But I'm wondering if there is a better way to do it.
You are looking in the wrong place, but if you can use $switch then you have MongoDB 3.4 and you can use $objectToArray which is actually the correct thing to do. Your problem is you are trying to "dynamicaly" refer to a property by the "value" of it's "key name". You cannot do that, so $objectToArray makes the "key" a "value"
So given your document:
{ "mode": "a", "a": [0,1,2], "b": [1,4,5], "c": [2,2] }
Then you do the aggregate, using $map and $filter to work with the converted elements as an array:
db.sample.aggregate([
{ "$project": {
"_id": 0,
"mode": 1,
"data": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$eq": ["$$this.k","$mode"] }
}
},
"in": "$$this.v"
}},
0
]
}
}}
])
Or using $let and $indexOfArray if that seems more sensible to you:
db.sample.aggregate([
{ "$project": {
"_id": 0,
"mode": 1,
"data": {
"$let": {
"vars": { "doc": { "$objectToArray": "$$ROOT" } },
"in": {
"$arrayElemAt": [
"$$doc.v",
{ "$indexOfArray": [ "$$doc.k", "$mode" ] }
]
}
}
}
}}
])
Which matches the selected field:
{
"mode" : "a",
"data" : [
0.0,
1.0,
2.0
]
}
If you look at "just" what $objectToArray is doing here, then the reasons should be self evident:
{
"data" : [
{
"k" : "_id",
"v" : ObjectId("597915787dcd6a5f6a9b4b98")
},
{
"k" : "mode",
"v" : "a"
},
{
"k" : "a",
"v" : [
0.0,
1.0,
2.0
]
},
{
"k" : "b",
"v" : [
1.0,
4.0,
5.0
]
},
{
"k" : "c",
"v" : [
2.0,
2.0
]
}
]
}
So now instead of there being an "object" with named properties, the "array" consistently contains "k" named as the "key" and "v" containing the "value". This is easy to $filter and obtain the desired results, or basically use any method that works with arrays to obtain the match.

Querying MongoDB array of similar objects

I've to work with old MongoDB where objects in one collection are structured like this.
{
"_id": ObjectId("57fdfcc7a7c81fde38b79a3d"),
"parameters": [
{
"key": "key1",
"value": "value1"
},
{
"key": "key2",
"value": "value2"
}
]
}
The problem is that parameters is an array of objects, which makes efficient querying difficult. There can be about 50 different objects, which all have "key" and "value" properties. Is it possible to make a query, where the query targets "key" and "value" inside one object? I've tried
db.collection.find({$and:[{"parameters.key":"value"}, {"parameters.value":"another value"}]})
but this query hits all the objects in parameters array.
EDIT. Nikhil Jagtiani found solution to my original question, but actually I should be able query to target multiple objects inside parameters array. E.g. check keys and values in two different objects in parameters array.
Please refer below mongo shell aggregate query :
db.collection.aggregate([
{
$unwind:"$parameters"
},
{
$match:
{
"parameters.key":"key1",
"parameters.value":"value1"
}
}
])
1) Stage 1 - Unwind : Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
2) Stage 2 - Match : Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
Without aggregation, queries will return the entire document even if one subdocument matches. This pipeline will only return the required subdocuments.
Edit: If you need to specify multiple key value pairs, what we need is $in for parameters field.
db.collection.aggregate([{$unwind:"$parameters"},{$match:{"parameters":{$in:[{ "key" : "key1", "value" : "value1"},{ "key" : "key2", "value" : "value2" }]}}}])
will match the following two pairs of key-values as subdocuments:
1) { "key" : "key1", "value" : "value1" }
2) { "key" : "key2", "value" : "value2" }
There is a $filter operator in the aggregation framework which is perfect for such queries. A bit verbose but very efficient, you can use it as follows:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$project": {
"parameters": {
"$filter": {
"input": "$parameters",
"as": "item",
"cond": {
"$or": [
{
"$and" : [
{ "$eq": ["$$item.key", "key1"] },
{ "$eq": ["$$item.value", "val1"] }
]
},
{
"$and" : [
{ "$eq": ["$$item.key", "key2"] },
{ "$eq": ["$$item.value", "val2"] }
]
}
]
}
}
}
}
}
])
You can also do this with more set operators in MongoDB 2.6 without using $unwind:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$project": {
"parameters": {
"$setDifference": [
{ "$map": {
"input": "$parameters",
"as": "item",
"in": {
"$cond": [
{ "$or": [
{
"$and" : [
{ "$eq": ["$$item.key", "key1"] },
{ "$eq": ["$$item.value", "val1"] }
]
},
{
"$and" : [
{ "$eq": ["$$item.key", "key2"] },
{ "$eq": ["$$item.value", "val2"] }
]
}
]},
"$$item",
false
]
}
}},
[false]
]
}
}
}
])
For a solution with MongoDB 2.4, you would need to use the $unwind operator unfortunately:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{ "$unwind": "$parameters" },
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$group": {
"_id": "$_id",
"parameters": { "$push": "$parameters" }
}
}
]);
Is it possible to make a query, where the query targets "key" and
"value" inside one object?
This is possible if you know which object(id) you are going to query upfront(to be given as input parameter in the find query). If that is not possible then we can try on the below approach for efficient querying.
Build an index on the parameters.key and if needed also on parameters.value. This would considerably improve the query performance.
Please see
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/core/index-multikey/

MongoDB insert document "or" increment field if exists in array

What I try to do is fairly simple, I have an array inside a document ;
"tags": [
{
"t" : "architecture",
"n" : 12
},
{
"t" : "contemporary",
"n" : 2
},
{
"t" : "creative",
"n" : 1
},
{
"t" : "concrete",
"n" : 3
}
]
I want to push an array of items to array like
["architecture","blabladontexist"]
If item exists, I want to increment object's n value (in this case its architecture),
and if don't, add it as a new Item (with value of n=0) { "t": "blabladontexist", "n":0}
I have tried $addToSet, $set, $inc, $upsert: true with so many combinations and couldn't do it.
How can we do this in MongoDB?
With MongoDB 4.2 and newer, the update method can now take a document or an aggregate pipeline where the following stages can be used:
$addFields and its alias $set
$project and its alias $unset
$replaceRoot and its alias $replaceWith.
Armed with the above, your update operation with the aggregate pipeline will be to override the tags field by concatenating a filtered tags array and a mapped array of the input list with some data lookup in the map:
To start with, the aggregate expression that filters the tags array uses the $filter and it follows:
const myTags = ["architecture", "blabladontexist"];
{
"$filter": {
"input": "$tags",
"cond": {
"$not": [
{ "$in": ["$$this.t", myTags] }
]
}
}
}
which produces the filtered array of documents
[
{ "t" : "contemporary", "n" : 2 },
{ "t" : "creative", "n" : 1 },
{ "t" : "concrete", "n" : 3 }
]
Now the second part will be to derive the other array that will be concatenated to the above. This array requires a $map over the myTags input array as
{
"$map": {
"input": myTags,
"in": {
"$cond": {
"if": { "$in": ["$$this", "$tags.t"] },
"then": {
"t": "$$this",
"n": {
"$sum": [
{
"$arrayElemAt": [
"$tags.n",
{ "$indexOfArray": [ "$tags.t", "$$this" ] }
]
},
1
]
}
},
"else": { "t": "$$this", "n": 0 }
}
}
}
}
The above $map essentially loops over the input array and checks with each element whether it's in the tags array comparing the t property, if it exists then the value of the n field of the subdocument becomes its current n value
expressed with
{
"$arrayElemAt": [
"$tags.n",
{ "$indexOfArray": [ "$tags.t", "$$this" ] }
]
}
else add the default document with an n value of 0.
Overall, your update operation will be as follows
Your final update operation becomes:
const myTags = ["architecture", "blabladontexist"];
db.getCollection('coll').update(
{ "_id": "1234" },
[
{ "$set": {
"tags": {
"$concatArrays": [
{ "$filter": {
"input": "$tags",
"cond": { "$not": [ { "$in": ["$$this.t", myTags] } ] }
} },
{ "$map": {
"input": myTags,
"in": {
"$cond": [
{ "$in": ["$$this", "$tags.t"] },
{ "t": "$$this", "n": {
"$sum": [
{ "$arrayElemAt": [
"$tags.n",
{ "$indexOfArray": [ "$tags.t", "$$this" ] }
] },
1
]
} },
{ "t": "$$this", "n": 0 }
]
}
} }
]
}
} }
],
{ "upsert": true }
);
I don't believe this is possible to do in a single command.
MongoDB doesn't allow a $set (or $setOnInsert) and $inc to affect the same field in a single command.
You'll have to do one update command to attempt to $inc the field, and if that doesn't change any documents (n = 0), do the update to $set the field to it's default value.

How to assign weights to searched documents in MongoDb?

This might sounds like simple question for you but i have spend over 3 hours to achieve it but i got stuck in mid way.
Inputs:
List of keywords
List of tags
Problem Statement: I need to find all the documents from the database which satisfy following conditions:
List documents that has 1 or many matching keywords. (achieved)
List documents that has 1 or many matching tags. (achieved)
Sort the found documents on the basis of weights: Each keyword matching carry 2 points and each tag matching carry 1 point.
Query: How can i achieve requirement#3.
My Attempt: In my attempt i am able to list only on the basis of keyword match (that too without multiplying weight with 2 ).
tags are array of documents. Structure of each tag is like
{
"id" : "ICC",
"some Other Key" : "some Other value"
}
keywords are array of string:
["women", "cricket"]
Query:
var predicate = [
{
"$match": {
"$or": [
{
"keywords" : {
"$in" : ["cricket", "women"]
}
},
{
"tags.id" : {
"$in" : ["ICC"]
}
}
]
}
},
{
"$project": {
"title":1,
"_id": 0,
"keywords": 1,
"weight" : {
"$size": {
"$setIntersection" : [
"$keywords" , ["cricket","women"]
]
}
},
"tags.id": 1
}
},
{
"$sort": {
"weight": -1
}
}
];
It seems that you were close in your attempt, but of course you need to implement something to "match your logic" in order to get the final "score" value you want.
It's just a matter of changing your projection logic a little, and assuming that both "keywords" and "tags" are arrays in your documents:
db.collection.aggregate([
// Match your required documents
{ "$match": {
"$or": [
{
"keywords" : {
"$in" : ["cricket", "women"]
}
},
{
"tags.id" : {
"$in" : ["ICC"]
}
}
]
}},
// Inspect elements and create a "weight"
{ "$project": {
"title": 1,
"keywords": 1,
"tags": 1,
"weight": {
"$add": [
{ "$multiply": [
{"$size": {
"$setIntersection": [
"$keywords",
[ "cricket", "women" ]
]
}}
,2] },
{ "$size": {
"$setIntersection": [
{ "$map": {
"input": "$tags",
"as": "t",
"in": "$$t.id"
}},
["ICC"]
]
}}
]
}
}},
// Then sort by that "weight"
{ "$sort": { "weight": -1 } }
])
So it is basicallt the $map logic here that "transforms" the other array to just give the id values for comparison against the "set" solution that you want.
The $add operator provides the additional "weight" to the member you want to "weight" your responses by.