Query on all nested docs inside a nested doc in MongoDB - mongodb

I have a collection with documents that look like this:
{ keyA1: "stringVal",
keyA2: "stringVal",
keyA3: { keyB1: { feild1: intVal,
feild2: intVal}
keyB2: { feild1: intVal,
feild2: intVal}
}
}
Currently the [keyB1, keyB2, ...] set is 7 keys, same for all documents in the collection. I want to query the intVals on specific fields for all keyB's. So, for example, I might want to find all documents where field2 has value greater than 100 regardless of whcih keyB it falls in.
For any one specific keyB, I simply use the dot notation: {"keyA3.keyB2.field2": {$gte: 100}}. Right now, I have the option of looping over all keyB's, but this may not be the case in the future where more keyB values can be added. I don't want to have to modify the code then, and would like to avoid harcoding those values in anyway. I also need the solution to be fairly fast, as the final deployment is expected to have over 20M documents.
How can I write a query that can "skip" the keyB field in the dot notation and just go through all the embedded docs?
FWIW, I'm implementing this in python using pymongo. Thanks.

first convert keyA3 object to array and add new field with $addFields
then filter the new array to match field2 value is greater than 100
then query the doc that size of matched array is greater than 0 , then remove extra field we add
db.collection.aggregate([
{
"$addFields": {
"arr": {
"$objectToArray": "$keyA3"
}
}
},
{
"$addFields": {
"matchArrSize": {
$size: {
"$filter": {
"input": "$arr",
"as": "z",
"cond": {
$gt: [
"$$z.v.feild2",
100
]
}
}
}
}
}
},
{
$match: {
matchArrSize: {
$gt: 0
}
}
},
{
$unset: [
"arr",
"matchArrSize"
]
}
])
https://mongoplayground.net/p/VumwL9y7Km1

Related

Find in nested array with compare on last field

I have a collection with documents like this one:
{
f1: {
firstArray: [
{
secondArray: [{status: "foo1"}, {status: "foo2"}, {status: "foo3"}]
}
]
}
}
My expected result includes documents that have at least one item in firstArray, which is last object status on the secondArray is included in an input array of values (eg. ["foo3"]).
I don't must use aggregate.
I tried:
{
"f1.firstArray": {
$elemMatch: {
"secondArray.status": {
$in: ["foo3"],
},
otherField: "bar",
},
},
}
You can use an aggregation pipeline with $match and $filter, to keep only documents that their size of matching last items are greater than zero:
db.collection.aggregate([
{$match: {
$expr: {
$gt: [
{$size: {
$filter: {
input: "$f1.firstArray",
cond: {$in: [{$last: "$$this.secondArray.status"}, ["foo3"]]}
}
}
},
0
]
}
}
}
])
See how it works on the playground example
If you know that the secondArray have always 3 items you can do:
db.collection.find({
"f1.firstArray": {
$elemMatch: {
"secondArray.2.status": {
$in: ["foo3"]
}
}
}
})
But otherwise I don't think you can check only the last item without an aggregaation. The idea is that a regular find allows you to write a query that do not use values that are specific for each document. If the size of the array can be different on each document or even on different items on the same document, you need to use an aggregation pipeline

MongoDB projections and fields subset

I would like to use mongo projections in order to return less data to my application. I would like to know if it's possible.
Example:
user: {
id: 123,
some_list: [{x:1, y:2}, {x:3, y:4}],
other_list: [{x:5, y:2}, {x:3, y:4}]
}
Given a query for user_id = 123 and some 'projection filter' like user.some_list.x = 1 and user.other_list.x = 1 is it possible to achieve the given result?
user: {
id: 123,
some_list: [{x:1, y:2}],
other_list: []
}
The ideia is to make mongo work a little more and retrieve less data to the application. In some cases, we are discarding 80% of the elements of the collections at the application's side. So, it would be better not returning then at all.
Questions:
Is it possible?
How can I achieve this. $elemMatch doesn't seem to help me. I'm trying something with unwind, but not getting there
If it's possible, can this projection filtering benefit from a index on user.some_list.x for example? Or not at all once the user was already found by its id?
Thank you.
What you can do in MongoDB v3.0 is this:
db.collection.aggregate({
$match: {
"user.id": 123
}
}, {
$redact: {
$cond: {
if: {
$or: [ // those are the conditions for when to include a (sub-)document
"$user", // if it contains a "user" field (as is the case when we're on the top level
"$some_list", // if it contains a "some_list" field (would be the case for the "user" sub-document)
"$other_list", // the same here for the "other_list" field
{ $eq: [ "$x", 1 ] } // and lastly, when we're looking at the innermost sub-documents, we only want to include items where "x" is equal to 1
]
},
then: "$$DESCEND", // descend into sub-document
else: "$$PRUNE" // drop sub-document
}
}
})
Depending on your data setup what you could also do to simplify this query a little is to say: Include everything that does not have a "x" field or if it is present that it needs to be equal to 1 like so:
$redact: {
$cond: {
if: {
$eq: [ { "$ifNull": [ "$x", 1 ] }, 1 ] // we only want to include items where "x" is equal to 1 or where "x" does not exist
},
then: "$$DESCEND", // descend into sub-document
else: "$$PRUNE" // drop sub-document
}
}
The index you suggested won't do anything for the $redact stage. You can benefit from it, however, if you change the $match stage at the start to get rid of all documents which don't match anyway like so:
$match: {
"user.id": 123,
"user.some_list.x": 1 // this will use your index
}
Very possible.
With findOne, the query is the first argument and the projection is the second. In Node/Javascript (similar to bash):
db.collections('users').findOne( {
id = 123
}, {
other_list: 0
} )
Will return the who'll object without the other_list field. OR you could specify { some_list: 1 } as the projection and returned will be ONLY the _id and some_list
$filter is your friend here. Below produces the output you seek. Experiment with changing the $eq fields and target values to see more or less items in the array get picked up. Note how we $project the new fields (some_list and other_list) "on top of" the old ones, essentially replacing them with the filtered versions.
db.foo.aggregate([
{$match: {"user.id": 123}}
,{$project: { "user.some_list": { $filter: {
input: "$user.some_list",
as: "z",
cond: {$eq: [ "$$z.x", 1 ]}
}},
"user.other_list": { $filter: {
input: "$user.other_list",
as: "z",
cond: {$eq: [ "$$z.x", 1 ]}
}}
}}
]);

Mongo: Return specific fields from an array slice

I would like to return specific fields from an array in Mongo and am having trouble.
Let us say we have a document like such:
{
"student":"Bob",
"report_cards": [
{
"Year":2016,
"English":"B",
"Math":"A"
},
{
"Year":2015,
"English":"B",
"Math":"A"
}
]
}
I would like to return the following:
{"Student": "Bob", {"English":"B"}}
Basically, I only need the first element in the report cards array, and only return the English field.
I know it is something around:
db.collection.find({},{"Student":1, "report_cards":{$slice:1}});
But this, of course, results in the full array (year, english, math) returning. I have tried the following:
db.collection.find({},{"Student":1, "report_cards.$.english":{$slice:1}})
db.collection.find({},{"Student":1, "report_cards":{$slice:1, "$.english"}});
but those aren't correct.
How do I simply return one field from the obj within the array results?
Thank you!
You can use the aggregation framework to get the values as
db.test.aggregate
([{$project:{_id:0,
student:'$student',
English:{ $arrayElemAt: ['$report_cards.English',0]
}}}])
AFAIK there is no built-in operator for filtering key value pair in embedded documents.
You can use combination of operators to achieve in latest 3.4 version.
The below query use $arrayElemAt to get first element in report_cards followed by $objectToArray to convert to array-view ( array of k and v pairs ) and $filter on English key to keep the matching key value pair and $arrayToObject to convert back to document.
Something like
{
"$project": {
"output": {
"$arrayToObject": {
"input": {
"$filter": {
"input": {
"$objectToArray": {
"$arrayElemAt": [
"$report_cards",
0
]
}
},
"as": "result",
"cond": {
"$eq": [
"$$result.k",
"English"
]
}
}
}
}
}
}
}

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

MongoDB: how to count number of keys in a document?

Lets say a document is :
{
a: 1,
b: 1,
c: 2,
....
z: 2
}
How can I count the number of keys in such document?
Thank you
Quite possible if using MongoDB 3.6 and newer though the aggregation framework.
Use the $objectToArray operator within an aggregation pipeline to convert the document to an array. The return array contains an element for each field/value pair in the original document. Each element in the return array is a document that contains two fields k and v.
The reference to root the document is made possible through the $$ROOT system variable which references the top-level document currently being processed in the aggregation pipeline stage.
On getting the array, you can then leverage the use of $addFields pipeline step to create a field that holds the counts and the actual count is derived with the use of the $size operator.
All this can be done in a single pipeline by nesting the expressions as follows:
db.collection.aggregate([
{ "$addFields": {
"count": {
"$size": {
"$objectToArray": "$$ROOT"
}
}
} }
])
Example Output
{
"_id" : ObjectId("5a7cd94520a31e44e0e7e282"),
"a" : 1.0,
"b" : 1.0,
"c" : 2.0,
"z" : 2.0,
"count" : 5
}
To exclude the _id field, you can use the $filter operator as:
db.collection.aggregate([
{
"$addFields": {
"count": {
"$size": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "el",
"cond": { "$ne": [ "$$el.k", "_id" ] }
}
}
}
}
}
])
or as suggested by 0zkr PM simply add a $project pipeline step at the beginning:
db.collection.aggregate([
{ "$project": { "_id": 0 } },
{ "$addFields": {
"count": {
"$size": {
"$objectToArray": "$$ROOT"
}
}
} }
])
There's no built-in command for that. Fetch this document and count the keys yourself.
In Mongo (as in most NoSQL solutions) it's usually a good idea to pre-calculate such values if you want to do queries on them later (e.g. "where the number of different keys > 12") so maybe you should consider adding a new field "keyCount" which you increment every time a new key is added.
If you are in Mongo console, you can write javascript to do that.
doc = db.coll.findOne({}); nKeys =0; for( k in doc){nKeys ++;} print(nKeys);
This will still be considered "outside the mongo" though.
Sergio is right, you have to do it outside of mongo.
You can do something like this:
var count = 0;
for(k in obj) {count++;}
print(count);
const MongoClient = new require("mongodb").MongoClient(
"mongodb://localhost:27017",
{
useNewUrlParser: true,
useUnifiedTopology: true
}
);
(async () => {
const connection = await MongoClient.connect();
const dbStuff = await connection
.db("db")
.collection("weedmaps.com/brands")
.find({})
.toArray(); // <- Chuck all it into an Array
for (let thing of dbStuff) {
console.log(Object.keys(thing).length);
}
return await connection.close();
})();
If You are work with Documents like the mongoose.Document, You can try to count all keys inside the doc after transform in a object, 'cause, before it, the doc just have commom properties:
Object.keys(document); // returns [ '$__', 'isNew', 'errors', '_doc', '$locals' ]
Object.keys(document).length; // returns ALWAYS 5
So, try cast to an object, after pass it to Object.keys
console.dir(
Object.keys(document.toObject()) // Returns ['prop1', 'prop2', 'prop3']
);
And to show the keys count
console.log(
Object.keys(document.toObject().length)
); // Returns 3
And Be Happy!!!