MongoDB project the documents with count greater than 2 [duplicate] - mongodb

This question already has answers here:
Query for documents where array size is greater than 1
(14 answers)
Closed 6 years ago.
I have a collection like
{
"_id": "201503110040020021",
"Line": "1", // several documents may have this Line value
"LineStart": ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND": [{
"Secuence": 10,
"Title": 1,
},
{
"Secuence": 183,
"Title": 613,
},
...
],
} {
"_id": "201503110040020022",
"Line": "1", // several documents may have this Line value
"LineStart": ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND": [{
"Secuence": 10,
"Title": 1,
},
],
}
SSCEXPEND is an array. I am trying to count the size of SSC array and project if the count is greater than or equal to 2. My query is something like this
db.entity.aggregate(
[
{
$project: {
SSCEXPEND_count: {$size: "$SSCEXPEND"}
}
},
{
$match: {
"SSCEXPEND_count2": {$gte: ["$SSCEXPEND_count",2]}
}
}
]
)
I am expecting the output to be only the the first document whose array size is greater than 2.
Project part is working fine and I am able to get the counts but I need to project only those which has count greater than or equal to two but my match part is not working. Can any one guide me as where am I going wrong?

You need to project the other fields and your $match pipeline will just need to do a query on the newly-created field to filter the documents based on the array size. Something like the following should work:
db.entity.aggregate([
{
"$project": {
"Line": 1,
"LineStart": 1, "SSCEXPEND": 1,
"SSCEXPEND_count": { "$size": "$SSCEXPEND" }
}
},
{
"$match": {
"SSCEXPEND_count": { "$gte": 2 }
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : "201503110040020021",
"Line" : "1",
"LineStart" : ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND" : [
{
"Secuence" : 10,
"Title" : 1
},
{
"Secuence" : 183,
"Title" : 613
}
],
"SSCEXPEND_count" : 2
}
],
"ok" : 1
}

This is actually a very simple query, where the trick is to use a property of "dot notation" in order to test the array. All you really need to ask for is documents where the array index of 2 $exists, which means the array must contain 3 elements or more:
db.entity.find({ "SSCEXPEND.2": { "$exists": true } })
It's the fastest way to do it and can even use indexes. No need for calculations in aggregation operations.

Related

Mongodb: push element to nested array if the condition is met

I have the following collection:
{
"_id": 11,
"outerArray": [
{ "_id" : 21,
"field": {
"innerArray" : [
1,
2,
3
]
}
},
{ "_id" : 22,
"field": {
"innerArray" : [
2,
3
]
}
},
{ "_id" : 23,
"field": {
"innerArray" : [
2
]
}
}
]
}
I need to go through all documents in collection and push to innerArray new element 4, if innerArray already contains element 1 or element 3
I tried to do it this way, and few others, similar to this one, but it didn't work as expected, it only pushes to innerArray of first element of outerArray
db.collection.updateMany(
{ "outerArray.field.innerArray": { $in: [ 1, 3 ] } },
{ $push: { "outerArray.$.field.innerArray": 4} }
)
How to make it push to all coresponding innerArrays?
Problem here is your missunderstanding a copule things.
When you do "outerArray.field.innerArray": { $in: [ 1, 3 ] } into your query, your are not getting only innerArray where has 1 or 3. You are gettings documents where exists these arrays.
So you are querying the entire document.
You have to use arrayFilter to update values when the filter is match.
So, If I've understood you correctly, the query you want is:
db.collection.update(
{}, //Empty object to find all documents
{
$push: { "outerArray.$[elem].field.innerArray": 4 }
},
{
"arrayFilters": [ { "elem.field.innerArray": { $in: [ 1, 3 ] } } ]
})
Example here
Note how the first object into update is empty. You have to put there the field to match the document (not the array, the document).
If you want to update only one document you have to fill first object (query object) with values you want, for example: {"_id": 11}.

How do I get a sum of the occurrence of each item in an array across all documents?

I want to get an aggregation/count of the occurrence of all items in an array across all documents. I've tried looking up examples but none of them seem to cover this scenario exactly or go about it in a very obtuse way.
Here's a simple idea of the document model i'm working with. The itemIds array within each object is always unique (no repeated values):
[{
_id:1,
itemIds:[3, 4, 6, 12]
},
{
_id:2,
itemIds:[4, 12]
},
{
_id:3,
itemIds:[3, 4, 8, 9, 12]
}]
I need the counts of each of these summed up (doesn't have to be this exact format but just giving a general idea of what I need):
{
itemsCount:[
{
itemId:3,
count:2
},
{
itemId:4,
count:3
},
{
itemId:6,
count:1
},
{
itemId:8,
count:1
},
{
itemId:9,
count:1
},
{
itemId:12,
count:3
}
]
}
Please try this :
db.yourCollection.aggregate([
{$project : {'itemIds' : 1, _id :0}},
{$unwind : '$itemIds'},
{$group : {'_id': '$itemIds', count :{$sum :1}}}
])

MongoDB lists - get every Nth item

I have a Mongodb schema that looks roughly like:
[
{
"name" : "name1",
"instances" : [
{
"value" : 1,
"date" : ISODate("2015-03-04T00:00:00.000Z")
},
{
"value" : 2,
"date" : ISODate("2015-04-01T00:00:00.000Z")
},
{
"value" : 2.5,
"date" : ISODate("2015-03-05T00:00:00.000Z")
},
...
]
},
{
"name" : "name2",
"instances" : [
...
]
}
]
where the number of instances for each element can be quite big.
I sometimes want to get only a sample of the data, that is, get every 3rd instance, or every 10th instance... you get the picture.
I can achieve this goal by getting all instances and filtering them in my server code, but I was wondering if there's a way to do it by using some aggregation query.
Any ideas?
Updated
Assuming the data structure was flat as #SylvainLeroux suggested below, that is:
[
{"name": "name1", "value": 1, "date": ISODate("2015-03-04T00:00:00.000Z")},
{"name": "name2", "value": 5, "date": ISODate("2015-04-04T00:00:00.000Z")},
{"name": "name1", "value": 2, "date": ISODate("2015-04-01T00:00:00.000Z")},
{"name": "name1", "value": 2.5, "date": ISODate("2015-03-05T00:00:00.000Z")},
...
]
will the task of getting every Nth item (of specific name) be easier?
It seems that your question clearly asked "get every nth instance" which does seem like a pretty clear question.
Query operations like .find() can really only return the document "as is" with the exception of general field "selection" in projection and operators such as the positional $ match operator or $elemMatch that allow a singular matched array element.
Of course there is $slice, but that just allows a "range selection" on the array, so again does not apply.
The "only" things that can modify a result on the server are .aggregate() and .mapReduce(). The former does not "play very well" with "slicing" arrays in any way, at least not by "n" elements. However since the "function()" arguments of mapReduce are JavaScript based logic, then you have a little more room to play with.
For analytical processes, and for analytical purposes "only" then just filter the array contents via mapReduce using .filter():
db.collection.mapReduce(
function() {
var id = this._id;
delete this._id;
// filter the content of "instances" to every 3rd item only
this.instances = this.instances.filter(function(el,idx) {
return ((idx+1) % 3) == 0;
});
emit(id,this);
},
function() {},
{ "out": { "inline": 1 } } // or output to collection as required
)
It's really just a "JavaScript runner" at this point, but if this is just for anaylsis/testing then there is nothing generally wrong with the concept. Of course the output is not "exactly" how your document is structured, but it's as near a facsimile as mapReduce can get.
The other suggestion I see here requires creating a new collection with all the items "denormalized" and inserting the "index" from the array as part of the unqique _id key. That may produce something you can query directly, bu for the "every nth item" you would still have to do:
db.resultCollection.find({
"_id.index": { "$in": [2,5,8,11,14] } // and so on ....
})
So work out and provide the index value of "every nth item" in order to get "every nth item". So that doesn't really seem to solve the problem that was asked.
If the output form seemed more desirable for your "testing" purposes, then a better subsequent query on those results would be using the aggregation pipeline, with $redact
db.newCollection([
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$mod": [ { "$add": [ "$_id.index", 1] }, 3 ] },
0 ]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
That at least uses a "logical condition" much the same as what was applied with .filter() before to just select the "nth index" items without listing all possible index values as a query argument.
No $unwind is needed here. You can use $push with $arrayElemAt to project the array value at requested index inside $group aggregation.
Something like
db.colname.aggregate(
[
{"$group":{
"_id":null,
"valuesatNthindex":{"$push":{"$arrayElemAt":["$instances",N]}
}}
},
{"$project":{"valuesatNthindex":1}}
])
You might like this approach using the $lookup aggregation. And probably the most convenient and fastest way without any aggregation trick.
Create a collection Names with the following schema
[
{ "_id": 1, "name": "name1" },
{ "_id": 2, "name": "name2" }
]
and then Instances collection having the parent id as "nameId"
[
{ "nameId": 1, "value" : 1, "date" : ISODate("2015-03-04T00:00:00.000Z") },
{ "nameId": 1, "value" : 2, "date" : ISODate("2015-04-01T00:00:00.000Z") },
{ "nameId": 1, "value" : 3, "date" : ISODate("2015-03-05T00:00:00.000Z") },
{ "nameId": 2, "value" : 7, "date" : ISODate("2015-03-04T00:00:00.000Z") },
{ "nameId": 2, "value" : 8, "date" : ISODate("2015-04-01T00:00:00.000Z") },
{ "nameId": 2, "value" : 4, "date" : ISODate("2015-03-05T00:00:00.000Z") }
]
Now with $lookup aggregation 3.6 syntax you can use $sample inside the $lookup pipeline to get the every Nth element randomly.
db.Names.aggregate([
{ "$lookup": {
"from": Instances.collection.name,
"let": { "nameId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$nameId", "$$nameId"] }}},
{ "$sample": { "size": N }}
],
"as": "instances"
}}
])
You can test it here
Unfortunately, with the aggregation framework it's not possible as this would require an option with $unwind to emit an array index/position, of which currently aggregation can't handle. There is an open JIRA ticket for this here SERVER-4588.
However, a workaround would be to use MapReduce but this comes at a huge performance cost since the actual calculations of getting the array index are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.
With mapReduce, you could try something like this:
Mapping function:
var map = function(){
for(var i=0; i < this.instances.length; i++){
emit(
{ "_id": this._id, "index": i },
{ "index": i, "value": this.instances[i] }
);
}
};
Reduce function:
var reduce = function(){}
You can then run the following mapReduce function on your collection:
db.collection.mapReduce( map, reduce, { out : "resultCollection" } );
And then you can query the result collection to geta list/array of every Nth item of the instance array by using the map() cursor method :
var thirdInstances = db.resultCollection.find({"_id.index": N})
.map(function(doc){return doc.value.value})
You can use below aggregation:
db.col.aggregate([
{
$project: {
instances: {
$map: {
input: { $range: [ 0, { $size: "$instances" }, N ] },
as: "index",
in: { $arrayElemAt: [ "$instances", "$$index" ] }
}
}
}
}
])
$range generates a list of indexes. Third parameter represents non-zero step. For N = 2 it will be [0,2,4,6...], for N = 3 it will return [0,3,6,9...] and so on. Then you can use $map to get correspinding items from instances array.
Or with just a find block:
db.Collection.find({}).then(function(data) {
var ret = [];
for (var i = 0, len = data.length; i < len; i++) {
if (i % 3 === 0 ) {
ret.push(data[i]);
}
}
return ret;
});
Returns a promise whose then() you can invoke to fetch the Nth modulo'ed data.

MongoDB Nested Array Intersection Query

and thank you in advance for your help.
I have a mongoDB database structured like this:
{
'_id' : objectID(...),
'userID' : id,
'movies' : [{
'movieID' : movieID,
'rating' : rating
}]
}
My question is:
I want to search for a specific user that has 'userID' : 3, for example, get all is movies, then i want to get all the other users that have at least, 15 or more movies with the same 'movieID', then with that group i wanna select only the users that have those 15 movies in similarity and have one extra 'movieID' that i choose.
I already tried aggregation, but failed, and if i do single queries like getting all the users movies from a user, the cycling every user movie and comparing it takes a bunch of time.
Any ideias?
Thank you
There are a couple of ways to do this using the aggregation framework
Just a simple set of data for example:
{
"_id" : ObjectId("538181738d6bd23253654690"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 2, "rating": 6 },
{ "_id": 3, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654691"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 4, "rating": 6 },
{ "_id": 2, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654692"),
"movies": [
{ "_id": 2, "rating": 5 },
{ "_id": 5, "rating": 6 },
{ "_id": 6, "rating": 7 }
]
}
Using the first "user" as an example, now you want to find if any of the other two users have at least two of the same movies.
For MongoDB 2.6 and upwards you can simply use the $setIntersection operator along with the $size operator:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document if you want to keep more than `_id`
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
}},
// Unwind the array
{ "$unwind": "$movies" },
// Build the array back with just `_id` values
{ "$group": {
"_id": "$_id",
"movies": { "$push": "$movies._id" }
}},
// Find the "set intersection" of the two arrays
{ "$project": {
"movies": {
"$size": {
"$setIntersection": [
[ 1, 2, 3 ],
"$movies"
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
This is still possible in earlier versions of MongoDB that do not have those operators, just using a few more steps:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document along with the "set" to match
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
"set": { "$cond": [ 1, [ 1, 2, 3 ], 0 ] }
}},
// Unwind both those arrays
{ "$unwind": "$movies" },
{ "$unwind": "$set" },
// Group back the count where both `_id` values are equal
{ "$group": {
"_id": "$_id",
"movies": {
"$sum": {
"$cond":[
{ "$eq": [ "$movies._id", "$set" ] },
1,
0
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
In Detail
That may be a bit to take in, so we can take a look at each stage and break those down to see what they are doing.
$match : You do not want to operate on every document in the collection so this is an opportunity to remove the items that are not possibly matches even if there still is more work to do to find the exact ones. So the obvious things are to exclude the same "user" and then only match the documents that have at least one of the same movies as was found for that "user".
The next thing that makes sense is to consider that when you want to match n entries then only documents that have a "movies" array that is larger than n-1 can possibly actually contain matches. The use of $and here looks funny and is not required specifically, but if the required matches were 4 then that actual part of the statement would look like this:
"$and": [
{ "movies": { "$not": { "$size": 1 } } },
{ "movies": { "$not": { "$size": 2 } } },
{ "movies": { "$not": { "$size": 3 } } }
]
So you basically "rule out" arrays that are not possibly long enough to have n matches. Noting here that this $size operator in the query form is different to $size for the aggregation framework. There is no way for example to use this with an inequality operator such as $gt is it's purpose is to specifically match the requested "size". Hence this query form to specify all of the possible sizes that are less than.
$project : There are a few purposes in this statement, of which some differ depending on the MongoDB version you have. Firstly, and optionally, a document copy is being kept under the _id value so that these fields are not modified by the rest of the steps. The other part here is keeping the "movies" array at the top of the document as a copy for the next stage.
What is also happening in the version presented for pre 2.6 versions is there is an additional array representing the _id values for the "movies" to match. The usage of the $cond operator here is just a way of creating a "literal" representation of the array. Funny enough, MongoDB 2.6 introduces an operator known as $literal to do exactly this without the funny way we are using $cond right here.
$unwind : To do anything further the movies array needs to be unwound as in either case it is the only way to isolate the existing _id values for the entries that need to be matched against the "set". So for the pre 2.6 version you need to "unwind" both of the arrays that are present.
$group : For MongoDB 2.6 and greater you are just grouping back to an array that only contains the _id values of the movies with the "ratings" removed.
Pre 2.6 since all values are presented "side by side" ( and with lots of duplication ) you are doing a comparison of the two values to see if they are the same. Where that is true, this tells the $cond operator statement to return a value of 1 or 0 where the condition is false. This is directly passed back through $sum to total up the number of matching elements in the array to the required "set".
$project: Where this is the different part for MongoDB 2.6 and greater is that since you have pushed back an array of the "movies" _id values you are then using $setIntersection to directly compare those arrays. As the result of this is an array containing the elements that are the same, this is then wrapped in a $size operator in order to determine how many elements were returned in that matching set.
$match: Is the final stage that has been implemented here which does the clear step of matching only those documents whose count of intersecting elements was greater than or equal to the required number.
Final
That is basically how you do it. Prior to 2.6 is a bit clunkier and will require a bit more memory due to the expansion that is done by duplicating each array member that is found by all of the possible values of the set, but it still is a valid way to do this.
All you need to do is apply this with the greater n matching values to meet your conditions, and of course make sure your original user match has the required n possibilities. Otherwise just generate this on n-1 from the length of the "user's" array of "movies".

Mongo - Querying inside array

I have this db structure
{
"_id": 107,
"standard": {"name": "building",
"item": [{"code": 151001,
"quantity": 10,
"delivered": 8,
"um": "kg" },
{"code": 151001,
"quantity": 20,
"delivered": 6,
"um": "kg" }]
}
}
And i would like to find all the objects that have code:151001 and just show the delivered field.
For example it would show something like this:
{delivered: 8}
{delivered: 6}
So far i got this query, but it does not show exactly what i want:
db.test.find(
{
"standard.item.code": 151001
}
).pretty()
Since your items are in an array, your best approach will be to use the Aggregation Framework for this.
Example code:
db.test.aggregate(
// Find matching documents (could take advantage of an index)
{ $match: {
"standard.item.code" : 151001,
}},
// Unpack the item array into a stream of documents
{ $unwind: "$standard.item" },
// Filter to the items that match the code
{ $match: {
"standard.item.code" : 151001,
}},
// Only show the delivered amounts
{ $project: {
_id: 0,
delivered: "$standard.item.delivered"
}}
)
Results:
{
"result" : [
{
"delivered" : 8
},
{
"delivered" : 6
}
],
"ok" : 1
}
You'll notice there are two $match steps in the aggregation. The first is to match the documents including that item code. After using $unwind on the array, the second $match limits to the items with that code.