MongoDB: Iterate over collection by key? - mongodb

How can I iterate over all documents matching each value of a specified key in a MongoDB collection?
E.g. for a collection containing:
{ _id: ObjectId, keyA: 1 },
{ _id: ObjectId, keyA: 2 },
{ _id: ObjectId, keyA: 2 },
...with an index of { keyA: 1 }, how can I run an operation on all documents where keyA:1, then keyA:2, and so on?
Specifically, I want to run a count() of the documents for each keyA value. So for this collection, the equivalent of find({keyA:1}).count(), find({keyA:2}).count(), etc.
UPDATE: whether or not the keys are indexed is irrelevant in terms of how they're iterated, so edited title and description to make Q/A easier to reference in the future.

A simpler approach to get the grouped count of unique values for keyA would be to use the new Aggregation Framework in MongoDB 2.2:
eg:
db.coll.aggregate(
{ $group : {
_id: "$keyA",
count: { $sum : 1 }
}}
)
... returns a result set where each _id is a unique value for keyA, with the count of how many times that value appears:
{
"result" : [
{
"_id" : 2,
"count" : 2
},
{
"_id" : 1,
"count" : 1
}
],
"ok" : 1
}

I am not sure I get you here but is this what you are looking for:
db.mycollection.find({ keyA: 1 }).count()
Will count all keys with keyA being 1.
If that does not answer the question do think you can be a little more specific?
Do you mean to do an aggregation for all unique key values for keyA?

It may be implemented with multiple queries:
var i=0;
var f=[];
while(i!=db.col.count()){
var k=db.col.findOne({keyA:{$not:{$in:f}}}).keyA;
i+=db.col.find({keyA:k}).count();
f.push(k);
}
The sense of this code is to collect unique values of KeyA field of objects of col collection in array f, which will be result of operation. Unfortunately, for a while doing this operation you should block any operations, which will change col collection.
UPDATE:
All can be done much easier using distinct:
db.col.distinct("KeyA")

Thanks to #Aleksey for pointing me to db.collection.distinct.
Looks like this does it:
db.ships.distinct("keyA").forEach(function(v){
db.ships.find({keyA:v}).count();
});
Of course calling count() within a loop doesn't do much; in my case I was looking for key-values with more than one document, so I did this:
db.ships.distinct("keyA").forEach(function(v){
print(db.ships.find({keyA:v}).count() > 1);
});

Related

How to check if multiple documents exist

Is there such a query that gets multiple fields, and returns which of these exists in the collection?
For example, if the collection has only:
{id : 1}
{id : 2}
And I want to know which of [{id : 1} , {id : 3}] exists in it, then the result will be something like [{id : 1}].
You are looking for the $in-operator.
db.collection.find({ id: { $in: [ 1, 3 ] } });
This will get you any documents where the id-field (different from the special _id field) is 1 or 3. When you only want the values of the id field and not the whole documents:
db.collection.find({ id: { $in: [ 1, 3 ] } }, { _id: false, id:true });
If you want to check provided key with value is present or not in collection, you can simply check by matching values and combining conditions using $or operator.
By considering id is different than _id in mongo.
You can use $or to get expected output and query will be as following.
db.collection.find({$or:[{"id":1},{"id":3}]},{"_id":0,"id":1})
If you want to match _id then use following query:
db.collection.find({$or:[{"_id":ObjectId("557fda78d077e6851e5bf0d3")},{"_id":ObjectId("557fda78d077e6851e5bf0d5")}]}

usage for MongoDB sort in array

I would like to ranked in descending order a list of documents in array names via their number value.
Here's the structure part of my collection :
_id: ObjectId("W")
var1: "X",
var2: "Y",
var3: "Z",
comments: {
names: [
{
number: 1;
},
{
number: 3;
},
{
number: 2;
}
],
field: Y;
}
but all my request with db.collection.find().sort( { "comments.names.number": -1 } ) doesn't work.
the desired output sort is :
{ "_id" : ObjectId("W"), "var1" : "X", "var3" : "Z", "comments" : { [ { "number" : 3 }, { "number" : 2 },{ "number" : 1 } ], "field": "Y" } }
Can you help me?
You need to aggregate the result, as below:
Unwind the names array.
Sort the records based on comments.names.number in descending
order.
Group the records based on the _id field.
project the required structure.
Code:
db.collection.aggregate([
{$unwind:"$comments.names"},
{$sort:{"comments.names.number":-1}},
{$group:{"_id":"$_id",
"var1":{$first:"$var1"},
"var2":{$first:"$var2"},
"var3":{$first:"$var3"},
"field":{$first:"$comments.field"},
"names":{$push:"$comments.names"}}},
{$project:{"comments":{"names":"$names","field":"$field"},"var1":1,
"var2":1,"var3":1}}
],{"allowDiskUse":true})
If your collection is large, you might want to add a $match criteria in the beginning of the aggregation pipeline to filter records or use (allowDiskUse:true), to facilitate sorting large number of records.
db.collection.aggregate([
{$match:{"_id":someId}},
{$unwind:"$comments.names"},
{$sort:{"comments.names.number":-1}},
{$group:{"_id":"$_id",
"var1":{$first:"$var1"},
"var2":{$first:"$var2"},
"var3":{$first:"$var3"},
"field":{$first:"$comments.field"},
"names":{$push:"$comments.names"}}},
{$project:{"comments":{"names":"$names","field":"$field"},"var1":1,
"var2":1,"var3":1}}
])
What The below query does:
db.collection.find().sort( { "comments.names.number": -1 } )
is to find all the documents, then sort those documents based on the number field in descending order. What this actually does is for each document get the comments.names.number field value which is the largest, for each document. And then sort the parent documents based on this number. It doesn't manipulate the names array inside each parent document.
You need update document for sort an array.
db.collection.update(
{ _id: 1 },
{
$push: {
comments.names: {
$each: [ ],
$sort: { number: -1 }
}
}
}
)
check documentation here:
http://docs.mongodb.org/manual/reference/operator/update/sort/#use-sort-with-other-push-modifiers
MongoDB queries sort the result documents based on the collection of fields specified in the sort. They do not sort arrays within a document. If you want the array sorted, you need to sort it yourself after you retrieve the document, or store the array in sorted order. See this old SO answer from Stennie.

Mongo find query for longest arrays inside object

I currently have objects in mongo set up like this for my application (simplified example, I removed some irrelevant fields for clarity here):
{
"_id" : ObjectId("529159af5b508dd71500000a"),
"c" : "somecontent",
"l" : [
{
"d" : "2013-11-24T01:43:11.367Z",
"u" : "User1"
},
{
"d" : "2013-11-24T01:43:51.206Z",
"u" : "User2"
}
]
}
What I would like to do is run a find query to return the objects which have the highest array length under "l" and sort highest->lowest, limit to 25 results. Some objects may have 1 object in the array, some may have 100. I'd like to find out which ones have the most under "l". I'm new to mongo and got everything else to work up until this point, but I just can't figure out the right parameters to get this specific query. Where I'm getting confused is how to handle counting the length of the array, sorting, etc. I could manually code this by parsing everything in the collection, but I'm sure there has to be a way for mongo to do this far more efficiently. I'm not against learning, if anyone knows any resources for more advanced queries or could help me out I'd really be thankful as this is the last piece! :-)
As a side note, node.js and mongo together is amazing and I wish I started using them in conjunction a long time ago.
Use the aggregation framework. Here's how:
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )
There is no easy way to do this with your existing schema. The reason for this is that there is nothing in mongodb to find the size of your array length. Yes, you have $size operator, but the way it works is just to find all the arrays of a specific length.
So you can not sort your find query based on the length of the array. The only reasonable way out is to add additional field to your schema which will hold the length of the array (you will have something like "l_length : 3" in additional to your fields for every document). Good thing is that you can do it easily by looking at this relevant answer and after this you just need to make sure to increment or decrement this value when you are modifying the array.
When you will add this field, you can easily sort it by that field and moreover you can take advantage of indexes.
There is no straight approach to do this,
You can try adding size field in your document using $size,
$addFields to add new field total to get total elements in l array
$sort by total in descending order
$limit to select single document
$project to remove total field if you don't needed
db.collection.aggregate([
{ $addFields: { total: { $size: "$l" } } },
{ $sort: { total: -1 } },
{ $limit: 25 }
// { $project: { total: 0 } }
])
Playground

MongoDB sorting documents by nested data

Considering the following design for posts:
{
title: string,
body: string,
comments: [
{name: string, comment: string, ...},
{name: string, comment: string, ...},
...
]
}
...
1) I would like to select all posts in my collection and have them sorted by the posts that have the most comments. I'm assuming since the .length variable is always set via javascript that it is possible to use this to sort by but I don't know how or if it's actually more efficient to store the comment count in a field in the post document?
1.1) Or does it make more sense to store the comment count in a separate document and continiously update that?
2) When selecting posts, is it possible to limit the result to only return back the last 3 comments of a post document as opposed to the whole array?
You need to use the aggregate command
This should give you a list of post _id with the number of comments sorted by the count in reverse order.
You can use the $limit operators to return the x top rows. e.g. { $limit : 5 }
db.posts.aggregate(
{ $unwind : "$comments" },
{ $group : { _id : "$_id" , number : { $sum : 1 } } },
{ $sort : { number : -1 } }
);
Take a look
http://docs.mongodb.org/manual/tutorial/aggregation-examples/

Get documents with tags in list, ordered by total number of matches

Given the following MongoDB collection of documents :
{
title : 'shirt one'
tags : [
'shirt',
'cotton',
't-shirt',
'black'
]
},
{
title : 'shirt two'
tags : [
'shirt',
'white',
'button down collar'
]
},
{
title : 'shirt three'
tags : [
'shirt',
'cotton',
'red'
]
},
...
How do you retrieve a list of items matching a list of tags, ordered by the total number of matched tags? For example, given this list of tags as input:
['shirt', 'cotton', 'black']
I'd want to retrieve the items ranked in desc order by total number of matching tags:
item total matches
-------- --------------
Shirt One 3 (matched shirt + cotton + black)
Shirt Three 2 (matched shirt + cotton)
Shirt Two 1 (matched shirt)
In a relational schema, tags would be a separate table, and you could join against that table, count the matches, and order by the count.
But, in Mongo... ?
Seems this approach could work,
break the input tags into multiple "IN" statements
query for items by "OR"'ing together the tag inputs
i.e. where ( 'shirt' IN items.tags ) OR ( 'cotton' IN items.tags )
this would return, for example, three instances of "Shirt One", 2 instances of "Shirt Three", etc
map/reduce that output
map: emit(this._id, {...});
reduce: count total occurrences of _id
finalize: sort by counted total
But I'm not clear on how to implement this as a Mongo query, or if this is even the most efficient approach.
As i answered in In MongoDB search in an array and sort by number of matches
It's possible using Aggregation Framework.
Assumptions
tags attribute is a set (no repeated elements)
Query
This approach forces you to unwind the results and reevaluate the match predicate with unwinded results, so its really inefficient.
db.test_col.aggregate(
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$unwind: "$tags"},
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$group: {
_id:{"_id":1},
matches:{$sum:1}
}},
{$sort:{matches:-1}}
);
Expected Results
{
"result" : [
{
"_id" : {
"_id" : ObjectId("5051f1786a64bd2c54918b26")
},
"matches" : 3
},
{
"_id" : {
"_id" : ObjectId("5051f1726a64bd2c54918b24")
},
"matches" : 2
},
{
"_id" : {
"_id" : ObjectId("5051f1756a64bd2c54918b25")
},
"matches" : 1
}
],
"ok" : 1
}
Right now, it isnt possible to do unless you use MapReduce. The only problem with MapReduce is that it is slow (compared to a normal query).
The aggregation framework is slated for 2.2 (so should be available in 2.1 dev release) and should make this sort of thing much easier to do without MapReduce.
Personally, I do not think using M/R is an efficient way to do it. I would rather query for all the documents and do those calculations on the application side. It is easier and cheaper to scale your app servers than it is to scale your database servers so let the app servers do the number crunching. Of those, this approach may not work for you given your data access patterns and requirements.
An even simpler approach may be to just include a count property in each of your tag objects and whenever you $push a new tag to the array, you also $inc the count property. This is a common pattern in the MongoDB world, at least until the aggregation framework.
I'll second #Bryan in saying that MapReduce is the only possible way at the moment (and it's far from perfect). But, in case you desperately need it, here you go :-)
var m = function() {
var searchTerms = ['shirt', 'cotton', 'black'];
var me = this;
this.tags.forEach(function(t) {
searchTerms.forEach(function(st) {
if(t == st) {
emit(me._id, {matches : 1});
}
})
})
};
var r = function(k, vals) {
var result = {matches : 0};
vals.forEach(function(v) {
result.matches += v.matches;
})
return result;
};
db.shirts.mapReduce(m, r, {out: 'found01'});
db.found01.find();