Mongo find query for longest arrays inside object - mongodb

I currently have objects in mongo set up like this for my application (simplified example, I removed some irrelevant fields for clarity here):
{
"_id" : ObjectId("529159af5b508dd71500000a"),
"c" : "somecontent",
"l" : [
{
"d" : "2013-11-24T01:43:11.367Z",
"u" : "User1"
},
{
"d" : "2013-11-24T01:43:51.206Z",
"u" : "User2"
}
]
}
What I would like to do is run a find query to return the objects which have the highest array length under "l" and sort highest->lowest, limit to 25 results. Some objects may have 1 object in the array, some may have 100. I'd like to find out which ones have the most under "l". I'm new to mongo and got everything else to work up until this point, but I just can't figure out the right parameters to get this specific query. Where I'm getting confused is how to handle counting the length of the array, sorting, etc. I could manually code this by parsing everything in the collection, but I'm sure there has to be a way for mongo to do this far more efficiently. I'm not against learning, if anyone knows any resources for more advanced queries or could help me out I'd really be thankful as this is the last piece! :-)
As a side note, node.js and mongo together is amazing and I wish I started using them in conjunction a long time ago.

Use the aggregation framework. Here's how:
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )

There is no easy way to do this with your existing schema. The reason for this is that there is nothing in mongodb to find the size of your array length. Yes, you have $size operator, but the way it works is just to find all the arrays of a specific length.
So you can not sort your find query based on the length of the array. The only reasonable way out is to add additional field to your schema which will hold the length of the array (you will have something like "l_length : 3" in additional to your fields for every document). Good thing is that you can do it easily by looking at this relevant answer and after this you just need to make sure to increment or decrement this value when you are modifying the array.
When you will add this field, you can easily sort it by that field and moreover you can take advantage of indexes.

There is no straight approach to do this,
You can try adding size field in your document using $size,
$addFields to add new field total to get total elements in l array
$sort by total in descending order
$limit to select single document
$project to remove total field if you don't needed
db.collection.aggregate([
{ $addFields: { total: { $size: "$l" } } },
{ $sort: { total: -1 } },
{ $limit: 25 }
// { $project: { total: 0 } }
])
Playground

Related

How to perform sort and limit on whole group by in MongoDB - mongoose?

I am trying to apply sort on the whole group and limit the results.
But my below mongoose code sorts the group on the mentioned limit.
collection.aggregate([
{ $sort : {NAME: -1}} ,
{ $match : { NAME : {$regex : `.*${query.NAME.toUpperCase()}.*`} } },
{ $group : { _id : "$NAME", NAME:{$first:"$NAME"} }},
{ $skip : 1},
{ $limit : 10}],function(err,data){}
Let's say it sort first 10 results in the group, instead of sorting everything and show the first 10 results.
Thanks in advance.
See this link to the documentation. I haven't tested this since I haven't got the environment, nor the database to do so, but, I believe you might want to put your $sort argument just before $limit in the pipeline.

Is there a way to prevent mongo queries "branching" on arrays?

If I have the following documents:
{a: {x:1}} // without array
{a: [{x:1}]} // with array
Is there a way to query for {'a.x':1} that will return the first one but not the second one? IE, I want the document where a.x is 1, and a is not an array.
Please note that future version of MongoDB would incorporate the $isArray aggregation expression. In the meantime...
...the following code will do the trick as the $elemMatch operator matches only documents having an array field:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}})
Given that dataset:
> db.test.find({},{_id:0})
{ "a" : { "x" : 1 } }
{ "a" : [ { "x" : 1 } ] }
{ "a" : [ { "x" : 0 }, { "x" : 1 } ]
It will return:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}}, {_id:0})
{ "a" : { "x" : 1 } }
Please note this should be considered as a short term solution. The MongoDB team took great cares to ensure that [{x:1}] and {x:1} behave the same (see dot-notation or $type for arrays). So you should consider that at some point in the future, $elemMatch might be updated (see JIRA issue SERVER-6050). In the meantime, maybe worth considering fixing your data model so it would no longer be necessary to distinguish between an array containing one subdocument and a bare subdocument.
You can do this by adding a second term that ensures a has no elements. That second term will always be true when a is a plain subdoc, and always false when a is an array (as otherwise the first term wouldn't have matched).
db.test.find({'a.x': 1, 'a.0': {$exists: false}})

Get first element in array and return using Aggregate?

How can I get and return the first element in an array using a Mongo aggregation?
I tried using this code:
db.my_collection.aggregate([
{ $project: {
resp : { my_field: { $slice: 1 } }
}}
])
but I get the following error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: invalid operator '$slice'",
"code" : 15999,
"ok" : 0
}
Note that 'my_field' is an array of 4 elements, and I only need to return the first element.
Since 3.2, we can use $arrayElemAt to get the first element in an array
db.my_collection.aggregate([
{ $project: {
resp : { $arrayElemAt: ['$my_field',0] }
}}
])
Currently, the $slice operator is unavailable in the the $project operation, of the aggregation pipeline.
So what you could do is,
First $unwind, the my_field array, and then group them together and take the $first element of the group.
db.my_collection.aggregate([
{$unwind:"$my_field"},
{$group:{"_id":"$_id","resp":{$first:"$my_field"}}},
{$project:{"_id":0,"resp":1}}
])
Or using the find() command, where you could make use of the $slice operator in the projection part.
db.my_collection.find({},{"my_field":{$slice:1}})
Update: based on your comments, Say you want only the second item in an array, for the record with an id, id.
var field = 2;
var id = ObjectId("...");
Then, the below aggregation command gives you the 2nd item in the my_field array of the record with the _id, id.
db.my_collection.aggregate([
{$match:{"_id":id}},
{$unwind:"$my_field"},
{$skip:field-1},
{$limit:1}
])
The above logic cannot be applied for more a record, since it would involve a $group, operator after $unwind. The $group operator produces a single record for all the records in that particular group making the $limit or $skip operators applied in the later stages to be ineffective.
A small variation on the find() query above would yield you the expected result as well.
db.my_collection.find({},{"my_field":{$slice:[field-1,1]}})
Apart from these, there is always a way to do it in the client side, though a bit costly if the number of records is very large:
var field = 2;
db.my_collection.find().map(function(doc){
return doc.my_field[field-1];
})
Choosing from the above options depends upon your data size and app design.
Starting Mongo 4.4, the aggregation operator $first can be used to access the first element of an array:
// { "my_field": ["A", "B", "C"] }
// { "my_field": ["D"] }
db.my_collection.aggregate([
{ $project: { resp: { $first: "$my_field" } } }
])
// { "resp" : "A" }
// { "resp" : "D" }
The $slice operator is scheduled to be made available in the $project operation in Mongo 3.1.4, according to this ticket: https://jira.mongodb.org/browse/SERVER-6074
This will make the problem go away.
This version is currently only a developer release and is not yet stable (as of July 2015). Expect this around October/November time.
Mongo 3.1.6 onwards,
db.my_collection.aggregate([
{
"$project": {
"newArray" : { "$slice" : [ "$oldarray" , 0, 1 ] }
}
}
])
where 0 is the start index and 1 is the number of elements to slice

Query returns more than expected results

Bear with me, this is not really my question. Just trying to get someone to understand.
Authors note:
The possible duplicate question solution allows $elemMatch to constrain because >all of the elements are an array. This is a little different.
So, in the accepted answer the main point has been brought up. This behavior is well
documented and you should not "compare 'apples'` with 'oranges'". The fields are of
different types, and while there is a workaround for this, the best solution for the real
world is don't do this.
Happy reading :)
I have a collection of documents I am trying to search, the collection contains the following:
{ "_id" : ObjectId("52faa8a695fa10cc7d2b7908"), "x" : 1 }
{ "_id" : ObjectId("52faa8ab95fa10cc7d2b7909"), "x" : 5 }
{ "_id" : ObjectId("52faa8ad95fa10cc7d2b790a"), "x" : 15 }
{ "_id" : ObjectId("52faa8b095fa10cc7d2b790b"), "x" : 25 }
{ "_id" : ObjectId("52faa8b795fa10cc7d2b790c"), "x" : [ 5, 25 ] }
So I want to find the results where x falls between the values of 10 and 20. So this is the query that seemed logical to me:
db.collection.find({ x: {$gt: 10, $lt: 20} })
But the problem is this returns two documents in the result:
{ "_id" : ObjectId("52faa8ad95fa10cc7d2b790a"), "x" : 15 }
{ "_id" : ObjectId("52faa8b795fa10cc7d2b790c"), "x" : [ 5, 25 ] }
I am not expecting to see the second result as none of the values are between 10 and 20.
Can someone explain why I do not get the result I expect? I think { "x": 15 } should be the only document returned.
So furthermore, how can I get what I expect?
This behaviour is expected and explained in mongo documentation here.
Query a Field that Contains an Array
If a field contains an array and your query has multiple conditional
operators, the field as a whole will match if either a single array
element meets the conditions or a combination of array elements
meet the conditions.
Mongo seems to be willing to play "smug", by giving back results when a combination of array elements match all conditions independently.
In our example, 5 matches the $lt:20 condition and 25 matches the $gt:10 condition. So, it's a match.
Both of the following will return the [5,25] result:
db.collection.find({ x: {$gt: 10, $lt: 20} })
db.collection.find({ $and : [{x: {$gt: 10}},{x:{ $lt: 20}} ] })
If this is user expected behaviour, opinions can vary. But it certainly is documented, and should be expected.
Edit, for Neil's sadistic yet highly educational edit to original answer, asking for a solution:
Use of the $elemMatch can make "stricter" element comparisons for arrays only.
db.collection.find({ x: { $elemMatch:{ $gt:10, $lt:20 } } })
Note: this will match both x:[11,12] and x:[11,25]
I believe when a query like this is needed, a combination on two queries is required, and the results combined. Below is a query that returns correct results for documents with x being not an array:
db.collection.find( { $where : "!Array.isArray(this.x)", x: {$gt: 10, $lt: 20} } )
But the best approach in this case is to change the type of x to always be an array, even when it only contains one element. Then, only the $elemMatch query is required to get correct results, with expected behaviour.
You can first check if the subdocument is not and array and provide a filter for the desired values:
db.collection.find(
{
$and :
[
{ $where : "!Array.isArray(this.x)" },
{ x: { $gt: 10, $lt: 20 } }
]
}
)
which returns:
{ "_id" : ObjectId("52fb4ec1cfe34ac4b9bab163"), "x" : 15 }

MongoDB: Iterate over collection by key?

How can I iterate over all documents matching each value of a specified key in a MongoDB collection?
E.g. for a collection containing:
{ _id: ObjectId, keyA: 1 },
{ _id: ObjectId, keyA: 2 },
{ _id: ObjectId, keyA: 2 },
...with an index of { keyA: 1 }, how can I run an operation on all documents where keyA:1, then keyA:2, and so on?
Specifically, I want to run a count() of the documents for each keyA value. So for this collection, the equivalent of find({keyA:1}).count(), find({keyA:2}).count(), etc.
UPDATE: whether or not the keys are indexed is irrelevant in terms of how they're iterated, so edited title and description to make Q/A easier to reference in the future.
A simpler approach to get the grouped count of unique values for keyA would be to use the new Aggregation Framework in MongoDB 2.2:
eg:
db.coll.aggregate(
{ $group : {
_id: "$keyA",
count: { $sum : 1 }
}}
)
... returns a result set where each _id is a unique value for keyA, with the count of how many times that value appears:
{
"result" : [
{
"_id" : 2,
"count" : 2
},
{
"_id" : 1,
"count" : 1
}
],
"ok" : 1
}
I am not sure I get you here but is this what you are looking for:
db.mycollection.find({ keyA: 1 }).count()
Will count all keys with keyA being 1.
If that does not answer the question do think you can be a little more specific?
Do you mean to do an aggregation for all unique key values for keyA?
It may be implemented with multiple queries:
var i=0;
var f=[];
while(i!=db.col.count()){
var k=db.col.findOne({keyA:{$not:{$in:f}}}).keyA;
i+=db.col.find({keyA:k}).count();
f.push(k);
}
The sense of this code is to collect unique values of KeyA field of objects of col collection in array f, which will be result of operation. Unfortunately, for a while doing this operation you should block any operations, which will change col collection.
UPDATE:
All can be done much easier using distinct:
db.col.distinct("KeyA")
Thanks to #Aleksey for pointing me to db.collection.distinct.
Looks like this does it:
db.ships.distinct("keyA").forEach(function(v){
db.ships.find({keyA:v}).count();
});
Of course calling count() within a loop doesn't do much; in my case I was looking for key-values with more than one document, so I did this:
db.ships.distinct("keyA").forEach(function(v){
print(db.ships.find({keyA:v}).count() > 1);
});