Indexing MongoDB for sort consistency - mongodb

The MongoDB documentation says that MongoDB doesn't store documents in a collection in a particular order. So if you have this collection:
db.restaurants.insertMany( [
{ "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
{ "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
{ "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
{ "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
{ "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
] );
and sorting like this:
db.restaurants.aggregate(
[
{ $sort : { borough : 1 } }
]
)
Then the sort order can be inconsistent since:
the borough field contains duplicate values for both Manhattan and Brooklyn. Documents are returned in alphabetical order by borough, but the order of those documents with duplicate values for borough might not to be the same across multiple executions of the same sort.
To return a consistent result it's recommended to modify the query to:
db.restaurants.aggregate(
[
{ $sort : { borough : 1, _id: 1 } }
]
)
My question relates to the efficiency of such a query. Let's say you have millions of documents, should you create a compound index, something like { borough: 1, _id: -1 }, to make it efficient? Or is it enough to index { borough: 1 } due to the, potentially, special nature of the _id field?
I'm using MongoDB 4.4.

If you need stable sort, you will have to sort on both the fields and for performant query you will need to have a compound index on both the fields.
{ borough: 1, _id: -1 }

Related

Querying aggregates on subdocuments then grouping by field in parent document

I'm a noob when it comes to Mongo and I've been struggling to wrap my head around how to fetch data in the following fashion. I have a collection of order documents that contain some data such as an event_id and a subcollection (if that's the term) of issued_tickets. issued_tickets contains one to many subdocuments that contain fields such as name, date, etc. What I am trying to do is fetch the number of each type of issued tickets for each event_id in the parent document. So I would be wanting to do a count on each issued_tickets grouped by issued_tickets.name and then that goes up to the parent which is then summed and grouped on the parent's event_id.
Can anyone help me accomplish this? I keep spinning myself out on trying groupings and projections still.
Here is a sample document:
{
"_id" : ObjectId("5ce7335c1c666f000414f74a"),
"event_id" : ObjectId("5cb54f966668a9719ef6a103"),
"subtotal" : 3000,
"service_fee" : 760,
"processing_fee" : 143,
"total" : 3903,
"customer_id" : ObjectId("5ce7666c1c335f000414f747"),
"updated_at" : ISODate("2019-05-23T23:57:17.524Z"),
"created_at" : ISODate("2019-05-23T23:57:17.524Z"),
"ref" : "60d5fcf9-86c6-469b-b86b-315a9b55caca",
"issued_tickets" : [
{
"_id" : ObjectId("5ce7335c1c335f000414f666"),
"name" : "Tier 1",
"stub_name" : "Tier 1",
"price" : 1500,
"base_fee" : 200,
"perc_fee" : "0.12",
"access_code" : "163a1b9ee98338a8a4288a1c87446665",
"redeemed" : false
},
{
"_id" : ObjectId("5ce7335c1c335f0004146669"),
"name" : "Tier 2",
"stub_name" : "Tier 2",
"price" : 1500,
"base_fee" : 200,
"perc_fee" : "0.12",
"access_code" : "f50f262cd0bf1ec4ab36667c2a762446",
"redeemed" : true
}
]
}
We can do aggregations like following
$unwind to deconstruct the array
$group to reconstruct the array. While regrouping by eventId and issued_tickets.name, we can count using $sum
Mongo script :
db.collection.aggregate([
{
$unwind: "$issued_tickets"
},
{
$group: {
_id: {
_id: "$event_id",
ticketName: "$issued_tickets.name"
},
count: {
$sum: 1
}
}
},
{
$project: {
event_id: "$_id._id",
ticketName: "$_id.ticketName",
count: 1,
_id: 0
}
}
])
Working Mongo playground

Mongodb accessing documents

I've the following db:
{ "_id" : 1, "results" : [ { "product" : "abc", "score" : 10 }, { "product" : "xyz", "score" : 5 } ] }
{ "_id" : 2, "results" : [ { "product" : "abc", "score" : 8 }, { "product" : "xyz", "score" : 7 } ] }
{ "_id" : 3, "results" : [ { "product" : "abc", "score" : 7 }, { "product" : "xyz", "score" : 8 } ] }
I want to show the first score of each _id, i tried the following:
db.students.find({},{"results.$":1})
But it doesn't seem to work, any advice?
You can take advantage of aggregation pipeline to solve this.
Use $project in conjunction with $arrayElemAt to point to appropriate node index in the array.
So, to extract the documents of the first score, have written below query.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results",0]}} } ]);
In case if you just wish to have scores excluding product, use $results.score as shown below.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results.score",0]}} } ]);
Here scoredoc object will have all documents of first score element.
Hope this helps!
According to above mentioned description please try executing following query in MongoDB shell
db.students.find(
{results:
{$elemMatch:{score:{$exists:true}}}}, {'results.$.score':1}
)
According to MongoDB documentation
The positional $ operator limits the contents of an from the
query results to contain only the first element matching the query
document.
Hence in above mentioned query positional $ operator is used in projection section to retrieve first score of each document.

MongoDB: aggregation group by a field in large collection

I have a large (millions) collection of files where tags is an array field like this.
{
"volume" : "abc",
"name" : "file1.txt",
"type" : "txt",
"tags" : [ "Interesting", "Weird" ], ...many other fields
}
Now I want to return count of unique tags for the entire collection. I am using aggregate for that. Here's my query.
db.files.aggregate(
{ "$match" : {"volume":"abc"}},
{ "$project" : { "tags" : 1}},
{ "$unwind" : "$tags"},
{ "$group" : { "_id" : "$tags" , "count" : { "$sum" : 1}}},
{ "$sort" : { "count" : 1}}
)
I am seeing that it takes around 3 seconds for this to return for a collection of 1.2M files. I do have index on tags and volume fields.
I am using MongoDB 2.4. Since 2.6 is not out, I cannot use the .explain() here.
Any ideas how I can improve this performance? I do need to have a summary count. Also, I cannot pre-compute these counts as my $match will be variable based on volume, type, a particular tag, some date time of file etc.

ordering fields after applying $setUnion mongoDB

I have a collection:
{
"_id" : ObjectId("5338ec2a5b5b71242a1c911c"),
"people" : [
{
"name" : "Vasya"
},
{
"age" : "30"
},
{
"weight" : "80"
}
],
"animals" : [
{
"dog" : "Sharick"
},
{
"cat" : "Barsik"
},
{
"bird" : "parrot"
}
]},{
"_id" : ObjectId("5338ec7f5b5b71242a1c911d"),
"people" : [
{
"name" : "Max"
},
{
"age" : "32"
},
{
"weight" : "78"
}
],
"animals" : [
{
"dog" : "Borbos"
},
{
"cat" : "Murka"
},
{
"bird" : "Eagle"
}
]}
then I combine two arrays "people" and "animals"
db.tmp.aggregate({$project:{"union":{$setUnion:["$people","$animals"]}}})
in the issue:
How to make the fields of each record array "result" to be displayed in a single order, and not randomly?
that is:
Wish I could find the quote ( If I can I will add it ), but it basically comes from the CTO of MongoDB and is essentially (sic) "Set's are not considered to be ordered". And very much so from a Math point of view that is true.
So you have stumbled upon one of the new features in the current (as of writing) 2.6 release candidate series. But like with it's $addToSet counterpart, the resulting set from $setUnion will not be sorted in any way.
To do this you need to $unwind and $sort and then $group again using $push, just as you always have with $addToSet. And of course you would need some common key in order to $sort on this, which your data does not.
Update: Here is the quote, and here is another.

Find maximum date from multiple embedded documents

One of many documents in my collection is like below:
{ "_id" :123,
"a" :[
{ "_id" : 1,
"dt" : ISODate("2013-06-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-02-10T19:38:42Z")
}
],
"b" :[
{ "_id" : 1,
"dt" : ISODate("2013-02-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-23-10T19:38:42Z")
}
],
"c" :[
{ "_id" : 1,
"dt" : ISODate("2013-03-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-13-10T19:38:42Z")
}
]
}
I want to find the maximum date for the whole document (a,b,c).
The solution i have right now is, I loop through all root _id then do a $match in aggregation framework for each a, b, c for every root document. this sounds very inefficient, any better ideas?
Your question is very very similar to this question. Check it out.
As with that solution, your issue can be handled with MongoDB's aggregation framework, using the $project and $cond operators to repeatedly flatten your documents while preserving a max value at each step.