MongoDB 4.1 TotalRecords and Data in Aggregate - mongodb

I´m facing a problem because I´m new at mongo but I want to solve it.
I have different collections which I aggregate with lookups which is working perfect.
But now, I want to have the sum of total records in the header of my result.
My first problem now is that my actor relation is an array and my second problem is that I don´t know how to divide TotalCount and data from each other in the response.
The result should look like:
{
"totalRecords": 12,
"itemsPerPage": 10,
"docs": {
"_id": "7429437848adssk",
"title": "abc"
"actors" [
{"name": "Mr.x" },
{"name": "Mrs.Y"}
]
}
}
I solved my aggregation without the total count with the following stages:
unwind actors
lookup on actors
group with $first on first collection and $addToSet of the actors
Result on my response without count is as expected but if I add a $count into group it counts the actors and on 1 document with 2 actors it counts 2. But I want to have a count of each document.
Could someone provide me with a simple working example on my problem?

You need to add these 3 steps at the end of your aggregation
{
$facet: {
totalRecords: [
{
$count: "totalRecords"
}
],
docs: [
{
$match: {}
}
]
}
},
{
$unwind: "$docs"
},
{
$addFields: {
totalRecords: {
$arrayElemAt: [
"$totalRecords.totalRecords",
0
]
}
}
}
MongoPlayground

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

Merging / aggregating multiple documents in MongoDB

I have a little problem with my aggregations. I have a large collection of flat documents with the following schema:
{
_id:ObjectId("5dc027d38da295b969eca568"),
emp_no:10001,
salary:60117,
from_date:"1986-06-26",
to_date:"1987-06-26"
}
It's all about annual employee salaries. The data is exported from relational database so there are multiple documents with the same value of "emp_no" but the rest of their attributes vary. I need to aggregate them by values of attribute "emp_no" so as a result I will have something like this:
//one document
{
_id:ObjectId("5dc027d38da295b969eca568"),
emp_no:10001,
salaries: [
{
salary:60117,
from_date:"1986-06-26",
to_date:"1987-06-26"
},
{
salary:62102,
from_date:"1987-06-26",
to_date:"1988-06-25"
},
...
]
}
//another document
{
_id:ObjectId("5dc027d38da295b969eca579"),
emp_no:10002,
salaries: [
{
salary:65828,
from_date:"1996-08-03",
to_date:"1997-08-03"
},
...
]
}
//and so on
Last but not least there are almost 2.9m of documents so aggregating by "emp_no" manually would be a bit of a problem. Is there a way I can aggregate them using just mongo queries? How do I do this kind of thing? Thank you in advance for any help
The group stage of aggregation pipeline can be used to get this type of aggregates. Specify the attribute you want to group by as the value of _id field in the group stage.
How does the below query work for you?
db.collection.aggregate([
{
"$group": {
"_id": "$emp_no",
"salaries": {
"$push": {
"salary": "$salary",
"from_data": "$from_date",
"to_data": "$to_date"
}
},
"emp_no": {
"$first": "$emp_no"
},
"first_document_id": {
"$first": "$_id"
}
}
},
{
"$project": {
"_id": "$first_document_id",
"salaries": 1,
"emp_no": 1
}
}
])

Is it possible to create multiple collections out of a fork (e.g. bucket or facet)

I was wondering if it's possible to create multiple collections (3 to be precise) when using facet and/or bucket. The documentation specifies that facet can't be used with out. So is there an alternative? I'm trying to do something like the following:
Split document in 3 categories (bucket / facet ???)
With these categories get an avg of some value
Output the query to 3 different collections, each collection have the name of the category
I have the following atm, which groups and filters everything, but if I'm using $out, it can only export to one document (so I got rid of $out)
db.getCollection('metingen').aggregate(
[
{
$unwind: "$parameters"
},
{
$group: {
_id : {
uur: { $hour: "$tijdstip" },
locatie: {
$map: {
input: "$loc.coordinates",
in: {
$divide: [
{ $trunc: { $multiply: [ '$$this', 1000 ] }},
1000
]
}
}
}
},
gemiddelde: {
$avg: "$parameters.value"
}
}
},
{
$sort: { "_id.uur" : 1 }
}
]
);
The filter value should be something like $parameters.soort, but I haven't added it here, but I experimented with it before. I'm looking for a way to do this in one query only if possible.
Thanks in advance!

MongoDB sort by array size with large number of documents

I have an article collection which stores a list tags as following:
{
id: 1,
title: "Sample title"
tags: ["tag1", "tag2", "tag3", "tag4"]
}
In order to match articles to user's interest I use aggregation "match" and "setIntersection"
to count how many common tags between user's interest and articles tags then sort them to get best match.
db.article.aggregate([
{
"$match": {
{"tags": {"$in": ["tags", ["tag1", ..., "tag100"]}}
}
},
{
"$project": {
"tags_match": {
"$setIntersection": ["tags", ["tag1", ..., "tag100"]]
},
}
},
{
"$project": {
"tags_match_size": {
"$size": "$tags_match"
},
}
},
{"$sort": {"tags_match_size" : 1}}
{ "$limit" : 40 }
]
);
It works fine if I have few hundred documents in the article collection. Now I have around 1M articles, it takes around half an hour to finish.
I can't create index for "tags_match_size" to run faster as it is a new field in aggregate query.
How can I make the query run faster?
Thank you.
Create an index for tags field. Index will work for only first $match.

Getting the name of the field with maximum count in mongodb

I am new to mongodb and want to get the name of the field(spare part type) which has the maximum count! A sample document in my collection(original collection has 50 documents) is given below
[
{
"Vehicle": {
"licensePlateNo": "111A",
"vehicletype": "Car",
"model": "Nissan Sunny",
"VehicleCategory": [
{
"name": "Passenger"
}
],
"SparePart": [
{
"sparePartID": 4,
"Type": "Wheel",
"Price": 10000,
"Supplier": [
{
"supplierNo": 10,
"name": "Saman",
"contactNo": 112412634
}
]
}
],
"Employee": [
{
"employeeNo": 3,
"job": "Painter",
"jobCategory": "",
"salary": 100000
}
]
}
}
]
How can i write a query to obtain the name of the spare part with the highest count?
Use the aggregation framework for this type of query. In particular you'd need to run an aggregation operation where the pipeline consists of the following stages (in order):
$unwind - You need this as the first pipeline step in order to flatten the SparePart array so that you can process the documents as denormalised further down the pipeline. Without this you won't get
the desired result since the data will be in array format and the accumulator operators within the preceding stage work on single documents to aggregate the counts.
$group - This step will calculate the counts for you, for documents grouped by the Type field. The accumulator operator $sum will return the total number of documents with each group.
$sort - As you get the results from the previous $group pipeline, you would need to order the documents by the count field so that you get the top document with the most counts.
$limit - This will give you the top document.
Now, assembling the above together you should run the following pipeline to get the desired result:
db.AutoSmart.aggregate([
{ "$unwind": "$Vehicle.SparePart" },
{
"$group": {
"_id": "$Vehicle.SparePart.Type",
"count": { "$sum": 1 }
}
},
{ "$sort": { "count": -1 } },
{ "$limit": 1 }
])
let suppose we want to get the max-age of users from DB.
db.collection.find().sort({age:-1}).limit(1) // for MAX
further you can check that document.