MongoDB find a given field and get average value - mongodb

I'd like to get the average in a collection for a given property value. What am I doing wrong?
[{name:'Bob',city:'Barcelona',trips: 1 },
{name:'Bruce',city:'Barcelona',trips: 5 },
{name:'Bruno',city:'València',trips: 2 },
{name:'Bart',city:'Barcelona',trips: 3 }]
db.x.aggregate([{$group:{city:'Barcelona', $avg:"$trips"}}]);

You need to filter the documents using the $match operator i.e. create a pipeline before the $group operator which will filter all the documents in the collection based on the given city value.
In the preceding $group operator pipeline, you can then use a null key (as denoted by the _id field) to group all the documents from the previous pipeline and get the accumulated average:
db.x.aggregate([
{ "$match": { "city": "Barcelona" } },
{ "$group": { "_id": null, "$avg": "$trips" } }
]);
Another approach (not as optimal as the above) would be to group all the documents in the collection by the city key and then filter afterwards:
db.x.aggregate([
{ "$group": { "_id": "$city", "$avg": "$trips" } },
{ "$match": { "_id": "Barcelona" } }
]);

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

How to group documents of a collection to a map with unique field values as key and count of documents as mapped value in mongodb?

I need a mongodb query to get the list or map of values with unique value of the field(f) as the key in the collection and count of documents having the same value in the field(f) as the mapped value. How can I achieve this ?
Example:
Document1: {"id":"1","name":"n1","city":"c1"}
Document2: {"id":"2","name":"n2","city":"c2"}
Document3: {"id":"3","name":"n1","city":"c3"}
Document4: {"id":"4","name":"n1","city":"c5"}
Document5: {"id":"5","name":"n2","city":"c2"}
Document6: {"id":"6,""name":"n1","city":"c8"}
Document7: {"id":"7","name":"n3","city":"c9"}
Document8: {"id":"8","name":"n2","city":"c6"}
Query result should be something like this if group by field is "name":
{"n1":"4",
"n2":"3",
"n3":"1"}
It would be nice if the list is also sorted in the descending order.
It's worth noting, using data points as field names (keys) is somewhat considered an anti-pattern and makes tooling difficult. Nonetheless if you insist on having data points as field names you can use this complicated aggregation to perform the query output you desire...
Aggregation
db.collection.aggregate([
{
$group: { _id: "$name", "count": { "$sum": 1} }
},
{
$sort: { "count": -1 }
},
{
$group: { _id: null, "values": { "$push": { "name": "$_id", "count": "$count" } } }
},
{
$project:
{
_id: 0,
results:
{
$arrayToObject:
{
$map:
{
input: "$values",
as: "pair",
in: ["$$pair.name", "$$pair.count"]
}
}
}
}
},
{
$replaceRoot: { newRoot: "$results" }
}
])
Aggregation Explanation
This is a 5 stage aggregation consisting of the following...
$group - get the count of the data as required by name.
$sort - sort the results with count descending.
$group - place results into an array for the next stage.
$project - use the $arrayToObject and $map to pivot the data such
that a data point can be a field name.
$replaceRoot - make results the top level fields.
Sample Results
{ "n1" : 4, "n2" : 3, "n3" : 1 }
For whatever reason, you show desired results having count as a string, but my results show the count as an integer. I assume that is not an issue, and may actually be preferred.

Is there a way to give order field to the result of MongoDB aggregation?

Is there any way to give order or rankings to MongoDB aggregation results?
My result is:
{
"score":100
"name": "John"
},
{
"score":80
"name": "Jane"
},
{
"score":60
"name": "Lee"
}
My wanted result is:
{
"score":100
"name": "John",
"rank": 1
},
{
"score":80
"name": "Jane"
"rank": 2
},
{
"score":60
"name": "Lee"
"rank": 3
}
I know there is a operator called $includeArrayIndex but this only works with $unwind operator.
Is there any way to give rank without using $unwind?
Using $unwind requires grouping on my collection, and I'm afraid grouping pipeline would be too huge to process.
The other way is to use $map and add rank in document using its index, and don't use $unwind stage because it would be single field array you can directly access using its key name as mention in last line of code,
$group by null and make array of documents in root array,
$map to iterate loop of root array, get the index of current object from root array using $indexOfArray and increment that returned index number using $add because index start from 0, and that is how we are creating rank field, merge object with current element object and rank field using $mergeObjects
let result = await db.collection.aggregate([
{
$group: {
_id: null,
root: {
$push: "$$ROOT"
}
}
},
{
$project: {
_id: 0,
root: {
$map: {
input: "$root",
in: {
$mergeObjects: [
"$$this",
{
rank: { $add: [{ $indexOfArray: ["$root", "$$this"] }, 1] }
}
]
}
}
}
}
}
]);
// you can access result using root key
let finalResult = result[0]['root'];
Playground

Getting the name of the field with maximum count in mongodb

I am new to mongodb and want to get the name of the field(spare part type) which has the maximum count! A sample document in my collection(original collection has 50 documents) is given below
[
{
"Vehicle": {
"licensePlateNo": "111A",
"vehicletype": "Car",
"model": "Nissan Sunny",
"VehicleCategory": [
{
"name": "Passenger"
}
],
"SparePart": [
{
"sparePartID": 4,
"Type": "Wheel",
"Price": 10000,
"Supplier": [
{
"supplierNo": 10,
"name": "Saman",
"contactNo": 112412634
}
]
}
],
"Employee": [
{
"employeeNo": 3,
"job": "Painter",
"jobCategory": "",
"salary": 100000
}
]
}
}
]
How can i write a query to obtain the name of the spare part with the highest count?
Use the aggregation framework for this type of query. In particular you'd need to run an aggregation operation where the pipeline consists of the following stages (in order):
$unwind - You need this as the first pipeline step in order to flatten the SparePart array so that you can process the documents as denormalised further down the pipeline. Without this you won't get
the desired result since the data will be in array format and the accumulator operators within the preceding stage work on single documents to aggregate the counts.
$group - This step will calculate the counts for you, for documents grouped by the Type field. The accumulator operator $sum will return the total number of documents with each group.
$sort - As you get the results from the previous $group pipeline, you would need to order the documents by the count field so that you get the top document with the most counts.
$limit - This will give you the top document.
Now, assembling the above together you should run the following pipeline to get the desired result:
db.AutoSmart.aggregate([
{ "$unwind": "$Vehicle.SparePart" },
{
"$group": {
"_id": "$Vehicle.SparePart.Type",
"count": { "$sum": 1 }
}
},
{ "$sort": { "count": -1 } },
{ "$limit": 1 }
])
let suppose we want to get the max-age of users from DB.
db.collection.find().sort({age:-1}).limit(1) // for MAX
further you can check that document.

Doing a Count on Array of Objects

If I have the payload:
{
"objs": [
{ "_id": "1234566", "some":"data" },
{ "_id": "1234566", "some":"data" },
{ "_id": "2345666", "some":"otherdata" },
{ "_id": "4566666", "some":"yetotherdata" },
]
}
What would be the best filter to get all objects with id: "1234566"?
To find all the documents having the an obj with _id as 1234566:
db.collection.find({"objs._id":"1234566"});
To filter the obj items, having the specified _id, for the document. Assuming your document has the _id attribute.
db.collection.aggregate([
{$unwind:"$objs"},
{$match:{"objs._id":"1234566"}},
{$group:{"_id":"_id","objs":{$push:{"id":"$objs._id","some":"$objs.some"}}}},
{$project:{"_id":0,"objs":1}}
])
You can change the _id in the $group stage, if you want to group based on some different field.