How to group documents of a collection to a map with unique field values as key and count of documents as mapped value in mongodb? - mongodb

I need a mongodb query to get the list or map of values with unique value of the field(f) as the key in the collection and count of documents having the same value in the field(f) as the mapped value. How can I achieve this ?
Example:
Document1: {"id":"1","name":"n1","city":"c1"}
Document2: {"id":"2","name":"n2","city":"c2"}
Document3: {"id":"3","name":"n1","city":"c3"}
Document4: {"id":"4","name":"n1","city":"c5"}
Document5: {"id":"5","name":"n2","city":"c2"}
Document6: {"id":"6,""name":"n1","city":"c8"}
Document7: {"id":"7","name":"n3","city":"c9"}
Document8: {"id":"8","name":"n2","city":"c6"}
Query result should be something like this if group by field is "name":
{"n1":"4",
"n2":"3",
"n3":"1"}
It would be nice if the list is also sorted in the descending order.

It's worth noting, using data points as field names (keys) is somewhat considered an anti-pattern and makes tooling difficult. Nonetheless if you insist on having data points as field names you can use this complicated aggregation to perform the query output you desire...
Aggregation
db.collection.aggregate([
{
$group: { _id: "$name", "count": { "$sum": 1} }
},
{
$sort: { "count": -1 }
},
{
$group: { _id: null, "values": { "$push": { "name": "$_id", "count": "$count" } } }
},
{
$project:
{
_id: 0,
results:
{
$arrayToObject:
{
$map:
{
input: "$values",
as: "pair",
in: ["$$pair.name", "$$pair.count"]
}
}
}
}
},
{
$replaceRoot: { newRoot: "$results" }
}
])
Aggregation Explanation
This is a 5 stage aggregation consisting of the following...
$group - get the count of the data as required by name.
$sort - sort the results with count descending.
$group - place results into an array for the next stage.
$project - use the $arrayToObject and $map to pivot the data such
that a data point can be a field name.
$replaceRoot - make results the top level fields.
Sample Results
{ "n1" : 4, "n2" : 3, "n3" : 1 }
For whatever reason, you show desired results having count as a string, but my results show the count as an integer. I assume that is not an issue, and may actually be preferred.

Related

How do I sort results based on a specific array item in MongoDB?

I have an array of documents that looks like this:
patient: {
conditions: [
{
columnToSortBy: "value",
type: "PRIMARY"
},
{
columnToSortBy: "anotherValue",
type: "SECONDARY"
},
]
}
I need to be able to $sort by columnToSortBy, but using the item in the array where type is equal to PRIMARY. PRIMARY is not guaranteed to be the first item in the array every time.
How do I set my $sort up to accommodate this? Is there something akin to:
// I know this is invalid. It's for illustration purposes
$sort: "columnToSortBy", {$where: {type: "PRIMARY"}}
Is it possible to sort a field, but only when another field matches a query? I do not want the secondary conditions to affect the sort in any way. I am sorting on that one specific element alone.
You need to use aggregation framework
db.collection.aggregate([
{
$unwind: "$patient.conditions" //reshape the data
},
{
"$sort": {
"patient.conditions.columnToSortBy": -1 //sort it
}
},
{
$group: {
"_id": "$_id",
"conditions": { //re group it
"$push": "$patient.conditions"
}
}
},
{
"$project": { //project it
"_id": 1,
"patient.conditions": "$conditions"
}
}
])
Playground

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

How to sort a dictionary keys and pick the first in MongoDb?

I'm running the following query as described in the docs.
db.getCollection('things')
.find(
{ _id: UUID("...") },
{ _id: 0, history: 1 }
)
It produces a single element that, when unfolded in the GUI, shows the dictonary history. When I unfold that, I get to see the contents: bunch of keys and correlated values.
Now, I'd like to sort the keys alphabetically and pick n first ones. Please note that it's not an array but a dictionary that is stored. Also, it would be great if I could flatten the structure and pop up my history to be the head (root?) of the document returned.
I understand it's about projection and slicing. However, I'm not getting anywhere, despite many attempts. I get syntax errors or a full list of elements. Being rather nooby, I fear that I require a few pointers on how to diagnose my issue to begin with.
Based on the comments, I tried with aggregate and $sort. Regrettably, I only seem to be sorting the current output (that produces a single document due to the match condition). I want to access the elements inside history.
db.getCollection('things')
.aggregate([
{ $match: { _id: UUID("...") } },
{ $sort: { history: 1 } }
])
I'm sensing that I should use projection to pull out a list of elements residing under history but I'm getting no success using the below.
db.getCollection('things')
.aggregate([
{ $match: { _id: UUID("...") } },
{ $project: { history: 1, _id: 0 } }
])
It is a long process to just sort object properties by alphabetical order,
$objectToArray convert history object to array in key-value format
$unwind deconstruct above generated array
$sort by history key by ascending order (1 = ascending, -1 = descending)
$group by _id and reconstruct history key-value array
$slice to get your number of properties from dictionary from top, i have entered 1
$arrayToObject back to convert key-value array to object format
db.getCollection('things').aggregate([
{ $match: { _id: UUID("...") } },
{ $project: { history: { $objectToArray: "$history" } } },
{ $unwind: "$history" },
{ $sort: { "history.k": 1 } },
{
$group: {
_id: "$_id",
history: { $push: "$history" }
}
},
{
$project: {
history: {
$arrayToObject: { $slice: ["$history", 1] }
}
}
}
])
Playground
There is another option, but as per MongoDB, it can not guarantee this will reproduce the exact result,
$objectToArray convert history object to array in key-value format
$setUnion basically this operator will get unique elements from an array, but as per experience, it will sort elements by key ascending order, so as per MongoDB there is no guarantee.
$slice to get your number of properties from dictionary from top, i have entered 1
$arrayToObject back to convert key-value array to object format
db.getCollection('things').aggregate([
{ $match: { _id: UUID("...") } },
{
$project: {
history: {
$arrayToObject: {
$slice: [
{ $setUnion: { $objectToArray: "$history" } },
1
]
}
}
}
}
])
Playground

Mongo Query to return common values in array

I need a Mongo Query to return me common values present in an array.
So if there are 4 documents in match, then the values are returned if those are present in in all the 4 documents
Suppose I have the below documents in my db
Mongo Documents
{
"id":"0",
"merchants":["1","2"]
}
{
"id":"1",
"merchants":["1","2","4"]
}
{
"id":"2",
"merchants":["4","5"]
}
Input : List of id
(i) Input with id "0" and "1"
Then it should return me merchants:["1","2"] as both are present in documents with id "0" & id "1"
(ii) Input with id "1" and "2"
Then it should return me merchants:["4"] as it is common and present in both documents with id "1" & id "2"
(iii) Input with id "0" and "2"
Should return empty merchants:[] as no common merchants between these 2 documents
You can try below aggregation.
db.collection.aggregate(
{$match:{id: {$in: ["1", "2"]}}},
{$group:{_id:null, first:{$first:"$merchants"}, second:{$last:"$merchants"}}},
{$project: {commonToBoth: {$setIntersection: ["$first", "$second"]}, _id: 0 } }
)
Say you have a function query that does the required DB query for you, and you'll call that function with idsToMatch which is an array containing all the elements you want to match. I have used JS here as the driver language, replace it with whatever you are using.
The following code is dynamic, will work for any number of ids you give as input:
const query = (idsToMatch) => {
db.collectionName.aggregate([
{ $match: { id: {$in: idsToMatch} } },
{ $unwind: "$merchants" },
{ $group: { _id: { id: "$id", data: "$merchants" } } },
{ $group: { _id: "$_id.data", count: {$sum: 1} } },
{ $match: { count: { $gte: idsToMatch.length } } },
{ $group: { _id: 0, result: {$push: "$_id" } } },
{ $project: { _id: 0, result: "$result" } }
])
The first $group statement is to make sure you don't have any
repetitions in any of your merchants attribute in a document. If
you are certain that in your individual documents you won't have any
repeated value for merchants, you need not include it.
The real work happens only upto the 2nd $match phase. The last two
phases ($group and $project) are only to prettify the result,
you may choose not use them, and instead use the language of your
choice to transform it in the form you want
Assuming you want to reduce the phases as per the points given above, the actual code will reduce to:
aggregate([
{ $match: { id: {$in: idsToMatch} } },
{ $unwind: "$merchants" },
{ $group: { _id: "merchants", count: {$sum: 1} } },
{ $match: { count: { $gte: idsToMatch.length } } }
])
Your required values will be at the _id attribute of each element of the result array.
The answer provided by #jgr0 is correct to some extent. The only mistake is the intermediate match operation
(i) So if input ids are "1" & "0" then the query becomes
aggregate([
{"$match":{"id":{"$in":["1","0"]}}},
{"$unwind":"$merchants"},
{"$group":{"_id":"$merchants","count":{"$sum":1}}},
{"$match":{"count":{"$eq":2}}},
{"$group":{"_id":null,"merchants":{"$push":"$_id"}}},
{"$project":{"_id":0,"merchants":1}}
])
(ii) So if input ids are "1", "0" & "2" then the query becomes
aggregate([
{"$match":{"id":{"$in":["1","0", "2"]}}},
{"$unwind":"$merchants"},
{"$group":{"_id":"$merchants","count":{"$sum":1}}},
{"$match":{"count":{"$eq":3}}},
{"$group":{"_id":null,"merchants":{"$push":"$_id"}}},
{"$project":{"_id":0,"merchants":1}}
])
The intermediate match operation should be the count of ids in input. So in case (i) it is 2 and in case (2) it is 3.

MongoDB find a given field and get average value

I'd like to get the average in a collection for a given property value. What am I doing wrong?
[{name:'Bob',city:'Barcelona',trips: 1 },
{name:'Bruce',city:'Barcelona',trips: 5 },
{name:'Bruno',city:'València',trips: 2 },
{name:'Bart',city:'Barcelona',trips: 3 }]
db.x.aggregate([{$group:{city:'Barcelona', $avg:"$trips"}}]);
You need to filter the documents using the $match operator i.e. create a pipeline before the $group operator which will filter all the documents in the collection based on the given city value.
In the preceding $group operator pipeline, you can then use a null key (as denoted by the _id field) to group all the documents from the previous pipeline and get the accumulated average:
db.x.aggregate([
{ "$match": { "city": "Barcelona" } },
{ "$group": { "_id": null, "$avg": "$trips" } }
]);
Another approach (not as optimal as the above) would be to group all the documents in the collection by the city key and then filter afterwards:
db.x.aggregate([
{ "$group": { "_id": "$city", "$avg": "$trips" } },
{ "$match": { "_id": "Barcelona" } }
]);