Counting data per user with mongo aggregation framework - mongodb

I have a collection, where each document contains user_ids as a property, which is an Array field. Example document(s) would be :
[{
_id: 'i3oi1u31o2yi12o3i1',
unique_prop: 33,
prop1: 'some string value',
prop2: 212,
user_ids: [1, 2, 3 ,4]
},
{
_id: 'i3oi1u88ffdfi12o3i1',
unique_prop: 34,
prop1: 'some string value',
prop2: 216,
user_ids: [2, 3 ,4]
},
{
_id: 'i3oi1u8834432ddsda12o3i1',
unique_prop: 35,
prop1: 'some string value',
prop2: 211,
user_ids: [2]
}]
My goal is to get number of documents per user, so sample output would be :
[
{user_id: 1, count: 1},
{user_id: 2, count: 3},
{user_id: 3, count: 2},
{user_id: 4, count: 2}
]
I've tried couple of things none of which worked, lastly I tried :
aggregate([
{ $group: {
_id: { unique_prop: "$unique_prop"},
users: { "$addToSet": "$user_ids" },
count: { "$sum": 1 }
}}
]
But it just returned the users per document. I m still trying to learn the any resource or advice would help.

You need to $unwind the "user_ids" array and in the $group stage count the number of time each "id" appears in the collection.
db.collection.aggregate([
{ "$unwind": "$user_ids" },
{ "$group": { "_id": "$user_ids", "count": {"$sum": 1 }}}
])

MongoDB aggregation performs computation on group of values from documents in a collection and return computed result through executing its stages in a pipeline.
According to above mentioned description please try executing following aggregate query in MongoDB shell.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: "$user_ids"
},
// Stage 2
{
$group: {
_id:{user_id:'$user_ids'},
total:{$sum:1}
}
},
// Stage 3
{
$project: {
_id:0,
user_id:'$_id.user_id',
count:'$total'
}
},
]
);
In above aggregate query initially $unwind operator breaks an array field user_ids of each document into multiple documents for each element of array field and then it groups documents by value of user_ids field contained into each document and performs summation of documents for each value of user_ids field.

Related

MongoDB - How to use $bucketAuto aggregation where the buckets are grouped by another property

I need to create an aggregation pipeline that return price ranges for each product category.
What I need to avoid is to load all available categories and call the Database again, one by one with a $match on each category. There must be a better way to do it.
Product documents
{
Price: 500,
Category: 'A'
},
{
Price: 7500,
Category: 'A'
},
{
Price: 340,
Category: 'B'
},
{
Price: 60,
Category: 'B'
}
Now I could use a $group stage to group the prices into an array by their category.
{
_id: "$Category",
Prices: {
$addToSet: "$Price"
}
}
Which would result in
{
_id: 'A',
Prices: [500, 7500]
},
{
_id: 'B',
Prices: [340, 60]
}
But If I use $bucketAuto stage after this, I am unable to groupBy multiple properties. Meaning it would not take the categories into account.
I have tried the following
{
groupBy: "$Prices",
buckets: 5,
output: {
Count: { $sum: 1}
}
}
This does not take categories into account, but I need the generated buckets to be organised by category. Either having the category field within the _id as well or have it as another field and have 5 buckets for each distinct category:
{
_id: {min: 500, max: 7500, category: 'A'},
Count: 2
},
{
_id: {min: 60, max: 340, category: 'B'},
Count: 2
}...
Query1
if you want to group by category and find the max and min price for that category you can do it like this
Playmongo
aggregate(
[{"$group":
{"_id": "$Category",
"min-price": {"$min": "$Price"},
"max-price": {"$max": "$Price"}}}])
Query2
if you want to group by category and then apply the bucket inside the array of the prices, to create like 5 buckets like in your example
you can do it with a trick, that allows us to use stage operators to do operators inside the array
the trick is to have 1 extra collection with only 1 document [{}]
you do lookup, you unwind that array, you do what you want on it
here we unwind the array and do $bucketAuto on it, with 5 buckets, like in your example, this way we can have group by category, and the prices in 5 ranges (5 buckets)
Playmongo
aggregate(
[{"$group": {"_id": "$Category", "prices": {"$push": "$Price"}}},
{"$lookup":
{"from": "coll_with_1_empty_doc",
"pipeline":
[{"$set": {"prices": "$$prices"}}, {"$unwind": "$prices"},
{"$bucketAuto": {"groupBy": "$prices", "buckets": 5}}],
"as": "bucket-prices",
"let": {"prices": "$prices", "category": "$_id"}}}])
If none of the above works, if you can give sample documents and example output

How to retrieve specific keys when grouping on mongo while using $max on a field?

How can i retrieve keys beyond the grouped ones from mongodb?
Documents example:
{code: 'x-1', discount_value: 10, type: 1}
{code: 'x-2', discount_value: 8, type: 1}
{code: 'x-3', discount_value: 5, type: 2}
Query:
{
$match: {
type: 1
}
},
{
$group: {
_id: null
discount_value: {$max: '$discount_value'}
}
}
This query will retrieve the max value from discount_value (10) key and the key _id but how i can do to retrieve the code and type key as well if i don't have operation to do those keys?
The current result:
{_id: null, discount_value: 10}
Expected result:
{_id: null, discount_value: 10, type: 1, code: 'x-1'}
You can try below query :
db.collection.aggregate([
{
$match: { type: 1 }
},
{
$group: {
_id: null,
doc: {
$max: {
discount_value: "$discount_value",
type: "$type",
code: "$code"
}
}
}
}
])
I believe it would get $max on field discount_value and get respective type & code values from the doc where discount_value is max.
In another way, since you're using $match as first stage, I believe your data will be less enough to perform $sort efficiently :
db.collection.aggregate([
{
$match: { type: 1 }
},
{
$sort: { discount_value: -1 } // sort in desc order
},
{
$limit: 1
}
])
Test : mongoplayground
Note :
Test the first query on DB itself rather than in playground. In first query you can use $replaceRoot as last stage if you wanted to make doc field as root of your document.

MongoDB Find values passed in that don't match

Currently stuck with an issue using MongoDB aggregation. I have a array of '_ids' that I need to check exist in a specific collection.
Example:
I have 3 records in 'Collection 1' with _id 1,2,3. I can find the matching values using:
$match: {
_id: {
$in: [1, 2, 3, 4]
}
}
However what I want to know is from the values I have passed in (1,2,3,4). Which ones don't match up to a record. (In this case _id 4 will not have a matching record)
So instead of returning records with _id 1, 2, 3. It needs to return the _id that doesn't exist. So in this example '_id: 4'
The query should also disregard any extra records in the collection. Example, if the collection held records with ID 1-10, and I passed in a query to determine if the _ids: 1, 7, 15 existed. The the value i'm expecting would be along the lines of ' _id: 15 doesn't exist
The first thought was to use to use $project within a aggregation to hold each _id that was passed in, and then attach each record in the collection. To the matching _id passed in. E.g:
Record 1:
{
_id: 1,
Collection1: [
record details: ...,
...
...
]
},
{
_id: 2,
Collection1: [] // This _id passed in, doesn't have a matching collection
}
However cant seem to get a working example in this instance. Any help would be appreciated!
If the input documents are:
{ _id: 1 },
{ _id: 2 },
{ _id: 5 },
{ _id: 10 }
And the array to match is:
var INPUT_ARRAY = [ 1, 7, 15 ]
The following aggregation:
db.test.aggregate( [
{
$match: {
_id: {
$in: INPUT_ARRAY
}
}
},
{
$group: {
_id: null,
matches: { $push: "$_id" }
}
},
{
$project: {
ids_not_exist: { $setDifference: [ INPUT_ARRAY, "$matches" ] },
_id: 0
}
}
] )
Returns:
{ "ids_not_exist" : [ 7, 15 ] }
Are you looking for $not ?
MDB Docs

Remove duplicate records from mongodb 4.0

i have a collection which has duplicate records. I am using mongodb 4.0. How do i remove the duplicate records from the entire collection?
the record are getting inserted with the following structure
{ item: "journal", qty: 25, size:15 , status: "A" }
All i need is to have unique records for one document.
You can group duplicated records using aggregation pipeline:
db.theCollection.aggregate([
{$group: {_id: {item: "$item", qty: "$qty", size: "$size", status: "$status"}}},
{$project: {_id: 0, item: "$_id.item", qty: "$_id.qty", size: "$_id.size", status: "$_id.status"}},
{$out: "theCollectionWithoutDuplicates"}
])
After the execution of aggregation pipeline, the theCollectionWithoutDuplicates collection contains a document for each group of original duplicated documents, with a new _id - you can verify the output, removing original collection (db.theCollection.drop()) and rename the new collection (db.theCollectionWithoutDuplicates.renameCollection('theCollection')). Drop and rename can be combined in db.theCollectionWithoutDuplicates.renameCollection('theCollection', true).
EXPLANATION of aggregation pipeline usage:
db.theCollection.aggregate([]) executes an aggregation pipeline, receiving a list of aggregation stages to be executed
the $group stage groups document by fields specified as subsequent _id field
the $project stage changes field names, flattening nested _id subdocuments produced by $group
the $out stage stores aggregation resulting documents into given collection
You can remove duplicated records using forEach:
db.collection.find({}, { item: 1, qty: 1, size: 1, status: 1 }).forEach(function(doc) {
db.collection.remove({_id: { $gt: doc._id }, item: doc.item, qty: doc.qty, size: doc.size, status: doc.status })
})
I recently create a code to delete duplicated documents from MongoDB, this should work:
const query = [
{
$group: {
_id: {
field: "$field",
},
dups: {
$addToSet: "$_id",
},
count: {
$sum: 1,
},
},
},
{
$match: {
count: {
$gt: 1,
},
},
},
];
const cursor = collection.aggregate(query).cursor({ batchSize: 10 }).exec();
cursor.eachAsync((doc, i) => {
doc.dups.shift(); // First element skipped for deleting
doc.dups.map(async (dupId) => {
await collection.findByIdAndDelete({ _id: dupId });
});
});

MongoDB aggregation but not including certain items

I'm very new to MongoDB's aggregation framework, so I do not know properly how to do this.
I have a data model that is structured like this:
{
name: String,
store: {
item1: Number,
item2: Number,
item3: Number,
item4: Number,
},
createdAt: Date
}
I want to return the average price of every item'i'. I'm trying with this query:
db.commerces.aggregate([
{
$group: {
_id: "",
item1Avg: { $avg: "$store.item1"},
item2Avg: { $avg: "$store.item2"},
item3Avg: { $avg: "$store.item3"},
item4Avg: { $avg: "$store.item4"}
}
}
]);
The problem is that when an item has no price set, it's stored in the database as a "-1".
I don't want these values to pollute the average result. Is there any way to limit the agreggation to only take into account when price is > 0.
$match operator before $group is not a solution because I want to return all the average prices.
Thank you!
EDIT: Here you have of an example of the input & desired output:
[{
name: 'name',
store: {
item1: 10,
item2: -1,
item3: 12,
item4: 3,
}
},
{
name: 'name2',
store: {
item1: 10,
item2: -1,
item3: -1,
item4: 2,
}
},...]
An the desired output:
{
item1Avg: 10,
item2Avg: 0,
item3Avg: 12,
item4Avg: 2.5
}
You need to $unwind the store, then $match values to meet your condition, then $group ones that passed the test. Unfortunately there is no way to $unwind an object, so you need to $project it to array first:
db.commerces.aggregate([
{$project: {store:[
{item:{$literal:"item1"}, val:"$store.item1"},
{item:{$literal:"item2"}, val:"$store.item2"},
{item:{$literal:"item3"}, val:"$store.item3"},
{item:{$literal:"item4"}, val:"$store.item4"}
]}},
{$unwind:"$store"},
{$match: {"store.val":{$gt:0}}},
{$group: {_id:"$store.item", avg:{$avg:"$store.val"}}}
])
EDIT:
As #blakes-seven pointed, it may not work on versions < 3.2. An alternative approach with $map may work:
db.commerces.aggregate([
{$project: {
store: {
$map:{
input:[
{item:{$literal:"item1"}, val:"$store.item1"},
{item:{$literal:"item2"}, val:"$store.item2"},
{item:{$literal:"item3"}, val:"$store.item3"},
{item:{$literal:"item4"}, val:"$store.item4"}
],
as: "i",
in: "$$i"
}
}
}},
{$unwind:"$store"},
{$match: {"store.val":{$gt:0}}},
{$group: {_id:"$store.item", avg:{$avg:"$store.val"}}}
])