I'm using the MongoDB shell to fetch some results, ordered. Here's a sampler,
{
"_id" : "32022",
"topics" : [
{
"weight" : 281.58551703724993,
"words" : "some words"
},
{
"weight" : 286.6695125796183,
"words" : "some more words"
},
{
"weight" : 289.8354232846977,
"words" : "wowz even more wordz"
},
{
"weight" : 305.70093587160807,
"words" : "WORDZ"
}]
}
what I want to get is, same structure, but ordered by "topics" : []
{
"_id" : "32022",
"topics" : [
{
"weight" : 305.70093587160807,
"words" : "WORDZ"
},
{
"weight" : 289.8354232846977,
"words" : "wowz even more wordz"
},
{
"weight" : 286.6695125796183,
"words" : "some more words"
},
{
"weight" : 281.58551703724993,
"words" : "some words"
},
]
}
I managed to get some ordered results, but no luck in grouping them by id field. is there a way to do this?
MongoDB doesn't provide a way to do this out of the box but there is a workaround which is to update your documents and use the $sort update operator to sort your array.
db.collection.update_many({}, {"$push": {"topics": {"$each": [], "$sort": {"weight": -1}}}})
You can still use the .aggregate() method like this:
db.collection.aggregate([
{"$unwind": "$topics"},
{"$sort": {"_id": 1, "topics.weight": -1}},
{"$group": {"_id": "$_id", "topics": {"$push": "$topics"}}}
])
But this is less efficient if all you want is sort your array, and you definitely shouldn't do that.
You could always do this client side using the .sort or sorted function.
If you don't want to update but only get documents, you can use the following query
db.test.aggregate(
[
{$unwind : "$topics"},
{$sort : {"topics.weight":-1}},
{"$group": {"_id": "$_id", "topics": {"$push": "$topics"}}}
]
)
It works for me:
db.getCollection('mycollection').aggregate(
{$project:{topics:1}},
{$unwind:"$topics"},
{$sort :{"topics.words":1}})
Related
I have below collection, need to find duplicate records in mongo, how can we find that as below is one sample of collection we have around more then 10000 records of collections.
/* 1 */
{
"_id" : 1814099,
"eventId" : "LAS012",
"eventName" : "CustomerTab",
"timeStamp" : ISODate("2018-12-31T20:09:09.820Z"),
"eventMethod" : "click",
"resourceName" : "CustomerTab",
"targetType" : "",
"resourseUrl" : "",
"operationName" : "",
"functionStatus" : "",
"results" : "",
"pageId" : "CustomerPage",
"ban" : "290824901",
"jobId" : "87377713",
"wrid" : "87377713",
"jobType" : "IBJ7FXXS",
"Uid" : "sc343x",
"techRegion" : "W",
"mgmtReportingFunction" : "N",
"recordPublishIndicator" : "Y",
"__v" : 0
}
We can first find the unique ids using
const data = await db.collection.aggregate([
{
$group: {
_id: "$eventId",
id: {
"$first": "$_id"
}
}
},
{
$group: {
_id: null,
uniqueIds: {
$push: "$id"
}
}
}
]);
And then we can make another query, which will find all the duplicate documents
db.collection.find({_id: {$nin: data.uniqueIds}})
This will find all the documents that are redundant.
Another way
To find the event ids which are duplicated
db.collection.aggregate(
{"$group" : { "_id": "$eventId", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }
)
To get duplicates from db, you need to get only the groups that have a count of more than one, we can use the $match operator to filter our results. Within the $match pipeline operator, we'll tell it to look at the count field and tell it to look for counts greater than one using the $gt operator representing "greater than" and the number 1. This looks like the following:
db.collection.aggregate([
{$group: {
_id: {eventId: "$eventId"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
]);
I assume that eventId is a unique id.
I have a collection like this:
{
_id : 123,
username : "xy",
comments : [
{
text : "hi",
postdate : 123456789
},
{
text : "hi1",
postdate : 555555555
},
{
text : "hi2",
postdate : 666666666
},
{
text : "hi3",
postdate : 987654321
}
]}
Now I want only the comments that have postdate 555555555 or higher and 987654321 or lower. I have this query, but it doesn't work:
db.post.aggregate([
{$match : {$and : [
{"_id": ObjectId("123")},
{"comments.posttime" : {$lte : 987654321}},
{"comments.posttime" : {$gte : 555555555}}
]}}
,{$unwind: "$comments"}]).pretty();
But when I try this it gets me all of the array elements. How should this be done?
Thank you!
Use $redact to Restricts the contents of the document,
db.an.aggregate([{
$redact: {
"$cond": [{
$and: [{
"$gte": [{
"$ifNull": ["$postdate", 555555555]
},
555555555
]
}, {
"$lte": [{
"$ifNull": ["$postdate", 987654321]
},
987654321
]
}]
}, "$$DESCEND", "$$PRUNE"]
}
}]).pretty()
you have to unwind the comments first and then do the match. so that comments array will be flattened and match condition can filter it properly.
[{$unwind: "$comments"},{$match : {$and : [
{"_id": ObjectId("123")},
{"comments.posttime" : {$lte : 987654321}},
{"comments.posttime" : {$gte : 555555555}}
]}}]
this will give one row for each comment, if you want the matching comments inside the array, use aggregate on _id and $push the comments
i new in MongoDB and i have a documment in mongo with this structure:
{
"game" : "chess",
"name" : "Chess",
"prizes" : [
{
"_id" : ObjectId("562575c61d41c81efce7bc1b"),
"group" : 0,
"name" : "10 coins",
"pos" : "0",
},
{
"_id" : ObjectId("562575c61d41c81efce7bc1c"),
"group" : 0,
"name" : "10 coins",
"pos" : "1",
},
{
"_id" : ObjectId("562575c61d41c81efce7bc1d"),
"group" : 1,
"name" : "20 coins",
"pos" : "2",
},
}
I need to get some like this:
{
"game" : "chess",
"name" : "Chess",
"prizes" : [
{
"name" : "10 coins",
"group" : 0,
},
{
"name" : "20 coins",
"group" : 1,
}
I try with aggregate and projection and unwind but i don't get the structure that i need, thanks.
Basically I'm unwinding and then getting the prizes array down to just the fields you want based on your questions desired output. then group them all together using the game and name and using $addToSet to insert unique entires of group and name into the array. Im making the assumption of no duplicate games but I think it would work anyway since they would just all get grouped in together.
db.docs.aggregate([
{$unwind : '$prizes'},
{$project: {name: '$name', game: '$game', prizes: {group: '$prizes.group', name: '$prizes.name'}}},
{$group: {_id: {name: '$name', game: '$game'}, prizes: {$addToSet: '$prizes'}}},
])
You can try like below. Use $map with some set operation $setUnion or $setDifference, only needed to filter duplicates while transforming the shape.
aggregate([{
"$project": {
"game": 1,
"name": 1,
"prizes": {
"$setUnion": [{
"$map": {
"input": "$prizes",
"as": "prize",
in: {
"name": "$$prize.name",
"group": "$$prize.group"
}
}
},
[]
]
}
}
}]).pretty();
The question is Calculate the average age of the users who have more than 3 strengths listed.
One of the data is like this :
{
"_id" : 1.0,
"user_id" : "jshaw0",
"first_name" : "Judy",
"last_name" : "Shaw",
"email" : "jshaw0#merriam-webster.com",
"age" : 39.0,
"status" : "disabled",
"join_date" : "2016-09-05",
"last_login_date" : "2016-09-30 23:59:36 -0400",
"address" : {
"city" : "Deskle",
"province" : "PEI"
},
"strengths" : [
"star schema",
"dw planning",
"sql",
"mongo queries"
],
"courses" : [
{
"code" : "CSIS2300",
"total_questions" : 118.0,
"correct_answers" : 107.0,
"incorect_answers" : 11.0
},
{
"code" : "CSIS3300",
"total_questions" : 101.0,
"correct_answers" : 34.0,
"incorect_answers" : 67.0
}
]
}
I know I need to count how many strengths this data has, and then set it to $gt, and then calculate the average age.
However, I don't know how to write 2 function which are count and average in one query. Do I need to use aggregation, if so, how?
Thanks so much
Use $redact to match your array size & $group to calculate the average :
db.collection.aggregate([{
"$redact": {
"$cond": [
{ "$gt": [{ "$size": "$strengths" }, 3] },
"$$KEEP",
"$$PRUNE"
]
}
}, {
$group: {
_id: 1,
average: { $avg: "$age" }
}
}])
The $redact part match the size of strenghs array greater than 3, it will $$KEEP record that match this condition otherwise $$PRUNE the record that don't match. Check $redact documentation
The $group just perform an average with $avg
I have the following MongoDB data model:
{
"_id" : ObjectId("53725814740fd6d2ee0ca2bb"),
"date" : "2014-01-01",
"establishmentId" : 1,
"products" : [
{
"productId" : 1,
"price" : 7.03,
"someOtherInfo" : 325,
"somethingElse" : 6878
},
{
"productId" : 2,
"price" : 4.6,
"someOtherInfo" : 243,
"somethingElse" : 1757
},
{
"productId" : 3,
"price" : 2.14,
"someOtherInfo" : 610,
"somethingElse" : 5435
},
{
"productId" : 4,
"price" : 1.45,
"someOtherInfo" : 627,
"somethingElse" : 5762
},
{
"productId" : 5,
"price" : 3.9,
"someOtherInfo" : 989,
"somethingElse" : 3752
}
}
What is the fastest way to get the average price across all establishments? Is there a better data model to achieve this?
An aggregation operation should handle this well. I'd suggest looking into the $unwind operation.
Something along these lines should work (just as an example):
db.collection.aggregate(
{$match: {<query parameters>}},
{$unwind: "$products"},
{
$group: {
_id: "<blank or field(s) to group by before averaging>",
$avg: "$price"
}
}
);
An aggregation built in this style should produce a JSON object that has the data you want.
Due to the gross syntax errors in anything else provided the more direct answer is:
db.collection.aggregate([
{ "$unwind": "$products" },
{ "$group": {
"_id": null,
"avgprice": { "$avg": "$products.price" }
}}
])
The usage of the aggregation framework here is to first $unwind the array, which is a way to "de-normalize" the content in the array into separate documents.
Then in the $group stage you pass in a value of null to the _id which means "group everything" and pass your $products.price ( note the dot notation ) in to the $avg operator to return the total average value across all of the sub-document entries in all of your documents in the collection.
See the full operator reference for more information.
The best solution I found was:
db.collection.aggregate([
{$match:{date:{$gte:"2014-01-01",$lte:"2014-01-31"},establishmentId:{$in:[1,2,3,4,5,6]}}
{ "$unwind": "$products" },
{ "$group": {
"_id": {date:"$date",product:"$products.productId"},
"avgprice": { "$avg": "$products.price" }
}}
])
And something I found out also was that it is much better to first use match and then unwind so there are fewer items to unwind. This results in a faster overall process.