I have a simple query that's giving me unexpected results when I query a public mongodb server,
mongodb://steemit:steemit#mongo1.steemdata.com:27017/SteemData
db.getCollection('Posts')
.aggregate([
{ $match: { "created" : { $gte: ISODate("2017-10-01T00:00:00.000Z"), $lt: ISODate("2017-10-30T00:00:00.000Z") } } },
{ $group: { "_id": "author", "itemcount": { "$sum": 1 } } },
{ $sort: { "itemcount": -1 } }
])
I'm trying to get a count of the number of articles by author for the month of October.
I would hope to see a result set something like,
Bob 10
Joe 9
Sam 3
Tim 1
Instead I'm getting,
author 23
Can someone explain what's wrong with my aggregation pipeline?
Thanks
There is a small issue with your group stage; the name of the field in the group stage ('author') should be prefixed with a dollar sign, like in this example in the documentation for $group.
Try updating the group stage to the following:
{ $group: { "_id": "$author", "itemcount": { "$sum": 1 } } },
Seems it's nothing to do with the query...
This one run at 5:32pm (ignore the single quotes, I tried with both)
And then a few mins later at 5:41pm
Related
I have a collection "TokenBalance" like this holding documents of this structure
{
_id:"SvVV1qdUcxNwSnSgxw6EG125"
balance:Array
address:"0x6262998ced04146fa42253a5c0af90ca02dfd2a3"
timestamp:1648156174658
_created_at:2022-03-24T21:09:34.737+00:00
_updated_at:2022-03-24T21:09:34.737+00:00
}
Each address has multiple documents like of structure above based on timestamps.
So address X can have 1000 objects with different timestamps.
What I want is to only get the last created documents per address but also pass all the document fields into the next stage which is where I am stuck. I don't even know if the way I am grouping is correctly done with the $last operator. I would appreciate some guidance on how to achieve this task.
What I have is this
$group stage (1st stage)
{
_id: '$address',
timestamp: {$last: '$timestamp'}
}
This gives me a result of
_id:"0x6262998ced04146fa42253a5c0af90ca02dfd2a3"
timestamp:1648193827320
But I want the other fields of each document as well so I can further process them.
Questions
1) Is it the correct way to get the last created document per "address" field?
2) How can I get the other fields into the result of that group stage?
Use $denseRank
db.collection.aggregate([
{
$setWindowFields: {
partitionBy: "$address",
sortBy: { timestamp: -1 },
output: { rank: { $denseRank: {} } }
}
},
{
$match: { rank: 1 }
}
])
mongoplayground
I guess you mean this:
{ $group: {
_id: '$address',
timestamp: {$last: '$timestamp'},
data: { $push: "$$ROOT" }
} }
If the latest timestamp is also the last sorted by _id you can use something like this:
[{$group: {
_id: '$_id',
latest: {
$last: '$$ROOT'
}
}}, {$replaceRoot: {
newRoot: '$latest'
}}]
let´s say I have a document of blog posts that has fields "_id, userName, age". a user could have made more than one blog post, I want to find the users that have made 4 posts.
db.blogs.aggregate([{$group: {_id: {"$userName", "age"}, : {$sum: ""eq", 3}}])
To know how many times a field comes, use a $group stage to group by that field and add an extra field for the count using the $count operator. Then if you want to filter by that count, just add a $match stage to filter by that new field:
{
$group: {
"_id": "$username",
"count": {
$count: {}
}
}
},
{
$match: {
"count": {
$gte: 4
}
}
}
Mongo playground
I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.
I m a MongoDB begginer and I have the following problem:
I have a document format(sorry for lack of definition) as follows in MongoDB:
And I want to query the top 10 albums of the worst genre of a decade I choose.
Firstly I did an aggregate that gave me in the last stage the worst genre of the decade I choose to use as comparison later (BDA1 being my database and album my collection I want to aggregate and find on):
BDA1.album.aggregation(
[
{
$addFields: {
release_date: {
$toDate: "$release_date"
}
}
},
{
$addFields: {
sales_amount: {
$convert: {
input: "$sales_amount",
to: "int"
}
}
}
},
{
$match: {
"release_date": {
$gte: new ISODate("2009-01-01"),
$lt: new ISODate("2021-01-01")
}
}
},
{
$unwind: {
path: "$band.band_genre",
}
},
{
$group: {
_id: "$band.band_genre",
total: {
$sum: "$sales_amount"
}
}
},
{
$sort: {
total: 1
}
},
{
$limit: 1
}
])
(Sorry for the lack of good formatting but I took the code from a pipeline I used to do the aggregation in MongoDB Compass.)
That resulted in:
But my question now is: how do I do to use that aggregate result in what I can only assume is a find command where band.band_genre equals to the genre I just calculated in the aggregation?
I have been searching SO for a while with no results and google as well.
Any suggestions?
(Anything that I have forgot to mention that u feel is important to understand the problem please say and I will edit it in)
I'm a begginer in MongoDB and I'm practising the aggregation method. I'm my example, I would like to get the wine which has been produced in the last 5 years (5 years back from the newest wine), then I would like to count how many wines have been produced in that period of time (the database gives us the year of the wines, in an integer)
I believe that first, I have to sort the wines by year, then I should get the year of the newest wine, and sustract five years, using that period of time to count the wines. But I don't know how to write all of that using the aggregation code.
Thanks!
You need to use various aggregation pipeline stages to transform your data.
MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.
As you have mentioned,
First you have to get the year of the newest wine.
I have used $group to group the data, and $max is used to get newestWineYear and entire documents($$ROOT) is pushed to data by using $push
Stage1
{
$group: {
_id: null,
"newestWineYear": {
$max: "$year"
},
data: {
$push: "$$ROOT"
}
}
}
The output of the first stage contains the entire documents in an array which we had named asdata and the newestWineYear
So, inorder to flatten the data array $unwind is used.
Stage2
{
$unwind: "$data"
}
Get the count of the wines that has been produced in the last 5 years.
I have used $group to get the count, and count is obtained using $sum.
Stage3
{
$group: {
_id: null,
count: {
"$sum": {
$cond: [
{
"$gte": [
"$data.year",
{
"$subtract": [
"$newestWineYear",
5
]
}
]
},
1,
0
]
}
}
}
}
A condition is added to $sum to count only the wines that has been produced in the last 5 years.
Condition is available inside $cond.
It says:
If "data.year" >= [ "$newestWineYear" - 5 ], then add 1 to count, else add 0
data.year is used because we had pushed year of the wines to the data array in our first stage of aggregation pipeline.
Final aggregation query can be found here: Playground
Alternative method can be found here without a $cond inside $group but a $match stage is introduced to get only the wines that has been produced in the last 5 years.
As you guessed, you need to use mongo's aggregation framework.
Start with a simple pipeline with 3 simple steps:
group with $ group to get last year's data (returns an array)
remove array and get single documents ($ unwind operator)
filter documents with the $ match operator
Finally, you can replace the root of your documents to have better formed results.
Imagine having data like this:
[
{
"wine": "Red xxx",
"year": 2018
},
{
"wine": "Red yyy",
"year": 2017
},
{
"wine": "Red zzz",
"year": 2017
},
{
"wine": "White 1",
"year": 2016
},
{
"wine": "White 2",
"year": 2013
},
{
"wine": "White 3",
"year": 2017
},
{
"wine": "White 4",
"year": 2009
}
]
Here's the pipeline and here you can see the results in playground.
db.collection.aggregate({
$group: {
_id: null,
"lastYear": {
$max: "$year"
},
data: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$data"
},
{
$match: {
$expr: {
$gte: [
"$data.year",
{
"$subtract": [
"$lastYear",
5
]
}
]
}
}
},
{
"$replaceWith": "$data"
})