How to find all the the ['id'] by month in pymongo? - mongodb

I have a huge dataset consisting of collections with fields like this
{"id":"f3fd1b6c",
"originalVersion":"v2",
"rotation":[{"0.5"},{"-0.5"},{"-0.5"},{"-0.5"}],
"scale":[{"1.0"},{""1.0"},{""1.0"}],
"translation":[{"-2.8820719718933105"},{"11.548246383666992"},{"0.0"}],
"timestamp":"2020-03-27T13:28:09.883+00:00"
I want to get all the field ids that were created in same month.
So far I have tried using "find" with exact timestamp query
db.collection.find({'timestamp':date})
But I want to get all the elements that were created in same month,

If you are going to search records by a given month, you can do a simple find with $month
db.collection.find({
$expr: {
$eq: [
{
$month: "$timestamp"
},
3
]
}
})
Here is the Mongo playground for your reference.
If you want to group by month and group the ids together, you can do like this.
db.collection.aggregate([
{
$group: {
_id: {
"$month": "$timestamp"
},
idsToFetch: {
"$push": "$id"
}
}
}
])
Here is the Mongo playground for your reference.

Related

MongoDB get only the last documents per grouping based on field

I have a collection "TokenBalance" like this holding documents of this structure
{
_id:"SvVV1qdUcxNwSnSgxw6EG125"
balance:Array
address:"0x6262998ced04146fa42253a5c0af90ca02dfd2a3"
timestamp:1648156174658
_created_at:2022-03-24T21:09:34.737+00:00
_updated_at:2022-03-24T21:09:34.737+00:00
}
Each address has multiple documents like of structure above based on timestamps.
So address X can have 1000 objects with different timestamps.
What I want is to only get the last created documents per address but also pass all the document fields into the next stage which is where I am stuck. I don't even know if the way I am grouping is correctly done with the $last operator. I would appreciate some guidance on how to achieve this task.
What I have is this
$group stage (1st stage)
{
_id: '$address',
timestamp: {$last: '$timestamp'}
}
This gives me a result of
_id:"0x6262998ced04146fa42253a5c0af90ca02dfd2a3"
timestamp:1648193827320
But I want the other fields of each document as well so I can further process them.
Questions
1) Is it the correct way to get the last created document per "address" field?
2) How can I get the other fields into the result of that group stage?
Use $denseRank
db.collection.aggregate([
{
$setWindowFields: {
partitionBy: "$address",
sortBy: { timestamp: -1 },
output: { rank: { $denseRank: {} } }
}
},
{
$match: { rank: 1 }
}
])
mongoplayground
I guess you mean this:
{ $group: {
_id: '$address',
timestamp: {$last: '$timestamp'},
data: { $push: "$$ROOT" }
} }
If the latest timestamp is also the last sorted by _id you can use something like this:
[{$group: {
_id: '$_id',
latest: {
$last: '$$ROOT'
}
}}, {$replaceRoot: {
newRoot: '$latest'
}}]

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

MongoDB use $last with $cond

Is there any way to get the last non-zero value in an aggregation. Is that possible?
Scenario:
I have an events collection, in which I store all the events from users. I want to fetch the list of users with last purchased items cost is $1.99 and logged in at least once last week.
My events collection will have records like
{_id:ObjectId("58af54d5ab7df73d71822708"),uid:1,event:"login"}
{_id:ObjectId("58db7189296fdedde1c04bc1"),uid:2,event:"login"}
{_id:ObjectId("5888419bfa4b69dc4af7c76c"),uid:2,event:"purchase",amount:3}
{_id:ObjectId("5888419bfa4b69dc4af7d45c"),uid:1,event:"purchase",amount:1.9}
{_id:ObjectId("5888819bfa4b69dc4af7c76c"),uid:1,event:"custom",type:3,value:2}
What am trying to do:
db.events.aggregate([{
{
$group: {
_id: uid,
last_login: {
$max: {
$cond: [{
$eq: ['$event', 'login']
}, '$_id', 0]
}
},
last_amount: {
$last: {
$cond: [{
$eq: ['$event', 'login']
}, '$_id', 0]
}
}
}
}
}, {
$match: {
last_purchase: {
$gte: ObjectId("58af54d50000000000000000")
},
last_amount: 1.9
}
}])
which obviously will fail because last will have 0 as the last item.
The output am expecting is
{_id:1,last_login:_id:ObjectId("58af54d5ab7df73d71822708"),last_amount:1.9}
The query is system generated. Please help.
To answer your question, you cannot modify the behaviour of $last using $cond. i.e. 'not the last entry because of a criteria'.
There are few alternatives that you could try:
Alter your document schema for the access pattern, especially if this is a frequently used query in your application. You should utilise the schema as a leverage to optimise your application queries.
Use one of the MongoDB supported drivers to process the expected outcome.
Depending on the use case, you could execute 2 separate aggregation queries; first query all the users uid that logged in at least once last week, and second to query only those users that have the last purchased item value of $1.99.
I found a workaround.Instead of $last, I used $push in the $group and added $slice with $setDifference in $project to remove null values. The query will now look something like
db.events.aggregate([{
{
$group: {
_id: uid,
last_login: {
$max: {
$cond: [{
$eq: ['$event', 'login']
}, '$_id', 0]
}
},
last_amount: {
$push: {
$cond: [{
$eq: ['$event', 'login']
}, '$_id', null]
}
}
}
}
},
{$project:{last_login:{$slice:[{$setDifference,'$last_login',[null]},-1,1]}}}
, {
$match: {
last_purchase: {
$gte: ObjectId("58af54d50000000000000000")
},
last_amount: 1.9
}
}])

Mongo query using aggregation for dates

In the following query I'm trying to find entries in my articles collection made in the last week, sorted by the number of votes on that article. The $match doesn't seem to work(maybe I dont know how to use it). The following query works perfectly, so its not a date format issue,
db.articles.find(timestamp:{
'$lte':new Date(),
'$gte':new Date(ISODate().getTime()-7*1000*86400)}
})
But this one doesn't fetch any results. Without the $match it also fetches the required results(articles sorted by votecount).
db.articles.aggregate([
{
$project:{
_id:1,
numVotes:{$subtract:[{$size:"$votes.up"},{$size:"$votes.down"}]}}
},
{
$sort:{numVotes:-1}
},
{
$match:{
timestamp:{
'$lte':new Date(),
'$gte':new Date(ISODate().getTime()-7*1000*86400)}
}
}
])
You are trying to match at the end of your pipeline, which supposes you have projected timestamp field, and you haven't done that.
I believe what you want is to filter data before aggregation, so you should place match at the top of your aggregation array.
Try this:
db.articles.aggregate([{
$match: {
timestamp: {
'$lte': new Date(),
'$gte': new Date(ISODate().getTime() - 7 * 1000 * 86400)
}
}
}, {
$project: {
_id: 1,
numVotes: {
$subtract: [{
$size: "$votes.up"
}, {
$size: "$votes.down"
}]
}
}
}, {
$sort: {
numVotes: -1
}
}])

Mongodb query specific month|year not date

How can I query a specific month in mongodb, not date range, I need month to make a list of customer birthday for current month.
In SQL will be something like that:
SELECT * FROM customer WHERE MONTH(bday)='09'
Now I need to translate that in mongodb.
Note: My dates are already saved in MongoDate type, I used this thinking that will be easy to work before but now I can't find easily how to do this simple thing.
With MongoDB 3.6 and newer, you can use the $expr operator in your find() query. This allows you to build query expressions that compare fields from the same document in a $match stage.
db.customer.find({ "$expr": { "$eq": [{ "$month": "$bday" }, 9] } })
For other MongoDB versions, consider running an aggregation pipeline that uses the $redact operator as it allows you to incorporate with a single pipeline, a functionality with $project to create a field that represents the month of a date field and $match to filter the documents
which match the given condition of the month being September.
In the above, $redact uses $cond tenary operator as means to provide the conditional expression that will create the system variable which does the redaction. The logical expression in $cond will check
for an equality of a date operator field with a given value, if that matches then $redact will return the documents using the $$KEEP system variable and discards otherwise using $$PRUNE.
Running the following pipeline should give you the desired result:
db.customer.aggregate([
{ "$match": { "bday": { "$exists": true } } },
{
"$redact": {
"$cond": [
{ "$eq": [{ "$month": "$bday" }, 9] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
This is similar to a $project +$match combo but you'd need to then select all the rest of the fields that go into the pipeline:
db.customer.aggregate([
{ "$match": { "bday": { "$exists": true } } },
{
"$project": {
"month": { "$month": "$bday" },
"bday": 1,
"field1": 1,
"field2": 1,
.....
}
},
{ "$match": { "month": 9 } }
])
With another alternative, albeit slow query, using the find() method with $where as:
db.customer.find({ "$where": "this.bday.getMonth() === 8" })
You can do that using aggregate with the $month projection operator:
db.customer.aggregate([
{$project: {name: 1, month: {$month: '$bday'}}},
{$match: {month: 9}}
]);
First, you need to check whether the data type is in ISODate.
IF not you can change the data type as the following example.
db.collectionName.find().forEach(function(each_object_from_collection){each_object_from_collection.your_date_field=new ISODate(each_object_from_collection.your_date_field);db.collectionName.save(each_object_from_collection);})
Now you can find it in two ways
db.collectionName.find({ $expr: {
$eq: [{ $year: "$your_date_field" }, 2017]
}});
Or by aggregation
db.collectionName.aggregate([{$project: {field1_you_need_in_result: 1,field12_you_need_in_result: 1,your_year_variable: {$year: '$your_date_field'}, your_month_variable: {$month: '$your_date_field'}}},{$match: {your_year_variable:2017, your_month_variable: 3}}]);
Yes you can fetch this result within date like this ,
db.collection.find({
$expr: {
$and: [
{
"$eq": [
{
"$month": "$date"
},
3
]
},
{
"$eq": [
{
"$year": "$date"
},
2020
]
}
]
}
})
If you're concerned about efficiency, you may want to store the month data in a separate field within each document.