mongo: find non-superseded documents - mongodb

I have a collection with documents like:
{
"_id" : "ThisIsASampleId_rand12345",
"timestamp" : ISODate("2019-04-30T10:53:34.515Z"),
"mySpecialId" : "specialId_12345",
"status" : "error",
}
My goal is to find all documents with {status: 'error'}, so long as no subsequent documents exist with the same mySpecialId and status 'success'.
Clearly I can do db.jobs.find({status: 'error'}), but after that, I get lost.
Do I need to do a $lookup in an aggregation pipeline into the same collection, using "mySpecialId" as both local and foreign fields, with a $match that includes something like {$gt: {timestamp: $PREVIOUS_TIMESTAMP}}? That feels wrong, somehow.
Is there a simpler/better/more elegant way to do this?

You can $sort your collection by timestamp field and then run $group with $last operator to get most recent document for each mySpecialId. Then you can simply check if that last document's status is error. If not then either all documents in this group had success or error appeared by was superseded with success. To get back original shape of your documents you can use $replaceRoot.
db.col.aggregate([
{
$sort: { timestamp: 1 }
},
{
$group: {
_id: "$mySpecialId",
lastDoc: { $last: "$$ROOT" }
}
},
{
$match: {
"lastDoc.status": "error"
}
},
{
$replaceRoot: {
newRoot: "$lastDoc"
}
}
])

Related

MongoDB sort by value in embedded document array

I have a MongoDB collection of documents formatted as shown below:
{
"_id" : ...,
"username" : "foo",
"challengeDetails" : [
{
"ID" : ...,
"pb" : 30081,
},
{
"ID" : ...,
"pb" : 23995,
},
...
]
}
How can I write a find query for records that have a challengeDetails documents with a matching ID and sort them by the corresponding PB?
I have tried (this is using the NodeJS driver, which is why the projection syntax is weird)
const result = await collection
.find(
{ "challengeDetails.ID": challengeObjectID},
{
projection: {"challengeDetails.$": 1},
sort: {"challengeDetails.0.pb": 1}
}
)
This returns the correct records (documents with challengeDetails for only the matching ID) but they're not sorted.
I think this doesn't work because as the docs say:
When the find() method includes a sort(), the find() method applies the sort() to order the matching documents before it applies the positional $ projection operator.
But they don't explain how to sort after projecting. How would I write a query to do this? (I have a feeling aggregation may be required but am not familiar enough with MongoDB to write that myself)
You need to use aggregation to sort n array
$unwind to deconstruct the array
$match to match the value
$sort for sorting
$group to reconstruct the array
Here is the code
db.collection.aggregate([
{ "$unwind": "$challengeDetails" },
{ "$match": { "challengeDetails.ID": 2 } },
{ "$sort": { "challengeDetails.pb": 1 } },
{
"$group": {
"_id": "$_id",
"username": { "$first": "$username" },
"challengeDetails": { $push: "$challengeDetails" }
}
}
])
Working Mongo playground

how to delete many documents from a collection based on a condition with values from an other collection

here is attached the aggregate query
i want to delete all the returning values of this query from the same collection "History"
how to do it ?
lets say i have a collection of stock-market companies records named "history" as
{_id:"value",symbol:"value",date:"value",open:"value",close:"value",......}
my file is supposed to have a document for each company for each day , in total 42days of records for each company
but after checking the data it seems like some companies doesn't have all the 42days records "one document/day" they have less
so i want to delete the companies who doesn't have exactly 42 documents
my group by will be on the "symbol" my count on "date" i can get the list but i don't know how to delete it
You can remove them running .remove method.
db.history.aggregate(...).forEach(function(doc){
db.history.remove({symbol: doc._id});
})
Note: It's very slow.
Alternative solution: Change aggregation criteria to return valid documents and override history collection with $out operator:
db.history.aggregate([
{
$group: {
_id: "$symbol",
nbr_jours: {
$sum: 1
},
data: {
$push: "$$ROOT"
}
}
},
{
$match: {
nbr_jours: {
$gte: 42 //$eq
}
}
},
{
$unwind: "$data"
},
{
$replaceRoot: {
newRoot: "$data"
}
},
{
$out: "history"
}
])
Note: It's very fast.

Select latest document after grouping them by a field in MongoDB

I got a question that I would expect to be pretty simple, but I cannot figure it out. What I want to do is this:
Find all documents in a collection and:
sort the documents by a certain date field
apply distinct on one of its other fields, but return the whole document
Best shown in an example.
This is a mock input:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("1998-11-04T18:46:14.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("1970-05-09T20:16:37.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
The expected output is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Or, in other words:
Group the input data by the commandName field
Inside each group sort the documents
Return the newest document from each group
My attempts to write this query have failed:
The distinct() function will only return the value of the field I am distinct-ing on, not the whole document. That makes it unsuitable for my case.
Tried writing an aggregate query, but ran into an issue of how to sort-and-select a single document from inside of each group? The sort aggreation stage will sort the groups among one other, which is not what I want.
I am not too well-versed in Mongo and this is where I hit a wall. Any ideas on how to continue?
For reference, this is the work-in-progress aggregation query I am trying to expand on:
db.getCollection('some_collection').aggregate([
{ $group: { '_id': '$commandName', 'docs': {$addToSet: '$$ROOT'} } },
{ $sort: {'_id.docs.???': 1}}
])
Post-resolved edit
Thank you for the answers. I got what I needed. For future reference, this is the full query that will do what was requested and also return a list of the filtered documents, not groups.
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': 1}},
{ $group: { '_id': '$commandName', 'result': { $last: '$$ROOT'} } },
{ $replaceRoot: {newRoot: '$result'} }
])
The query result without the $replaceRoot stage would be:
[
{
"_id": "migration_a",
"result": {
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
}
},
{
"_id": "migration_b",
"result": {
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
}
]
The outer _id and _result are just "group-wrappers" around the actual document I want, which is nested under the result key. Moving the nested document to the root of the result is done using the $replaceRoot stage. The query result when using that stage is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Try this:
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': -1}},
{ $group: { '_id': '$commandName', 'doc': {$first: '$$ROOT'} } }
])
I believe this will result in what you're looking for:
db.collection.aggregate([
{
$group: {
"_id": "$commandName",
"executionDate": {
"$last": "$executionDate"
}
}
}
])
You can check it out here
Of course, if you want to match your expected output exactly, you can add a sort (this may not be necessary since your goal is to simply return the newest document from each group):
{
$sort: {
"executionDate": 1
}
}
You can check this version out here.
The use-case the question presents is nearly covered in the $last aggregation operator documentation.
Which summarises:
the $group stage should follow a $sort stage to have the input
documents in a defined order. Since $last simply picks the last
document from a group.
Query: Link
db.collection.aggregate([
{
$sort: {
executionDate: 1
}
},
{
$group: {
_id: "$commandName",
executionDate: {
$last: "$executionDate"
}
}
}
]);

Using "$count" Within an "addField" Operation in MongoDB Aggregation

I am trying to find the correct combination of aggregation operators to add a field titled "totalCount" to my mongoDB view.
This will get me the count at this particular stage of the aggregation pipeline and output this as the result of a count on each of the documents:
{
$count: "count"
}
But I then end up with one document with this result, rather than what I'm trying to accomplish, which is to make this value print out as an addedField that is a field/value on all of the documents, or even better, a value that prints in addition to the returned documents.
I've tried this but it gives me an error ""Unrecognized expression '$count'",":
{
$addFields: {
"totalCount" : { $count: "totalCount" }
}
}
What would the correct syntactical construction be for this? Is it possible to do it this way, or do I need to use $sum, or some other operator to make this work? I also tried this:
{
$addFields: {
"totalCount" : { $sum: { _id: 1 } }
}
},
... but while it doesn't give me any errors, it just prints 0 as the value for that field on every document rather than the total count of all documents.
Total count will always be a one-document result so you need $facet to run mutliple aggregation pipelines and then merge results. Let's say your regular pipeline contains simple $project and you want to merge it's results with $count. You can run below aggregation:
db.col.aggregate([
{
$facet: {
totalCount: [
{ $count: "value" }
],
pipelineResults: [
{
$project: { _id: 1 } // your regular aggregation pipeline here
}
]
}
},
{
$unwind: "$pipelineResults"
},
{
$unwind: "$totalCount"
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [ "$pipelineResults", { totalCount: "$totalCount.value" } ]
}
}
}
])
After $facet stage you'll get single document like this
{
"totalCount" : [
{
"value" : 3
}
],
"pipelineResults" : [
{
"_id" : ObjectId("5b313241120e4bc08ce87e46")
},
//....
]
}
Then you have to use $unwind to transform arrays into multiple documents and $replaceRoot with $mergeObjects to promote regular pipeline results into root level.
Since mongoDB version 5.0 there is another option, that allows to avoid the disadvantage of $facet, the grouping of all returned document into a one big document. The main concern is that a document as a size limit of 16M. Using $setWindowFields allows to avoid this concern
This can simply replace #micki's 4 steps:
db.col.aggregate([
{$setWindowFields: {output: {totalCount: {$count: {}}}}}
])

Mongodb: nested field in $group's _id

Assume we have documents like this in the collection
{
_id: {
element_id: '12345',
name: 'foobar'
},
value: {
count: 1
}
}
I am using the aggregation framework to do a $group, like so
db.collection.aggregate([
{ $group: { _id: '$_id.element_id', total: { $sum: '$value.count' } } }
])
And got a result of
{ "result" : [ { "_id" : null, "total" : 1 } ], "ok" : 1 }
Notice that the _id field in the result is null. From experimentation it seems that $group is not allowing a nested field declaration for its _id (e.g. $_id.element_id).
Why is this? And is there a workaround for it?
Thank you.
I found a workaround using $project.
db.collection.aggregate([
{ $project: { element_id: '$_id.element_id', count: '$value.count' } },
{ $group: { _id: '$element_id', total: { $sum: '$count' } } }
])
$project Reshapes a document stream by renaming, adding, or removing fields.
http://docs.mongodb.org/manual/reference/aggregation/#_S_project
This turns out to have been issue SERVER-7491. It appears to have been fixed in 2.2.2 (released about 3 days ago).
The workaround mentioned above worked well for me in 2.2.1. As a note, when using the $project workaround (pre 2.2.2) excluding _id from the $project with _id:0 is inadvisable as it appears to behave quite strangely, I ended up with some working properly and some where that portion of the _id field was missing in the end result within the same aggregation.