MongoDB: get all entries with most recent revision - mongodb

I'm trying to query a prebuilt MongoDB that keeps revision information around for every entry. A sample of what the data looks like is this:
{
"_id": ObjectId("5cd48bad15447900012fae00"),
"sku": "abc123",
"_revision": 5
"_created": ISODate("2019-05-10T15:00:00.000Z"),
"provider": "MySupplier5"
}
In this example there are entries for the same record with:
different _id
same sku
same provider
_revision set to 1, 2, 3 and 4 (one per document)
varied _created dates
What I want to select is:
All documents with Provider set of "MySupplier5," but only the most recent revision.
I tried reading the documentation on Aggregate but wasn't able to get what I needed. I felt that the approach I am using to work around it is exceptionally inefficient and was hoping someone could point me in the right direction.
Right now, I am doing this:
Get all SKUs:
db.products.distinct('sku', { provider: "MySupplier5" }, { _id: 0, sku: 1 })
Loop over all SKUs and get the most recent revision:
db.products.find({ sku: 'abc123' }).sort({ revision: -1 }).limit(1)
That works, but it feels increidbly inefficient.
I was able to get those two queries running using Mongoid just fine and I'm sure I could translate whatever is given into the Ruby equivalent as well. I just can't figure out how to say "only give me the most recent revision" in Mongo.
Thanks!

For anyone looking for the answer that I wound up with based on the link provided I am listing it here (apologies for the late, late answer).
It took me all of the following links to be able to learn & figure out exactly what I needed:
StackOverflow 55101531
MongoDB Aggregation
MongoDB Aggregation API Doc
Some Samples to drive it home
[
{
"$group": {
"_id": "$sku",
"maxRevision": { "$max": "$revision" },
"originalDocuments": { "$push": "$$ROOT" }
}
},
{
"$project": {
"originalDocument": {
"$filter": {
"input": "$originalDocuments",
"as": "doc",
"cond": {
"$and": [
{ "$eq": ["abc123", "$$doc.sku"] },
{ "$eq": ["$maxRevision", "$$doc.revision"] }
]
}
}
}
}
},
{
"$unwind": "$originalDocument"
}
]

Related

Getting aggregate count for nested fields in mongo

I have a mongo collection mimicking this java class. A student can be taught a number of subjects across campus.
class Students {
String studentName;
Map<String,List<String>> subjectsByCampus;
}
So a structure will look like this
{
_id: ObjectId("someId"),
studentName:'student1',
subjectByCampusName:{
campus1:['subject1','subject2'],
campus2: ['subject3']
},
_class: 'fqnOfTheEntity'
}
I want to find the count of subjects offered by each campus or be able to query the count of subjects offered by a specific campus. Is there a way to get it through query?
Mentioned in the comments, but the schema here does not appear to be particularly well-suited toward gathering the data requested in the question. For an individual student, this is doable via $size and processing the object (as an array) via $map. For example, if we want your desired out put of campus1:2, campus2:1 for the sample document provided, a pipeline to produce that in a countsByCampus field might look as follows:
[
{
"$addFields": {
"countsByCampus": {
"$arrayToObject": {
"$map": {
"input": {
"$objectToArray": "$subjectByCampusName"
},
"in": {
"$mergeObjects": [
"$$this",
{
v: {
$size: "$$this.v"
}
}
]
}
}
}
}
}
}
]
Playground demonstration here with an output document of:
{
"_class": "fqnOfTheEntity",
"_id": "someId",
"countsByCampus": {
"campus1": 2,
"campus2": 1
},
"studentName": "student1",
"subjectByCampusName": {
"campus1": [
"subject1",
"subject2"
],
"campus2": [
"subject3"
]
}
}
Doing that across the entire collection would involve $grouping the results together. This can be done but would be an extremely resource-intensive and likely slow operation.

MongoDB aggregate different collections

i've been trying to do this aggregation since yesterday with no luck, hope you guys can give me some ideas. A little background, I have 2 collections, one for results and another for questions. Results is basically where people solve questions, so it can have 1 question or up to 99 if i'm not mistaken. This is the simplified schema:
Results Schema:
_id: ObjectId("6010664ac5c4f77f26f5d005")
questions: [
{
_id: ObjectId("5d2627c94bb703bfcc910763"),
correct: false
},
{
_id: ObjectId("5d2627c94bb703bfcc910764"),
correct: true
},
{
_id: ObjectId("5d2627c94bb703bfcc910765"),
correct: true
}
]
So, on this specific object, the user answered 3 questions and got 2 of them correct.
Questions Schema:
_id: ObjectId("5d2627c94bb703bfcc910763")
What i'm struggling to do is: for each element in all the questions schema, I have to check if the question was answered - i.e (check if there's an _id of questions array == _id on the Questions Schema, yes there can be multiple questions with the same _id as Questions Schema, but Questions Schema _id is unique). If that question was answered, I need to check if correct = true, if so, I add it to a correctAnswer variable, if not, wrongAnswer.
I've tried many things so far with conditions, group to get the $sum of correct and wrong answers, lookup to join both collections but so far, I can't even get to show just the aggregation result.
One of the things I tried (i was trying to do baby steps first) but as mentioned before, couldn't even get the result printed.
Result.aggregate([
$lookup: {
from: 'question',
localField: 'questions._id',
foreignField: '_id',
as: 'same'
},
But this gets me both collections combined, and 'same' comes as empty array, tried using match with also no luck.
I also did a $project to just get the information I wanted
$project: {
_id: 0,
questions: {
_id: 1,
correct: 1
}
},
Tried using $group:
$group: {
_id: "$_id",
$cond: {if: {"$correct": { $eq: true}}, then: {testField: {$sum: 1}}, else: {testField: 0}}
}
And as I said, i was just trying to do baby steps so it the testField was beeing manually set, also tried many other things from stackoverflow.
Would appreciate the help and sorry for the very long text, just wanted to put in some examples that I did and tried.
TLDR: Need to find a question from the Results Schema where _id matches an _id from the Questions Schema, if there is, check if correct: true or correct: false. Update Questions Schema accordingly with how many were correct and how many were wrong for each question from Questions Schema.
Example: newField: {correctAnswer: 4, wrongAnswer: 3} so in this case, there were 7 questions from the Result schema question array that matched an _id from Question Schema, 4 had correct: true and 3 had correct: false. Then it goes on like this for the rest of Question Schema
For a scenario where "$lookup" can't be used because the Question collection is in a different database, the Result collection may be used to generate output documents to update the Question collection.
Here's one way to do it.
db.Result.aggregate([
{
"$unwind": "$questions"
},
{
"$group": {
"_id": "$questions._id",
"correct": {
"$push": "$questions.correct"
}
}
},
{
"$project": {
"newField": {
"correctAnswers": {
"$size": {
"$filter": {
"input": "$correct",
"as": "bool",
"cond": "$$bool"
}
}
},
"wrongAnswers": {
"$size": {
"$filter": {
"input": "$correct",
"as": "bool",
"cond": { "$not": "$$bool" }
}
}
}
}
}
}
])
Try it on mongoplayground.net.
I don't know of a way to "$lookup" and update at the same time. There's probably a better way to do this, but the aggregation pipeline below creates the documents that could be used in a subsequent update. The pipeline correctly counts repeat questions by a single Result _id, in case someone keeps trying a question until they get it right. One possible issue is that if a question has no Result answers, then no "newField": { "correctAnswers": 0, "wrongAnswers": 0 } document is created.
db.Question.aggregate([
{
// lookup documents in Result that match _id
"$lookup": {
"from": "Result",
"localField": "_id",
"foreignField": "questions._id",
"as": "results"
}
},
{
// unwind everything
"$unwind": "$results"
},
{
// more of everything
"$unwind": "$results.questions"
},
{
// only keep answers that match question
"$match": {
"$expr": { "$eq": [ "$_id", "$results.questions._id" ] }
}
},
{
// reassemble and count correct/wrong answers
"$group": {
"_id": "$_id",
"correct": {
"$sum": {
"$cond": [ { "$eq": [ "$results.questions.correct", true ] }, 1, 0 ]
}
},
"wrong": {
"$sum": {
"$cond": [ { "$eq": [ "$results.questions.correct", false ] }, 1, 0 ]
}
}
}
},
{
// project what you want as output
"$project": {
newField: {
correctAnswers: "$correct",
wrongAnswers: "$wrong"
}
}
}
])
Try it on mongoplayground.net.

Mongodb get document that has max value for each subdocument

I have some data looking like this:
{'Type':'A',
'Attributes':[
{'Date':'2021-10-02', 'Value':5},
{'Date':'2021-09-30', 'Value':1},
{'Date':'2021-09-25', 'Value':13}
]
},
{'Type':'B',
'Attributes':[
{'Date':'2021-10-01', 'Value':36},
{'Date':'2021-09-15', 'Value':14},
{'Date':'2021-09-10', 'Value':18}
]
}
I would like to query for each document the document with the newest date. With the data above the desired result would be:
{'Type':'A', 'Date':'2021-10-02', 'Value':5}
{'Type':'B', 'Date':'2021-10-01', 'Value':36}
I managed to find some queries to find over all sub document only the global max. But I did not find the max for each document.
Thanks a lot for your help
Storing date as string is generally considered as bad pratice. Suggest that you change your date field into date type. Fortunately for your case, you are using ISO date format so some effort could be saved.
You can do this in aggregation pipeline:
use $max to find out the max date
use $filter to filter the Attributes array to contains only the latest element
$unwind the array
$project to your expected output
Here is the Mongo playground for your reference.
This keeps 1 member from Attributes only, the one with the max date.
If you want to keep multiple ones use the #ray solution that keeps all members that have the max-date.
*mongoplayground can lose the order, of fields in a document,
if you see wrong result, test it on your driver, its bug of mongoplayground tool
Query1 (local-way)
Test code here
aggregate([
{
"$project": {
"maxDateValue": {
"$max": {
"$map": {
"input": "$Attributes",
"in": { "Date": "$$this.Date", "Value": "$$this.Value" },
}
}
},
"Type": 1
}
},
{
"$project": {
"Date": "$maxDateValue.Date",
"Value": "$maxDateValue.Value"
}
}
])
Query2 (unwind-way)
Test code here
aggregate([
{
"$unwind": { "path": "$Attributes" }
},
{
"$group": {
"_id": "$Type",
"maxDate": {
"$max": {
"Date": "$Attributes.Date",
"Value": "$Attributes.Value"
}
}
}
},
{
"$project": {
"_id": 0,
"Type": "$_id",
"Date": "$maxDate.Date",
"Value": "$maxDate.Value"
}
}
])

aggregating metrics data in mongodb

I'm trying to pull report data out of a realtime metrics system inspired by the NYC MUG/SimpleReach schema, and maybe my mind is still stuck in SQL mode.
The data is stored in a document like so...
{
"_id": ObjectId("5209683b915288435894cb8b"),
"account_id": 922,
"project_id": 22492,
"stats": {
"2009": {
"04": {
"17": {
"10": {
"sum": {
"impressions": 11
}
},
"11": {
"sum": {
"impressions": 603
}
},
},
},
},
}}
and I've been trying different variations of the aggregation pipeline with no success.
db.metrics.aggregate({
$match: {
'project_id':22492
}}, {
$group: {
_id: "$project_id",
'impressions': {
//This works, but doesn't sum up the data...
$sum: '$stats.2009.04.17.10.sum.impressions'
/* none of these work.
$sum: ['$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions']
$sum: {'$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions'}
$sum: '$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions'
*/
}
}
any help would be appreciated.
(ps. does anyone have any ideas on how to do date range searches using this document schema? )
$group is designed to be applied to many documents, but here we only have one matched document.
Instead, $project could be used to sum up specific fields, like this:
db.metrics.aggregate(
{ $match: {
'project_id':22492
}
},
{ $project: {
'impressions': {
$add: [
'$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions'
]
}
}
})
I don't think there is an elegant way to do date range searches with this schema, because MongoDB operations/predictions are designed to be applied on values, rather than keys in a document. If I understand correctly, the most interesting point in the slides you mentioned is to cache/pre-aggregate metrics when updating. That's a good idea, but could be implemented with another schema. For example, using date and time with indexes, which are supported by MongoDB, might be a good choice for range searches. Even aggregation framework supports data operations, giving more flexibility.

combing result of two queries

my collection is like
{ queid:"1', date:'07/24/2013', resolved:'true' }
i want to get counts like
{date:'07/24/2013',countquestion:10,resolved:5}
{date:'07/23/2013',countquestion:5,resolved:2}
presently i am getting counts for number of questions using
que.aggregate({$group:"$date",count{$sum:1}})
and resolved using
que.aggregate({$match:{resolved:true}},{$group:"$date",count{$sum:1}})
combining both in java program. Is there a better way to do this query and combine results.
You could achieve this using $cond operator. Just pay attention to resolved field, it should be a boolean.
{
"$group": {
"_id": "$date",
"countquestion": {
"$sum": 1
},
"resolved": {
"$sum": {
"$cond": [ "$resolved", 1, 0 ]
}
}
}
}