mongodb: $substr inside $group - mongodb

I am a bit new to mongodb, and I have a situation.
The entries in my mongodb collection is around 300k, I used mongodb aggregate to migrate all the old entries to new and my code looks like
collection.aggregate([
{ $group: {
_id: "$_id",
"new_id" : {$substr : [{$first: "$old_id"}, 2,-1]}, //error exists here
"new_roles": {$push: "$old_role"},
}
},
{ $out: newCollection}
], {allowDiskUse:true}, function (updateError, updateResult) {
if (!updateError) {
return cb(false, true);
} else {
return cb(true, false);
}
})
What I want to achieve it that I want to take substring of old_id and put it in new_id. The old_id of old collection looks like a-112 a-34311. I want to remove a- from these old ids and put it into new id.
I tried using the above code but it shows error that The $substr accumulator is a unary operator. I tried searching for the error but no luck

You can't use $substr in the $group stage as it is not an accumulator operator. You need to do this in a $project stage like this:
collection.aggregate([
{ $group: {
_id: "$_id",
"new_id" : {$first: "$old_id"},
"new_roles": {$push: "$old_role"},
}
},
{$project: {
"new_id": {$substr: ["$new_id", 2, -1]},
"new_roles": 1
}
},
{ $out: newCollection}
], {allowDiskUse:true}, function (updateError, updateResult) {
if (!updateError) {
return cb(false, true);
} else {
return cb(true, false);
}
})

Related

Select latest document after grouping them by a field in MongoDB

I got a question that I would expect to be pretty simple, but I cannot figure it out. What I want to do is this:
Find all documents in a collection and:
sort the documents by a certain date field
apply distinct on one of its other fields, but return the whole document
Best shown in an example.
This is a mock input:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("1998-11-04T18:46:14.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("1970-05-09T20:16:37.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
The expected output is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Or, in other words:
Group the input data by the commandName field
Inside each group sort the documents
Return the newest document from each group
My attempts to write this query have failed:
The distinct() function will only return the value of the field I am distinct-ing on, not the whole document. That makes it unsuitable for my case.
Tried writing an aggregate query, but ran into an issue of how to sort-and-select a single document from inside of each group? The sort aggreation stage will sort the groups among one other, which is not what I want.
I am not too well-versed in Mongo and this is where I hit a wall. Any ideas on how to continue?
For reference, this is the work-in-progress aggregation query I am trying to expand on:
db.getCollection('some_collection').aggregate([
{ $group: { '_id': '$commandName', 'docs': {$addToSet: '$$ROOT'} } },
{ $sort: {'_id.docs.???': 1}}
])
Post-resolved edit
Thank you for the answers. I got what I needed. For future reference, this is the full query that will do what was requested and also return a list of the filtered documents, not groups.
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': 1}},
{ $group: { '_id': '$commandName', 'result': { $last: '$$ROOT'} } },
{ $replaceRoot: {newRoot: '$result'} }
])
The query result without the $replaceRoot stage would be:
[
{
"_id": "migration_a",
"result": {
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
}
},
{
"_id": "migration_b",
"result": {
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
}
]
The outer _id and _result are just "group-wrappers" around the actual document I want, which is nested under the result key. Moving the nested document to the root of the result is done using the $replaceRoot stage. The query result when using that stage is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Try this:
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': -1}},
{ $group: { '_id': '$commandName', 'doc': {$first: '$$ROOT'} } }
])
I believe this will result in what you're looking for:
db.collection.aggregate([
{
$group: {
"_id": "$commandName",
"executionDate": {
"$last": "$executionDate"
}
}
}
])
You can check it out here
Of course, if you want to match your expected output exactly, you can add a sort (this may not be necessary since your goal is to simply return the newest document from each group):
{
$sort: {
"executionDate": 1
}
}
You can check this version out here.
The use-case the question presents is nearly covered in the $last aggregation operator documentation.
Which summarises:
the $group stage should follow a $sort stage to have the input
documents in a defined order. Since $last simply picks the last
document from a group.
Query: Link
db.collection.aggregate([
{
$sort: {
executionDate: 1
}
},
{
$group: {
_id: "$commandName",
executionDate: {
$last: "$executionDate"
}
}
}
]);

MongoDB - Update or Insert object in array

I have the following collection
{
"_id" : ObjectId("57315ba4846dd82425ca2408"),
"myarray" : [
{
userId : ObjectId("570ca5e48dbe673802c2d035"),
point : 5
},
{
userId : ObjectId("613ca5e48dbe673802c2d521"),
point : 2
},
]
}
These are my questions
I want to push into myarray if userId doesn't exist, it should be appended to myarray. If userId exists, it should be updated to point.
I found this
db.collection.update({
_id : ObjectId("57315ba4846dd82425ca2408"),
"myarray.userId" : ObjectId("570ca5e48dbe673802c2d035")
}, {
$set: { "myarray.$.point": 10 }
})
But if userId doesn't exist, nothing happens.
and
db.collection.update({
_id : ObjectId("57315ba4846dd82425ca2408")
}, {
$push: {
"myarray": {
userId: ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}
}
})
But if userId object already exists, it will push again.
What is the best way to do this in MongoDB?
Try this
db.collection.update(
{ _id : ObjectId("57315ba4846dd82425ca2408")},
{ $pull: {"myarray.userId": ObjectId("570ca5e48dbe673802c2d035")}}
)
db.collection.update(
{ _id : ObjectId("57315ba4846dd82425ca2408")},
{ $push: {"myarray": {
userId:ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}}
)
Explination:
in the first statment $pull removes the element with userId= ObjectId("570ca5e48dbe673802c2d035") from the array on the document where _id = ObjectId("57315ba4846dd82425ca2408")
In the second one $push inserts
this object { userId:ObjectId("570ca5e48dbe673802c2d035"), point: 10 } in the same array.
The accepted answer by Flying Fisher is that the existing record will first be deleted, and then it will be pushed again.
A safer approach (common sense) would be to try to update the record first, and if that did not find a match, insert it, like so:
// first try to overwrite existing value
var result = db.collection.update(
{
_id : ObjectId("57315ba4846dd82425ca2408"),
"myarray.userId": ObjectId("570ca5e48dbe673802c2d035")
},
{
$set: {"myarray.$.point": {point: 10}}
}
);
// you probably need to modify the following if-statement to some async callback
// checking depending on your server-side code and mongodb-driver
if(!result.nMatched)
{
// record not found, so create a new entry
// this can be done using $addToSet:
db.collection.update(
{
_id: ObjectId("57315ba4846dd82425ca2408")
},
{
$addToSet: {
myarray: {
userId: ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}
}
}
);
// OR (the equivalent) using $push:
db.collection.update(
{
_id: ObjectId("57315ba4846dd82425ca2408"),
"myarray.userId": {$ne: ObjectId("570ca5e48dbe673802c2d035"}}
},
{
$push: {
myarray: {
userId: ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}
}
}
);
}
This should also give (common sense, untested) an increase in performance, if in most cases the record already exists, only the first query will be executed.
There is a option called update documents with aggregation pipeline starting from MongoDB v4.2,
check condition $cond if userId in myarray.userId or not
if yes then $map to iterate loop of myarray array and check condition if userId match then merge with new document using $mergeObjects
if no then $concatArrays to concat new object and myarray
let _id = ObjectId("57315ba4846dd82425ca2408");
let updateDoc = {
userId: ObjectId("570ca5e48dbe673802c2d035"),
point: 10
};
db.collection.update(
{ _id: _id },
[{
$set: {
myarray: {
$cond: [
{ $in: [updateDoc.userId, "$myarray.userId"] },
{
$map: {
input: "$myarray",
in: {
$mergeObjects: [
"$$this",
{
$cond: [
{ $eq: ["$$this.userId", updateDoc.userId] },
updateDoc,
{}
]
}
]
}
}
},
{ $concatArrays: ["$myarray", [updateDoc]] }
]
}
}
}]
)
Playground
Unfortunately "upsert" operation is not possible on embedded array. Operators simply do not exist so that this is not possible in a single statement.Hence you must perform two update operations in order to do what you want. Also the order of application for these two updates is important to get desired result.
I haven't found any solutions based on a one atomic query. Instead there are 3 ways based on a sequence of two queries:
always $pull (to remove the item from array), then $push (to add the updated item to array)
db.collection.update(
{ _id : ObjectId("57315ba4846dd82425ca2408")},
{ $pull: {"myarray.userId": ObjectId("570ca5e48dbe673802c2d035")}}
)
db.collection.update(
{ _id : ObjectId("57315ba4846dd82425ca2408")},
{
$push: {
"myarray": {
userId:ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}
}
}
)
try to $set (to update the item in array if exists), then get the result and check if the updating operation successed or if a $push needs (to insert the item)
var result = db.collection.update(
{
_id : ObjectId("57315ba4846dd82425ca2408"),
"myarray.userId": ObjectId("570ca5e48dbe673802c2d035")
},
{
$set: {"myarray.$.point": {point: 10}}
}
);
if(!result.nMatched){
db.collection.update({_id: ObjectId("57315ba4846dd82425ca2408")},
{
$addToSet: {
myarray: {
userId: ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}
}
);
always $addToSet (to add the item if not exists), then always $set to update the item in array
db.collection.update({_id: ObjectId("57315ba4846dd82425ca2408")},
myarray: { $not: { $elemMatch: {userId: ObjectId("570ca5e48dbe673802c2d035")} } } },
{
$addToSet : {
myarray: {
userId: ObjectId("570ca5e48dbe673802c2d035"),
point: 10
}
}
},
{ multi: false, upsert: false});
db.collection.update({
_id: ObjectId("57315ba4846dd82425ca2408"),
"myArray.userId": ObjectId("570ca5e48dbe673802c2d035")
},
{ $set : { myArray.$.point: 10 } },
{ multi: false, upsert: false});
1st and 2nd way are unsafe, so transaction must be established to avoid two concurrent requests could push the same item generating a duplicate.
3rd way is safer. the $addToSet adds only if the item doesn't exist, otherwise nothing happens. In case of two concurrent requests, only one of them adds the missing item to the array.
Possible solution with aggregation pipeline:
db.collection.update(
{ _id },
[
{
$set: {
myarray: { $filter: {
input: '$myarray',
as: 'myarray',
cond: { $ne: ['$$myarray.userId', ObjectId('570ca5e48dbe673802c2d035')] },
} },
},
},
{
$set: {
myarray: {
$concatArrays: [
'$myarray',
[{ userId: ObjectId('570ca5e48dbe673802c2d035'), point: 10 },
],
],
},
},
},
],
);
We use 2 stages:
filter myarray (= remove element if userId exist)
concat filtered myarray with new element;
When you want update or insert value in array try it
Object in db
key:name,
key1:name1,
arr:[
{
val:1,
val2:1
}
]
Query
var query = {
$inc:{
"arr.0.val": 2,
"arr.0.val2": 2
}
}
.updateOne( { "key": name }, query, { upsert: true }
key:name,
key1:name1,
arr:[
{
val:3,
val2:3
}
]
In MongoDB 3.6 it is now possible to upsert elements in an array.
array update and create don't mix in under one query, if you care much about atomicity then there's this solution:
normalise your schema to,
{
"_id" : ObjectId("57315ba4846dd82425ca2408"),
userId : ObjectId("570ca5e48dbe673802c2d035"),
point : 5
}
You could use a variation of the .forEach/.updateOne method I currently use in mongosh CLI to do things like that. In the .forEach, you might be able to set all of your if/then conditions that you mentioned.
Example of .forEach/.updateOne:
let medications = db.medications.aggregate([
{$match: {patient_id: {$exists: true}}}
]).toArray();
medications.forEach(med => {
try {
db.patients.updateOne({patient_id: med.patient_id},
{$push: {medications: med}}
)
} catch {
console.log("Didn't find match for patient_id. Could not add this med to a patient.")
}
})
This may not be the most "MongoDB way" to do it, but it definitely works and gives you the freedom of javascript to do things within the .forEach.

How do I query a mongo document containing subset of nested array

Here is a doc I have:
var docIHave = {
_id: "someId",
things: [
{
name: "thing1",
stuff: [1,2,3,4,5,6,7,8,9]
},
{
name: "thing2",
stuff: [4,5,6,7,8,9,10,11,12,13,14]
},
{
name: "thing3",
stuff: [1,4,6,8,11,21,23,30]
}
]
}
This is the doc I want:
var docIWant = {
_id: "someId",
things: [
{
name: "thing1",
stuff: [5,6,7,8,9]
},
{
name: "thing2",
stuff: [5,6,7,8,9,10,11]
},
{
name: "thing3",
stuff: [6,8,11]
}
]
}
stuff´s of docIWant should only contain items greater than min=4
and smaller than max=12.
Background:
I have a meteor app and I subscribe to a collection giving me docIHave. Based on parameters min and max I need the docIWant "on the fly". The original document should not be modified. I need a query or procedure that returns me docIWant with the subset of stuff.
A practical code example would be greatly appreciated.
Use the aggregation framework for this. In the aggregation pipeline, consider the $match operator as your first pipeline stage. This is quite necessary to optimize your aggregation as you would need to filter documents that match the given criteria first before passing them on further down the pipeline.
Next use the $unwind operator. This deconstructs the things array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
Another $unwind operation would be needed on the things.stuff array as well.
The next pipeline stage would then filter dopcuments where the deconstructed things.stuff match the given min and max criteria. Use a $match operator for this.
A $group operator is then required to group the input documents by a specified identifier expression and applies the accumulator expression $push to each group. This creates an array expression to each group.
Typically your aggregation should end up like this (although I haven't actually tested it but this should get you going in the right direction):
db.collection.aggregate([
{
"$match": {
"things.stuff": { "$gt": 4, "$lte": 11 }
}
},
{
"$unwind": "$things"
},
{
"$unwind": "$things.stuff"
},
{
"$match": {
"things.stuff": { "$gt": 4, "$lte": 11 }
}
},
{
"$group": {
"_id": {
"_id": "$_id",
"things": "$things"
},
"stuff": {
"$push": "$things.stuff"
}
}
},
{
"$group": {
"_id": "$_id._id",
"things": {
"$push": {
"name": "$_id.things.name",
"stuff": "$stuff"
}
}
}
}
])
If you need to transform the document on the client for display purposes, you could do something like this:
Template.myTemplate.helpers({
transformedDoc: function() {
// get the bounds - maybe these are stored in session vars
var min = Session.get('min');
var max = Session.get('max');
// fetch the doc somehow that needs to be transformed
var doc = SomeCollection.findOne();
// transform the thing.stuff arrays
_.each(doc.things, function(thing) {
thing.stuff = _.reject(thing.stuff, function(n) {
return (n < min) || (n > max);
});
});
// return the transformed doc
return doc;
}
});
Then in your template: {{#each transformedDoc.things}}...{{/each}}
Use mongo aggregation like following :
First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project.
The query will be as following:
db.collection.aggregate([
{
$unwind: "$things"
}, {
$unwind: "$things.stuff"
}, {
$match: {
"things.stuff": {
$gt: 4,
$lt:12
}
}
}, {
$group: {
"_id": "$things.name",
"stuff": {
$push: "$things.stuff"
}
}
}, {
$project: {
"thingName": "$_id",
"stuff": 1
}
}])

Return all fields MongoDB Aggregate

I tried searching on here but couldn't really find what I need. I have documents like this:
{
appletype:Granny,
color:Green,
datePicked:2015-01-26,
dateRipe:2015-01-24,
numPicked:3
},
{
appletype:Granny,
color:Green,
datePicked:2015-01-01,
dateRipe:2014-12-28,
numPicked:6
}
I would like to return only those apples picked latest, will all fields. I want my query to return me the first document only essentially. When I try to do:
db.collection.aggregate([
{ $match : { "appletype" : "Granny" } },
{ $sort : { "datePicked" : 1 } },
{ $group : { "_id" : { "appletype" : "$appletype" },
"datePicked" : { $max : "$datePicked" } },
])
It does return me all the apples picked latest, however with only appletype:Granny and datePicked:2015-01-26. I need the remaining fields. I tries using $project and adding all the fields, but it didn't get me what I needed. Also, when I added the other fields to the group, since datePicked is unique, it returned both records.
How can I go about returning all fields, for only the latest datePicked?
Thanks!
From your description, it sounds like you want one document for each of the types of apple in your collection and showing the document with the most recent datePicked value.
Here is an aggregate query for that:
db.collection.aggregate([
{ $sort: { "datePicked": -1 },
{ $group: { _id: "$appletype", color: { $first: "$color" }, datePicked: { $first: "$datePicked" }, dateRipe: { $first: "$dateRipe" }, numPicked: { $first: "$numPicked" } } },
{ $project: { _id: 0, color: 1, datePicked: 1, dateRipe: 1, numPicked: 1, appletype: "$_id" } }
])
But then based on the aggregate query you've written, it looks like you're trying to get this:
db.collection.find({appletype: "Granny"}).sort({datePicked: -1}).limit(1);

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.