How to remove duplicate entries from an array? - mongodb

In the following example, "Algorithms in C++" is present twice.
The $unset modifier can remove a particular field but how to remove an entry from a field?
{
"_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
"favorites" : {
"books" : [
"Algorithms in C++",
"The Art of Computer Programming",
"Graph Theory",
"Algorithms in C++"
]
},
"name" : "robert"
}

As of MongoDB 2.2 you can use the aggregation framework with an $unwind, $group and $project stage to achieve this:
db.users.aggregate([{$unwind: '$favorites.books'},
{$group: {_id: '$_id',
books: {$addToSet: '$favorites.books'},
name: {$first: '$name'}}},
{$project: {'favorites.books': '$books', name: '$name'}}
])
Note the need for the $project to rename the favorites field, since $group aggregate fields cannot be nested.

The easiest solution is to use setUnion (Mongo 2.6+):
db.users.aggregate([
{'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])
Another (more lengthy) version that is based on the idea from #kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):
> db.users.aggregate([
{'$unwind': {
'path': '$favorites.books',
// output the document even if its list of books is empty
'preserveNullAndEmptyArrays': true
}},
{'$group': {
'_id': '$_id',
'books': {'$addToSet': '$favorites.books'},
// arbitrary name that doesn't exist on any document
'_other_fields': {'$first': '$$ROOT'},
}},
{
// the field, in the resulting document, has the value from the last document merged for the field. (c) docs
// so the new deduped array value will be used
'$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
},
// this stage wouldn't be necessary if the field wasn't nested
{'$addFields': {'favorites.books': '$books'}},
{'$project': {'_other_fields': 0, 'books': 0}}
])
{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" :
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }

What you have to do is use map reduce to detect and count duplicate tags .. then use $set to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
This has been discussed sevel times here .. please seee
Removing duplicate records using MapReduce
Fast way to find duplicates on indexed column in mongodb
http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce
http://www.mongodb.org/display/DOCS/MapReduce
How to remove duplicate record in MongoDB by MapReduce?

function unique(arr) {
var hash = {}, result = [];
for (var i = 0, l = arr.length; i < l; ++i) {
if (!hash.hasOwnProperty(arr[i])) {
hash[arr[i]] = true;
result.push(arr[i]);
}
}
return result;
}
db.collection.find({}).forEach(function (doc) {
db.collection.update({ _id: doc._id }, { $set: { "favorites.books": unique(doc.favorites.books) } });
})

Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to remove duplicates from an array:
// {
// "favorites" : { "books" : [
// "Algorithms in C++",
// "The Art of Computer Programming",
// "Graph Theory",
// "Algorithms in C++"
// ]},
// "name" : "robert"
// }
db.collection.aggregate(
{ $set:
{ "favorites.books":
{ $function: {
body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
args: ["$favorites.books"],
lang: "js"
}}
}
}
)
// {
// "favorites" : { "books" : [
// "Algorithms in C++",
// "The Art of Computer Programming",
// "Graph Theory"
// ]},
// "name" : "robert"
// }
This has the advantages of:
keeping the original order of the array (if that's not a requirement, then prefer #Dennis Golomazov's $setUnion answer)
being more efficient than a combination of expensive $unwind and $group stages.
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to modify.
args, which contains the fields from the record that the body function takes as parameter. In our case "$favorites.books".
lang, which is the language in which the body function is written. Only js is currently available.

Related

How do I update a field in a sub-document array with a field from the document in MongoDB?

I have a large amount of data (~160M items) where a date value wasn't populated on the sub-document array fields, but was populated on the parent document. I'm very new to MongoDB and having trouble figuring out how to $set the field to match. Here's a sample of the data:
{
"_id": "5f11d4c48663f32e940696ed",
"Widgets":[{
"WidgetId":663,
"Name":"Super Widget 2.0",
"Created":null,
"LastUpdated":null
}],
"Status":3,
"LastUpdated":null,
"Created": "2018-11-09T18:22:16.000Z"
}
}
My knowledge of MongoDB is pretty limited but here's the basic aggregation I have created for part of the pipeline and where I'm struggling:
db.sample.aggregate(
[
{
"$match" : {
"Donors.$.Created" : {
"$exists" : true
}
}
},
{
"$match" : {
"Widgets.$.Created" : null
}
},
{
"$set" : {
"Widgets.$.Created" : "Created" // <- This is where I can't figure out how to define the reference to the parent "Created" field
}
}
]
);
The desired output would be:
{
"_id": "5f11d4c48663f32e940696ed",
"Widgets":[{
"WidgetId":663,
"Name":"Super Widget 2.0",
"Created":"2018-11-09T18:22:16.000Z",
"LastUpdated":null
}],
"Status":3,
"LastUpdated":null,
"Created": "2018-11-09T18:22:16.000Z"
}
}
Thanks for any assitance
Are you attempting to add the Created field to sub documents on query/aggregation? Or are you attempting to update/save the Created field on the subdocuments?
The $ is an update operator, to be used with updateMany or updateOne. Not aggregate.
https://docs.mongodb.com/manual/reference/operator/query-array/
https://docs.mongodb.com/manual/reference/operator/update-array/
If you just want to add the parents Created field to all subdocuments on query/aggregation this is all you have to do: https://mongoplayground.net/p/yHDHULCSTIz
db.collection.aggregate([
{
"$addFields": {
"Widgets.Created": "$Created"
}
}
])
If your attempting to save the parents Created field to all subdocuments:
db.sample.updateMany({"Widgets.Created" : null}, [{$set: {"Widgets.Created" : "$Created"}}])
Note: This matches any doc that has a subdocument with a null Created field and updates all the subdocuments.

How can I write a Mongoose find query that uses another field as it's conditional?

Consider the following:
I have a Mongoose model called 'Person'. In the schema for the Person mode, each Person has two fields: 'children' and 'maximum_children'. Both fields are of type Number.
I would like to write a find query that returns Persons when that Persons 'children' value is less that it's 'maximum_children' value.
I have tried:
person_model.find({
children: {
$lt: maximum_children
}
}, function (error, persons) {
// DO SOMETHING ELSE
});
and
person_model.find({
children: {
$lt: 'maximum_children'
}
}, function (error, persons) {
// DO SOMETHING ELSE
});
I'm doing something wrong in trying to specify the field name that I want to compare 'children' against.
OK.
I found a solution, just after I posted this question.
The answer seems to be:
person_model.find({
$where: "children < maximum_children"}, function (error, persons)
}, {
// DO SOMETHING ELSE
});
Seems to work OK, although it seems messy.
$where must execute its JavaScript conditional against every doc so its performance can be quite poor. Instead, you can use aggregate to include a new field in a $project stage the indicates whether the doc matches or not and then filter on that:
person_model.aggregate([
{$project: {
isMatch: {$lt: ['$children', '$maximum_children']},
doc: '$$ROOT'
}},
{$match: {isMatch: true}},
{$project: {_id: 0, doc: 1}}
], function(err, results) {...});
This uses $$ROOT to include the original doc as the doc field of the projection, with a final $project used to remove the isMatch field that was added.
results looks like:
{
"doc" : {
"_id" : ObjectId("54d04591257efd80c6965ada"),
"children" : 5,
"maximum_children" : 10
}
},
{
"doc" : {
"_id" : ObjectId("54d04591257efd80c6965add"),
"children" : 5,
"maximum_children" : 6
}
}
If you want to remove the added doc level of the objects you can use Array#map on results like so:
results = results.map(function(item) { return item.doc; });
Which reshapes results to put them back into their original form:
{
"_id" : ObjectId("54d04591257efd80c6965ada"),
"children" : 5,
"maximum_children" : 10
},
{
"_id" : ObjectId("54d04591257efd80c6965add"),
"children" : 5,
"maximum_children" : 6
}

MongoDB conditionally $addToSet sub-document in array by specific field

Is there a way to conditionally $addToSet based on a specific key field in a subdocument on an array?
Here's an example of what I mean - given the collection produced by the following sample bootstrap;
cls
db.so.remove();
db.so.insert({
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples"
}
]
});
n defines a unique document key. I only want one entry with the same n value in the array at any one time. So I want to be able to update the pfms array using n so that I end up with just this;
{
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples",
"mState": 1111234
}
]
}
Here's where I am at the moment;
db.so.update({
"Name": "fruitBowl",
},{
// not allowed to do this of course
// "$pull": {
// "pfms": { n: "apples" },
// },
"$addToSet": {
"pfms": {
"$each": [
{
"n": "apples",
"mState": 1111234
}
]
}
}
}
)
Unfortunately, this adds another array element;
db.so.find().toArray();
[
{
"Name" : "fruitBowl",
"_id" : ObjectId("53ecfef5baca2b1079b0f97c"),
"pfms" : [
{
"n" : "apples"
},
{
"n" : "apples",
"mState" : 1111234
}
]
}
]
I need to effectively upsert the apples document matching on n as the unique identifier and just set mState whether or not an entry already exists. It's a shame I can't do a $pull and $addToSet in the same document (I tried).
What I really need here is dictionary semantics, but that's not an option right now, nor is breaking out the document - can anyone come up with another way?
FWIW - the existing format is a result of language/driver serialization, I didn't choose it exactly.
further
I've gotten a little further in the case where I know the array element already exists I can do this;
db.so.update({
"Name": "fruitBowl",
"pfms.n": "apples",
},{
$set: {
"pfms.$.mState": 1111234,
},
}
)
But of course that only works;
for a single array element
as long as I know it exists
The first limitation isn't a disaster, but if I can't effectively upsert or combine $addToSet with the previous $set (which of course I can't) then it the only workarounds I can think of for now mean two DB round-trips.
The $addToSet operator of course requires that the "whole" document being "added to the set" is in fact unique, so you cannot change "part" of the document or otherwise consider it to be a "partial match".
You stumbled on to your best approach using $pull to remove any element with the "key" field that would result in "duplicates", but of course you cannot modify the same path in different update operators like that.
So the closest thing you will get is issuing separate operations but also doing that with the "Bulk Operations API" which is introduced with MongoDB 2.6. This allows both to be sent to the server at the same time for the closest thing to a "contiguous" operations list you will get:
var bulk = db.so.initializeOrderedBulkOp();
bulk.find({ "Name": "fruitBowl", "pfms.n": "apples": }).updateOne({
"$pull": { "pfms": { "n": "apples" } }
});
bulk.find({ "Name": "fruitBowl" }).updateOne({
"$push": { "pfms": { "n": "apples", "state": 1111234 } }
})
bulk.execute();
That pretty much is your best approach if it is not possible or practical to move the elements to another collection and rely on "upserts" and $set in order to have the same functionality but on a collection rather than array.
I have faced the exact same scenario. I was inserting and removing likes from a post.
What I did is, using mongoose findOneAndUpdate function (which is similar to update or findAndModify function in mongodb).
The key concept is
Insert when the field is not present
Delete when the field is present
The insert is
findOneAndUpdate({ _id: theId, 'likes.userId': { $ne: theUserId }},
{ $push: { likes: { userId: theUserId, createdAt: new Date() }}},
{ 'new': true }, function(err, post) { // do the needful });
The delete is
findOneAndUpdate({ _id: theId, 'likes.userId': theUserId},
{ $pull: { likes: { userId: theUserId }}},
{ 'new': true }, function(err, post) { // do the needful });
This makes the whole operation atomic and there are no duplicates with respect to the userId field.
I hope this helpes. If you have any query, feel free to ask.
As far as I know MongoDB now (from v 4.2) allows to use aggregation pipelines for updates.
More or less elegant way to make it work (according to the question) looks like the following:
db.runCommand({
update: "your-collection-name",
updates: [
{
q: {},
u: {
$set: {
"pfms.$[elem]": {
"n":"apples",
"mState": NumberInt(1111234)
}
}
},
arrayFilters: [
{
"elem.n": {
$eq: "apples"
}
}
],
multi: true
}
]
})
In my scenario, The data need to be init when not existed, and update the field If existed, and the data will not be deleted. If the datas have these states, you might want to try the following method.
// Mongoose, but mostly same as mongodb
// Update the tag to user, If there existed one.
const user = await UserModel.findOneAndUpdate(
{
user: userId,
'tags.name': tag_name,
},
{
$set: {
'tags.$.description': tag_description,
},
}
)
.lean()
.exec();
// Add a default tag to user
if (user == null) {
await UserModel.findOneAndUpdate(
{
user: userId,
},
{
$push: {
tags: new Tag({
name: tag_name,
description: tag_description,
}),
},
}
);
}
This is the most clean and fast method in the scenario.
As a business analyst , I had the same problem and hopefully I have a solution to this after hours of investigation.
// The customer document:
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK"
}
]
}
// The problem : I want to insert dateField "01.01.2016" to customer documents where customerCodes subdocument has a document with code "YK" but does not have dateField. The final document must be as follows :
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK" ,
"dateField" : "01.01.2016"
}
]
}
// The solution : the solution code is in three steps :
// PART 1 - Find the customers with customerCodes "YK" but without dateField
// PART 2 - Find the index of the subdocument with "YK" in customerCodes list.
// PART 3 - Insert the value into the document
// Here is the code
// PART 1
var myCursor = db.customers.find({ customerCodes:{$elemMatch:{code:"YK", dateField:{ $exists:false} }}});
// PART 2
myCursor.forEach(function(customer){
if(customer.customerCodes != null )
{
var size = customer.customerCodes.length;
if( size > 0 )
{
var iFoundTheIndexOfSubDocument= -1;
var index = 0;
customer.customerCodes.forEach( function(clazz)
{
if( clazz.code == "YK" && clazz.changeDate == null )
{
iFoundTheIndexOfSubDocument = index;
}
index++;
})
// PART 3
// What happens here is : If i found the indice of the
// "YK" subdocument, I create "updates" document which
// corresponds to the new data to be inserted`
//
if( iFoundTheIndexOfSubDocument != -1 )
{
var toSet = "customerCodes."+ iFoundTheIndexOfSubDocument +".dateField";
var updates = {};
updates[toSet] = "01.01.2016";
db.customers.update({ "id" : customer.id } , { $set: updates });
// This statement is actually interpreted like this :
// db.customers.update({ "id" : "1212" } ,{ $set: customerCodes.0.dateField : "01.01.2016" });
}
}
}
});
Have a nice day !

MongoDB: Too many positional (i.e. '$') elements found in path

I just upgraded to Mongo 2.6.1 and one update statement that was working before is not returning an error. The update statement is:
db.post.update( { 'answers.comments.name': 'jeff' },
{ '$set': {
'answers.$.comments.$.name': 'joe'
}},
{ multi: true }
)
The error I get is:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 2,
"errmsg" : "Too many positional (i.e. '$') elements found in path 'answers.$.comments.$.createUsername'"
}
})
When I update an element just one level deep instead of two (i.e. answers.$.name instead of answers.$.comments.$.name), it works fine. If I downgrade my mongo instance below 2.6, it also works fine.
You CAN do this, you just need Mongo 3.6! Instead of redesigning your database, you could use the Array Filters feature in Mongo 3.6, which can be found here:
https://thecodebarbarian.com/a-nodejs-perspective-on-mongodb-36-array-filters
The beauty of this is that you can bind all matches in an array to a variable, and then reference that variable later. Here is the prime example from the link above:
Use arrayFilters.
MongoDB 3.5.12 extends all update modifiers to apply to all array
elements or all array elements that match a predicate, specified in a
new update option arrayFilters. This syntax also supports nested array
elements.
Let us assume a scenario-
"access": {
"projects": [{
"projectId": ObjectId(...),
"milestones": [{
"milestoneId": ObjectId(...),
"pulses": [{
"pulseId": ObjectId(...)
}]
}]
}]
}
Now if you want to add a pulse to a milestone which exists inside a project
db.users.updateOne({
"_id": ObjectId(userId)
}, {
"$push": {
"access.projects.$[i].milestones.$[j].pulses": ObjectId(pulseId)
}
}, {
arrayFilters: [{
"i.projectId": ObjectId(projectId)
}, {
"j.milestoneId": ObjectId(milestoneId)
}]
})
For PyMongo, use arrayFilters like this-
db.users.update_one({
"_id": ObjectId(userId)
}, {
"$push": {
"access.projects.$[i].milestones.$[j].pulses": ObjectId(pulseId)
}
}, array_filters = [{
"i.projectId": ObjectId(projectId)
}, {
"j.milestoneId": ObjectId(milestoneId)
}])
Also,
Each array filter must be a predicate over a document with a single
field name. Each array filter must be used in the update expression,
and each array filter identifier $[] must have a corresponding
array filter. must begin with a lowercase letter and not contain
any special characters. There must not be two array filters with the
same field name.
https://jira.mongodb.org/browse/SERVER-831
The positional operator can be used only once in a query. This is a limitation, there is an open ticket for improvement: https://jira.mongodb.org/browse/SERVER-831
As mentioned; more than one positional elements not supported for now. You may update with mongodb cursor.forEach() method.
db.post
.find({"answers.comments.name": "jeff"})
.forEach(function(post) {
if (post.answers) {
post.answers.forEach(function(answer) {
if (answer.comments) {
answer.comments.forEach(function(comment) {
if (comment.name === "jeff") {
comment.name = "joe";
}
});
}
});
db.post.save(post);
}
});
db.post.update(
{ 'answers.comments.name': 'jeff' },
{ '$set': {
'answers.$[i].comments.$.name': 'joe'
}},
{arrayFilters: [ { "i.comments.name": { $eq: 'jeff' } } ]}
)
check path after answers for get key path right
I have faced the same issue for the as array inside Array update require much performance impact. So, mongo db doest not support it. Redesign your database as shown in the given link below.
https://pythonolyk.wordpress.com/2016/01/17/mongodb-update-nested-array-using-positional-operator/
db.post.update( { 'answers.comments.name': 'jeff' },
{ '$set': {
'answers.$.comments.$.name': 'joe'
}},
{ multi: true }
)
Answer is
db.post.update( { 'answers.comments.name': 'jeff' },
{ '$set': {
'answers.0.comments.1.name': 'joe'
}},
{ multi: true }
)

how to use mongodb aggregate and retrieve entire documents

I am seriously baffled by mongodb's aggregate function. All I want is to find the newest document in my collection. Let's say each record has a field "created"
db.collection.aggregate({
$group: {
_id:0,
'id':{$first:"$_id"},
'max':{$max:"$created"}
}
})
yields the correct result, but I want the entire document in the result? How would I do that?
This is the structure of the document:
{
"_id" : ObjectId("52310da847cf343c8c000093"),
"created" : 1389073358,
"image" : ObjectId("52cb93dd47cf348786d63af2"),
"images" : [
ObjectId("52cb93dd47cf348786d63af2"),
ObjectId("52f67c8447cf343509d63af2")
],
"organization" : ObjectId("522949d347cf3402c3000001"),
"published" : 1392601521,
"status" : "PUBLISHED",
"tags" : [ ],
"updated" : 1392601521,
"user_id" : ObjectId("52214ce847cf344902000000")
}
In the documentation i found that the $$ROOT expression addresses this problem.
From the DOC:
http://docs.mongodb.org/manual/reference/operator/aggregation/group/#group-documents-by-author
query = [
{
'$sort': {
'created': -1
}
},
{
$group: {
'_id':null,
'max':{'$first':"$$ROOT"}
}
}
]
db.collection.aggregate(query)
db.collection.aggregate([
{
$group: {
'_id':"$_id",
'otherFields':{ $push: { fields: $ROOT } }
}
}
])
I think I figured it out. For example, I have a collection containing an array of images (or pointers). Now I want to find the document with the most images
results=[];
db.collection.aggregate([
{$unwind: "$images"},
{$group:{_id:"$_id", 'imagecount':{$sum:1}}},
{$group:{_id:"$_id",'max':{$max: "$imagecount"}}},
{$sort:{max:-1}},
{$group:{_id:0,'id':{$first:'$_id'},'max':{$first:"$max"}}}
]).result.forEach(function(d){
results.push(db.stories.findOne({_id:d.id}));
});
now the final array will contain the document with the most images. Since images is an array, I use $unwind, I then group by document id and $sum:1, pipe that into a $group that finds the max, pipe it into reverse $sort for max and $group out the first result. Finally I fetchOne the document and push it into the results array.
You should be using db.collection.find() rather than db.collection.aggregate():
db.collection.find().sort({"created":-1}).limit(1)