mongodb delete nested object without knowledge of object nodes - mongodb

For the below document, I am trying to delete the node which contains id = 123
{
'_id': "1234567890",
"image" : {
"unknown-node-1" : {
"id" : 123
},
"unknown-node-2" : {
"id" : 124
}
}
}
Result should be as below.
{
'_id': "1234567890",
"image" : {
"unknown-node-2" : {
"id" : 124
}
}
}
The below query achieves the result. But i have to know the unknown-node-1 in advance. How can I achieve the results without pre-knowledge of node, but only
info that I have is image.*.id = 123
(* means unknown node)
Is it possible in mongo? or should I do these find on my app code.
db.test.update({'_id': "1234567890"}, {$unset: {'image.unknown-node-1': ""}})

Faiz,
There is no operator to help match and project a single key value pair without knowing the key. You'll have to write post processing code to scan each one of the documents to find the node with the id and then perform your removal.
If you have the liberty of changing your schema, you'll have more flexibilty. With a document design like this:
{
'_id': "1234567890",
"image" : [
{"id" : 123, "name":"unknown-node-1"},
{"id" : 124, "name":"unknown-node-2"},
{"id" : 125, "name":"unknown-node-3"}
]
}
You could remove documents from the array like this:
db.collectionName.update(
{'_id': "1234567890"},
{ $pull: { image: { id: 123} } }
)
This would result in:
{
'_id': "1234567890",
"image" : [
{"id" : 124, "name":"unknown-node-2"},
{"id" : 125, "name":"unknown-node-3"}
]
}

With your current schema, you will need a mechanism to get a list of the dynamic keys that you need to assemble the query before doing the update and one way of doing this would be with MapReduce. Take for instance the following map-reduce operation which will populate a separate collection with all the keys as the _id values:
mr = db.runCommand({
"mapreduce": "test",
"map" : function() {
for (var key in this.image) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "test_keys"
})
To get a list of all the dynamic keys, run distinct on the resulting collection:
> db[mr.result].distinct("_id")
[ "unknown-node-1", "unknown-node-2" ]
Now given the list above, you can assemble your query by creating an object that will have its properties set within a loop. Normally if you knew the keys beforehand, your query will have this structure:
var query = {
"image.unknown-node-1.id": 123
},
update = {
"$unset": {
"image.unknown-node-1": ""
}
};
db.test.update(query, update);
But since the nodes are dynamic, you will have to iterate the list returned from the mapReduce operation and for each element, create the query and update parameters as above to update the collection. The list could be huge so for maximum efficiency and if your MongoDB server is 2.6 or newer, it would be better to take advantage of using a write commands Bulk API that allow for the execution of bulk update operations which are simply abstractions on top of the server to make it easy to build bulk operations and thus get perfomance gains with your update over large collections. These bulk operations come mainly in two flavours:
Ordered bulk operations. These operations execute all the operation in order and error out on the first write error.
Unordered bulk operations. These operations execute all the operations in parallel and aggregates up all the errors. Unordered bulk operations do not guarantee order of execution.
Note, for older servers than 2.6 the API will downconvert the operations. However it's not possible to downconvert 100% so there might be some edge cases where it cannot correctly report the right numbers.
In your case, you could implement the Bulk API update operation like this:
mr = db.runCommand({
"mapreduce": "test",
"map" : function() {
for (var key in this.image) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "test_keys"
})
// Get the dynamic keys
var dynamic_keys = db[mr.result].distinct("_id");
// Get the collection and bulk api artefacts
var bulk = db.test.initializeUnorderedBulkOp(), // Initialize the Unordered Batch
counter = 0;
// Execute the each command, triggers for each key
dynamic_keys.forEach(function(key) {
// Create the query and update documents
var query = {},
update = {
"$unset": {}
};
query["image."+ key +".id"] = 123;
update["$unset"]["image." + key] = ";"
bulk.find(query).update(update);
counter++;
if (counter % 100 == 0 ) {
bulk.execute() {
// re-initialise batch operation
bulk = db.test.initializeUnorderedBulkOp();
}
});
if (counter % 100 != 0) { bulk.execute(); }

Related

Add field if not exist to document in Mongo

Source Doc
{
"_id" : "12345",
"LastName" : "Smith",
"FirstName" : "Fred",
"ProfileCreated" : NumberLong(1447118831860),
"DropOut" : false,
}
New Doc
{
"_id" : "12345",
"LastName" : "Smith",
"FirstName" : "Fred",
"ProfileCreated" : NumberLong(1447118831860),
"DropOut" : true,
"LatestConsultation" : false,
}
I have two collections which share a lot of the same document ID's and fields but over time the new documents will have fields added to them and or completely new documents with new ID's will get created.
I think I know how to handle new documents using $setOnInsert and upsert = true but I'm not sure how best to handle the addition of new fields. The behavior I require for documents that exists in both collection matched on _id with new fields is to add the new field to the document without modifying the values of any of the other fields even if they have changed as in the example where the DropOut value has changed. The resulting document I require is.
Result document
{
"_id" : "12345",
"LastName" : "Smith",
"FirstName" : "Fred",
"ProfileCreated" : NumberLong(1447118831860),
"DropOut" : false,
"LatestConsultation" : false,
}
What is the best and most performatic way to achive this? Also if this can somehow be combined into a single statement that also includes the addition of documents that exists in the new collection but not in the source collection that would be amazing :-)
PS. I am using Pymongo so a Pymongo example would be even better but I can translate a mongo shell example.
Not sure is this is possible with an atomic update. However, you could string in some mixed operations and tackle this in such a way that you iterate the new collection and for each document in the new collection:
Use the _id field to query the old collection. Use the findOne() method to return a document from the old collection that matches on the _id from the new collection.
Extend the new doc with the old doc by adding the new fields which do not exist in the old document.
Update the new collection with this merged document.
The following basic mongo shell example demonstrates the algorithm above:
function merge(from, to) {
var obj = {};
if (!from) {
from = {};
} else {
obj = from;
}
for (var key in to) {
if (!from.hasOwnProperty(key)) {
obj[key] = to[key];
}
}
return obj;
}
db.new_collection.find({}).snapshot().forEach(function(doc){
var old_doc = db.old_collection.findOne({ "_id": doc._id }),
merged_doc = merge(old_doc, doc);
db.new_collection.update(
{ "_id": doc._id },
{ "$set": merged_doc }
);
});
For dealing with large collections, better leverage your updates using the bulk API which offers better performance and efficient update operations done through
sending the update requests in bulk rather than each update operation for every request (which is slow). The method to use is the bulkWrite() function, which can be applied in the above example as:
function merge(from, to) {
var obj = {};
if (!from) {
from = {};
} else {
obj = from;
}
for (var key in to) {
if (!from.hasOwnProperty(key)) {
obj[key] = to[key];
}
}
return obj;
}
var ops = [];
db.new_collection.find({}).snapshot().forEach(function(doc){
var old_doc = db.old_collection.findOne({ "_id": doc._id }),
merged_doc = merge(old_doc, doc);
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": merged_doc }
}
});
if (ops.length === 1000) {
db.new_collection.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.new_collection.bulkWrite(ops);
Or for MongoDB 2.6.x and 3.0.x releases use this version of Bulk operations:
var bulk = db.new_collection.initializeUnorderedBulkOp(),
counter = 0;
db.new_collection.find({}).snapshot().forEach(function(doc){
var old_doc = db.old_collection.findOne({ "_id": doc._id }),
merged_doc = merge(old_doc, doc);
bulk.find({ "_id": doc._id }).updateOne({ "$set": merged_doc });
if (counter % 1000 === 0) {
bulk.execute();
bulk = db.new_collection.initializeUnorderedBulkOp();
}
});
if (counter % 1000 !== 0 ) bulk.execute();
The Bulk operations API in both cases will help reduce the IO load on the server by sending the requests only once in every 1000 documents in the collection to process.

How can i remove empty string from a mongodb collection?

I have a "mongodb colllenctions" and I'd like to remove the "empty strings"with keys from it.
From this:
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year" : "15",
"year_comment" : "",
}
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year" : "",
"year_comment" : "asd",
}
I'd like to gain this result:
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year" : "15",
}
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year_comment" : "asd",
}
How could I solve it?
Please try executing following code snippet in Mongo shell which strips fields with empty or null values
var result=new Array();
db.getCollection('test').find({}).forEach(function(data)
{
for(var i in data)
{
if(data[i]==null || data[i]=='')
{
delete data[i]
}
}
result.push(data)
})
print(tojson(result))
Would start with getting a distinct list of all the keys in the collection, use those keys as your query basis and do an ordered bulk update using the Bulk API operations. The update statement uses the $unset operator to remove the fields.
The mechanism to get distinct keys list that you need to assemble the query is possible through Map-Reduce. The following mapreduce operation will populate a separate collection with all the keys as the _id values:
mr = db.runCommand({
"mapreduce": "my_collection",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "my_collection" + "_keys"
})
To get a list of all the dynamic keys, run distinct on the resulting collection:
db[mr.result].distinct("_id")
// prints ["_id", "year", "year_comment", ...]
Now given the list above, you can assemble your query by creating an object that will have its properties set within a loop. Normally your query will have this structure:
var keysList = ["_id", "year", "year_comment"];
var query = keysList.reduce(function(obj, k) {
var q = {};
q[k] = "";
obj["$or"].push(q);
return obj;
}, { "$or": [] });
printjson(query); // prints {"$or":[{"_id":""},{"year":""},{"year_comment":""}]}
You can then use the Bulk API (available with MongoDB 2.6 and above) as a way of streamlining your updates for better performance with the query above. Overall, you should be able to have something working as:
var bulk = db.collection.initializeOrderedBulkOp(),
counter = 0,
query = {"$or":[{"_id":""},{"year":""},{"year_comment":""}]},
keysList = ["_id", "year", "year_comment"];
db.collection.find(query).forEach(function(doc){
var emptyKeys = keysList.filter(function(k) { // use filter to return an array of keys which have empty strings
return doc[k]==="";
}),
update = emptyKeys.reduce(function(obj, k) { // set the update object
obj[k] = "";
return obj;
}, { });
bulk.find({ "_id": doc._id }).updateOne({
"$unset": update // use the $unset operator to remove the fields
});
counter++;
if (counter % 1000 == 0) {
// Execute per 1000 operations and re-initialize every 1000 update statements
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
})
If you need to update a single blank parameter or you prefer to do parameter by parameter, you can use the mongo updateMany functionality:
db.comments.updateMany({year: ""}, { $unset : { year : 1 }})

Rename a sub-document field within an Array

Considering the document below how can I rename 'techId1' to 'techId'. I've tried different ways and can't get it to work.
{
"_id" : ObjectId("55840f49e0b"),
"__v" : 0,
"accessCard" : "123456789",
"checkouts" : [
{
"user" : ObjectId("5571e7619f"),
"_id" : ObjectId("55840f49e0bf"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"techId1" : ObjectId("553d9cbcaf")
},
{
"user" : ObjectId("5571e7619f15"),
"_id" : ObjectId("55880e8ee0bf"),
"date" : ISODate("2015-06-22T13:01:51.672Z"),
"techId1" : ObjectId("55b7db39989")
}
],
"created" : ISODate("2015-06-19T12:47:05.422Z"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"location" : ObjectId("55743c8ddbda"),
"model" : "model1",
"order" : ObjectId("55840f49e0bf"),
"rid" : "987654321",
"serialNumber" : "AHSJSHSKSK",
"user" : ObjectId("5571e7619f1"),
"techId" : ObjectId("55b7db399")
}
In mongo console I tried which gives me ok but nothing is actually updated.
collection.update({"checkouts._id":ObjectId("55840f49e0b")},{ $rename: { "techId1": "techId" } });
I also tried this which gives me an error. "cannot use the part (checkouts of checkouts.techId1) to traverse the element"
collection.update({"checkouts._id":ObjectId("55856609e0b")},{ $rename: { "checkouts.techId1": "checkouts.techId" } })
In mongoose I have tried the following.
collection.findByIdAndUpdate(id, { $rename: { "checkouts.techId1": "checkouts.techId" } }, function (err, data) {});
and
collection.update({'checkouts._id': n1._id}, { $rename: { "checkouts.$.techId1": "checkouts.$.techId" } }, function (err, data) {});
Thanks in advance.
You were close at the end, but there are a few things missing. You cannot $rename when using the positional operator, instead you need to $set the new name and $unset the old one. But there is another restriction here as they will both belong to "checkouts" as a parent path in that you cannot do both at the same time.
The other core line in your question is "traverse the element" and that is the one thing you cannot do in updating "all" of the array elements at once. Well, not safely and without possibly overwriting new data coming in anyway.
What you need to do is "iterate" each document and similarly iterate each array member in order to "safely" update. You cannot really iterate just the document and "save" the whole array back with alterations. Certainly not in the case where anything else is actively using the data.
I personally would run this sort of operation in the MongoDB shell if you can, as it is a "one off" ( hopefully ) thing and this saves the overhead of writing other API code. Also we're using the Bulk Operations API here to make this as efficient as possible. With mongoose it takes a bit more digging to implement, but still can be done. But here is the shell listing:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "checkouts.techId1": { "$exists": true } }).forEach(function(doc) {
doc.checkouts.forEach(function(checkout) {
if ( checkout.hasOwnProperty("techId1") ) {
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$set": { "checkouts.$.techId": checkout.techId1 }
});
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$unset": { "checkouts.$.techId1": 1 }
});
count += 2;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
});
});
if ( count % 500 !== 0 )
bulk.execute();
Since the $set and $unset operations are happening in pairs, we are keeping the total batch size to 1000 operations per execution just to keep memory usage on the client down.
The loop simply looks for documents where the field to be renamed "exists" and then iterates each array element of each document and commits the two changes. As Bulk Operations, these are not sent to the server until the .execute() is called, where also a single response is returned for each call. This saves a lot of traffic.
If you insist on coding with mongoose. Be aware that a .collection acessor is required to get to the Bulk API methods from the core driver, like this:
var bulk = Model.collection.inititializeOrderedBulkOp();
And the only thing that sends to the server is the .execute() method, so this is your only execution callback:
bulk.exectute(function(err,response) {
// code body and async iterator callback here
});
And use async flow control instead of .forEach() such as async.each.
Also, if you do that, then be aware that as a raw driver method not governed by mongoose, you do not get the same database connection awareness as you do with mongoose methods. Unless you know for sure the database connection is already established, it is safter to put this code within an event callback for the server connection:
mongoose.connection.on("connect",function(err) {
// body of code
});
But otherwise those are the only real ( apart from call syntax ) alterations you really need.
This worked for me, I created this query to perform this procedure and I share it, (although I know it is not the most optimized way):
First, make an aggregate that (1) $match the documents that have the checkouts array field with techId1 as one of the keys of each sub-document. (2) $unwind the checkouts field (that deconstructs the array field from the input documents to output a document for each element), (3) adds the techId field (with $addFields), (4) $unset the old techId1 field, (5) $group the documents by _id to have again the checkout sub-documents grouped by its _id, and (6) write the result of these aggregation in a temporal collection (with $out).
const collection = 'yourCollection'
db[collection].aggregate([
{
$match: {
'checkouts.techId1': { '$exists': true }
}
},
{
$unwind: {
path: '$checkouts'
}
},
{
$addFields: {
'checkouts.techId': '$checkouts.techId1'
}
},
{
$project: {
'checkouts.techId1': 0
}
},
{
$group: {
'_id': '$_id',
'checkouts': { $push: { 'techId': '$checkouts.techId' } }
}
},
{
$out: 'temporal'
}
])
Then, you can make another aggregate from this temporal collection to $merge the documents with the modified checkouts field to your original collection.
db.temporal.aggregate([
{
$merge: {
into: collection,
on: "_id",
whenMatched:"merge",
whenNotMatched: "insert"
}
}
])

How to get total record fields from mongodb by using $group?

I wants to get records with all fields using $group in mongodb.
i.e: SELECT * FROM users GROUP BY state, equivalent query in mongodb.
Can any one help me.
AFAIK there is no way to return all object in group query. You can use $addToSet operator to add fields into the array to return. The example code is shown in the below. You can add all the fields to the array using addToSet operator. It will return array as a response and you should get data from those array(s).
db.users.aggregate({$group : {_id : "$state", id : {$addToSet : "$_id"}, field1 : {$addToSet : "$field1"}}});
You can do it with MapReduce instead:
db.runCommand({
mapreduce: 'tests',
map: function() {
return emit(this.state, { docs: [this] });
},
reduce: function(key, vals) {
var res = vals.pop();
vals.forEach(function(val) {
[].push.apply(res.docs, val.docs);
});
return res;
},
finalize: function(key, reducedValue) {
return reducedValue.docs;
},
out: { inline: 1 }
})
I'm using finalize function in my example because MapReduce not supports arrays in reducedValue.
But, regardless of the method, you should try to avoid such queries in productions. They are fine for rare requests, like analytics, db migrations or daily scripting, but not for frequent ones.

"Map Reduce" reduce function finds value undefined

I have the following collection:
{
"_id" : ObjectId("51f1fcc08188d3117c6da351"),
"cust_id" : "abc123",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 25,
"items" : [{
"sku" : "ggg",
"qty" : 7,
"price" : 2.5
}, {
"sku" : "ppp",
"qty" : 5,
"price" : 2.5
}]
}
My map function is:
var map=function(){emit(this._id,this);}
For debugging purpose I overide the emit method as follows:
var emit = function (key,value){
print("emit");
print("key: " + key + "value: " + tojson(value));
reduceFunc2(key, toJson(value));
}
and the reduce function as follows:
var reduceFunc2 = function reduce(key,values){
var val = values;
print("val",val);
var items = [];
val.items.some(function (entry){
print("entry is:::"+entry);
if (entry.qty>5 && entry.sku=='ggg'){
items.push(entry)
}
});
val.items = items;
return val;
}
But when I apply map as:
var myDoc = db.orders.findOne({
_id: ObjectId("51f1fcc08188d3117c6da351")
});
map.apply(myDoc);
I get the following error:
emit key: 51f1fcc08188d3117c6da351 value:
{
"_id":" ObjectId(\"51f1fcc08188d3117c6da351\")",
"cust_id":"abc123",
"ord_date":" ISODate(\"2012-10-03T18:30:00Z\")",
"status":"A",
"price":25,
"items":[
{
"sku":"ggg",
"qty":7,
"price":2.5
},
{
"sku":"ppp",
"qty":5,
"price":2.5
}
]
}
value:: undefined
Tue Jul 30 12:49:22.920 JavaScript execution failed: TypeError: Cannot call method 'some' of undefined
you can find that their is an items field in the value as printed which is of array kind, even then it is throwing error cannot call some on undefined, if someone can tell where i am going wrong.
You have an error in your reduceFunc2 function:
var reduceFunc2 = function reduce(key,values){
var val = values[0]; //values is an array!!!
// ...
}
Reduce function meant to reduce an array of elements, emitted with the same key, to a single document. So, it accepts an array. You're emitting each key only once, so it's an array with a single element with it.
Now you'll be able to call your MapReduce normally:
db.orders.mapReduce(map, reduceFunc2, {out: {inline: 1}});
The way you overridden emit function is broken, so you shouldn't use it.
Update. Mongo may skip reduce operation if there is only one document associated with the given key, because there is no point in reducing a single document.
The idea of MapReduce is that you maps each document into an array of key-value pairs to be reduced on the next step. If there is more than one value associated with the given key, Mongo runs a reduce operation to reduce it to the single document. Mongo expects reduce function to return reduced document in the same format as the elements which was emitted. It's why Mongo may run reduce operation any number of times for each key (up to the number of emits). There is also no guarantee that reduce operation will be called at all if there is nothing to reduce (e.g. if there is only one element).
So, it's best to move map logic to the proper place.
Update 2. Anyway, why are you using MapReduce here? You can just query for the documents you need:
db.orders.find({}, {
items: {
$elemMatch: {
qty: {$gt: 5},
sku: 'qqq'
}
}
})
Update 3. If you really want to do it with MapReduce, try this:
db.runCommand({
mapreduce: 'orders',
query: {
items: {
$elemMatch: {
qty: {$gt: 5},
sku: 'ggg'
}
}
},
map: function map (){
this.items = this.items.filter(function (entry) {
return (entry.qty>5 && entry.sku=='ggg')
});
emit(this._id,this);
},
reduce: function reduce (key, values) {
return values[0];
},
verbose: true,
out: {
merge: 'map_reduce'
}
})