Remove the max value from an array in a document - mongodb

Using mongodb. I have a collection of vehicles, each one has an array of accidents, and each accident has a date.
Vehicle {
_id: ...,,
GasAMount...,
Type: ...,
Accidents: [
{
Date: ISODate(...),
Type: ..,
Cost: ..
},
{
Date: ISODate(..),
Type: ..,
Cost:...,
}
]
}
How can i remove the oldest accident of each vehicle without using aggregate ?
Important not to use the aggregate method.

Unfortunately, you may have to use aggregation in this case as it's near impossible to find a non-aggregation based solution that can be as efficient.
Aggregation is useful here to get the embedded documents with the oldest date. Once you get them it's easier to do an update. The following demonstrates this concept, using MongoDB's bulk API to update your collection:
var bulk = db.vehicles.initializeUnorderedBulkOp(),
counter = 0,
pipeline = [
{ "$unwind": "$Accidents" },
{
"$group": {
"_id": "$_id",
"oldestDate": { "$min": "$Accidents.Date" }
}
}
];
var cur = db.vehicles.aggregate(pipeline);
cur.forEach(function (doc){
bulk.find({ "_id": doc._id }).updateOne({
"$pull": { "Accidents": { "Date": doc.oldestDate } }
});
counter++;
if (counter % 100 == 0) {
bulk.execute();
bulk = db.vehicles.initializeUnorderedBulkOp();
}
});
if (counter % 100 != 0) bulk.execute();

Related

MongoDB: update all document on one field

{
"_id" : 1,
"users" : 2329255
},
{
"_id" :2,
"users" : 2638831
}
how to update all documents users field divided by 100.
result will be
{
"_id" : 1,
"users" : 23292.55
},
{
"_id" : 2,
"users" : 26388.31
}
db.coll.update({}, {$set: {'users': {'$divide': ['$users', 100]}}})
----its not working
Try below query:
db.coll.find().snapshot().forEach(
function (e) {
e.users = e.users/100;
// save the updated document
db.coll.save(e);
}
)
Above query will change/update the data in DB. If you want to fetch records with devided value then use $ project:
db.coll.aggregate(
[
{ $project: { users: { $divide: [ "$users", 100 ] } } }
]
)
this will not update the data but will return you desired value.
Use as per your requirement.
The $divide operator is only valid for the aggregate() function, not the update() function. What you want to do is use the aggregate() method to create a computed field, iterate the results from
the aggregate() cursor to create bulk update operations that you can send to the server in one request, rather that sending each update request with each item in the result.
The following example demonstrates this:
var bulkUpdateOps = [];
db.coll.aggregate([
{ "$match": { "users": { "$exists": true } } }
{
"$project": {
"computed_field": {
"$divide": ["$users", 100]
}
}
}
]).forEach(function(doc){
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "users": doc.computed_field } }
}
});
if (bulkUpdateOps.length === 500) {
db.coll.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) db.coll.bulkWrite(bulkUpdateOps);
Or for MongoDB 2.6.x and 3.0.x releases, use this version of Bulk operations:
var bulk = db.coll.initializeUnorderedBulkOp(),
counter = 0;
db.coll.aggregate([
{ "$match": { "users": { "$exists": true } } }
{
"$project": {
"computed_field": {
"$divide": ["$users", 100]
}
}
}
]).forEach(function(doc) {
bulk.find({ "_id": doc._id })
.updateOne({ "$set": { "users": doc.computed_field } });
if (counter % 500 === 0) {
bulk.execute();
bulk = db.coll.initializeUnorderedBulkOp();
}
});
if (counter % 500 !== 0 ) bulk.execute();
The Bulk operations API in both cases will help reduce the IO load on the server by sending the requests only once in every 500 documents in the collection to process.

Remove all Duplicates except the most recent document

I would like to clear all duplicated of a specific field in a collection. leaving only the earliest entry of the duplicates.
Here is my aggregate query which works great for finding the duplicates:
db.History.aggregate([
{ $group: {
_id: { name: "$sessionId" },
uniqueIds: { $addToSet: "$_id" },
count: { $sum: 1 }
} },
{ $match: {
count: { $gte: 2 }
} },
{ $sort : { count : -1} }
],{ allowDiskUse:true,
cursor:{}});
Only problem is that i need to execute a remove query as well and keep for each of the duplicates the youngest entry (determined by the field 'timeCreated':
"timeCreated" : ISODate("2016-03-07T10:48:43.251+02:00")
How exactly do i do that?
Personally I would take advantage of the fact that the ObjectId values themselves are "monotonic" or therefore "ever increasing in value" which means that the "youngest" or "most recent" would come at the end of a naturally sorted list.
So rather than force the aggregation pipeline to do the sorting, the most logical and efficient thing to do is simply sort the list of unique _id values returned per document as you process each response.
So basically working with the listing that you must have found:
Remove Duplicates from MongoDB
And is actually my answer ( and your the second person to reference this week, and yet no votes received for useful! Hmm! ), where it's just a simple .sort() applied within the cursor iteration for the returned array:
Using the _id Value
var bulk = db.History.initializeOrderedBulkOp(),
count = 0;
// List "all" fields that make a document "unique" in the `_id`
// I am only listing some for example purposes to follow
db.History.aggregate([
{ "$group": {
"_id": "$sessionId",
"ids": { "$push": "$_id" }, // _id values are already unique, so $addToSet adds nothing
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } }
],{ "allowDiskUse": true}).forEach(function(doc) {
doc.ids.sort().reverse(); // <-- this is the only real change
doc.ids.shift(); // remove first match, which is now youngest
bulk.find({ "_id": { "$in": doc.ids } }).remove(); // removes all $in list
count++;
// Execute 1 in 1000 and re-init
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.History.initializeOrderedBulkOp();
}
});
if ( count % 1000 != 0 )
bulk.execute();
Using a specific field
If you "really" are set on adding another date value on which to determine which is youngest then just add to the array in $push first, then apply the client side sort function. Again just a really simple change:
var bulk = db.History.initializeOrderedBulkOp(),
count = 0;
// List "all" fields that make a document "unique" in the `_id`
// I am only listing some for example purposes to follow
db.History.aggregate([
{ "$group": {
"_id": "$sessionId",
"ids": { "$push": {
"_id": "$_id",
"created": "$timeCreated"
}},
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } }
],{ "allowDiskUse": true}).forEach(function(doc) {
doc.ids = doc.ids.sort(function(a,b) { // sort dates and just return _id
return a.created.valueOf() < a.created.valueOf()
}).map(function(el) { return el._id });
doc.ids.shift(); // remove first match, which is now youngest
bulk.find({ "_id": { "$in": doc.ids } }).remove(); // removes all $in list
count++;
// Execute 1 in 1000 and re-init
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.History.initializeOrderedBulkOp();
}
});
if ( count % 1000 != 0 )
bulk.execute();
So it's a really simple process with no "real" alteration to the original process used to identify the duplicates and then remove all but one of them.
Always the best approach here to just let the server do the job of finding the duplicates, then client side when iterating the cursor you can then work out from the returned array which document is going to be kept and which ones you are going to remove.

Insert field with array size in mongo

I have a documents in mongodb, containing some array. Now I need to have a field containing a quantity of items of this array. So I need to update documents adding this field.
Simply I thought this will work:
db.myDocument.update({
"itemsTotal": {
$exists: false
},
"items": {
$exists: true
}
}, {
$set: {
itemsTotal: {
$size: "$items"
}
}
}, {
multi: true
})
But it completes with "not okForStorage".
Also I tried to make an aggregation, but it throws exception:
"errmsg" : "exception: invalid operator '$size'",
"code" : 15999,
"ok" : 0
What is a best solution and what I do wrong? I'm starting to think about writing java tool for calculation totals and updating documents with it.
You can use the .aggregate() method to $project your documents and return the $size of the items array. After that you will need to loop through your aggregation result using the .forEach loop and $set the itemTotal field for your document using "Bulk" operation for maximum efficiency.
var bulkOp = db.myDocument.initializeUnorderedBulkOp();
var count = 0;
db.myDocument.aggregate([
{ "$match": {
"itemsTotal": { "$exists": false } ,
"items": { "$exists": true }
}},
{ "$project": { "itemsTotal": { "$size": "$items" } } }
]).forEach(function(doc) {
bulkOp.find({ "_id": doc._id }).updateOne({
"$set": { "itemsTotal": doc.itemsTotal }
});
count++;
if (count % 200 === 0) {
// Execute per 200 operations and re-init
bulkOp.execute();
bulkOp = db.myDocument.initializeUnorderedBulkOp();
}
})
// Clean up queues
if (count > 0) {
bulkOp.execute();
}
You could initialise a Bulk() operations builder to update the document in a loop as follows:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find("itemsTotal": { "$exists": false },
"items": {
$exists: true
}
).forEach(function(doc) {
var items_size = doc.items.length;
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "itemsTotal": items_size }
});
count++;
if (count % 100 == 0) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if (count % 100 != 0) { bulk.execute(); }
This is much easier starting with MongoDB v3.4, which introduced the $addFields aggregation pipeline operator. We'll also use the $out operator to output the result of the aggregation to the same collection (replacing the existing collection is atomic).
db.myDocuments.aggregate( [
{
$addFields: {
itemsTotal: { $size: "$items" } ,
},
},
{
$out: "myDocuments"
}
] )
WARNING: this solution requires that all documents to have the items field. If some documents don't have it, aggregate will fail with
"The argument to $size must be an array, but was of type: missing"
You might think you could add a $match to the aggregation to filter only documents containing items, but that means all documents not containing items will not be output back to the myDocuments collection, so you'll lose those permanently.

Convert a field to an array using update operation

In Mongo, how do you convert a field to an array containing only the original value using only the update operation?
Given a document:
{
"field": "x"
}
Then one or more update operation(s):
db.items.update(...)
Should result in:
{
"field": ["x"]
}
MongoDB does not currently allow you to refer to the existing value of a field within an update type of operation. In order to make changes that refer to the existing field values you would need to loop the results. But the array conversion part is simple:
db.collection.find().forEach(function(doc) {
db.collection.update(
{ _id: doc._id },
{ "$set": { "field": [doc.field] } }
);
})
Or even better with bulk update functionality in MongoDB 2.6:
var batch = [];
db.collection.find().forEach(function(doc) {
batch.push({
"q": { _id: doc._id },
"u": { "$set": { "field": [doc.field] } }
});
if ( batch.length % 500 == 0 ) {
db.runCommand({ "update": "collection", "updates": batch });
batch = [];
}
});
if ( batch.length > 0 )
db.runCommand({ "update": "collection", "updates": batch });
Or even using the new Bulk API helpers:
var counter = 0;
var bulk = db.collection.initializeUnorderedBulkOp();
db.collection.find().forEach(function(doc) {
bulk.find({ _id: doc._id }).update({
"$set": { "field": [doc.field] }
});
counter++;
if ( counter % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
counter = 0;
}
});
if ( counter > 0 )
bulk.execute();
Both of those last would only send the updates to the server per every 500 items or whatever you want to tune it to under the 16MB BSON limit. All updates are still performed individually, but this removes a lot of write/confirmation traffic from the overall operation and is much faster.

Removing white spaces (leading and trailing) from string value

I have imported a csv file in mongo using mongoimport and I want to remove leading and trailing white spaces from my string value.
Is it possible directly in mongo to use a trim function for all collection or do I need to write a script for that?
My collection contains elements such as:
{
"_id" : ObjectId("53857680f7b2eb611e843a32"),
"category" : "Financial & Legal Services "
}
I want to apply trim function for all the collection so that "category" should not contain any leading and trailing spaces.
It is not currently possible for an update in MongoDB to refer to the existing value of a current field when applying the update. So you are going to have to loop:
db.collection.find({},{ "category": 1 }).forEach(function(doc) {
doc.category = doc.category.trim();
db.collection.update(
{ "_id": doc._id },
{ "$set": { "category": doc.category } }
);
})
Noting the use of the $set operator there and the projected "category" field only in order to reduce network traffic"
You might limit what that processes with a $regex to match:
db.collection.find({
"$and": [
{ "category": /^\s+/ },
{ "category": /\s+$/ }
]
})
Or even as pure $regex without the use of $and which you only need in MongoDB where multiple conditions would be applied to the same field. Otherwise $and is implicit to all arguments:
db.collection.find({ "category": /^\s+|\s+$/ })
Which restricts the matched documents to process to only those with leading or trailing white-space.
If you are worried about the number of documents to look, bulk updating should help if you have MongoDB 2.6 or greater available:
var batch = [];
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1 }).forEach(
function(doc) {
batch.push({
"q": { "_id": doc._id },
"u": { "$set": { "category": doc.catetgory.trim() } }
});
if ( batch.length % 1000 == 0 ) {
db.runCommand("update", batch);
batch = [];
}
}
);
if ( batch.length > 0 )
db.runCommand("update", batch);
Or even with the bulk operations API for MongoDB 2.6 and above:
var counter = 0;
var bulk = db.collection.initializeOrderedBulkOp();
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1}).forEach(
function(doc) {
bulk.find({ "_id": doc._id }).update({
"$set": { "category": doc.category.trim() }
});
counter = counter + 1;
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
);
if ( counter > 1 )
bulk.execute();
Best done with bulkWrite() for modern API's which uses the Bulk Operations API ( technically everything does now ) but actually in a way that is safely regressive with older versions of MongoDB. Though in all honesty that would mean prior to MongoDB 2.6 and you would be well out of coverage for official support options using such a version. The coding is somewhat cleaner for this:
var batch = [];
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1}).forEach(
function(doc) {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "category": doc.category.trim() } }
}
});
if ( batch.length % 1000 == 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
}
);
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
Which all only send operations to the server once per 1000 documents, or as many modifications as you can fit under the 64MB BSON limit.
As just a few ways to approach the problem. Or update your CSV file first before importing.
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update of a field based on its own value.
Starting Mongo 4.0, the $trim operator can be applied on a string to remove its leading/trailing white spaces:
// { category: "Financial & Legal Services " }
// { category: " IT " }
db.collection.updateMany(
{},
[{ $set: { category: { $trim: { input: "$category" } } } }]
)
// { category: "Financial & Legal Services" }
// { category: "IT" }
Note that:
The first part {} is the match query, filtering which documents to update (in this case all documents).
The second part [{ $set: { category: { $trim: { input: "$category" } } } }] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator which in this case replaces the value for "category".
With $trim we modify and trim the value for "category".
Note that $trim can take an optional parameter chars which allows specifying which characters to trim.
Small correction to the answer from Neil for bulk operations api
it is
initializeOrderedBulkOp
not
initializeBulkOrderedOp
also you missed to
counter++;
inside the forEach, so in summary
var counter = 1;
var bulk = db.collection.initializeOrderedBulkOp();
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1}).forEach(
function(doc) {
bulk.find({ "_id": doc._id }).update({
"$set": { "category": doc.category.trim() }
});
if ( counter % 1000 == 0 ) {
bulk.execute();
counter = 1;
}
counter++;
}
);
if ( counter > 1 )
bulk.execute();
Note: I don't have enough reputation to comment, hence adding an answer
You can execute javascript in an MongoDB update command when it's in a cursor method:
db.collection.find({},{ "category": 1 }).forEach(function(doc) {
db.collection.update(
{ "_id": doc._id },
{ "$set": { "category": doc.category.trim() } }
);
})
If you have a ton of records and need to batch process, you might want to look at the other answers here.