How to parseFloat in a bulk update Mongo operation - mongodb

So I have a few fields that are currently strings that I am converting over to Double values in my mongo database. I was originally doing it with a find.forEach(function()).... style but that was taking way too long so I decided to try out the bulk operations Mongo has. So here is my attempt:
var bulkOp = db.transactions.initializeOrderedBulkOp();
bulkOp.find({"tran.val": {$exists: true}}).update(
{
$set: {"tran.val":parseFloat("$tran.val")}
}
);
bulkOp.execute();
When I do this my tran.val which use to be "1.35" or something along those lines now becomes NaN. I also tried without the $ on tran.val but no luck. Am I missing something?

Related

Put all documents in an array

Is there a way to push all the documents of a given collection in a array?
I did this but is there any quicker way?
var ops = [];
db.getCollection('stock').find({}).forEach(function (stock) {
ops.push(stock);
})
PS: I use Mongo 3.4
You can just use the toArray function on the cursor that's returned from find, like this:
var ops = db.getCollection('stock').find({}).toArray();
Note: As with your original solution, this might suffer with performance if the stock collection contains millions of documents.
As an aside, you can use db.stock directly to shorten the query a little bit:
var ops = db.stock.find({}).toArray();
Try using lean query option. in your case:
db.getCollection('stock').find({}).lean()
You could as well use $facet which will allow you to create the array on the server side - provided the resulting document array is no bigger than 16MB in which case you'll get an exception:
db.stock.aggregate({
$facet: {
ops: [ { $match: {} } ]
}
})
In order to reduce the amount of data returned you could limit the number of returned fields in the above pipeline (instead of an empty $match stage - which is a hack anyway - you would then use $project).

Iterating over MongoDB collection to duplicate all documents is painfully slow

I have a collection of 7,000,000 documents (each of perhaps 1-2 KB BSON) in a MongoDB collection that I would like to duplicate, modifying one field. The field is a string with a numeric value, and I would like to increment the field by 1.
Following this approach From the Mongo shell, I took the following approach:
> var all = db.my_collection.find()
> all.forEach(function(it) {
... it._id = 0; // to force mongo to create a new objectId
... it.field = (parseInt(it.field) + 1).toString();
... db.my_collection.insert(it);
... })
Executing the following code is taking an extremely long time; at first I thought the code was broken somehow, but from a separate terminal I checked the status of the collection something like an hour later to find the process was still running and there was now 7,000,001 documents! I checked to find that sure enough, there was exactly 1 new document that matched the incremented field.
For context, I'm running a 2015 MBP with 4 cores and 16 GB ram. I see mongo near the top of my CPU overhead averaging about 85%.
1) Am I missing a bulk modify/update capability in Mongodb?
2) Any reason why the above operation would be working, yet working so slowly that it is updating a document at a rate of 1/hr?
Try the db.collection.mapReduce() way:
NB: A single emit can only hold half of MongoDB’s maximum BSON document size.
var mapFunction1 = function() {
emit(ObjectId(), (parseInt(this.field) + 1).toString());
};
MongoDB will not call the reduce function for a key that has only a single value.
var reduceFunction1 = function(id, field) {
return field;
};
Finally,
db.my_collection.mapReduce(
mapFunction1,
reduceFunction1.
{"out":"my_collection"} //Replaces the entire content; consider merge
)
I'm embarrassed to say that I was mistaken that this line:
... it._id = 0; // to force mongo to create a new objectId
Does indeed force mongo to create a new ObjectId. Instead I needed to be explicit:
... it._id = ObjectId();

How to set a field value only when document modified?

I'm updating many documents using a bulk operation, but I only want to bump the timestamp of documents that are changed by the new values.
Currently my bulk operation looks something like this:
var updates = db.collection.initializeUnorderedBulkOp();
// ...
updates.find( someQuery ).update( {
$set: someValues,
$currentDate: { modified:true }
} );
updates.execute();
Even if someValues doesn't modify any fields, the modified field gets set. Is there a way to provide a list of additional updates to be performed only when the original update results in a change?
I'm running MongoDB 2.6
I want to avoid negating all the object values in the query one by one.

optimizing query for $exists in sub property

I need to search for the existence of a property that is within another object.
the collection contains documents that look like:
"properties": {
"source": {
"a/name": 12837,
"a/different/name": 76129
}
}
As you can see below, part of the query string is from a variable.
With some help from JohnnyHK (see mongo query - does property exist? for more info), I've got a query that works by doing the following:
var name = 'a/name';
var query = {};
query['properties.source.' + name] = {$exists: true};
collection.find(query).toArray(function...
Now I need to see if I can index the collection to improve the performance of this query.
I don't have a clue how to do this or if it is even possible to index for this.
Suggestions?
2 things happening in here.
First probably you are looking for sparse indexes.
http://docs.mongodb.org/manual/core/index-sparse/
In your case it could be a sparse index on "properties.source.a/name" field. Making indexes on field will dramatically improve your query lookup time.
db.yourCollectionName.createIndex( { "properties.source.a/name": 1 }, { sparse: true } )
Second thing. Always when you want to know whether your query is fast/slow, use mongo console, run your query and on its result call explain method.
db.yourCollectionName.find(query).explain();
Thanks to it you will know whether your query uses indexes or not, how many documents it had to check in order to complete query and some others useful information.

How to add a field to a document which contains the result of the comparison of two other fields

I would like to speed up an query on my mongoDB which uses $where to compare two fields in the document, which seems to be really slow.
My query look like this:
db.mycollection.find({ $where : "this.lastCheckDate < this.modificationDate})
What I would like to do is add a field to my document, i.e. isCheckDateLowerThenModDate, on which I could execute a probably much faster query:
db.mycollection.find({"isCheckDateLowerThenModDate":true})
I quite new to mongoDB an have no idea how to do this. I would appreciate if someone could give me some hints or examples on
How to initialize such a field on an existing collection
How to maintain this field. Which means how to update this field when lastCheckDate or modificationDate changes.
Thanks in advance for your help!
You are thinking in a right way!
1.How to initialize such a field on an existing collection.
Most simple way is to load each document (from your language), calculate this field, update and save.
Or you could perform an update via mongo shell:
db.mycollection.find().forEach(function(doc) {
if(doc.lastCheckDate < doc.modificationDate)
{
doc.isCheckDateLowerThenModDate = true;
}
else
{
doc.isCheckDateLowerThenModDate = false;
}
db.mycollection.save(doc);
});
2.How to maintain this field. Which means how to update this field when
lastCheckDate or modificationDate changes.
You have to do it yourself from your client code. Make some wrapper for update, save operations and recalculate this value each time there. To be absolutely sure that this update works -- write unit tests.
The $where clause is slow because it is evaluating each document using the JavaScript interpreter.
There are a few alternatives:
1) Assuming your use case is "look for records that need updating", take advantage of a sparse index:
add a boolean field like needsChecking and $set this whenever the modificationDate is updated
in your "check" procedure, find the documents that have this field set (should be fast due to the sparse index)
db.mycollection.find({'needsChecking':true});
after you've done whatever check is needed, $unset the needsChecking field.
2) A new (and faster) feature in MongoDB 2.2 is the Aggregation Framework.
Here is an example of adding a "isUpdated" field based on the date comparison, and then filtering the matching documents:
db.mycollection.aggregate(
{ $project: {
_id: 1,
name: 1,
type: 1,
modificationDate: 1,
lastCheckDate: 1,
isUpdated: { $gt:["$modificationDate","$lastCheckDate"] }
}},
{ $match : {
isUpdated : true,
}}
)
Some current caveats of using the Aggregation Framework are:
you have to specify fields to include aside from _id
the result is limited to the current maximum BSON document size (16Mb in MongoDB 2.2)