How do I maintain data types when copying document? - mongodb

I need to make a change to use a generated ObjectId instead of String I was using but the field data type changes from Int to Double.
For example say we have a document
{_id: "Product Name", count: 415 }
Now I want to create a document
{_id: "some object id", name: "Product Name", count: 415 }
I am using similar code below but it makes the count a Double.
var cursor = db.products.find()
cursor.forEach(function(item)
{
var old_id= item._id;
item.name = old_id;
delete item._id;
db.products.insert(item);
db.products.remove({_id:old_id});
});
I can add this in the loop: item.count = NumberInt( item.count) to make sure it's an Int but
I really don't want to do this for each field that I have.
Is there anyway to do this without manually having to cast them? I don't understand why it takes an Int and turns it into a Double. I know Double is the default but the fields that I am working with are already Integers.

Well if I understand you, your documents look like this:
{ "_id" : "Apple", "count" : 187 }
{ "_id" : "Google", "count" : 123 }
{ "_id" : "Amazon", "count" : 325 }
{ "_id" : "Oracle", "count" : 566 }
You can use the Bulk Api to update your collection.
var bulk = db.collection.initializeUnorderedBulkOp();
Var count = 0;
db.collection.aggregate([{ $project: { '_id': 0, 'name': '$_id', 'count': 1 }}]).forEach(function(doc){
bulk.find({'_id': doc.name}).remove();
bulk.insert(doc);
count++;
if (count % 1000 == 0){
// Execute per 1000 operations and re-init.
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}})
// Clean up queues
if (count % 1000 != 0){
bulk.execute();
}
Then:
db.collection.find()
Yields the following documents:
{ "_id" : ObjectId("55a7e2c7eb68594275546c7c"), "count" : 187, "name" : "Apple" }
{ "_id" : ObjectId("55a7e2c7eb68594275546c7d"), "count" : 123, "name" : "Google" }
{ "_id" : ObjectId("55a7e2c7eb68594275546c7e"), "count" : 325, "name" : "Amazon" }
{ "_id" : ObjectId("55a7e2c7eb68594275546c7f"), "count" : 566, "name" : "Oracle" }
Is there anyway to do this without manually having to cast them? I don't understand why it takes an Int and turns it into a Double. I know Double is the default but the fields that I am working with are already Integers.
You really don't need to worry about that if you are using the shell but as pointed out in the comment you can always use a language with native support for integers to preserve the data type.

Related

mongodb findOneAndUpdate only if a certain condition is met

Following is my mongo db entries.
my-mongo-set:PRIMARY> db.stat_collection.find({name : /s/})
{ "_id" : ObjectId("5aabf231a167b3808302b138"), "name" : "shankarmr", "email" : "abc#xyz", "rating" : 9901 }
{ "_id" : ObjectId("5aabf23da167b3808302b139"), "name" : "shankar", "email" : "abc1#xyz1", "rating" : 10011 }
{ "_id" : ObjectId("5aabf2b5a167b3808302b13a"), "name" : "shankar1", "email" : "abc2#xyz2", "rating" : 10 }
{ "_id" : ObjectId("5aabf2c2a167b3808302b13b"), "name" : "shankar2", "email" : "abc3#xyz3", "rating" : 100 }
Now i want to find an entry based on name but update a field only if a certain condition holds good.
I tried the following statement, but it gives me error at the second reference to $rating.
db.stat_collection.findOneAndUpdate({name: "shankar"}, {$set : {rating : {$cond : [ {$lt : [ "$rating", 100]}, 100, $rating]}}, $setOnInsert: fullObject}, {upsert : true} )
So in my case, it shouldnot update rating for the 2nd document as the rating is not less than 100. But for the third document, rating should be updated to 100.
How do i get it work?
$max is the operator you're looking for, try:
db.stat_collection.findOneAndUpdate( { name: "shankar1"}, { $max: { rating: 100 } }, { returnNewDocument: true } )
You'll either get old value (if is greater than 100) or modify a document and set 100
According to the documentation:
The $max operator updates the value of the field to a specified value if the specified value is greater than the current value of the field. The $max operator can compare values of different types, using the BSON comparison order.
You should put all conditions in the query part of the update:
db.stat_collections.findOneAndUpdate(
{ name: "Shankar", rating: { $lt: 100 } },
$set : { rating: 100 },
);
"If the name is Shankar and rating is less than 100, then set the rating to 100." is the above.

Querying with array of parameters in mongodb

I have below collection in the DB, I want to retrieve data where birth month equal to given 2 months. lets say [1,2], or [4,5]
{
"_id" : ObjectId("55aa1e526fea82e9a4188f38"),
"name" : "Nilmini",
"birthDate" : 6,
"birthMonth" : 1
},
{
"_id" : ObjectId("55aa1e526fea82e9a4188f39"),
"name" : "Ruwan",
"birthDate" : 6,
"birthMonth" : 1
},{
"_id" : ObjectId("55aa1e526fea82e9a4188f40"),
"name" : "Malith",
"birthDate" : 6,
"birthMonth" : 1
},
{
"_id" : ObjectId("55aa1e526fea82e9a4188f7569"),
"name" : "Pradeep",
"birthDate" : 6,
"birthMonth" : 7
}
I use below query to get the result set, I could get the result for give one month,now I want to get results for multiple months.
var currentDay = moment().date();
var currentMonths = [];
var currentMonth = moment().month();
if(currentDay > 20){
currentMonths.push(moment().month());
currentMonths.push(moment().month()+1);
}else{
currentMonths.push(currentMonth);
}
// In blow query I am trying to pass the array to the 'birthMonth',
I'm getting nothing when I pass array to the query, I think there should be another way to do this,
Employee.find(
{
"birthDate": {$gte:currentDay}, "birthMonth": currentMonths
}, function(err, birthDays) {
res.json(birthDays);
});
I would really appreciate if you could help me to figure this out
You can use the $in operator to match against multiple values in an array like currentMonths.
So your query would be:
Employee.find(
{
"birthDate": {$gte:currentDay}, "birthMonth": {$in: currentMonths}
}, function(err, birthDays) {
res.json(birthDays);
});

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.

MongoDB: Removing duplicate document based on ObjectId?

This is really an open question. I am sorry if this goes little vague but I am trying to collect thoughts from other people since I am very new to Mongo
Situation
I realized that my collection has multiple duplicate documents (based on name key)
These documents may be same or might got changed during the subsequent dumps from file(we want to keep later changes)
Since there is no insert date, it will be hard to tell looking at document which one is latest (bad schema design)
Wanted
To remove the documents which were inserted earlier
I read that each document in collection is assigned an ObjectId(here) that makes document unique
Question
Is it possible to know which document is inserted earlier based on ObjectId and remove it using Map Reduce?
Any other thoughts and advices?
I'm bored this evening, so here we go.
Step 1. Let's prepare our test data.
> db.users.insert({name: 'John', other_field: Math.random()})
> db.users.insert({name: 'Bob', other_field: Math.random()})
> db.users.insert({name: 'Mary', other_field: Math.random()})
> db.users.insert({name: 'John', other_field: Math.random()})
> db.users.insert({name: 'Jeff', other_field: Math.random()})
> db.users.insert({name: 'Ivan', other_field: Math.random()})
> db.users.insert({name: 'Mary', other_field: Math.random()})
> db.users.find()
{
"_id" : ObjectId("501976e9bee9b253265bba8b"),
"name" : "John",
"other_field" : 0.9884713875252772
}
{
"_id" : ObjectId("501976e9bee9b253265bba8c"),
"name" : "Bob",
"other_field" : 0.048004131996396415
}
{
"_id" : ObjectId("501976e9bee9b253265bba8d"),
"name" : "Mary",
"other_field" : 0.20415803582615222
}
{
"_id" : ObjectId("501976e9bee9b253265bba8e"),
"name" : "John",
"other_field" : 0.5514446987265585
}
{
"_id" : ObjectId("501976e9bee9b253265bba8f"),
"name" : "Jeff",
"other_field" : 0.8685077449753242
}
{
"_id" : ObjectId("501976e9bee9b253265bba90"),
"name" : "Ivan",
"other_field" : 0.2842514340422925
}
{
"_id" : ObjectId("501976eabee9b253265bba91"),
"name" : "Mary",
"other_field" : 0.984048520281136
}
Step 2. The map-reduce
var map = function() {
emit(this.name, this);
};
var reduce = function(name, vals) {
var last_obj = null;
vals.forEach(function(v) {
if(!last_obj || v._id > last_obj._id) {
last_obj = v;
}
});
return last_obj;
};
db.users.mapReduce(map, reduce, {out: 'temp_coll'})
It basically groups all documents by name and then selects the one with the largest _id.
Step 3. Do something with unique data.
> db.temp_coll.find()
{
"_id" : "Bob",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba8c"),
"name" : "Bob",
"other_field" : 0.048004131996396415
}
}
{
"_id" : "Ivan",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba90"),
"name" : "Ivan",
"other_field" : 0.2842514340422925
}
}
{
"_id" : "Jeff",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba8f"),
"name" : "Jeff",
"other_field" : 0.8685077449753242
}
}
{
"_id" : "John",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba8e"),
"name" : "John",
"other_field" : 0.5514446987265585
}
}
{
"_id" : "Mary",
"value" : {
"_id" : ObjectId("501976eabee9b253265bba91"),
"name" : "Mary",
"other_field" : 0.984048520281136
}
}
For example, drop the original collection, iterate this one and insert values into new collection. Don't forget to drop the temp collection when you're done.
Important
I didn't bother with extraction of a timestamp from objectid, because I assumed that you run your import jobs not twice a second (not even every second, maybe).
Ok since object id uses timestamp as it's leading four bytes you can do this with a bit of math.
Thankfully the mongo shell has a way to get the timestamp from an object id by you will need to do some more javascript to first query your documents with the same name then store them in a temp variable (if using the command line) or in a temp table (if using drivers) and parse each individual id's using the timestamp getter that's shown in the link below.
http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
Remember that object id's are only accurate to the second so this still doesn't help in rapid insertion mode.
But either way what you are asking for is doable either in a map reduce function or in the way shown above which does it through the command line.
Give that a shot and if you get stuck let me know. If i know your collection structure i can probably whip up something real quick but only after you bang your head on it a couple of times :)

Trouble with mongo map reduce and aggregating key names

I have a collection in my database representing IP addresses pulled from various sources. A sample of which looks like this:
{ "_id" : ObjectId("4e71060444dce16174378b79"), "ip" : "xxx.xxx.xxx.xxx", "sources" : { "Source1" : NumberLong(52), "Source2" : NumberLong(7) } }
Each object will have one or more sources.
My goal is to show the number of entries reported by each source without necessarily knowing the names of every possible source (because new ones can potentially be added at any time). I have attempted to address this with map reduce by simply emitting a 1 for each key in the sources hash of each object, but something is wrong with my syntax, it seems. If I do the following:
var map_s = function(){
for(var source in this.sources) {
emit(source, 1);
}
}
var red_s = function(key, values){
var total = 0;
values.forEach(function(){
total++;
});
return total;
}
var op = db.addresses.mapReduce(map_s, red_s, {out: 'results'});
db.results.find().forEach(printjson);
I get
{ "_id" : "Source1", "value" : 12 }
{ "_id" : "Source2", "value" : 230 }
{ "_id" : "Source3", "value" : 358 }
{ "_id" : "Source4", "value" : 398 }
{ "_id" : "Source5", "value" : 39 }
{ "_id" : "Source6", "value" : 420 }
{ "_id" : "Source7", "value" : 156 }
Which is far too small for the database size. For instance, I get the following in the shell if I count off of a specific source:
> db.addresses.count({"sources.Source4": {$exists: true}});
1260538
Where is my error?
Yes there is a problem in your reduce method, it must be idempotent.
Remember that reduce() may be called many times on intermediary results.
Instead of
values.forEach(function(){
total++;
});
You need:
values.forEach(function(x){
total += x;
});