I have below collection in the DB, I want to retrieve data where birth month equal to given 2 months. lets say [1,2], or [4,5]
{
"_id" : ObjectId("55aa1e526fea82e9a4188f38"),
"name" : "Nilmini",
"birthDate" : 6,
"birthMonth" : 1
},
{
"_id" : ObjectId("55aa1e526fea82e9a4188f39"),
"name" : "Ruwan",
"birthDate" : 6,
"birthMonth" : 1
},{
"_id" : ObjectId("55aa1e526fea82e9a4188f40"),
"name" : "Malith",
"birthDate" : 6,
"birthMonth" : 1
},
{
"_id" : ObjectId("55aa1e526fea82e9a4188f7569"),
"name" : "Pradeep",
"birthDate" : 6,
"birthMonth" : 7
}
I use below query to get the result set, I could get the result for give one month,now I want to get results for multiple months.
var currentDay = moment().date();
var currentMonths = [];
var currentMonth = moment().month();
if(currentDay > 20){
currentMonths.push(moment().month());
currentMonths.push(moment().month()+1);
}else{
currentMonths.push(currentMonth);
}
// In blow query I am trying to pass the array to the 'birthMonth',
I'm getting nothing when I pass array to the query, I think there should be another way to do this,
Employee.find(
{
"birthDate": {$gte:currentDay}, "birthMonth": currentMonths
}, function(err, birthDays) {
res.json(birthDays);
});
I would really appreciate if you could help me to figure this out
You can use the $in operator to match against multiple values in an array like currentMonths.
So your query would be:
Employee.find(
{
"birthDate": {$gte:currentDay}, "birthMonth": {$in: currentMonths}
}, function(err, birthDays) {
res.json(birthDays);
});
Related
I need to make a change to use a generated ObjectId instead of String I was using but the field data type changes from Int to Double.
For example say we have a document
{_id: "Product Name", count: 415 }
Now I want to create a document
{_id: "some object id", name: "Product Name", count: 415 }
I am using similar code below but it makes the count a Double.
var cursor = db.products.find()
cursor.forEach(function(item)
{
var old_id= item._id;
item.name = old_id;
delete item._id;
db.products.insert(item);
db.products.remove({_id:old_id});
});
I can add this in the loop: item.count = NumberInt( item.count) to make sure it's an Int but
I really don't want to do this for each field that I have.
Is there anyway to do this without manually having to cast them? I don't understand why it takes an Int and turns it into a Double. I know Double is the default but the fields that I am working with are already Integers.
Well if I understand you, your documents look like this:
{ "_id" : "Apple", "count" : 187 }
{ "_id" : "Google", "count" : 123 }
{ "_id" : "Amazon", "count" : 325 }
{ "_id" : "Oracle", "count" : 566 }
You can use the Bulk Api to update your collection.
var bulk = db.collection.initializeUnorderedBulkOp();
Var count = 0;
db.collection.aggregate([{ $project: { '_id': 0, 'name': '$_id', 'count': 1 }}]).forEach(function(doc){
bulk.find({'_id': doc.name}).remove();
bulk.insert(doc);
count++;
if (count % 1000 == 0){
// Execute per 1000 operations and re-init.
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}})
// Clean up queues
if (count % 1000 != 0){
bulk.execute();
}
Then:
db.collection.find()
Yields the following documents:
{ "_id" : ObjectId("55a7e2c7eb68594275546c7c"), "count" : 187, "name" : "Apple" }
{ "_id" : ObjectId("55a7e2c7eb68594275546c7d"), "count" : 123, "name" : "Google" }
{ "_id" : ObjectId("55a7e2c7eb68594275546c7e"), "count" : 325, "name" : "Amazon" }
{ "_id" : ObjectId("55a7e2c7eb68594275546c7f"), "count" : 566, "name" : "Oracle" }
Is there anyway to do this without manually having to cast them? I don't understand why it takes an Int and turns it into a Double. I know Double is the default but the fields that I am working with are already Integers.
You really don't need to worry about that if you are using the shell but as pointed out in the comment you can always use a language with native support for integers to preserve the data type.
I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more
This is really an open question. I am sorry if this goes little vague but I am trying to collect thoughts from other people since I am very new to Mongo
Situation
I realized that my collection has multiple duplicate documents (based on name key)
These documents may be same or might got changed during the subsequent dumps from file(we want to keep later changes)
Since there is no insert date, it will be hard to tell looking at document which one is latest (bad schema design)
Wanted
To remove the documents which were inserted earlier
I read that each document in collection is assigned an ObjectId(here) that makes document unique
Question
Is it possible to know which document is inserted earlier based on ObjectId and remove it using Map Reduce?
Any other thoughts and advices?
I'm bored this evening, so here we go.
Step 1. Let's prepare our test data.
> db.users.insert({name: 'John', other_field: Math.random()})
> db.users.insert({name: 'Bob', other_field: Math.random()})
> db.users.insert({name: 'Mary', other_field: Math.random()})
> db.users.insert({name: 'John', other_field: Math.random()})
> db.users.insert({name: 'Jeff', other_field: Math.random()})
> db.users.insert({name: 'Ivan', other_field: Math.random()})
> db.users.insert({name: 'Mary', other_field: Math.random()})
> db.users.find()
{
"_id" : ObjectId("501976e9bee9b253265bba8b"),
"name" : "John",
"other_field" : 0.9884713875252772
}
{
"_id" : ObjectId("501976e9bee9b253265bba8c"),
"name" : "Bob",
"other_field" : 0.048004131996396415
}
{
"_id" : ObjectId("501976e9bee9b253265bba8d"),
"name" : "Mary",
"other_field" : 0.20415803582615222
}
{
"_id" : ObjectId("501976e9bee9b253265bba8e"),
"name" : "John",
"other_field" : 0.5514446987265585
}
{
"_id" : ObjectId("501976e9bee9b253265bba8f"),
"name" : "Jeff",
"other_field" : 0.8685077449753242
}
{
"_id" : ObjectId("501976e9bee9b253265bba90"),
"name" : "Ivan",
"other_field" : 0.2842514340422925
}
{
"_id" : ObjectId("501976eabee9b253265bba91"),
"name" : "Mary",
"other_field" : 0.984048520281136
}
Step 2. The map-reduce
var map = function() {
emit(this.name, this);
};
var reduce = function(name, vals) {
var last_obj = null;
vals.forEach(function(v) {
if(!last_obj || v._id > last_obj._id) {
last_obj = v;
}
});
return last_obj;
};
db.users.mapReduce(map, reduce, {out: 'temp_coll'})
It basically groups all documents by name and then selects the one with the largest _id.
Step 3. Do something with unique data.
> db.temp_coll.find()
{
"_id" : "Bob",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba8c"),
"name" : "Bob",
"other_field" : 0.048004131996396415
}
}
{
"_id" : "Ivan",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba90"),
"name" : "Ivan",
"other_field" : 0.2842514340422925
}
}
{
"_id" : "Jeff",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba8f"),
"name" : "Jeff",
"other_field" : 0.8685077449753242
}
}
{
"_id" : "John",
"value" : {
"_id" : ObjectId("501976e9bee9b253265bba8e"),
"name" : "John",
"other_field" : 0.5514446987265585
}
}
{
"_id" : "Mary",
"value" : {
"_id" : ObjectId("501976eabee9b253265bba91"),
"name" : "Mary",
"other_field" : 0.984048520281136
}
}
For example, drop the original collection, iterate this one and insert values into new collection. Don't forget to drop the temp collection when you're done.
Important
I didn't bother with extraction of a timestamp from objectid, because I assumed that you run your import jobs not twice a second (not even every second, maybe).
Ok since object id uses timestamp as it's leading four bytes you can do this with a bit of math.
Thankfully the mongo shell has a way to get the timestamp from an object id by you will need to do some more javascript to first query your documents with the same name then store them in a temp variable (if using the command line) or in a temp table (if using drivers) and parse each individual id's using the timestamp getter that's shown in the link below.
http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
Remember that object id's are only accurate to the second so this still doesn't help in rapid insertion mode.
But either way what you are asking for is doable either in a map reduce function or in the way shown above which does it through the command line.
Give that a shot and if you get stuck let me know. If i know your collection structure i can probably whip up something real quick but only after you bang your head on it a couple of times :)
I'm new in mongodbs mapreduce and for sure I have not completely understood it for now. And I have a problem, which I try to solve for few days without success.
I have a collection of let's say posts with a tags field. Now I want to mapreduce a new collection of tags. Where every tag have an array of all posts ids that have this one particular tag assigned.
one of my attempts to do this (which doesn't do this right)
m = function() {
for (var i in this.tags) {
emit(this.tags[i], {"ids" : [this._id]});
};
}
r = function(key, emits) {
var total = {ids : []}
for (var i in emits) {
emits[i].ids.forEach(function(id) {
total.ids.push(id);
}
}
return total;
};
I know, that I have to pivot the date some how around, but I just cant get my head wrapped around it.
I think you're missing a ")" in your reduce function to close the emits[i].ids.forEach(). Is this what you're trying to do?
r = function (key, values) {
var total = {ids:[]};
for (var i in values) {
values[i].ids.forEach(
function (id){
total.ids.push(id);
}
);
}
return total;
}
input
{_id:2, tags: ["dog", "Jenna"]}
{_id:1, tags: ["cat", "Jenna"]}
result:
{"results" : [
{"_id" : "Jenna",
"value" : {"ids" : [2,1]}
},
{"_id" : "cat",
"value" : {"ids" : [1]}
},
{"_id" : "dog",
"value" : {"ids" : [2]}
}
],
"timeMillis" : 1,
"counts" : {
"input" : 2,
"emit" : 4,
"reduce" : 1,
"output" : 3
},
"ok" : 1,
}
I have some data that looks like this:
[
{
"_id" : ObjectId("4e2f2af16f1e7e4c2000000a"),
"advertisers" : [
{
"created_at" : ISODate("2011-07-26T21:02:19Z"),
"category" : "Infinity Pro Spin Air Brush",
"updated_at" : ISODate("2011-07-26T21:02:19Z"),
"lowered_name" : "conair",
"twitter_name" : "",
"facebook_page_url" : "",
"website_url" : "",
"user_ids" : [ ],
"blog_url" : "",
},
and I was thinking that a query like this would give the id of the advertiser:
var start = new Date(2011, 1, 1);
> var end = new Date(2011, 12, 12);
> db.agencies.find( { "created_at" : {$gte : start , $lt : end} } , { _id : 1 , program_ids : 1 , advertisers { name : 1 } } ).limit(1).toArray();
But my query didn't work. Any idea how I can add the fields inside the nested elements to my list of fields I want to get?
Thanks!
Use dot notation (e.g. advertisers.name) to query and retrieve fields from nested objects:
db.agencies.find({
"advertisers.created_at": {
$gte: start,
$lt: end
}
},
{
_id: 1,
program_ids: 1,
"advertisers.name": 1
}
}).limit(1).toArray();
Reference: Retrieving a Subset of Fields
and Dot Notation
db.agencies.find(
{ "advertisers.created_at" : {$gte : start , $lt : end} } ,
{ program_ids : 1 , advertisers.name : 1 }
).limit(1).pretty();
There is one thing called dot notation that MongoDB provides that allows you to look inside arrays of elements. Using it is as simple as adding a dot for each array you want to enter.
In your case
"_id" : ObjectId("4e2f2af16f1e7e4c2000000a"),
"advertisers" : [
{
"created_at" : ISODate("2011-07-26T21:02:19Z"),
"category" : "Infinity Pro Spin Air Brush",
"updated_at" : ISODate("2011-07-26T21:02:19Z"),
"lowered_name" : "conair",
"twitter_name" : "",
"facebook_page_url" : "",
"website_url" : "",
"user_ids" : [ ],
"blog_url" : "",
},
{ ... }
If you want to go inside the array of advertisers to look for the property created_at inside each one of them, you can simply write the query with the property {'advertisers.created_at': query} like follows
db.agencies.find( { 'advertisers.created_at' : { {$gte : start , $lt : end} ... }