mongodb get distinct records - mongodb

I am using mongoDB in which I have collection of following format.
{"id" : 1 , name : x ttm : 23 , val : 5 }
{"id" : 1 , name : x ttm : 34 , val : 1 }
{"id" : 1 , name : x ttm : 24 , val : 2 }
{"id" : 2 , name : x ttm : 56 , val : 3 }
{"id" : 2 , name : x ttm : 76 , val : 3 }
{"id" : 3 , name : x ttm : 54 , val : 7 }
On that collection I have queried to get records in descending order like this:
db.foo.find({"id" : {"$in" : [1,2,3]}}).sort(ttm : -1).limit(3)
But it gives two records of same id = 1 and I want records such that it gives 1 record per id.
Is it possible in mongodb?

There is a distinct command in mongodb, that can be used in conjunction with a query. However, I believe this just returns a distinct list of values for a specific key you name (i.e. in your case, you'd only get the id values returned) so I'm not sure this will give you exactly what you want if you need the whole documents - you may require MapReduce instead.
Documentation on distinct:
http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

You want to use aggregation. You could do that like this:
db.test.aggregate([
// each Object is an aggregation.
{
$group: {
originalId: {$first: '$_id'}, // Hold onto original ID.
_id: '$id', // Set the unique identifier
val: {$first: '$val'},
name: {$first: '$name'},
ttm: {$first: '$ttm'}
}
}, {
// this receives the output from the first aggregation.
// So the (originally) non-unique 'id' field is now
// present as the _id field. We want to rename it.
$project:{
_id : '$originalId', // Restore original ID.
id : '$_id', //
val : '$val',
name: '$name',
ttm : '$ttm'
}
}
])
This will be very fast... ~90ms for my test DB of 100,000 documents.
Example:
db.test.find()
// { "_id" : ObjectId("55fb595b241fee91ac4cd881"), "id" : 1, "name" : "x", "ttm" : 23, "val" : 5 }
// { "_id" : ObjectId("55fb596d241fee91ac4cd882"), "id" : 1, "name" : "x", "ttm" : 34, "val" : 1 }
// { "_id" : ObjectId("55fb59c8241fee91ac4cd883"), "id" : 1, "name" : "x", "ttm" : 24, "val" : 2 }
// { "_id" : ObjectId("55fb59d9241fee91ac4cd884"), "id" : 2, "name" : "x", "ttm" : 56, "val" : 3 }
// { "_id" : ObjectId("55fb59e7241fee91ac4cd885"), "id" : 2, "name" : "x", "ttm" : 76, "val" : 3 }
// { "_id" : ObjectId("55fb59f9241fee91ac4cd886"), "id" : 3, "name" : "x", "ttm" : 54, "val" : 7 }
db.test.aggregate(/* from first code snippet */)
// output
{
"result" : [
{
"_id" : ObjectId("55fb59f9241fee91ac4cd886"),
"val" : 7,
"name" : "x",
"ttm" : 54,
"id" : 3
},
{
"_id" : ObjectId("55fb59d9241fee91ac4cd884"),
"val" : 3,
"name" : "x",
"ttm" : 56,
"id" : 2
},
{
"_id" : ObjectId("55fb595b241fee91ac4cd881"),
"val" : 5,
"name" : "x",
"ttm" : 23,
"id" : 1
}
],
"ok" : 1
}
PROS: Almost certainly the fastest method.
CONS: Involves use of the complicated Aggregation API. Also, it is tightly coupled to the original schema of the document. Though, it may be possible to generalize this.

I believe you can use aggregate like this
collection.aggregate({
$group : {
"_id" : "$id",
"docs" : {
$first : {
"name" : "$name",
"ttm" : "$ttm",
"val" : "$val",
}
}
}
});

The issue is that you want to distill 3 matching records down to one without providing any logic in the query for how to choose between the matching results.
Your options are basically to specify aggregation logic of some kind (select the max or min value for each column, for example), or to run a select distinct query and only select the fields that you wish to be distinct.
querymongo.com does a good job of translating these distinct queries for you (from SQL to MongoDB).
For example, this SQL:
SELECT DISTINCT columnA FROM collection WHERE columnA > 5
Is returned as this MongoDB:
db.runCommand({
"distinct": "collection",
"query": {
"columnA": {
"$gt": 5
}
},
"key": "columnA"
});

If you want to write the distinct result in a file using javascript...this is how you do
cursor = db.myColl.find({'fieldName':'fieldValue'})
var Arr = new Array();
var count = 0;
cursor.forEach(
function(x) {
var temp = x.id;
var index = Arr.indexOf(temp);
if(index==-1)
{
printjson(x.id);
Arr[count] = temp;
count++;
}
})

Specify Query with distinct.
The following example returns the distinct values for the field sku, embedded in the item field, from the documents whose dept is equal to "A":
db.inventory.distinct( "item.sku", { dept: "A" } )
Reference: https://docs.mongodb.com/manual/reference/method/db.collection.distinct/

Related

Getting array of object with limit and offset doesn't work using mongodb

First let me say that I am new to mongodb. I am trying to get the data from the collection
Here is the document in my collection student:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"id" : "l_7c0e37b9-132e-4054-adbf-649dbc29f43d",
"name" : "Raj",
"class" : "10th",
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
the output which i require is
{
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
for this response i used the following query
db.getCollection('student').find({},{"assignments":1})
Now what exactly I am trying is to apply limit and offset for the comments list I tried with $slice:[0,3] but it gives me whole document with sliced result
but not assignments alone so how can I combine these two in order to get only assignments with limit and offset.
You'll need to aggregate rather than find because aggregate allows you to project+slice.
Given the document from your question, the following command ...
db.getCollection('student').aggregate([
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
... returns:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
This represents the assignments array (and only the assignments array) with a slice from element 2 to 5. You can change the slice arguments (2, 5 in the above example) to apply your own offset and limit (where the first argument is the offset and the limit is the difference between the first and second arguments).
If you want to add a match condition (to address specific documents) to the above then you'd do something like this:
db.getCollection('other').aggregate([
/// match a specific document
{$match: {"_id": ObjectId("5979e0473f00003717a9bd62")}},
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
More details on the match step here.

Union Set using MapReduce MongoDB

I'm trying to unite two collections using MapReduce. They have identical structure, for example:
db.tableR.insert({product:"A", quantity:150});
db.tableR.insert({product:"B", quantity:100});
db.tableR.insert({product:"C", quantity:60});
db.tableR.insert({product:"D", quantity:200});
db.tableS.insert({product:"A", quantity:150});
db.tableS.insert({product:"B", quantity:100});
db.tableS.insert({product:"F", quantity:220});
db.tableS.insert({product:"G", quantity:130});
I want MapReduce delete duplicates.
I'm creating a map that divides collection according quantity:
map = function(){
if (this.quantity<150){
var key=0;
}else{
var key=1;
}
var value = {"product":this.product, "quantity":this.quantity};
emit(key,value);
};
Now I want that reduce function removes duplicates but I can't find a way to add the new ones to the reduced var.
This is what I tried:
reduce = function(keys,values){
var reduced = {
product:"",
quantity:""
};
for (var i=0; i < values.length;i++)
{
if(values[i].product !== null) {reduced.insert({product: values[i].product, quantity: values[i].quantity})}
}
return reduced;};
db.tableR.mapReduce(map,reduce,{out:'map_reduce_result'});
db.tableS.mapReduce(map,reduce,{out:'map_reduce_result'});
db.map_reduce_result.find();
What function can I use?
My expected output:
{"_id" : 0, "value" : {"product" : "B","quantity" : 100}}
{"_id" : 0, "value" : {"product" : "C","quantity" : 60}}
{"_id" : 0, "value" : {"product" : "G","quantity" : 130}}
{"_id" : 1, "value" : {"product" : "A","quantity" : 150}}
{"_id" : 1, "value" : {"product" : "D","quantity" : 200}}
{"_id" : 1, "value" : {"product" : "F","quantity" : 220}}
The reduce function can only return a single value, so you want it to execute for every single row. The reduce function gets called for each unique key returned in your map function. Your keys were 0 and 1, so it would only get called twice for each collection - once for key 0 and once for key 1. Hence, the max number of results would only be 2 for each collection.
What you need to do is set the key to the product in the map function:
map = function(){
emit(this.product,{product:this.product,quantity:this.quantity});
};
Now, the reduce function will get called for every unique product value. Our new map function just returns the first value in the array (if there were duplicates in the same collection it would just take the first. You could be smart here and take the highest or lowest quantity - or the sum of the quantities, etc).
reduce = function(keys,values){
return values[0];
};
Run your first map reduce job:
db.tableR.mapReduce(map,reduce,{out:'map_reduce_result'});
Run your second, but this time merge the result:
db.tableS.mapReduce(map,reduce,{out: {merge: 'map_reduce_result'}});
Now db.map_reduce_result.find() returns:
{ "_id" : "A", "value" : { "product" : "A", "quantity" : 150 } }
{ "_id" : "B", "value" : { "product" : "B", "quantity" : 100 } }
{ "_id" : "C", "value" : { "product" : "C", "quantity" : 60 } }
{ "_id" : "D", "value" : { "product" : "D", "quantity" : 200 } }
{ "_id" : "F", "value" : { "product" : "F", "quantity" : 220 } }
{ "_id" : "G", "value" : { "product" : "G", "quantity" : 130 } }
Obviously the _id doesn't match what you are looking for. If you absolutely need that you can use the aggregation framework like so:
db.map_reduce_result.aggregate([{$project:{
_id:{$cond: { if: { $gte: [ "$value.quantity", 150 ] }, then: 1, else: 0 }},
value:1
}}]);
This results in:
{ "_id" : 1, "value" : { "product" : "A", "quantity" : 150 } }
{ "_id" : 0, "value" : { "product" : "B", "quantity" : 100 } }
{ "_id" : 0, "value" : { "product" : "C", "quantity" : 60 } }
{ "_id" : 1, "value" : { "product" : "D", "quantity" : 200 } }
{ "_id" : 1, "value" : { "product" : "F", "quantity" : 220 } }
{ "_id" : 0, "value" : { "product" : "G", "quantity" : 130 } }
Note: If two rows from different collections have the same product ID, but different quantities I am unsure which one will be returned.

How to create an index on the "name" field of a document in mongodb

I want to create an index on the name field of a document in mongodb so that when I do a find,I should get all the names to be displayed in the alphabetical order.How can I achieve this ? Can anyone please help me out ...
My documents in mongodb:
db.col.find();
{ "_id" : ObjectId("5696256b0c50bf42dcdfeae1"), "name" : "Daniel", "age" : 24 }
{ "_id" : ObjectId("569625850c50bf42dcdfeae2"), "name" : "Asha", "age" : 21 }
{ "_id" : ObjectId("569625a40c50bf42dcdfeae3"), "name" : "Hampi", "age" : 34 }
{ "_id" : ObjectId("5696260f0c50bf42dcdfeae5"), "name" : "Bhavana", "age" : 14 }
Actually you don't need an index in order to display your result alphabetically. What you need is the .sort() method.
db.collection.find().sort({'name': 1})
Which returns
{ "_id" : ObjectId("569625850c50bf42dcdfeae2"), "name" : "Asha", "age" : 21 }
{ "_id" : ObjectId("5696260f0c50bf42dcdfeae5"), "name" : "Bhavana", "age" : 14 }
{ "_id" : ObjectId("5696256b0c50bf42dcdfeae1"), "name" : "Daniel", "age" : 24 }
{ "_id" : ObjectId("569625a40c50bf42dcdfeae3"), "name" : "Hampi", "age" : 34 }
Creating an index on a field in your document will not automatically sort your result on that particular field you still need to use the .sort() method. see Use Indexes to Sort Query Results
If you want to return an array of all names in your documents in ascending order then you will need to use the .aggregate() method.
The first stage in the pipeline is the $sort stage where you sort your documents by "name" in ascending order. The last stage is the $group stage where you group your documents and use the $push accumulator operator to return an array of "names"
db.collection.aggregate([
{ "$sort": { "name": 1 } },
{ "$group": { "_id": null, "names": { "$push": "$name" } } }
])
Which yields:
{ "_id" : null, "names" : [ "Asha", "Bhavana", "Daniel", "Hampi" ] }

MongoDB Query group and distinct together

Consider the following set of documents:
[
{
"name" : "nameA",
"class" : "classA",
"age" : 24,
"marks" : 45
},
{
"name" : "nameB",
"class" : "classB",
"age" : 22,
"marks" : 65
},
{
"name" : "nameC",
"class" : "classA",
"age" : 14,
"marks" : 55
}
]
I need to fetch the min and max values for age and marks as well as the distinct values for name and class.
I know that I can use aggregate and group to get the max and min values of age and marks with one query, and I can get distinct values of name and class using distinct query.
But I don't want to do separate queries to fetch that information. Is there a way in which I can get the result with one query? Let's say if I can merge the aggregate and distinct somehow.
Sure, you can do it with one aggregation command. You need to use $group with $addToSet operator:
db.collection.aggregate([{
$group : {
_id : null,
name : { $addToSet : "$name" },
class : { $addToSet : "$class" },
ageMin : { $min : "$age" },
ageMax : { $max : "$age" },
marksMin : { $min : "$marks" },
marksMax : { $max : "$marks" }
}
}]);
$addToSet will create an array with unique values for the selected field.
This aggregation will return the following response for your example docs:
{
"_id" : null,
"name" : [
"nameC",
"nameB",
"nameA"
],
"class" : [
"classB",
"classA"
],
"ageMin" : 14,
"ageMax" : 24,
"marksMin" : 45,
"marksMax" : 65
}

Find count of maximum consecutive records based on one field in Mongodb Query

I want to find the count of maximum consecutive records based on one particular field.
My db.people collection after finding sort based on field is:
> db.people.find().sort({ updated_at: 1})
{ "_id" : 1, "name" : "aaa", "flag" : true, "updated_at" : ISODate("2014-02-07T08:42:48.688Z") }
{ "_id" : 2, "name" : "bbb", "flag" : false, "updated_at" : ISODate("2014-02-07T08:43:10Z") }
{ "_id" : 3, "name" : "ccc", "flag" : true, "updated_at" : ISODate("2014-02-07T08:43:40.660Z") }
{ "_id" : 4, "name" : "ddd", "flag" : true, "updated_at" : ISODate("2014-02-07T08:43:51.567Z") }
{ "_id" : 6, "name" : "fff", "flag" : false, "updated_at" : ISODate("2014-02-07T08:44:23.713Z") }
{ "_id" : 7, "name" : "ggg", "flag" : true, "updated_at" : ISODate("2014-02-07T08:44:44.639Z") }
{ "_id" : 8, "name" : "hhh", "flag" : true, "updated_at" : ISODate("2014-02-07T08:44:51.415Z") }
{ "_id" : 5, "name" : "eee", "flag" : true, "updated_at" : ISODate("2014-02-07T08:55:24.917Z") }
In above records, there are two places where flag attribute value comes true in consecutive ways. i.e
record with _id 3 - record with _id 4 (2 consecutive records)
and
record with _id 7 - record with _id 8 - record with _id 5 (3 consecutive records)
However, I want the maximum consecutive number from mongo query search. i.e 3.
Is it possible to get such result?
I googled it and found a little similar solution of using Map-Reduce here https://stackoverflow.com/a/7408639/1120530.
I am new to mongodb and couldn't able to understand the map-reduce documentation and specially how to apply it in above scenario.
You can do this mapReduce operation.
First the mapper:
var mapper = function () {
if ( this.flag == true ) {
totalCount++;
} else {
totalCount = 0;
}
if ( totalCount != 0 ) {
emit (
counter,
{ _id: this._id, totalCount: totalCount }
);
} else {
counter++;
}
};
Which keeps a running count of the total times that the true value is seen in flag. If that count is more than 1 then we emit the the value, also containing the document _id. Another counter which is used for the key is incremented when the flag is false, in order to have a grouping "key" for the matches.
Then the reducer:
var reducer = function ( key, values ) {
var result = { docs: [] };
values.forEach(function(value) {
result.docs.push(value._id);
result.totalCount = value.totalCount;
});
return result;
};
Simply pushes the _id values onto a result array along with the totalCount.
Then run:
db.people.mapReduce(
mapper,
reducer,
{
"out": { "inline": 1 },
"scope": {
"totalCount": 0,
"counter": 0
},
"sort": { "updated_at": 1 }
}
)
So with the mapper and reducer functions, we then define the global variables used in "scope" and pass in the "sort" that was required on updated_at dates. Which gives the result:
{
"results" : [
{
"_id" : 1,
"value" : {
"docs" : [
3,
4
],
"totalCount" : 2
}
},
{
"_id" : 2,
"value" : {
"docs" : [
7,
8,
5
],
"totalCount" : 3
}
}
],
"timeMillis" : 2,
"counts" : {
"input" : 7,
"emit" : 5,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
Of course you could just skip the totalCount variable and just use the array length, which would be the same. But since you want to use that counter anyway it's just added in. But that's the principle.
So yes, this was a problem suited to mapReduce, and now you have an example.