Mongo Aggregation / Grouping Query - mongodb

I have records of this form:
{
"_id" : ObjectId("57993e64498e9bebb535154f"),
"fooKey" : "123|a|b|c|||d",
"locationId" : 1,
"type" : "FOO"
}
{
"_id" : ObjectId("579e0a3d498e9bebb545ff96"),
"fooKey" : "123|x|y|z|||v",
"locationId" : 1,
"type" : "FOO"
}
{
"_id" : ObjectId("57a5443b498e381a40a26afb"),
"fooKey" : "123|a|b|c|||d",
"locationId" : 2,
"type" : "FOO"
}
{
"_id" : ObjectId("57a63fef498e381a40a60347"),
"fooKey" : "123|x|y|z|||v",
"locationId" : 2,
"type" : "FOO"
}
{
"_id" : ObjectId("579ab3ce498e9538125052ca"),
"fooKey" : "456|h|j|j|||k",
"locationId" : 2,
"type" : "BAR"
}
I went through the documentation and this seems like it could be complex given that I need this today (and I am not an expert in Mongo). What I need an aggregation query (for only records with "type" : "FOO") to return groups grouped by:
The first field in the pipe-delimited string in the "fooKey" field (for example "123"
The locationId
and then where the resulting count of the type field (where it is specifically equal to "FOO" is greater than 1.
That is given the records above I need something along the lines of records 1 and 2 returned in a group along with records 3 and 4 aggregated in a group... along with a count of the group size.
Expected Output
Something like this:
{
"foo": "123",
"locationId": 1,
"type": "FOO",
"total": 2
},
{
"foo": "123",
"locationId": 2,
"type": "FOO",
"total": 2
}

Related

How to query hierachical data that are not based on document IDs but on another property

Lets say I have hierarchical structure of comments like this:
Comment 1
Comment 2
Comment 3
Comment 4
That is represented using following sequence of events stored as documents in mongodb:
Id EventType ContentId ParentContentId
1 CommentAdded 1
2 CommentEdited 1
3 CommentAdded 2 1
4 CommentAdded 3 1
5 CommentUpvoted 3
6 CommentAdded 4 3
Is it possible to query the hierarchy based on root ContentId as following:
getCommentsTree(rootContentId) { … }
var comments = getCommentsTree(1);
References: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-parent-references/index.html and https://docs.mongodb.com/manual/reference/operator/aggregation/graphLookup/#pipe._S_graphLookup.
The idea is to filter out (match operator) the requested document and then use the graphLookup aggregate function (check all the limitations, such as max memory)
db.getCollection('comments').aggregate([
{
$match : { "ContentId" : 1.0 }
},
{
$graphLookup: {
from: 'comments',
startWith: "$ContentId",
connectFromField: "ContentId",
connectToField: "ParentContentId",
as: "hierarchy"
}
}])
This will return the object that matched with an added hierarchy array, showing all descendants, something like this:
...
"hierarchy" : [
{
"id" : 3,
"type" : "added",
"text" : "c3",
"parentId" : 1
},
{
"id" : 4,
"type" : "added",
"text" : "c4",
"parentId" : 3
},
{
"id" : 2,
"type" : "added",
"text" : "c2",
"parentId" : 1
}
]
...
"hierarchy" : [
{
"id" : 3,
"type" : "added",
"text" : "c3",
"parentId" : 1
},
{
"id" : 4,
"type" : "added",
"text" : "c4",
"parentId" : 3
},
{
"id" : 2,
"type" : "added",
"text" : "c2",
"parentId" : 1
}
]
You should have a look at https://docs.mongodb.com/manual/applications/data-models-tree-structures/ to see if other modeling techniques would be better for your scenario.

Getting array of object with limit and offset doesn't work using mongodb

First let me say that I am new to mongodb. I am trying to get the data from the collection
Here is the document in my collection student:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"id" : "l_7c0e37b9-132e-4054-adbf-649dbc29f43d",
"name" : "Raj",
"class" : "10th",
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
the output which i require is
{
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
for this response i used the following query
db.getCollection('student').find({},{"assignments":1})
Now what exactly I am trying is to apply limit and offset for the comments list I tried with $slice:[0,3] but it gives me whole document with sliced result
but not assignments alone so how can I combine these two in order to get only assignments with limit and offset.
You'll need to aggregate rather than find because aggregate allows you to project+slice.
Given the document from your question, the following command ...
db.getCollection('student').aggregate([
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
... returns:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
This represents the assignments array (and only the assignments array) with a slice from element 2 to 5. You can change the slice arguments (2, 5 in the above example) to apply your own offset and limit (where the first argument is the offset and the limit is the difference between the first and second arguments).
If you want to add a match condition (to address specific documents) to the above then you'd do something like this:
db.getCollection('other').aggregate([
/// match a specific document
{$match: {"_id": ObjectId("5979e0473f00003717a9bd62")}},
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
More details on the match step here.

mongodb count and remove duplicate values

i have a large mongodb collection with a lot of duplicate inserts like this
{ "_id" : 1, "val" : "222222", "val2" : "37"}
{ "_id" : 2, "val" : "222222", "val2" : "37" }
{ "_id" : 3, "val" : "222222", "val2" : "37" }
{ "_id" : 4, "val" : "333333", "val2" : "66" }
{ "_id" : 5, "val" : "111111", "val2" : "22" }
{ "_id" : 6, "val" : "111111", "val2" : "22" }
{ "_id" : 7, "val" : "111111", "val2" : "22" }
{ "_id" : 8, "val" : "111111", "val2" : "22" }
i want to count all duplicates for each insert and only leave one unique entry with the count number in DB like this
{ "_id" : 1, "val" : "222222", "val2" : "37", "count" : "3"}
{ "_id" : 2, "val" : "333333", "val2" : "66", "count" : "1"}
{ "_id" : 2, "val" : "111111", "val2" : "22", "count" : "4" }
i already checked out MapReduce and aggregation framework but they never output the full document back and only do one calculation for full collection
it would be good to save the new data to a new collection
If you use mongodb 2.6, here is an example with the aggregation framework :
db.duplicate.aggregate({$group:{_id:"$val",count:{$sum :1}}},
{$project:{_id:0, val:"$_id", count:1}},
{$out:"deduplicate"})
group with val and count
project to rename _id field and mask _id field
out to write to a new collection (here the name is deduplicate)
Hope it fit with your case.
Might be easier with an incremental map reduce
mapper=function(){
emit({'val1':this.val, 'val2':this.val2}, {'count':1});
}
reducer=function(k,v){
counter=0;
for (i=0;i<v.length;i++){
counter+=v[i].count;
}
return {'count':counter}
}
Then in the shell you'll need to do
bigcollection.map_reduce(mapper, reducer, {out:{reduce:'reducedcollection'}})
This should result in a new collection called reduced collection. Your values will be the IDs and the count will be there. Note the use of two values as the key in your new collection. If you want to find a specific instance you can do:
reducedcollection.findOne({'id.val1':'33333', 'id.val2':'22'})
The interesting thing happens is that you can now drop the old collection and as new data comes in, map reduce it on top of the reducedcollection and you'll increment the counts.
Might be handy?

Query for Partial Object in Array - MongoDB

I have a simple structured document like this:
"people" : [
{
"id" : "6241863",
"amount" : 5
}
],
People can contain more than one element. I've managed to get this to work:
db.village.findOne({"people": {"$in": [{"id": "6241863", "amount": 5}]}})
But I want to ignore amount and search for any document containing people with id 6241863 and any amount.
According to the advanced query documentation, you can mix array value queries with dot notation for reaching into objects. Here's an example using your schema:
$ mongo
MongoDB shell version: 2.1.0
connecting to: test
> db.users.save({_id: 1, friends: [{id: 2, name: 'bob'}]})
> db.users.find({friends: {id: 2, name: 'bob'}})
{ "_id" : 1, "friends" : [ { "id" : 2, "name" : "bob" } ] }
> db.users.find({'friends.id': 2})
{ "_id" : 1, "friends" : [ { "id" : 2, "name" : "bob" } ] }
> db.users.find({'friends.name': 'bob'})
{ "_id" : 1, "friends" : [ { "id" : 2, "name" : "bob" } ] }
> db.users.find({'friends.name': 'ted'})
>
Try with that
db.village.findOne({"people" : { "$in" : [{ "id" : "6241863" , "amount" : { "$ne" : null } }]}})

mongodb get distinct records

I am using mongoDB in which I have collection of following format.
{"id" : 1 , name : x ttm : 23 , val : 5 }
{"id" : 1 , name : x ttm : 34 , val : 1 }
{"id" : 1 , name : x ttm : 24 , val : 2 }
{"id" : 2 , name : x ttm : 56 , val : 3 }
{"id" : 2 , name : x ttm : 76 , val : 3 }
{"id" : 3 , name : x ttm : 54 , val : 7 }
On that collection I have queried to get records in descending order like this:
db.foo.find({"id" : {"$in" : [1,2,3]}}).sort(ttm : -1).limit(3)
But it gives two records of same id = 1 and I want records such that it gives 1 record per id.
Is it possible in mongodb?
There is a distinct command in mongodb, that can be used in conjunction with a query. However, I believe this just returns a distinct list of values for a specific key you name (i.e. in your case, you'd only get the id values returned) so I'm not sure this will give you exactly what you want if you need the whole documents - you may require MapReduce instead.
Documentation on distinct:
http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct
You want to use aggregation. You could do that like this:
db.test.aggregate([
// each Object is an aggregation.
{
$group: {
originalId: {$first: '$_id'}, // Hold onto original ID.
_id: '$id', // Set the unique identifier
val: {$first: '$val'},
name: {$first: '$name'},
ttm: {$first: '$ttm'}
}
}, {
// this receives the output from the first aggregation.
// So the (originally) non-unique 'id' field is now
// present as the _id field. We want to rename it.
$project:{
_id : '$originalId', // Restore original ID.
id : '$_id', //
val : '$val',
name: '$name',
ttm : '$ttm'
}
}
])
This will be very fast... ~90ms for my test DB of 100,000 documents.
Example:
db.test.find()
// { "_id" : ObjectId("55fb595b241fee91ac4cd881"), "id" : 1, "name" : "x", "ttm" : 23, "val" : 5 }
// { "_id" : ObjectId("55fb596d241fee91ac4cd882"), "id" : 1, "name" : "x", "ttm" : 34, "val" : 1 }
// { "_id" : ObjectId("55fb59c8241fee91ac4cd883"), "id" : 1, "name" : "x", "ttm" : 24, "val" : 2 }
// { "_id" : ObjectId("55fb59d9241fee91ac4cd884"), "id" : 2, "name" : "x", "ttm" : 56, "val" : 3 }
// { "_id" : ObjectId("55fb59e7241fee91ac4cd885"), "id" : 2, "name" : "x", "ttm" : 76, "val" : 3 }
// { "_id" : ObjectId("55fb59f9241fee91ac4cd886"), "id" : 3, "name" : "x", "ttm" : 54, "val" : 7 }
db.test.aggregate(/* from first code snippet */)
// output
{
"result" : [
{
"_id" : ObjectId("55fb59f9241fee91ac4cd886"),
"val" : 7,
"name" : "x",
"ttm" : 54,
"id" : 3
},
{
"_id" : ObjectId("55fb59d9241fee91ac4cd884"),
"val" : 3,
"name" : "x",
"ttm" : 56,
"id" : 2
},
{
"_id" : ObjectId("55fb595b241fee91ac4cd881"),
"val" : 5,
"name" : "x",
"ttm" : 23,
"id" : 1
}
],
"ok" : 1
}
PROS: Almost certainly the fastest method.
CONS: Involves use of the complicated Aggregation API. Also, it is tightly coupled to the original schema of the document. Though, it may be possible to generalize this.
I believe you can use aggregate like this
collection.aggregate({
$group : {
"_id" : "$id",
"docs" : {
$first : {
"name" : "$name",
"ttm" : "$ttm",
"val" : "$val",
}
}
}
});
The issue is that you want to distill 3 matching records down to one without providing any logic in the query for how to choose between the matching results.
Your options are basically to specify aggregation logic of some kind (select the max or min value for each column, for example), or to run a select distinct query and only select the fields that you wish to be distinct.
querymongo.com does a good job of translating these distinct queries for you (from SQL to MongoDB).
For example, this SQL:
SELECT DISTINCT columnA FROM collection WHERE columnA > 5
Is returned as this MongoDB:
db.runCommand({
"distinct": "collection",
"query": {
"columnA": {
"$gt": 5
}
},
"key": "columnA"
});
If you want to write the distinct result in a file using javascript...this is how you do
cursor = db.myColl.find({'fieldName':'fieldValue'})
var Arr = new Array();
var count = 0;
cursor.forEach(
function(x) {
var temp = x.id;
var index = Arr.indexOf(temp);
if(index==-1)
{
printjson(x.id);
Arr[count] = temp;
count++;
}
})
Specify Query with distinct.
The following example returns the distinct values for the field sku, embedded in the item field, from the documents whose dept is equal to "A":
db.inventory.distinct( "item.sku", { dept: "A" } )
Reference: https://docs.mongodb.com/manual/reference/method/db.collection.distinct/