MongoDB Count and groupby subdocument - mongodb

I have the following document:
{
"id":1,
"url":"mysite.com",
"views":
[
{"ip":"1.1.1.1","date":"01-01-2015"},
{"ip":"2.2.2.2","date":"01-01-2015"},
{"ip":"1.1.1.1","date":"01-01-2015"},
{"ip":"1.1.1.1","date":"01-01-2015"}
]
}
If i want to count how many unique ips (groupBy), how can I do that with mongo?

Use the aggregation framework to get the desired result. The aggregation pipeline will have a $unwind operation as the first step which deconstructs the views array field from the input documents to output a document for each element. Each output document replaces the array with an element value. The next pipeline stage $group then groups the documents by the "views.ip" field, calculates the count field for each group, and outputs a document for each unique state.
The new per-ip documents have two fields: the _id field and the count field. The _id field contains the value of the unique IP address; i.e. the group by field. The count field is a calculated field that contains the total ip count per each unique IP. To calculate the value, $group uses the $sum operator to calculate the total number of IP addresses. So your final aggregation pipeline would look like this:
db.collection.aggregate([
{
"$unwind": "$views"
},
{
"$group": {
"_id": "$views.ip",
"count": {
"$sum": 1
}
}
}
])
Output:
/* 1 */
{
"result" : [
{
"_id" : "2.2.2.2",
"count" : 1
},
{
"_id" : "1.1.1.1",
"count" : 3
}
],
"ok" : 1
}
-- UPDATE --
To get the total of all unique IP's, you need another $group pipeline stage, this time the _id is null, that is you group all the documents from the previous pipeline stream into one, then use the same $sum operation on that group to get the total of the count. The aggregation pipeline would look like this in the end:
db.collection.aggregate([
{
"$unwind": "$views"
},
{
"$group": {
"_id": "$views.ip",
"count": {
"$sum": 1
}
}
},
{
"$group": {
"_id": null,
"total": {
"$sum": "$count"
}
}
}
])
Output:
/* 1 */
{
"result" : [
{
"_id" : null,
"total" : 4
}
],
"ok" : 1
}

Related

Mongo find duplicates for entries for two or more fields

I have documents like this:
{
"_id" : ObjectId("557eaf444ba222d545c3dffc"),
"foreing" : ObjectId("538726124ba2222c0c0248ae"),
"value" : "test",
}
I want to find all documents which have duplicated values for pair foreing & value.
You can easily identify the duplicates by running the following aggregation pipeline operation:
db.collection.aggregate([
{
"$group": {
"_id": { "foreing": "$foreing", "value": "$value" },
"uniqueIds": { "$addToSet": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$match": { "count": { "$gt": 1 } } }
])
The $group operator in the first step is used to group the documents by the foreign and value key values and then create an array of _id values for each of the grouped documents as the uniqueIds field using the $addToSet operator. This gives you an array of unique expression values for each group. Get the total number of grouped documents to use in the later pipeline stages with the $sum operator.
In the second pipeline stage, use the $match operator to filter out all documents with a count of 1. The filtered-out documents represent unique index keys.
The remaining documents will be those in the collection that have duplicate key values for pair foreing & value.
We only have to group on the bases of 2 keys, and select the elements with count greater than 1, to find the duplicates.
Query :- Will be like
db.mycollection.aggregate(
{ $group: {
_id: { foreing: "$foreing", value: "$value" },
count: { $sum: 1 },
docs: { $push: "$_id" }
}},
{ $match: {
count: { $gt : 1 }
}}
)
OUTPUT :- Will be like
{
"result" : [
{
"_id" : {
"foreing" : 1,
"value" : 2
},
"count" : 2,
"docs" : [
ObjectId("34567887654345678987"),
ObjectId("34567887654345678987")
]
}
],
"ok" : 1
}
Reference Link :- How to find mongo documents with a same field
Difference between node.js require and ES6 import and export

Mongo $group with $project

I am trying to get keyword count along with parentId, categioryId and llcId.
My db is
{
"_id" : ObjectId("5673f5b1e4b0822f6f0a5b89"),
"keyword" : "electronic content management system",
"llcId" : "CL1K9B",
"categoryId" : "CL1K8V",
"parentId" : "CL1K8V",
}
I tried $project with $group
db.keyword.aggregate([
{
$group: {
_id: "$llcId",
total: {$sum: 1},
}
},
{
$project: {
categoryId: 1, total: 1
}
}
])
And it gives me a result like
{ "_id" : "CL1KJQ", "total" : 17 }
{ "_id" : "CL1KKW", "total" : 30 }
But I need actual data in result also e.g. llcId, categoryId, keyword, total. I tried to display cetgoryId and keyword by using $project but it displays only _id and total. What I am missing?
To get the keyword count you'd need to group the documents by the keyword field, then use the accumulator operator $sum to get the documents count. As for the other field values, since you are grouping all the documents by the keyword value, the best you can do to get the other fields is use the $first operator which returns a value from the first document for each group. Otherwise you may have to use the $push operator to return an array of the field values for each group:
var pipeline = [
{
"$group": {
"_id": "$keyword",
"total": { "$sum": 1 },
"llcId": { "$first": "$llcId"},
"categoryId": { "$first": "$categoryId"},
"parentId": { "$first": "$parentId"}
}
}
];
db.keyword.aggregate(pipeline)
You are grouping by llcId so it will give more than one categoryId per llcId.
If you want categoryId as in your result, you have to write that in your group query. For example:
db.keyword.aggregate([
{
$group: {
_id: "$llcId",
total: {$sum: 1},
categoryId:{$max:"$categoryId"}
}
},
{
$project: {
categoryId: 1, total: 1
}
}])

Limiting Query result in MongoDB

I have 20,000+ documents in my mongodb. I just learnt that you cannot query them all in one go.
So my question is this:
I want to get my document using find(query) then limit its results for 3 documents only and I can choose where those documents start from.
For example if my find() query resulted in 8 documents :
[{doc1}, {doc2}, {doc3}, {doc4}, {doc5}, {doc6}, {doc7}, {doc 8}]
command limit(2, 3) will gives [doc3, doc4, doc5]
And I also need to get total count for all that result(without limit) for example : length() will give 8 (the number of total document resulted from find() function)
Any suggestion? Thanks
add .skip(2).limit(3) to the end of your query
I suppose you have the following documents in your collection.
{ "_id" : ObjectId("56801243fb940e32f3221bc2"), "a" : 0 }
{ "_id" : ObjectId("56801243fb940e32f3221bc3"), "a" : 1 }
{ "_id" : ObjectId("56801243fb940e32f3221bc4"), "a" : 2 }
{ "_id" : ObjectId("56801243fb940e32f3221bc5"), "a" : 3 }
{ "_id" : ObjectId("56801243fb940e32f3221bc6"), "a" : 4 }
{ "_id" : ObjectId("56801243fb940e32f3221bc7"), "a" : 5 }
{ "_id" : ObjectId("56801243fb940e32f3221bc8"), "a" : 6 }
{ "_id" : ObjectId("56801243fb940e32f3221bc9"), "a" : 7 }
From MongoDB 3.2 you can use the .aggregate() method and the $slice operator.
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" }
}},
{ "$project": {
"count": 1,
"_id": 0,
"docs": { "$slice": [ "$docs", 2, 3 ] }
}}
])
Which returns:
{
"count" : 8,
"docs" : [
{
"_id" : ObjectId("56801243fb940e32f3221bc4"),
"a" : 2
},
{
"_id" : ObjectId("56801243fb940e32f3221bc5"),
"a" : 3
},
{
"_id" : ObjectId("56801243fb940e32f3221bc6"),
"a" : 4
}
]
}
You may want to sort your document before grouping using the $sort operator.
From MongoDB 3.0 backwards you will need to first $group your documents and use the $sum accumulator operator to return the "count" of documents; also in that same group stage you need to use the $push and the $$ROOT variable to return an array of all your documents. The next stage in the pipeline will then be the $unwind stage where you denormalize that array. From there use use the $skip and $limit operators respectively skip the first 2 documents and passes 3 documents to the next stage which is another $group stage.
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" }
}},
{ "$unwind": "$docs" },
{ "$skip": 2 },
{ "$limit": 3 },
{ "$group": {
"_id": "$_id",
"count": { "$first": "$count" },
"docs": { "$push": "$docs" }
}}
])
As #JohnnyHK pointed out in this comment
$group is going to read all documents and build a 20k element array with them just to get three docs.
You should then run two queries using find()
db.collection.find().skip(2).limit(3)
and
db.collection.count()

How to calculate a sum of specific documents using aggregation?

I have a schema representing a message thread. So each document in the mongo database looks something like:
{
id: "thread_id",
participants: ["user1", "user2"],
unReadMessageCounts: [
{
participant: "user1",
count: 5
},
{
participant: "user2",
count: 3
}
}
What I want to do is get a sum of all unread messages counts for a given user - say, "user2". I know I could do this by just doing a find() on the collection and then writing a function to sum up to counts for a given user. But I'd like to use mongo's aggregate functionality if possible. I know I can do a match to first select all threads in which "user2" is a participant, but then how do I construct the group and/or sum expressions to pull out the right field from the document?
Use the following aggregation pipeline to get the desired result. The initial step filters out the incoming documents to only accept "user2" participant by way of the $match operator.
The preceding pipeline stage then "denormalizes" the unReadMessageCounts array through the $unwind operator that outputs 2 documents from the array for each incumbent document (in your above sample data).
Further filtering is necessary to aggregate data for the correct participant and this is done through another $match pipeline step.
The final aggregation operation using $group specifies a group _id of null, calculating the total counts for all documents in the pipeline using the accumulator operator $sum on the "unReadMessageCounts.count" field.
So, running this aggregation pipeline on the sample data given:
db.collection.aggregate([
{
"$match": { "unReadMessageCounts.participant": "user2" }
},
{ "$unwind" : "$unReadMessageCounts" },
{
"$match": { "unReadMessageCounts.participant": "user2" }
},
{
"$group": {
"_id": null,
"total": { "$sum": "$unReadMessageCounts.count" }
}
}
])
will yield the result:
/* 0 */
{
"result" : [
{
"_id" : null,
"total" : 3
}
],
"ok" : 1
}
You can use the $redact operator to as shown here to limit the size of documents to process in the pipeline then you $unwind your documents and in the $group stage you use the $sum accumulator operator to return total of unread message for "user2".
db.collection.aggregate([
{ "$match": {
"unReadMessageCounts": {
"$elemMatch": { "participant": "user2" }
}
}},
{ "$redact": {
"$cond": [
{ "$or": [
{ "$eq": [ "$participant", "user2" ] },
{ "$not" : "$participant" }
]},
"$$DESCEND",
"$$PRUNE"
]
}},
{ "$unwind": "$unReadMessageCounts" },
{ "$group": {
"_id": null,
"total": { "$sum": "$unReadMessageCounts.count" }
}}
])

mongodb count num of distinct values per field/key

Is there a query for calculating how many distinct values a field contains in DB.
f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)
If someone adds more documents with a new country I would like the query to return 9.
Is there easier way then group and count?
MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.
There is a shell db.collection.distinct() helper as well:
> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]
> db.countries.distinct('country').length
4
As noted in the MongoDB documentation:
Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.
Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.
db.articles.aggregate([
{
$match: {
keywords: { $not: {$size: 0} }
}
},
{ $unwind: "$keywords" },
{
$group: {
_id: {$toLower: '$keywords'},
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{ $sort : { count : -1} },
{ $limit : 100 }
]);
that give result such as
{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }
With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.
For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:
db.users.aggregate([
{ "$group": {
"_id": { "$toLower": "$role" },
"count": { "$sum": 1 }
} },
{ "$group": {
"_id": null,
"counts": {
"$push": { "k": "$_id", "v": "$count" }
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Example Output
{
"user" : 67,
"superuser" : 5,
"admin" : 4,
"moderator" : 12
}
I wanted a more concise answer and I came up with the following using the documentation at aggregates and group
db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])
You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.
Sample
For each distinct value of field counts the occurrences in documents optionally filtered by query
> db.users.distinctAndCount('name', {name: /^a/i})
{
"Abagail": 1,
"Abbey": 3,
"Abbie": 1,
...
}
The field parameter could be an array of fields
> db.users.distinctAndCount(['name','job'], {name: /^a/i})
{
"Austin,Educator" : 1,
"Aurelia,Educator" : 1,
"Augustine,Carpenter" : 1,
...
}
To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :
db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})
So, find number distinct names from a collection where age > 25 will be like :
db.your_collection_name.distinct('names', {'age': {"$gt": 25}})
Hope it helps!
I use this query:
var collection = "countries"; var field = "country";
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})
Output:
countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16
This query first distinct all the values, and then count for each one of them the number of occurrences.
If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:
db.users.aggregate([
{ $group: { _id: '$country' } },
{ $count: 'countOfUniqueCountries' }
]);