I'm currently experimenting with MongoDB. Using the Twitters Streaming API I collected a bunch of tweets (seemed a good way to learn to use MongoDB's aggregation options).
I have the following query
db.twitter.aggregate([
{ $group : { _id : '$status.user.screen_name', count: { $sum : 1 } } },
{ $sort : { count : -1, _id : 1 } },
{ $skip : 0 },
{ $limit : 5 },
]);
As expected this is te result:
{
"result" : [
{
"_id" : "VacaturesBreda",
"count" : 5
},
{
"_id" : "breda_nws",
"count" : 3
},
{
"_id" : "BredaDichtbij",
"count" : 2
},
{
"_id" : "JobbirdUTITBaan",
"count" : 2
},
{
"_id" : "vacatures_nr1",
"count" : 2
}
],
"ok" : 1
}
The question is how can I match on the user id_str and return the screen_name and for example the followers_count of the user. I tried to do this with { $project .... } but I kept ending up with an empty result set.
For those not familiar with the user object in Twitters JSON response here a part of it (just selected the first user in the db).
"user" : {
"id" : 2678963916,
"id_str" : "2678963916",
"name" : "JobbirdUT IT Banen",
"screen_name" : "JobbirdUTITBaan",
"location" : "Utrecht",
"url" : "http://www.jobbird.com",
"description" : "Blijf op de hoogte van de nieuwste IT/Automatisering vacatures in Utrecht, via http://Jobbird.com",
"protected" : false,
"verified" : false,
"followers_count" : 1,
"friends_count" : 1,
"listed_count" : 0,
"favourites_count" : 0,
"statuses_count" : 311,
"created_at" : "Fri Jul 25 07:35:48 +0000 2014",
...
},
Update: As requested a clear example on the proposed response (sorry for not adding it).
So instead of grouping on the screen_name grouping on the id_str. Why you might ask, it is possible to edit your screen_name but you are still the same user for Twitter (so the last screen_name should be returned:
db.twitter.aggregate([
{ $group : { _id : '$status.user.id_str', count: { $sum : 1 } } },
{ $sort : { count : -1, _id : 1 } },
{ $skip : 0 },
{ $limit : 5 },
]);
And as the response something like this:
{
"result" : [
{
"_id" : "123456789",
"screen_name": "awsome_screen_name",
"followers_count": 523,
"count" : 5
},
....
],
"ok" : 1
}
You are basically looking for an operator that does not specifically "aggregate" the content, and this is basically what the $first and $last operators do:
db.twitter.aggregate([
{ "$group": {
"_id": "$status.user.id_str",
"screen_name": { "$first": "$status.user.screen_name" },
"followers_count": { "$sum": "$status.user.followers_count" },
"count": { "$sum": 1 }
}},
{ "$sort": { "followers_count": -1, "count": -1 } },
{ "$limit": 5 }
])
Which picks the "first" occurrence of the field based on the grouping key. That is generally useful where there is duplicated related data in the documents to the grouping key.
An alternate approach is to include the fields in the grouping key. You can later restructure with $project:
db.twitter.aggregate([
{ "$group": {
"_id": {
"_id": "$status.user.id_str",
"screen_name": "$status.user.screen_name"
},
"followers_count": { "$sum": "$status.user.followers_count" },
"count": { "$sum": 1 }
}},
{ "$project": {
"_id": "$_id._id",
"screen_name": "$_id.screen_name"
"followers_count": 1,
"count": 1
}},
{ "$sort": { "followers_count": -1, "count": -1 } },
{ "$limit": 5 }
])
Which is useful where you are unsure of the related "uniqueness".
Related
I have a simple table with ranked users...
User:
{
"_id" : "aaa",
"rank" : 10
},
{
"_id" : "bbb",
"rank" : 30
},
{
"_id" : "ccc",
"rank" : 20
},
{
"_id" : "ddd",
"rank" : 30
},
{
"_id" : "eee",
"rank" : 30
},
{
"_id" : "fff",
"rank" : 10
}
And I would like to count how many have each rank, and then sort them with highest to lowest count
So I can get this result:
Result:
{
"rank" : 30,
"count": 3
},
{
"rank" : 10,
"count": 2
},
{
"rank" : 20,
"count": 1
}
I tried different things but cant seem to get the correct output
db.getCollection("user").aggregate([
{
"$group": {
"_id": {
"rank": "$rank"
},
"count": { "$sum": 1 }
},
"$sort": {
"count" : -1
}
])
I hope this is possible to do.
You can count and then sort them by aggregation in mongodb
db.getCollection('users').aggregate(
[
{
$group:
{
_id: "$rank",
count: { $sum: 1 }
}
},
{ $sort : { count : -1} }
]
)
Working example
https://mongoplayground.net/p/aM3Ci3GACjp
You don't need to add additional group or count stages when you can do it in one go -
db.getCollection("user").aggregate([
{
$sortByCount: "$rank"
}
])
I am trying to find a user list which is new for day-1. I have written the query to find the users who arrived till the day before yesterday and the list of users arrived yesterday. Now I want minus those data how can I do that in a single aggregate function.
Function to get the list before yesterday
db.chat_question_logs.aggregate([
{
$match : {"createdDate":{$lte: ISODate("2020-04-29T00:00:00Z")}}
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
similarly for the day-1 is as below
db.chat_question_logs.aggregate([
{
$match : {"createdDate":{$gte: ISODate("2020-04-30T00:00:00Z"),$lte: ISODate("2020-05-01T00:00:00Z")}}
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
Result JSON are as below
/* 1 */
{
"_id" : {
"userId" : "2350202241750776"
},
"count" : 1
},
/* 2 */
{
"_id" : {
"userId" : "26291570771793121"
},
"count" : 1
},
/* 3 */
{
"_id" : {
"userId" : "2742872209107866"
},
"count" : 5
},
/* 4 */
{
"_id" : {
"userId" : "23502022417507761212"
},
"count" : 1
},
/* 5 */
{
"_id" : {
"userId" : "2629157077179312"
},
"count" : 43
}
How can I find the difference.
It sounds like what you want is to get all users created yesterday (which is the 28th in this example).
db.chat_question_logs.aggregate([
{
$match : { $and: [
{ "createdDate":{$lt: ISODate("2020-04-29T00:00:00Z")} },
{ "createdDate": {$gte: ISODate("2020-04-28T00:00:00Z") }}
] }
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
Is this what you want?
Hi found the solution which is below
I used the group and first appearance of the Id and then filter record on date which I wanted.The query is as below
db.chat_question_logs.aggregate([
{
$group:
{
_id: "$userInfo.userId",
firstApprance: { $first: "$createdDate" }
}
},
{
$match : { "firstApprance": { $gte: new ISODate("2020-05-03"), $lt: new ISODate("2020-05-05") } }
}
])
I've searched but could not find an answer to my problem. I need to count the occurences of the field "nationalCode". I've got a collection with this sample structure in MongoDB:
{
"_id" : ObjectId("5d7519cc6c17d65d4983f048"),
"origin" : "Base1",
"topic" : [
{
"nationalTopic" : {
"nationalCode" : 26
},
"dateTime" : NumberLong(20120927000000)
},
{
"nationalTopic" : {
"nationalCode" : 132
},
"dateTime" : NumberLong(20120927000000)
},
{
"nationalTopic" : {
"nationalCode" : 26
},
"dateTime" : NumberLong(20120927000000)
},
{
"nationalTopic" : {
"nationalCode" : 26
},
"dateTime" : NumberLong(20121005000000)
}
]
}
I've used the following code (I tried many variations of it, but none of them got me the right results):
db.processos.aggregate(
[
{ "$unwind": "$topic" },
{"$match": {"origin": "Base1"}},
{"$group": { "_id": { nationalCode: "$topic.nationalTopic.nationalCode", "count": { "$sum": 1 }} } }
]
)
I'm expecting something like this:
{
"_id" : {
"nationalCode" : 26,
"count" : 3.0
}
}
/* 2 */
{
"_id" : {
"nationalCode" : 132,
"count" : 1.0
}
}
You should extract the count element from the _id.
The following query worked for me.
db.data.aggregate(
[
{ "$unwind": "$topic" },
{"$match": {"origin": "Base1"}},
{"$group": { _id: { "nationalCode": "$topic.nationalTopic.nationalCode" },
"count": {$sum: 1} }
}
]
)
just do it with $project to change your format
do it like this
MongoDB Enterprise >
db.ggg.aggregate(
[
{$unwind:"$topic"},
{"$match": {"origin": "Base1"}},
{"$group": { "_id": { nationalCode: "$topic.nationalTopic.nationalCode"},
"count": { "$sum": 1 } }},
{$project :{"_id.nationalCode":1,"_id.count":"$count"}}
]
)
here it the result !
{ "_id" : { "nationalCode" : 26, "count" : 3 } }
{ "_id" : { "nationalCode" : 132, "count" : 1 } }
I have an object looks like this
{
"_id" : {
"import_type" : "MANUAL_UPLOAD",
"supplier" : "jabino.de",
"unit_price" : "0"
},
"statuses" : [
{
"status" : "DUPLICATED",
"count" : 14
},
{
"status" : "BLACKLISTED",
"count" : 2
},
{
"status" : "USABLE",
"count" : 2239
},
{
"status" : "INVALID_EMAIL_ADDRESS",
"count" : 1
},
{
"status" : "DUPLICATED",
"count" : 14
},
{
"status" : "BLACKLISTED",
"count" : 2
},
{
"status" : "USABLE",
"count" : 2239
},
{
"status" : "INVALID_EMAIL_ADDRESS",
"count" : 1
}
]
}
How I can sum all the count in the statuses array which has the same status without losing keys-values in _id. E.g. in this case
Duplicated: 28
Blacklisted: 4
Usable: 4478
Invalid email address: 2
You can use below aggregation
db.collection.aggregate([
{ "$unwind": "$statuses" },
{ "$group": {
"_id": {
"_id": "$_id",
"statuses": "$statuses.status"
},
"count": { "$sum": "$statuses.count" }
}},
{ "$group": {
"_id": "$_id._id",
"statuses": {
"$push": {
"status": "$_id.statuses",
"count": "$count"
}
}
}}
])
My collection looks like this:
{
"_id":ObjectId("5744b6cd9c408cea15964d18"),
"uuid":"bbde4bba-062b-4024-9bb0-8b12656afa7e",
"version":1,
"categories":["sport"]
},
{
"_id":ObjectId("5745d2bab047379469e10e27"),
"uuid":"bbde4bba-062b-4024-9bb0-8b12656afa7e",
"version":2,
"categories":["sport", "shopping"]
},
{
"_id":ObjectId("5744b6359c408cea15964d15"),
"uuid":"561c3705-ba6d-432b-98fb-254483fcbefa",
"version":1,
"categories":["politics"]
}
I want to count the number of documents for every category. To do this, I unwind the categories array:
db.collection.aggregate(
{$unwind: '$categories'},
{$group: {_id: '$categories', count: {$sum: 1}} }
)
Result:
{ "_id" : "sport", "count" : 2 }
{ "_id" : "shopping", "count" : 1 }
{ "_id" : "politics", "count" : 1 }
Now I want to count the number of documents for every category, but where document version is the latest version.
This is where I am stuck.
It's ugly but I think this gives you what you're after:
db.collection.aggregate(
{ $unwind : "$categories" },
{ $group :
{ "_id" : { "uuid" : "$uuid" },
"doc" : { $push : { "version" : "$version", "category" : "$categories" } },
"maxVersion" : { $max : "$version" }
}
},
{ $unwind : "$doc" },
{ $project : { "_id" : 0, "uuid" : "$id.uuid", "category" : "$doc.category", "isCurrentVersion" : { $eq : [ "$doc.version", "$maxVersion" ] } } },
{ $match : { "isCurrentVersion" : true }},
{ $group : { "_id" : "$category", "count" : { $sum : 1 } } }
)
You can do this by first grouping the denormalized documents (from the $unwind operator step) by two keys, i.e. the categories and version fields. This is necessary for the preceding pipeline step which orders the grouped documents and their accumulated counts by the version (desc) and categories (asc) keys respectively using the $sort operator.
Another grouping will be required to get the top documents in each categories group after ordering using the $first operator. The following shows this
db.collection.aggregate(
{ "$unwind": "$categories" },
{
"$group": {
"_id": {
'categories': '$categories',
'version': '$version'
},
"count": { "$sum": 1 }
}
},
{ "$sort": { "_id.version": -1, "_id.categories": 1 } },
{
"$group": {
"_id": "$_id.categories",
"count": { "$first": "$count" },
"version": { "$first": "$_id.version" }
}
}
)
Sample Output
{ "_id" : "shopping", "count" : 1, "version" : 2 }
{ "_id" : "sport", "count" : 1, "version" : 2 }
{ "_id" : "politics", "count" : 1, "version" : 1 }