MongoDB $group aggregation - mongodb

I have collection like this
OrgName EmpId Domain Date
Google 12345 ABC 2017/01/01
Google 12345 XYZ 2017/02/01
Google 67890 ABC 2017/03/01
Google 45678 ABC 2017/03/02
Yahoo 69875 HGF 2017/03/02
Google 45678 XYZ 2017/03/03
Google 45678 XYZ 2017/03/03
Google 12345 XYZ 2017/03/03
Google 12345 ABC 2017/03/04
Google 12345 ABC 2017/04/05
I need to fetch which employee having the max "Domain" count and must be in both "ABC" and "XYZ" domain GROUPBY OrgName wise.
I am using below query:
db.Collection1.aggregate([{ "$match" : { "$or" : [ { "Domain": "ABC"},{ "Domain": "XYZ"}]}},
{
$group :{ "_id": {"OrgName" : "$OrgName", "EmpId" : "$EmpId",
"Domain" : "$Domain"},
            count:{ $sum : 1 },
"participantData" : { "$push" : { "EmpId" : "$EmpId" , "Domain" : "$Domain"}}}},
    {$sort:{"count":-1}},
     {$limit: 10}
],{ allowDiskUse: true })
In above example, am expecting result : employee_id=12345 present in both "ABC" and "XYZ" Domain count is 5 (i.e., 12345.ABC = 3 and 12345.XYZ=2).

You can try below query.
The below query $group by OrgName, EmpId followed by $match to filter documents where participant array contains both 'ABC' & 'XYZ` value.
$sort the filtered data by count and output first 10 values.
db.collection.aggregate([
{"$match":{"$or":[{"Domain":"ABC"},{"Domain":"XYZ"}]}},
{"$group":{
"_id":{"OrgName":"$OrgName","EmpId":"$EmpId"},
"count":{"$sum":1},
"participantData":{"$push":{"EmpId":"$EmpId","Domain":"$Domain"}}
}},
{"$match":{"participantData.Domain":{"$all":["ABC","XYZ"]}}},
{"$sort":{"count":-1}},
{"$limit":10}
])

Related

express mongoDB aggregation sum

I have many records in my mongodb collection and I need to recount some information.
records format:
{
"_id" : someId,
"targetFrom" : ObjectId("603e0355e805140334e79438"),<-- this is ID for search
"targetTo" : null,
"operationPaid" : true,
"type" : "coming", <--- type
"moneyAccount" : someId,
"agent" : null,
"sum" : 5000, <--- sum
}
{
"_id" : someId,
"targetFrom" : null,
"targetTo" : null,
"operationPaid" : true,
"type" : "out", <--- type
"moneyAccount" : someId,
"agent" : ObjectId("603e0355e805140334e79438"),<-- this is ID for search
"sum" : 3000, <--- sum
}
so, I need to group by records TYPE and get SUM for id ObjectId("603e0355e805140334e79438"), but id for search can be field targetFrom or targetTo or agent
for this example I need to get result 2000
sum 5000 is coming and sum 3000 is out with
Query
match the Id in one of the 3 possible fields
group by null (all collection 1 group), if type="out" i subtract the sum field else i add to sum field
Test code here
aggregate(
[{"$match":
{"$expr":
{"$or":
[{"$eq":["$targetFrom", ObjectId("603e0355e805140334e79438")]},
{"$eq":["$targetTo", ObjectId("603e0355e805140334e79438")]},
{"$eq":["$agent", ObjectId("603e0355e805140334e79438")]}]}}},
{"$group":
{"_id":null,
"sum":
{"$sum":
{"$cond":
[{"$eq":["$type", "out"]}, {"$subtract":[0, "$sum"]}, "$sum"]}}}}])

MongoDB : Sort documents based on an object within an array

I have a collection which has a sample object as follows :
{
name : 'Sachin Tendulkar',
followers : {
location : {
countries : [{
name : 'India',
count : 12345
}, {
name : 'Pakistan',
count : 12345
},{
name : 'Australia',
count : 12345
}],
cities : [{
name : 'Mumbai',
count : 12345
},{
name : 'Karachi',
count : 12345
},{
name : 'Melborne',
count : 12345
}]
states : [{
name : 'Maharastra',
count : 12345
},{
name : 'Balochistan',
count : 12345
},{
name : 'Sydney',
count : 12345
}]
}
}
}
I wish to sort all the documents based on the city count. For example,
Sort all documents according to a specific city i.e. Mumbai's count
Can you help me build a query for sorting document as per the conditions mentioned above ?
I think this will be the query:
db.collection_name.aggregate([
{$unwind:"$followers.location.cities"},
{$match:{followers.location.cities.name:"Mumbai"},
{$sort:{"followers.location.cities.count":1}}
])

Find oldest/youngest post in mongodb collection

I have a mongodb collection with many fields. One field is 'date_time', which is in an ISO datetime format, Ex: ISODate("2014-06-11T19:16:46Z"), and another field is 'name'.
Given a name, how do I find out the oldest/youngest post in the collection?
Ex: If there are two posts in the collection 'data' :
[{'name' : 'John', 'date_time' : ISODate("2014-06-11T19:16:46Z")},
{'name' : 'John', 'date_time' : ISODate("2015-06-11T19:16:46Z")}]
Given the name 'John' how do I find out the oldest post in the collection i.e., the one with ISODate("2014-06-11T19:16:46Z")? Similarly for the youngest post.
Oldest:
db.posts.find({ "name" : "John" }).sort({ "date_time" : 1 }).limit(1)
Newest:
db.posts.find({ "name" : "John" }).sort({ "date_time" : -1 }).limit(1)
Index on { "name" : 1, "date_time" : 1 } to make the queries efficient.
You could aggregate it as below:
Create an index on the name and date_time fields, so that the
$match and $sort stage operations may use it.
db.t.ensureIndex({"name":1,"date_time":1})
$match all the records for the desired name(s).
$sort by date_time in ascending order.
$group by the name field. Use the $first operator to get the first
record of the group, which will also be the oldest. Use the $last
operator to get the last record in the group, which will also be the
newest.
To get the entire record use the $$ROOT system variable.
Code:
db.t.aggregate([
{$match:{"name":"John"}},
{$sort:{"date_time":1}},
{$group:{"_id":"$name","oldest":{$first:"$$ROOT"},
"youngest":{$last:"$$ROOT"}}}
])
o/p:
{
"_id" : "John",
"oldest" : {
"_id" : ObjectId("54da62dc7f9ac597d99c182d"),
"name" : "John",
"date_time" : ISODate("2014-06-11T19:16:46Z")
},
"youngest" : {
"_id" : ObjectId("54da62dc7f9ac597d99c182e"),
"name" : "John",
"date_time" : ISODate("2015-06-11T19:16:46Z")
}
}
db.t.find().sort({ "date_time" : 1 }).limit(1).pretty()

How would this query in mongodb

example document
{
"_id" : ObjectId("5338796453370917f05bb064"),
"Sigla" : "CE",
"Regiao" : "Nordeste",
"Codigo" : 2306009,
"Municipio" : "Iracema",
"1991" : 52.40499877929688,
"2000" : 108.7089996337891,
"IDHEducacao" : {
"1991" : 0.516,
"2000" : 0.735
}
}
{
"_id" : ObjectId("5338796453370917f05bb065"),
"Sigla" : "CE",
"Regiao" : "Nordeste",
"Codigo" : 2306108,
"Municipio" : "Irauçuba",
"1991" : 47.72299957275391,
"2000" : 62.65800094604492,
"IDHEducacao" : {
"1991" : 0.491,
"2000" : 0.692
}
}
---> Mongodb
I made the following query
{"$group":
{
"_id":{"Regiao":"$Regiao"},
"IDHEducao_max_2000" : {"$max" : "$2000"},
}
}
I want to show the region, the largest index of the field in 2000, and what is the municipality that owns this index. But I'm not getting
Looks like 2000 is the name of one of the fields in your document, which I find strange.
The SQL below:
SELECT Regiao, MAX( 2000 ) AS Indice FROM table1 GROUP BY Regiao
can be written in MongoDB as
db.table1.aggregate([
{"$group": {
"_id":{"Regiao":"$Regiao"},
"IDHEducao_max_2000" : {"$max" : "$2000"}}
},
{"$project": {"_id":0, "Regiao":"$_id", "Indice":"$IDHEducao_max_2000"}}
]}
But this SQL:
SELECT Sigla, Regiao, Municipio, MAX( 2000 ) AS Indice FROM table1 GROUP BY Regiao
is NOT valid. When you use GROUP BY, you can only select fields used in the GROUP BY or aggregated values of other fields (i.e., SUM(), COUNT(), etc..). However, if all you need is some value for the other fields, you could use the $first or $last operators. Note that these operators are usually used only after a sort phase to get min/max:
db.table1.aggregate([
{"$group": {
"_id":{"Regiao":"$Regiao"},
"Sigla" : {"$first" : "$Sigla"}}
"Municipio" : {"$first" : "$Municipio"}}
"IDHEducao_max_2000" : {"$max" : "$2000"}}
},
{"$project": {"_id":0, "Sigla":1, "Regiao":"$_id", "Municipio":1,
"Indice":"$IDHEducao_max_2000"}}
]}
EDIT: OP was updated with the question below:
I want to show the region, the largest index of the field in 2000, and what is the municipality that owns this index.
If you use the $sort phase of the aggregation pipeline followed by $group phase and make use of the $first operator, you can get the results you want:
db.table1.aggregate([
// Sort by (City ASC, Index DESC)
{$sort:{"Regiao":1, "2000":-1}},
// Group by City and get the max Index and corresponding Municipality
{$group:{
_id:"$Regiao",
Index:{$first:"$2000"},
Municipio:{$first:"$Municipio"}}
}
])

Pivot rows to columns in MongoDB

The relevant question is Efficiently convert rows to columns in sql server. But the answer is specific to SQL.
I want the same result i.e. pivot row to column without aggregating anything (as of now) in MongoDB.
The collection looks something as below. These are statistics of facebook page properties:
timestamp | propName | propValue
--------------------------------
1371798000000 | page_fans | 100
--------------------------------
1371798000000 | page_posts | 50
--------------------------------
1371798000000 | page_stories | 25
--------------------------------
I need answer like:
timestamp | page_fans | page_posts | page_stories
--------------------------------
1371798000000 | 100 | 50 | 25
--------------------------------
The column names are pre-determined. They don't have to be generated dynamically. But question is how to achieve this in MongoDB.
I believe aggregation is of no use for this purpose. Do I need to use MapReduce? But in that case I have nothing to reduce I guess? Well another option could be fetching these values in code and do the manipulation in programming language e.g. Java
Any insights would be helpful. Thanks in advance :)!!!
EDIT (Based on input from Schaliasos):
Input JSON:
{
"_id" : ObjectId("51cd366644aeac654ecf8f75"),
"name" : "page_storytellers",
"pageId" : "512f993a44ae78b14a9adb85",
"timestamp" : NumberLong("1371798000000"),
"value" : NumberLong(30871),
"provider" : "Facebook"
}
{
"_id" : ObjectId("51cd366644aeac654ecf8f76"),
"name" : "page_fans",
"pageId" : "512f993a44ae78b14a9adb85",
"timestamp" : NumberLong("1371798000000"),
"value" : NumberLong(1291509),
"provider" : "Facebook"
}
{
"_id" : ObjectId("51cd366644aeac654ecf8f77"),
"name" : "page_fan_adds",
"pageId" : "512f993a44ae78b14a9adb85",
"timestamp" : NumberLong("1371798000000"),
"value" : NumberLong(2829),
"provider" : "Facebook"
}
Expected Output JSON:
{
"timestamp" : NumberLong("1371798000000"),
"provider" : "Facebook",
"page_storytellers" : NumberLong(30871),
"page_fans" : NumberLong("1371798000000"),
"page_fan_adds" : NumberLong("1371798000000")
}
Now, you can utilise new aggregation operator $arrayToObject to pivot MongoDB keys. This operator is available in MongoDB v3.4.4+
For example, given an example data of:
db.foo.insert({ provider: "Facebook", timestamp: '1371798000000', name: 'page_storytellers', value: 20871})
db.foo.insert({ provider: "Facebook", timestamp: '1371798000000', name: 'page_fans', value: 1291509})
db.foo.insert({ provider: "Facebook", timestamp: '1371798000000', name: 'page_fan_adds', value: 2829})
db.foo.insert({ provider: "Google", timestamp: '1371798000000', name: 'page_fan_adds', value: 1000})
You can utilise Aggregation Pipeline below:
db.foo.aggregate([
{$group:
{_id:{provider:"$provider", timestamp:"$timestamp"},
items:{$addToSet:{name:"$name",value:"$value"}}}
},
{$project:
{tmp:{$arrayToObject:
{$zip:{inputs:["$items.name", "$items.value"]}}}}
},
{$addFields:
{"tmp.provider":"$_id.provider",
"tmp.timestamp":"$_id.timestamp"}
},
{$replaceRoot:{newRoot:"$tmp"}
}
]);
The output would be:
{
"page_fan_adds": 1000,
"provider": "Google",
"timestamp": "1371798000000"
},
{
"page_fan_adds": 2829,
"page_fans": 1291509,
"page_storytellers": 20871,
"provider": "Facebook",
"timestamp": "1371798000000"
}
See also $group,
$project,
$addFields,
$zip,
and $replaceRoot
I have done something like this using aggregation. Could this help ?
db.foo.insert({ timestamp: '1371798000000', propName: 'page_fans', propValue: 100})
db.foo.insert({ timestamp: '1371798000000', propName: 'page_posts', propValue: 25})
db.foo.insert({ timestamp: '1371798000000', propName: 'page_stories', propValue: 50})
db.foo.aggregate({ $group: { _id: '$timestamp', result: { $push: { 'propName': '$propName', 'propValue': '$propValue' } }}})
{
"result" : [
{
"_id" : "1371798000000",
"result" : [
{
"propName" : "page_fans",
"propValue" : 100
},
{
"propName" : "page_posts",
"propValue" : 50
},
{
"propName" : "page_stories",
"propValue" : 25
}
]
}
],
"ok" : 1
}
You may want to use $sum operator along the way. See here