example document
{
"_id" : ObjectId("5338796453370917f05bb064"),
"Sigla" : "CE",
"Regiao" : "Nordeste",
"Codigo" : 2306009,
"Municipio" : "Iracema",
"1991" : 52.40499877929688,
"2000" : 108.7089996337891,
"IDHEducacao" : {
"1991" : 0.516,
"2000" : 0.735
}
}
{
"_id" : ObjectId("5338796453370917f05bb065"),
"Sigla" : "CE",
"Regiao" : "Nordeste",
"Codigo" : 2306108,
"Municipio" : "Irauçuba",
"1991" : 47.72299957275391,
"2000" : 62.65800094604492,
"IDHEducacao" : {
"1991" : 0.491,
"2000" : 0.692
}
}
---> Mongodb
I made the following query
{"$group":
{
"_id":{"Regiao":"$Regiao"},
"IDHEducao_max_2000" : {"$max" : "$2000"},
}
}
I want to show the region, the largest index of the field in 2000, and what is the municipality that owns this index. But I'm not getting
Looks like 2000 is the name of one of the fields in your document, which I find strange.
The SQL below:
SELECT Regiao, MAX( 2000 ) AS Indice FROM table1 GROUP BY Regiao
can be written in MongoDB as
db.table1.aggregate([
{"$group": {
"_id":{"Regiao":"$Regiao"},
"IDHEducao_max_2000" : {"$max" : "$2000"}}
},
{"$project": {"_id":0, "Regiao":"$_id", "Indice":"$IDHEducao_max_2000"}}
]}
But this SQL:
SELECT Sigla, Regiao, Municipio, MAX( 2000 ) AS Indice FROM table1 GROUP BY Regiao
is NOT valid. When you use GROUP BY, you can only select fields used in the GROUP BY or aggregated values of other fields (i.e., SUM(), COUNT(), etc..). However, if all you need is some value for the other fields, you could use the $first or $last operators. Note that these operators are usually used only after a sort phase to get min/max:
db.table1.aggregate([
{"$group": {
"_id":{"Regiao":"$Regiao"},
"Sigla" : {"$first" : "$Sigla"}}
"Municipio" : {"$first" : "$Municipio"}}
"IDHEducao_max_2000" : {"$max" : "$2000"}}
},
{"$project": {"_id":0, "Sigla":1, "Regiao":"$_id", "Municipio":1,
"Indice":"$IDHEducao_max_2000"}}
]}
EDIT: OP was updated with the question below:
I want to show the region, the largest index of the field in 2000, and what is the municipality that owns this index.
If you use the $sort phase of the aggregation pipeline followed by $group phase and make use of the $first operator, you can get the results you want:
db.table1.aggregate([
// Sort by (City ASC, Index DESC)
{$sort:{"Regiao":1, "2000":-1}},
// Group by City and get the max Index and corresponding Municipality
{$group:{
_id:"$Regiao",
Index:{$first:"$2000"},
Municipio:{$first:"$Municipio"}}
}
])
Related
I have many records in my mongodb collection and I need to recount some information.
records format:
{
"_id" : someId,
"targetFrom" : ObjectId("603e0355e805140334e79438"),<-- this is ID for search
"targetTo" : null,
"operationPaid" : true,
"type" : "coming", <--- type
"moneyAccount" : someId,
"agent" : null,
"sum" : 5000, <--- sum
}
{
"_id" : someId,
"targetFrom" : null,
"targetTo" : null,
"operationPaid" : true,
"type" : "out", <--- type
"moneyAccount" : someId,
"agent" : ObjectId("603e0355e805140334e79438"),<-- this is ID for search
"sum" : 3000, <--- sum
}
so, I need to group by records TYPE and get SUM for id ObjectId("603e0355e805140334e79438"), but id for search can be field targetFrom or targetTo or agent
for this example I need to get result 2000
sum 5000 is coming and sum 3000 is out with
Query
match the Id in one of the 3 possible fields
group by null (all collection 1 group), if type="out" i subtract the sum field else i add to sum field
Test code here
aggregate(
[{"$match":
{"$expr":
{"$or":
[{"$eq":["$targetFrom", ObjectId("603e0355e805140334e79438")]},
{"$eq":["$targetTo", ObjectId("603e0355e805140334e79438")]},
{"$eq":["$agent", ObjectId("603e0355e805140334e79438")]}]}}},
{"$group":
{"_id":null,
"sum":
{"$sum":
{"$cond":
[{"$eq":["$type", "out"]}, {"$subtract":[0, "$sum"]}, "$sum"]}}}}])
here is my mongo document..
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
}
]
}
i want to update and upsert invoice_id into last element of sub-array..
i have tried..
sort: {$natural: -1},
subscription.$.invoice
what i want it to be is....
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}
While there are ways to get the last array element, like Saravana shows in her answer, I don't recommend doing it that way because it introduces race conditions. For example, if two subs are added simultaneously, you can't depend on which one is 'last' in the array.
If an invoice_id has to be tied to a specific sub_id, then it's far better to query and find that specific element in the array, then add the invoice_id to it.
In the comments, the OP indicated that the current order of operations is 1) add sub_id, 2) insert the invoice record into the INVOICE collection and get the invoice_id, 3) add the invoice_id into the new subscription.
However, if you already have the sub_id, then it's better to re-order your operations this way: 1) insert the invoice record and get the invoice_id 2) add both sub_id and invoice_id with a single operation.
Doing this improves performance (eliminates the second update operation), but more importantly, eliminates race conditions because you're adding both sub_id and invoice_id at the same time.
we can get the document and update last element by index
> var doc = db.sub.findOne({"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8")})
> if ( doc.subscriptions.length - 1 >= 0 )
doc.subscriptions[doc.subscriptions.length-1].invoice_id="5a56fd399dd78e33948c9b8f"
> db.sub.update({_id:doc._id},doc)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
or write an aggregation pipeline to form the document and use it for update
db.sub.aggregate(
[
{$match : { "_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8") }},
{$addFields : { last : { $subtract : [{$size : "$subscriptions"},1]}}},
{$unwind : { path :"$subscriptions" , includeArrayIndex : "idx"}},
{$project : { "subscriptions.sub_id" : 1,
"subscriptions.invoice_id" : {
$cond : {
if: { $eq: [ "$idx", "$last" ] },
then: "5a56fd399dd78e33948c9b8f",
else: "$$REMOVE"
}
}
}
},
{$group : {_id : "$_id", subscriptions : {$push : "$subscriptions"}}}
]
).pretty()
result doc
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}
I am caching data from an online resource for future use in machine learning. This data is canonical and has no missing entries.
In the event that the real-time connection is dropped or the machine rebooted, I have a safeguard in place that does a historical search for a range of ids that are missing from the cache.
What I have yet to implement, however, is a mechanism for searching through the collection and identifying ranges where id values have been skipped.
For instance:
{"entry_id": 27497713, ...}
{"entry_id": 27497761, ...}
This data has a clear gap where entries are missing between 27497713 and 27497761.
Is there a way I can find such a gap using queries? Perhaps at least narrowing it down by selecting values between two ranges and checking the count of returned entries? Given how many entries the collection contains, I am trying to avoid lots of queries for efficiency.
can you try this aggregation
$group - get $min and $max
$addFields - generate $range by $min and $max entry_id
$lookup - self lookup with generated range ids and entry ids
$project - get only non matching range ids using setDifference
pipeline
db.entries.aggregate(
[
{$group : {_id : null, min : {$min : "$entry_id"}, max : {$max : "$entry_id"}}},
{$addFields : {rangeIds : {$range : ["$min", "$max"]}}},
{$lookup : {from : "entries", localField : "rangeIds", foreignField : "entry_id", as : "entries"}},
{$project : {_id :0, missingIds : {$setDifference : ["$rangeIds", "$entries.entry_id"]}}}
]
)
collection
> db.entries.find()
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad22"), "entry_id" : 27497713 }
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad23"), "entry_id" : 27497761 }
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad24"), "entry_id" : 27497750 }
>
aggregate result
> db.entries.aggregate( [ {$group : {_id : null, min : {$min : "$entry_id"}, max : {$max : "$entry_id"}}}, {$addFields : {rangeIds : {$range : ["$min", "$max"]}}}, {$lookup : {from : "entries", localField : "rangeIds", foreignField : "entry_id", as : "entries"}}, {$project : {_id :0, missingIds : {$setDifference : ["$rangeIds", "$entries.entry_id"]}}} ] )
{ "missingIds" : [ 27497714, 27497715, 27497716, 27497717, 27497718, 27497719, 27497720, 27497721, 27497722, 27497723, 27497724, 27497725, 27497726, 27497727, 27497728, 27497729, 27497730, 27497731, 27497732, 27497733, 27497734, 27497735, 27497736, 27497737, 27497738, 27497739, 27497740, 27497741, 27497742, 27497743, 27497744, 27497745, 27497746, 27497747, 27497748, 27497749, 27497751, 27497752, 27497753, 27497754, 27497755, 27497756, 27497757, 27497758, 27497759, 27497760 ] }
>
I'm new to the MongoDB world. I'm trying to figure out how to count the number of children organizations assigned to a parent organization. I have documents that have this general structure:
{
"_id" : "001",
"parentOrganization" : {
"organizationId" : "pOrg1"
},
"childOrganization" : {
"organizationId" : "cOrg1"
}
},
{
"_id" : "002",
"parentOrganization" : {
"organizationId" : "pOrg1"
},
"childOrganization" : {
"organizationId" : "cOrg2"
}
},
{
"_id" : "003",
"parentOrganization" : {
"organizationId" : "pOrg2"
},
"childOrganization" : {
"organizationId" : "cOrg3"
}
}
Each document has a parentOrganization with an associated childOrganization. There may be multiple documents with the same parentOrganization, but different childOrganizations. There may also be multiple documents with the same parent/child relationship. Additionally, there may even be a case where a child org may associate with multiple parent orgs.
I'm trying to group by parentOrganization and then count the number of unique childOrganization's associated with each parentOrganization, as well as display the unique id's.
I have tried using an aggregation framework with $match and $group, but I'm still not getting into the child organization parts to count them. Here is what I'm currently attempting:
var s1 = {$match: {"parentOrganization.organizationId": {$exists: true}}};
var s2 = {$group: {_id: "$parentOrganization.organizationId", count: {$sum: "$childOrganization.organizationId"}}};
db.collection.aggregate(s1, s2);
My results are returning the parentOrganization, but my $sum is not returning the number of associated childOrganizations:
/* 1 */
{
"_id" : "pOrg1",
"count" : 0
}
/* 2 */
{
"_id" : "pOrg2",
"count" : 0
}
I get the feeling it is a bit more complicated than my limited knowledge has access to at this time. What details am I missing in this query?
Your $sum is referencing the childOrganization.organizationId value, which is a string. When $sum references a string, it will return the value 0.
I was a unsure of exactly what you were asking for, but I believe that these aggregations can help you on your way.
This will return a count of documents groups by the parentOrganization.organizationId
db.collection.aggregate({$group: {"_id":"$parentOrganization.organizationId", "count": {"$sum": 1}}})
Output:
{ "_id" : "pOrg2", "count" : 1 }
{ "_id" : "pOrg1", "count" : 2 }
This will return a count of unique parent/child organizations:
db.collection.aggregate(
{$group: {"_id": {"parentOrganization": "$parentOrganization.organizationId", "childOrganization": "$childOrganization.organizationId"}, "count":{$sum:1}}})
Output:
{ "_id" : { "parentOrganization" : "pOrg2", "childOrganization" : "cOrg3" }, "count" : 1 }
{ "_id" : { "parentOrganization" : "pOrg1", "childOrganization" : "cOrg2" }, "count" : 1 }
{ "_id" : { "parentOrganization" : "pOrg1", "childOrganization" : "cOrg1" }, "count" : 1 }
This will return a count of unique child organizations and get the set of unique child organizations as well using $addToSet. One caveat of using $addToSet is that the MongoDB 16MB limit on document size still holds. This means that if your collection is large enough such that the size of the set will make one document greater than 16MB, the command will fail. The first $group will create a set of child organizations grouped by parent organization. The $project is used simply to add the total size of the set to the result.
db.collection.aggregate([
{$group: {"_id" : "$parentOrganization.organizationId", "childOrgs" : { "$addToSet" : "$childOrganization.organizationId"}}},
{$project: {"_id" : "$_id", "uniqueChildOrgsCount": {"$size" : "$childOrgs"}, "uniqueChildOrgs": "$childOrgs"}}])
Output:
{ "_id" : "pOrg2", "uniqueChildOrgsCount" : 1, "uniqueChildOrgs" : [ "cOrg3" ]}
{ "_id" : "pOrg1", "uniqueChildOrgsCount" : 2, "uniqueChildOrgs" : [ "cOrg2", "cOrg1" ]}
During these aggregations, I left out the $match statement you included for simplicity, but you could add that back as well.
I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.
When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])
you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)