Need help on mongo aggregation query - mongodb

Below is the sample data:
[{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-92.41151,
35.11683
]
}
},
"geoHash": "dr72jwgnbbst", "status":"Active","customerId":"8047380094","locationId":"A0"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-89.58342,
36.859161
]
}
},
"geoHash": "dn6qtkr5xk8m", "status":"Pending","customerId":"8047380094","locationId":"A1"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-86.038762,
36.519016
]
}
},
"geoHash": "dn6zf0h6xtcp", "status":"Active","customerId":"8047380094","locationId":"A2"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-98.3081936,
26.2143207
]
}
},
"geoHash": "9udj4unjmp9f", "status":"Pending","customerId":"8047380094","locationId":"A3"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-98.5377275,
29.4878928
]
}
},
"geoHash": "9v1zv8p52t8u", "status":"Pending","customerId":"8047380094","locationId":"A4"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-73.7018126,
42.641387
]
}
},
"geoHash": "dreddfeup69m", "status":"Pending","customerId":"8047380094","locationId":"A5"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-111.865295,
33.431942
]
}
},
"geoHash": "9tbqnqn5jtwq", "status":"Active","customerId":"8047380094","locationId":"A6"
},
{
"outdoor": {
"location": {
"type": "Point",
"coordinates": [
-79.810763,
34.174603
]
}
},
"geoHash": "dnp4rv796rtz", "status":"Active","customerId":"8047380094","locationId":"A7"
}
]
Currently we are running 2 queries:
Query 1 - This query will give the counts by status grouped by geoHash substring.
db.locations.aggregate([{"$match": {"customerId": "8047380094"}}, {"$project": {"status": 1, "geoHash": {"$substr": ["$geoHash", 0, 2]}}}, {"$group": {"_id": {"geoHash": "$geoHash", "status": "$status"}, "statusCount": {"$sum": 1}}}],
{
"allowDiskUse": true
});
Output 1:
{ "_id" : { "geoHash" : "dr", "status" : "Active" }, "statusCount" : 1 }
{ "_id" : { "geoHash" : "dr", "status" : "Pending" }, "statusCount" : 1 }
{ "_id" : { "geoHash" : "dn", "status" : "Active" }, "statusCount" : 2 }
{ "_id" : { "geoHash" : "dn", "status" : "Pending" }, "statusCount" : 1 }
{ "_id" : { "geoHash" : "9u", "status" : "Pending" }, "statusCount" : 1 }
{ "_id" : { "geoHash" : "9v", "status" : "Pending" }, "statusCount" : 1 }
{ "_id" : { "geoHash" : "9t", "status" : "Active" }, "statusCount" : 1 }
Query 2 - We have a query to get the first location coordinates for the geohash group.
db.locations.aggregate([{"$match": {"customerId": "8047380094"}}, {"$project": {"geoHash": {"$substr": ["$geoHash", 0, 2]}, "locations": "$outdoor.location.coordinates"}}, {"$group": {"_id": "$geoHash", "locations": {"$push": "$locations"}}}, {"$project": {"_id": 1, "locations": {"$arrayElemAt": ["$locations", 0]}}}],
{
"allowDiskUse": true
});
Output 2:
{ "_id" : "dr", "locations" : [ -92.41151, 35.11683 ] }
{ "_id" : "dn", "locations" : [ -89.58342, 36.859161 ] }
{ "_id" : "9u", "locations" : [ -98.3081936, 26.2143207 ] }
{ "_id" : "9v", "locations" : [ -98.5377275, 29.4878928 ] }
{ "_id" : "9t", "locations" : [ -111.865295, 33.431942 ] }
Question 1:
Is there any we can combine both queries into 1 and get both the outputs in a single query?
Question 2:
If the total count is 1 (not by status), we need to get the locationId also? How can we achieve this in the same query?
In the above case,
for "9u", we need to return A3
for "9v", we need to return A4
for "9t", we need to return A6
Note: We are using spring boot application with spring mongo.

I didn't go what you try to achieve, because the question doesn't mention it. But for you both question, I can answer
You can use $facet to use create multiple aggregation pipeline
By using the first $facet stage, you can use additional stage to get your locationId;
Here is the code
{
$project: {
firstQuery: 1,
secondQuery: {
$map: {
input: "$secondQuery",
as: "s",
in: {
$mergeObjects: [
"$$s",
{
$arrayElemAt: [
{
$map: {
input: {
$filter: {
input: "$firstQuery",
as: "f",
cond: { $eq: ["$$f._id.geoHash", "$$s._id" ] }
}
},
in: { locationId: "$$this.locationId" }
}
},
0
]
}
]
}
}
}
}
}
Working Mongo playground
Note: Make sure { $eq: ["$$f._id.geoHash", "$$s._id" ] } will give one unique object. Else need another idea. And I feel your posted both queries more like same. So there can be a easy solution. But I answered based on the question you posted

Related

sum array of elements without unwind in group by?

Below are the projected Result and I want to get the sum of Expenses Amount where ExpenseType equal to "1" and the result should group by Type and Quarter. How to achieve this functionality without unwinding the Expenses Array.?
{
"Type" : "CreditCard",
"Quarter": "20201",
"Expenses" : [
{
"ExpenseType" : "1",
"Amount" : 123
},
{
"ExpenseType" : "2",
"Amount" : 183
}
]
}
{
"Type" : "Cash",
"Quarter": "20202",
"Expenses" : [
{
"ExpenseType" : "1",
"Amount" : 345
},
{
"ExpenseType" : "2",
"Amount" : 200
}
]
}
Expected Output:
{
"Type" : "CreditCard",
"Quarter": "20201",
"Total":"123"
}
{
"Type" : "Cash",
"Quarter": "20202",
"Total":"345"
}****
Mechanism
Group by Quarter and Tpy
Sum values
Pipeline
db.collection.aggregate({
$group: {
"_id": {
"Quarter": "$Quarter",
"Type": "$Type"
},
"Total": {
$push: {
$reduce: {
input: "$Expenses",
initialValue: 0,
in: {
$cond: [
{
$eq: [
"$$this.ExpenseType",
"1"
]
},
{
$add: [
"$$value",
"$$this.Amount"
]
},
{
$add: [
"$$value",
0
]
}
]
}
}
}
}
}
})
Playground

MongoDB Aggregation function

I have the following JSON Documents in Mongo collection named "Movies"
{
"_id": "5ed0c9700b9e8b0e2c542054",
"movie_name": "Jake 123",
"score": 20,
"director": "Jake"
},
{
"_id": "5ed0a9840b9e8b0e2c542053",
"movie_name": "Avatar",
"director": "James Cameroon",
"score": 50,
"boxoffice": [
{
"territory": "US",
"gross": 2000
},
{
"territory": "UK",
"gross": 1000
}
]
},
{
"_id": "5ed0a9630b9e8b0e2c542052",
"movie_name": "Titanic",
"score": 100,
"director": "James Cameroon",
"boxoffice": [
{
"territory": "US",
"gross": 1000
},
{
"territory": "UK",
"gross": 500
}
],
"actors": [
"Kate Winselet",
"Leonardo De Caprio",
"Rajinikanth",
"Kamalhaasan"
]
}
I run the below query which finds the maximum collection of a country of various movies. My intention is to find the maximum collection and the corresponding territory.
db.movies.aggregate([
{$match: {"boxoffice" : { $exists: true, $ne : []}}},
{$project: {
"title":"$movie_name", "max_boxoffice": {$max : "$boxoffice.gross"},
"territory" : "$boxoffice.territory" } }
])
I get the result as follows. How do I get the correct territory that corresponds to the collection?
{
"_id" : ObjectId("5ed0a9630b9e8b0e2c542052"),
"title" : "Titanic",
"max_boxoffice" : 1000,
"territory" : [
"US",
"UK"
]
},
{
"_id" : ObjectId("5ed0a9840b9e8b0e2c542053"),
"title" : "Avatar",
"max_boxoffice" : 2000,
"territory" : [
"US",
"UK"
]
}
Expected output:
Avatar and Titanic has collected more money in US. I wanted territories to display the values of them
{
"_id" : ObjectId("5ed0a9630b9e8b0e2c542052"),
"title" : "Titanic",
"max_boxoffice" : 1000,
"territory" : "US"
},
{
"_id" : ObjectId("5ed0a9840b9e8b0e2c542053"),
"title" : "Avatar",
"max_boxoffice" : 2000,
"territory" : "US"
}
For this specific requirement, you can use $set (aggregation). $set appends new fields to existing documents. and we can include one or more $set stages in an aggregation operation to achieve this like:
db.movies.aggregate([
{
$match: { "boxoffice": { $exists: true, $ne: [] } }
},
{
$set: {
boxoffice: {
$filter: {
input: "$boxoffice",
cond: { $eq: ["$$this.gross", { $max: "$boxoffice.gross" }]}
}
}
}
},
{
$set: {
boxoffice: { $arrayElemAt: ["$boxoffice", 0] }
}
},
{
$project: {
"title": "$movie_name",
"max_boxoffice": "$boxoffice.gross",
"territory": "$boxoffice.territory"
}
}
])
Mongo Playground

MongoDB iteration on aggregate

I have a collection :
{
"value" : "20",
"type" : "square",
"name" : "form1"
},
{
"value" : "24",
"type" : "circle",
"name" : "form2"
},
{
"value" : "12",
"type" : "square",
"name" : "form3"
}
This aggregation :
let searchTerm = "form2"
db.myCollec.aggregate([
{ "$facet": {
"data": [
{ "$match": { "name": searchTerm }},
{ "$project": { "name": 1, "type": 1, "_id": 0 }}
]
}},
{ "$project": {
"name": {
"$ifNull": [{ "$arrayElemAt": ["$data.name", 0] }, searchTerm ]
},
"type": {
"$ifNull": [{ "$arrayElemAt": ["$data.type", 0] }, null]
}
}}
])
give this result :
{ "name" : "form2", "type" : "circle" }
and if i'm looking for a non existing "form4" :
{ "name" : "form4", "type" : null }
Now I want to do it for a lot of values so I try to put them in an array then loop on this array. According to the asynchronous property of javascript I try this code :
tab = [ "form2", "form4" ]
for( var i =0; i<(tab.length);i++) { (function (i) {
searchTerm = tab[i]
db.myCollec.aggregate([
{ "$facet": {
"data": [
{ "$match": { "name": searchTerm }},
{ "$project": { "name": 1, "type": 1, "_id": 0 }}
]
}},
{ "$project": {
"name": {
"$ifNull": [{ "$arrayElemAt": ["$data.name", 0] }, searchTerm ]
},
"type": {
"$ifNull": [{ "$arrayElemAt": ["$data.type", 0] }, null]
}
}}
])
}) (i) }
There is no result...
If I add a print(searchTerm) the values are well printed but no result for the aggregation.
Thanx for your help.

How to group longitude and latitude reducing the decimal places of those points?

I have the following aggregate:
db.locations.aggregate(
// Pipeline
[
// Stage 1
{
$geoNear: {
near: { type: "Point", coordinates: [-47.121314, -18.151515 ] },
distanceField: "dist.calculated",
maxDistance: 500,
includeLocs: "dist.location",
num: 50000,
spherical: true
}
},
// Stage 2
{
$group: {
"_id" : {
'loc' : '$loc'
},
qtd: { $sum:1 }
}
},
], );
And the following collection:
{
"_id" : ObjectId(),
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121311,
-18.151512
]
}
},
{
"_id" : ObjectId(),
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121311,
-18.151512
]
}
},
{
"_id" : ObjectId(),
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121312,
-18.151523
]
}
},
{
"_id" : ObjectId(),
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121322,
-18.151533
]
}
}
When I run the aggregate, I have the following result:
{
"_id" : {
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121311,
-18.151512
]
}
},
"qtd" : 2.0
},
{
"_id" : {
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121312,
-18.151523
]
}
},
"qtd" : 1.0
},
{
"_id" : {
"loc" : {
"type" : "Point",
"coordinates" : [
-47.121322,
-18.151533
]
}
},
"qtd" : 1.0
}
I would like to group these locations in a single document, since they are very close ..
I thought of reducing the size of each point, -47.121314 being something like -47.1213
Something like this
{
"_id" : {
"loc" : {
"type" : "Point",
"coordinates" : [
-47.1213,
-18.1515
]
}
},
"qtd" : 4.0
}
But I have no idea how to group these documents.
Is it possible?
The way to reduce the floating point precision is to $multiply out the number by the required precision adjustment, "truncate it" to an integer and then $divide back to the desired precision.
For latest MongoDB releases ( since MongoDB 3.2 ) you can use $trunc:
db.locations.aggregate([
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [ -47.121314, -18.151515 ]
},
"distanceField": "qtd",
"maxDistance": 500,
"num": 50000,
"spherical": true
}},
{ "$group": {
"_id": {
"type": '$loc.type',
"coordinates": {
"$map": {
"input": '$loc.coordinates',
"in": {
"$divide": [
{ "$trunc": { "$multiply": [ '$$this', 10000 ] } },
10000
]
}
}
}
},
"qtd": { "$sum": '$qtd' }
}}
]);
For releases prior to that, you can use $mod and $subtract to remove the "remainder" instead:
db.locations.aggregate([
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [ -47.121314, -18.151515 ]
},
"distanceField": "qtd",
"maxDistance": 500,
"num": 50000,
"spherical": true
}},
{ "$group": {
"_id": {
"type": '$loc.type',
"coordinates": {
"$map": {
"input": '$loc.coordinates',
"as": "coord",
"in": {
"$divide": [
{ "$subtract": [
{ "$multiply": [ '$$coord', 10000 ] },
{ "$mod": [
{ "$multiply": [ '$$coord', 10000 ] },
1
]}
]},
10000
]
}
}
}
},
"qtd": { "$sum": '$qtd' }
}}
]);
Both return the same result:
/* 1 */
{
"_id" : {
"type" : "Point",
"coordinates" : [
-47.1213,
-18.1515
]
},
"qtd" : 4.01180839007879
}
We use $map here to "reshape" the array contents of "coordinates" applying the "rounding" to each value in the array. You might note the two slightly different usages with "as' in the second example, since the ability to use $$this as a default reference was only applied in MongoDB 3.2, for which the listing presumes you would not have or otherwise you would use $trunc instead of the alternate method usage.
You should note that $geoNear which is essentially a "nearest" search is only returning 100 documents by default or alternately up to the number specified in "num" or "limit" options. So that is always a governing factor in the number of results returned if those would exceed the other constraints such as "maxDistance".
There is also no need to follow the documentation so literally, as "distanceField" is the only other mandatory parameter aside from "spherical" which is required when a "2dsphere" index is used. The value to "distanceField" can be whatever you actually want it to be, and in this case we simply supply it directly with the name of the property you want to output.

Query an ArrayPosition in MongoDB

I have a collection like this:
> db.nodes.find()
{ "_id" : ObjectId("534d44e182bee8420ace927f"), "id" : "59598841", "created_by" : "JOSM", "geo" : { "type" : "Point", "coordinates" : [ 9.7346094, 52.371738 ] } }
{ "_id" : ObjectId("534d44e182bee8420ace9280"), "id" : "59598842", "created_by" : "JOSM", "geo" : { "type" : "Point", "coordinates" : [ 9.7343616, 52.3718121 ] } }
{ "_id" : ObjectId("534d44e182bee8420ace9281"), "id" : "59598845", "created_by" : "JOSM", "geo" : { "type" : "Point", "coordinates" : [ 9.7331504, 52.372057 ] } }
{ "_id" : ObjectId("534d44e182bee8420ace9282"), "id" : "59835778", "created_by" : "JOSM", "geo" : { "type" : "Point", "coordinates" : [ 9.7354137, 52.3711697 ] } }
{ "_id" : ObjectId("534d44e182bee8420ace9283"), "id" : "60409270", "created_by" : "JOSM", "geo" : { "type" : "Point", "coordinates" : [ 9.7354388, 52.3735999 ] } }
Now I want to query the coordinates-array to find the document with the greatest lon-value.
How can I do that, I have no idea :(
Tschüss, Andre
So actually getting the "lon" which is the first value, of the array may not seem immediately apparent, but is quite simple with aggregate:
db.nodes.aggregate([
{ "$project": {
"_id": {
"_id": "$_id",
"id": "$id",
"created_by": "$created_by",
"geo": "$geo",
},
"coordinates": "$geo.coordinates"
}},
{ "$unwind": "$coordinates" },
{ "$group": {
"_id": "$_id",
"lon": { "$first": "$coordinates" }
}},
{ "$sort": { "lon": 1 } },
{ "$limit": 1 },
{ "$project": {
"_id": "$_id._id",
"id": "$_id.id",
"created_by": "$_id.created_by",
"geo": "$_id.geo",
}}
])
Which gives the whole document with the higest value. Or if you just want the value:
db.nodes.aggregate([
{ "$unwind": "$geo.coordinates" },
{ "$group": {
"_id": "$_id",
"lon": { "$first": "$geo.coordinates" }
}},
{ "$group": {
"_id": null,
"lon": { "$max": "$lon" }
}}
])
Try using the aggregation framework
db.nodes.aggregate(
{ $unwind: "geo.coordinate" },
{ $group: { _id: { id: "$id"}, lon: { $first: "geo.coordinate" } } },
{ $group: { _id: null, maxval: { $max: "$lon" } } }
)
For more info on aggregation look here: http://docs.mongodb.org/manual/reference/operator/aggregation/