Object to group id in MongoDB pipelines - mongodb

I have a MongoDB aggregation pipeline that returns such an output:
[
{
"id": {
"isPaid": false,
"state": "approved",
"updatedAt": "2018-06-27"
},
"state": "approved",
"isPaid": false,
"updatedAt": "2018-06-27",
"totalCount": 1,
"totalValue": 658.4332
},
{
"id": {
"isPaid": false,
"state": "canceled",
"updatedAt": "2018-05-30"
},
"state": "canceled",
"isPaid": false,
"updatedAt": "2018-05-30",
"totalCount": 1,
"totalValue": 1735.7175
},
]
For the system that exploits it, I'd need id to be available, as a string.
So I'm wondering if, using an aggregation pipeline stage, there would be an elegant and generic way to concatenate/serialize object values to a string:
"id": {"isPaid": false, "state": "approved", "updatedAt": "2018-06-27"}
to something like:
"id": "0.approved.2018-06-27"

you can use $concat
db.t55.aggregate([
{$addFields : {
id : {$concat : [{$cond :["$isPaid", "1","0"]}, ".", "$state", "." ,"$updatedAt"]}
}}
])
using $toString if not just true/false
db.t55.aggregate([
{$addFields : {
id : {$concat : [{$toString :{$indexOfArray : [[false,true], "$isPaid"]}}, ".", "$state", "." ,"$updatedAt"]}
}}
])
output
{ "_id" : ObjectId("5c3e794b21526e3ff4bf4ca2"), "id" : "0.approved.2018-06-27", "state" : "approved", "isPaid" : false, "updatedAt" : "2018-06-27", "totalCount" : 1, "totalValue" : 658.4332 }
{ "_id" : ObjectId("5c3e794b21526e3ff4bf4ca3"), "id" : "0.canceled.2018-05-30", "state" : "canceled", "isPaid" : false, "updatedAt" : "2018-05-30", "totalCount" : 1, "totalValue" : 1735.7175 }

Related

Mongodb how to reduce the array within the matching key and calculate avg

{
"_id" : {
"state" : "NY",
"st" : "value"
},
"List" : [
{
"id" : "21",
"score" : 18.75,
"name" : "PU"
},
{
"id" : "21",
"score" : 25.0,
"name" : "PU"
},
{
"id" : "23",
"score" : 25.0,
"name" : "CL"
},
{
"id" : "23",
"score" : 56.25,
"name" : "CL"
}
]
}
Desired result:
Match the key with id within the array and calculate avg of score.
{
"_id" : {
"state" : "New York",
"st" : "value"
},
"List" : [
{
"id" : "21",
"score" : 21.875,
"name" : "PU"
},
{
"id" : "23",
"score" : 40.625,
"name" : "CL"
}
]
}
Thank you in advance.
Query
(returns the expected result)
unwind List
group with including the id, and find avg
fix the structure to be similar with the document you want
group back to restore the document structure (reverse the unwind)
if 2 sames ids have different name(if possible to happen)
query will make them seperated members in the array.
(alternativly it could make them same member and pack the names in an array, but that would produce different schema from the one you expect to see)
Test code here
db.collection.aggregate([
{
"$unwind": {
"path": "$List"
}
},
{
"$group": {
"_id": {
"state": "$_id.state",
"st": "$_id.st",
"id": "$List.id",
"name": "$List.name"
},
"avg": {
"$avg": "$List.score"
}
}
},
{
"$project": {
"_id": {
"state": "$_id.state",
"st": "$_id.st"
},
"List": {
"name": "$_id.name",
"id": "$_id.id",
"avg": "$avg"
}
}
},
{
"$group": {
"_id": "$_id",
"List": {
"$push": "$List"
}
}
}
])

Mongodb aggregate with cond and query value

I'm new to mongodb. I need to know how it is possible to query item for set to the value with aggregate
Data
[
{
"_id" : "11111",
"parent_id" : "99",
"name" : "AAAA"
},
{
"_id" : "11112",
"parent_id" : "99",
"name" : "BBBB"
},
{
"_id" : "11113",
"parent_id" : "100",
"name" : "CCCC"
},
{
"_id" : "11114",
"parent_id" : "99",
"name" : "DDDD"
}
]
mongoshell
Assume $check is false
db.getCollection('test').aggregate(
[
{
"$group": {
"_id": "$id",
//...,
"item": {
"$last": {
"$cond": [
{"$eq": ["$check", true]},
"YES",
* * ANSWER **,
}
]
}
},
}
]
)
So i need the result for item is all the name contain with same parent_id as string of array
Expect result
[
{
"_id" : "11111",
"parent_id" : "99",
"name" : "AAAA",
"item" : ["AAAA","BBBB","DDDD"]
},
{
"_id" : "11112",
"parent_id" : "99",
"name" : "BBBB",
"item" : ["AAAA","BBBB","DDDD"]
},
{
"_id" : "11113",
"parent_id" : "100",
"name" : "CCCC",
"item" : ["CCCC"]
},
{
"_id" : "11114",
"parent_id" : "99",
"name" : "DDDD",
"item" : ["AAAA","BBBB","DDDD"]
}
]
Try this..
Sample live demo
db.collection.aggregate([
{
"$group": {
"_id": "$parent_id",
"item": {
"$push": "$name"
},
"data": {
"$push": {
"_id": "$_id",
"name": "$name"
}
}
}
},
{
"$unwind": "$data"
},
{
"$project": {
"_id": "$data._id",
"parent_id": "$_id",
"name": "$data.name",
"item": 1
}
}
])

MongoDB Return Inner Document From Array

I am trying to fetch an element from an array in a document and only the element I don't want the entire document
I tried a different method but they all return the entire document
db.dept.find({"section.classes.CRN":"1901"}).limit(100)
db.dept.where("section.classes.CRN").eq("1901").limit(100)
json
{
"_id" : ObjectId("5d70ab0c280d6b8ebb850cc1"),
"name" : "Art Studio",
"abbr" : "ARS",
"section" : [
{
"type" : "Undergraduate Courses",
"classes" : [
{
"CRN" : "193",
"Course" : "ARS100",
"Sec" : "01",
"Title" : "Drawing I",
"Cr" : "3",
"Dates" : "8/26-12/19",
"Days" : "MR",
"Time" : "1230P-0320P",
"Loc" : "SAB 226",
"Instructor" : "Schuck",
"Attributes" : "",
"Avail" : "F"
},
{
"CRN" : "293",
"Course" : "ARS100",
"Sec" : "02",
"Title" : "Drawing I",
"Cr" : "3",
"Dates" : "8/26-12/19",
"Days" : "MR",
"Time" : "0330P-0620P",
"Loc" : "SAB 226",
"Instructor" : "Itty",
"Attributes" : "",
"Avail" : "F"
},
{...
I am trying to get this or something similar when searching for a set of CRN values
json
[ {
"CRN" : "193",
"Course" : "ARS100",
"Sec" : "01",
"Title" : "Drawing I",
"Cr" : "3",
"Dates" : "8/26-12/19",
...
"Instructor" : "Schuck",
"Attributes" : "",
"Avail" : "F"
}
]
Try using the aggregate pipeline to project double nested array as:
Input:
[
{
"_id": ObjectId("5d70ab0c280d6b8ebb850cc1"),
"name": "Art Studio",
"abbr": "ARS",
"section": [
{
"type": "Undergraduate Courses",
"classes": [
{
"CRN": "193",
"Course": "ARS100",
"Sec": "01",
"Title": "Drawing I",
"Cr": "3",
"Dates": "8/26-12/19",
"Days": "MR",
"Time": "1230P-0320P",
"Loc": "SAB 226",
"Instructor": "Schuck",
"Attributes": "",
"Avail": "F"
},
{
"CRN": "293",
"Course": "ARS100",
"Sec": "02",
"Title": "Drawing I",
"Cr": "3",
"Dates": "8/26-12/19",
"Days": "MR",
"Time": "0330P-0620P",
"Loc": "SAB 226",
"Instructor": "Itty",
"Attributes": "",
"Avail": "F"
}
]
}
]
}
]
Query:
hereafter unwinding section you can filter classes for CRN
db.collection.aggregate([
{
$unwind: "$section"
},
{
$project: {
name: 1,
abbr: 1,
"section.type": 1,
"section.classes": {
$filter: {
input: "$section.classes",
as: "item",
cond: {
$eq: [
"$$item.CRN",
"193"
]
}
}
}
}
},
{
$group: {
_id: "$_id",
section: {
$push: "$section"
}
}
}
])
Output:
you can manage your keys as you want in project for adding new keys or replacing them.
[
{
"_id": ObjectId("5d70ab0c280d6b8ebb850cc1"),
"section": [
{
"classes": [
{
"Attributes": "",
"Avail": "F",
"CRN": "193",
"Course": "ARS100",
"Cr": "3",
"Dates": "8/26-12/19",
"Days": "MR",
"Instructor": "Schuck",
"Loc": "SAB 226",
"Sec": "01",
"Time": "1230P-0320P",
"Title": "Drawing I"
}
],
"type": "Undergraduate Courses"
}
]
}
]
db.dept.find({"section.classes.CRN":"1901"},{"section.classes":1}).limit(100)
It's called projection in mongodb, you pass a second object in find query to specify which fields you want in result.
so according to your above case if you want name, and section in result you should pass something like this
db.dept.find({"section.classes.CRN":"1901"},{"name":1, "section":1}).limit(100)

MongoDB Use $group for the subset after $group

I just learned mongoDB, I am trying to find some repeat customer info through my customer database.
The sample collection:
{
"_id" : ObjectId("5b7617e48146d8bae"),
"amazon_id" : "112",
"date" : "2018-01-25T18:40:55-08:00",
"email" : "xxxxx#marketplace.amazon.com",
"buy_name" : "xxxxx",
"sku" : "NPC-50",
"qty" : 8,
"price" : 215.92,
"reci_name" : "XXXXX",
"street1" : "XXXXX",
"street2" : "",
"street3" : "",
"city" : "XXXXX",
"state" : "XXXXX",
"zip_code" : "XXXXXX"
}
{
"_id" : ObjectId("5b761712e48146d8bae"),
"amazon_id" : "114",
"date" : "2018-01-27T18:40:55-08:00",
"email" : "xxxxx#marketplace.amazon.com",
"buy_name" : "xxxxx",
"sku" : "ABC",
"qty" : 1,
"price" : 19.99,
"reci_name" : "XXXXX",
"street1" : "XXXXX",
"street2" : "",
"street3" : "",
"city" : "XXXXX",
"state" : "XXXXX",
"zip_code" : "XXXXXX"
}
I group all customer info by their email id, and here is my code:
db.getCollection('order').aggregate([
{ $group: { _id: "$email",
OrderInfo: {$push: {orderId: "$amazon_id", sku: "$sku", qty: "$qty", price:"$price"
}},
CustomerInfo: {$addToSet: {buyName: "$buy_name",reName: "$reci_name", email: "$email", street1: "$street1",
street2: "$street2", city: "$city", state: "$state", zipCode: "$zip_code"} }
}},
{ $project: {_id: 1, OrderInfo: 1, CustomerInfo:1, total_price:{$sum: "$OrderInfo.price"} }},
{ $match: {total_price: {$gt:100} } },
{ $sort: {total_price:-1}},
], { allowDiskUse: true } );
It shows me the result:
{
"_id" : "xxxxxxx#marketplace.amazon.com",
"OrderInfo" : [
{
"orderId" : "112",
"sku" : "NPC-50",
"qty" : 8,
"price" : 215.92
},
{
"orderId" : "112",
"sku" : "NPC-50",
"qty" : 1,
"price" : 26.99
},
{
"orderId" : "114",
"sku" : "NPC-50",
"qty" : 1,
"price" : 26.99
},
{
"orderId" : "114",
"sku" : "ABC",
"qty" : 1,
"price" : 19.99
},
{
"orderId" : "116",
"sku" : "ABC",
"qty" : 1,
"price" : 19.99
},
],
"CustomerInfo" : [
{
"buyName" : "xxxxxxxxx",
"reName" : "xxxxxxxxxxxx",
"email" : "xxxxxxxxxxxx#marketplace.amazon.com",
"street1" : "xxxxxxxxxxx",
"street2" : "",
"city" : "xxxxxxxxxx",
"state" : "xxxxxxxxxxxx",
"zipCode" : "xxxxxxxxxx"
},
{
"buyName" : "xxxxxxxxxx",
"reName" : "xxxxxx",
"email" : "xxxxxxxx#marketplace.amazon.com",
"street1" : "xxxxxxxxxxx",
"street2" : "",
"city" : "xxxxx",
"state" : "xxxx",
"zipCode" : "xxxxxxxx"
}
],
"total_price" : 309.88
}
However, I want to group the sku and sum up the qty and price in the OrderInfo Set. My expected output is something like:
{
"OrderInfo" : [
{
"sku": "NPC-50",
"qty": 10,
"price": 269.9
},
{
"sku": "ABC",
"qty": 2,
"price": 39.98
},
],
"CustomerInfo" : [
{
"buyName" : "xxxxxxxxx",
"reName" : "xxxxxxxxxxxx",
"email" : "xxxxxxxxxxxx#marketplace.amazon.com",
"street1" : "xxxxxxxxxxx",
"street2" : "",
"city" : "xxxxxxxxxx",
"state" : "xxxxxxxxxxxx",
"zipCode" : "xxxxxxxxxx"
},
{
"buyName" : "xxxxxxxxxx",
"reName" : "xxxxxx",
"email" : "xxxxxxxx#marketplace.amazon.com",
"street1" : "xxxxxxxxxxx",
"street2" : "",
"city" : "xxxxx",
"state" : "xxxx",
"zipCode" : "xxxxxxxx"
}
],
"total_price" : 309.88
}
Any Help will be appreciated.
You can use below aggregation.
db.order.aggregate([
{"$group":{
"_id":{"email":"$email","sku":"$sku"},
"qty":{"$sum":"$qty"},
"price":{"$sum":"$price"},
"CustomerInfo":{
"$addToSet":{
"buyName":"$buy_name",
"reName":"$reci_name",
"email":"$email",
"street1":"$street1",
"street2":"$street2",
"city":"$city",
"state":"$state",
"zipCode":"$zip_code"
}
}
}},
{"$group":{
"_id":"$_id.email",
"OrderInfo":{"$push":{"sku":"$_id.sku","qty":"$qty","price":"$price"}},
"total_price":{"$sum":"$price"},
"CustomerInfo":{"$first":"$CustomerInfo"}
}},
{"$match":{"total_price":{"$gt":100}}},
{"$sort":{"total_price":-1}}
])
You can try below aggregation
db.collection.aggregate([
{ "$group": {
"_id": {
"email": "$email",
"sku": "$sku"
},
"CustomerInfo": {
"$addToSet": {
"buyName": "$buy_name",
"otherFields": "$otherFields",
}
},
"price": { "$sum": "$price" },
"qty": { "$sum": "$qty" }
}},
{ "$group": {
"_id": "$_id.email",
"CustomerInfo": { "$first": "$CustomerInfo" },
"OrderInfo": {
"$push": {
"sku": "$_id.sku",
"qty": "$qty",
"price": "$price"
}
}
}}
])

Aggregate group multiple fields

Given the following dataset:
{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }
I'm trying to achieve the following result. It would be great to just return the totals greater than zero, and also compress each city, cat and Qx total to a single record.
{
"city" : "Yuma",
"cat" : "roads",
"Q2total" : 40
},
{
"city" : "Reno",
"cat" : "roads",
"Q1total" : 30
},
{
"city" : "Reno",
"cat" : "roads",
"Q4total" : 60
},
{
"city" : "Yuma",
"cat" : "parks",
"Q3total" : 45
},
{
"city" : "Reno",
"cat" : "parks",
"Q1total" : 35
},
{
"city" : "Yuma",
"cat" : "roads",
"Q4total" : 20
}
Possible?
We could ask, to what end? Your documents already have a nice consistent Object structure which is recommended. Having objects with varying keys is not a great idea. Data is "data" and should not really be the name of the keys.
With that in mind, the aggregation framework actually follows this sense and does not allow for the generation of arbitrary key names from data contained in the document. But you could get a similar result with the output as data points:
db.junk.aggregate([
// Aggregate first to reduce the pipeline documents somewhat
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat"
},
"Q1": { "$sum": "$Q1" },
"Q2": { "$sum": "$Q2" },
"Q3": { "$sum": "$Q3" },
"Q4": { "$sum": "$Q4" }
}},
// Convert the "quarter" elements to array entries with the same keys
{ "$project": {
"totals": {
"$map": {
"input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "Q1" ] },
{ "quarter": "$$el", "total": "$Q1" },
{ "$cond": [
{ "$eq": [ "$$el", "Q2" ] },
{ "quarter": "$$el", "total": "$Q2" },
{ "$cond": [
{ "$eq": [ "$$el", "Q3" ] },
{ "quarter": "$$el", "total": "$Q3" },
{ "quarter": "$$el", "total": "$Q4" }
]}
]}
]}
}
}
}},
// Unwind the array produced
{ "$unwind": "$totals" },
// Filter any "0" resutls
{ "$match": { "totals.total": { "$ne": 0 } } },
// Maybe project a prettier "flatter" output
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$totals.quarter",
"total": "$totals.total"
}}
])
Which gives you results like this:
{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }
You could alternately use mapReduce which allows "some" flexibility with key names. The catch is though that your aggregation is still by "quarter", so you need that as part of the primary key, which cannot be changed once emitted.
Additionally, you cannot "filter" any aggregated results of "0" without a second pass after outputting to a collection, so it's not really of much use for what you want to do, unless you can live with a second mapReduce operation of "transform" query on the output collection.
Worth note is if you look at what is being done in the "second" pipeline stage here with $project and $map you will see that the document structure is essentially being altered to sometime like what you could alternately structure your documents like originally, like this:
{
"city" : "Reno",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 35 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 0 },
{ "quarter" : "Q4", "total" : 0 }
]
},
{
"city" : "Yuma",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 0 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 45 },
{ "quarter" : "Q4", "total" : 0 }
]
}
Then the aggregation operation becomes simple for your documents to the same results as shown above:
db.collection.aggregate([
{ "$unwind": "$totals" },
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat",
"quarter": "$totals.quarter"
},
"ttotal": { "$sum": "$totals.total" }
}},
{ "$match": { "ttotal": { "$ne": 0 } },
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$_id.quarter",
"total": "$ttotal"
}}
])
So it might make more sense to consider structuring your documents in that way to begin with and avoid any overhead required by the document transformation.
I think you'll find that consistent key names makes a far better object model to program to, where you should be reading the data point from the key-value and not the key-name. If you really need to, then it's a simple matter of reading the data from the object and transforming the keys of each already aggregated result in post processing.