{
"no" : "2020921008981",
"date" : ISODate("2020-04-01T05:19:02.263+0000"),
"sale" : {
"soldItems" : [
{
"itemId" : "5b55ac7f0550de00210a3b24",
"qty" : NumberInt(1),
},
{
"itemId" : "5b55ac7f0550de00210a3b25",
"qty" : NumberInt(2),
}
],
"items" : [
{
"_id" : ObjectId("5b55ac7f0550de00210a3b24"),
unit :"KG"
},
{
"_id" : ObjectId("5b55ac7f0550de00210a3b25"),
unit :"ML"
}
]
}
}
Desired output :
{
"no" : "2020921008981",
"sale" : {}
"qtyList" : "1 KG \n 2 ML"
}
In order to build itemQtyList output field, two fields from different arrays (string and int) should be used. Couldn't find any reference for doing that. Any idea would be appreciated.
You can use below aggregation
db.collection.aggregate([
{ "$project": {
"itemQtyList": {
"$reduce": {
"input": { "$range": [0, { "$size": "$sale.soldItems" }] },
"initialValue": "",
"in": {
"$concat": [
"$$value",
{ "$cond": [{ "$eq": ["$$this", 0] }, "", " \n "] },
{ "$toString": {
"$arrayElemAt": [
"$sale.soldItems.qty",
"$$this"
]
}},
" ",
{ "$arrayElemAt": ["$sale.items.unit", "$$this"] }
]
}
}
}
}}
])
MongoPlayground
"$redact": {
"$cond": {
"if": {
"$eq": [ { "$arrayElemAt": [ "$messages.type", -1 ] }, "A" ]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}
On the above code, i need to add 2 more condition in if statements but not able to add them.
i.e (messages.type = 'A' or messages.type = 'D') && (messages.sender=61)
Please help
you can use $cond with $and like this:
{$cond : [{$and : [{$eq : [1,1]},{$eq : [2,2]},{$eq : [3,3]}]},"yes","no"]}
returns "yes"
{$cond : [{$and : [{$eq : [2,1]},{$eq : [2,2]},{$eq : [3,3]}]},"yes","no"]}
returns "no"
edit:
As per the query in the OP (note this is untested):
"$redact":{
$cond : [
{
$and : [
{
$or : [
{
$eq : [
"$messages.type",
"A"
]
},
{
$eq : [
"$messages.type",
"D"
]
},
]
},
{
$eq : [
"$messages.sender",
61
]
}
]
},
"$$KEEP",
"$$PRUNE"
]
}
The following query is not valid. Can anyone please point me to the error?
I am trying to filter on all documents where the last item in the Progress array has a given state - for example "Done".
db.logdata.aggregate(
[
{ "$match": { "ProcessingInfo.Progress" : { "$exists": true } } },
{ "$redact":
{
"$cond": {
"if": { "$eq": [ { "$arrayElemAt": [ "$ProcessingInfo.Progress.State", -1 ], "Done" } ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}
}
]
)
Sample document (which should be matched - because the last State in Progress array is "InProgress"):
{
"_id" : ObjectId("578fa85bb29339a1fa6d6109"),
"ProcessingInfo" : {
"DateStarted" : ISODate("2016-08-06T16:55:58.294+0000"),
"Attempt" : NumberInt(1),
"LastState" : "Failed",
"Progress" : [
{
"State" : "Failed",
"StateDescription" : ""
},
{
"State" : "Success",
"StateDescription" : ""
},
{
"State" : "Done",
"StateDescription" : ""
},
{
"State" : "InProgress",
"StateDescription" : ""
}
]
}
}
To kind of "circumvent" this problem I have an extra field in the document root "LastState" - is that maybe the way to go (complexity-wise)?
Your query has a little syntax error: your "Done" should be in the $eq array, not in the object containing $arrayElemAt definition. The following query (optimised to filter out documents without the expected State anywhere in the ProcessInfo.Progress array thanks to what Styvane suggested) should return your example document:
[
{ "$match": { "ProcessingInfo.Progress.State" : "InProgress" } },
{ "$redact":
{
"$cond": {
"if": { "$eq": [ { "$arrayElemAt": [ "$ProcessingInfo.Progress.State", -1 ] }, "InProgress" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}
}
]
i have a entity collection like this:
{
"_id" : ObjectId("55f93cedc4fd0e1f309aea64"),
"entityType" : "1",
"identifierIdentity" : [
{
"identifierTypeCode" : "NPI",
"identifierValue" : "111"
},
{
"identifierTypeCode" : "NPI",
"identifierValue" : "123"
},
{
"identifierTypeCode" : "NPI",
"identifierValue" : "141"
},
{
"identifierTypeCode" : "SSN",
"identifierValue" : "155"
}
]
}
{
"_id" : ObjectId("55f93cedc4fd0e1f309aea65"),
"entityType" : "2",
"identifierIdentity" : [
{
"identifierTypeCode" : "NPI",
"identifierValue" : "111"
},
{
"identifierTypeCode" : "NPI",
"identifierValue" : "123"
},
{
"identifierTypeCode" : "SSN",
"identifierValue" : "155"
}
]
}
{
"_id" : ObjectId("55f93cedc4fd0e1f309aea66"),
"entityType" : "3",
"identifierIdentity" : [
{
"identifierTypeCode" : "SSN",
"identifierValue" : "111"
},
{
"identifierTypeCode" : "SSN",
"identifierValue" : "123"
}
]
}
in the above identifierIdentity is an array of documents.
I am trying to get the entity count where "identifierTypeCode" : "NPI" is greater than or equal to two.
I am able to get this using java code but i have millions of records which is taking a lot of time. I want to know whether is their a way i can achieve this in a single query.
The $redact aggregate operator will make short work of this, coupled with a match for arrays that have at least two elements to trim thngs down:
db.collection.aggregate([
{ "$match": {
"identifierIdentity.identifierTypeCode": "NPI",
"identifierIdentity.1": { "$exists": true }
}},
{ "$redact": {
"$cond": {
"if": {
"$gte": [
{ "$size": { "$setDifference": [
{ "$map": {
"input": "$identifierIdentity",
"as": "el",
"in": {
"$cond": {
"if": { "$eq": ["$$el.identifierTypeCode", "NPI"] },
"then": "$$el",
"else": false
}
}
}},
[false]
] } },
2
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Basically match first to cut down the possible documents to procces, the the $redact does another logical match by filtering out the matches from the array and counting the size of the result. Where greater or equal to 2, then you keep the result, or otherwise discard it.
I have some documents
{name: 'apple', type: 'fruit', color: 'red'}
{name: 'banana', type: 'fruit', color: 'yellow'}
{name: 'orange', type: 'fruit', color: 'orange'}
{name: 'eggplant', type: 'vege', color: 'purple'}
{name: 'brocoli', type: 'vege', color: 'green'}
{name: 'rose', type: 'flower', color: 'red'}
{name: 'cauli', type: 'vege', color: 'white'}
{name: 'potato', type: 'vege', color: 'brown'}
{name: 'onion', type: 'vege', color: 'white'}
{name: 'strawberry', type: 'fruit', color: 'red'}
{name: 'cashew', type: 'nut', color: ''}
{name: 'almond', type: 'nut', color: ''}
{name: 'lemon', type: 'vege', color: 'yellow'}
{name: 'tomato', type: 'vege', color: 'red'}
{name: 'tomato', type: 'fruit', color: 'red'}
{name: 'fig', type: 'fruit', color: 'pink'}
{name: 'nectarin', type: 'fruit', color: 'pink'}
I want to group them into alphabets like below
{
_id:'a',
name:['apple','almond'],
type:[],
color:[]
}
{
_id:'b',
name:['banana','brocoli'],
type:[],
color:['brown']
}
...
{
_id:'f',
name:['fig'],
type:['fruit','flower'],
color:['']
}
...
{
_id:'n',
name:['nectarin'],
type:['nut'],
color:['']
}
...
{
_id:'p',
name:['potato'],
type:[''],
color:['pink','purple']
}
...
The result can be saved into another collection. So I can issue a query in the newly created collection: find({_id:'a'}) to return name, type and color begins with the letter 'a'.
I have thought about using $group
$group: {
_id: $substr: ['$name', 0, 1],
name: {$addToSet: '$name'},
}
Then another command
$group: {
_id: $substr: ['$type', 0, 1],
name: {$addToSet: '$type'},
}
And
$group: {
_id: $substr: ['$color', 0, 1],
name: {$addToSet: '$color'},
}
But I am stuck at how to unify all three together to save into a new collection. Or is aggregation framework not suitable for this kind of data summary?
In a real world example, e.g. a e-commerce site, the front page displays something like: "currently we have 135636 products under 231 categories from 111 brands". Surely, these numbers should be cached somewhere (in memory or in another collection), because running $group each time is resource intensive? What would be the optimal schema/design for these situations?
Sorry, my questions are a bit 'confusing'.
Since you have multiple arrays here, the key thing is to "merge" them all into one for the simplest processing.
The $map operator of the aggregation framework works well here, as well as transforming elements so that you get your "first letter" from each word within the data:
db.alpha.aggregate([
{ "$project": {
"list": {
"$map": {
"input": [ "A", "B", "C" ],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
{
"type": { "$literal": "name" },
"value": "$name",
"alpha": { "$substr": [ "$name",0,1 ] }
},
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
{
"type": { "$literal": "type" },
"value": "$type",
"alpha": { "$substr": [ "$type",0,1 ] }
},
{
"type": { "$literal": "color" },
"value": "$color",
"alpha": { "$substr": [ "$color",0,1 ] }
}
]}
]
}
}
}
}},
{ "$unwind": "$list" },
{ "$match": { "list.alpha": { "$ne": "" } } },
{ "$group": {
"_id": "$list.alpha",
"list": {
"$addToSet": "$list"
}
}},
{ "$project": {
"name": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "name" ] },
"$$el.value",
false
]
}
}},
[false]
]
},
"type": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "type" ] },
"$$el.value",
false
]
}
}},
[false]
]
},
"color": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "color" ] },
"$$el.value",
false
]
}
}},
[false]
]
}
}},
{ "$sort": { "_id": 1 } }
])
If you look at the data in "stages" it makes a lot of sense what is happening here in the tranformation.
The first stage "maps" all of the fields into a single array per document, so all documents now look like this:
{
"_id" : ObjectId("55df0652c9064ef625d7f36e"),
"list" : [
{
"type" : "name",
"value" : "nectarin",
"alpha" : "n"
},
{
"type" : "type",
"value" : "fruit",
"alpha" : "f"
},
{
"type" : "color",
"value" : "pink",
"alpha" : "p"
}
]
}
The $unwind is of little consequence, as it does the standard and creates new documents from each member. It is the $group that does most of the work here with this result per "alpha" in the grouping:
{
"_id" : "o",
"list" : [
{
"type" : "name",
"value" : "orange",
"alpha" : "o"
},
{
"type" : "color",
"value" : "orange",
"alpha" : "o"
},
{
"type" : "name",
"value" : "onion",
"alpha" : "o"
}
]
}
That has a nice grouping, and is arguably a decent output format. But in order to get to the end results then the $map operator is employed again alongside $setDifference which can be used to remove the false values where each field "type" conversion does not match the required output field.
The full result is:
{ "_id" : "a", "name" : [ "almond", "apple" ], "type" : [ ], "color" : [ ] }
{ "_id" : "b", "name" : [ "brocoli", "banana" ], "type" : [ ], "color" : [ "brown" ] }
{ "_id" : "c", "name" : [ "cashew", "cauli" ], "type" : [ ], "color" : [ ] }
{ "_id" : "e", "name" : [ "eggplant" ], "type" : [ ], "color" : [ ] }
{ "_id" : "f", "name" : [ "fig" ], "type" : [ "flower", "fruit" ], "color" : [ ] }
{ "_id" : "g", "name" : [ ], "type" : [ ], "color" : [ "green" ] }
{ "_id" : "l", "name" : [ "lemon" ], "type" : [ ], "color" : [ ] }
{ "_id" : "n", "name" : [ "nectarin" ], "type" : [ "nut" ], "color" : [ ] }
{ "_id" : "o", "name" : [ "onion", "orange" ], "type" : [ ], "color" : [ "orange" ] }
{ "_id" : "p", "name" : [ "potato" ], "type" : [ ], "color" : [ "pink", "purple" ] }
{ "_id" : "r", "name" : [ "rose" ], "type" : [ ], "color" : [ "red" ] }
{ "_id" : "s", "name" : [ "strawberry" ], "type" : [ ], "color" : [ ] }
{ "_id" : "t", "name" : [ "tomato" ], "type" : [ ], "color" : [ ] }
{ "_id" : "v", "name" : [ ], "type" : [ "vege" ], "color" : [ ] }
{ "_id" : "w", "name" : [ ], "type" : [ ], "color" : [ "white" ] }
{ "_id" : "y", "name" : [ ], "type" : [ ], "color" : [ "yellow" ] }
Where everything is grouped alphabetically and with their own arrays for each field.
Upcoming releases of MongoDB will have a $filter that makes the $map and $setDifference combination a bit nicer. But that does not make "sets", not that it matters much to this process as long as $addToSet is employed where it is.
Thinking about this, I would like to "advise" that considering the amount of data you want to process here that the resulting "arrays" for each letter might just possibly exceed the BSON limits depending on how many distinct "words" there actually are.
In which case the "advice" here would be follow the process right up to and including the $match, but then only $group afterwards like this:
{ "$group": {
"_id": {
"alpha": "$list.alpha",
"type": "$list.type",
"value": "$list.value",
}
}},
{ "$sort": { "_id": 1 } }
It's longer output of course, but will not exceed the BSON limit for documents at any stage.
Using aggregation you should use some complex aggregation query. First find out all name first letters using substr after that create all name,type and color array using group use $map to check whether given name starts with or not
$setDifference used to remove duplicate empty parameter and finally $out used for writing documents in new collection.
Check this aggregation query :
db.collection.aggregate({
"$project": {
"firstName": {
"$substr": ["$name", 0, 1]
},
"name": 1,
"type": 1,
"color": 1
}
}, {
"$group": {
"_id": null,
"allName": {
"$push": "$name"
},
"allType": {
"$push": "$type"
},
"allColor": {
"$push": "$color"
},
"allfirstName": {
"$push": "$firstName"
}
}
}, {
"$unwind": "$allfirstName"
}, {
"$group": {
"_id": "$allfirstName",
"allType": {
"$first": "$allType"
},
"allName": {
"$first": "$allName"
},
"allColor": {
"$first": "$allColor"
}
}
}, {
"$project": {
"type": {
"$setDifference": [{
"$map": {
"input": "$allType",
"as": "type",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$type", 0, 1]
}, "$_id"]
},
"then": "$$type",
"else": ""
}
}
}
},
[""]
]
},
"color": {
"$setDifference": [{
"$map": {
"input": "$allColor",
"as": "color",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$color", 0, 1]
}, "$_id"]
},
"then": "$$color",
"else": ""
}
}
}
},
[""]
]
},
"name": {
"$setDifference": [{
"$map": {
"input": "$allName",
"as": "name",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$name", 0, 1]
}, "$_id"]
},
"then": "$$name",
"else": ""
}
}
}
},
[""]
]
}
}
}, {
"$sort": {
"_id": 1
}
}, {
"$out": "newCollection"
})