I'm stuck in an issue:
I need to transform:
[ {a:1 , b:2 , c:3} , {a:5, b:6, c:7} ]
Into:
[{a:[1,5], b:[2,6] , c: [3,7]}]
Just look for common keys and group that.
I'm not sure if i should use $project + $reduce or $group. Someone have a tip?
To do this, we should change the object to array first to be abble to group by key. You can check it here.
{
"$project": {
"_id": 0 // First we have to eliminate the _id and all the other fields that we dont want to group
}
},
{
"$project": {
"arr": {
"$objectToArray": "$$ROOT"
}
}
},
Then we sould unwind this array and group the keys.
{
"$unwind": "$arr"
},
{
"$group": {
"_id": "$arr.k",
"field": {
"$push": "$arr.v"
}
}
}
Finally we remap the information with the desired output.
{
$replaceRoot: {
newRoot: {
$arrayToObject: [
[
{
k: "$_id",
v: "$field"
}
]
]
}
}
}
Related
I would like to merge several documents. Most of the fields have the same values but there might be one or two fields that have different values. These fields are unknown beforehand. Ideally I would like to merge all the documents keeping the fields that are the same as is but creating an array of values only for those fields that have some variation.
For my first approach I grouped by a common field to my documents and kept the first document, this however discards some information that varies in other fields.
group_documents = {
"$group": {
"_id": "$0020000E.Value",
"doc": {
"$first": "$$ROOT"
}
}
}
merge_documents = {
"$replaceRoot": {
"newRoot": "$doc"
}
}
write_collection = { "$out": { "db": "database", "coll": "records_nd" } }
objects = coll.aggregate(pipeline)
IF the fields that have different values where known I would have done something like this,
merge_sol1
or
merge_sol2
or
merge_sol3
The third solution is actually very close to my desired output and I could tweak it a bit. But these answers assume a-priori knowledge of the fields to be merged.
You can first convert $$ROOT to array of k-v tuples by $objectToArray. Then, $group all fields by $addToSet to put all distinct values into an array first. Then, check the size of the result array and conditionally pick the first item if the array size is 1 (i.e. the value is the same for every documents in the field); Otherwise, keep the result array. Finally, revert back to original document form by $arrayToObject.
db.collection.aggregate([
{
$project: {
_id: "$key",
arr: {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": "$arr"
},
{
$match: {
"arr.k": {
$nin: [
"key",
"_id"
]
}
}
},
{
$group: {
_id: {
id: "$_id",
k: "$arr.k"
},
v: {
"$addToSet": "$arr.v"
}
}
},
{
$project: {
_id: "$_id.id",
arr: [
{
k: "$_id.k",
v: {
"$cond": {
"if": {
$gt: [
{
$size: "$v"
},
1
]
},
"then": "$v",
"else": {
$first: "$v"
}
}
}
}
]
}
},
{
"$project": {
doc: {
"$arrayToObject": "$arr"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
_id: "$_id"
},
"$doc"
]
}
}
}
])
Mongo Playground
First of all I know we can check if a key exists using the dot operator but in my case it is not working and I dont know why.
So far in the aggregation pipeline I have the following records.
{
"my_key":"1234"
"data":{
1234:"abc"
4567:"xyz"
}
}
{
"my_key":"6666"
"data":{
1234:"abc"
4567:"xyz"
}
}
I want to return the document where the my_key value does not exists in the data object. So according to the above example it should return the 2nd document.
I was trying using the $match operator as following but it does not seem to work.
$match :
{
"data.$my_key":{$exists:false}
}
This does not work and I dont get why :(
Is it because the my_key value is a string and the keys in the data object are not strings?
playground
db.collection.aggregate([
{
"$project": {//Reshape the data
"data": {
"$objectToArray": "$data"
},
"my_key": 1
}
},
{
"$unwind": "$data"
},
{
"$match": {//matching
"$expr": {
"$eq": [
"$data.k",
"$my_key"
]
}
}
}
])
Another way
Wihtout unwind
db.collection.aggregate([
{
"$project": {
"data": {
"$objectToArray": "$data"
},
"my_key": 1
}
},
{
$project: {
"output": {
"$map": {
"input": "$data",
"as": "data",
"in": {
"$eq": [
"$$data.k",
"$my_key"
]
}
}
},
"data": 1,
"my_key": 1
}
},
{
$match: {
output: true
}
}
])
If you need original format of data, you can add the below as last stage
{
$project: {
"data": {
"$arrayToObject": "$data"
},
"my_key": 1
}
}
db.getCollection('rien').aggregate([
{
$match: {
$and: [
{
"id": "10356"
},
{
$or: [
{
"sys_date": {
"$gte": newDate(ISODate().getTime()-90*24*60*60*1000)
}
},
{
"war_date": {
"$gte": newDate(ISODate().getTime()-90*24*60*60*1000)
}
}
]
}
]
}
},
{
$group: {
"_id": "$b_id",
count: {
$sum: 1
},
ads: {
$addToSet: {
"s": "$s",
"ca": "$ca"
}
},
files: {
$addToSet: {
"system": "$system",
"hostname": "$hostname"
}
}
}
},
{
$sort: {
"ads.s": -1
}
},
{
$group: {
"_id": "$b_id",
total_count: {
$sum: 1
},
"data": {
"$push": "$$ROOT"
}
}
},
{
$project: {
"_id": 0,
"total_count": 1,
results: {
$slice: [
"$data",
0,
50
]
}
}
}
])
When I execute this pipelines 5 times, it results in different set of documents. It is 3 node cluster. No sharding enabled. Have 10million documents. Data is static.
Any ideas about the inconsistent results? I feel I am missing some fundamentals here.
I can see 2 problems,
"ads.s": -1 will not work because, its an array field $sort will not apply in array field
$addToSet will not maintain sort order even its ordered from previous stage,
here mentioned in $addToSet documentation => Order of the elements in the output array is unspecified.
and also here mentioned in accumulators-group-addToSet => Order of the array elements is undefined
and also a JIRA ticket SERVER-8512 and DOCS-1114
You can use $setUnion operator for ascending order and $reduce for descending order result from $setUnion,
I workaround I am adding a solution below, I am not sure this is good option or not but you can use if this not affect performance of your query,
I am adding updated stages here only,
remain same
{ $match: {} }, // skipped
{ $group: {} }, // skipped
$sort, optional its up to your requirement if you want order by main document
{ $sort: { _id: -1 } },
$setUnion, treating arrays as sets. If an array contains duplicate entries, $setUnion ignores the duplicate entries, and second its return array in ascending order on the base of first field that we specified in $group stage is s, but make sure all element in array have s as first field,
$reduce to iterate loop of array and concat arrays current element $$this and initial value $$value, this will change order of array in descending order,
{
$addFields: {
ads: {
$reduce: {
input: { $setUnion: "$ads" },
initialValue: [],
in: { $concatArrays: [["$$this"], "$$value"] }
}
},
files: {
$reduce: {
input: { $setUnion: "$files" },
initialValue: [],
in: { $concatArrays: [["$$this"], "$$value"] }
}
}
}
},
remain same
{ $group: {} }, // skipped
{ $project: {} } // skipped
Playground
$setUnion mentioned in documentation: The order of the elements in the output array is unspecified., but I have tested every way its returning in ascending order perfectly, why I don't know,
I have asked question in MongoDB Developer Forum does-setunion-expression-operator-order-array-elements-in-ascending-order?, they replied => it will not guarantee of order!
I have a highly nested mongoDB set of objects and I want to count the number of subdocuments that match a given condition Edit: (in each document). For example:
{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
{
"study_id":"Study1",
"samples":[
{
"sample_id":"NA00001",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"NA00002",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE1",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE2",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE3",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE7",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
I want to know how many subdocuments contain GT:"1|0", which in this case would be 1 in the first document, and two in the second, and 0 in the 3rd. I've tried the unwind and aggregate functions but I'm obviously not doing something correct. When I try to count the sub documents by the "GT" field, mongo complains:
db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])
since my group's names cannot contain ".", yet if I leave them out:
db.collection.aggregate([{$group: {"$GT":1,_id:0}}])
it complains because "$GT cannot be an operator name"
Any ideas?
You need to process $unwind when working with arrays, and you need to do this three times:
db.collection.aggregate([
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Group results to obtain the matched count per key
{ "$group": {
"_id": "$studies.samples.formdata.GT",
"count": { "$sum": 1 }
}}
])
Ideally you want to filter your input. Possibly do this with a $match both before and after $unwind is processed and using a $regex to match documents where the data at point begins with a "1".
db.collection.aggregate([
// Match first to exclude documents where this is not present in any array member
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Match to filter
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Group results to obtain the matched count per key
{ "$group": {
"_id": {
"_id": "$_id",
"key": "$studies.samples.formdata.GT"
},
"count": { "$sum": 1 }
}}
])
Note that in all cases the "dollar $" prefixed entries are the "variables" referring to properties of the document. These are "values" to use an input on the right side. The left side "keys" must be specified as a plain string key. No variable can be used to name a key.
https://mongoplayground.net/p/DpX6cFhR_mm
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$match": {
"$or": [
{
"tags.name": "Canada"
},
{
"tags.name": "ABC"
}
]
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
])
I am having an issues that I thought it would happen often, but I wasn't able to find enough information during my research.
My problem is that I expect the return of a query to have a given JSON format, but when the match filters out all documents, I get no json.
A simplified example: I would like to have the count if documents that match a given criteria, so I have the following query
db.collection.aggregate( [{
$match: {
type: /^1[.]2[.]3[.].*$/
}
}, {
$group: {
_id: {$ifNull : ["$type", 0]},
count: { $sum: 1 }
}
}]);
If I have at least one document that matches, then the query works:
{ "_id" : "1.2.3", "count" : 44 }
If I have no documents, I would like to receive a json like this:
{ "_id" : "1.5.3", "count" : 0 }
Is this possible?
ps: this is a simplified case, it would not be so easy to handle that on the application side, so I would rather try to adjust my query
If you can know beforehand the value of the key that you are searching for(i.e. 1.2.3, 1.5.3 in your case), here is a workaround using $facet. It first tries to get the documents by $match and store them into an array named results. Depending on the $size of the results array, we either replace it with the $group result (when we have matched records); or replace it with a default count: 0 record with the key you specified.
db.collection.aggregate([
{
"$facet": {
"results": [
{
$match: {
"type": <key you want to search>
}
},
{
$group: {
_id: {
$ifNull: [
"$type",
0
]
},
count: {
$sum: 1
}
}
}
]
}
},
{
"$replaceRoot": {
"newRoot": {
"$cond": {
"if": {
$gt: [
{
"$size": "$results"
},
0
]
},
"then": "$$ROOT",
"else": {
"results": [
{
"_id": <key you want to search>,
"count": 0
}
]
}
}
}
}
},
{
"$unwind": "$results"
},
{
"$replaceRoot": {
"newRoot": "$results"
}
}
])
Mongo Playground