Results different documents for the same query - mongodb

db.getCollection('rien').aggregate([
{
$match: {
$and: [
{
"id": "10356"
},
{
$or: [
{
"sys_date": {
"$gte": newDate(ISODate().getTime()-90*24*60*60*1000)
}
},
{
"war_date": {
"$gte": newDate(ISODate().getTime()-90*24*60*60*1000)
}
}
]
}
]
}
},
{
$group: {
"_id": "$b_id",
count: {
$sum: 1
},
ads: {
$addToSet: {
"s": "$s",
"ca": "$ca"
}
},
files: {
$addToSet: {
"system": "$system",
"hostname": "$hostname"
}
}
}
},
{
$sort: {
"ads.s": -1
}
},
{
$group: {
"_id": "$b_id",
total_count: {
$sum: 1
},
"data": {
"$push": "$$ROOT"
}
}
},
{
$project: {
"_id": 0,
"total_count": 1,
results: {
$slice: [
"$data",
0,
50
]
}
}
}
])
When I execute this pipelines 5 times, it results in different set of documents. It is 3 node cluster. No sharding enabled. Have 10million documents. Data is static.
Any ideas about the inconsistent results? I feel I am missing some fundamentals here.

I can see 2 problems,
"ads.s": -1 will not work because, its an array field $sort will not apply in array field
$addToSet will not maintain sort order even its ordered from previous stage,
here mentioned in $addToSet documentation => Order of the elements in the output array is unspecified.
and also here mentioned in accumulators-group-addToSet => Order of the array elements is undefined
and also a JIRA ticket SERVER-8512 and DOCS-1114
You can use $setUnion operator for ascending order and $reduce for descending order result from $setUnion,
I workaround I am adding a solution below, I am not sure this is good option or not but you can use if this not affect performance of your query,
I am adding updated stages here only,
remain same
{ $match: {} }, // skipped
{ $group: {} }, // skipped
$sort, optional its up to your requirement if you want order by main document
{ $sort: { _id: -1 } },
$setUnion, treating arrays as sets. If an array contains duplicate entries, $setUnion ignores the duplicate entries, and second its return array in ascending order on the base of first field that we specified in $group stage is s, but make sure all element in array have s as first field,
$reduce to iterate loop of array and concat arrays current element $$this and initial value $$value, this will change order of array in descending order,
{
$addFields: {
ads: {
$reduce: {
input: { $setUnion: "$ads" },
initialValue: [],
in: { $concatArrays: [["$$this"], "$$value"] }
}
},
files: {
$reduce: {
input: { $setUnion: "$files" },
initialValue: [],
in: { $concatArrays: [["$$this"], "$$value"] }
}
}
}
},
remain same
{ $group: {} }, // skipped
{ $project: {} } // skipped
Playground
$setUnion mentioned in documentation: The order of the elements in the output array is unspecified., but I have tested every way its returning in ascending order perfectly, why I don't know,
I have asked question in MongoDB Developer Forum does-setunion-expression-operator-order-array-elements-in-ascending-order?, they replied => it will not guarantee of order!

Related

MongoDB document merge without a-priori knowledge of fields

I would like to merge several documents. Most of the fields have the same values but there might be one or two fields that have different values. These fields are unknown beforehand. Ideally I would like to merge all the documents keeping the fields that are the same as is but creating an array of values only for those fields that have some variation.
For my first approach I grouped by a common field to my documents and kept the first document, this however discards some information that varies in other fields.
group_documents = {
"$group": {
"_id": "$0020000E.Value",
"doc": {
"$first": "$$ROOT"
}
}
}
merge_documents = {
"$replaceRoot": {
"newRoot": "$doc"
}
}
write_collection = { "$out": { "db": "database", "coll": "records_nd" } }
objects = coll.aggregate(pipeline)
IF the fields that have different values where known I would have done something like this,
merge_sol1
or
merge_sol2
or
merge_sol3
The third solution is actually very close to my desired output and I could tweak it a bit. But these answers assume a-priori knowledge of the fields to be merged.
You can first convert $$ROOT to array of k-v tuples by $objectToArray. Then, $group all fields by $addToSet to put all distinct values into an array first. Then, check the size of the result array and conditionally pick the first item if the array size is 1 (i.e. the value is the same for every documents in the field); Otherwise, keep the result array. Finally, revert back to original document form by $arrayToObject.
db.collection.aggregate([
{
$project: {
_id: "$key",
arr: {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": "$arr"
},
{
$match: {
"arr.k": {
$nin: [
"key",
"_id"
]
}
}
},
{
$group: {
_id: {
id: "$_id",
k: "$arr.k"
},
v: {
"$addToSet": "$arr.v"
}
}
},
{
$project: {
_id: "$_id.id",
arr: [
{
k: "$_id.k",
v: {
"$cond": {
"if": {
$gt: [
{
$size: "$v"
},
1
]
},
"then": "$v",
"else": {
$first: "$v"
}
}
}
}
]
}
},
{
"$project": {
doc: {
"$arrayToObject": "$arr"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
_id: "$_id"
},
"$doc"
]
}
}
}
])
Mongo Playground

Find the distinct values in array of field, count them and write to another collection as array of string

Is is possible to get the distinct values of field that is an array of strings and write the output of distinct to another collection. As shown below from src_coll to dst_coll.
src_coll
{"_id": ObjectId("61968a26c05149a23ad391f4"),"letters": ["aa", "ab", "ac", "ad", "aa", "af"] , "numbers":[11,12,13,14] }
{"_id": ObjectId("61968a26c05149a23ad391f5"),"letters": ["ab", "af", "ag", "ah", "ai", "aj"] , "numbers":[15,16,17,18] }
{"_id": ObjectId("61968a26c05149a23ad391f6"),"letters": ["ac", "ad", "ae", "af", "ag", "ah"] , "numbers":[16,17,18,19] }
{"_id": ObjectId("61968a26c05149a23ad391f7"),"letters": ["ae", "af", "ag", "ah", "ai", "aj"] , "numbers":[17,18,19,20] }
dst_coll
{"_id": ObjectId("61968a26c05149a23ad391f8"),"all_letters": ["aa", "ab", "ac", "ad", "ae", "af", "ag", "ah", "ai", "aj"] }
I have seen the answer using distinct:
db.src_coll.distinct('letters') and using aggregate (if collection is huge, because i was getting error Executor error during distinct command :: caused by :: distinct too big, 16mb cap). I used:
db.src_coll.aggregate([ { $group: { _id: "$letters" } }, { $count: "letters_count" }], { allowDiskUse: true })
I do no know how to write the output of distinct or aggregate as show in dst_coll.
My collection contains 522 documents, Total Size = 314 MB, but the field letters contains thousands of string values in array for each document.
I appreciate your time to reply.
Thanks
Method I
I am assuming you are trying to create a single document containing all the distinct values in letters field across all documents in src_col. You can create a collection based on aggregation output using either $out or $merge. But $out would replace your collection if it already exists.
The unwinding array here would run out of memory in which case you will have to use { allowDiskUse: true } option.
db.collection.aggregate([
{
$unwind: "$letters"
},
{
$group: {
_id: null,
all_letters: {
"$addToSet": "$letters"
}
}
},
{
$merge: {
into: "dst_coll"
}
}
])
Demo
Method II
Another way to do this without $unwind is to use $reduce function which is more efficient.
db.collection.aggregate([
{
$group: {
_id: null,
all_letters: {
"$addToSet": "$letters"
}
}
},
{
$project: {
"all_letters": {
$reduce: {
input: "$all_letters",
initialValue: [],
in: {
$setUnion: [
"$$value",
"$$this"
]
}
}
}
}
},
{
$merge: {
into: "dst_coll"
}
}
])
Demo
Method III
Since we are going to create a single document from a collection using group, for large collections it's likely to run into memory issues. A way to avoid this would be to break down grouping into multiple stages, so each stage would not have to keep in memory lot of documents.
db.collection.aggregate([
{
$unwind: "$letters"
},
{
$bucketAuto: {
groupBy: "$_id",
buckets: 10000, // adjust the bucket size so that it outputs multiples documents for a range of documents.
output: {
"all_letters": {
"$addToSet": "$letters"
}
}
}
},
{
$bucketAuto: {
groupBy: "$_id",
buckets: 1000,
output: {
"all_letters": {
"$addToSet": "$all_letters"
}
}
}
},
{
$project: {
"all_letters": {
$reduce: {
input: "$all_letters",
initialValue: [],
in: {
$setUnion: [
"$$value",
"$$this"
]
}
}
}
}
},
{
$group: {
_id: null,
all_letters: {
"$addToSet": "$all_letters"
}
}
},
{
$project: {
"all_letters": {
$reduce: {
input: "$all_letters",
initialValue: [],
in: {
$setUnion: [
"$$value",
"$$this"
]
}
}
}
}
},
{
$merge: {
into: "dst_coll"
}
}
])
Refer to $bucketAuto and Aggregation Pipeline Limits.
Demo
Here's my solution for it but I'm not sure if it's the optimal way.
Algorithm:
Unwind all the array
Group by letters which will give only unique results
Group them again to get a single result
Use the $out stage to write the result to another collection:
Aggregation pipeline:
db.collection.aggregate([
{
$project: {
letters: 1,
_id: 0
}
},
{
$unwind: "$letters"
},
{
$group: {
_id: "$letters"
}
},
{
$group: {
_id: null,
allLetters: {
"$addToSet": "$_id"
}
}
},
{
$out: "your-collection-name"
}
])
Kindly see the docs for $out stage yourself.
See the solution on mongodb playground: Query

How to ggregate two collections and match field with array

I need to group the results of two collections candidatos and ofertas, and then "merge" those groups to return an array with matched values.
I've created this example with the aggregate and similar data to make this easier to test:
https://mongoplayground.net/p/m0PUfdjEye4
This is the explanation of the problem that I'm facing.
I can get both groups with the desired results independently:
candidatos collection:
db.getCollection('ofertas').aggregate([
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}}
]);
This is the result...
ofertas collection:
db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}}
]);
This is the result...
What I need to do, is to aggregate those groups to merge their results based on their _id coincidence. I think I'm going in the right way with the next aggregate, but the field countOfertas always returns 0.0. I think that there is something wrong in my project $cond, but I don't know what is it. This is the aggregate:
db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}},
{
$lookup: {
from: 'ofertas',
let: {},
pipeline: [
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}}
],
as: 'ofertas'
}
},
{
$project: {
_id: 1,
countProvinciaCandidato: 1,
countOfertas: {
$cond: {
if: {
$eq: ['$ofertas._id', "$_id"]
},
then: '$ofertas.countProvinciaOferta',
else: 0,
}
}
}
},
{ $sort: { "countProvinciaCandidato": -1}},
{ $limit: 20 }
]);
And this is the result, but as you can see, field countOfertas is always 0
Any kind of help will be welcome
What you have tried is so much appreciated. But in $project you need to use $reduce which helps to loop through the array and satisfy the condition
Here is the code
db.candidatos.aggregate([
{
"$group": {
_id: "$que_busco.ubicacion_puesto_trabajo.provincia",
countProvinciaCandidato: { $sum: 1 }
}
},
{
$lookup: {
from: "ofertas",
let: {},
pipeline: [
{
"$group": {
_id: "$ubicacion_puesto.provincia",
countProvinciaOferta: { $sum: 1 }
}
}
],
as: "ofertas"
}
},
{
$project: {
_id: 1,
countProvinciaCandidato: 1,
countOfertas: {
"$reduce": {
"input": "$ofertas",
initialValue: 0,
"in": {
$cond: [
{ $eq: [ "$$this._id", "$_id" ] },
{ $add: [ "$$value", 1 ] },
"$$value"
]
}
}
}
}
},
{ $sort: { "countProvinciaCandidato": -1 } },
{ $limit: 20 }
])
Working Mongo playground
Note : If you need to do with aggregations only, this is fine. But I personally feel this approach is not good. My suggestion is, you can concurrently call group aggregations in different service and do it with programmatically. Because $lookup is expensive, when you get massive data, this performance will be reduced
The $eq in the $cond is comparing an array to an ObjectId, so it never matches.
The $lookup stage results will be in the ofertas field as an array of documents, so '$ofertas._id' will be an array of all the _id values.
You will probably need to use $unwind, $reduce after the $lookup.

aggregate with unwind, how to limit per document and not globally? (mongodb)

If I have a collection with 300 documents, each document has a array field called items (each item of the array is an object), something like this:
*DOCUMENT 1:*
_id: **********,
title: "test",
desc: "test desc",
items (array)
0: (object)
title: (string)
tags: (array of strings)
1: (object)
etc.
and I need to retrieve items by tags, what I'm using is this query below. I have to $limit results to something like 200 or the query is too big, the problem is if the first document has more than 200 items what it returns are only items of that document, what I'd need is to limit results PER document, for instance I'd need to retrieve 5 items for each different document where tags match ($all) tags provided.
const foundItems = await db.collection('store').aggregate([
{
$unwind: '$items'
},
{
$match: {
'items.tags': { $all : tagsArray }
}
},
{
$project: {
myitem: '$items',
desc: 1,
title: 1
}
},
{
$limit: 200
}
]).toArray()
to make it more clear and simple what I'd need in a ideal world would be something like:
{
$limit: 5,
$per: _id,
$totalLimit: 200
}
instead of $limit: 200 , is this achievable somehow? I didn't find any explanation about it in the official documentation.
What I tried is to add $sort right before $limit which would make sense if it had the behaviour I'm looking for put it that way and maybe not if placed AFTER the limit, but unfortunately it doesn't work that way and placed before or after the limit doesn't make any difference.
And I can't really use $sample since results are more than the 5%
Updated demo - https://mongoplayground.net/p/nM6T9XVa-XK
db.collection.aggregate([
{ $unwind: "$items" },
{
$match: {
"items.tags": {
$all: [ "a","b" ]
}
}
},
{
"$group": {
"_id": "$_id",
"myitem": { "$push": "$items" },
desc: { "$first": "$desc" },
title: { "$first": "$title" }
}
},
{
"$project": {
"_id": 1,
desc: 1,
title: 1,
"myitem": { $slice: [ "$myitem", 2 ]
}
}
},
{
$unwind: "$myitem"
}
])
Demo - https://mongoplayground.net/p/BESptnyUfSS
After matching the records you can $group them according to id and $project them and limit them using Use $slice
db.collection.aggregate([
{ $unwind: "$items" },
{
$match: {
"items.tags": { $all: [ "a", "b" ]
}
}
},
{
$project: {
_id: 1, myitem: "$items", desc: 1,title: 1
}
},
{
"$group": {
"_id": "$_id",
"myitem": { "$push": "$myitem" }
}
},
{
"$project": {
"_id": 1,
"myitem": {
$slice: [ "$myitem", 1 ] // limit records here per group / id
}
}
}
])

Mongodb having problem of adding two values inside nested document with dynamic key

I wish to add currentAsset.total and longTermAsset.total for each of my child documents with dynamic key to a new field. My current mongodb version is 4.0.12
My source document is as below:
{
"_id":"5f44bc4c36ac3e2c8c6db4bd",
"counter":"Apple",
"balancesheet":{
"0":{
"currentAsset":{
"total":123.12
},
"longTermAsset":{
"total":10.16
}
},
"1":{
"currentAsset":{
"total":10.23
},
"longTermAsset":{
"total":36.28
}
}
}
}
The result document I wanted to get is:
{
"_id": "5f44bc4c36ac3e2c8c6db4bd",
"counter": "Apple",
"balancesheet": {
"0": {
"currentAsset": {
"total": 123.12
},
"longTermAsset": {
"total": 10.16
},
"totalAsset": 133.28
},
"1": {
"currentAsset": {
"total": 10.23
},
"longTermAsset": {
"total": 36.28
},
"totalAsset": 46.51
}
}
}
I have tried a few aggegrates but failed as it is giving me "errmsg" : "$add only supports numeric or date types, not array"
db.balancesheets.aggregate([
{
$match: { counter: "Apple" }
},
{
$project: {
bs: { $objectToArray: "$balancesheet" }
}
},
{
$addFields: {
totalAsset: {
$add: ["$bs.k.currentAsset.total", "$bs.k.longTermAsset.total"]
}
}
}
])
As I refer to this, it seems like the version needs to be 4.2 and above. Is there anyway that will be able to do it on my existing 4.0.12 version?
MongoDB Aggregation: add field from an embedded document via a dynamic field path
There is no version issues, follow few fixes,
first 2 pipelines looks good,
$unwind deconstruct bs array
$addFields corrected, you used k instead of v in accessing field total
$group to reconstruct and prepare again object to array
$addFields to convert bs array to object using $reduce
db.collection.aggregate([
// $match ... pipeline
// $project ... pipeline
// unwind bs array
{ $unwind: "$bs" },
{
$addFields: {
"bs.v.totalAsset": { $add: ["$bs.v.currentAsset.total", "$bs.v.longTermAsset.total"] }
}
},
{
$group: {
_id: "$_id",
bs: { $push: { $arrayToObject: [["$bs"]] } },
counter: { $first: "$counter" },
},
}
{
$addFields: {
bs: {
$reduce: {
input: "$bs",
initialValue: {},
in: { $mergeObjects: ["$$value", "$$this"] }
}
}
}
}
])
Playground