Match and Average in mongo keep producing null - mongodb

I'm using the console to perform an aggregation, using $match to check that a nested field exists, and then pushing to the group and $avg operator. However the match works, just fine on the same variable and the code for count works too, but when it comes to the average I return null every time.
I'm looking in an array with .0 for example for the first element and then looking in a field for that element. It's very perplexing and difficult to debug. Are there any suggestions? Distinct shows that the values I look at are all numeric afaik. Are the any suggestions for how to debug this?
db.b.aggregate([ {$match: {"x.x.x.0.x": {$exists: true} } }, {$group: {_id: null, myAvg: { $avg: "$x.x.x.0.x"}}}])
Results in:
{ "_id" : null, "myAvg" : null }

This appears to be a limitation of the aggregation framework with respect to where you can actually use the "array.n" notation to access the nth element of an array.
More precisely, given the following sample document:
db.test.insertOne({
"a" : [
{
"x" : 1.0
}
]
})
...you can do the following to retrieve all documents where the first element of the "a" array matches 1:
db.test.aggregate({
$match: {
"a.0.x": 1
}
})
However, you cannot run the following:
db.test.aggregate({
$project: {
"a0x": "$a.0.x"
}
})
Well, you can but it will return an empty array like this which is a little surprising indeed:
{
"_id" : ...,
"a0x" : []
}
However, there is a special operator $arrayElemAt to access the nth element in this case like so:
db.test.aggregate({
$project: {
"a0x": { $arrayElemAt: [ "$a.x", 0 ] },
}
})
Kindly note that this will return the nth element only - so not nested inside an array anymore:
{
"a0x" : 1.0
}
So what you probably want to do is this:
db.b.aggregate({
$group: {
_id: null,
myAvg: {
$avg: {
$arrayElemAt: [ "$x.x.x.x", 0 ]
}
}
}
})

Related

How do i find the total number of subjectsthat has no prerequisites using agregation?

I have tried several codes but it didn't work.
Example from the database,
one has a prerequisite and one does not have prerequisites and I would like to find the total number of the subject with no prerequisites :
db.Subject.insert(
{
"_id":ObjectId(),
"subject":{
"subCode":"CSCI321",
"subTitle":"Final Year Project",
"credit":6,
"type":"Core",
"assessments": [
{ "assessNum": 1,
"weight":30,
"assessType":"Presentation",
"description":"Prototype demonstration" },
{ "assignNum": 2,
"weight":70,
"assessType":"Implementation and Presentation",
"description":"Final product Presentation and assessment of product implementation by panel of project supervisors" }
]
}
}
)
db.Subject.insert(
{
"_id":ObjectId(),
"subject":{
"subCode":"CSCI203",
"subTitle":"Algorithm and Data Structures",
"credit":3,
"type":"Core",
"prerequisite": ["csci103"]
}})
one of the few codes that I tried using :
db.Subject.aggregate({$group:{"prerequisite":{"$exists": null}, count:{$sum:1}}});
Results :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:18:14
_assertCommandWorked#src/mongo/shell/assert.js:534:17
assert.commandWorked#src/mongo/shell/assert.js:618:16
DB.prototype._runAggregate#src/mongo/shell/db.js:260:9
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1062:12
#(shell):1:1
You can use $match to eliminate unwanted documents and $group to calculate sum
db.collection.aggregate([
{
$match: {
"subject.prerequisite": {
"$exists": false
}
}
},
{
$group: {
_id: null,
total: {
$sum: 1
}
}
}
])
Working Mongo playground
This can be achieved within a single aggregation pipeline stage i.e. the $group step where you can use the BSON Types comparison order to aggregate the
documents where the 'subjects.prerequisites' field exists and has at least an element. The condition can be used as the group by key i.e. the _id field
in $group.
Consider running the following aggregation pipeline to get the desired results:
db.Subject.aggregate([
{ $group: {
_id: {
$cond: [
{
$or: [
{ $lte: ['$subject.prerequisite', null] },
{
$eq: [
{ $size: { $ifNull: ['$subject.prerequisite', [] ] } },
0
]
}
]
},
'noPrerequisite',
'havePrerequisite'
]
},
count: { $sum: 1 }
} }
])
The first condition in the OR simply returns true if a document does not have the embedded prerequisites field and the other satisfies these set of conditions:
if length of ( prerequisites || [] ) is zero
In the above, $cond takes a logical condition as its first argument (if) and then returns the second argument where the evaluation is true (then) or the third argument where false (else). When used as an expression in the _id field for $group, it groups all the documents into either true/false which is conditionally projected as "noPrerequisite" (true) OR "havePrerequisite" (false) in the group key.
The results will contain both counts for documents where the prerequisite field exists and for those without the field OR it has an empty array.

Filter only documents that have ALL FIELDS non null (with aggregation framework)

I have many documents, but I want to figure out how to get only documents that have ALL FIELDS non null.
Suppose I have these documents:
[
{
'a': 1,
'b': 2,
'c': 3
},
{
'a': 9,
'b': 12
},
{
'a': 5
}
]
So filtering the documents, only the first have ALL FIELDS not null. So filtering out these documents, I would get only the first. How can I do this?
So when you wanted to get only the documents which have ALL FIELDS, without specifying all of them in filter query like this : { a: {$exists : true}, b : {$exists : true}, c : {$exists : true}} then it might not be a good idea, in other way technically if you've 10s of fields in the document then it wouldn't either be a good idea to mention all of them in the query. Anyhow as you don't want to list them all - We can try this hack if it performs well, Let's say if you've a fixed schema & say that all of your documents may contain only fields a, b & c (_id is default & exceptional) but nothing apart from those try this :
If you can get count of total fields, We can check for field count which says all fields do exists, Something like below :
db.collection.aggregate([
/** add a new field which counts no.of fields in the document */
{
$addFields: { count: { $size: { $objectToArray: "$$ROOT" } } }
},
{
$match: { count: { $eq: 4 } } // we've 4 as 3 fields + _id
},
{
$project: { count: 0 }
}
])
Test : mongoplayground
Note : We're only checking for field existence but not checking for false values like null or [] or '' on fields. Also this might not work for nested fields.
Just in case if you wanted to check all fields exist in the document with their names, So if you can pass all fields names as input, then try below query :
db.collection.aggregate([
/** create a field with all keys/field names in the document */
{
$addFields: {
data: {
$let: {
vars: { data: { $objectToArray: "$$ROOT" } },
in: "$$data.k"
}
}
}
},
{
$match: { data: { $all: [ "b", "c", "a" ] } } /** List down all the field names from schema */
},
{
$project: { data: 0 }
}
])
Test : mongoplayground
Ref : aggregation-pipeline
You can try to use explain to check your queries performance.

How to get all subdocuments _id into variable

Im trying to get families subdocuments _ids to variable.
Here my schema:
families: [
{
_id: {
type: mongoose.Types.ObjectId
},
name: {
type: String
},
relation: {
type: String
}
}
]
the problem is, i can get the _id of parent to show inside variable, but when im trying to get the families _ids its showing undefined in console log.
What is the proper query to get families subdocuments _ids into variable?
Please try this :
db.yourCollection.aggregate([
{ $unwind: '$families' },
{ $project: { Ids: '$families._id' } }, { $group: { '_id': '$_id', subDocumentsIDs: { $push: '$Ids' } } }
])
Output:
/* 1 */
{
"_id" : ObjectId("5d58d3205a0d22d3c85d16f1"),
"subDocumentsIDs" : [
ObjectId("5d570b350e2fb4f72533d512"),
ObjectId("5d570b350e2fb4f71533d510"),
ObjectId("5d570b350e2fb4172533d511")
]
}
/* 2 */
{
"_id" : ObjectId("5d58d3105a0d22d3c85d1591"),
"subDocumentsIDs" : [
ObjectId("5d570b350e2fb4f72533d312"),
ObjectId("5d570b350e2fb4f71533d310"),
ObjectId("5d570b350e2fb4172533d311")
]
}
Please consider this as a basic example & go ahead with enhancements if anything needed, something like $unwind as an early stage would have performance impacts, if your collection is of large dataset, but you can easily avoid that by using $match as first stage, as you said you're able to get parent _id then use it in $match to filter documents

Average array of arrays MongoDB

I have this collection in MongoDB
{
{"values" : [1,2,3,4,5,6]},
{"values" : [7,8,9,10,11,12]},
{"values" : [13,14,15,16,17,18]}
}
How I can aggregate and take a array with average by indexes?
Like this:
{ "average" : [7,8,9,9.66,10.66,12] }
Note: average[0] = (1 + 7 + 13) / 3
Regards,
You can use Aggregation Framework and $avg.
$avg can be used in $project or $group.
https://docs.mongodb.com/manual/reference/operator/aggregation/avg/
With a single expression as its operand, if the expression resolves to
an array, $avg traverses into the array to operate on the numerical
elements of the array to return a single value. With a list of
expressions as its operand, if any of the expressions resolves to an
array, $avg does not traverse into the array but instead treats the
array as a non-numerical value.
UPDATE #2:
since the problem is now more clear, i will update my answer.
db.stackoverflow027.aggregate([
{
$match: {
"message.testnr":"1111"
}
},
{
$unwind: {
path: "$message.content.deflection",
includeArrayIndex: "position"
}
},
{
$group: {
_id: "$position",
averageForIndex: {$avg: "$message.content.deflection"}/*,
debug_totalIndexInvolvedInTheAverage: {$sum: 1},
debug_valueInvolvedInTheAverage: {$push: "$message.content.deflection"},
debug_documentInvolvedInTheAverage: {$push: "$$ROOT"}*/
}
},
{
$sort: {_id:1}
},
{
$group: {
_id: null,
average: {$push: "$averageForIndex"}
}
}
], { allowDiskUse: true });
That will give you this output:
{
"_id" : null,
"average" : [
6.0,
7.0,
8.0,
9.0,
10.0
]
}
I also added { allowDiskUse: true } in order to avoid memory limitations (check the link to have more informations).
Hope now your problem is solved.
You can see some "debug_" property in order to give you the opportunity to figure out what really happen at $group iteration. But you can remove this property in product environmental.

Remove duplicate in MongoDB

I have a collection with the field called "contact_id".
In my collection I have duplicate registers with this key.
How can I remove duplicates, resulting in just one register?
I already tried:
db.PersonDuplicate.ensureIndex({"contact_id": 1}, {unique: true, dropDups: true})
But did not work, because the function dropDups is no longer available in MongoDB 3.x
I'm using 3.2
Yes, dropDups is gone for good. But you can definitely achieve your goal with little bit effort.
You need to first find all duplicate rows and then remove all except first.
db.dups.aggregate([{$group:{_id:"$contact_id", dups:{$push:"$_id"}, count: {$sum: 1}}},
{$match:{count: {$gt: 1}}}
]).forEach(function(doc){
doc.dups.shift();
db.dups.remove({_id : {$in: doc.dups}});
});
As you see doc.dups.shift() will remove first _id from array and then remove all documents with remaining _ids in dups array.
script above will remove all duplicate documents.
this is a good pattern for mongod 3+ that also ensures that you will not run our of memory which can happen with really big collections. You can save this to a dedup.js file, customize it, and run it against your desired database with: mongo localhost:27017/YOURDB dedup.js
var duplicates = [];
db.runCommand(
{aggregate: "YOURCOLLECTION",
pipeline: [
{ $group: { _id: { DUPEFIELD: "$DUPEFIELD"}, dups: { "$addToSet": "$_id" }, count: { "$sum": 1 } }},
{ $match: { count: { "$gt": 1 }}}
],
allowDiskUse: true }
)
.result
.forEach(function(doc) {
doc.dups.shift();
doc.dups.forEach(function(dupId){ duplicates.push(dupId); })
})
printjson(duplicates); //optional print the list of duplicates to be removed
db.YOURCOLLECTION.remove({_id:{$in:duplicates}});
We can also use an $out stage to remove duplicates from a collection by replacing the content of the collection with only one occurrence per duplicate.
For instance, to only keep one element per value of x:
// > db.collection.find()
// { "x" : "a", "y" : 27 }
// { "x" : "a", "y" : 4 }
// { "x" : "b", "y" : 12 }
db.collection.aggregate(
{ $group: { _id: "$x", onlyOne: { $first: "$$ROOT" } } },
{ $replaceWith: "$onlyOne" }, // prior to 4.2: { $replaceRoot: { newRoot: "$onlyOne" } }
{ $out: "collection" }
)
// > db.collection.find()
// { "x" : "a", "y" : 27 }
// { "x" : "b", "y" : 12 }
This:
$groups documents by the field defining what a duplicate is (here x) and accumulates grouped documents by only keeping one (the $first found) and giving it the value $$ROOT, which is the document itself. At the end of this stage, we have something like:
{ "_id" : "a", "onlyOne" : { "x" : "a", "y" : 27 } }
{ "_id" : "b", "onlyOne" : { "x" : "b", "y" : 12 } }
$replaceWith all existing fields in the input document with the content of the onlyOne field we've created in the $group stage, in order to find the original format back. At the end of this stage, we have something like:
{ "x" : "a", "y" : 27 }
{ "x" : "b", "y" : 12 }
$replaceWith is only available starting in Mongo 4.2. With prior versions, we can use $replaceRoot instead:
{ $replaceRoot: { newRoot: "$onlyOne" } }
$out inserts the result of the aggregation pipeline in the same collection. Note that $out conveniently replaces the content of the specified collection, making this solution possible.
maybe it be a good try to create a tmpColection, create unique index, then copy data from source, and last step will be swap names?
Other idea, I had is to get doubled indexes into array (using aggregation) and then loop thru calling the remove() method with the justOne parameter set to true or 1.
var itemsToDelete = db.PersonDuplicate.aggregate([
{$group: { _id:"$_id", count:{$sum:1}}},
{$match: {count: {$gt:1}}},
{$group: { _id:1, ids:{$addToSet:"$_id"}}}
])
and make a loop thru ids array
makes this sense for you?
I have used this approach:
Take the mongo dump of the particular collection.
Clear that collection
Add a unique key index
Restore the dump using mongorestore.