I have a collection where a sample document has the following shape:
{
"document_1": {
"field_1": {},
"array_1": {
"subobject_1": {
"subobject_field_1": "true",
"subobject_field_2": {},
"subobject_field_3": {}
},
"subobject_2": {
"subobject_field_1": "false",
"subobject_field_2": {},
"subobject_field_3": {}
}
}
}
}
The number of subobjects (subobject_) under array_1 varies and is not the same for all documents. I am trying to make a query that, for each document, counts the number of subobjects where subobject_field_1 is true. I also want to be able to specify exactly which fields to return, and set additional conditions (in this case the additional condition would be that field_1 = "A"). In this case, the output would look like:
{
"document_1": {
"field_1": "A",
"array_1": 1
}
}
I have tried the code below, but that only gives me the number of subobjects regardless of whether or not subobject_field_1is trueor false.
db.getCollection('myCollection').aggregate([
{
$match: {field_1: 'A'}
},
{
$project: {field_1: 1, array_1: {$size: $array_1}}
}
])
Thanks in advance!
You can use below aggregation
db.collection.aggregate([
{ "$match": { "field_1": "A" }},
{ "$addFields": {
"field_2": {
"$size": {
"$filter": {
"input": { "$objectToArray": "$field_2" },
"cond": { "$eq": ["$$this.v.subdoc_field_1", "true"] }
}
}
}
}}
])
Related
Consider I have the following document structure:
{
"_id": ObjectID(),
"foo": "FOO",
"bar": "BAR",
"items": [
{
"foo": "FOO",
"bar": "BAR",
"name": "hello",
"value": "50"
},
{
"foo": "FOO",
"bar": "BAR",
"name": "bye",
"value": "300"
},
{
"foo": "FOO",
"bar": "BAR",
"name": "welcome",
"value": "500"
}
],
}
I would like to find all items that match both the following conditions:
name = "hello"
value != 0
And for each matched item I would like to return only the value field. I don't need all the other fields (foo/bar in this example).
So the ideal result should look like this:
[
{ value: "50" },
{ value: "100" },
{ value: "30" },
…
]
How do I do this with MongoDB?
I've tried this query:
// filter
{
items: {
$elemMatch: {
name: "hello",
value: { $ne: "0" },
},
}
}
// projection
{
"_id": 0,
"items.$": 1
}
It matches the items correctly, but it returns the whole items and I want only a single field from it.
Sadly, I can't use projection like this: "items.$.value": 1.
I've also tried the following aggregation:
{
$unwind: {
path: "$items"
}
}
{
$match: {
"items.name": "hello",
"items.value": { $ne: "0" },
}
}
{
$replaceRoot: {
newRoot: "$items"
}
}
{
$project: {
"value": 1
}
}
It works perfectly and returns the expected result, but I have a feeling that it will have poorer performance.
Is there a way to achieve what I want with optimal performance?
Maybe something like this:
db.collection.aggregate([
{
$match: {
items: {
$elemMatch: {
"name": "hello",
"value": {
$ne: "0"
}
}
}
}
},
{
$project: {
items: {
"$map": {
input: {
"$filter": {
"input": "$items",
"as": "i",
"cond": {
$and: [
{
$ne: [
"$$i.value",
0
]
},
{
$eq: [
"$$i.name",
"hello"
]
}
]
}
}
},
as: "m",
"in": {
"value": "$$m.value"
}
}
}
}
},
{
$unwind: "$items"
},
{
"$replaceRoot": {
"newRoot": "$items"
}
}
])
Explained:
Match only documents having at least 1x items.element with name:"hello" and value!=0 ( good to have index on items.name+items.value , this match stage is expected to reduce the data that you want to pass to the next stages -> less data = better performance )
Filter only the values for matched items in the project stage.
( This will remove unnecessary items array sub-items , again less data = better performance )
Unwind only the already filtered ( keeping unwind in the later stages will save alot of resources if the collection is big ... )
replace the root with the necessary values only ( this is to have the output as in expected format )
Playground
Indeed as identified simple match stage will not provide correct results and $elemMatch must be used in the first $match stage ...
First of all I know we can check if a key exists using the dot operator but in my case it is not working and I dont know why.
So far in the aggregation pipeline I have the following records.
{
"my_key":"1234"
"data":{
1234:"abc"
4567:"xyz"
}
}
{
"my_key":"6666"
"data":{
1234:"abc"
4567:"xyz"
}
}
I want to return the document where the my_key value does not exists in the data object. So according to the above example it should return the 2nd document.
I was trying using the $match operator as following but it does not seem to work.
$match :
{
"data.$my_key":{$exists:false}
}
This does not work and I dont get why :(
Is it because the my_key value is a string and the keys in the data object are not strings?
playground
db.collection.aggregate([
{
"$project": {//Reshape the data
"data": {
"$objectToArray": "$data"
},
"my_key": 1
}
},
{
"$unwind": "$data"
},
{
"$match": {//matching
"$expr": {
"$eq": [
"$data.k",
"$my_key"
]
}
}
}
])
Another way
Wihtout unwind
db.collection.aggregate([
{
"$project": {
"data": {
"$objectToArray": "$data"
},
"my_key": 1
}
},
{
$project: {
"output": {
"$map": {
"input": "$data",
"as": "data",
"in": {
"$eq": [
"$$data.k",
"$my_key"
]
}
}
},
"data": 1,
"my_key": 1
}
},
{
$match: {
output: true
}
}
])
If you need original format of data, you can add the below as last stage
{
$project: {
"data": {
"$arrayToObject": "$data"
},
"my_key": 1
}
}
I'm a complete beginner in mongodb . Actually I'm trying to find all the documents containing null or nothing for example documents like {
"_id" : "abc"
} for deleting them from collection.
But even after searching a lot of SO questions I couldn't get any solution .So, how can I do this ? and sorry if I'm ignoring anything.
I don't know how to do it in a single operation, but you can try something like this:
db["collectionName"].find({_id: {$exists: true}}).forEach(function(doc) {
if (Object.keys(doc).length === 1) {
// ..delete this document db["collectionName"].remove({_id: doc._id})
}
})
One possible solution is to get a list of the _id values of those null field documents and then remove them. This can be significantly efficient considering that you only execute two queries instead of looping through the whole collection (this can potentially affect your db performance especially with large collections).
Consider running the following aggregate pipeline to get those ids:
var ids = db.collection.aggregate([
{ "$project": {
"hashmaps": { "$objectToArray": "$$ROOT" }
} },
{ "$project": {
"keys": "$hashmaps.k"
} },
{ "$redact": {
"$cond": [
{
"$eq":[
{
"$ifNull": [
{ "$arrayElemAt": ["$keys", 1] },
0
]
},
0
]
},
"$$KEEP",
"$$PRUNE"
]
} },
{ "$group": {
"_id": null,
"ids": { "$push": "$_id" }
} }
]).toArray()[0]["ids"];
Removing the documents
db.collection.remove({ "_id": { "$in": ids } });
The other approach is similar to the above in that you would need two queries; the first which returns a list of all the top level fields in the collection and the last removes the documents from the collection which do not have those fields altogether.
Consider running the following queries:
/*
Run an aggregate pipeline operation to get a list
of all the top-level fields in the collection
*/
var fields = db.collection.aggregate([
{ "$project": {
"hashmaps": { "$objectToArray": "$$ROOT" }
} },
{ "$project": {
"keys": "$hashmaps.k"
} },
{ "$group": {
"_id": null,
"fields": { "$addToSet": "$keys" }
} },
{ "$project": {
"fields": {
"$setDifference": [
{
"$reduce": {
"input": "$fields",
"initialValue": [],
"in": { "$setUnion" : ["$$value", "$$this"] }
}
},
["_id"]
]
}
}
}
]).toArray()[0]["fields"];
The second query looks for the existence of all the fields except the _id one. For example, suppose your collection has documents with the keys _id, a, b and c, the query
db.collection.find({
"a" : { "$exists": false },
"b" : { "$exists": false },
"c" : { "$exists": false }
});
matches documents that do not contain the all the three fields a, b AND c:
So if you have a list of the top level fields in your collection then all you need is to construct the above query document. Use reduce method on the array for this:
// Construct the above query
var query = fields.reduce(function(acc, curr) {
acc[curr] = { "$exists": false };
return acc;
}, {});
Then use the query to remove the documents as
db.collection.remove(query);
I have an article collection:
{
_id: 9999,
authorId: 12345,
coAuthors: [23456,34567],
title: 'My Article'
},
{
_id: 10000,
authorId: 78910,
title: 'My Second Article'
}
I'm trying to figure out how to get a list of distinct author and co-author ids out of the database. I have tried push, concat, and addToSet, but can't seem to find the right combination. I'm on 2.4.6 so I don't have access to setUnion.
Whilst $setUnion would be the "ideal" way to do this, there is another way that basically involved "switching" between a "type" to alternate which field is picked:
db.collection.aggregate([
{ "$project": {
"authorId": 1,
"coAuthors": { "$ifNull": [ "$coAuthors", [null] ] },
"type": { "$const": [ true,false ] }
}},
{ "$unwind": "$coAuthors" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"$cond": [
"$type",
"$authorId",
"$coAuthors"
]
}
}},
{ "$match": { "_id": { "$ne": null } } }
])
And that is it. You may know the $const operation as the $literal operator from MongoDB 2.6. It has always been there, but was only documented and given an "alias" at the 2.6 release.
Of course the $unwind operations in both cases produce more "copies" of the data, but this is grouping for "distinct" values so it does not matter. Just depending on the true/false alternating value for the projected "type" field ( once unwound ) you just pick the field alternately.
Also this little mapReduce does much the same thing:
db.collection.mapReduce(
function() {
emit(this.authorId,null);
if ( this.hasOwnProperty("coAuthors"))
this.coAuthors.forEach(function(id) {
emit(id,null);
});
},
function(key,values) {
return null;
},
{ "out": { "inline": 1 } }
)
For the record, $setUnion is of course a lot cleaner and more performant:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": ["A"],
"as": "el",
"in": "$authorId"
}},
{ "$ifNull": [ "$coAuthors", [] ] }
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined"
}}
])
So there the only real concerns are converting the singular "authorId" to an array via $map and feeding an empty array where the "coAuthors" field is not present in the document.
Both output the same distinct values from the sample documents:
{ "_id" : 78910 }
{ "_id" : 23456 }
{ "_id" : 34567 }
{ "_id" : 12345 }
I have following json structure in mongo collection-
{
"students":[
{
"name":"ABC",
"fee":1233
},
{
"name":"PQR",
"fee":345
}
],
"studentDept":[
{
"name":"ABC",
"dept":"A"
},
{
"name":"XYZ",
"dept":"X"
}
]
},
{
"students":[
{
"name":"XYZ",
"fee":133
},
{
"name":"LMN",
"fee":56
}
],
"studentDept":[
{
"name":"XYZ",
"dept":"X"
},
{
"name":"LMN",
"dept":"Y"
},
{
"name":"ABC",
"dept":"P"
}
]
}
Now I want to calculate following output.
if students.name = studentDept.name
so my result should be as below
{
"name":"ABC",
"fee":1233,
"dept":"A",
},
{
"name":"XYZ",
"fee":133,
"dept":"X"
}
{
"name":"LMN",
"fee":56,
"dept":"Y"
}
Do I need to use mongo aggregation or is it possible to get above given output without using aggregation???
What you are really asking here is how to make MongoDB return something that is actually quite different from the form in which you store it in your collection. The standard query operations do allow a "limitted" form of "projection", but even as the title on the page shared in that link suggests, this is really only about "limiting" the fields to display in results based on what is present in your document already.
So any form of "alteration" requires some form of aggregation, which with both the aggregate and mapReduce operations allow to "re-shape" the document results into a form that is different from the input. Perhaps also the main thing people miss with the aggregation framework in particular, is that it is not just all about "aggregating", and in fact the "re-shaping" concept is core to it's implementation.
So in order to get results how you want, you can take an approach like this, which should be suitable for most cases:
db.collection.aggregate([
{ "$unwind": "$students" },
{ "$unwind": "$studentDept" },
{ "$group": {
"_id": "$students.name",
"tfee": { "$first": "$students.fee" },
"tdept": {
"$min": {
"$cond": [
{ "$eq": [
"$students.name",
"$studentDept.name"
]},
"$studentDept.dept",
false
]
}
}
}},
{ "$match": { "tdept": { "$ne": false } } },
{ "$sort": { "_id": 1 } },
{ "$project": {
"_id": 0,
"name": "$_id",
"fee": "$tfee",
"dept": "$tdept"
}}
])
Or alternately just "filter out" the cases where the two "name" fields do not match and then just project the content with the fields you want, if crossing content between documents is not important to you:
db.collection.aggregate([
{ "$unwind": "$students" },
{ "$unwind": "$studentDept" },
{ "$project": {
"_id": 0,
"name": "$students.name",
"fee": "$students.fee",
"dept": "$studentDept.dept",
"same": { "$eq": [ "$students.name", "$studentDept.name" ] }
}},
{ "$match": { "same": true } },
{ "$project": {
"name": 1,
"fee": 1,
"dept": 1
}}
])
From MongoDB 2.6 and upwards you can even do the same thing "inline" to the document between the two arrays. You still want to reshape that array content in your final output though, but possible done a little faster:
db.collection.aggregate([
// Compares entries in each array within the document
{ "$project": {
"students": {
"$map": {
"input": "$students",
"as": "stu",
"in": {
"$setDifference": [
{ "$map": {
"input": "$studentDept",
"as": "dept",
"in": {
"$cond": [
{ "$eq": [ "$$stu.name", "$$dept.name" ] },
{
"name": "$$stu.name",
"fee": "$$stu.fee",
"dept": "$$dept.dept"
},
false
]
}
}},
[false]
]
}
}
}
}},
// Students is now an array of arrays. So unwind it twice
{ "$unwind": "$students" },
{ "$unwind": "$students" },
// Rename the fields and exclude
{ "$project": {
"_id": 0,
"name": "$students.name",
"fee": "$students.fee",
"dept": "$students.dept"
}},
])
So where you want to essentially "alter" the structure of the output then you need to use one of the aggregation tools to do. And you can, even if you are not really aggregating anything.