Encapsulate/Transform an Array into an Array of Objects - mongodb

I am currently in the process of modifying a schema and I need to do a relatively trivial transform using the aggregation framework and a bulkWrite.
I want to be able to take this array:
{
...,
"images" : [
"http://example.com/...",
"http://example.com/...",
"http://example.com/..."
]
}
and aggregate to a similar array where the original value is encapsulated:
{
...,
"images" : [
{url: "http://example.com/..."},
{url: "http://example.com/..."},
{url: "http://example.com/..."}
]
}
This slow query works, but it is ridiculously expensive to unwind an entire collection.
[
{
$match: {}
},
{
$unwind: {
path : "$images",
}
},
{
$group: {
_id: "$_id",
images_2: {$addToSet: {url: "$images"}}
}
},
]
How can this be achieved with project or some other cheaper aggregation?

$map expression should do the job, try this:
db.col.aggregate([
{
$project: {
images: {
$map: {
input: '$images',
as: 'url',
in: {
url: '$$url'
}
}
}
}
}
]);

You don't need to use the bulkWrite() method for this.
You can use the $map aggregation array operator to apply an expression to each element element in your array.
Here, the expression simply create a new object where the value is the item in the array.
let mapExpr = {
"$map": {
"input": "$images",
"as": "imageUrl",
"in": { "url": "$$imageUrl }
}
};
Finally you can use the $out aggregation pipeline operator to overwrite your collection or write the result into a different collection.
Of course $map is not an aggregation pipeline operator so which means that the $map expression must be use in a pipeline stage.
The way you do this depends on your MongoDB version.
The best way is in MongoDB 3.4 using $addFields to change the value of the "images" field in your document.
db.collection.aggregate([
{ "$addFields": { "images": mapExpr }},
{ "$out": "collection }
])
From MongoDB 3.2 backwards, you need to use the $project pipeline stage but you also need to include all the other fields manually in your document
db.collection.aggregate([
{ "$project": { "images": mapExpr } },
{ "$out": "collection }
])

Related

Removing item out of nested document array and while also accounting for null/empty document array

I'm new to mongodb and I've been working on this query for quite sometime. I've found solutions using "$project" and "$group" and "$match". Overall goal is if document within nested array "internal" attribute is false, remove it from the array.
$project and $group DO work BUT they then throw of the projection, I don't even see a current projection in this query but once I add in $project or $group it ONLY returns the specific nested document array I'm messing with.
$match won't work because I have cases where the parameter in question that I'm using to remove items from the nested document array is true or false or the array is empty, and $match in different use cases just doesn't return the main document.
Here's an example $group
{ '$unwind': '$notes' },
{
$group: {
_id: "$_id",
notes: {
$push: {
$cond: {
if: { $eq: [ "$notes.internal", false ] },
then: "$$REMOVE",
else: "$notes.internal"
}
}
}
}
You may be able to use $addFields with $filter:
{$addFields: {
notes: {$filter: {
input: "$notes",
as: "item",
cond: {$ne: [ "$$item.internal", false ]}
}}
}}

Filter array using the $in operator in the $project stage

Right now, it's not possible to use the $in operator in the $filter array aggregation operator.
Let's say this is the document schema:
{
_id: 1,
users: [
{
_id: 'a',
accounts: ['x', 'y', 'z']
},
{
_id: 'b',
accounts: ['j','k','l']
}
]
}
I want, using aggregate, to get the documents with filtered array of users based on the contents of the accounts array.
IF the $in would work with the $filter operator, I would expect it to look like this:
db.test.aggregate([
{
$project: {
'filtered_users': {
$filter: {
input: '$users',
as: 'user',
cond: {
$in: ['$$user.accounts', ['x']]
}
}
}
}
}
])
and return in the filtered_users only the first user since x is in his account.
But, as I said, this doesn't work and I get the error:
"invalid operator '$in'"
because it isn't supported in the $filter operator.
Now I know I can do it with $unwind and using regular $match operators, but then it will be much longer (and uglier) aggregation with the need of using $group to set the results back as an array - I don't want this
My question is, if there is some other way to manipulate the $filter operator to get my desired results.
Since $in is not supported in aggregate operation for array, the alternative would be for you to use $setIsSubset. For more information on this you can refer this link. The aggregate query would now look like
db.test.aggregate([
{
$project: {
'filtered_users': {
$filter: {
input: '$users',
as: 'user',
cond: {
$setIsSubset: [['x'], '$$user.accounts']
}
}
}
}
}])
This query will return only elements which have [x] as a subset of the array in user.accounts.
Starting From MongoDB 3.4, you can use the $in aggregation operator in the $project stage
db.collection.aggregate([
{
"$project": {
"filtered_users": {
"$filter": {
"input": "$users",
"as": "user",
"cond": { "$in": [ "x", "$$user.accounts" ] }
}
}
}
}
])

How do I query a mongo document containing subset of nested array

Here is a doc I have:
var docIHave = {
_id: "someId",
things: [
{
name: "thing1",
stuff: [1,2,3,4,5,6,7,8,9]
},
{
name: "thing2",
stuff: [4,5,6,7,8,9,10,11,12,13,14]
},
{
name: "thing3",
stuff: [1,4,6,8,11,21,23,30]
}
]
}
This is the doc I want:
var docIWant = {
_id: "someId",
things: [
{
name: "thing1",
stuff: [5,6,7,8,9]
},
{
name: "thing2",
stuff: [5,6,7,8,9,10,11]
},
{
name: "thing3",
stuff: [6,8,11]
}
]
}
stuff´s of docIWant should only contain items greater than min=4
and smaller than max=12.
Background:
I have a meteor app and I subscribe to a collection giving me docIHave. Based on parameters min and max I need the docIWant "on the fly". The original document should not be modified. I need a query or procedure that returns me docIWant with the subset of stuff.
A practical code example would be greatly appreciated.
Use the aggregation framework for this. In the aggregation pipeline, consider the $match operator as your first pipeline stage. This is quite necessary to optimize your aggregation as you would need to filter documents that match the given criteria first before passing them on further down the pipeline.
Next use the $unwind operator. This deconstructs the things array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
Another $unwind operation would be needed on the things.stuff array as well.
The next pipeline stage would then filter dopcuments where the deconstructed things.stuff match the given min and max criteria. Use a $match operator for this.
A $group operator is then required to group the input documents by a specified identifier expression and applies the accumulator expression $push to each group. This creates an array expression to each group.
Typically your aggregation should end up like this (although I haven't actually tested it but this should get you going in the right direction):
db.collection.aggregate([
{
"$match": {
"things.stuff": { "$gt": 4, "$lte": 11 }
}
},
{
"$unwind": "$things"
},
{
"$unwind": "$things.stuff"
},
{
"$match": {
"things.stuff": { "$gt": 4, "$lte": 11 }
}
},
{
"$group": {
"_id": {
"_id": "$_id",
"things": "$things"
},
"stuff": {
"$push": "$things.stuff"
}
}
},
{
"$group": {
"_id": "$_id._id",
"things": {
"$push": {
"name": "$_id.things.name",
"stuff": "$stuff"
}
}
}
}
])
If you need to transform the document on the client for display purposes, you could do something like this:
Template.myTemplate.helpers({
transformedDoc: function() {
// get the bounds - maybe these are stored in session vars
var min = Session.get('min');
var max = Session.get('max');
// fetch the doc somehow that needs to be transformed
var doc = SomeCollection.findOne();
// transform the thing.stuff arrays
_.each(doc.things, function(thing) {
thing.stuff = _.reject(thing.stuff, function(n) {
return (n < min) || (n > max);
});
});
// return the transformed doc
return doc;
}
});
Then in your template: {{#each transformedDoc.things}}...{{/each}}
Use mongo aggregation like following :
First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project.
The query will be as following:
db.collection.aggregate([
{
$unwind: "$things"
}, {
$unwind: "$things.stuff"
}, {
$match: {
"things.stuff": {
$gt: 4,
$lt:12
}
}
}, {
$group: {
"_id": "$things.name",
"stuff": {
$push: "$things.stuff"
}
}
}, {
$project: {
"thingName": "$_id",
"stuff": 1
}
}])

How to filter array in a mongodb query

In mongodb, I have a collection that contains a single document that looks like the following:
{
"_id" : ObjectId("5552b7fd9e8c7572e36e39df"),
"StackSummaries" : [
{
"StackId" : "arn:aws:cloudformation:ap-southeast-2:406119630047:stack/XXXX-30fb22a-285-439ee279-c7c8d36/4ebd8770-f8f4-11e4-bf36-503f2370240f",
"TemplateDescription" : "XXXX",
"StackStatusReason" : "",
"CreationTime" : "2015-05-12T22:14:50.535Z",
"StackName" : "XXXX",
"StackStatus" : "CREATE_COMPLETE"
},
{
"TemplateDescription" : "XXXX",
"StackStatusReason" : "",
"CreationTime" : "2015-05-11T04:02:05.543Z",
"StackName" : "XXXX",
"StackStatus" : "DELETE_COMPLETE",
"StackId" : "arn:aws:cloudformation:ap-southeast-2:406119630047:stack/XXXXX/7c8d04e0-f792-11e4-bb12-506726f15f9a"
},
{ ... },
{ many others }
]
}
ie the imported results of the aws cli command aws cloudformation
list-stacks
I'm trying to find the items of the StackSummaries array that have a StackStatus of CREATE_COMPLETE or UPDATE_COMPLETE. After much experimenting and reading other SO posts I arrived at the following:
db.cf_list_stacks.aggregate( {$match: {"StackSummaries.StackStatus": "CREATE_COMPLETE"}})
However this still returns the whole document (and I haven't even worried about UPDATE_COMPLETE).
I'm coming from an SQL background and struggling with simple queries like this. Any ideas on how to get the information I'm looking for?
SO posts I've looked at:
MongoDB query with elemMatch for nested array data
MongoDB: multiple $elemMatch
$projection vs $elemMatch
Make $elemMatch (projection) return all objects that match criteria
Update
Notes on things I learned while understanding this topic:
aggregate() is just a pipeline (like a Unix shell pipeline) where each $ operator is just another step. And like shell pipelines they can look complex, but you just build them up step by step until you get the results you want
Mongo has a great webinar: Exploring the Aggregation Framework
RoboMongo is a good tool (GPL3) for working with Mongo data and queries
If you only want the object inside the StackSummaries array, you should use the $unwind clause to expand the array, filter the documents you want and then project only the parts of the document that you actually want.
The query would look something like this:
db.cf_list_stacks.aggregate([
{ '$unwind' : '$StackSummaries' },
{ '$match' : { 'StackSummaries.StackStatus' : 'CREATE_COMPLETE' } },
{ '$project' : {
'TemplateDescription' : '$StackSummaries.TemplateDescription',
'StackStatusReason' : '$StackSummaries.StackStatusReason',
...
} }
])
Useful links:
Aggregation pipeline documentation
$unwind Documentation
$project Documentation
With MongoDB 3.4 and newer, you can leverage the $addFields and $filter operators with the aggregation framework to get the desired result.
Consider running the following pipeline:
db.cf_list_stacks.aggregate([
{
"$addFields": {
"StackSummaries": {
"$filter": {
"input": "$StackSummaries",
"as": "el":
"cond": {
"$in": [
"$$el.StackStatus",
["CREATE_COMPLETE", "UPDATE_COMPLETE"]
]
}
}
}
}
}
]);
For MongoDB 3.2
db.cf_list_stacks.aggregate([
{
"$project": {
"StackSummaries": {
"$filter": {
"input": "$StackSummaries",
"as": "el":
"cond": {
"$or": [
{ "$eq": ["$$el.StackStatus", "CREATE_COMPLETE"] },
{ "$eq": ["$$el.StackStatus", "UPDATE_COMPLETE"] }
]
}
}
}
}
}
]);
For MongoDB 3.0 and below
db.cf_list_stacks.aggregate([
{ "$unwind": "$StackSummaries" },
{
"$match": {
"StackSummaries.StackStatus": {
"$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
}
}
},
{
"$group": {
"_id": "$_id",
"StackSummaries": {
"$addToSet": "$StackSummaries"
}
}
}
])
The above pipeline has the $unwind operator which deconstructs the StackSummaries array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
A further filtering is required after the $unwind to get only the documents that pass the given criteria thus a second $match operator pipeline stage follows.
In order to get the original array field after doing the $unwind bit, you would need to group the documents using the $group operator and within the group you can then use the $addToSet array operator to then push the elements into the array.
Based on the criteria that you are trying to find the items of the StackSummaries array that have a StackStatus of CREATE_COMPLETE OR UPDATE_COMPLETE, you could use $elemMatch projection but this won't work with the $in operator as required to get the document with StackStatus of CREATE_COMPLETE OR UPDATE_COMPLETE at this time. There is a JIRA issue for this:
db.cf_list_stacks.find(
{
"StackSummaries.StackStatus": {
"$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
}
},
{
"StackSummaries": {
"$elemMatch": {
"StackStatus": {
"$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
}
}
}
})
This will only give you documents where the StackStatus has the "CREATE_COMPLETE" value.

How to search embedded array

I want to get all matching values, using $elemMatch.
// create test data
db.foo.insert({values:[0,1,2,3,4,5,6,7,8,9]})
db.foo.find({},{
'values':{
'$elemMatch':{
'$gt':3
}
}
}) ;
My expecected result is {values:[3,4,5,6,7,8,9]} . but , really result is {values:[4]}.
I read mongo document , I understand this is specification.
How do I search for multi values ?
And more, I use 'skip' and 'limit'.
Any idea ?
Using Aggregation:
db.foo.aggregate([
{$unwind:"$values"},
{$match:{"values":{$gt:3}}},
{$group:{"_id":"$_id","values":{$push:"$values"}}}
])
You can add further filter condition in the $match, if you would like to.
You can't achieve this using an $elemMatch operator since, mongoDB doc says:
The $elemMatch projection operator limits the contents of an array
field that is included in the query results to contain only the array
element that matches the $elemMatch condition.
Note
The elements of the array are documents.
If you look carefully at the documentation on $elemMatch or the counterpart to query of the positional $ operator then you would see that only the "first" matched element is returned by this type of "projection".
What you are looking for is actually "manipulation" of the document contents where you want to "filter" the content of the array in the document rather than return the original or "matched" element, as there can be only one match.
For true "filtering" you need the aggregation framework, as there is more support there for document manipulation:
db.foo.aggregate([
// No point selecting documents that do not match your condition
{ "$match": { "values": { "$gt": 3 } } },
// Unwind the array to de-normalize as documents
{ "$unwind": "$values },
// Match to "filter" the array
{ "$match": { "values": { "$gt": 3 } } },
// Group by to the array form
{ "$group": {
"_id": "$_id",
"values": { "$push": "$values" }
}}
])
Or with modern versions of MongoDB from 2.6 and onwards, where the array values are "unique" you could do this:
db.foo.aggregate([
{ "$project": {
"values": {
"$setDifference": [
{ "$map": {
"input": "$values",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el", 3 ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])