ordering fields after applying $setUnion mongoDB - mongodb

I have a collection:
{
"_id" : ObjectId("5338ec2a5b5b71242a1c911c"),
"people" : [
{
"name" : "Vasya"
},
{
"age" : "30"
},
{
"weight" : "80"
}
],
"animals" : [
{
"dog" : "Sharick"
},
{
"cat" : "Barsik"
},
{
"bird" : "parrot"
}
]},{
"_id" : ObjectId("5338ec7f5b5b71242a1c911d"),
"people" : [
{
"name" : "Max"
},
{
"age" : "32"
},
{
"weight" : "78"
}
],
"animals" : [
{
"dog" : "Borbos"
},
{
"cat" : "Murka"
},
{
"bird" : "Eagle"
}
]}
then I combine two arrays "people" and "animals"
db.tmp.aggregate({$project:{"union":{$setUnion:["$people","$animals"]}}})
in the issue:
How to make the fields of each record array "result" to be displayed in a single order, and not randomly?
that is:

Wish I could find the quote ( If I can I will add it ), but it basically comes from the CTO of MongoDB and is essentially (sic) "Set's are not considered to be ordered". And very much so from a Math point of view that is true.
So you have stumbled upon one of the new features in the current (as of writing) 2.6 release candidate series. But like with it's $addToSet counterpart, the resulting set from $setUnion will not be sorted in any way.
To do this you need to $unwind and $sort and then $group again using $push, just as you always have with $addToSet. And of course you would need some common key in order to $sort on this, which your data does not.
Update: Here is the quote, and here is another.

Related

Reference multiple fields with aggregation function

Let's say I have some mongo DB query which returns following two documents. (I am using aggregation & projection which returns me this result set).
{
"name" : {
"value" : "ANDERSON"
},
"ID" : {
"value" : "2356"
},
}
{
"employeename" : {
"value" : "DAVID"
},
"ID" : {
"value" : "2356"
},
}
My DB is schema less & I am storing attributes and there values. There are multiple attributes which represents the same information. For e.g. here "name" & "employeename" represents the same thing. I want the final output in some common attribute (say "Employee Name"). This common attribute can have value either from "name" or "employeename".
I think this problem can be solved by adding one more pipe in with the aggregation. I tried $or (it returns true/false not the value)
db.getCollection('mycollection').aggregate([
{ "$project" : {
"name" : 1,
"ID" : 1, "employeename" : 1
}},
{ "$project":{
"Employee Name": {$or : ["$name", "$employeename"]}
}}
])
Final Output should be
{
" Employee Name" : {
"value" : "ANDERSON"
},
"ID" : {
"value" : "2356"
},
}
{
" Employee Name" : {
"value" : "DAVID"
},
"ID" : {
"value" : "2356"
},
}
Can somebody tell me how to write this mongo DB command?
What you want is the $ifNull operator, you can also shorten your pipeline to one $project stage.
db.getCollection('mycollection').aggregate([
{ "$project" : {
"EmployeeName" : { "$ifNull": [ "$name", "$employeename" ] },
"ID" : 1,
}}
])

MongoDB select documents where field1 equals nested.field2 in aggregate pipeline

I have joined two collections on one field using '$lookup', while actually I needed two fields to have a unique match. My next step would be to unwind the array containing different values of the second field I need for a unique match and then compare these to the value of the second field it needs to match higher up. However, the second line in the snippet below returns no results.
// Request only the page that has been viewed
{ '$unwind' : '$DSpub.PublicationPages'},
{ '$match' : {'pageId' : '$DSpub.PublicationPages.PublicationPageId' } }
Is there a more appropriate way to do this? Or can I avoid doing this altogether by unwinding the "from" collection before performing the '$lookup', and then match both fields?
This is not as easy at it looks.
$match does not operate on dynamic data (that means we are comparing static value against data set). To overcome that - we can use $project phase to add a bool static flag, that can be utilized by $match
Please see example below:
Having input collection like this:
[{
"_id" : ObjectId("56be1b51a0f4c8591f37f62b"),
"name" : "Alice",
"sub_users" : [{
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
]
}, {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a"),
"name" : "Bob",
"sub_users" : [{
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
]
}
]
We want to get only fields where _id and $docs.sub_users._id" are same, where docs are $lookup output.
db.collecction.aggregate([{
$lookup : {
from : "collecction",
localField : "_id",
foreignField : "_id",
as : "docs"
}
}, {
$unwind : "$docs"
}, {
$unwind : "$docs.sub_users"
}, {
$project : {
_id : 0,
fields : "$$ROOT",
matched : {
$eq : ["$_id", "$docs.sub_users._id"]
}
}
}, {
$match : {
matched : true
}
}
])
that gives output:
{
"fields" : {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a"),
"name" : "Bob",
"sub_users" : [
{
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
],
"docs" : {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a"),
"name" : "Bob",
"sub_users" : {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
}
},
"matched" : true
}

Mongo: how to retrieve ONLY subdocs that match certain properties

Having, for example, a collection named test and the following document is inside:
{
"_id" : ObjectId("5692ac4562c824cc5167379f"),
"list" : [
{
"name" : "elem1",
"type" : 1
},
{
"name" : "elem2",
"type" : 2
},
{
"name" : "elem3",
"type" : 1
},
{
"name" : "elem4",
"type" : 3
},
{
"name" : "elem4",
"type" : 2
}
]
}
Let's say I would like to retrieve a list of only those subdocuments inside list that match:
type = 2.
I've tried the following query:
db.getCollection('test').find({
'_id': ObjectId("5692ac4562c824cc5167379f"),
'list.type': 1
})
But the result I get contains every subdocument inside list, and I guess this is because inside list there are at least one document which's type equals 1.
Instead of that, the result I am interested to obtain would be every subdocument inside list that matches 'list.type': 1:
{
"_id" : ObjectId("5692ac4562c824cc5167379f"),
"list" : [
{
"name" : "elem1",
"type" : 1
},
{
"name" : "elem3",
"type" : 1
}
]
}
...so $and $elemMatch is not what I am really looking for as they return just the first matching element.
Anyone knows how to achieve what I am looking for?
db.myCol.aggregate([
{ $unwind: "$list" },
{ $match: { "list.type":1 } },
{ $group: { "_id":"$_id", list: {$push:"$list"}} }
])

paging subdocument in mongodb subdocument

I want to paging my data in Mongodb. I use slice operator but can not paging my data. I wish to bring my row but can not paging in this row.
I want to return only 2 rows of data source.
How can resolve it
My Query :
db.getCollection('forms').find({
"_id": ObjectId("557e8c93a6df1a22041e0879"),
"Questions._id": ObjectId("557e8c9fa6df1a22041e087b")
}, {
"Questions.$.DataSource": {
"$slice": [0, 2]
},
"_id": 0,
"Questions.DataSourceItemCount": 1
})
My collection data :
/* 1 */
{
"_id" : ObjectId("557e8c93a6df1a22041e0879"),
"QuestionCount" : 2.0000000000000000,
"Questions" : [
{
"_id" : ObjectId("557e8c9ba6df1a22041e087a"),
"DataSource" : [],
"DataSourceItemCount" : NumberLong(0)
},
{
"_id" : ObjectId("557e8c9fa6df1a22041e087b"),
"DataSource" : [
{
"_id" : ObjectId("557e9428a6df1a198011fa55"),
"CreationDate" : ISODate("2015-06-15T09:00:24.485Z"),
"IsActive" : true,
"Text" : "sdf",
"Value" : "sdf"
},
{
"_id" : ObjectId("557e98e9a6df1a1a88da8b1d"),
"CreationDate" : ISODate("2015-06-15T09:20:41.027Z"),
"IsActive" : true,
"Text" : "das",
"Value" : "asdf"
},
{
"_id" : ObjectId("557e98eea6df1a1a88da8b1e"),
"CreationDate" : ISODate("2015-06-15T09:20:46.889Z"),
"IsActive" : true,
"Text" : "asdf",
"Value" : "asdf"
},
{
"_id" : ObjectId("557e98f2a6df1a1a88da8b1f"),
"CreationDate" : ISODate("2015-06-15T09:20:50.401Z"),
"IsActive" : true,
"Text" : "asd",
"Value" : "asd"
},
{
"_id" : ObjectId("557e98f5a6df1a1a88da8b20"),
"CreationDate" : ISODate("2015-06-15T09:20:53.639Z"),
"IsActive" : true,
"Text" : "asd",
"Value" : "asd"
}
],
"DataSourceItemCount" : NumberLong(5)
}
],
"Name" : "er"
}
Though this is possible to do with some real wrangling you would be best off changing the document structure to "flatten" the array entries into a single array. The main reason for this is "updates" which are not atomically supported by MongoDB with respect to updating the "inner" array due to the current limitations of the positional $ operator.
At any rate, it's not easy to deal with for the reasons that will become apparent.
For the present structure you approach it like this:
db.collection.aggregate([
// Match the required document and `_id` is unique
{ "$match": {
"_id": ObjectId("557e8c93a6df1a22041e0879")
}},
// Unwind the outer array
{ "$unwind": "$Questions" },
// Match the inner entry
{ "$match": {
"Questions._id": ObjectId("557e8c9fa6df1a22041e087b"),
}},
// Unwind the inner array
{ "$unwind": "$Questions.DataSource" }
// Find the first element
{ "$group": {
"_id": {
"_id": "$_id",
"questionId": "$Questions._id"
},
"firstSource": { "$first": "$Questions.DataSource" },
"sources": { "$push": "$Questions.DataSource" }
}},
// Unwind the sources again
{ "$unwind": "$sources" },
// Compare the elements to keep
{ "$project": {
"firstSource": 1,
"sources": 1,
"seen": { "$eq": [ "$firstSource._id", "$sources._id" ] }
}},
// Filter out anything "seen"
{ "$match": { "seen": true } },
// Group back the elements you want
{ "$group": {
"_id": "$_id",
"firstSource": "$firstSource",
"secondSource": { "$first": "$sources" }
}}
])
So that is going to give you the "first two elements" of that inner array. It's the basic process for implementing $slice in the aggregation framework, which is required since you cannot use standard projection with a "nested array" in the way you are trying.
Since $slice is not supported otherwise with the aggregation framework, you can see that doing "paging" would be a pretty horrible and "iterative" operation in order to "pluck" the array elements.
I could at this point suggest "flattening" to a single array, but the same "slicing" problem applies because even if you made "QuestionId" a property of the "inner" data, it has the same projection an selection problems for which you need the same aggregation approach.
Then there is this "seemingly" not great structure for your data ( for some query operations ) but it all depends on your usage patterns. This structure suits this type of operation:
{
"_id" : ObjectId("557e8c93a6df1a22041e0879"),
"QuestionCount" : 2.0000000000000000,
"Questions" : {
"557e8c9ba6df1a22041e087a": {
"DataSource" : [],
"DataSourceItemCount" : NumberLong(0)
},
"557e8c9fa6df1a22041e087b": {
"DataSource" : [
{
"_id" : ObjectId("557e9428a6df1a198011fa55"),
"CreationDate" : ISODate("2015-06-15T09:00:24.485Z"),
"IsActive" : true,
"Text" : "sdf",
"Value" : "sdf"
},
{
"_id" : ObjectId("557e98e9a6df1a1a88da8b1d"),
"CreationDate" : ISODate("2015-06-15T09:20:41.027Z"),
"IsActive" : true,
"Text" : "das",
"Value" : "asdf"
}
],
"DataSourceItemCount" : NumberLong(5)
}
}
}
Where this works:
db.collection.find(
{
"_id": ObjectId("557e8c93a6df1a22041e0879"),
"Questions.557e8c9fa6df1a22041e087b": { "$exists": true }
},
{
"_id": 0,
"Questions.557e8c9fa6df1a22041e087b.DataSource": { "$slice": [0, 2] },
"Questions.557e8c9fa6df1a22041e087b.DataSourceItemCount": 1
}
)
Nested arrays are not great for many operations, particularly update operations since it is not possible to get the "inner" array index for update operations. The positional $ operator will only get the "first" or "outer" array index and cannot "also" match the inner array index.
Updates with a structure like you have involve "reading" the document as a whole and then manipulating in code and writing back. There is no "guarantee" that the document has not changed in the collection between those operations and it can lead to inconsistencies unless handled properly.
On the other hand, the revised structure as shown, works well for the type of query given, but may be "bad" if you need to dynamically search or "aggregate" across what you have represented as the "outer" "Questions".
Data structure with MongoDB is very subjective to "how you use it". So it is best to consider all of your usage patterns before "nailing down" a final data structure design for your application.
So you can either take note of the problems and solutions as noted, or simply live with retrieving the "outer" element via the standard "positional" match and then just "slice" in your client code.
It's all a matter of "what suits your application best".

What's the $unwind operator in MongoDB?

This is my first day with MongoDB so please go easy with me :)
I can't understand the $unwind operator, maybe because English is not my native language.
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
The project operator is something I can understand, I suppose (it's like SELECT, isn't it?). But then, $unwind (citing) returns one document for every member of the unwound array within every source document.
Is this like a JOIN? If yes, how the result of $project (with _id, author, title and tags fields) can be compared with the tags array?
NOTE: I've taken the example from MongoDB website, I don't know the structure of tags array. I think it's a simple array of tag names.
The thing to remember is that MongoDB employs an "NoSQL" approach to data storage, so perish the thoughts of selects, joins, etc. from your mind. The way that it stores your data is in the form of documents and collections, which allows for a dynamic means of adding and obtaining the data from your storage locations.
That being said, in order to understand the concept behind the $unwind parameter, you first must understand what the use case that you are trying to quote is saying. The example document from mongodb.org is as follows:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
Notice how tags is actually an array of 3 items, in this case being "fun", "good" and "fun".
What $unwind does is allow you to peel off a document for each element and returns that resulting document.
To think of this in a classical approach, it would be the equivilent of "for each item in the tags array, return a document with only that item".
Thus, the result of running the following:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
would return the following documents:
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
Notice that the only thing changing in the result array is what is being returned in the tags value. If you need an additional reference on how this works, I've included a link here.
$unwind duplicates each document in the pipeline, once per array element.
So if your input pipeline contains one article doc with two elements in tags, {$unwind: '$tags'} would transform the pipeline to be two article docs that are the same except for the tags field. In the first doc, tags would contain the first element from the original doc's array, and in the second doc, tags would contain the second element.
consider the below example to understand this
Data in a collection
{
"_id" : 1,
"shirt" : "Half Sleeve",
"sizes" : [
"medium",
"XL",
"free"
]
}
Query -- db.test1.aggregate( [ { $unwind : "$sizes" } ] );
output
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "medium" }
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "XL" }
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "free" }
As per mongodb official documentation :
$unwind Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
Explanation through basic example :
A collection inventory has the following documents:
{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
The following $unwind operations are equivalent and return a document for each element in the sizes field. If the sizes field does not resolve to an array but is not missing, null, or an empty array, $unwind treats the non-array operand as a single element array.
db.inventory.aggregate( [ { $unwind: "$sizes" } ] )
or
db.inventory.aggregate( [ { $unwind: { path: "$sizes" } } ]
Above query output :
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
Why is it needed?
$unwind is very useful while performing aggregation. it breaks complex/nested document into simple document before performaing various operation like sorting, searcing etc.
To know more about $unwind :
https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
To know more about aggregation :
https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
Let me explain in a way corelated to RDBMS way. This is the statement:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
to apply to the document / record:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The $project / Select simply returns these field/columns as
SELECT author, title, tags FROM article
Next is the fun part of Mongo, consider this array tags : [ "fun" , "good" , "fun" ] as another related table (can't be a lookup/reference table because values has some duplication) named "tags". Remember SELECT generally produces things vertical, so unwind the "tags" is to split() vertically into table "tags".
The end result of $project + $unwind:
Translate the output to JSON:
{ "author": "bob", "title": "this is my title", "tags": "fun"},
{ "author": "bob", "title": "this is my title", "tags": "good"},
{ "author": "bob", "title": "this is my title", "tags": "fun"}
Because we didn't tell Mongo to omit "_id" field, so it's auto-added.
The key is to make it table-like to perform aggregation.