What's the $unwind operator in MongoDB? - mongodb

This is my first day with MongoDB so please go easy with me :)
I can't understand the $unwind operator, maybe because English is not my native language.
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
The project operator is something I can understand, I suppose (it's like SELECT, isn't it?). But then, $unwind (citing) returns one document for every member of the unwound array within every source document.
Is this like a JOIN? If yes, how the result of $project (with _id, author, title and tags fields) can be compared with the tags array?
NOTE: I've taken the example from MongoDB website, I don't know the structure of tags array. I think it's a simple array of tag names.

The thing to remember is that MongoDB employs an "NoSQL" approach to data storage, so perish the thoughts of selects, joins, etc. from your mind. The way that it stores your data is in the form of documents and collections, which allows for a dynamic means of adding and obtaining the data from your storage locations.
That being said, in order to understand the concept behind the $unwind parameter, you first must understand what the use case that you are trying to quote is saying. The example document from mongodb.org is as follows:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
Notice how tags is actually an array of 3 items, in this case being "fun", "good" and "fun".
What $unwind does is allow you to peel off a document for each element and returns that resulting document.
To think of this in a classical approach, it would be the equivilent of "for each item in the tags array, return a document with only that item".
Thus, the result of running the following:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
would return the following documents:
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
Notice that the only thing changing in the result array is what is being returned in the tags value. If you need an additional reference on how this works, I've included a link here.

$unwind duplicates each document in the pipeline, once per array element.
So if your input pipeline contains one article doc with two elements in tags, {$unwind: '$tags'} would transform the pipeline to be two article docs that are the same except for the tags field. In the first doc, tags would contain the first element from the original doc's array, and in the second doc, tags would contain the second element.

consider the below example to understand this
Data in a collection
{
"_id" : 1,
"shirt" : "Half Sleeve",
"sizes" : [
"medium",
"XL",
"free"
]
}
Query -- db.test1.aggregate( [ { $unwind : "$sizes" } ] );
output
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "medium" }
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "XL" }
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "free" }

As per mongodb official documentation :
$unwind Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
Explanation through basic example :
A collection inventory has the following documents:
{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
The following $unwind operations are equivalent and return a document for each element in the sizes field. If the sizes field does not resolve to an array but is not missing, null, or an empty array, $unwind treats the non-array operand as a single element array.
db.inventory.aggregate( [ { $unwind: "$sizes" } ] )
or
db.inventory.aggregate( [ { $unwind: { path: "$sizes" } } ]
Above query output :
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
Why is it needed?
$unwind is very useful while performing aggregation. it breaks complex/nested document into simple document before performaing various operation like sorting, searcing etc.
To know more about $unwind :
https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
To know more about aggregation :
https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

Let me explain in a way corelated to RDBMS way. This is the statement:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
to apply to the document / record:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The $project / Select simply returns these field/columns as
SELECT author, title, tags FROM article
Next is the fun part of Mongo, consider this array tags : [ "fun" , "good" , "fun" ] as another related table (can't be a lookup/reference table because values has some duplication) named "tags". Remember SELECT generally produces things vertical, so unwind the "tags" is to split() vertically into table "tags".
The end result of $project + $unwind:
Translate the output to JSON:
{ "author": "bob", "title": "this is my title", "tags": "fun"},
{ "author": "bob", "title": "this is my title", "tags": "good"},
{ "author": "bob", "title": "this is my title", "tags": "fun"}
Because we didn't tell Mongo to omit "_id" field, so it's auto-added.
The key is to make it table-like to perform aggregation.

Related

Mongo Query for 2 values in array list in a document field

I have a document in a collection that has a field called "myList", it has list items and I need to be able to query collection documents that have field "status" of "good" and "doneBy" with a value of "system" in the "myList" field:
[collection].myList
[
{
"location" : "3826487.pdf",
"status" : "good",
"time" : ISODate("2017-06-27T20:03:46.512Z"),
"reportIdx" : 0,
"doneBy" : "System"
},
{
"location" : "rt-0.pdf",
"status" : "bad",
"time" : ISODate("2017-06-28T16:24:16.559Z"),
"reportIdx" : 0,
"doneBy" : "System"
}
]
It should return all documents that have a list item qualified by the first one in this list. Even though the second list item has "bad", it should return this collection doc with "myList" having these 2 list items.
I figured out that a search for one of the fields would be this but how to do both , I'm not sure of the syntax.
db.getCollection('[collection]').find({myList : { $elemMatch : { "status" : "good" }}})
I believe I found it:
db.getCollection('[collection]')find({ myList:
{ $all: [
{$elemMatch : { "status" : "good" }},
{$elemMatch : {"doneBy" : "System"}}
]
} })

How to search character by character in mongodb array text field?

I have documents in mongodb are like :
[{
"_id" : 1,
"name" : "Himanshu",
"tags" : ["member", "active"]
},{
"_id" : 2,
"name" : "Teotia",
"tags" : ["employer", "withdrawal"]
},{
"_id" : 3,
"name" : "John",
"tags" : ["member", "deactive"]
},{
"_id" : 4,
"name" : "Haris",
"tags" : ["employer", "action"]
}]
What I want to search here is if we have array of filter like {"tags" : ["member", "act"]} it will reply back id's 1 and 2 because here member is full match and act partial match in two documents.
if I have filter like {"tags" : ["mem"] } then it should reply id's 1 and 3
One more case If I have filter like {"tags" : ["member", "active"]} then it should answer only 1.
You basically need two concepts here.
Convert each input string of the array into an Regular expression anchored to the "start" of the string:
Apply the list with the $all operator to ensure "all" matches:
var filter = { "tags": [ "mem", "active" ] };
// Map with regular expressions
filter.tags = { "$all": filter.tags.map(t => new RegExpr("^" + t)) };
// makes filter to { "tags": { "$all": [ /^mem/, /^active/ ] } }
// search for results
db.collection.find(filter);

Mongodb accessing documents

I've the following db:
{ "_id" : 1, "results" : [ { "product" : "abc", "score" : 10 }, { "product" : "xyz", "score" : 5 } ] }
{ "_id" : 2, "results" : [ { "product" : "abc", "score" : 8 }, { "product" : "xyz", "score" : 7 } ] }
{ "_id" : 3, "results" : [ { "product" : "abc", "score" : 7 }, { "product" : "xyz", "score" : 8 } ] }
I want to show the first score of each _id, i tried the following:
db.students.find({},{"results.$":1})
But it doesn't seem to work, any advice?
You can take advantage of aggregation pipeline to solve this.
Use $project in conjunction with $arrayElemAt to point to appropriate node index in the array.
So, to extract the documents of the first score, have written below query.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results",0]}} } ]);
In case if you just wish to have scores excluding product, use $results.score as shown below.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results.score",0]}} } ]);
Here scoredoc object will have all documents of first score element.
Hope this helps!
According to above mentioned description please try executing following query in MongoDB shell
db.students.find(
{results:
{$elemMatch:{score:{$exists:true}}}}, {'results.$.score':1}
)
According to MongoDB documentation
The positional $ operator limits the contents of an from the
query results to contain only the first element matching the query
document.
Hence in above mentioned query positional $ operator is used in projection section to retrieve first score of each document.

MongoDB: Retrieve most referenced document

I have a MongoDB collection (called 'links') with documents like this one:
{
"_id" : ObjectId("544bc8abd4c66b0e3cf12665"),
"name" : "Pet 4056 AgR",
"file" : "P0001J01",
"quotes" : [
{
"_id" : ObjectId("544bc8afd4c66b0e3cf15173"),
"name" : "Pet 4837 ED",
"file" : "P1103J03"
},
{
"_id" : ObjectId("544bc8b6d4c66b0e3cf19425"),
"name" : "ACO 845 AgR",
"file" : "P2810J07"
},
{
"_id" : ObjectId("544bc8afd4c66b0e3cf14a77"),
"name" : "ACO 1574 AgR",
"file" : "P0924J05"
}
]
}
In my db, this means that this document references 3 other documents.
For each document, in its quotes array there are no two documents with the same id/name/file. The name field is unique in the collection.
Now, I need to get the document that is the most referenced. It's the document that appears in most quotes arrays. How can I do that?
I believe this is achieved through an aggregation, but I can't figure out how to do it, especially because the names are inside an array.
Thanks! :)
You can do this using the aggregation framework, but a key feature to working with arrays is that you use the $unwind pipeline operation to first "de-normalize" the array content as separate documents:
db.links.aggregate([
// Unwind the array
{ "$unwind": "$quotes" },
// Group by the inner "name" value and count the occurrences
{ "$group": {
"_id": "$quotes.name",
"count": { "$sum": 1 }
}},
// Sort to the highest count on top
{ "$sort": { "count": 1 } },
// Just return the largest value
{ "$limit": 1 }
])
So what $unwind does here is for each array element it takes a copy of the "outer" document that owns the array and produces a new document containing the outer and just the singular array element. Basically like this:
{
"_id" : ObjectId("544bc8abd4c66b0e3cf12665"),
"name" : "Pet 4056 AgR",
"file" : "P0001J01",
"quotes" :
{
"_id" : ObjectId("544bc8afd4c66b0e3cf15173"),
"name" : "Pet 4837 ED",
"file" : "P1103J03"
}
},
{
"_id" : ObjectId("544bc8abd4c66b0e3cf12665"),
"name" : "Pet 4056 AgR",
"file" : "P0001J01",
"quotes" :
{
"_id" : ObjectId("544bc8b6d4c66b0e3cf19425"),
"name" : "ACO 845 AgR",
"file" : "P2810J07"
}
}
This allows other aggregation pipeline stages to access content just as any normal document, so you can $group the occurrences on "quotes.name" without a problem.
Take a good look at all of the aggregation pipeline operators, it is worth understanding what they all do.

ordering fields after applying $setUnion mongoDB

I have a collection:
{
"_id" : ObjectId("5338ec2a5b5b71242a1c911c"),
"people" : [
{
"name" : "Vasya"
},
{
"age" : "30"
},
{
"weight" : "80"
}
],
"animals" : [
{
"dog" : "Sharick"
},
{
"cat" : "Barsik"
},
{
"bird" : "parrot"
}
]},{
"_id" : ObjectId("5338ec7f5b5b71242a1c911d"),
"people" : [
{
"name" : "Max"
},
{
"age" : "32"
},
{
"weight" : "78"
}
],
"animals" : [
{
"dog" : "Borbos"
},
{
"cat" : "Murka"
},
{
"bird" : "Eagle"
}
]}
then I combine two arrays "people" and "animals"
db.tmp.aggregate({$project:{"union":{$setUnion:["$people","$animals"]}}})
in the issue:
How to make the fields of each record array "result" to be displayed in a single order, and not randomly?
that is:
Wish I could find the quote ( If I can I will add it ), but it basically comes from the CTO of MongoDB and is essentially (sic) "Set's are not considered to be ordered". And very much so from a Math point of view that is true.
So you have stumbled upon one of the new features in the current (as of writing) 2.6 release candidate series. But like with it's $addToSet counterpart, the resulting set from $setUnion will not be sorted in any way.
To do this you need to $unwind and $sort and then $group again using $push, just as you always have with $addToSet. And of course you would need some common key in order to $sort on this, which your data does not.
Update: Here is the quote, and here is another.