How to get mongodb deeply embeded document id - mongodb

I have the following mongo document, which is part of a bigger document called attributes, which also has Colour and Size
> db.attributes.find({'name': {'en-UK': 'Fabric'}}).pretty()
{
"_id" : ObjectId("543261cda14c971132fa2b91"),
"values" : [
{
"source" : [
{
"_id" : ObjectId("543261cda14c971132fa2b79"),
"name" : {
"en-UK" : "Combed Cotton"
}
},
],
"name" : [
{
"_id" : ObjectId("543261cda14c971132fa2b85"),
"name" : {
"en-UK" : "Brushed 3-ply"
}
},
{
"_id" : ObjectId("543261cda14c971132fa2b8f"),
"name" : {
"en-UK" : "Plain Weave"
}
},
{
"_id" : ObjectId("543261cda14c971132fa2b90"),
"name" : {
"en-UK" : "1x1 Rib"
}
}
]
}
],
"name" : {
"en-UK" : "Fabric"
}
}
I am trying to return the _id for a sub document and have the following:
db.attributes.aggregate([
{ '$match': {'name.en-UK': 'Fabric'} },
{ '$unwind' : '$values' },
{ '$project': { 'name' : '$values.name'} },
{ '$match': { '$and': [{"name.name.en-UK" : "1x1 Rib"} ] }}
])
What is the correct way to do this?
Also, the values of Fabric is an array with two items, source and name, but if I populate it like:
> db.attributes.find({'name': {'en-UK': 'Fabric'}}).pretty()
{
"_id" : ObjectId("543261cda14c971132fa2b91"),
"values" : {
"source" : [{ ... }]
"name": [{ ... }]
}
}
I get the following error
"errmsg" : "exception: $unwind: value at end of field path must be an array"
But if I wrap it inside a square brackets this then works, so that
> db.attributes.find({'name': {'en-UK': 'Fabric'}}).pretty()
{
"_id" : ObjectId("543261cda14c971132fa2b91"),
"values" : [{
"source" : [{ ... }],
"name": [{ ... }]
}]
}
what am I missing as values is an array of two objects, source and name each containing a list of arrays
Any advice much appreciated

What you seem to be "missing" here is that "some" of your documents do either not contain a "value" property at all or at the very least it is "not an array". This is the basic context of the error you have been given.
Fortunately there are a couple of ways to get around this. Namely, either "testing" for the presence of an array when submitting you original query. Or actually "substituting" the missing element for some kind of array when processing the pipeline.
Here are both approaches in what is effectively an redundant form since the first $match condition really sorts this out:
db.attributes.aggregate([
{ "$match": {
"name.en-UK": "Fabric",
"values.0": { "$exists": true }
}},
{ "$project": {
"name": 1,
"values": { "$ifNull": [ "$values", [] ] }
}},
{ "$unwind": "$values" },
{ "$unwind": "$values.name" },
{ "$match": { "values.name.name.en-UK" : "1x1 Rib" }}
])
So as I said. Really redundant in that the initial $match actually asks if an "initial array element" actually exists. Which kind of means that there is an array there.
The second $project phase actually uses the $ifNull operator to "fill in" a value ( or basically an empty array ) where the tested element does not exist. We tested for that anyway before, but this demonstrates the different approaches.
But the basic idea id either "avoiding" or "filling-in" where your document does not have the expected data that you want to process. Which is the cause of your error.

Related

How to convert multiple documents from a single collection to a single document containing one array

I have an aggregation pipeline that nearly does what I want. I've used match / unwind / project / sort to get 99% of the way. It is returning multiple documents:
[
{
"_id" : 254.8
},
{
"_id" : 93.7
},
{
"_id" : 89.9
},
{
"_id" : 94.15
},
{
"_id" : 102.1
},
{
"_id" : 93.9
},
{
"_id" : 102.7
}
]
Note: I've added the array brackets and commas to make it more readable, but you can also read it as:
{
"_id" : 254.8
}
{
"_id" : 93.7
}
{
"_id" : 89.9
}
{
"_id" : 94.15
}
{
"_id" : 102.1
}
I need the contents of the ID fields from all 7 documents in an array of values in one document:
{values: [254.8, 93.7, 89.9, 94.15, 102.1, 93.9, 102.7]}
It would be easy to sort this with JS once I have the results but I'd rather do it in the pipeline if possible so my JS stays 100% generic and only returns pure pipeline data.
Here is what you need to complete the job:
db.collection.aggregate([
{
"$group": {
"_id": null,
"values": {
$push: "$_id"
}
}
},
{
"$project": {
_id: false
}
}
])
The result will be:
[
{
"values": [
254.8,
93.7,
89.9,
94.15,
102.1,
93.9,
102.7
]
}
]
https://mongoplayground.net/p/pTmR_rni0J1

Omit empty fields from MongoDB query result

Is there a way to omit empty fields (eg empty string, or an empty array) from MongoDB query results' documents (find or aggregate).
Document in DB:
{
"_id" : ObjectId("5dc3fcb388c1c7c5620ed496"),
"name": "Bill",
"emptyString" : "",
"emptyArray" : []
}
Output:
{
"_id" : ObjectId("5dc3fcb388c1c7c5620ed496"),
"name": "Bill"
}
Similar question for Elasticsearch: Omit null fields from elasticsearch results
Please use aggregate function.
If you want to remove key. you use $cond by using $project.
db.Speed.aggregate( [
{
$project: {
name: 1,
"_id": 1,
"emptyString": {
$cond: {
if: { $eq: [ "", "$emptyString" ] },
then: "$$REMOVE",
else: "$emptyString"
}
},
"emptyArray": {
$cond: {
if: { $eq: [ [], "$emptyArray" ] },
then: "$$REMOVE",
else: "$emptyArray"
}
}
}
}
] )
One way this could be done is using cursor.map() which is available on find() and aggregation([]) both.
The idea is to have list of the fields that are present/could be in the documents and filter out by using delete operator to remove the fields (which are empty strings or empty array, both have length property) from returning document.
Mongo Shell:
var fieldsList = ["name", "emptyString", "emptyArray"];
db.collection.find().map(function(d) {
fieldsList.forEach(function(k) {
if (
k in d &&
(Array.isArray(d[k]) ||
(typeof d[k] === "string" || d[k] instanceof String)) &&
d[k].length === 0
) {
delete d[k];
}
});
return d;
});
Test documents:
{
"_id" : ObjectId("5dc426d1f667120607ac5006"),
"name" : "Bill",
"emptyString" : "",
"emptyArray" : [ ]
}
{
"_id" : ObjectId("5dc426d1f667120607ac5007"),
"name" : "Foo",
"emptyString" : "foo",
"emptyArray" : [ ]
}
{
"_id" : ObjectId("5dc426d1f667120607ac5008"),
"name" : "Bar",
"emptyString" : "",
"emptyArray" : [
"foo",
"bar"
]
}
{
"_id" : ObjectId("5dc426d1f667120607ac5009"),
"name" : "May",
"emptyString" : "foobar",
"emptyArray" : [
"foo",
"bar"
]
}
O/P
[
{
"_id" : ObjectId("5dc426d1f667120607ac5006"),
"name" : "Bill"
},
{
"_id" : ObjectId("5dc426d1f667120607ac5007"),
"name" : "Foo",
"emptyString" : "foo"
},
{
"_id" : ObjectId("5dc426d1f667120607ac5008"),
"name" : "Bar",
"emptyArray" : [
"foo",
"bar"
]
},
{
"_id" : ObjectId("5dc426d1f667120607ac5009"),
"name" : "May",
"emptyString" : "foobar",
"emptyArray" : [
"foo",
"bar"
]
}
]
Note: if the number of fields are very large in the documents this may not be very optimal solution since the comparisons are going to happen with all fields in document. You might want to chunk the fieldsList with properties that are suspected to be empty array or string.
I think the easiest way to remove all empty string- and empty array-fields from the output is to add the aggregation stage below. (And yes, "easy" is relative, when you have to create these levels of logic to accomplish such a trivial task...)
$replaceRoot: {
newRoot: {
$arrayToObject: {
$filter: {
input: {
$objectToArray: '$$ROOT'
},
as: 'item',
cond: {
$and: [
{ $ne: [ '$$item.v', [] ] },
{ $ne: [ '$$item.v', '' ] }
]
}
}
}
}
}
Just modify the cond-clause to filter out other types of fields (e.g. null).
btw: I haven't tested the performance of this, but at least it's generic and somewhat readable.
Edit: IMPORTANT! The $replaceRoot-stage does prevent MongoDB from optimizing the pipeline, so if you use it in a View that you run .find() on, it will append a $match-stage to the end of the View's pipeline, in stead of prepending an indexed search at the start of the pipeline. This will have significant impact on the performance. You can safely use it in a custom pipeline though, as long as you have the $match-stage before it. (At least as far as my limited MongoDB knowledge tells me). And if anyone knows how to prépend a $match-stage to a View when querying, then please leave a comment :-)

paging subdocument in mongodb subdocument

I want to paging my data in Mongodb. I use slice operator but can not paging my data. I wish to bring my row but can not paging in this row.
I want to return only 2 rows of data source.
How can resolve it
My Query :
db.getCollection('forms').find({
"_id": ObjectId("557e8c93a6df1a22041e0879"),
"Questions._id": ObjectId("557e8c9fa6df1a22041e087b")
}, {
"Questions.$.DataSource": {
"$slice": [0, 2]
},
"_id": 0,
"Questions.DataSourceItemCount": 1
})
My collection data :
/* 1 */
{
"_id" : ObjectId("557e8c93a6df1a22041e0879"),
"QuestionCount" : 2.0000000000000000,
"Questions" : [
{
"_id" : ObjectId("557e8c9ba6df1a22041e087a"),
"DataSource" : [],
"DataSourceItemCount" : NumberLong(0)
},
{
"_id" : ObjectId("557e8c9fa6df1a22041e087b"),
"DataSource" : [
{
"_id" : ObjectId("557e9428a6df1a198011fa55"),
"CreationDate" : ISODate("2015-06-15T09:00:24.485Z"),
"IsActive" : true,
"Text" : "sdf",
"Value" : "sdf"
},
{
"_id" : ObjectId("557e98e9a6df1a1a88da8b1d"),
"CreationDate" : ISODate("2015-06-15T09:20:41.027Z"),
"IsActive" : true,
"Text" : "das",
"Value" : "asdf"
},
{
"_id" : ObjectId("557e98eea6df1a1a88da8b1e"),
"CreationDate" : ISODate("2015-06-15T09:20:46.889Z"),
"IsActive" : true,
"Text" : "asdf",
"Value" : "asdf"
},
{
"_id" : ObjectId("557e98f2a6df1a1a88da8b1f"),
"CreationDate" : ISODate("2015-06-15T09:20:50.401Z"),
"IsActive" : true,
"Text" : "asd",
"Value" : "asd"
},
{
"_id" : ObjectId("557e98f5a6df1a1a88da8b20"),
"CreationDate" : ISODate("2015-06-15T09:20:53.639Z"),
"IsActive" : true,
"Text" : "asd",
"Value" : "asd"
}
],
"DataSourceItemCount" : NumberLong(5)
}
],
"Name" : "er"
}
Though this is possible to do with some real wrangling you would be best off changing the document structure to "flatten" the array entries into a single array. The main reason for this is "updates" which are not atomically supported by MongoDB with respect to updating the "inner" array due to the current limitations of the positional $ operator.
At any rate, it's not easy to deal with for the reasons that will become apparent.
For the present structure you approach it like this:
db.collection.aggregate([
// Match the required document and `_id` is unique
{ "$match": {
"_id": ObjectId("557e8c93a6df1a22041e0879")
}},
// Unwind the outer array
{ "$unwind": "$Questions" },
// Match the inner entry
{ "$match": {
"Questions._id": ObjectId("557e8c9fa6df1a22041e087b"),
}},
// Unwind the inner array
{ "$unwind": "$Questions.DataSource" }
// Find the first element
{ "$group": {
"_id": {
"_id": "$_id",
"questionId": "$Questions._id"
},
"firstSource": { "$first": "$Questions.DataSource" },
"sources": { "$push": "$Questions.DataSource" }
}},
// Unwind the sources again
{ "$unwind": "$sources" },
// Compare the elements to keep
{ "$project": {
"firstSource": 1,
"sources": 1,
"seen": { "$eq": [ "$firstSource._id", "$sources._id" ] }
}},
// Filter out anything "seen"
{ "$match": { "seen": true } },
// Group back the elements you want
{ "$group": {
"_id": "$_id",
"firstSource": "$firstSource",
"secondSource": { "$first": "$sources" }
}}
])
So that is going to give you the "first two elements" of that inner array. It's the basic process for implementing $slice in the aggregation framework, which is required since you cannot use standard projection with a "nested array" in the way you are trying.
Since $slice is not supported otherwise with the aggregation framework, you can see that doing "paging" would be a pretty horrible and "iterative" operation in order to "pluck" the array elements.
I could at this point suggest "flattening" to a single array, but the same "slicing" problem applies because even if you made "QuestionId" a property of the "inner" data, it has the same projection an selection problems for which you need the same aggregation approach.
Then there is this "seemingly" not great structure for your data ( for some query operations ) but it all depends on your usage patterns. This structure suits this type of operation:
{
"_id" : ObjectId("557e8c93a6df1a22041e0879"),
"QuestionCount" : 2.0000000000000000,
"Questions" : {
"557e8c9ba6df1a22041e087a": {
"DataSource" : [],
"DataSourceItemCount" : NumberLong(0)
},
"557e8c9fa6df1a22041e087b": {
"DataSource" : [
{
"_id" : ObjectId("557e9428a6df1a198011fa55"),
"CreationDate" : ISODate("2015-06-15T09:00:24.485Z"),
"IsActive" : true,
"Text" : "sdf",
"Value" : "sdf"
},
{
"_id" : ObjectId("557e98e9a6df1a1a88da8b1d"),
"CreationDate" : ISODate("2015-06-15T09:20:41.027Z"),
"IsActive" : true,
"Text" : "das",
"Value" : "asdf"
}
],
"DataSourceItemCount" : NumberLong(5)
}
}
}
Where this works:
db.collection.find(
{
"_id": ObjectId("557e8c93a6df1a22041e0879"),
"Questions.557e8c9fa6df1a22041e087b": { "$exists": true }
},
{
"_id": 0,
"Questions.557e8c9fa6df1a22041e087b.DataSource": { "$slice": [0, 2] },
"Questions.557e8c9fa6df1a22041e087b.DataSourceItemCount": 1
}
)
Nested arrays are not great for many operations, particularly update operations since it is not possible to get the "inner" array index for update operations. The positional $ operator will only get the "first" or "outer" array index and cannot "also" match the inner array index.
Updates with a structure like you have involve "reading" the document as a whole and then manipulating in code and writing back. There is no "guarantee" that the document has not changed in the collection between those operations and it can lead to inconsistencies unless handled properly.
On the other hand, the revised structure as shown, works well for the type of query given, but may be "bad" if you need to dynamically search or "aggregate" across what you have represented as the "outer" "Questions".
Data structure with MongoDB is very subjective to "how you use it". So it is best to consider all of your usage patterns before "nailing down" a final data structure design for your application.
So you can either take note of the problems and solutions as noted, or simply live with retrieving the "outer" element via the standard "positional" match and then just "slice" in your client code.
It's all a matter of "what suits your application best".

Aggregation framework flatten subdocument data with parent document

I am building a dashboard that rotates between different webpages. I am wanting to pull all slides that are part of the "Test" deck and order them appropriately. After the query my result would ideally look like.
[
{ "url" : "http://10.0.1.187", "position": 1, "duartion": 10 },
{ "url" : "http://10.0.1.189", "position": 2, "duartion": 3 }
]
I currently have a dataset that looks like the following
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
},
{
"title" : "Other Deck",
"position" : 1,
"duration" : 10
}
],
"url" : "http://10.0.1.187"
}
My attempted query looks like:
db.slides.aggregate([
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);
But it does not yield my desired results. How can I query my dataset and get my expected results in a optimal way?
Well to truly "flatten" the document as your title suggests then $unwind is always going to be employed as there really is not other way to do that. There are however some different approaches if you can live with the array being filtered down to the matching element.
Basically speaking, if you really only have one thing to match in the array then your fastest approach is to simply use .find() matching the required element and projecting:
db.slides.find(
{ "decks.title": "Test" },
{ "decks.$": 1 }
).sort({ "decks.position": 1 }).pretty()
That is still an array but as long as you have only one element that matches then this does work. Also the items are sorted as expected, though of course the "title" field is not dropped from the matched documents, as that is beyond the possibilities for simple projection.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
}
]
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
Another approach, as long as you have MongoDB 2.6 or greater available, is using the $map operator and some others in order to both "filter" and re-shape the array "in-place" without actually applying $unwind:
db.slides.aggregate([
{ "$project": {
"url": 1,
"decks": {
"$setDifference": [
{
"$map": {
"input": "$decks",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.title", "Test" ] },
{
"position": "$$el.position",
"duration": "$$el.duration"
},
false
]
}
}
},
[false]
]
}
}},
{ "$sort": { "decks.position": 1 }}
])
The advantage there is that you can make the changes without "unwinding", which can reduce processing time with large arrays as you are not essentially creating new documents for every array member and then running a separate $match stage to "filter" or another $project to reshape.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"position" : 1,
"duration" : 2
}
],
"url" : "http://10.0.1.187"
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"position" : 2,
"duration" : 3
}
]
}
You can again either live with the "filtered" array or if you want you can again "flatten" this truly by adding in an additional $unwind where you do not need to filter with $match as the result already contains only the matched items.
But generally speaking if you can live with it then just use .find() as it will be the fastest way. Otherwise what you are doing is fine for small data, or there is the other option for consideration.
Well as soon as I posted I realized I should be using an $unwind. Is this query the optimal way to do it, or can it be done differently?
db.slides.aggregate([
{
"$unwind": "$decks"
},
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);

MongoDB query only the inner document

My mongodb collection looks like this:
{
"_id" : ObjectId("5333bf6b2988dc2230c9c924"),
"name" : "Mongo2",
"notes" : [
{
"title" : "mongodb1",
"content" : "mongo content1"
},
{
"title" : "replicaset1",
"content" : "replca content1"
}
]
}
{
"_id" : ObjectId("5333fd402988dc2230c9c925"),
"name" : "Mongo2",
"notes" : [
{
"title" : "mongodb2",
"content" : "mongo content2"
},
{
"title" : "replicaset1",
"content" : "replca content1"
},
{
"title" : "mongodb2",
"content" : "mongo content3"
}
]
}
I want to query only notes that have the title "mongodb2" but do not want the complete document.
I am using the following query:
> db.test.find({ 'notes.title': 'mongodb2' }, {'notes.$': 1}).pretty()
{
"_id" : ObjectId("5333fd402988dc2230c9c925"),
"notes" : [
{
"title" : "mongodb2",
"content" : "mongo bakwas2"
}
]
}
I was expecting it to return both notes that have title "mongodb2".
Does mongo return only the first document when we query for a document within a document ?
The positional $ operator can only return the first match index that it finds.
Using aggregate:
db.test.aggregate([
// Match only the valid documents to narrow down
{ "$match": { "notes.title": "mongodb2" } },
// Unwind the array
{ "$unwind": "$notes" },
// Filter just the array
{ "$match": { "notes.title": "mongodb2" } },
// Reform via group
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"notes": { "$push": "$notes" }
}}
])
So you can use this to "filter" specific documents from the array.
$ always refers to the first match, as does the $elemMatch projection operator.
I think you have three options:
separate the notes so each is a document of its own
accept sending more data over the network and filter client-side
use the aggregation pipeline ($match and $project)
I'd probably choose option 1, but you probably have a reason for your data model.