How to remove array elements when array is nested in multiple levels of embedded docs? - mongodb

Given the following MongoDB example collection ("schools"), how do you remove student "111" from all clubs?
[
{
"name": "P.S. 321",
"structure": {
"principal": "Fibber McGee",
"vicePrincipal": "Molly McGee",
"clubs": [
{
"name": "Chess",
"students": [
ObjectId("111"),
ObjectId("222"),
ObjectId("333")
]
},
{
"name": "Cricket",
"students": [
ObjectId("111"),
ObjectId("444")
]
}
]
}
},
...
]
I'm hoping there's some way other than using cursors to loop over every school, then every club, then every student ID in the club...

MongoDB doesn't have a great support for arrays within arrays (within arrays ...). The simplest solution I see is to read the whole document into your app, modify it there and then save. This way, of course, the operation is not atomic, but for your app it might be ok.

Related

MongoDB data model for fast reads using array data

I have a dataset which returns an array named "data_arr" that contains anywhere from 5 to 200 subitems which have a labelspace & key-value pair as follows.
{
"other_fields": "other_values",
...
"data_arr":[
{
"labelspace": "A",
"color": "red"
},
{
"labelspace": "A",
"size": 500
},
{
"labelspace": "B",
"shape": "round"
},
]
}
The question is, within MongoDB, how to store this data optimized for fast reads. Specifically, there would be queries:
Comparing key-values (ie. Average size of objects which are both red
and round).
Returning all documents which meet a criteria (ie. Red objects
larger than 300).
Label space is important because some key names are reused.
I've contemplated indexing with the existing structure by indexing labelspace.
I've considered grouping all labelspace key/values into a single sub-document as follows:
{
"other_fields": "other_values",
...
"data_a":
{
"color": "red",
"size": 500
},
"data_b":
{
"shape": "round"
}
}
Or modeling it as follows with a multi-value index
{
"other_fields": "other_values",
...
"data_arr":[
{
"labelspace": "A",
"key": "color",
"value": "red"
},
{
"labelspace": "A",
"key": "size",
"value": 500
},
{
"labelspace": "B",
"key": "shape",
"value": "round"
},
]
}
This is a new data set that needs to be collected. So it's difficult for me to build up enough of a sample only to discover I've ventured down the wrong path.
I think the last one is best suited for indexing, so possibly the best approach?

Use $redact to replace sub-documents with "access denied"

I've written some $redact operation to filter my documents:
db.test.aggregate([
{ $redact: {
$cond: {
if: { "$ifNull" : ["$_acl.READ", false] },
then: { $cond: {
if: { $anyElementTrue: {
$map: {
input: "$_acl.READ",
as: "myfield",
in: { $setIsSubset: [ "$$myfield", ["user1“] ] }
}
}},
then: "$$DESCEND",
else: "$$PRUNE"
}},
else: "$$DESCEND",
}
}}
])
This will remove all (sub)documents, where _acl.READ doesn't contain user1. But it will keep all (sub)documents where _acl.READ is not set.
After the aggregation I can't tell if some information was removed of if it simply wasn't part of the document.
Though I'd like remove sensitive information, but keep some hint which tells that access was denied. I.e.
{
id: ...,
subDoc1: {
foo: "bar",
_acl: {
READ: [ ["user1"] ]
}
},
subDoc2: {
_error: "ACCESS DENIED"
}
}
I just can't figure out, how to modify the document while using $redact.
Thank you!
The $redact pipeline stage is quite unique in the aggregation framework as it is not only capable of recursively descending into the nested structure of a document but also in that it can traverse across all of the keys at any level. It does however still require a concept of "depth" in that a key must either contain a sub-document object or an array which itself is composed of sub-documents.
But what it cannot do is "replace" or "swap-out" content. The only actions allowed here are fairly set, or more specifically from the documentation:
The argument can be any valid expression as long as it resolves to $$DESCEND, $$PRUNE, or $$KEEP system variables. For more information on expressions, see Expressions.
The possibly misleading statement there is "The argument can be any valid expression", which is in fact true, but it must however return exactly the same content as what would be resolved to be present in one of those system variables anyhow.
So in order to give some sort of "Access Denied" response in replacement of the "redacted" content, you would have to process differently. Also you would need to consider the limitations of other operators which could simply not work in a "recursive" or in a manner that requires "traversal" as mentioned earlier.
Keeping with the example from the documentation:
{
"_id": 1,
"title": "123 Department Report",
"tags": [ "G", "STLW" ],
"year": 2014,
"subsections": [
{
"subtitle": "Section 1: Overview",
"tags": [ "SI", "G" ],
"content": "Section 1: This is the content of section 1."
},
{
"subtitle": "Section 2: Analysis",
"tags": [ "STLW" ],
"content": "Section 2: This is the content of section 2."
},
{
"subtitle": "Section 3: Budgeting",
"tags": [ "TK" ],
"content": {
"text": "Section 3: This is the content of section3.",
"tags": [ "HCS" ]
}
}
]
}
If we want to process this to "replace" when matching the "roles tags" of [ "G", "STLW" ], then you would do something like this instead:
var userAccess = [ "STLW", "G" ];
db.sample.aggregate([
{ "$project": {
"title": 1,
"tags": 1,
"year": 1,
"subsections": { "$map": {
"input": "$subsections",
"as": "el",
"in": { "$cond": [
{ "$gt": [
{ "$size": { "$setIntersection": [ "$$el.tags", userAccess ] }},
0
]},
"$$el",
{
"subtitle": "$$el.subtitle",
"label": { "$literal": "Access Denied" }
}
]}
}}
}}
])
That's going to produce a result like this:
{
"_id": 1,
"title": "123 Department Report",
"tags": [ "G", "STLW" ],
"year": 2014,
"subsections": [
{
"subtitle": "Section 1: Overview",
"tags": [ "SI", "G" ],
"content": "Section 1: This is the content of section 1."
},
{
"subtitle": "Section 2: Analysis",
"tags": [ "STLW" ],
"content": "Section 2: This is the content of section 2."
},
{
"subtitle" : "Section 3: Budgeting",
"label" : "Access Denied"
}
]
}
Basically, we are rather using the $map operator to process the array of items and pass a condition to each element. In this case the $cond operator first looks at the condition to decide whether the "tags" field here has any $setIntersection result with the userAccess variable we defined earlier.
Where that condition was deemed true then the element is returned un-altered. Otherwise in the false case, rather than remove the element ( not impossible with $map but another step), since $map returns an equal number of elements as it received in "input", you just replace the returned content with something else. In this case and object with a single key and a $literal value. Being "Access Denied".
So keep in mind what you cannot do, being:
You cannot actually traverse document keys. Any processing needs to be explicit to the keys specifically mentioned.
The content therefore cannot be in another other form than an array as MongoDB cannot traverse accross keys. You would need to otherwise evaluate conditionally at each key path.
Filtering the "top-level" document is right out. Unless you really want to add an additional stage at the end that does this:
{ "$project": {
"doc": { "$cond": [
{ "$gt": [
{ "$size": { "$setIntersection": [ "$tags", userAccess ] }},
0
]},
"$ROOT",
{
"title": "$title",
"label": { "$literal": "Access Denied" }
}
]}
}}
With all said and done, there really is not a lot of purpose in any of this unless you are indeed intending to actually "aggregate" something at the end of the day. Just making the server do exactly the same filtering of document content that you can do in client code it usually not the best use of expensive CPU cycles.
Even in the basic examples as given, it makes a lot more sense to just do this in client code unless you are really getting a major benefit out of removing entries that do not meet your conditions from being transferred over the network. In your case there is no such benefit, so better to client code instead.

MongoDB: Tree Node Structure with Object-Maps instead of Collections

Using MongoDB for storage, if I wanted to represent a tree structure of nodes, where child nodes under a single parent always have unique node-names, I believe the standard approach would be to use collections and to manage the node name uniqueness on the app level:
Approach 1: Collection Based Approach for Tree Data
{ "node_name": "home", "title": "Home", "children": [
{ "node_name": "products", "title": "Products", "children": [
{ "node_name": "electronics", "title": "Electronics", "children": [ ] },
{ "node_name": "toys", "title": "Toys", "children": [ ] } ] },
{ "node_name": "services", "title": "Services", "children": [
{ "node_name": "repair", "title": "Repair", "children": [ ] },
{ "node_name": "training", "title": "Training"", "children": [ ] } ] } ] }
I have however thought of the following alternate approach, where node-names become "Object Map" field names, and we do without collections altogether:
Approach 2: Object-Map Based Approach (without Collections)
// NOTE: We don't have the equivalent of "none_name":"home" at the root, but that's not an issue in this case
{ "title": "Home", "children": {
"products": { "title": "Products", children": {
"electronics": { "title": "Electronics", "children": { } },
"toys": { "title": "Toys", "children": { } } } },
"services": { "title": "Services", children": {
"repair": { "title": "Repair", "children": { } },
"training": { "title": "Training", "children": { } } } } } }
The question is:
Strictly from a MongoDB perspective (considering querying, performance, data maintainability and data-size and server scaling), are there any major issues with Approach #2 (over #1)?
EDIT: After getting to know MongoDB a bit better (and thanks to Neil's comments below), I realized that both options of this question are generally the wrong way to go, because they assume that it makes sense to store multiple nodes in a single MongoDB document. Ultimately, each "node" should be a separate document and (as Neil Lunn stated in the comments) there are various ways to implement a hierarchy tree, as seen here: Model Tree Structures in MongoDB
I think this use-case is not good for Mongo DB, because:
there's(MongoDB 2.6) no compress algorithm (your documents will be too large)
Mongo DB use database-level locks (when you want one large document, all DB operations will be blocked)
it will be hard to index
I think better solution will be some relational DB for this use-case.

MongoDB and Nested Arrays

I have a collection with data like this:
{
"Name": "Steven",
"Children": [
{
"Name": "Liv",
"Children": [
{
"Name": "Milo"
}
]
},
{
"Name": "Mia"
},
{
"Name": "Chelsea"
}
]
},
{
"Name": "Ozzy",
"Children": [
{
"Name": "Jack",
"Children": [
{
"Name": "Pearl"
}
]
},
{
"Name": "Kelly"
}
]
}
Two questions
Can MongoDB flatten the arrays to a structure like this [Steven, Liv, Milo, Mia,Chelsea, Ozzy, Jack, Pearl,Kelly]
How can I find the a document where name is jack, no matter where in the structure it is placed
In general, MongoDB does not perform recursive or arbitrary-depth operations on nested fields. To accomplish objectives 1 and 2 I would reconsider the structure of the data as an arbitrarily nested document is not a good way to model a tree in MongoDB. The MongoDB docs have a good section of modelin tree structures that present several options with examples. Pick the one that best suits your entire use case - they will all make 1 and 2 very easy.

Mongo - What is best design: store menu documents or menu item documents?

I want to store website menus in Mongo for the navigation of my CMS, but since I'm new to Mongo and the concept of documents, I'm trying to figure out what would be best:
a) Should I store menu documents, containing children and those having more children, or
b) Should I store menu item documents with parent_id and child_ids ?
Both would appear to have benefits, since in case A it's normal to load an entire menu at once as you'll need everything to display, but B might be easier to update single items?
I'm using Spring data mongo.
PS: If I asked this question in a wrong way, please let me know. I'm sure this question can be expanded to any general parent-child relationship, but I was having trouble finding the right words.
Since menus are typically going to be very small (under 16MB I hope) then the embedded form should give you the best performance:
{
"topItem1": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
],
"topItem2": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
{
"name": "sub-menu",
"type": "sub",
"items": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
}
}
]
}
The only possible issue there is with updating the content inside nested arrays, as MngoDB can only "match" the first found array index. See the positional $ operator documentation for this.
But as long as you know the positions then this should not be a problem, using "dot notation" concepts:
db.menu.update({}, {
"$set": {
"topItem2.2.items.1": { "name": "item3", "link": "linkValue" }
}
})
But general adding should be simple:
db.menu.update(
{ "topItem2.name": "sub-menu" },
{
"$push": {
"topItem2.2.items": { "name": "item4", "link": "linkValue" }
}
}
)
So that is a perspective on how to use the inherrent embedded structure rather than associate "parent" and "child" items.
After long hard thinking I believe I would use:
{
_id: {},
submenu1: [
{label: "Whatever", url: "http://localhost/whatever"}
]
}
I thought about using related documents with IDs all sitting in a collection but then you would have to shoot off multiple queries to get the parent and its range, possibly even sub-sub ranges too. With this structure you have only one query for all.
This structure is not infallible however, if you change your menu items regularly you might start to notice fragmentation. You can remedy this a little with powerof2sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
But yes, with careful planning you should be able to use one single document for every parent menu item