Mongodb: compute a value before using it as connectFromField in an aggregate - mongodb

I have mongo tree structure that looks like this:
{"_id":uid1,"parent": null, "path": "#uid1", "name": "a"}
{"_id":uid2,"parent": "uid1", "path": "#uid1#uid2", "name": "b"}
{"_id":uid3,"parent": "uid1", "path": "#uid1#uid3", "name": "c"}
{"_id":uid4,"parent": "uid2", "path": "#uid1#uid2#uid4", "name": "1"}
{"_id":uid5,"parent": "uid2", "path": "#uid1#uid2#uid5", "name": "2"}
{"_id":uid6,"parent": "uid1", "path": "#uid1#uid6", "name": "1"}
{"_id":uid7,"parent": "uid6", "path": "#uid1#uid6#uid7", "name": "x"}
where every node is represented by its unique id uidx and located thanks to its parent's uid. Everytime the parent of a node is modified its path and the paths of its children are automatically modified (inside a mongoose pre-save).
The example above can be represented as follows:
a
|_b
| |_1
| |_2
|_c
|_1
|_x
My goal is to build a request that will get only the leaves under a specified node.
Had I stored the parent path inside the field parent instead of only the parent identifier I would have been able to do it using the following request:
db.tree.aggregate([
{$match:{"parent": {$regex:"^#uid1#uid2"}}},
{$graphLookup:{
from:"tree",
startWith:"$path",
connectFromField:"path",
connectToField:"parent",
as:"dep"}},
{$match:{dep:[]}},
{$project:{"_id":0, path:1}}
])
as already answered in my previous question here: Mongodb: get only leaves of tree
The problem is I did not.
So I have to somehow transform the 'connectToField' in my request so that it represents the path of my parent instead of the id of my parent. Does anybody have an idea on how to do this?
This question is a rewritten version of my previous question here:previous version

You don't need to calculate anything, nor rely on the path.
It is the exact usecase from https://docs.mongodb.com/manual/reference/operator/aggregation/graphLookup/. The { $match: { dep: [] } } stage returns nodes with no children, i.e. leaves.
db.tree.aggregate([
{ $graphLookup: {
from:"tree",
startWith:"$_id",
connectFromField:"_id",
connectToField:"parent",
as:"dep",
maxDepth: 1
} },
{ $match: { dep: [] } },
])
maxDepth: 1 is added to speed things up a bit. As soon as the node have a direct child you don't care about rest of the branch, so 1 level depth for direct children is sufficient:

Related

restricting term query on field with path tokenizer based hierarchy analyzer to the first child level

In an ElasticSearch index, I have a hierarchy analyzer on a field that is holding a path, split with a path tokenizer.
This I intend to be used to query into a flat document structure (all top level documents in the index) to select a sub-tree. The path is used to encode the tree here.
But how can the query restrict the resultset to the first child level of the term, when needed?
Example:
{
"id": "root",
"path": "root"
"type": "list"
}
{
"id": "one",
"path": "root/one",
"type": "list"
}
{
"id": "two",
"path": "root/two",
"type": "list"
}
{
"id": "three",
"path": "root/two/three",
"type": "list"
}
{
"id": "four",
"path": "root/two/four",
"type": "item"
}
The path field has the hierarchy analyzer.
If I do a term query on path for root and type: list, I then get root, root/one, root/two and root/two/three.
Querying for root/two with type: item will return root/two/four.
The hierarchy analyzer allows to select all sub-documents, whatever the nesting. This is good for some use-cases.
But other use-cases should only return documents from the first child level.
For example, when querying for root, the resultset should contain root/one and root/two but not root and also not root/two/three or root/two/four.
GET /myindex/_search
{
"query": {
"bool": {
"must": [
{"term": {"path": "root"}},
{"term": {"type":"list"}}
]
}
}
}
The above query should exclude root itself, as well as root/two/three and it's decendants or any other documents deeper than level two. How can it be extended, to do so?
In other words: The above query should only return directs decendants of root when querying for root on the path field, which has the hierarchy analyzer with path tokenizer.
But other queries should be able to retrieve a whole "tree", for which the hierarchy analyzer is helpful.
Can it be done on the query and how?

Should JSON patch fail when an array or object isn't defined?

Given the JSON object
{
"property": "value"
}
If you perform a JSON patch on this object like the following
[{
"op": "add",
"path": "/otherProperty/property",
"value": "childvalue"
}]
Should the JSON patch operation fail because the otherProperty isn't defined or should the operation add the whole path?
I couldn't find any information in this.
As mentioned in the comments, the JSON Patch Internet Draft states that the operation should result in an error:
However, the object itself or an array containing it does need to
exist, and it remains an error for that not to be the case. For
example, an "add" with a target location of "/a/b" starting with this
document:
{ "a": { "foo": 1 } }
is not an error, because "a" exists, and "b" will be added to its
value. It is an error in this document:
{ "q": { "bar": 2 } }
because "a" does not exist.
That said you can still do what you want, but you have to change the syntax by adding an object that contains the property you want. So according to Appendix 10 of that draft you can do
[{
"op": "add",
"path": "/otherProperty",
"value": { "property" : "childvalue" }
}]
In this case you are creating a field at the root level that has a json object as body:
{
"property": "value",
"otherProperty" : {
"property" : "childvalue"
}
}
I tested this here by pasting before and after JSON of the target resource, and it generated the the same add statement I presented above.

Sorting by nested objects attributes in mongoose when using populate

I'm trying to sort parent documents by an attribute from a populated child document.
so if a person schema has an attribute business and the business schema has an attribute name, I want to sort the list of person documents by the name of their business alphabetically.
This seems very possible since the relationship between the person and business is always 1 to 1. but it seems mongoose doesn't allow such sorting mechanism as whenever I pass business.name as sorting arguments it will default sort instead (same sorting as passing unknown arguments).
I'm trying to use aggregate but the docs on that are very bad and not all arguments are clear.
I would like to know if there is a way of doing.
This is my aggregate code:
let populatedArray = [
{
"path": "business",
"schema": require("../models/business").collection.name
},
{
"path": "createdBy",
"schema": require("../models/User").collection.name
},
{
"path": "schema2",
"schema": require("../models/schema2").collection.name
},
{
"path": "schema3",
"schema": require("../models/schema3").collection.name
},
{
"path": "schema4",
"schema": require("../models/schema4").collection.name
},
{
"path": "schema5",
"schema": require("../models/schema5").collection.name
},
{
"path": "schema6",
"schema": require("../models/schema6").collection.name,
"populate": [{
"path": "schema7",
"schema": require("../models/schema7").collection.name
}]
}];
populatedArray.forEach((elem)=>{
docsPromise.lookup({from:elem.schema,localField: elem.path, foreignField: '_id', as: elem.path})
docsPromise.unwind("$"+elem.path)
})
With the unwind command I get no documents. without the unwind command I get 500 documents while I only have 140 in the database. I know that aggregate is near a LEFT_JOIN on a SQL DB which can give similar result but I don't know to stop it form doing so.

How to implement a RESTful API for order changes on large collection entries?

I have an endpoint that may contain a large number of resources. They are returned in a paginated list. Each resource has a unique id, a rank field and some other data.
Semantically the resources are ordered with respect to their rank. Users should be able to change that ordering. I am looking for a RESTful interface to change the rank field in many resources in a large collection.
Reordering one resource may result in a change of the rank fields of many resources. For example consider moving the least significant resource to the most significant position. Many resources may need to be "shifted down in their rank".
The collection being paginated makes the problem a little tougher. There has been a similar question before about a small collection
The rank field is an integer type. I could change its type if it results in a reasonable interface.
For example:
GET /my-resources?limit=3&marker=234 returns :
{
"pagination": {
"prevMarker": 123,
"nextMarker": 345
},
"data": [
{
"id": 12,
"rank": 2,
"otherData": {}
},
{
"id": 35,
"rank": 0,
"otherData": {}
},
{
"id": 67,
"rank": 1,
"otherData": {}
}
]
}
Considered approaches.
1) A PATCH request for the list.
We could modify the rank fields with the standard json-patch request. For example the following:
[
{
"op": "replace",
"path": "/data/0/rank",
"value": 0
},
{
"op": "replace",
"path": "/data/1/rank",
"value": 1
},
{
"op": "replace",
"path": "/data/2/rank",
"value": 2
}
]
The problems I see with this approach:
a) Using the array indexes in path in patch operations. Each resource has already a unique ID. I would rather use that.
b) I am not sure to what the array index should refer in a paginated collection? I guess it should refer to the global index once all pages are received and merged back to back.
c) The index of a resource in a collection may be changed by other clients. What the current client thinks at index 1 may not be at that index anymore. I guess one could add test operation in the patch request first. So the full patch request would look like:
[
{
"op": "test",
"path": "/data/0/id",
"value": 12
},
{
"op": "test",
"path": "/data/1/id",
"value": 35
},
{
"op": "test",
"path": "/data/2/id",
"value": 67
},
{
"op": "replace",
"path": "/data/0/rank",
"value": 0
},
{
"op": "replace",
"path": "/data/1/rank",
"value": 1
},
{
"op": "replace",
"path": "/data/2/rank",
"value": 2
}
]
2) Make the collection a "dictionary"/ json object and use a patch request for a dictionary.
The advantage of this approach over 1) is that we could use the unique IDs in path in patch operations.
The "data" in the returned resources would not be a list anymore:
{
"pagination": {
"prevMarker": 123,
"nextMarker": 345
},
"data": {
"12": {
"id": 12,
"rank": 2,
"otherData": {}
},
"35": {
"id": 35,
"rank": 0,
"otherData": {}
},
"67": {
"id": 67,
"rank": 1,
"otherData": {}
}
}
}
Then I could use the unique ID in the patch operations. For example:
{
"op": "replace",
"path": "/data/12/rank",
"value": 0
}
The problems I see with this approach:
a) The my-resources collection can be large, I am having difficulty about the meaning of a paginated json object, or a paginated dictionary. I am not sure whether an iteration order can be defined on this large object.
3) Have a separate endpoint for modifying the ranks with PUT
We could add a new endpoint like this PUT /my-resource-ranks.
And expect the complete list of the ordered id's to be passed in a PUT request. For example
[
{
"id": 12
},
{
"id": 35
},
{
"id": 67
}
]
We would make the MyResource.rank a readOnly field so it cannot be modified through other endpoints.
The problems I see with this approach:
a) The need to send the complete ordered list. In the PUT request for /my-resource-ranks we will not include any other data, but only the unique id's of resources. It is less severe than sending the full resources but still the complete ordered list can be large.
4) Avoid the MyResource.rank field and the "rank" be the order in the /my-collections response.
The returned resources would not have the "rank" field in them and they will be already sorted with respect to their rank in the response.
{
"pagination": {
"prevMarker": 123,
"nextMarker": 345
},
"data": [
{
"id": 35,
"otherData": {}
},
{
"id": 67,
"otherData": {}
},
{
"id": 12,
"otherData": {}
}
]
}
The user could change the ordering with the move operation in json-patch.
[
{
"op": "test",
"path": "/data/2/id",
"value": 12
},
{
"op": "move",
"from": "/data/2",
"path": "/data/0"
}
]
The problems I see with this approach:
a) I would prefer the freedom for the server to return to /my-collections in an "arbitrary" order from the client's point of view. As long as the order is consistent, the optimal order for a "simpler" server implementation may be different than the rank defined by the application.
b) Same concern as 1)b). Does index in the patch operation refer to the global index once all pages are received and merged back to back? Or does it refer to the index in the current page ?
Update:
Does anyone know further examples from an existing public API ? Looking for further inspiration. So far I have:
Spotify's Reorder a Playlist's Tracks
Google Tasks: change order, move
I would
Use PATCH
Define a specialized content-type specifically for updating the order.
The application/patch+json type is pretty great for doing straight-up modifications, but I think your use-case is unique enough to warrant a useful, minimal specialized content-type.

Mongo - What is best design: store menu documents or menu item documents?

I want to store website menus in Mongo for the navigation of my CMS, but since I'm new to Mongo and the concept of documents, I'm trying to figure out what would be best:
a) Should I store menu documents, containing children and those having more children, or
b) Should I store menu item documents with parent_id and child_ids ?
Both would appear to have benefits, since in case A it's normal to load an entire menu at once as you'll need everything to display, but B might be easier to update single items?
I'm using Spring data mongo.
PS: If I asked this question in a wrong way, please let me know. I'm sure this question can be expanded to any general parent-child relationship, but I was having trouble finding the right words.
Since menus are typically going to be very small (under 16MB I hope) then the embedded form should give you the best performance:
{
"topItem1": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
],
"topItem2": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
{
"name": "sub-menu",
"type": "sub",
"items": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
}
}
]
}
The only possible issue there is with updating the content inside nested arrays, as MngoDB can only "match" the first found array index. See the positional $ operator documentation for this.
But as long as you know the positions then this should not be a problem, using "dot notation" concepts:
db.menu.update({}, {
"$set": {
"topItem2.2.items.1": { "name": "item3", "link": "linkValue" }
}
})
But general adding should be simple:
db.menu.update(
{ "topItem2.name": "sub-menu" },
{
"$push": {
"topItem2.2.items": { "name": "item4", "link": "linkValue" }
}
}
)
So that is a perspective on how to use the inherrent embedded structure rather than associate "parent" and "child" items.
After long hard thinking I believe I would use:
{
_id: {},
submenu1: [
{label: "Whatever", url: "http://localhost/whatever"}
]
}
I thought about using related documents with IDs all sitting in a collection but then you would have to shoot off multiple queries to get the parent and its range, possibly even sub-sub ranges too. With this structure you have only one query for all.
This structure is not infallible however, if you change your menu items regularly you might start to notice fragmentation. You can remedy this a little with powerof2sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
But yes, with careful planning you should be able to use one single document for every parent menu item