restricting term query on field with path tokenizer based hierarchy analyzer to the first child level - elasticsearch-dsl

In an ElasticSearch index, I have a hierarchy analyzer on a field that is holding a path, split with a path tokenizer.
This I intend to be used to query into a flat document structure (all top level documents in the index) to select a sub-tree. The path is used to encode the tree here.
But how can the query restrict the resultset to the first child level of the term, when needed?
Example:
{
"id": "root",
"path": "root"
"type": "list"
}
{
"id": "one",
"path": "root/one",
"type": "list"
}
{
"id": "two",
"path": "root/two",
"type": "list"
}
{
"id": "three",
"path": "root/two/three",
"type": "list"
}
{
"id": "four",
"path": "root/two/four",
"type": "item"
}
The path field has the hierarchy analyzer.
If I do a term query on path for root and type: list, I then get root, root/one, root/two and root/two/three.
Querying for root/two with type: item will return root/two/four.
The hierarchy analyzer allows to select all sub-documents, whatever the nesting. This is good for some use-cases.
But other use-cases should only return documents from the first child level.
For example, when querying for root, the resultset should contain root/one and root/two but not root and also not root/two/three or root/two/four.
GET /myindex/_search
{
"query": {
"bool": {
"must": [
{"term": {"path": "root"}},
{"term": {"type":"list"}}
]
}
}
}
The above query should exclude root itself, as well as root/two/three and it's decendants or any other documents deeper than level two. How can it be extended, to do so?
In other words: The above query should only return directs decendants of root when querying for root on the path field, which has the hierarchy analyzer with path tokenizer.
But other queries should be able to retrieve a whole "tree", for which the hierarchy analyzer is helpful.
Can it be done on the query and how?

Related

MongoDB - Rename keys in different collections with different structures

I am trying to refactor all the keys over a bunch of collections that match a certain regex or have the same name. The issue is that in different documents or collections, the keys in question may appear at different nesting levels in different locations.
For example, let's say we need to replace the key "sound" to "noise" in the following:
Animals collection:
{
"_id": "4ebb8fd68f7aaffc5d287383",
"animal": "cat",
"name": "fluffy",
"type": "long-haired",
"sound": "meow"
}
Events collection:
{
"_id": "4ebb8fd68f7abac75d287341",
"event": "thunder",
"description": {
"type": "natural",
"sound": "boom"
}
}
How would you go about doing it? Via raw mongo queries ideally, or pymongo if necessary

Sorting by nested objects attributes in mongoose when using populate

I'm trying to sort parent documents by an attribute from a populated child document.
so if a person schema has an attribute business and the business schema has an attribute name, I want to sort the list of person documents by the name of their business alphabetically.
This seems very possible since the relationship between the person and business is always 1 to 1. but it seems mongoose doesn't allow such sorting mechanism as whenever I pass business.name as sorting arguments it will default sort instead (same sorting as passing unknown arguments).
I'm trying to use aggregate but the docs on that are very bad and not all arguments are clear.
I would like to know if there is a way of doing.
This is my aggregate code:
let populatedArray = [
{
"path": "business",
"schema": require("../models/business").collection.name
},
{
"path": "createdBy",
"schema": require("../models/User").collection.name
},
{
"path": "schema2",
"schema": require("../models/schema2").collection.name
},
{
"path": "schema3",
"schema": require("../models/schema3").collection.name
},
{
"path": "schema4",
"schema": require("../models/schema4").collection.name
},
{
"path": "schema5",
"schema": require("../models/schema5").collection.name
},
{
"path": "schema6",
"schema": require("../models/schema6").collection.name,
"populate": [{
"path": "schema7",
"schema": require("../models/schema7").collection.name
}]
}];
populatedArray.forEach((elem)=>{
docsPromise.lookup({from:elem.schema,localField: elem.path, foreignField: '_id', as: elem.path})
docsPromise.unwind("$"+elem.path)
})
With the unwind command I get no documents. without the unwind command I get 500 documents while I only have 140 in the database. I know that aggregate is near a LEFT_JOIN on a SQL DB which can give similar result but I don't know to stop it form doing so.

Mongodb: compute a value before using it as connectFromField in an aggregate

I have mongo tree structure that looks like this:
{"_id":uid1,"parent": null, "path": "#uid1", "name": "a"}
{"_id":uid2,"parent": "uid1", "path": "#uid1#uid2", "name": "b"}
{"_id":uid3,"parent": "uid1", "path": "#uid1#uid3", "name": "c"}
{"_id":uid4,"parent": "uid2", "path": "#uid1#uid2#uid4", "name": "1"}
{"_id":uid5,"parent": "uid2", "path": "#uid1#uid2#uid5", "name": "2"}
{"_id":uid6,"parent": "uid1", "path": "#uid1#uid6", "name": "1"}
{"_id":uid7,"parent": "uid6", "path": "#uid1#uid6#uid7", "name": "x"}
where every node is represented by its unique id uidx and located thanks to its parent's uid. Everytime the parent of a node is modified its path and the paths of its children are automatically modified (inside a mongoose pre-save).
The example above can be represented as follows:
a
|_b
| |_1
| |_2
|_c
|_1
|_x
My goal is to build a request that will get only the leaves under a specified node.
Had I stored the parent path inside the field parent instead of only the parent identifier I would have been able to do it using the following request:
db.tree.aggregate([
{$match:{"parent": {$regex:"^#uid1#uid2"}}},
{$graphLookup:{
from:"tree",
startWith:"$path",
connectFromField:"path",
connectToField:"parent",
as:"dep"}},
{$match:{dep:[]}},
{$project:{"_id":0, path:1}}
])
as already answered in my previous question here: Mongodb: get only leaves of tree
The problem is I did not.
So I have to somehow transform the 'connectToField' in my request so that it represents the path of my parent instead of the id of my parent. Does anybody have an idea on how to do this?
This question is a rewritten version of my previous question here:previous version
You don't need to calculate anything, nor rely on the path.
It is the exact usecase from https://docs.mongodb.com/manual/reference/operator/aggregation/graphLookup/. The { $match: { dep: [] } } stage returns nodes with no children, i.e. leaves.
db.tree.aggregate([
{ $graphLookup: {
from:"tree",
startWith:"$_id",
connectFromField:"_id",
connectToField:"parent",
as:"dep",
maxDepth: 1
} },
{ $match: { dep: [] } },
])
maxDepth: 1 is added to speed things up a bit. As soon as the node have a direct child you don't care about rest of the branch, so 1 level depth for direct children is sufficient:

MongoDB - how to properly model relations

Let's assume we have the following collections:
Users
{
"id": MongoId,
"username": "jsloth",
"first_name": "John",
"last_name": "Sloth",
"display_name": "John Sloth"
}
Places
{
"id": MongoId,
"name": "Conference Room",
"description": "Some longer description of this place"
}
Meetings
{
"id": MongoId,
"name": "Very important meeting",
"place": <?>,
"timestamp": "1506493396",
"created_by": <?>
}
Later on, we want to return (e.g. from REST webservice) list of upcoming events like this:
[
{
"id": MongoId(Meetings),
"name": "Very important meeting",
"created_by": {
"id": MongoId(Users),
"display_name": "John Sloth",
},
"place": {
"id": MongoId(Places),
"name": "Conference Room",
}
},
...
]
It's important to return basic information that need to be displayed on the main page in web ui (so no additional calls are needed to render the table). That's why, each entry contains display_name of the user who created it and name of the place. I think that's a pretty common scenario.
Now my question is: how should I store this information in db (question mark values in Metting document)? I see 2 options:
1) Store references to other collections:
place: MongoId(Places)
(+) data is always consistent
(-) additional calls to db have to be made in order to construct the response
2) Denormalize data:
"place": {
"id": MongoId(Places),
"name": "Conference room",
}
(+) no need for additional calls (response can be constructed based on one document)
(-) data must be updated each time related documents are modified
What is the proper way of dealing with such scenario?
If I use option 1), how should I query other documents? Asking about each related document separately seems like an overkill. How about getting last 20 meetings, aggregate the list of related documents and then perform a query like db.users.find({_id: { $in: <id list> }})?
If I go for option 2), how should I keep the data in sync?
Thanks in advance for any advice!
You can keep the DB model you already have and still only do a single query as MongoDB introduced the $lookup aggregation in version 3.2. It is similar to join in RDBMS.
$lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
So instead of storing a reference to other collections, just store the document ID.

Mongo - What is best design: store menu documents or menu item documents?

I want to store website menus in Mongo for the navigation of my CMS, but since I'm new to Mongo and the concept of documents, I'm trying to figure out what would be best:
a) Should I store menu documents, containing children and those having more children, or
b) Should I store menu item documents with parent_id and child_ids ?
Both would appear to have benefits, since in case A it's normal to load an entire menu at once as you'll need everything to display, but B might be easier to update single items?
I'm using Spring data mongo.
PS: If I asked this question in a wrong way, please let me know. I'm sure this question can be expanded to any general parent-child relationship, but I was having trouble finding the right words.
Since menus are typically going to be very small (under 16MB I hope) then the embedded form should give you the best performance:
{
"topItem1": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
],
"topItem2": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
{
"name": "sub-menu",
"type": "sub",
"items": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
}
}
]
}
The only possible issue there is with updating the content inside nested arrays, as MngoDB can only "match" the first found array index. See the positional $ operator documentation for this.
But as long as you know the positions then this should not be a problem, using "dot notation" concepts:
db.menu.update({}, {
"$set": {
"topItem2.2.items.1": { "name": "item3", "link": "linkValue" }
}
})
But general adding should be simple:
db.menu.update(
{ "topItem2.name": "sub-menu" },
{
"$push": {
"topItem2.2.items": { "name": "item4", "link": "linkValue" }
}
}
)
So that is a perspective on how to use the inherrent embedded structure rather than associate "parent" and "child" items.
After long hard thinking I believe I would use:
{
_id: {},
submenu1: [
{label: "Whatever", url: "http://localhost/whatever"}
]
}
I thought about using related documents with IDs all sitting in a collection but then you would have to shoot off multiple queries to get the parent and its range, possibly even sub-sub ranges too. With this structure you have only one query for all.
This structure is not infallible however, if you change your menu items regularly you might start to notice fragmentation. You can remedy this a little with powerof2sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
But yes, with careful planning you should be able to use one single document for every parent menu item