Trees - saving parent path - mongodb

I have an embedded (updatable) tree structure in an array - considering the upcoming $slice abilities to select only parts of an array, I'm thinking of implementing a way to display only one (sub-)branch of the tree.
If I understand correctly, to do this efficiently, I'd have to save the path (grandparent.parent.child) in each item in the tree.
However, I can't see a nice way of managing these paths (when updating), looks like these are my options:
trusting client-side parameters blindly and just inserting to the array without verification of the path.
fetching the entire document, compute the path and only then store the new item
What do you think? Branches cannot move around in the tree, they are only inserted/updated.

I just didn't think about deep enough. At the time of insertion I currently have only a parent-id, but when I start having paths - I can use them to find the parent, then I can verify that the parent exist exactly in the same way as I'm currently doing it with parent-id.
The only problem with this approach is that I'd need to save the path even for branches that don't have a parent (children of root).
Then, I can just search for them in the array with the mongo query and if they don't exist the path is bogus and I stop the update.

Related

Does the Javascript Firestore client cache document references?

Just in case I'm trying to solve the XY problem here, here's some context (domain is a role-playing game companion app). I have a document (campaign), which has a collection (characters), and I'm working with angular.io / angularfire.
The core problem here is that if I query the collection of characters on a campaign, I get back Observable<Character[]>. I can use that in an *ngFor let character of characters | async just fine, but this ends up being a little messy downstream - I really want to do something like have the attributes block as a standalone component (<character-attributes [character]="character">) and so on.
This ends up meaning down in the actual display components, I have a mixture of items that change via ngOnChanges (stuff that comes from the character) and items that are observable (things injected by global services like the User playing a particular Character).
I have a couple options for making this cleaner (the zeroth being: just ignore it).
One: I could flatten all the possible dependencies into scalars instead of observables (probably by treating things like the attributes as a real only-view component and injecting more data as a direct input - <character-attributes [character]="" [player]="" [gm]=""> etc. Displayable changes kind of take care of themselves.
Two: I could find some magical way to convert an Observable<Character[]> into an Observable<Observable<Character>[]> which is kind of what I want, and then pass the Character observable down into the various character display blocks (there's a few different display options, depending on whether you're a player (so you want much more details of your character, and small info on everything else) or a GM (so you want intermediate details on everything that can expand into details anywhere).
Three: Instead of passing a whole Character into my component, I could pass character.id and have child components construct an observable for it in ngOnInit. (or maybe switchMap in ngOnChanges, it's unclear if the angular runtime will reuse actual components for different items by changing out the arguments, but that's a different stack overflow question). In this case, I'd be doing multiple reads of the same document - once in a query to get all characters, and once in each view component that is given the characterId and needs to fetch an observable of the character in question.
So the question is: if I do firestore.collection('/foo/1/bars').valueChanges() and later do firestore.doc('/foo/1/bars/1').valueChanges() in three different locations in the code, does that call four firestore reads (for billing purposes), one read, or two (one for the query and one for the doc)?
I dug into the firebase javascript sdk, and it looks like it's possible that the eventmanager handles multiple queries for the same item by just maintaining an array of listeners, but I quite frankly am not confident in my code archaeology here yet.
There's probably an option four here somewhere too. I might be over-engineering this, but this particular toy project is primarily so I can wrestle with best-practices in firestore, so I want to figure out what the right approach is.
I looked at the code linked from the SDK and it might be the library is smart enough to optimize multiple observers of the same document to just read the document once. However this is an implementation detail that is dangerous to rely on, as it could change without notice because it's not part of the public API.
On one hand, if you have the danger above in mind and are still willing to investigate, then you may create some test program to discover how things work as of today, either by checking the reads usage from the Console UI or by temporarily modifying the SDK source adding some logging to help you understand what's happening under the hood.
On the other hand, I believe part of the question arises from a application state management perspective. In fact, both listening to the collection or listening to each individual document will notify the same changes to the app, IMO what differs here is how data will flow across the components and how these changes will be managed. In that aspect I would chose whatever approach feels better codewise.
Hope this helps somewhat.

Optimistic locking & aggregate root's internal entity

Let's assume I have the Aggregate root Picture with internal entity Shape. Picture contains list of shapes.
Shape will remain an internal entity of the Picture aggregate root because the Picture defines some rules among multiple Shape instances. Let's say you can't assign new Shape when Picture is read-only and Picture may not contain two Shapes of the same color. Having defined these rules, the aggregate root - knowing about all of its Shapes - can now consistently verify rules.
To not brake Law of Demeter, I am accessing the Shape always through the Picture.
My question is related to ptimistic locking with aggregate versioning. If I am updating color of the Shape through Picture root aggregate, am I increasing the version of the aggreagate root - Picture or only of the Shape ?
My assumption is - only of the Shape, because oposite would prevent parallel updating of multiple Shapes of one Picture.
But what if during update of the Shape, Picture was set to readonly mode?
Thanks for advice.
Every time an Aggregate mutates it should increase the version number when using an optimistic locking mechanism. An Aggregate mutates when its Aggregate root or any of the nested Entities mutate. When a conflicts occurs, it means that a previous faster state mutation has already been committed and it cannot be rollback. It also mean that the later state mutation was based on old data and it must be re-executed.
However, this conflict should be transparently retried by the framework by re-executing the command (load, execute, persist). The Aggregate should not care about this situation, the domain logic should be the same. In other words, in case of conflict, the client should not even notice, the HTTP response (or whatever) should be the same, maybe a little slower.
My question is related to ptimistic locking with aggregate versioning. If I am updating color of the Shape through Picture root aggregate, am I increasing the version of the aggreagate root - Picture or only of the Shape ?
You are increasing the version of the root. Specifically, you are changing the aggregate root from one that "points to" version:4 of Shape to one that points to version:5.
It's somewhat similar to how git handles file changes. You edited the file, which means that the file name that used to point to blob:1 now points to blob:2. But "file" is just a name in the tree, so we need to change the tree from one that says { file -> blob:1 } to a tree that says { file -> blob:2 }, and so on all the way up to the root.
Repeating the same idea another way, any fixed version of the aggregate is "immutable" -- I should be able to look at version:4 all day, and not be affected by the changes that you are making to the Shape, which means your changes need to happen in a new version.
As a clarification: it's weird.
The aggregate is, as a data pattern, a single graph of relations that changes atomically, to ensure that the invariant is maintained. But "objects" want to encapsulate their own state. So we take something that is a single tree, and break it into pieces that are individually managed by an object, and then stitch them all back together again to create a single new tree.
The version number relates to the aggregate as it's the aggregate whos state is changed when a shape changes colour. Not sure why this would prevent parallel updating as long as the updates don't actually conflict.
What I mean by that is let's say our AG is at version 3. It contains a red, yellow and blue triangle. Two commands are issued in parallel to change the red triangle to a green one and another command is issued to change blue one to a purple. Both commands are issued at version 3 so a concurrency error will be detected. But assuming you are using events, you can look back at the events and see that they don't conflict and can, therefore, allow the process to go through.
I have a blog post which goes into this in a lot more detail. You can find it here: Handling Concurrency Conflicts in a CQRS and Event Sourced System
I hope that helps.

Returning parent-child chain in a single mongodb query?

I have a filesystem like structure, where there are a bunch of folders that each contain the Object Id of their parent folder. Given a specific folder, I want to return the path to this folder; the only way I can think of to do this is to traverse up the tree of linked Object Ids until I get to the root.
Doing this using just db.find, I would have to send out a number of queries equivalent to the depth of the folder. This doesn't scale however, so I am wondering of there is a way that mongo can chain the connected IDs for me, within a single request. Is this possible? Or is there at least a better way?
There is not really a way to do this in a single MongoDB query like you asked for a first.
You could denormalize the path into each object so it doesn't have to walk all the parents. But any changes would require an update to each and every child.
Basically each folder has...
{
...,
name: "my_folder",
path: "\path\to\folder",
children: ...,
}
And when you update a folder you need to walk the children of the folder and update them by taking parent.path, adding parent.name, and slapping that into the child's path property. Update away down the chain. Updates are more expensive, but reads a whole lot cheaper.
MongoDB documentation has some examples of patterns to model Tree Structures.
Check if one of them fits your requirements. With the information you have provided I would say that an Array of Ancestors could be a good option.
I believe what you'd like to do is pretty similar to using a tree model.
I understand you data currently don't have the ancestors field, but that would be one of the best approaches to the problem:
folder
{
_id: 100,
folder_name: "foo",
ancestors: [80, 23, 1]
}
Please check this video from MongoDB University(by 10gen) for further explanation:
https://www.youtube.com/watch?feature=player_embedded&v=lIjXyQklGWY

Recommendations on structure for Mongoid/MongoDB Tree of Tags

I'm looking for some recommendations on how to structure the tags part of this data model:
Here's a simplified version of it:
a Site has many Posts (relational association [references_many in mongoid speak]). A Site has a tree of tags
a Post has an array of tags (subset of the Site's tags, order doesn't matter)
The use cases I care about are:
Quickly saving & retrieving the Site's tags in tree form (ie to be able to display them as a tree in the UI)
Quickly querying which of a Site's posts have a certain tag.
Without the tree structure, http://github.com/wilkerlucio/mongoid_taggable solves my usecases. I've seen some of the acts_as_tree ports for Mongoid like:
http://github.com/benedikt/mongoid-tree
http://github.com/saks/mongoid_acts_as_tree
http://github.com/ticktricktrack/mongoid_tree
They all seem to take a relational approach, as opposed to embedded, to storing the hierarchy, which would mean that both of the use cases above would be slow (likely requiring a map/reduce).
Has anyone done anything similar, or have any advice? Ideally I'd love a Mongoid solution, but I'm happy to drop down to the Ruby driver as well.
Do you need to update the structure of the tree (i.e. move a tag to another parent)? If that is possible, the embedded approach would become difficult, and the relational/normalized approach makes more sense.
I would probably store the tags themselves in the document (embedded), but if there is any chance that I need to move tree nodes around on-line, then I'd store the hierarchy in another document. Queries need not be slow, if you first flatten the search query (according to the current tree) and then search for those tags. This approach probably does not scale to well if the flattened search query ends up having hundreds of tags in them (how tall is your tree?).
If tags cannot be moved to new parents (or only by you, during scheduled maintenance), go ahead and embed the whole hierarchy.
there are two implemented patterns of mongodb tree structure

How can I index a bunch of files in Perl?

I'm trying to clean up a database by first finding unreferenced objects. I have extracted all the database objects into a list, and all the ddl code into files, I also have all the Java source code for the project.
Basically what I want to do (preferably in Perl as it's the scripting language that I'm most familiar with) is to somehow index the contents of all the extracted database ddl and Java files (to speed up the search), step through the database object list and then search through all the files (using the index) to see if those objects are referenced anywhere and create a report.
If you could point me in the right direction to find something that indexes all those files in a way that I can search them (preferably in Perl) I would greatly appreciate it.
The key here is to be able to do this programatically, not manually (using something like Google desktop search).
Break the task down into its steps and start at the beginning. First, what does a record look like, and what information in it connects it to another record? Parse that record, store its unique identifier and a list of the things it references.
Once you have that list, invert it. For each reference, create a list of the objects referenced. Count them by their identifier. You should be able to get the ones whose count is zero.
That's a very general answer, but you asked a very general question. If you are having trouble, break it down into just one of those steps and ask a more specific question, supplying sample data and the code you've tried so far.
Good luck,
An interesting module you might use to do what you want is KinoSearch, it provides you the kind of indexing you said to be looking for. Then you can go through the object identifiers and check if there are references to it.