MongoDB: Populate parent using array of children vs search children by parent ID - mongodb

I'm having an argument with my manager over database structure.
We need to create a parent-type object with multiple children, and query a list of all the children that belong to that parent.
I want to use an array of child ObjectIds in the parent object, which are added when the child is created. The parent can be found using FindById, and the child list can then by filled using populate().
My manager insists on not storing an array of children in the parent, but only storing the parent's id as a field in each child object, and then get the list by searching for all of the child objects with the parent's id. He claims that this will be just as fast, since "populate just searches for the object by id anyway".
However, it seems inconceivable to me that it would be just as fast. Isn't the whole point of an _id field to index the file's location for fast retrieval? Shouldn't finding a list of objects where you have their _id always be much faster and more scaleable than searching the entire database for objects where a given field matches a given value?
Is there any justification for not using populate in this kind of circumstance? (Storing a reference to the parent in the child object is, of course, also an option - but he is insistent on not storing an array of children in the parent at all.)

The best method is not a simple choice, it depends on the nature of the data, what you expect for the data set in the future, and how you are going to query the data, both now and in the future.
Storing each child ID in an array on the parent is definitely a viable choice. This makes it easy to retrieve information like "How many children does this parent have?" or "Does this parent include both of these children?". It also simplifies paging through the children because the client will receive all of the child ID values when retrieving the parent and can retrieve as many child records as necessary for display. Storing additional data in the array on the parent, such as a child name and added date, would mean that client could have enough informaiton to show a link to each child without needing to retrieve all of the children first.
This method also has some drawbacks. If the number of children possible for a parent is more than a couple hundred, or not limited at all, there will be serious performance implications when the array becomes large. MongoDB specifically recommends to Avoid Unbounded Arrays
Storing the parent ID in each child maintains a linkage without a single field that does not need to be an array. This means that obtaining a list of children for a given parent would require a separate query, or a $lookup, but would simplify finding the child first, and then linking to the parent.
This method completely avoids the large array issues, even if the data set grows exponentially in the future.

The Mongoose Populate function, while convenient, is slow - it performs a search for each child in the array.
The most efficient way of creating a database where you will need to find the children of a given parent is to store the parent's id as an indexed field in the child, then search for all child objects with a given parent. This is similar to a relational database. It is not usually necessary to store children as an array in the parent.
There is nothing particularly special about the _id field; this is an indexed field automatically created by Mongoose but you can add other indexed fields and searching for them will be just as fast.

Related

Dealing with subdocument arrays (Mongoose) in a RESTful setting

Suppose my schema has a field containing arrays of objects, and I'd like to have endpoints for CRUD operations on these objects. From what I've seen, the idiomatic way to do things at the database level is through findByIdAndUpdate, with $push/$pull operations for adding/removing stuff.
Thing is, those calls will return the updated parent object. Generally, a POST endpoint for creating an object will return the saved object, but here there's an extra step for finding the new object in the parent object array. Is there a neat way I could achieve this or am I missing something here?
(I know having the array objects as separate entitites would solve the problem, but lets just say I cant do that)

Event Sourcing. Aggregates with Children lists or with ParentID?

Should I store in Aggregate lists of children or only ParentID?
Lists of children can be huge (thousands). Adding or removing one child would mean saving back to storage thousands and thousands children when only one added or removed.
ParentID is very easy to manage. It makes all aggregates simple (no lists!). It allows to quickly create lists by simply querying aggregates by their ParentID. It allows arbitrary level of hierarchies that can be quickly rebuilt recursively without creating huge aggregates.
If I'd go with ParentID, should I add to events table the field ParentIdGuid to refer to the parent of the aggregate so i can quickly generate lists?
EDIT:
Brad's answer made me realize that parentid will not be included in the resulting json of the child objects that were updated, since I would only include the fields that had changed. Therefore I cannot rely on json indexing of postgress. That means i would have to create a parentid column on the events table itself.
A parent ID on the child is definitely the way to go. Think of a database foreign key. The relationship is held as a parent ID pointer in the child record.
When you instantiate your AR in memory to use it, you should populate the list of children using the ParentID pointer. If this is expensive, which it sounds like it is, maybe you implement a populate-on-demand function so you don't pay the price unless it's necessary.
There are 2 approaches to the Events table. It either contains all properties of the entity after the change is made, or it can contain just the modified fields. If you’re in the camp of saving all properties, then the ParentID should be in each event. If you prefer to save just the changes, then somewhere in the event list for that entity should be the ParentID; possibly just in the creation event. With Event Sourcing you add up all the change events to arrive at the current state, so this would give you the ParentID in the entity.

Core Data Fetch

I have an entity, and I want to fetch a certain attribute.
For example,
Let's say I have an entity called Food, with multiple attributes. I want to select all categories, which is an attribute on each food item. What's the best way to accomplish this in Core Data?
Just run your fetch request and then use valueForKey: to extract all of the attribute values. If your model contains lots of objects, you can set the fetch limit and offset (and sort descriptor) to page through the items. When doing this you should also set the fetch request to not return objects as faults.
Just remembered there is an alternative. You can set the properties to fetch and then set the result type to NSDictionaryResultType. You still need to do the iteration but this will return the minimum data possible.
EDIT: I think I misunderstood your question. It seems that you only want to fetch a property of an object, not the object itself (e.g. the attribute but not the entity)? I don't believe core data is going to work that way...it's an object graph rather than a database, as the person above mentioned. Research how Core Data "faults", automatically retrieving dependent objects as they are needed. I left the advice below in case it still applies, though I'm not sure it will.
You can add a predicate to your search in order to fetch only objects which meet certain criteria; it works like an "if" statement for fetching. Here's Apple's documentation:
https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSPredicate_Class/Reference/NSPredicate.html
And a tutorial:
http://www.peterfriese.de/using-nspredicate-to-filter-data/
All that said, the necessity really depends on how many objects you're fetching. If it's not causing any performance hit, there's not necessarily anything wrong with fetching a few un-needed objects. Solve problems, in other words--don't "optimize" things that were working fine. If your model contains a ton of objects, though, it could be expensive to fetch them all and you would want to use a predicate.
You can only fetch whole objects, but you can only fetch objects that have a perticlar attribute using nspredicate in your fetch request. See the code snippet in Xcode in the snippet section. Look for fetch request in the find field and drag out the snippet with the nspredicate code. You can set the predicate to only find the objects that satisfy this predicate. Hope this helps!

Entity Framework 4: How do I get the original values of a child object when saving a parent?

I've seen lots of posts on SE relating to this, but none have answered the question satisfactorily. If there is a post that does answer this (with an actual code example) then please point me in that direction.
I need to write information to a log when saving an object. I need to know the original values and the new values. This is very easy for the parent object, and it is even fairly easy to get the new values on any changed child objects. The challenge is getting the original values of the child object.
For instance, a user changes a child object via a drop-down list. This changes the value of the foreign key on the parent. When saving, I need to write the textual description (the ToString() value or some other value) of the changed entity in the log, not the value of the foreign key.
The ObjectStateEntry contains the current values and original values of the parent, but how do I get the current and original values of the changed child object?
It seems like this is something that should be possible, but is either much too difficult to accomplish, or has been overlooked by the Microsoft design team.
Thanks in advance for any help.
The same way you always get it: You look it up. Remember, in your case the "child object" might not even be loaded from the DB. There is no requirement to do so before changing the FK value on the "parent."
It doesn't sound like you actually changed the "child object" itself. Rather, you just changed the "parent" to point at a different child object.
In this case, I'd use Context.GetObjectByKey() to pull the object based on the original FK value. This grabs it from memory if it happens to be loaded, and from the DB if it doesn't.

Check if attribute exists

Is there better way rather than to fetch with predicate and see the number of results in order to check that the attribute exists when adding it into managed context? I'm trying to make an attribute unique for given entity...
I think you may have scrambled your nomenclature. You don't add attributes to context. You add managed objects which are defined by entities which have attributes. You could be asking about two different types of test.
If you're asking whether a means exist of testing if a managed object already exist with the exact same attributes of the one you planning on inserting, the answer is no. Since entities can be arbitrarily complex and since it takes only literally one bit different to make them logically distinct, there is no means of testing whether two objects are logically identical i.e. have the same attributes and relationships, without fetching them and testing them.
If you're asking whether you can test for a unique value of an attribute of a particular entity then you can. First you fetch on a property using [NSFetchRequest setProperty:] and then set you're predicate for the sought value. When walking relationships, you can use the Set and Array Operators to find managed objects with unique values.