Event Sourcing. Aggregates with Children lists or with ParentID? - cqrs

Should I store in Aggregate lists of children or only ParentID?
Lists of children can be huge (thousands). Adding or removing one child would mean saving back to storage thousands and thousands children when only one added or removed.
ParentID is very easy to manage. It makes all aggregates simple (no lists!). It allows to quickly create lists by simply querying aggregates by their ParentID. It allows arbitrary level of hierarchies that can be quickly rebuilt recursively without creating huge aggregates.
If I'd go with ParentID, should I add to events table the field ParentIdGuid to refer to the parent of the aggregate so i can quickly generate lists?
EDIT:
Brad's answer made me realize that parentid will not be included in the resulting json of the child objects that were updated, since I would only include the fields that had changed. Therefore I cannot rely on json indexing of postgress. That means i would have to create a parentid column on the events table itself.

A parent ID on the child is definitely the way to go. Think of a database foreign key. The relationship is held as a parent ID pointer in the child record.
When you instantiate your AR in memory to use it, you should populate the list of children using the ParentID pointer. If this is expensive, which it sounds like it is, maybe you implement a populate-on-demand function so you don't pay the price unless it's necessary.
There are 2 approaches to the Events table. It either contains all properties of the entity after the change is made, or it can contain just the modified fields. If you’re in the camp of saving all properties, then the ParentID should be in each event. If you prefer to save just the changes, then somewhere in the event list for that entity should be the ParentID; possibly just in the creation event. With Event Sourcing you add up all the change events to arrive at the current state, so this would give you the ParentID in the entity.

Related

Walk tree of objects for change detection

TL;DR: Can EFCore do some kind of DetectChanges(someEntity) and automatically walk the object tree from that entity?
My domain model is created in Domain Driven Design style, where object references are only within an aggregate (e.g. PurchaseOrder.Lines is a collection of PurchaseOrderLine) and associations outside of the aggregate are by Id only (e.g. PurchaseOrder.CustomerId is a Guid, instead of a property to a Customer object).
I am retrieving many objects from the DB and altering them. At the end of the process I want to decide which objects to save and which ones not to. So, I want to only save modified objects that I specifically state have changed via PurchaseOrderRepository.Update(purchaseOrder), but I also want to ensure all related objects are checked for changes too (so I'd like EFCore to see that 2 objects were removed from purchaseOrder.Lines so should be deleted, and 1 new one was added).
I don't want to automatically save everything that is retrieved + modified, only what I explicitly state should be. Is this something that is possible?
For example:
If I load a lot of objects, modify them, and then they fail my domain validation, I want to abandon saving changes to those objects and instead save a new Event of some kind in the DB saying a file import failed.

MongoDB: Populate parent using array of children vs search children by parent ID

I'm having an argument with my manager over database structure.
We need to create a parent-type object with multiple children, and query a list of all the children that belong to that parent.
I want to use an array of child ObjectIds in the parent object, which are added when the child is created. The parent can be found using FindById, and the child list can then by filled using populate().
My manager insists on not storing an array of children in the parent, but only storing the parent's id as a field in each child object, and then get the list by searching for all of the child objects with the parent's id. He claims that this will be just as fast, since "populate just searches for the object by id anyway".
However, it seems inconceivable to me that it would be just as fast. Isn't the whole point of an _id field to index the file's location for fast retrieval? Shouldn't finding a list of objects where you have their _id always be much faster and more scaleable than searching the entire database for objects where a given field matches a given value?
Is there any justification for not using populate in this kind of circumstance? (Storing a reference to the parent in the child object is, of course, also an option - but he is insistent on not storing an array of children in the parent at all.)
The best method is not a simple choice, it depends on the nature of the data, what you expect for the data set in the future, and how you are going to query the data, both now and in the future.
Storing each child ID in an array on the parent is definitely a viable choice. This makes it easy to retrieve information like "How many children does this parent have?" or "Does this parent include both of these children?". It also simplifies paging through the children because the client will receive all of the child ID values when retrieving the parent and can retrieve as many child records as necessary for display. Storing additional data in the array on the parent, such as a child name and added date, would mean that client could have enough informaiton to show a link to each child without needing to retrieve all of the children first.
This method also has some drawbacks. If the number of children possible for a parent is more than a couple hundred, or not limited at all, there will be serious performance implications when the array becomes large. MongoDB specifically recommends to Avoid Unbounded Arrays
Storing the parent ID in each child maintains a linkage without a single field that does not need to be an array. This means that obtaining a list of children for a given parent would require a separate query, or a $lookup, but would simplify finding the child first, and then linking to the parent.
This method completely avoids the large array issues, even if the data set grows exponentially in the future.
The Mongoose Populate function, while convenient, is slow - it performs a search for each child in the array.
The most efficient way of creating a database where you will need to find the children of a given parent is to store the parent's id as an indexed field in the child, then search for all child objects with a given parent. This is similar to a relational database. It is not usually necessary to store children as an array in the parent.
There is nothing particularly special about the _id field; this is an indexed field automatically created by Mongoose but you can add other indexed fields and searching for them will be just as fast.

iphone SDK: Arbitrary tableview row reordering with core data

What is the best way to implement arbitrary row reordering in a tableview that uses core data? The approach that seems obvious to me is to add a rowOrder attribute of type Int16 to the entity that is being reordered in the tableview and manually iterate through the entity updating the rowOrder attributes of all the rows whenever the user finishes reordering.
That is an incredibly inelegant solution though. I'm hoping there is a better approach that doesn't require possibly hundreds of updates whenever the user reorders things.
If the ordering is something that the data model should modal and store, then the ordering should be part of the entity graph anyway.
A good, lightweight solution is to create an Order entity that has a one-to-one relationship to the actual entity being ordered. To make updating easy, create a linked-list like structure of the objects. Something like this:
Order{
order:int;
orderedObject<--(required,nullify)-->OrderObject.order
previous<--(optional,nullify)-->Order.next;
next<--(optional,nullify)-->Order.previous;
}
If you create a custom subclass, you can provide an insert method that inserts a new object in the chain and then sends a message down the next relationships and tells each object to increment its order by one then the message to its next. A delete method does the opposite. That makes the ordering integral to the model and nicely encapsulated. It's easy to make a base class for this so you can reuse it as needed.
The big advantage is that it only requires the small Order objects to be in alive in memory.
Edit:
Of course, you can extend this with another linked object to provide section information. Just relate that entity to the Order entity then provide the order number as the one in the section.
There is no better way and that is the accepted solution. Core Data does not have row ordering internally so you need to do it yourself. However it is really not a lot of code.

Set a relationship with Core Data

I have two entities that are connected through a one-to-many relationship, let's say CategoryEntity and ItemEntity. The relationship is optional for CategoryEntity (there can be categories without items), but required for every ItemEntity. At the app's loading, the Categories already exist in the store, and I want to import ItemEntities and connect them to the appropriate CategoryEntity.
Obviously executing a FetchRequest for each ItemEntity to find the matching category wouldn't be a good solution because there will be about 4000-6000 Items each time..
So, is there something more efficient I could do?
If you have correctly setup your Core Data model, then you have a to-many relationship from the Category entity to the Item entity, and an inverse to-one relationship from Item to Category. Also, you have a "cascade" delete rule for the to-many relationship and a "nullify" delete rule for the to-one relationship.
Assuming this, each time you insert an Item object, setting its Category relationship automatically inserts the Item into the corresponding Category. Deleting an Item automatically removes it from the corresponding Category.
On the Category side, removing a Category automatically removes all of the corresponding Item objects.
Therefore, when you fetch Items, you have already fetched for each Item object its corresponding Category object. You do not need do anything else. Note that, by default, you are not actually retrieving the Category object during the fetch: instead a fault is fired the first time you try to access the object and the object is retrieved at that time. This provides better performances if you do not plan to use immediately the Category object stored within the Item object just fetched. If you plan to use the Category object almost every time you fetch an Item, then you must use the NSFetchRequest methods
- (void)setReturnsObjectsAsFaults:(BOOL)yesNo
- (void)setRelationshipKeyPathsForPrefetching:(NSArray *)keys
to tell Core Data that you do now want faults and that you ask for prefetching your Category relationship.
When you say 'import' item entities, what do you mean? Are these in another Core Data store, defined in another format in a file somewhere, retrieved over the network?
One approach would be to fetch all the categories in one go and add them to an NSDictionary acting as a cache and keyed by some identifying value that allows you to perform a quick lookup. For each item entity that you instantiate during import (whatever that means), retrieve its category ID and then retrieve the Category MO from the cache. Set the relationship and then save. Even better, batch up a number of insertions and save every 10, 100 or 1000 to reduce IO overhead.

Row insertion order entity framework

I'm using a transaction to insert multiple rows in multiple tables. For these rows I would like to add these rows in order. Upon calling SaveChanges all the rows are inserted out of order.
When not using a transaction and saving changes after each insertion does keep order, but I really need a transaction for all entries.
The order inserts/updates and deletes are made in the Entity Framework is dependent upon many things in the Entity Framework.
For example if you insert a new Product in a new Category we have to add the Category before the Product.
This means that if you have a large set of changes there are local ordering constraints that we must satisfy first, and indeed this is what we do.
The order that you do things on the context can be in conflict with these rules. For example if you do this:
ctx.AddToProducts(
new Product{
Name = "Bovril",
Category = new Category {Name = "Food"}
}
);
the effect is that the Product is added (to the context) first and then when we walk the graph we add the Category too.
i.e. the insert order into the context is:
Product
Category
but because of referential integrity constraints we must re-order like this before attempting to insert into the database:
Category
Product
So this kind of local re-ordering is kind of non-negotiable.
However if there are no local dependencies like this, you could in theory preserve ordering. Unfortunately we don't currently track 'when' something was added to the context, and for efficiency reason we don't track entities in order preserving structures like lists. As a result we can't currently preserve the order of unrelated inserts.
However we were debating this just recently, so I am keen to see just how vital this is to you?
Hope this helps
Alex
Program Manager Entity Framework Team
I'm in the process of crossing this bridge. I'm replacing NHibernate with EF and the issue that I'm running across is how lists are inserted into the DB. If I add items to a list like so (in pseduo code):
list.Add(testObject);
list.Add(testObject1);
I'm not currently guaranteed to get the same order when is run 'SaveChanges'. That's a shame because my list object (i.e. linked list) knows the order it was created in. Objects that are chained together using references MUST be saved to the DB in the same order. Not sure why you mentioned that you're "debating" this. =)