Walk tree of objects for change detection

Walk tree of objects for change detection - entity-framework-core

TL;DR: Can EFCore do some kind of DetectChanges(someEntity) and automatically walk the object tree from that entity?
My domain model is created in Domain Driven Design style, where object references are only within an aggregate (e.g. PurchaseOrder.Lines is a collection of PurchaseOrderLine) and associations outside of the aggregate are by Id only (e.g. PurchaseOrder.CustomerId is a Guid, instead of a property to a Customer object).
I am retrieving many objects from the DB and altering them. At the end of the process I want to decide which objects to save and which ones not to. So, I want to only save modified objects that I specifically state have changed via PurchaseOrderRepository.Update(purchaseOrder), but I also want to ensure all related objects are checked for changes too (so I'd like EFCore to see that 2 objects were removed from purchaseOrder.Lines so should be deleted, and 1 new one was added).
I don't want to automatically save everything that is retrieved + modified, only what I explicitly state should be. Is this something that is possible?
For example:
If I load a lot of objects, modify them, and then they fail my domain validation, I want to abandon saving changes to those objects and instead save a new Event of some kind in the DB saying a file import failed.

Related

What is the point of the Update function in the Repository EF pattern?

I am using the repository pattern within EF using an Update function I found online
public class Repository<T> : IRepository<T> where T : class
{
public virtual void Update(T entity)
{
var entry = this.context.Entry(entity);
this.dbset.Attach(entity);
entry.State = System.Data.Entity.EntityState.Modified;
}
}
I then use it within a DeviceService like so:
public void UpdateDevice(Device device)
{
this.serviceCollection.Update(device);
this.uow.Save();
}
I have realise that what this actually does it update ALL of the device's information rather than just update the property that changed. This means in a multi threaded environment changes can be lost.
After testing I realised I could just change the Device then call uow.Save() which both saved the data and didnt overwrite any existing changes.
So my question really is - What is the point in the Update() function? It appears in almost every Repository pattern I find online yet it seems destructive.

I wouldn't call this generic Update method generally "destructive" but I agree that it has limited use cases that are rarely discussed in those repository implementations. If the method is useful or not depends on the scenario where you want to apply it.
In an "attached scenario" (Windows Forms application for instance) where you load entities from the database, change some properties while they are still attached to the EF context and then save the changes the method is useless because the context will track all changes anyway and know at the end which columns have to be updated or not. You don't need an Update method at all in this scenario (hint: DbSet<T> (which is a generic repository) does not have an Update method for this reason). And in a concurrency situation it is destructive, yes.
However, it is not clear that a "change tracked update" isn't sometimes destructive either. If two users change the same property to different values the change tracked update for both users would save the new column value and the last one wins. If this is OK or not depends on the application and how secure it wants changes to be done. If the application disallows to ever edit an object that is not the last version in the database before the change is saved it cannot allow that the last save wins. It would have to stop, force the user to reload the latest version and take a look at the last values before he enters his changes. To handle this situation concurrency tokens are necessary that would detect that someone else changed the record in the meantime. But those concurrency checks work the same way with change tracked updates or when setting the entity state to Modified. The destructive potential of both methods is stopped by concurrency exceptions. However, setting the state to Modified still produces unnecessary overhead in that it writes unchanged column values to the database.
In a "detached scenario" (Web application for example) the change tracked update is not available. If you don't want to set the whole entity to Modified you have to load the latest version from the database (in a new context), copy the properties that came from the UI and save the changes again. However, this doesn't prevent that changes another user has done in the meantime get overwritten, even if they are changes on different properties. Imagine two users load the same customer entity into a web form at the same time. User 1 edits the customer name and saves. User 2 edits the customer's bank account number and saves a few seconds later. If the entity gets loaded into the new context to perform the update for User 2 EF would just see that the customer name in the database (that already includes the change of User 1) is different from the customer name that User 2 sent back (which is still the old customer name). If you copy the customer name value the property will be marked as Modified and the old name will be written to the database and overwrite the change of User 1. This update would be just as destructive as setting the whole entity state to Modified. In order to avoid this problem you would have to either implement some custom change tracking on client side that recognizes if User 2 changed the customer name and if not it just doesn't copy the value to the loaded entity. Or you would have to work with concurrency tokens again.
You didn't mention the biggest limitation of this Update method in your question - namely that it doesn't update any related entities. For example, if your Device entity had a related Parts collection and you would edit this collection in a detached UI (add/remove/modify items) setting the state of the parent Device to Modified won't save any of those changes to the database. It will only affect the scalar (and complex) properties of the parent Device itself. At the time when I used repos of this kind I named the update method FlatUpdate to indicate that limitation better in the method name. I've never seen a generic "DeepUpdate". Dealing with complex object graphs is always a non-generic thing that has to be written individually per entity type and depending on the situation. (Fortunately a library like GraphDiff can limit the amount of code that has to be written for such graph updates.)
To cut a long story short:
For attached scenarios the Update method is redundant as EFs automatic change tracking does all the necessary work to write correct UPDATE statements to the database - including changes in related object graphs.
For detached scenarios it is a comfortable way to perform updates of simple entities without relationships.
Updating object graphs with parent and child entities in a detached scenario can't be done with such a simplified Update method and requires significantly more (non-generic) work.
Safe concurrency control needs more sophisticated tools, like enabling the optimistic concurrency checks that EF provides and handling the resulting concurrency exceptions in a user-friendly way.

After Slauma's very profound and practical answer I'd like to zoom in on some basic principles.
In this MSDN article there is one important sentence
A repository separates the business logic from the interactions with the underlying data source or Web service.
Simple question. What has the business logic to do with Update?
Fowler defines a repository pattern as
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
So as far as the business logic is concerned a repository is just a collection. Collection semantics are about adding and removing objects, or checking whether an object exists. The main operations are Add, Remove, and Contains. Check out the ICollection<T> interface: no Update method there.
It's not the business logic's concern whether objects should be marked as 'modified'. It just modifies objects and relies on other layers to detect and persist changes. Exposing an Update method
makes the business layer responsible for tracking and reporting its changes. Soon all kinds of if constructs will creep in to check whether values have changes or not.
breaks persistence ignorance, because the mere fact that storing updates is something else than storing new objects is a data layer detail.
prevents the data access layer from doing its job properly. Indeed, the implementation you show is destructive. While the Data Access Layer may be perfectly capable of perceiving and persisting granular changes, this method marks a whole object as modified and forces a swiping UPDATE.

Is it possible to tell if an entity is tracked?

I'm using Entity Framework 4.1. I've implemented a base repository using lots of the examples online. My repository get methods take a bool parameter to decide whether to track the entities. Sometimes, I want to load an entity and track it, other times, for some entities, I simply want to read them and display them (i.e. in a graph). In this situation there is never a need to edit, so I don't want the overhead of tracking them. Also, graph entities are sent to a silverlight client, so the entities are disconnected from the context. Hence my Get methods can return a list of entities that are either tracked or not. This is achieved dynamically creating the query as follows:
DbQuery<E> query = Context.Set<E>();
// Track the entities in the context?
if (!trackEntities)
{
query = query.AsNoTracking();
}
However, I now want to enable the user to interact with the graph and edit it. This will not happen very often, so I still want to get some entities without tracking them but to have the ability to save them. To do this I simply attach them to the context and set the state as modified. Everything is working so far.
I am auditing any changes by overriding the SaveChanges method. As explained above I may, in some low cases, need to save modified entities that were disconnected. So to audit, I have to retrieve the current values from the database and then compare to work out what was changed while disconnected. If the entity has been tracked, there is no need to get the old values, as I've got access to them via the state manager. I'm not using self tracking entities, as this is overkill for my requirements.
QUESTION: In my auditing method I simply want to know if the modified entity is tracked or not, i.e. do I need to go to the db and get the original values?
Cheers

DbContext.ChangeTracker.Entries (http://msdn.microsoft.com/en-us/library/gg679172(v=vs.103).aspx) returns DbEntityEntry objects for all tracked entities. DbEntityEntry has Entity property that you could use to find out whether the entity is tracked. Something like
var isTracked = ctx.ChangeTracker.Entries().Any(e => Object.ReferenceEquals(e.Entity, myEntity));

Save One, Save All in Entity Framework

I'm still learning about Unit of Work patterns, repository patterns, etc.
My app:
I have a list of entities, say customers in a listview
When I select a customer a detail form shows, where their details can be edited
I'm trying to understand the standard MVVM/Entity Framework way of accomplishing the following:
When the user edits a customer it shows as "changed" (but not saved)
The user can chose to either save the current customer, or save all the changed customers
The Save or Save All commands/buttons are disabled if that option is not available (the current customer is unchanged, or all customers are unchanged)
Seems simple enough? But I have no idea how to approach this using MVVM/EF. Do I use UoW, do I detach objects and re-attach to the context so I can save them one at a time? How do I detect if an object is changed or unchanged?
Help! Thanks!

I throw in a few remarks:
The critical point in your requirements is in my opinion the option to save either one single customer or all changed customers. You need to take into account that Entity Framework doesn't have a method to save changes of a single or a few selected objects in the context. You can only save the changes of the whole Unit of Work (which is the ObjectContext or DbContext in EF) by calling myContext.SaveChanges().
This leads to the conclusion that you cannot use the list of all customers and the customer detail form in one single Unit of Work (= EF context) which holds all customers as attached entities. If you would do this you could provide a function/button to save all changes but not an option to save only the current customer in the form.
So, I would either think about if you really need those functions or I would work with the entities in a detached state. This would mean that you have to load the customer list from the database and dispose the context after that. When you save the changes - and now it doesn't matter if all changes or only changes of a single customer - you can create a new context, pull the original entity/entities from the database and update with the changed properties.
But working with either attached or detached entities - or either having one living EF context per view/form or creating only one short-living context per CRUD operation - is an important design decision in my opinion. Generally the possibility to have your entities attached to a context during the lifetime of a view/form exists to make your life as programmer easier because it offers you features like lazy loading and change tracking out of the box. So you might think twice if you want to give this up.
To recognize if a customer object has been changed or not the EF context could be helpful because it tracks the state of an object. You could for instance query the ObjectStateManager for a customer and check if it is in a "Changed" state. But to have this option you would need to work with attached entities as explained above. Since you cannot save (or also cancel) single object changes it is questionable if it would make sense at all to show the user that customer 1 and customer 3 has changed. (I would probably only show "some customers have changed".)
If you are working with detached entities you have to manage by hand which customers have changed or not by implementing some kind of "dirty flag" logic. Here is a thread about this:
Different ways to implement 'dirty'-flag functionality

iPhone and Core Data: how to retain user-entered data between updates?

Consider an iPhone application that is a catalogue of animals. The application should allow the user to add custom information for each animal -- let's say a rating (on a scale of 1 to 5), as well as some notes they can enter in about the animal. However, the user won't be able to modify the animal data itself. Assume that when the application gets updated, it should be easy for the (static) catalogue part to change, but we'd like the (dynamic) custom user information part to be retained between updates, so the user doesn't lose any of their custom information.
We'd probably want to use Core Data to build this app. Let's also say that we have a previous process already in place to read in animal data to pre-populate the backing (SQLite) store that Core Data uses. We can embed this database file into the application bundle itself, since it doesn't get modified. When a user downloads an update to the application, the new version will include the latest (static) animal catalogue database, so we don't ever have to worry about it being out of date.
But, now the tricky part: how do we store the (dynamic) user custom data in a sound manner?
My first thought is that the (dynamic) database should be stored in the Documents directory for the app, so application updates don't clobber the existing data. Am I correct?
My second thought is that since the (dynamic) user custom data database is not in the same store as the (static) animal catalogue, we can't naively make a relationship between the Rating and the Notes entities (in one database) and the Animal entity (in the other database). In this case, I would imagine one solution would be to have an "animalName" string property in the Rating/Notes entity, and match it up at runtime. Is this the best way to do it, or is there a way to "sync" two different databases in Core Data?

Here's basically how I ended up solving this.
While Amorya's and MHarrison's answers were valid, they had one assumption: that once created, not only the tables but each row in each table would always be the same.
The problem is that my process to pre-populate the "Animals" database, using existing data (that is updated periodically), creates a new database file each time. In other words, I can't rely on creating a relationship between the (static) Animal entity and a (dynamic) Rating entity in Core Data, since that entity may not exist the next time I regenerate the application. Why not? Because I have no control how Core Data is storing that relationship behind the scenes. Since it's an SQLite backing store, it's likely that it's using a table with foreign key relations. But when you regenerate the database, you can't assume anything about what values each row gets for a key. The primary key for Lion may be different the second time around, if I've added a Lemur to the list.
The only way to avoid this problem would require pre-populating the database only once, and then manually updating rows each time there's an update. However, that kind of process isn't really possible in my case.
So, what's the solution? Well, since I can't rely on the foreign key relations that Core Data makes, I have to make up my own. What I do is introduce an intermediate step in my database generation process: instead of taking my raw data (which happens to be UTF-8 text but is actually MS Word files) and creating the SQLite database with Core Data directly, I introduce an intermediary step: I convert the .txt to .xml. Why XML? Well, not because it's a silver bullet, but simply because it's a data format I can parse very easily. So what does this XML file have different? A hash value that I generate for each Animal, using MD5, that I'll assume is unique. What is the hash value for? Well, now I can create two databases: one for the "static" Animal data (for which I have a process already), and one for the "dynamic" Ratings database, which the iPhone app creates and which lives in the application's Documents directory. For each Rating, I create a pseudo-relationship with the Animal by saving the Animal entity's hash value. So every time the user brings up an Animal detail view on the iPhone, I query the "dynamic" database to find if a Rating entity exists that matches the Animal.md5Hash value.
Since I'm saving this intermediate XML data file, the next time there's an update, I can diff it against the last XML file I used to see what's changed. Now, if the name of an animal was changed -- let's say a typo was corrected -- I revert the hash value for that Animal in situ. This means that even if an Animal name is changed, I'll still be able to find a matching Rating, if it exists, in the "dynamic" database.
This solution has another nice side effect: I don't need to handle any migration issues. The "static" Animal database that ships with the app can stay embedded as an app resource. It can change all it wants. The "dynamic" Ratings database may need migration at some point, if I modify its data model to add more entities, but in effect the two data models stay totally independent.

The way I'm doing this is: ship a database of the static stuff as part of your app bundle. On app launch, check if there is a database file in Documents. If not, copy the one from the app bundle to Documents. Then open the database from Documents: this is the only one you read from and edit.
When an upgrade has happened, the new static content will need to be merged with the user's editable database. Each static item (Animal, in your case) has a field called factoryID, which is a unique identifier. On the first launch after an update, load the database from the app bundle, and iterate through each Animal. For each one, find the appropriate record in the working database, and update any fields as necessary.
There may be a quicker solution, but since the upgrade process doesn't happen too often then the time taken shouldn't be too problematic.

Storing your SQLite database in the Documents directory (NSDocumentDirectory) is certainly the way to go.
In general, you should avoid application changes that modify or delete SQL tables as much as possible (adding is ok). However, when you absolutely have to make a change in an update, something like what Amorya said would work - open up the old DB, import whatever you need into the new DB, and delete the old one.
Since it sounds like you want a static database with an "Animal" table that can't be modified, then simply replacing this table with upgrades shouldn't be an issue - as long as the ID of the entries doesn't change. The way you should store user data about animals is to create a relation with a foreign key to an animal ID for each entry the user creates. This is what you would need to migrate when an upgrade changes it.

Core-Data: Want to persist new web xml stuff to my data store rather than replace existing

I have an application that loads core data, then calls the XML Web Service to get updated data. I basically want my app, rather than to erase all the data and load everything (including new) to persist, I want it to add only the new stuff onto the existing stack (without duplication).
What's the general consensus strategy for something like this?

I fetch an NSSet* of all persisted objects and then perform an intersection operation on that set of NSManagedObject instances with a new managed object, which is populated with the data from an individual XML element and its contents.
If there is something left from that intersection, that means I have that element already in my data store. So I update the existing, already-persisted managed object's properties with data from the XML element and save, discarding the new managed object.
If I have an empty set, the newly created managed object gets saved into the data store directly.
I don't have a hash value available to compare between the persisted data and the XML data, so this works reasonably well.

If each item in your xml data set has a unique key, you can just use that key to find the existing record and update it with the new info

Like Corey says, find a unique key to use in your data. Search the original data for that key and if it's not found, the object is a new one and can be inserted. If the object is already in the original data, discard it.
If there's a chance that the data will be updated over time (objects getting updated on the web side) then you'll have to compare the old (iphone) object and the new object with the same key. If they're the exact same, discard the new one. If not, replace the old one.
If there's no unique key you can use, you're going to have to do this comparison on all objects. I'm not sure of the quickest way to compare two managed objects, but you could try creating a managed object with the new (web) data and comparing it with each of the objects in the original.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse