EF Core 5: TrackGraph update many-to-many relationship - entity-framework-core

Is it possible to update an EF Core 5.0 many-to-many collection from a detached object?
Let's say we have Blog and Tag models, with a many-to-many relationship.
I'm trying to save a detached blog with the following code:
int UpdateBlog(Blog blog)
{
db.ChangeTracker.TrackGraph(blog, node =>
{
if (node.Entry.Entity == blog)
node.Entry.State = node.Entry.IsKeySet ? EntityState.Modified : EntityState.Added;
else
{
// I don't want to save tags attached to the blog, just update the EF skip navigation.
node.Entry.State = node.Entry.Unchanged;
}
});
db.SaveChanges();
return blog.Id;
}
This mostly works: Blog is inserted or updated. When inserting a new blog, Tags collection is properly inserted as well.
But I can't find a way to update the Tags collection when updating an existing blog!
I've checked that entry.Navigation("Tags").IsModified is indeed true (and according to the docs, that means it will be saved). IsLoaded is false but changing it to true doesn't make a difference either.
Given EF doesn't have the current collection state, I'm expecting it to do a complete DELETE and then INSERT the current values. Is that impossible to achieve without properly loading the collection? That would seriously limit the usefulness of TrackGraph!

After more debugging and reading the EF Core source code, I think the answer is: "no, it's not possible to update a many-to-many collection (or any collection for that matter?) without loading it properly first".
https://github.com/dotnet/efcore/blob/ac2bb48b10ecf1289b568a94b7a35e8075c6d787/src/EFCore/ChangeTracking/Internal/ChangeDetector.cs#L312-L366
It seems that unlike NHibernate, EF can't update a collection that's not loaded first so that it can do a diff (NHibernate handle this case by deleting and re-creating everything).
This in turn makes it rather a pain to update such a detached graph. If someone knows a simple, efficient and generic way to do it I would gladly take the answer!

Related

How does EntityFramework Core mange data internally?

I'm trying to understand how EntityFramework Core manages data internally because it influences how I call DbSets. Particularly, does it refer to in-memory data or re-query the database every time?
Example 1)
If I call _context.ToDo.Where(x => x.id == 123).First() and then in a different procedure call the same command again, will EF give me the in-memory value or re-query the DB?
Example 2)
If I call _context.ToDo.Where(x => x.id == 123).First() and then a few lines later call _context.ToDo.Find(123).Where(x => x.id == 123).Incude(x => x.Children).First(), will it use the in-memeory and then only query the DB for "Children" or does it recall the entire dataset?
I guess I'm wondering if it matters if I duplicate a call or not?
Is this affected by the AsNoTracking() switch?
What you really ask is how caching works in EF Core, not how DbContext manages data.
EF always offered 1st level caching - it kept the entities it loaded in memory, as long as the context remains alive. That's how it can track changes and save all of them when SaveChanges is called.
It doesn't cache the query itself, so it doesn't know that Where(....).First() is meant to return those specific entities. You'd have to use Find() instead. If tracking is disabled, no entities are kept around.
This is explained in Querying and Finding Entities, especially Finding entities using primary keys:
The Find method on DbSet uses the primary key value to attempt to find an entity tracked by the context. If the entity is not found in the context then a query will be sent to the database to find the entity there. Null is returned if the entity is not found in the context or in the database.
Find is different from using a query in two significant ways:
A round-trip to the database will only be made if the entity with the given key is not found in the context.
Find will return entities that are in the Added state. That is, Find will return entities that have been added to the context but have not yet been saved to the database.
In Example #2 the queries are different though. Include forces eager loading, so the results and entities returned are different. There's no need to call that a second time though, if the first entity and context are still around. You could just iterate over the Children property and EF would load the related entities one by one, using lazy loading.
EF will execute 1 query for each child item it loads. If you need to load all of them, this is slow. Slow enough to be have its own name, the N+1 selects problem. To avoid this you can load a related collection explicitly using explicit loading, eg. :
_context.Entry(todo).Collection(t=>t.Children).Load();
When you know you're going to use all children though, it's better to eagerly load all entities with Include().

Does Entity Framework Core have a simple way of preventing the update of child or parent entities?

I'm trying to write an UpdateStatus method which will only update the Status field of an entity when I save changes to the database. If any other fields in the entity have changed I don't want to save those changes to the database. That is simple enough for the entity's own fields, using:
using (var context = new DataAccessContext())
{
context.Attach(entity);
context.Entry(entity).Property(e => e.StatusCode).IsModified = true;
context.SaveChanges();
}
However, I've discovered that any related entity reachable via a navigation property of the entity I'm setting the status of will be inserted if that related entity does not have a key value set. So if a new Child entity is added to entity.Children by some calling code, and the Child entity ChildId property is 0, that Child will be inserted into the database.
Is there any easy way in EF Core to avoid inserting related entities?
I've found an old StackOverflow post that shows how to do it in the pre-Core Entity Framework: How do I stop Entity Framework from trying to save/insert child objects? However, that answer involves looping over every related entity. Is there an easier way in EF Core?
The reason I'm looking for an easier way is that my hierarchy of entities is 5 layers deep. And I've found that it's not enough to detach just the immediate children of an entity. You have to use nested loops to detach the grandchildren, the great-grandchildren, etc. If you only detach the immediate children they won't be inserted but EF Core will attempt to insert new grandchildren and will crash and burn because it hasn't inserted their parents. It gets pretty messy.
I could just read a fresh copy of an entity from the database before updating its Status but I'm trying to avoid having to do a read before I write.
What you are asking is quite simple in EF Core. If you don't want EF Core change tracker operation to process the related data, set the EntityEntry.State rather than calling DbContext / DbSet methods like Attach, Add, Update, Remove etc.
This behavior is different from EF6 where methods and setting state are doing one and the same, and is partially mentioned in the Saving Related Data - Adding a graph of new entities
documentation topic:
Tip
Use the EntityEntry.State property to set the state of just a single entity. For example, context.Entry(blog).State = EntityState.Modified.
So in your sample, simply replace
context.Attach(entity);
with
context.Entry(entity).State = EntityState.Unchanged;
Entity Framework Core ignores relationships unless you explicitly
include them in queries.
When attaching an entity to the database that has related data/ child properties, those entities will be included in the query.
So to fix this issue all you need to do is set those child properties to null and then EF Core will ignore the child-objects when you're updating the parent-object.

Entity Framework - Why explicitly set entity state to modified?

The official documentation says to modify an entity I retrieve a DbEntityEntry object and either work with the property functions or I set its state to modified. It uses the following example
Department dpt = context.Departments.FirstOrDefault();
DbEntityEntry entry = context.Entry(dpt);
entry.State = EntityState.Modified;
I don't understand the purpose of the 2nd and 3rd statement. If I ask the framework for an entity like the 1st statement does and then modify the POCO as in
dpt.Name = "Blah"
If I then ask EF to SaveChanges(), the entity has a status of MODIFIED (I'm guessing via snapshot tracking, this isn't a proxy) and the changes are persisted without the need to manually set the state. Am I missing something here?
In your scenario you indeed don't have to set the state. It is purpose of change tracking to find that you have changed a value on attached entity and put it to modified state. Setting state manually is important in case of detached entities (entities loaded without change tracking or created outside of the current context).
As said, in a scenario with disconnected entities it can be useful to set an entity's state to Modified. It saves a roundtrip to the database if you just attach the disconnected entity, as opposed to fetching the entity from the database and modifying and saving it.
But there can be very good reasons not to set the state to Modified (and I'm sure Ladislav was aware of this, but still I'd like to point them out here).
All fields in the record will be updated, not only the changes. There are many systems in which updates are audited. Updating all fields will either cause large amounts of clutter or require the auditing mechanism to filter out false changes.
Optimistic concurrency. Since all fields are updated, this may cause more conflicts than necessary. If two users update the same records concurrently but not the same fields, there need not be a conflict. But if they always update all fields, the last user will always try to write stale data. This will at best cause an optimistic concurrency exception or in the worst case data loss.
Useless updates. The entity is marked as modified, no matter what. Unchanged entities will also fire an update. This may easily occur if edit windows can be opened to see details and closed by OK.
So it's a fine balance. Reduce roundtrips or reduce redundancy.
Anyway, an alternative to setting the state to Modified is (using DbContext API):
void UpdateDepartment(Department department)
{
var dpt = context.Departments.Find(department.Id);
context.Entry(dpt).CurrentValues.SetValues(department);
context.SaveChanges();
}
CurrentValues.SetValues marks individual properties as Modified.
Or attach a disconnected entity and mark individual properties as Modified manually:
context.Entry(dpt).State = System.Data.Entity.EntityState.Unchanged;
context.Entry(dpt).Property(d => d.Name).IsModified = true;

copy records from between two databases using EF

I need to copy data from one database to another with EF. E.g. I have the following table relations: Forms->FormVersions->FormLayouts... We have different forms in both databases and we want to collect them to one DB. Basically I want to load Form object recursively from one DB and save it to another DB with all his references. Also I need to change IDs of the object and related objects if there are exists objects with the same ID in the second database.
Until now I have following code:
Form form = null;
using (var context = new FormEntities())
{
form = (from f in context.Forms
join fv in context.FormVersions on f.ID equals fv.FormID
where f.ID == 56
select f).First();
}
var context1 = new FormEntities("name=FormEntities1");
context1.AddObject("Forms", form);
context1.SaveChanges();
I'm receiving the error: "The EntityKey property can only be set when the current value of the property is null."
Can you help with implementation?
The simplest solution would be create copy of your Form (new object) and add that new object. Otherwise you can try:
Call context.Detach(form)
Set form's EntityKey to null
Call context1.AddObject(form)
I would first second E.J.'s answer. Assuming though that you are going to use Entity Framework, one of the main problem areas that you will face is relationship management. Your code should use the Include method to ensure that related objects are included in the results of a select operation. The join that you have will not have this effect.
http://msdn.microsoft.com/en-us/library/bb738708.aspx
Further, detaching an object will not automatically detach the related objects. You can detach them in the same way however the problem here is that as each object is detached, the relationships that it held to other objects within the context are broken.
Manually restoring the relationships may be an option for you however it may be worthwhile looking at EntityGraph. This framework allows you to define object graphs and then perform operations such as detach upon them. The entire graph is detached in a single operation with its relationships intact.
My experience with this framework has been in relation to RIA Services and Silverlight however I believe that these operations are also supported in .Net.
http://riaservicescontrib.codeplex.com/wikipage?title=EntityGraphs
Edit1: I just checked the EntityGraph docs and see that DetachEntityGraph is in the RIA specific layer which unfortunately rules it out as an option for you.
Edit2: Alex Jame's answer to the following question is a solution to your problem. Don't load the objects into the context to begin with - use the notracking option. That way you don't need to detach them which is what causes the problem.
Entity Framework - Detach and keep related object graph
If you are only doing a few records, Ladislav's suggestion will probably work, but if you are moving lots of data, you should/could consider doing this move in a stored procedure. The entire operation can be done at the server, with no need to move objects from the db server, to your front end and then back again. A single SP call would do it all.
The performance will be a lot better which may or may not not matter in your case.

Finding Entity Framework contexts

Through various questions I have asked here and other forums, I have come to the conclusion that I have no idea what I'm doing when it comes to the generated entity context objects in Entity Framework.
As background, I have a ton of experience using LLBLGen Pro, and Entity Framework is about three weeks old to me.
Lets say I have a context called "myContext". There is a table/entity called Employee in my model, so I now have a myContext.Employees. I assume this to mean that this property represents the set of Employee entities in my context. However, I assume wrong, as I can add a new entity to the context with:
myContext.Employees.AddObject(new Employee());
and this new Employee entity appears nowhere in myContext.Employees. From what I gather, the only way to find this newly added entity is to track it down hiding in the myContext.ObjectStateManager. This sounds to me like the myContext.Employees set is in fact not the set of Employee entities in the context, but rather some kind of representation of the Employee entities that exist in the database.
To add further to this confusion, Lets say I am looking at a single Employee entity. There is a Project entity that has a M:1 relationship with Employee (an employee can have multiple projects). If I want to add a new project to a particular employee, I just do:
myEmployee.Projects.Add(new Project());
Great, this actually adds the Project to the collection as I would expect. But this flies right in the face of how the ObjectSet properties off of the context work. If I add a new Project to the context with:
myContext.Projects.AddObject(new Project());
this does not alter the Projects set.
I would appreciate it very much if someone were to explain this to me. Also, I really want a collection of all the Employees (or Projects) in the context, and I want it available as a property of the context. Is this possible with EF?
An ObjectSet is a query. Like everything in LINQ, it's lazy. It does nothing until you either enumerate it or call a method like .Count(), at which point a database query is run, and any returned entities are merged with those already in the context.
So you can do something like:
var activeEmployees = Context.Employees.Where(e => e.IsActive)
...without running a query.
You can further compose this:
var orderedEmployees = activeEmployees.OrderBy(e => e.Name);
...again, without running a query.
But if you look into the set:
var first = orderedEmployees.First();
...then a DB query is run. This is common to all LINQ.
If you want to enumerate entities already in the context, you need to look towards the ObjectStateManager, instead. So for Employees, you can do:
var states = EntityState.Added || EntityState.Deleted || // whatever you need
var emps = Context.ObjectStateManager.GetObjectStateEntries(states)
.Select(e => e.Entity)
.OfType<Employee>();
Note that although this works, it is not a way that I would recommend working. Typically, you do not want your ObjectContexts to be long-lived. For this, and other reasons, they are not really suitable to be a general-purpose container of objects. Use the usual List types for that. It is more accurate to think of an ObjectContext as a unit of work. Typically, in a unit of work you already know which instances you are working with.