Best approach to load or reload entity from database in EF Core - entity-framework-core

I have a DbSet which will have some of the entities loaded, not all. I want to retrieve the latest entity (including reloading it if required). However, I want to avoid hitting the database twice.
For a single entity - failed approach:
// Load entity
// If the entity is already in memory, this will not hit the database
// If the entity is not already loaded into memory, this will hit the database
var myCar = Context.Cars.FirstOrDefault(c => c.CarId == carId);
// Reload, in case the entity has changed in the database
// This will hit the database a second time, if the entity was NOT already loaded into memory
await Context.Entry(myCar).ReloadAsync().ConfigureAwait(false);
Convoluted approach:
// To retrieve from in-memory, use .Local:
var myCar = Context.Cars.Local.FirstOrDefault(c => c.CarId == carId);
if (myCar != null)
{
// Entity already loaded into memory
// It requires a reload
await Context.Entry(myCar).ReloadAsync().ConfigureAwait(false);
}
else
{
// Entity is not loaded into memory
// Load entity from database (don't use .Local)
myCar = Context.Cars.FirstOrDefault(c => c.CarId == carId);
}
This will only hit the db once, but is messy. I am looking for a single call that does this: "get the entity, ensuring it's the latest".
Edit for clarification:
The question is:
How do I neatly and simply retrieve the contents of the database, in a clean and succinct manner, and ensure that the database is only hit once (ie: no redundant .Reload() to ensure I have the latest).
I know:
(a) If I have loaded the entity, there are definitely not any local changes. There may be changes from another user or process in the database. The question is "How do I hit the database once and only once" in order to get the changes from the other user. Calling FirstOrDefault is NOT loading the changes, hence I am calling Reload as well.
(b) I may not have the entity in memory. I simply need to load the entity using FirstOrDefault. In which case, the Reload will hit the database again.
From the first answer:
The correct solution should be to ONLY call FirstOrDefault(), it should load the changes in the case of (a), or load the entity in case of (b).
My problem, however, now remains as follows:
I don't know why FirstOrDefault still isn't loading changes. Since I know that there are definitely no local changes (State == Unchanged), I would expect FirstOrDefault to load changes from the database, but it's not doing so. This is why I'm calling .Reload(), but instead I need to find out why FirstOrDefault is not loading the changes from the database, which will negate the need to call .Reload(). This will solve the issue.

// If the entity is already in memory, this will not hit the database
var myCar = Context.Cars.FirstOrDefault(c => c.CarId == carId);
This is not correct. The statement will hit the database, but EF doesn't overwrite changes. It only looks like the database wasn't hit.
That's because EF doesn't want to overwrite local changes unasked. The same Car instance could, after having been changed, be contained in the result of subsequent LINQ statements that would load it from the database (and sometimes not clearly visible, e.g. by lazy loading). The changes should still be saved.
That said, what I think you're after is a method that only reloads an entity object (hitting the database) if it's modified and does nothing if it's not. For example as a method in a DbContext class:
async Task ReloadIfModified<TEntity>(TEntity entityObject) where TEntity : class
{
var entry = Entry(entityObject);
if (entry.State != EntityState.Modified) return;
await entry.ReloadAsync();
}
This assumes that deleted objects shouldn't be reloaded. If you want Detached (or even Added) objects to be reloaded you can change that in the method.
Note that this is not the same as Context.Cars.Find(1). The Find method doesn't hit the database if the entity object is already tracked, which is good, but it doesn't reload a modified object (which is not what you want here).

Related

How does EntityFramework Core mange data internally?

I'm trying to understand how EntityFramework Core manages data internally because it influences how I call DbSets. Particularly, does it refer to in-memory data or re-query the database every time?
Example 1)
If I call _context.ToDo.Where(x => x.id == 123).First() and then in a different procedure call the same command again, will EF give me the in-memory value or re-query the DB?
Example 2)
If I call _context.ToDo.Where(x => x.id == 123).First() and then a few lines later call _context.ToDo.Find(123).Where(x => x.id == 123).Incude(x => x.Children).First(), will it use the in-memeory and then only query the DB for "Children" or does it recall the entire dataset?
I guess I'm wondering if it matters if I duplicate a call or not?
Is this affected by the AsNoTracking() switch?
What you really ask is how caching works in EF Core, not how DbContext manages data.
EF always offered 1st level caching - it kept the entities it loaded in memory, as long as the context remains alive. That's how it can track changes and save all of them when SaveChanges is called.
It doesn't cache the query itself, so it doesn't know that Where(....).First() is meant to return those specific entities. You'd have to use Find() instead. If tracking is disabled, no entities are kept around.
This is explained in Querying and Finding Entities, especially Finding entities using primary keys:
The Find method on DbSet uses the primary key value to attempt to find an entity tracked by the context. If the entity is not found in the context then a query will be sent to the database to find the entity there. Null is returned if the entity is not found in the context or in the database.
Find is different from using a query in two significant ways:
A round-trip to the database will only be made if the entity with the given key is not found in the context.
Find will return entities that are in the Added state. That is, Find will return entities that have been added to the context but have not yet been saved to the database.
In Example #2 the queries are different though. Include forces eager loading, so the results and entities returned are different. There's no need to call that a second time though, if the first entity and context are still around. You could just iterate over the Children property and EF would load the related entities one by one, using lazy loading.
EF will execute 1 query for each child item it loads. If you need to load all of them, this is slow. Slow enough to be have its own name, the N+1 selects problem. To avoid this you can load a related collection explicitly using explicit loading, eg. :
_context.Entry(todo).Collection(t=>t.Children).Load();
When you know you're going to use all children though, it's better to eagerly load all entities with Include().

Entity Framework update with attach or single

I'm new to entity framework and seeing different approaches for updating.
public void Update (Model model)
{
var modelInDb = context.Singe(m => m.Id == model.Id);
modelInDb.Name = "New Name";
context.SaveChanges();
}
public void Update (Model model)
{
context.Customer.Attach(model);
model.Name = "New Name";
context.SaveChanges();
}
Why I should use attach over single? Could you explain difference.
Passing entities between client and server should be considered an antipattern because it can make your system vulnerable to man in the browser and similar attacks.
Your 2 examples don't really outline much because you are setting the updated value solely in your method, rather than based on input from the view. A more common example for an update would be:
public void Update (Model model)
{
var modelInDb = context.Models.Singe(m => m.Id == model.Id);
modelInDb.Name = model.Name;
context.SaveChanges();
}
and
public void Update (Model model)
{
context.Models.Attach(model);
context.Entity(model).State = EntityState.Modified;
context.SaveChanges();
}
In your example, if your method sets the modifications then the UPDATE SQL statement should be Ok, just modifying the customer Name. However, if you attach the model, and set it's state to Modified to save the new model fields to the DB, it will update all columns.
Of these two examples, the first is better than the second for a number of reasons. The first example is loading the data from the context and copying across only the data you expect to be able to change from the view. The second is taking the model from the view as-is, attaching it to the context, and will overwrite any existing fields. Attackers can discover this and use the behaviour to alter data your view did not allow to change. A customer Order for instance might contain a lot of data about an order including relationships for products, discounts, etc. A user may not see any of these details in their view, but by passing an Entity graph, all of it is visible in the web request data. Not only is this going to be sending far more information to the client than the client needs (slower) but it can be altered in debug tools and the like prior to reaching your service as well. Attaching and updating the returned entity exposes your system to tampering.
Additionally you risk overwriting stale data in your objects. With option 1 you are loading the "right now" copy of the entity. A simple check to a Row Version Number or Last Modified Date between your passed in data and the current DB copy can signal whether that row had changed since the copy passed to the client a while ago. With the 2nd method, you can inadvertently erase modifications to data without a trace.
The better approach is to pass ViewModels to and from your view. By using Select or Automapper to fill a view model, you avoid exposing any more about your domain than the client needs to see. You also only accept back the data needed for the operation the client can perform. This reduces the payload size and reduces the vulnerability to tampering. I've seen an alarming # of examples, even from Microsoft, passing Entities around between client and server. It looks practical since the objects are already there, but this is wasteful for resources/performance, troublesome for dealing with cyclic references and serialization, and prone to data tampering and stale data overwrites.

How to update a dbSet if the record has changed in the database?

If I detect that a record in the database has been changed by another process and the record is a member of my current dbSet, how do I update that record to reflect the data of the changed database record? As I understand it Attach will fail if the item is already present.
I use a large subset of the data in-memory to update status on the screen and allow users to change the data. Other processes do the same.
I am using EF7 (core) which is probably not that smart as I am new to EF.
You can use Load() method to rehydrate the affected entity from database:
Just in case, first do get your entity detached
DbContext.Entry(changed).State = EntityState.Detached;
DbContext.MyDbSet.Where(e => e.Id == changed.Id).Load();
Above, Load() will get the entry again into the set.
Before you can use it, you will need to import Microsoft.Data.Entity namespace

Deleting an object with SQL and updating the Context

I am using EF and have a context with which I have deleted a row in a table using a simple SQL call through the ExecuteStoreCommand function on the context (I have to do it this way for other reasons). It works fine, but the context doesn't know what happened.
I get a problem later when using the same context when I'm trying to commit some changes because the commit does not give the expected number of affected rows.
My question is, what is the best way to update the context with the changes (deleted rows) that I've made.
I've already tried getting the deleted object and using it in the context refresh function, but it doesn't really work, probably because it (correctly) gets a null reference when trying to get the deleted object.
_ctx.Refresh(RefreshMode.StoreWins, _ctx.Employees.FirstOrDefault(s => s.EmployeeId == employeeId));
Also, using ObjectStateManager.GetObjectStateEntries doesn't work for me either, because it doesn't know which objects have been deleted.
I do not want to:
Recreate the context.
Refresh more than I need to.
Mess with lazy loading.
I just want to get the context up to date again after deleting.
Try this (you must somehow identify deleted entity - if you don't know how you are done and the solution is only a new context):
var employee = ctx.ObjectStateManager.GetObjectStateEntries(~EntityState.Detached)
.Where(e => !e.IsRelationship)
.Select(e => e.Entity)
.OfType<Employee>()
.FirstOrDefault(e => e...);
if (employee != null) ctx.Detach(employee);
Btw. don't use direct SQL modification for attached entities. It is the worst operation you can do. EF doesn't expect this and it is not able to handle it. The best solution in such case is recreating the context.

Entity Framework - Why explicitly set entity state to modified?

The official documentation says to modify an entity I retrieve a DbEntityEntry object and either work with the property functions or I set its state to modified. It uses the following example
Department dpt = context.Departments.FirstOrDefault();
DbEntityEntry entry = context.Entry(dpt);
entry.State = EntityState.Modified;
I don't understand the purpose of the 2nd and 3rd statement. If I ask the framework for an entity like the 1st statement does and then modify the POCO as in
dpt.Name = "Blah"
If I then ask EF to SaveChanges(), the entity has a status of MODIFIED (I'm guessing via snapshot tracking, this isn't a proxy) and the changes are persisted without the need to manually set the state. Am I missing something here?
In your scenario you indeed don't have to set the state. It is purpose of change tracking to find that you have changed a value on attached entity and put it to modified state. Setting state manually is important in case of detached entities (entities loaded without change tracking or created outside of the current context).
As said, in a scenario with disconnected entities it can be useful to set an entity's state to Modified. It saves a roundtrip to the database if you just attach the disconnected entity, as opposed to fetching the entity from the database and modifying and saving it.
But there can be very good reasons not to set the state to Modified (and I'm sure Ladislav was aware of this, but still I'd like to point them out here).
All fields in the record will be updated, not only the changes. There are many systems in which updates are audited. Updating all fields will either cause large amounts of clutter or require the auditing mechanism to filter out false changes.
Optimistic concurrency. Since all fields are updated, this may cause more conflicts than necessary. If two users update the same records concurrently but not the same fields, there need not be a conflict. But if they always update all fields, the last user will always try to write stale data. This will at best cause an optimistic concurrency exception or in the worst case data loss.
Useless updates. The entity is marked as modified, no matter what. Unchanged entities will also fire an update. This may easily occur if edit windows can be opened to see details and closed by OK.
So it's a fine balance. Reduce roundtrips or reduce redundancy.
Anyway, an alternative to setting the state to Modified is (using DbContext API):
void UpdateDepartment(Department department)
{
var dpt = context.Departments.Find(department.Id);
context.Entry(dpt).CurrentValues.SetValues(department);
context.SaveChanges();
}
CurrentValues.SetValues marks individual properties as Modified.
Or attach a disconnected entity and mark individual properties as Modified manually:
context.Entry(dpt).State = System.Data.Entity.EntityState.Unchanged;
context.Entry(dpt).Property(d => d.Name).IsModified = true;