I'm new to entity framework and seeing different approaches for updating.
public void Update (Model model)
{
var modelInDb = context.Singe(m => m.Id == model.Id);
modelInDb.Name = "New Name";
context.SaveChanges();
}
public void Update (Model model)
{
context.Customer.Attach(model);
model.Name = "New Name";
context.SaveChanges();
}
Why I should use attach over single? Could you explain difference.
Passing entities between client and server should be considered an antipattern because it can make your system vulnerable to man in the browser and similar attacks.
Your 2 examples don't really outline much because you are setting the updated value solely in your method, rather than based on input from the view. A more common example for an update would be:
public void Update (Model model)
{
var modelInDb = context.Models.Singe(m => m.Id == model.Id);
modelInDb.Name = model.Name;
context.SaveChanges();
}
and
public void Update (Model model)
{
context.Models.Attach(model);
context.Entity(model).State = EntityState.Modified;
context.SaveChanges();
}
In your example, if your method sets the modifications then the UPDATE SQL statement should be Ok, just modifying the customer Name. However, if you attach the model, and set it's state to Modified to save the new model fields to the DB, it will update all columns.
Of these two examples, the first is better than the second for a number of reasons. The first example is loading the data from the context and copying across only the data you expect to be able to change from the view. The second is taking the model from the view as-is, attaching it to the context, and will overwrite any existing fields. Attackers can discover this and use the behaviour to alter data your view did not allow to change. A customer Order for instance might contain a lot of data about an order including relationships for products, discounts, etc. A user may not see any of these details in their view, but by passing an Entity graph, all of it is visible in the web request data. Not only is this going to be sending far more information to the client than the client needs (slower) but it can be altered in debug tools and the like prior to reaching your service as well. Attaching and updating the returned entity exposes your system to tampering.
Additionally you risk overwriting stale data in your objects. With option 1 you are loading the "right now" copy of the entity. A simple check to a Row Version Number or Last Modified Date between your passed in data and the current DB copy can signal whether that row had changed since the copy passed to the client a while ago. With the 2nd method, you can inadvertently erase modifications to data without a trace.
The better approach is to pass ViewModels to and from your view. By using Select or Automapper to fill a view model, you avoid exposing any more about your domain than the client needs to see. You also only accept back the data needed for the operation the client can perform. This reduces the payload size and reduces the vulnerability to tampering. I've seen an alarming # of examples, even from Microsoft, passing Entities around between client and server. It looks practical since the objects are already there, but this is wasteful for resources/performance, troublesome for dealing with cyclic references and serialization, and prone to data tampering and stale data overwrites.
Related
In the following, author advises to not partially initialize domain entities.
As we stated earlier, each customer must have no more than 5 contacts. By not returning the contacts along with the customers themselves, we leave a hole in our domain model which allows us to add a 6th contact and thus break this invariant.
Because of that, the practice of partial initialization should be avoided. If your repository returns a list of domain entities (or just a single domain entity), make sure the entities are fully initialized meaning that all their properties are filled out.
https://enterprisecraftsmanship.com/posts/partially-initialized-entities-anti-pattern/
So, should we have to load the whole object graph? A customer with all contacts and all related things or entity framework lazy loading would help?
It probably has less to do with the object graph and more to do with the invariants involved.
As someone posted in the comments of that post, a performance issue may very well arise when there are 1000's of permitted contacts. An example of something to this effect may be that a Customer may only have, say, 5 active Order instances. Should all order instances linked to the customer be loaded? Most certainly not. In fact, an Order is another aggregate and an instance of one aggregate should not be contained in another aggregate. You could use a value object containing the id of the other aggregate but for a great many of these the same performance issue may manifest itself.
An alternative may be to simply keep a ContactCount or, in my example, an ActiveOrderCount which is kept consistent. If the actual relationships are to be stored/removed then these may be attached to the relevant aggregate when adding/removing in order to persist the change but that is a transient representation.
So, should we have to load the whole object graph? A customer with all contacts and all related things or entity framework lazy loading would help?
The answer is, actually, a resounding "yes". However, your object model should not be deep. You should make every attempt to create small aggregates. I try to model my aggregates with a single root entity and then containing value objects. The entire aggregate is loaded. Lazy-loading is probably an indication that you are querying your domain which is something that I suggest one not do. Rather create a simple query mechanism that uses some read model to return return the relevant data for your front-end.
The anti-pattern of partially loaded entities has to do with both graphs (children and relatives) as well as data within an entity. The reason it is an anti-pattern is because any code that is written to accept, and expect an entity should be given a complete and valid entity.
This is not to say that you always must load a complete entity, it is that if you ever return an entity, it should be a complete, or complete-able entity. (proxies associated to a live DbContext)
An example of a partially loaded example and why it goes bad:
Someone goes to write the following method that an MVC controller will call to get a customer and return it to a view...
public IEnumerable<Customer> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria)).ToList();
}
}
Code like this may have worked earlier with simpler entities, but Customer had related data like Orders and when MVC went to serialize it, they got an error because the Orders proxies could not lazy load due to the DbContext being disposed. The options were to somehow eager-load all related details with this call to return the complete customer, completely disable lazy loading proxies, or return an incomplete customer. Since this method would be used to display a summary list of just customer details, the author could choose to do something like:
public IEnumerable<Customer> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria))
.Select(x => new Customer
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Any other fields that we want to display...
}).ToList();
}
}
The problem seems solved. The trouble with this approach, or turning off lazy load proxies, is that you are returning a class that implies "I am a Customer Entity". That object may be serialized to a view, and de-serialized back from a view and passed to another method that is expecting a Customer Entity. Modifications to your code down the road will need to somehow determine which "Customer" objects are actually associated with a DbContext (or a complete, disconnected entity) vs. one of these partial, and incomplete Customer objects.
Eager-loading all of the related data would avoid the issue of the partial entity, however it is both wasteful in terms of performance and memory usage, and prone to bugs as entities evolve as when relatives are added they need to be eager-fetched in the repository or could result in lazy load hits, errors, or incomplete entity views getting introduced down the road.
Now in the early days of EF & NHibernate you would be advised to always return complete entities, or write your repositories to never return entities, instead, return DTOs. For example:
public IEnumerable<CustomerDTO> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria))
.Select(x => new CustomerDTO
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Any other fields that we want to display...
}).ToList();
}
}
This is a better approach than the above one because by returning and using the CustomerDTO, there is absolutely no confusion between this partial object and a Customer entity. However, this solution has its drawbacks. One is that you may have several similar, but different views that need customer data, and some may need a bit extra or some of the related data. Other methods will have different search requirements. Some will want pagination or sorting. Using this approach will be similar to the article's example where you end up with a repository returning several similar, but different DTOs with a large number of variant methods for different criteria, inclusions, etc. (CustomerDTO, CustomerWithAddressDTO, etc. etc.)
With modern EF there is a better solution available for repositories, and that is to return IQueryable<TEntity> rather than IEnumerable<TEntity> or even TEntity. For example, to search for customers leveraging IQueryable:
public IEnumerable<Customer> GetCustomers()
{
return Context.Customers.Where(x => x.IsActive)
}
Then, when your MVC Controller goes to get a list of customers with it's criteria:
using (var contextScope = ContextScopeFactory.Create())
{
return CustomerRepository.GetCustomers()
.Where(x => x.CustomerName.Contains(criteria)
.Select(x => new CustomerViewModel
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Details from customer and related entities as needed.
}).ToList();
}
By returning IQueryable the repository does not need to worry about complete vs. incomplete representations of entities. It can enforce core rules such as active state checking, but leave it up to the consumers to filter, sort, paginate, or otherwise consume the data as they see fit. This keeps the repositories very lightweight and simple to work with while allowing controllers and services that consume them to be unit tested with mocks in place of the repositories. The controllers should consume the entities returned by the repository, but take care not to return these entities themselves. Instead they can populate view models (or DTOs) to hand over to the web client or API consumer to avoid partial entities being passed around and confused for real entities.
This applies to cases even when a repository is expected to return just 1 entity, returning IQueryable has it's advantages.
for instance, comparing:
public Customer GetCustomerById(int customerId)
{
return Context.Customers.SingleOrDefault(x => x.CustomerId == customerId);
}
vs.
public IQueryable<Customer> QGetCustomerById(int customerId)
{
return Context.Customers.Where(x => x.CustomerId == customerId);
}
These look very similar, but to the consumer (controller/service) it would be a bit different.
var customer = CustomerRepository.GetCustomerById(customerId);
vs.
var customer = CustomerRepository.QGetCustomerById(customerId).Single();
Slightly different, but the 2nd is far more flexible. If we just wanted to check if a customer existed?
var customerExists = CustomerRepository.GetCustomerById(customerId) != null;
vs.
var customerExists = CustomerRepository.QGetCustomerById(customerId).Any();
The first would execute a query that loads the entire customer entity. The second merely executes an Exists check query. When it comes to loading related data? The first method would need to rely on lazy loading or simply not have related details available, where as the IQueryable method could:
var customer = CustomerRepository.QGetCustomerById(customerId).Include(x => x.Related).Single();
or better, if loading a view model with or without related data:
var customerViewModel = CustomerRepository.QGetCustomerById(customerId)
.Select(x => new CustomerViewModel
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
RelatedName = x.Related.Name,
// ... etc.
}).Single();
Disclaimer: Actual mileage may vary depending on your EF version. EF Core has had a number of changes compared to EF6 around lazy loading and query building.
A requirement for this pattern is that the DbContext either has to be injected (DI) or provided via a unit of work pattern as the consumer of the repository will need to interact with the entities and their DbContext when materializing the query created by the repository.
A case where using a partially initialized entity is perfectly valid would be when performing a Delete without pre-fetching the entity. For instance in cases where you're certain a particular ID or range of IDs needs to be deleted, rather than loading those entities to delete you can instantiate a new class with just that entity's PK populated and tell the DbContext to delete it. The key point when considering the use of incomplete entities would be that it is only cases where the entity only lives within the scope of the operation and is not returned to callers.
If there are two Customer domain models representing different bounded contexts (with different/overlapping fields), how are you supposed to Update this certain X bounded context Customer in the database that might be holding both those Customer domains in one POCO (or maybe Y bounded context Customer additionally uses a list of Orders of this same context)?
Also I could put it this way. How do you solve cases when domain models maps many to one with the database POCO?
Does it mean that repository would have to query db once more, but this time whole POCO object from DB, update its values accordingly and finally make the update?
It would help if you provided the 2 contexts and overlapping attributes of Customer. For the purpose of this answer Ill use the contexts: 'Sales' and 'Marketing' and the shared attribute is 'Preferred Name'
My initial thought based on the phrase 'overlapping fields' is that you need to revisit your model as you should not have 2 models responsible for a specific value otherwise you have concurrency/race conditions.
Try and think how your clients would resolve the situation in the old days of pen & paper. Who would own the 'customer' file? Would sales and marketing each have their own version, or would marketing rely on sales copy (or visa versa)?
Also, one of the most powerful aspects of DDD is it forces your persistence concerns way out into you infrastructure layers where they belong. You do not have to use EF for all your repository calls, if it easier to hand craft some sql for a specific persistence call then do it.
--Scenario 1: Overlapping field is not overlapping--
In this case, the domain experts came to realise that Sales.Customer.PreferredName and Marketing.Customer.PreferredName are independent attributes and can be different between contexts. Marketing often used the field for their cute we are you best pals campaign correspondence whilst Sales preferred to keep on file the most un-ambiguous
The CUSTOMER database table has 2 fields: PreferredNameSale and PreferredNameMarketing.
The 2 Concrete Repositories will end up looking something like:
class Sales.Repositories.ClientRepository : Domain.Sales.IClientRepository {
Update(Domain.Sales.Client salesClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(salesClient.Id);
dbClient.PreferredNameSales = salesClient.PreferredName;
db.SaveChanges();
}
}
}
class Marketing.Repositories.ClientRepository : Domain.Marketing.IClientRepository {
Update(Domain.Marketing.Client marketingClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(marketingClient.Id);
dbClient.PreferredNameMarketing = marketingClient.PreferredName;
db.SaveChanges();
}
}
}
Entity Framework should notice that only 1 field was changed and send the appropriate update client set field=newvalue where id=1 to the database.
There should be no concurrency issues when sales and marketing update their version of a single clients preferred name at the same time.
Also note that EF is providing a lot of overhead and very little value here. The same work could be completed with a simple parameterised SqlCommand.Execute()
--Scenario 2: Overlapping field is overlapping--
Your model is broken but it is too late to fix it properly. You lie to yourself that the chances of sales and marketing trying to change the preferred name at the same time is tiny and even if it happens, it should be rare that hopefully the user will blame themselves for not using the system correctly.
In this case, there is only one database field: client.PreferredName and as with scenario 1, the functions work on the same table/field:
class Sales.Repositories.ClientRepository : Domain.Sales.IClientRepository {
Update(Domain.Sales.Client salesClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(salesClient.Id);
dbClient.PreferredName = salesClient.PreferredName;
db.SaveChanges();
}
}
}
class Marketing.Repositories.ClientRepository : Domain.Marketing.IClientRepository {
Update(Domain.Marketing.Client marketingClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(marketingClient.Id);
dbClient.PreferredName = marketingClient.PreferredName;
db.SaveChanges();
}
}
}
The Obviously problem is that a save at the same time by both sales and marketing will end up with last one wins in terms of persisted data. You can try and mitigate this with lastupdated timestamps and so on but it will just get more messy and broken. Review your model and remember: DB MODEL != DOMAIN MODEL != UI View Model
Each bounded context is required to have its own database. That is it, there should be no discussion here. Violating this rule leads to severe consequences, which have been discussed many times.
Overlapping fields smell, different bounded contexts have different concerns and therefore do not require to have much data to share. The best case is when the only thing you share is the aggregate identity. If in your world you have one Customer that have different concerns handled by two different bounded contexts, you can use one CustomerId value for both bounded contexts.
If you really need to sync some data, you need to have it in both models, therefore in both persistent stores (I intentionally avoid the word database here) and you can sync the data using domain events. This is very common.
I'm using Entity Framework 4.1. I've implemented a base repository using lots of the examples online. My repository get methods take a bool parameter to decide whether to track the entities. Sometimes, I want to load an entity and track it, other times, for some entities, I simply want to read them and display them (i.e. in a graph). In this situation there is never a need to edit, so I don't want the overhead of tracking them. Also, graph entities are sent to a silverlight client, so the entities are disconnected from the context. Hence my Get methods can return a list of entities that are either tracked or not. This is achieved dynamically creating the query as follows:
DbQuery<E> query = Context.Set<E>();
// Track the entities in the context?
if (!trackEntities)
{
query = query.AsNoTracking();
}
However, I now want to enable the user to interact with the graph and edit it. This will not happen very often, so I still want to get some entities without tracking them but to have the ability to save them. To do this I simply attach them to the context and set the state as modified. Everything is working so far.
I am auditing any changes by overriding the SaveChanges method. As explained above I may, in some low cases, need to save modified entities that were disconnected. So to audit, I have to retrieve the current values from the database and then compare to work out what was changed while disconnected. If the entity has been tracked, there is no need to get the old values, as I've got access to them via the state manager. I'm not using self tracking entities, as this is overkill for my requirements.
QUESTION: In my auditing method I simply want to know if the modified entity is tracked or not, i.e. do I need to go to the db and get the original values?
Cheers
DbContext.ChangeTracker.Entries (http://msdn.microsoft.com/en-us/library/gg679172(v=vs.103).aspx) returns DbEntityEntry objects for all tracked entities. DbEntityEntry has Entity property that you could use to find out whether the entity is tracked. Something like
var isTracked = ctx.ChangeTracker.Entries().Any(e => Object.ReferenceEquals(e.Entity, myEntity));
The official documentation says to modify an entity I retrieve a DbEntityEntry object and either work with the property functions or I set its state to modified. It uses the following example
Department dpt = context.Departments.FirstOrDefault();
DbEntityEntry entry = context.Entry(dpt);
entry.State = EntityState.Modified;
I don't understand the purpose of the 2nd and 3rd statement. If I ask the framework for an entity like the 1st statement does and then modify the POCO as in
dpt.Name = "Blah"
If I then ask EF to SaveChanges(), the entity has a status of MODIFIED (I'm guessing via snapshot tracking, this isn't a proxy) and the changes are persisted without the need to manually set the state. Am I missing something here?
In your scenario you indeed don't have to set the state. It is purpose of change tracking to find that you have changed a value on attached entity and put it to modified state. Setting state manually is important in case of detached entities (entities loaded without change tracking or created outside of the current context).
As said, in a scenario with disconnected entities it can be useful to set an entity's state to Modified. It saves a roundtrip to the database if you just attach the disconnected entity, as opposed to fetching the entity from the database and modifying and saving it.
But there can be very good reasons not to set the state to Modified (and I'm sure Ladislav was aware of this, but still I'd like to point them out here).
All fields in the record will be updated, not only the changes. There are many systems in which updates are audited. Updating all fields will either cause large amounts of clutter or require the auditing mechanism to filter out false changes.
Optimistic concurrency. Since all fields are updated, this may cause more conflicts than necessary. If two users update the same records concurrently but not the same fields, there need not be a conflict. But if they always update all fields, the last user will always try to write stale data. This will at best cause an optimistic concurrency exception or in the worst case data loss.
Useless updates. The entity is marked as modified, no matter what. Unchanged entities will also fire an update. This may easily occur if edit windows can be opened to see details and closed by OK.
So it's a fine balance. Reduce roundtrips or reduce redundancy.
Anyway, an alternative to setting the state to Modified is (using DbContext API):
void UpdateDepartment(Department department)
{
var dpt = context.Departments.Find(department.Id);
context.Entry(dpt).CurrentValues.SetValues(department);
context.SaveChanges();
}
CurrentValues.SetValues marks individual properties as Modified.
Or attach a disconnected entity and mark individual properties as Modified manually:
context.Entry(dpt).State = System.Data.Entity.EntityState.Unchanged;
context.Entry(dpt).Property(d => d.Name).IsModified = true;
Suppose I have a couple of table:
person(personid, name, email)
employee(personid,cardno, departmentID) //personid,departmentID is foreign key
department(departmentID, departmentName)
employeePhone(personID, phoneID) //this is relationship table
phone(phoneID, phonenumer)
When EntityFramework generate entity class for employee, this class have members like:
public partial class employee{
int _personid;
string _cardno;
string _departmentName;
person _person;
department _department;
//......
}
by default, when this class is loaded, only data available for employee table column, not data for associated entities data loaded.
If I use Linq to get the data at client side, Include should be used for linq query.
My question is: I want to the associated entities data loaded at server side when the employee is instantiated at server side. So when I get the entity at client side, all data available already so that I can easy to bind it to UI.
How to implement this request?
Don't bind entity types to your UI. This couples the UI to the entity layer. Loading will be the least of your problems. With a coupled UI, you violate the single responsibility principle, require blacklists/whitelists to be maintained to have any form of security, break types which can't deal with circular references, you have poor performance because you load all fields from all related types, etc., etc., etc.
Instead, create a dedicated view model and project onto it:
var pm = (from e in Context.Employees
where e.Id == id
select new EmployeePresentation
{
EmployeeNumber = e.Number,
Name = e.Person.Name,
// etc.
}.First();
Because this is LINQ to Entities, the fields you reference in Person, etc., are automatically loaded, without requiring eager loading, lazy loading, or explicit Load(). But only those fields, not the entirety of Person, as with every other method.
Update, per comments
Using presentation models is also important for updates. It is not the case that I want a user to be able to update every field that they can see. Different presentation models for the same entity might have different validation/scaffolding rules since they're used at different points in data flow within the app. Also, the user should implicitly update fields they cannot see (e.g., timestamp).
Generically, my updates look like this (ASP.MVC Web app):
public ActionResult Update(EmployeePresentation model)
{
if (!ModelState.IsValid)
{
// User violated validation rule on presentation model.
return View(model);
}
Repository.Update(model.Id, delegate(Employee employee)
{
model.UpdateEmployee(employee);
});
return RedirectToAction("Index");
}
Note that there is no possibility of the user ever updating something they're not allowed to, in a typechecked, type-safe way, and that the model binding, the presentation model and the repository can all be extended for custom behavior.
What you are hoping for is Lazy loading of the dependent Entities. There is an article located at the link below that talks about this and at the bottom of the article it also give you an explanation of how to perform a lazy load should you still feel the need.
See "Configuring Lazy Loading in Entity Framework" in this article:
Entity Framework and Lazy Loading