Mapping DDD domain models to EF POCO - entity-framework

If there are two Customer domain models representing different bounded contexts (with different/overlapping fields), how are you supposed to Update this certain X bounded context Customer in the database that might be holding both those Customer domains in one POCO (or maybe Y bounded context Customer additionally uses a list of Orders of this same context)?
Also I could put it this way. How do you solve cases when domain models maps many to one with the database POCO?
Does it mean that repository would have to query db once more, but this time whole POCO object from DB, update its values accordingly and finally make the update?

It would help if you provided the 2 contexts and overlapping attributes of Customer. For the purpose of this answer Ill use the contexts: 'Sales' and 'Marketing' and the shared attribute is 'Preferred Name'
My initial thought based on the phrase 'overlapping fields' is that you need to revisit your model as you should not have 2 models responsible for a specific value otherwise you have concurrency/race conditions.
Try and think how your clients would resolve the situation in the old days of pen & paper. Who would own the 'customer' file? Would sales and marketing each have their own version, or would marketing rely on sales copy (or visa versa)?
Also, one of the most powerful aspects of DDD is it forces your persistence concerns way out into you infrastructure layers where they belong. You do not have to use EF for all your repository calls, if it easier to hand craft some sql for a specific persistence call then do it.
--Scenario 1: Overlapping field is not overlapping--
In this case, the domain experts came to realise that Sales.Customer.PreferredName and Marketing.Customer.PreferredName are independent attributes and can be different between contexts. Marketing often used the field for their cute we are you best pals campaign correspondence whilst Sales preferred to keep on file the most un-ambiguous
The CUSTOMER database table has 2 fields: PreferredNameSale and PreferredNameMarketing.
The 2 Concrete Repositories will end up looking something like:
class Sales.Repositories.ClientRepository : Domain.Sales.IClientRepository {
Update(Domain.Sales.Client salesClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(salesClient.Id);
dbClient.PreferredNameSales = salesClient.PreferredName;
db.SaveChanges();
}
}
}
class Marketing.Repositories.ClientRepository : Domain.Marketing.IClientRepository {
Update(Domain.Marketing.Client marketingClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(marketingClient.Id);
dbClient.PreferredNameMarketing = marketingClient.PreferredName;
db.SaveChanges();
}
}
}
Entity Framework should notice that only 1 field was changed and send the appropriate update client set field=newvalue where id=1 to the database.
There should be no concurrency issues when sales and marketing update their version of a single clients preferred name at the same time.
Also note that EF is providing a lot of overhead and very little value here. The same work could be completed with a simple parameterised SqlCommand.Execute()
--Scenario 2: Overlapping field is overlapping--
Your model is broken but it is too late to fix it properly. You lie to yourself that the chances of sales and marketing trying to change the preferred name at the same time is tiny and even if it happens, it should be rare that hopefully the user will blame themselves for not using the system correctly.
In this case, there is only one database field: client.PreferredName and as with scenario 1, the functions work on the same table/field:
class Sales.Repositories.ClientRepository : Domain.Sales.IClientRepository {
Update(Domain.Sales.Client salesClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(salesClient.Id);
dbClient.PreferredName = salesClient.PreferredName;
db.SaveChanges();
}
}
}
class Marketing.Repositories.ClientRepository : Domain.Marketing.IClientRepository {
Update(Domain.Marketing.Client marketingClient) {
using (var db = new MyEfContext()) {
var dbClient = db.Client.Fetch(marketingClient.Id);
dbClient.PreferredName = marketingClient.PreferredName;
db.SaveChanges();
}
}
}
The Obviously problem is that a save at the same time by both sales and marketing will end up with last one wins in terms of persisted data. You can try and mitigate this with lastupdated timestamps and so on but it will just get more messy and broken. Review your model and remember: DB MODEL != DOMAIN MODEL != UI View Model

Each bounded context is required to have its own database. That is it, there should be no discussion here. Violating this rule leads to severe consequences, which have been discussed many times.
Overlapping fields smell, different bounded contexts have different concerns and therefore do not require to have much data to share. The best case is when the only thing you share is the aggregate identity. If in your world you have one Customer that have different concerns handled by two different bounded contexts, you can use one CustomerId value for both bounded contexts.
If you really need to sync some data, you need to have it in both models, therefore in both persistent stores (I intentionally avoid the word database here) and you can sync the data using domain events. This is very common.

Related

Entity Framework update with attach or single

I'm new to entity framework and seeing different approaches for updating.
public void Update (Model model)
{
var modelInDb = context.Singe(m => m.Id == model.Id);
modelInDb.Name = "New Name";
context.SaveChanges();
}
public void Update (Model model)
{
context.Customer.Attach(model);
model.Name = "New Name";
context.SaveChanges();
}
Why I should use attach over single? Could you explain difference.
Passing entities between client and server should be considered an antipattern because it can make your system vulnerable to man in the browser and similar attacks.
Your 2 examples don't really outline much because you are setting the updated value solely in your method, rather than based on input from the view. A more common example for an update would be:
public void Update (Model model)
{
var modelInDb = context.Models.Singe(m => m.Id == model.Id);
modelInDb.Name = model.Name;
context.SaveChanges();
}
and
public void Update (Model model)
{
context.Models.Attach(model);
context.Entity(model).State = EntityState.Modified;
context.SaveChanges();
}
In your example, if your method sets the modifications then the UPDATE SQL statement should be Ok, just modifying the customer Name. However, if you attach the model, and set it's state to Modified to save the new model fields to the DB, it will update all columns.
Of these two examples, the first is better than the second for a number of reasons. The first example is loading the data from the context and copying across only the data you expect to be able to change from the view. The second is taking the model from the view as-is, attaching it to the context, and will overwrite any existing fields. Attackers can discover this and use the behaviour to alter data your view did not allow to change. A customer Order for instance might contain a lot of data about an order including relationships for products, discounts, etc. A user may not see any of these details in their view, but by passing an Entity graph, all of it is visible in the web request data. Not only is this going to be sending far more information to the client than the client needs (slower) but it can be altered in debug tools and the like prior to reaching your service as well. Attaching and updating the returned entity exposes your system to tampering.
Additionally you risk overwriting stale data in your objects. With option 1 you are loading the "right now" copy of the entity. A simple check to a Row Version Number or Last Modified Date between your passed in data and the current DB copy can signal whether that row had changed since the copy passed to the client a while ago. With the 2nd method, you can inadvertently erase modifications to data without a trace.
The better approach is to pass ViewModels to and from your view. By using Select or Automapper to fill a view model, you avoid exposing any more about your domain than the client needs to see. You also only accept back the data needed for the operation the client can perform. This reduces the payload size and reduces the vulnerability to tampering. I've seen an alarming # of examples, even from Microsoft, passing Entities around between client and server. It looks practical since the objects are already there, but this is wasteful for resources/performance, troublesome for dealing with cyclic references and serialization, and prone to data tampering and stale data overwrites.

Partially initializing domain entities

In the following, author advises to not partially initialize domain entities.
As we stated earlier, each customer must have no more than 5 contacts. By not returning the contacts along with the customers themselves, we leave a hole in our domain model which allows us to add a 6th contact and thus break this invariant.
Because of that, the practice of partial initialization should be avoided. If your repository returns a list of domain entities (or just a single domain entity), make sure the entities are fully initialized meaning that all their properties are filled out.
https://enterprisecraftsmanship.com/posts/partially-initialized-entities-anti-pattern/
So, should we have to load the whole object graph? A customer with all contacts and all related things or entity framework lazy loading would help?
It probably has less to do with the object graph and more to do with the invariants involved.
As someone posted in the comments of that post, a performance issue may very well arise when there are 1000's of permitted contacts. An example of something to this effect may be that a Customer may only have, say, 5 active Order instances. Should all order instances linked to the customer be loaded? Most certainly not. In fact, an Order is another aggregate and an instance of one aggregate should not be contained in another aggregate. You could use a value object containing the id of the other aggregate but for a great many of these the same performance issue may manifest itself.
An alternative may be to simply keep a ContactCount or, in my example, an ActiveOrderCount which is kept consistent. If the actual relationships are to be stored/removed then these may be attached to the relevant aggregate when adding/removing in order to persist the change but that is a transient representation.
So, should we have to load the whole object graph? A customer with all contacts and all related things or entity framework lazy loading would help?
The answer is, actually, a resounding "yes". However, your object model should not be deep. You should make every attempt to create small aggregates. I try to model my aggregates with a single root entity and then containing value objects. The entire aggregate is loaded. Lazy-loading is probably an indication that you are querying your domain which is something that I suggest one not do. Rather create a simple query mechanism that uses some read model to return return the relevant data for your front-end.
The anti-pattern of partially loaded entities has to do with both graphs (children and relatives) as well as data within an entity. The reason it is an anti-pattern is because any code that is written to accept, and expect an entity should be given a complete and valid entity.
This is not to say that you always must load a complete entity, it is that if you ever return an entity, it should be a complete, or complete-able entity. (proxies associated to a live DbContext)
An example of a partially loaded example and why it goes bad:
Someone goes to write the following method that an MVC controller will call to get a customer and return it to a view...
public IEnumerable<Customer> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria)).ToList();
}
}
Code like this may have worked earlier with simpler entities, but Customer had related data like Orders and when MVC went to serialize it, they got an error because the Orders proxies could not lazy load due to the DbContext being disposed. The options were to somehow eager-load all related details with this call to return the complete customer, completely disable lazy loading proxies, or return an incomplete customer. Since this method would be used to display a summary list of just customer details, the author could choose to do something like:
public IEnumerable<Customer> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria))
.Select(x => new Customer
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Any other fields that we want to display...
}).ToList();
}
}
The problem seems solved. The trouble with this approach, or turning off lazy load proxies, is that you are returning a class that implies "I am a Customer Entity". That object may be serialized to a view, and de-serialized back from a view and passed to another method that is expecting a Customer Entity. Modifications to your code down the road will need to somehow determine which "Customer" objects are actually associated with a DbContext (or a complete, disconnected entity) vs. one of these partial, and incomplete Customer objects.
Eager-loading all of the related data would avoid the issue of the partial entity, however it is both wasteful in terms of performance and memory usage, and prone to bugs as entities evolve as when relatives are added they need to be eager-fetched in the repository or could result in lazy load hits, errors, or incomplete entity views getting introduced down the road.
Now in the early days of EF & NHibernate you would be advised to always return complete entities, or write your repositories to never return entities, instead, return DTOs. For example:
public IEnumerable<CustomerDTO> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria))
.Select(x => new CustomerDTO
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Any other fields that we want to display...
}).ToList();
}
}
This is a better approach than the above one because by returning and using the CustomerDTO, there is absolutely no confusion between this partial object and a Customer entity. However, this solution has its drawbacks. One is that you may have several similar, but different views that need customer data, and some may need a bit extra or some of the related data. Other methods will have different search requirements. Some will want pagination or sorting. Using this approach will be similar to the article's example where you end up with a repository returning several similar, but different DTOs with a large number of variant methods for different criteria, inclusions, etc. (CustomerDTO, CustomerWithAddressDTO, etc. etc.)
With modern EF there is a better solution available for repositories, and that is to return IQueryable<TEntity> rather than IEnumerable<TEntity> or even TEntity. For example, to search for customers leveraging IQueryable:
public IEnumerable<Customer> GetCustomers()
{
return Context.Customers.Where(x => x.IsActive)
}
Then, when your MVC Controller goes to get a list of customers with it's criteria:
using (var contextScope = ContextScopeFactory.Create())
{
return CustomerRepository.GetCustomers()
.Where(x => x.CustomerName.Contains(criteria)
.Select(x => new CustomerViewModel
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Details from customer and related entities as needed.
}).ToList();
}
By returning IQueryable the repository does not need to worry about complete vs. incomplete representations of entities. It can enforce core rules such as active state checking, but leave it up to the consumers to filter, sort, paginate, or otherwise consume the data as they see fit. This keeps the repositories very lightweight and simple to work with while allowing controllers and services that consume them to be unit tested with mocks in place of the repositories. The controllers should consume the entities returned by the repository, but take care not to return these entities themselves. Instead they can populate view models (or DTOs) to hand over to the web client or API consumer to avoid partial entities being passed around and confused for real entities.
This applies to cases even when a repository is expected to return just 1 entity, returning IQueryable has it's advantages.
for instance, comparing:
public Customer GetCustomerById(int customerId)
{
return Context.Customers.SingleOrDefault(x => x.CustomerId == customerId);
}
vs.
public IQueryable<Customer> QGetCustomerById(int customerId)
{
return Context.Customers.Where(x => x.CustomerId == customerId);
}
These look very similar, but to the consumer (controller/service) it would be a bit different.
var customer = CustomerRepository.GetCustomerById(customerId);
vs.
var customer = CustomerRepository.QGetCustomerById(customerId).Single();
Slightly different, but the 2nd is far more flexible. If we just wanted to check if a customer existed?
var customerExists = CustomerRepository.GetCustomerById(customerId) != null;
vs.
var customerExists = CustomerRepository.QGetCustomerById(customerId).Any();
The first would execute a query that loads the entire customer entity. The second merely executes an Exists check query. When it comes to loading related data? The first method would need to rely on lazy loading or simply not have related details available, where as the IQueryable method could:
var customer = CustomerRepository.QGetCustomerById(customerId).Include(x => x.Related).Single();
or better, if loading a view model with or without related data:
var customerViewModel = CustomerRepository.QGetCustomerById(customerId)
.Select(x => new CustomerViewModel
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
RelatedName = x.Related.Name,
// ... etc.
}).Single();
Disclaimer: Actual mileage may vary depending on your EF version. EF Core has had a number of changes compared to EF6 around lazy loading and query building.
A requirement for this pattern is that the DbContext either has to be injected (DI) or provided via a unit of work pattern as the consumer of the repository will need to interact with the entities and their DbContext when materializing the query created by the repository.
A case where using a partially initialized entity is perfectly valid would be when performing a Delete without pre-fetching the entity. For instance in cases where you're certain a particular ID or range of IDs needs to be deleted, rather than loading those entities to delete you can instantiate a new class with just that entity's PK populated and tell the DbContext to delete it. The key point when considering the use of incomplete entities would be that it is only cases where the entity only lives within the scope of the operation and is not returned to callers.

What is the best apporach to create a Model class from DB table

I am trying to use a table suppose Account table from database containing 20 around columns and for different views I have requirement of different columns like for basic entries I need 10 columns for data insertion from one department and 5,5 from other departments. So, there will be 3 views who requires data or communicate between that table via Model So what will be the best approach to use:
1) Create 3 models like one contains only 10 columns and other 5-5?
2) Using only single model containing all columns. Isn't it gonna be carrying unnecessary data?
I know we can break that table and use relations to normalize data but I just want to understand more about the best approach like login Model and User Model. We can manage both with single model because we need username and password field for both but is it right way to use single model instead of 2?
The Model and ViewModel represent separate concerns. Keep them separate. The entity should reflect the data state, where you can define view models to support the different view concerns. When you go to load the view models from the EF context via the Entities, you utilize .Select() which will compose efficient SQL queries for just the columns your view model will need.
For example, if I have an Account entity with 20-odd properties defined, but I want to present a list of Accounts listing just their User Names, last login time, and list of roles: (Which would be the "Name" property of a Role referenced by a AccountRoles table linking a many-to-many between accounts and roles)
[Serializable]
public class AccountSummaryViewModel
{
public string AccountName { get; set; }
public DateTime LastLoginDateTime { get; set; }
public ICollection<string> Roles { get; set; } = new List<string>();
}
var accounts = MyContext.Accounts
.Where(x => x.IsActive)
.OrderBy(x => x.AccountName)
.Select(x => new AccountSummaryViewModel
{
AccountName = x.AccountName,
LastLoginDateTime = x.LastLoginDateTime,
Roles = x.Roles.Select(x => x.Role.Name).ToList()
}).ToList();
The entity structure reflects your data structure, but then when the server go to query those entities to serve a view, define the ViewModel for the data structure you want to display and leverage EF to compose a query to fill that view model.
You can also leverage Automapper for this with it's .ProjectTo<T>() method which integrates with EF's IQueryable implementation.
EF DbContexts can also only have one entity registered that is associated to a single table. To have multiple flavors of entity pointing at an Account table, you would need multiple DbContext definitions. Bounded contexts are useful for large systems, but can lead to miserably intertwined context references if used improperly.
It is advisable to avoid ever passing entities to a view because this can lead to all kinds of performance issues, exceptions, as well as security vulnerabilities, especially if controllers actions accept entities back from the client. By passing entities to a client The service is passing more information to the client than it needs, and you run into potential issues around triggering lazy loading calls, or tripping up the serialization with circular references. The system also tell hackers/competitors more about your data structure and data than it really should be. The UI may not display most of the information in the entity, but it is sending all of that data to the client. This requires more memory on the server/client, and larger payloads over the wire as well.

How to solve initial very slow EF Entity call which uses TPH and Complex Types?

I am using EF6
I have a generic table which holds data for different types of class objects using the "Table Per Hierarchy" Approach. In addition these class objects use complex types for defining types for their properties.
So using a made up example,
Table = Person
"Mike the Teacher" is a "Teacher" instance of Person with a personType of "Teacher"
The "Teacher" instance has 2 properties, complextypePersonalDetails and complextypeAddress.
complextypePersonalDetails contains
First Name, Surname and Age.
complextypeAddress contains
HouseName, Street, Town, City, County.
I admit that this design may be over the top, and the problem may be of my making, but that aside I wanted to check to see whether I could do anymore with EF6 before I rewrite it.
I am performance profiling the code with JetBrains DotTrace.
On first call, say on
personTeacher = db.person.OfType().First()
I get a massive delay of around 150,000ms
around:
SerializedGeneratedViewOfType (150,000ms)
TryGenerateQueryViewOfType
GenerateTypeSpecificQueryView
GenerateQueryViewForSingleExtent
GenerateQueryViewForExtentAndType
GenerateViewsForExtentAndType
GenerateViewComponents
EnsureExtentIsFullyMapped (90,000ms)
GenerateCaseStatements (60,000ms)
I have created a pregenerated View using the "InteractivePreGeneratedViews" nuget package which creates the SQL. However even with this I still need to incur my first hit. Also this hit seems to happen every time the Webserver/Website/AppPool is restarted.
I am not totally sure of the EF process, but I guess there is some further form of runtime compilation or caching which happens when web app starts. Where could this be happening and is there a proactive method that I could use to pregenerate/precompile/precache this problem away.
In the medium term, we will rewrite this code in Dapper or EF.Core. So for now, any thoughts on what can be done?
Thanks.
I had commented on this before, but retracted it, but just agreeing with "this design may be over the top, and the problem may be of my making", but I thought I'd see if anyone else jumped in.
The initial spin-up cost is due to EF needing to resolve the mapping for your schema. This happens once, the first time a DBSet on the context is accessed. You can mitigate this by executing a query on your application start, I.e.
void Application_Start(object sender, EventArgs e)
{
// Initialization stuff...
using (var context = new MyContext())
{
var result = context.MyTable.Any(); // Spin up will happen here, not when the first user attempts to access a query.
}
}
You actually need to run a query for the the DbContext to resolve the mapping, just new-ing one up won't do it.
For larger, or more complex schemas you can also look to utilize bounded contexts where each context maps a particular set of relationships for a specific area of the application. The less complex/comprehensive a context is, the faster it initializes.
As far as the design goes, TPH is for representing inheritance, which is where you need to establish an "is-a" relation between like entities. Relational models, and ORMs by definition can support this, but they're geared more towards "has-a" relationships. Rather than having a model where you go "is-a person with an address", the relation is best mapped out that a person may "have-an" address. I've worked on a system that was designed by a team of engineers where an entire reporting system with dynamic rules was represented by 6 tables. Honestly, those designs are a nightmare to maintain.
I don't know why OfType() is so slow, but I found a fast and easy workaround by replacing it with a cast; EntityFramework seems to support that just fine and without the performance penalty.
var stopwatch = Stopwatch.StartNew();
using (var db = new MyDbContext())
{
// warn up
Console.WriteLine(db.People.Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// fast
Console.WriteLine(db.People.Select(p => p as Teacher).Where(p => p != null).Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// slow
Console.WriteLine(db.People.OfType<Teacher>().Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
}
20
3308 ms
2
3796 ms
2
10026 ms

Is it possibile to use a single transaction (on EF) with two different contexts pointing different schemas?

I'm currenly designing an application where I need to use two different database schemas (on the same instance): one as the application base, the other one to customize the application and the fields for every customer.
Since I read something about Repository pattern and as I've understood is possible to use two different contexts without efficiency loose, I'm now asking if I can use a single database transaction between two schemas with Entity Framework, as I'm actually doing directly on the database (SQL Server 2008-2012).
Sorry for my English an Thanks in advance!
If your connection strings are the same (which in your case will be as you have different schemas only for different contexts) then you are ok with this approach.
Basically you will have two different contexts that will be connected via the same connection string to the database and which will represent two different schemas.
using (var scope = new TransactionScope()) {
using (var contextSO = new ContextSchemaOne()) {
// Add, remove, change entities from context schema one
ContextSchemaOne.SaveChanges;
}
using (var contextST = new ContextSchemaTwo()) {
// Add, remove, change entities from context schema two
ContextSchemaTwo.SaveChanges;
}
scope.Complete();
}
I wasn't very successful in the past with this approach, and we switched to one context per database.
Further reading: Entity Framework: One Database, Multiple DbContexts. Is this a bad idea?
Maybe it's better to read something about unit of work before taking a decision about this.
You will have to do something like this: Preparing for multiple EF contexts on a unit of work - TransactionScope