Why is Entity Framework executing queries when I expect objects can be grabbed from EF cache?
With these simple model classes:
public class Blog
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Post> Posts { get; set; }
}
public class Post
{
public int Id { get; set; }
public string Content { get; set; }
public virtual Blog Blog { get; set; }
}
public class BlogDbContext : DbContext
{
public BlogDbContext() : base("BlogDbContext") {}
public DbSet<Blog> Blogs { get; set; }
public DbSet<Post> Posts { get; set; }
}
I profile the queries of following action
public class HomeController : Controller
{
public ActionResult Index()
{
var ctx = new BlogDbContext();
// expecting posts are retrieved and cached by EF
var posts = ctx.Posts.ToList();
var blogs = ctx.Blogs.ToList();
var wholeContent = "";
foreach (var blog in blogs)
foreach (var post in blog.Posts) // <- query is executed
wholeContent += post.Content;
return Content(wholeContent);
}
}
Why doesn't EF re-use the Post entities which I had already grabbed with the var posts = ctx.Posts.ToList(); statement?
Further explanation:
An existing application has an Excel export report. The data is grabbed via a main Linq2Sql query with a tree of includes (~20). Then it is mapped via automapper and additional data from manual caches (which previously slowed down the execution if added to the main query) is added.
Now the data is grown and SQL Server crashes when trying to execute the query with an error:
The query processor ran out of internal resources and could not produce a query plan.
Lazy loading would result in >100.000 queries. So I thought I could preload all the required data with a few simple queries and let EF use the objects automatically from cache during lazy loading.
There I initial had additional problems with limits of the TSQL IN() clause which I solved with MoreLinq´s Batch extension.
When you have Lazy Loading enabled, EF will still reload the Collection Navigation Properties. Probably because EF doesn't know whether you have really loaded all the Posts. EG code like
var post = db.Posts.First();
var relatedPosts = post.Blog.Posts.ToList();
Would be tricky, as the Blog would have one Post already loaded, but obviously the others need to be fetched.
In any case when relying on the Change Tracker to fix-up your Navigation Properties, you should disable Lazy Loading anyway. EG
using (var db = new BlogDbContext())
{
db.Configuration.LazyLoadingEnabled = false;
. . .
Given you have the navigation properties, look at leveraging them in your query to feed Automapper a dynamic object to map to your ViewModel/DTO rather than a top-level entity which you'd be relying on eager loading or waiting on lazy loading.
This is done by issuing a .Select() on your query. To use a simple example of extracting order details including the customer name, list of product names and quantities from order lines, and the delivery address where an Order has a reference to customer, and that customer has a delivery address, a collection of order lines, each with a product...
var orderDetails = dbContext.Orders
.Where(o => /* Insert criteria */)
.Select(o => new
{
o.OrderId,
o.OrderNumber,
o.Customer.CustomerId,
CustomerName = x.Customer.FullName,
o.Customer.DeliveryAddress, // Address entity if no further dependencies, or extract fields/relations from the Address.
o.OrderLines.Select( ol = > new
{
ol.OrderLineId,
ProductName = ol.Product.Name,
ol.Quantity
}
}).ToList(); // Ready to feed into Automapper.
With ~20 includes your Select will undoubtedly be a bit more involved, but the idea is to feed SQL Server a query to retrieve just the data you want that you can then feed into Automapper to navigate through where any child relationships can either be flattened by EF or simplified and returned for your mapper to flesh out into the resulting models.
With growing systems you will also want to consider leveraging paging /w Skip and Take rather than ToList, or at least leveraging Take to ensure that there is a cap to the amount of data your return. ToList is a primary performance troll that I look for in EF code because its misuse can kill applications.
Related
Following through Julie Lerman's Pluralsight course EF Core 6 Fundamentals I've created two classes in my own project (my own design, but identical to the course in terms of class structure/data hierarchy):
Class 1: Events - To hold information about an event being held (e.g. a training course), with a title and description (some fields removed for brevity):
public class EventItem
{
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int EventItemId { get; set; }
[Required(AllowEmptyStrings = false)]
public string EventTitle { get; set; }
public string? EventDescription { get; set; }
[Required]
public List<EventCategory> EventCategories { get; set; } = new();
}
Class 2: Event categories - Each event can be linked to one or more pre-existing (seeded) categories (e.g. kids, adult).
public class EventCategory
{
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int EventCategoryId { get; set; }
[Required]
public string EventCategoryName { get; set; }
public List<EventItem>? EventItems { get; set; }
}
In my Razor form to create the event, the user can select from multiple categories. Using EF Core I take the posted data (via a VM/DTO object) and construct the relevant parent/child entities. However upon saving to the database I get an exception as EF Core tries to re-create the categories when they already exist:
Cannot insert explicit value for identity column in table
'EventCategories' when IDENTITY_INSERT is set to OFF.
My code explicitly looks up the existing categories selected by the user, but the context tracker appears to still believe they need inserting, in addition to creating the many-to-many relationship.
I'd appreciate any input as to why this is happening please:
using (var dbcontext = DbFactory.CreateDbContext())
{
// Get selected categories from user's check box list
var selectedCategoryIds = _eventCagetories.Where(c => c.isSelected).Select(c => c.EventCategoryId).ToList();
// Create new Event
var newEventItem = new EventFinderDomain.Models.EventItem() {
EventTitle = _eventItemDTO.EventTitle,
EventDescription = _eventItemDTO.EventDescription,
EventUrl = _eventItemDTO.EventUrl,
TicketUrl = _eventItemDTO.TicketUrl
};
// Find categories from the database based on their ID value
var selectedEventCategories = dbcontext.EventCategories.Where(c => selectedCategoryIds.Contains(c.EventCategoryId)).ToList();
// Add the categories to the event
newEventItem.EventCategories!.AddRange(selectedEventCategories);
// Add the event to the change tracker
await dbcontext.EventItems.AddAsync(newEventItem); // <-- Created correctly with child list objects added
// Detect changes for debugging
dbcontext.ChangeTracker.DetectChanges();
var debugView = dbcontext.ChangeTracker.DebugView; // <-- Incorrectly shows newEventItem.Categories being added
// Save to database
await dbcontext.SaveChangesAsync(); // <-- Cannot insert explicit value for identity column
}
The Event entity appears to be correctly created in the debugger with its related child categories included:
The change tracker however incorrectly shows the selected categories being added again when they already exist:
After commenting out every line of code in the app and adding back in until it broke, it emerges the problem was elsewhere within Program.cs:
builder.Services.AddDbContextFactory<EventFinderContext>(
opt => opt.UseSqlServer(new SqlConnectionStringBuilder() {/*...*/}.ConnectionString)
.EnableSensitiveDataLogging()
.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking) // <-- THE CULPRIT
);
In the training video this method was described as a way of reducing overhead for disconnected apps. I had assumed that because of the disconnected nature of HTTP, this would be beneficial and that context would be re-established when creating the model's child data. This was incorrect on my part.
I should have used .AsNoTracking() only when retriving read-only data from my database. For example, loading in the child-data for a new model that wouldn't be modified directly, but used to create the many-to-many data (explicitly, for the category data option items only and not for the event data).
The entity/model has a child object, during ADD (POST) operations where I just want the parent object to be updated in the database, I simply set the child object to null. Parent object adds to database just fine and child object doesn't touch the database.
However, when I do an UPDATE (PUT) and set the same child object to null, the parent object is actually deleted from the database and child object not touched in the database?
Model code:
namespace PROJ.API.Models
{
public partial class Todo
{
public Todo()
{
}
public long TdoId { get; set; }
public string TdoDescription { get; set; } = null!;
public long PtyId { get; set; }
public virtual Priority? Priority { get; set; }
}
public partial class Priority
{
public Priority()
{
}
public long PtyId { get; set; }
public byte PtyLevel { get; set; }
public string PtyDescription { get; set; } = null!;
}
}
Entities code:
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;
namespace PROJ.API.Entities
{
public class Todo
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public long TdoId { get; set; }
public string TdoDescription { get; set; } = null!;
public long PtyId { get; set; }
[ForeignKey("PtyId")]
public virtual Priority? Priority { get; set; }
}
public class Priority
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public long PtyId { get; set; }
public byte PtyLevel { get; set; }
public string PtyDescription { get; set; } = null!;
}
}
Repository code:
public async Task<Todo?> GetTodoAsync(long tdoId)
{
var todo = await _context.Todo.Where(c => c.TdoId == tdoId)
.Include(x => x.Priority)
.FirstOrDefaultAsync();
return todo;
}
Controller code:
[HttpPut()] // UPDATE
public async Task<ActionResult> UpdateTodoAsync(Todo todo)
{
var eTodo = await _myRepository.GetTodoAsync(todo.TdoId);
if (todo.Priority == null || todo.Priority.PtyId == 0)
{
var priority = await _myRepository.GetPriorityAsync(todo.PtyId);
if (priority != null)
{
_mapper.Map(priority, todo.Priority);
}
}
_mapper.Map(todo, eTodo);
await _myRepository.SaveChangesAsync();
return NoContent();
}
My understanding is that setting the child object to null tells EF to NOT perform any operation on it in the database. TODO.PtyId is setup with a FK to PRIORITY.PtyId in the SQL database but I have NOT defined this in context (OnModelCreating) as I don't "think" I need the Fluent API approach here.
Any thoughts on what I'm doing wrong and/or why an UPDATE is actually deleting a record when I set a child object to NULL? As I noted before an ADD using the same null approach works just fine.
A couple things.
In your example and naming convention you should be explicitly nominating your FK either by attribute or fluent declaration. EF's convention is to base FK names on the "type" of the relationship, not the property name. So for instance if you have:
public virtual Priority? Pty { get; set; }
EF will be looking for a FK named Priority_ID or PriorityID, not PtyID. This behaviour may have changed in EF Core, I honestly haven't delved back into whether EF conventions can be trusted to work this out.
Lastly, this is overall a typical example of issues that can come up whenever you mix concerns with entities and use detached entities as view models. It's also outlining an issue with your repository implementation. In your case you are detaching an entity, then when passing it back to the server to update, loading the entity from data state, and using Automapper to copy the values across.
The first problem is that your repository is automatically and unconditionally eager-loading the related entity when in at least this example you don't need or want that related entity. When EF eager loads a relationship and then you set that related entity to #null, the proxy records this action and EF will interpret that as "Remove this relationship". If the related entity is not loaded/associated and left as #null when saving that top-level entity, nothing is removed. Either way you will want to avoid doing something like setting a related entity to #null if you don't want to save changes to them. The solution is either not to load related entities in the first place, ignore mapping across related entities, or marking those entities as Unchanged to avoid persisting changes to them.
Not loading Related entities:
This could either be solved by adding arguments to indicate what should be eager loaded, or considering adopting IQueryable for the repository method:
Argument:
public async Task<Todo?> GetTodoAsync(long tdoId, bool includeRelated = true)
{
var query = _context.Todo.Where(c => c.TdoId == tdoId);
if (includeRelated)
{
query = query.Include(c => c.Pty);
}
return query.FirstOrDefaultAsync();
}
In simple cases this isn't too bad, but in more complex entities it can be a pain, especially if you want to selectively include relatives. This way when you load eToDo from data state, you can tell it to not eager load the Priority. This isn't foolproof as it is still possible that a Priority could be associated if that DbContext instance had previously loaded the Priority associated with that Todo. Tracked entities will be associated even if you don't explicitly eager load them. To be safe this should be combined with the Automapper changes further below.(Excluding mapping changes) This is still a worthwhile change as you can avoid resource/performance costs of unconditionally eager loading every read.
IQueryable:
public IQueryable<Todo> GetTodoById(long tdoId)
{
var query = _context.Todo.Where(c => c.TdoId == tdoId);
return query;
}
IQueryable gives your consumer a lot more flexibility into what it wants to do with regards to data that will be coming back, but it does require a shift in thinking around the unit of work pattern to move the scope of the DbContext out into a Unit of Work so that consumers are responsible for that scope rather than at the individual repository level. The advantages of this approach are that the unit of work (DbContext scope) can be shared across repositories if needed, and with this pattern your consumer has control over things like:
Projection using Select or Count, Any, etc.
async vs. synchronous operations.
Assessing whether or not to eager load related entities.
So as an example with this pattern, the controller or service code would function more like:
[HttpPut()] // UPDATE
public async Task<ActionResult> UpdateTodoAsync(Todo todo)
{
using (var contextScope = _contextScopeFactory.Create())
{
var eTodo = await _myRepository.GetTodoById(todo.TdoId)
.SingleAsync();
_mapper.Map(todo, eTodo);
await contextScope.SaveChangesAsync();
return NoContent();
}
}
contextScope / _contextScopeFactory are a UoW pattern called DbContextScope by Medhi El Gueddari for EF6 which has a number of forks covering EF Core. I like this pattern as it gives the Repository a dependency to a locator to resolve the DbContext from the Scope rather than passing around DbContext instances, giving that scope full control over whether or not SaveChanges() gets committed or not. Leveraging IQueryable enables projection so it can help avoid this issue all-together when used to read data to send to a view to Project to a ViewModel using Automapper's ProjectTo rather than sending Entities to a View which come back to the controller as a deserialized and typically incomplete shell of what they once were.
Excluding mapping changes:
This involves adjusting the mapping you use to exclude copying across changes to the related entity when mapping one Todo across to another. If the mapping ignores Todo.Pty then you can map across just the Todo fields from the one instance to the DB instance and save the DbInstance without change tracking tripping anything changing in Pty or the relationship.
Marking as Unchanged:
Given your repository is managing the scope of the DbContext what you will likely need to do is add a method to isolate changes to just that top-level entity. Since the Repository is scoping the DbContext, this means some form of clunky method since we need to pass the entity to tweak tracking.
// eTodo.Pty = null; don't do
_myRepository.IgnoreRelatedChanges(eTodo);
await _myRepository.SaveChangesAsync();
then...
public void IgnoreRelatedChanges(Todo todo)
{
_context.Entry(todo.Pty).State = EntityState.Unchanged;
}
The trouble with this approach is that it is clumsy and prone to bugs/exceptions.
In any case that should provide you with some options to consider to solve your issue, and possibly consider for updating your repository pattern.
I am new to EF Core 6.0.1, using it with Blazor (WebAssembly), .NET 6.0, and Visual Studio 2022. I am creating a database of internal software projects, including their author(s) and maintainer(s).
I am having trouble getting EF Core to take in a List of Authors / List of Maintainers as part of creating a new SoftwareItem from a webform submission.
SoftwareItem in defined (in part) as follows:
public class SoftwareItem
{
[Key]
public int SoftwareId { get; set; }
public string Name { get; set; }
public string CurrentVersion { get; set; }
public string Status { get; set; }
public List<Author> Authors { get; set; }
public List<Maintainer> Maintainers { get; set;}
[other properties omitted]
}
An Author is defined as follows:
public class Author
{
[Key]
public int AuthorId { get; set; }
public int SoftwareItemId { get; set; }
public int ProgrammerId { get; set; }
public Programmer Programmer { get; set; }
}
Maintainer is identical, except for having a MaintainerId instead of an AuthorId.
Programmer is defined as:
public class Programmer
{
[Key]
public int ProgrammerId { get; set; }
public string ProgrammerName { get; set; }
}
EF Core created the tables for me based on a migration, and I have manually populated the Programmer table with the nine people who might be an Author and/or a Maintainer.
I have a webform where the user can create a new SoftwareItem, with pre-populated drop-downs for Authors and Maintainers that, after querying the database, contain the potential ProgrammerNames. The user can assign up to three Authors and up to three Maintainers before submitting the webform (via an Author1 dropdown, an Author2 dropdown etc.) Submitting the webform calls the InsertSoftware method, included below.
Note that I'm not a fan of the repetition between the Author logic and Maintainer logic, and the List should probably be a HashSet (in case the same author is set in Author1 and Author2) but those are issues for another day. The Author1 and similar variables are the int IDs set by the webform. I've previously verified they are being set to the appropriate values via a JavaScript alert. An ID of 0 means the value was never set (e.g. there is no second author).
The SoftwareItem here is instantiated as a new object on OnIntializedAsync and bound as the webform's model.
public async Task InsertSoftware()
{
List<int> authorIdsToAdd = new List<int>();
authorIdsToAdd.Add(Author1);
authorIdsToAdd.Add(Author2);
authorIdsToAdd.Add(Author3);
SoftwareItem.Authors = new List<Author>();
foreach (int author in authorIdsToAdd)
{
if (author != 0)
{
foreach (Programmer programmer in ProgrammerList)
{
if (programmer.ProgrammerId == author)
{
Author addedAuthor = new Author();
addedAuthor.Programmer = new Programmer();
addedAuthor.Programmer.ProgrammerId = author;
SoftwareItem.Authors.Add(addedAuthor);
}
}
}
}
[repeat code for the Maintainers]
await Http.PostAsJsonAsync("api/softwareitem", SoftwareItem);
Navigation.NavigateTo("software/fetchsoftware");
}
The SoftwareItem API is (in part) as follows:
[HttpPost]
public async Task<IActionResult> Create([FromBody] SoftwareItem softwareItem)
{
_context.Software.Add(softwareItem);
await _context.SaveChangesAsync();
return Ok(softwareItem);
}
My understanding from this Stack Overflow question is that if objects have been instantiated for a navigation property when the parent entity is added and saved to the database context, then EF Core will also add the new navigation property values to their appropriate tables. However, that isn't happening, and all I'm getting is a 500 error in the console.
What I'm expecting is that...
A new entry will be inserted into the SoftwareItem table
New entries will be inserted into the Author table, containing an auto-incremented AuthorId, the SoftwareItem's SoftwareItemId, and the ProgrammerId from the webform
New entries will be inserted into the Maintainer table, containing an auto-incremented MaintainerId, the SoftwareItem's SoftwareItemId, and the ProgrammerId from the webform.
Ok, it's a bit difficult to make out what your code is precisely doing but there are a few issues I see.
First, with entities you should always avoid ever reinitializing navigation property lists. During inserts it's "ok", but anywhere else it would lead to bugs/errors so it's better to simply not see it in the code. Pre-initialize your properties in the entity itself:
public class SoftwareItem
{
// ...
public virtual ICollection<Author> Authors { get; set; } = new List<Author>();
public virtual ICollection<Maintainer> Maintainers { get; set;} = new List<Maintainer>();
}
This ensures the collections are ready to go when you need them for a new entity.
Next, it can be helpful to structure your code to avoid things like module level variables. Your InsertSoftware() method references an instance of SoftwareItem and it isn't clear where, or what this reference would be pointing at. If you have a method chain that loaded a particular software item instance to be updated, pass the reference through the chain of methods as a parameter. This helps encapsulate the logic. You should also look to define a scope for whenever you are referencing a DbContext. With Blazor this needs to be done a bit more explicitly to avoid DbContext instances from being too long-lived. Long-lived DbContext instances are a problem because they lead to performance degradation as they track increasing numbers of entities, and can easly become "poisoned" with invalid entities that prevent things like SaveChanges() calls from succeeding. Keep instances alive only as long as absolutely necessary. I would strongly recommend looking at unit of work patterns to help encapsulate the lifetime scope of a DbContext. Ideally entities loaded by a DbContext should not be passed outside of that scope to avoid issues and complexity with detached or orphaned entities.
Next, it is important to know when you are looking to create new entities vs. reference existing data. Code like this is a big red flag:
Author addedAuthor = new Author();
addedAuthor.Programmer = new Programmer();
addedAuthor.Programmer.ProgrammerId = author;
From what I can make out, the Author (and Maintainer) are linking entities so we will want to create one for each "link" between a software item and a programmer. However, Programmer is a reference to what should be an existing row in the database.
If you do something like:
var programmer = new Programmer { ProgrammerId == author };
then associate that programmer as a reference to another entity, you might guess this would tell EF to find and associate an existing programmer.. Except it doesn't. You are telling EF to associate a new programmer with a particular ID. Depending on how EF has been configured for that entity (whether to use an identity column for the PK or not) this will result in one of three things happening if that programmer ID already exists:
A new programmer is created with an entirely new ID (identity gives it a new id and ProgrammerId is ignored)
EF throws an exception when it tries to insert a new programmer with the same ID. (Duplicate PK)
EF throws an exception if you tell it add a new programmer and it happens to already be tracking an instance with the same ID.
So, to fix this, load your references:
List<int> authorIdsToAdd = new List<int>();
// likely need logic to only add authors if they are selected, and unique.
authorIdsToAdd.Add(Author1);
authorIdsToAdd.Add(Author2);
authorIdsToAdd.Add(Author3);
// Define your own suitable scope mechanism for this method or method chain
using (var context = new AppDbContext())
{
var softwareItem = new SoftwareItem { /* populate values from DTO or Map from DTO */ }
// Retrieve references
var authors = await context.Programmers.Where(x => authorIdsToAdd.Contains(x.ProgrammerId)).ToListAsync();
foreach(var author in authors)
{
softwareItem.Authors.Add(new Author { Programmer = author });
}
// Continue for Maintainers...
await context.SaveChangesAsync();
}
I have been reading that Repositories should return domain objects only. I am having difficulty with implementing this. I currently have API with Service Layer, Repository and I am using EF Core to access sql database.
If we consider User(Id, Name, address, PhoneNumber, Email, Username) and Orders (id, OrderDetails, UserId) as 2 domain objects. One Customer can have multiple Orders. I have created navigation property
public virtual User User{ get; set; }
and foreign Key.
Service layer needs to return DTO with OrderId, OrderDetails, CustomerId, CustomerName. What should the Repository return in this case? This is what i was trying:
public IEnumerable<Orders> GetOrders(int orderId)
{
var result = _context.Orders.Where(or=>or.Id=orderId)
.Include(u => u.User)
.ToList();
return result;
}
I am having trouble with Eager loading. I have tried to use include. I am using Database first. In the case of above, Navigation Properties are always retuned with NULL. The only way i was able to get data in to Navigation Properties was to enable lazy loading with proxies for the context. I think this will be a performance issue
Can anyone help with what i should return and why .Include is not working?
Repositories can return other types of objects, even primitive types like integers if you want to count some number of objects based on a criteria.
This is from the Domain Driven Design book:
They (Repositories) can also return symmary information, such as a
count of how many instances (of Domain Object) meet some criteria.
They can even return summary calculations, such as the total across
all matching objects of some numerical attribute.
If you return somethings that isn't a Domain Objects, it's because you need some information about the Domain Objects, so you should only return immutable objects and primitive data types like integers.
If you make a query to get and objects with the intention of changing it after you get it, it should be a Domain Object.
If you need to do it place boundaries around your Domain Objects and organize them in Aggregates.
Here's a good article that explains how to decompose your model into aggregates: https://dddcommunity.org/library/vernon_2011/
In your case you can either compose the User and the Order entities in a single Aggreate or have them in separate Aggregates.
EDIT:
Example:
Here we will use Reference By Id and all Entities from different Aggregates will reference other entities from different Aggregates by Id.
We will have three Aggregates: User, Product and Order with one ValueObject OrderLineItem.
public class User {
public Guid Id{ get; private set; }
public string FirstName { get; private set; }
public string LastName { get; private set; }
}
public class Product {
public Guid Id { get; private set; }
public string Name { get; private set; }
public Money Price { get; private set; }
}
public class OrderLineItem {
public Guid ProductId { get; private set; }
public Quantity Quantity { get; private set; }
// Copy the current price of the product here so future changes don't affect old orders
public Money Price { get; private set; }
}
public class Order {
public Guid Id { get; private set; }
public IEnumerable<OrderLineItem> LineItems { get; private set; }
}
Now if you do have to do heavy querying in your app you can create a ReadModel that will be created from the model above
public class OrderLineItemWithProductDetails {
public Guid ProductId { get; private set; }
public string ProductName { get; private set; }
// other stuff quantity, price etc.
}
public class OrderWithUserDetails {
public Guid Id { get; private set; }
public string UserFirstName { get; private set; }
public string UserLastName { get; private set; }
public IEnumerable<OrderLineItemWithProductDetails > LineItems { get; private set; }
// other stuff you will need
}
How you fill the ReadModel is a whole topic, so I can't cover all of it, but here are some pointers.
You said you will do a Join, so you're probably using RDBMS of some kind like PosteSQL or MySQL. You can do the Join in a special ReadModel Repository. If your data is in a single Database, you can just use a ReadModel Repository.
// SQL Repository, No ORM here
public class OrderReadModelRepository {
public OrderWithUserDetails FindForUser(Guid userId) {
// this is suppose to be an example, my SQL is a bit rusty so...
string sql = #"SELECT * FROM orders AS o
JOIN orderlineitems AS l
JOIN users AS u ON o.UserId = u.Id
JOIN products AS p ON p.id = l.ProductId
WHERE u.Id = userId";
var resultSet = DB.Execute(sql);
return CreateOrderWithDetailsFromResultSet(resultSet);
}
}
// ORM based repository
public class OrderReadModelRepository {
public IEnumerable<OrderWithUserDetails> FindForUser(Guid userId) {
return ctx.Orders.Where(o => o.UserId == userId)
.Include("OrderLineItems")
.Include("Products")
.ToList();
}
}
If it's not, well you will have to build it an keep it in a separate database. You can use DomainEvents to do that, but I wont go that far if you have a single SQL database.
The advice I give around the repository pattern is that repositories should return IQueryable<TEntity>, Not IEnumerable<TEntity>.
The purpose of a repository is to:
Make code easier to test.
Centralize common business rules.
The purpose of a repository should not be to:
Abstract EF away from your project.
Hide knowledge of your domain. (Entities)
If you're introducing a repository to hide the fact that the solution is depending on EF or hide the domain then you are sacrificing much of what EF can bring to the table for managing the interaction with your data or you are introducing a lot of unnecessary complexity into your solution to try and keep that capability. (filtering, sorting, paginating, selective eager loading, etc.)
Instead, by leveraging IQueryable and treating EF as a first-class citizen to your domain you can leverage EF to produce flexible and fast queries to get the data you need.
Given a Service where you want to " return DTO with OrderId, OrderDetails, CustomerId, CustomerName."
Step 1: Raw example, no repository...
Service code:
public OrderDto GetOrderById(int orderId)
{
using (var context = new AppDbContext())
{
var order = context.Orders
.Select(x => new OrderDto
{
OrderId = x.OrderId,
OrderDetails = x.OrderDetails,
CustomerId = x.Customer.CustomerId,
CustomerName = x.Customer.Name
}).Single(x => x.OrderId == orderId);
return order;
}
}
This code can work perfectly fine, but it is coupled to the DbContext so it is hard to unit test. We may have additional business logic to consider that will need to apply to pretty much all queries such as if Orders have an "IsActive" state (soft delete) or the database serves multiple clients (multi-tenant). There will be a lot of queries in our controllers and would lead to the need for a lot of things like .Where(x => x.IsActive) included everywhere.
With the Repository pattern (IQueryable), unit of work:
public OrderDto GetOrderById(int orderId)
{
using (var context = ContextScopeFactory.CreateReadOnly())
{
var order = OrderRepository.GetOrders()
.Select(x => new OrderDto
{
OrderId = x.OrderId,
OrderDetails = x.OrderDetails,
CustomerId = x.Customer.CustomerId,
CustomerName = x.Customer.Name
}).Single(x => x.OrderId == orderId);
return order;
}
}
Now at face value in the controller code above, this doesn't really look much different to the first raw example, but there are a few bits that make this testable and can help manage things like common criteria.
The repository code:
public class OrderRepository : IOrderRepository
{
private readonly IAmbientContextScopeLocator _contextScopeLocator = null;
public OrderRepository(IAmbientContextScopeLocator contextScopeLocator)
{
_contextScopeLocator = contextScopeLocator ?? throw new ArgumentNullException("contextScopeLocator");
}
private AppDbContext Context => return _contextScopeLocator.Get<AppDbContext>();
IQueryable<Order> IOrderRepository.GetOrders()
{
return Context.Orders.Where(x => x.IsActive);
}
}
This example uses Mehdime's DbContextScope for the unit of work, but can be adapted to others or an injected DbContext as long as it is lifetime scoped to the request. It also demonstrates a case with a very common filter criteria ("IsActive") that we might want to centralize across all queries.
In the above example we use a repository to return the orders as an IQueryable. The repository method is fully mock-able where the DbContextScopeFactory.CreateReadOnly call can be stubbed out, and the repository call can be mocked to return whatever data you want using a List<Order>().AsQueryable() for example. By returning IQueryable the calling code has full control over how the data will be consumed. Note that there is no need to worry about eager-loading the customer/user data. The query will not be executed until you perform the Single (or ToList etc.) call which results in very efficient queries. The repository class itself is kept very simple as there is no complexity about telling it what records and related data to include. We can adjust our query to add sorting, pagination, (Skip/Take) or get a Count or simply check if any data exists (Any) without adding functions etc. to the repository or having the overhead of loading the data just to do a simple check.
The most common objections I hear to having repositories return IQueryable are:
"It leaks. The callers need to know about EF and the entity structure." Yes, the callers need to know about EF limitations and the entity structure. However, many alternative approaches such as injecting expression trees for managing filtering, sorting, and eager loading require the same knowledge of the limitations of EF and the entity structure. For instance, injecting an expression to perform filtering still cannot include details that EF cannot execute. Completely abstracting away EF will result in a lot of similar but crippled methods in the repository and/or giving up a lot of the performance and capability that EF brings. If you adopt EF into your project it works a lot better when it is trusted as a first-class citizen within the project.
"As a maintainer of the domain layer, I can optimize code when the repositories are responsible for the criteria." I put this down to premature optimization. The repositories can enforce core-level filtering such as active state or tenancy, leaving the desired querying and retrieval up to the implementing code. It's true that you cannot predict or control how these resulting queries will look against your data source, but query optimization is something that is best done when considering real-world data use. The queries that EF generates reflect the data that is needed which can be further refined and the basis for what indexes will be most effective. The alternative is trying to predict what queries will be used and giving those limited selections to the services to consume with the intention of them requesting further refined "flavours". This often reverts to services running less efficient queries more often to get their data when it's more trouble to introduce new queries into the repositories.
Considering the blogging data model:
Blog:
int Id
ICollection<Post> Posts
Post:
int Id
int BlogId
DateTime Date
Then loading Blogs with the date of their latest post (LatestPostDate) and bind to the UI, while they are tracked by the context.
There are some solutions, such as using DTO, but the result entities are not tracked by the context.
Also I can set the LatestPostDate as NotMapped, define a Table-valued function, and apply SqlQuery on DbSet. Although, the NotMapped fields are not loaded in this way.
What are the best practices?
I try not to add column to the table, also avoid calculating the values after loading.
Best practice would be to handle display concerns in a ViewModel.
But as you do not want to map the Entity to another class, let's first take a look at the [NotMapped] variant, using LINQ to calculate the latest post date instead of plain SQL.
using System.Linq;
public class Blog {
public int Id { get; set; }
public virtual ICollection<Post> Posts { get; set; }
[NotMapped]
public DateTime? LatestPostDate {
get {
return Posts.OrderBy(p => p.Date).LastOrDefault()?.Date;
}
}
}
This way, the value is calculated only when you access the property LatestPostDate (probably during UI rendering). You can reduce the number of DB accesses by eager loading the Posts, although this will increase the size of the data set you are working with.
var blogs = _dbContext.Blogs.Include(b => b.Posts).ToArray();
But if you use a ViewModel, you can fill the LatestPostDate in one go:
public class BlogViewModel {
public int Id { get; set; }
public DateTime? LatestPostDate { get; set; }
}
var viewModels = _dbContext.Blogs.Select(b => new BlogViewModel {
Id = b.Id,
LatestPostDate = b.Posts.OrderBy(p => p.Date).LastOrDefault()?.Date;
}).ToArray();
Regarding your concerns that the ViewModel is not tracked by the context: in the edit usecase, load the Entity again using the Id provided by the ViewModel and map the updated properties. This gives you full control over the properties that should be editable. As a bonus, the ViewModel is a good place to do input validation, formatting etc.