What specific issue does the repository pattern solve? - entity-framework

(Note: My question has very similar concerns as the person who asked this question three months ago, but it was never answered.)
I recently started working with MVC3 + Entity Framework and I keep reading that the best practice is to use the repository pattern to centralize access to the DAL. This is also accompanied with explanations that you want to keep the DAL separate from the domain and especially the view layer. But in the examples I've seen the repository is (or appears to be) simply returning DAL entities, i.e. in my case the repository would return EF entities.
So my question is, what good is the repository if it only returns DAL entities? Doesn't this add a layer of complexity that doesn't eliminate the problem of passing DAL entities around between layers? If the repository pattern creates a "single point of entry into the DAL", how is that different from the context object? If the repository provides a mechanism to retrieve and persist DAL objects, how is that different from the context object?
Also, I read in at least one place that the Unit of Work pattern centralizes repository access in order to manage the data context object(s), but I don't grok why this is important either.
I'm 98.8% sure I'm missing something here, but from my readings I didn't see it. Of course I may just not be reading the right sources... :\

I think the term "repository" is commonly thought of in the way the "repository pattern" is described by the book Patterns of Enterprise Application Architecture by Martin Fowler.
A Repository mediates between the domain and data mapping layers,
acting like an in-memory domain object collection. Client objects
construct query specifications declaratively and submit them to
Repository for satisfaction. Objects can be added to and removed from
the Repository, as they can from a simple collection of objects, and
the mapping code encapsulated by the Repository will carry out the
appropriate operations behind the scenes.
On the surface, Entity Framework accomplishes all of this, and can be used as a simple form of a repository. However, there can be more to a repository than simply a data layer abstraction.
According to the book Domain Driven Design by Eric Evans, a repository has these advantages:
They present clients with a simple model for obtaining persistence objects and managing their life cycle
They decouple application and domain design from persistence technology, multiple database strategies, or even multiple data sources
They communicate design decisions about object access
They allow easy substitution of a dummy implementation, for unit testing (typically using an in-memory collection).
The first point roughly equates to the paragraph above, and it's easy to see that Entity Framework itself easily accomplishes it.
Some would argue that EF accomplishes the second point as well. But commonly EF is used simply to turn each database table into an EF entity, and pass it through to UI. It may be abstracting the mechanism of data access, but it's hardly abstracting away the relational data structure behind the scenes.
In simpler applications that mostly data oriented, this might not seem to be an important point. But as the applications' domain rules / business logic become more complex, you may want to be more object oriented. It's not uncommon that the relational structure of the data contains idiosyncrasies that aren't important to the business domain, but are side-effects of the data storage. In such cases, it's not enough to abstract the persistence mechanism but also the nature of the data structure itself. EF alone generally won't help you do that, but a repository layer will.
As for the third advantage, EF will do nothing (from a DDD perspective) to help. Typically DDD uses the repository not just to abstract the mechanism of data persistence, but also to provide constraints around how certain data can be accessed:
We also need no query access for persistent objects that are more
convenient to find by traversal. For example, the address of a person
could be requested from the Person object. And most important, any
object internal to an AGGREGATE is prohibited from access except by
traversal from the root.
In other words, you would not have an 'AddressRepository' just because you have an Address table in your database. If your design chooses to manage how the Address objects are accessed in this way, the PersonRepository is where you would define and enforce the design choice.
Also, a DDD repository would typically be where certain business concepts relating to sets of domain data are encapsulated. An OrderRepository may have a method called OutstandingOrdersForAccount which returns a specific subset of Orders. Or a Customer repository may contain a PreferredCustomerByPostalCode method.
Entity Framework's DataContext classes don't lend themselves well to such functionality without the added repository abstraction layer. They do work well for what DDD calls Specifications, which can be simple boolean expressions sent in to a simple method that will evaluate the data against the expression and return a match.
As for the fourth advantage, while I'm sure there are certain strategies that might let one substitute for the datacontext, wrapping it in a repository makes it dead simple.
Regarding 'Unit of Work', here's what the DDD book has to say:
Leave transaction control to the client. Although the REPOSITORY will insert into and delete from the database, it will ordinarily not
commit anything. It is tempting to commit after saving, for example,
but the client presumably has the context to correctly initiate and
commit units of work. Transaction management will be simpler if the
REPOSITORY keeps its hands off.

Entity Framework's DbContext basically resembles a Repository (and a Unit of Work as well). You don't necessarily have to abstract it away in simple scenarios.
The main advantage of the repository is that your domain can be ignorant and independent of the persistence mechanism. In a layer based architecture, the dependencies point from the UI layer down through the domain (or usually called business logic layer) to the data access layer. This means the UI depends on the BLL, which itself depends on the DAL.
In a more modern architecture (as propagated by domain-driven design and other object-oriented approaches) the domain should have no outward-pointing dependencies. This means the UI, the persistence mechanism and everything else should depend on the domain, and not the other way around.
A repository will then be represented through its interface inside the domain but have its concrete implementation outside the domain, in the persistence module. This way the domain depends only on the abstract interface, not the concrete implementation.
That basically is object-orientation versus procedural programming on an architectural level.
See also the Ports and Adapters a.k.a. Hexagonal Architecture.
Another advantage of the repository is that you can create similar access mechanisms to various data sources. Not only to databases but to cloud-based stores, external APIs, third-party applications, etc.

You're right,in those simple cases the repository is just another name for a DAO and it brings only one value: the fact that you can switch EF to another data access technique. Today you're using MSSQL, tomorrow you'll want a cloud storage. OR using a micro orm instead of EF or switching from MSSQL to MySql.
In all those cases it's good that you use a repository, as the rest of the app won't care about what storage you're using now.
There's also the limited case where you get information from multiple sources (db + file system), a repo will act as the facade, but it's still a another name for a DAO.
A 'real' repository is valid only when you're dealing with domain/business objects, for data centric apps which won't change storage, the ORM alone is enough.

It would be useful in situations where you have multiple data sources, and want to access them using a consistent coding strategy.
For example, you may have multiple EF data models, and some data accessed using traditional ADO.NET with stored procs, and some data accessed using a 3rd party API, and some accessed from an Access database living on a Windows NT4 server sitting under a blanket of dust in your broom closet.
You may not want your business or front-end layers to care about where the data is coming from, so you build a generic repository pattern to access "data", rather than to access "Entity Framework data".
In this scenario, your actual repository implementations will be different from each other, but the code that calls them wouldn't know the difference.

Given your scenario, I would simply opt for a set of interfaces that represent what data structures (your Domain Models) need to be returned from your data layer. Your implementation can then be a mixture of EF, Raw ADO.Net or any other type of Data Store/Provider. The key strategy here is that the implementation is abstracted away from the immediate consumer - your Domain layer. This is useful when you want to unit test your domain objects and, in less common situations - change your data provider / database platform altogether.
You should, if you havent already, consider using an IOC container as they make loose coupling of your solution very easy by way of Dependency Injection. There are many available, personally i prefer Ninject.
The domain layer should encapsulate all of your business logic - the rules and requirements of the problem domain, and can be consumed directly by your MVC3 web application. In certain situations it makes sense to introduce a services layer that sits above the domain layer, but this is not always necessary, and can be overkill for straightforward web applications.

Another thing to consider is that even when you know that you will be working with a single data store it still might make sense to create a repository abstraction. The reason is that there might be a function that your application needs that your ORM du jour either does badly (performance), not at all, or you just don't know how to make the ORM bend to your needs.
If you are wrapping your ORM behind a well thought out repository interface, you can easily switch between different technologies as you see fit. It's not uncommon in my repositories to see some methods use EF for their work and others to use something like PetaPoco, or (gasp) ADO.net code. The repository abstraction enables you to use exactly the right tool for the job at hand without leaking these complexities into the client code.

I think there is a big misunderstanding of what many articles call "repository." And that's why there are doubts about what real value those abstractions bring.
In my opinion the repository in it's pure form is IEnumerable, while you and many articles are talking about "data access service."
I've blogged about it here.

Related

Entity Framework 6 Database-First and Onion Architecture

I am using Entity Framework 6 database-first. I am converting the project to implement the onion architecture to move towards better separation of concerns. I have read many articles and watched many videos but having some issues deciding on my solution structure.
I have 4 projects: Core, Infrastructure, Web & Tests.
From what I've learned, the .edmx file should be placed under my "Infrastructure" folder. However, I have also read about using the Repository and Unit of Work patterns to assist with EF decoupling and using Dependency Injection.
With this being said:
Will I have to create Repository Interfaces under CORE for ALL entities in my model? If so, how would one maintain this on a huge database? I have looked into automapper but found issues with it presenting IEnumererables vs. IQueryables but there is an extension available it has to hlep with this. I can try this route deeper but want to hear back first.
As an alternative, should I leave my edmx in Infrastructure and move the .tt T4 files for my entities to CORE? Does this present any tight coupling or a good solution?
Would a generic Repository interface work well with the suggestion you provide? Or maybe EF6 already resolves the Repository and UoW patterns issue?
Thank you for looking at my question and please present any alternative responses as well.
I found a similar post here that was not answered:
EF6 and Onion architecture - database first and without Repository pattern
Database first doesn't completely rule out Onion architecture (aka Ports and Adapters or Hexagonal Architecture, so you if you see references to those they're the same thing), but it's certainly more difficult. Onion Architecture and the separation of concerns it allows fit very nicely with a domain-driven design (I think you mentioned on twitter you'd already seen some of my videos on this subject on Pluralsight).
You should definitely avoid putting the EDMX in the Core or Web projects - Infrastructure is the right location for that. At that point, with database-first, you're going to have EF entities in Infrastructure. You want your business objects/domain entities to live in Core, though. At that point you basically have two options if you want to continue down this path:
1) Switch from database first to code first (perhaps using a tool) so that you can have POCO entities in Core.
2) Map back and forth between your Infrastructure entities and your Core objects, perhaps using something like AutoMapper. Before EF supported POCO entities this was the approach I followed when using it, and I would write repositories that only dealt with Core objects but internally would map to EF-specific entities.
As to your questions about Repositories and Units of Work, there's been a lot written about this already, on SO and elsewhere. You can certainly use a generic repository implementation to allow for easy CRUD access to a large set of entities, and it sounds like that may be a quick way for you to move forward in your scenario. However, my general recommendation is to avoid generic repositories as your go-to means of accessing your business objects, and instead use Aggregates (see DDD or my DDD course w/Julie Lerman on Pluralsight) with one concrete repository per Aggregate Root. You can separate out complex business entities from CRUD operations, too, and only follow the Aggregate approach where it is warranted. The benefit you get from this approach is that you're constraining how the objects are accessed, and getting similar benefits to a Facade over your (large) set of database entities.
Don't feel like you can only have one dbcontext per application. It sounds like you are evolving this design over time, not starting with a green field application. To that end, you could keep your .edmx file and perhaps a generic repository for CRUD purposes, but then create a new code first dbcontext for a specific set of operations that warrant POCO entities, separation of concerns, increased testability, etc. Over time, you can shift the bulk of the essential code to use this, while still keeping the existing dbcontext so you don't lose and current functionality.
I am using entity framework 6.1 in my DDD project. Code first works out very well if you want to do Onion Architecture.
In my project we have completely isolated Repository from the Domain Model. Application Service is what uses repository to load aggregates from and persist aggregates to the database. Hence, there is no repository interfaces in the domain (core).
Second option of using T4 to generate POCO in a separate assembly is a good idea. Please remember that your domain model (core) should be persistence-ignorant.
While generic repository are good for enforcing aggregate-level operations, I prefer using specific repository more, simply because not every Aggregate is going to need all of those generic repository operations.
http://codingcraft.wordpress.com/

Repository and IoC Patterns

Previously I asked this question and on a answer I got this comment:
This works, however injecting the container to a part, as far as I know, is not a "normal" use-case of MEF.
In my web app I have a few repositories that, of course, retrieve entities from the DB. To make them as loosely coupled as possible I'm making them return interfaces (eg IUser, IBill, IBlaBlaBla...) and the repository project only references the library project (that contains the interfaces). I use MEF composition capabilities to tie it all up...
Since the repository must have a concrete object to fill with info it got from the DB and the only the Container is aware of which concrete class maps to a specific interface I think that the repository MUST have reference to the container so it can call the "Resolve()", get a new instance and do his job, but that apparently is a mistake.
Can anyone tell me why and what approach would be better?
PS: I don't know if it's relevant but I'm using DDD...
I think the flaw in your design that lead to this problem is the use of interfaces to hide entities behind. Since entities are your core concept in your domain, there should be no use in hiding them behind an abstraction. Let's put it differently: do you ever have a different implementation of IUser?
In other words, ditch the IUser, IBill, etc. interface and let your repositories and business commands depend directly on your aggregate entities.

Non business logic database access in DDD

just starting with DDD. I understand that all the domain entities and their logic should be kept in Domain layer along with repository interfaces. In my application I store some data in the database that is used to construct parts of the user interface (presentation layer) at run time. Where should I keep the poco classes and repository interfaces for such types, in Application layer or Domain layer. I can not make out because these objects will not have any domain related business logic but they need to be hydrated from the database and I am using EF so for data access I will need entities and repositories so to me obvious choice is to keep them with all other entities and repository interfaces in domain layer but that breaks DDD rules
I think you've answered your own question:
In my application I store some data in the database that is used to
construct parts of the user interface (presentation layer) at run
time.
Access to data that is used to serve the UI should be encapsulated in the presentation layer. This is different from the application layer in that a DDD application layer implements use cases by orchestrating repositories and delegating to appropriate entities, domain services - it forms an API around your domain layer. Different presentation implementations can use the same application service.
However, different presentation layer implementations may require different data. Therefore, place presentation data access directly in the presentation layer. This can be implemented with EF which is not exclusive for DDD scenarios.
A good way to think of it is this - if you had to replace EF with stored procedures for performance reasons, you should not have to modify your domain code. So anything that blocks that needs to be hidden behind an interface and made replaceable.
Repositories CAN be a part of domain logic. However what should not be is the mechanism for persistence. This can easily be encapsulated into a DAL service or other object, which of course is programmed to an IDALService interface. So when you need to switch persistance (for example, moving from EF to a NoSQL solution) you simply write an alternative version that implements the IDALService interface, and you're good to go. The repositories can still do the logic within them, but are now using a new way to store them (this is an Inversion of Control idea).
As for the POCO objects, in DDD the real question is what do they represent? Are they entities, that need to be persisted? Are they value objects? Don't let the technology (EF) determine what the structure of your object model is, instead provide means to translate to an application scope.
In the Application layer. Keep the domain intact and isolated.

Considering the following architectural changes and need some advice (Domain Entities, DTO, Aggregates)

about a year ago I set set up a solution consisting of an ASP.Net MVC 3 (now) presentation layer, application layer, domain layer and infrastructure layer (crosscutting stuff and data). I decided to keep the domain model in a separate project from the domain logic and use a relaxed approach to the presentation layer by passing the domain entities instead of DTO's since we really only have 1 front end right now.
We are going to be servicing a distributed layer soon, in addition to our main website and I will use DTO's there, but I am considering using DTO's in the main website also. I am also wondering if I should bother to break out the framework code in the domain layer (IRepository, IUnitOfWork, Entity/Value object supertypes etc). Well here, let me list out the questions I need feedback on:
1) I was pretty diligent about not having an anemic domain model and also watched out for behavior that was specific to the presentation concerns. Most of the business calculations that are needed are on the domain entities, is it ok for the presentation layer to call this behavior directly or should it instead call an application service that then calls the domain entities? This would suggest to me that there is no reason to have the presentation layer know about the domain entities and instead could use DTO's. Alternatively, I could have the DTO's expose these behaviors, but then I feel like I am robbing the domain entities. So I guess that is 3 options (Rich domain objects called directly, service layer or dto with behavior) which is best?
2) Right now I have a domain project, which has domain services, specifications and logic and is orchestrated by the application layer and separate project for the domain model (used by presentation layer and application layer). I also have framework interfaces for generic repository and unit of work pattern here. Should I break the framework stuff out into a separate project and combine the rest into one project?
3) I want to reorganize my domain layer into aggregates, right now all of the domain model is organized by modules, basically all the types for each module are in one namespace. Would it be better to organize the entities, value objects, services and other stuff by the aggregates?
4) Should I use the Separated Interface pattern for infrastructure services that are basically .net framework helper library types? For example configuration objects or validation runners? What is the benefit there in doing so?
5) Lastly, not many examples I have seen have used interfaces for domain entities. Almost every object I have I prefer to pass around interfaces for dependency reasons and it makes testing much easier. Is it valid to use interfaces instead of concretes? I should mention that we use EF 4.3.1 (soon to upgrade to latest version) and I seem to remember that EF had a problem with using interfaces or something. Should I be exposing interfaces instead of the domain entities?
Thank you very much in advance.
Project Structure:
Presentation.Web
| |
| Application
| | |
Domain.Model - Domain
(Infrastructure.Data, Infrastructure.Core, Infrastructure.Security)
Explanation:
Presentation.Web (MVC3 Web Project)
Application
-- Service Layer that orchestrates the domain layer and responds to requests from the presentation layer (get this update that). This is organized by module, for example if I had a customer module I would have Application.Customer and in that would be all of the application services
Domain
-- Contains domain services, specifications, calculations and other domain logic that is not exposed as behavior on domain entities. For example a calculation that involves several domain entities exposed as a domain service for the application layer to call.
-- Also contains framework code for a specification framework and the main interfaces for a generic repository and unit of work pattern.
Domain.Model
-- Contains the domain entities and enumerations. Organized by module. For example, if I might have a customer module which has a customer entity, customerorder entity etc. This is broken out away from the domain project so that the objects can be used by the application and presenation layer.
Infrastructure.Security
-- Security infrastructure for authentication and authorization
Infrastructure.Core
-- Cross-cutting stuff used by multiple layers (validators, logging, configuration, extensions, IoC, email etc..). Most of the projects depend on interfaces in this project (except domain.model) for infrastructure services.
Infrastructure.Data
-- Repository Implementations via LINQ and EF 4.3.1, mapping layer, Unit of Work implementation. Interfaces are in Domain project (separated interfaces pattern)
1) First, determine whether your main website really needs to use the application layer. IMHO, if your application services and your main website are on the same web server, then you should evaluate whether the potential performance loss is worth having your main website call app server methods when it could call the domain objects directly. However, if your application server is definitely on another server, then yes, you should have the application server call your domain objects and pass only DTOs back and forth between it and any presentation layers you may have, including your main website.
2) This is really a question on preference of organization. Both are valid. You choose.
3) Anoter question on preference of organization. I, personally, organize my code by bounded context first. Then, I have entities and aggregate roots directly under them. Then, I have folders for Enumerations, Repositories (interfaces), Services (interfaces), Specifications, and Values. The namespaces do not reflect this organizational structure past the last bounded context folder. But, again, you should do this in the way that best suits the way you look at the code.
4) This is an implementation concern. I, personally, only break out implementation concerns into interfaces if I think there is a good possibility that I will need to swap out the implementations in the future. That being said, I usually organize my helper libraries into specific infrastructure contexts (eg. MainContext.Web.MVC.Helpers or MainContext.Web.WebForms.Helpers.) These rarely change and I have yet to come across an instance where I needed to swap out implementations entirely.
5) From my understanding, it is perfectly valid to use interfaces instead of concretes for your domain entities. That being said, I have yet to run into a case where I needed different implementations for my domain entities. The only reason I can even think of would be if you needed to change your business logic for one application, but leave an older application using the original business logic. If your business objects are good models for the domain, I can't fathom you actually running into this problem, but I have seen examples where people do this just for the sake of the abstraction. IMHO, that is not worth the extra coding effort, but if it makes you feel good inside or you get some actual benefit (eg. making testing easier), there isn't any reason why you can't abstract out your domain entities. That being said, domain services and repositories should definitely have contracts that allows you to swap out their implementations.
Answer 5 is derived from the idea that the application is the one who chooses the implementations. If you are trying to achieve onion architecture, then your application is going to be choosing the concrete implementations for everything (repositories, domain services, and other abstracted implementation concerns). I see no reason why it can't just use domain aggregates directly since they are the concrete representation of your domain model. (Note: All entities should be encapsulated into aggregates. The application should never be able to hold a reference to an entity that is not an aggregate under the context)

EF + WCF in three-layered application with complex object graphs. Which pattern to use?

I have an architectural question about EF and WCF.
We are developing a three-tier application using Entity Framework (with an Oracle database), and a GUI based on WPF. The GUI communicates with the server through WCF.
Our data model is quite complex (more than a hundred tables), with lots of relations. We are currently using the default EF code generation template, and we are having a lot of trouble with tracking the state of our entities.
The user interfaces on the client are also fairly complex, sometimes an object graph with more than 50 objects are sent down to a single user interface, with several layers of aggregation between the entities. It is an important goal to be able to easily decide in the BLL layer, which of the objects have been modified on the client, and which objects have been newly created.
What would be the clearest approach to manage entities and entity states between the two layers? Self tracking entities? What are the most common pitfalls in this scenario?
Could those who have used STEs in a real production environment tell their experiences?
STEs are supposed to solve this scenario but they are not silver bullet. I have never used them in real project (I don't like them) but I spent some time playing with them. The main pitfalls I found are:
Coupling your data layer with your client application - you must share entity assembly between projects (it also means it is .NET only solution but it should not be a problem in your case)
Large data transfers - you pass 50 entities to clients, client change single entity and you will pass 50 entities back. It will require some fighting with STEs to avoid passing unnecessary data
Unnecessary updates to database - normally when EF works with attached entities it track changes on property level but with STEs it track changes on entity level. So if user modify single property in entity with 100 properties it will generate update with setting all of them. It will require modifying template and adding property level change tracking to avoid this.
Client application should use STEs directly (binding STEs to UI) to get most of its self tracking ability. Otherwise you will have to implement code which will move data from UI back to self tracking entity and modify its state.
They are not proxied = they don't support lazy loading (in case of WCF service it is good behavior)
I described today the way to solve this without STEs. There is also related question about tracking over web services (check #Richard's answer and provided links).
We have developed a layered application with STE's. A user interface layer with ASP.NET and ModelViewPresenter, a business layer, a WCF service layer and the data layer with Entity Framework.
When I first read about STE's the documentation said that they are easier then using custom DTO's. They should be the 'quick and easy way' and that only on really big projects you should use hand written DTO's.
But we've run in a lot of problems using STE's. One of the main problems is that if your entities come from multiple service calls (for example in a master detail view) and so from different contexts you will run into problems when composing the graphs on the server and trying to save them. So our server function still have to check manually which data has changed and then recompose the object graph on the server. A lot has been written about this topic but it's still not easy to fix.
Another problem we ran into was that the STE's wouldn't work without WCF. The change tracking is activated when the entities are serialized. We've originally designed an architecture where WCF could be disabled and the service calls would just be in process (this was a requirement for our unit tests, which would run a lot faster without wcf and be easier to setup). It turned out that STE's are not the right choice for this.
I've also noticed that developers sometimes included a lot of data in their query and just send it to the client instead of really thinking about which data they needed.
After this project we've decided to use custom DTO's with automapper from server to client and use the POCO template in our data layer in a new project.
So since you already state that your project is big I would opt for custom DTO's and service functions that are a specifically created for one goal instead of 'Update(Person person)' functions that send a lot of data
Hope this helps :)