Entity Framework, Projections, and Relations (oh my?) - entity-framework

So, I'm working on new software, but I have no choice but to brownfield the database. I would like to use Entity Framework where it makes sense.
Here's my dilemma:
Since the tables are very wide, and I can't change this, I will probably make heavy use of projection to limit the width of the datasets that I query.
I do want to make use of navigation properties where it makes sense
From what I've seen, a lot of people use a model where there is a single DbContext class for the whole project.
So,
I'm weighing these pro's and cons, and I'm wondering what the established best practices might be:
Use 1 DbContext.
There could be A LOT of "pollution" here, with bunches of projections of the data inside of the 1 context class. This sounds like it could become a maintenance nightmare.
Don't make my projections dbsets at all -- just make them plain old objects and select new MyProject {..} into them.
This offers the benefit of keeping my projections in module-specific assemblies and namespaces, but now I get NO navigation/lazy loading/ etc.
Be evil?? and use multiple DbContexts?
I'm not really sure what the maintenance story looks like here, but I'm kind of starting to lean in this direction. My biggest problem with it is that it feels like I'm swimming against the current -- not many people seem to do this, but for a large system, it seems like it could be the best option.
Thoughts?

I think you must use POCO or DTO for data transfer between different layers of application. Use ViewModel to send data to View.
Consider using Repository Pattern and UoW to have a better and efficient architecture in this scenario. Limit the use of navigation properties till repository otherwise they makes entity heavy while transferring those across layers (Use POCO or DTO).
If you are doing as above, then I do not think using multiple DbContexts would give you any benefit. Thanks.

Related

Entity Framework: multiple dbcontext or not? And a few other performance related question

I’m build a calendar/entry/statistics application using quite complex models with a large number of relationships between models.
In general I’m concerned of performance, and are considering different strategies and are looking for input before implementing the application.
I’m completely new to DBContextPooling so please excuse me for possible stupid questions. But has DBContextPooling anything to do with the use of multiple DBContext classes, or is the use related to improved performance regardless of a single or multiple DBContext?
I might end up implementing a larger number of DBsets, or should I avoid it? I’m considering to create multiple DBContext Classes for simplicity, but will this reduce memory use and improve performance? Would it be better/smarter to split the application into smaller projects?
Is there any performance difference in using IEnumerable vs ICollection? I’m avoiding the use of lists as much as possible. Or would it be even better to use IAsyncEnumerable?
Most of your performance pain points will come from a complex architecture that you have not simplified. Managing a monolithic application result in lots of unnecessary 'compensating' logic when one use case is treading on the toes of another and your relationships are so intertwined.
Optimisations such as whether to use Context Pooling or IEnumerable vs ICollection can come later. They should not affect the architecture of your solution.
If your project is as complex as you suggest, then I'd recommend you read up on Domain Driven Design and Microservices and break your application up into several projects (or groups of projects).
https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/
Each project (or group of projects) will have its own DbContext to administer the entities within that project.
Further, each DbContext should start off by only exposing Aggregate Roots through the DbSets. This can mean more database activity than is strictly necessary for a particular use case but best to start with a clean architecture and squeeze that last ounce of performance (sometimes at the cost of architectural clarity), if and when needed.
For example if you want to add an attendee to an appointment, it can be appealing to attack the Attendee table directly. But, to keep things clean, and considering an attendee cannot exist without an appointment, then you should make appointment the aggregate root and only expose appointment as an entry point for the outside world to attack. That appointment can be retrieved from the database with its attendees. Then ask the appointment to add the attendees. Then save the appointment graph by calling SaveChanges on the DbContext.
In summary, your Appointment is responsible for the functionality within its graph. You should ask Appointment to add an Attendee to the list instead of adding an Attendee to the Appointment's list of attendees. A subtle shift in thinking that can reduce complexity of your solution an awful lot.
The art is deciding where those boundaries between microservices/contexts should lie. There can be pros and cons to two different architectures, with no clear winner
To your other questions:
DbContext Pooling is about maintaining a pool of ready-to-go instantiated DbContexts. It saves the overhead of repeated DbContext instantiation. Probably not worth it, unless you have an awful lot of separate requests coming in and your profiling shows that this is a pain point.
The number of DbSets required is alluded to above.
As for IEnumerable or ICollection or IList, it depends on what functionality you require. Here's a nice simple summary ... https://medium.com/developers-arena/ienumerable-vs-icollection-vs-ilist-vs-iqueryable-in-c-2101351453db
Would it be better/smarter to split the application into smaller
projects?
Yes, absolutely! Start with architectural clarity and then tweak for performance benefits where and when required. Don't start with performance as the goal (unless you're building a millisecond sensitive solution).

Moving logic from Template Toolkit to Catalyst

I think that I am using too much conditionals and calculations in the TT templates.
I am displaying a result set of items from DBIc. For each item I need to calculate things using the retrieved values, and the template doesn't seems to be the right place.
But in Catalyst it is a thick object that comes from DBIc.
So how can I move logic to the model? Must I run a whole loop for all items and change the object somehow?
Regards:
Migue,
First, you're on the right track by wanting to properly separate concerns. You'll thank yourself if you're the maintainer 6-12 months down the road.
IMHO, your Catalyst controllers should be as thin as possible with the business logic in the various models. This makes it easier to test because you don't have the overhead of Catalyst to deal with. I've been thinking about model separation quite a bit myself. There are two schools of thought I've come across:
1) Make your DBIx::Class Result classes have the business logic. This approach is convenient and simple.
2) Make a standalone Model which is instantiated by the Controller, and which has a DBIx::Class schema object. The model would use the DBIC schema to query the database, and then use the resulting data in its own business logic methods. This approach might be better if you have a lot of business logic since you separate DB access from business logic.
Personally, I've historically used approach #1 but I'm leaning towards #2 for larger apps.
Two possibilities.
Create a method in corresponding schema class.
(if 1 is not possible) Pass a callback to template that would have this object as argument.
You could
create a resultset that retrieves the data from the database and then calculates the needed values
if possible calculate the needed values within the database and then only retrieve the data needed for output
I personally would prefer the second one.
I hope that helps.

Rules of thumbs for writing "queries" using ADO.NET Entity Framework

I’m currently working on a prototype of a medium size web application, and I thought that it would be good to also experiment with Entity Framework. The problem is that the major part of the application is not the data layer and logic, and so that I don't have much time to play with Entity Framework. On the other hand, the database schema is quite simple.
One of the problems I’m facing is that I cannot find a consistent way to "write queries". As far as I can tell, there are four "interfaces" for the job:
LINQ to Entities
LINQ to Entities using LINQ extension methods
Entity SQL
Query builder
OK, the first two are essentially the same, but it’s good to use just one for maintenance and consistency.
I’m mostly puzzled by the fact that none of them seems to be complete and the most general. I often find myself cornered and using some ugly looking combination of several of them. My guess is that Entity SQL is the most general one, but writing queries using strings feels like a step back. The main reason I’m experimenting with something like Entity Framework is that I like the compile time checking.
Some other random thought / issues:
I often also use the ObjectQuery.Include() method, but again it takes a string. Is this the only way?
When to use ObjectQuery.Execute() (vs. ToList())? Does it actually execute the query?
Should execute queries as soon as possible (e.g. using ToList()) or should I not care just let leave the execution for the first enumeration which gets in the way?
Are ObjectQuery.Skip() and ObjectQuery.Take() available only as extension methods? Is there a better way to do paging? It’s 2009 and almost every web application deals with paging.
Overall, I understand there are many difficulties when implementing an ORM, and often one has to compromise. On the other hand, the direct database access (e.g. ADO.NET) is plain and simple and has well defined interface (tabular results, data readers), so all code - no matter who and when writes it - is consistent. I don’t want to faced with too many choices whenever writing a database query. It’s too tedious and more than likely different developers will come up with different ways.
What are your rules of thumbs?
I use LINQ-to-Entities as much as possible. I also try and formalise to the lambda-form, as opposed to the extended SQL-style syntax. I have to admit to have had problems enforcing relationships and making compromises on efficiency just to expedite my coding of our application (eg. Master->Child tables may need to be manually loaded) but all in all, EF is a good product.
I do use EF's .Include() method for lazy-loading, which as you say, does require a string input. I find no problem with this, other than that of identifying the string to use which is relatively simple. I guess if you're keen on compile-time checking of such relations, a model similar to: Parent.GetChildren() might be more appropriate.
My application does require some "dynamic" queries to be performed, though. I have two ways of meeting this:
a) I create a mediator object, eg. ClientSearchMediator, which "knows" how to search for clients by name, etc. I can then put this through a SearchHandler.Search(ISearchMediator[] mediators) call (for example). This can be used to target specific data structures and sort results accordingly using LINQ-to-Entities.
b) For a looser experience, possibly as a result of a user designing their own query (using high level tools our application provides), eSQL is ideal for this purpose. It can be made to be injection-safe.
I don't have enough knowledge to address all of this, but I'll at least take a few stabs.
I don't know why you think ADO.NET is more consistent than Entity Framework. There are many different ways to use ADO.NET and I've definitely seen inconsistency within a single code base.
Entity Framework is currently a 1.0 release and it suffers from many 1.0 type problems (incomplete & inconsistent API, missing features, etc.).
In regards to Include, I assume you are referring to eager loading. Multiple people (outside of Microsoft) have developed solutions for getting "type safe" includes (try googling something like: Entity Framework ObjectQueryExtension Include). That said, Include is more of a hint than anything. You can't force eager loading and you have to always remember to call the IsLoaded() method to see if your request was fulfilled. As far as I know, the way "Include" works is not changing at all in the next version of Entity Framework (4.0 - to ship with VS 2010).
As far as executing the Linq query as soon as it's built vs. the last possible moment, that decision is situational. Personally, I would probably execute it as soon as it's built for the most part unless there was a compelling reason not to, but I can see other people going the opposite direction.
There are more mature ORMs on the market and Entity Framework isn't necessarily your best option. For the most part, you can bend Entity Framework to your will, but you may end up rolling your own implementation of features that come out of the box with other ORMs.

Linq to SQL, Entity Framework, Repository Pattern, and Dependency Injection

Stephan Walters video on MVC and Models is a very good and light discussion of the various topics listed in this questions title. The one question listed in the notes unanswered was:
If you create an Interface / Repository pattern for Linq2SQL, does Linq2SQLs classes still cause a dependency on Linq, even though you pass the classes as toList?
It is probably an easy answer YES, however, what standard mechanic would you use to represent the data?
Lets say you have a Product entity that is made up of three tables (Prices, Text, and Photos) (you could have sets of price for different regions, different text for localization, and different photos). (Sounds like a builder pattern) Would you create a slice of these tables grabbing the right prices, text, and photos in to a single List? Since Lists may be proprietary, would you use a Dictionary object?
I thank you for your answers. I am very interested in the "standard and proper" way to do it rather than 101 possibilities.
Another quick question: is Entity Framework ready for a complicated database yet? There are a lot of constructs that Linq2SQL likes that EF does not. EF seems to require identity fields as primary keys (HAHA), but it seems like every demo does this. I want to use EF, but I constantly fail to make it work, falling back to Linq2SQL.
If you keep the L2S on the other side of the Repository facade (remember, that's all a Repository is - a facade) then you decouple the rest of your application from L2S. This means that the job of the code behind your repository is to turn the L2S into "domain" objects, custom classes, and then the Repository returns those.
In this sense, the Repository is returning fully formed "Product" objects with all their related Price, Text, and Photo data. This is called an Aggregate Root.
There shouldn't be a problem with Lists, since they are CLR objects.
As far as EF for advanced scenarios, my advice would be not yet, for the reasons you note.
The standard mechanism I'd use to represent the data is a Data Transfer Object. I would never return a LINQ to SQL or Entity Framework object across a service boundary, and I would hesitate to return it across a layer boundary of any kind. This is because these objects will serialize implementation-dependant data.

What are the pros/cons of returning POCO objects from a Repository rathen than EF Entities?

Following the way Rob does it, I have the classes that are generated by the Linq to SQL wizard, and then a copy of those classes that are POCOs. In my repositories I return these POCOs rather than the Linq to SQL models:
return from c in DataContext.Customer
where c.ID == id
select new MyPocoModels.Customer { ID = c.ID, Name = c.Name }
I understand that the benefit of this is that the POCO models can be instantiated easier so this will make my code more testable.
I'm now moving from Linq to SQL over to Entity Framework and I'm about half way through an EF book. It seems there's a lot of goodness I'm going to lose out on by returning POCOs from my repositories rather than the EF entities.
I still haven't really embraced unit testing, so I feel like I'm wasting a lot of time creating these extra POCOs and writing the code to populate them, when all I appear to be gaining is testable code, yet I'm also gonna lose out on a lot of the benefits of the EF by not being able to track my objects.
Does anyone have any advice for a relative newb to all this ORM/Repository stuff?
Anthony
Another reason people don't like the auto-generated objects (in LINQ to SQL for example) is because of their built-in "magic".
Usually the magic is invisible and you never notice it, but when you try to do things like serialize one of those objects and then deserialize it (for example when using web services) its internal connection to the data source is broken and special hacks need to be employed to "put the magic back in".
With POCOs, you don't have to worry about those sorts of things and can get a better separation between your data and service layers. The downside of course is that you have to write lots of boring POCO -> magic object and magic object -> POCO conversion code. But in the end I think it's usually worth it, especially for large or complex projects.
The main reason is that a lot of people like to develop their model with a specific mindset: like DDD for instance. They might want to use a specific pattern (like Spec or State) for things like statuses (instead of enums) - or you might want to use a Factory for instantiation.
OO breaks when you try to use Tables as Objects when things get more complex. Simple sites work OK - but when you get to big big things, it gets ugly.
So - as always - it depends what you think your project will turn into.
My experience is that when you start writting some complex queries .Include method is worthless and you will find yourself either:
a) Writting a lot of queries to get the data you want or
b) abusing of annonymous types to load the data in a single query and then writting a lot of code just to pass that data to your entities.
POCOs are the way to go, IMHO.