Understanding EF Code-First eager-loading - entity-framework

I've been working with EF CodeFirst since EF 4.1 went live, that's more than a year ago, and I feel pretty confortable working with it now. I'm used to custom entity validators, overriding .SaveChanges() to modify some of it's behaviours and to some non-trivial concepts like mapping to nontable db objects. But theres this part of EF that remain cloudy to me: context.Configuration.LazyLoadingEnabled = false;.
I understand the basics, linq queries will be thrown to the database as soon as they are called, dependent collections won't get loaded if I don't explicitly specify it, yadda yadda yadda.
What I would love to understand is:
In what situations should I disable lazy loading? And why?
What are the practical benefits and/or drawbacks of disabling it?
Any additional clarification is welcome.

EF v1 did not support lazy loading. When lazy loading was added in the 2nd round (it was EF4) it could cause problems when porting apps written with EF v1 to EF4 when suddenly a lot more queries would have been sent to the database when it had not happen before. Therefore the easiest way to make a EF v1 work on EF4 was to disable lazy loading.
Another interesting thing is to take a look at how lazy loading is implemented - EF will dynamically create a type derived from your entity type and will add some code to handle lazy loading. This means that EF is not actually using your type but the type derived from your type. This is usually OK but sometimes may cause issues.
Finally sometimes you may want to control when queries are actually sent to the database (e.g. due to latency when using Sql Azure it is usually better/faster to send one query returning a bigger result instead of a lot of queries returning smaller results). With lazy loading it often happens the people don't realize that they hit the database hard when it is not necessary or not effective. One thing to notice here that you can mix both worlds - .Include() will force loading related entities regardless of the lazy loading setting.
You may read more about this here: http://thedatafarm.com/blog/data-access/a-look-at-lazy-loading-in-ef4/ and here: http://msdn.microsoft.com/en-us/magazine/hh205756.aspx

Related

Correct way to persist and existing JPA entity in database

In one application I am working on, I found that the JPA objects that are modified are not previouly loaded, I mean;
Instead of doing like this:
getEntityManager().find(Enterprise.class, idEnterprise);
//Some modifying operations
They do like this(and works!):
Enterprise obj = new Enterprise (IdEnterprise);
//Some modifying operations
getEntityManager().persist(obj);
This last solution doesnt load from database the object and it modifies the object correctly.
How is this possible?
Is a good practice? At least you avoid the query that loads from database, right?
Thanks
It depends. If you are writing code from a controller class (or any application code) you shouldn't be worried about jpa stuff, so the second approach is bad and redundant.
If, instead, you are working in infrastructure code, maybe you can manually persist you entities to enable some performance optimization or simply because you want the data to persist even if the transaction fails.
I strongly suspect the second bit of code is someone creating an entity from scratch, but mixing application, domain and infrastructure code in the same method: an extremely evil practice. (Very evil, darth father evil, never do that)

Entity Framework code first - development strategies

Working on a brand new project from the ground up. That means the data model is in a constant flux, doubly so because things are, inevitably, not as well planned as they should be. Model classes are being created and changed fairly regularly.
The plan was to use the latest version of EF with all the neat code-first stuff in it. But we're constantly tripping over the limitations the framework has in terms of adding or updating tables. The initialization options seem to allow only the complete deletion and re-creation of the database, which isn't really ideal.
I've had a look at the migrations. But this seems a sledgehammer to crack a nut: we don't need to detail every single small change and update with a new migration scaffold.
Are there some better strategies to deal with this? For instance, I started writing some unit tests to pre-populate one of the contexts with some test data, but because this causes the whole Db to drop and re-create, it causes problems with all the other contexts. Or perhaps making use of a custom initialiser to seed the data for us? How can we easily exclude these in production code?
We're also wondering about perhaps abandoning code-first and going back to EDMX diagrams. At least that way changes result in updated SQL commands which can be run directly against the database.
Any suggestions gratefully received.
I think, imho, that:
as the database schema must at least match your model you should/must detail every single change, and code first migration allows that and trace the changes over time
code first migration also allows to migrate the database schema for you
code first migration also allows you to produce sql that allows you to migrate the schema
For these reasons code first is as good (if not better) as the edmx approach
Please take few minutes to implement http://msdn.microsoft.com/en-us/data/jj591621.aspx
One other point, always imho and in a perfect world, if you unit test the business of you model you should not need the DAL, use generic collection. Be aware of different comportement of linq to object vs linq to entities, for example concerning the case sensitivity.

Assert.AreEqual unit testing for DbContext entities

I wish to unit test my business logic is loading the correct data by loading an entity via the business logic and comparing it to an entity loaded directly from the dbcontext.
Assert.AreEqual fails I'm guessing because the entities are loaded as tracked.
I thought that I could possibly use AsNoTracking(), but it didn't work.
Is there a way of "unwrapping" the entity from entity framework to a POCO?
I've read about disabling proxycreation, but is this the only option?
I'm hoping there is something similar (although I realise a completely different concept), to ko.utils.unwrapObservable() in the knockout javascript library.
It is strange integration test (it is not unit test at all because it uses database) - it should be enough to simply define static expectation instead of loading it again from the database. Dynamic tests are more error prone and can hide issues.
To make it work you must override Equal to compare data not references. Disabling proxy creation will not work because you will still have different reference from your business logic and different reference from tested context (unless you share the context but in such case the test will be even more strange).

Entity Framework 4.1 for large number of tables (715)

I'm developing a data access layer for a database with over 700 tables. I created the model including all the tables, which generated a huge model. I then changed the model to use DBContext from 4.1 which seemed to improve how it compiled and worked. The designer didnt seem to work at all.
I then created a test app which just added two records to the table, but the processor went 100% in the db.SaveChanges method. Being a black box it was difficult to accertain what went wrong.
So my questions are
Is the entity framework the best approach to a large database
If so, should the model be broken down into logical areas. I did note that you cant have the same sql table in multiple models
I have read that the code only approach is best in these large cases. What is that.
Any guidance would be truly appreciated
Thanks
Large database is always something special. Any technology has some pros and cons when working with a large database.
The problem you have encountered is the most probably related to building the model. When you start the application and use EF related stuff for the first time EF must build the model description and compile it - this is the most time consuming operation you can find in EF. Complexity of this operation grows with number of entities in the model. Once the model is compiled it is reused for the whole lifetime of the application (if you restart the application or unload application domain the model must be compiled again). You can avoid this by precompiling the model. It is done at design time where you use some tool to generate code from the model and you include that code into your project (it must be done again after each change in the model). For EDMX based models you can use EdmGen.exe to generate views and for code first based models you can use EF Power Tools CTP1.
EDMX (the designer) was improved in VS 2010 SP1 to be able to work with large models but I still think the large in this case is around 100 entities / tables. In the same time you rarely need 715 tables in the same model. I believe that these 715 tables indeed model several domains so you can divide them into multiple models.
The same is true when you are using DbContext and code first. If you model a class do you think that it is correct design when the class exposes 715 properties? I don't think so but that is exactly what your derived DbContext looks like - it has a public property for each exposed entity set (in the simplest mapping it means one property per table).
Same entity can be used in multiple models but you should try to avoid it as much as possible because it can introduce some complexities when loading entity in one context type and using it in other context type.
Code only = code first = Entity framework when you define mapping in the code without using EDMX.
take a look this post.
http://blogs.msdn.com/b/adonet/archive/2008/11/24/working-with-large-models-in-entity-framework-part-1.aspx

Rules of thumbs for writing "queries" using ADO.NET Entity Framework

I’m currently working on a prototype of a medium size web application, and I thought that it would be good to also experiment with Entity Framework. The problem is that the major part of the application is not the data layer and logic, and so that I don't have much time to play with Entity Framework. On the other hand, the database schema is quite simple.
One of the problems I’m facing is that I cannot find a consistent way to "write queries". As far as I can tell, there are four "interfaces" for the job:
LINQ to Entities
LINQ to Entities using LINQ extension methods
Entity SQL
Query builder
OK, the first two are essentially the same, but it’s good to use just one for maintenance and consistency.
I’m mostly puzzled by the fact that none of them seems to be complete and the most general. I often find myself cornered and using some ugly looking combination of several of them. My guess is that Entity SQL is the most general one, but writing queries using strings feels like a step back. The main reason I’m experimenting with something like Entity Framework is that I like the compile time checking.
Some other random thought / issues:
I often also use the ObjectQuery.Include() method, but again it takes a string. Is this the only way?
When to use ObjectQuery.Execute() (vs. ToList())? Does it actually execute the query?
Should execute queries as soon as possible (e.g. using ToList()) or should I not care just let leave the execution for the first enumeration which gets in the way?
Are ObjectQuery.Skip() and ObjectQuery.Take() available only as extension methods? Is there a better way to do paging? It’s 2009 and almost every web application deals with paging.
Overall, I understand there are many difficulties when implementing an ORM, and often one has to compromise. On the other hand, the direct database access (e.g. ADO.NET) is plain and simple and has well defined interface (tabular results, data readers), so all code - no matter who and when writes it - is consistent. I don’t want to faced with too many choices whenever writing a database query. It’s too tedious and more than likely different developers will come up with different ways.
What are your rules of thumbs?
I use LINQ-to-Entities as much as possible. I also try and formalise to the lambda-form, as opposed to the extended SQL-style syntax. I have to admit to have had problems enforcing relationships and making compromises on efficiency just to expedite my coding of our application (eg. Master->Child tables may need to be manually loaded) but all in all, EF is a good product.
I do use EF's .Include() method for lazy-loading, which as you say, does require a string input. I find no problem with this, other than that of identifying the string to use which is relatively simple. I guess if you're keen on compile-time checking of such relations, a model similar to: Parent.GetChildren() might be more appropriate.
My application does require some "dynamic" queries to be performed, though. I have two ways of meeting this:
a) I create a mediator object, eg. ClientSearchMediator, which "knows" how to search for clients by name, etc. I can then put this through a SearchHandler.Search(ISearchMediator[] mediators) call (for example). This can be used to target specific data structures and sort results accordingly using LINQ-to-Entities.
b) For a looser experience, possibly as a result of a user designing their own query (using high level tools our application provides), eSQL is ideal for this purpose. It can be made to be injection-safe.
I don't have enough knowledge to address all of this, but I'll at least take a few stabs.
I don't know why you think ADO.NET is more consistent than Entity Framework. There are many different ways to use ADO.NET and I've definitely seen inconsistency within a single code base.
Entity Framework is currently a 1.0 release and it suffers from many 1.0 type problems (incomplete & inconsistent API, missing features, etc.).
In regards to Include, I assume you are referring to eager loading. Multiple people (outside of Microsoft) have developed solutions for getting "type safe" includes (try googling something like: Entity Framework ObjectQueryExtension Include). That said, Include is more of a hint than anything. You can't force eager loading and you have to always remember to call the IsLoaded() method to see if your request was fulfilled. As far as I know, the way "Include" works is not changing at all in the next version of Entity Framework (4.0 - to ship with VS 2010).
As far as executing the Linq query as soon as it's built vs. the last possible moment, that decision is situational. Personally, I would probably execute it as soon as it's built for the most part unless there was a compelling reason not to, but I can see other people going the opposite direction.
There are more mature ORMs on the market and Entity Framework isn't necessarily your best option. For the most part, you can bend Entity Framework to your will, but you may end up rolling your own implementation of features that come out of the box with other ORMs.