Entity Framework Table Per Type Performance - entity-framework

So it turns out that I am the last person to discover the fundamental floor that exists in Microsoft's Entity Framework when implementing TPT (Table Per Type) inheritance.
Having built a prototype with 3 sub classes, the base table/class consisting of 20+ columns and the child tables consisting of ~10 columns, everything worked beautifully and I continued to work on the rest of the application having proved the concept. Now the time has come to add the other 20 sub types and OMG, I've just started looking the SQL being generated on a simple select, even though I'm only interested in accessing the fields on the base class.
This page has a wonderful description of the problem.
Has anyone gone into production using TPT and EF, are there any workarounds that will mean that I won't have to:
a) Convert the schema to TPH (which goes against everything I try to achieve with my DB design - urrrgghh!)?
b) rewrite with another ORM?
The way I see it, I should be able to add a reference to a Stored Procedure from within EF (probably using EFExtensions) that has the the TSQL that selects only the fields I need, even using the code generated by EF for the monster UNION/JOIN inside the SP would prevent the SQL being generated every time a call is made - not something I would intend to do, but you get the idea.
The killer I've found, is that when I'm selecting a list of entities linked to the base table (but the entity I'm selecting is not a subclass table), and I want to filter by the pk of the Base table, and I do .Include("BaseClassTableName") to allow me to filter using x=>x.BaseClass.PK == 1 and access other properties, it performs the mother SQL generation here too.
I can't use EF4 as I'm limited to the .net 2.0 runtimes with 3.5 SP1 installed.
Has anyone got any experience of getting out of this mess?

This seems a bit confused. You're talking about TPH, but when you say:
The way I see it, I should be able to add a reference to a Stored Procedure from within EF (probably using EFExtensions) that has the the TSQL that selects only the fields I need, even using the code generated by EF for the monster UNION/JOIN inside the SP would prevent the SQL being generated every time a call is made - not something I would intend to do, but you get the idea.
Well, that's Table per Concrete Class mapping (using a proc rather than a table, but still, the mapping is TPC...). The EF supports TPC, but the designer doesn't. You can do it in code-first if you get the CTP.
Your preferred solution of using a proc will cause performance problems if you restrict queries, like this:
var q = from c in Context.SomeChild
where c.SomeAssociation.Foo == foo
select c;
The DB optimizer can't see through the proc implementation, so you get a full scan of the results.
So before you tell yourself that this will fix your results, double-check that assumption.
Note that you can always specify custom SQL for any mapping strategy with ObjectContext.ExecuteStoreQuery.
However, before you do any of this, consider that, as RPM1984 points out, your design seems to overuse inheritance. I like this quote from NHibernate in Action
[A]sk yourself whether it might be better to remodel inheritance as delegation in the object model. Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. [Your ORM] acts as a buffer between the object and relational models, but that doesn't mean you can completely ignore persistence concerns when designing your object model.

We've hit this same problem and are considering porting our DAL from EF4 to LLBLGen because of this.
In the meantime, we've used compiled queries to alleviate some of the pain:
Compiled Queries (LINQ to Entities)
This strategy doesn't prevent the mammoth queries, but the time it takes to generate the query (which can be huge) is only done once.
You'll can use compiled queries with Includes() as such:
static readonly Func<AdventureWorksEntities, int, Subcomponent> subcomponentWithDetailsCompiledQuery = CompiledQuery.Compile<AdventureWorksEntities, int, Subcomponent>(
(ctx, id) => ctx.Subcomponents
.Include("SubcomponentType")
.Include("A.B.C.D")
.FirstOrDefault(s => s.Id == id));
public Subcomponent GetSubcomponentWithDetails(int id)
{
return subcomponentWithDetailsCompiledQuery.Invoke(ObjectContext, id);
}

Related

Any decent resources on how to map complex POCO objects in EF 4.1?

So I heard L2S is going the way of the dodo bird. I am also finding out that if I use L2S, I will have to write multiple versions of the same code to target different schemas even if they vary slightly. I originally chose L2S because it was reliable and easy to learn, while EF 3 wasn't ready for public consumption at the time.
After reading lots of praises for EF 4.1, I thought I would do a feasibility test. I discovered that EF 4.1 is a beast to get your head around. It is mindnumblingly complex with hundreds of ways of doing the same thing. It seems to work fine if you're planning on using simple table-to-object mapped entities, but complex POCO object mapping has been a real PITA. There are no good tutorials and the few that exist are very rudimentary.
There are tons of blogs about learning the fundamentals about EF 4.1, but I have a feeling that they deliberately avoid advanced topics. Are there any good tutorials on more complex mapping scenarios? For instance, taking an existing POCO object and mapping it across several tables, or persisting a POCO object that is composed of other POCO objects? I keep hearing this is possible, but haven't found any examples.
Disclaimer: IMO EF 4.1 is best known for its Code-First approach. Most of the following links point to articles about doing stuff in code-first style. I'm not very familiar with DB-First or Model-First approaches.
I have learned many things from Mr. Manavi's blog. Especially, the Inheritance with code-first series was full of new stuff for me. This MSDN link has some valuable links/infos about different mapping scenarios too. Also, I have learned manu stuff by following or answering questions with entity-framework tags here on SO.
Whenever I want to try some new complex object mapping, I do my best (based on my knowledge about EF) to create the correct mappings; However sometimes, you face a dead end. That's why god created StackOverflow. :)
What do you mean by EFv4.1? Do you mean overhyped code-first / fluent-API? In such case live with a fact that it is mostly for simple mapping scenarios. It offers more then L2S but still very little in terms of advanced mappings.
The basic mapping available in EF follows basic rule: one table = one entity. Entity can be single class or composition of the main class representing the entity itself and helper classes for set of mapped fields (complex types).
The most advanced features you will get with EF fluent-API or designer are:
TPH inheritance - multiple tables in inheritance hierarchy mapped to the same table. Types are differed by special column called discriminator. Shared fields must be in parent class.
TPT inheritance - each type mapped to the separate table = basic type has one table and each derived type has one table as well. Shared fields must be defined in base type and thus in base table. Relation between base and derived table is one-to-one. Derived entities span multiple tables.
TPC inheritance - each class has separate table = shared fields must be defined in base type but each derived type has them in its own table.
Entity splitting - entity is split into two or more tables which are related by one-to-one relation. All parts of entity must exist.
Table splitting - table is split into two or more entities related with one-to-one relation.
Designer also offers
Conditional mapping - this is not real mapping. It is only hardcoded filter on mapping level where you select one or more fields to restrict records which are allowed for loading.
When using basic or more advanced features table can participate only in one mapping.
All these mapping techniques follow very strict rules. Your classes and tables must follow these rules to make them work. That means you cannot take arbitrary POCO and map it to multiple tables without satisfying those rules.
These rules can be avoided only when using EDMX and advanced approach with advanced skills = no fluent API and no designer but manual modifications of XML defining EDMX. Once you go this way you can use
Defining query - custom SQL query used to specify loading of new "entity". This is also approach natively used by EDMX and designer when mapping database view
Query view - custom ESQL query used to specify new "entity" from already mapped entities. It is more usable for predefined projections because in contrast to defining query it has some limitations (for example aggregations are not allowed).
Both these features allow you defining classes combined from multiple tables. The disadvantage of both these mapping techniques is that mapped result is read only. You must use stored procedures for persisting changes when using these techniques.

Why does EF 4.1 not support complex queries as well as linq-to-sql?

I am in the process of converting our internal web application from Linq-To-Sql to EF CodeFirst from an existing database. I have been getting annoyed with Linq-To-Sql's limitations more and more lately, and having to update the edmx after updating a very intertwined database table finally frustrated me enough to switch to EF.
However, I am encountering several situations where using linq with Linq-To-Sql is more powerful than the latest Entity Framework, and I am wondering if anyone knows the reasoning for it? Most of this seems to deal with transformations. For example, the following query works in L2S but not in EF:
var client = (from c in _context.Clients
where c.id == id
select ClientViewModel.ConvertFromEntity(c)).First();
In L2S, this correctly retrieves a client from the database and converts it into a ClientViewModel type, but in EF this exceptions saying that Linq to Entities does not recognize the method (which makes sense as I wrote it.
In order to get this working in EF I have to move the select to after the First() call.
Another example is my query to retrieve a list of clients. In my query I transform it into an anonymous structure to be converted into JSON:
var clients = (from c in _context.Clients
orderby c.name ascending
select new
{
id = c.id,
name = c.name,
versionString = Utils.GetVersionString(c.ProdVersion),
versionName = c.ProdVersion.name,
date = c.prod_deploy_date.ToString()
})
.ToList();
Not only does my Utils.GetVersionString() method cause an unsupported method exception in EF, the c.prod_deploy_date.ToString() causes one too and it's a simple DateTime. Like previously, in order to fix it I had to do my select transformation after ToList().
Edit: Another case I just came across is that EF cannot handle where clauses that compare entities where as L2S has no issues for with it. For example the query
context.TfsWorkItemTags.Where(x => x.TfsWorkItem == TfsWorkItemEntity).ToList()
throws an exception and instead I have to do
context.TfsWorkItemTags.Where(x => x.TfsWorkItem.id == tfsWorkItemEntity.id).ToList()
Edit 2: I wanted to add another issue that I found. Apparently you can't use arrays in EF Linq queries, and this probably annoys me more than anything. So for example, right now I convert an entity that denotes a version into an int[4] and try to query on it. In Linq-to-Sql I used the following query:
return context.ReleaseVersions.Where(x => x.major_version == ver[0] && x.minor_version == ver[1]
&& x.build_version == ver[2] && x.revision_version == ver[3])
.Count() > 0;
This fails with the following exception:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities.
Edit 3: I found another instance of EF's bad Linq implementation. The following is a query that works in L2S but doesn't in EF 4.1:
DateTime curDate = DateTime.Now.Date;
var reqs = _context.TestRequests.Where(x => DateTime.Now > (curDate + x.scheduled_time.Value)).ToList();
This throws an ArgumentException with the message DbArithmeticExpression arguments must have a numeric common type.
Why does it seem like they downgraded the ability for Linq queries in EF than in L2S?
Edit (9/2/2012): Updated to reflect .NET 4.5 and added few more missing features
This is not the answer - it cannot be because the only qualified person who can answer your question is probably a product manager from ADO.NET team.
If you check feature set of old datasets then linq-to-sql and then EF you will find that critical features are removed in newer APIs because newer APIs are developed in much shorter times with big effort to deliver new fancy features.
Just list of some critical features available in DataSets but not available in later APIs:
Batch processing
Unique keys
Features available in Linq-to-Sql but not supported in EF (perhaps the list is not fully correct, I haven't used L2S for a long time):
Logging database activity
Lazy loaded properties
Left outer join (DefaultIfEmpty) since the first version (EF has it since EFv4)
Global eager loading definitions
AssociateWith - for example conditions for eager loaded data
Code first since the first version
IMultipleResults supporting stored procedures returning multiple result sets (EF has it in .NET 4.5 but there is no designer support for this feature)
Support for table valued functions (EF has this in .NET 4.5)
And some others
Now we can list features available in EF ObjectContext API (EFv4) and missing in DbContext API (EFv4.1):
Mapping stored procedures
Conditional mapping
Mapping database functions
Defining queries, QueryViews, Model defined functions
ESQL is not available unless you convert DbContext back to ObjectContext
Manipulating state of independent relationships is not possible unless you convert DbContext back to ObjectContext
Using MergeOption.OverwriteChanges and MergeOption.PreserveChanges is not possible unless you convert DbContext back to ObjectContext
And some others
My personal feeling about this is only big sadness. Core features are missing and features existing in previous APIs are removed because ADO.NET team obviously doesn't have enough resources to reimplement them - this makes migration path in many cases almost impossible. The whole situation is even worse because missing features or migration obstacles are not directly listed (I'm afraid even ADO.NET team doesn't know about them until somebody reports them).
Because of that I think that whole idea of DbContext API was management failure. At the moment ADO.NET team must maintain two APIs - DbContext is not mature to replace ObjectContext and it actually can't because it is just a wrapper and because of that ObjectContext cannot die. Resources available for EF development was most probably halved.
There are more problems related. Once we leave ADO.NET team and look on the problem from the perspective of MS product suite we will see so many discrepancies that I sometimes even wonder if there is any global strategy.
Simply live with the fact that EF's provider works in different way and queries which worked in Linq-to-sql don't have to work with EF.
A little late to the game, but I found this post while searching for something else, and figured that I'd post an answer to the fundamental questions in the original post, which mostly boil down to "LINQ to SQL allows Expression [x], but EF doesn't".
The answer is that the query provider (the code that translates your LINQ expression tree into something that actually executes and returns an enumerable set of stuff) is fundamentally different between L2S and EF. To understand why, you have to realize that another fundamental difference between L2S and EF is that L2S is table-based and EF is entity-model-based. In other words, EF works with conceptual entity models, and knows that the underlying physical model (the DB tables) don't necessarily reflect conceptual entities. This is because tables are normalized/denormalized, and have weird ways of dealing with entity type generalization (inheritance). So in EF, you have a picture of the conceptual model (which is the objects you code against in VB/C#, etc.) and a mapping to the physical underlying table(s) that comprise your conceptual entities. L2S does not do this. Every "model entity" in L2S is strictly a single table, with exactly the table fields as-is.
So far, that in and of itself doesn't really explain the problems in the original post, but hopefully, you can begin to appreciate that fundamentally, EF is not L2S+ or L2S v4.0. It's a very different kind of product (a real ORM) even though there is some coincidental overlap in the fact that both use LINQ to get at database data.
One other interesting difference is that EF was built from the ground up to be DB-agnostic, whereas L2S only works against MS SQL Server (although anyone who's sniffed around the L2S code deep enough will see that there are some underpinnings to allow different DBs, but in the end, it was tied to MS SQL Server only). This difference also plays a big role in why some expressions work in L2S LINQ, but not in EF LINQ. EF's query provider deals with canonical DB expressions, which in plain english means LINQ expressions that have SQL query equivalents in nearly all relational databases out there. The underlying EF engine (query provider) translates the LINQ expressions to these canonical DB expressions, then hands the canonical expression tree off to a specific DB provider (say Oracle's or MySQL's EF provider) where it is translated to product-specific SQL. You can see here how these canonical expressions are supposed to be translated by the individual product-specific providers: http://msdn.microsoft.com/en-us/library/ee789836.aspx
Additionally, EF allows some product-specific DB functions (store functions) as expressions through extensions. The underlying product-specific providers are responsible for both providing and translating these.
That being the case, EF only allows expressions that are DB canonical expressions, or store-specific functions, becuase all expressions in the tree are converted to SQL for execution against the DB.
The difference with L2S is that L2S passes off any expressions that it can to the DB from its limited SQL generator, and then executes any expressions it can't translate to SQL on the materialized object set that is returned. While this makes it look simpler to use L2S, what you don't see is that half your expressions don't actually make it to the DB as SQL and this can cause some really inefficient queries bringing back larger sets of data which are then iterated again in CLR memory with regular object LINQ acting against them for the other expressions which L2S can't turn into SQL.
You get the exact same effects in EF by using EF to return the materialized data to object sets in memory, and then using additional LINQ statements on that set in memory - just as L2S does, but in this case, you just have to do it explicitly, just like when you say you have to call .First() before using a non-DB-canonical expression. Similarly, you can call .ToArray() or .ToList() before using additional expressions that can't be turned into SQL.
One other big difference is that in EF, entities must be used whole. Real model entities represent conceptual objects that are transacted on in whole. You never have half a User, for example. The User is an object whose state depends on all fields. If you want to return partial entities, or a flattened join of multiple entities, you have to define a projection (what EF calls a Complex Type), or you can use some of the new 4.1/4.2/4.3 POCO features.
Now that Entity Framework is open source, it's easy enough to see, especially from the comments on Issues from the team, that one of the goals is to provide EF as an open layer on top of multiple databases. To be fair, Microsoft has unsurprisingly only implemented that layer on top of their SQL Server, but there are other implementations, like DevArt's MySql EF Connector.
As part of that goal, it's wise to keep the public interface somewhat limited, and attempting to add an additional layer that asks - well, some of this might be done in memory, some of this might be done in SQL, who knows - definitely complicates the job for other implementers trying to tie EF into this or that database.
So, I agree with the other answer here - you'd have to ask the team - but you can also get a lot of info about that team's direction on the public bug tracker and their other publications, and this seems like a clear motivation.
That said the main difference between LINQ to SQL and EF is the way EF throws an exception on code that has to be run in memory, and if you're an Expressions ninja there's nothing stopping you from going the next step to wrap the DbContext class and make that work just like LINQ to SQL. On the other hand, what you gain there is a mixed bag - you make it implicit rather than explicit when the SQL is generated, and when it fires, and that can be viewed as a loss of performance and control in exchange for flexibility/ease of authoring.

Entity Framework 4.1 for large number of tables (715)

I'm developing a data access layer for a database with over 700 tables. I created the model including all the tables, which generated a huge model. I then changed the model to use DBContext from 4.1 which seemed to improve how it compiled and worked. The designer didnt seem to work at all.
I then created a test app which just added two records to the table, but the processor went 100% in the db.SaveChanges method. Being a black box it was difficult to accertain what went wrong.
So my questions are
Is the entity framework the best approach to a large database
If so, should the model be broken down into logical areas. I did note that you cant have the same sql table in multiple models
I have read that the code only approach is best in these large cases. What is that.
Any guidance would be truly appreciated
Thanks
Large database is always something special. Any technology has some pros and cons when working with a large database.
The problem you have encountered is the most probably related to building the model. When you start the application and use EF related stuff for the first time EF must build the model description and compile it - this is the most time consuming operation you can find in EF. Complexity of this operation grows with number of entities in the model. Once the model is compiled it is reused for the whole lifetime of the application (if you restart the application or unload application domain the model must be compiled again). You can avoid this by precompiling the model. It is done at design time where you use some tool to generate code from the model and you include that code into your project (it must be done again after each change in the model). For EDMX based models you can use EdmGen.exe to generate views and for code first based models you can use EF Power Tools CTP1.
EDMX (the designer) was improved in VS 2010 SP1 to be able to work with large models but I still think the large in this case is around 100 entities / tables. In the same time you rarely need 715 tables in the same model. I believe that these 715 tables indeed model several domains so you can divide them into multiple models.
The same is true when you are using DbContext and code first. If you model a class do you think that it is correct design when the class exposes 715 properties? I don't think so but that is exactly what your derived DbContext looks like - it has a public property for each exposed entity set (in the simplest mapping it means one property per table).
Same entity can be used in multiple models but you should try to avoid it as much as possible because it can introduce some complexities when loading entity in one context type and using it in other context type.
Code only = code first = Entity framework when you define mapping in the code without using EDMX.
take a look this post.
http://blogs.msdn.com/b/adonet/archive/2008/11/24/working-with-large-models-in-entity-framework-part-1.aspx

design advice : Entity Framework useful with stored procedures?

I am new to .NET and early in the design process of a front-end application for a database, and looking for some advice.
I am not sure I get it...
The DB is very strongly normalized, but provides lots of stored procedures to abstract the logical model (ex. Select sproc returns one data set from multiple tables closely reflecting the business object, Insert/Update sprocs to multiple tables, etc..)
How should I design the DAL ?
I'm not sure what the benefit of the Entity Framework is in this context.
When generated, it reflects the normalized DB schema rather than an abstraction of it.
Or if I map the sprocs to generate it (which requires some work since the T-SQL in the sprocs is dynamic and with joins), I get the business objects alright but can't see the benefit of it : the entities represent to a single 'abstract' table, and not a set of entities with Datarelations, the sprocs handles the calls to multiple tables. It seems more work to map the generated change events to the sprocs than to call the sprocs directly.
What am I missing ?
Thanks,
Michael.
You can do this with the EF, but the designer won't help you; it doesn't know how to derive an entity type from a stored proc signature. You would need to write the EDMX by hand. CTP 4 code-first may be easier, but I've never tried it.
After some research, I think I'm starting to be on the right track.
The following articles were very helpful :
Entity Framework Modeling: Table Per Hierarchy Inheritance
see also : Entity Framework Modeling : Entity Splitting
To be clear, what I wanted was to abstract the DB schema in the EF model.
(I didn't want any entity representing 'nothing', like a many-to-many relationship).
Besides these modeling techniques, I used views to generate the EF model, and added sprocs for CRUD.
As I said, I am a beginner. Let me know if I am on a wrong track...

What are the pros/cons of returning POCO objects from a Repository rathen than EF Entities?

Following the way Rob does it, I have the classes that are generated by the Linq to SQL wizard, and then a copy of those classes that are POCOs. In my repositories I return these POCOs rather than the Linq to SQL models:
return from c in DataContext.Customer
where c.ID == id
select new MyPocoModels.Customer { ID = c.ID, Name = c.Name }
I understand that the benefit of this is that the POCO models can be instantiated easier so this will make my code more testable.
I'm now moving from Linq to SQL over to Entity Framework and I'm about half way through an EF book. It seems there's a lot of goodness I'm going to lose out on by returning POCOs from my repositories rather than the EF entities.
I still haven't really embraced unit testing, so I feel like I'm wasting a lot of time creating these extra POCOs and writing the code to populate them, when all I appear to be gaining is testable code, yet I'm also gonna lose out on a lot of the benefits of the EF by not being able to track my objects.
Does anyone have any advice for a relative newb to all this ORM/Repository stuff?
Anthony
Another reason people don't like the auto-generated objects (in LINQ to SQL for example) is because of their built-in "magic".
Usually the magic is invisible and you never notice it, but when you try to do things like serialize one of those objects and then deserialize it (for example when using web services) its internal connection to the data source is broken and special hacks need to be employed to "put the magic back in".
With POCOs, you don't have to worry about those sorts of things and can get a better separation between your data and service layers. The downside of course is that you have to write lots of boring POCO -> magic object and magic object -> POCO conversion code. But in the end I think it's usually worth it, especially for large or complex projects.
The main reason is that a lot of people like to develop their model with a specific mindset: like DDD for instance. They might want to use a specific pattern (like Spec or State) for things like statuses (instead of enums) - or you might want to use a Factory for instantiation.
OO breaks when you try to use Tables as Objects when things get more complex. Simple sites work OK - but when you get to big big things, it gets ugly.
So - as always - it depends what you think your project will turn into.
My experience is that when you start writting some complex queries .Include method is worthless and you will find yourself either:
a) Writting a lot of queries to get the data you want or
b) abusing of annonymous types to load the data in a single query and then writting a lot of code just to pass that data to your entities.
POCOs are the way to go, IMHO.