When is IEnumerable preferred over IQueryable in Entity Framework? - entity-framework

I understand how IEnumerable and IQueryable works. I just cannot imagine a situation where IEnumerable would be needed in entity framework when working with SQL database. I wonder if I can just discard IEnumerable in EF. Can you provide any useful example that shows IEnumerable could be more useful than IQueryable?

There are three situations that come to mind.
When EF cannot convert your query into a correct SQL statement - so you need to bring the results into memory to complete the computation.
When you need to perform operations that involve operators that do not convert to SQL.
When SQL server is slower at producing the computation than an in-memory calculation. Many times I have found that pulling all the data into memory is quicker than letting SQL to do it.

Provided a data source (as IQueryable) can be queried, then yes, use IQueryable - though you shouldn't be creating IQueryable instances or implementing it yourself, that's what EF is for. You will still need to use IEnumerable as method parameters or return types if you're using EF with external data-sources or other data that isn't queryable itself, such as JOINing an EF table with non-EF data.
For example, you'd have a return type as IEnumerable<T> if the data you're returning isn't queryable because you called AsEnumerable or ToList but you don't want to reveal implementation details - but I'd then prefer IReadOnlyList<T> in that case.

Related

Entity Framework 6 performance impact with IList

Is there any performance impact of IList instead ICollection (or IEnumerable) in EF entities?
It really depends on how you retrieve the data from the database context. Entity framework uses DbSet in order to expose an abstraction from linq to the database tables. This exposure comes in the form of an IQueryable. IQueryable implements the IEnumerable interface and allows for queried results to the database to be enumerated.
Enumeration forces the expression tree associated with an IQueryable object to be executed.
-MSDN: http://msdn.microsoft.com/en-us/library/vstudio/bb351562(v=vs.100).aspx
So depending on if you simply use a ToList() and grab the entire set, use a .Take(), or a .Skip(), or if you use a projection such as .Select() -- all of these factors weigh into how performant the entity framework will be when getting data from the database.
What you choose as far as an interface to represent the data you have fetched will not affect performance at all. This is because the interfaces only expose methods to use, but will not change the underlying set of data.
Basically the methods will range from a base of IEnumerable, all the way up to IList which has a larger selection available. In general, you can use IEnumerable for a set of data which needs to be iterated and used, ICollection for a set of data which is going to be iterated and modified, and IList for a set of data which is going to be iterated, modified, and sorted.

Efficient querying when using DTOs in Breeze

We are using DTOs server side, and have configured a dbcontext using fluent api in order to give breeze the metadata it needs. We map 1:1 to our real database entities, and each DTO contains a simple subset of the real database entity properties.
This works great, but now I need to figure out a way to make queries efficient - i.e. if the Breeze client queries for a single item I don't want to have to create a whole set of DTO objects before I can filter. i.e. I want to figure out a way to execute the filter/sort on the actual entities, but still return DTO objects.
I guess I need to figure out a way to intercept the query execution in order to query my real database entities and return a DTO instead of the real database entity.
Any ideas for how to best approach this?
Turns out that if you use projection in a link statement, e.g.
From PossibleCustomer As Customer
In Customers
Select New CustomerDto With {.Id = PossibleCustomer.Id,
.Name = PossibleCustomer.Name,
.Email = PossibleCustomer.Email}
.. then linq is smart enough to still optimize any queries to the database - i.e. if I query on the linq statement above to filter for a single item by Id, the database is hit with that query for just a single item and a single DTO is created. Pretty clever stuff. This only works if you do a direct projection in the linq statement - if you call off to a function to create your DTO then this won't work.
Just in case others are facing the same scenario, you might want to look at AutoMapper - it can create these projections for you using a model you create just once - avoids all those huge linq statements that are hard to read and validate. The automapper projections (assuming you stick to the simple stuff) still allow the linq to entities magic that ensures you don't have to do table scans when you create your DTOs.

Why does EF 4.1 not support complex queries as well as linq-to-sql?

I am in the process of converting our internal web application from Linq-To-Sql to EF CodeFirst from an existing database. I have been getting annoyed with Linq-To-Sql's limitations more and more lately, and having to update the edmx after updating a very intertwined database table finally frustrated me enough to switch to EF.
However, I am encountering several situations where using linq with Linq-To-Sql is more powerful than the latest Entity Framework, and I am wondering if anyone knows the reasoning for it? Most of this seems to deal with transformations. For example, the following query works in L2S but not in EF:
var client = (from c in _context.Clients
where c.id == id
select ClientViewModel.ConvertFromEntity(c)).First();
In L2S, this correctly retrieves a client from the database and converts it into a ClientViewModel type, but in EF this exceptions saying that Linq to Entities does not recognize the method (which makes sense as I wrote it.
In order to get this working in EF I have to move the select to after the First() call.
Another example is my query to retrieve a list of clients. In my query I transform it into an anonymous structure to be converted into JSON:
var clients = (from c in _context.Clients
orderby c.name ascending
select new
{
id = c.id,
name = c.name,
versionString = Utils.GetVersionString(c.ProdVersion),
versionName = c.ProdVersion.name,
date = c.prod_deploy_date.ToString()
})
.ToList();
Not only does my Utils.GetVersionString() method cause an unsupported method exception in EF, the c.prod_deploy_date.ToString() causes one too and it's a simple DateTime. Like previously, in order to fix it I had to do my select transformation after ToList().
Edit: Another case I just came across is that EF cannot handle where clauses that compare entities where as L2S has no issues for with it. For example the query
context.TfsWorkItemTags.Where(x => x.TfsWorkItem == TfsWorkItemEntity).ToList()
throws an exception and instead I have to do
context.TfsWorkItemTags.Where(x => x.TfsWorkItem.id == tfsWorkItemEntity.id).ToList()
Edit 2: I wanted to add another issue that I found. Apparently you can't use arrays in EF Linq queries, and this probably annoys me more than anything. So for example, right now I convert an entity that denotes a version into an int[4] and try to query on it. In Linq-to-Sql I used the following query:
return context.ReleaseVersions.Where(x => x.major_version == ver[0] && x.minor_version == ver[1]
&& x.build_version == ver[2] && x.revision_version == ver[3])
.Count() > 0;
This fails with the following exception:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities.
Edit 3: I found another instance of EF's bad Linq implementation. The following is a query that works in L2S but doesn't in EF 4.1:
DateTime curDate = DateTime.Now.Date;
var reqs = _context.TestRequests.Where(x => DateTime.Now > (curDate + x.scheduled_time.Value)).ToList();
This throws an ArgumentException with the message DbArithmeticExpression arguments must have a numeric common type.
Why does it seem like they downgraded the ability for Linq queries in EF than in L2S?
Edit (9/2/2012): Updated to reflect .NET 4.5 and added few more missing features
This is not the answer - it cannot be because the only qualified person who can answer your question is probably a product manager from ADO.NET team.
If you check feature set of old datasets then linq-to-sql and then EF you will find that critical features are removed in newer APIs because newer APIs are developed in much shorter times with big effort to deliver new fancy features.
Just list of some critical features available in DataSets but not available in later APIs:
Batch processing
Unique keys
Features available in Linq-to-Sql but not supported in EF (perhaps the list is not fully correct, I haven't used L2S for a long time):
Logging database activity
Lazy loaded properties
Left outer join (DefaultIfEmpty) since the first version (EF has it since EFv4)
Global eager loading definitions
AssociateWith - for example conditions for eager loaded data
Code first since the first version
IMultipleResults supporting stored procedures returning multiple result sets (EF has it in .NET 4.5 but there is no designer support for this feature)
Support for table valued functions (EF has this in .NET 4.5)
And some others
Now we can list features available in EF ObjectContext API (EFv4) and missing in DbContext API (EFv4.1):
Mapping stored procedures
Conditional mapping
Mapping database functions
Defining queries, QueryViews, Model defined functions
ESQL is not available unless you convert DbContext back to ObjectContext
Manipulating state of independent relationships is not possible unless you convert DbContext back to ObjectContext
Using MergeOption.OverwriteChanges and MergeOption.PreserveChanges is not possible unless you convert DbContext back to ObjectContext
And some others
My personal feeling about this is only big sadness. Core features are missing and features existing in previous APIs are removed because ADO.NET team obviously doesn't have enough resources to reimplement them - this makes migration path in many cases almost impossible. The whole situation is even worse because missing features or migration obstacles are not directly listed (I'm afraid even ADO.NET team doesn't know about them until somebody reports them).
Because of that I think that whole idea of DbContext API was management failure. At the moment ADO.NET team must maintain two APIs - DbContext is not mature to replace ObjectContext and it actually can't because it is just a wrapper and because of that ObjectContext cannot die. Resources available for EF development was most probably halved.
There are more problems related. Once we leave ADO.NET team and look on the problem from the perspective of MS product suite we will see so many discrepancies that I sometimes even wonder if there is any global strategy.
Simply live with the fact that EF's provider works in different way and queries which worked in Linq-to-sql don't have to work with EF.
A little late to the game, but I found this post while searching for something else, and figured that I'd post an answer to the fundamental questions in the original post, which mostly boil down to "LINQ to SQL allows Expression [x], but EF doesn't".
The answer is that the query provider (the code that translates your LINQ expression tree into something that actually executes and returns an enumerable set of stuff) is fundamentally different between L2S and EF. To understand why, you have to realize that another fundamental difference between L2S and EF is that L2S is table-based and EF is entity-model-based. In other words, EF works with conceptual entity models, and knows that the underlying physical model (the DB tables) don't necessarily reflect conceptual entities. This is because tables are normalized/denormalized, and have weird ways of dealing with entity type generalization (inheritance). So in EF, you have a picture of the conceptual model (which is the objects you code against in VB/C#, etc.) and a mapping to the physical underlying table(s) that comprise your conceptual entities. L2S does not do this. Every "model entity" in L2S is strictly a single table, with exactly the table fields as-is.
So far, that in and of itself doesn't really explain the problems in the original post, but hopefully, you can begin to appreciate that fundamentally, EF is not L2S+ or L2S v4.0. It's a very different kind of product (a real ORM) even though there is some coincidental overlap in the fact that both use LINQ to get at database data.
One other interesting difference is that EF was built from the ground up to be DB-agnostic, whereas L2S only works against MS SQL Server (although anyone who's sniffed around the L2S code deep enough will see that there are some underpinnings to allow different DBs, but in the end, it was tied to MS SQL Server only). This difference also plays a big role in why some expressions work in L2S LINQ, but not in EF LINQ. EF's query provider deals with canonical DB expressions, which in plain english means LINQ expressions that have SQL query equivalents in nearly all relational databases out there. The underlying EF engine (query provider) translates the LINQ expressions to these canonical DB expressions, then hands the canonical expression tree off to a specific DB provider (say Oracle's or MySQL's EF provider) where it is translated to product-specific SQL. You can see here how these canonical expressions are supposed to be translated by the individual product-specific providers: http://msdn.microsoft.com/en-us/library/ee789836.aspx
Additionally, EF allows some product-specific DB functions (store functions) as expressions through extensions. The underlying product-specific providers are responsible for both providing and translating these.
That being the case, EF only allows expressions that are DB canonical expressions, or store-specific functions, becuase all expressions in the tree are converted to SQL for execution against the DB.
The difference with L2S is that L2S passes off any expressions that it can to the DB from its limited SQL generator, and then executes any expressions it can't translate to SQL on the materialized object set that is returned. While this makes it look simpler to use L2S, what you don't see is that half your expressions don't actually make it to the DB as SQL and this can cause some really inefficient queries bringing back larger sets of data which are then iterated again in CLR memory with regular object LINQ acting against them for the other expressions which L2S can't turn into SQL.
You get the exact same effects in EF by using EF to return the materialized data to object sets in memory, and then using additional LINQ statements on that set in memory - just as L2S does, but in this case, you just have to do it explicitly, just like when you say you have to call .First() before using a non-DB-canonical expression. Similarly, you can call .ToArray() or .ToList() before using additional expressions that can't be turned into SQL.
One other big difference is that in EF, entities must be used whole. Real model entities represent conceptual objects that are transacted on in whole. You never have half a User, for example. The User is an object whose state depends on all fields. If you want to return partial entities, or a flattened join of multiple entities, you have to define a projection (what EF calls a Complex Type), or you can use some of the new 4.1/4.2/4.3 POCO features.
Now that Entity Framework is open source, it's easy enough to see, especially from the comments on Issues from the team, that one of the goals is to provide EF as an open layer on top of multiple databases. To be fair, Microsoft has unsurprisingly only implemented that layer on top of their SQL Server, but there are other implementations, like DevArt's MySql EF Connector.
As part of that goal, it's wise to keep the public interface somewhat limited, and attempting to add an additional layer that asks - well, some of this might be done in memory, some of this might be done in SQL, who knows - definitely complicates the job for other implementers trying to tie EF into this or that database.
So, I agree with the other answer here - you'd have to ask the team - but you can also get a lot of info about that team's direction on the public bug tracker and their other publications, and this seems like a clear motivation.
That said the main difference between LINQ to SQL and EF is the way EF throws an exception on code that has to be run in memory, and if you're an Expressions ninja there's nothing stopping you from going the next step to wrap the DbContext class and make that work just like LINQ to SQL. On the other hand, what you gain there is a mixed bag - you make it implicit rather than explicit when the SQL is generated, and when it fires, and that can be viewed as a loss of performance and control in exchange for flexibility/ease of authoring.

Returning IQueryable vs. ObjectQuery when using LINQ to Entities

I have been reading when using LINQ to entites a query is of type IQueryable before it is processed but when the query has been processed, it's no longer an IQueryable, but an ObjectQuery.
In that case, is it correct to write methods from my layer (repository layer) to return IQueryable?
Do I need to cast?
Why would I want to return an ObjectQuery?
I'm coming from a LINQ to SQL background where things were always IQueryable but EF seems to have changed this.
Any help really appreciated.
My repositories always returns IQueryable. The reason for this is that IQueryable is not dependent on EF whereas ObjectQuery is. So if I want my other layers to be persistance ignorant I don't want to introduce dependency on ObjectQuery.
ObjectQuery is implementation of IQueryable with several additional features. First feature you will quickly discover is Include function which is need for eager loading of navigation properties (related entities). Second feature is EQL - entity query language. EQL is second way how you can query your conceptual model. It has similar syntax as SQL. You can write simple query as string, pass it to ObjectQuery constructor and execute query or use it in Linq-To-Entities.

IQueryable<T> vs IEnumerable<T> with Lambda, which to choose?

I do more and more exercise with Lambda but I do not figure out why sometime example use .AsQueryable(); that use the IQueryable and sometime it omit the .AsQueryable(); and use the IEnumerable.
I have read the MSDN but I do no see how "executing an expression tree" is an advantage over not.
Anyone can explain it to me?
IQueryable implements IEnumerable, so right off the bat, with IQueryable, you can do everything that you can do with IEnumerable. IQueryables deal with converting some lambda expression into query on the underlying data source - this could be a SQL database, or an object set.
Basically, you should usually not have to care one way or the other if it is an IQueryable, or an IEnumerable.
As a user, you generally shouldn't have to care, it's really about the implementor of the data source. If the data providers just implements IEnumerable, you are basically doing LINQ to objects execution, i.e. it's in memory operation on collections. IQueryable provides the data source the capability to translate the expression tree into an alternate representation for execution once the expression is executed (usually by enumeration), which is what Linq2Sql, Linq2Xml and Linq2Entities do.
The only time i can see the end user caring, is if they wish to inspect the Expression tree that would be executed, since IQueryable exposes Expression. Another use might be inspecting the Provider. But both of those scenarios should really be reserved for implementors of other query providers and want to transform the expression. The point of Linq is that you shouldn't have to care about the implementation of expression execution to be used.
I agree with Arne Claassen, there are cases when you need to think about the underlying implmentatoin provided by the data sources. For example check this blog post which shows how the SQL generated by IEnumerable and IQueryable are different in certain scenarios.