LINQ Where clause constant expression optimization - entity-framework

I have rewritten some of my database accessing code, in order to save some cycles. My main goal was to achieve as much server-side evaluation of my LINQ querys as possible.
In order to do so, i replaced this:
data = ...some LINQ...
if(condition){
data = data.Where(element => filter-condition)
}
with this:
data = ...some LINQ...
.Where(element => !condition || filter-condition)
condition in this case is an expression that does not depend on the current element. So you could say it is practically a constant during the whole query, as it always evaluates to true for all elements in data or it evaluates to false for all elements.
On the other hand filter-condition is an expression that depends on the current element, as you would expect from you usual Where clause condition.
This optimization works like a charm, because it enables server-side evaluation in SQL on the database, and the LINQ to SQL compiler is intelligent enough to even short-cirquit the generated SQL if my condition evaluates to false.
My question ist, what happens if this code is not evaluated in SQL on server-side. Lets say i would do the following:
data = ...some LINQ...
.AsEnumerable() //Enforces client-side query evaluation
.Where(element => !condition || filter-condition)
Now my Where clause gets evaluated client-side, which is not a problem on the functional side. Of course, the performance is weaker for client-side execution. But what about my custom optimization i did beforhand? Is there a performance penalty for evaluating condition for every element in my data sequence? Or is LINQ on client-side also intelligent enough, to short-circuit the "constant" expression condition?

Or is LINQ on client-side also intelligent enough, to short-circuit the "constant" expression condition?
Sort of. || always gets short-circuit evaluation in C#, but LINQ on the client does not have any sort of query optimizer, so the !condition || filter-condition predicate will be evaluated for every entity.

Related

ANDALSO options, stop evaluating when one fails

I have a SQL select statement that reads items. There are conditions for which items to display, but when one condition fails, I don't need to check the other.
For example:
where item like 'M%'
and item_class='B'
and fGetOnHand(item)>0
If either of the first 2 fail, i do NOT want to do the last one (a call to a user defined function).
From what I have read on this site, SQL Server's AND and OR operators do not follow short circuiting behavior. This means that the call to the UDF could happen first, or maybe not at all, should one of the other conditions happen first and fail.
We might be able to try rewriting your logic using a CASE expression, where the execution order is fixed:
WHERE
CASE WHEN item NOT LIKE 'M%' OR item_class <> 'B'
THEN 0
WHEN fGetOnHand(item) <= 0
THEN 0
ELSE 1 END = 1
The above logic forces the check on item and item_class to happen first. Should either fail, then the first branch of the CASE expression evaluates to 0, and the condition fails. Only if both these two checks pass would the UDF be evaluated.
This is very verbose, but if the UDF call is a serious penalty, then perhaps phrasing your WHERE clause as above would be worth the trade off of verbose code for better performance.

Entity Framework: Why people use .AsEnumerable() along with EF query

I have never yet used .AsEnumerable() in an EntityFramework query.
See the below example and tell me why they use .AsEnumerable() before Select ?
Could they not just use Select directly?
Please tell me the reason for the usage of .AsEnumerable() here in below query.
Why did they use .ToArray() instead of .Tolist() ?
private IEnumerable<AutoCompleteData> GetAutoCompleteData(string searchTerm)
{
using (var context = new AdventureWorksEntities())
{
var results = context.Products
.Include("ProductSubcategory")
.Where(p => p.Name.Contains(searchTerm)
&& p.DiscontinuedDate == null)
.AsEnumerable()
.Select(p => new AutoCompleteData
{
Id = p.ProductID,
Text = BuildAutoCompleteText(p)
})
.ToArray();
return results;
}
}
The difference between AsEnumerable and AsQueryable is that the enumerable contains all information to create an enumerator. Once you've got the enumerator you can ask for the first element, and if there is one, you can get the next one.
The Queryable does not hold the information to create the enumerator. It holds an Expression and a Provider. The Provider knows which process must execute the Expression and which language this process uses. Quite often the other process is a database management system, and the language is SQL.
The result of a Queryable.Select(...) is still an IQueryable, meaning that the query is not performed yet. The Select function only changed the Expression.
Only if you ask for the Enumerator, either explicitly by calling GetEnumerator(), or implicitly by calling foreach, or one of the non-deferred execution functions like ToList(), ToDictionary(), FirstOrDefault(), Sum(), the Provider will translate the expression into the format that the execution process understands and execute the query. Once the data is transported to the local process the enumerator is created.
Alas, sometimes you want to call your own functions in your query. SQL does not know these functions, and thus the Provider can't translate such Expressions into SQL. In fact, the provider of DbContext does not even know all Linq functions. See supported and unsupported Linq methods
That is the moment when you use AsEnumerable(). If you ask for the Enumerator (in your foreach for example), the Provider will translate the Expression until AsEnumerable; send it to the execution process and transport all data to local process. After that, the query will be AsEnumerable: the rest of the LINQ will be performed in local memory, and thus your local functions can be called.
You could of course use ToList() to fetch all data to local memory and continue your linq after that. But that would be a waste if you'd only want the first element, or every other one.
This brings me to the final remark: the transport of the data from the DBMS to your local memory is one of the slower parts. Try to limit this transport to only the data you'll actually use.
For example: if you have a one-to-many relation between a Teacher and his Students, don't fetch the Teacher and his Students, because you'll transport Student.TeacherId many times, and they will all have the same value as Teacher.Id. Instead, only select the data you really want to use
Not all Select projections, Where predicates, and Aggregations can be translated from C# Expressions into native database queries - in your case, the full LINQ expression attempts to construct a AutoCompleteData class with using a custom function BuildAutoCompleteText to set one of its properties - this cannot be trivially converted into native database code like SQL.
In your case, AsEnumerable serves to terminate the work which will be done in SQL before this will be executed in SQL.
i.e.
.Include("ProductSubcategory")
.Where(p => p.Name.Contains(searchTerm)
&& p.DiscontinuedDate == null)
will be executed in SQL, roughly as a JOIN to ProductSubcategory, and a WHERE predicate translated from your Products such as:
Product.Name LIKE '%' + #SearchTerm + '%' AND Product.DiscontinuedDate IS NULL
All work subsequent to the AsEnumerable (i.e. the projection of the results to AutoCompleteData objects) will be done in-memory with LINQ to objects.
ToArray and ToList will both execute (materialize) the result, but into different data structures. In your example, neither materialisation is required - since the return type is IEnumerable<AutoCompleteData> - the caller of the function might execute .Any() or First() which would render full materialisation wasteful - I would recommend you remove .ToArray() altogether - since the using statement controls the SQL lifespan is protected by the AsEnumerable() materialization, there is no issue with connection lifespans here.
tell me the intention of usage of .AsEnumerable() here in below query?
In this particular example AsEnumerable() was used to bring the data back to the client, because EF has no idea how to map BuildAutoCompleteText() to SQL query.
they could use select directly.....is not it?
No, unless you define custom function BuildAutoCompleteText on SQL Server and make EF aware of that function.
why they use .ToArray(); instead of Tolist() ?
In this case it does not matter both implement IEnumerable<T>

Linq To Entities - Any VS First VS Exists

I am using Entity Framework and I need to check if a product with name = "xyz" exists ...
I think I can use Any(), Exists() or First().
Which one is the best option for this kind of situation? Which one has the best performance?
Thank You,
Miguel
Okay, I wasn't going to weigh in on this, but Diego's answer complicates things enough that I think some additional explanation is in order.
In most cases, .Any() will be faster. Here are some examples.
Workflows.Where(w => w.Activities.Any())
Workflows.Where(w => w.Activities.Any(a => a.Title == "xyz"))
In the above two examples, Entity Framework produces an optimal query. The .Any() call is part of a predicate, and Entity Framework handles this well. However, if we make the result of .Any() part of the result set like this:
Workflows.Select(w => w.Activities.Any(a => a.Title == "xyz"))
... suddenly Entity Framework decides to create two versions of the condition, so the query does as much as twice the work it really needed to. However, the following query isn't any better:
Workflows.Select(w => w.Activities.Count(a => a.Title == "xyz") > 0)
Given the above query, Entity Framework will still create two versions of the condition, plus it will also require SQL Server to do an actual count, which means it doesn't get to short-circuit as soon as it finds an item.
But if you're just comparing these two queries:
Activities.Any(a => a.Title == "xyz")
Activities.Count(a => a.Title == "xyz") > 0
... which will be faster? It depends.
The first query produces an inefficient double-condition query, which means it will take up to twice as long as it has to.
The second query forces the database to check every item in the table without short-circuiting, which means it could take up to N times longer than it has to, depending on how many items need to be evaluated before finding a match. Let's assume the table has 10,000 items:
If no item in the table matches the condition, this query will take roughly half the time as the first query.
If the first item in the table matches the condition, this query will take roughly 5,000 times longer than the first query.
If one item in the table is a match, this query will take an average of 2,500 times longer than the first query.
If the query is able to leverage an index on the Title and key columns, this query will take roughly half the time as the first query.
So in summary, IF you are:
Using Entity Framework 4 (since newer versions might improve the query structure) Entity Framework 6.1 or earlier (since 6.1.1 has a fix to improve the query), AND
Querying directly against the table (as opposed to doing a sub-query), AND
Using the result directly (as opposed to it being part of a predicate), AND
Either:
You have good indexes set up on the table you are querying, OR
You expect the item not to be found the majority of the time
THEN you can expect .Any() to take as much as twice as long as .Count(). For example, a query might take 100 milliseconds instead of 50. Or 10 instead of 5.
IN ANY OTHER CIRCUMSTANCE .Any() should be at least as fast, and possibly orders of magnitude faster than .Count().
Regardless, until you have determined that this is actually the source of poor performance in your product, you should care more about what's easy to understand. .Any() more clearly and concisely states what you are really trying to figure out, so stick with that.
Any translates into "Exists" at the database level. First translates into Select Top 1 ... Between these, Exists will out perform First because the actual object doesn't need to be fetched, only a Boolean result value.
At least you didn't ask about .Where(x => x.Count() > 0) which requires the entire match set to be evaluated and iterated over before you can determine that you have one record. Any short-circuits the request and can be significantly faster.
One would think Any() gives better results, because it translates to an EXISTS query... but EF is awfully broken, generating this (edited):
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent2]
WHERE Condition
)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
Instead of:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit)
ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
...basically doubling the query cost (for simple queries; it's even worse for complex ones)
I've found using .Count(condition) > 0 is faster pretty much always (the cost is exactly the same as a properly-written EXISTS query)
Ok, I decided to try this out myself. Mind you, I'm using the OracleManagedDataAccess provider with the OracleEntityFramework, but I'm guessing it produces compliant SQL.
I found that First() was faster than Any() for a simple predicate. I'll show the two queries in EF and the SQL that was generated. Mind you, this is a simplified example, but the question was asking whether any, exists or first was faster for a simple predicate.
var any = db.Employees.Any(x => x.LAST_NAME.Equals("Davenski"));
So what does this resolve to in the database?
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME")
)) THEN 1 ELSE 0 END AS "C1"
FROM ( SELECT 1 FROM DUAL ) "SingleRowTable1"
It's creating a CASE statement. As we know, ANY is merely syntatic sugar. It resolves to an EXISTS query at the database level. This happens if you use ANY at the database level as well. But this doesn't seem to be the most optimized SQL for this query.
In the above example, the EF construct Any() isn't needed here and merely complicates the query.
var first = db.Employees.Where(x => x.LAST_NAME.Equals("Davenski")).Select(x=>x.ID).First();
This resolves to in the database as:
SELECT
"Extent1"."ID" AS "ID"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME") AND (ROWNUM <= (1) )
Now this looks like a more optimized query than the initial query. Why? It's not using a CASE ... THEN statement.
I ran these trivial examples several times, and in ALMOST every case, (no pun intended), the First() was faster.
In addition, I ran a raw SQL query, thinking this would be faster:
var sql = db.Database.SqlQuery<int>("SELECT ID FROM MYSCHEMA.EMPLOYEES WHERE LAST_NAME = 'Davenski' AND ROWNUM <= (1)").First();
The performance was actually the slowest, but similar to the Any EF construct.
Reflections:
EF Any doesn't exactly map to how you might use Any in the database. I could have written a more optimized query in Oracle with ANY than what EF produced without the CASE THEN statement.
ALWAYS check your generated SQL in a log file or in the debug output window.
If you're going to use ANY, remember it's syntactic sugar for EXISTS. Oracle also uses SOME, which is the same as ANY. You're generally going to use it in the predicate as a replacement for IN. In this case it generates a series of ORs in your WHERE clause. The real power of ANY or EXISTS is when you're using Subqueries and are merely testing for the EXISTENCE of related data.
Here's an example where ANY really makes sense. I'm testing for the EXISTENCE of related data. I don't want to get all of the records from the related table. Here I want to know if there are Surveys that have Comments.
var b = db.Survey.Where(x => x.Comments.Any()).ToList();
This is the generated SQL:
SELECT
"Extent1"."SURVEY_ID" AS "SURVEY_ID",
"Extent1"."SURVEY_DATE" AS "SURVEY_DATE"
FROM "MYSCHEMA"."SURVEY" "Extent1"
WHERE ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."COMMENTS" "Extent2"
WHERE ("Extent1"."SURVEY_ID" = "Extent2"."SURVEY_ID")
))
This is optimized SQL!
I believe the EF does a wonderful job generating SQL. But you have to understand how the EF constructs map to DB constructs else you can create some nasty queries.
And probably the best way to get a count of related data is to do an explicit Load with a Collection Query count. This is far better than the examples provided in prior posts. In this case you're not loading related entities, you're just obtaining a count. Here I'm just trying to find out how many Comments I have for a particular Survey.
var d = db.Survey.Find(1);
var e = db.Entry(d).Collection(f => f.Comments)
.Query()
.Count();
Any() and First() is used with IEnumerable which gives you the flexibility for evaluating things lazily. However Exists() requires List.
I hope this clears things out for you and help you in deciding which one to use.

How to convert Lambda Expression to SQL Server Query syntax?

Here's the lambda expression, I want to convert this to SQL Server query syntax.
{x => ((True AndAlso x.Name.ToLower().Contains("_")) AndAlso Not(x.IsDeleted))}
Note : The lambda expression is equivalent to Where Clause of Sql server.
I want to convert it to sql syntax and then pass it to sql server stored procedure.
Is there any way I can achieve this?
Generally, you can use the ToString method on the IQueryable object returned by the LINQ statement to find the exact query that would be executed on the database. But, in this case, I would guess something like this might be generated for the WHERE:
WHERE CONTAINS(Name, '_') AND NOT IsDeleted
But, you haven't provided any detail that would allow me to verify that.
If you don't have full-text on, then the following might be more applicable:
WHERE Name like '%_%' AND NOT IsDeleted

Does LINQ to Entities expression with inner object context instance translated into sequence of SQL client-server requests?

I have ADO.NET EF expression like:
db.Table1.Select(
x => new { ..., count = db.Table2.Count(y => y.ForeignKey.ID == x.ID) })
Does I understand correctly it's translated into several SQL client-server requests and may be refactored for better performance?
Thank you in advance!
Yes - the expression will get translated (in the best way it can) to a SQL query.
And just like any T-SQL query, an EF (or L2SQL) query expression can be refactored for performance.
Why not run SQL profiler in the background to see what it is getting executed, and try and optimize the raw T-SQL first - which will help optimize the expression.
Or if you have LinqPad, just optimize the T-SQL query and get LinqPad to write your query for you.
Also, im not really sure why you have specified the delegate for the Count() expression.
You can simply do this:
var query= from c in db.Table1
select new { c.CustomerID, OrderCount = c.Table2s.Count() };
The answer is NO - this query will be translated into one client-to-RDBMS request.
RPM1984 advised to use LinqPad. LinqPad showed that the query will be translated into very straightforward SQL expression. Approach with grouping will be translated into another SQL expression but still will be executed in one request.