I'm getting errors when I try and do something like this:
from s in db.SomeDbSet where IsValid(s) select s
It errors telling me that it can't process IsValid.
Basically what I'm trying to do is filter based on another dbSet inside the Where that is linked and does an any, but it won't let me.
I've tried a million different ways of doing a Expression but I can't find the right way and building my own Extension method like Where doesn't seem to work either.
Thanks!
Can you paste your IsValid function?
In this case it's EF job to take LINQ syntax and turn it into SQL syntax.
EF can't turn your function into SQL. it only supports a set number of functions that have a clear SQL equivalent commend.
you have two options:
1) Rewrite the function as a series of supported commends. This will be turned into a SQL sub-query, Meaning a single trip to the database, For example:
// will only return records that have at least one related entity marked as full.
query.Where(m => m.ReletedEntities.Any(re => re.IsFull == true));
2) Get all the data from the database and then using Linq and your function work with the data. this will be done in memory using your actual function that will be called once for every item in the collection. You will also have to load the related entity collection. or it will still be an "entity framework translated to SQL query", And will fail if you use your function.
I am using Entity Framework and I need to check if a product with name = "xyz" exists ...
I think I can use Any(), Exists() or First().
Which one is the best option for this kind of situation? Which one has the best performance?
Thank You,
Miguel
Okay, I wasn't going to weigh in on this, but Diego's answer complicates things enough that I think some additional explanation is in order.
In most cases, .Any() will be faster. Here are some examples.
Workflows.Where(w => w.Activities.Any())
Workflows.Where(w => w.Activities.Any(a => a.Title == "xyz"))
In the above two examples, Entity Framework produces an optimal query. The .Any() call is part of a predicate, and Entity Framework handles this well. However, if we make the result of .Any() part of the result set like this:
Workflows.Select(w => w.Activities.Any(a => a.Title == "xyz"))
... suddenly Entity Framework decides to create two versions of the condition, so the query does as much as twice the work it really needed to. However, the following query isn't any better:
Workflows.Select(w => w.Activities.Count(a => a.Title == "xyz") > 0)
Given the above query, Entity Framework will still create two versions of the condition, plus it will also require SQL Server to do an actual count, which means it doesn't get to short-circuit as soon as it finds an item.
But if you're just comparing these two queries:
Activities.Any(a => a.Title == "xyz")
Activities.Count(a => a.Title == "xyz") > 0
... which will be faster? It depends.
The first query produces an inefficient double-condition query, which means it will take up to twice as long as it has to.
The second query forces the database to check every item in the table without short-circuiting, which means it could take up to N times longer than it has to, depending on how many items need to be evaluated before finding a match. Let's assume the table has 10,000 items:
If no item in the table matches the condition, this query will take roughly half the time as the first query.
If the first item in the table matches the condition, this query will take roughly 5,000 times longer than the first query.
If one item in the table is a match, this query will take an average of 2,500 times longer than the first query.
If the query is able to leverage an index on the Title and key columns, this query will take roughly half the time as the first query.
So in summary, IF you are:
Using Entity Framework 4 (since newer versions might improve the query structure) Entity Framework 6.1 or earlier (since 6.1.1 has a fix to improve the query), AND
Querying directly against the table (as opposed to doing a sub-query), AND
Using the result directly (as opposed to it being part of a predicate), AND
Either:
You have good indexes set up on the table you are querying, OR
You expect the item not to be found the majority of the time
THEN you can expect .Any() to take as much as twice as long as .Count(). For example, a query might take 100 milliseconds instead of 50. Or 10 instead of 5.
IN ANY OTHER CIRCUMSTANCE .Any() should be at least as fast, and possibly orders of magnitude faster than .Count().
Regardless, until you have determined that this is actually the source of poor performance in your product, you should care more about what's easy to understand. .Any() more clearly and concisely states what you are really trying to figure out, so stick with that.
Any translates into "Exists" at the database level. First translates into Select Top 1 ... Between these, Exists will out perform First because the actual object doesn't need to be fetched, only a Boolean result value.
At least you didn't ask about .Where(x => x.Count() > 0) which requires the entire match set to be evaluated and iterated over before you can determine that you have one record. Any short-circuits the request and can be significantly faster.
One would think Any() gives better results, because it translates to an EXISTS query... but EF is awfully broken, generating this (edited):
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent2]
WHERE Condition
)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
Instead of:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit)
ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
...basically doubling the query cost (for simple queries; it's even worse for complex ones)
I've found using .Count(condition) > 0 is faster pretty much always (the cost is exactly the same as a properly-written EXISTS query)
Ok, I decided to try this out myself. Mind you, I'm using the OracleManagedDataAccess provider with the OracleEntityFramework, but I'm guessing it produces compliant SQL.
I found that First() was faster than Any() for a simple predicate. I'll show the two queries in EF and the SQL that was generated. Mind you, this is a simplified example, but the question was asking whether any, exists or first was faster for a simple predicate.
var any = db.Employees.Any(x => x.LAST_NAME.Equals("Davenski"));
So what does this resolve to in the database?
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME")
)) THEN 1 ELSE 0 END AS "C1"
FROM ( SELECT 1 FROM DUAL ) "SingleRowTable1"
It's creating a CASE statement. As we know, ANY is merely syntatic sugar. It resolves to an EXISTS query at the database level. This happens if you use ANY at the database level as well. But this doesn't seem to be the most optimized SQL for this query.
In the above example, the EF construct Any() isn't needed here and merely complicates the query.
var first = db.Employees.Where(x => x.LAST_NAME.Equals("Davenski")).Select(x=>x.ID).First();
This resolves to in the database as:
SELECT
"Extent1"."ID" AS "ID"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME") AND (ROWNUM <= (1) )
Now this looks like a more optimized query than the initial query. Why? It's not using a CASE ... THEN statement.
I ran these trivial examples several times, and in ALMOST every case, (no pun intended), the First() was faster.
In addition, I ran a raw SQL query, thinking this would be faster:
var sql = db.Database.SqlQuery<int>("SELECT ID FROM MYSCHEMA.EMPLOYEES WHERE LAST_NAME = 'Davenski' AND ROWNUM <= (1)").First();
The performance was actually the slowest, but similar to the Any EF construct.
Reflections:
EF Any doesn't exactly map to how you might use Any in the database. I could have written a more optimized query in Oracle with ANY than what EF produced without the CASE THEN statement.
ALWAYS check your generated SQL in a log file or in the debug output window.
If you're going to use ANY, remember it's syntactic sugar for EXISTS. Oracle also uses SOME, which is the same as ANY. You're generally going to use it in the predicate as a replacement for IN. In this case it generates a series of ORs in your WHERE clause. The real power of ANY or EXISTS is when you're using Subqueries and are merely testing for the EXISTENCE of related data.
Here's an example where ANY really makes sense. I'm testing for the EXISTENCE of related data. I don't want to get all of the records from the related table. Here I want to know if there are Surveys that have Comments.
var b = db.Survey.Where(x => x.Comments.Any()).ToList();
This is the generated SQL:
SELECT
"Extent1"."SURVEY_ID" AS "SURVEY_ID",
"Extent1"."SURVEY_DATE" AS "SURVEY_DATE"
FROM "MYSCHEMA"."SURVEY" "Extent1"
WHERE ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."COMMENTS" "Extent2"
WHERE ("Extent1"."SURVEY_ID" = "Extent2"."SURVEY_ID")
))
This is optimized SQL!
I believe the EF does a wonderful job generating SQL. But you have to understand how the EF constructs map to DB constructs else you can create some nasty queries.
And probably the best way to get a count of related data is to do an explicit Load with a Collection Query count. This is far better than the examples provided in prior posts. In this case you're not loading related entities, you're just obtaining a count. Here I'm just trying to find out how many Comments I have for a particular Survey.
var d = db.Survey.Find(1);
var e = db.Entry(d).Collection(f => f.Comments)
.Query()
.Count();
Any() and First() is used with IEnumerable which gives you the flexibility for evaluating things lazily. However Exists() requires List.
I hope this clears things out for you and help you in deciding which one to use.
I have been doing queries in EF and everything working great but now i have in the db 2 fields that are actually CHAR.. They hold a date but in the form of a number, in SQL Management Studio i can do date1 >= date2 for example and i can also check to see if a number i have falls in between these 2 dates.
Its nothing unusual, but basically a field that represents a date (the number grows as the date does)...
Now in EF when i try to do >= it states you can't do this on a string, ok understand its c# so i tried doing Convert.ToDecimal(date1) but it gives me an error saying that its not supported.
I have no option of changing the db fields, they are set in stone :-(
the way i got it to work was request of details and do a .ToList and then use the .ToDecimal and it works but of course this is doing it in memory! and this defeats the object of EF i.e. for example adding to the query using iqueryable.
Another way i got it to work was to pass the SQL query to SqlQuery of the dbcontext but again i lose a lot of ef functionality.
Can anyone help?
I am really stuck
As you say that you tried >= I assume that it would work for you if you could do that in plain SQL. And that is possible by doing
String.Compare(date1, date2) >= 0
EF is smart enough to translate that into a >= operator.
The advantage is that you do not need to compare converted values, so indexes can be used in execution plans.
First of all, you can at least enable deferred execution of the query by using AsEnumerable() instead of ToList(). This won't change the fact that the database would need to return all the records when you do in fact execute the query, however.
To let the database perform the filtering, you need your query to be compatible with SQL. Since you can't do ToDecimal() in SQL, you need to work with strings directly by converting your myvar to a string that is in the same format as dateStart and dateEnd, then form your query.
I have a relatively simple thing that I can do easily in SQL, but I'm trying to get used to using Lambda expressions, and having a hard time.
Here is a simple example. Basically I have 2 tables.
tblAction (ActionID, ActionName)
tblAudit (AuditID, ActionID, Deleted)
tblAudit may have an entry regarding tblAction with the Deleted flag set to 1.
All I want to do is select actions where we don't have a Deleted entry in tblAudit. So the SQL statement is:
Select tblAction.*
From tblAction LEFT JOIN tblAudit on tblAction.ActionID=tblAudit.ActionID
where tblAudit.Deleted <> 1
What would be the equivalent of the above in VB.Net's LINQ? I tried:
Context.Actions.Where(Function(act) Context.Audit
.Where(Function(aud) aud.Deleted=False AndAlso aud.ActionID=act.ActionID)).ToList
But that is really an inner join type scenario where it requires that each entry in tblAction also has an Entry in tblAudit. I am using Entity Framework Code First to do the database mapping. Is there a way to define the mapping in a way where you can do this?
You should add
Public Property Audits As DbSet<Audit>
into your action entity class (to register the association between those tables).
Now you can just write what you mean:
(From act in Context.Actions Where Not act.Audits.Any(Function(audit) audit.Deleted)).ToArray
which is equivalent to
Context.Actions.Where(Function(act) Not act.Audits.Any(Function(audit) audit.Deleted)).ToArray
and let the LINQ parser do the hard SQL work.
I have a generic code that is used to retrieve DDL information from a Firebird database (FB2.1). It generates SQL code like
SELECT * FROM MyTable where 'c' <> 'c'
I cannot change this code. Actually, if that matters, it is inside Report Builder 10.
The fact is that some tables from my database are becoming a litle too populated (>1M records) and that query is starting to take too long to execute.
If I try to execute
SELECT * FROM MyTable where SomeIndexedField = SomeImpossibleValue
it will obviously use that index and run very quickly.
Well, it wouldn´t be that hard to the database find out that that is an impossible matcher and make some sort of optimization and avoid testing it against each row.
Is there any way to make my firebird database to optimize that search?
As the filter condition is a negative proposition (and also doesn't refer a column to search, but only a value to compare to another value), Firebird need to do a full table scan (without use any index) to confirm that aren't any record that meet your criteria.
If you can't change you need to wait for the upcoming 3.0 version, that will implement the Boolean data type, and therefore should start to evaluate "constant" fake comparisons in advance (maybe the client library will do this evaluation before send the statement to the server?).