Entity Framework 4 Any() - entity-framework

Entity Framework 4 Any() - entity-framework

Basically I have a query where I need to check to see if the data exists on another table.
bool isInUse;
IsInUse = entity.H83SAF_HEALTH_PLAN.H83SAF_CONSENT.Any();
When I used clutch to trap the query to see what was going on, I could see the query running basically like this:
SELECT
"Extent1"."ACTIVE" AS "ACTIVE",
"Extent1"."IS_COMPLETE" AS "IS_COMPLETE",
"Extent1"."DATE_CREATED" AS "DATE_CREATED",
"Extent1"."DATE_MODIFIED" AS "DATE_MODIFIED",
"Extent1"."CREATED_BY" AS "CREATED_BY",
"Extent1"."MODIFIED_BY" AS "MODIFIED_BY"
...more columns..natter, natter..
FROM "H83FTF"."H83SAF_CONSENT" "Extent1"
WHERE ("Extent1"."HEALTH_PLAN_ID" = 1)
The query itself is fine. I have a problem with the .Any() statement. What I thought should happen is that the query should quit abruptly when the .Any() condition is met.
Yet when the query I run, it looks like like the query is bringing back over 18,000 records (which I don't use) I only want to see if the data exists on the other table if the condition is met - as it is, the query hangs up the website while 18,000 rows are executed with the .Any() statement.
The first row has the condition met but my understanding is that .Any() should quit or stop the moment the condition is met.
I tried firstordefault() yet it still fetches 18000 rows in the memory...

Finally came to a conclusion is that using .Any() on EF 5.0 does not translate to exist in Oracle 11g - solution was to do a direct call and bring back yes or no.

Related

Understanding SQL query complexity

I'm currently having trouble understanding why a seemingly simple query is taking much longer to return results than a much more complicated (looking) query.
I have a view, performance_summary (which in turn selects from another view). Currently, within psql, when I run a query like
SELECT section
FROM performance_summary
LIMIT 1;
it takes a minute or so to return a result, whereas a query like
SELECT section, version, weighted_approval_rate
FROM performance_summary
WHERE version in ('1.3.10', '1.3.11') AND section ~~ '%WEST'
ORDER BY 1,2;
gets results almost instantly. Without knowing how the view is defined, is there any obvious or common reason why this is?

Not really, without knowing how the view is defined. It could be that the "more complex" query uses an index to select just two rows and then perform some trivial grouping sorting on the two. The query without the where clause might see postgres operating on millions of rows, trillions of operations and producing a single row out after discarding 999999999 rows, we just don't know unless you post the view definition and the explain plan output for both queries
You might be falling into the trap of thinking that a View is somehow a cache of info - it isn't. It's a stored query, that is inserted into the larger query when you select from it/include it in another query- this means that the whole thing must be planned and executed from scratch. There isn't a notion that creating a View does any pre planning etc, onto which other further improvement is done. It's more like the definition of the View is pasted into any query that uses it, then the query is run as if it were just written there and then

SQL Query with a little bit more complicated WHERE clause performance issue

I have some performance issue in this case:
Very simplified query:
SELECT COUNT(*) FROM Items WHERE ConditionA OR ConditionB OR ConditionC OR ...
Simply I have to determine how many Items the user has access through some complicated conditions.
When there is a large number of records (100,000+) in the Items table and say ~10 complicated conditions concatenated in WHERE clause, I get the result about 2 seconds in my case. The problem is when a very few conditions are met, f.e. when I get only 10 Items from 100,000.
How can I improve the performace in this "Get my items" case?
Additional information:
the query is generated by EF 6.1
MS SQL 2012 Express
SQL Execution Plan

Add an additional table to your schema. Instead of building a long query, insert the value for each condition into this new table, along with a key for that user/session. Then, JOIN the two tables together.
This should perform much better, because it will allow the database engine to make better use of indexes on your Items table.
Additionally, this will position you to eventually pre-define sets of permissions for your users, such that you don't need to insert them right at the moment you do the check. The permission sets will already be in the table, and the new table can also be indexed, which will improve performance further.

Linq To Entities - Any VS First VS Exists

I am using Entity Framework and I need to check if a product with name = "xyz" exists ...
I think I can use Any(), Exists() or First().
Which one is the best option for this kind of situation? Which one has the best performance?
Thank You,
Miguel

Okay, I wasn't going to weigh in on this, but Diego's answer complicates things enough that I think some additional explanation is in order.
In most cases, .Any() will be faster. Here are some examples.
Workflows.Where(w => w.Activities.Any())
Workflows.Where(w => w.Activities.Any(a => a.Title == "xyz"))
In the above two examples, Entity Framework produces an optimal query. The .Any() call is part of a predicate, and Entity Framework handles this well. However, if we make the result of .Any() part of the result set like this:
Workflows.Select(w => w.Activities.Any(a => a.Title == "xyz"))
... suddenly Entity Framework decides to create two versions of the condition, so the query does as much as twice the work it really needed to. However, the following query isn't any better:
Workflows.Select(w => w.Activities.Count(a => a.Title == "xyz") > 0)
Given the above query, Entity Framework will still create two versions of the condition, plus it will also require SQL Server to do an actual count, which means it doesn't get to short-circuit as soon as it finds an item.
But if you're just comparing these two queries:
Activities.Any(a => a.Title == "xyz")
Activities.Count(a => a.Title == "xyz") > 0
... which will be faster? It depends.
The first query produces an inefficient double-condition query, which means it will take up to twice as long as it has to.
The second query forces the database to check every item in the table without short-circuiting, which means it could take up to N times longer than it has to, depending on how many items need to be evaluated before finding a match. Let's assume the table has 10,000 items:
If no item in the table matches the condition, this query will take roughly half the time as the first query.
If the first item in the table matches the condition, this query will take roughly 5,000 times longer than the first query.
If one item in the table is a match, this query will take an average of 2,500 times longer than the first query.
If the query is able to leverage an index on the Title and key columns, this query will take roughly half the time as the first query.
So in summary, IF you are:
Using Entity Framework 4 (since newer versions might improve the query structure) Entity Framework 6.1 or earlier (since 6.1.1 has a fix to improve the query), AND
Querying directly against the table (as opposed to doing a sub-query), AND
Using the result directly (as opposed to it being part of a predicate), AND
Either:
You have good indexes set up on the table you are querying, OR
You expect the item not to be found the majority of the time
THEN you can expect .Any() to take as much as twice as long as .Count(). For example, a query might take 100 milliseconds instead of 50. Or 10 instead of 5.
IN ANY OTHER CIRCUMSTANCE .Any() should be at least as fast, and possibly orders of magnitude faster than .Count().
Regardless, until you have determined that this is actually the source of poor performance in your product, you should care more about what's easy to understand. .Any() more clearly and concisely states what you are really trying to figure out, so stick with that.

Any translates into "Exists" at the database level. First translates into Select Top 1 ... Between these, Exists will out perform First because the actual object doesn't need to be fetched, only a Boolean result value.
At least you didn't ask about .Where(x => x.Count() > 0) which requires the entire match set to be evaluated and iterated over before you can determine that you have one record. Any short-circuits the request and can be significantly faster.

One would think Any() gives better results, because it translates to an EXISTS query... but EF is awfully broken, generating this (edited):
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent2]
WHERE Condition
)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
Instead of:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit)
ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
...basically doubling the query cost (for simple queries; it's even worse for complex ones)
I've found using .Count(condition) > 0 is faster pretty much always (the cost is exactly the same as a properly-written EXISTS query)

Ok, I decided to try this out myself. Mind you, I'm using the OracleManagedDataAccess provider with the OracleEntityFramework, but I'm guessing it produces compliant SQL.
I found that First() was faster than Any() for a simple predicate. I'll show the two queries in EF and the SQL that was generated. Mind you, this is a simplified example, but the question was asking whether any, exists or first was faster for a simple predicate.
var any = db.Employees.Any(x => x.LAST_NAME.Equals("Davenski"));
So what does this resolve to in the database?
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME")
)) THEN 1 ELSE 0 END AS "C1"
FROM ( SELECT 1 FROM DUAL ) "SingleRowTable1"
It's creating a CASE statement. As we know, ANY is merely syntatic sugar. It resolves to an EXISTS query at the database level. This happens if you use ANY at the database level as well. But this doesn't seem to be the most optimized SQL for this query.
In the above example, the EF construct Any() isn't needed here and merely complicates the query.
var first = db.Employees.Where(x => x.LAST_NAME.Equals("Davenski")).Select(x=>x.ID).First();
This resolves to in the database as:
SELECT
"Extent1"."ID" AS "ID"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME") AND (ROWNUM <= (1) )
Now this looks like a more optimized query than the initial query. Why? It's not using a CASE ... THEN statement.
I ran these trivial examples several times, and in ALMOST every case, (no pun intended), the First() was faster.
In addition, I ran a raw SQL query, thinking this would be faster:
var sql = db.Database.SqlQuery<int>("SELECT ID FROM MYSCHEMA.EMPLOYEES WHERE LAST_NAME = 'Davenski' AND ROWNUM <= (1)").First();
The performance was actually the slowest, but similar to the Any EF construct.
Reflections:
EF Any doesn't exactly map to how you might use Any in the database. I could have written a more optimized query in Oracle with ANY than what EF produced without the CASE THEN statement.
ALWAYS check your generated SQL in a log file or in the debug output window.
If you're going to use ANY, remember it's syntactic sugar for EXISTS. Oracle also uses SOME, which is the same as ANY. You're generally going to use it in the predicate as a replacement for IN. In this case it generates a series of ORs in your WHERE clause. The real power of ANY or EXISTS is when you're using Subqueries and are merely testing for the EXISTENCE of related data.
Here's an example where ANY really makes sense. I'm testing for the EXISTENCE of related data. I don't want to get all of the records from the related table. Here I want to know if there are Surveys that have Comments.
var b = db.Survey.Where(x => x.Comments.Any()).ToList();
This is the generated SQL:
SELECT
"Extent1"."SURVEY_ID" AS "SURVEY_ID",
"Extent1"."SURVEY_DATE" AS "SURVEY_DATE"
FROM "MYSCHEMA"."SURVEY" "Extent1"
WHERE ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."COMMENTS" "Extent2"
WHERE ("Extent1"."SURVEY_ID" = "Extent2"."SURVEY_ID")
))
This is optimized SQL!
I believe the EF does a wonderful job generating SQL. But you have to understand how the EF constructs map to DB constructs else you can create some nasty queries.
And probably the best way to get a count of related data is to do an explicit Load with a Collection Query count. This is far better than the examples provided in prior posts. In this case you're not loading related entities, you're just obtaining a count. Here I'm just trying to find out how many Comments I have for a particular Survey.
var d = db.Survey.Find(1);
var e = db.Entry(d).Collection(f => f.Comments)
.Query()
.Count();

Any() and First() is used with IEnumerable which gives you the flexibility for evaluating things lazily. However Exists() requires List.
I hope this clears things out for you and help you in deciding which one to use.

SQL0100W Error on DB2

I get the following error when running an sqr report on DB2:
SQL0100W - No row was found for FETCH, UPDATE or DELETE; or the result of a query is an empty table. SQLSTATE=02000
The sql in question runs correctly when I paste it into RapidSQL, replacing the parameters. The sql in question is an insert-select. No rows are returned by the select, and this is fine... I expect the report to be blank for my parameters.
Any idea how I can get around this?

DB2 returns always an SQL0100 warning (this is a warning, not an error - errors would have negative values) when no rows are returned. That's the way it is.
I don't know peoplesoft at all - so I can't give you any pointers with that. Back when I was programming for DB2 we ignored those SQL0100 warnings.

If SQR can't gracefully handle a NOT_FOUND SQL0100 return, then code a preliminary query to return a count of the number of rows that satisfy the conditions of the actual query. Check the result of the count in an if-then block in SQR to run the actual query if and only if the row count returned by the preceding query was not zero.

Turns out to be an environment setup issue. Got resolved with no change from me after a couple of builds....
Strange :-/

if you delete delete more than one record using logic operation like delete from tabname where columnnmae=deleterecord and columnnmae=deleterecord then they show this type error.machine an

Does DataReader.NextResult retrieves the result is always the same order

I have a SELECT query that yields multiple results and do not have any ORDER BY clause.
If I execute this query multiple times and then iterate through results using DataReader.NextResult(), would I be guaranteed to get the results in the same order?
For e.g. if I execute the following query that return 199 rows:
SELECT * FROM products WHERE productid < 200
would I always get the first result with productid = 1 and so on?
As far as I have observed it always return the results in same order, but I cannot find any documentation for this behavior.
======================================
As per my research:
Check out this blog Conor vs. SQL. I actually wanted to ask if the query-result changes even if the data in table remains the same (i.e no update or delete). But it seems like in case of large table, when SQL server employees parallelism, the order can be different

First of all, to iterate the rows in a DataReader, you should call Read, not NextResult.
Calling NextResult will move to the next result set if your query has multiple SELECT statements.
To answer your question, you must not rely on this.
A query without an ORDER BY clause will return rows in SQL Server's default iteration order.
For small tables, this will usually be the order in which the rows were added, but this is not guaranteed and is liable to change at any time. For example, if the table is indexed or partitioned, the order will be different.

No, DataReader will return the results in the order they come back from SQL. If you don't specify an ORDER BY clause, that will be the order that they exist in the table.

It is possible, perhaps even likely that they will always return in the same order, but this isn't guaranteed. The order is determined by the queryplan (at least in SQL Server) on the database server. If something changes that queryplan, the order could change. You should always use ORDER BY if the order of results is in anyway important to your processing of the data.