I want to compare 2 results.
I want to check if there is a difference between predicate = 'description' and predicate = '#shortDescription' (to avoid duplicates especially).
here is my ddl (I did not manage to declare the variable). https://dbfiddle.uk/o7cHpbBl
I don't understand why the except doesn't work when I applied the same logic as here
thank you
Related
This is a very weird problem
In short
var q = (some query).Count();
Gives my a number and
var q = (some query).ToList().Count();
Gives me entirely different number...
with mentioning that (some query) has two includes (joins)
is there a sane explanation for that???
EDIT: here is my query
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).Count();
this gives me a wrong number
and this:
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).ToList().Count();
Gives me accurate number..
EDIT 2
Wher I wrote my query as linq query it worked perfectly...
var q1 = (from d in db.membership_renewals where d.isDeleted == false join m in db.members on d.mr_memberId equals m.m_id join s in db.sports on d.mr_sportId equals s.s_id select d.mr_id).Count();
I think the problem that entity framework doesn't execute the joins in the original query but forced to execute them in (ToList())...
I Finally figured out what's going on...
The database tables are not linked together in the database (there are no relationship or constraints defined in the database itself) so the code doesn't execute the (inner join) part.
However my classes on the other hand are well written so when I perform (ToList()) it automatically ignores the unbound rows...
And when I wrote the linq query defining the relation ship keys (primary and foreign) it worked alright because now the database understands my relation between tables...
Thanks everyone you've been great....
My guess is IQueryable gives a smaller number cause not all the objects are loaded, kind of like a stream in Java, but IQueryable.toList().count() forces the Iqueryable to load all the data and it is traversed by the list constructor and stored in the list so IQueryable.toList().Count() is the accurate answer. This is based on 5 minutes of search on MSDN.
The idea is the underlying datastore of the IQueryable is a database iterator so it executes differently every time because it executes the query again on the database, so if you call it twice against the same table, and the data has changed you get different results. This is called delayed execution. But when you say IQueryable.ToList() you force the iterator to do the whole iteration once and dump the results in a list which is constant
I want to build queries in my Extbase repository that can filter out objects with a certain number of items. items is of type ObjectStorage in the model.
I tried to get objects with at least 1 item in this query, but it obviously doesn't work because the method "greater than" won't count the items in the objectStorage.
$x = 0; //any number
$query = $this->createQuery();
$constraints[] = $query->equals('deleted', 0);
$constraints[] = $query->equals('hidden', 0);
$constraints[] = $query->greaterThan('items', $x);
return $query->matching($query->logicalAnd($constraints))->execute();
So how can I do this?
I was thinking about an SQL Statement, but how could I add it to constraints?
I don't want to do everything with SQL.
I don't see how you could do it without either using $query->statement or iterating the whole result without the "count" constraint and then throwing away the records with items lower than $x. (Apart from pgampe's approach that was posted at the exact same time.)
Which approach you take depends on performance considerations. If you have a small result set, doing a foreach on the QueryResult wouldn't hurt so much. But since every result is converted into an object in the process, you'll have a big overhead by doing so when you have a lot of records.
Keep in mind that using $query->statement won't make you lose the ability to work with the records as domain objects. So it's not a big deal to go this way.
You can create yourself a custom constraint that does this. Have a look at how it is done in the \TYPO3\CMS\Extbase\Persistence\Generic\Qom\QueryObjectModelFactory.
I am calculating average length of identifiers with CQLinq in NDepend, and I want to get the length of the names of classes, fields and methods. I walked through this page of CQlinq: http://www.ndepend.com/docs/cqlinq-syntax, and I have code like:
let id_m = Methods.Select(m => new { m.SimpleName, m.SimpleName.Length })
let id_f = Fields.Select(f => new { f.Name, f.Name.Length })
select id_m.Union(id_f)
It doesn't work, one error says:
'System.Collections.Generic.IEnumerable' does not
contain a definition for 'Union'...
The other one is:
cannot convert from
'System.Collections.Generic.IEnumerable' to
'System.Collections.Generic.HashSet'
However, according to MSDN, IEnumerable Interface defines Union() and Concat() methods.
It seems to me that I cannot use CQLinq exactly the same way as Linq. Anyway, is there a way to get the information from Types, Methods and Fields domains within a singe query?
Thanks a lot.
is there a way to get the information from Types, Methods and Fields domains within a singe query?
Not for now, because a CQLinq query can only match a sequence of types, or a sequence of methods or a sequence of field, so you need 3 distinct code queries.
For next version CQLinq, will be improved a lot and indeed you'll be able to write things like:
from codeElement in Application.TypesAndMembers
select new { codeElement, codeElement.Name.Length }
Next version will be available before the end of the year 2016.
I am using Entity Framework and I need to check if a product with name = "xyz" exists ...
I think I can use Any(), Exists() or First().
Which one is the best option for this kind of situation? Which one has the best performance?
Thank You,
Miguel
Okay, I wasn't going to weigh in on this, but Diego's answer complicates things enough that I think some additional explanation is in order.
In most cases, .Any() will be faster. Here are some examples.
Workflows.Where(w => w.Activities.Any())
Workflows.Where(w => w.Activities.Any(a => a.Title == "xyz"))
In the above two examples, Entity Framework produces an optimal query. The .Any() call is part of a predicate, and Entity Framework handles this well. However, if we make the result of .Any() part of the result set like this:
Workflows.Select(w => w.Activities.Any(a => a.Title == "xyz"))
... suddenly Entity Framework decides to create two versions of the condition, so the query does as much as twice the work it really needed to. However, the following query isn't any better:
Workflows.Select(w => w.Activities.Count(a => a.Title == "xyz") > 0)
Given the above query, Entity Framework will still create two versions of the condition, plus it will also require SQL Server to do an actual count, which means it doesn't get to short-circuit as soon as it finds an item.
But if you're just comparing these two queries:
Activities.Any(a => a.Title == "xyz")
Activities.Count(a => a.Title == "xyz") > 0
... which will be faster? It depends.
The first query produces an inefficient double-condition query, which means it will take up to twice as long as it has to.
The second query forces the database to check every item in the table without short-circuiting, which means it could take up to N times longer than it has to, depending on how many items need to be evaluated before finding a match. Let's assume the table has 10,000 items:
If no item in the table matches the condition, this query will take roughly half the time as the first query.
If the first item in the table matches the condition, this query will take roughly 5,000 times longer than the first query.
If one item in the table is a match, this query will take an average of 2,500 times longer than the first query.
If the query is able to leverage an index on the Title and key columns, this query will take roughly half the time as the first query.
So in summary, IF you are:
Using Entity Framework 4 (since newer versions might improve the query structure) Entity Framework 6.1 or earlier (since 6.1.1 has a fix to improve the query), AND
Querying directly against the table (as opposed to doing a sub-query), AND
Using the result directly (as opposed to it being part of a predicate), AND
Either:
You have good indexes set up on the table you are querying, OR
You expect the item not to be found the majority of the time
THEN you can expect .Any() to take as much as twice as long as .Count(). For example, a query might take 100 milliseconds instead of 50. Or 10 instead of 5.
IN ANY OTHER CIRCUMSTANCE .Any() should be at least as fast, and possibly orders of magnitude faster than .Count().
Regardless, until you have determined that this is actually the source of poor performance in your product, you should care more about what's easy to understand. .Any() more clearly and concisely states what you are really trying to figure out, so stick with that.
Any translates into "Exists" at the database level. First translates into Select Top 1 ... Between these, Exists will out perform First because the actual object doesn't need to be fetched, only a Boolean result value.
At least you didn't ask about .Where(x => x.Count() > 0) which requires the entire match set to be evaluated and iterated over before you can determine that you have one record. Any short-circuits the request and can be significantly faster.
One would think Any() gives better results, because it translates to an EXISTS query... but EF is awfully broken, generating this (edited):
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent2]
WHERE Condition
)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
Instead of:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [MyTable] AS [Extent1]
WHERE Condition
)) THEN cast(1 as bit)
ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
...basically doubling the query cost (for simple queries; it's even worse for complex ones)
I've found using .Count(condition) > 0 is faster pretty much always (the cost is exactly the same as a properly-written EXISTS query)
Ok, I decided to try this out myself. Mind you, I'm using the OracleManagedDataAccess provider with the OracleEntityFramework, but I'm guessing it produces compliant SQL.
I found that First() was faster than Any() for a simple predicate. I'll show the two queries in EF and the SQL that was generated. Mind you, this is a simplified example, but the question was asking whether any, exists or first was faster for a simple predicate.
var any = db.Employees.Any(x => x.LAST_NAME.Equals("Davenski"));
So what does this resolve to in the database?
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME")
)) THEN 1 ELSE 0 END AS "C1"
FROM ( SELECT 1 FROM DUAL ) "SingleRowTable1"
It's creating a CASE statement. As we know, ANY is merely syntatic sugar. It resolves to an EXISTS query at the database level. This happens if you use ANY at the database level as well. But this doesn't seem to be the most optimized SQL for this query.
In the above example, the EF construct Any() isn't needed here and merely complicates the query.
var first = db.Employees.Where(x => x.LAST_NAME.Equals("Davenski")).Select(x=>x.ID).First();
This resolves to in the database as:
SELECT
"Extent1"."ID" AS "ID"
FROM "MYSCHEMA"."EMPLOYEES" "Extent1"
WHERE ('Davenski' = "Extent1"."LAST_NAME") AND (ROWNUM <= (1) )
Now this looks like a more optimized query than the initial query. Why? It's not using a CASE ... THEN statement.
I ran these trivial examples several times, and in ALMOST every case, (no pun intended), the First() was faster.
In addition, I ran a raw SQL query, thinking this would be faster:
var sql = db.Database.SqlQuery<int>("SELECT ID FROM MYSCHEMA.EMPLOYEES WHERE LAST_NAME = 'Davenski' AND ROWNUM <= (1)").First();
The performance was actually the slowest, but similar to the Any EF construct.
Reflections:
EF Any doesn't exactly map to how you might use Any in the database. I could have written a more optimized query in Oracle with ANY than what EF produced without the CASE THEN statement.
ALWAYS check your generated SQL in a log file or in the debug output window.
If you're going to use ANY, remember it's syntactic sugar for EXISTS. Oracle also uses SOME, which is the same as ANY. You're generally going to use it in the predicate as a replacement for IN. In this case it generates a series of ORs in your WHERE clause. The real power of ANY or EXISTS is when you're using Subqueries and are merely testing for the EXISTENCE of related data.
Here's an example where ANY really makes sense. I'm testing for the EXISTENCE of related data. I don't want to get all of the records from the related table. Here I want to know if there are Surveys that have Comments.
var b = db.Survey.Where(x => x.Comments.Any()).ToList();
This is the generated SQL:
SELECT
"Extent1"."SURVEY_ID" AS "SURVEY_ID",
"Extent1"."SURVEY_DATE" AS "SURVEY_DATE"
FROM "MYSCHEMA"."SURVEY" "Extent1"
WHERE ( EXISTS (SELECT
1 AS "C1"
FROM "MYSCHEMA"."COMMENTS" "Extent2"
WHERE ("Extent1"."SURVEY_ID" = "Extent2"."SURVEY_ID")
))
This is optimized SQL!
I believe the EF does a wonderful job generating SQL. But you have to understand how the EF constructs map to DB constructs else you can create some nasty queries.
And probably the best way to get a count of related data is to do an explicit Load with a Collection Query count. This is far better than the examples provided in prior posts. In this case you're not loading related entities, you're just obtaining a count. Here I'm just trying to find out how many Comments I have for a particular Survey.
var d = db.Survey.Find(1);
var e = db.Entry(d).Collection(f => f.Comments)
.Query()
.Count();
Any() and First() is used with IEnumerable which gives you the flexibility for evaluating things lazily. However Exists() requires List.
I hope this clears things out for you and help you in deciding which one to use.
I have a Postgres table with more than 8 million rows. Given the following two ways of doing the same query via DBD::Pg, I get wildly different results.
$q .= '%';
## query 1
my $sql = qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE '$q'
};
my $sth1 = $dbh->prepare($sql);
$sth1->execute();
## query 2
my $sth2 = $dbh->prepare(qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE ?
});
$sth2->execute($q);
query 2 is at least an order of magnitude slower than query 1... seems like it is not using the indexes, while query 1 is using the index.
Would love hear why.
With LIKE expressions, b-tree indexes can only be used if the search pattern is left-anchored, i.e. terminated with %. More details in the manual.
Thanks to #evil otto for the link. This link to the current version.
Your first query provides this essential information at prepare time, so the query planner can use a matching index.
Your second query does not provide any information about the pattern at prepare time, so the query planner cannot use any indexes.
I suspect that in the first case the query compiler/optimizer detects that the clause is a constant, and can build an optimal query plan. In the second it has to compile a more generic query because the bound variable can be anything at run-time.
Are you running both test cases from same file using same $dbh object?
I think reason of increasing speed in second case is that you using prepared statement which is already parsed(but maybe I wrong:)).
Ahh, I see - I will drop out after this comment since I don't know Perl. But I would trust that the editor is correct in highlighting the $q as a constant. I'm guessing that you need to concatenate the value into the string, rather than just directly referencing the variable. So, my guess is that if + is used for string concatenation in perl, then use something like:
my $sql = qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE '
} + $q + qq{'};
(Note: unless the language is tightly integrated with the database, such as Oracle/PLSQL, you usually have to create a completely valid SQL string before submitting to the database, instead of expecting the compiler to 'interpolate'/'Substitute' the value of the variable.)
I would again suggest that you get the COUNT() of the statements, to make sure that you are comparing apple to apples.
I don't know Postgres at all, but I think in Line 7 (WHERE Lower( a ) LIKE '$q'
), $q is actually a constant. It looks like your editor thinks so too, since it is highlighted in red. You probably still need to use the ? for the variable.
To test, do a COUNT(*), and make sure they match - I could be way offbase.