Does OrderBy followed by Skip and Take compile into query or run in memory? - entity-framework

Say I have a query like this one:
var result=collection.OrderBy(orderingFunction).Skip(start).Take(length);
Will the whole query run on SQL server and return the result, or will it return the whole ordered table and then run Skip and Take in memory? I am concerned because I noticed that OrderBy returns IOrderedEnumerable.
How about something like this:
if(orderAscending)
orderedCollection=collection.OrderBy(orderingFunction);
else
orderedCollection=collection.OrderByDescending(orderingFunction);
var result=orderedCollection.Skip(start).Take(length);
Will the Skip and Take part run on Server or in memory in this case?

This query is translated into SQL. An Entity Framework query such as
myTable.OrderBy(row => row.Id).Skip(10).Take(20);
Will produce SQL resembling the following:
SELECT TOP (20) [Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [my_table] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[Id] ASC
I recommend downloading LinqPad, a utility that allows you to execute EF queries (and other queries), and see the results and the corresponding SQL. It is an invaluable tool in developing high-quality EF queries.

Yes, it does translate to SQL. This is essential for paging.
You can verify this using SQL Profiler.

Related

select count(*) in WHERE Clause needing a comparison operator

Kinda new to SQL so I was reading up on some queries and chanced upon this (https://iggyfernandez.wordpress.com/2011/12/04/day-4-the-twelve-days-of-sql-there-way-you-write-your-query-matters/)
The part that got me curious is the aggregate query in the WHERE Clause. This is probably my misunderstanding but how does the author's code (shown below) run? I presumed that Count(*) - or rather aggregate functions cannot be used in the WHERE clause and you need a HAVING for that ?
SELECT per.empid, per.lname
FROM personnel per
WHERE (SELECT count(*) FROM payroll pay WHERE pay.empid = per.empid AND pay.salary = 199170) > 0;
My second question would be why the comparison operator (>0) is needed ? I was playing around and noticed that it would not run in PostgreSQL without the >0; also reformatting it to have a HAVING by Clause massively improves the query execution time
SELECT per.empid, per.lname
FROM personnel per
WHERE EXISTS (SELECT per.empid FROM payroll pay WHERE pay.empid = per.empid AND pay.salary = 199170)
GROUP BY per.empid, per.lname
HAVING COUNT(*) > 0;
Omit the GROUP BY and HAVING clauses in your version, then your query will be a more efficient query that is equivalent to the original.
In the original query, count(*) appears in the SELECT list of a subquery. You can use a parenthesized subquery almost anywhere in an SQL statement.

Inefficient Azure mobile apps query consuming lots of Azure SQL DTUs

I have an application that uses the Azure Mobile Apps .NET backend and Windows Client. One of the tables being synced between the backend and client has 30,000 rows. When syncing this table, the DTU spikes to around 60% of my 100 DTU (S3) tier, which is very bad. The monitoring graph looks like this:
The table controller for this table is pretty basic:
public async Task<IQueryable<MyBigTable>> GetMyBigTable()
{
return Query();
}
The following is the SQL generated by Azure Mobile Apps:
exec sp_executesql N'SELECT TOP (51)
[Project2].[Field1] AS [Field1],
[Project2].[C1] AS [C1],
[Project2].[C2] AS [C2],
[Project2].[Deleted] AS [Deleted],
[Project2].[C3] AS [C3],
...
FROM ( SELECT
[Project1].[Id] AS [Id],
...
FROM ( SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[MyBigTable] AS [Extent1]
WHERE ([Extent1].[UpdatedAt] >= #p__linq__0)
) AS [Project1]
ORDER BY [Project1].[UpdatedAt] ASC, [Project1].[Id] ASC
OFFSET #p__linq__1 ROWS FETCH NEXT #p__linq__2 ROWS ONLY
) AS [Project2]
ORDER BY [Project2].[UpdatedAt] ASC, [Project2].[Id] ASC',N'#p__linq__0 datetimeoffset(7),#p__linq__1 int,#p__linq__2 int',#p__linq__0='2017-02-28 03:48:49.4840000 +00:00',#p__linq__1=0,#p__linq__2=50
... not very pretty, and far too complicated for its ultimate purpose, but that's another matter. The problem, I think, which explains the shape of the graph, is that the inner SQL aliased as [Project1] always returns all rows from [UpdatedAt] till the end of the table. This inner SQL would have been the ideal place to put a TOP 51 clause, but instead it's found in the outer SQLs.
So, although each paging call from the client returns only 50 rows to the client, the first call causes the inner SQL to return all rows; the next call, 50 rows less than the previous, and so on. This, I think, explains the shape of the graph.
Is there any way to influence how the SQL is generated, or even override the SQL with my own? Does this mean that I need to extract the OData query? What is the best way to do this?

Dynamic FROM clause in Postgres

Using PostgreSQL 9.1.13 I've written the followed query to calculate some data:
WITH windowed AS (
SELECT a.person_id, a.category_id,
CAST(dense_rank() OVER w AS float) / COUNT(*) OVER (ORDER BY category_id) * 100.0 AS percentile
FROM (
SELECT DISTINCT ON (person_id, category_id) *
FROM performances s
-- Want to insert a FROM clause here
INNER JOIN person p ON s.person_id = p.ident
ORDER BY person_id, category_id, created DESC
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
)
SELECT category_id,percentile FROM windowed
WHERE person_id = 1;
I now want to turn this into a stored procedure but my issue is that in the middle there, where I showed the comment, I need to place a dynamic WHERE clause. For example, I'd like to add something like:
WHERE p.weight > 110 OR p.weight IS NULL
The calling application let's people pick filters and so I want to be able to pass the appropriate filters into the query. There could be 0 or many filters, depending on the caller, but I could pass it all in as a properly formatted where clause as a string parameter, for example.
The calling application just sends values to a webservice, which then builds the string and calls the stored procedure, so SQL injection attacks won't really be an issue.
The calling application just sends values to a webservice, which then
builds the string and calls the stored procedure, so SQL injection
attacks won't really be an issue.
Too many cooks spoil the broth.
Either let your webserive build the SQL statement or let Postgres do it. Don't use both on the same query. That leaves two possible weak spots for SQL injection attacks and makes debugging and maintenance a lot harder.
Here is full code example for a plpgsql function that builds and executes an SQL statement dynamically while making SQL injection impossible (just from two days ago):
Robust approach for building SQL queries programmatically
Details heavily depend on exact requirements.

JPA - MAX of COUNT or SELECT FROM SELECT

I wrote the following query for MySQL:
SELECT subquery.t1_column1,
subquery.t2_id,
MAX(subquery.val)
FROM (
SELECT t1.column1 as t1_column1,
t1.id_t2 AS t2_id,
count(1) AS val
FROM table1 t1
INNER JOIN table2 t2
ON t2.id = t1.id_t2
GROUP BY t1.id_t2
) subquery
GROUP BY t1_column1
And I'd like to translate it into JPA (JPQL or criteria query).
I don't know how to make this max(count) thing, and JPA doesn't seem to like the SELECT FROM SELECT...
If anyone has an idea other than native queries (I'll do it for now), it would be great.
I haven't checked tha JPA specification, but given that the Hibernate documentation says
Note that HQL subqueries can occur only in the select or where
clauses.
I very much doubt that your query can be transformed in a valid JPQL query.
You'll have to keep using this native SQL query.
JPA 2.0 JPQL does not support sub-selects in the from clause. You may want to try to rewrite your query, or use a native SQL query.
EclipseLink 2.4 will support sub-selects in the FROM clause,
see,
http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Querying/JPQL#Sub-selects_in_FROM_clause

Entity framework: query executing 'select from' for no reason

I'm having some issues with the entity framework. I'm executing a simple select from a view in the database. However, when I view the SQL that EF generates, it is executing the query twice using a select from. Is this the way it is supposed to operate? It seems very inefficient.
var reads = (from rt in ctx.C2kReadsToTransfer
where rt.ReadDt > fromDate
&& rt.ReadDt < toDate
select rt);
This gets translated into the following SQL
SELECT
[Extent1].[AMRID] AS [AMRID]
, [Extent1].[Comments] AS [Comments]
, [Extent1].[ExternalSystemType] AS [ExternalSystemType]
, [Extent1].[LastReadDt] AS [LastReadDt]
, [Extent1].[ReadDt] AS [ReadDt]
, [Extent1].[Reading] AS [Reading]
, [Extent1].[Units] AS [Units]
, [Extent1].[Transferred] AS [Transferred]
FROM
(SELECT
[ReadsToTransfer].[AMRID] AS [AMRID]
, [ReadsToTransfer].[Comments] AS [Comments]
, [ReadsToTransfer].[ExternalSystemType] AS [ExternalSystemType]
, [ReadsToTransfer].[LastReadDt] AS [LastReadDt]
, [ReadsToTransfer].[ReadDt] AS [ReadDt]
, [ReadsToTransfer].[Reading] AS [Reading]
, [ReadsToTransfer].[Transferred] AS [Transferred]
, [ReadsToTransfer].[Units] AS [Units]
FROM [dbo].[ReadsToTransfer] AS [ReadsToTransfer])
AS [Extent1]
That seems to be very inefficient, especially when the table contains close to 250 million rows as ours does. Also, if I tack a .Take(2000) onto the end of the code, it simply puts a 'select top 2000' on only the first select. Thus, making it select the top 2000 of the inside select which is the entire table.
Any thoughts on this?
That seems to be very inefficient
I don't think so... the outer SELECT is just a projection (actually an identity projection) of the inner SELECT, and a projection has a negligible performance impact...
Regarding the TOP 2000 clause, the fact that it is on the outer SELECT doesn't mean that the DB will read all rows from the inner SELECT ; it will read them as long as they are requested by the outer SELECT, then stop.
Just try to run the query manually, with or without the outer SELECT : I bet you won't find any significant difference in performance.