I'm having some issues with the entity framework. I'm executing a simple select from a view in the database. However, when I view the SQL that EF generates, it is executing the query twice using a select from. Is this the way it is supposed to operate? It seems very inefficient.
var reads = (from rt in ctx.C2kReadsToTransfer
where rt.ReadDt > fromDate
&& rt.ReadDt < toDate
select rt);
This gets translated into the following SQL
SELECT
[Extent1].[AMRID] AS [AMRID]
, [Extent1].[Comments] AS [Comments]
, [Extent1].[ExternalSystemType] AS [ExternalSystemType]
, [Extent1].[LastReadDt] AS [LastReadDt]
, [Extent1].[ReadDt] AS [ReadDt]
, [Extent1].[Reading] AS [Reading]
, [Extent1].[Units] AS [Units]
, [Extent1].[Transferred] AS [Transferred]
FROM
(SELECT
[ReadsToTransfer].[AMRID] AS [AMRID]
, [ReadsToTransfer].[Comments] AS [Comments]
, [ReadsToTransfer].[ExternalSystemType] AS [ExternalSystemType]
, [ReadsToTransfer].[LastReadDt] AS [LastReadDt]
, [ReadsToTransfer].[ReadDt] AS [ReadDt]
, [ReadsToTransfer].[Reading] AS [Reading]
, [ReadsToTransfer].[Transferred] AS [Transferred]
, [ReadsToTransfer].[Units] AS [Units]
FROM [dbo].[ReadsToTransfer] AS [ReadsToTransfer])
AS [Extent1]
That seems to be very inefficient, especially when the table contains close to 250 million rows as ours does. Also, if I tack a .Take(2000) onto the end of the code, it simply puts a 'select top 2000' on only the first select. Thus, making it select the top 2000 of the inside select which is the entire table.
Any thoughts on this?
That seems to be very inefficient
I don't think so... the outer SELECT is just a projection (actually an identity projection) of the inner SELECT, and a projection has a negligible performance impact...
Regarding the TOP 2000 clause, the fact that it is on the outer SELECT doesn't mean that the DB will read all rows from the inner SELECT ; it will read them as long as they are requested by the outer SELECT, then stop.
Just try to run the query manually, with or without the outer SELECT : I bet you won't find any significant difference in performance.
Related
I am using a asp.net GridView control and setting DatasourceId to EntityDataSource as below .in the
page load setting the GridDataSource.EntityTypeFilter to a View name and also adding a where clause as GridDataSource.Where = sWhereClause
The View has million records but the Where condition filter out the record .The EntityDataSource first getting all million record in Sub-Query then applying the Where which timing out command. its generating the query as below. I want the where clause shouldgo with ViewName select statement itself not with sub-query table Extent1.
SELECT TOP (20)
[Filter1].[COL1],
[Filter1].[COL2]
[Filter1].[Col3]
FROM (
SELECT [Extent1].[COL1] , [Extent1].[COL2], [Extent1].[COL3]
, row_number() OVER (ORDER BY [Extent1].[COL1] ASC
) AS [row_number]
FROM
(
SELECT
ViewName.V1 ,
ViewName.V2
ViewName.V3
ViewName.V4
FROM [dbo].ViewName
)
AS [Extent1]
WHERE ([Extent1].[COL1] LIKE '%FilterValue%')
OR ([Extent1].[COL1] LIKE '%FilterValue%') OR ([Extent1].[COL2] LIKE '%FilterValue%') OR ([Extent1].[COL3] LIKE '%FilterValue%') )
) AS [Filter1]
WHERE [Filter1].[row_number] > 0
ORDER BY [Filter1].COL1] ASC
Thanks and appreciate any help in advance.
The EntityDataSource first getting all million record in Sub-Query then applying the Where which timing out command
Just because the WHERE clause is not in the subquery does not mean that subquery isn't filtered. SQL Server (assuming that's what you're using), can "push down" the WHERE clause predicates into the subquery, and into base tables in view definition.
But that won't make any difference here as your multiple LIKE predicates have to be evaluated for every single row in the output of the view, then all the rows have to be sorted to find the top 20.
Using this SQL, I can cast a boolean column to a text:
SELECT *, (CASE WHEN bars.some_cond THEN 'Yes' ELSE 'No' END) AS some_cond_alpha
FROM "foos"
INNER JOIN "bars" ON "bars"."id" = "foos"."bar_id";
So why do I get a PG::UndefinedColumn: ERROR: column "some_cond_alpha" does not exist when I try to use it in a WHERE clause?
SELECT *, (CASE WHEN bars.some_cond THEN 'Yes' ELSE 'No' END) AS some_cond_alpha
FROM "foos"
INNER JOIN "bars" ON "bars"."id" = "foos"."bar_id"
WHERE (some_cond_alpha ILIKE '%y%');
This is because the column is created on-the-fly and does not exist. Possibly in later editions of PG it will, but right now you can not refer to an alias'd column in the WHERE clause, although for some reason you can refer to the alias'd column in the GROUP BY clause (don't ask me why they more friendly in the GROUP BY)
To get around this, I would make the query into a subquery and then query the column OUTSIDE the subquery as follows:
select *
from (
SELECT *, (CASE WHEN bars.some_cond THEN 'Yes' ELSE 'No' END) AS some_cond_alpha
FROM "foos"
INNER JOIN "bars" ON "bars"."id" = "foos"."bar_id"
) x
WHERE (x.some_cond_alpha ILIKE '%y%')
NOTE: It is possible at some point in the future you will be able to refer to an alias'd column in the WHERE clause. In prior versions, you could not refer to the alias in the GROUP BY clause but since 9.4 + it is possible...
SQL evaluates queries in a rather counterintuitive way. It starts with the FROM and WHERE clauses, and only hits the SELECT towards the end. So aliases defined in the SELECT don't exist yet when we're in the WHERE. You need to do a subquery if you want to have access to an alias, as shown in Walker Farrow's answer.
When I read an SQL query, I try to do so in roughly this order:
Start at the FROM. You can generally read one table/view/subquery at a time from left to right (or top to bottom, depending on how the code is laid out); it's normally not permissible for one item to refer to something that hasn't been mentioned yet.
Go down, clause by clause, in the order they're written. Again, read from left to right, top to bottom; nothing should reference anything that hasn't been defined yet. Stop right before you hit ORDER BY or something which can only go after ORDER BY (if there is no ORDER BY/etc., stop at the end).
Jump up to the SELECT and read it.
Go back down to where you were and resume reading.
If at any point you see a subquery, apply this algorithm recursively.
If the query begins with WITH RECURSIVE, go read the Postgres docs for 20 minutes and figure it out.
I'm having an issue with limiting the SQL query. I'm using SQL 2000 so I can't use any of the functions like ROW_NUMBER(),CTE OR OFFSET_ROW FETCH.
I have tried the Select TOP limit * FROM approach and excluded the already shown results but this way the query is so slow because sometimes my result query fetches more than 10000 records.
Also I have tried the following approach:
SELECT * FROM (
SELECT DISTINCT TOP 100 PERCENT i.name, i.location, i.image ,
( SELECT count(DISTINCT i.id) FROM image WHERE i.id<= im.id ) AS recordnum
FROM images AS im
order by im.location asc, im.name asc) as tmp
WHERE recordnum between 5 AND 15
same problem here plus issue because I couldn't add ORDER option in sub query from record um. I have placed both solution in stored procedure but still the query execution is still so slow.
So my question is:
IS there an efficient way to limit the query to pull 20 records per page in SQL 2000 for large amounts of data i.e more than 10000?
Thanks.
Now the subquery is only run once
where im2.id is null will skip the first 40 rows
SELECT top 25 im1.*
FROM images im1
left join ( select top 40 id from images order by id ) im2
on im1.id = im2.id
where im2.id is null
order by im1.id
Query-wise, there is no great performing way. If performance is critical and the data will always be grouped/ordered the same, you could add a int column and set the value by trigger based on the grouping/ordering. Index it and it should be extremely fast for reads; writes will be a bit slower.
Also, make sure you have indexes on the Id columns on image and images.
I can't paste in the entire SQL for various reasons, so consider this example:
select *
from
(select nvl(get_quantity(1), 10) available_qty
from dual)
where available_qty > 30;
get_quantity is a function that makes a calculation based on the ID of a record that's passed through it. If it returns null, I use nvl() to force it to 10.
The query runs very slow when I use the WHERE clause in the parent query. When I comment out the WHERE clause, however, it runs very fast. What I don't get is why it can display the data very fast, but it can't query it just as fast. I am querying the results of a subquery, too. I was under the impression that subqueries return a "rendered" dataset. It's almost as if querying the available_qty identifier is causing it to reference something within the subquery.
This is why I don't think the contents of the get_quantity function are relevant here, so I didn't bother posting it. Instead, I think it's a misunderstanding on my part of how Oracle handles subqueries and whatnot.
Do any of you Oracle gurus have any idea what I am doing wrong?
Afterthought: as I was entering tags for this question, the tag "correlated subquery" came up. In doing some quick research, it seems that this type of subquery somewhat depends on the outer query. Could this be related to my problem?
Let's try an experiment. First we'll run the following query:
select lvl, rnd
from (select level as lvl from dual connect by level <= 5) a,
(select dbms_random.value() rnd from dual) b;
The "a" subquery will return 5 rows with values from 1 to 5. The "b" subquery will return one row with a random value. If the function is run before the two tables are join (by Cartesian), the same random value will be returned for each row. The actual results:
LVL RND
---------- ----------
1 .417932089
2 .963531718
3 .617016889
4 .128395638
5 .069405568
5 rows selected.
Clearly the function was run for each of the joined rows, not for the subquery before the join. This is a result of Oracle's optimizer deciding that the best path for the query is to do things in that order. To prevent this, we have to add something to the second subquery that will make Oracle run the subquery in it's entirety before performing the join. We'll add rownum to the subquery, since Oracle knows rownum will change if it's run after the join. The following query demonstrates this:
select lvl, rnd from (
select level as lvl from dual connect by level <= 5) a,
(select dbms_random.value() rnd, rownum from dual) b;
As you can see from the results, the function was only run once in this case:
LVL RND
---------- ----------
1 .028513902
2 .028513902
3 .028513902
4 .028513902
5 .028513902
5 rows selected.
In your case, it seems likely that the filter provided by the where clause is making the optimizer take a different path, where it's running the function repeatedly, rather than once. By making Oracle run the subquery as written, you should get more consistent run-times.
SELECT DISTINCT tblJobReq.JobReqId
, tblJobReq.JobStatusId
, tblJobClass.JobClassId
, tblJobClass.Title
, tblJobReq.JobClassSubTitle
, tblJobAnnouncement.JobClassDesc
, tblJobAnnouncement.EndDate
, blJobAnnouncement.AgencyMktgVerbage
, tblJobAnnouncement.SpecInfo
, tblJobAnnouncement.Benefits
, tblSalary.MinRateSal
, tblSalary.MaxRateSal
, tblSalary.MinRateHour
, tblSalary.MaxRateHour
, tblJobClass.StatementEval
, tblJobReq.ApprovalDate
, tblJobReq.RecruiterId
, tblJobReq.AgencyId
FROM ((tblJobReq
LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId = tblJobAnnouncement.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode
WHERE (tblJobReq.JobClassId in (SELECT JobClassId
from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
When i try to execute the query it results in the following error.
Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094
I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000.
Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?
The [non obvious] reason why SQL needs to SORT is the DISTINCT keyword.
Depending on the data and underlying table structures, you may be able to do away with this DISTINCT, and hence not trigger this error.
You readily found the alternative solution which is to truncate some of the fields in the SELECT list.
Edit: Answering "Can you please explain how DISTINCT would be the reason here?"
Generally, the fashion in which the DISTINCT requirement is satisfied varies with
the data context (expected number of rows, presence/absence of index, size of row...)
the version/make of the SQL implementation (the query optimizer in particular receives new or modified heuristics with each new version, sometimes resulting in alternate query plans for various constructs in various contexts)
Yet, all the possible plans associated with a "DISTINCT query" involve *some form* of sorting of the qualifying records. In its simplest form, the plan "fist" produces the list of qualifying rows (records) (the list of records which satisfy the WHERE/JOINs/etc. parts of the query) and then sorts this list (which possibly includes some duplicates), only retaining the very first occurrence of each distinct row. In other cases, for example when only a few columns are selected and when some index(es) covering these columns is(are) available, no explicit sorting step is used in the query plan but the reliance on an index implicitly implies the "sortability" of the underlying columns. In other cases yet, steps involving various forms of merging or hashing are selected by the query optimizer, and these too, eventually, imply the ability of comparing two rows.
Bottom line: DISTINCT implies some sorting.
In the specific case of the question, the error reported by SQL Server and preventing the completion of the query is that "Sorting is not possible on rows bigger than..." AND, the DISTINCT keyword is the only apparent reason for the query to require any sorting (BTW many other SQL constructs imply sorting: for example UNION) hence the idea of removing the DISTINCT (if it is logically possible).
In fact you should remove it, for test purposes, to assert that, without DISTINCT, the query completes OK (if only including some duplicates). Once this fact is confirmed, and if effectively the query could produce duplicate rows, look into ways of producing a duplicate-free query without the DISTINCT keyword; constructs involving subqueries can sometimes be used for this purpose.
An unrelated hint, is to use table aliases, using a short string to avoid repeating these long table names. For example (only did a few tables, but you get the idea...)
SELECT DISTINCT JR.JobReqId, JR.JobStatusId,
tblJobClass.JobClassId, tblJobClass.Title,
JR.JobClassSubTitle, JA.JobClassDesc, JA.EndDate, JA.AgencyMktgVerbage,
JA.SpecInfo, JA.Benefits,
S.MinRateSal, S.MaxRateSal, S.MinRateHour, S.MaxRateHour,
tblJobClass.StatementEval,
JR.ApprovalDate, JR.RecruiterId, JR.AgencyId
FROM (
(tblJobReq AS JR
LEFT JOIN tblJobAnnouncement AS JA ON JR.JobReqId = JA.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary AS S ON tblJobClass.SalaryCode = S.SalaryCode
WHERE (JR.JobClassId in
(SELECT JobClassId from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
FYI, running this SQL command on your DB can fix the problem if it is caused by space that needs to be reclaimed after dropping variable length columns:
DBCC CLEANTABLE (0,[dbo.TableName])
See: http://msdn.microsoft.com/en-us/library/ms174418.aspx
This is a limitation of SQL Server 2000. You can:
Split it into two queries and combine elsewhere
SELECT ID, ColumnA, ColumnB FROM TableA JOIN TableB
SELECT ID, ColumnC, ColumnD FROM TableA JOIN TableB
Truncate the columns appropriately
SELECT LEFT(LongColumn,2000)...
Remove any redundant columns from the SELECT
SELECT ColumnA, ColumnB, --IDColumnNotUsedInOutput
FROM TableA
Migrate off of SQL Server 2000