SQL query problem when upgrading from SQL Server 2000 to SQL Server 2008 R2 - tsql

I am currently upgrading a database server from SQL Server 2000 to SQL Server 2008 R2. One of my queries used to take under a second to run and now takes in excess of 3 minutes (running on faster a faster machine).
I think I have located where it is going wrong but not why it is going wrong. Could somebody explain what the problem is and how I might resolve it?
The abridged code is as follows:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
...
FROM
Registrar reg
JOIN EnabledType et ON et.enabledTypeCode = reg.enabled
LEFT JOIN [Transaction] txn ON txn.registrarId = reg.registrarId
WHERE
txn.transactionid IS NULL OR
txn.transactionid IN
(
SELECT MAX(transactionid)
FROM [Transaction]
GROUP BY registrarid
)
I believe the issue is located on the "txn.transactionid IS NULL OR" line. If I remove this condition it runs as fast as it used to (less than a second) and returns all the records minus the 3 rows that that statement would have included. If I remove the second part of the OR statement it returns the 3 rows that I would expect in less than a second.
Could anybody point me in the correct direction as to why this is happening and when this change occured?
Many thanks in advance
Jonathan
I have accepted Alex's solution and included the new version of the code. It seems that we have found 0.1% of queries that the new query optimiser runs slower.
WITH txn AS (
SELECT registrarId, balance , ROW_NUMBER() OVER (PARTITION BY registrarid ORDER BY transactionid DESC) AS RowNum
FROM [Transaction]
)
SELECT
reg.registrarId,
reg.ianaId,
reg.registrarName,
reg.clientId,
reg.enabled,
ISNULL(txn.balance, 0.00) AS [balance],
reg.alertBalance,
reg.disableBalance,
et.enabledTypeName
FROM
Registrar reg
JOIN EnabledType et
ON et.enabledTypeCode = reg.enabled
LEFT JOIN txn
ON txn.registrarId = reg.registrarId
WHERE
ISNULL(txn.RowNum,1)=1
ORDER BY
registrarName ASC

Try restructuring the query using a CTE and ROW_NUMBER...
WITH txn AS (
SELECT registrarId, transactionid, ...
, ROW_NUMBER() OVER (PARTITION BY registrarid ORDER BY transactionid DESC) AS RowNum
FROM [Transaction]
)
SELECT
...
FROM
Registrar reg
JOIN EnabledType et ON et.enabledTypeCode = reg.enabled
LEFT JOIN txn ON txn.registrarId = reg.registrarId
AND txn.RowNum=1

Related

Proc is running slow with NOT EXISTS

I'm working on trying to create a stored procedure however I'm running into a issue where the stored procedure runs for over 5 minutes due to close to 50k records.
The process seems pretty straight forward, I'm just not sure why it is taking so long.
Essentially I have two tables:
Table_1
ApptDate ApptName ApptDoc ApptReason ApptType
-----------------------------------------------------------------------
03/15/2021 Physical Dr Smith Yearly Day
03/15/2021 Check In Dr Doe Check In Day
03/15/2021 Appt oth Dr Dee Check In Monthly
Table_2 - this table has the same exact structure as Table_1, what I am trying to achieve is simply archive the the data from Table_1
DECLARE #Date_1 as DATETIME
SET #Date_1 = GetDate() - 1
INSERT INTO Table_2 (ApptDate, ApptName, ApptDoc, ApptReason)
SELECT ApptDate, ApptName, ApptDoc, ApptReason
FROM Table_1
WHERE ApptType = 'Day' AND ApptDate = #Date_1
AND NOT EXISTS (SELECT 1 FROM Table_2
WHERE AppType = 'Day' AND ApptDate = #Date_1)
So this stored procedure seems pretty straight forward, however the NOT EXIST is causing it to be really slow.
The reason for NOT EXIST, is that this stored procedure is part of a bigger process that runs multiple times a day (morning, afternoon, night). I'm trying to make sure that I only have 1 copy of the the '03/15/2021' data. I'm basically running an archive process on previous days data (#Date_1)
Any thoughts how this can be "sped up".
For this query:
INSERT INTO Table_2 (ApptDate, ApptName, ApptDoc, ApptReason)
SELECT ApptDate, ApptName, ApptDoc, ApptReason
from Table_1 t1
Where ApptType = 'Day' and
ApptDate = #Date_1 and
NOT EXISTS (Select 1
from Table_2 t2
where t2.AppType = t1.AppType and
t2.ApptDate = t1.ApptDate
);
You want indexes on: table_1(ApptType) and more importantly, Table_2(AppType, ApptDate) or Table_2(ApptDate, AppType).
Note: I changed the correlation clause to just refer to the values in the outer query. This seems more general than your version, but should have the same performance (in this case).

Why SSRS reports works in development but not in Production

I have a SELECT statement (NOT a Stored Procedure) that I am using to create a report in SSRS (Visual Studio 2010).
Parameter #ClassCode is the one that causing a trouble. But in Development it works fine, but when I deploy it to Production it renders forever.
I am assuming it a Parameter Sniffing, and I read about how to fix it inside the Stored Procedure. But I dont have a SP, I am using a SELECT statement.
What would be the workaround for SELECT statement?
And what is the difference between environments? Production is much much more powerful.
My query below:
;WITH cte1
AS
(
SELECT QuoteID,
AccidentDate,
PolicyNumber,
SUM(PaidLosses) as PaidLosses
FROM tblLossesPlazaCommercialAuto
WHERE InsuredState IN (#State) AND AccidentDate BETWEEN #StartDate AND #EndDate AND TransactionDate <= #EndDate AND Coverage = 'VehicleComprehensive'
GROUP BY QuoteID,
AccidentDate,
PolicyNumber
),
cte3
AS
(
SELECT
cte1.Quoteid,
cte1.PolicyNumber,
cte1.AccidentDate,
cc.TransactionEffectiveDate,
cc.ClassCode,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY cte1.QuoteID, cte1.PolicyNumber,cc.AccidentDate ORDER BY (SELECT 0))=1 THEN cte1.PaidLosses
ELSE 0
END as PaidLosses--,
FROM cte1 inner join tblClassCodesPlazaCommercial cc
on cte1.PolicyNumber=cc.PolicyNumber
AND cte1.AccidentDate=cc.AccidentDate
AND cc.AccidentDate IS NOT NULL
/* This is the one that gives me problem */
WHERE cc.ClassCode IN (#ClassCode)
)
SELECT SUM(PaidLosses) as PaidLosses, c.YearNum, c.MonthNum
FROM cte3 RIGHT JOIN tblCalendar c ON c.YearNum = YEAR(cte3.AccidentDate) AND c.MonthNum = MONTH(cte3.AccidentDate)
WHERE c.YearNum <>2017
GROUP BY c.YearNum, c.MonthNum
ORDER BY c.YearNum, c.MonthNum
Used Tuning Advisor to see what indexes and statistics needed the workload. After creating those - everything works fine.

Left outer join using 2 of 3 tables in Postgresql

I need to show all clients entered into the system for a date range.
All clients are assigned to a group, but not necessarily to a staff.
When I run the query as such:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
staff.staff_name_cs,
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
clients.zrud_staff = staff.zzud_staff AND
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 121 entries
if I comment out:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
-- staff.staff_name_cs, -- Line Commented out
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
-- clients.zrud_staff = staff.zzud_staff AND --Line commented out
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 173 entries
I know I need to do an outer join to capture all clients regardless of if there
is a staff assigned, but each attempt has failed. I have done outer joins with
two tables, but adding a third has twisted my brain.
Thanks for any suggestions
I have no way of testing this (or of knowing that it is right) but what I read in your query is that you want something similar to this:
SELECT --I just used short aliases. I choose something other than the table name so I know it is an alias "c" for client etc...
c.name_lastfirst_cs,
to_char (c.date_intake,'MM/DD/YY')AS Date_Created,
c.client_id,
c.display_intake,
s.staff_name_cs,
g.name,
l.zrud_client AS "link_client",--I'm selecting some data here so that I can debug later, you can just filter this out with another select if you need to
l.zzud_group AS "link_group" --Again, so I can see these relationships
FROM
public.clients c
LEFT OUTER JOIN staff s ON --is staff required? If it isn't then outer join (optional)
s.zzud_staff = c.zrud_staff --so we linked staff to clients here
LEFT OUTER JOIN public.link_group l ON --this looks like a lookup table to me so we select the lookup record
l.zrud_client = c.zzud_client -- this is how I define the lookup, a client id
LEFT OUTER JOIN public.groups g ON --then we use that to lookup a group
g.zzup_group = l.zrud_group --which is defined by this data here
WHERE -- the following must be true
c.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
Now for the why: I've basically moved your where clause to JOIN x ON y=z syntax. In my experience this is a better way to write an maintain queries as it allows you to specify relationships between tables rather than doing a big-ol'-join and trying to filter that data with the where clause. Keep in mind each condition is REQUIRED not optional so when you say you want records with the following conditions you're going to get them (and if I read this right--I probably don't as I don't have a schema in-front of me) if a record is missing a link-table record OR a staff member you're going to filter it out.
Alternatively (possibly significantly slower) You can SELECT anything so you can chain it like:
SELECT
*
FROM
(
SELECT
*
FROM
public.clients
WHERE
x condition
)
WHERE
y condition
OR
SELECT * FROM x WHERE x.condition IN (SELECT * FROM y)
In your case this tactic probably won't be easier than a standard join syntax.
^And some serious opinion here: I recommend you use the join syntax I outlined above here. It is functionally the same as joining and specifying a where clause, but as you noted, if you don't understand the relationships it can cause a Cartesian join. http://www.tutorialspoint.com/sql/sql-cartesian-joins.htm . Lastly, I tend to specify what type of join I want. I write INNER JOIN and OUTER JOIN a lot in my queries because it helps the next person (usually me) figure out what the heck I meant. If it is optional use an outer join, if it is required use an inner join (default).
Good luck! There are much better SQL developers out there and there's probably another way to do it.

Nested select statement in FROM clause? Inner Join statements? or just table name?

I'm building a query that needs data from 5 tables.
I've been told by a DBA in the past that specifying a list of columns vs getting all columns (*) is preferred from some performance/memory aspect.
I've also been told that the database performs a JOIN operation behind the scenes when there's a list of tables in the FROM clause, to create one table (or view).
The existing database has very little data at the moment, as we're at a very initial point. So not sure I can measure the performance hit in practice.
I am not a database pro. I can get what data I need. The dillema is, at what price.
Added: At the moment I'm working with MS SQL Server 2008 R2.
My questions are:
Is there a performance difference and why, between the following:
a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit)
b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)?
c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?
Is there yet a better way to do this?
Below are two queries that both get the slice of data that I need. One includes more nested select statements.
I apologize if they are not written in a standard form or helplessly overcomplicated - hopefully you can decipher. I try to keep them organized as much as possible.
Insights would be most appreciated as well.
Thanks for checking this out.
5 tables: devicepool, users, trips, TripTracker, and order
Query 1 (more select statements):
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
(
SELECT
tripid,
[status] tripstatus,
stops,
devid,
groupid
FROM
trips
)
AS [base2]
ON base.devid = base2.devid AND base2.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
(
SELECT
[status] [orderstatus],
[address] [destaddress],
[tripid],
stopnumber orderstopnumber
FROM [order]
)
AS [orders]
ON orders.orderstopnumber = tracker.stopnumber)
Query 2:
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
trips
ON base.devid = trips.devid AND trips.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
[order]
ON [order].stopnumber = tracker.stopnumber)
Is there a performance difference and why, between the following: a.
SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel
that this might be a performance hit) b. SELECT ... FROM tbl1 inner
join tbl2 on ... inner join tbl3 on ... etc (would this be more
explicit to the server and save on performance/memory)? c. SELECT ...
FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this
save anythig? or is it just extra select statements that create more
work for the server and for us)?
a) and b) should result in the same query plan (although this is db-specific). b) is much preferred for portability and readability over a). c) is a horrible idea, that hurts readability and if anything will result in worse peformance. Let us never speak of it again.
Is there yet a better way to do this?
b) is the standard approach. In general, writing the plainest ANSI SQL will result in the best performance, as it allows the query parser to easily understand what you are trying to do. Trying to outsmart the compiler with tricks may work in a given situation, but does not mean that it will still work when the cardinality or amount of data changes, or the database engine is upgraded. So, avoid doing that unless you are absolutely forced to.

Is there a way to find TOP X records with grouped data?

I'm working with a Sybase 12.5 server and I have a table defined as such:
CREATE TABLE SomeTable(
[GroupID] [int] NOT NULL,
[DateStamp] [datetime] NOT NULL,
[SomeName] varchar(100),
PRIMARY KEY CLUSTERED (GroupID,DateStamp)
)
I want to be able to list, per [GroupID], only the latest X records by [DateStamp]. The kicker is X > 1, so plain old MAX() won't cut it. I'm assuming there's a wonderfully nasty way to do this with cursors and what-not, but I'm wondering if there is a simpler way without that stuff.
I know I'm missing something blatantly obvious and I'm gonna kick myself for not getting it, but .... I'm not getting it. Please help.
Is there a way to find TOP X records, but with grouped data?
According to the online manual, Sybase 12.5 supports WINDOW functions and ROW_NUMBER(), though their syntax differs from standard SQL slightly.
Try something like this:
SELECT SP.*
FROM (
SELECT *, ROW_NUMBER() OVER (windowA ORDER BY [DateStamp] DESC) AS RowNum
FROM SomeTable
WINDOW windowA AS (PARTITION BY [GroupID])
) AS SP
WHERE SP.RowNum <= 3
ORDER BY RowNum DESC;
I don't have an instance of Sybase, so I haven't tested this. I'm just synthesizing this example from the doc.
I made a mistake. The doc I was looking at was Sybase SQL Anywhere 11. It seems that Sybase ASA does not support the WINDOW clause at all, even in the most recent version.
Here's another query that could accomplish the same thing. You can use a self-join to match each row of SomeTable to all rows with the same GroupID and a later DateStamp. If there are three or fewer later rows, then we've got one of the top three.
SELECT s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
FROM SomeTable s1
LEFT OUTER JOIN SomeTable s2
ON s1.[GroupID] = s2.[GroupID] AND s1.[DateStamp] < s2.[DateStamp]
GROUP BY s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
HAVING COUNT(*) < 3
ORDER BY s1.[DateStamp] DESC;
Note that you must list the same columns in the SELECT list as you list in the GROUP BY clause. Basically, all columns from s1 that you want this query to return.
Here's quite an unscalable way!
SELECT GroupID, DateStamp, SomeName
FROM SomeTable ST1
WHERE X <
(SELECT COUNT(*)
FROM SomeTable ST2
WHERE ST1.GroupID=ST2.GroupID AND ST2.DateStamp > ST1.DateStamp)
Edit Bill's solution is vastly preferable though.