Speeding up TSQL - tsql

Hi all i wondering if there's a more efficient way of executing this TSQl script. It basically goes and gets the very latest activity ordering by account name and then join this to the accounts table. So you get the very latest activity for a account. The problem is there are currently about 22,000 latest activities, so obviously it has to go through alot of data, just wondering if theres a more efficient way of doing what i'm doing?
DECLARE #pastAppointments TABLE (objectid NVARCHAR(100), account NVARCHAR(500), startdate DATETIME, tasktype NVARCHAR(100), ownerid UNIQUEIDENTIFIER, owneridname NVARCHAR(100), RN NVARCHAR(100))
INSERT INTO #pastAppointments (objectid, account, startdate, tasktype, ownerid, owneridname, RN)
SELECT * FROM (
SELECT fap.regardingobjectid, fap.regardingobjectidname, fap.actualend, fap.activitytypecodename, fap.ownerid, fap.owneridname,
ROW_NUMBER() OVER (PARTITION BY fap.regardingobjectidname ORDER BY fap.actualend DESC) AS RN
FROM FilteredActivityPointer fap
WHERE fap.actualend < getdate()
AND fap.activitytypecode NOT LIKE 4201
) tmp WHERE RN = 1
ORDER BY regardingobjectidname
SELECT fa.name, fa.owneridname, fa.new_technicalaccountmanagername, fa.new_customerid, fa.new_riskstatusname, fa.new_numberofopencases,
fa.new_numberofurgentopencases, app.startdate, app.tasktype, app.ownerid, app.owneridname
FROM FilteredAccount fa LEFT JOIN #pastAppointments app on fa.accountid = app.objectid and fa.ownerid = app.ownerid
WHERE fa.statecodename = 'Active'
AND fa.ownerid LIKE #owner_search
ORDER BY fa.name

You can remove ORDER BY regardingobjectidname from the first INSERT query - the only (narrow) purpose such a sort would have on an INSERT query is if there was an identity column on the table being inserted into. And there isn't in this case, so if the optimizer isn't smart enough, it'll perform a pointless sort.

Related

Query to select by number of associated objects

I have two tables that look like the following:
Orders
------
id
tracking_number
ShippingLogs
------
tracking_number
created_at
stage
I would like to select the IDs of Orders that have ONLY ONE ShippingLog associated with it, and the stage of the ShippingLog must be error. If it has two ShippingLog entries, I don't want it. If it has one ShippingLog bug its stage is shipped, I don't want it.
This is what I have, and it doesn't work, and I know why (it finds the log with the error, but has no way of knowing if there are others). I just don't really know how to get it the way I need it.
SELECT DISTINCT
orders.id, shipping_logs.created_at, COUNT(shipping_logs.*)
FROM
orders
JOIN
shipping_logs ON orders.tracking_number = shipping_logs.tracking_number
WHERE
shipping_logs.created_at BETWEEN '2021-01-01 23:40:00'::timestamp AND '2021-01-26 23:40:00'::timestamp AND shipping_logs.stage = 'error'
GROUP BY
orders.id, shipping_logs.created_at
HAVING
COUNT(shipping_logs.*) = 1
ORDER BY
orders.id, shipping_logs.created_at DESC;
If you want to retain every column from the join of the two tables given your requirements, then I would suggest using COUNT here as an analytic function:
WITH cte AS (
SELECT o.id, sl.created_at,
COUNT(*) OVER (PARTITION BY o.id) num_logs,
COUNT(*) FILTER (WHERE sl.stage <> 'error')
OVER (PARTITION BY o.id) non_error_cnt
FROM orders o
INNER JOIN shipping_logs sl ON sl.tracking_number = o.tracking_number
WHERE sl.created_at BETWEEN '2021-01-01 23:40:00'::timestamp AND
'2021-01-26 23:40:00'::timestamp
)
SELECT id AS order_id, created_at
FROM cte
WHERE num_logs = 1 AND non_error_cnt = 0
ORDER BY id, created_at DESC;

Unable to get Percentile_Cont() to work in Postgresql

I am trying to calculate a percentile using the percentile_cont() function in PostgreSQL using common table expressions. The goal is find the top 1% of accounts regards to their balances (called amount here). My logic is to find the 99th percentile which will return those whose account balances are greater than 99% of their peers (and thus finding the 1 percenters)
Here is my query
--ranking subquery works fine
with ranking as(
select a.lname,sum(c.amount) as networth from customer a
inner join
account b on a.customerid=b.customerid
inner join
transaction c on b.accountid=c.accountid
group by a.lname order by sum(c.amount)
)
select lname, networth, percentile_cont(0.99) within group
order by networth over (partition by lname) from ranking ;
I keeping getting the following error.
ERROR: syntax error at or near "order"
LINE 2: ...ame, networth, percentile_cont(0.99) within group order by n..
I am thinking that perhaps I forgot a closing brace etc. but I can't seem to figure out where. I know it could be something with the order keyword but I am not sure what to do. Can you please help me to fix this error?
This tripped me up, too.
It turns out percentile_cont is not supported in postgres 9.3, only in 9.4+.
https://www.postgresql.org/docs/9.4/static/release-9-4.html
So you have to use something like this:
with ordered_purchases as (
select
price,
row_number() over (order by price) as row_id,
(select count(1) from purchases) as ct
from purchases
)
select avg(price) as median
from ordered_purchases
where row_id between ct/2.0 and ct/2.0 + 1
That query care of https://www.periscopedata.com/blog/medians-in-sql (section: "Median on Postgres")
You are missing the brackets in the within group (order by x) part.
Try this:
with ranking
as (
select a.lname,
sum(c.amount) as networth
from customer a
inner join account b on a.customerid = b.customerid
inner join transaction c on b.accountid = c.accountid
group by a.lname
order by networth
)
select lname,
networth,
percentile_cont(0.99) within group (
order by networth
) over (partition by lname)
from ranking;
I want to point out that you don't need a subquery for this:
select c.lname, sum(t.amount) as networth,
percentile_cont(0.99) within group (order by sum(t.amount)) over (partition by lname)
from customer c inner join
account a
on c.customerid = a.customerid inner join
transaction t
on a.accountid = t.accountid
group by c.lname
order by networth;
Also, when using table aliases (which should be always), table abbreviations are much easier to follow than arbitrary letters.

How do I find the sum of all transactions since an event?

So, let's say that I have a group of donors, and they make donations on an irregular basis. I can put the donor name, the donation amount, and the donation date into a table, but then I want to do a report that shows all of that information PLUS the value of all donations after that amount.
I know that I can parse through this using a loop, but is there a better way?
I'm cheating here by not bothering with the code that would go through and assign a transaction number by donor and ensure that everything is the right order. That's easy enough.
DECLARE #Donors TABLE (
ID INT IDENTITY
, Name NVARCHAR(30)
, NID INT
, Amount DECIMAL(7,2)
, DonationDate DATE
, AmountAfter DECIMAL(7,2)
)
INSERT INTO #Donors VALUES
('Adam Zephyr',1,100.00,'2017-01-14',NULL)
, ('Adam Zephyr',2,200.00,'2017-01-17',NULL)
, ('Adam Zephyr',3,150.00,'2017-01-20',NULL)
, ('Braden Yu',1,50.00,'2017-01-11',NULL)
, ('Braden Yu',2,75.00,'2017-01-19',NULL)
DECLARE #Counter1 INT = 0
, #Name NVARCHAR(30)
WHILE #Counter1 < (SELECT MAX(ID) FROM #Donors)
BEGIN
SET #Counter1 += 1
SET #Name = (SELECT Name FROM #Donors WHERE ID = #Counter1)
UPDATE d1
SET AmountAfter = (SELECT ISNULL(SUM(Amount),0) FROM #Donors d2 WHERE ID > #Counter1 AND Name = #Name)
FROM #Donors d1
WHERE d1.ID = #Counter1
END
SELECT * FROM #Donors
It seems like there ought to be a way to do this recursively, but I just can't wrap my head around it.
This would show the latest donation per Name which I presume is the donor and the total of all amounts donated by that person. Perhaps it's more appropriate to use NID for the partitions.
;with MostRecentDonations as (
select *,
row_number() over (partition by Name order by DonationDate desc) as rn,
sum(Amount) over (partition by Name) as TotalDonations
from #Donors
)
select * from MostRecentDonations
where rn = 1;
There's certainly no need to store a running total anywhere unless you have some kind of performance issue.
EDIT:
I've thought about your question and now I'm thinking that you just want a running total with all the transactions included. That's easy too:
select *,
sum(Amount) over (partition by Name order by DonationDate) as DonationsToDate
from #Donors
order by Name, DonationDate;

Using top in a subquery

I have the following data structure and I want to write a query that returns for a given order number, all the orderlineid's with the most recent statusId for that orderline.
If I was just interested in a particular order line I could use
select top 1 StatusId from task where OrderLineId = #OrderLineId order by TaskId desc
but I can't figure out how to get all the results for a given OrderId in one SQL Statement.
If I'm understanding your question correctly, you could use row_number in a subquery:
select orderid, orderlineid, statusid
from (
select o.orderid,
ol.orderlineid,
t.statusid,
row_number() over (partition by o.orderid order by t.taskid desc) rn
from order o
join orderline ol on o.orderid = ol.orderid
join task t on ol.orderlineid = t.orderlineid
) t
where orderid = ? and rn = 1
Please note, order is a reserved word in sql server so if that's your real table name, you'll need to use brackets around it. But I'd recommend renaming it to make your life easier.

Best way to get rid of unwanted sql subselects?

I have a table called Registrations with the following fields:
Id
DateStarted (not null)
DateCompleted (nullable)
I have a bar chart which shows the number of registrations started and completed by date.
My query looks like:
;
WITH Initial(DateStarted, StartCount)
as (
select Datestarted, COUNT(*)
FROM Registrations
GROUP BY DateStarted
)
select I.DateStarted, I.StartCount, COUNT(DISTINCT R.RegistrationId) as CompleteCount
from Initial I
inner join Registrations R
ON (I.DateStarted = R.DateCompleted)
GROUP BY I.DateStarted, I.StartCount
which returns a table that looks like:
DateStarted StartCount CompleteCount
2009-08-01 1033 903
2009-08-02 540 498
The query just has one of those code smell problems. What is a better way of doing this?
EDIT: so why wont the below work? you could throw coalesce() statements around the counts in the last select statement if you wanted to make the counts zero instead of null. it will also include dates that have completed (or ended in the example below) registrations even though that date doesn't have started registrations.
I am assuming the following table structure (roughly).
create table temp
(
id int,
start_date datetime,
end_date datetime
)
insert into temp values (1, '8/1/2009', '8/1/2009')
insert into temp values (2, '8/1/2009', '8/2/2009')
insert into temp values (3, '8/1/2009', null)
insert into temp values (4, '8/2/2009', '8/2/2009')
insert into temp values (5, '8/2/2009', '8/3/2009')
insert into temp values (6, '8/2/2009', '8/4/2009')
insert into temp values (7, '8/4/2009', null)
Then you could do the following to get what you want.
with start_helper as
(
select start_date, count(*) as count from temp group by start_date
),
end_helper as
(
select end_date, count(*) as count from temp group by end_date
)
select coalesce(a.start_date, b.end_date) as date, a.count as start_count, b.count as end_count
from start_helper a full outer join end_helper b on a.start_date = b.end_date
where coalesce(a.start_date, b.end_date) is not null
I would think the full outer join is necessary since a record can be completed today that started yesterday but we may have not started a new record today so you would lose a day from your results.
Off-hand, I think this does it:
SELECT
DateStarted
, COUNT(*) as StartCount
, SUM(CASE
WHEN DateCompleted = DateStated THEN 1
ELSE 0 END
) as CompleteCount
FROM Registration
GROUP BY DateStarted
OK, apparently I had the requirements wrong before. Given that the CompleteCounts are independent of the StartDate, then this is how I would do it:
;WITH StartDays AS
(
SELECT DateStarted
, Count(*) AS CompleteCount
FROM Registration
GROUP BY DateStarted
)
, CompleteDays AS
(
SELECT DateCompleted
, Count(*) AS StartCount
FROM Registration
GROUP BY DateCompleted
)
SELECT
DateStarted
, COALESCE(StartCount, 0) AS StartCount
, COALESCE(CompleteCount, 0) AS CompleteCount
FROM StartDays
FULL OUTER JOIN CompleteDays ON DateStarted = DateCompleted
Which actually is pretty close to what you had.
I don't see a problem. I see a common table expression being used.
You didn't provide DDL for the tables, so I'm not going to try to reproduce this. However, I think you may be able to directly substitute the SELECT for the use of Initial.
I believe the following is identical in function to what you have:
select DS.DateStarted
, count(distinct DS.RegistrationId) as StartCount
, count(distinct DC.RegistrationId) as CompleteCount
from Registrations DS
inner join Registrations DC on DS.DateStarted = DC.DateCompleted
group by Ds.DateStarted
I'm a bit confused by the name of the column DateStarted in the results. It looks to just be a date where both some things started and some things ended. And the counts are the number or registrations started and completed that day.
The inner join is throwing away any date where there is either 0 starts or 0 completes. To get all:
select coalesce(DS.DateStarted, DC.DateCompleted) as "Date"
, count(distinct DS.RegistrationId) as StartCount
, count(distinct DC.RegistrationId) as CompleteCount
from Registrations DS
full outer join Registrations DC on DS.DateStarted = DC.DateCompleted
group by Ds.DateStarted, DC.DateCompleted
If you wanted to include dates that are neither DateStarted nor DateCompleted, with counts of 0 and 0, then you will need a source of dates and I think it would be clearer to use two correlated sub-queries in select clause instead of joins and count distinct:
select DateSource."Date"
, (select count(*)
from Registrations
where DateStarted = DateSource."Date") as StartCount
, (select count (*)
from Registrations
where DateCompleted = DateSource."Datge") as CompleteCount
from DateSource -- implementation of date source left as exercise
where DateSource.Date between #LowDate and #HighDate