Performance tune tSQL Query count(*) & subqueries - tsql

I know that there's a better way to do what I'm trying to accomplish here. Though the query works I fear it's performance will suffer as the dataset's it is applied to grow.
I don't even necesarily need someone to rewrite what I have if they would just be willing to point me in the direction of the topic I should study I would greatly appreciate it.
What I'm trying to return with this query is a count of the number of records at or above a certain status.
Thanks in advance for your help!
SELECT
( SELECT count(*)
FROM TABLE1 c1
WHERE ( c1.U_KEY3 NOT LIKE 'z%' AND (c1.U_KEY1 = '' or c1.U_KEY1 IS NULL) )
) AS 'STATUS is EMPTY'
,
( SELECT count(*)
FROM TABLE1 c1
WHERE ( c1.U_KEY3 NOT LIKE 'z%' AND LEFT(c1.U_KEY1,2) >= '70' )
) AS 'STATUS > 70'
,
( SELECT count(*)
FROM TABLE1 c1
WHERE ( c1.U_KEY3 NOT LIKE 'z%' AND LEFT(c1.U_KEY1,2) >= '50' )
) AS 'STATUS > 50'
,
( SELECT count(*)
FROM TABLE1 c1
WHERE ( c1.U_KEY3 NOT LIKE 'z%' AND LEFT(c1.U_KEY1,2) >= '30' )
) AS 'STATUS > 30'
,
( SELECT count(*)
FROM TABLE1 c1
WHERE ( c1.U_KEY3 NOT LIKE 'z%' AND LEFT(c1.U_KEY1,2) >= '10' )
) AS 'STATUS > 10'

You could roll all the subqueries into a single query using a CASE statement:
SELECT
SUM(CASE WHEN c1.U_KEY1 = '' OR c1.U_KEY1 IS NULL THEN 1 ELSE 0 END) AS 'STATUS IS EMPTY',
SUM(CASE WHEN LEFT(c1.U_KEY1,2) >= '70' THEN 1 ELSE 0 END) AS 'STATUS > 70',
SUM(CASE WHEN LEFT(c1.U_KEY1,2) >= '50' THEN 1 ELSE 0 END) AS 'STATUS > 50',
SUM(CASE WHEN LEFT(c1.U_KEY1,2) >= '30' THEN 1 ELSE 0 END) AS 'STATUS > 30',
SUM(CASE WHEN LEFT(c1.U_KEY1,2) >= '10' THEN 1 ELSE 0 END) AS 'STATUS > 10'
FROM TABLE1 c1
WHERE c1.U_KEY3 NOT LIKE 'z%'
But this might not run as fast as the individual subqueries.

I would turn the question around like this:
DECLARE #t TABLE (Id INT, U_Key1 VARCHAR(4) null);
INSERT INTO #t (id,U_Key1)
VALUES
(1,null),
(2,'902'),
(3,'452'),
(4,'401'),
(5,'103'),
(6,'359'),
(7,'335'),
(8,'772'),
(9,'143'),
(10,'222'),
(11,'664'),
(12,'992'),
(13,'122'),
(14,'332'),
(15,'421'),
(16,'622'),
(17,'982'),
(18,'1234'),
(19,null),
(20,'012');
WITH A AS (
SELECT CAST(LEFT(U_Key1,2) AS INT) val FROM #t
), limits AS (
SELECT 10 limitval, 'Status >= 10' limittext
UNION ALL
SELECT 30 , 'Status >= 30'
UNION ALL
SELECT 50 , 'Status >= 50'
UNION ALL
SELECT 70 , 'Status >= 70'
), Counts AS (
SELECT 'Status is empty' Limittext, COUNT(id) Count FROM #t
WHERE U_Key1 IS null
UNION ALL
SELECT l.limittext, COUNT( A.val) Count FROM A
CROSS JOIN limits l
WHERE A.val >= l.limitval
GROUP BY l.limittext
)
SELECT * FROM Counts
That produces the result:
Status is empty 2
Status >= 10 17
Status >= 30 12
Status >= 50 6
Status >= 70 4

Related

Count the number of instances the time is above average time

Here is my code:
arrival_cluster_raw as (
SELECT
routes.uc_id ,
cg.cluster_id ,
cg.cluster_centroid ,
routes.imei ,
routes.time_created::date as campaign_date,
min(routes.time_created) as m_per_imei_cluster
FROM cluster_groups as cg
group by 1,2,3,4,5
)
,
arrival_cluster_final as
(
select uc_id, campaign_date, cluster_id, cluster_centroid , date_trunc('second', AVG(m_per_imei_cluster::TIME)) as avg_arrival_time,
count(case when m_per_imei_cluster::TIME < (select AVG(m_per_imei_cluster::TIME) from arrival_cluster_raw) then 1 else null END) as "num_of_arrival_teams_before_avg_time"
,count(case when m_per_imei_cluster::TIME > (select AVG(m_per_imei_cluster::TIME) from arrival_cluster_raw) then 1 else null END) as "num_of_arrival_teams_after_avg_time"
FROM arrival_cluster_raw
group by uc_id,cluster_id, cluster_centroid ,campaign_date
)
The problem is that in the "arrival_cluster_final", the average value of the entire cluster
is being compared whereas I want to compare the average value for the combination of uc_id,cluster_id, cluster_centroid ,campaign_date
--can you try this one.
WITH arrival_cluster_raw AS (
SELECT
routes.uc_id,
cg.cluster_id,
cg.cluster_centroid,
routes.imei,
routes.time_created::date AS campaign_date,
min(routes.time_created) AS m_per_imei_cluster
FROM
cluster_groups AS cg
JOIN routes ON routes.uc_id = cg.id --assume the way you want join.
GROUP BY
1,2,3,4,5
),
arrival_cluster_final AS (
SELECT
uc_id,
cluster_id,
cluster_centroid,
imei,
campaign_date,
date_trunc('second', (avg(m_per_imei_cluster) OVER w))
,count( CASE WHEN (avg(m_per_imei_cluster) OVER w) < m_per_imei_cluster THEN
1
ELSE
NULL
END) AS num_of_arrival_teams_before_avg_time
,count(
CASE WHEN (avg(m_per_imei_cluster) OVER w) > m_per_imei_cluster THEN
1
ELSE
NULL
END) AS num_of_arrival_teams_after_avg_time
FROM
arrival_cluster_raw
WINDOW w AS (PARTITION BY uc_id,
cluster_id,
cluster_centroid,
campaign_date))
SELECT * FROM arrival_cluster_final ORDER BY 1;

How to show the maximum number for each combination of customer and product in a specific state in Postgresql?

I just begin learning Postgresql recently.
I have a table named 'sales':
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
)
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
......
It looks like this:
And there are 500 rows in total.
Now I want to use the query to implement this:
For each combination of customer and product, output the maximum sales quantities for
NY and minimum sales quantities for NJ and CT in 3 separate columns. Like the first
report, display the corresponding dates (i.e., dates of those maximum and minimum sales
quantities). Furthermore, for CT and NJ, include only the sales that occurred after 2000;
for NY, include all sales.
It should be like this:
I have tried the following query:
SELECT
cust customer,
prod product,
MAX(CASE WHEN rn3 = 1 THEN quant END) NY_MAX,
MAX(CASE WHEN rn3 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date,
MIN(CASE WHEN rn2 = 1 THEN quant END) NJ_MIN,
MIN(CASE WHEN rn2 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date,
MIN(CASE WHEN rn1 = 1 THEN quant END) CT_MIN,
MIN(CASE WHEN rn1 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant) rn1,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant) rn2,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant DESC) rn3
FROM sales
) x
WHERE rn1 = 1 OR rn2 = 1 or rn3 = 1
GROUP BY cust, prod;
This is the result:
This is wrong because it shows me the maximum number and minimum number of all states, not of the specific state I want. And I have no idea how to deal with the year as the question as me to do.
We can handle this using separate CTEs along with a calendar table:
WITH custprod AS (
SELECT DISTINCT cust, prod
FROM sales
),
ny_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant DESC) rn
FROM sales
WHERE state = 'NY'
),
nj_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant) rn
FROM sales
WHERE state = 'NJ'
),
ct_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant) rn
FROM sales
WHERE state = 'CT'
)
SELECT
cp.cust,
cp.prod,
nys.quant AS ny_max,
nys.year::text || '-' || nys.month::text || '-' || nys.day::text AS ny_date,
njs.quant AS nj_max,
njs.year::text || '-' || njs.month::text || '-' || njs.day::text AS nj_date,
cts.quant AS ct_max,
cts.year::text || '-' || cts.month::text || '-' || cts.day::text AS ct_date
FROM custprod cp
LEFT JOIN ny_sales nys
ON cp.cust = nys.cust AND cp.prod = nys.prod AND nys.rn = 1
LEFT JOIN nj_sales njs
ON cp.cust = njs.cust AND cp.prod = njs.prod AND njs.rn = 1
LEFT JOIN ct_sales cts
ON cp.cust = cts.cust AND cp.prod = cts.prod AND cts.rn = 1
ORDER BY
cp.cust,
cp.prod;
Note: You didn't provide comprehensive sample data, but the above seems to be working in the demo link below.
Demo

Dividing sums with different WHERE conditions

I need help in my query. I am trying to divide a SUM of a column with different WHERE conditions for example
SELECT
TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY') AS YR,
SUM(T1.PRICE) AS COLUMN_1
FROM TABLE_ONE T1
INNER JOIN SUB_STATUSES status ON status.SUB_ID = T1.ID
WHERE status.R_SUB_STATUS_CODE = 'COMPLETED'
AND T1.TYPE = 'COMPANY' OR T1.TYPE = 'SMALL_BUSINESS'
GROUP BY TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY')
ORDER BY TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY') DESC
DIVIDE BY
SELECT
TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY') AS YR,
SUM(T1.PRICE) AS COLUMN_1
from TABLE_ONE T1
INNER JOIN SUB_STATUSES status ON status.SUB_ID = T1.ID
WHERE status.R_SUB_STATUS_CODE = 'COMPLETED'
AND T1.TYPE = 'LOT' OR T1.TYPE = 'LAND'
GROUP BY TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY')
ORDER BY TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY') DESC
The first Column returns this :
2017 1094
2016 89
2015 95
2014 101
2013 113
2012 173
2011 191
2010 165
Use a case statement instead of a where clause. Outer query checks divide by zero.
SELECT yr, case when column_2 <> 0 then column_1/column2 else 0 end divcol
FROM (
SELECT
TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY') AS YR,
SUM(case when t1.type in ('COMPANY', 'SMALL_BUSINESS') THEN T1.PRICE ELSE 0 END) as COLUMN_1,
SUM(case when t1.type in ('LOT', 'LAND') THEN T1.PRICE ELSE 0 end) AS COLUMN_2
FROM TABLE_ONE T1
INNER JOIN SUB_STATUSES status ON status.SUB_ID = T1.ID
WHERE status.R_SUB_STATUS_CODE = 'COMPLETED'
GROUP BY TO_CHAR(PS.REPORT_PERIOD_END_DATE, 'YYYY')
)
ORDER BY yr DESC

Count Divided by a Count

How do you divide a count by another count. I have seen a few different methods but I am unable to get them to work for my purposes. The code that I am working on currently is:
Select(select COUNT(lUsers)
FROM Tlocation
WHERE dLastUpdated IS NOT NULL
AND dRemovalDate BETWEEN DATEADD(day,-7,GETDATE()) and GETDATE())
/
(SELECT COUNT(lUsers)
FROM Tlocation
WHERE dLastUpdated < GETDATE()
AND dRemovalDate IS NULL OR dRemovalDate > GETDATE())
But this just returns a 0 every time.
It's because the value is less than an INT (which COUNT() returns). Here's how you could do it:
SELECT
(
SELECT 1562
)
/
CAST(
(
SELECT 92825
)
AS DECIMAL(20, 10)
)
;
Or for your query:
Select(select COUNT(lUsers)
FROM Tlocation
WHERE dLastUpdated IS NOT NULL
AND dRemovalDate BETWEEN DATEADD(day,-7,GETDATE()) and GETDATE())
/
CAST(
(SELECT COUNT(lUsers)
FROM Tlocation
WHERE dLastUpdated < GETDATE()
AND dRemovalDate IS NULL OR dRemovalDate > GETDATE())
AS DECIMAL(20, 10)
)
;

TSQL get overlapping periods from datetime ranges

I have a table with date range an i need the sum of overlapping periods (in hours) between its rows.
This is a schema example:
create table period (
id int,
starttime datetime,
endtime datetime,
type varchar(64)
);
insert into period values (1,'2013-04-07 8:00','2013-04-07 13:00','Work');
insert into period values (2,'2013-04-07 14:00','2013-04-07 17:00','Work');
insert into period values (3,'2013-04-08 8:00','2013-04-08 13:00','Work');
insert into period values (4,'2013-04-08 14:00','2013-04-08 17:00','Work');
insert into period values (5,'2013-04-07 10:00','2013-04-07 11:00','Holyday'); /* 1h overlapping with 1*/
insert into period values (6,'2013-04-08 10:00','2013-04-08 20:00','Transfer'); /* 6h overlapping with 3 and 4*/
insert into period values (7,'2013-04-08 11:00','2013-04-08 12:00','Test'); /* 1h overlapping with 3 and 6*/
And its fiddle: http://sqlfiddle.com/#!6/9ca31/10
I expect a sum of 8h overlapping hours:
1h (id 5 over id 1)
6h (id 6 over id 3 and 4)
1h (id 7 over id 3 and 6)
I check this: select overlapping datetime events with SQL but seems to not do what I need.
Thank you.
select sum(datediff(hh, case when t2.starttime > t1.starttime then t2.starttime else t1.starttime end,
case when t2.endtime > t1.endtime then t1.endtime else t2.endtime end))
from period t1
join period t2 on t1.id < t2.id
where t2.endtime > t1.starttime and t2.starttime < t1.endtime;
Updated to handle several overlaps:
select sum(datediff(hh, start, fin))
from (select distinct
case when t2.starttime > t1.starttime then t2.starttime else t1.starttime end as start,
case when t2.endtime > t1.endtime then t1.endtime else t2.endtime end as fin
from period t1
join period t2 on t1.id < t2.id
where t2.endtime > t1.starttime and t2.starttime < t1.endtime
) as overlaps;
I have some "dirty" solution. Hope this helps :)
with src as (
select
convert(varchar, starttime, 112) [start_date]
, cast(left(convert(varchar, starttime, 108), 2) as int) [start_time]
, convert(varchar, endtime, 112) [end_date]
, cast(left(convert(varchar, endtime, 108), 2) as int) [end_time]
, id
from [period]),
[gr] as (
select
row_number() over(order by s1.[start_date], s1.[start_time], s1.[end_time], s2.[start_time], s2.[end_time]) [no]
, s1.[start_date] [date]
, s1.[start_time] [t1]
, s1.[end_time] [t2]
, s2.[start_time] [t3]
, s2.[end_time] [t4]
from src s1
join src s2 on s1.[start_date] = s2.[start_date]
and s1.[end_date] = s2.[end_date]
and (s1.[start_time] between s2.[start_time] and s2.[end_time] or s1.[end_time] between s2.[start_time] and s2.[end_time])
and s1.id != s2.id),
[raw] as (
select [no], [date], [t1] [h] from [gr] union all
select [no], [date], [t2] from [gr] union all
select [no], [date], [t3] from [gr] union all
select [no], [date], [t4] from [gr]),
[max_min] as (
select [no], [date], max(h) [max_h], min(h) [min_h]
from [raw]
group by [no], [date]
),
[result] as (
select [raw].*
from [raw]
left join [max_min] on [raw].[no] = [max_min].[no]
and ([raw].h = [max_min].[max_h] or [raw].h = [max_min].[min_h])
where [max_min].[no] is null),
[final] as (
select distinct r1.[date], r1.h [start_h], r2.h [end_h], abs(r1.h - r2.h) [dif]
from [result] r1
join [result] r2 on r1.[no] = r2.[no]
where abs(r1.h - r2.h) > 0
and r1.h > r2.h)
select sum(dif) [overlapping hours] from [final]
SQLFiddle