How to pivot and merge to avoid the NULL cells?

How to pivot and merge to avoid the NULL cells? - tsql

Sample table data:
SELECT [id],[date],[agent],[sales]
FROM [AgentSales]
id date agent sales
1 2021-01-02 00:00:00.000 Agent A 10
2 2021-01-03 00:00:00.000 Agent A 2
3 2021-01-04 00:00:00.000 Agent B 22
4 2021-01-06 00:00:00.000 Agent B 5
5 2021-02-05 00:00:00.000 Agent A 1
6 2021-02-06 00:00:00.000 Agent B 33
7 2021-03-06 00:00:00.000 Agent A 11
8 2021-03-06 00:00:00.000 Agent B 3
Group by YearMonth, Agent:
SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
ORDER BY 1
YRMON AGENT SALES
2021 1 Agent A 12
2021 1 Agent B 27
2021 2 Agent A 1
2021 2 Agent B 33
2021 3 Agent A 11
2021 3 Agent B 3
PIVOT:
SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, [AGENT A], [AGENT B]
FROM AgentSales
PIVOT (SUM(SALES) FOR AGENT IN ([AGENT A], [AGENT B])) AS PIVOTTABLE
YRMON AGENT A AGENT B
2021 1 10 NULL
2021 1 2 NULL
2021 1 NULL 22
2021 1 NULL 5
2021 2 1 NULL
2021 2 NULL 33
2021 3 11 NULL
2021 3 NULL 3
I want to have the actual value in place of the NULLs. So effectively there should be only 2 rows for each YearMon. How to do this?
Expected result:
YRMON AGENT A AGENT B
2021 1 12 27
2021 2 1 33
2021 3 11 3

Looks like simple conditional aggregation should do the trick.
SELECT
FORMAT(EOMONTH([DATE]), 'yyyy MM') YRMON,
SUM(CASE WHEN AGENT = 'AGENT B' THEN SALES END) [AGENT B],
SUM(CASE WHEN AGENT = 'AGENT A' THEN SALES END) [AGENT A]
FROM AgentSales
GROUP BY EOMONTH([DATE])
ORDER BY EOMONTH([DATE]);
Or if you want each one on a separate row, you can add in a row-number and group on that
SELECT
FORMAT(EOMONTH([DATE]), 'yyyy MM') YRMON,
SUM(CASE WHEN AGENT = 'AGENT B' THEN SALES END) [AGENT B],
SUM(CASE WHEN AGENT = 'AGENT A' THEN SALES END) [AGENT A]
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY EOMONTH([DATE]), AGENT ORDER BY [DATE])
FROM AgentSales
) sales
GROUP BY EOMONTH([DATE]), rn
ORDER BY EOMONTH([DATE]), rn;
Grouping by EOMONTH is more performant than grouping by a string
db<>fiddle

Related

Using the SUM OVER clause, how to check sum over period only when output is not greater than a certain value, otherwise use current month value?

Sample data:
select date, agent, sales
from agentsales
date agent sales
2021-01-03 00:00:00.000 Agent A 10
2021-02-05 00:00:00.000 Agent A 15
2021-03-10 00:00:00.000 Agent A 10
2021-01-05 00:00:00.000 Agent B 5
2021-02-06 00:00:00.000 Agent B 28
2021-03-10 00:00:00.000 Agent B 5
2021-01-02 00:00:00.000 Agent C 35
2021-02-04 00:00:00.000 Agent C 25
2021-03-08 00:00:00.000 Agent C 15
2021-01-01 00:00:00.000 Agent D 5
2021-02-02 00:00:00.000 Agent D 35
2021-03-10 00:00:00.000 Agent D 31
I want to get the counts of agents who have crossed 30 sales, such that if they have never crossed a total of 30 sales then consider sum over current and previous months, otherwise only current month.
Expected output:
YrMon Count_Agent_more_than_30_sales
Jan21 1
Feb21 2
Mar21 2
Logic:
Jan21 - 1 since only C has crossed 30 sales
Feb21 - 2 since B and D have crossed 30 sales. Agent D has crossed the 30 mark in the month, and B has crossed over period for first time. C is not considered as it previously crossed the 30 mark.
Mar21 - 2 since A and D have crossed 30 sales. Agent A has crossed mark over period for 1st time. D has crossed for the month. B is not considered as periodic case was already considered in last month. C is not considered as it already crossed 30 mark last month.
As mentioned above, I want to get the counts of agents who have crossed 30 sales, such that if they have never crossed a total of 30 sales then consider sum over current and previous months, otherwise only current month.
My query to calculate sum over period:
;WITH CTE AS (SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
)
SELECT *, SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON) SUMOVERPERIOD FROM CTE
ORDER BY 2,1
YRMON AGENT SALES SUMOVERPERIOD
2021 1 Agent A 10 10
2021 2 Agent A 15 25
2021 3 Agent A 10 35
2021 1 Agent B 5 5
2021 2 Agent B 28 33
2021 3 Agent B 5 38
2021 1 Agent C 35 35
2021 2 Agent C 25 60
2021 3 Agent C 15 75
2021 1 Agent D 5 5
2021 2 Agent D 35 40
2021 3 Agent D 31 71
Now I am trying to apply the logic on the calculated sum:
;WITH CTE AS (SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
)
SELECT *, SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON) SUMOVERPERIOD,
CASE WHEN SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON)>30 THEN 1 ELSE 0 END AS CALC
FROM CTE
ORDER BY 2,1
YRMON AGENT SALES SUMOVERPERIOD CALC
2021 1 Agent A 10 10 0
2021 2 Agent A 15 25 0
2021 3 Agent A 10 35 1
2021 1 Agent B 5 5 0
2021 2 Agent B 28 33 1
2021 3 Agent B 5 38 1
2021 1 Agent C 35 35 1
2021 2 Agent C 25 60 1
2021 3 Agent C 15 75 1
2021 1 Agent D 5 5 0
2021 2 Agent D 35 40 1
2021 3 Agent D 31 71 1
This query is always considering sum over current and previous period.
How to check whether the sales has previously crossed the 30 sales mark and for such cases to exclude doing the sum over period? For example can we apply LAG on the result of the SUM OVER column?

Please check if one of these fits your needs (I think the description confusion)
Option 1
-- If you want to count only the first time [agent] crossed 30 sales
;With MyCTE01 as (
SELECT
[date] = EOMONTH([date], -1),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
),
MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
-- The idea of using "and S - [sales] < 30" instead of ROW_NUMBER came from #Charlieface, but it is better to do the work on DATE data type and not on string
WHERE S > 30 and S - [sales] < 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
Option 2
-- If you want to count all the [agent] crossed 30 sales till now
;With MyCTE01 as (
SELECT
[date] = DATEADD(DAY, 1, EOMONTH([date], -1)),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
)
,MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
WHERE S > 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
Option 3
-- If you want to count only the first time [agent] crossed 30 sales or when the sales or over 30
;With MyCTE01 as (
SELECT
[date] = DATEADD(DAY,1,EOMONTH([date], -1)),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
)
,MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
-- The idea of using "and S - [sales] < 30" instead of ROW_NUMBER came from #Charlieface, but it is better to do the work on DATE data type and not on string
WHERE (S > 30 and S - [sales] < 30) or sales > 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
DDL+DML
USE tempdb
GO
DROP TABLE IF EXISTS [AgentSales]
GO
CREATE TABLE [AgentSales](id INT IDENTITY(1,1), [date] DATE, agent VARCHAR(100), sales INT)
GO
INSERT [AgentSales]([date],[agent],[sales]) VALUES
('2021-01-03 00:00:00.000','Agent A', 10),
('2021-02-05 00:00:00.000','Agent A', 15),
('2021-03-10 00:00:00.000','Agent A',10),
('2021-01-05 00:00:00.000','Agent B',5 ),
('2021-02-06 00:00:00.000','Agent B',28),
('2021-03-10 00:00:00.000','Agent B',5 ),
('2021-01-02 00:00:00.000','Agent C',35),
('2021-02-04 00:00:00.000','Agent C',25),
('2021-03-08 00:00:00.000','Agent C',15),
('2021-01-01 00:00:00.000','Agent D',5 ),
('2021-02-02 00:00:00.000','Agent D',35),
('2021-03-10 00:00:00.000','Agent D',31)
GO
SELECT [id],[date],[agent],[sales]
FROM [AgentSales]
GO

Looks like this should work for you
You need to pre-aggregate the sales per agent and month, then get a running sum of that aggregate
Then simply check if each row has crossed over in this month by comparing the current data with the running sum
SELECT
YrMon = FORMAT(Month, 'yyyy MM'),
Count_Agent_more_than_30_sales =
COUNT(CASE WHEN SumOverPeriod >= 30 AND SumOverPeriod - sales < 30 OR sales >= 30 THEN 1 END)
FROM (
SELECT
Month = EOMONTH(date),
agent,
sales = SUM(sales),
SumOverPeriod = SUM(SUM(sales)) OVER (PARTITION BY agent ORDER BY EOMONTH(date)
ROWS UNBOUNDED PRECEDING)
FROM AgentSales
GROUP BY EOMONTH(date), agent
) sales
GROUP BY Month;
db<>fiddle

How to get last value with condition in postgreSQL?

I have a table in postgres with three columns, one with a group, one with a date and the last with a value.
grp
mydate
value
A
2021-01-27
5
A
2021-01-23
10
A
2021-01-15
15
B
2021-01-26
7
B
2021-01-24
12
B
2021-01-15
17
I would like to create a view with a sequence of dates and the most recent value on table for each date according with group.
grp
mydate
value
A
2021-01-27
5
A
2021-01-26
10
A
2021-01-25
10
A
2021-01-24
10
A
2021-01-23
10
A
2021-01-22
15
A
2021-01-21
15
A
2021-01-20
15
A
2021-01-19
15
A
2021-01-18
15
A
2021-01-17
15
A
2021-01-16
15
A
2021-01-15
15
B
2021-01-27
7
B
2021-01-26
7
B
2021-01-25
12
B
2021-01-24
12
B
2021-01-23
17
B
2021-01-22
17
B
2021-01-21
17
B
2021-01-20
17
B
2021-01-19
17
B
2021-01-18
17
B
2021-01-17
17
B
2021-01-16
17
B
2021-01-15
17
SQL code to generate the table:
CREATE TABLE foo (
grp char(1),
mydate date,
value integer);
INSERT INTO foo VALUES
('A', '2021-01-27', 5),
('A', '2021-01-23', 10),
('A', '2021-01-15', 15),
('B', '2021-01-26', 7),
('B', '2021-01-24', 12),
('B', '2021-01-15', 17)
I have so far managed to generate a visualization with the sequence of dates joined with the distinct groups, but I am failing to get the most recent value.
SELECT DISTINCT(foo.grp), (date_trunc('day'::text, dd.dd))::date AS mydate
FROM foo, generate_series((( SELECT min(foo.mydate) AS min
FROM foo))::timestamp without time zone, (now())::timestamp without time zone, '1 day'::interval) dd(dd)

step-by-step demo:db<>fiddle
SELECT
grp,
gs::date as mydate,
value
FROM (
SELECT
*,
COALESCE( -- 2
lead(mydate) OVER (PARTITION BY grp ORDER BY mydate) - 1, -- 1
mydate
) as prev_date
FROM foo
) s,
generate_series(mydate, prev_date, interval '-1 day') as gs -- 3
ORDER BY grp, mydate DESC -- 4
lead() window function shifts the next value of an ordered group (= partition) into the current one. The group is already defined, the order is the date. This can be used to create the required date range. Since you don't want to have the last date twice (as end of the first range and beginning of the next one) the end date stops - 1 (one day before the next group starts)
This is for the very last records of the groups: They don't have a following record, so lead() yield NULL. To avoid this, COALESCE() sets them to the current record.
Now, you can create a date range with the current and the next date value using generate_series().
Finally you can generate the required order

One SQL Stored Procedure to get cut off date of two different cut off date format

I have one system that read from two client databases. For the two clients, both of them have different format of cut off date:
1) Client A: Every month at 15th. Example: 15-12-2016.
2) Client B: Every first day of the month. Example: 1-1-2017.
The cut off date are stored in the table as below:
Now I need a single query to retrieve the current month's cut off date of the client. For instance, today is 15-2-2017, so the expected cut off date for both clients should be as below:
1) Client A: 15-1-2017
2) Client B: 1-2-2017
How can I accomplish this in a single Stored Procedure? For client B, I can always get the first day of the month. But this can't apply to client A since their cut off is last month's date.

Might be something like this you are looking for:
DECLARE #DummyClient TABLE(ID INT IDENTITY,ClientName VARCHAR(100));
DECLARE #DummyDates TABLE(ClientID INT,YourDate DATE);
INSERT INTO #DummyClient VALUES
('A'),('B');
INSERT INTO #DummyDates VALUES
(1,{d'2016-12-15'}),(2,{d'2017-01-01'});
WITH Numbers AS
( SELECT 0 AS Nr
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 9
UNION ALL SELECT 10
UNION ALL SELECT 11
UNION ALL SELECT 12
UNION ALL SELECT 13
UNION ALL SELECT 14
UNION ALL SELECT 15
UNION ALL SELECT 16
UNION ALL SELECT 17
UNION ALL SELECT 18
UNION ALL SELECT 19
UNION ALL SELECT 20
UNION ALL SELECT 21
UNION ALL SELECT 22
UNION ALL SELECT 23
UNION ALL SELECT 24
)
,ClientExt AS
(
SELECT c.*
,MIN(d.YourDate) AS MinDate
FROM #DummyClient AS c
INNER JOIN #DummyDates AS d ON c.ID=d.ClientID
GROUP BY c.ID,c.ClientName
)
SELECT ID,ClientName,D
FROM ClientExt
CROSS APPLY(SELECT DATEADD(MONTH,Numbers.Nr,MinDate)
FROM Numbers) AS RunningDate(D);
The result
ID Cl Date
1 A 2016-12-15
1 A 2017-01-15
1 A 2017-02-15
1 A 2017-03-15
1 A 2017-04-15
1 A 2017-05-15
1 A 2017-06-15
1 A 2017-07-15
1 A 2017-09-15
1 A 2017-10-15
1 A 2017-11-15
1 A 2017-12-15
1 A 2018-01-15
1 A 2018-02-15
1 A 2018-03-15
1 A 2018-04-15
1 A 2018-05-15
1 A 2018-06-15
1 A 2018-07-15
1 A 2018-08-15
1 A 2018-09-15
1 A 2018-10-15
1 A 2018-11-15
1 A 2018-12-15
2 B 2017-01-01
2 B 2017-02-01
2 B 2017-03-01
2 B 2017-04-01
2 B 2017-05-01
2 B 2017-06-01
2 B 2017-07-01
2 B 2017-08-01
2 B 2017-10-01
2 B 2017-11-01
2 B 2017-12-01
2 B 2018-01-01
2 B 2018-02-01
2 B 2018-03-01
2 B 2018-04-01
2 B 2018-05-01
2 B 2018-06-01
2 B 2018-07-01
2 B 2018-08-01
2 B 2018-09-01
2 B 2018-10-01
2 B 2018-11-01
2 B 2018-12-01
2 B 2019-01-01

PostgreSQL - GROUP subsequent rows

I have a table which contains some records ordered by date.
And I want to get start and end dates for each subsequent group (grouped by some criteria e.g.position).
Example:
create table tbl (id int, date timestamp without time zone,
position int);
insert into tbl values
( 1 , '2013-12-01', 1),
( 2 , '2013-12-02', 2),
( 3 , '2013-12-03', 2),
( 4 , '2013-12-04', 2),
( 5 , '2013-12-05', 3),
( 6 , '2013-12-06', 3),
( 7 , '2013-12-07', 2),
( 8 , '2013-12-08', 2)
Of course if I simply group by position I will get wrong result as positions could be the same for different groups:
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tbl GROUP BY POSITION
I will get:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
But I want:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 04 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 07 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
I found a solution for MySql which uses variables and I could port it but I believe PostgreSQL can do it in some smarter way using its advanced features like window functions.
I'm using PostgreSQL 9.2

There is probably more elegant solution but try this:
WITH tmp_tbl AS (
SELECT *,
CASE WHEN lag(position,1) OVER(ORDER BY id)=position
THEN position
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tbl
)
, tmp_tbl2 AS(
SELECT position,date,
CASE WHEN lag(position,1)OVER(ORDER BY id)=position
THEN lag(grouping_col,1) OVER(ORDER BY id)
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tmp_tbl
)
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tmp_tbl2 GROUP BY grouping_col,position

There are some complete answers on Stackoverflow for that, so I'll not repeat them in detail, but the principle of it is to group the records according to the difference between:
The row number when ordered by the date (via a window function)
The difference between the dates and a static date of reference.
So you have a series such as:
rownum datediff diff
1 1 0 ^
2 2 0 | first group
3 3 0 v
4 5 1 ^
5 6 1 | second group
6 7 1 v
7 9 2 ^
8 10 2 v third group

How do I add totals/subtotals to a set of results without grouping the row data?

I'm constructing a SQL query for a business report. I need to have both subtotals (grouped by file number) and grand totals on the report.
I'm entering unknown SQL territory, so this is a bit of a first attempt. The query I made is almost working. The only problem is that the entries are being grouped -- I need them separated in the report.
Here is my sample data:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3B Mar 28/10 1 3
3B Mar 28/10 5 10
When I run this query
SELECT
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM SubtotalTesting
GROUP BY FileNumber, Date WITH ROLLUP
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
Date
I get the following:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 6 13 <--
3B NULL 6 13
NULL NULL 17 38
What I want is:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 1 3 <--
3B Mar 28/10 5 10 <--
3B NULL 6 13
NULL NULL 17 38
I can clearly see why the entries are being grouped, but I have no idea how to separate them while still returning the subtotals and grand total.
I'm a bit green when it comes to doing advanced SQL queries like this, so if I'm taking the wrong approach to the problem by using WITH ROLLUP, please suggest some preferred alternatives -- you don't have to write the whole query for me, I just need some direction. Thanks!

WITH SubtotalTesting (FileNumber, Date, Cost, Charge) AS
(
SELECT '3', CAST('2009-22-12' AS DATETIME), 5, 10
UNION ALL
SELECT '3', '2010-13-06', 6, 15
UNION ALL
SELECT '3B', '2010-28-03', 1, 3
UNION ALL
SELECT '3B', '2010-28-03', 5, 10
),
q AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY filenumber) AS rn
FROM SubTotalTesting
)
SELECT rn,
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM q
GROUP BY
FileNumber, Date, rn WITH ROLLUP
HAVING GROUPING(rn) <= GROUPING(Date)
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END),
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END),
Date