Pivot Table By One Column And Multiple Aggregates - tsql

I prepared the following report. It displays three columns (AccruedInterest, Tip & CardRevenue) by month for a given year. Now I want to "Rotate" the result so that StartDate value turn into 12 columns.
I have this:
I need this:
I have tried pivoting the table but I need multiple columns to be aggregated as you see.

You have to unpivot your data and the pivot.
Note: I put in my own values in your table as to make each unique so you know the data is correct.
--Create YourTable
SELECT * INTO YourTable
FROM
(
SELECT CAST('2015-01-01' AS DATE) StartDate,
607.834 AS AccruedInterest,
1 AS Tip,
3 AS CardRevenue
UNION ALL
SELECT CAST('2015-02-01' AS DATE) StartDate,
643.298 AS AccruedInterest,
16.8325 AS Tip,
5 AS CardRevenue
) A;
GO
--This pivots your data
SELECT *
FROM
(
--This unpivots your data using cross apply
SELECT col,val,StartDate
FROM YourTable
CROSS APPLY
(
SELECT 'AccruedInterest', CAST(AccruedInterest AS VARCHAR(100))
UNION ALL
SELECT 'Tip', CAST(Tip AS VARCHAR(100))
UNION ALL
SELECT 'CardRevenue', CAST(CardRevenue AS VARCHAR(100))
) A(col,val)
) B
PIVOT
(
MAX(val) FOR startdate IN([2015-01-01],[2015-02-01])
) pvt
Results:
col 2015-01-01 2015-02-01
AccruedInterest 607.834 643.298
CardRevenue 3 5
Tip 1.0000 16.8325

Related

Find the next oldest row in Redshift

I have a table called user_activity in Redshift that has department, user_id, activity_type, activity_id, activity_date.
I'd like to query a daily report of how many days since the last event (of any type). Using CROSS APPLY (SQL Server) or LATERAL JOIN (Postgres 9+), I'd do something like...
SELECT d.date, a.last_activity_date
FROM date_table d
CROSS JOIN (
SELECT DISTINCT user_id FROM activity_table
) u
CROSS APPLY (
SELECT TOP 1 activity_date as last_activity_date
FROM activity_table
WHERE user_id = u.user_id AND activity_date <= d.date
ORDER BY activity_date DESC
) a
For now, I write it similar to the below, but it is a bit slow and I am afraid it'll only get slower.
with user_activity as (
select distinct activity_date, user_id from activity_table
)
select
d.date, u.user_id,
max(u.activity_date) as last_activity_date
from date_table d
inner join user_activity u on u.activity_date <= d.date
where d.date between '2020-01-01' and current_date
group by 1, 2
Can someone suggest a good alternative for my needs or for CROSS APPLY / LATERAL JOIN.
As you are seeing cross joining and inequality joining will slow down as you data grows and are generally not the approach you want in Redshift. This is due to the data size increase that comes with this type of action when applied to large data tables that are typical in Redshift.
You want to use window functions to perform this type of analysis. But you will need to step back and rethink how you will structure the SQL. A MAX(activity_date) window function, partitioned by user_id and ordered by date and with a frame clause of all preceding rows, will find the most recent activity to any activity.
Now this will produce only rows for user_ids and dates that exist in the data table and it looks like you want 1 row for each date for each user_id, right? To do this you need to UNION in a frame of data that has 1 row for each date for each user_id ahead of the window function. You will need NULLs in for the other columns so that the data widths match. You will also want the dates in a separate column from activity_date. Now all dates for all user ids will be in the source and the window function will give you the result you want.
You also ask ‘how is this better than the joins?’ Well in the joins you are replicating all the data records by the number of dates which can get really big. In this approach you just have the original data records plus one row per user_id per date (which is the size of your output) and as the number of records per user_id grows this approach doesn’t.
——— Request to modify asker’s code per comments made to their approach ———
Your code is definitely on the right track as you have removed the massive inequality join of your original. I made 2 comments about it. The first is that I believe you need GROUP BY user_id, date to prevent multiple rows per user_id per date that would result if there are records for the same user_id on a single date with differing activity_types. This is a simple oversight.
The second is to state that I intended for you to use UNION ALL, not LEFT JOIN, in combining the actual data and the user_id/date framework. Your approach works fine but I have found that unioning with very large amounts of data is generally faster than joining but you do need to make sure the columns match up. Either way we end up with a data segment with 3 columns - 2 date columns, one with NULLs for framework rows, and 1 user_id. Your approach is fine and the difference in performance is likely very small unless you have huge tables.
Since you asked for a rewrite, here it is with both changes. (NOTE: my laptop is in the shop so I don’t have ready access to Redshift at the moment and this SQL is untested. If the intent is not clear from this and you need me to debug it will be delayed by a few days. I’m keeping your setup methods and SQL structure.)
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select cal_date, user_id,
lag(max(activity_date), 1) ignore nulls over (partition by user_id order by date) as last_activity_date
from (
Select user_id, date as cal_date, NULL as activity_date from user_dates
Union all
Select user_id, activity_date as cal_date, activity_date from user_activity
)
Group by user_id, cal_date
)
select * from user_date_activity
order by user_id, cal_date```
This was my query based on Bill's answer.
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select ud.date, ud.user_id,
lag(ua.activity_date, 1) ignore nulls over (partition by ud.user_id order by ud.date) as last_activity_date
from user_dates ud
left join user_activity ua on ud.date = ua.activity_date and ud.user_id = ua.user_id
)
select * from user_date_activity
order by user_id, date

Select Date and Count, Group By Date -- How to show Dates with NULL Counts?

SELECT
CAST(c.DT AS DATE) AS 'Date'
, COUNT(p.PatternID) AS 'Count'
FROM CalendarMain c
LEFT OUTER JOIN Pattern p
ON c.DT = p.PatternDate
INNER JOIN Result r
ON p.PatternID = r.PatternID
INNER JOIN Detail d
ON p.PatternID = d.PatternID
WHERE r.Type = 7
AND d.Panel = 501
AND CAST(c.DT AS DATE)
BETWEEN '20190101' AND '20190201'
GROUP BY CAST(c.DT AS DATE)
ORDER BY CAST(c.DT AS DATE)
The query above isn't working for me. It still skips days where the COUNT is NULL for it's c.DT.
c.DT and p.PatternDate are both time DateTime, although c.DT can't be NULL. It is actually the PK for the table. It is populated as DateTimes for every single day from 2015 to 2049, so the records for those days exist.
Another weird thing I noticed is that nothing returns at all when I join C.DT = p.PatternDate without a CAST or CONVERT to a Date style. Not sure why when they are both DateTimes.
There are a few things to talk about here. At this stage it's not clear what you're actually trying to count. If it's the number of "patterns" per day for the month of Jan 2019, then:
Your BETWEEN will also count any activity occurring on 1 Feb,
It looks like one pattern could have multiple results, potentially causing a miscount
It looks like one pattern could have multiple details, potentially causing a miscount
If one pattern has say 3 eligible results, and also 4 details, you'll get the cross product of them. Your count will be 12.
I'm going to assume:
you only want the distinct number of patterns, regardless of the number of details and results.
You only want January's activity
--Set up some dummy data
DROP TABLE IF EXISTS #CalendarMain
SELECT cast('20190101' as datetime) as DT
INTO #CalendarMain
UNION ALL SELECT '20190102' as DT
UNION ALL SELECT '20190103' as DT
UNION ALL SELECT '20190104' as DT
UNION ALL SELECT '20190105' as DT --etc etc
;
DROP TABLE IF EXISTS #Pattern
SELECT cast('1'as int) as PatternID
,cast('20190101 13:00' as datetime) as PatternDate
INTO #Pattern
UNION ALL SELECT 2,'20190101 14:00'
UNION ALL SELECT 3,'20190101 15:00'
UNION ALL SELECT 4,'20190104 11:00'
UNION ALL SELECT 5,'20190104 14:00'
;
DROP TABLE IF EXISTS #Result
SELECT cast(100 as int) as ResultID
,cast(1 as int) as PatternID
,cast(7 as int) as [Type]
INTO #Result
UNION ALL SELECT 101,1,7
UNION ALL SELECT 102,1,8
UNION ALL SELECT 103,1,9
UNION ALL SELECT 104,2,8
UNION ALL SELECT 105,2,7
UNION ALL SELECT 106,3,7
UNION ALL SELECT 107,3,8
UNION ALL SELECT 108,4,7
UNION ALL SELECT 109,5,7
UNION ALL SELECT 110,5,8
;
DROP TABLE IF EXISTS #Detail
SELECT cast(201 as int) as DetailID
,cast(1 as int) as PatternID
,cast(501 as int) as Panel
INTO #Detail
UNION ALL SELECT 202,1,502
UNION ALL SELECT 203,1,503
UNION ALL SELECT 204,1,502
UNION ALL SELECT 205,1,502
UNION ALL SELECT 206,1,502
UNION ALL SELECT 207,2,502
UNION ALL SELECT 208,2,503
UNION ALL SELECT 209,2,502
UNION ALL SELECT 210,4,502
UNION ALL SELECT 211,4,501
;
-- create some variables
DECLARE #start_date as date = '20190101';
DECLARE #end_date as date = '20190201'; --I assume this is an exclusive end date
SELECT cal.DT
,isnull(patterns.[count],0) as [Count]
FROM #CalendarMain cal
LEFT JOIN ( SELECT cast(p.PatternDate as date) as PatternDate
,COUNT(DISTINCT p.PatternID) as [Count]
FROM #Pattern p
JOIN #Result r ON p.PatternID = r.PatternID
JOIN #Detail d ON p.PatternID = d.PatternID
WHERE r.[Type] = 7
and d.Panel = 501
GROUP BY cast(p.PatternDate as date)
) patterns ON cal.DT = patterns.patternDate
WHERE cal.DT >= #start_date
and cal.DT < #end_date --Your code would have included 1 Feb, which I assume was unintentional.
ORDER BY cal.DT
;

TSQL Get Item Price History from Item Price Changes

I have a table of item price changes, and I want to use it to create a table of item prices for each date (between the item's launch and end dates).
Here's some code to create the date:-
declare #Item table (item_id int, item_launch_date date, item_end_date date);
insert into #Item Values (1,'2001-01-01','2016-01-01'), (2,'2001-01-01','2016-01-01')
declare #ItemPriceChanges table (item_id int, item_price money, my_date date);
INSERT INTO #ItemPriceChanges VALUES (1, 123.45, '2001-01-01'), (1, 345.34, '2001-01-03'), (2, 34.34, '2001-01-01'), (2,23.56 , '2005-01-01'), (2, 56.45, '2016-05-01'), (2, 45.45, '2017-05-01'); ;
What I'd like to see is something like this:-
item_id date price
------- ---- -----
1 2001-01-01 123.45
1 2001-01-02 123.45
1 2001-01-03 345.34
1 2001-01-04 345.34
etc.
2 2001-01-01 34.34
2 2001-01-02 34.34
etc.
Any suggestions on how to write the query?
I'm using SQL Server 2016.
Added:
I also have a calendar table called "dim_calendar" with one row per day. I had hoped to use a windowing function, but the nearest I can find is lead() and it doesn't do what I thought it would do:-
select
i.item_id,
c.day_date,
ipc.item_price as item_price_change,
lead(item_price,1,NULL) over (partition by i.item_id ORDER BY c.day_date) as item_price
from dim_calendar c
inner join #Item i
on c.day_date between i.item_launch_date and i.item_end_date
left join #ItemPriceChanges ipc
on i.item_id=ipc.item_id
and ipc.my_date=c.day_date
order by
i.item_id,
c.day_date;
Thanks
I wrote this prior to your edit. Note that your sample output suggests that an item can have two prices on the day of the price change. The following assumes that an item can only have one price on a price change day and that is the new price.
declare #Item table (item_id int, item_launch_date date, item_end_date date);
insert into #Item Values (1,'2001-01-01','2016-01-01'), (2,'2001-01-01','2016-01-01')
declare #ItemPriceChange table (item_id int, item_price money, my_date date);
INSERT INTO #ItemPriceChange VALUES (1, 123.45, '2001-01-01'), (1, 345.34, '2001-01-03'), (2, 34.34, '2001-01-01'), (2,23.56 , '2005-01-01'), (2, 56.45, '2016-05-01'), (2, 45.45, '2017-05-01');
SELECT * FROM #ItemPriceChange
-- We need a table variable holding all possible date points for the output
DECLARE #DatePointList table (DatePoint date);
DECLARE #StartDatePoint date = '01-Jan-2001';
DECLARE #MaxDatePoint date = GETDATE();
DECLARE #DatePoint date = #StartDatePoint;
WHILE #DatePoint <= #MaxDatePoint BEGIN
INSERT INTO #DatePointList (DatePoint)
SELECT #DatePoint;
SET #DatePoint = DATEADD(DAY,1,#DatePoint);
END;
-- We can use a CTE to sequence the price changes
WITH ItemPriceChange AS (
SELECT item_id, item_price, my_date, ROW_NUMBER () OVER (PARTITION BY Item_id ORDER BY my_date ASC) AS SeqNo
FROM #ItemPriceChange
)
-- With the price changes sequenced, we can derive from and to dates for each price and use a join to the table of date points to produce the output. Also, use an inner join back to #item to only return rows for dates that are within the start/end date of the item
SELECT ItemPriceDate.item_id, DatePointList.DatePoint, ItemPriceDate.item_price
FROM #DatePointList AS DatePointList
INNER JOIN (
SELECT ItemPriceChange.item_id, ItemPriceChange.item_price, ItemPriceChange.my_date AS from_date, ISNULL(ItemPriceChange_Next.my_date,#MaxDatePoint) AS to_date
FROM ItemPriceChange
LEFT OUTER JOIN ItemPriceChange AS ItemPriceChange_Next ON ItemPriceChange_Next.item_id = ItemPriceChange.item_id AND ItemPriceChange.SeqNo = ItemPriceChange_Next.SeqNo - 1
) AS ItemPriceDate ON DatePointList.DatePoint >= ItemPriceDate.from_date AND DatePointList.DatePoint < ItemPriceDate.to_date
INNER JOIN #item AS item ON item.item_id = ItemPriceDate.item_id AND DatePointList.DatePoint BETWEEN item.item_launch_date AND item.item_end_date
ORDER BY ItemPriceDate.item_id, DatePointList.DatePoint;
#AlphaStarOne Perfect! I've modified it to use a Windowing function rather than a self-join, but what you've suggested works. Here's my implementation of that in case anyone else needs it:
SELECT
ipd.item_id,
dc.day_date,
ipd.item_price
FROM dim_calendar dc
INNER JOIN (
SELECT
item_id,
item_price,
my_date AS from_date,
isnull(lead(my_date,1,NULL) over (partition by item_id ORDER BY my_date),getdate()) as to_date
FROM #ItemPriceChange ipc1
) AS ipd
ON dc.day_date >= ipd.from_date
AND dc.day_date < ipd.to_date
INNER JOIN #item AS i
ON i.item_id = ipd.item_id
AND dc.day_date BETWEEN i.item_launch_date AND i.item_end_date
ORDER BY
ipd.item_id,
dc.day_date;

Grouping consecutive dates in PostgreSQL

I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.
I'm using PostgreSQL.
Table A
id startdate enddate
--------------------------
101 12/28/2013 12/31/2013
Table B
id startdate enddate
--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
Desired Result
id startdate enddate
-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013
Right. I have a query that I think works. It certainly works on the sample records you provided. It uses a recursive CTE.
First, you need to merge the two tables. Next, use a recursive CTE to get the sequences of overlapping dates. Finally, get the start and end dates, and join back to the "merged" table to get the id.
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent
The below fragment does what you intend. (but it will probably be very slow) The problem is that detecteng (non)overlapping dateranges is impossible with standard range operators, since a range could be split into two parts.
So, my code does the following:
split the dateranges from table_A into atomic records, with one date per record
[the same for table_b]
cross join these two tables (we are only interested in A_not_in_B, and B_not_in_A) , remembering which of the L/R outer join wings it came from.
re-aggregate the resulting records into date ranges.
-- EXPLAIN ANALYZE
--
WITH RECURSIVE ranges AS (
-- Chop up the a-table into atomic date units
WITH ar AS (
SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date AS thedate
, 'A'::text AS which
, a.id
FROM a
)
-- Same for the b-table
, br AS (
SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date AS thedate
, 'B'::text AS which
, b.id
FROM b
)
-- combine the two sets, retaining a_not_in_b plus b_not_in_a
, moments AS (
SELECT COALESCE(ar.id,br.id) AS id
, COALESCE(ar.which, br.which) AS which
, COALESCE(ar.thedate, br.thedate) AS thedate
FROM ar
FULL JOIN br ON br.id = ar.id AND br.thedate = ar.thedate
WHERE ar.id IS NULL OR br.id IS NULL
)
-- use a recursive CTE to re-aggregate the atomic moments into ranges
SELECT m0.id, m0.which
, m0.thedate AS startdate
, m0.thedate AS enddate
FROM moments m0
WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id AND nx.which = m0.which
AND nx.thedate = m0.thedate -1
)
UNION ALL
SELECT rr.id, rr.which
, rr.startdate AS startdate
, m1.thedate AS enddate
FROM ranges rr
JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
)
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
-- suppress partial subassemblies
WHERE nx.id = ra.id AND nx.which = ra.which
AND nx.startdate = ra.startdate
AND nx.enddate > ra.enddate
)
;

How to transpose all rows into columns with particular column as header in sqlserver

I want to transpose each row of the table into different columns making one particular row as a header for the transposed table. Please help me out with this asap. Any simple example would be fine
Use PIVOT:
select
--userID, Stage1, Stage2, Stage3, Stage4
*
from (
select '1' as UserID,'1' as Stage,'1-1-2013' as [Date] union all
select '1','2','2-1-2013' union all
select '2','1','1-1-2013' union all
select '1','3','5-1-2013' union all
select '2','2','3-1-2013' union all
select '3','1','6-1-2013' union all
select '3','2','8-1-2013' union all
select '1','4','10-1-2013' union all
select '3','3','12-1-2013'
) x
pivot (
max([Date]) for Stage in ([1],[2],[3],[4],[5],[6])
) p
More about this topic:
http://blog.sqlauthority.com/2008/06/07/sql-server-pivot-and-unpivot-table-examples/