Redshift - Many Columns to Rows (Unpivot) - amazon-redshift

In Redshift :
I've a table with 30 dimension fields and more than 150 measure fields.
To make good use of these data in a visualization tool (Tableau), I need to Unpivot the measure columns into only one measure and one dimension to categorize them.
Short Example:
Date Country Order Banana Apple Orange Kiwi Lemon
1-10-2018 Belgium XYZ789 14 0 10 16 7
1-10-2018 Germany ABC123 10 15 3 15 3
2-10-2018 Belgium KLM456 9 9 7 1 7
Result :
Date Country Order Measure_Name Measure_Value
1-10-2018 Belgium XYZ789 Banana 14
1-10-2018 Belgium XYZ789 Apple 0
1-10-2018 Belgium XYZ789 Orange 10
1-10-2018 Belgium XYZ789 Kiwi 16
1-10-2018 Belgium XYZ789 Lemon 7
1-10-2018 Germany ABC123 Banana 10
1-10-2018 Germany ABC123 Apple 15
1-10-2018 Germany ABC123 Orange 3
1-10-2018 Germany ABC123 Kiwi 15
1-10-2018 Germany ABC123 Lemon 3
2-10-2018 Belgium KLM456 Banana 9
2-10-2018 Belgium KLM456 Apple 9
2-10-2018 Belgium KLM456 Orange 7
2-10-2018 Belgium KLM456 Kiwi 1
2-10-2018 Belgium KLM456 Lemon 7
I know and I've tried the 'UNION ALL' solution but my table count millions of rows, and more than 150 columns to unpivot is really too huge for this solution. (Even The SQL is more than 8k rows long)
Do you have any Idea to help me ?
Thanks a lot,

When writing this code in an 'imperative' way, you'd like to generate more rows out of one, possibly using something like flatMap (or equivalent in your programming language). To generate rows in SQL, you have to use JOIN.
This problem can be solved by (CROSS)JOINing your table with another, having as many rows as there are columns to unpivot. You need to add some conditional magic and Voila!.
CREATE TABLE t (
"Date" date,
"Country" varchar,
"Order" varchar,
"Banana" varchar,
"Apple" varchar,
"Orange" varchar,
"Kiwi" varchar,
"Lemon" varchar
);
INSERT INTO t VALUES ('1-10-2018', 'Belgium', 'XYZ789', '14', '0', '10', '16', '7');
INSERT INTO t VALUES ('1-10-2018', 'Germany', 'ABC123', '10', '15', '3', '15', '3');
INSERT INTO t VALUES ('2-10-2018', 'Belgium', 'KLM456', '9', '9', '7', '1', '7');
WITH
cols as (
select 'Banana' as c
union all
select 'Apple' as c
union all
select 'Orange' as c
union all
select 'Kiwi' as c
union all
select 'Lemon' as c
)
select
"Date",
"Country",
"Order",
c "Fruit Type",
CASE c
WHEN 'Banana' THEN "Banana"
WHEN 'Apple' THEN "Apple"
WHEN 'Orange' THEN "Orange"
WHEN 'Kiwi' THEN "Kiwi"
WHEN 'Lemon' THEN "Lemon"
ELSE NULL
END as "Amount Ordered"
from t cross join cols;
https://www.db-fiddle.com/f/kojuPAjpS5twCKXSPVqYyP/3

Given that you have 150 columns to transpose, I do not think its feasible to do it with SQL. I have had almost the same exact scenario and used python to solve it. The pseudo-code and explanation is in this question
Redshift. How can we transpose (dynamically) a table from columns to rows?

Related

PostgresSQL: Fill values for null rows based on rows for which we do have values

I have the following table:
country year rank
1 Austria 2019 1
2 Austria 2018 NA
3 Austria 2017 NA
4 Austria 2016 NA
5 Spain 2019 2
6 Spain 2018 NA
7 Spain 2017 NA
8 Spain 2016 NA
9 Belgium 2019 3
10 Belgium 2018 NA
11 Belgium 2017 NA
12 Belgium 2016 NA
I want to fill in the NA values for 2018, 2017 and 2016 for each country with the value for 2019 (which we have).
I want the output table to look like this:
country year rank
1 Austria 2019 1
2 Austria 2018 1
3 Austria 2017 1
4 Austria 2016 1
5 Spain 2019 2
6 Spain 2018 2
7 Spain 2017 2
8 Spain 2016 2
9 Belgium 2019 3
10 Belgium 2018 3
11 Belgium 2017 3
12 Belgium 2016 3
I do not know where to get started with this question. I typically work with R but am now working on a platform which uses postgresSQL. I could do this in R but thought it would be worthwhile to figure out how it is done with postgres.
Any help with this would be greatly appreciated. Thank you.
Using an update to join to find the non NULL rank value for each country:
UPDATE yourTable AS t1
SET "rank" = t2.max_rank
FROM
(
SELECT country, MAX("rank") AS max_rank
FROM yourTable
GROUP BY country
) t2
WHERE t2.country = t1.country;
-- AND year IN (2016, 2017, 2018)
Add the commented out portion of the WHERE clause if you really only want to target certain years (your example seems to imply that you want to backfill all missing data).
If you just want to view your data in the format of the output, then use MAX as an analytic function:
SELECT country, year, MAX("rank") OVER (PARTITION BY country) AS "rank"
FROM yourTable
ORDER BY country, year DESC;
If you just want the output then
try this,
with cte as (
select distinct on (country) * from test
order by country, year desc
)
select
t1.id,t1.country,t1.year,t2.rank
from test t1 left join cte t2 on t1.country=t2.country
If you want to update your table then try this:
with cte as (
select distinct on (country) * from test
order by country, year desc
)
update test set rank=cte.rank from cte
where test.country=cte.country
DEMO

Pivoting while grouping in postgres

I've been using crosstab in postgres to pivot a table, but am now needing to add in a grouping and I'm not sure if that's possible.
I'm starting with results like this:
Date Account# Type Count
-----------------------------------------
2020/1/1 100 Red 5
2020/1/1 100 Blue 3
2020/1/1 100 Yellow 7
2020/1/2 100 Red 2
2020/1/2 100 Yellow 9
2020/1/1 101 Red 4
2020/1/1 101 Blue 7
2020/1/1 101 Yellow 3
2020/1/2 101 Red 8
2020/1/2 101 Blue 6
2020/1/2 101 Yellow 4
And I'd like to pivot it like this, where there's a row for each combination of date and account #:
Date Account# Red Blue Yellow
---------------------------------------------
2020/1/1 100 5 3 7
2020/1/2 100 2 0 9
2020/1/1 101 4 7 3
2020/1/2 101 8 6 4
This is the code I've written returns the error "The provided SQL must return 3 columns: rowid, category, and values" which makes sense per my understanding of crosstab.
SELECT *
FROM crosstab(
SELECT date, account_number, type, count
FROM table
ORDER BY 2,1,3'
) AS ct (date timestamp, account_number varchar, Red bigint, Blue bigint, Yellow bigint);
(I wrote the dates in a simplified format in the example tables but they are timestamps)
Is there a different way I can manipulate the first table to look like the second? Thank you!
You can do conditional aggregation:
select
date,
account#,
sum(cnt) filter(where type = 'Red' ) red,
sum(cnt) filter(where type = 'Blue' ) blue,
sum(cnt) filter(where type = 'Yellow') yellow
from mytable
group by date, account#

Full Outer Joins In PostgreSql [duplicate]

This question already has answers here:
Left Outer Join Not Working?
(4 answers)
Closed 4 years ago.
I've created a table of students with columns student_id as primary key,
student_name and gender.
I've an another table gender which consists of gender_id and gender.
gender_id in student refers to table gender.
Tables data looks like this:
Student table
STUDENT_ID STUDENT_NAME GENDER
1 Ajith 1
2 Alan 1
3 Ann 2
4 Alexa 2
5 Amith 1
6 Nisha 2
7 Rathan 1
8 Rebecca 2
9 asdf null
10 asd null
11 dbss null
Gender Table
GENDER_ID GENDER
1 Male
2 Female
3 Others
My query and its result
SELECT S.STUDENT_NAME,
G.GENDER
FROM STUDENTS S
FULL OUTER JOIN GENDER G ON G.GENDER_ID = S.GENDER
result is giving with 12 rows including the Others value from the gender table.
STUDENT_ID STUDENT_NAME GENDER
1 Ajith Male
2 Alan Male
3 Ann Female
4 Alexa Female
5 Amith Male
6 Nisha Female
7 Rathan Male
8 Rebecca Female
Others
9 asdf
10 asd
11 dbss
I'm trying to restrict a particular student_id:
SELECT S.STUDENT_ID,
S.STUDENT_NAME,
G.GENDER
FROM STUDENTS S
FULL OUTER JOIN GENDER G ON G.GENDER_ID = S.GENDER
WHERE S.STUDENT_ID <> 11;
now the the total number of the rows are reduced to 10.
STUDENT_ID STUDENT_NAME GENDER
1 Ajith Male
2 Alan Male
3 Ann Female
4 Alexa Female
5 Amith Male
6 Nisha Female
7 Rathan Male
8 Rebecca Female
9 asdf
10 asd
Why has the one row with Others Values disappeared from the second select query?
I'm trying to find the cause of this issue.
That's because NULL <> 11 is not TRUE, but NULL, and only rows where the condition is TRUE are included in the result.
You'd have to write something like
WHERE s.student_id IS DISTINCT FROM 11
Your second select query returns all rows where student_id is different (<>) from 11.

T-SQL pulling multiple columns of data from a single column field

I am trying to pull 3 columns of data from one field. basically i have a field with for arguments sake a table with the following data:
Color,
Model,
Year of a car.
It is itemized as ID4 is Color, ID5 is Model and ID6 is Year. I can pull one data set with no problem using a filter, ex. Filter = 4, 5 or 6. But I cannot pull multiples as I just get the headers and no data at all.
Assuming you are using SQL Server 2005+, and your question really is "how do you break one column in a table into multiple named columns based on another field in the same table", here is a simple example patterned after your question.
Give this dataset:
declare #tbl table (id int, tag char(3), data varchar(255))
insert into #tbl values
(1, 'ID4', 'Red'), (1, 'ID5', 'Toyota'), (1, 'ID6', '1999'),
(2, 'ID4', 'Blue'), (2, 'ID5', 'Honda'), (2, 'ID6', '2000'),
(3, 'ID4', 'Green'), (3, 'ID5', 'Nissan'), (3, 'ID6', '2004'),
(4, 'ID4', 'Red'), (4, 'ID5', 'Nissan'), (4, 'ID6', '1990'),
(5, 'ID4', 'Black'), (5, 'ID5', 'Toyota'), (5, 'ID6', '2002')
A simple select statement returns this data:
select * from #tbl
id tag data
1 ID4 Red
1 ID5 Toyota
1 ID6 1999
2 ID4 Blue
2 ID5 Honda
2 ID6 2000
3 ID4 Green
3 ID5 Nissan
3 ID6 2004
4 ID4 Red
4 ID5 Nissan
4 ID6 1990
5 ID4 Black
5 ID5 Toyota
5 ID6 2002
This pivot query returns the data -- one row per car -- with Color, Model and Year as their own columns:
select id, [ID4] as 'Color', [ID5] as 'Model', [ID6] as 'Year'
from (select id, tag, data from #tbl) as p
pivot (max(data) for tag in ([ID4], [ID5], [ID6])) as pvt
order by pvt.id
This is how the output looks:
id Color Model Year
1 Red Toyota 1999
2 Blue Honda 2000
3 Green Nissan 2004
4 Red Nissan 1990
5 Black Toyota 2002

DateDiff Missing few records

I am using the datediff function
SELECT stName
,stId
,stDob --(varchar(15))
,stJoinDt --(datetime)
FROM student stu
WHERE
DATEDIFF(yy,stu.stDob,stu.stJoinDt) between 18 and 75
Since the between operator is not effective I have also changed the code to
SELECT stName
,stId
,stDob
,stJoinDt
FROM student stu
WHERE
DATEDIFF(yy,stu.stDob,stu.stJoinDt) >= 18
AND DATEDIFF(yy,stu.stDob,stu.stJoinDt) < 75
Is there any other effective way to write datediff to capture all the missing records?
The missing records are
stDob stJoinDt
10/08/1925 2011-01-03
04/18/1935 2011-01-19
12/11/1928 2011-06-06
1/24/1927 2011-04-18
04/18/1918 2011-04-20
Those records should be missing because the number of years between stDob and stJoinDt is not between 18 and 75, as you are filtering them out with your condition that stDob and stJoinDt differ by between 18 and 75 years:
with student as (
select 'Bob' as stName, 1 as stId, '10/08/1925' as stDob, '2011-01-03' as stJoinDt
union select 'Bob' as stName, 2 as stId, '04/18/1935', '2011-01-19'
union select 'Bob' as stName, 3 as stId, '12/11/1928', '2011-06-06'
union select 'Bob' as stName, 4 as stId, '1/24/1927 ', '2011-04-18'
union select 'Bob' as stName, 5 as stId, '04/18/1918', '2011-04-20'
)
SELECT stName
,stId
,stDob --(varchar(15))
,stJoinDt --(datetime)
,datediff(yy, stu.stDob, stu.stJoinDt) as DiffYears
FROM student stu
Output:
stName stId stDob stJoinDt DiffYears
Bob 1 10/08/1925 2011-01-03 *86* (>75)
Bob 2 04/18/1935 2011-01-19 *76* (>75)
Bob 3 12/11/1928 2011-06-06 *83* (>75)
Bob 4 1/24/1927 2011-04-18 *84* (>75)
Bob 5 04/18/1918 2011-04-20 *93* (>75)
My guess would be you were wanting to capture all records where the person is at least 18 years old. In that case, remove the 75 part from the filter:
WHERE
DATEDIFF(yy,stu.stDob,stu.stJoinDt) >= 18
-- STOP HERE
Although technically this does not perform the correct calculation, because it is only finding the difference in the year values and not taking into account day and month. For instance, a date-of-birth of 12/31/1990 and a join date of 1/1/2008 would register as 18 years even though the person is only 17 years, 1 day old. I would recommend instead using the solution provided in this question:
where
(DATEDIFF(YY, stu.stDob, stu.stJoinDt) -
CASE WHEN(
(MONTH(stDob)*100 + DAY(stDob)) > (MONTH(stJoinDt)*100 + DAY(stJoinDt))
) THEN 1 ELSE 0 END
) >= 18