Make row data into column headers while grouping - amazon-redshift

I'm trying to group up on a multiple rows of similar data and convert differentiated row data into columns on Amazon Redshift. Easier to explain with an example ->
Starting Table
+-------------------------------------------+
|**Col1** | **Col2** | **Col3** | **Col 4** |
| x | y | A | 123 |
| x | y | B | 456 |
+-------------------------------------------+
End result desired
+-------------------------------------------+
|**Col1** | **Col2** | **A** | **B** |
| x | y | 123 | 456 |
+-------------------------------------------+
Essentially grouping by Column 1 and 2, and the entries in Column 3 become the new column headers and the entries in Column 4 become the entries for the new columns.
Any help super appreciated!

There is no native functionality, but you could do something like:
SELECT
COL1,
COL2,
MAX(CASE WHEN COL3='A' THEN COL4 END) AS A,
MAX(CASE WHEN COL3='B' THEN COL4 END) AS B
FROM table
GROUP BY COL1, COL2
You effectively need to hard-code the column names. It's not possible to automatically define columns based on the data.
This is standard SQL - nothing specific to Amazon Redshift.

Related

Parse text data in PostgreSQL

I've got a PostgreSQL database, one table with 2 text columns, stored data like this:
id| col1 | col2 |
------------------------------------------------------------------------------|
1 | value_1, value_2, value_3 | name_1(date_1), name_2(date_2), name_3(date_3)|
2 | value_4, value_5, value_6 | name_4(date_4), name_5(date_5), name_6(date_6)|
I need to parse rows in a new table like this:
id | col1 | col2 | col3 |
1 | value_1 | name_1 | date_1 |
1 | value_2 | name_2 | date_2 |
...| ... | ... | ... |
2 | value_6 | name_6 | date_6 |
How might I do this?
step-by-step demo:db<>fiddle
SELECT
id,
u_col1 as col1,
col2_matches[1] as col2, -- 5
col2_matches[2] as col3
FROM
mytable,
unnest( -- 3
regexp_split_to_array(col1, ', '), -- 1
regexp_split_to_array(col2, ', ') -- 2
) as u (u_col1, u_col2),
regexp_matches(u_col2, '(.+)\((.+)\)') as col2_matches -- 4
Split the data of your first column into an array
Split the data of your second column into an array of form {a(a), b(b), c(c)}
Transpose all array elements into own records
Split the elements of form a(b) into an array of form {a,b}
Show required columns. For the col2 and col3 show the first or the second array element from step 4

how to get the minor of three column values in postgresql

The common function to get the minor value of a column is min(column), but what I want to do is to get the minor value of a row, based on the values of 3 columns. For example, using the following base table:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 2 | 1 | 3 |
| 10 | 0 | 1 |
| 13 | 12 | 2 |
+------+------+------+
I want to query it as:
+-----------+
| min_value |
+-----------+
| 1 |
| 0 |
| 2 |
+-----------+
I found a solution as follows, but for SQL, not Postgresql. So I am not getting it to work in postgresql:
select
(
select min(minCol)
from (values (t.col1), (t.col2), (t.col3)) as minCol(minCol)
) as minCol
from t
I could write something using case statement but I would like to write a query like the above for postgresql. Is this possible?
You can use least() (and greatest() for the maximum)
select least(col1, col2, col3) as min_value
from the_table

Selecting value for the latest two distinct columns

I am trying to do an SQL which will return the latest data value of the two distinct columns of my table.
Currently, I select distinct the values of the column and afterwards, I iterate through the columns to get the distinct values selected before then order and limit to 1. These tags can be any number and may not always be posted together (one time only tag 1 can be posted; whereas other times 1, 2, 3 can).
Although it gives the expected outcome, this seems to be inefficient in a lot of ways, and because I don't have enough SQL experience, this was so far the only way I found of performing the task...
--------------------------------------------------
| name | tag | timestamp | data |
--------------------------------------------------
| aa | 1 | 566 | 4659 |
--------------------------------------------------
| ab | 2 | 567 | 4879 |
--------------------------------------------------
| ac | 3 | 568 | 1346 |
--------------------------------------------------
| ad | 1 | 789 | 3164 |
--------------------------------------------------
| ae | 2 | 789 | 1024 |
--------------------------------------------------
| af | 3 | 790 | 3346 |
--------------------------------------------------
Therefore the expected outcome is {3164, 1024, 3346}
Currently what I'm doing is:
"select distinct tag from table"
Then I store all the distinct tag values programmatically and iterate programmatically through these values using
"select data from table where '"+ tags[i] +"' in (tag) order by timestamp desc limit 1"
Thanks,
This comes close, but beware if you have two rows with the same tag share a maximum timestamp you will get duplicates in the result set
select data from table
join (select tag, max(timestamp) maxtimestamp from table t1 group by tag) as latesttags
on table.tag = latesttags.tag and table.timestamp = latesttags.maxtimestamp

Postgresql Split single row to multiple rows

I'm new to postgresql. I'm getting below results from a query and now I need to split single row to obtain multiple rows.
I have gone through below links, but still couldn't manage it. Please help.
unpivot and PostgreSQL
How to split a row into multiple rows with a single query?
Current result
id,name,sub1code,sub1level,sub1hrs,sub2code,sub2level,sub2hrs,sub3code,sub3level,sub3hrs --continue till sub15
1,Silva,CHIN,L1,12,MATH,L2,20,AGRW,L2,35
2,Perera,MATH,L3,30,ENGL,L1,10,CHIN,L2,50
What we want
id,name,subcode,sublevel,subhrs
1,Silva,CHIN,L1,12
1,Silva,MATH,L2,20
1,Silva,AGRW,L2,35
2,Perera,MATH,L3,30
2,Perera,ENGL,L1,10
2,Perera,CHIN,L2,50
Use union:
select id, 1 as "#", name, sub1code, sub1level, sub1hrs
from a_table
union all
select id, 2 as "#", name, sub2code, sub2level, sub2hrs
from a_table
union all
select id, 3 as "#", name, sub3code, sub3level, sub3hrs
from a_table
order by 1, 2;
id | # | name | sub1code | sub1level | sub1hrs
----+---+--------+----------+-----------+---------
1 | 1 | Silva | CHIN | L1 | 12
1 | 2 | Silva | MATH | L2 | 20
1 | 3 | Silva | AGRW | L2 | 35
2 | 1 | Perera | MATH | L3 | 30
2 | 2 | Perera | ENGL | L1 | 10
2 | 3 | Perera | CHIN | L2 | 50
(6 rows)
The # column is not necessary if you want to get the result sorted by subcode or sublevel.
You should consider normalization of the model by splitting the data into two tables, e.g.:
create table students (
id int primary key,
name text);
create table hours (
id int primary key,
student_id int references students(id),
code text,
level text,
hrs int);

Sum of the most recent non-null columns (window function with "ignore nulls")

I am using PostgreSQL 9.1.9.
In the project I am working on, some most recent records have null columns because that information was not available when that row was created. I have a view that lists the sum of rows that belongs to the members of a group. As of right now, the view shows the sum of the most recent columns, which uses null values if those are the most recent values. For example,
table1
group_name | member
-------------------
group1 | Andy
group1 | Bob
table2
name | stat_date | col1 | col2 | col 3
--------------------------------------
Andy | 6/19/13 | null | 1 | 2
Andy | 6/18/13 | 100 | 3 | 5
Bob | 6/19/13 | 50 | 9 | 12
Bob | 6/18/13 | 111 | 31 | 51
-- creating view would be something like this...
create view v_grouped as
select table1.group_name, stat_date,
sum(col1) as col1_sum, sum(col2) as col2_sum, sum(col3) as col3_sum
from table1
join table2 on table1.member = table2.name
group by table1.group_name, table2.stat_date;
Current view looks like this:
group_name | stat_date | col1_sum | col2_sum | col3_sum
-------------------------------------------------------
group1 | 6/19/13 | 50 | 10 | 14
group2 | 6/18/13 | 211 | 34 | 56
Instead of 50, 150 would be a closer representation of what the actual group total is, despite lack of data for 6/19. So, I want an output of
group_name | stat_date | col1_sum | col2_sum | col3_sum
-------------------------------------------------------
group1 | 6/19/13 | 150 | 10 | 14
group2 | 6/18/13 | 211 | 34 | 56
I've been looking at first_value() from window functions as a possible function to use. I found that Oracle's first_value() supports the ignore nulls option which I believe will do what I want (http://psoug.org/definition/FIRST_VALUE.htm). According to this page I linked, about PL/SQL's first_value() function:
If the first value in the result set is NULL then the function returns NULL unless you specify IGNORE NULLS.
If you use the IGNORE NULLS parameter then FIRST_VALUE will return the first non-null value found in the result set. (If all
values are null then it will return NULL.)
Example Syntax: FIRST_VALUE(expression [INGORE NULLS]) OVER (analytic_clause)
But PostgreSQL's first_value() does not support such an option. Is there a way to do this in PostgreSql? Thank you in advance!
You can use this custom aggregate as a postgres variant of FIRST_VALUE(expression INGORE NULLS). Or build your own aggregate with desired behavior.
Is this what you are trying to describe?
SELECT sum(col1), sum(col2), sum(col3) FROM table2 WHERE col1 IS NOT NULL
(although I omitted the join on table1; that is an exercise for the reader)