Parse text data in PostgreSQL - postgresql

I've got a PostgreSQL database, one table with 2 text columns, stored data like this:
id| col1 | col2 |
------------------------------------------------------------------------------|
1 | value_1, value_2, value_3 | name_1(date_1), name_2(date_2), name_3(date_3)|
2 | value_4, value_5, value_6 | name_4(date_4), name_5(date_5), name_6(date_6)|
I need to parse rows in a new table like this:
id | col1 | col2 | col3 |
1 | value_1 | name_1 | date_1 |
1 | value_2 | name_2 | date_2 |
...| ... | ... | ... |
2 | value_6 | name_6 | date_6 |
How might I do this?

step-by-step demo:db<>fiddle
SELECT
id,
u_col1 as col1,
col2_matches[1] as col2, -- 5
col2_matches[2] as col3
FROM
mytable,
unnest( -- 3
regexp_split_to_array(col1, ', '), -- 1
regexp_split_to_array(col2, ', ') -- 2
) as u (u_col1, u_col2),
regexp_matches(u_col2, '(.+)\((.+)\)') as col2_matches -- 4
Split the data of your first column into an array
Split the data of your second column into an array of form {a(a), b(b), c(c)}
Transpose all array elements into own records
Split the elements of form a(b) into an array of form {a,b}
Show required columns. For the col2 and col3 show the first or the second array element from step 4

Related

how to get the minor of three column values in postgresql

The common function to get the minor value of a column is min(column), but what I want to do is to get the minor value of a row, based on the values of 3 columns. For example, using the following base table:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 2 | 1 | 3 |
| 10 | 0 | 1 |
| 13 | 12 | 2 |
+------+------+------+
I want to query it as:
+-----------+
| min_value |
+-----------+
| 1 |
| 0 |
| 2 |
+-----------+
I found a solution as follows, but for SQL, not Postgresql. So I am not getting it to work in postgresql:
select
(
select min(minCol)
from (values (t.col1), (t.col2), (t.col3)) as minCol(minCol)
) as minCol
from t
I could write something using case statement but I would like to write a query like the above for postgresql. Is this possible?
You can use least() (and greatest() for the maximum)
select least(col1, col2, col3) as min_value
from the_table

Make row data into column headers while grouping

I'm trying to group up on a multiple rows of similar data and convert differentiated row data into columns on Amazon Redshift. Easier to explain with an example ->
Starting Table
+-------------------------------------------+
|**Col1** | **Col2** | **Col3** | **Col 4** |
| x | y | A | 123 |
| x | y | B | 456 |
+-------------------------------------------+
End result desired
+-------------------------------------------+
|**Col1** | **Col2** | **A** | **B** |
| x | y | 123 | 456 |
+-------------------------------------------+
Essentially grouping by Column 1 and 2, and the entries in Column 3 become the new column headers and the entries in Column 4 become the entries for the new columns.
Any help super appreciated!
There is no native functionality, but you could do something like:
SELECT
COL1,
COL2,
MAX(CASE WHEN COL3='A' THEN COL4 END) AS A,
MAX(CASE WHEN COL3='B' THEN COL4 END) AS B
FROM table
GROUP BY COL1, COL2
You effectively need to hard-code the column names. It's not possible to automatically define columns based on the data.
This is standard SQL - nothing specific to Amazon Redshift.

Postgresql Split single row to multiple rows

I'm new to postgresql. I'm getting below results from a query and now I need to split single row to obtain multiple rows.
I have gone through below links, but still couldn't manage it. Please help.
unpivot and PostgreSQL
How to split a row into multiple rows with a single query?
Current result
id,name,sub1code,sub1level,sub1hrs,sub2code,sub2level,sub2hrs,sub3code,sub3level,sub3hrs --continue till sub15
1,Silva,CHIN,L1,12,MATH,L2,20,AGRW,L2,35
2,Perera,MATH,L3,30,ENGL,L1,10,CHIN,L2,50
What we want
id,name,subcode,sublevel,subhrs
1,Silva,CHIN,L1,12
1,Silva,MATH,L2,20
1,Silva,AGRW,L2,35
2,Perera,MATH,L3,30
2,Perera,ENGL,L1,10
2,Perera,CHIN,L2,50
Use union:
select id, 1 as "#", name, sub1code, sub1level, sub1hrs
from a_table
union all
select id, 2 as "#", name, sub2code, sub2level, sub2hrs
from a_table
union all
select id, 3 as "#", name, sub3code, sub3level, sub3hrs
from a_table
order by 1, 2;
id | # | name | sub1code | sub1level | sub1hrs
----+---+--------+----------+-----------+---------
1 | 1 | Silva | CHIN | L1 | 12
1 | 2 | Silva | MATH | L2 | 20
1 | 3 | Silva | AGRW | L2 | 35
2 | 1 | Perera | MATH | L3 | 30
2 | 2 | Perera | ENGL | L1 | 10
2 | 3 | Perera | CHIN | L2 | 50
(6 rows)
The # column is not necessary if you want to get the result sorted by subcode or sublevel.
You should consider normalization of the model by splitting the data into two tables, e.g.:
create table students (
id int primary key,
name text);
create table hours (
id int primary key,
student_id int references students(id),
code text,
level text,
hrs int);

Postgres DISTINCT ON eqivalent in Hibernate Query Language

I need Postgres DISTINCT ON equivalent in HQL. For example consider the following.
SELECT DISTINCT ON (Col2) Col1, Col4 FROM tablename;
on table
Col1 | Col2 | Col3 | Col4
---------------------------------
AA1 | A | 2 | 1
AA2 | A | 4 | 2
BB1 | B | 2 | 3
BB2 | B | 5 | 4
Col2 will not be shown in the result as below
Col1 | Col4
------------
AA1 | 1
BB1 | 3
Can anyone give a solution in HQL. I need to use DISTINCT as it is part of a bigger query.
Sorry but I misread your question:
No, Hibernate does not support a DISTINCT ON query.
Here is possible duplicate of your question: Postgresql 'select distinct on' in hibernate

SQL group by distinct array

I have table1:
col1 (integer) | col2 (varchar[]) | col3 (integer)
----------------------------------------------------
1 | {A,B,C} | 2
1 | {A} | 5
1 | {A,B} | 1
2 | {A,B} | 2
2 | {A} | 3
2 | {B} | 1
I want summarize 'col3 ' with a GROUP BY 'col1 ' by keeping only DISTINCT values ​​from 'col3 '
Expected result below :
col1 (integer) | col2 (varchar[]) | col3 (integer)
----------------------------------------------------
1 | {A,B,C} | 8
2 | {A,B} | 6
I tried this :
SELECT col1, array_to_string(array_accum(col2), ','::text),sum(col3) FROM table1 GROUP BY col1
but the result is not the one expected :
col1 (integer) | col2 (varchar[]) | col3 (integer)
---------------------------------------------------------------
1 | {A,B,C,A,A,B} | 8
2 | {A,B,A,B} | 6
do you have any suggestion?
If the logic of which col2 you want is by the largest (like in your expected output is {A,B,C} & {A,B}.
SELECT col1, (SELECT sub.col2
FROM table1 sub
INNER JOIN table1 sub ON MAX(char_length(sub.col2)) = col2
WHERE sub.col1 = col1)
SUM(col3)
FROM table1
GROUP BY col1
SELECT
col1,
array_to_string(array_accum(col2), ','::text),
sum(col3)
FROM table1
GROUP BY col1;
but array_to_string concatenates array elements using supplied delimiter and optional null string.
You have to devise a different strategy like using array_dims(anyarray) to select the array with max elements or create a new aggregation function.
For this you could be interested in this answer:
eliminate duplicate array values in postgres