SQL group by distinct array

SQL group by distinct array - postgresql

I have table1:
col1 (integer) | col2 (varchar[]) | col3 (integer)
----------------------------------------------------
1 | {A,B,C} | 2
1 | {A} | 5
1 | {A,B} | 1
2 | {A,B} | 2
2 | {A} | 3
2 | {B} | 1
I want summarize 'col3 ' with a GROUP BY 'col1 ' by keeping only DISTINCT values from 'col3 '
Expected result below :
col1 (integer) | col2 (varchar[]) | col3 (integer)
----------------------------------------------------
1 | {A,B,C} | 8
2 | {A,B} | 6
I tried this :
SELECT col1, array_to_string(array_accum(col2), ','::text),sum(col3) FROM table1 GROUP BY col1
but the result is not the one expected :
col1 (integer) | col2 (varchar[]) | col3 (integer)
---------------------------------------------------------------
1 | {A,B,C,A,A,B} | 8
2 | {A,B,A,B} | 6
do you have any suggestion?

If the logic of which col2 you want is by the largest (like in your expected output is {A,B,C} & {A,B}.
SELECT col1, (SELECT sub.col2
FROM table1 sub
INNER JOIN table1 sub ON MAX(char_length(sub.col2)) = col2
WHERE sub.col1 = col1)
SUM(col3)
FROM table1
GROUP BY col1

SELECT
col1,
array_to_string(array_accum(col2), ','::text),
sum(col3)
FROM table1
GROUP BY col1;
but array_to_string concatenates array elements using supplied delimiter and optional null string.
You have to devise a different strategy like using array_dims(anyarray) to select the array with max elements or create a new aggregation function.
For this you could be interested in this answer:
eliminate duplicate array values in postgres

Related

Parse text data in PostgreSQL

I've got a PostgreSQL database, one table with 2 text columns, stored data like this:
id| col1 | col2 |
------------------------------------------------------------------------------|
1 | value_1, value_2, value_3 | name_1(date_1), name_2(date_2), name_3(date_3)|
2 | value_4, value_5, value_6 | name_4(date_4), name_5(date_5), name_6(date_6)|
I need to parse rows in a new table like this:
id | col1 | col2 | col3 |
1 | value_1 | name_1 | date_1 |
1 | value_2 | name_2 | date_2 |
...| ... | ... | ... |
2 | value_6 | name_6 | date_6 |
How might I do this?

step-by-step demo:db<>fiddle
SELECT
id,
u_col1 as col1,
col2_matches[1] as col2, -- 5
col2_matches[2] as col3
FROM
mytable,
unnest( -- 3
regexp_split_to_array(col1, ', '), -- 1
regexp_split_to_array(col2, ', ') -- 2
) as u (u_col1, u_col2),
regexp_matches(u_col2, '(.+)\((.+)\)') as col2_matches -- 4
Split the data of your first column into an array
Split the data of your second column into an array of form {a(a), b(b), c(c)}
Transpose all array elements into own records
Split the elements of form a(b) into an array of form {a,b}
Show required columns. For the col2 and col3 show the first or the second array element from step 4

Postgres Insert N Rows in a Loop for All Values in a Selected Column

Suppose I have users stored as
select * from users_t where user_name like 'ABC%';
id user_name
1 ABC1
2 ABC2
.. ..
Now I need to loop through all user_name's and make that number of INSERTs into a different table, RECALLS_T. All the other columns are hard-coded constants that I define.
Assume the following table, with a Sequence called RECALLS_T_ID_SEQ on the ID:
id created_by_user_name field1 field2
1 ABC1 Const1 Const2
2 ABC2 Const1 Const2
.. .. .. ..
How do I insert these in a Postgres loop?
ADDITIONAL QUESTION Also, what if I need to insert X (say 5) Recalls for each User entry? Suppose it's not a 1:1 mapping, but 5:1, where 5 is a hard-coded loop number.

You can use the select in the insert statement:
insert into recalls_t (created_by_user_name, field1, field2)
select user_name, 'Const1', 'Const2'
from users_t
where user_name like 'ABC%';
Use the function generate_series() to insert more than one row for each entry from users_t. I have added the column step to illustrate this:
insert into recalls_t (created_by_user_name, field1, field2, step)
select user_name, 'Const1', 'Const2', step
from users_t
cross join generate_series(1, 3) as step
where user_name like 'ABC%'
returning *
id | created_by_user_name | field1 | field2 | step
----+----------------------+--------+--------+------
1 | ABC1 | Const1 | Const2 | 1
2 | ABC2 | Const1 | Const2 | 1
3 | ABC1 | Const1 | Const2 | 2
4 | ABC2 | Const1 | Const2 | 2
5 | ABC1 | Const1 | Const2 | 3
6 | ABC2 | Const1 | Const2 | 3
(6 rows)
Live demo in Db<>fiddle.

Postgres DISTINCT ON eqivalent in Hibernate Query Language

I need Postgres DISTINCT ON equivalent in HQL. For example consider the following.
SELECT DISTINCT ON (Col2) Col1, Col4 FROM tablename;
on table
Col1 | Col2 | Col3 | Col4
---------------------------------
AA1 | A | 2 | 1
AA2 | A | 4 | 2
BB1 | B | 2 | 3
BB2 | B | 5 | 4
Col2 will not be shown in the result as below
Col1 | Col4
------------
AA1 | 1
BB1 | 3
Can anyone give a solution in HQL. I need to use DISTINCT as it is part of a bigger query.

Sorry but I misread your question:
No, Hibernate does not support a DISTINCT ON query.
Here is possible duplicate of your question: Postgresql 'select distinct on' in hibernate

How can I get the sum(value) on the latest gather_time per group(name,col1) in PostgreSQL?

Actually, I got a good answer about the similar issue on below thread, but I need one more solution for different data set.
How to get the latest 2 rows ( PostgreSQL )
The Data set has historical data, and I just want to get sum(value) for the group on the latest gather_time.
The final result should be as following:
name | col1 | gather_time | sum
-------+------+---------------------+-----
first | 100 | 2016-01-01 23:12:49 | 6
first | 200 | 2016-01-01 23:11:13 | 4
However, I just can see the data for the one group(first-100) with a query below meaning that there is no data for the second group(first-200).
Thing is that I need to get the one row per the group.
The number of the group can be vary.
select name,col1,gather_time,sum(value)
from testtable
group by name,col1,gather_time
order by gather_time desc
limit 2;
name | col1 | gather_time | sum
-------+------+---------------------+-----
first | 100 | 2016-01-01 23:12:49 | 6
first | 100 | 2016-01-01 23:11:19 | 6
(2 rows)
Can you advice me to accomplish this requirement?
Data set
create table testtable
(
name varchar(30),
col1 varchar(30),
col2 varchar(30),
gather_time timestamp,
value integer
);
insert into testtable values('first','100','q1','2016-01-01 23:11:19',2);
insert into testtable values('first','100','q2','2016-01-01 23:11:19',2);
insert into testtable values('first','100','q3','2016-01-01 23:11:19',2);
insert into testtable values('first','200','t1','2016-01-01 23:11:13',2);
insert into testtable values('first','200','t2','2016-01-01 23:11:13',2);
insert into testtable values('first','100','q1','2016-01-01 23:11:11',2);
insert into testtable values('first','100','q1','2016-01-01 23:12:49',2);
insert into testtable values('first','100','q2','2016-01-01 23:12:49',2);
insert into testtable values('first','100','q3','2016-01-01 23:12:49',2);
select *
from testtable
order by name,col1,gather_time;
name | col1 | col2 | gather_time | value
-------+------+------+---------------------+-------
first | 100 | q1 | 2016-01-01 23:11:11 | 2
first | 100 | q2 | 2016-01-01 23:11:19 | 2
first | 100 | q3 | 2016-01-01 23:11:19 | 2
first | 100 | q1 | 2016-01-01 23:11:19 | 2
first | 100 | q3 | 2016-01-01 23:12:49 | 2
first | 100 | q1 | 2016-01-01 23:12:49 | 2
first | 100 | q2 | 2016-01-01 23:12:49 | 2
first | 200 | t2 | 2016-01-01 23:11:13 | 2
first | 200 | t1 | 2016-01-01 23:11:13 | 2

One option is to join your original table to a table containing only the records with the latest gather_time for each name, col1 group. Then you can take the sum of the value column for each group to get the result set you want.
SELECT t1.name, t1.col1, MAX(t1.gather_time) AS gather_time, SUM(t1.value) AS sum
FROM testtable t1 INNER JOIN
(
SELECT name, col1, col2, MAX(gather_time) AS maxTime
FROM testtable
GROUP BY name, col1, col2
) t2
ON t1.name = t2.name AND t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND
t1.gather_time = t2.maxTime
GROUP BY t1.name, t1.col1
If you wanted to use a subquery in the WHERE clause, as you attempted in your OP, to restrict to only records with the latest gather_time then you could try the following:
SELECT name, col1, gather_time, SUM(value) AS sum
FROM testtable t1
WHERE gather_time =
(
SELECT MAX(gather_time)
FROM testtable t2
WHERE t1.name = t2.name AND t1.col1 = t2.col1
)
GROUP BY name, col1

SELECT Col1, Col2, Col3 FROM Table WHERE Column1 has duplicates

I want to show all the rows of the table where col1 has duplicates.
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 0 | 0 |
| 3 | 0 | 0 |
| 3 | 1 | 1 |
| 4 | 0 | 0 |
+------+------+------+
The results should be:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 0 | 0 |
| 1 | 1 | 1 |
| 3 | 0 | 0 |
| 3 | 1 | 1 |
+------+------+------+
I've tried some queries with no luck, so here I am asking for your help.

Depending on your version of sql server you can use:
select col1, col2, col3
from
(
select col1, col2, col3,
count(col1) over(partition by col1) cnt
from yourtable
) src
where cnt > 1
See SQL Fiddle with Demo

select t.col1, t.col2, t.col3
from mytable t join (select col1
from mytable
group by col1
having count(*) > 1) t2
on t.col1 = t2.col1

Let me add one more variant solution. If you have a pk column that has a UNIQUE or PRIMARY KEY constraint, you can use:
select col1, col2, col3
from <yourTable> t1
where exists
(select *
from <yourTable> t2
where t2.col1 = t1.col1
and t2.pk <> t1.pk
) ;

If the name of the table is T5 then use this:
SELECT COL1, COL2, COL3
FROM T5
WHERE COL1 IN
(
SELECT COL1
FROM T5
GROUP BY COL1
HAVING COUNT(COL1)>=2
)
I checked and the above should not use any nonstandard SQL. I am assuming that is the case for the others.

select col1, col2, col3
from <yourTable> t1
where exists
(select null
from <yourTable> t2
where t2.col1 = t1.col1
group by t2.col1
having count(*) > 1)
sqlFiddle

Guess I am too late.. but how about a left join...
SQLFIDDLE DEMO
Query:
SELECT DISTINCT x.col1, x.col2, x.col3
FROM ab y
LEFT JOIN
ab x
ON y.col1=x.col1 and ( y.col2<> x.col2
OR x.col3<>y.col3 )
where not (x.col3 is null)
and not (x.col2 is null)
;
Results:
COL1 COL2 COL3
1 0 0
1 1 1
3 0 0
3 1 1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SQL group by distinct array - postgresql

If the logic of which col2 you want is by the largest (like in your expected output is {A,B,C} & {A,B}. SELECT col1, (SELECT sub.col2 FROM table1 sub INNER JOIN table1 sub ON MAX(char_length(sub.col2)) = col2 WHERE sub.col1 = col1) SUM(col3) FROM table1 GROUP BY col1

Related

Parse text data in PostgreSQL

Postgres Insert N Rows in a Loop for All Values in a Selected Column

Postgres DISTINCT ON eqivalent in Hibernate Query Language

How can I get the sum(value) on the latest gather_time per group(name,col1) in PostgreSQL?

SELECT Col1, Col2, Col3 FROM Table WHERE Column1 has duplicates

Categories

Resources