Postgres : Get multiple columns with group by - postgresql

Table
select * from hello;
id | name
----+------
1 | abc
2 | xyz
3 | abc
4 | dfg
5 | abc
(5 rows)
Query
select name,count(*) from hello where name in ('abc', 'dfg') group by name;
name | count
------+-------
dfg | 1
abc | 3
(2 rows)
In the above query, I am trying to get the count of the rows whose name is in the tuple. However, I want to get the id as well with the count of the names. Is there a way this can be achievable? Thanks

If you want to return the "id" values, then you can use a window function:
select id, name, count(*) over(PARTITION BY name)
from hello
where name in ('abc', 'dfg');
This will return the id values along with the count of rows per name.

If you want to see all IDs for each name, you need to aggregate them:
select name, count(*), array_agg(id) as ids
from hello
where name in ('abc', 'dfg')
group by name;
This returns something like this:
name | count | ids
-----+-------+--------
abc | 3 | {1,3,5}
dfg | 1 | {4}

Related

tsql - How to convert multiples rows and columns into one row

id | acct_num | name | orderdt
1 1006A Joe Doe 1/1/2021
2 1006A Joe Doe 1/5/2021
EXPECTED OUTPUT
id | acct_num | name | orderdt | id1 | acct_num1 | NAME1 | orderdt1
1 1006A Joe Doe 1/1/2021 2 1006A Joe Doe 1/5/2021
My query is the following:
Select id,
acct_num,
name,
orderdt
from order_tbl
where acct_num = '1006A'
and orderdt >= '1/1/2021'
If you always have one or two rows you could do it like this (I'm assuming the latest version of SQL Server because you said TSQL):
NOTE: If you have a known max (eg 4) this solution can be converted to support any number by changing the modulus and adding more columns and another join.
WITH order_table_numbered as
(
SELECT ID, ACCT_NUM, NAME, ORDERDT,
ROW_NUMBER() AS (PARTITION BY ACCT_NUM ORDER BY ORDERDT) as RN
)
SELECT first.id as id, first.acct_num as acct_num, first.num as num, first.order_dt as orderdt,
second.id as id1, second.acct_num as acct_num1, second.num as num1, second.order_dt as orderdt1
FROM order_table_numbered first
LEFT JOIN order_table_numbered second ON first.ACCT_NUM = second.ACCT_NUM and (second.RN % 2 = 0)
WHERE first.RN % 2 = 1
If you have an unknown number of rows I think you should solve this on the client OR convert the groups to XML -- the XML support in SQL Server is not bad.

Difference of top two values while GROUP BY

Suppose I have the following SQL Table:
id | score
------------
1 | 4433
1 | 678
1 | 1230
1 | 414
5 | 8899
5 | 123
6 | 2345
6 | 567
6 | 2323
Now I wanted to do a GROUP BY id operation wherein the score column would be modified as follows: take the absolute difference between the top two highest scores for each id.
For example, the response for the above query should be:
id | score
------------
1 | 3203
5 | 8776
6 | 22
How can I perform this query in PostgreSQL?
Using ROW_NUMBER along with pivoting logic we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY score DESC) rn
FROM yourTable
)
SELECT id,
ABS(MAX(score) FILTER (WHERE rn = 1) -
MAX(score) FILTER (WHERE rn = 2)) AS score
FROM cte
GROUP BY id;
Demo

getting the ratio across multiple rows by a given ID

I have a (test) table:
zip | category | value
-------+----------+-------
17268 | 1 | 23
17268 | 2 | 10
17268 | 3 | 33
10011 | 1 | 22
10011 | 2 | 78
10011 | 3 | 45
I want to output another table that shows, by zipcode, the percentage of the total values that the category 3 values comprise.
For example, the total values for zipcode 17268 is 66. And for that zip category 3 values are 33. So I want to assign to 17268 the output ratio value .5 (for 33/66).
I can run this command:
select zip, sum(distinct value) from ziptest group by zip;
To get this transformation:
zip | sum
-------+-----
10011 | 145
17268 | 66
But now I want to divide that sum for each zipcode by the value of that zipcode's category 3 value.
Can anyone advise?
I suspect I'm looking for something like this:
select zip, (select value from ziptest where category = 3)/sum(distinct value) from ziptest group by zip;
or this:
select zip, sum(distinct value), (value where category = 3) from ziptest group by zip;
A correlated subquery is a good route here. It's similar to your first attempt, but with the "Correlation" between the main query and the subquery:
select zip, (select value from ziptest where category = 3 and zip = zt.zip)/sum(distinct value)
from ziptest zt
group by zip;
Alternatively using a join:
select zt.zip, zt2.cat3value/sum(value)
from ziptest zt
INNER JOIN (SELECT DISTINCT zip, value FROM ziptest WHERE category=3) zt2
ON zt.zip = zt2.zip
group by zip;
Alternatively (and probably fastest) is using a case statement:
SELECT zip, sum(CASE WHEN category=3 THEN value ELSE 0 END)/Sum(value)
FROM ziptest
GROUP BY zip;

Hierarchy trees in database and web app

I want to create web app which will use tree data structures. Users will be able to create, update and delete trees. I have the following table in PostgreSQL called nodes in database:
id INTEGER PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE,
parent_id INTEGER NULL REFERENCE nodes(id)
Getting data
I want to get data in the following form:
id | name | children
---|------|--------------
1 | a | [2,3]
2 | b | []
3 | c | [4]
4 | d | []
I created query which returns data in form
id | name | parent_id
---|------|--------------
1 | a |
2 | b | 1
3 | c | 1
4 | d | 3
And here is code:
WITH RECURSIVE nodes_cte(id, name, parent_id, level) AS (
SELECT nodes.id, nodes.name, nodes.parent_id, 0 AS level
FROM nodes
WHERE name = 'a'
UNION ALL
SELECT nodes.id, nodes.name, nodes.parent_id, level+1
FROM nodes
JOIN nodes_cte
ON nodes_cte.id = nodes.parent_id
)
SELECT * FROM nodes_cte;
Can I change SQL code to get what I want or should I do that in app??
Inserting data
I want to know what are the ways to insert data into the table. I think that following approach will work for me:
create sequence in database
increase sequence for number of elements in tree
manually compute ids in app and insert elements in the table
Are there better ways?
CREATE TABLE nodes
( id INTEGER PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
-- I created query which returns data in form
INSERT INTO nodes(id,name,parent_id)VALUES
( 1 , 'a' , NULL)
,( 2 , 'b' , 1)
,( 3 , 'c' , 1)
,( 4 , 'd' , 3)
;
SELECT p.id, p.name
, array_agg(c.id) AS children
FROM nodes p
LEFT JOIN nodes c ON c.parent_id = p.id
GROUP BY p.id, p.name
;
Result:
id | name | children
----+------+----------
1 | a | {2,3}
2 | b | {NULL}
3 | c | {4}
4 | d | {NULL}
(4 rows)
Extra: using generate_series() to insert a bunch of records. Each record having id/3 as parent, (except when zero).
INSERT INTO nodes(id,name,parent_id)
SELECT gs, 'zzz_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 5,25) gs
;
INSERTING/UPDATING DATA
Normally, your front-end should not mess with sequences, but leave that to the DBMS. You already have a UNIQUE constraint on name, because it is a natural key . So, your front-end should use that key to address rows in the nodes table, like in:
CREATE TABLE nodes2
( id SERIAL NOT NULL PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
INSERT INTO nodes2(name,parent_id)
SELECT 'Omg_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 1,15) gs
;
PREPARE upd (text, text) AS
-- child, parent
UPDATE nodes2 c
SET parent_id = p.id
FROM nodes2 p
WHERE p.name = $2 -- parent
AND c.name = $1 -- child
;
EXECUTE upd( 'Omg_12', 'Omg_11');
EXECUTE upd( 'Omg_15', 'Omg_11');
Result:
CREATE TABLE
INSERT 0 15
PREPARE
UPDATE 1
UPDATE 1
id | name | children
----+--------+-----------
1 | Omg_1 | {3,4,5}
2 | Omg_2 | {6,7,8}
3 | Omg_3 | {9,10,11}
4 | Omg_4 | {13,14}
5 | Omg_5 | {NULL}
6 | Omg_6 | {NULL}
7 | Omg_7 | {NULL}
8 | Omg_8 | {NULL}
9 | Omg_9 | {NULL}
10 | Omg_10 | {NULL}
11 | Omg_11 | {15,12}
12 | Omg_12 | {NULL}
13 | Omg_13 | {NULL}
14 | Omg_14 | {NULL}
15 | Omg_15 | {NULL}
(15 rows)

Selecting more info from the same row after using GROUP BY

I have a table containing, for example, this data:
id | value | name | date
1 | 1 | 'one' | 2015-01-02
2 | 1 | 'two' | 2015-02-03
3 | 2 | 'three'| 2014-01-03
4 | 2 | 'four' | 2014-01-02
I want for each distinct value, the name of the row with the latest date. So:
value | name | date
1 | 'two' | 2015-02-03
2 | 'three'| 2014-01-03
I currently have this query: SELECT value, MAX(date) FROM table GROUP BY value, which gives me the value and date columns I'm looking for. How do I modify the query to add the name field? Simply adding it to the SELECT clause won't work, as Postgres will (understandably) complain I have to add it to the GROUP BY clause. But doing so will add it to the uniqueness check, and my query will return all 4 rows. All I need is the name of the row where it found the latest date.
distinct on() is the most efficient way to do this with Postgres
select distinct on (value) id, value, name, date
from the_table
order by value, date;
SQLFiddle example: http://sqlfiddle.com/#!15/dff68/1
This will give you all required fields:
select t1.* from table t1
inner join (
SELECT value, MAX(date) as date FROM table GROUP BY value
)t2 on t1.date=t2.date;
SQL Fiddle: http://sqlfiddle.com/#!15/9491f/2