getting the ratio across multiple rows by a given ID - postgresql

I have a (test) table:
zip | category | value
-------+----------+-------
17268 | 1 | 23
17268 | 2 | 10
17268 | 3 | 33
10011 | 1 | 22
10011 | 2 | 78
10011 | 3 | 45
I want to output another table that shows, by zipcode, the percentage of the total values that the category 3 values comprise.
For example, the total values for zipcode 17268 is 66. And for that zip category 3 values are 33. So I want to assign to 17268 the output ratio value .5 (for 33/66).
I can run this command:
select zip, sum(distinct value) from ziptest group by zip;
To get this transformation:
zip | sum
-------+-----
10011 | 145
17268 | 66
But now I want to divide that sum for each zipcode by the value of that zipcode's category 3 value.
Can anyone advise?
I suspect I'm looking for something like this:
select zip, (select value from ziptest where category = 3)/sum(distinct value) from ziptest group by zip;
or this:
select zip, sum(distinct value), (value where category = 3) from ziptest group by zip;

A correlated subquery is a good route here. It's similar to your first attempt, but with the "Correlation" between the main query and the subquery:
select zip, (select value from ziptest where category = 3 and zip = zt.zip)/sum(distinct value)
from ziptest zt
group by zip;
Alternatively using a join:
select zt.zip, zt2.cat3value/sum(value)
from ziptest zt
INNER JOIN (SELECT DISTINCT zip, value FROM ziptest WHERE category=3) zt2
ON zt.zip = zt2.zip
group by zip;
Alternatively (and probably fastest) is using a case statement:
SELECT zip, sum(CASE WHEN category=3 THEN value ELSE 0 END)/Sum(value)
FROM ziptest
GROUP BY zip;

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

tsql - How to convert multiples rows and columns into one row

id | acct_num | name | orderdt
1 1006A Joe Doe 1/1/2021
2 1006A Joe Doe 1/5/2021
EXPECTED OUTPUT
id | acct_num | name | orderdt | id1 | acct_num1 | NAME1 | orderdt1
1 1006A Joe Doe 1/1/2021 2 1006A Joe Doe 1/5/2021
My query is the following:
Select id,
acct_num,
name,
orderdt
from order_tbl
where acct_num = '1006A'
and orderdt >= '1/1/2021'
If you always have one or two rows you could do it like this (I'm assuming the latest version of SQL Server because you said TSQL):
NOTE: If you have a known max (eg 4) this solution can be converted to support any number by changing the modulus and adding more columns and another join.
WITH order_table_numbered as
(
SELECT ID, ACCT_NUM, NAME, ORDERDT,
ROW_NUMBER() AS (PARTITION BY ACCT_NUM ORDER BY ORDERDT) as RN
)
SELECT first.id as id, first.acct_num as acct_num, first.num as num, first.order_dt as orderdt,
second.id as id1, second.acct_num as acct_num1, second.num as num1, second.order_dt as orderdt1
FROM order_table_numbered first
LEFT JOIN order_table_numbered second ON first.ACCT_NUM = second.ACCT_NUM and (second.RN % 2 = 0)
WHERE first.RN % 2 = 1
If you have an unknown number of rows I think you should solve this on the client OR convert the groups to XML -- the XML support in SQL Server is not bad.

Difference of top two values while GROUP BY

Suppose I have the following SQL Table:
id | score
------------
1 | 4433
1 | 678
1 | 1230
1 | 414
5 | 8899
5 | 123
6 | 2345
6 | 567
6 | 2323
Now I wanted to do a GROUP BY id operation wherein the score column would be modified as follows: take the absolute difference between the top two highest scores for each id.
For example, the response for the above query should be:
id | score
------------
1 | 3203
5 | 8776
6 | 22
How can I perform this query in PostgreSQL?
Using ROW_NUMBER along with pivoting logic we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY score DESC) rn
FROM yourTable
)
SELECT id,
ABS(MAX(score) FILTER (WHERE rn = 1) -
MAX(score) FILTER (WHERE rn = 2)) AS score
FROM cte
GROUP BY id;
Demo

Postgres : Get multiple columns with group by

Table
select * from hello;
id | name
----+------
1 | abc
2 | xyz
3 | abc
4 | dfg
5 | abc
(5 rows)
Query
select name,count(*) from hello where name in ('abc', 'dfg') group by name;
name | count
------+-------
dfg | 1
abc | 3
(2 rows)
In the above query, I am trying to get the count of the rows whose name is in the tuple. However, I want to get the id as well with the count of the names. Is there a way this can be achievable? Thanks
If you want to return the "id" values, then you can use a window function:
select id, name, count(*) over(PARTITION BY name)
from hello
where name in ('abc', 'dfg');
This will return the id values along with the count of rows per name.
If you want to see all IDs for each name, you need to aggregate them:
select name, count(*), array_agg(id) as ids
from hello
where name in ('abc', 'dfg')
group by name;
This returns something like this:
name | count | ids
-----+-------+--------
abc | 3 | {1,3,5}
dfg | 1 | {4}

Filter rows based on two fields, where one of them contains a selection criterion

Given the following table
group | weight | category_id | category_name_plus
1 10 100 Ab
1 20 101 Bcd
1 30 100 Efghij
2 10 101 Bcd
2 20 101 Cdef
2 30 100 Defgh
2 40 100 Ab
3 10 102 Fghijkl
3 20 101 Ab
The "weight" is unique for each group and is also an indicator for the order of records inside the group.
What I want is to retrieve one record per group filtered by category_id, but only the record having the highest "weight" inside its "group".
Example for filtering by category_id = 100:
group | weight | category_id | category_name_plus
1 30 100 Efghij
2 40 100 Ab
Example for filtering by category_id = 101:
group | weight | category_id | category_name_plus
1 20 101 Bcd
2 20 101 Cdef
3 20 101 Ab
How can I select just these rows?
I tried fiddling with UNIQUE, MAX(category_id) etc. but I'm still unable to get the correct results. The main problem for me is to get the category_name_plus value here.
I am working with PostgreSQL 9.4(beta 3), because I also need various other niceties like "WITH ORDINALITY" etc.
The rank window function should do the trick:
SELECT "group", weight, category_id, category_name_plus
FROM (SELECT "group", weight, category_id, category_name_plus,
RANK() OVER (PARTITION BY "group"
ORDER BY weight DESC) AS rk
FROM my_table) t
WHERE rk = 1 AND category_id = 101
Note:
"group" is a reserved word in SQL, so it has to be surrounded by quotes in order to be used as a column name. It would probably be better, though, to replace it with a non-reserved word, such as "group_id".
Try something like:
SELECT DISTINCT ON (category_id) *
from your_table
order by category_id, weight desc