Rank rows per group based on numeric column values in PostgreSQL - postgresql

I have following table in PostgreSQL 11.0
drug_id synonym score
96165807064 chembl490421 0.667
96165807064 querciformolide a 1.0
96165807064 querciformolide b 1.0
96165807066 chembl196832 1.0
96165807066 cpiylcsbeicghy-uhfffaoysa-n 0.875
96165807066 schembl1752046 0.938
96165807066 stk694847 0.75
96165807066 molport-006-827-808 0.812
96165807066 akos016348681 0.625
96165807066 akos004112738 0.688
96165807066 mcule-5237395512 0.562
I would like to add a column with 'rank' group by drug_id based on the score column.
Following is the expected output
drug_id synonym score rank
96165807064 querciformolide a 1.0 1
96165807064 querciformolide b 1.0 1
96165807064 chembl490421 0.667 2
96165807066 chembl196832 1.0 1
96165807066 schembl1752046 0.938 2
96165807066 cpiylcsbeicghy-uhfffaoysa-n 0.875 3
96165807066 molport-006-827-808 0.812 4
96165807066 stk694847 0.75 5
96165807066 akos004112738 0.688 6
96165807066 akos016348681 0.625 7
96165807066 mcule-5237395512 0.562 8
I am using following query:
SELECT distinct
drug_id,
synonym,
score,
dense_RANK () OVER (
PARTITION BY drug_id
ORDER BY score
) rank_number
FROM
tbl
order by drug_id, score desc
;
I am not getting expected output using above query.
drug_id synonym score rank_number
96165807064 querciformolide a 1.0 2
96165807064 querciformolide b 1.0 2
96165807064 chembl490421 0.667 1
96165807066 chembl196832 1.0 15
96165807066 schembl1752046 0.938 14
96165807066 cpiylcsbeicghy-uhfffaoysa-n 0.875 13
96165807066 molport-006-827-808 0.812 12
96165807066 stk694847 0.75 11
96165807066 akos004112738 0.688 10
96165807066 akos016348681 0.625 9
96165807066 mcule-5237395512 0.562 8

You can use the following query:
SELECT
t.drug_id,
t.synonym,
t.score,
DENSE_RANK() OVER (
PARTITION BY t.drug_id
ORDER BY t.drug_id, t.score desc
) rank
FROM
test t;
I created a sql fiddle to show the query working.
https://www.db-fiddle.com/f/p9ANUghi8TxLgXrhUHsUaY/3

Related

SUM postgres not like expected

after the query I would like to obtain the SUM of account_move_line.balance AS ammounteur
when account_id, partner_id, invoice_id and account_account.code were =
SELECT
account_move_line.name, account_move_line.account_id,
account_move_line.partner_id, account_move_line.invoice_id,
account_move_line.journal_id,
CASE
WHEN account_account.code LIKE '40%%'
THEN '400000'
WHEN account_account.code LIKE '44%%'
THEN '440000'
ELSE account_account.code
END AS ACCOUNTGL,
CASE
WHEN account_account.code = '702000'
THEN SUM(account_move_line.balance)
ELSE (round(account_move_line.balance, 2))
END AS AMOUNTEUR
FROM
public.account_move_line
JOIN
account_account ON (account_account.id = account_move_line.account_id)
WHERE
(account_move_line.date BETWEEN '2020-03-01' AND '2020-03-31')
GROUP BY
account_move_line.account_id, account_move_line.partner_id,
account_move_line.invoice_id, account_move_line.journal_id,
account_account.code, account_move_line.balance, account_move_line.name
ORDER BY
account_move_line.account_id, account_move_line.invoice_id;
The result I get:
NAME account_id Partner_id Invoice_id J_id accountgl amounteur
"Taxe led" 186 2476 1883 1 "702000" -0.83
"Taxe eclairage" 186 2476 1883 1 "702000" -0.11
"Taxe gros et petit blanc" 186 3090 1884 1 "702000" -0.83
"Taxe eclairage" 186 2077 1885 1 "702000" 0.25
"Taxe eclairage" 186 2077 1887 1 "702000" -0.25
"Taxe eclairage" 186 2077 1888 1 "702000" -0.02
"Taxe led" 186 2481 1916 1 "702000" -0.83
"Taxe eclairage" 186 2481 1916 1 "702000" -0.52
I expected
NAME account_id Partner_id Invoice_id J_id accountgl amounteur
186 2476 1883 1 "702000" -0.94
"Taxe gros et petit blanc" 186 3090 1884 1 "702000" -0.83
"Taxe eclairage" 186 2077 1885 1 "702000" 0.25
"Taxe eclairage" 186 2077 1887 1 "702000" -0.25
"Taxe eclairage" 186 2077 1888 1 "702000" -0.02
186 2481 1916 1 "702000" -1.35
Thanks
I'm guessing, but it seems you expect the results to be grouped by account_id, partner_id, invoice_id, and perhaps journal_id. But you've told it to group by so many more columns.
account_move_line.account_id,
account_move_line.partner_id,
account_move_line.invoice_id,
account_move_line.journal_id,
account_account.code,
account_move_line.balance,
account_move_line.name
To be grouped, a row would have to have the same account, partner, invoice, and journal IDs. Plus the same code, balance, and name.
Cut your group by back to just the four IDs.
This will mean you cannot select some columns because the group has several values for that column. For example, the name. Each group will contain several names, no single name can be selected.

Assigning row number without any order

How can I use row_number() function without any order
Example Table:
COL A COL B
42123345990000 0
42123345990000 0.33333334
42123345990000 0.6666667
42123345990000 1
42123345990000 0.86340976
42123345980000 0
42123345980000 0.1
42123345980000 0.2
42123345980000 0.3432426
42123345980000 0.5
42123345980000 0.53144264
Desired Output:
ROW COL A COL B
1 42123345990000 0
2 42123345990000 0.33333334
3 42123345990000 0.6666667
4 42123345990000 1
5 42123345990000 0.86340976
1 42123345980000 0
2 42123345980000 0.1
3 42123345980000 0.2
4 42123345980000 0.3432426
5 42123345980000 0.5
6 42123345980000 0.53144264
I would like partition to be existing on COL A but no ordering.
The general answer to the question of a row_number without an ordering is to order over a constant - row_number() over (order by 1)
In your case the expected output shows that the row number is actually a rank based upon the col b values, so what you actually want is - dense_rank() over (order by COLB)

Postgres: for each row evaluate all successive rows under conditions

I have this table:
id | datetime | row_number
1 2018-04-09 06:27:00 1
1 2018-04-09 14:15:00 2
1 2018-04-09 15:25:00 3
1 2018-04-09 15:35:00 4
1 2018-04-09 15:51:00 5
1 2018-04-09 17:05:00 6
1 2018-04-10 06:42:00 7
1 2018-04-10 16:39:00 8
1 2018-04-10 18:58:00 9
1 2018-04-10 19:41:00 10
1 2018-04-14 17:05:00 11
1 2018-04-14 17:48:00 12
1 2018-04-14 18:57:00 13
I'd count for each row the successive rows with time <= '01:30:00' and start the successive evaluation from the first row that doesn't meet the condition.
I try to exlplain better the question.
Using windows function lag():
SELECT id, datetime,
CASE WHEN datetime - lag (datetime,1) OVER(PARTITION BY id ORDER BY datetime)
< '01:30:00' THEN 1 ELSE 0 END AS count
FROM table
result is:
id | datetime | count
1 2018-04-09 06:27:00 0
1 2018-04-09 14:15:00 0
1 2018-04-09 15:25:00 1
1 2018-04-09 15:35:00 1
1 2018-04-09 15:51:00 1
1 2018-04-09 17:05:00 1
1 2018-04-10 06:42:00 0
1 2018-04-10 16:39:00 0
1 2018-04-10 18:58:00 0
1 2018-04-10 19:41:00 1
1 2018-04-14 17:05:00 0
1 2018-04-14 17:48:00 1
1 2018-04-14 18:57:00 1
But it's not ok for me because I want exclude row_number 5 because interval between row_number 5 and row_number 2 is > '01:30:00'. And start the new evaluation from row_number 5.
The same for row_number 13.
The right output could be:
id | datetime | count
1 2018-04-09 06:27:00 0
1 2018-04-09 14:15:00 0
1 2018-04-09 15:25:00 1
1 2018-04-09 15:35:00 1
1 2018-04-09 15:51:00 0
1 2018-04-09 17:05:00 1
1 2018-04-10 06:42:00 0
1 2018-04-10 16:39:00 0
1 2018-04-10 18:58:00 0
1 2018-04-10 19:41:00 1
1 2018-04-14 17:05:00 0
1 2018-04-14 17:48:00 1
1 2018-04-14 18:57:00 0
So right count is 5.
I'd use a recursive query for this:
WITH RECURSIVE tmp AS (
SELECT
id,
datetime,
row_number,
0 AS counting,
datetime AS last_start
FROM mytable
WHERE row_number = 1
UNION ALL
SELECT
t1.id,
t1.datetime,
t1.row_number,
CASE
WHEN lateral_1.counting THEN 1
ELSE 0
END AS counting,
CASE
WHEN lateral_1.counting THEN tmp.last_start
ELSE t1.datetime
END AS last_start
FROM
mytable AS t1
INNER JOIN
tmp ON (t1.id = tmp.id AND t1.row_number - 1 = tmp.row_number),
LATERAL (SELECT (t1.datetime - tmp.last_start) < '1h 30m'::interval AS counting) AS lateral_1
)
SELECT id, datetime, counting
FROM tmp
ORDER BY id, datetime;

How to query with lead() values not in current range

I´m having problems querying when lead() values are not within the range of current row, rows on the range's edge return null lead() values.
Let’s say I have a simple table to keep track of continuous counters
create table anytable
( wseller integer NOT NULL,
wday date NOT NULL,
wshift smallint NOT NULL,
wconter numeric(9,1) )
with the following values
wseller wday wshift wcounter
1 2016-11-30 1 100.5
1 2017-01-03 1 102.5
1 2017-01-25 2 103.2
1 2017-02-05 2 106.1
2 2015-05-05 2 81.1
2 2017-01-01 1 92.1
2 2017-01-01 2 93.1
3 2016-12-01 1 45.2
3 2017-01-05 1 50.1
and want net units for current year
wseller wday wshift units
1 2017-01-03 1 2
1 2017-01-25 2 0.7
1 2017-02-05 2 2.9
2 2017-01-01 1 11
2 2017-01-01 2 1
3 2017-01-05 1 4.9
If I use
seletc wseller, wday, wshift, wcounter-lead(wcounter) over (partition by wseller order by wseller, wday desc, wshift desc)
from anytable
where wday>='2017-01-01'
gives me nulls on the first wseller by partition. I´m using this query within a large CTE.
What am I doing wrong?
The scope of a window function takes into account conditions in the WHERE clause. Move the condition to the outer query:
select *
from (
select
wseller, wday, wshift,
wcounter- lead(wcounter) over (partition by wseller order by wday desc, wshift desc)
from anytable
) s
where wday >= '2017-01-01'
order by wseller, wday, wshift
wseller | wday | wshift | ?column?
---------+------------+--------+----------
1 | 2017-01-03 | 1 | 2.0
1 | 2017-01-25 | 2 | 0.7
1 | 2017-02-05 | 2 | 2.9
2 | 2017-01-01 | 1 | 11.0
2 | 2017-01-01 | 2 | 1.0
3 | 2017-01-05 | 1 | 4.9
(6 rows)

Adding a number to a field according to some specific condition

I have the Following Data
CompId PersonelNo EduId RecordsDay DateEs
1 1000 1 2 1370
1 1000 2 10 1370
1 1002 2 5 1380
1 1003 1 4 1391
1 1003 2 7 1391
I want to add (1392-1390=2) for RecordsDay of the Maximum EduID and Records which DateEs are less than or equal to 1390 and add (DateEs -1390) for RecordsDay the Maximum EduID and Records with DateEs Bigger than 1390
So the Data would be like this
CompId PersonelNo EduId RecordsDay DateEs
1 1000 1 2 1370 // record is the same becuase eduID is not Max for this Personel
1 1000 2 12 1370 // this is max EduId for this personel and DateEs is less than 1390 so (1392-1390) +10 = 12
1 1002 2 7 1380 //this is the only record for this personel and DateEs is less than 1390(1392-1390) +5 = 7
1 1003 1 4 1391 // record is the same becuase eduID is not Max for this Personel
1 1003 2 8 1391 // this is max EduId for this personel and DateEs is Greater than 1390 so (1391-1390) +7 = 8
I want to have TSQl for it. I am working on it, but can write it up to now
You can try:
SELECT CASE
WHEN [EduId] = MAX(EduId) OVER (Partition by PersonelNo) AND DateEs <= 1390 THEN RecordsDay + 2
WHEN [EduId] = MAX(EduId) OVER (Partition by PersonelNo) AND DateEs > 1390 THEN RecordsDay + (DateEs -1390) END