Filter rows based on two fields, where one of them contains a selection criterion - postgresql

Given the following table
group | weight | category_id | category_name_plus
1 10 100 Ab
1 20 101 Bcd
1 30 100 Efghij
2 10 101 Bcd
2 20 101 Cdef
2 30 100 Defgh
2 40 100 Ab
3 10 102 Fghijkl
3 20 101 Ab
The "weight" is unique for each group and is also an indicator for the order of records inside the group.
What I want is to retrieve one record per group filtered by category_id, but only the record having the highest "weight" inside its "group".
Example for filtering by category_id = 100:
group | weight | category_id | category_name_plus
1 30 100 Efghij
2 40 100 Ab
Example for filtering by category_id = 101:
group | weight | category_id | category_name_plus
1 20 101 Bcd
2 20 101 Cdef
3 20 101 Ab
How can I select just these rows?
I tried fiddling with UNIQUE, MAX(category_id) etc. but I'm still unable to get the correct results. The main problem for me is to get the category_name_plus value here.
I am working with PostgreSQL 9.4(beta 3), because I also need various other niceties like "WITH ORDINALITY" etc.

The rank window function should do the trick:
SELECT "group", weight, category_id, category_name_plus
FROM (SELECT "group", weight, category_id, category_name_plus,
RANK() OVER (PARTITION BY "group"
ORDER BY weight DESC) AS rk
FROM my_table) t
WHERE rk = 1 AND category_id = 101
Note:
"group" is a reserved word in SQL, so it has to be surrounded by quotes in order to be used as a column name. It would probably be better, though, to replace it with a non-reserved word, such as "group_id".

Try something like:
SELECT DISTINCT ON (category_id) *
from your_table
order by category_id, weight desc

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

Typeorm order by after distinct on with postgresql

I have a table below:
id
product_id
priceĀ 
1
1
100
2
1
150
3
2
120
4
2
190
5
3
100
6
3
80
I want to select cheapest price for product and sort them by price
Expected output:
id
product_id
price
6
3
80
1
1
100
3
2
120
What I try so far:
`
repository.createQueryBuilder('products')
.orderBy('products.id')
.distinctOn(['products.id'])
.addOrderBy('price')
`
This query returns, cheapest products but not sort them. So, addOrderBy doesn't effect to products. Is there a way to sort products after distinctOn ?
SELECT id,
product_id,
price
FROM (SELECT id,
product_id,
price,
Dense_rank()
OVER (
partition BY product_id
ORDER BY price ASC) dr
FROM product) inline_view
WHERE dr = 1
ORDER BY price ASC;
Setup:
postgres=# create table product(id int, product_id int, price int);
CREATE TABLE
postgres=# insert into product values (1,1,100),(2,1,150),(3,2,120),(4,2,190),(5,3,100),(6,3,80);
INSERT 0 6
Output
id | product_id | price
----+------------+-------
6 | 3 | 80
1 | 1 | 100
3 | 2 | 120
(3 rows)

Postgresql difference between rows

My data:
id value
1 10
1 20
1 60
2 10
3 10
3 30
How to compute column 'change'?
id value change | my comment, how to compute
1 10 10 | 20-10
1 20 40 | 60-20
1 60 40 | default_value-60. In this example default_value=100
2 10 90 | default_value-10
3 10 20 | 30-10
3 30 70 | default_value-30
In other words: if row of id is last, then compute 100-value,
else compute next_value-value_now
You can access the value of the "next" (or "previous") row using a window function. The concept of a "next" row only makes sense if you have a column to define an order on the rows. You said you have a date column on which you can order the result. I used the column name your_date_column for this. You need to replace that with the actual column name of course.
select id,
value,
lead(value, 1, 100) over (partition by id order by your_date_column) - value as change
from the_table
order by id, your_date_column
lead(value, 1, 100) says: take the column value of the "next" row (that's the 1). If there is no such row, use the default value 100 instead.
Join on a subquery and use ROW_NUMBER to find the last value per group
WITH CTE AS(
SELECT id,value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn,
(LEAD(value) OVER (PARTITION BY id ORDER BY date)-value) change FROM t)
SELECT cte.id,cte.value,
(CASE WHEN cte.change IS NULL THEN 100-cte.value ELSE cte.change END)as change FROM cte LEFT JOIN
(SELECT id,MAX(rn) mrn FROM cte
GROUP BY id) as x
ON x.mrn=cte.rn AND cte.id=x.id
FIDDLE

select two maximum values per person based on a column partition

Hi if I have the following table:
Person------Score-------Score_type
1 30 A
1 35 A
1 15 B
1 16 B
2 74 A
2 68 A
2 40 B
2 39 B
Where for each person and score type I want to pick out the maximum score to obtain a table like:
Person------Score-------Score_type
1 35 A
1 16 B
2 74 A
2 40 B
I can do this using multiple select statements, but this will be cumbersome, especially later on. so I was wondering if there is a function which can help me do this. I have used the parititon function before but only to label sequences in a table....
select person,
score_type,
max(score) as score
from scores
group by person, score_type
order by person, score_type;
With "partition function" I guess you mean window functions. They can indeed be used for this as well:
select person
score_type,
score
from (
select person,
score_type,
score,
row_number() over (partition by person, score_type order by score desc) as rn
from scores
) t
where rn = 1
order by person, score_type;
Using the max() aggregate function along with the grouping by person and score_type should do the trick.

Sql join and remove distinct in two separate column

I have table ordered
form_id | procedure_id
----------+-------------
101 | 24
101 | 23
101 | 22
102 | 7
102 | 6
102 | 3
102 | 2
And another table have table performed
form_id | procedure_id
----------+-------------
101 | 42
101 | 45
102 | 5
102 | 3
102 | 7
102 | 12
102 | 13
Expected output
form_id o_procedure_id p_procedure_id
101 24 42
101 23 45
101 22 NULL
102 7 7
102 6 5
102 3 3
102 2 12
102 NULL 13
I tried the below query:
with ranked as
(select
dense_rank() over (partition by po.form_id order by po.procedure_id) rn1,
dense_rank() over (partition by po.form_id order by pp.procedure_id) rn2,
po.form_id,
po.procedure_id,
pp.procedure_id
from ordered po,
performed pp where po.form_id = pp.form_id)
select ranked.* from ranked
--where rn1=1 or rn2=1
The above query return the value with repeat value ordered and procedure ID.
How to get Excepted output?
I wasn't quite sure how you would want to handle multiple null values and/or null values on both sides of your tables. My example therefor assumes the first table to be leading and include all entries while the second table might include holes. Query ain't pretty but i suppose it does what you expect it to:
select test1_sub.form_id, test1_sub.process_id as pid_1, test2_sub.process_id as pid_2 from (
select form_id,
process_id,
rank() over (partition by form_id order by process_id asc nulls last)
from test1) as test1_sub
left join (
select * from (
select form_id,
process_id,
rank() over (partition by form_id order by process_id asc nulls last)
from test2
) as test2_nonexposed
) as test2_sub on test1_sub.form_id = test2_sub.form_id and test1_sub.rank = test2_sub.rank;