Select rows with second highest value for each ID repeated multiple times

Select rows with second highest value for each ID repeated multiple times - postgresql

Id values
1 10
1 20
1 30
1 40
2 3
2 9
2 0
3 14
3 5
3 7
Answer should be
Id values
1 30
2 3
3 7
I tried as below
Select distinct
id,
(select max(values)
from table
where values not in(select ma(values) from table)
)

You need the row_number window function. This adds a column with a row count for each group (in your case the ids). In a subquery you are able to ask for the second row of each group.
demo:db<>fiddle
SELECT
id, values
FROM (
SELECT
*,
row_number() OVER (PARTITION BY id ORDER BY values DESC)
FROM
table
) s
WHERE row_number = 2

Related

PySpark subtract last row from first row in a group

I want to use window function to partition by ID and have the last row of each group to be subtracted from the first row and create a separate column with the output. What is the cleanest way to achieve that result?
ID col1
1 1
1 2
1 4
2 1
2 1
2 6
3 5
3 5
3 7
Desired output:
ID col1 col2
1 1 3
1 2 3
1 4 3
2 1 5
2 1 5
2 6 5
3 5 2
3 5 2
3 7 2

Code below
w=Window.partitionBy('ID').orderBy('col1').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('out', last('col1').over(w)-first('col1').over(w)).show()

Sounds like you’re defining the “first” row as the row with the minimum value of col1 in the group, and the “last” row as the row with maximum value of col1 in the group. To compute them, you can use the MIN and MAX window functions:
SELECT
ID,
col1,
(MAX(col1) OVER (PARTITION BY ID)) - (MIN(col1) OVER (PARTITION BY ID)) AS col2
FROM
...
If you’re defining “first” and “last” row somehow differently (e.g., in terms of some timestamp), you can use the more general FIRST_VALUE and LAST_VALUE window functions:
SELECT
ID,
col1,
(LAST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
-
(FIRST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
AS col2
FROM
...
The two snippets above are equivalent, but the latter is more general: you can specify ordering by a different column and/or you can modify the window specification.

SQL - add sequential counter column starting at condition

I have a table:
id market
1 mkt1
2 mkt2
3 mkt1
4 special
5 mkt2
6 mkt2
7 special
How can I select all columns from the table while also adding a sequential counter column, which starts counting once a condition has been triggered? In this example, when market=="special":
id market count
1 mkt1 0
2 mkt2 0
3 mkt1 0
4 special 1
5 mkt2 2
6 mkt2 3
7 special 4

Here's one option using row_number with union all:
with cte as (
select min(id) as id from t where market = 'special'
)
select t.id, t.market, 0 rn
from t join cte on t.id < cte.id
union all
select t.id, t.market, row_number() over (order by t.id) rn
from t join cte on t.id >= cte.id
Online Demo
Edited to use min after your edits...

PostgreSQL materialized view with global and partitioned ranks

I have a table with 10 mils. of user ratings.
I need to create materialized view that has global rating and rating by country and is refreshed once a day.
I came up with the following select query:
SELECT row_number() OVER(ORDER BY value DESC, id) AS rank_global,
row_number() OVER(PARTITION BY country ORDER BY value DESC, id) AS rank_country,
*
FROM rate
ORDER BY value DESC, id
LIMIT 100000
Is there a way to speed up this query or maybe there is another way to do the same? I created btree (value desc, id) and (country, value desc, id) indexes but it still takes a lot of time to complete.
Example:
Creating a table and populating it with users with random value column and random country:
CREATE TABLE rate
(
id serial NOT NULL,
name text,
value integer NOT NULL DEFAULT 0,
country character varying,
CONSTRAINT rate_pkey PRIMARY KEY (id)
);
INSERT INTO rate
(SELECT n, ('user_'||n), (random()*30)::int, ('country_'||(random()*3)::int)
FROM generate_series(0,10) AS n);
CREATE INDEX rate_country_value_id_index
ON rate
USING btree(country, value DESC, id);
CREATE INDEX rate_value_id_index
ON rate
USING btree(value DESC, id);
Table contents:
id name value country
0 user_0 28 country_2
1 user_1 24 country_2
2 user_2 29 country_1
3 user_3 11 country_1
4 user_4 16 country_1
5 user_5 28 country_0
6 user_6 3 country_1
7 user_7 7 country_1
8 user_8 28 country_1
9 user_9 4 country_0
10 user_10 29 country_1
Then I create materialized view:
CREATE MATERIALIZED VIEW rate_view AS
SELECT row_number() OVER (ORDER BY value DESC, id) AS rgl,
row_number() OVER (PARTITION BY country ORDER BY value DESC, id) AS rc,
*
FROM rate
ORDER BY value DESC, id;
View contents (rgl - global rank, rc - rank by country):
rgl rc id name value country
1 1 2 user_2 29 country_1
2 2 10 user_10 29 country_1
3 1 5 user_5 28 country_0
4 1 0 user_0 28 country_2
5 3 8 user_8 28 country_1
6 2 1 user_1 24 country_2
7 4 4 user_4 16 country_1
8 5 3 user_3 11 country_1
9 6 7 user_7 7 country_1
10 2 9 user_9 4 country_0
11 7 6 user_6 3 country_1
Now i can create complex queries to select users with closest rank and its neighbours by rank. Both globally and by country.
For example, (after creating (value,id) and (rgl) indexes on view) here is global top 50 and 5 users closest by rank to value 9999942:
(
WITH closest_rank AS
(
SELECT rgl FROM rate_view
WHERE value <= 9999942
ORDER BY value DESC, id ASC
LIMIT 1
)
SELECT rgl, name, value
FROM rate_view
WHERE rgl > (SELECT rgl-3 FROM closest_rank )
ORDER BY rgl ASC
LIMIT 5
)
UNION
SELECT rgl, name, value
FROM rate_view
WHERE rgl <=50
ORDER BY rgl;

Postgresql difference between rows

My data:
id value
1 10
1 20
1 60
2 10
3 10
3 30
How to compute column 'change'?
id value change | my comment, how to compute
1 10 10 | 20-10
1 20 40 | 60-20
1 60 40 | default_value-60. In this example default_value=100
2 10 90 | default_value-10
3 10 20 | 30-10
3 30 70 | default_value-30
In other words: if row of id is last, then compute 100-value,
else compute next_value-value_now

You can access the value of the "next" (or "previous") row using a window function. The concept of a "next" row only makes sense if you have a column to define an order on the rows. You said you have a date column on which you can order the result. I used the column name your_date_column for this. You need to replace that with the actual column name of course.
select id,
value,
lead(value, 1, 100) over (partition by id order by your_date_column) - value as change
from the_table
order by id, your_date_column
lead(value, 1, 100) says: take the column value of the "next" row (that's the 1). If there is no such row, use the default value 100 instead.

Join on a subquery and use ROW_NUMBER to find the last value per group
WITH CTE AS(
SELECT id,value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn,
(LEAD(value) OVER (PARTITION BY id ORDER BY date)-value) change FROM t)
SELECT cte.id,cte.value,
(CASE WHEN cte.change IS NULL THEN 100-cte.value ELSE cte.change END)as change FROM cte LEFT JOIN
(SELECT id,MAX(rn) mrn FROM cte
GROUP BY id) as x
ON x.mrn=cte.rn AND cte.id=x.id
FIDDLE

How to optimize query

I have the same problem as mentioned in In SQL, how to select the top 2 rows for each group. The answer is working fine. But it takes too much time. How to optimize this query?
Example:
sample_table
act_id: act_cnt:
1 1
2 1
3 1
4 1
5 1
6 3
7 3
8 3
9 4
a 4
b 4
c 4
d 4
e 4
Now i want to group it (or using some other ways). And i want to select 2 rows from each group. Sample Output:
act_id: act_cnt:
1 1
2 1
6 3
7 3
9 4
a 4
I am new to SQL. How to do it?

The answer you linked to uses an inefficient workaround for MySQL's lack of window functions.
Using a window function is most probably much faster as you only need to read the table once:
select name,
score
from (
select name,
score,
dense_rank() over (partition by name order by score desc) as rnk
from the_table
) t
where rnk <= 2;
SQLFiddle: http://sqlfiddle.com/#!15/b0198/1
Having an index on (name, score) should speed up this query.
Edit after the question (and the problem) has been changed
select act_id,
act_cnt
from (
select act_id,
act_cnt,
row_number() over (partition by act_cnt order by act_id) as rn
from sample_table
) t
where rn <= 2;
New SQLFiddle: http://sqlfiddle.com/#!15/fc44b/1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Select rows with second highest value for each ID repeated multiple times - postgresql

Id values 1 10 1 20 1 30 1 40 2 3 2 9 2 0 3 14 3 5 3 7 Answer should be Id values 1 30 2 3 3 7 I tried as below Select distinct id, (select max(values) from table where values not in(select ma(values) from table) )

Related

PySpark subtract last row from first row in a group

SQL - add sequential counter column starting at condition

PostgreSQL materialized view with global and partitioned ranks

Postgresql difference between rows

How to optimize query

Categories

Resources