How to optimize query - postgresql

I have the same problem as mentioned in In SQL, how to select the top 2 rows for each group. The answer is working fine. But it takes too much time. How to optimize this query?
Example:
sample_table
act_id: act_cnt:
1 1
2 1
3 1
4 1
5 1
6 3
7 3
8 3
9 4
a 4
b 4
c 4
d 4
e 4
Now i want to group it (or using some other ways). And i want to select 2 rows from each group. Sample Output:
act_id: act_cnt:
1 1
2 1
6 3
7 3
9 4
a 4
I am new to SQL. How to do it?

The answer you linked to uses an inefficient workaround for MySQL's lack of window functions.
Using a window function is most probably much faster as you only need to read the table once:
select name,
score
from (
select name,
score,
dense_rank() over (partition by name order by score desc) as rnk
from the_table
) t
where rnk <= 2;
SQLFiddle: http://sqlfiddle.com/#!15/b0198/1
Having an index on (name, score) should speed up this query.
Edit after the question (and the problem) has been changed
select act_id,
act_cnt
from (
select act_id,
act_cnt,
row_number() over (partition by act_cnt order by act_id) as rn
from sample_table
) t
where rn <= 2;
New SQLFiddle: http://sqlfiddle.com/#!15/fc44b/1

Related

SQL - add sequential counter column starting at condition

I have a table:
id market
1 mkt1
2 mkt2
3 mkt1
4 special
5 mkt2
6 mkt2
7 special
How can I select all columns from the table while also adding a sequential counter column, which starts counting once a condition has been triggered? In this example, when market=="special":
id market count
1 mkt1 0
2 mkt2 0
3 mkt1 0
4 special 1
5 mkt2 2
6 mkt2 3
7 special 4
Here's one option using row_number with union all:
with cte as (
select min(id) as id from t where market = 'special'
)
select t.id, t.market, 0 rn
from t join cte on t.id < cte.id
union all
select t.id, t.market, row_number() over (order by t.id) rn
from t join cte on t.id >= cte.id
Online Demo
Edited to use min after your edits...

Select rows with second highest value for each ID repeated multiple times

Id values
1 10
1 20
1 30
1 40
2 3
2 9
2 0
3 14
3 5
3 7
Answer should be
Id values
1 30
2 3
3 7
I tried as below
Select distinct
id,
(select max(values)
from table
where values not in(select ma(values) from table)
)
You need the row_number window function. This adds a column with a row count for each group (in your case the ids). In a subquery you are able to ask for the second row of each group.
demo:db<>fiddle
SELECT
id, values
FROM (
SELECT
*,
row_number() OVER (PARTITION BY id ORDER BY values DESC)
FROM
table
) s
WHERE row_number = 2

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.

How to number rows with a repeating 1,2,3,4, 1,2,3,4,... series

How can I add a series in length of 4 to a table like this:
Source table:
id
1
2
3
4
5
6
7
8
Results table:
id series
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
I'm using PostgreSQL 9.1.
If your IDs are really consecutive and gapless, you can just use id % 4 + 1. But I imagine that in reality your IDs aren't so orderly, and if they're generated from a SEQUENCE you shouldn't rely on them being gapless.
You can do it properly with row_number(), as shown here: http://sqlfiddle.com/#!12/22767/5
SELECT
id,
(row_number() OVER (ORDER BY id) - 1) % 4 + 1
FROM Table1
ORDER BY 1;
It's also possible to do using generate_series as a set-returning-function in the SELECT list, but that's a PostgreSQL extension, wheras the above is standard SQL that'll work in any modern database except MySQL, which doesn't support window functions.
If you want to actually add a column to the table it gets a bit more complicated. I'm not really sure why you'd want to do that, but it's possible using UPDATE ... FROM:
BEGIN;
ALTER TABLE table1 ADD COLUMN col2 INTEGER;
WITH gen_num(id,n) AS (
SELECT
id,
(row_number() OVER (ORDER BY id) - 1) % 4 + 1
FROM Table1
ORDER BY 1)
UPDATE table1 SET col2 = n
FROM gen_num
WHERE gen_num.id = table1.id;
COMMIT;

SQL Server Multiple Running Totals

I have a table like this
UserID Score Date
5 6 2010-1-1
7 8 2010-1-2
5 4 2010-1-3
6 3 2010-1-4
7 4 2010-1-5
6 1 2010-1-6
I would like to get a table like this
UserID Score RunningTotal Date
5 6 6 2010-1-1
5 4 10 2010-1-3
6 3 3 2010-1-4
6 1 4 2010-1-6
7 8 8 2010-1-2
7 4 12 2010-1-5
Thanks!
Unlike Oracle, PostgreSQL and even MySQL, SQL Server has no efficient way to calculate running totals.
If you have few scores per UserID, you can use this:
SELECT userId,
(
SELECT SUM(score)
FROM scores si
WHERE si.UserID = so.UserID
AND si.rn <= so.rn
)
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY UserID) AS rn
FROM scores
) so
, however, this will be very inefficient for larger tables.
For larger tables, you could benefit from using (God help me) a cursor.
Would something like this work for you...?
SELECT UserID, Score,
(SELECT SUM(Score)
FROM TableName innerTable
WHERE innerTable.UserID = outerTable.userID
AND innerTable.Date <= outerTable.date) AS RunningTotal
FROM TableName outerTable
This assumes, though, that a user cannot have more than one score per day. (What is your PK?)