How to retrieve top 3 results for each column in postgresql? - postgresql

I have given a question. The table looks like this..
STATE | year1 | ... | year 10
AP | 100 | ... | 120
assam | 13 | .. | 42
madhya pradesh | 214 | ... | 421
Now, I need to get the top - 3 states for each year.
I tried everything possible. But, I am not able to filter results per column.

You have a design problem. The enumerated column are almost always a sign of bad design.
For now you could unpivot using unnest and then use window function row_number to get the top 3 states per year:
with unpivoted as (
select state,
unnest(array[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) as year,
unnest(array[
year_1, year_2, year_3,
year_4, year_5, year_6,
year_7, year_8, year_9,
year_10
]) as value,
from your_table
)
select *
from (
select t.*,
row_number() over (
partition by year
order by value desc
) as seqnum
from unpivoted t
) t
where seqnum <= 3;
Demo

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

How to write a select query for displaying data on a table in another way using Postgresql?

I want to write a select query to pick data from a table which is shown in this image below,PICTURE_1
1.Table Containing Data
and display it like this image in this link below, PICTURE_2
2.Result of the query
About the data: The first picture shows data logged into a table for 2 seconds from 3 IDs(1,2&3) having 2 sub IDs (aa&bb). Values and timestamp are also displayed in the picture. The table conatins only 3 column as shown in PICTURE_1. Could you guys help me write a query to display data in the table to get displayed as shown in the second image using Postgresql?. You can extract ID name using substring function. The language that Im using is plpgsql. Any ideas/logic also will be good.Thank you for your time.
Please try this. Here row value has been shown in column wise and also use CTE.
-- PostgreSQL(v11)
WITH cte_t AS (
SELECT LEFT(name, 1) id
, RIGHT(name, POSITION('.' IN REVERSE(name)) - 1) t_name
, value
, time_stamp
FROM test
)
SELECT id
, time_stamp :: DATE "date"
, time_stamp :: TIME "time"
, MAX(CASE WHEN t_name = 'aa' THEN value END) "aa"
, MAX(CASE WHEN t_name = 'bb' THEN value END) "bb"
FROM cte_t
GROUP BY id, time_stamp
ORDER BY date, time, id;
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=6d35047560b3f83e6c906584b23034e9
Check this query dbfiddle
with cte (name, value, timeStamp) as (values
('1.aa', 1, '2021-08-20 10:10:01'),
('2.aa', 2, '2021-08-20 10:10:01'),
('3.aa', 3, '2021-08-20 10:10:01'),
('1.bb', 4, '2021-08-20 10:10:01'),
('2.bb', 5, '2021-08-20 10:10:01'),
('3.bb', 6, '2021-08-20 10:10:01'),
('1.aa', 7, '2021-08-20 10:10:02'),
('2.aa', 8, '2021-08-20 10:10:02'),
('3.aa', 9, '2021-08-20 10:10:02'),
('1.bb', 0, '2021-08-20 10:10:02'),
('2.bb', 1, '2021-08-20 10:10:02'),
('3.bb', 2, '2021-08-20 10:10:02')
), sub_cte as (
select split_name[1] as id, split_name[2] as name, value, tt::date as date, tt::time as time from (
select
regexp_split_to_array(name, '\.') split_name,
value,
to_timestamp(timestamp, 'YYYY-MM-DD HH:MI:SS') as tt
from cte
) foo
)
select id, date, time, a.value as aa, b.value as bb from sub_cte a
left join (
select * from sub_cte where name = 'bb'
) as b using (id, date, time)
where a.name = 'aa'
Result
id | date | time | aa | bb
----+------------+----------+----+----
1 | 2021-08-20 | 10:10:01 | 1 | 4
2 | 2021-08-20 | 10:10:01 | 2 | 5
3 | 2021-08-20 | 10:10:01 | 3 | 6
1 | 2021-08-20 | 10:10:02 | 7 | 0
2 | 2021-08-20 | 10:10:02 | 8 | 1
3 | 2021-08-20 | 10:10:02 | 9 | 2
(6 rows)

Making a dynamic custom series in postgresql (avoiding loop if possible)

im new to postgresql and i'm trying to do something that requires a loop in T-SQL
did some research on how to loop in postgresql and found out that i should make some sort of function first and i'm trying to avoid that.
I have a main table of
SELECT 17 as employeecount, 'Aug-2020' as month
UNION
SELECT 22, 'Sep-2020'
UNION
SELECT 27, 'Oct-2020'
I would need an output that increments 1 to x(employeecount) per month like below:
SELECT 1 as employeecount, 'Aug-2020' as month
UNION
SELECT 2, 'Aug-2020'
UNION
SELECT 3, 'Aug-2020'
........... up to 17, 'Aug-2020'
UNION
SELECT 1, 'Sep-2020'
UNION
... up to 22, 'Sep-2020' and so on
or
---------------------
increment | Month |
1 | Aug-2020|
2 | Aug-2020|
3 | Aug-2020|
4 | Aug-2020|
. | Aug-2020|
. | Aug-2020|
17 | Aug-2020|
1 | Sep-2020|
2 | Sep-2020|
. | Sep-2020|
. | Sep-2020|
22 | Sep-2020|
I'm trying to avoid looping but if there's no other way, then it'd be fine.
Thanks in advance!
Use a lateral join to generate_series():
with main (employeecount, month) as (
values (17, 'Aug-2020'), (22, 'Sep-2020'), (27, 'Oct-2020')
)
select increment, month
from main
cross join lateral generate_series(1, employeecount) as gs(increment);

how to efficiently locate a value from one table among values from another table, with SQL

I have a problem in Postgresql which I find even difficult to describe in the title: I have two tables, containing each a range of values very similar but not identical. Suppose I have values like 0, 10, 20, 30, ... in one, and 1, 5, 6, 9, 10, 12, 19, 25, 26, ... in the second one (these are milliseconds). For each value of the second one I want to find the values immediately lower and higher in the first one. So, for the value 12 it would give me 10 and 20. I'm doing it like this :
SELECT s.*, MAX(v1."millisec") AS low_v, MIN(v2."millisec") AS high_v
FROM "signals" AS s, "tracks" AS v1, "tracks" AS v2
WHERE v1."millisec" <= s."d_time"
AND v2."millisec" > s."d_time"
GROUP BY s."d_time", s."field2"; -- this is just an example
And it works ... but it is very slow once I process several thousands of lines, even with indexes on s."d_time" and v.millisec. So, I think there must be a much better way to do it, but I fail to find one. Could anyone help me ?
Try:
select s.*,
(select millisec
from tracks t
where t.millisec <= s.d_time
order by t.millisec desc
limit 1
) as low_v,
(select millisec
from tracks t
where t.millisec > s.d_time
order by t.millisec asc
limit 1
) as high_v
from signals s;
Be sure you have an index for track.millisec;
If you had just created
the index, you'll need to analyze the table to take advantage of it.
Naive (trivial) way to find the preceding and next value.
-- the data (this could have been part of the original question)
CREATE TABLE table_one (id SERIAL NOT NULL PRIMARY KEY
, msec INTEGER NOT NULL -- index maight help
);
CREATE TABLE table_two (id SERIAL NOT NULL PRIMARY KEY
, msec INTEGER NOT NULL -- index maight help
);
INSERT INTO table_one(msec) VALUES (0), ( 10), ( 20), ( 30);
INSERT INTO table_two(msec) VALUES (1), ( 5), ( 6), ( 9), ( 10), ( 12), ( 19), ( 25), ( 26);
-- The query: find lower/higher values in table one
-- , but but with no values between "us" and "them".
--
SELECT this.msec AS this
, prev.msec AS prev
, next.msec AS next
FROM table_two this
LEFT JOIN table_one prev ON prev.msec < this.msec AND NOT EXISTS (SELECT 1 FROM table_one nx WHERE nx.msec < this.msec AND nx.msec > prev.msec)
LEFT JOIN table_one next ON next.msec > this.msec AND NOT EXISTS (SELECT 1 FROM table_one nx WHERE nx.msec > this.msec AND nx.msec < next.msec)
;
Result:
CREATE TABLE
CREATE TABLE
INSERT 0 4
INSERT 0 9
this | prev | next
------+------+------
1 | 0 | 10
5 | 0 | 10
6 | 0 | 10
9 | 0 | 10
10 | 0 | 20
12 | 10 | 20
19 | 10 | 20
25 | 20 | 30
26 | 20 | 30
(9 rows)
try this :
select * from signals s,
(select millisec low_value,
lead(millisec) over (order by millisec) high_value from tracks) intervals
where s.d_time between low_value and high_value-1
For this type of problem "Window functions" are ideal see : http://www.postgresql.org/docs/9.1/static/tutorial-window.html

sum every 3 rows of a table

I have the following query to count all data every minute.
$sql= "SELECT COUNT(*) AS count, date_trunc('minute', date) AS momento
FROM p WHERE fk_id_b=$id_b GROUP BY date_trunc('minute', date)
ORDER BY momento ASC";
What I need to do is get the sum of the count for each row with the count of the 2 past minutes.
For example with the result of the $sql query above
|-------date---------|----count----|
|2012-06-21 05:20:00 | 12 |
|2012-06-21 05:21:00 | 14 |
|2012-06-21 05:22:00 | 10 |
|2012-06-21 05:23:00 | 20 |
|2012-06-21 05:24:00 | 25 |
|2012-06-21 05:25:00 | 30 |
|2012-06-21 05:26:00 | 10 |
I want this result:
|-------date---------|----count----|
|2012-06-21 05:20:00 | 12 |
|2012-06-21 05:21:00 | 26 | 12+14
|2012-06-21 05:22:00 | 36 | 12+14+10
|2012-06-21 05:23:00 | 44 | 14+10+20
|2012-06-21 05:24:00 | 55 | 10+20+25
|2012-06-21 05:25:00 | 75 | 20+25+30
|2012-06-21 05:26:00 | 65 | 25+30+10
Here's a more general solution for the sum of values from current and N previous rows (N=2 in your case).
SELECT "date",
sum("count") OVER (order by "date" ROWS BETWEEN 2 preceding AND current row)
FROM t
ORDER BY "date";
You can change N between 0 and "Unbounded". This approach gives you a chance to have a parameter in your app "count of the N past minutes". Also, no need for handling default values if out of bounds.
You can find more on this in PostgreSQL docs (4.2.8. Window Function Calls)
This is not so tricky with lag() window function (also on SQL Fiddle):
CREATE TABLE t ("date" timestamptz, "count" int4);
INSERT INTO t VALUES
('2012-06-21 05:20:00',12),
('2012-06-21 05:21:00',14),
('2012-06-21 05:22:00',10),
('2012-06-21 05:23:00',20),
('2012-06-21 05:24:00',25),
('2012-06-21 05:25:00',30),
('2012-06-21 05:26:00',10);
SELECT *,
"count"
+ coalesce(lag("count", 1) OVER (ORDER BY "date"), 0)
+ coalesce(lag("count", 2) OVER (ORDER BY "date"), 0) AS "total"
FROM t;
I've double-quoted date and count columns, as these are reserved words;
lag(field, distance) gives me the value of the field column distance rows away from the current one, thus first function gives previous row's value and second call gives the value from the one before;
coalesce() is required to avoid NULL result from lag() function (for the first row in your query there's no “previous” one, thus it's NULL), otherwise the total will also be NULL.
#vyegorov's answer covers it mostly. But I have more gripes than fit into a comment.
Don't use reserved words like date and count as identifiers at all. PostgreSQL allows those two particular key words as identifier - other than every SQL standard. But it's still bad practice. The fact that you can use anything inside double-quotes as identifier, even "; DELETE FROM tbl;" does not make it a good idea. The name "date" for a timestamp is misleading on top of that.
Wrong data type. Example displays timestamp, not timestamptz. Does not make a difference here, but still misleading.
You don't need COALESCE(). With the window functions lag() and lead() you can can provide a default value as 3rd parameter:
Building on this setup:
CREATE TABLE tbl (ts timestamp, ct int4);
INSERT INTO tbl VALUES
('2012-06-21 05:20:00', 12)
, ('2012-06-21 05:21:00', 14)
, ('2012-06-21 05:22:00', 10)
, ('2012-06-21 05:23:00', 20)
, ('2012-06-21 05:24:00', 25)
, ('2012-06-21 05:25:00', 30)
, ('2012-06-21 05:26:00', 10);
Query:
SELECT ts, ct + lag(ct, 1, 0) OVER (ORDER BY ts)
+ lag(ct, 2, 0) OVER (ORDER BY ts) AS total
FROM tbl;
Or better yet: use a single sum() as window aggregate function with a custom window frame:
SELECT ts, sum(ct) OVER (ORDER BY ts ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM tbl;
Same result.
Related:
Group by end of period instead of start date