Postgresql - increment counter in rows where a column has duplicate value - postgresql

I have added a column (seq) to a table used for scheduling so the front end can manage the order in which each item can be displayed. Is it possible to craft a SQL query to populate this column with an incremental counter based on the common duplicate values in the date column?
Before
------------------------------------
| name | date_time | seq |
------------------------------------
| ABC1 | 15-01-2017 11:00:00 | |
| ABC2 | 16-01-2017 11:30:00 | |
| ABC1 | 16-01-2017 11:30:00 | |
| ABC3 | 17-01-2017 10:00:00 | |
| ABC3 | 18-01-2017 12:30:00 | |
| ABC4 | 18-01-2017 12:30:00 | |
| ABC1 | 18-01-2017 12:30:00 | |
------------------------------------
After
------------------------------------
| name | date_time | seq |
------------------------------------
| ABC1 | 15-01-2017 11:00:00 | 0 |
| ABC2 | 16-01-2017 11:30:00 | 0 |
| ABC1 | 16-01-2017 11:30:00 | 1 |
| ABC3 | 17-01-2017 10:00:00 | 0 |
| ABC3 | 18-01-2017 12:30:00 | 0 |
| ABC4 | 18-01-2017 12:30:00 | 1 |
| ABC1 | 18-01-2017 12:30:00 | 2 |
------------------------------------
Solved, thanks to both answers.
To make it easier for anybody who finds this, the working code is:
UPDATE my_table f
SET seq = seq2
FROM (
SELECT ctid, ROW_NUMBER() OVER (PARTITION BY date_time ORDER BY ctid) -1 AS seq2
FROM my_table
) s
WHERE f.ctid = s.ctid;

Use the window function row_number():
with my_table (name, date_time) as (
values
('ABC1', '15-01-2017 11:00:00'),
('ABC2', '16-01-2017 11:30:00'),
('ABC1', '16-01-2017 11:30:00'),
('ABC3', '17-01-2017 10:00:00'),
('ABC3', '18-01-2017 12:30:00'),
('ABC4', '18-01-2017 12:30:00'),
('ABC1', '18-01-2017 12:30:00')
)
select *,
row_number() over (partition by name order by date_time)- 1 as seq
from my_table
order by date_time;
name | date_time | seq
------+---------------------+-----
ABC1 | 15-01-2017 11:00:00 | 0
ABC1 | 16-01-2017 11:30:00 | 1
ABC2 | 16-01-2017 11:30:00 | 0
ABC3 | 17-01-2017 10:00:00 | 0
ABC1 | 18-01-2017 12:30:00 | 2
ABC3 | 18-01-2017 12:30:00 | 1
ABC4 | 18-01-2017 12:30:00 | 0
(7 rows)
Read this answer for a similar question about updating existing records with a unique integer.

Check out ROW_NUMBER().
SELECT name, date_time, ROW_NUMBER() OVER (PARTITION BY date_time ORDER BY name) FROM [table]

Related

postgresql - find Discontinuous id and get read_time

I have a table like this, and there are three cases,
## case a
| rec_no | read_time | id
+--------+---------------------+----
| 45139 | 2023-02-07 17:00:00 | a
| 45140 | 2023-02-07 17:15:00 | a
| 45141 | 2023-02-07 17:30:00 | a
| 45142 | 2023-02-07 18:15:00 | a
| 45143 | 2023-02-07 18:30:00 | a
| 45144 | 2023-02-07 18:45:00 | a
## case b
| rec_no | read_time | id
+--------+---------------------+----
| 21735 | 2023-02-01 19:15:00 | b
| 21736 | 2023-02-01 19:30:00 | b
| 21742 | 2023-02-01 21:00:00 | b
| 21743 | 2023-02-01 21:15:00 | b
| 21744 | 2023-02-01 21:30:00 | b
| 21745 | 2023-02-01 21:45:00 | b
## case c
| rec_no | read_time | id
+--------+---------------------+----
| 12345 | 2023-02-02 12:15:00 | c
| 12346 | 2023-02-02 12:30:00 | c
| 12347 | 2023-02-02 12:45:00 | c
| 12348 | 2023-02-02 13:15:00 | c
| 12352 | 2023-02-02 14:00:00 | c
| 12353 | 2023-02-02 14:15:00 | c
I'd like to find out the missing readtime field when the rec is not continuous.
read_time is '15 min' interval
in different 'id', rec_no are independent
I'd like something like this,
## case a
## nothing because rec_no is continous
| read_time | id
+---------------------+----
## case b
## get six rows
| read_time | id
+--------+-----------------
| 2023-02-01 19:45:00 | b
| 2023-02-01 20:00:00 | b
| 2023-02-01 20:15:00 | b
| 2023-02-01 20:30:00 | b
| 2023-02-01 20:45:00 | b
| 2023-02-01 21:00:00 | b
## case c
## get two rows (13:00:00 is missing but rec_no is continous)
| read_time | id
+--------+-----------------
| 2023-02-02 13:30:00 | c
| 2023-02-02 13:45:00 | c
Is there a way to do this ? The output format is not too important as long as I can get the result correctly.
step-by-step demo: db<>fiddle
SELECT
rec_no,
id,
gs
FROM (
SELECT
*,
lead(rec_no) OVER (PARTITION BY id ORDER BY rec_no) - rec_no > 1 AS is_gap, -- 1
lead(read_time) OVER (PARTITION BY id ORDER BY rec_no) as next_read_time
FROM mytable
)s, generate_series( -- 3
read_time + interval '15 minutes', -- 4
next_read_time - interval '15 minutes',
interval '15 minutes'
) as gs
WHERE is_gap -- 2
Use lead() window function to move the next rec_no value and the next read_time value to the current row. With this you can check if the difference between the current and next rec_no values are greater than 1.
Filter all records with greater differences
Generate a time series with 15 minutes interval
Because the series includes start and end, you need a start at the next 15 minutes points (+ interval) and end one "slot" before the next recorded value (- interval).

How to select last timestamp by distinct columns?

Suppose there is table like this:
| user_id | location_id | datetime | other_field |
| ------- | ----------- | ------------------- | ----------- |
| 12 | 1 | 2020-02-01 10:00:00 | asdqwe |
| 12 | 1 | 2020-02-01 10:30:00 | asdqwe |
| 12 | 2 | 2020-02-01 10:40:00 | asdqwe |
| 12 | 2 | 2020-02-01 10:50:00 | asdqwe |
| 13 | 1 | 2020-02-01 10:10:00 | asdqwe |
| 13 | 1 | 2020-02-01 10:20:00 | asdqwe |
| 14 | 3 | 2020-02-01 09:00:00 | asdqwe |
I want to select last datetime of each distinct user_id and location_id. This is what result I am looking for:
| user_id | location_id | datetime | other_field |
| ------- | ----------- | ------------------- | ----------- |
| 12 | 1 | 2020-02-01 10:30:00 | asdqwe |
| 12 | 2 | 2020-02-01 10:50:00 | asdqwe |
| 13 | 1 | 2020-02-01 10:20:00 | asdqwe |
| 14 | 3 | 2020-02-01 09:00:00 | asdqwe |
Here is the table description:
CREATE TABLE mykeyspace.mytable (
user_id int,
location_id int,
datetime timestamp,
other_field text,
PRIMARY KEY ((user_id, location_id, other_field), datetime)
) WITH CLUSTERING ORDER BY (datetime ASC)
AND read_repair_chance = 0.0
AND dclocal_read_repair_chance = 0.1
AND gc_grace_seconds = 864000
AND bloom_filter_fp_chance = 0.01
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
AND comment = ''
AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold' : 32, 'min_threshold' : 4 }
AND compression = { 'chunk_length_in_kb' : 64, 'class' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
AND default_time_to_live = 0
AND speculative_retry = '99PERCENTILE'
AND min_index_interval = 128
AND max_index_interval = 2048
AND crc_check_chance = 1.0
AND cdc = false;
For such things, CQL has "PER PARTITION LIMIT" clause (available in Cassandra 3.6+ IIRC). But to use on your table, you need to change table definition to CLUSTERING ORDER BY (datetime DESC), and then you could write:
select * from prospacedb.quarter_utilisation per partition limit 1;
and get row with latest timestamp for every partition key you have.

Select with group by and count two columns

I have this table
+----+----------------------------+---------------+----------+
| ID | CREATED_AT | CATEGORY | TYPE |
+----+----------------------------+---------------+----------+
| 1 | 2017-11-23 23:00:40.221958 | SEM COBERTURA | callback |
| 2 | 2017-11-23 22:58:36.970052 | VENDA | ativo |
| 3 | 2017-11-23 22:47:03.956185 | SEM COBERTURA | ativo |
| 4 | 2017-11-23 22:42:24.309915 | VENDA | ativo |
| 5 | 2017-11-23 22:32:48.780418 | SEM COBERTURA | callback |
| 6 | 2017-11-23 22:12:21.631433 | VENDA | callback |
| 7 | 2017-11-23 22:09:38.52699 | SEM COBERTURA | ativo |
| 8 | 2017-11-23 22:08:09.836343 | LIGACAO MUDA | callback |
| 9 | 2017-11-23 22:08:07.058063 | SEM COBERTURA | callback |
| 10 | 2017-11-23 22:07:02.067439 | LIGACAO MUDA | other |
+----+----------------------------+---------------+----------+
With the table above, i want group by and sum the column TYPE and sum the CATEGORY "VENDA", eg:
This is what i want
+----------+------------+----------------+
| TYPE | COUNT_TYPE | COUNT_CATEGORY_VENDA |
+----------+------------+----------------+
| callback | 5 | 1 |
| ativo | 4 | 2 |
| other | 1 | 0 |
+----------+------------+----------------+
The type "callback" appear 5 times and has 1 category "VENDA", "ativo" appear 4 times and has 2 "VENDA"...
To get TYPE and COUNT_TYPE i'm using this query:
SELECT TYPE, count(TYPE) AS COUNT_TYPE FROM table WHERE created_at BETWEEN '2017-11-23 00:00:00' AND '2017-11-23 23:59:00' GROUP BY TYPE ORDER BY COUNT_TYPE DESC
Can anyone help me, please?
You can use case when in postgresql
SELECT TYPE, count(TYPE) AS COUNT_TYPE, SUM(case CATEGORY when 'VENDA' then 1 else 0 end) FROM table WHERE created_at BETWEEN '2017-11-23 00:00:00' AND '2017-11-23 23:59:00' GROUP BY TYPE ORDER BY COUNT_TYPE DESC
There are several ways to do this. IMHO the most simple way is to use case when to filter what to count in query.
SELECT
TYPE,
count(TYPE) AS COUNT_TYPE,
SUM(
case CATEGORY when 'VENDA' then 1 else 0 end
) AS COUNT_CATEGORY_VENDA
FROM table
WHERE
created_at BETWEEN '2017-11-23 00:00:00' AND '2017-11-23 23:59:00'
GROUP BY TYPE
ORDER BY COUNT_TYPE DESC;
Besides CASE col WHEN d1 THEN v1 WHEN d2 THEN v2 ELSE v3 END, you can also try CASE WHEN col = d1 THEN v1 WHEN col = d2 THEN v2 ELSE v3 END.
Another way is to use sub-queries.

Crosstab function and Dates PostgreSQL

I had to create a cross tab table from a Query where dates will be changed into column names. These order dates can be increase or decrease as per the dates passed in the query. The order date is in Unix format which is changed into normal format.
Query is following:
Select cd.cust_id
, od.order_id
, od.order_size
, (TIMESTAMP 'epoch' + od.order_date * INTERVAL '1 second')::Date As order_date
From consumer_details cd,
consumer_order od,
Where cd.cust_id = od.cust_id
And od.order_date Between 1469212200 And 1469212600
Order By od.order_id, od.order_date
Table as follows:
cust_id | order_id | order_size | order_date
-----------|----------------|---------------|--------------
210721008 | 0437756 | 4323 | 2016-07-22
210721008 | 0437756 | 4586 | 2016-09-24
210721019 | 10749881 | 0 | 2016-07-28
210721019 | 10749881 | 0 | 2016-07-28
210721033 | 13639 | 2286145 | 2016-09-06
210721033 | 13639 | 2300040 | 2016-10-03
Result will be:
cust_id | order_id | 2016-07-22 | 2016-09-24 | 2016-07-28 | 2016-09-06 | 2016-10-03
-----------|----------------|---------------|---------------|---------------|---------------|---------------
210721008 | 0437756 | 4323 | 4586 | | |
210721019 | 10749881 | | | 0 | |
210721033 | 13639 | | | | 2286145 | 2300040

Postgresql Use count on multiple columns

I have two tables. The first generate the condition for counting records in the second. The two tables are linked by a relation of 1:1 by Timestamp.
The problem is that the second table have many columns, and we need a count for each column that match the condition in the first column.
Example:
Tables met and pot
CREATE TABLE met (
tstamp timestamp without time zone NOT NULL,
h1_rad double precision,
CONSTRAINT met_pkey PRIMARY KEY (tstamp)
)
CREATE TABLE pot (
tstamp timestamp without time zone NOT NULL,
c1 double precision,
c2 double precision,
c3 double precision,
CONSTRAINT met_pkey PRIMARY KEY (tstamp)
)
REALLY pot have 108 columns from c1 to c108.
Tables values:
+ Table met + + Table pot +
+----------------+--------+--+----------------+------+------+------+
| tstamp | h1_rad | | tstamp | c1 | c2 | c3 |
+----------------+--------+--+----------------+------+------+------+
| 20150101 00:00 | 0 | | 20150101 00:00 | 5,5 | 3,3 | 15,6 |
| 20150101 00:05 | 1,8 | | 20150101 00:05 | 12,8 | 15,8 | 1,5 |
| 20150101 00:10 | 15,4 | | 20150101 00:10 | 25,4 | 4,5 | 1,4 |
| 20150101 00:15 | 28,4 | | 20150101 00:15 | 18,3 | 63,5 | 12,5 |
| 20150101 00:20 | 29,4 | | 20150101 00:20 | 24,5 | 78 | 17,5 |
| 20150101 00:25 | 13,5 | | 20150101 00:25 | 12,8 | 5,4 | 18,4 |
| 20150102 00:00 | 19,5 | | 20150102 00:00 | 11,1 | 25,6 | 6,5 |
| 20150102 00:05 | 2,5 | | 20150102 00:05 | 36,5 | 21,4 | 45,2 |
| 20150102 00:10 | 18,4 | | 20150102 00:10 | 1,4 | 35,5 | 63,5 |
| 20150102 00:15 | 20,4 | | 20150102 00:15 | 18,4 | 23,4 | 8,4 |
| 20150102 00:20 | 6,8 | | 20150102 00:20 | 16,8 | 12,5 | 18,4 |
| 20150102 00:25 | 17,4 | | 20150102 00:25 | 25,8 | 23,5 | 9,5 |
+----------------+--------+--+----------------+------+------+------+
What i need is the number of rows of pot where value is higher than 15 when in met the value is higher than 15 with the same timestamp, grouped by day.
With the data supplied we need something like:
+----------+----+----+----+
| day | c1 | c2 | c3 |
+----------+----+----+----+
| 20150101 | 3 | 2 | 1 |
| 20150102 | 2 | 4 | 1 |
+----------+----+----+----+
How can i get this ?
Is this possible with a single query even with subquerys ?
Actually the raw data is stored every minute in others tables. The tables met and pot are summarized and filtered tables for performance.
If necessary, i can create tables with data summarized by days if this simplify the solution.
Thanks
P.D.
Sorry for my english
You can solve this with some CASE statements. Test for both conditions, and if true return a 1. Then SUM() the results using a GROUP BY on the timestamp converted to a date to get your total:
SELECT
date(met.tstamp),
SUM(CASE WHEN met.h1_rad > 15 AND pot.c1 > 15 THEN 1 END) as C1,
SUM(CASE WHEN met.h1_rad > 15 AND pot.c2 > 15 THEN 1 END) as C2,
SUM(CASE WHEN met.h1_rad > 15 AND pot.c3 > 15 THEN 1 END) as C3
FROM
met INNER JOIN pot ON met.tstamp = pot.tstamp
GROUP BY date(met.tstamp)