PostgreSQL : LEFT JOIN with tempory table adding extra lines

PostgreSQL : LEFT JOIN with tempory table adding extra lines - postgresql

I have a view storing water pipes references together with details on operations on the water system. I need to extract from that view the water pipes on which more than one operation were registered over the past 12 months. Here is how I proceed :
Here is the view structure and a sample of data:
CREATE TABLE schema.pipe (
id INTEGER,
code VARCHAR,
ope_date DATE,
ope_type VARCHAR(2),
system VARCHAR(2));
INSERT INTO schema.pipe (code, ope_date, ope_type, system) VALUES
('0001', '2014-11-11', '01', 'EU'),
('0001', '2014-11-11', '03', 'EU'),
('0002', '2014-12-03', '03', 'EP'),
('0002', '2014-01-03', '03', 'EP'),
('0003', '2014-08-11', '01', 'EP'),
('0003', '2014-03-03', '03', 'EP'),
('0003', '2012-02-27', '03', 'EP'),
('0004', '2014-08-11', '01', 'UN'),
('0004', '2013-12-30', '03', 'UN'),
('0004', '2013-06-01', '03', 'UN'),
('0004', '2012-07-31', '03', 'UN'),
('0005', '2013-10-01', '03', 'EU'),
('0005', '2012-11-01', '03', 'EU'),
('0006', '2014-04-01', '01', 'UN'),
('0006', '2014-05-15', '01', 'UN');
code is the pipe reference
ope_date is the operation date
ope_type is the operation type
system is the system type
Here is the query I'm using:
SELECT code, ope_date FROM schema.pipe
WHERE (NOW()::DATE - ope_date) < 365
GROUP BY code, ope_date
HAVING count(*) = 1 ;
I get this:
code | ope_date
---------+--------------
0002 | 2014-12-03
0002 | 2014-01-03
0003 | 2014-08-11
0003 | 2014-03-03
0004 | 2013-12-30
0004 | 2014-08-11
0006 | 2014-04-01
0006 | 2014-05-15
Now, I need to bring back the other columns with this selection. So I use:
WITH temptable AS (
SELECT code, ope_date FROM schema.pipe WHERE (NOW()::DATE - ope_date) < 365 GROUP BY code, ope_date HAVING count(*) = 1)
SELECT DISTINCT a.code, a.ope_date, b.ope_type, b.system FROM temptable a LEFT JOIN schema.pipe b on a.code = b.code ;
I get this, which is too many lines (I need 8 lines and I get 12):
code | ope_date | ope_type | system
-------+--------------+------------+---------
0002 | 2014-01-03 | 03 | EP
0002 | 2014-12-03 | 03 | EP
0003 | 2014-03-03 | 01 | EP
0003 | 2014-03-03 | 03 | EP
0003 | 2014-08-11 | 01 | EP
0003 | 2014-08-11 | 03 | EP
0004 | 2013-12-30 | 01 | UN
0004 | 2013-12-30 | 03 | UN
0004 | 2014-08-11 | 01 | UN
0004 | 2014-08-11 | 03 | UN
0006 | 2014-04-01 | 01 | UN
0006 | 2014-05-15 | 01 | UN
So here comes my question: how can I get just the lines matching my selection?
Many thanks in advance.
EDIT :
What I need is:
code | ope_date | ope_type | system
---------+-------------+-----------+---------
0002 | 2014-12-03 | 03 | EP
0002 | 2014-01-03 | 03 | EP
0003 | 2014-08-11 | 01 | EP
0003 | 2014-03-03 | 03 | EP
0004 | 2013-12-30 | 03 | UN
0004 | 2014-08-11 | 01 | UN
0006 | 2014-04-01 | 01 | UN
0006 | 2014-05-15 | 01 | UN

I've found myself a solution that consists in using both code and ope_date columns to join the tables (not just code):
WITH temptable AS (
SELECT code, ope_date FROM schema.pipe WHERE (NOW()::DATE - ope_date) < 365 GROUP BY code, ope_date HAVING count(*) = 1)
SELECT DISTINCT a.code, a.ope_date, b.ope_type, b.system FROM temptable a, schema.pipe b WHERE a.code = b.code AND a.ope_date = b.ope_date;
Any comment on this solution?

Related

postgresql - find Discontinuous id and get read_time

I have a table like this, and there are three cases,
## case a
| rec_no | read_time | id
+--------+---------------------+----
| 45139 | 2023-02-07 17:00:00 | a
| 45140 | 2023-02-07 17:15:00 | a
| 45141 | 2023-02-07 17:30:00 | a
| 45142 | 2023-02-07 18:15:00 | a
| 45143 | 2023-02-07 18:30:00 | a
| 45144 | 2023-02-07 18:45:00 | a
## case b
| rec_no | read_time | id
+--------+---------------------+----
| 21735 | 2023-02-01 19:15:00 | b
| 21736 | 2023-02-01 19:30:00 | b
| 21742 | 2023-02-01 21:00:00 | b
| 21743 | 2023-02-01 21:15:00 | b
| 21744 | 2023-02-01 21:30:00 | b
| 21745 | 2023-02-01 21:45:00 | b
## case c
| rec_no | read_time | id
+--------+---------------------+----
| 12345 | 2023-02-02 12:15:00 | c
| 12346 | 2023-02-02 12:30:00 | c
| 12347 | 2023-02-02 12:45:00 | c
| 12348 | 2023-02-02 13:15:00 | c
| 12352 | 2023-02-02 14:00:00 | c
| 12353 | 2023-02-02 14:15:00 | c
I'd like to find out the missing readtime field when the rec is not continuous.
read_time is '15 min' interval
in different 'id', rec_no are independent
I'd like something like this,
## case a
## nothing because rec_no is continous
| read_time | id
+---------------------+----
## case b
## get six rows
| read_time | id
+--------+-----------------
| 2023-02-01 19:45:00 | b
| 2023-02-01 20:00:00 | b
| 2023-02-01 20:15:00 | b
| 2023-02-01 20:30:00 | b
| 2023-02-01 20:45:00 | b
| 2023-02-01 21:00:00 | b
## case c
## get two rows (13:00:00 is missing but rec_no is continous)
| read_time | id
+--------+-----------------
| 2023-02-02 13:30:00 | c
| 2023-02-02 13:45:00 | c
Is there a way to do this ? The output format is not too important as long as I can get the result correctly.

step-by-step demo: db<>fiddle
SELECT
rec_no,
id,
gs
FROM (
SELECT
*,
lead(rec_no) OVER (PARTITION BY id ORDER BY rec_no) - rec_no > 1 AS is_gap, -- 1
lead(read_time) OVER (PARTITION BY id ORDER BY rec_no) as next_read_time
FROM mytable
)s, generate_series( -- 3
read_time + interval '15 minutes', -- 4
next_read_time - interval '15 minutes',
interval '15 minutes'
) as gs
WHERE is_gap -- 2
Use lead() window function to move the next rec_no value and the next read_time value to the current row. With this you can check if the difference between the current and next rec_no values are greater than 1.
Filter all records with greater differences
Generate a time series with 15 minutes interval
Because the series includes start and end, you need a start at the next 15 minutes points (+ interval) and end one "slot" before the next recorded value (- interval).

How many users have 2 or more transactions in period of n-days without skipping even a day

I need help to find and count orders in certain consecutive period of time from 'sales_track' table from users that has minimum of two or more transactions (rephrase: How many users have 2 or more transactions in period of n-days without skipping even a day)
sales_track
sales_tx_id | u_id | create_date | item_id | price
------------|------|-------------|---------|---------
ffff-0291 | 0001 | 2019-08-01 | 0300 | 5.00
ffff-0292 | 0001 | 2019-08-01 | 0301 | 2.50
ffff-0293 | 0002 | 2019-08-01 | 0209 | 3.50
ffff-0294 | 0003 | 2019-08-01 | 0020 | 1.00
ffff-0295 | 0001 | 2019-08-02 | 0301 | 2.50
ffff-0296 | 0001 | 2019-08-02 | 0300 | 5.00
ffff-0297 | 0001 | 2019-08-02 | 0209 | 3.50
ffff-0298 | 0002 | 2019-08-02 | 0300 | 5.00
For simplicity sake sample is for two consecutive days (period of time is between 2019-08-01 and 2019-08-02) only, in real operation I would have to search eg. 10 consecutive days transaction.
I'm able so far to find the minimum two or more transactions.
SELECT user_id, COUNT (user_id) FROM sales_track WHERE created_at BETWEEN
('2019-08-01') AND ('2019-08-02')
GROUP BY u_id HAVING COUNT (sales_tx_id) >= 2;
The output I'm looking for is like:
u_id | tx_count | tx_amount
------|----------|------------
0001 | 5 | 18.50
Thank you in advance your help.

step-by-step demo:db<>fiddle
First: My extended data set:
sales_tx_id | user_id | created_at | item_id | price
:---------- | :------ | :--------- | :------ | ----:
ffff-0291 | 0001 | 2019-08-01 | 0300 | 5.00
ffff-0292 | 0001 | 2019-08-01 | 0301 | 2.50
ffff-0293 | 0002 | 2019-08-01 | 0209 | 3.50
ffff-0294 | 0003 | 2019-08-01 | 0020 | 1.00
ffff-0295 | 0001 | 2019-08-02 | 0301 | 2.50
ffff-0296 | 0001 | 2019-08-02 | 0300 | 5.00
ffff-0297 | 0001 | 2019-08-02 | 0209 | 3.50
ffff-0298 | 0002 | 2019-08-02 | 0300 | 5.00
ffff-0299 | 0001 | 2019-08-05 | 0209 | 3.50
ffff-0300 | 0001 | 2019-08-05 | 0020 | 1.00
ffff-0301 | 0001 | 2019-08-06 | 0209 | 3.50
ffff-0302 | 0001 | 2019-08-06 | 0020 | 1.00
ffff-0303 | 0001 | 2019-08-07 | 0209 | 3.50
ffff-0304 | 0001 | 2019-08-07 | 0020 | 1.00
ffff-0305 | 0002 | 2019-08-08 | 0300 | 5.00
ffff-0306 | 0002 | 2019-08-08 | 0301 | 2.50
ffff-0307 | 0001 | 2019-08-09 | 0209 | 3.50
ffff-0308 | 0001 | 2019-08-09 | 0020 | 1.00
ffff-0309 | 0002 | 2019-08-09 | 0300 | 5.00
ffff-0310 | 0002 | 2019-08-09 | 0301 | 2.50
ffff-0311 | 0001 | 2019-08-10 | 0209 | 3.50
ffff-0312 | 0001 | 2019-08-10 | 0020 | 1.00
ffff-0313 | 0002 | 2019-08-10 | 0300 | 5.00
User 1 has 3 streaks:
2019-08-01, 2019-08-02
2019-08-05, 2019-08-06, 2019-08-07
2019-08-09, 2019-08-10
User 2:
Has transaction at 2019-08-01, 2019-08-02, but only one each date, so that does not count
Has streak on 2019-08-08, 2019-08-09 (2019-08-10 has only one transaction, does not extend streak)
So we are expecting 4 rows: 3 for each user 1 streak, 1 for user 2
SELECT -- 4
user_id,
SUM(count),
SUM(price),
MIN(created_at) AS consecutive_start
FROM (
SELECT *, -- 3
SUM(is_in_same_group) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id
FROM (
SELECT -- 2
*,
(lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) + 1 <> created_at)::int as is_in_same_group
FROM (
SELECT -- 1
created_at,
user_id,
COUNT(*),
SUM(price) AS price
FROM
sales_track
WHERE created_at BETWEEN '2018-02-01' AND '2019-08-11'
GROUP BY created_at, user_id
HAVING COUNT(*) >= 2
) s
) s
) s
GROUP BY user_id, group_id
Grouping all (created_at, user_id) groups and remove those with COUNT() < 2
the lag() window function allows to get the value of the previous record within one ordered group. The group here is the user_id. The check here is: If the current created_at value is the next to the previous (current + 1) then 0, else 1.
Now we can use the cummulative SUM() window function to sum these values: The value increases if the gap is too big (if value is 1) otherwise it is the same value as the previous date. Now we got a group_id for all dates that only differ +1
Finally these groups can be grouped for SUM() and COUNT()

Finding the next occurrence of particular day of the week

How can I take a datetime column start_at convert it to a day of week and find out the next future occurrence relative to the current date?
Here I'm trying to add the DOW to the current week but it doesn't appear to be correct.
SELECT date_trunc('week', current_date) + CAST(extract(dow from start_at) || ' days' AS interval)
Full example:
SELECT id event_id,
GENERATE_SERIES(date_trunc('week', current_date) + CAST(extract(dow from start_at) + 1 || ' days' AS interval) + start_at::time, current_date + interval '3 weeks', '1 week'::INTERVAL) AS start_at
FROM events
Events
+-----+---------------------------+---------------------+
| id | start_at | recurring_schedule |
+-----+---------------------------+---------------------+
| 358 | 2015-01-23 20:00:00 +0000 | Weekly |
| 359 | 2016-01-22 19:30:00 +1100 | Monthly |
| 360 | 2016-02-01 19:00:00 +1100 | Weekly |
| 361 | 2016-02-01 20:00:00 +0000 | Weekly |
| 362 | 2014-02-13 20:00:00 +0000 | Bi-Weekly |
+-----+---------------------------+---------------------+
Output
+----------+---------------------------+
| event_id | start_at |
+----------+---------------------------+
| 35 | 2018-04-11 19:30:00 +0000 |
| 94 | 2018-04-12 20:00:00 +0100 |
| 269 | 2018-04-13 18:30:00 +0100 |
| 45 | 2018-04-13 20:00:00 +0100 |
| 242 | 2018-04-13 19:30:00 +1100 |
| 35 | 2018-04-18 19:30:00 +0000 |
| 94 | 2018-04-19 20:00:00 +0100 |
| 269 | 2018-04-20 18:30:00 +0100 |
| 45 | 2018-04-20 20:00:00 +0100 |
| 242 | 2018-04-20 19:30:00 +1100 |
+----------+---------------------------+

Give this a try:
SELECT id event_id,
GENERATE_SERIES(date_trunc('week', current_date)::date
+ (extract(isodow from start_at)::int - 1) + start_at::time, current_date
+ interval '3 weeks', '1 week'::INTERVAL) AS start_at
FROM events

Query to verify number 24 hours before the calling

I have a table [POSTGRE] like this...
- Agent who answered the call,
- call number
- datetime end of call.
Agent | Caller | DateTime
------------------------------------
101 | 555-1234 | 12-01-16 00:00
101 | 555-1234 | 12-01-16 01:00
102 | 555-1234 | 12-01-16 02:00
102 | 555-1234 | 13-01-16 06:00
1º - I need to verify each number (each line).. and look 24 hours before call, to check if is Recalling, eg.: day 13th was not recalling because the last call from this number was not in 24 hours.
2º - Need to get the agent from recalling.
need to display like this...
Agent | Caller | DateTime | Recalling | Recalling-From
----------------------------------------------------------------------
101 | 555-1234 | 12-01-16 00:00 | NO |
101 | 555-1234 | 12-01-16 01:00 | YES | 101
102 | 555-1234 | 12-01-16 02:00 | YES | 101
102 | 555-1234 | 13-01-16 06:00 | NO |
my query is ...
SELECT
to_char(datahora_entrada_fila, 'DD/MM/YYYY') as dia,
calleridnum As numero,
MAX(datahora_inicio) AS data_ini
FROM callcenter.chamada_fila_in cfin
WHERE datahora_inicio BETWEEN '2016-12-01 00:00:00' AND '2016-12-01 23:59:59'
AND status_chamada = 'Finalizada'
GROUP BY dia,numero
)
SELECT
to_char(datahora_entrada_fila, 'DD/MM/YYYY') as dia,
(SELECT count(calleridnum) FROM callcenter.chamada_fila_in f INNER JOIN maiordata md ON f.calleridnum = md.numero WHERE calleridnum = cfin.calleridnum AND datahora_inicio BETWEEN md.data_ini - interval '24 hours' AND md.data_ini ) AS QtdRechamada,
calleridnum As numero
FROM callcenter.chamada_fila_in cfin
WHERE datahora_inicio BETWEEN '2016-12-01 00:00:00' AND '2016-12-01 23:59:59'
AND status_chamada = 'Finalizada'
GROUP BY dia,numero
ORDER BY
dia,numero DESC
need a better function or method..
this query is very heavy in my database, need To be optimized.

postgres sql bucket values into generated time sequence

I am trying to transform data from a table of recorded events. I am transforming the data into a consistent 'daily half hour view'. e.g 48 half periods (padding out half hours with zero when there are no matching events), i have completed this with partial success.
SELECT t1.generate_series,
v1.begin_time,
v1.end_time,
v1.volume
FROM tbl_my_values v1
RIGHT JOIN ( SELECT generate_series.generate_series
FROM generate_series((to_char(now(), 'YYYY-MM-dd'::text) || ' 22:00'::text)::timestamp without time zone,
(to_char(now() + '1 day'::interval, 'YYYY-MM-dd'::text) || ' 22:00'::text)::timestamp without time zone, '00:30:00'::interval)
generate_series(generate_series)) t1 ON t1.generate_series = v1.begin_time
order by 1 ;
This provides the following results:
2015-12-19 22:00:00 | 2015-12-19 22:00:00+00 | 2015-12-19 23:00:00+00 | 172.10
2015-12-19 22:30:00 | | |
2015-12-19 23:00:00 | 2015-12-19 23:00:00+00 | 2015-12-20 00:00:00+00 | 243.60
2015-12-20 00:30:00 | | |
2015-12-20 01:00:00 | | |
However based on the 'start' and 'end' columns the view should be:
2015-12-19 22:00:00 | 2015-12-19 22:00:00+00 | 2015-12-19 23:00:00+00 | 172.10
2015-12-19 22:30:00 | | | 172.10
2015-12-19 23:00:00 | 2015-12-19 23:00:00+00 | 2015-12-20 00:00:00+00 | 243.60
2015-12-20 00:30:00 | | | 243.60
2015-12-20 01:00:00 | | |
because the the values in this example span 2 half hours e.g. are valid for one hour.
All help is very welcome. Thanks

Your ON clause is only comparing to the begin_time. I think you want an inequality:
on t1.generate_series between v1.begin_time and t1.end_time

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL : LEFT JOIN with tempory table adding extra lines - postgresql

Related

postgresql - find Discontinuous id and get read_time

How many users have 2 or more transactions in period of n-days without skipping even a day

Finding the next occurrence of particular day of the week

Query to verify number 24 hours before the calling

postgres sql bucket values into generated time sequence

Categories

Resources