Postgres: for each row evaluate all successive rows under conditions - postgresql

I have this table:
id | datetime | row_number
1 2018-04-09 06:27:00 1
1 2018-04-09 14:15:00 2
1 2018-04-09 15:25:00 3
1 2018-04-09 15:35:00 4
1 2018-04-09 15:51:00 5
1 2018-04-09 17:05:00 6
1 2018-04-10 06:42:00 7
1 2018-04-10 16:39:00 8
1 2018-04-10 18:58:00 9
1 2018-04-10 19:41:00 10
1 2018-04-14 17:05:00 11
1 2018-04-14 17:48:00 12
1 2018-04-14 18:57:00 13
I'd count for each row the successive rows with time <= '01:30:00' and start the successive evaluation from the first row that doesn't meet the condition.
I try to exlplain better the question.
Using windows function lag():
SELECT id, datetime,
CASE WHEN datetime - lag (datetime,1) OVER(PARTITION BY id ORDER BY datetime)
< '01:30:00' THEN 1 ELSE 0 END AS count
FROM table
result is:
id | datetime | count
1 2018-04-09 06:27:00 0
1 2018-04-09 14:15:00 0
1 2018-04-09 15:25:00 1
1 2018-04-09 15:35:00 1
1 2018-04-09 15:51:00 1
1 2018-04-09 17:05:00 1
1 2018-04-10 06:42:00 0
1 2018-04-10 16:39:00 0
1 2018-04-10 18:58:00 0
1 2018-04-10 19:41:00 1
1 2018-04-14 17:05:00 0
1 2018-04-14 17:48:00 1
1 2018-04-14 18:57:00 1
But it's not ok for me because I want exclude row_number 5 because interval between row_number 5 and row_number 2 is > '01:30:00'. And start the new evaluation from row_number 5.
The same for row_number 13.
The right output could be:
id | datetime | count
1 2018-04-09 06:27:00 0
1 2018-04-09 14:15:00 0
1 2018-04-09 15:25:00 1
1 2018-04-09 15:35:00 1
1 2018-04-09 15:51:00 0
1 2018-04-09 17:05:00 1
1 2018-04-10 06:42:00 0
1 2018-04-10 16:39:00 0
1 2018-04-10 18:58:00 0
1 2018-04-10 19:41:00 1
1 2018-04-14 17:05:00 0
1 2018-04-14 17:48:00 1
1 2018-04-14 18:57:00 0
So right count is 5.

I'd use a recursive query for this:
WITH RECURSIVE tmp AS (
SELECT
id,
datetime,
row_number,
0 AS counting,
datetime AS last_start
FROM mytable
WHERE row_number = 1
UNION ALL
SELECT
t1.id,
t1.datetime,
t1.row_number,
CASE
WHEN lateral_1.counting THEN 1
ELSE 0
END AS counting,
CASE
WHEN lateral_1.counting THEN tmp.last_start
ELSE t1.datetime
END AS last_start
FROM
mytable AS t1
INNER JOIN
tmp ON (t1.id = tmp.id AND t1.row_number - 1 = tmp.row_number),
LATERAL (SELECT (t1.datetime - tmp.last_start) < '1h 30m'::interval AS counting) AS lateral_1
)
SELECT id, datetime, counting
FROM tmp
ORDER BY id, datetime;

Related

my-sql query to fill out the time gaps between the records

I want to write an optimized query to fill out the time gaps between the records with the stock value that is most recent to date.
The requirement is to have the latest stock value for every group of id_warehouse, id_stock, and date. The table is quite large (2 million records) and hence I would like to optimize the query that I have added below and the table grows.
daily_stock_levels:
date
id_stock
id_warehouse
new_stock
is_stock_avaible
2022-01-01
1
1
24
1
2022-01-01
1
1
25
1
2022-01-01
1
1
29
1
2022-01-02
1
1
30
1
2022-01-06
1
1
27
1
2022-01-09
1
1
26
1
Result:
date
id_stock
id_warehouse
closest_date_with_stock_value
most_recent_stock_value
2022-01-01
1
1
29
1
2022-01-02
1
1
30
1
2022-01-03
1
1
30
1
2022-01-04
1
1
30
1
2022-01-05
1
1
30
1
2022-01-06
1
1
27
1
2022-01-07
1
1
27
1
2022-01-07
1
1
27
1
2022-01-09
1
1
26
1
2022-01-10
1
1
26
1
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
2022-08-08
1
1
26
1
SELECT
sl.date,
sl.id_warehouse,
sl.id_item,
(SELECT
s.date
FROM
daily_stock_levels s
WHERE s.is_stock_available = 1
AND sl.id_warehouse = s.id_warehouse
AND sl.id_item = s.id_item
AND sl.date >= s.date
ORDER BY s.date DESC
LIMIT 1) AS closest_date_with_stock_value,
(SELECT
s.new_stock
FROM
daily_stock_levels s
WHERE s.is_stock_available = 1
AND sl.id_warehouse = s.id_warehouse
AND sl.id_item = s.id_item
AND sl.date >= s.date
ORDER BY s.date DESC
LIMIT 1) AS most_recent_stock_value
FROM
daily_stock_levels sl
GROUP BY sl.id_warehouse,
sl.id_item,
sl.date

how can I get a new column order sequence when type is only service

I have this transaction table how can I get a new column order sequence when type is only service and product.
Question
id
Type
Date
Sequence
1
Member
2021-02-24
4
1
product
2021-01-03
2
2
service
2022-04-21
5
1
product
2021-02-01
3
2
service
2022-02-16
3
1
Member
2022-02-03
6
1
Service
2021-10-23
5
2
product
2022-01-03
2
1
service
2020-12-16
1
2
product
2022-03-30
4
2
service
2021-12-01
1
1
Member
2022-04-03
7
Result
id
Type
Date
Sequence
Expected Result
1
Member
2021-02-24
4
Null
1
product
2021-01-03
2
2
2
service
2022-04-21
5
5
1
product
2021-02-01
3
3
2
service
2022-02-16
3
3
1
Member
2022-02-03
6
Null
1
Service
2021-10-23
5
4
2
product
2022-01-03
2
2
1
service
2020-12-16
1
1
2
product
2022-03-30
4
4
2
service
2021-12-01
1
1
1
Member
2022-04-03
7
Null
You can try to use CASE WHEN expression.
SELECT *, CASE WHEN Type IN ('service','product') THEN Sequence END Newcolumn
FROM T

How to query with lead() values not in current range

I´m having problems querying when lead() values are not within the range of current row, rows on the range's edge return null lead() values.
Let’s say I have a simple table to keep track of continuous counters
create table anytable
( wseller integer NOT NULL,
wday date NOT NULL,
wshift smallint NOT NULL,
wconter numeric(9,1) )
with the following values
wseller wday wshift wcounter
1 2016-11-30 1 100.5
1 2017-01-03 1 102.5
1 2017-01-25 2 103.2
1 2017-02-05 2 106.1
2 2015-05-05 2 81.1
2 2017-01-01 1 92.1
2 2017-01-01 2 93.1
3 2016-12-01 1 45.2
3 2017-01-05 1 50.1
and want net units for current year
wseller wday wshift units
1 2017-01-03 1 2
1 2017-01-25 2 0.7
1 2017-02-05 2 2.9
2 2017-01-01 1 11
2 2017-01-01 2 1
3 2017-01-05 1 4.9
If I use
seletc wseller, wday, wshift, wcounter-lead(wcounter) over (partition by wseller order by wseller, wday desc, wshift desc)
from anytable
where wday>='2017-01-01'
gives me nulls on the first wseller by partition. I´m using this query within a large CTE.
What am I doing wrong?
The scope of a window function takes into account conditions in the WHERE clause. Move the condition to the outer query:
select *
from (
select
wseller, wday, wshift,
wcounter- lead(wcounter) over (partition by wseller order by wday desc, wshift desc)
from anytable
) s
where wday >= '2017-01-01'
order by wseller, wday, wshift
wseller | wday | wshift | ?column?
---------+------------+--------+----------
1 | 2017-01-03 | 1 | 2.0
1 | 2017-01-25 | 2 | 0.7
1 | 2017-02-05 | 2 | 2.9
2 | 2017-01-01 | 1 | 11.0
2 | 2017-01-01 | 2 | 1.0
3 | 2017-01-05 | 1 | 4.9
(6 rows)

Pandas edit dataframe

I am querying a MongoDB collection, with two queries and appending them to get a single data frame.(keys are: status, date, uniqueid)
for record in results:
query1 = (record["sensordata"]["user"])
df1 = pd.DataFrame(query1.items())
query2 = (record["created_date"])
df2 = pd.DataFrame(query2.items())
index = "status"
result = df1.append(df2, index)
b = result.transpose()
print b
b.to_csv(q)
output is :
0 1 2
0 status uniqueid date
1 0 191b117fcf5c 2017-03-01 17:51:08.263000
0 1 2
0 status uniqueid date
1 1 191b117fcf5c 2017-03-01 17:51:17.216000
0 1 2
0 status uniqueid date
1 1 191b117fcf5c 2017-03-01 17:51:23.269000
0 1 2
0 status uniqueid date
1 1 191b117fcf5c 2017-03-01 18:26:17.216000
0 1 2
0 status uniqueid date
1 1 191b117fcf5c 2017-03-01 18:26:21.130000
0 1 2
0 status uniqueid date
1 0 191b117fcf5c 2017-03-01 18:26:28.217000
how to remove these extra 0 ,1 ,2 and 0,1 in rows and columns?
also, i don't want status uniqueid and date repeat everytime.
My desired output should be like this:
status uniqueid date
0 191b117fcf5c 2017-03-01 18:26:28.217000
1 191b117fcf5c 2017-03-01 19:26:28.192000
1 191b117fcf5c 2017-04-01 11:16:28.222000

Replace DataFrame rows with most recent data based on key

I have a dataframe that looks like this:
user_id val date
1 10 2015-02-01
1 11 2015-01-01
2 12 2015-03-01
2 13 2015-02-01
3 14 2015-03-01
3 15 2015-04-01
I need to run a function that calculates (let's say) the sum of vals chronologically by the dates. If a user has a more recent date, use that date, but if not, keep the older date.
For example. If I run the function with the date 2015-03-15, then the table will be:
user_id val date
1 10 2015-02-01
2 12 2015-03-01
3 14 2015-03-01
Giving me a sum of 36.
If I run the function with the date 2015-04-15, then the table will be:
user_id val date
1 10 2015-02-01
2 12 2015-03-01
3 15 2015-04-01
(User 3's row was replaced with a more recent date).
I know this is fairly esoteric, but thought I could bounce this off all of you as I have been trying to think of a simple way of doing this..
try this:
In [36]: df.loc[df.date <= '2015-03-15']
Out[36]:
user_id val date
0 1 10 2015-02-01
1 1 11 2015-01-01
2 2 12 2015-03-01
3 2 13 2015-02-01
4 3 14 2015-03-01
In [39]: df.loc[df.date <= '2015-03-15'].sort_values('date').groupby('user_id').agg({'date':'last', 'val':'last'}).reset_index()
Out[39]:
user_id date val
0 1 2015-02-01 10
1 2 2015-03-01 12
2 3 2015-03-01 14
or:
In [40]: df.loc[df.date <= '2015-03-15'].sort_values('date').groupby('user_id').last().reset_index()
Out[40]:
user_id val date
0 1 10 2015-02-01
1 2 12 2015-03-01
2 3 14 2015-03-01
In [41]: df.loc[df.date <= '2015-04-15'].sort_values('date').groupby('user_id').last().reset_index()
Out[41]:
user_id val date
0 1 10 2015-02-01
1 2 12 2015-03-01
2 3 15 2015-04-01