how can I generate 6 month dates from a specific date - pyspark

I have a table pqdf.
which have Effective_Date column, first I will do distinct of Effective_Date.
now from this date I want to generate 6 months dates,
if my start date is 2022-01-01 then my table last row value will be 2022-06-30. and total row count be around 181 rows
+----------------+
| Effective_Date |
+----------------+
| 2022-01-01 |
| 2022-01-01 |
| 2022-01-01 |
+----------------+
please help
I tried below but query but its not working.
select explode (sequence( first_value(to_date('Effective_Date'))), to_date(DATEADD(month, 6, Effective_Date)), interval 1 day) as date from pqdf

See if this works. If it doesn't, can you please also provide the error message that you are seeing?
WITH pqdf AS (
SELECT "2022-01-01" AS Effective_Date
)
SELECT
EXPLODE(SEQUENCE(
DATE(Effective_Date),
TO_DATE(DATEADD(MONTH, 6, DATE(Effective_Date))),
INTERVAL 1 DAY)
) AS date
FROM
pqdf

Related

How to calculate current month / six months ago and result as a percent change in Postgresql?

create table your_table(type text,compdate date,amount numeric);
insert into your_table values
('A','2022-01-01',50),
('A','2022-02-01',76),
('A','2022-03-01',300),
('A','2022-04-01',234),
('A','2022-05-01',14),
('A','2022-06-01',9),
('B','2022-01-01',201),
('B','2022-02-01',33),
('B','2022-03-01',90),
('B','2022-04-01',41),
('B','2022-05-01',11),
('B','2022-06-01',5),
('C','2022-01-01',573),
('C','2022-02-01',77),
('C','2022-03-01',109),
('C','2022-04-01',137),
('C','2022-05-01',405),
('C','2022-06-01',621);
I am trying to calculate to show the percentage change in $ from 6 months prior to today's date for each type. In example:
Type A decreased -82% over six months.
Type B decreased -97.5%
Type C increased +8.4%.
How do I write this in postgresql mixed in with other statements?
It looks like comparing against 5, not 6 months prior, and 2022-06-01 isn't today's date.
Join the table with itself based on the matching type and desired time difference. Demo
select
b.type,
b.compdate,
a.compdate "6 months earlier",
b.amount "amount 6 months back",
round(-(100-b.amount/a.amount*100),2) "change"
from your_table a
inner join your_table b
on a.type=b.type
and a.compdate = b.compdate - '5 months'::interval;
-- type | compdate | 6 months earlier | amount 6 months back | change
--------+------------+------------------+----------------------+--------
-- A | 2022-06-01 | 2022-01-01 | 9 | -82.00
-- B | 2022-06-01 | 2022-01-01 | 5 | -97.51
-- C | 2022-06-01 | 2022-01-01 | 621 | 8.38

Using Timescale to find the latest value per interval

I have timeseries data that has up to millisecond accuracy. Some of these timestamps can coincide on the exact time which can therefore be sorted out by a database id column to figure out which is the latest.
I am trying to use Timescale to get the latest values per second.
Here is an example of the data I'm looking at
time db_id value
2020-01-01 08:39:23.293 | 4460 | 136.01 |
2020-01-01 08:39:23.393 | 4461 | 197.95 |
2020-01-01 08:40:38.973 | 4462 | 57.95 |
2020-01-01 08:43:01.223 | 4463 | 156 |
2020-01-01 08:43:26.577 | 4464 | 253.43 |
2020-01-01 08:43:26.577 | 4465 | 53.68 |
2020-01-01 08:43:26.577 | 4466 | 160.00 |
When obtaining latest price per second, my results should look like this
time value
2020-01-01 08:39:23 | 197.95 |
2020-01-01 08:39:24 | 197.95 |
.
.
.
2020-01-01 08:40:37 | 197.95 |
2020-01-01 08:40:38 | 57.95 |
2020-01-01 08:40:39 | 57.95 |
.
.
.
2020-01-01 08:43:25 | 57.95 |
2020-01-01 08:43:26 | 160.00 |
2020-01-01 08:43:27 | 160.00 |
.
.
.
I've successfully obtained the latest results per second using the Timescale time_bucket
SELECT last(value, db_id), time_bucket('1 seconds', time) AS per_second FROM timeseries GROUP BY per_second ORDER BY per_second DESC;
but it leaves holes in the time column.
time value
2020-01-01 08:39:23 | 197.95 |
2020-01-01 08:40:38 | 57.95 |
2020-01-01 08:43:26 | 160.00 |
The solution I thought up of is creating a database with per second timestamps and null values, migrating data from the previous resulting table and then replacing the null values with last occurring value but it seems like a lot of intermediary steps.
I'd like to know if there is a better approach to this issue of finding the "latest value" per second, minute, hour etc. I originally tried solving the issue with python as it seemed like a simple issue but it took up a lot of computing time.
Found a nice working solution to my problem.
It involves four main steps:
getting latest values
select
time_bucket('1 second', time + '1 second') as interval,
last(val, db_id) as last_value
from table
where time > <date_start> and time < <date_end>
group by interval
order by time;
This will produce a table that has the latest values. last also takes advantage of a column in case another level of sorting is required.
e.g.
time last_value
2020-01-01 08:39:23 | 197.95 |
2020-01-01 08:40:38 | 57.95 |
2020-01-01 08:43:26 | 160.00 |
Note that I shift the time by one second with + '1 second' since I only want data before a particular second - without this it will consider on-the-second data as part of the last price.
creating a table with timestamps per second
select
time_bucket_gapfill('1 second', time) as per_second
from table
where time > <date_start> and time < <date_end>
group by per_second
order by per_second;
Here I produce a table where each row has per second timestamps.
e.g.
per_second
2020-01-01 00:00:00.000
2020-01-01 00:00:01.000
2020-01-01 00:00:02.000
2020-01-01 00:00:03.000
2020-01-01 00:00:04.000
2020-01-01 00:00:05.000
join them together and add a value_partition column
select
per_second,
last_value,
sum(case when last_value is null then 0 else 1 end) over (order by per_second) as value_partition
from
(
select
time_bucket('1 second', time + '1 second') as interval,
last(val, db_id) as last_value
from table
where time > <date_start> and time < <date_end>
group by interval, time
) a
right join
(
select
time_bucket_gapfill('1 second', time) as per_second
from table
where time > <date_start> and time < <date_end>
group by per_second
) b
on a.interval = b.per_second
Inspired by this answer, the goal is to have a counter (value_partition) that increments only if the value is not null.
e.g.
per_second latest_value value_partition
2020-01-01 00:00:00.000 NULL 0
2020-01-01 00:00:01.000 15.82 1
2020-01-01 00:00:02.000 NULL 1
2020-01-01 00:00:03.000 NULL 1
2020-01-01 00:00:04.000 NULL 1
2020-01-01 00:00:05.000 NULL 1
2020-01-01 00:00:06.000 NULL 1
2020-01-01 00:00:07.000 NULL 1
2020-01-01 00:00:08.000 NULL 1
2020-01-01 00:00:09.000 NULL 1
2020-01-01 00:00:10.000 15.72 2
2020-01-01 00:00:10.000 14.67 3
filling in the null values
select
per_second,
first_value(last_value) over (partition by value_partition order by per_second) as latest_value
from
(
select
per_second,
last_value,
sum(case when last_value is null then 0 else 1 end) over (order by per_second) as value_partition
from
(
select
time_bucket('1 second', time + '1 second') as interval,
last(val, db_id) as last_value
from table
where time > <date_start> and time < <date_end>
group by interval
) a
right join
(
select
time_bucket_gapfill('1 second', time) as per_second
from table
where time > <date_start> and time < <date_end>
group by per_second
) b
on a.interval = b.per_second
) as q
This final step brings everything together.
This takes advantage of the value_partition column and overwrites the null values accordingly.
e.g.
per_second latest_value
2020-01-01 00:00:00.000 NULL
2020-01-01 00:00:01.000 15.82
2020-01-01 00:00:02.000 15.82
2020-01-01 00:00:03.000 15.82
2020-01-01 00:00:04.000 15.82
2020-01-01 00:00:05.000 15.82
2020-01-01 00:00:06.000 15.82
2020-01-01 00:00:07.000 15.82
2020-01-01 00:00:08.000 15.82
2020-01-01 00:00:09.000 15.82
2020-01-01 00:00:10.000 15.72
2020-01-01 00:00:10.000 14.67

Calculate Average of Price per Items per Month in a Few Years Postgresql

I have this table inside my postgresql database,
item_code | date | price
==============================
aaaaaa.1 |2019/12/08 | 3.04
bbbbbb.b |2019/12/08 | 19.48
261893.c |2019/12/08 | 7.15
aaaaaa.1 |2019/12/17 | 4.15
bbbbbb.2 |2019/12/17 | 20
xxxxxx.5 |2019/03/12 | 3
xxxxxx.5 |2019/03/18 | 4.5
how can i calculate the average per item, per month over the year. so i get the result something like:
item_code | month | price
==============================
aaaaaa.1 | 2019/12 | 3.59
bbbbbb.2 | 2019/12 | 19.74
261893.c | 2019/12 | 7.15
xxxxxx.5 | 2019/03 | 3.75
I have tried to look and apply many alternatives but i am still not get the point, would really appreciate your help because i am new to postgresql.
I don't see how the question relates to a moving average. It seems you just want group by:
select item_code, date_trunc('month', date) as date_month, avg(price) as price
from mytable
group by item_code, date_month
This gives date_month as a date, truncated to the first day of the month - which I find more useful that the format you suggested. But it you do want that:
to_char(date, 'YYYY/MM') as date_month

Stored procedure (or better way) to add a new row to existing table every day at 22:00

I will be very grateful for your advice regarding the following issue.
Given:
PostgreSQL database
Initial (basic) query
select day, Value_1, Value_2, Value_3
from table
where day=current_date
which returns a row with following columns
Day | Value_1(int) | Value_2(int) | Value 3 (int)
2019-11-14 | 10 | 10 | 14
It is needed to create a view with this starting information and add a new row every day based on the outcome of initial query executed at 22:00.
The expected outcome tomorrow at 22:01 will be
Day | Value_1 | Value_2 | Value_3
2019-11-14 | 10 | 10 | 14
2019-11-15 | N | M | P
Many thanks in advance for your time and support.

Postgresql Time Series for each Record

I'm having issues trying to wrap my head around how to extract some time series stats from my Postgres DB.
For example, I have several stores. I record how many sales each store made each day in a table that looks like:
+------------+----------+-------+
| Date | Store ID | Count |
+------------+----------+-------+
| 2017-02-01 | 1 | 10 |
| 2017-02-01 | 2 | 20 |
| 2017-02-03 | 1 | 11 |
| 2017-02-03 | 2 | 21 |
| 2017-02-04 | 3 | 30 |
+------------+----------+-------+
I'm trying to display this data on a bar/line graph with different lines per Store and the blank dates filled in with 0.
I have been successful getting it to show the sum per day (combining all the stores into one sum) using generate_series, but I can't figure out how to separate it out so each store has a value for each day... the result being something like:
["Store ID 1", 10, 0, 11, 0]
["Store ID 2", 20, 0, 21, 0]
["Store ID 3", 0, 0, 0, 30]
It is necessary to build a cross join dates X stores:
select store_id, array_agg(total order by date) as total
from (
select store_id, date, coalesce(sum(total), 0) as total
from
t
right join (
generate_series(
(select min(date) from t),
(select max(date) from t),
'1 day'
) gs (date)
cross join
(select distinct store_id from t) s
) using (date, store_id)
group by 1,2
) s
group by 1
order by 1
;
store_id | total
----------+-------------
1 | {10,0,11,0}
2 | {20,0,21,0}
3 | {0,0,0,30}
Sample data:
create table t (date date, store_id int, total int);
insert into t (date, store_id, total) values
('2017-02-01',1,10),
('2017-02-01',2,20),
('2017-02-03',1,11),
('2017-02-03',2,21),
('2017-02-04',3,30);