How to calculate the AVG time stamp for last one week - snowflake-task

I am trying to calculate the AVG timestamp for last 7 days in Snowflake database.
Data type is VARCHAR and below is the sample data.
LOAD_TIME VARCHAR(10) -
Sample Data:
LOAD_TIME (HHMM)
1017
0927
0713
0645
1753
2104
1253

If you convert these values to epoch_seconds, it's possible to calculate the average:
select to_varchar(to_timestamp(avg(date_part(epoch_second,to_timestamp(load_time,'HH24MI')))), 'HH24MI') as average
from values
('1017'),('0927'),('0713'),('0645'),('1753'),('2104'),('1253') tmp (load_time);
+---------+
| AVERAGE |
+---------+
| 1213 |
+---------+

Related

Is there a way to extract start and end from a TimeWeightSummary - Timescaledb

I'm using the time_weight function from timescaledb in a SQL Query. From the doc time_weight is defined as:
An aggregate that produces a TimeWeightSummary from timestamps and associated values.
Also in the doc we can read
Internally, the first and last points seen as well as the calculated weighted sum are stored in each TimeWeightSummary
Is there a way to extract the first and the last point value and timestamp from this TimeWeightSummary ?
Here is my SQL query:
WITH t as (
SELECT
time_bucket('1 hour'::interval, time) as dt,
time_weight('Linear', time, p) AS tw
FROM tsdb.asset_sensor_sample
WHERE asset = 16
GROUP BY time_bucket('1 hour'::interval, time)
)
SELECT
dt,
tw,
average(tw) -- extract the average from the time weight summary
FROM t LIMIT 5;
Here is the result:
dt | tw | average
------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------
2022-06-20 10:00:00+00 | (version:1,first:(ts:"2022-06-20 10:15:43.274976+00",val:127334.29194694175),last:(ts:"2022-06-20 10:55:43.274976+00",val:54155.74946698491),weighted_sum:239731880571173.1,method:Linear) | 99888.28357132213
2022-06-20 11:00:00+00 | (version:1,first:(ts:"2022-06-20 11:05:43.274976+00",val:72054.79620663091),last:(ts:"2022-06-20 11:55:43.274976+00",val:117667.04302813657),weighted_sum:247386550516233.63,method:Linear) | 82462.1835054112
2022-06-20 12:00:00+00 | (version:1,first:(ts:"2022-06-20 12:05:43.274976+00",val:95982.76112628987),last:(ts:"2022-06-20 12:55:43.274976+00",val:83790.58995691259),weighted_sum:259849005747598.72,method:Linear) | 86616.33524919958
2022-06-20 13:00:00+00 | (version:1,first:(ts:"2022-06-20 13:05:43.274976+00",val:108062.03874288549),last:(ts:"2022-06-20 13:55:43.274976+00",val:117030.17726887773),weighted_sum:329446374963518.94,method:Linear) | 109815.45832117298
2022-06-20 14:00:00+00 | (version:1,first:(ts:"2022-06-20 14:05:43.274976+00",val:64564.42379745973),last:(ts:"2022-06-20 14:55:43.274976+00",val:95290.48787317045),weighted_sum:303986848836652.75,method:Linear) | 101328.94961221758
(5 rows)
We can clearly see that these informations are available in the tw column. But I don't know to extract it.

Calculate Average of Price per Items per Month in a Few Years Postgresql

I have this table inside my postgresql database,
item_code | date | price
==============================
aaaaaa.1 |2019/12/08 | 3.04
bbbbbb.b |2019/12/08 | 19.48
261893.c |2019/12/08 | 7.15
aaaaaa.1 |2019/12/17 | 4.15
bbbbbb.2 |2019/12/17 | 20
xxxxxx.5 |2019/03/12 | 3
xxxxxx.5 |2019/03/18 | 4.5
how can i calculate the average per item, per month over the year. so i get the result something like:
item_code | month | price
==============================
aaaaaa.1 | 2019/12 | 3.59
bbbbbb.2 | 2019/12 | 19.74
261893.c | 2019/12 | 7.15
xxxxxx.5 | 2019/03 | 3.75
I have tried to look and apply many alternatives but i am still not get the point, would really appreciate your help because i am new to postgresql.
I don't see how the question relates to a moving average. It seems you just want group by:
select item_code, date_trunc('month', date) as date_month, avg(price) as price
from mytable
group by item_code, date_month
This gives date_month as a date, truncated to the first day of the month - which I find more useful that the format you suggested. But it you do want that:
to_char(date, 'YYYY/MM') as date_month

Pyspark calculated field based off time difference

I have a table that looks like this:
trip_distance | tpep_pickup_datetime | tpep_dropoff_datetime|
+-------------+----------------------+----------------------+
1.5 | 2019-01-01 00:46:40 | 2019-01-01 00:53:20 |
In the end, I need to get create a speed column for each row, so something like this:
trip_distance | tpep_pickup_datetime | tpep_dropoff_datetime| speed |
+-------------+----------------------+----------------------+-------+
1.5 | 2019-01-01 00:46:40 | 2019-01-01 00:53:20 | 13.5 |
So this is what I'm trying to do to get there. I figure I should add an interium column to help out, called trip_time which is a calculation of tpep_dropoff_datetime - tpep_pickup_datetime. Here is the code I'm doing to get that:
df4 = df.withColumn('trip_time', df.tpep_dropoff_datetime - df.tpep_pickup_datetime)
which is producing a nice trip_time column:
trip_distance | tpep_pickup_datetime | tpep_dropoff_datetime| trip_time|
+-------------+----------------------+----------------------+-----------------------+
1.5 | 2019-01-01 00:46:40 | 2019-01-01 00:53:20 | 6 minutes 40 seconds|
But now I want to do the speed column, and this how I'm trying to do that:
df4 = df4.withColumn('speed', (F.col('trip_distance') / F.col('trip_time')))
But that is giving me this error:
AnalysisException: cannot resolve '(trip_distance/trip_time)' due to data type mismatch: differing types in '(trip_distance/trip_time)' (float and interval).;;
Is there a better way?
One option is to convert your time to unix_timestamp which is in unit of seconds, and then you can do the subtraction, which gives you interval as integer that can be further used to calculate speed:
import pyspark.sql.functions as f
df.withColumn('speed', f.col('trip_distance') * 3600 / (
f.unix_timestamp('tpep_dropoff_datetime') - f.unix_timestamp('tpep_pickup_datetime'))
).show()
+-------------+--------------------+---------------------+-----+
|trip_distance|tpep_pickup_datetime|tpep_dropoff_datetime|speed|
+-------------+--------------------+---------------------+-----+
| 1.5| 2019-01-01 00:46:40| 2019-01-01 00:53:20| 13.5|
+-------------+--------------------+---------------------+-----+

Stored procedure (or better way) to add a new row to existing table every day at 22:00

I will be very grateful for your advice regarding the following issue.
Given:
PostgreSQL database
Initial (basic) query
select day, Value_1, Value_2, Value_3
from table
where day=current_date
which returns a row with following columns
Day | Value_1(int) | Value_2(int) | Value 3 (int)
2019-11-14 | 10 | 10 | 14
It is needed to create a view with this starting information and add a new row every day based on the outcome of initial query executed at 22:00.
The expected outcome tomorrow at 22:01 will be
Day | Value_1 | Value_2 | Value_3
2019-11-14 | 10 | 10 | 14
2019-11-15 | N | M | P
Many thanks in advance for your time and support.

Spark time series data generation

I am trying to generate time series data in Spark and Scala. I have the following data in DataFrame which is hourly data
sid|date |count
200|2016-04-30T18:00:00:00+00:00 | 10
200 |2016-04-30T21:00:00:00+00:00 | 5
I want to generate time series data for the last 2 days hourly by taking the max time from the input. In my case the series data should start from 2016-04-30T21:00:00:00+00:00 and generate hourly data.Any hour without data then it should put the count as null. Sample output as follows
id|sid|date |count
1 |200|2016-04-28T22:00:00:00+00:00 |
2 |200|2016-04-28T23:00:00:00+00:00 |
3 |200|2016-04-29T00:00:00:00+00:00 |
--------------------------------------
45|200|2016-04-30T18:00:00:00+00:00 |10
--------------------------------------
--------------------------------------
48|200|2016-04-30T21:00:00:00+00:00 |5
Thanks,