Hourly data from a table containing per minute data - postgresql

I have a table which contains three columns in a PostgreSQL database. The three columns are timestamp, tag and value. In this table data is automatically inserted from log file generated by SCADA server. I need hourly data from this table. (20:00:00, 21:00:00)
timestamp tag value
2019-06-06 06:00:00 x 123
2019-06-06 06:00:00 y 456
2019-06-06 06:01:00 x 123
2019-06-06 06:01:00 y 656
2019-06-06 06:02:00 x 123
2019-06-06 06:02:00 y 333
.......
.......
2019-06-06 06:59:00 x 2232
2019-06-06 06:59:00 y 654
2019-06-06 07:00:00 x 5645
2019-06-06 07:00:00 y 54654
I want data exactly at 2019-06-06 06:00:00 07:00:00 from this. The table is getting updated every minute hence I cant write it in where.
Desired Output should be like this.
timestamp tag value
2019-06-06 06:00:00 x 123
2019-06-06 06:00:00 y 456
2019-06-06 07:00:00 x 5645
2019-06-06 07:00:00 y 54654
...
.....
......
2019-06-09 07:00:00 x 5645
2019-06-09 07:00:00 y 54654

It seems you only want those rows that were recorded at exactly the full hour.
you can do that with a simple WHERE clause
select *
from the_table
where date_trunc('hour', "timestamp") = "timestamp";
date_trunc "truncates" the timestamp value to the given granularity. So minutes, seconds and milliseconds will be set to zero.

You can extract the hour part of the timestamp first, then group your result by the hour.
SELECT date_part('hour', timestamp) as hour, STRING_AGG(tag, ','), STRING_AGG(value, ',')
FROM your_table
GROUP BY hour;
Various functions to extract different parts of a timestamp.
STRING_AGG can be used to combine values from different rows.

Related

How to calculate rest of the amount after comparing current date in pyspark dataframe?

I need to calculate how much I have in my account after today. That means, for the current day
how much I left in my original Total_salary.
Below is my sample data set.
start_date end_date duration(months) Total_salary left_amount
2021-05-03 2022-05-03 12 1200 400
2019-01-01 2023-01-01 48 4800 2300
2018-01-01 2020-01-01 24 2400 0
2020-01-01 2023-01-01 36 3600 1200
2024-01-01 2027-01-01 36 3600 3600
I need get the upto current date how much I left, if end_date < current date.
Let take first row as an example, I agree with a client for working for 12 months with total
salary 1200, by each month I will receive 100 as my salary. So, I need to know today how much
I left from my original total_salary. (100*8 = 800, 1200-800 = 400)
I don't know how to get SUM up to current date.
I need to implement this in pyspark. Please anyone can help me to sort out this?
Thank you
import datetime
from pyspark.sql import functions as F
current_date = datetime.date.today()
(
df
.withColumn('left_months', F.greatest(F.lit(0), F.months_between('end_date', F.lit(current_date))))
.withColumn('left_amount', F.col('total_salary')/F.col('duration(months)') * F.col('left_months'))
.withColumn('left_amount', F.least('total_salary', 'left_amount'))
)

Convert military time integer to standard time Postgres without time zone

I have a time column in postgres called military_time that is an integer and in some cases needs to be padded Ex: 1400, 1300, 25, 0900. I need to convert to 2:00 pm,1:00 pm,12:25 am,9:00 am. I have read I need cast integer to time and then use the Postgres function to_char into the format I need but I am a little lost. I have found a bunch of syntax for other languages but nothing in Postgres sql.
This is going to be more complicated then that. You will need a way to distinguish between hour only 1400/hour and minutes 1425 and minutes only 25. The hour/hrs&minutes is simple enough:
select to_char(1400::text::time, 'HH:MI:SS AM'); 02:00:00 PM,
select to_char(1425::text::time, 'HH:MI:SS AM'); 02:25:00 PM.
Minutes only could be done as:
select to_char(('00:'|| 25::text)::time, 'HH:MI:SS AM'); 12:25:00 AM
To pull it together:
create table mil_time (time_fld integer);
insert into mil_time values (1400), (1425), (25), (700);
SELECT
time_fld,
CASE
WHEN time_fld >= 1000 THEN
to_char(time_fld::text::time, 'HH:MI:SS AM')
WHEN time_fld >= 100 THEN
to_char(('0'|| time_fld::text)::time, 'HH:MI:SS AM')
WHEN time_fld <= 60 THEN
to_char(('00:'|| time_fld::text)::time, 'HH:MI:SS AM')
ELSE
'00:00:00'
END
FROM
mil_time;
time_fld | case
----------+-------------
1400 | 02:00:00 PM
1425 | 02:25:00 PM
25 | 12:25:00 AM
700 | 07:00:00 AM
UPDATE
Explanation of time_fld::text::time. It is Postgres shorthand for cast to text then to time, so:
select pg_typeof(1400::text); text
select pg_typeof(1400::text::time); time without time zone

Postgres tsrange, filter by date and time

I have an events table that has a field called duration thats of type tsrange and that captures the beginning and end time of an event thats of type timestamp. What I want is to be able to filter all events across a certain date range and then filter those events by time. So for instance, a user should be able to filter for all events happening between (inclusive) 12-15-2019 to 12-17-2019 and that are playing at 9PM. To do this, the user submits a date range which filters all events in that date range:
WHERE lower(duration)::date <# '[start, finish]'::daterange
In the above start and finish are user submitted parameters.
Then I want to filter those events by events that are playing during a specific time e.g. 9PM, essentially only show events that have 9PM between their start and end time.
So if I have the following table:
id duration
--- ------------------------------------
A 2019-12-21 19:00...2019-12-22 01:00
B 2019-12-17 16:00...2019-12-17 18:00
C 2019-12-23 19:00...2019-12-23 21:00
D 2019-12-23 19:00...2019-12-24 01:00
E 2019-12-27 14:00...2019-12-27 16:00
And the user submits a date range of 2019-12-21 to 2019-12-27 then event B will be filtered out. Then the user submits a time of 9:00PM (21:00), in which case A, C, and D will be returned.
EDIT
I was able to get it to work using the following:
WHERE duration #> (lower(duration)::date || ' 21:00:00')::timestamp
Where the 21:00 above is the user data, but it seems a bit hackish
A tsrange contains a timestamp at 9 p.m. if and only if 9 p.m. on the starting day or 9 p.m. on the following day are part of the range.
You can use that to write your condition.
An example:
lower(r)::date + TIME '21:00' <# r OR
(lower(r)::date + 1) + TIME '21:00' <# r
is a test if r contains some timestamp at 9 p.m.
The user input from 2019-12-21 to 2019-12-27 at 21:00 means that he is interested in
select generate_series(timestamp '2019-12-21 21:00', '2019-12-27 21:00', '1 day') as t
t
---------------------
2019-12-21 21:00:00
2019-12-22 21:00:00
2019-12-23 21:00:00
2019-12-24 21:00:00
2019-12-25 21:00:00
2019-12-26 21:00:00
2019-12-27 21:00:00
(7 rows)
Hence you should check whether the duration column contains one of the timestamp:
select distinct e.*
from events e
cross join generate_series(timestamp '2019-12-21 21:00', '2019-12-27 21:00', '1 day') as t
where duration #> t
id | duration
----+-----------------------------------------------
A | ["2019-12-21 19:00:00","2019-12-22 01:10:00")
C | ["2019-12-23 19:00:00","2019-12-23 21:10:00")
D | ["2019-12-23 19:00:00","2019-12-24 01:10:00")
(3 rows)

difference on date condition in postgresql

name total date
a 100 1/2/2015
b 30 1/2/2015
c 40 1/2/2015
d 45 1/2/2015
a 20 2/2/2015
b 13 2/2/2015
a 30 3/2/2015
b 23 3/2/2015
c 20 3/2/2015
and the table goes on with different dates,
I want to find difference(a-b) for each date and occurence .. i.e
diff total date
a-b 70 1/2/2015
a-b 7 2/2/2015....
how do I do it in postgresql
Use nth_value() window function for that:
WITH t(name,total,date) AS ( VALUES
('a',100,'2016-01-01'::DATE),
('b',30,'2016-01-01'::DATE),
('c',40,'2016-01-01'::DATE),
('d',45,'2016-01-01'::DATE),
('a',20,'2016-01-02'::DATE),
('b',13,'2016-01-02'::DATE)
)
SELECT
DISTINCT ON (date)
'a-b' AS diff,
(nth_value(total,1) OVER (PARTITION BY date) -
nth_value(total,2) OVER (PARTITION BY date)) total_diff,
date
FROM t
WHERE name IN ('a','b');
Result:
diff | total_diff | date
------+------------+------------
a-b | 70 | 2016-01-01
a-b | 7 | 2016-01-02
(2 rows)

Difference between timestamp variable and time variable

I have the following table in postgresql (table1):
timestamp1 timestamp without time zone NOT NULL,
variable1 integer,
timestamp2 timestamp without time zone NOT NULL
I want calculate the timestamp2.
Note that variable 1 is of type integer in table 1, but in practice is a time variable defined in hours
The timestamp2 is defined by difference of timestamp1 and variable1
(timestamp2= timestamp1 – variable1)
For example,
2013-02-06 07:00:00 - 5 = 2013-02-06 02:00:00
2013-02-06 09:00:00 - 12 = 2013-02-05 21:00:00
2013-02-06 12:00:00 - 4.5 = 2013-02-06 07:30:00
How to do this calculation (of timestamp2) in postgresql?
select timestamp1 - interval '1 hour' * variable1
from table1
Postgres understands the number as time:
5 -> 05:00:00
12 -> 12:00:00
4.5 -> 04:30:00