timestamp issue while loading data to postgresql - postgresql

I have csv file in which I have a timestamp column in the format below:
starttime
--------
2018-01-01 12:00
2018-01-01 12:30
2018-01-01 12:45
2018-01-01 12:15
When I load the data into a table using psql, it is loaded with the following format:
staattime
----------
2018-01-01 12:00:00
2018-01-01 12:00:30
2018-01-01 12:00:45
2018-01-01 12:00:15
Ideally it should use the yyyy-MM-dd hh:mm:ss format, but instead it loads the values as yyyy-MM-ss hh:ss:mm
PSQL command : \copy tablename from 'D:/filename.csv' CSV HEADER DELIMITER ',';
file sample data :
EKEY DKEY FKEY SEKEY TZKEY DATEKEY STIME
0 1032 4265 72 9863 23 2017-01-01 09:30
0 1032 4265 72 9863 23 2017-01-01 09:30
0 1032 4265 72 9863 26 2017-01-01 11:00
0 1032 4265 72 9863 27 2017-01-01 11:30
0 570 3785 73 9863 2 2017-01-01 06:00
0 336 3785 73 9863 2 2017-01-01 06:00
0 570 3785 73 9863 2 2017-01-01 06:00
Thanks in advance

Related

How do I choose the minimum value from rows within certain hour range?

I have one row per hour for each day. For hours 06:00 to 21:00 I need to use the lowest values ( minimum value) found in Column Price of hours 22:00 (day before) to 05:00 and store it in the column Lowest on each row for hours 06:00 to 21:00. Else column lowes will show the same value as in Price.
How should the Excel formula in column Lowest look like to solve this?
How I need it to look (minimum value shown in Bold)
Date & hour
DatePart
Hour
Price
Lowest
2018-01-01 00:00
2018-01-01
00:00
258,86
258,86
2018-01-01 01:00
2018-01-01
01:00
259,85
259,85
2018-01-01 02:00
2018-01-01
02:00
256,6
256,6
2018-01-01 03:00
2018-01-01
03:00
242,84
242,84
2018-01-01 04:00
2018-01-01
04:00
243,23
243,23
2018-01-01 05:00
2018-01-01
05:00
177,07
177,07
2018-01-01 06:00
2018-01-01
06:00
174,8
177,07
2018-01-01 07:00
2018-01-01
07:00
175
177,07
2018-01-01 08:00
2018-01-01
08:00
194,27
177,07
2018-01-01 09:00
2018-01-01
09:00
203,81
177,07
2018-01-01 10:00
2018-01-01
10:00
243,43
177,07
2018-01-01 11:00
2018-01-01
11:00
252,47
177,07
2018-01-01 12:00
2018-01-01
12:00
236,84
177,07
2018-01-01 13:00
2018-01-01
13:00
245,89
177,07
2018-01-01 14:00
2018-01-01
14:00
253,75
177,07
2018-01-01 15:00
2018-01-01
15:00
260,14
177,07
2018-01-01 16:00
2018-01-01
16:00
265,75
177,07
2018-01-01 17:00
2018-01-01
17:00
269,68
177,07
2018-01-01 18:00
2018-01-01
18:00
268,3
177,07
2018-01-01 19:00
2018-01-01
19:00
265,06
177,07
2018-01-01 20:00
2018-01-01
20:00
262,5
177,07
2018-01-01 21:00
2018-01-01
21:00
260,24
177,07
2018-01-01 22:00
2018-01-01
22:00
256,5
256,5
2018-01-01 23:00
2018-01-01
23:00
244,61
244,61
2018-01-02 00:00
2018-01-02
00:00
248,54
248,54
2018-01-02 01:00
2018-01-02
01:00
227,7
227,7
2018-01-02 02:00
2018-01-02
02:00
243,62
243,62
2018-01-02 03:00
2018-01-02
03:00
246,08
246,08
2018-01-02 04:00
2018-01-02
04:00
252,96
252,96
2018-01-02 05:00
2018-01-02
05:00
263,88
263,88
2018-01-02 06:00
2018-01-02
06:00
273,32
227,7
2018-01-02 07:00
2018-01-02
07:00
299,86
227,7
2018-01-02 08:00
2018-01-02
08:00
313,92
227,7
2018-01-02 09:00
2018-01-02
09:00
329,65
227,7
2018-01-02 10:00
2018-01-02
10:00
344,5
227,7
2018-01-02 11:00
2018-01-02
11:00
346,27
227,7
2018-01-02 12:00
2018-01-02
12:00
339,78
227,7
2018-01-02 13:00
2018-01-02
13:00
335,25
227,7
2018-01-02 14:00
2018-01-02
14:00
353,74
227,7
2018-01-02 15:00
2018-01-02
15:00
374,09
227,7
2018-01-02 16:00
2018-01-02
16:00
409,68
227,7
2018-01-02 17:00
2018-01-02
17:00
416,76
227,7
2018-01-02 18:00
2018-01-02
18:00
371,53
227,7
2018-01-02 19:00
2018-01-02
19:00
331,32
227,7
2018-01-02 20:00
2018-01-02
20:00
303,6
227,7
2018-01-02 21:00
2018-01-02
21:00
283,64
227,7
2018-01-02 22:00
2018-01-02
22:00
275,18
275,18
2018-01-02 23:00
2018-01-02
23:00
271,35
271,35
First Method - volatile and lazy
You could go with this formula in row 2 of your Lowest column (where Lowest is in column E) and copied down:
=IF(C2=TIME(6,0,0), MIN(OFFSET(D2,-MIN(8,ROW()-1),0,MIN(8,ROW()-1),1)),
IF( (C2 > TIME(6,0,0))*(C2 < TIME(22,0,0)), E1,
D2) )
I put mine on the side of yours and called it Low to test if the correct answer was reached, where you have your table starting in A1 like this:
In an Excel table (ctrl t) it is more readable like this:
=IF([#Hour]=TIME(6,0,0), MIN(OFFSET([#Price],-MIN(8,ROW()-1),0,MIN(8,ROW()-1),1)),
IF( ([#Hour] > TIME(6,0,0))*([#Hour] < TIME(22,0,0)), E1,
[#Price]) )
Craner Method - non-volatile and non-lazy
This used INDEX instead of OFFSET as proposed by Scott Craner - should make the worksheet more responsive.
=IF(C2=TIME(6,0,0), MIN(INDEX(D:D,MAX(1,ROW()-8)):INDEX(D:D,ROW()-1)),
IF( (C2 > TIME(6,0,0))*(C2 < TIME(22,0,0)), F1,
D2) )
or in Excel Table:
=IF([#Hour]=TIME(6,0,0), MIN(INDEX(D:D,MAX(1,ROW()-8)):INDEX(D:D,ROW()-1)),
IF( ([#Hour] > TIME(6,0,0))*([#Hour] < TIME(22,0,0)), H1,
[#Price]) )
If the data is not sorted use one of these formula. If the data is sorted like the example shows then the INDEX version of Mark's Formula will be quicker on large datasets.
Nest MINIFS in an IF:
=IF(AND(C2>=TIME(6,0,0),C2<=TIME(21,0,0)),MINIFS(D:D,A:A,">="&B2-1+TIME(22,0,0),A:A,"<="&B2+TIME(5,0,0)),D2)
If one does not have MINIFS we can use AGGREGATE:
=IF(AND(C2>=TIME(6,0,0),C2<=TIME(21,0,0)),AGGREGATE(15,7,$D$2:$D$49/(($A$2:$A$49>=B2-1+TIME(22,0,0))*($A$2:$A$49<=B2+TIME(5,0,0))),1),D2)
note that we need to shift from full column references to just the data set.

Unable to Calculate 7 Day Moving Average due to inconsistent dates

I just noticed that my code below is not actually a 7 day moving avg, and instead it is a 7 row moving avg. The dates in my table spans several months and I am trying to iron out since I have inconsistent data flow so I can't expect the last 7 rows of the window function to actually represent a 7 day avg. Thanks.
select date, sales,
avg(sales) over(order by date rows between 6 preceding and current row)
from sales_info
order by date
You can get a bit closer to a true 7 day moving average by using RANGE instead of ROWS for your range specification.
Read more about window function frames here.
I believe this should work for you:
select date, sales,
avg(sales) over(order by date range between '6 days' preceding and current row)
from sales_info
order by date;
Here's a demonstration with made up data:
SELECT i,
t,
avg(i) OVER (ORDER BY t RANGE BETWEEN '6 days' preceding and current row) FROM (
SELECT i, t
FROM generate_series('2021-01-01'::timestamp, '2021-02-01'::timestamp, '1 day') WITH ORDINALITY as g(t, i)
) sub;
i | t | avg
----+---------------------+------------------------
1 | 2021-01-01 00:00:00 | 1.00000000000000000000
2 | 2021-01-02 00:00:00 | 1.5000000000000000
3 | 2021-01-03 00:00:00 | 2.0000000000000000
4 | 2021-01-04 00:00:00 | 2.5000000000000000
5 | 2021-01-05 00:00:00 | 3.0000000000000000
6 | 2021-01-06 00:00:00 | 3.5000000000000000
7 | 2021-01-07 00:00:00 | 4.0000000000000000
8 | 2021-01-08 00:00:00 | 5.0000000000000000
9 | 2021-01-09 00:00:00 | 6.0000000000000000
10 | 2021-01-10 00:00:00 | 7.0000000000000000
11 | 2021-01-11 00:00:00 | 8.0000000000000000
12 | 2021-01-12 00:00:00 | 9.0000000000000000
13 | 2021-01-13 00:00:00 | 10.0000000000000000
14 | 2021-01-14 00:00:00 | 11.0000000000000000
15 | 2021-01-15 00:00:00 | 12.0000000000000000
16 | 2021-01-16 00:00:00 | 13.0000000000000000
17 | 2021-01-17 00:00:00 | 14.0000000000000000
18 | 2021-01-18 00:00:00 | 15.0000000000000000
19 | 2021-01-19 00:00:00 | 16.0000000000000000
20 | 2021-01-20 00:00:00 | 17.0000000000000000
21 | 2021-01-21 00:00:00 | 18.0000000000000000
22 | 2021-01-22 00:00:00 | 19.0000000000000000
23 | 2021-01-23 00:00:00 | 20.0000000000000000
24 | 2021-01-24 00:00:00 | 21.0000000000000000
25 | 2021-01-25 00:00:00 | 22.0000000000000000
26 | 2021-01-26 00:00:00 | 23.0000000000000000
27 | 2021-01-27 00:00:00 | 24.0000000000000000
28 | 2021-01-28 00:00:00 | 25.0000000000000000
29 | 2021-01-29 00:00:00 | 26.0000000000000000
30 | 2021-01-30 00:00:00 | 27.0000000000000000
31 | 2021-01-31 00:00:00 | 28.0000000000000000
32 | 2021-02-01 00:00:00 | 29.0000000000000000

Take data for each day in tsrange

Assuming a have table like
+----------------------------+-----+-----------+
+ tsrange + id + anyvalues +
+----------------------------------------------+
+["2019-09-20","2019-09-25") + 1 + ... +
+["2019-09-01","2019-09-23") + 2 + ... +
+["2019-09-15","2019-09-22") + 3 + ... +
+ ... + ... + ... +
+----------------------------+-----+-----------+
Is it possible to get data state for each day from 2019-09-01 till 2019-09-25?
I just have no idea what could be query or probably any function exists for it purpose.
So in output i'd like to get 25 raws with values for each id (if it exists for it)
I think you are looking for something like this? The expected output would help determine if this is correct:
select d, count(id)
FROM YOUR_TABLE
RIGHT JOIN generate_series('2019-09-01'::timestamp, '2019-09-25'::timestamp, interval '1 day') AS g(d) on tsrange #> d
group by d order by 1;
d | count
---------------------+-------
2019-09-01 00:00:00 | 1
2019-09-02 00:00:00 | 1
2019-09-03 00:00:00 | 1
2019-09-04 00:00:00 | 1
2019-09-05 00:00:00 | 1
2019-09-06 00:00:00 | 1
2019-09-07 00:00:00 | 1
2019-09-08 00:00:00 | 1
2019-09-09 00:00:00 | 1
2019-09-10 00:00:00 | 1
2019-09-11 00:00:00 | 1
2019-09-12 00:00:00 | 1
2019-09-13 00:00:00 | 1
2019-09-14 00:00:00 | 1
2019-09-15 00:00:00 | 2
2019-09-16 00:00:00 | 2
2019-09-17 00:00:00 | 2
2019-09-18 00:00:00 | 2
2019-09-19 00:00:00 | 2
2019-09-20 00:00:00 | 3
2019-09-21 00:00:00 | 3
2019-09-22 00:00:00 | 2
2019-09-23 00:00:00 | 1
2019-09-24 00:00:00 | 1
2019-09-25 00:00:00 | 0
(25 rows)

In Postresql. How to replicate rows based on the number value in a column

Here is problem below:
Name Start Time End Time Number
A1 5:13 PM 5:43 PM 0
A2 7:06 PM 8:51 PM 2
A3 6:36 PM 8:06 PM 3
A4 4:51 PM 7:51 PM 4
I would like to replicate rows based on Number values and include three new Columns (New_Start_Time, New_End_Time, and Minutes) I'm new to Sql, how I can do this in Postresql?
I expected the result below:
Name Start Time End Time Number New_Start_Time New_End_Time
A1 5:13 PM 5:43 PM 0 5:13 PM 5:43 PM
A2 7:06 PM 8:51 PM 2 7:06 PM 8:00 PM
A2 7:06 PM 8:51 PM 2 8:00 PM 8:51 PM
A3 6:36 PM 8:06 PM 3 6:36 PM 7:00 PM
A3 6:36 PM 8:06 PM 3 7:00 PM 8:00 PM
A3 6:36 PM 8:06 PM 3 8:00 PM 8:06 PM
A4 4:51 PM 7:51 PM 4 4:51 PM 5:00 PM
A4 4:51 PM 7:51 PM 4 5:00 PM 6:00 PM
A4 4:51 PM 7:51 PM 4 6:00 PM 7:00 PM
A4 4:51 PM 7:51 PM 4 7:00 PM 7:51 PM
This can be done using generate_series() and calculating the number of hours between start and end time.
So we need to first calculate the "base start time" by rounding the start_time to the full hour. This is also used to add the hours when duplicating the rows:
with rounded as (
select name,
start_time,
end_time,
date_trunc('hour', start_time)::time as base_start_time,
extract(hour from (date_trunc('hour', end_time) + interval '1 hour') - date_trunc('hour', start_time))::int as num_hours
from times
)
select name,
start_time,
end_time,
case
when h = 1 then start_time
else base_start_time + interval '1 hour' * (h - 1)
end as new_start_time,
case
when h = num_hours then end_time
else base_start_time + interval '1 hour' * h
end as new_end_time
from rounded
cross join generate_series(1, num_hours, 1) as t(h)
order by name, new_start_time;
The CTE is used to calculate the base offset and the number of hours that need to be generated. If you are sure you can trust your number column, you can replace the extract(hour ...) as num_hours expression with just number as num_hours.
The new start and new end is then calculated based on which "hour" the row reflects. For the first hour we use the existing start time, for all others we just add the number of hours needed. For the new end time we need to check if it's the last hour.
The above returns:
name | start_time | end_time | new_start_time | new_end_time
-----+------------+----------+----------------+-------------
A1 | 17:13 | 17:43 | 17:13 | 17:43
A2 | 19:06 | 20:51 | 19:06 | 20:00
A2 | 19:06 | 20:51 | 20:00 | 20:51
A3 | 18:36 | 20:06 | 18:36 | 19:00
A3 | 18:36 | 20:06 | 19:00 | 20:00
A3 | 18:36 | 20:06 | 20:00 | 20:06
A4 | 16:51 | 19:51 | 16:51 | 17:00
A4 | 16:51 | 19:51 | 17:00 | 18:00
A4 | 16:51 | 19:51 | 18:00 | 19:00
A4 | 16:51 | 19:51 | 19:00 | 19:51
Online example: https://rextester.com/GAZP30312

Finding the next occurrence of particular day of the week

How can I take a datetime column start_at convert it to a day of week and find out the next future occurrence relative to the current date?
Here I'm trying to add the DOW to the current week but it doesn't appear to be correct.
SELECT date_trunc('week', current_date) + CAST(extract(dow from start_at) || ' days' AS interval)
Full example:
SELECT id event_id,
GENERATE_SERIES(date_trunc('week', current_date) + CAST(extract(dow from start_at) + 1 || ' days' AS interval) + start_at::time, current_date + interval '3 weeks', '1 week'::INTERVAL) AS start_at
FROM events
Events
+-----+---------------------------+---------------------+
| id | start_at | recurring_schedule |
+-----+---------------------------+---------------------+
| 358 | 2015-01-23 20:00:00 +0000 | Weekly |
| 359 | 2016-01-22 19:30:00 +1100 | Monthly |
| 360 | 2016-02-01 19:00:00 +1100 | Weekly |
| 361 | 2016-02-01 20:00:00 +0000 | Weekly |
| 362 | 2014-02-13 20:00:00 +0000 | Bi-Weekly |
+-----+---------------------------+---------------------+
Output
+----------+---------------------------+
| event_id | start_at |
+----------+---------------------------+
| 35 | 2018-04-11 19:30:00 +0000 |
| 94 | 2018-04-12 20:00:00 +0100 |
| 269 | 2018-04-13 18:30:00 +0100 |
| 45 | 2018-04-13 20:00:00 +0100 |
| 242 | 2018-04-13 19:30:00 +1100 |
| 35 | 2018-04-18 19:30:00 +0000 |
| 94 | 2018-04-19 20:00:00 +0100 |
| 269 | 2018-04-20 18:30:00 +0100 |
| 45 | 2018-04-20 20:00:00 +0100 |
| 242 | 2018-04-20 19:30:00 +1100 |
+----------+---------------------------+
Give this a try:
SELECT id event_id,
GENERATE_SERIES(date_trunc('week', current_date)::date
+ (extract(isodow from start_at)::int - 1) + start_at::time, current_date
+ interval '3 weeks', '1 week'::INTERVAL) AS start_at
FROM events