PostgreSQL calculate average values for parts of the day - postgresql

I have a postgres table with measured temperatures and timestamp of measurement. Measuring interval is 30 minutes, but sometimes it skips, so I don't get the same number of measurements each day.
The table looks like this:
I need to create a view that shows average temperature for each day divided into four 6 hour intervals: 00-06, 06-12, 12-18 and 18-24. It should look something like this:
avg_temp, time
|24.5 | 2018-05-13 00:00:00 |
|22.1 | 2018-05-13 06:00:00 |
|25.6 | 2018-05-13 12:00:00 |
|20.6 | 2018-05-13 18:00:00 |
|21.8 | 2018-05-14 00:00:00 |
etc. etc.

You can round timestamps to quarters of a day with the following expression (on an exemplary data):
with my_table(temp, time) as (
values
(20, '2018-05-20 4:00'::timestamp),
(21, '2018-05-20 5:00'),
(22, '2018-05-20 6:00'),
(23, '2018-05-20 7:00'),
(24, '2018-05-20 12:00'),
(25, '2018-05-20 19:00')
)
select avg(temp), time::date + (extract(hour from time)::int/ 6* 6)* '1h'::interval as time
from my_table
group by 2
order by 2
avg | time
---------------------+---------------------
20.5000000000000000 | 2018-05-20 00:00:00
22.5000000000000000 | 2018-05-20 06:00:00
24.0000000000000000 | 2018-05-20 12:00:00
25.0000000000000000 | 2018-05-20 18:00:00
(4 rows)

If you also need the averages for intervals without any measurements, you'll need a calendar-table:
-- \i tmp.sql
CREATE TABLE the_temp(
ztime timestamp primary key
, ztemp double precision
) ;
INSERT INTO the_temp( ztemp, ztime )
VALUES (20, '2018-05-20 4:00')
, (21, '2018-05-20 5:00')
, (22, '2018-05-20 6:00')
, (23, '2018-05-20 7:00')
, (24, '2018-05-20 12:00')
, (25, '2018-05-20 19:00')
;
-- Generate calendar table
WITH cal AS(
SELECT ts AS t_begin, ts+ '6hours'::interval AS t_end
FROM generate_series('2018-05-20 0:00'::timestamp
, '2018-05-21 0:00', '6hours'::interval) ts
)
SELECT cal.t_begin, cal.t_end
, AVG( tt.ztemp)AS zmean
FROM cal
LEFT JOIN the_temp tt
ON tt.ztime >= cal.t_begin
AND tt.ztime < cal.t_end
GROUP BY cal.t_begin, cal.t_end
;

Related

How to show sum per day AND year postgresql

I want to get sum row values per day and per year, and showing on the same row.
The database that the first and second queries get results from from include a table like this (ltg_data):
time lon lat geom
2018-01-30 11:20:21 -105.4333 32.3444 01010....
And then some geometries that I'm joining to.
One query:
SELECT to_char(time, 'MM/DD/YYYY') as day, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
day strikes
01/28/2018 22
03/23/2018 15
12/19/2017 20
12/20/2017 12
Second query:
SELECT to_char(time, 'YYYY') as year, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
year strikes
2017 32
2018 37
What I'd like is:
day daily_strikes year yearly_strikes
01/28/2018 22 2018 37
03/23/2018 15 2018 37
12/19/2017 20 2017 32
12/20/2017 12 2017 32
I found that union all shows the year totals at the very bottom, but I'd like to have the results horizontally, even if there are repeat yearly totals. Thanks for any help!
You can try this kind of approach. It's not very optimal but at lease works:
I have a test table like this:
postgres=# select * from test;
d | v
------------+---
2001-02-16 | a
2002-02-16 | a
2002-02-17 | a
2002-02-17 | a
(4 wiersze)
And query:
select
q.year,
sum(q.countPerDay) over (partition by extract(year from q.day)),
q.day,
q.countPerDay
from (
select extract('year' from d) as year, date_trunc('day', d) as day, count(*) as countPerDay from test group by day, year
) as q
So the result looks like this:
2001 | 1 | 2001-02-16 00:00:001 | 1
2002 | 3 | 2002-02-16 00:00:001 | 1
2002 | 3 | 2002-02-17 00:00:001 | 2
create table strikes (game_date date,
strikes int
) ;
insert into strikes (game_date, strikes)
values ('01/28/2018', 22),
('03/23/2018', 15),
('12/19/2017', 20),
('12/20/2017', 12)
;
select * from strikes ;
select game_date, strikes, sum(strikes) over(partition by extract(year from game_date) ) as sum_stikes_by_year
from strikes ;
"2017-12-19" 20 "32"
"2017-12-20" 12 "32"
"2018-01-28" 22 "37"
"2018-03-23" 15 "37"
This application of aggregation is known as "windowing" functions or analytic functions:
PostgreSQL Docs
---- EDIT --- based on comments...
create table strikes_tally (strike_time timestamp,
lat varchar(10),
long varchar(10),
geom varchar(10)
) ;
insert into strikes_tally (strike_time, lat, long, geom)
values ('2018-01-01 12:43:00', '100.1', '50.8', '1234'),
('2018-01-01 12:44:00', '100.1', '50.8', '1234'),
('2018-01-01 12:45:00', '100.1', '50.8', '1234'),
('2018-01-02 20:01:00', '100.1', '50.8', '1234'),
('2018-01-02 20:02:00', '100.1', '50.8', '1234'),
('2018-01-02 22:03:00', '100.1', '50.8', '1234') ;
select to_char(strike_time, 'dd/mm/yyyy') as strike_date,
count(strike_time) over(partition by to_char(strike_time, 'dd/mm/yyyy')) as daily_strikes,
to_char(strike_time, 'yyyy') as year,
count(strike_time) over(partition by to_char(strike_time, 'yyyy') ) as yearly_strikes
from strikes_tally
;

How to retrieve top 3 results for each column in postgresql?

I have given a question. The table looks like this..
STATE | year1 | ... | year 10
AP | 100 | ... | 120
assam | 13 | .. | 42
madhya pradesh | 214 | ... | 421
Now, I need to get the top - 3 states for each year.
I tried everything possible. But, I am not able to filter results per column.
You have a design problem. The enumerated column are almost always a sign of bad design.
For now you could unpivot using unnest and then use window function row_number to get the top 3 states per year:
with unpivoted as (
select state,
unnest(array[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) as year,
unnest(array[
year_1, year_2, year_3,
year_4, year_5, year_6,
year_7, year_8, year_9,
year_10
]) as value,
from your_table
)
select *
from (
select t.*,
row_number() over (
partition by year
order by value desc
) as seqnum
from unpivoted t
) t
where seqnum <= 3;
Demo

Grouping Events in Postgres

I've got an events table that is generated by user activity on a site:
timestamp | name
7:00 AM | ...
7:01 AM | ...
7:02 AM | ...
7:30 AM | ...
7:31 AM | ...
7:32 AM | ...
8:01 AM | ...
8:03 AM | ...
8:05 AM | ...
8:08 AM | ...
8:09 AM | ...
I'd like to aggregate over the events to provide a view of when a user is active. I'm defining active to mean the period in which an event is within +/- 2 minutes. For the above that'd mean:
from | till
7:00 AM | 7:02 AM
7:30 AM | 7:32 AM
8:01 AM | 8:05 AM
8:08 AM | 8:09 AM
What's the best way to write a query that'll aggregate in that method? Is it possible via a WINDOW function or self join or is PL/SQL required?
Use two window functions: one to calculate intervals between contiguous events (gaps) and another to find series of gaps less or equal 2 minutes:
select arr[1] as "from", arr[cardinality(arr)] as "till"
from (
select array_agg(timestamp order by timestamp) arr
from (
select timestamp, sum((gap > '2m' )::int) over w
from (
select timestamp, coalesce(timestamp - lag(timestamp) over w, '3m') gap
from events
window w as (order by timestamp)
) s
window w as (order by timestamp)
) s
group by sum
) s
from | till
----------+----------
07:00:00 | 07:02:00
07:30:00 | 07:32:00
08:01:00 | 08:05:00
(3 rows)
Test it here.
By grouping them around half-hour flooring and getting min & max values:
WITH x(t) AS ( VALUES
('7:02 AM'::TIME),('7:01 AM'::TIME),('7:00 AM'::TIME),
('7:30 AM'::TIME),('7:31 AM'::TIME),('7:32 AM'::TIME),
('8:01 AM'::TIME),('8:03 AM'::TIME),('8:05 AM'::TIME)
)
SELECT MIN(t) "from", MAX(t) "till"
FROM (select t, date_trunc('hour', t) +
CASE WHEN (t-date_trunc('hour', t)) >= '30 minutes'::interval
THEN '30 minutes'::interval ELSE '0'::interval END t1 FROM x ) y
GROUP BY t1 ORDER BY t1;
You can apply the same receipt with datetime values like:
WITH x(t) AS (
SELECT '2017-01-01'::TIMESTAMP + (RANDOM()*1440*'1 minute'::INTERVAL) t
FROM GENERATE_SERIES(0,1000))
SELECT MIN...

Getting attendance of an employee with a date series in a particular range in Postgres

I have a attendance table with employee_id, date and punch-in time.
Emp_Id PunchTime
101 10/10/2016 07:15
101 10/10/2016 12:20
101 10/10/2016 12:50
101 10/10/2016 16:31
102 10/10/2016 07:15
Here I have the date only for the working days. I want to get the attendance list of a employee with series of given date period. I need the day also. Result should look like as follows
date | day |employee_id | Intime | outtime |
2016-10-09 | sunday | 101 | | |
2016-10-10 | monday | 101 | 2016-10-10 7:15AM |2016-10-10 4:31 PM |
You can generate a list of dates and then do an outer join on them:
The following displays all days in October:
select d.date, a.emp_id,
min(punchtime) as intime,
max(punchtime) as outtime
from generate_series(date '2016-10-01', date '2016-11-01' - 1, interval '1' day) as d (date)
left join attendance a on d.date = a.punchtime::date
group by d.date, a.emp_id;
order by d.date, a.emp_id;
As you want the first and last timestamp from each day this can be done using a simple group by query.
This will however not repeat the emp_id for the non_existing days.
Something like the following will generate a list of the range of dates (starting and ending with whatever range is found in your punchtime table), with employees and intime, outtime for each. Check the SQL fiddle here:
http://sqlfiddle.com/#!15/d93bd/1
WITH RECURSIVE minmax AS
(
SELECT MIN(CAST(time AS DATE)) AS min, MAX(CAST(time as DATE)) AS max
FROM emp_time
),
dates AS
(
SELECT m.min as datepart
FROM minmax m
RIGHT JOIN emp_time e ON m.min = CAST(e.time as DATE)
UNION ALL
SELECT d.datepart + 1 FROM dates d, minmax mm
WHERE d.datepart + 1 <= mm.max
)
SELECT d.datepart as date, e.emp, MIN(e.time) as intime, MAX(e.time) as outtime FROM dates d
LEFT JOIN emp_time e ON d.datepart = CAST(e.time as DATE)
GROUP BY d.datepart, e.emp
ORDER BY d.datepart;

Getting data from postgres weekly (according to date)

user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
)
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
------------+------------+-------
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
select
ui,
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
total_time_spent
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.