postgresql average of numrange - postgresql

I have a Postgresql table with following data
Power_range;unit; Date
[0.055,0.065];un_MW_el;14.01.1985
[0.02,0.02];un_MW_el;22.08.1985
[0.075,0.085];un_MW_el;09.04.1986
[0.055,0.055];un_MW_el;01.08.1986
[0.065,0.065];un_MW_el;19.01.1987
[0.075,0.075];un_MW_el;16.04.1987
[0.055,0.055];un_MW_el;15.05.1987
How can we query to list the average of numrange for each year/row?
The end result should be something like
0.060;1985
0.2;1985
0.80;1986
0.055;1986
0.065;1987
0.075;1987
0.055;1987

I'm not sure if thereis inline avg math func, but you can easily mock it up with arythmetics:
t=# select ((lower(power_range) + upper(power_range))/2)::float(2),extract(year from d) avg From rt;
float4 | avg
--------+------
0.06 | 1985
0.02 | 1985
0.08 | 1986
0.055 | 1986
0.065 | 1987
0.075 | 1987
0.055 | 1987
(7 rows)
where rt is created from your sample:
t=# create table rt(Power_range numrange, unit text, d date);
CREATE TABLE
t=# copy rt from stdin delimiter ';';
https://www.postgresql.org/docs/current/static/functions-range.html

Related

Run a SQL query against ten-minutes time intervals

I have a postgresql table with this schema:
id SERIAL PRIMARY KEY,
traveltime INT,
departuredate TIMESTAMPTZ,
departurehour TIMETZ
Here is a bit of data (edited):
id | traveltime | departuredate | departurehour
----+------------+------------------------+---------------
1 | 73 | 2019-12-24 00:00:03+01 | 00:00:03+01
2 | 73 | 2019-12-24 00:12:16+01 | 00:12:16+01
53 | 115 | 2019-12-24 07:53:44+01 | 07:53:44+01
54 | 116 | 2019-12-24 07:58:45+01 | 07:58:45+01
55 | 119 | 2019-12-24 08:03:46+01 | 08:03:46+01
56 | 120 | 2019-12-24 08:08:47+01 | 08:08:47+01
57 | 121 | 2019-12-24 08:13:48+01 | 08:13:48+01
58 | 121 | 2019-12-24 08:18:48+01 | 08:18:48+01
542 | 112 | 2019-12-26 07:52:41+01 | 07:52:41+01
543 | 114 | 2019-12-26 07:57:42+01 | 07:57:42+01
544 | 116 | 2019-12-26 08:02:43+01 | 08:02:43+01
545 | 116 | 2019-12-26 08:07:44+01 | 08:07:44+01
546 | 117 | 2019-12-26 08:12:45+01 | 08:12:45+01
547 | 118 | 2019-12-26 08:17:46+01 | 08:17:46+01
548 | 118 | 2019-12-26 08:22:48+01 | 08:22:48+01
1031 | 80 | 2019-12-28 07:50:33+01 | 07:50:33+01
1032 | 81 | 2019-12-28 07:55:34+01 | 07:55:34+01
1033 | 81 | 2019-12-28 08:00:35+01 | 08:00:35+01
1034 | 82 | 2019-12-28 08:05:36+01 | 08:05:36+01
1035 | 82 | 2019-12-28 08:10:37+01 | 08:10:37+01
1036 | 83 | 2019-12-28 08:15:38+01 | 08:15:38+01
1037 | 83 | 2019-12-28 08:20:39+01 | 08:20:39+01
I'd like to get the average for all the values collected for traveltime for each 10 minutes interval for several weeks.
Expected result for the data sample: for the 10-minutes interval between 8h00 and 8h10, the rows that will be included in the avg are with id 55, 56, 544, 545, 1033 and 1034
and so on.
I can get the average for a specific interval:
select avg(traveltime) from belt where departurehour >= '10:40:00+01' and departurehour < '10:50:00+01';
To avoid creating a query for each interval, I used this query to get all the 10-minutes intervals for the complete period encoded:
select i from generate_series('2019-11-23', '2020-01-18', '10 minutes'::interval) i;
What I miss is a way to apply my AVG query to each of these generated intervals. Any direction would be helpful!
It turns out that the generate_series does not actually apply as requardless of the date range. The critical part is the 144 10Min intervals per day. Unfortunatly Postgres does not provide an interval type for minuets. (Perhaps creating one would be a useful exersize). But all is not loss you can simulate the same with BETWEEN, just need to play with the ending of the range.
The following generates this simulation using a recursive CTE. Then as previously joins to your table.
set timezone to '+1'; -- necessary to keep my local offset from effecting results.
-- create table an insert data here
-- additional data added outside of date range so should not be included)
with recursive min_intervals as
(select '00:00:00'::timetz start_10Min -- start of 1st 10Min interval
, '00:09:59.999999'::timetz end_10Min -- last microsecond in 10Min interval
, 1 interval_no
union all
select start_10Min + interval '10 min'
, end_10Min + interval '10 min'
, interval_no + 1
from Min_intervals
where interval_no < 144 -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
) -- select * from min_intervals;
select start_10Min, end_10Min, avg(traveltime) average_travel_time
from min_intervals
join belt
on departuredate::time between start_10Min and end_10Min
where departuredate::date between date '2019-11-23' and date '2020-01-18'
group by start_10Min, end_10Min
order by start_10Min;
-- test result for 'specified' Note added rows fall within time frame 08:00 to 08:10
-- but these should be excluded so the avg for that period should be the same for both queries.
select avg(traveltime) from belt where id in (55, 56, 544, 545, 1033, 1034);
My issue with the above is the data range is essentially hard coded (yes substitution parameter are available) and manually but that is OK for psql or an IDE but not good for a production environment. If this is to be used in that environment I'd use the following function to return a virtual table of the same results.
create or replace function travel_average_per_10Min_interval(
start_date_in date
, end_date_in date
)
returns table (Start_10Min timetz
,end_10Min timetz
,avg_travel_time numeric
)
language sql
as $$
with recursive min_intervals as
(select '00:00:00'::timetz start_10Min -- start of 1st 10Min interval
, '00:09:59.999999'::timetz end_10Min -- last microsecond in 10Min interval
, 1 interval_no
union all
select start_10Min + interval '10 min'
, end_10Min + interval '10 min'
, interval_no + 1
from Min_intervals
where interval_no < 144 -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
) -- select * from min_intervals;
select start_10Min, end_10Min, avg(traveltime) average_travel_time
from min_intervals
join belt
on departuredate::time between start_10Min and end_10Min
where departuredate::date between start_date_in and end_date_in
group by start_10Min, end_10Min
order by start_10Min;
$$;
-- test
select * from travel_average_per_10Min_interval(date '2019-11-23', date '2020-01-18');

Check condition in date interval between now and next month

I have a table in PostgreSQL 10. The table has the following structure
| date | entity | col1 | col2 |
|------+--------+------+------|
Every row represents an event that happens to an entity in a given date. The event has attributes represented by col1 and col2.
I want to add a new column that indicates if with respect to the current row there are events in which the column col2 fulfills a given condition (in the following example the condition is col2 > 20) in a given interval (say 1 month) .
| date | entity | col1 | col2 | fulfill |
|------+--------+------+------+---------|
| t1 | A | a1 | 10 | F |
| t1 | B | b | 9 | F |
| t2 | A | a2 | 10 | T |
| t3 | A | a3 | 25 | F |
| t3 | B | b2 | 8 | F |
t3 is a date inside t2 + interval 1 month.
What is the most efficient way to acomplish this?
I am not sure if I got your problem correctly. My case is 'T if there is a value >= 10 between now an the next month'
I have the following data:
val event_date
--- ----------
22 2016-12-31 -- should be T because val >= 10
8 2017-03-20 -- should be F because in [event_date, eventdate + 1 month no val >= 10]
6 2017-03-22 -- F
42 2017-12-31 -- T because there are 2 values >= 10 in next month
25 2018-01-24 -- T val >= 10
9 2018-02-11 -- F
1 2018-03-01 -- T because in month there is 1 val >= 10
2 2018-03-10 -- T same
20 2018-04-01 -- T
7 2018-04-01 -- T because an same day val >= 10
1 2018-07-24 -- F
22 2019-01-01 -- T
4 2020-10-22 -- T
123 2020-11-04 -- T
The query:
SELECT DISTINCT
e1.val,
e1.event_date,
CASE
WHEN MAX(e2.val) over (partition BY e1.event_date) >= 10
THEN 'T'
ELSE 'F'
END AS fulfilled
FROM
testdata.events e1
JOIN
testdata.events e2
ON
e1.event_date <= e2.event_date
AND e2.event_date <=(e1.event_date + interval '1 month') ::DATE
ORDER BY
e1.event_date
The result:
val event_date fulfilled
--- ---------- ---------
22 2016-12-31 T
8 2017-03-20 F
6 2017-03-22 F
42 2017-12-31 T
25 2018-01-24 T
9 2018-02-11 F
1 2018-03-01 T
2 2018-03-10 T
20 2018-04-01 T
7 2018-04-01 T
1 2018-07-24 F
22 2019-01-01 T
4 2020-10-22 T
123 2020-11-04 T
Currently I am not finding a solution without joining the same table which seems not very stylish to me.

Estimated time minus spent time wrong result - postgresql

I have tasks that are estimated to some hours. And time spent minus estimated should result in time left to spend.
Employee table
CREATE TABLE sign
(signid varchar(3), signname varchar(30));
INSERT INTO sign
(signid, signname)
VALUES
('AA', 'Adam'),
('BB', 'Bert'),
('CC', 'Cecil'),
('DD', 'David')
Task table
CREATE TABLE task
(taskid int4, taskdate date, tasksign varchar(3), taskhr numeric(10,2));
INSERT INTO task
(taskid, taskdate, tasksign, taskhr)
VALUES
(1,'2016-01-01','AA',10),
(2,'2016-02-01','BB',10),
(3,'2016-01-15','BB',10),
(4,'2016-03-01','BB',10),
(5,'2016-01-03','CC',10)
Time sheet table
CREATE TABLE hr
(hrid int4, hrsign varchar(3), hrtask int4, hrqty numeric(10,2));
INSERT INTO hr
(hrid, hrsign, hrtask, hrqty)
VALUES
(1,'AA',1,1.1),
(2,'BB',2,1.2),
(3,'CC',5,2.3),
(4,'CC',5,5)
My attempt to get a simple query that subtract spent time from estimated time gives wrong answer:
SELECT signid,signname,to_char(taskdate, 'iyyy-iw'),sum(taskhr),sum(hrqty)
FROM sign
LEFT JOIN task ON tasksign=signid
LEFT JOIN hr ON taskid=hrtask
GROUP BY 1,2,3
ORDER BY 2,3
The answer is:
id name week task hr
AA Adam 2015-53 10 1,1000
BB Bert 2016-02 10 NULL
BB Bert 2016-05 10 1,2000
BB Bert 2016-09 10 NULL
CC Cecil 2015-53 20 7,3000
DD David NULL NULL NULL
The task hours seems to be duplicated. It should look like this:
id name week task hr
AA Adam 2015-53 10 1,1000
BB Bert 2016-02 10 NULL
BB Bert 2016-05 10 1,2000
BB Bert 2016-09 10 NULL
CC Cecil 2015-53 10 7,3000
DD David NULL NULL NULL
Any tip how to make a query that calculate correct?
"fiddle"
http://rextester.com/UOO16020
Joining the hr table multiplies the task table rows. Aggregate hr before joining:
select signid, signname, to_char(taskdate, 'iyyy-iw'), sum(taskhr), sum(hrqty)
from
sign
left join
task on tasksign = signid
left join (
select hrtask, sum(hrqty) as hrqty
from hr
group by 1
)
hr on taskid = hrtask
group by 1,2,3
order by 2,3
;
signid | signname | to_char | sum | sum
--------+----------+---------+-------+------
AA | Adam | 2015-53 | 10.00 | 1.10
BB | Bert | 2016-02 | 10.00 |
BB | Bert | 2016-05 | 10.00 | 1.20
BB | Bert | 2016-09 | 10.00 |
CC | Cecil | 2015-53 | 10.00 | 7.30
DD | David | | |

Postgres placeholders for 0 data

I have some Postgres data like this:
date | count
2015-01-01 | 20
2015-01-02 | 15
2015-01-05 | 30
I want to run a query that pulls this data with 0s in place for the dates that are missing, like this:
date | count
2015-01-01 | 20
2015-01-02 | 15
2015-01-03 | 0
2015-01-04 | 0
2015-01-05 | 30
This is for a very large range of dates, and I need it to fill in all the gaps. How can I accomplish this with just SQL?
Given a table junk of:
d | c
------------+----
2015-01-01 | 20
2015-01-02 | 15
2015-01-05 | 30
Running
select fake.d, coalesce(j.c, 0) as c
from (select min(d) + generate_series(0,7,1) as d from junk) fake
left outer join junk j on fake.d=j.d;
gets us:
d | c
------------+----------
2015-01-01 | 20
2015-01-02 | 15
2015-01-03 | 0
2015-01-04 | 0
2015-01-05 | 30
2015-01-06 | 0
2015-01-07 | 0
2015-01-08 | 0
You could of course adjust the start date for the series, length it runs for, etc.
Where is this data going? To an outside source or another table or view?
There's probably a better solution but you could create a new table(or in excel wherever the data is going) that has the entire date-range you want with another integer column of null values. Then update that table with your current dataset then replace all nulls with zero.
It's a really roundabout way to do things but it'll work.
I don't have enough rep to comment :(
This is also a good reference
Using COALESCE to handle NULL values in PostgreSQL

A query to Find average value for each ranges?

Here is my table
GID | Distance (KM) | Subdistance (KM) | Iri_avg
-------------------------------------------------
1 | 13.952 | 0 | 0.34
2 | 13.957 | 0.005 | 0.22
3 | 13.962 | 0.010 | 0.33
4 | 13.967 | 0.015 | 0.12
5 | 13.972 | 0.020 | 0.35
...
I would like to find AVG of Iri_avg for each ranges,
for example..
each 5 metre (by default)
each 10 metre
each 100 metre
each 500 metre
What is the PostgreSQL query to solve this problem ?
Your question is unclear. Your data has two distance columns, which one do you mean?
Here is an example of how to get averages based on the subdistance.
select floor(subdistance*1000)/5.0)*5.0 as lower_bound, avg(iri_avg) as avg_ari_avg
from t
group by floor(subdistance*1000)/5.0)*5.0
order by 1
The expression "floor(subdistance*1000)/5.0)*5.0" gets the closest 5 meter increment less than the value. You can replace the "5" with "10" or "100" for other binning.
This is meant as an illustration. It is unclear which column you want to bin, what you want to do about empty bins, and whether you are looking for all bin-widths in a single query versus the query to handle just one bin-width.