Postgres: Aggregate of Averaged Time Samples

Postgres: Aggregate of Averaged Time Samples - postgresql

This is a bit of a SQL code golf post...
I have a table that contains a timestamp an identifier, and a price
CREATE TABLE IF NOT EXISTS ts_data (
ts TIMESTAMP WITHOUT TIME ZONE, --timestamp
metric INTEGER, --identifier
usd DOUBLE PRECISION -- price
);
Given the sample data:
INSERT INTO ts_data (ts, metric, usd)
VALUES
('2018-08-21 01:00:00', 1, 5.00),
('2018-08-21 01:05:00', 1, 10.00),
('2018-08-21 01:10:00', 1, 15.00),
('2018-08-21 01:15:00', 1, 20.00),
('2018-08-21 02:00:00', 1, 25.00),
('2018-08-21 02:05:00', 1, 30.00),
('2018-08-21 02:10:00', 1, 35.00),
('2018-08-21 02:15:00', 1, 40.00),
('2018-08-21 01:00:00', 2, 1.00),
('2018-08-21 01:05:00', 2, 2.00),
('2018-08-21 01:10:00', 2, 3.00),
('2018-08-21 01:15:00', 2, 4.00),
('2018-08-21 02:00:00', 2, 5.00),
('2018-08-21 02:05:00', 2, 6.00),
('2018-08-21 02:10:00', 2, 7.00),
('2018-08-21 02:15:00', 2, 8.00);
What I'm trying to do
Re-Sample the time-frequency by using DATE_TRUNC and average the price of the same truncated date per metric
Calculate the SUM of all metrics per truncated date
This query accomplishes what I want
SELECT ts, SUM(usd) as usd FROM (
SELECT metric, date_trunc('HOUR', ts) as ts, AVG(usd) AS usd
FROM ts_data
WHERE ts BETWEEN '2018-08-20 00:00:00' AND '2018-08-22 00:00:00'
AND metric IN (1, 2)
GROUP BY date_trunc('HOUR', ts), metric
) samples
GROUP BY ts;
We get:
+------------------------------+
| ts | price|
+------------------------------+
| "2018-08-21 01:00:00" | 15 |
| "2018-08-21 02:00:00" | 39 |
Are there any more efficient or compact ways to accomplish the same thing?

Related

How to write a select query for displaying data on a table in another way using Postgresql?

I want to write a select query to pick data from a table which is shown in this image below,PICTURE_1
1.Table Containing Data
and display it like this image in this link below, PICTURE_2
2.Result of the query
About the data: The first picture shows data logged into a table for 2 seconds from 3 IDs(1,2&3) having 2 sub IDs (aa&bb). Values and timestamp are also displayed in the picture. The table conatins only 3 column as shown in PICTURE_1. Could you guys help me write a query to display data in the table to get displayed as shown in the second image using Postgresql?. You can extract ID name using substring function. The language that Im using is plpgsql. Any ideas/logic also will be good.Thank you for your time.

Please try this. Here row value has been shown in column wise and also use CTE.
-- PostgreSQL(v11)
WITH cte_t AS (
SELECT LEFT(name, 1) id
, RIGHT(name, POSITION('.' IN REVERSE(name)) - 1) t_name
, value
, time_stamp
FROM test
)
SELECT id
, time_stamp :: DATE "date"
, time_stamp :: TIME "time"
, MAX(CASE WHEN t_name = 'aa' THEN value END) "aa"
, MAX(CASE WHEN t_name = 'bb' THEN value END) "bb"
FROM cte_t
GROUP BY id, time_stamp
ORDER BY date, time, id;
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=6d35047560b3f83e6c906584b23034e9

Check this query dbfiddle
with cte (name, value, timeStamp) as (values
('1.aa', 1, '2021-08-20 10:10:01'),
('2.aa', 2, '2021-08-20 10:10:01'),
('3.aa', 3, '2021-08-20 10:10:01'),
('1.bb', 4, '2021-08-20 10:10:01'),
('2.bb', 5, '2021-08-20 10:10:01'),
('3.bb', 6, '2021-08-20 10:10:01'),
('1.aa', 7, '2021-08-20 10:10:02'),
('2.aa', 8, '2021-08-20 10:10:02'),
('3.aa', 9, '2021-08-20 10:10:02'),
('1.bb', 0, '2021-08-20 10:10:02'),
('2.bb', 1, '2021-08-20 10:10:02'),
('3.bb', 2, '2021-08-20 10:10:02')
), sub_cte as (
select split_name[1] as id, split_name[2] as name, value, tt::date as date, tt::time as time from (
select
regexp_split_to_array(name, '\.') split_name,
value,
to_timestamp(timestamp, 'YYYY-MM-DD HH:MI:SS') as tt
from cte
) foo
)
select id, date, time, a.value as aa, b.value as bb from sub_cte a
left join (
select * from sub_cte where name = 'bb'
) as b using (id, date, time)
where a.name = 'aa'
Result
id | date | time | aa | bb
----+------------+----------+----+----
1 | 2021-08-20 | 10:10:01 | 1 | 4
2 | 2021-08-20 | 10:10:01 | 2 | 5
3 | 2021-08-20 | 10:10:01 | 3 | 6
1 | 2021-08-20 | 10:10:02 | 7 | 0
2 | 2021-08-20 | 10:10:02 | 8 | 1
3 | 2021-08-20 | 10:10:02 | 9 | 2
(6 rows)

PostgreSQL calculate average values for parts of the day

I have a postgres table with measured temperatures and timestamp of measurement. Measuring interval is 30 minutes, but sometimes it skips, so I don't get the same number of measurements each day.
The table looks like this:
I need to create a view that shows average temperature for each day divided into four 6 hour intervals: 00-06, 06-12, 12-18 and 18-24. It should look something like this:
avg_temp, time
|24.5 | 2018-05-13 00:00:00 |
|22.1 | 2018-05-13 06:00:00 |
|25.6 | 2018-05-13 12:00:00 |
|20.6 | 2018-05-13 18:00:00 |
|21.8 | 2018-05-14 00:00:00 |
etc. etc.

You can round timestamps to quarters of a day with the following expression (on an exemplary data):
with my_table(temp, time) as (
values
(20, '2018-05-20 4:00'::timestamp),
(21, '2018-05-20 5:00'),
(22, '2018-05-20 6:00'),
(23, '2018-05-20 7:00'),
(24, '2018-05-20 12:00'),
(25, '2018-05-20 19:00')
)
select avg(temp), time::date + (extract(hour from time)::int/ 6* 6)* '1h'::interval as time
from my_table
group by 2
order by 2
avg | time
---------------------+---------------------
20.5000000000000000 | 2018-05-20 00:00:00
22.5000000000000000 | 2018-05-20 06:00:00
24.0000000000000000 | 2018-05-20 12:00:00
25.0000000000000000 | 2018-05-20 18:00:00
(4 rows)

If you also need the averages for intervals without any measurements, you'll need a calendar-table:
-- \i tmp.sql
CREATE TABLE the_temp(
ztime timestamp primary key
, ztemp double precision
) ;
INSERT INTO the_temp( ztemp, ztime )
VALUES (20, '2018-05-20 4:00')
, (21, '2018-05-20 5:00')
, (22, '2018-05-20 6:00')
, (23, '2018-05-20 7:00')
, (24, '2018-05-20 12:00')
, (25, '2018-05-20 19:00')
;
-- Generate calendar table
WITH cal AS(
SELECT ts AS t_begin, ts+ '6hours'::interval AS t_end
FROM generate_series('2018-05-20 0:00'::timestamp
, '2018-05-21 0:00', '6hours'::interval) ts
)
SELECT cal.t_begin, cal.t_end
, AVG( tt.ztemp)AS zmean
FROM cal
LEFT JOIN the_temp tt
ON tt.ztime >= cal.t_begin
AND tt.ztime < cal.t_end
GROUP BY cal.t_begin, cal.t_end
;

Summing arrays in conjunction with GROUP BY

I've got some periodic counter data (like once a second) from different objects that I wish to combine into an hourly total.
If I do it with separate column names, it's pretty straightforward:
CREATE TABLE ts1 (
id INTEGER,
ts TIMESTAMP,
count0 integer,
count1 integer,
count2 integer
);
INSERT INTO ts1 VALUES
(1, '2017-12-07 10:37:48', 10, 20, 50),
(2, '2017-12-07 10:37:48', 13, 7, 88),
(1, '2017-12-07 10:37:49', 12, 23, 34),
(2, '2017-12-07 10:37:49', 11, 13, 46),
(1, '2017-12-07 10:37:50', 8, 33, 80),
(2, '2017-12-07 10:37:50', 9, 3, 47),
(1, '2017-12-07 10:37:51', 17, 99, 7),
(2, '2017-12-07 10:37:51', 9, 23, 96);
SELECT id, date_trunc('hour', ts + '1 hour') nts,
sum(count0), sum(count1), sum(count2)
FROM ts1 GROUP BY id, nts;
id | nts | sum | sum | sum
----+---------------------+-----+-----+-----
1 | 2017-12-07 11:00:00 | 47 | 175 | 171
2 | 2017-12-07 11:00:00 | 42 | 46 | 277
(2 rows)
The problem is that different objects have different numbers of counts (though each particular object's rows -- ones sharing the same ID -- all have the same number of counts). Hence I want to use an array.
The corresponding table looks like this:
CREATE TABLE ts2 (
id INTEGER,
ts TIMESTAMP,
counts INTEGER[]
);
INSERT INTO ts2 VALUES
(1, '2017-12-07 10:37:48', ARRAY[10, 20, 50]),
(2, '2017-12-07 10:37:48', ARRAY[13, 7, 88]),
(1, '2017-12-07 10:37:49', ARRAY[12, 23, 34]),
(2, '2017-12-07 10:37:49', ARRAY[11, 13, 46]),
(1, '2017-12-07 10:37:50', ARRAY[8, 33, 80]),
(2, '2017-12-07 10:37:50', ARRAY[9, 3, 47]),
(1, '2017-12-07 10:37:51', ARRAY[17, 99, 7]),
(2, '2017-12-07 10:37:51', ARRAY[9, 23, 96]);
I have looked at this answer https://stackoverflow.com/a/24997565/1076479 and I get the general gist of it, but I cannot figure out how to get the correct rows summed together when I try to combine it with the grouping by id and timestamp.
For example, with this I get all the rows, not just the ones with matching id and timestamp:
SELECT id, date_trunc('hour', ts + '1 hour') nts, ARRAY(
SELECT sum(elem) FROM ts2 t, unnest(t.counts)
WITH ORDINALITY x(elem, rn) GROUP BY rn ORDER BY rn
) FROM ts2 GROUP BY id, nts;
id | nts | array
----+---------------------+--------------
1 | 2017-12-07 11:00:00 | {89,221,448}
2 | 2017-12-07 11:00:00 | {89,221,448}
(2 rows)
FWIW, I'm using postgresql 9.6

The problem with you original query is that you're summing all elements, because GROUP BY id, nts is executed in outer query. Combining a CTE with LATERAL JOIN would do the trick:
WITH tmp AS (
SELECT
id,
date_trunc('hour', ts + '1 hour') nts,
sum(elem) AS counts
FROM
ts2
LEFT JOIN LATERAL unnest(counts) WITH ORDINALITY x(elem, rn) ON TRUE
GROUP BY
id, nts, rn
)
SELECT id, nts, array_agg(counts) FROM tmp GROUP BY id, nts

Aggregation on time range

I have a data-set that contains date {yyyy/mm/dd} and time {h,m,s} and temperature {float} as an individual columns.
I want to aggregate temperature values for each day by average function.
The problem is that, I don't know how I can query the time attribute to say for example aggregate {h,m, (0-5)s} and {h,m, (5-10)s} and {h,m, (10-15)s} and ..., automatically.

select
day,
to_char(date_trunc('minute', "time"), 'HH24:MI') as "minute",
extract(second from "time")::integer / 5 as "range",
avg(temperature) as average
from (
select d::date as day, d::time as "time", random() * 100 as temperature
from generate_series('2012-01-01', '2012-01-03', '1 second'::interval) s(d)
) d
group by 1, 2, 3
order by 1, 2, 3
;
If you want the average for all days:
select
to_char(date_trunc('minute', "time"), 'HH24:MI') as "minute",
extract(second from "time")::integer / 5 as "range",
avg(temperature) as average
from (
select d::time as "time", random() * 100 as temperature
from generate_series('2012-01-01', '2012-01-03', '1 second'::interval) s(d)
) d
group by 1, 2
order by 1, 2
;
I think the important part for your question is to group by the integer result of the division of the seconds by the range size.

SQL finding average value of n rows where n is a sum of a field

I have data that looks like this.
SoldToRetailer
OrderDate | Customer | Product | price | units
-------------------------------------------------
1-jun-2011 | customer1 | Product1 | $10 | 5
2-jun-2011 | customer1 | Product1 | $5 | 3
3-jun-2011 | customer1 | Product2 | $10 | 4
4-jun-2011 | customer1 | Product1 | $4 | 4
5-jun-2011 | customer2 | Product3 | $10 | 1
SalesByRetailers
Customer | Product | units
-----------------------------
customer1 | Product1 | 5
customer2 | Product3 | 1
Here's what I need.
Sales(average price)
Customer | Product | units | Average Price
--------------------------------------------
customer1 | Product1 | 5 | $3.44
customer2 | Product3 | 1 | $10
Average Price is defined as the average price of the most recent SoldToRetailer Prices that add up to the units.
So in the first case, I grab the orders from June 4th and June 2nd. I don't need (actually want) the orders from june 1st to be included.
EDIT: Hopefully a better explanation.
I'm attempting to determine the correct (most recent) price where an item was sold to a retailer. It's LIFO order for the prices. The price is determined by averaging the price sold over the last n orders. Where n = total retail sales for a particular product and customer.
In SQL pseudcode it would look like this.
Select s1.Customer, s1.product, average(s2.price)
from SalesByRetailers s1
join SoldToRetailer s2
on s1.customer=s2.customer
and s1.product=s2.product
and ( select top (count of records where s2.units = s1.units) from s2 order by OrderDate desc)
What I need to return is the number of records from SoldToRetailer where the sum of units is >= SalesByRetailer Units.
It looks like it could be solved by a RANK or rowover partition, but I'm at a loss.
The SoldToRetailer table is ginormous so performance is at a premium.
Running on SQL 2008R2
Thanks for helping

So I used 3 techniques. First I created a table with an over by clause to give me a sorted list of products and prices, then I edited the table to add in the running average. An OUTER APPLY sub select fixded my final problem. Hopefully the code will help someone else with a similar problem.
A shout out to Jeff Moden of SQLSderverCentral.com fame for the running average help.
SELECT d.ProductKey,
d.ActualDollars,
d.Units,
ROW_NUMBER() OVER(PARTITION BY ProductKey ORDER BY d.OrderDateKey DESC) AS RowNumber,
NULL AS RunningTotal,
CONVERT(DECIMAL(10, 4), 0) AS RunningDollarsSum,
CONVERT(DECIMAL(10, 4), 0) AS RunningAverage
INTO #CustomerOrdersDetails
FROM dbo.orders d
WHERE customerKey = #CustomerToSelect
--DB EDIT... Google "Quirky update SQL Server central. Jeff Moden's version of a
--Running total. Holy crap it's faster. tried trangular joins before.
CREATE CLUSTERED INDEX [Index1]
ON #CustomerOrdersDetails ( ProductKey ASC, RowNumber ASC )
DECLARE #RunningTotal INT
DECLARE #PrevProductKey INT
DECLARE #RunningDollarsSum DECIMAL(10, 4)
UPDATE #CustomerOrdersDetails
SET #RunningTotal = RunningTotal = CASE
WHEN ProductKey = #PrevProductKey THEN c.Units + ISNULL(#RunningTotal, 0)
ELSE c.Units
END,
#RunningDollarsSum = RunningDollarsSum = CASE
WHEN ProductKey = #PrevProductKey THEN c.ActualDollars + ISNULL(#RunningDollarsSum, 0)
ELSE c.ActualDollars
END,
#PrevProductKey = ProductKey,
RunningAverage = #RunningDollarsSum / NULLIF(#RunningTotal, 0)
FROM #CustomerOrdersDetails c WITH (TABLOCKX)
OPTION (MAXDOP 1)
-- =============================================
-- Update Cost fields with average price calculation
-- =============================================
UPDATE d
SET DolSoldCostUSD = COALESCE(d.DolSoldCostUSD,
d.UnitsSold * a.RunningAverage),
FROM dbo.inbound d
OUTER APPLY (SELECT TOP 1 *
FROM #CustomerOrdersDetails ap
WHERE ap.ProductKey = d.ProductKey
AND d.UnitsSold + d.UnitsOnHand + d.UnitsOnOrder + d.UnitsReceived + d.UnitsReturned >= RunningTotal
ORDER BY RunningTotal) AS a

declare #table table (customer varchar(15), product varchar(15), qty int, price decimal(6,2))
insert into #table (customer, product, qty, price)
values
('customer1', 'product1', 5, 3),
('customer1', 'product1', 4, 4),
('customer1', 'product1', 3, 2),
('customer1', 'product1', 2, 13),
('customer1', 'product1', 3, 3),
('customer1', 'product2', 5, 1),
('customer1', 'product2', 4, 7),
('customer1', 'product2', 2, 5),
('customer1', 'product2', 6, 23),
('customer1', 'product2', 2, 1),
('customer2', 'product1', 2, 1),
('customer2', 'product1', 4, 4),
('customer2', 'product1', 7, 3),
('customer2', 'product1', 1, 12),
('customer2', 'product1', 2, 3),
('customer2', 'product2', 3, 2),
('customer2', 'product2', 6, 5),
('customer2', 'product2', 8, 4),
('customer2', 'product2', 2, 11),
('customer2', 'product2', 1, 2)
select customer, product, sum(qty) as units, (sum(qty * price))/SUM(qty) as 'Average Price' from #table
group by customer, product

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres: Aggregate of Averaged Time Samples - postgresql

Related

How to write a select query for displaying data on a table in another way using Postgresql?

PostgreSQL calculate average values for parts of the day

Summing arrays in conjunction with GROUP BY

Aggregation on time range

SQL finding average value of n rows where n is a sum of a field

Categories

Resources