Different result for (apparantly) equal input data?

Different result for (apparantly) equal input data? - postgresql

To compute the average angle in a table (angles in degrees [0, 360]) I use the following statement:
SELECT
(CASE WHEN (a < 0.0)
THEN a + 360.0
ELSE a END) as angle
FROM (
SELECT
degrees(atan2(avg(sin(radians(x))), avg(cos(radians(x))))) as a
FROM
angle_t
) as t
UNION
SELECT
x
FROM
angle_t
when it came to testing I tried my table containing yahoo weather data:
WITH angle_t(x) AS (
SELECT
cast(wind_direction as double precision)
FROM
weather_yahoo
WHERE
time >= current_date - interval '1 days' - interval '1 hours'
AND
time <= current_date - interval '1 days')
The output was:
246.670436944698
250.0
240.0
I wondered why the average angle wasn't 245 but 246.67... so I ran another test with apparantly equal input data:
WITH angle_t(x) AS (VALUES
(240 :: double precision),
(250))
The output showed the (un-)expected result:
245.0
250.0
240.0
Can anyone explain this to me? (this is PostgreSQL 8.4)

The UNION operator eliminates duplicate entries.
From the documentation [emphasis mine]:
UNION effectively appends the result of query2 to the result of query1 (although there is no guarantee that this is the order in which the rows are actually returned). Furthermore, it eliminates duplicate rows from its result, in the same way as DISTINCT, unless UNION ALL is used.
Therefore, if UNION ALL is used, the explanation for the unexpected outcome is obvious:
246.670436944698
250.0
240.0
250.0

Related

write a query to calculate cumulative performance based on daily percent change in postgresql?

I have daily change in a table like below.
Table: performance
date
percent_change
2022/12/01
2
2022/12/02
-1
2022/12/03
3
I want to assume initial value as 100. and shows cumulative value till date, like below.
expected output:
date
percent_change
cumulative value
2022/12/01
2
102
2022/12/02
-1
100.98
2022/12/03
3
104.0094

A product of values, like the one you want to make, is nothing more than EXP(SUM(LN(...))). It results in a slightly verbose query but does not require new functions to be coded and can be ported as is to other DBMS.
In your case, as long as none of your percentages is below -100%:
SELECT date,
percent_change,
100 * EXP(SUM(LN(1+percent_change/100)) OVER (ORDER BY Date)) AS cumulative_value
FROM T
The SUM(...) OVER (ORDER BY ...) is what makes it a cumulative sum.
If you need to account for percentages lower than -100%, you need a bit more complexity.
SELECT date,
percent_change,
100 * -1 ^ SUM(CASE WHEN percent_change < -100 THEN 1 ELSE 0 END) OVER (ORDER BY Date)
* EXP(SUM(LN(ABS(1+percent_change/100))) OVER (ORDER BY Date))
AS cumulative_value
FROM T
WHERE NOT EXISTS (SELECT FROM T T2 WHERE T2.percent_change = -100 AND T2.date <= T.date)
UNION ALL
SELECT Date, percent_change, 0
FROM T
WHERE EXISTS (SELECT FROM T T2 WHERE T2.percent_change = -100 AND T2.date <= T.date)
Explanation:
An ABS(...) has been added to account for the values not supported in the previous query. It effectively strips the sign of 1 + percentage_value / 100
Before the EXP(SUM(LN(ABS(...)))), the -1 ^ SUM(...) is where the sign is put back to the calculation. Read it as: -1 to the power of how many times we encountered a negative value.
The part WHERE EXISTS(...) / WHERE NOT EXISTS(...) handles the special case of percentage_value = -100%. When we encounter -100, we cannot calculate the logarithm even with a call to ABS(...).However, this does not matter much as the products you want to calculate are going to be 0 from this point onward.
Side note:
You can save yourself some of the complexity of the above queries by changing how you store the changes.
Storing 0.02 to represent 2% removes the multiplications/divisions by 100.
Storing 0.0198026272961797 (LN(1 + 0.02)) removes the need to call for a logarithm in your query.

I assume that date in 3rd row is 2022/12/03. Otherwise you need to add an id or some other column to have order on percent changes that occurred in the same day.
Solution
To calculate value after percent_change, you need to multiply your current value by (100 + percent_change) / 100
For day n cumulative value is 100 multiplied by product of coefficients (100 + percent_change) / 100 up to day n.
In PostgreSQL "up to day n" can be implemented with window functions.
Since there is no aggregate function for multiplication, lets create it.
CREATE AGGREGATE PRODUCT(DOUBLE PRECISION) (
SFUNC = float8mul,
STYPE = FLOAT8
);
Final query will look like this:
SELECT
date,
percent_change,
100 * product((100 + percent_change)::float / 100) OVER (ORDER BY date) cumulative_value
FROM performance;

How to get the difference in minutes between two timestamps excluding weekends?

I need to get the difference in minutes excluding weekends (Saturday, Sunday), between 2 timestamps in postgres, but I'm not getting the expected result.
Examples:
Get diff in minutes, however, weekends are include
SELECT EXTRACT(EPOCH FROM (NOW() - '2021-08-01 08:00:00') / 60)::BIGINT as diff_in_minutes;
$ diff_in_minutes = 17566
Get diff in weekdays, excluding saturday and sunday
SELECT COUNT(*) as diff_in_days
FROM generate_series('2021-08-01 08:00:00', NOW(), interval '1d') d
WHERE extract(isodow FROM d) < 6;
$ diff_in_days = 10
Expected:
From '2021-08-12 08:00:00' to '2021-08-13 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-16 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-17 08:00:00' = 2880
and so on ...

the solution is:
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series(from_ts, to_ts, interval'1 minute') AS x
WHERE extract(isodow FROM x) <= 5
so
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series('2021-08-13 08:00:00'::timestamp, '2021-08-17 08:00:00', '1 minute') AS x
WHERE extract(isodow FROM x) <= 5
returns 2880

This is not an optimal solution - but I will leave finding the optimal solution as a homework for you.
First, create an SQL function
CREATE OR REPLACE FUNCTION public.time_overlap (
b_1 timestamptz,
e_1 timestamptz,
b_2 timestamptz,
e_2 timestamptz
)
RETURNS interval AS
$body$
SELECT GREATEST(interval '0 second',e_1 - b_1 - GREATEST(interval '0 second',e_1 - e_2) - GREATEST(interval '0 second',b_2 - b_1));
$body$
LANGUAGE 'sql'
IMMUTABLE
RETURNS NULL ON NULL INPUT
SECURITY INVOKER
PARALLEL SAFE
COST 100;
Then, call it like this:
WITH frame AS (SELECT generate_series('2021-08-13 00:00:00', '2021-08-17 23:59:59', interval '1d') AS d)
SELECT SUM(EXTRACT(epoch FROM time_overlap('2021-08-13 08:00:00', '2021-08-17 08:00:00',d,d + interval '1 day'))/60) AS total
FROM frame
WHERE extract(isodow FROM d) < 6
In the CTE you should round down the left/earlier of the 2 timestamps and round up the right/later of the 2 timestamps. The idea is that you should generate the series over whole days - not in the middle of the day.
When calling the time_overlap function you should use the exact values of your 2 timestamps so that it properly calculates the overlapping in minutes between each day of the generated series and the given timeframe between your 2 timestamps.
In the end, when you sum over all the overlappings - you will get the total number of minutes excluding the weekends.

PostgreSQL: calculating multiple averages in one query

I'd like to calculate an average amount of money spent per day for the past 7, 30, 90 and 180 days. I know how to do it using PL/pgSQL, but I'd prefer to do it in one query if possible. Something like this:
SELECT SUM(amount)/days
FROM transactions
WHERE created_at > CURRENT_DATE - ((days || ' day')::INTERVAL)
AND days = ANY(ARRAY[7,30,90,180]);
ERROR: column "days" does not exist

You can use unnest to convert the array into a table and the use a correlated subquery to calculate the average:
SELECT
days,
(SELECT SUM(amount)/days
FROM transactions
WHERE created_at > CURRENT_DATE - ((days || ' day')::INTERVAL)
) AS average
FROM unnest(ARRAY[7,30,90,180]) t(days)

Weighted moving average in Amazon Redshift

Is there a way to calculate a weighted moving average with a fixed window size in Amazon Redshift? In more detail, given a table with a date column and a value column, for each date compute the weighted average value over a window of a specified size, with weights specified in an auxiliary table.
My search attempts so far yielded plenty of examples for doing this with window functions for simple average (without weights), for example here. There are also some related suggestions for postgres, e.g., this SO question, however Redshift's feature set is quite sparse compared with postgres and it doesn't support many of the advanced features that are suggested.

Assuming we have the following tables:
create temporary table _data (ref_date date, value int);
insert into _data values
('2016-01-01', 34)
, ('2016-01-02', 12)
, ('2016-01-03', 25)
, ('2016-01-04', 17)
, ('2016-01-05', 22)
;
create temporary table _weight (days_in_past int, weight int);
insert into _weight values
(0, 4)
, (1, 2)
, (2, 1)
;
Then, if we want to calculate a moving average over a window of three days (including the current date) where values closer to the current date are assigned a higher weight than those further in the past, we'd expect for the weighted average for 2016-01-05 (based on values from 2016-01-05, 2016-01-04 and 2016-01-03):
(22*4 + 17*2 + 25*1) / (4+2+1) = 147 / 7 = 21
And the query could look as follows:
with _prepare_window as (
select
t1.ref_date
, datediff(day, t2.ref_date, t1.ref_date) as days_in_past
, t2.value * weight as weighted_value
, weight
, count(t2.ref_date) over(partition by t1.ref_date rows between unbounded preceding and unbounded following) as num_values_in_window
from
_data t1
left join
_data t2 on datediff(day, t2.ref_date, t1.ref_date) between 0 and 2
left join
_weight on datediff(day, t2.ref_date, t1.ref_date) = days_in_past
order by
t1.ref_date
, datediff(day, t2.ref_date, t1.ref_date)
)
select
ref_date
, round(sum(weighted_value)::float/sum(weight), 0) as weighted_average
from
_prepare_window
where
num_values_in_window = 3
group by
ref_date
order by
ref_date
;
Giving the result:
ref_date | weighted_average
------------+------------------
2016-01-03 | 23
2016-01-04 | 19
2016-01-05 | 21
(3 rows)

How to split start/end time columns into discrete chunks with PostgreSQL?

We have some tables, which have a structure like:
start, -- datetime
end, -- datetime
cost -- decimal
So, for example, there might be a row like:
01/01/2010 10:08am, 01/01/2010 1:56pm, 135.00
01/01/2010 11:01am, 01/01/2010 3:22pm, 118.00
01/01/2010 06:19pm, 01/02/2010 1:43am, 167.00
Etc...
I'd like to get this into a format (with a function?) that returns data in a format like:
10:00am, 10:15am, X, Y, Z
10:15am, 10:30am, X, Y, Z
10:30am, 10:45am, X, Y, Z
10:45am, 11:00am, X, Y, Z
11:00am, 11:15am, X, Y, Z
....
Where:
X = the number of rows that match
Y = the cost / expense for that chunk of time
Z = the total amount of time during this duration
IE, for the above data, we might have:
10:00am, 10:15am, 1, (135/228 minutes*7), 7
The first row starts at 10:08am, so only 7 minutes are used from 10:00-10:15.
There are 228 minutes in the start->end time.
....
11:00am, 11:15am, 2, ((135+118)/((228+261) minutes*(15+14)), 29
The second row starts right after 11:00am, so we need 15 minutes from the first row, plus 14 minutes from the second row
There are 261 minutes in the second start->end time
....
I believe I've done the math right here, but need to figure out how to make this into a PG function, so that it can be used within a report.
Ideally, I'd like to be able to call the function with some arbitrary duration, ie 15minute, or 30minute, or 60minute, and have it split up based on that.
Any ideas?

Here is my try. Given this table definition:
CREATE TABLE interval_test
(
"start" timestamp without time zone,
"end" timestamp without time zone,
"cost" integer
)
This query seems to do what you want. Not sure if it is the best solution, though.
Also note that it needs Postgres 8.4 to work, because it uses WINDOW functions and WITH queries.
WITH RECURSIVE intervals(period_start) AS (
SELECT
date_trunc('hour', MIN(start)) AS period_start
FROM interval_test
UNION ALL
SELECT intervals.period_start + INTERVAL '15 MINUTES'
FROM intervals
WHERE (intervals.period_start + INTERVAL '15 MINUTES') < (SELECT MAX("end") FROM interval_test)
)
SELECT DISTINCT period_start, intervals.period_start + INTERVAL '15 MINUTES' AS period_end,
COUNT(*) OVER (PARTITION BY period_start ) AS record_count,
SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start ) AS total_time,
(SUM(cost) OVER (PARTITION BY period_start ) /
(EXTRACT(EPOCH FROM SUM("end" - "start") OVER (PARTITION BY period_start )) / 60)) *
((EXTRACT (EPOCH FROM SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start )))/60)
AS expense
FROM interval_test
INNER JOIN intervals ON (intervals.period_start, intervals.period_start + INTERVAL '15 MINUTES') OVERLAPS (interval_test.start, interval_test.end)
ORDER BY period_start ASC

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Different result for (apparantly) equal input data? - postgresql

Related

write a query to calculate cumulative performance based on daily percent change in postgresql?

How to get the difference in minutes between two timestamps excluding weekends?

PostgreSQL: calculating multiple averages in one query

Weighted moving average in Amazon Redshift

How to split start/end time columns into discrete chunks with PostgreSQL?

Categories

Resources