I have the following table:
ganado_id created weight
1 2018-12-24 285
2 2018-12-24 288
2 2018-10-13 241
1 2018-10-13 244
1 2018-08-11 202
I need to calculate the average weight gain for each ganado_id. Desired output:
ganado_id avg_weight_gain
1 0.618
2 0.652
The average weight gain for ganado_id = 1 is calculated this way:
SELECT ((285 - 244)::NUMERIC / ('2018-12-24'::DATE - '2018-10-13'::DATE)::NUMERIC + (244 - 202)::NUMERIC / ('2018-10-13'::DATE - '2018-08-11'::DATE)::NUMERIC) / 2
The average weight gain for ganado_id = 2 is calculated this way:
SELECT (288 - 241)::NUMERIC / ('2018-12-24'::DATE - '2018-10-13'::DATE)::NUMERIC
In production, there can be 1 to 15 weight records (first table) for each ganado_id
Try using the lag aggregate function to show you both the weight from the previous record and the date from the previous record. You can then sum the two (gain from previous record, number of days from previous record) to get the average:
with gains as (
select
ganado_id, weight, created,
weight - lag (weight) over (partition by ganado_id order by created) as gain,
created - lag (created) over (partition by ganado_id order by created) as days
from table1
)
select
ganado_id, sum (gain) * 1.0 / sum (days) as avg_gain
from gains
group by
ganado_id
-- EDIT --
Per your feedback, this would be the average of the averages:
with gains as (
select
ganado_id, weight, created,
1.0 * (weight - lag (weight) over (partition by ganado_id order by created)) /
(created - lag (created) over (partition by ganado_id order by created)) as gain_per_day
from table
)
select
ganado_id, avg (gain_per_day)
from gains
group by
ganado_id
Results:
1 0.61805555555555555556
2 0.65277777777777777778
Related
I have a table table1 which contains the details of any depositor like
Depositor
Deposit_Amount
Deposit_Date
Maturity_Date
Tenure
Rate
A
25000
2021-08-10
2022-08-10
12
10%
I have another table table2 which contains the interest due date as:
Interest_Due_Date
2021-09-30
2021-12-31
2022-03-31
2022-06-30
2022-08-10
My Code is:
with recursive recur (n, start_bal, days,principle,interest, end_bal) as
(
select sno,deposit_amount,rate,days,deposit_amount * (((rate::decimal(18,2))/100)/365)*days as interest, deposit_amount+(deposit_amount * (((rate::decimal(18,2))/100)/365)*days) as end_bal from (
SELECT
sno, COALESCE(DATE_PART('day', deposit_date::TIMESTAMP - lag(deposit_date::TIMESTAMP) over
(ORDER BY sno ASC rows BETWEEN UNBOUNDED PRECEDING AND CURRENT row)),0) AS
days, deposit_date, deposit_amount, rate
FROM
( SELECT
ROW_NUMBER () OVER (ORDER BY deposit_date) AS sno,
deposit_date,
deposit_amount,
rate
FROM
( SELECT
t1.deposit_date, t1.deposit_amount, t1.rate from table1 t1
UNION ALL
SELECT
t2.Interest_Due_Date AS idate, 0 as depo_amount, 0 as rate
FROM
table2 t2
ORDER BY
deposit_date) dep) calc) b where sno = 1 union all select b.sno, b.end_bal,b.days,b.prin_bal,(coalesce(a.end_bal,0)) * (((b.rate)/100)/365)*b.days as interest_NEW,
coalesce(a.end_bal,0)+ ((a.end_bal) * (((calc.rate)/100)/365)*calc.days) as end_bal_NEW
from b, recur as a
where calc.sno = a.n+1 ) select * from recur
"Every time when i try to execute the query its showing an error 'relation 'b' does not exist"
...
The result table should be
Deposit Amount
Date
Days
Interest
Total Amount
25000
2021-08-10
0
0
25000
0
2021-09-30
51
349.32
25349.32
0
2021-12-31
92
638.94
25988.26
0
2022-03-31
90
640.81
26629.06
0
2022-06-30
91
663.90
27292.97
0
2022-08-10
41
306.58
27599.54
I have an orders table with datetime when an order was placed, and when it was completed:
orderid
userid
price
status
createdat
doneat
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
I would like to have a new column that is the cumulative total (sum price) per user when the order was created:
orderid
userid
price
status
createdat
doneat
cumulative total when placed (per user)
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
0
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
0
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
250
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
0
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
100
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
100
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
100
The logic is sum the price for each user for all orders that were completed before the current row's created at date. For orderid=2, although it's the user's 2nd order, there are no orders that were completed before its createdat datetime of 2/21/21 05:27:29, so the cumulative total when placed is 0.
The same for orderid in [5,6,7]. For those orders and that userid, the only order that was completed before their createdat dates is order 4, so their cumulative total when placed is 100.
In PowerBI the logic is like this:
SUMX (
filter(
orders,
earlier orders.userid = orders.userid && orders.doneat < orders.createdat && order.status = 'completed'),
orders.price)
Would anyone have any hints of how to achieved this in postgresql?
I tried something like this and it didn't work.
select (case when o.doneat < o.createdat over (partition by o.userid, o.status order by o.createdat)
then sum(o.price) over (partition by o.userid, o.status ORDER BY o.doneat asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
end) as cumulativetotal_whenplaced
from order o
Thank you
You can duplicate each row into:
an "original" (which we'll decorate with a flag keep = true), that has an accounting value val = 0 (so far), and a time t = createdat;
a "duplicate" (keep = false), that has the price to account for (if status is 'completed') as val and a time t = doneat.
Then it's just a matter of accounting for the right bits:
select orderid, userid, price, status, createdat, doneat, cumtot
from (
select *, sum(val) over (partition by userid order by t, keep desc) as cumtot
from (
select *, createdat as t, 0 as val, true as keep from foo
union all
select *, doneat as t,
case when status = 'completed' then price else 0 end as val,
false as keep
from foo
) as a
) as a
where keep
order by orderid;
Example: DB Fiddle.
Note for RedShift: the window expression above needs to be replaced by:
...
select *, sum(val) over (
partition by userid order by t, keep desc
rows unbounded preceding) as cumtot
...
Result for your data:
orderid
userid
price
status
createdat
doneat
cumtot
1
128
100
completed
2021-02-16T18:40:45.000Z
2021-02-21T07:59:46.000Z
0
2
128
150
completed
2021-02-21T05:27:29.000Z
2021-02-23T11:58:23.000Z
0
3
128
100
completed
2021-09-03T08:38:14.000Z
2021-09-10T14:24:35.000Z
250
4
5
100
completed
2022-05-28T23:28:07.000Z
2022-06-26T06:10:35.000Z
0
5
5
100
canceled
2022-07-08T22:28:57.000Z
2022-08-10T06:55:17.000Z
100
6
5
100
completed
2022-07-25T13:46:38.000Z
2022-08-10T06:57:20.000Z
100
7
5
5
completed
2022-08-07T18:07:07.000Z
2022-08-12T06:56:23.000Z
100
Note: this type of accounting across time is actually robust to many corner-cases (various orders overlapping, some starting and finishing while others are still in process, etc.) It is the basis for a fast interval compaction algorithm that I should describe someday on SO.
Bonus: try to figure out why the partitioning window is ordered by t (fairly obvious) and also by keep desc (less obvious).
I'm trying to get this query to work properly...
select salary from agent
where salary > 75000
ORDER BY salary ASC
LIMIT (select ROUND(count(salary) * .75) as TwentyFifthTile from agent)
some addition information about the rows:
166 rows – 25%
331 rows – 50%
497 rows – 75%
662 rows – 100%
These rows have salary 75,000 plus:
235 / 662 = ~.35
.35 * 662 = ~235 rows.
I'm trying to get the above query to return back all the rows that have salary greater than 75,000 but are still in the first 497 rows. When I run the above query it returns all the rows starting at 75,000 and limited by a 497 row return constraint.
I'm not sure how I can just return salaries of greater than 75,000 that are in the first 497 rows of the limit constraint.
You can divide the total number of rows by the current row number to get this:
select salary
from (
select salary,
count(*) over () as total_count,
row_number() over (order by salary) as rn
from agent
where salary > 75000
) t
where (rn / total_count::numeric) <= 0.75
order by salary asc
Use row_number:
select salary, row_number() over (order by salary) row_num
from agent
where row_num < (select ROUND(count(salary) * .75) from agent)
and salary > 75000
Hi I want to show the Result set in ascending order. I have created the SQL FIDDLE for the same.
select amount_range as amount_range, count(*) as number_of_items,
sum(amount) as total_amount
from (
select *,case
when amount between 0.00 and 2500.00 then '<=$2,500.00'
when amount between 2500.01 and 5000.00 then '$2,500.01 - $5,000.00'
when amount between 5000.01 and 7500.00 then '$5,000.01 - $7,500.00'
when amount between 7500.01 and 10000.00 then '$7,500.01 - $10,000.00'
else '>$10,000.01' end as amount_range
from Sales ) a
group by amount_range order by amount_range;
My Results should be like
<=$2,500.00 4 5000
$2,500.01 - $5,000.00 3 12000
$5,000.01 - $7,500.00 2 13000
$7,500.01 - $10,000.00 1 10000
>$10,000.01 1 15000
The easiest method will be to sort off of a value in each grouping, for example the minimum amount:
select amount_range as amount_range,
count(*) as number_of_items,
sum(amount) as total_amount
from (
select *,case
when amount between 0.00 and 2500.00 then '<=$2,500.00'
when amount between 2500.01 and 5000.00 then '$2,500.01 - $5,000.00'
when amount between 5000.01 and 7500.00 then '$5,000.01 - $7,000.00'
when amount between 7500.01 and 10000.00 then '$7,500.01 - $10,000.00'
else '>$10,000.01' end as amount_range
from Sales ) a
group by amount_range
order by min(amount);
In Postgres, your subquery could also return an array where the first element is the desired position and the second is the string describing the bucket. Then, the outer query can ORDER BY your positioning value.
select amount_range[2] as amount_range,
count(*) as number_of_items,
sum(amount) as total_amount
from (
select *,case
when amount between 0.00 and 2500.00 then ARRAY['1','<=$2,500.00']
when amount between 2500.01 and 5000.00 then ARRAY['2','$2,500.01 - $5,000.00']
when amount between 5000.01 and 7500.00 then ARRAY['3', '$5,000.01 - $7,000.00']
when amount between 7500.01 and 10000.00 then ARRAY['4', '$7,500.01 - $10,000.00']
else ARRAY['5','>$10,000.01'] end as amount_range
from Sales ) a
group by amount_range
order by amount_range[1];
The first method happens to be simpler for your example. The second method would be useful if you were bucketing by something more complicated than ranges.
Is there a way to calculate a weighted moving average with a fixed window size in Amazon Redshift? In more detail, given a table with a date column and a value column, for each date compute the weighted average value over a window of a specified size, with weights specified in an auxiliary table.
My search attempts so far yielded plenty of examples for doing this with window functions for simple average (without weights), for example here. There are also some related suggestions for postgres, e.g., this SO question, however Redshift's feature set is quite sparse compared with postgres and it doesn't support many of the advanced features that are suggested.
Assuming we have the following tables:
create temporary table _data (ref_date date, value int);
insert into _data values
('2016-01-01', 34)
, ('2016-01-02', 12)
, ('2016-01-03', 25)
, ('2016-01-04', 17)
, ('2016-01-05', 22)
;
create temporary table _weight (days_in_past int, weight int);
insert into _weight values
(0, 4)
, (1, 2)
, (2, 1)
;
Then, if we want to calculate a moving average over a window of three days (including the current date) where values closer to the current date are assigned a higher weight than those further in the past, we'd expect for the weighted average for 2016-01-05 (based on values from 2016-01-05, 2016-01-04 and 2016-01-03):
(22*4 + 17*2 + 25*1) / (4+2+1) = 147 / 7 = 21
And the query could look as follows:
with _prepare_window as (
select
t1.ref_date
, datediff(day, t2.ref_date, t1.ref_date) as days_in_past
, t2.value * weight as weighted_value
, weight
, count(t2.ref_date) over(partition by t1.ref_date rows between unbounded preceding and unbounded following) as num_values_in_window
from
_data t1
left join
_data t2 on datediff(day, t2.ref_date, t1.ref_date) between 0 and 2
left join
_weight on datediff(day, t2.ref_date, t1.ref_date) = days_in_past
order by
t1.ref_date
, datediff(day, t2.ref_date, t1.ref_date)
)
select
ref_date
, round(sum(weighted_value)::float/sum(weight), 0) as weighted_average
from
_prepare_window
where
num_values_in_window = 3
group by
ref_date
order by
ref_date
;
Giving the result:
ref_date | weighted_average
------------+------------------
2016-01-03 | 23
2016-01-04 | 19
2016-01-05 | 21
(3 rows)