Cumulative sum with TimescaleDB continous aggregate view - postgresql

Let's say I have a continuous aggregate view that tracks the warehouse inventory change daily. The example below is not real, but I tried to simplify it for the purpose of the question.
CREATE MATERIALIZED VIEW inventory_daily
WITH (timescaledb.continuous) AS
SELECT item,
time_bucket(INTERVAL '1 day', 'time') AS bucket,
SUM(item_delta) as daily_change
FROM conditions
GROUP BY item, bucket;
This gives you intraday inventory changes nicely. But what would be the best, or the most CPU efficient way to the get cumulative sum of all inventory changes over the whole lifespan of items? If you sum all changes together, you should have the count how many items there are left in the inventory, for double accounting, for each day.
Can the cumulative sum done in continous aggregated view or is there a better way to do a breakdown of inventory totals, assuming you have just the change as an input? What I hope to accomplish is:
-- Don't know how to do CUMULATIVE SUM
CREATE MATERIALIZED VIEW inventory_daily
WITH (timescaledb.continuous) AS
SELECT item,
time_bucket(INTERVAL '1 day', time) AS bucket,
SUM(item_delta) as daily_change,
CUMULATIVE_SUM(item_total) as total_at_the_end_day,
FROM conditions
GROUP BY item, bucket;
If this cannot be done in an aggregated view (as it looks like based on the comment) what would be the next best option? Manually calculate values for each day?

Related

two different selects in grafana from influxdb

I would like to calculate my savings of my solar power roof system.
So i have a select of my values:
SELECT (sum("Verbrauch")/60 - sum("Bezug")/60) * $Strompreis_Maingau /1000 FROM "Meter" WHERE $timeFilter GROUP BY time(1d) fill(null)
i multiply the values with the varible "Strompreis_Maingau" where the costs/kwh is configured.
So i changed my power company and i have now an other price for one kwh.
I would like to display the daily values up to 15.12.2021 with the "old" variable and from 16.12.2021 on with a new one.
When i change the select to:
SELECT (sum("Verbrauch")/60 - sum("Bezug")/60) * $Strompreis_Maingau /1000 FROM "Meter" WHERE time < '2021-12-16' GROUP BY time(1d) fill(null)
Then i see the values only up to this date..but i would like to combine it with an addidtional select with the new variable and up from 16.12.2021.
Is this possible in some way?
Thank you!
I have the same problem, the only way to calculate with the right costs is to put the costs/kwh into the database. Everytime they change you put another datapoint to the right time and the other values and the sum are correctly calculated. You can if you want display a graph with the currency of your electricity price.

postgresSQL How to do a SELECT clause with an condition iterating through a range of values?

Hy everyone. This is my first post on Stack Overflow so sorry if it is clumpsy in any way.
I work in Python and make postgresSQL requests to a google BigQuery database. The data structure looks like this :
sample of data
where time is represented in nanoseconds, and is not regularly spaced (it is captured real-time).
What I want to do is to select, say, the mean price over a minute, for each minute in a time range that i would like to give as a parameter.
This time range is currently a list of timestamps that I build externally, and I make sure they are separated by one minute each :
[1606170420000000000, 1606170360000000000, 1606170300000000000, 1606170240000000000, 1606170180000000000, ...]
My question is : how can I extract this list of mean prices given that list of time intervals ?
Ideally I'd expect something like
SELECT AVG(price) OVER( PARTITION BY (time BETWEEN time_intervals[i] AND time_intervals[i+1] for i in range(len(time_intervals))) )
FROM table_name
but I know that doesn't make sense...
My temporary solution is to aggregate many SELECT ... UNION DISTINCT clauses, one for each minute interval. But as you can imagine, this is not very efficient... (I need up to 60*24 = 1440 samples)
Now there very well may already be an answer to that question, but since I'm not even sure about how to formulate it, I found nothing yet. Every link and/or tip would be of great help.
Many thanks in advance.
First of all, your sample data appears to be at millisecond resolution, and you are looking for averages at minute (sixty-second) resolution.
Please try this:
select div(time, 60000000000) as minute,
pair,
avg(price) as avg_price
from your_table
group by div(time, 60000000000) as minute, pair
If you want to control the intervals as you said in your comment, then please try something like this (I do not have access to BigQuery):
with time_ivals as (
select tick,
lead(tick) over (order by tick) as next_tick
from unnest(
[1606170420000000000, 1606170360000000000,
1606170300000000000, 1606170240000000000,
1606170180000000000, ...]) as tick
)
select t.tick, y.pair, avg(y.price) as avg_price
from time_ivals t
join your_table y
on y.time >= t.tick
and y.time < t.next_tick
group by t.tick, y.pair;

Aggregate on day, month and week level

I am making a dashboard with clicks on a daily level and a month level on a certain campaign.
If I have say 1 customer clicking on 2 days then at a daily level that customer is counted twice. However, when I look at the aggregate monthly level this person will be counted once.
My SQL code that I am pulling into tableau is at a daily level. How do I get a monthly level view in the dashboard? When I am creating a parameter with month and day..on selecting month it is just adding the day level numbers to give me the month.
Any advice?
Sounds like a count distinct thing. To get around this the COUNTD would need to happen in Tableau. That would mean you need the Contact ID (or whatever it is you want to count) within the data source. Obviously that would mean your data source is much bigger but is the only way to get an accurate unique count over a custom time period.
Another alternative is to restrict the available time periods for the user and pre-aggregate for those time periods.

How to create continuous views for leaderboards?

I have a set of events coming in with the structure player_id, score, timestamp. I want to create cycle based leaderboards on this so that I can see the players daily, weekly, monthly and yearly leaderboards. What kind of aggregations should I use. Could I use ordered set aggregates with rank?. And is it possible to also see/store the past/historical leaderboards so that I could also see last months leaderboards?
You can use fss_agg_weighted to build filtered space saving top-ks, and then extract the top-k player scores by calling fss_topk on the column built by fss_agg_weighted. For example, to continuously compute the daily top 10 player scores:
CREATE CONTINUOUS VIEW daily_top_scores AS
SELECT day(timestamp), fss_agg_weighted(player_id, 10, score) GROUP BY day;
And to extract the top-10 at a given point in time,
SELECT day, fss_topk(fss_agg_weighted) FROM daily_top_scores;
You can also combine the top-k results over wider date ranges without losing any information. To compute the top-10 scores over the entire history of the continuous view:
SELECT fss_topk(combine(fss_agg_weighted)) FROM daily top_scores;

MS Access 03 Query Criteras

If I have a report that tracks data for several accounts for each month with rows labeled:
UNITS,
REVENUE,
AVG REV/UNIT
How would I create a query that will filter the report to just show accounts where the UNITS row has increase/decreased 25% and the AVG REV/UNIT has increased/decreased 10%, from the previous month to the current month.
An example would be for the month of June I have the numbers....
JUN
UNITS 3,271
Revenue $3,598.10
Avg R/U $1.08
So when I run the report at the end of July I only want accounts that have a 25% difference in UNITS and/or a 10% difference in AVG REV/UNIT to show on a report.
qryPharmacy
SELECT PHAR_REPORT.*, (IIf(u1 Is Null,0,u1)+IIf(u2 Is Null,0,u2)+IIf(u3 Is Null,0,u3)+IIf(u4 Is Null,0,u4)+IIf(u5 Is Null,0,u5)+IIf(u6 Is Null,0,u6)+IIf(u7 Is Null,0,u7)+IIf(u8 Is Null,0,u8)+IIf(u9 Is Null,0,u9)+IIf(u10 Is Null,0,u10)+IIf(u11 Is Null,0,u11)+IIf(u12 Is Null,0,u12)) AS USUM, (IIf(r1 Is Null,0,r1)+IIf(r2 Is Null,0,r2)+IIf(r3 Is Null,0,r3)+IIf(r4 Is Null,0,r4)+IIf(r5 Is Null,0,r5)+IIf(r6 Is Null,0,r6)+IIf(r7 Is Null,0,r7)+IIf(r8 Is Null,0,r8)+IIf(r9 Is Null,0,r9)+IIf(r10 Is Null,0,r10)+IIf(r11 Is Null,0,r11)+IIf(r12 Is Null,0,r12)) AS RSUM, RMonth.*, PG2.*, PG.pGroup
FROM PHAR_REPORT, RMonth, PG2, PG
WHERE (((PHAR_REPORT.PR) Like ([PCODE] & '*')) And ((PG.pID)=PG2.PID))
ORDER BY PG2.pID, PHAR_REPORT.PR;
You should do it with more than one query. In the first query, select the data for the first month. On a second, to the desired month to compare. Create a third query that links the two first (be care about the correct relationship). Do the grouping/calculations in these queries.
In the 3rd query, create two fields that calculates increasing/decreasing for units and rev/unit. Now, you can add a criterium on each parameter field in the query columns.
The chalenge here is to be sure about hou would you work with the primary keys on months. Eg: if a A row in the first query isn't in the second (for not having an event on second month, for example), it will not be showed. In this case, the solution would be to create the queryes linking a table or query wich has the entyre set of registers, forcing it to show all the desired records despite they have or not occurrences.