I've got time series data in TimescaleDB from smart meters, the Energy value is stored as a counter.
I have 2 questions:
1) How do I calculate the difference between each row of energy values so I can just see the increase minute by minute for each row?
2) I've got this in 1 minute intervals and I'd like to aggregate as 30m, 60m etc. What's the best way of achieving this?
Thanks in advance for any help you can give me.
There are a couple of challenges here. First, you have to make sure that the intervals of your counter indexes are constant (think communication outages,...) . If not, you'll have to deal with the resulting energy peaks.
Second, your index will probably look like a discrete jigsaw signal, restarting at zero once in a while.
Here's how we did it.
For 2), we use as many continuous aggregates on the indexes as we require resolutions (15min, 60min,...). Use locf where required.
For 1) we do the delta computation on the fly. Meaning that we query the db for the indexes and then loop through the array to compute the delta. This way we can easily handle the jigsaw and peaks.
I've just got an answer to my question, which is very similar to your Part 1, here
In short, the answer I got was to use a before_insert trigger and calculate the difference values on insertion, storing them in a new column. This avoids needing to re-calculate deltas on every query.
I extended the function suggested in the Answer by also calculating the delta_time with
NEW.delta_time = EXTRACT (EPOCH FROM NEW.time - previous_time);
This returns the number of seconds which have passed, allowing you to calculate meter power reliably.
For Part 2, consider a Continuous Aggregate with time buckets as suggested above
Related
I'd like to get the 0.95 percentile memory usage of my pods from the last x time. However this query start to take too long if I use a 'big' (7 / 10d) range.
The query that i'm using right now is:
quantile_over_time(0.95, container_memory_usage_bytes[10d])
Takes around 100s to complete
I removed extra namespace filters for brevity
What steps could I take to make this query more performant ? (except making the machine bigger)
I thought about calculating the 0.95 percentile every x time (let's say 30min) and label it p95_memory_usage and in the query use p95_memory_usage instead of container_memory_usage_bytes, so that i can reduce the amount of points the query has to go through.
However, would this not distort the values ?
As you already observed, aggregating quantiles (over time or otherwise) doesn't really work.
You could try to build a histogram of memory usage over time using recording rules, looking like a "real" Prometheus histogram (consisting of _bucket, _count and _sum metrics) although doing it may be tedious. Something like:
- record: container_memory_usage_bytes_bucket
labels:
le: 100000.0
expr: |
container_memory_usage_bytes > bool 100000.0
+
(
container_memory_usage_bytes_bucket{le="100000.0"}
or ignoring(le)
container_memory_usage_bytes * 0
)
Repeat for all bucket sizes you're interested in, add _count and _sum metrics.
Histograms can be aggregated (over time or otherwise) without problems, so you can use a second set of recording rules that computes an increase of the histogram metrics, at much lower resolution (e.g. hourly or daily increase, at hourly or daily resolution). And finally, you can use histogram_quantile over your low resolution histogram (which has a lot fewer samples than the original time series) to compute your quantile.
It's a lot of work, though, and there will be a couple of downsides: you'll only get hourly/daily updates to your quantile and the accuracy may be lower, depending on how many histogram buckets you define.
Else (and this only came to me after writing all of the above) you could define a recording rule that runs at lower resolution (e.g. once an hour) and records the current value of container_memory_usage_bytes metrics. Then you could continue to use quantile_over_time over this lower resolution metric. You'll obviously lose precision (as you're throwing away a lot of samples) and your quantile will only update once an hour, but it's much simpler. And you only need to wait for 10 days to see if the result is close enough. (o:
The quantile_over_time(0.95, container_memory_usage_bytes[10d]) query can be slow because it needs to take into account all the raw samples for all the container_memory_usage_bytes time series on the last 10 days. The number of samples to process can be quite big. It can be estimated with the following query:
sum(count_over_time(container_memory_usage_bytes[10d]))
Note that if the quantile_over_time(...) query is used for building a graph in Grafana (aka range query instead of instant query), then the number of raw samples returned from the sum(count_over_time(...)) must be multiplied by the number of points on Grafana graph, since Prometheus executes the quantile_over_time(...) individually per each point on the displayed graph. Usually Grafana requests around 1000 points for building smooth graph. So the number returned from sum(count_over_time(...)) must be multiplied by 1000 in order to estimate the number of raw samples Prometheus needs to process for building the quantile_over_time(...) graph. See more details in this article.
There are the following solutions for reducing query duration:
To add more specific label filters in order to reduce the number of selected time series and, consequently, the number of raw samples to process.
To reduce the lookbehind window in square brackets. For example, changing [10d] to [1d] reduces the number of raw samples to process by 10x.
To use recording rules for calculating coarser-grained results.
To try using other Prometheus-compatible systems, which may process heavy queries at faster speed. Try, for example, VictoriaMetrics.
I am trying to calculate a rolling average by 30 days. However, in Tableau, I have to use window_avg(avg(varaible), - 30, 0). It means that it is actually calculating the average of daily average. It first calculate the average value per day, then average the values for past 30 days. I am wondering whether there is a function in Tableau that can calculate directly rolling average, like pandas.rolling?
In this specific case, you can use the following
window_sum(sum(variable), -30, 0) / window_sum(sum(1), -30, 0)
A few concepts about table calcs to keep in mind
Table calcs operate on aggregate query results.
This gives you flexibility - you can partition the table of query results in many ways, access multiple values in the result set, order the query results to impact your calculations, nest table calcs in different ways.
This approach can also give you efficiency if you can calculate what you need simply from the aggregate results that you've already fetched.
It also gives you complexity. You have to be aware of how each calculation specifies the addressing and partitioning of the query results. You also have to think about how double aggregation will impact your results.
In most cases, applying back to back aggregation functions requires some careful thought about what the results will mean. As you've noted, averages of averages may not mean what people think they mean. Others, may be quite reasonable, say averages of daily sales totals.
In some cases, double aggregation can be used without extra thought as the results are the same regardless. Sums of Sums, Mins of Mins, Max of Max yield the same result as calling Sum, min or max on the underlying data rows. These functions are called additive aggregation functions, and obey the associative rule you learned in grade school. Hence, the formula at the start of this answer.
You can also read about the Total() function.
I have two data set, let us name them "actual speed" and "desired speed". My main objective is to match actual speed with the desired speed.
But for doing that in my case, I need to tune FF(1x10), Integral(10x8) and Proportional gain table(10x8).
My approach till now was as follows:-
First, start the iteration with having 0.1 as the initial value in the first cells(FF[0]) of the FF table
Then find the R-square or Co-relation between two dataset( i.e. Actual Speed and Desired Speed)
Increment the value of first cell(FF[0]) by 0.25 and then again compute R-square or Co-relation of two data set.
Once the cell(FF[0]) value reaches 2(Gains Maximum value. Already defined by the lab). Evaluate R-square and re-write the gain value in FF[0] which gives min. error between the two curve.
Then tune the Integral and Proportional table in the same way for the same RPM Range
Once It is tune then go for higher RPM range and repeat step 2-5 (RPM Range: 800-1000; 1000-1200;....;3000-3200)
Now the problem is that this process is taking way too long time to complete. For example it takes around 1 Hr. time to tune one cell of FF. Which is actually very slow.
If possible, Please suggest any other approach which I can try to tune the tables. I am using MATLAB R2010a and I can't shift to any other version of MATLAB because my controller can communicate with this version only and I can't use any app for tuning since my GUI is already communicating with the controller and those two datasets are being made in real-time
In the given figure, lets us take (X1,Y1) curve as Desired speed and (X2,Y2) curve as Actual speed
UPDATE
Good morning,
I have a question about the time execution of a script on Matlab. Is it possible to know previously how long spend the execution of a script before running it (an estimated time, for example)? I know that with tic and toc command, among others, is it possible to know the time at the end but I don't know if it's possible to know it before.
Thanks in advance,
It is not too hard to make an estimate of how long your calculation will take.
You already know how to record calculation times with tic and toc, so now you can do this:
Start with a small scale test (example, n=1) and record the calculation time
Multiply n with a constant k (I usually choose 2 or 10 for easy calculations), record the calculation time
Keep multiplying with n untill you find a consistent relation: 'If I multiply my input size with k, my calculation time changes like so ...'
Now you can extrapolate your estimated calculation time by:
calculating how many times you need to multiply input size of the biggest small scale example to get your real data size
Applying the consistent relation that you found exactly that many times to the calculation time of your biggest small scale example
Of course this combines well with some common sense, like if you do certain things t times they will take about t times as long. This can easily be used when you have to perform a certain calculation a million times. Just interrupt the loop after a minute or so, if it is still in the first ten calculations you may want to give up!
Context: I'm trying to improve the values returned by the iPhone CLLocationManager, although this is a more generally applicable problem. The key is that CLLocationManger returns data on current velocity as and when it feels like it, rather than at a fixed sample rate.
I'd like to use a feedback equation to improve accuracy
v=(k*v)+(1-k)*currentVelocity
where currentVelocity is the speed returned by didUpdateToLocation:fromLocation: and v is the output velocity (and also used for the feedback element).
Because of the "as and when" nature of didUpdateToLocation:fromLocation: I could calculate the time interval since it was last called, and do something like
for (i=0;i<timeintervalsincelastcalled;i++) v=(k*v)+(1-k)*currentVelocity
which would work, but is wasteful of cycles. Especially as I probably want timeintervalsincelastcalled to be measured as 10ths of a second.
Is there a way to solve this without the loop ? i.e. rework (integrate?) the formula so I put an interval into the equation and get the same answer as I would have by iteration ?
If you write your original equation as
v = k*vCurrent + (1-k)*v
you can apply the answer from another SO question.
Instead of iterating, you could just choose the value of k based on the size of the interval. For example, if the interval length is an hour - you'd probably want k to be 0.
It would be easy to precompute k for a variety of interval sizes to give the same answer as the iteration would give. Just compute the change by iterating (you already have code for that), and then compute the value of k that would give you that algebraicly.
It's a common programmer jedi trick to have a table of lookup values in place of expensive calculations. (there, now my answer has something to do with code!)