I am using a MATLAB toolbox, specifically, https://uk.mathworks.com/matlabcentral/fileexchange/32882-armax-garch-k-sk-toolbox-estimation-forecasting-simulation-and-value-at-risk-applications
to insert data into the functions, the author defines a data matrix and then uses data(:,3) for the third column which represents a series.
I would like to do this put add data(:,3) lagged by one period.
My question: is there a way I can write something in Matlab that lags the dataset by one period which can be inserted into the function.
If I understand correctly, you would like to lag a series by one time period, with the time period being however you collect the data, for example, daily data, lag the series by one day.
If so you can use the lagmatrix
To provide an example,
LAGGEDX = lagmatrix(data(:,3),1)
This would lag your data(:,3) series by one day if it is daily data, you could then insert LAGGEDX in replace of data(:,3).
Related
How can I plot time-grouped increment data in a bar graph in Grafana, but with a sparse data source that needs interpolation BEFORE calculating the increment?
My data source is an InfluxDB with a sparse time series of accumulated values (think: gas meter readings). The data points are usually a few days apart.
My goal is to create a bar graph with value increase per day. For the missing values, linear interpolation will do just fine.
I've come up with
SELECT spread("value") FROM "gas" WHERE $timeFilter GROUP BY time(1d) fill(linear)
but this won't work as the fill(linear) command is executed AFTER the spread(value) command. If I use time periods much greater than my granularity of input data (e.g. time(14d)), it shows proper bars, but once I use smaller periods, the bars collapse to 0.
How can I apply the interpolation BEFORE the difference operation?
Described situation by you is caused by fact that fill() fills data only if you do not have anything in your group by time() period in your query. If you get spread=0 then you probably have only one value in this period, so no fill() is used.
What I can suggest to you is to use subquery with lower group period time to prepare interpolation of your original signal. This is an example:
SELECT spread("interpolated_value") FROM (
SELECT first("value") as "interpolated_value" from "gas"
WHERE $timeFilter
GROUP BY time(10s) fill(linear)
)
GROUP BY time(1d) fill(none)
Subquery will prepare value for each 10s period (I recommend to set this value possibly as high as you can accept). If in 10s periods are values, it will pick the first one, if there is no value in this period, it will do an interpolation.
In main query there is an usage from prepared interpolated set of values to calculate spread.
All above only describes how you can get interpolated data within shorted periods. I strongly recommend to think about usability of this data. Calculating spread from lineary interpolated data may have questionable reliability.
Given a sequence of numbers that trend overtime, I would like to use Reactive Extensions to give an alert when there is a sudden absolute change spike or drop. i.e 101.2, 102.4, 101.4, 100.9, 95, 93, 85... and then increasing slowly back to 100.
The alert would be triggered on the drop from 100.9 to 95, each would have a timestamp looking for an an alert of the form:
LargeChange
TimeStamp
Distance
Percentage
I believe i need to start with Buffer(60, 1) for a 60 sample moving average (of a minute frequency between samples).
Whilst that would give the average value, I can't assign an arbitrary % to trigger the alert since this could vary from signal to signal - one may have more volatility that the other.
To get volatility I would then take a longer historical time frame Buffer(14, 1) (these would be 14 days of daily averages of the same signal).
I would then calculate the difference between each value in the buffer and the 14 day average, square and add all these deviations, and divide by the number of samples.
My questions are please:
How would I perform the above volatility calculation, or is it better to just do this outside of RX and update the new volatility value once daily external to the observable stream calculation (this may make more sense to avoid me having to run 14 days worth of 1 minute samples through it)?
How would we combine the fast moving average and volatility level (updated once per day) to give alerts? I am seeing Scan and DistinctUntilChanged on posts on SO, but cant work out how to put together.
I would start by breaking this down into steps. (For simplicity I'll assume the original data source is an observable called values.)
Convert values into a moving averages observable (we'll call this averages here).
Combine values and averages into an observable that can watch for "extremes".
For step 1, you may be able to use the built-in Window method that Slugart mentioned in a comment or the similar Buffer method. A Select call after the Window or Buffer can be used to process the array into a single average value object. Something like:
averages = values.Buffer(60, 1)
.Select((buffer) => { /* do average and std dev calcuation here */ });
If you need sliding windows, you may have to implement your own operator, but I could easily be unaware of one that does exist. Scan along with a queue seem like a good basis for such an operator if you need to write it.
For step 2, you will probably want to start with CombineLatest followed by a Where clause. Something like:
extremes = values.CombineLatest(averages, (v, a) => new { Current = v, Average = a })
.Where((value) = { /* check if value.Current is out of deviation from value.Average */ });
The nice part of this approach is that you can choose between having averages be computed directly from values in line like we did here or be some other source of volatility information with minimal effect on the rest of the code.
Note that the CombineLatest call may cause two subscriptions to values, one directly and one indirectly via a subscription to averages. If the underlying implementation of values makes this undesirable, use Publish and RefCount to get around this.
Also note that CombineLatest will output a value each time either values or averages outputs a value. This means that you will get two events every time averages updates, one for the values update and one for the averages update triggered by the value.
If you are using sliding windows, that would mean a double update on every value, and it would probably be better to simply include the current value on the Scan output and skip the CombineLatest altogether. You would have something like this instead:
averages = values.Scan((v) => { /* build sliding window and attach current value */ });
extremes = averages.Where((a) => { /* check if current value is out of deviation for the window */ });
Once you have extremes, you can subscribe to it and trigger your alerts.
I am new to PostgreSQL and database systems, and I am currently trying to create a database to store observed values as well as all predictions made in the past for some time series.
I have already built a table (actually a view) for observed values, with rows looking basically like:
(time, object, value)
Now I want to store predictions, which means for each time, what has been predicted by some software for the following next N time steps, N being variable since the software has different prediction types.
I have thought about multiple solutions, which are the following:
Store each prediction as a row, using max(N)=240 columns i.e (time, object, value 1, value 2, ..., value 240).
Store each prediction as a row, with the prediction values as a binary JSON, i.e (time, object, JSONB prediction).
Store each prediction value as a row, with a column specifying the delay of the prediction in hours, i.e
(time, object, delay, value).
I don't know how each of these choices would affect performance when I will retrieve and compute summary values on the predictions. A typical thing I would like to do is to retrieve the performance of the prediction for some delay, i.e. how big is the prediction error when we predict x days ahead, and I need this query to be executed pretty fast, to display it in a dashboard.
Which choice do you think is the best? Or do you have any other idea?
Thanks a lot!
Without further information about the access patterns for the collected data i would strongly recommend to use jsonb.
Using one column per timestep will result in bloat of the system catalog and statistics.
If you need to filter on the values of the predictions, you don't want to maintain 240 indexes also.
If you don't need to use these values within a WHERE condition you may use json instead of jsonb.
I make a structure using Comsol then I want to make this structure subjected to a temperature variation ( T(begain)=25C then a temperature ramp (100 C/min) till T=250C and it lasts for 30 min then another temperature ramp (-100 C/min) till T=25C ).How could I make these temperature sweep?
You can define a function (e.g foo) that follows exactly your desired temperature with time profile. Then in the place where you specify your temperature (whether it is a boundary condition or domain condition) you insert foo(t), t being COMSOL's exclusive variable name for time.
You can do that for other variables too, space for instance. The easiest way to define foo is through the 1D interpolation option. Unfortunately, I do not currently have a COMSOL license to check it but I think you can simply enter the time and temperature values in the 1D interpolation table, choose a name and the interpolant style and just use it in the later part of the program.
I'am simulating magnetic fields in time domain with moving coils. Time dependent solver is needed for the movement and for temperature ramps as well. I think that you can use something like this, T=T_start+rate_of_change*t. The t variable is available with the time dependent solver and you can simply write the equation I mentioned. However, I think that you need to use time dependent solver three times, one for ramp up second for the constant temperature and third for the ramp down. Set the times for time dependent solvers so that you can made the desired temperatures.
First t=0s->(225/100*60)135s
second t=135s->(135+30*60)1935s
and last one t=1935s->(1935+135)2070s
You might also need to use compile solutions steps as well to add these three solutions together. I can try to do this tomorrow and check it.
Hope that this helped a bit
I have some smart meter data which shows gas and electricity meter readings at 30 min intervals for about two years, for 16000 households.
The date is stored in separate .mat files, with a datetime variable for the timestamp and a double variable for the actual data. Some of the data has gaps in, from a few hours to several days or weeks. I want to create a timeseries object containing all of the data and a continuous timestamp for the two year period, so that I can then interpolate the gaps.
Another option would be to use snychronize, but for this it seems the 16000 data series need to be in individual timeseries objects, which seems cumbersome.
I have tried this with timeseries objects and financial time series but cannot get all of the 16000 data series and corresponding timestamps into one time series object. When I try to add more than one series to an existing timeseries object, it is added "in series" rather than "in parallel" (i.e. data in the Data:1 column).
When I tried with a financial time series I had difficulties preparing the datetime data in a cell array.
Any ideas what the most efficient way to do this is?
Thanks
Russell
Depending on the version of matlab that you have the best idea would seem to be to use the table variable.
Tables can be used to store disparate objects so that you can have the date/time stamp as well as the meter readings in the same variable.
You can horizontally concatenate the tables (or otherwise join as you read them in so that you will now have a time series with a single date variable and the responses for each household.