Why can future data affect output of NARX time series prediction? - matlab

I am using A NARX network with one target vector and several input vectors. I am trying to predict the target value each day based on the inputs provided that day. If I feed all of the input and target values into the network up to and including the current day, I get one output result. But if I also feed in additional future input and target values, I get a different output result.
I have used " rng 'default' " to keep the same seed for the random number generator. I suspect there is some normalization or something going on that is using the future values, but I can't locate it. I am not doing that myself.
Obviously, any prediction routine that uses future data is invalid.
Thoughts?
I expected that supplying all of the inputs and target up to and including the current day would give an output estimate of the current day target, and that this would not depend on future data. But it does.
I checked this as follows: I substituted a linear regression model instead of the NARX, and got completely reproducible and identical results whether the future data was present or not. The inputs and targets fed to this routine were identical to those fed to the NARX.

Related

Parameter Variation in AnyLogic: Data for a specific variation

I am using parameter variation in AnyLogic (in a system dynamics model). I am interested in how one parameter changes with the various iterations. The parameter is binary: 0 when supply of water is greater than demand and 1 when supply is lower than demand. The parameters being varied are a given percentage of decrease in outdoor irrigation, a given percentage of decrease in indoor water-use, and a given percentage of households that have rainwater harvesting systems. Visually, I need a time plot where on the x-axis is time (10,950 days; i.e. 30 years) and the binary on the y-axis. This should essentially show which iteration pushes a 1 further into the future.
I have watched videos and seen how histograms and 2D data are used to visualize the results of the iterations, but this does not show which iteration produced which output specifically. Is there a way to first, visually show the output as I have described above and second, return the data for a specific iteration?
Many thanks!
Parameter variation experiments have After Iteration and After Simulation run actions that are executed after each iteration and simulation respectively. Here, it is possible to access the values inside the simulation object after it finished but before it is destroyed. There is also a getCurrentIteration() method which can be used to control the parameter variation experiment and retrieve the data.
For more detail please consult here and see "SIR Agent Based Calibration" example model in AnyLogic example models library (Help -> Example Models).

AnyLogic Sensitivity analysis visualization

I am new to this and lost about how to create visualization for the AnyLogic sensitivity analysis. Here is the summary:
I have dataset that captures dependent and independent variables at the end of simulation run (just captures one pair at the end). Trying to vary independent variable to see the impact on dependent variable. The resulting dataset is correct (when I copy and paste in Excel) but the chart looks blank.
Also, output states that it completed 5 iterations but I specified 10 and the data shows that there were in fact 10 iterations.
There are no parameters for the chart data (see the screenshot) but I am guessing it is automatically populated at the end simulation based on the code (also copied below)? Otherwise, I cannot figure out what goes into the chart data (tried to manually enter variables/datasets to no avail).
This is the code after each simulation runs:
Color color = lerpColor( (getCurrentIteration() - 1) / (double) (getMaximumIterations() - 1), blue, red );
chart0.addDataSet( root.died_friend, format( root.SocFriendBrave ), color, true, `Chart.INTERPOLATION_LINEAR, 1, Chart.POINT_NONE );`
I realize this is a very basic question but I am lost and cannot get on the right path based on the help I found. thank you.
Let's say you have 2 experiments: simulation (normal one) and sensitivity analysis (new one)
The only way for the chart to look blank is if your dataset died_friend is empty. This can happen for many reasons, but the reason I suspect it happens here, is because you fill this dataset with information at the end of the simulation run, which means that you are probably using the java actions of the simulation experiment.
The sensitivity experiment DOES NOT read what you write on the java actions of the simulation experiment, so that might be the problem.
If this is not the case, you need to check other possible reasons on why your dataset is empty when you run the sensitivity analysis.
Remember: the fact that your dataset has data in your simulation experiment, does not necessarily mean the dataset will not be empty on the sensitivity experiment.
Since the Sensitivity Analysis experiment is set up via the wizard, it would be better if you showed us how you'd set that up, rather than the resultant AnyLogic-generated code.
From what you've said here, it looks like you may be incorrectly trying to use datasets instead of scalar values. If there's one 'dependent variable' (model output of interest) then the 'independent variable' needs to be a model parameter (parameter of your top-level agent, normally Main, which the experiment will be varying).
So you should specify:
Varying the relevant parameter (looks like you wanted it to be 0, 0.25, 0.5, 0.75, 1 so min 0, max 1 with step size of 0.25)
Producing a chart for the scalar output you want. (If this was, say, a variable called outputValue, you'd use expression root.outputValue when setting up the chart in the wizard.)
Also you should only be getting more than 5 iterations if you set up your parameter variation criteria incorrectly.
(Dataset outputs in the experiment's charts are typically for where your model produces a time series as an output --- i.e., a dataset of sim time vs. value --- and you want the Sensitivity Analysis experiment to show charts with each run's time series as separate lines (i.e., time on the X-axis). Scalar charts are for where the X-axis is the varying parameter and the Y-axis the output of interest.)

Some questions about split_train_test() function

I am currently trying to use Python's linearregression() model to describe the relationship between two variables X and Y. Given a dataset with 8 columns and 1000 rows, I want to split this dataset into training and test sets using split_train_test.
My question: I wonder what is the difference between train_test_split(dataset, test_size, random_test = int) vs train_test_split(dataset, test_size).Also, does the 2nd one (without setting random_test=int) give me a different test set and training set each time I re-run my program? Also, does the 1st one give me the same test set and training set every time I re-run my program? What is the difference between setting random_test=42 vs random_test=43, for example?
In python scikit-learn train_test_split will split your input data into two sets i) train and ii) test. It has argument random_state which allows you to split data randomly.
If the argument is not mentioned it will classify the data in a stratified manner which will give you the same split for the same dataset.
Assume you want a random split the data so that you could measure the performance of your regression on the same data with different splits. you can use random_state to achieve it. Each random state will give you pseudo-random split of your initial data. In order to keep track of performance and reproduce it later on the same data you will use the random_state argument with value used before.
It is useful for cross validation technique in machine learning.

Which input block is used in Matlab simulink that provide random values with time?

I am a new user of Matlab Simulink and currently i am working with Fuzzy Logic Controller with Simulation. In the Simulink, i want to input the values(i highlighted in the picture below), that changes with time.
Here, i am using Uniform Random Number Block which sends four values(which i feed) at same time but i want one value must input first, then another and so on. My target is to use Scope block(the one which is highlighted in the right side) and check the output that varies with time. But with current configuration, i just input ONLY one value. Please help?

Simulink From Workspace: can't use timestamps from matrix

I'm using the simulink block From Workspace to read in some audio data provided by a script. I have formatted the data in a matrix with 2 columns, the first is the timestamp and the second is the data.
In the configuration paramaters, I have specified Fixed-Step and Discrete solver. The Start time and Stop also need to be configured manually and don't seem to come from the data.
Also, in the From Workspace block configuration, I need to specify the sample time (1/44100) or I get a warning if I specify -1, to inherit from the data and then get strange sample times.
So, how can I get simulink to use only the sample times in the matrix and use the first and last timestamps as the start and stop time of the simulation?
You should be able to do what you want by doing the following:
Firstly note that your problem is by definition not fixed step, hence you cannot use a fixed-step solver, which by definition is ... fixed-step.
You must use a variable step solver.
Assuming your (2 column) input data is called simin then set the start and stop times to be simin(1,1) and simin(end,1) respectively.
In your From Workspace block set the sample time to be 0 (which should have been the default).
Also de-select the Interpolate data option; and set "Form the output after final data value by:" to zero (you won't be using anything past the end of your data set so this should be OK.
Then you need to tell the solver to take additional steps to those that it would naturally want to take.
Do this on the Data Import/Export pane of the Model Configuration Parameters.
Near the bottom of the pane there is a selection box and an edit box for doing this.
Note however that this does not prevent the solver from taking steps at other time points, it just forces it to take additional steps at the times you specify.
But because you have your From WOrkspace block to not interpolate this shouldn't be a problem either. You should put simin(:,1) in here so that the solver is guaranteed to take steps at the time points in your input data.
Note that if you want an input block that only samples at the time points in the simin time vector then the only way to do this is to write an S-function that uses the mdlGetTimeOfNextVarHit method to tell the solver what the next sample time (for this block) should be.