Randomnes in fitclinear from Matlab - matlab

I am running a logistic regression using the matlab function fitclinear with the following parameters:
rng('default')
[Mdl,FitInfo] = fitclinear(X',y', 'Lambda','auto',...
'Learner','logistic',...
'ObservationsIn','columns',...
'Regularization','ridge',...
'Solver','sgd',...
'Verbose',1,...
'BatchSize',100,...
'LearnRate',0.1,...
'OptimizeLearnRate',true,...
'PassLimit',100,...
'ClassNames',[-1,1]);
And due to the fact that i m working with recent and long historycal data, I came to realize that training this logistic regression with the exact same X and y and after setting the random generator to default to reproduce results, could results in 2 different results, i.e. 2 different set of Betas and different bias.
Could anyone tell me what could be the reason behing? Where could the randomness come from?

The system starts at a random start point, from there, with the size of your system, there are many local minima which could still be good. The idea is that with larger sizes of frameworks we don't really care about having the same global minima, we care about having decent results. Therefore, we can start at any random point and accept that our system is unlikely to end up with the best result but rather in some location that gives us good results. This means that it is unlikely, given a large system that any two training sequences will be the same.
https://stats.stackexchange.com/questions/203288/understanding-almost-all-local-minimum-have-very-similar-function-value-to-the

Related

Strange behavior of linear regression in PyTorch

I am facing a peculiar problem and I was wondering if there is an explanation. I am trying to run a linear regression problem and test different optimization methods and two of them have a strange outcome when comparing to each other. I build a data set that satisfies y=2x+5 and I add a random noise to that.
xtrain=np.range(0,50,1).reshape(50,1)
ytrain=2*train+5+np.random.normal(0,2,(50,1))
opt1=torch.optim.SGD(model.parameters(),lr=1e-5,momentum=0.8))
opt2=torch.optim.Rprop(model.parameters(),lr=1e-5)
F_loss=F.mse_loss
from torch.utils.data import TensorDataset,DataLoader
train_d=TensorDataset(xtrain,ytrain)
train=DataLoader(train_d,50,shuffle=True)
model1=nn.Linear(1,1)
loss=F_loss(model1(xtrain),ytrain)
def fit(nepoch, model1, F_loss, opt):
for epoch in range(nepoch):
for i,j in train:
predict = model1(i)
loss = F_loss(predict, j)
loss.backward()
opt.step()
opt.zero_grad()
When i compare the results of the following commands:
fit(500000, model1, F_loss, opt1)
fit(500000, model1, F_loss, opt2)
In the last epoch for opt1:loss=2.86,weight=2.02,bias=4.46
In the last epoch for opt2:loss=3.47,weight=2.02,bias=4.68
These results do not make sense to me, shouldn't opt2 have a smaller loss than opt1 since the weight and bias it finds is closer to the real value? opt2's method finds weights and biases to be closer to the real value (they are respectively 2 and 5). Am i doing something wrong?
This has to do with the fact that you are drawing the training samples themselves from a random distribution.
By doing so, you inherently randomized the ground truth to some extent. Sure, you will get values that are inherently distributed around 2x+5, but you do not guarantee that 2x+5 will also be the best fit to this data distribution.
It could thus happen that you accidentally end up with values that deviate quite significantly from the original function, and, since you use a mean squared error, these values get weighted quite significantly.
In expectation (i.e., for the number of samples going towards infinity), you will likely get closer and closer to the expected parameters.
A way to verify this would be to plot your training samples against the parameter set, as well as the (ideal) underlying function.
Also note that Linear Regression does have a direct solution - something that is very uncommon in Machine Learning - meaning you can directly calculate an optimal solution, e.g., with sklearn's function

Predictive curve fitting matlab

I have a question about curve fitting, I have many curves like the one in the picture.
X axis : time
Y axis : temperature
Each sample comes out every 30s.
GOAL : predict the value at the end of the transient
What would you do in this situation?
What I am doing is this :
for every new sample I start a new fitting (and so each fitting is independent from the previous one) and check the value of the fitted curve 2 hours (all curves I have set before 2h) after the start of the measurement. If for a number (let's say 5) of subsequent fitting the value in the future stays more or less the same(+-0.2°C) I so assume that the estimation is the right one.
This approach seems to me far too simple and I think I am not exploiting all information. For example the info of the error I am making punctually (e.g. at minute 4:00 I predict and at 4:30 I see that I am doing an error).
In the picture the red part of the curve is excluded (but the real data in the future passes through it). the estimation is the blue one. You see in this case I don't have a good prediction... In general I have also more flat curves.
Based on the comments above, I tried to formulate an answer as no one else is giving some input.
I think your are using a good basic procedure. Better results may be obtained by using a more appropriate fitting curve, which includes all the dominant dynamics, but avoids overfitting of the data. Based on your figure, the simplest form I could think of is:
s + a(1-e^(-t/tau))
with parameters s (the initial temperature), a (amplitude = steady state value) and tau (dominant time constant). As you mentioned yourself, limiting the allowed range for the parameters may avoid overfitting and increase the quality of your estimation.
Using a random high order function, like you are using now, may give good interpolation results, but are dangerous to use for extrapolation, because strange effects may occur outside the fitting region.
Alternatives
Using the error (eg. correcting for the extrapolated error) may be possible, but is tricky and may not always give good results.
Training a neural network to perform the estimation is probably overkill, but may give better results if applied correctly. Note that you need a lot of training data which should be representative for the data for which you will use the neural network later on.

NARX Neural network prediction?

I am trying to solve a time series problem using the NARX Neural Network solution that Matlab provides. I am trying to understand how to predict actual values, but the results I get are almost perfect! The errors are so small that I am not sure if I am actually predicting. I just want to make sure I am doing everything right!
Basically I train the network with some samples using the GUI solution. Then I use the following script to test the neural network with new samples:
X = num2cell(open2(1:end))'; % input
T = num2cell(close2(1:end))'; % this is the output I should get
net = removedelay(net);
[Xs,Xi,Ai,Ts] = preparets(net,X,{},T);
Y = net(Xs,Xi,Ai);
plotresponse(Ts,Y)
view(net)
Y = cell2mat(Y);
T = cell2mat(T);
sizey = length(Y);
sizet = length(T);
T = T(1:sizey);
figure
plot(1:sizey,T,1:sizey,Y)
The graph I am getting is almost identical to the original target time series function. The errors are really small and the only difference is that the graph (Y) is shifted 2 samples to the left. But, am I really predicting?
Here's part of the graph:
Thanks in advance!
Update: The actual prediction graph is shifted to the right and not to the left. The targets provided by the preparets function (blue) occurs before! So it doesn't show it's actually predicting.
Right Shift
Your graph shows a timeshift of 1 (not 2!) timestep(s). This is not ideal, but can happen when the delays are badly chosen which leads to this kind of delay pattern. (For further explanation have a look at this question on MATLAB CENTRAL. In fact, Greg Heath posted a lot of material on ANNs, very worth the read even though it's sometimes a bit short to be understood immediately, especially for beginners.) So, to avoid this you have to look into the correlation patterns of your data.
Removedelay()
Now, I'm assuming that you wanted to correct for this behaviour by removing the delay of the network instead. Unfortunately, this is not what removedelay() is meant for:
This example uses a timedelaynet, but can be adopted for NAR and NARX networks as well, and I found the description very helpful. In combination with a quote from removedelay's documentation
The result is a network which behaves identically, except that outputs are produced n timesteps later.
it becomes clear that you're not changing the network, instead you only change the time dependence of your y-values, so your network will try to predict one time step ahead. You can see this behaviour at the very end of your T and Y vectors where Y will have an additional value while T fills this space with NaN (because you obviously cannot generate more targets out of the blue).
removedelay() is supposed to be used in combination with a closed loop design, so that you can obtain predicted values early to use them as direct input for the next step. In this case, it also makes sense to increase the output delay by more than just one which is why you can pass an additional argument n:
net = removedelay(net,n);
To prove that the additional time step is not used you can simulate the desired data set with your trained net and then simulate the same set with removedelay(). They are going to be identical except for the last value of the Y curve (see Figure 1).
Fig. 1: Both plots are based on the same net trained with the first 3500 data points of MATLAB's heat exchanger example. Shown are the simulation results for the last 500 values in the set that have not been used in the training process. The results are identical except for an additional value for the one on the left using removedelay().
Errors
Your errors have to be very small if you're using a representative training set. Therefore, the prediction for similar, new data will be good because your net is not overfitted.
Conclusions
So, are you predicting? No, you are simulating. Simulating your network's behaviour is based on inputs of your previously unknown data set, not the targets (they only have to be passed to allow for performance evaluation). Therefore, passing new data to your net with or without removedelay() is simulation in both cases because it is based on provided inputs. Removing the delay doesn't make a difference for these results.
Prediction, on the other hand, requires no input data because it really just continues the pattern the network has learned so far without taking new input into account.
Suggestions
If all you want is to have an unknown data set with valid input values passed to your net for simulation, you could just as well pass it as part of the testing set by using the divideblock or divideint options.
If you want to make use of early prediction by removedelay() or need prediction in general because your inputs have holes or are unreliable for other reasons, you should consider simulating your unknown set with a closed loop. Should its performance be all too awful you can also train a closed loop network from the very beginning.

Matlab Confidence Interval for Degrees of Freedom

I would like to calculate a Confidence Interval along with my Degrees of Freedom (DOF) estimation in Matlab. I am trying to run the following line of code:
[R, DoF, ciDOF] = copulafit('t', U); % fit the copula
The code line without the "ciDOF" arguments takes between 1-3 hours to run with my data. I tried to run the code with the "ciDOF" argument several times, but the calculations seem to take very long (I stopped the calculation after 8 hours). No error message is generated.
Does anyone have experience with this argument and could kindly tell me how long I should expect the calculation to take (the size of my data is 167*19) and if I have specified the "ciDOF" argument correctly?
Many thanks for the help!
Carolin
If your data matrix U is of size 167 x 19, then what you are asking for is a copula-fit distribution dependent on 19-dimensions, making your copula a distribution in a 20-dimensional space with 19 dependent variables.
This is almost definitely why it is taking so long, because whether it is your intention or not, you are asking MATLAB to solve a minimization problem of taking 19 marginal distributions and come-up with the 19-variate joint distribution (the copula) where each marginal distribution (represented by 167 x 1 row-vectors) is uniform.
Most-likely this is a limit of the MATLAB implementation that is iterating through many independent computations and then trying to combine them together to fit the joint distribution's ideal conditions.
First and foremost -- and not to be insulting or insinuating -- you should definitely check that you really are trying to find a 19-variate copula. Also, just in case, make sure that your matrix U is oriented in the proper way, because if you have it transposed, you could be trying to ask for the solution to a 167-variate distribution.
But, if this is what you are actually trying to do, there is not really an easy way to predict how long it will take or how long it should take. Even with multiple dimensions, if your marginals are simple or uniform already, that would greatly reduce the copula computation. But, really, there is no way to tell.
Although this may seem like a cop-out, you may actually have better luck switching from MATLAB to R, especially if you have a lot of multivariate data, and you will probably find a lot more functionality in R than MATLAB. R is freely available and comes with a Graphical User Interface (GUI), in-case you aren't comfortable with command-line programming.
There are many more sources, but here is one PDF lecture on computing copula-fits in R:
http://faculty.washington.edu/ezivot/econ589/copulasPowerpoint.pdf

Different results for Fundamental Matrix in Matlab

I am implementing stereo matching and as preprocessing I am trying to rectify images without camera calibration.
I am using surf detector to detect and match features on images and try to align them. After I find all matches, I remove all that doesn't lie on the epipolar lines, using this function:
[fMatrix, epipolarInliers, status] = estimateFundamentalMatrix(...
matchedPoints1, matchedPoints2, 'Method', 'RANSAC', ...
'NumTrials', 10000, 'DistanceThreshold', 0.1, 'Confidence', 99.99);
inlierPoints1 = matchedPoints1(epipolarInliers, :);
inlierPoints2 = matchedPoints2(epipolarInliers, :);
figure; showMatchedFeatures(I1, I2, inlierPoints1, inlierPoints2);
legend('Inlier points in I1', 'Inlier points in I2');
Problem is, that if I run this function with the same data, I am still getting different results causing differences in resulted disparity map in each run on the same data
Pulatively matched points are still the same, but inliners points differs in each run.
Here you can see that some matches are different in result:
UPDATE: I thought that differences was caused by RANSAC method, but using LMedS, MSAC, I am still getting different results on the same data
EDIT: Admittedly, this is only a partial answer, since I am only explaining why this is even possible with these fitting methods and not how to improve the input keypoints to avoid this problem from the start. There are problems with the distribution of your keypoint matches, as noted in the other answers, and there are ways to address that at the stage of keypoint detection. But, the reason the same input can yield different results for repeated executions of estimateFundamentalMatrix with the same pairs of keypoints is because of the following. (Again, this does not provide sound advice for improving keypoints so as to solve this problem).
The reason for different results on repeated executions, is related to the the RANSAC method (and LMedS and MSAC). They all utilize stochastic (random) sampling and are thus non-deterministic. All methods except Norm8Point operate by randomly sampling 8 pairs of points at a time for (up to) NumTrials.
But first, note that the different results you get for the same inputs are not equally suitable (they will not have the same residuals) but the search space can easily lead to any such minimum because the optimization algorithms are not deterministic. As the other answers rightly suggest, improve your keypoints and this won't be a problem, but here is why the robust fitting methods can do this and some ways to modify their behavior.
Notice the documentation for the 'NumTrials' option (ADDED NOTE: changing this is not the solution, but this does explain the behavior):
'NumTrials' — Number of random trials for finding the outliers
500 (default) | integer
Number of random trials for finding the outliers, specified as the comma-separated pair consisting of 'NumTrials' and an integer value. This parameter applies when you set the Method parameter to LMedS, RANSAC, MSAC, or LTS.
MSAC (M-estimator SAmple Consensus) is a modified RANSAC (RANdom SAmple Consensus). Deterministic algorithms for LMedS have exponential complexity and thus stochastic sampling is practically required.
Before you decide to use Norm8Point (again, not the solution), keep in mind that this method assumes NO outliers, and is thus not robust to erroneous matches. Try using more trials to stabilize the other methods (EDIT: I mean, rather than switching to Norm8Point, but if you are able to back up in your algorithms then address the the inputs -- the keypoints -- as a first line of attack). Also, to reset the random number generator, you could do rng('default') before each call to estimateFundamentalMatrix. But again, note that while this will force the same answer each run, improving your key point distribution is the better solution in general.
I know its too late for your answer, but I guess it would be useful for someone in the future. Actually, the problem in your case is two fold,
Degenerate location of features, i.e., The location of features is mostly localized (on you :P) and not well-spread throughout the image.
These matches are sort of on the same plane. I know you would argue that your body is not planar, but comparing it to the depth of the room, it sort of is.
Mathematically, this means you are kind of extracting E (or F) from a planar surface, which always has infinite solutions. To sort this out, I would suggest using some constrain on distance between any two extracted SURF features, i.e., any two SURF features used for matching should be at least 40 or 100 pixels apart (depending on the resolution of your image).
Another way to get better SURF features is to set 'NumOctaves' in detectSURFFeatures(rgb2gray(I1),'NumOctaves',5); to larger values.
I am facing the same problem and this has helped (a little bit).