I am using Matlab GPU computing to run a simulation. I suspect I may encounter a "random number seed" overlapping issue. My code is the following
N = 10000;
v = rand(N,1);
p = [0:0.1:1];
pA = [0:0.1:2];
[v,p,pA] = ndgrid(v,p,pA);
v = gpuArray(v);
p = gpuArray(p);
pA = gpuArray(pA);
t = 1;
bH = 0.9;
bL = 0.6;
a = 0.5;
Y = MyFunction(v,p,pA,t,bH,bL,a);
function[RA] = MyFunction(v,p,pA,t,bH,bL,a)
function[RA] = SSP1(v,p,pA)
RA = 0;
S1 = rand;
S2 = rand;
S3 = rand;
vA1 = (S1<a)*bH+(S1>=a)*bL;
vA2 = (S2<a)*bH+(S2>=a)*bL;
vA3 = (S3<a)*bH+(S3>=a)*bL;
if p<=t && pA>3*bL && pA<=3*bH
if pA>vA1+vA2+vA3
if v>=p
RA = p;
end
else
if v+vA1+vA2+vA3>=p+pA
RA = p+pA;
end
end
end
end
[RA] = gather(arrayfun(#SSP1,v,p,pA));
end
The idea of the code is the following:
I generate N random agents, which is characterized by the value of v. Then for each agent, I have to compute a quantity given (p,pA). As I have N agents and many combinations of (p,pA), I want to use GPU to speed up the process. But here comes a tricky thing:
for each agent, in order to finish my computation, I have to generate 3 extra random variables, vA1,vA2,vA3. Based on my understanding of GPU (I could be wrong), it does these computations simultaneously, i.e, for each agent v, it generates 3 random variables vA1,vA2,vA3. And GPU does this N procedures at the same time. However, I am not sure whether for agent 1 and agent 2, the corresponding vA1,vA2,vA3 may overlap? Because here N could be 1 million. I want to make sure that for all of these agents, the random number seed that is used to generate their corresponding vA1,vA2,vA3 won't overlap; otherwise, I am in big trouble.
There is a way to prevent this from happening, which is: I first generate 3N of these random variables vA1,vA2,vA3. Then I put them into my GPU. However, that may require a lot of GPU memory, which I don't have. The current method, I guess does not need too much GPU memory, as I am generating vA1,vA2,vA3 on the fly?
What you say does not happen. The proof is that the following code snipped generates random values in hB.
A=ones(100,1);
dA=gpuArray(A);
[hB] = gather(arrayfun(#applyrand,dA));
function dB=applyrand(dA)
r=rand;
dB=dA*r;
end
That said, your code has only 12 values for your random variables (4 for each) because for your use of S1, S2 and S3 you are basically flipping a coin:
vA1 = (S1<0.5)*bH+(S1>=0.5)*bL;
so vA1 is either 0, bH, bL or bH+bL.
Maybe this lack of variability is what is making you think that you don't have much randomness, not very clear from the question.
I have used two types of models for modeling a SISO system with a time series data. The first is ARIMAx and the second one the Output-Error. Now, I should know which of the two performs best in forecasting the output giving the input in certain horizon, 15 days in my case, and only the necessary observed outputs for the model initialize properly. In Matlab, it is presented two functions in that seems to be used to validate models the forecast() and predict(). I have been reading the difference between predicting and forecasting and apparently people misconfuse a lot the two terms. I would like to know which of the two I should use to validate a model and choose the best one. The main point is that I have to test the model's performance for many horizons. In other words, how the model performs to forecast on the first day ahead, on the second day ahead until the 15th day ahead. I wrote the following code as an example:
close all
clear all
tic;
uhe = {'furnas'};
% Set the structures to be evaluated in ARMAx model
na = 10;
nb = 2;
nc = 1;
nk = 2;
% Set the structures to be evaluated in OE model
nbb = 10;
nf = 6;
nkk = 0;
u = 1;
% Read training dataset file and set iddata definitions
data_train = importdata(strcat('train_',uhe{u},'.dat'));
data_test = importdata(strcat('test_',uhe{u},'.txt'));
data_valid = importdata(strcat('valid_',uhe{u},'.txt'));
data_complet = vertcat(data_train, data_valid, data_test);
data_complet = iddata(data_complet(:,2),data_complet(:,1));
data_complet.TimeUnit = 'days';
data_complet.InputName = 'Chuva';
data_complet.OutputName = 'Vazão';
data_complet.InputUnit = 'm³/s';
data_complet.OutputUnit = 'm³/s';
data_complet.Name = 'Sistema Chuva-Vazão';
data_train = iddata(data_train(:,2),data_train(:,1));
data_train.TimeUnit = 'days';
data_train.InputName = 'Chuva';
data_train.OutputName = 'Vazão';
data_train.InputUnit = 'm³/s';
data_train.OutputUnit = 'm³/s';
data_train.Name = 'Sistema Chuva-Vazão';
data_valid = iddata(data_valid(:,2),data_valid(:,1));
data_valid.TimeUnit = 'days';
data_valid.InputName = 'Chuva';
data_valid.OutputName = 'Vazão';
data_valid.InputUnit = 'm³/s';
data_valid.OutputUnit = 'm³/s';
data_valid.Name = 'Sistema Chuva-Vazão';
data_test = iddata(data_test(:,2),data_test(:,1));
data_test.TimeUnit = 'days';
data_test.InputName = 'Chuva';
data_test.OutputName = 'Vazão';
data_test.InputUnit = 'm³/s';
data_test.OutputUnit = 'm³/s';
data_test.Name = 'Sistema Chuva-Vazão';
% Modeling training dataset with ARMAx
models_train_armax = armax(data_train,[na nb nc nk]);
% Modeling training dataset with OE
models_train_oe = oe(data_train,[nbb nf nkk]);
% Evalutaing the validation dataset ARMAX
x0 = findstates(models_train_armax,data_valid);
OPT = simOptions('InitialCondition',x0);
ssmodel_armax=idss(models_train_armax);
models_valid_armax = sim(ssmodel_armax,data_valid,OPT);
% Evaluating the validation dataset OE
x0 = findstates(models_train_oe,data_valid);
OPT = simOptions('InitialCondition',x0);
ssmodel_oe=idss(models_train_oe);
models_valid_oe = sim(ssmodel_oe,data_valid,OPT);
% Predicting Horizon
hz = 20;
% Applying predict function
opt = predictOptions('InitialCondition','e');
[y_armax_pred] = predict(ssmodel_armax,data_valid(1:end),hz,opt);
[y_oe_pred] = predict(ssmodel_oe,data_valid(1:end),hz,opt);
% Applying forecast function
opt = forecastOptions('InitialCondition','e');
[y_armax_fc] = forecast(ssmodel_armax,data_train((end-max([na nb nc nk])):end),hz,data_test.u(1:hz),opt);
[y_oe_fc] = forecast(ssmodel_oe,data_train((end-max([nbb nf nkk])):end),hz,data_test(1:hz),opt);
Depends on how you are trying to validate the model. Generally you would use the predict command as you would want backtest against previous data.
Alternatively you could use forecast if you have a cross-validation/holdout sample and you would like to test against that
Matlab's help has an interesting line regarding the difference between forecast and predict
forecast performs prediction into the future, in a time range beyond the last instant of measured data. In contrast, the predict command predicts the response of an identified model over the time span of measured data. Use predict to determine if the predicted result matches the observed response of an estimated model. If sys is a good prediction model, consider using it with forecast.
Also note that Matlab's help for predict also says that careful model validation should not use the default value of the prediction horizon.
For careful model validation, a one-step-ahead prediction (K = 1) is usually not a good test for validating the model sys over the time span of measured data. Even the trivial one step-ahead predictor, y(hat)(t)=y(t−1), can give good predictions. So a poor model may look fine for one-step-ahead prediction of data that has a small sample time. Prediction with K = Inf, which is the same as performing simulation with sim command, can lead to diverging outputs because low-frequency disturbances in the data are emphasized, especially for models with integration. Use a K value between 1 and Inf to capture the mid-frequency behavior of the measured data.
I am using the linprog function in Matlab to solve a set of large linear programming problems. I have 2601 decision variables, 51 inequality constraints, 71 equality constraints, and lower bounds of 0 for all variables.
The coefficients in the objective function and constraints vary in different problems. I am using the simplex method (when I try active-set and interior-point the program never stops running, as long as I have waited which was more than hours).
The simplex method converges for some of the problems very quickly, and for some of them (also very quickly) shows this message:
Exiting: The constraints are overly stringent; no feasible starting point found.
However, even for the ones with that message, it still provides a solution which satisfy the constraints. Can I just ignore that message and use the solutions or the message is important and the solution is probably not optimum?
Update: It turned out that the interior-point method solves some of them, but not the others. So in the code below, I used the interior-point method for the ones that work with it, and the simplex method with the rest.
These are my files and this is my code:
clc; clear;
%distances
t1 = readtable('t.xlsx', 'ReadVariableNames',false);
ti = table2array(t1);
sz = size(ti);
tiv = reshape(ti, [1,sz(1)*sz(2)]);
%crude oil production and attraction
A = readtable('A.xlsx', 'ReadVariableNames',false);
Ai = table2array(A);
P = readtable('P.xlsx', 'ReadVariableNames',false);
Pi = table2array(P);
%others
one1 = readtable('A Matrix.xlsx', 'ReadVariableNames',false);
one = table2array(one1);
two1 = readtable('Aeq Matrix.xlsx', 'ReadVariableNames',false);
two = table2array(two1);
zero = zeros(sz(1), sz(1));
infin = inf(sz(1), sz(1));
zerov = reshape(zero, [1,sz(1)*sz(2)]);
infinv = reshape(infin, [1,sz(1)*sz(2)]);
%OF
f = (tiv).^1;
%linear program
%x = linprog(f,A,b,Aeq,beq,lb,ub)
options1 = optimoptions('linprog','Algorithm','interior-point');
options2 = optimoptions('linprog','Algorithm','simplex');
x1999 = vec2mat(linprog(f,one,Pi(1,1:end),two,Ai(1,1:end),zerov,infinv,zerov,options2),sz(1));
x2000 = vec2mat(linprog(f,one,Pi(2,1:end),two,Ai(2,1:end),zerov,infinv,zerov,options1),sz(1));
x2001 = vec2mat(linprog(f,one,Pi(3,1:end),two,Ai(3,1:end),zerov,infinv,zerov,options1),sz(1));
x2002 = vec2mat(linprog(f,one,Pi(4,1:end),two,Ai(4,1:end),zerov,infinv,zerov,options1),sz(1));
x2003 = vec2mat(linprog(f,one,Pi(5,1:end),two,Ai(5,1:end),zerov,infinv,zerov,options1),sz(1));
x2004 = vec2mat(linprog(f,one,Pi(6,1:end),two,Ai(6,1:end),zerov,infinv,zerov,options1),sz(1));
x2005 = vec2mat(linprog(f,one,Pi(7,1:end),two,Ai(7,1:end),zerov,infinv,zerov,options1),sz(1));
x2006 = vec2mat(linprog(f,one,Pi(8,1:end),two,Ai(8,1:end),zerov,infinv,zerov,options1),sz(1));
x2007 = vec2mat(linprog(f,one,Pi(9,1:end),two,Ai(9,1:end),zerov,infinv,zerov,options2),sz(1));
x2008 = vec2mat(linprog(f,one,Pi(10,1:end),two,Ai(10,1:end),zerov,infinv,zerov,options2),sz(1));
x2009 = vec2mat(linprog(f,one,Pi(11,1:end),two,Ai(11,1:end),zerov,infinv,zerov,options2),sz(1));
x2010 = vec2mat(linprog(f,one,Pi(12,1:end),two,Ai(12,1:end),zerov,infinv,zerov,options2),sz(1));
x2011 = vec2mat(linprog(f,one,Pi(13,1:end),two,Ai(13,1:end),zerov,infinv,zerov,options2),sz(1));
x2012 = vec2mat(linprog(f,one,Pi(14,1:end),two,Ai(14,1:end),zerov,infinv,zerov,options1),sz(1));
x2013 = vec2mat(linprog(f,one,Pi(15,1:end),two,Ai(15,1:end),zerov,infinv,zerov,options2),sz(1));
x2014 = vec2mat(linprog(f,one,Pi(16,1:end),two,Ai(16,1:end),zerov,infinv,zerov,options2),sz(1));
x2015 = vec2mat(linprog(f,one,Pi(17,1:end),two,Ai(17,1:end),zerov,infinv,zerov,options2),sz(1));
x2016 = vec2mat(linprog(f,one,Pi(18,1:end),two,Ai(18,1:end),zerov,infinv,zerov,options1),sz(1));
In case somebody wants to know what the problem was, I found that for those programs with error, there was actually no feasible point and what the error said was correct. I found it out by running the same linear programs with a vector of zeros for the objective function's coefficients, and getting the same error (recommended method by Matlab's manual).
I am using PCA to reduce number of features before training Random Forest. I first used around 70 principal components out of 125 which were around 99% of the energy (according to eigen values). I got much worse results after training Random Forests with new transformed features. After that I used all the principal components and I got the same results as when I used 70. This made no sense to me since that is the same feature space only in difirent base (the space has only be rotated so that should not affect the boundary).
Does anyone have the idea what may be the problem here?
Here is my code
clc;
clear all;
close all;
load patches_training_256.txt
load patches_testing_256.txt
Xtr = patches_training_256(:,2:end);
Xtr = Xtr';
Ytr = patches_training_256(:,1);
Ytr = Ytr';
Xtest = patches_testing_256(:,2:end);
Xtest = Xtest';
Ytest = patches_testing_256(:,1);
Ytest = Ytest';
data_size = size(Xtr, 2);
feature_size = size(Xtr, 1);
mu = mean(Xtr,2);
sigma = std(Xtr,0,2);
mu_mat = repmat(mu,1,data_size);
sigma_mat = repmat(sigma,1,data_size);
cov = ((Xtr - mu_mat)./sigma_mat) * ((Xtr - mu_mat)./sigma_mat)' / data_size;
[v d] = eig(cov);
%[U S V] = svd(((Xtr - mu_mat)./sigma_mat)');
k = 124;
%Ureduce = U(:,1:k);
%XtrReduce = ((Xtr - mu_mat)./sigma_mat) * Ureduce;
XtrReduce = v'*((Xtr - mu_mat)./sigma_mat);
B = TreeBagger(300, XtrReduce', Ytr', 'Prior', 'Empirical', 'NPrint', 1);
data_size_test = size(Xtest, 2);
mu_test = repmat(mu,1,data_size_test);
sigma_test = repmat(sigma,1,data_size_test);
XtestReduce = v' * ((Xtest - mu_test) ./ sigma_test);
Ypredict = predict(B,XtestReduce');
error = sum(Ytest' ~= (double(cell2mat(Ypredict)) - 48))
Random forest heavily depends on the choice of the base. It is not a linear model, which is (up to normalization) rotation invariant, RF completely changes behaviour once you "rotate the space". The reason behind it lies in the fact that it uses decision trees as base classifiers which analyze each feature completely independently, so as the result it fails to find any linear combination of features. Once you rotate your space you change "meaning" of features. There is nothing wrong with that, simply tree based classifiers are rather bad choice to apply after such transformations. Use features selection methods instead (methods which select which features are valuable without creating any linear combinations). In fact, RFs themselves can be used for such task due to their internal "feature importance" computation,
There is already a matlab function princomp which would do pca for you. I would suggest not to fall in numerical error loops. They have done it for us..:)