seed that controls the order of a random function in Matlab - matlab

I used Matlab kmeans function to do clustering for two datasets: data1 and data2.
I have three main files, containing the following codes respectively,
result1 = kmeans(data1, 4);
result2 = kmeans(data2, 4);
r1 = kmeans(data1,4);
r2 = kmeans(data2,4);
I noticed that result1 and r1 are the same, but result2 and r2 are slightly different. I believe that this is caused by the randomness in the kmeans algorithm. In the 1st and 2nd files, data1 is executed first and thus kmeans uses the same "seed". In the 1st and 3rd files, data2 is executed at different stages. The kmeans used for result1 has an effect on the following kmeans.
My question is: can we set up seed in certain way so that r2 and result2 are the same?

You can control random number generation in MATLAB using the rng function. With it, you can capture the state of the random number generator before running your code, then set the random number generator back to that state before you run it again, ensuring you get the same results. For example:
rngState1 = rng; % Capture state before processing data1
result1 = kmeans(data1, 4);
rngState2 = rng; % Capture state before processing data2
result2 = kmeans(data2, 4);
...
rng(rngState1); % Restore state previously used for processing data1
r1 = kmeans(data1,4);
...
rng(rngState2); % Restore state previously used for processing data2
r2 = kmeans(data2,4);
Since you're processing data in separate files, this might mean saving and loading the state variables to and from a MAT-file to do what I've outlined above. Another option is simply to set the seed to a given value before processing each data set:
rng(1); % Set seed to 1 for data1
result1 = kmeans(data1, 4);
rng(2); % Set seed to 2 for data2
result2 = kmeans(data2, 4);
...
rng(1);
r1 = kmeans(data1,4);
...
rng(2);
r2 = kmeans(data2,4);

Another alternative is to use non-random initialization:
start = data1(1:4,:); % This is not necessarily a good initialization!
result1 = kmeans(data1, 4, 'Start',start);
Don't copy-paste the code above, it is just for illustrative purposes. But you might have a good strategy to initialize your means non-randomly, it depends on your data how you could do this. For example for 2D data within a rectangular domain you can select the four corners of the domain.

Related

What is the purpose of a sequence folding layer in matlab?

When designing a CNN for 1D time series signal classification in MATLAB i get the error that the 2dconvolutional layer does not take sequences as input. From my understanding it is perfectly possible to convolve of an "array" with a 3x1 filter. To resolve this issue MATLAB suggests to use a "sequence folding layer". What would be the function of such a sequence folding layer and how would the architecture need to be changed?
I get the following error message:
What would be the function of such a sequence folding layer and how would the architecture need to be changed?
Simply speaking, you're parsing in a sequence (or as you called it - "an array") of images whereas you need to convert them into a batch of images before performing any convolutional operations.
Documentation about sequenceFoldingLayer:
A sequence folding layer converts a batch of image sequences to a batch of images. Use a sequence folding layer to perform convolution operations on time steps of image sequences independently.
Regarding the usage of a sequenceFoldingLayer (once again suggested to check out the documentation):
To use a sequence folding layer, you must connect the miniBatchSize output to the miniBatchSize input of the corresponding sequence unfolding layer. For an example, see Create Network for Video Classification.
On said website are also lots of examples on how to create a sequence folding layer - e.g. one with the name of "fold1":
layer = sequenceFoldingLayer('Name','fold1')
... as well as examples on how to properly implement it within your project.
What would be the function of such a sequence folding layer and how would the architecture need to be changed?
Using sequenceFolding/sequenceUnFoldingLayer works as follows experimentally, the results in the red box in the figure are consistent.
%% Testing the sequenceFoldingLayer/sequenceUnfoldingLayer internal flow
% The difference between using the sequence layer and not using the sequenceFolding/sequenceUnFoldingLayer to extract features
% Common parameters used below
%
% 测试sequenceFoldingLayer/sequenceUnfoldingLayer内部具体操作流程
% 用和不用sequence layer的区别,使用sequenceFolding/sequenceUnFoldingLayer提取出特征结果比较
% 下面使用的共有参数
inputSize = [28 28 1];
filterSize = 5;
numFilters = 20;
numHiddenUnits = 200;
numClasses = 10;
inputData = rand(28,28,1,30);% h*w*c*n
convL = convolution2dLayer(filterSize,numFilters,'Name','conv',...
'Weights',ones(filterSize,filterSize,1,numFilters),...
'Bias',ones(1,1,numFilters));
bnL= batchNormalizationLayer('Name','bn',...
'TrainedVariance',ones(1,1,numFilters),...
'TrainedMean',zeros(1,1,numFilters),...
'Offset',zeros(1,1,numFilters),...
'Scale',ones(1,1,numFilters));
%% first stage, use sequence layer
layers = [ ...
sequenceInputLayer(inputSize,'Name','input')
sequenceFoldingLayer('Name','fold')
convL
bnL
reluLayer('Name','relu')
sequenceUnfoldingLayer('Name','unfold')
flattenLayer('Name','flatten')
lstmLayer(numHiddenUnits,'OutputMode','last','Name','lstm',...
'InputWeights',ones(800,11520),...
'RecurrentWeights',ones(800,200),...
'Bias',ones(800,1));
fullyConnectedLayer(numClasses, 'Name','fc',...
'Weights',ones(numClasses,200),...
'Bias',ones(numClasses,1));
softmaxLayer('Name','softmax')
classificationLayer('Name','classification')];
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'fold/miniBatchSize','unfold/miniBatchSize');
assembleNet = assembleNetwork(lgraph);
%% second stage: not use sequence layer
% The sequenceFoldingLayer is essentially a per-frame graph acquisition feature as follows
layers2 = [ ...
imageInputLayer(inputSize,'Name','input','Normalization','none')
convL
bnL
reluLayer('Name','relu')
fullyConnectedLayer(numClasses, 'Name','fc',...
'Weights',ones(numClasses,11520),...
'Bias',ones(numClasses,1));
softmaxLayer('Name','softmax')
classificationLayer('Name','classification')];
lgraph2 = layerGraph(layers2);
assembleNet2 = assembleNetwork(lgraph2);
%% visualize
analyzeNetwork(lgraph)
analyzeNetwork(lgraph2)
%% comparison
out1 = activations(assembleNet,inputData,'unfold','OutputAs','channels');
out2 = activations(assembleNet2,inputData,'relu','OutputAs','channels');
t1 = out1{1};
t2 = out2;
%% result is same
t1(1:10)
t2(1:10)

Training LSTM in keras for classification, with data structure with 60 time steps

I have a multidimensional dataset(3500,10), in which, there is one binary variable I want to predict, y (3500, 1). So I used the following code to separated X and y and create a data structure with 60 timesteps to use as input for the LSTM network:
data_set = data_set.as_matrix() # Using multiple predictors.
X_total = []
y_total = []
n_future = 1 # Number of days you want to predict into the future
n_past = 60 # Number of past days you want to use to predict the future
for i in range(60, len(data_set)):
X_total.append(data_set[i-n_past:i, :9])
y_total.append(data_set[i+n_future-1:i + n_future, 9])
X_total, y_total = np.array(X_total), np.array(y_total)
Then I get X_total(3460,60,9) and y_total(3460,1)
How can I be sure that the NN uses for each obs of X_total the matching y_total?
It is kind of confusing, when I look into X_total data, it seems that it starts at the first obs of the original data_set and y_total at the 60th.
How can I check it?

how to determine the time constant from the plot

In the following minimal working example,
M=2;
num=1/M;
den=[1 6/M];
G=tf(num,den);
step(G)
The output response of the system is shown
The characteristics of the system is missing the time constant which in this case Tc = M/6 (sec). Is there an option to activate this important characteristic? The time constant is the time that takes the step response to reach 63% of its final value. In this example, the plot via the steady state option, the final output is 0.167. To compute the time constant basically we compute the time of the magnitude of the output at 0.167*0.63 = 0.10521. From the plot, we can see
which matches Tc=M/6 where M=2. This is tedious workaround. Hopefully there is an option regarding this issue.
You don't really need to plot output of step function to find Tc:
M=2;
num=1/M;
den=[1 6/M];
G=tf(num,den);
tvect = 0:0.0001:3; % provide any time limits you want;
% the smaller time increment the higher "accuracy"
[val, t] = step(G, tvect);
idx_Tc = find(val>=0.63*max(val), 1, 'first');
Tc = t(idx_Tc); % Tc which you are looking for
val_Tc = val(idx_Tc); % value at Tc
If you need a nice plot, you can easily create it afterwards, since you already have all required values (t, val, Tc, val_Tc).
EDIT:
You can extend context menu following example below:
M=2;
num=1/M;
den=[1 6/M];
G=tf(num,den);
step(G);
f = gcf;
c = f.CurrentAxes.UIContextMenu;
uimenu(c,'Label','Find Tc','Callback',#findTc);
f.CurrentAxes.UIContextMenu = c;
where function findTc is defined as:
function findTc(~,callbackdata)
t = callbackdata.Source.Parent.Parent.CurrentAxes.Children(1).Children(2).XData;
val = callbackdata.Source.Parent.Parent.CurrentAxes.Children(1).Children(2).YData;
idx_Tc = find(val>=0.63*max(val),1, 'first');
Tc = t(idx_Tc);
val_Tc = val(idx_Tc);
disp(['Tc: ' num2str(Tc) ' val_Tc: ' num2str(val_Tc)])
end
Instead of printing values in command window you can annotate the plot but I would say it's rather cosmetic change. The code above shows how you can have this feature as an option in context menu.

basic help using hmm to clasify a sequence

I am very new to matlab, hidden markov model and machine learning, and am trying to classify a given sequence of signals. Please let me know if the approach I have followed is correct:
create a N by N transition matrix and fill with random values which sum to 1for each row. (N will be the number of states)
create a N by M emission/observation matrix and fill with random values which sum to 1 for each row
convert different instances of the sequence (i.e each instance will be saying the word 'hello' ) into one long stream and feed each stream to the hmm train function such that:
new_transition_matrix old_transition_matrix = hmmtrain(sequence,old_transition_matrix,old_emission_matrix)
give the final transition and emission matrix to hmm decode with an unknown sequence to give the probability
i.e [posterior_states logrithmic_probability] = hmmdecode( sequence, final_transition_matrix,final_emission_matris)
1. and 2. are correct. You have to be careful that your initial transition and emission matrices are not completely uniform, they should be slightly randomized for the training to work.
3. I would just feed in the 'Hello' sequences separately rather than concatenating them to form a single long sequence.
Let's say this is the sequence for Hello: [1,0,1,1,0,0]. If you form one long sequence from 3 'Hello' sequences, you would get:
data = [1,0,1,1,0,0,1,0,1,1,0,0,1,0,1,1,0,0]
This is not ideal, instead you should feed the sequences in separately like:
data = [1,0,1,1,0,0; 1,0,1,1,0,0; 1,0,1,1,0,0].
Since you are using MatLab, I would recommend using the HMM toolbox by Murphy. It has a demo on how you can train an HMM with multiple observation sequences:
M = 3;
N = 2;
% "true" parameters
prior0 = normalise(rand(N ,1));
transmat0 = mk_stochastic(rand(N ,N ));
obsmat0 = mk_stochastic(rand(N ,M));
% training data: a 5*6 matrix, e.g. 5 different 'Hello' sequences of length 6
number_of_seq = 5;
seq_len= 6;
data = dhmm_sample(prior0, transmat0, obsmat0, number_of_seq, seq_len);
% initial guess of parameters
prior1 = normalise(rand(N ,1));
transmat1 = mk_stochastic(rand(N ,N ));
obsmat1 = mk_stochastic(rand(N ,M));
% improve guess of parameters using EM
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1, 'max_iter', 5);
LL
4. What you say is correct, below is how you calculate the log probaility in the HMM toolbox:
% use model to compute log[P(Obs|model)]
loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)
Finally: Have a look at this paper by Rabiner on how the mathematics work if anything is unclear.
Hope this helps.

MatLab BayesNetToolbox parameter learning

My question is specific to the "learn_params()" function of the BayesNetToolbox in MatLab. In the user manual, "learn_params()" is stated to be suitable for use only if the input data is fully observed. I have tried it with a partially observed dataset where I represented unobserved values as NaN's.
It seems like "learn_params()" can deal with NaNs and the node state combinations that do not occur in the dataset. When I apply dirichlet priors to smoothen the 0 values, I get 'sensible' MLE distributions for all nodes. I have copied the script where I do this.
Can someone clarify whether what I am doing makes sense or if I am missing
something, i.e. the reason why "learn_params()" cannot be used with partially
observed data.
The MatLab Script where I test this is here:
% Incomplete dataset (where NaN's are unobserved)
Age = [1,2,2,NaN,3,3,2,1,NaN,2,1,1,3,NaN,2,2,1,NaN,3,1];
TNMStage = [2,4,2,3,NaN,1,NaN,3,1,4,3,NaN,2,4,3,4,1,NaN,2,4];
Treatment = [2,3,3,NaN,2,NaN,4,4,3,3,NaN,2,NaN,NaN,4,2,NaN,3,NaN,4];
Survival = [1,2,1,2,2,1,1,1,1,2,2,1,2,2,1,2,1,2,2,1];
matrixdata = [Age;TNMStage;Treatment;Survival];
node_sizes =[3,4,4,2];
% Enter the variablesmap
keys = {'Age', 'TNM','Treatment', 'Survival'};
v= 1:1:length(keys);
VariablesMap = containers.Map(keys,v);
% create the dag and the bnet
N = length(node_sizes); % Instead of entering it manually
dag2 = zeros(N,N);
dag2(VariablesMap('Treatment'),VariablesMap('Survival')) = 1;
bnet21 = mk_bnet(dag2, node_sizes);
draw_graph(bnet21.dag);
dirichletweight=1;
% define the CPD priors you want to use
bnet23.CPD{VariablesMap('Age')} = tabular_CPD(bnet23, VariablesMap('Age'), 'prior_type', 'dirichlet','dirichlet_type', 'unif', 'dirichlet_weight', dirichletweight);
bnet23.CPD{VariablesMap('TNM')} = tabular_CPD(bnet23, VariablesMap('TNM'), 'prior_type', 'dirichlet','dirichlet_type', 'unif', 'dirichlet_weight', dirichletweight);
bnet23.CPD{VariablesMap('Treatment')} = tabular_CPD(bnet23, VariablesMap('Treatment'), 'prior_type', 'dirichlet','dirichlet_type', 'unif','dirichlet_weight', dirichletweight);
bnet23.CPD{VariablesMap('Survival')} = tabular_CPD(bnet23, VariablesMap('Survival'), 'prior_type', 'dirichlet','dirichlet_type', 'unif','dirichlet_weight', dirichletweight);
% Find MLEs from incomplete data with Dirichlet prior CPDs
bnet24 = learn_params(bnet23, matrixdata);
% Look at the new CPT values after parameter estimation has been carried out
CPT24 = cell(1,N);
for i=1:N
s=struct(bnet24.CPD{i}); % violate object privacy
CPT24{i}=s.CPT;
end
According to my understanding of the BNT documentation, you need to make a couple of changes:
Missing values should be represented as empty cells instead of NaN values.
The learn_params_em function is the only one that supports missing values.
My previous response was incorrect, as I mis-recalled which of the BNT learning functions had support for missing values.