I'm using MATLAB R2014a version.
I have ten clusters of X, and y data.
I want to fit these 10 corresponding data model by using a neural network tool in MATLAB. And I want to save 10 different models in somewhere.
For each cluster, I need to design an implementation to determine the correct number of hidden layers. And I will save each model into an array or something like that. And then continue for the 2nd cluster.
For this aim, I have developed this algorithm:
for q = 1:z % number of clusters
mdl = fitnet( 10 );
mdl = train( mdl, X( classes == q ), y( classes == q ) );
view( mdl );
yy = net( X( classes == q ) );
perf = perform( net, yy, y( classes == q ) );
model( q ).mdl = mdl;
clear mdl;
end
When I run this code, I get this error:
Error using view (line 67)
Invalid input arguments
Error in Main (line 97)
view(mdl);
How can I fix the problem?
Thanks,
Unlike mentioned in the comments view() is the right function to choose here because it has been overloaded to also show a sketch of a neural network (see here: http://www.mathworks.com/help/nnet/ref/view.html).
So, the problem obviously is not view() itself, but your mdl-network which means you should:
go there with a debugger and check if it really is a neural network and if it contains values
check those values because X and y might not be the vectors you want (which you should also check)
...and/or post more information about what's going on in your code.
Related
I need to plot figures with subplots inside a parfor-loop, similar to this question (which deals more with the quality of the plots).
My code looks something like this:
parfor idx=1:numel(A)
N = A(idx);
fig = figure();
ax = subplot(3,1,1);
plot(ax, ...);
...
saveas(fig,"..."),'fig');
saveas(fig,"...",'png');
end
This gives a weird error:
Data must be numeric, datetime, duration or an array convertible to double.
I am sure that the problem does not lie in non-numeric data as the same code without parallelization works.
At this point I expected an error because threads will concurrently create and access figures and axes objects, and I do not think it is ensured that the handles always correspond to the right object (threads are "cross-plotting" so to say).
If I pre-initialize the objects and then acces them like this,
ax = cell(1,numel(A)); % or ax = zeros(1,numel(A));
ax(idx) = subplot(3,1,1);
I get even weirder errors somewhere in the fit-calls I use:
Error using curvefit.ensureLogical>iConvertSubscriptIndexToLogical (line 26)
Excluded indices must be nonnegative integers that reference the fit's input data points
Error in curvefit.ensureLogical (line 18)
exclude = iConvertSubscriptIndexToLogical(exclude, nPoints);
Error in cfit/plot (line 46)
outliers = curvefit.ensureLogical( outliers, numel( ydata ) );
I have the feeling it has to work with some sort of variable slicing described in the documentation, I just can't quite figure out how.
I was able to narrow the issues down to a fitroutine I was using.
TLDR: Do not use fitobjects (cfit or sfit) for plots in a parfor-loop!
Solutions:
Use wrappers like nlinfit() or lsqcurvefit() instead of fit(). They give you the fit parameters directly so you can call your fitfunction with them when plotting.
If you have to use fit() (for some reason it is the only one which was able to fit my data more or less consistently), extract the fit parameters and then call your fitfunction using cell expansion.
fitfunc = #(a,b,c,d,e,x) ( ... );
[fitobject,gof,fitinfo] = fit(x,y,fitfunc,fitoptions(..));
vFitparam = coeffvalues(fitobject);
vFitparam_cell = num2cell(vFitparam);
plot(ax,x,fitfunc(vFitparam_cell{:},x), ... );
As far as I know fit() requires the function handle to have subsequent parameters (not a vector), so by using a cell you can avoid bloated code like this:
plot(ax,x,fitfunc(vFitparam(1),vFitparam(2),vFitparam(3),vFitparam(4),vFitparam(5),x), ... );
I implemented a multilayer perceptron with 1 hidden layer on MNIST dataset. The activation function in hidden layer is leaky(0.01) ReLu and output layer has a softmax activation function. The learning method is mini-batch SGD. The network structure is 784*30*10. The problem is I found the predictions the network made, for each input sample, are quite similar. That means the model would always like to think the image is some certain number. Thanks #Lemm Ras for pointing out the label-data mismatching problem in previous data_shuffle function and now fixed. But after some batch training, I found the predictions are still some kind of similar: That's confusing.
Another issue is the update value is too small comparing with original weight, in the MLP code, I add variable 'cc' and 'dd' to record the ratio between their weight_update and weight,
cc=W_OUTPUT_Update./W_OUTPUT;
dd=W_MLP_Update./W_MLP;
During debugging, the magnitude for cc is 10^-4(0.0001) and dd is also 10^-4. This might be the reason that the accuracy doesn't seems improved a lot.
After several days debugging. I have no idea why that happens and how to solve it, it made me stuck for one week. Can someone help me please?
The screenshot is the value of A2 after softmax function.
[dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data(); %initialize str
images=images(:,1:10000); % for debugging, get part of whole data set
labels=labels(1:10000,1);
labels_matrix=labels_matrix(:,1:10000);
test_images=test_images(:,1:500);
test_labels=test_labels(1:500,1);
train_amount=10000;
test_amount=500;
% initialize the structure
[ W_MAD, W_MLP, W_OUTPUT] = initialize_structure(dimension, train_amount, test_amount);
epoch=100;
correct_rate=zeros(1,epoch); %record testing accuracy
corr=zeros(1,epoch); %record training accuracy
lr=0.2;
lamda=0;
batch_size=50;
for i=1:epoch
sprintf('MLP in iteration %d over %d', i, epoch)
%shuffle data
[labels_shuffled labels_matrix_shuffled images_shuffled]=shuffle_data(labels, labels_matrix,images);
[ cor, W_MLP, W_OUTPUT ] = train_mlp_relu(lr, leaky, lamda, momentum_gamma, batch_size,W_MLP, W_OUTPUT, W_MAD, power, images_shuffled, train_amount, labels_shuffled, labels_matrix_shuffled);
corr(i)=cor/train_amount;
% test
correct_rate(i) = structure_test( W_MAD, W_MLP, W_OUTPUT, test_images, test_labels, test_amount );
end
% plot results
plot(1:epoch,correct_rate);
Here's the training MLP function, please ignore L2 regularization parameter lamda which is currently set as 0.
%MLP with batch size batch_size
cor=0;
%leaky=(1/batch_size);
leaky=0.001;
for i=1:train_amount/batch_size
batch_images=images(:,batch_size*(i-1)+1:batch_size*i);
batch_labels=labels_matrix(:,batch_size*(i-1)+1:batch_size*i);
%from MAD to MLP
V1=W_MLP'*batch_images;
V1(1,:)=1; %set bias unit as 1
V1_dirivative=ones(size(V1));
V1_dirivative(find(V1<0))=leaky;
A1=relu(V1,leaky); % A stands for activation
V2=W_OUTPUT'* A1;
A2=softmax(V2);
%write these scope control codes into functions.
%train error
[val idx]=max(A2);
idx=idx-1; %because index(idx) for matrix vaires from 1 to 10 while label varies from 0 to 9.
res=labels(batch_size*(i-1)+1:batch_size*i)-idx';
cor=cor+sum(res(:)==0);
%softmax loss, due to relu applied nodes that has
%contribution to activate neurons has gradient 1; while <0 nodes
%has no contribution
delta_softmax=-(1/batch_size)*(batch_labels-A2);
delta_output=W_OUTPUT*delta_softmax.*V1_dirivative;
%update
W_OUTPUT_Update=lr*(1/batch_size)*A1*delta_softmax'+lamda*W_OUTPUT;
cc=W_OUTPUT_Update./W_OUTPUT;
W_MLP_Update=lr*(1/batch_size)*batch_images*delta_output'+lamda*W_MLP;
dd=W_MLP_Update./W_MLP;
k=mean(A2,2);
W_OUTPUT=W_OUTPUT-W_OUTPUT_Update;
W_MLP=W_MLP-W_MLP_Update;
end
end
Here is the softmax function:
function [ val ] = softmax( val )
val=exp(val);
val=val./repmat(sum(val),10,1);
end
The labels_matrix is the aimed output matrix for A2 and created as:
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
And Relu:
function [ val ] = relu( val,leaky )
val(find(val<0))=leaky*val(find(val<0));
end
Data shuffle
%this version is wrong, due to it only shuffles label and data without doing the same shuffling on the 'labels_matrix' which is used to calculate MLP's delta in output layer. It destroyed the link between data and label.
% function [ label, data ] = shuffle_data( label, data )
% [row column]=size(data);
% array=randperm(column);
% data=data(:,array);
% label=label(array);
% %if shuffle respect to row then use the code below
% %data=data(randperm(row),:);
% end
function [ label, label_matrix, data ] = shuffle_data( label, label_matrix, data )
[row column]=size(data);
array=randperm(column);
data=data(:,array);
label=label(array);
label_matrix=label_matrix(:, array);
%if shuffle respect to row then use the code below
%data=data(randperm(row),:);
end
Data loading:
function [ dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data()
%%load training and testing data, labels
data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-images.idx3-ubyte';
label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-labels.idx1-ubyte';
test_data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-images.idx3-ubyte';
test_label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-labels.idx1-ubyte';
images = loadMNISTImages(data_location);
labels = loadMNISTLabels(label_location);
test_images=loadMNISTImages(test_data_location);
test_labels=loadMNISTLabels(test_label_location);
%%data centralization
[dimension train_amount]=size(images);
[dimension test_amount]=size(test_images);
%%complete normalization
%%transform labels from index to matrix in order to apply square loss function in output layer
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
end
When you are shuffling the images, the association data-label is lost. Since this association must survive, what you need is to enforce the same shuffling for both data and labels.
In order to do so you could, for instance, Create an external shuffled index list: shuffled=randperm(N), with N the number of images and then pass to the train method either the list created or the elements of images and label addressed by the shuffled list.
I was training to do some PCA reconstroctions of MNIST on python and compare them to my (old) reconstruction in maltab and I happened to discover that my reconstruction don't agree. After some debugging I decided to print a unique characteristic of the principal components of each one to reveal if they were the same and I discovered to my surprised that they were not the same. I printing the sum of all components and I got different numbers. I did the following in matlab:
[coeff, ~, ~, ~, ~, mu] = pca(X_train);
U = coeff(:,1:K)
U_fingerprint = sum(U(:))
%print 31.0244
and in python/scipy:
pca = pca.fit(X_train)
U = pca.components_
print 'U_fingerprint', np.sum(U)
# prints 12.814
why are the twi PCA's not computing the same value?
All my attempts and solving this issue:
The way I discovered this was because when I was reconstructing my MNIST images, the python reconstructions where much much closer to their original images by a lot. I got error of 0.0221556788645 in python while in MATLAB I got errors of size 29.07578. To figure out where the difference was coming from I decided to finger print the data sets (maybe they were normalized differently). So I got two independent copies the MNIST data set (that were normalized by dividing my 255) and got the finger prints (summing all numbers in data set):
print np.sum(x_train) # from keras
print np.sum(X_train)+np.sum(X_cv) # from TensorFlow
6.14628e+06
6146269.1585420668
which are (essentially) same (one copy from tensorflow MNIST and the other from Keras MNIST, note MNIST train data set has about 1000 less training set so you need to append the missing ones). To my surprise, my MATLAB data had the same finger print:
data_fingerprint = sum(X_train(:))
% prints data_fingerprint = 6.1463e+06
meaning the data sets are exactly the same. Good, so the normalization data is not the issue.
In my MATLAB script I am actually computing the reconstruction manually as follow:
U = coeff(:,1:K)
X_tilde_train = (U * U' * X_train);
train_error_PCA = (1/N_train)*norm( X_tilde_train - X_train ,'fro')^2
%train_error_PCA = 29.0759
so I thought that might be the problem because I was using the interface python gave for computing the reconstructions as in:
pca = PCA(n_components=k)
pca = pca.fit(X_train)
X_pca = pca.transform(X_train) # M_train x K
#print 'X_pca' , X_pca.shape
X_reconstruct = pca.inverse_transform(X_pca)
print 'tensorflow error: ',(1.0/X_train.shape[0])*LA.norm(X_reconstruct_tf - X_train)
print 'keras error: ',(1.0/x_train.shape[0])*LA.norm(X_reconstruct_keras - x_train)
#tensorflow error: 0.0221556788645
#keras error: 0.0212030354818
which results in different error values 0.022 vs 29.07, shocking difference!
Thus, I decided to code that exact reconstruction formula in my python script:
pca = PCA(n_components=k)
pca = pca.fit(X_train)
U = pca.components_
print 'U_fingerprint', np.sum(U)
X_my_reconstruct = np.dot( U.T , np.dot(U, X_train.T) )
print 'U error: ',(1.0/X_train.shape[0])*LA.norm(X_reconstruct_tf - X_train)
# U error: 0.0221556788645
to my surprise, it has the same error as my MNIST error computing by using the interface. Thus, concluding that I don't have the misconception of PCA that I thought I had.
All that lead to me to check what the principal components actually where and to my surprise scipy and MATLAB have different fingerprint for their PCA values.
Does anyone know why or whats going on?
As warren suggested, the pca components (eigenvectors) might have different sign. After doing a finger print by adding all components in magnitude only I discovered they have the same finger print:
[coeff, ~, ~, ~, ~, mu] = pca(X_train);
K=12;
U = coeff(:,1:K)
U_fingerprint = sumabs(U(:))
% U_fingerprint = 190.8430
and for python:
k=12
pca = PCA(n_components=k)
pca = pca.fit(X_train)
print 'U_fingerprint', np.sum(np.absolute(U))
# U_fingerprint 190.843
which means the difference must be because of the different sign of the (pca) U vector. Which I find very surprising, I thought that should make a big difference, I didn't even consider it making a big difference. I guess I was wrong?
I don't know if this is the problem, but it certainly could be. Principal component vectors are like eigenvectors: if you multiply the vector by -1, it is still a valid PCA vector. Some of the vectors computed by matlab might have a different sign than those computed in python. That will result in very different sums.
For example, the matlab documentation has this example:
coeff = pca(ingredients)
coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
I have my own python PCA code, and with the same input as in matlab, it produces this coefficient array:
[[ 0.0678 0.646 -0.5673 0.5062]
[ 0.6785 0.02 0.544 0.4933]
[-0.029 -0.7553 -0.4036 0.5156]
[-0.7309 0.1085 0.4684 0.4844]]
So, instead of simply summing the coefficient array, try summing the absolute values of the coefficients. Alternatively, ensure that all the vectors have the same sign convention before summing. You could do that by, say, multiplying each column by the sign of the first element in that column (assuming none of them are zero).
After finishing cluster analysis,when I input some new data,how Do I know which cluster do the data belongs to?
data(freeny)
library(RSNNS)
options(digits=2)
year<-as.integer(rownames(freeny))
freeny<-cbind(freeny,year)
freeny = freeny[sample(1:nrow(freeny),length(1:nrow(freeny))),1:ncol(freeny)]
freenyValues= freeny[,1:5]
freenyTargets=decodeClassLabels(freeny[,6])
freeny = splitForTrainingAndTest(freenyValues,freenyTargets,ratio=0.15)
km<-kmeans(freeny$inputsTrain,10,iter.max = 100)
kclust=km$cluster
kmeans returns an object containing the coordinates of the cluster centers in $centers. You want to find the cluster to which the new object is closest (in terms of the sum of squares of distances):
v <- freeny$inputsTrain[1,] # just an example
which.min( sapply( 1:10, function( x ) sum( ( v - km$centers[x,])^2 ) ) )
The above returns 8 - same as the cluster to which the first row of freeny$inputsTrain was assigned.
In an alternative approach, you can first create a clustering, and then use a supervised machine learning to train a model which you will then use as a prediction. However, the quality of the model will depend on how good the clustering really represents the data structure and how much data you have. I have inspected your data with PCA (my favorite tool):
pca <- prcomp( freeny$inputsTrain, scale.= TRUE )
library( pca3d )
pca3d( pca )
My impression is that you have at most 6-7 clear classes to work with:
However, one should run more kmeans diagnostic (elbow plots etc) to determine the optimal number of clusters:
wss <- sapply( 1:10, function( x ) { km <- kmeans(freeny$inputsTrain,x,iter.max = 100 ) ; km$tot.withinss } )
plot( 1:10, wss )
This plot suggests 3-4 classes as the optimum. For a more complex and informative approach, consult the clusterograms: http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/
I have a 1000x2 data file that I'm using for this problem.
I am supposed to fit the data with Acos(wt + phi). t is time, which is the first column in the data file, i.e. the independent variable. I need to find the fit parameters (A, f, and phi) and their uncertainties.
My code is as follows:
%load initial data file
data = load('hw_fit_cos_problem.dat');
t = data(:,1); %1st column is t (time)
x = t;
y = data(:,2); %2nd column is y (signal strength)
%define fitting function
f = fittype('A*cos(w*x + p)','coefficients','A','problem',{'w','p'});
% check fit parameters
coeffs = coeffnames(f);
%fit data
[A] = fit(x,y,f)
disp('confidence interval/errorbars');
ci = confint(A)
which yields 4 different error messages that I don't understand.
Error Messages:
Error using fit>iAssertNumProblemParameters (line 1113)
Missing problem parameters. Specify the values as a cell array with one element for each problem parameter
in the fittype.
Error in fit>iFit (line 198)
iAssertNumProblemParameters( probparams, probnames( model ) );
Error in fit (line 109)
[fitobj, goodness, output, convmsg] = iFit( xdatain, ydatain, fittypeobj, ...
Error in problem2 (line 14)
[A] = fit(x,y,f)
The line of code
f = fittype('A*cos(w*x + p)','coefficients','A','problem',{'w','p'});
specifies A as a "coefficient" in the model, and the values w and p as "problem" parameters.
Thus, the fitting toolbox expects that you will provide some more information about w and p, and then it will vary A. When no further information about w and p was provided to the fitting tool, that resulted in an error.
I am not sure of the goal of this project, or why w and p were designated as problem parameters. However, one simple solution is to allow the toolbox to treat A, w, and p as "coefficients", as follows:
f = fittype('A*cos(w*x + p)','coefficients', {'A', 'w', 'p'});
In this case, the code will not throw an error, and will return 95% confidence intervals on A, w, and p.
I hope that helps.
The straightforward answer to your question is that the error "Missing problem parameters" is generated because you have identified w and p as problem-specific fixed parameters,
but you have not told the fit function what these fixed values are.
You can do this by changing the line
[A] = fit(x,y,f)
to
[A]=fit(x,y,f,'problem',{100,0.1})
which supplies the values w=100 and p=0.1 in the fit. This should resolve the errors you specified (all 4 error messages result from this error)
In general specifying some of the quantities in your fit equation as problem-specific fixed parameters might be a valid thing to do - for example if you have determined them independently and have good reason to believe the values you obtained to be reliable. In this case, you might know the frequency w in this way, but you most probably won't know the phase p, so that should be a fit coefficient.
Hope that helps.