Calling function and getting - not enough input arguments, even though syntax is correct - matlab

I'm teaching myself classification, I read and understood the MatLab online help of the simple LDA classifier which uses the fisher iris dataset.
I have now moved to SVM. But even though I use the exact syntax from the help page I get an error of either not enough or too many input arguments.
I have made trained my SVMClassifier using svmtrain via the command:
SVMStruct = svmtrain(training,labels);
Where training is a 207 by 900 training matrix. There are 207 samples and 900 HoG descriptors or features. Similarly labels is a 207 by 1 column vector consisting of either +1 or -1 for their respective samples.
I then wanted to test it and see if this works by calling:
Group = svmclassify(SVMStruct,sample,'Showplot',true)
Where sample is a 2 by 900 matrix containing 2 test samples. I was expecting to get +1 and -1 as these are what the test samples should be labelled. But I get the error:
Too many input arguments.
And when I use the command
Group = svmclassify(SVMStruct,sample)
I get the error
Not enough input arguments.

You might have overloaded svmclassify function.
try
>> which svmclassify
to verify that you are actually calling the right function.
In case that you overloaded the function (that is, created a different function with the same name svmclassify) and it is located higher in your path then you'll need to rename the overloaded function and run svmclassify again.

Related

No Model Summary For GLMs in Pyspark / SparkML

I'm familiarizing myself with Pyspark and SparkML at the moment. To do so I use the titanic dataset to train a GLM for predicting the 'Fare' in that dataset.
I'm following closely the Spark documentation. I do get a working model (which I call glm_fare) but when I try to assess the trained model using summary I get the following error message:
RuntimeError: No training summary available for this GeneralizedLinearRegressionModel
Why is this?
The code for training was as such:
glm_fare = GeneralizedLinearRegression(
labelCol="Fare",
featuresCol="features",
predictionCol='prediction',
family='gamma',
link='log',
weightCol='wght',
maxIter=20
)
glm_fit = glm_fare.fit(training_df)
glm_fit.summary
Just in case someone comes across this question, I ran into this problem as well and it seems that this error occurs when the Hessian matrix is not invertible. This matrix is used in the maximization of the likelihood for estimating the coefficients.
The matrix is not invertible if one of the eigenvalues is 0, which occurs when there is multicollinearity in your variables. This means that one of the variables can be predicted with a linear combination of the other variables. Consequently, the effect of each of the variables cannot be identified with any significance.
A possible solution would be to find the variables that are (multi)collinear and remove one of them from the regression. Note however that multicollinearity is only a problem if you want to interpret the coefficients and not when the model is used for prediction.
It is documented possibly there could be no summary available for a model in GeneralizedLinearRegressionModel docs.
However you can do an initial check to avoid the error:
glm_fit.hasSummary() which is a public boolean method.
Using it as
if glm_fit.hasSummary():
print(glm_fit.summary)
Here is a direct like to the Pyspark source code
and the GeneralizedLinearRegressionTrainingSummary class source code and where the error is thrown
Make sure your input variables for one hot encoder starts from 0.
One error I made that caused summary not created is, I put quarter(1,2,3,4) directly to one hot encoder, and get a vector of length 4, and one column is 0. I converted quarter to 0,1,2,3 and problem solved.

how to compare the expected values with actual values using matlab's neural network toolbox?

I have been using matlab's neural network toolbox lately for my research. I created some neural networks using fitting tool of this toolbox. Now I encountered a problem while testing the network with new data. Basically, I want to test the network with some new data, which do not have the same sample number as the network. For example, I have created the network with the input of 12x36 matrix (12 variables, 36 samples) and output of 1x36 vector (1 variable, 36 samples). Now, I want to get the results for a new data (12x11520 (12 variables and 11520 samples)). However, when I put this data into the network and get the results, I only get an output vector of 1x36 (1 variable 36 samples) while I was expecting to get the results as 1x11520 (1 variable and 11520 samples). I am using the below line to get the results from the neural network named as network.
output_estimate = network(input');
I also tried the line below to get the results but the result did not change.
output_estimate = sim(network,input');
Could you please help me understand this and get the output as the same sample length so that I can compare the expected and actual results for the new data.
Thank you very much in advance.
Irem

MATLAB NN toolbox: Error using trainlm

I have a 90×8 dataset that I feature-extracted (by summing 1's in every 10×10 cell) from 90 character images i.e. digits 1-9. Every row represents an image.
I am trying to use following code to train a neural network and recognize new input images(that are digits between 1 and 9 inclusive):
net.trainFcn='traingdx';
net.performFcn='sse';
net.trainParam.goal=0.1;
net.trainParam.show=20;
net.trainParam.epochs=5000;
net.trainParam.mc=0.95;
net =newff(minmax(datasetNormalized'),[20 9],{'logsig' 'logsig'});
T=reshape(repmat([1:9],10,1),1,90);
[net,tr]=train(net,datasetNormalized,T);
Afterwards I want to use the following to recognize new images using the trained network. m is an image character that has also been feature extracted.
[a,m]=max(sim(net,m));
disp(b);
I am getting the following errors and I don't have any idea how to solve it:
Error using trainlm (line 109)
Inputs and targets have different numbers of samples.
Error in network/train (line 106) [net,tr] =
feval(net.trainFcn,net,X,T,Xi,Ai,EW,net.trainParam);
Error in Neural (line 55) [net,tr]=train(net,datasetNormalized,T);
Note: datasetNormalized is my dataset normalized in [0,1].
Which part causes the problem?
Inputs and targets have different numbers of samples. it seems to be the problem
T=reshape(repmat([1:9],10,1),1,90) --> T=reshape(repmat([1:9],10,1),90,1)
[net,tr]=train(net,datasetNormalized,T); --> [net,tr]=train(net,datasetNormalized',T);
T is to be used as target for the network; Therefore, following a friend's advice, I defined T as a 9*90 array in such a way that the first 10 columns have 1 in their first row-other rows being zero, the second 10 columns have 1 in their second row, and so on
T=zeros(9,90);
for j=1:90
i=ceil(j/10);
T(i,j)=1;
end
[net,tr]=train(net,datasetNormalized',T);
This solved the error I was getting upon training network, though I'm not still sure how it's going to be mapped to input characters and determine them.

keving murphy's hmm matlab toolbox assertion error

I am working on a project that needs to use hidden markov models. I downloaded Kevin Murphy's toolbox. I have some problems about the usage. In the toolbox webpage, he says that first input of dhmm_em and dhmm_logprob are symbol sequence data. On their examples, they give row vectors as data. So, when I give my symbol sequence as row vector, I get error;
??? Error using ==> assert at 9
assertion violated:
Error in ==> fwdback at 105
assert(approxeq(sum(alpha(:,t)),1))
Error in ==> dhmm_logprob at 17
[alpha, beta, gamma, ll] = fwdback(prior,
transmat, obslik, 'fwd_only', 1);
Error in ==> mainCourseProject at 110
loglik(train_act) =
dhmm_logprob(orderedSymbols,
hmm{train_act}.prior,
hmm{train_act}.trans,
hmm{act}.emiss);
However, before giving this error, code works for some symbol vectors. When I give my data as column vector, functions work fine, no errors. So why exactly am I getting this error?
You might say that I should be giving not single vectors, but vector sets, I also tried to collect my feature vectors in a struct and give row vectors as such, but nothing changed, I still get assertion error.
By the way, my symbol sequence does not have any zeros, I am doing everything almost the same as they showed in their examples, so I would be greatful if anyone could help me please.
Im not sure, but from the function call stack shown above, shouldn't the last line be hmm{train_act}.emiss instead of hmm{act}.emiss.
In other words when you computing the log-probability of a sequence, you should pass components that belong to the same HMM model (transition matrix, emission matrix, and prior probabilities).
By the way, the ASSERT in the code is a sanity check that a vector of probabilities should sum to 1. Oftentimes, when working with very small values (log-probabilities), numerical stability issues can creep in... You could edit the APPROXEQ function to relax the comparison a bit, by giving it a bigger margin of error
This error message and the code it refers to are human-readable. An assertion is a guard put in by the programmer, to ensure that certain conditions are met. In this case, what is the condition? approxeq(sum(alpha(:,t)),1) I'd venture to say that approxeq wants the values to be approximately equal, so this boils down to: sum(alpha(:,t)) ~= 1
Without knowing anything about the code, I'd also guess that these refer to probabilities. The probabilities of a node's edges must sum to one. Hopefully this starts you down a productive debugging path. If you can't figure out what's wrong with your input that produces this condition, start wading into the code a bit to see where this alpha vector comes from, and how it ended up invalid.

Matlab Weka Interface AdaBoost Issues: Out of Bounds Exception

I'm doing some cross-validation using a Matlab Weka Interface that I got from file exchange. My loop structure seems to work fine for Weka's Logistic classifier. However, when I try to do the exact same thing for AdaBoostM1, it throws the following error:
??? Java exception occurred: java.lang.ArrayIndexOutOfBoundsException
Error in ==> wekaClassify at 24 classProbs(t+1,:) = (classifier.distributionForInstance(testData.instance(t)))';
Error in ==> classifier_search at 225 [pred ~] = wekaClassify(matlab2weka('instance', featurelabels, tester), classifier);
I have determined through some testing that this only occurs when the number of instances in the training set is greater than the number of instances in the test set. I am sure you can see why that is a problem for me, since in most situations the training set is greater than the test set in size.
Is there something different about how I should format my inputs when using Adaboost rather than Logistic? Any information you can give regarding this problem would be so helpful.
I downloaded this code from this page: http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface
Emails bounce from the account of the guy who made it, and he doesn't seem to respond to comments on the page - I'm hoping that maybe someone here has used this.
EDIT: Here is the code that I use to train and test the classifier:
classifier = trainWekaClassifier(matlab2weka('training', featurelabels, train), 'meta.AdaBoostM1', { strcat('-P 100 -S 1 -I ', num2str(r), '-W weka.classifiers.trees.DecisionStump')});
[pred ~] = wekaClassify(matlab2weka('instance', featurelabels, tester), classifier);
I haven't used this combination of software, so I can only take a guess at what could cause this.
Are your training/testing data matrices the right way round? They should be N-by-D (N instances, D features).
If you were passing in a D-by-N training matrix and a D-by-M testing matrix, then I would expect it to work only when M < N - which is what you describe - and even then, it wouldn't give a meaningful result.