I am using this block of code to get all the combinations of the rows of a matrix as I have to apply brute force.
function combinations(training)
S=dec2bin(1:2^size(training,1)-1)=='1';
allSubsets=cell(size(S,1),1);
for i=1:size(S,1)
allSubsets{i}=training(S(i,:),:);
display(allSubsets{i})
end
end
If I run this function on a small matrix of lets say 7 or even 20 rows it works perfectly fine. But once I increase the size of the matrix to 25 it gives me this error.
Out of memory. Type HELP MEMORY for your options.
Error in dec2bin (line 55)
s=char(rem(floor(d*pow2(1-max(n,e):0)),2)+'0');
Error in combination (line 11) S=dec2bin(1:2^size(training,1)-1)=='1';
Moreover, when I increase the number of rows up to 120 it gives me the following error :/
Maximum variable size allowed by the program is exceeded.
Error in combination (line 11) S=dec2bin(1:2^size(training,1)-1)=='1';
I have to run this on a data of 2000 rows having 69 columns. Please help!!
I'm using spark to run LinearRegression.
Since my data can not be predicted to a linear model, I added some higher polynomial features to get a better result.
This works fine!
Instead of modifying the data myself, I wanted to use the PolynomialExpansion function from the spark library.
To find the best solution I used a loop over different degrees.
After 10 iterations (degree 10) I ran into the following error:
Caused by: java.lang.IllegalArgumentException: requirement failed: You provided 77 indices and values, which exceeds the specified vector size -30.
I used trainingData with 2 features. This sounds like I have too many features after the polynomial expansion when using a degree of 10, but the vector size -30 confuses me.
In order to fix this I started experimenting with different example data and degrees. For testing I used the following lines of code with different testData (with only one entry line) in libsvm format:
val data = spark.read.format("libsvm").load("data/testData2.txt")
val polynomialExpansion = new PolynomialExpansion()
.setInputCol("features")
.setOutputCol("polyFeatures")
.setDegree(10)
val polyDF2 = polynomialExpansion.transform(data)
polyDF2.select("polyFeatures").take(3).foreach(println)
ExampleData: 0 1:1 2:2 3:3
polynomialExpansion.setDegree(11)
Caused by: java.lang.IllegalArgumentException: requirement failed: You provided 333 indices and values, which exceeds the specified vector size 40.
ExampleData: 0 1:1 2:2 3:3 4:4
polynomialExpansion.setDegree(10)
Caused by: java.lang.IllegalArgumentException: requirement failed: You provided 1000 indices and values, which exceeds the specified vector size -183.
ExampleData: 0 1:1 2:2 3:3 4:4 5:5
polynomialExpansion.setDegree(10)
Caused by: java.lang.IllegalArgumentException: requirement failed: You provided 2819 indices and values, which exceeds the specified vector size -548.
It looks like the number of features from the data has an affect on the highest possible degree, but the number of features after the polynomial expansion seems not to be the cause for the error since it differs a lot.
It also doesn't crash at the expansion function but when I try to print the new features in the last line of code.
I was thinking that maybe my memory was full at that time, but I checked the system control and there was still some free memory available.
I'm using:
Eclipse IDE
Maven project
Scala 2.11.7
Spark 2.0.0
Spark-mllib 2.0.0
Ubuntu 16.04
I'm glad for any ideas regarding this problem
Hi I have been using Murphy's HMM toolbox with output of Gaussian Mixture. In brief, I have 2 datasets for training. Each dataset comprises of 2000 observations with 11 dimensions per observation. I implemented the following steps to observe the path sequence output.
N_states=2
N_Gaussian_Mixture=1
For each of the dataset, a HMM model was generated. The steps are:
Step 1: mixgauss_init() was used to generated GMM signature for my training data.
Step 2: After declaring the matrices for Prior and Transmat, mhmm_em() was used to generate HMM model for the training dataset.
Testing: 2 test data from each of the dataset are used for testing using mhm_logprob(). The output were correctly predicted using loglikelihood scores in every run.
However, when I tried to observe the sequence of the HMM modelling (Dataset_123 with testdata_123) via mixgauss_prob() followed by viterbi_path(), the output sequences were inconsistent. For example, for the first run, the output sequence can be 2221111111111. But when I rerun the program again, the sequence can change to 1111111111111 or 1111111111222. Initially I thought it could be due to my Prior matrix. I fixed the Prior value but it is not helping.
Secondly, it there a possibility when I can assigned labels to the states and sequence? Like Matlab function:
hmmgenerate(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be a numeric array or a cell array of the names of the symbols. The default symbols are integers 1 through N, where N is the number of possible emissions.
`hmmgenerate(...,'Statenames',STATENAMES) specifies the names of the states. STATENAMES can be a numeric array or a cell array of the names of the states. The default state names are 1 through M, where M is the number of states.?
Thank you for your time and hope to hear from the expert sharing.
I'm trying to train a cascade classifier by the built-in Matlab function "trainCascadeObjectDetector", but that always shows the following error message when I call this function:
trainCascadeObjectDetector('MCsDetector.xml',positiveInstances(1:5000,:),'./negativeSubFolder/',...
'FalseAlarmRate',0.01,'NumCascadeStages',5, 'FeatureType', 'LBP');
Automatically setting ObjectTrainingSize to [ 32, 32 ]
Using at most 980 of 1000 positive samples per stage
Using at most 1960 negative samples per stage
265 ocvTrainCascade(filenameParams, trainerParams, cascadeParams, boostParams, ...
Training stage 1 of 5
[....................................................Time to train stage 1: 12 seconds
Error using ocvTrainCascade
Error in generating samples for training. No samples could be generated for training the first cascade stage.
Error in trainCascadeObjectDetector (line 265)
ocvTrainCascade(filenameParams, trainerParams, cascadeParams, boostParams, ...
The number of samples are 5000 positive images and 11000 negative images. The Matlab version is 2014a that is running on Ubuntu 12.04.
I am not sure if I need to increase more training data, because the error message is:
Error in generating samples for training. No samples could be generated for training the first cascade stage.
Could you please have a look at this? Thanks!
First of all, what is the data type of positiveInstances? It should be a 1D array of structs with two fields: imageFileName and objectBoundingBoxes. positiveInstances(1:5000,:) looks a bit suspicious, because you are treating it as a 2D matrix.
The second thing to check is the negativeSubFolder. It should contain a lot of images without the objects of interest to be able to generate 1960 negative samples per stage.
For future reference, there is a tutorial in the MATLAB documentation.
I have a dataset in .csv format as shown:
NRC_CLASS,L1_MARKS_FINAL,L2_MARKS_FINAL,L3_MARKS_FINAL,S1_MARKS_FINAL,S2_MARKS_FINAL,S3_MARKS_FINAL,
FAIL,7,12,12,24,4,30,
PASS,49,36,46,51,31,56,
FAIL,59,35,42,18,18,45,
PASS,61,30,51,33,30,52,
PASS,68,30,35,53,45,54,
2,82,77,75,32,36,56,
FAIL,18,35,35,32,21,35,
2,86,56,46,44,37,60,
1,94,45,62,70,50,59,
Where the first column talks about the over all grade:
FAIL - Fail
PASS - Pass class
1 - First class
2 - Second class
D - Distinction
This is followed by marks of each student in 6 subjects.
Is there anyway i can find out performance in which subject makes a difference in overall outcome?
I am using Weka and had used J48 to build a tree.
The summary of J48 classifier is:
=== Summary ===
Correctly Classified Instances 30503 92.5371 %
Incorrectly Classified Instances 2460 7.4629 %
Kappa statistic 0.902
Mean absolute error 0.0332
Root mean squared error 0.1667
Relative absolute error 10.8867 %
Root relative squared error 42.7055 %
Total Number of Instances 32963
Also I discretized the marks data into 10 bins with useEqualFrequency set to true. The summary of J48 now is:
=== Summary ===
Correctly Classified Instances 28457 86.3301 %
Incorrectly Classified Instances 4506 13.6699 %
Kappa statistic 0.8205
Mean absolute error 0.0742
Root mean squared error 0.2085
Relative absolute error 24.3328 %
Root relative squared error 53.4264 %
Total Number of Instances 32963
First of all, you may need to quantify a value for each of the NRC_CLASS Values (or even better, use the actual grade out of 100) to improve the quality of attribute testing.
From there, you could potentially use Attribute Selection (found in the Select Attribute tab of Weka Explorer) to find the attributes that have the greatest influence on the overall grade. Perhaps the CorrelationAttributeEval as the Attribute Evaluator coupled with the Ranker search method could assist in identifying the attributes of greatest importance to the least.
Hope this Helps!
It seems you want to determine the relative relevance of each attribute. In this case, you need to use a weight learning algorithm. Weka has a few, I just used Relief. Go to the tab Select attributes, in Attribute Evaluator, select ReliefF-AttributeEval, it will select the
Select the attribute that has the value for the outcome class.
Search Method for you. Click Start.
The results will include the ranked attributes, the highest ranked is the most relevant.
In a test data set T with 25 attributes, run i=1:25 rounds where you replace the values of the i-th attribute with random values (=noise). Compare the test performance of each of the 25 rounds with the case where no attribute was replaced, and identify the round in which the performance dropped the most.
If the worst performance decrease occurred e.g. in round 13, this indicates that attribute 13 is the most important one.