Append for MATLAB - matlab

I am training an ANN, and I want to have different instances of training. In each instance, I want to find the maximum difference between the actual and predicted output. Then I want to take the average of all these maximums.
My code so far is:
maximum = [];
k=1;
for k = 1:5
%Train network
layers = [ ...
imageInputLayer([250 1 1])
reluLayer
fullyConnectedLayer(100)
fullyConnectedLayer(100)
fullyConnectedLayer(1)
regressionLayer];
options = trainingOptions('sgdm','InitialLearnRate',0.1, ...
'MaxEpochs',1000);
net = trainNetwork(nnntrain,nnnfluidtrain,layers,options);
net.Layers
%Test network
predictedn = predict(net,nnntest);
maximum = append(maximum, max(abs(predictedn-nnnfluidtest)));
k=k+1
end
My intent is to produce a list named 'maximum' with five elements (the max of each ANN training instance) that I would then like to take the average of.
However, it keeps giving me the error:
wrong number of input arguments for obsolete matrix-based syntax
when it tries to append. The first input is a list while the second is a 1x1 single.

Appending in MATLAB is a native operation. You append elements by actually building a new vector where the original vector is part of the input.
Therefore:
maximum = [maximum max(abs(predictedn-nnnfluidtest))];
If for some reason you would like to do it in function form, the function you are looking for is cat which is short form for concatenate. The append function is seen in multiple toolboxes but each one of them does not do what you want. cat is what you want but you still need to provide the original input vector as part of the arguments:
maximum = cat(2, maximum, max(abs(predictedn-nnnfluidtest)));
The first argument is the axis you want to append to. To respect the code that you're doing above, you want the columns to increase as you extend your vector so that is the second axis, or the axis being 2.

Related

Encode each training image as a histogram of the number of times each vocabulary element shows up for Bag of Visual Words

I want to implement bag of visual words in MATLAB. I used SURF features to extract features from the images and k-means to cluster those features into k clusters. I now have k centroids and I want to know how many times each cluster is used by assigning each image feature to its closet neighbor. Finally, I'd like to create a histogram of this for each image.
I tried to use knnsearch function but it doesn't work in this case.
Here is my MATLAB code:
clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name};
for k=1:numel(files)
fullFileName = fullfile(folder, files{k});
H = fspecial('log');
image=imfilter(imread(fullFileName),H);
temp = detectSURFFeatures(image);
[im_features, temp] = extractFeatures(image, temp);
features{k}= im_features;
end
features = vertcat(features{:});
image_feats = [];
[assignments,centers] = kmeans(double(features),500);
vocab = centers';
I have all images feature in features array and cluster center in centroid array
You're almost there. You don't even need to use knnsearch at all. The assignments variable tells you which input feature mapped to which cluster. assignments will give you a N x 1 vector where N is the total number of examples you have, or the total number of features in the input matrix features. Each value assignments(i) tells you which cluster the example i (or row i) of features it maps to. The cluster centroid dictated by assignments(i) would be given as centers(i, :).
Therefore given how you've called kmeans, it will be a N x 1 vector where each element is from 1 to 500 with 500 being the total number of clusters desired.
Let's do the simple case where we only have one image in your codebook. If this is the case, all you have to do is create a histogram of the assignments variable. The output histogram h will be a 500 x 1 vector with each element h(i) being the number of times an example used centroid i as its representation in your codebook.
Just use the histcounts function and make sure that you specify the bin ranges so that they coincide with each cluster ID. You must make sure that you account for the ending bin, as the bin ranges are exclusive on the right edge so just add an additional bin to the end.
Something like this will work:
h = histcounts(assignments, 1 : 501);
If you want something simpler and you don't want to worry about specifying the end bin, you can use accumarray to achieve the same result:
h = accumarray(assignments, 1);
The effect of accumarray we assign key-value pairs where the key is the centroid that the example mapped to and the value is simply 1 for all keys. accumarray will bin all values in assignments that share the same key and you do something with those values. The default behaviour of accumarray is to sum all values, which is effectively computing the histogram.
However, you want to do this for multiple images, not just a single image.
For Bag of Visual Words problems, we will certainly have more than one training image in our database. Therefore, you want to find the histogram of the features for each image. We can still use the above concept, but one thing I can suggest is you maintain a separate variable that tells you how many features were detected per image, then you can index into the assignments variable to help extract out the correct assigned centroid IDs, then build a histogram of those individually. We can build a 2D matrix where each row delineates the histogram of each image. Remember that in kmeans, each row tells you what cluster each example was assigned to independently of the other examples in your data. Using that, you would use kmeans on the entire training dataset, then be smart about how you're accessing the assignments variable to extract out the assigned clusters for each input image.
Therefore, modify your code so that it looks something like this:
clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name};
num_features = zeros(numel(files), 1); % New - for keeping track of # of features per image
for k=1:numel(files)
fullFileName = fullfile(folder, files{k});
H = fspecial('log');
image=imfilter(imread(fullFileName),H);
temp = detectSURFFeatures(image);
[im_features, temp] = extractFeatures(image, temp);
num_features(k) = size(im_features, 1); % New - # of features per image
features{k}= im_features;
end
features = vertcat(features{:});
num_clusters = 500; % Added to make the code adaptive
[assignments,centers] = kmeans(double(features), num_clusters);
counter = 1; % Keeps track of where we need to slice in assignments
% Go through each image and find their histograms
features_hist = zeros(numel(files), num_clusters); % Records the per image histograms
for k = 1 : numel(files)
a = assignments(counter : counter + num_features(k) - 1); % Get the assignments
h = histcounts(a, 1 : num_clusters + 1);
% Or:
% h = accumarray(a, 1).'; % Transpose to make it a row
% Place in final output
features_hist(k, :) = h;
% Increment counter
counter = counter + num_features(k);
end
features_hist will now be a N x 500 matrix where each row is the histogram of each image you are seeking. The final job would be to use a supervised machine learning algorithm (SVM, Neural Networks, etc.) where the expected labels is the description of each image you have assigned to the image accompanied by the histogram of each image as the input features. The final result would be a learned model so that when you have a new image, calculate the SURF features, represent them in a histogram of features like we did above, then feed it into the classification model to give you the expected class or label that the image represents.
P.S. Deep Learning / CNNs do a much better job at this, but require much more time to train. If you're looking at performance wise, don't use Bag of Visual Words but this is something very quick to implement and it's known to perform moderately well but that of course depends on the kinds of images you want to classify.

min of row on Matlab GPU with arrayfun

I would like to find the index of the smallest value resulting from some computation, like the nearest value, using Matlab gpuArrays.
However, in the arrayfun scenario the min function doesn't seem to offer the functionality.
With the following code:
function grid_gpu_test
gridSize = 8;
grid = gpuArray(rand(gridSize));
all_c=1:gridSize; % because : is not supported
function X = min_diff(row)
X = min(abs(grid(row,all_c)-grid(row,1)))
end
rows = gpuArray.colon(2, gridSize)';
arrayfun(#min_diff, rows)
end
I get the following error:
Too few input arguments supplied to: 'min'. Error in 'grid_gpu_test' (line: 9)
Is there a way to achieve this? I know that using min(gpuArray) works normally when it's not in arrayfun, but I want to achieve this with an operation that doesn't simplify into matrix operations.
I'm a little confused by your question, because your code errors out when you try to run it on the CPU. By making rows go 2:(gridSize+1), then it exceeds the size of grid.
In any case, I think here rather than arrayfun, you want to use bsxfun (or implicit expansion if you have R2016b or later). Here's the bsxfun version.
grid = gpuArray.rand(8);
% I think what you're trying to compute is the difference
% between each column of "grid" compared to the first column
difference = bsxfun(#minus, grid(:,1), grid);
% To find the minimum difference, and its column, use
% the following form of MIN
[val, col] = min(difference, [], 2)
Here I'm using the "reduction" form of min, and I want to reduce across columns, so I need to pass in the 2 as the third argument. The second argument is [] to tell MATLAB that you want the "reduction" form of min, rather than the element-wise form of min. (Note that gpuArray/arrayfun supports only the element-wise form of min, which explains the error you're seeing).
Based on the extra information in the comments, perhaps xcorr2 is what you're after (this works on the GPU).

How can I apply Huffman coding correctly?

I applied the zigzag function after quantization to an image block, and I want to compute the Huffman coding of this block. I understand that the input argument must be a vector, and that the histogram should be calculated.
I wrote the following code, but it doesn't seem to work:
[M N]=size(yce);
fun1=zigzag(yce);
count1 = imhist(fun1);
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(count1,p1);
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
I get the following error with the huffmandict function:
Error in project at 65
[dict1,avglen1]=huffmandict(count1,p1);
Source symbols repeat.
zigzag.m is a written function in a matlab file.it converts a matrix into a vector,thus eliminating long sequences of zeros.
The Huffman encoding function (huffmandict) in MATLAB requires that the symbols vector (first argument of the function) must all be unique values. This symbols vector is a list of all possible symbols that are seen in your data that you want to encode / compress. As such, it wouldn't make sense to have a list of all symbols to be encountered if there are duplicates. This is much like a dictionary of words, where it wouldn't make sense to see the same word twice in this dictionary. The second parameter of the function is the associated probabilities of occurrence for each symbol in your sequence.
With huffmandict, what you are doing is you are creating a dictionary for Huffman encoding that consists of all possible unique symbols to be encountered when encoding/decoding as well as their associated probabilities. Therefore, by examining your code, you need to extract both the bin locations as well as the probabilities of occurrence when using imhist. Essentially, you need to call the two element output version of imhist. The second output of imhist gives you a list of all possible intensities / symbols that were encountered in the data, while the first element gives you the frequency of each these intensities / symbols in your data. You then normalize the first output element by the total number of symbols / intensities in your data to get the probabilities (assuming equiprobable encounters of course). Once this is complete, you use both of these as input into huffmandict.
In other words, you need to change only two lines of code, thus:
[M N]=size(yce);
fun1=zigzag(yce);
[count1,x] = imhist(fun1); %// Change
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(x,p1); %// Change
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
Edit
Knowing how fun1 is structured now, do not use imhist. imhist assumes that you are putting in image data, but it doesn't look like that's the case. Instead, try using histc instead to compute the frequency of occurrence. As such, simply modify your code to this:
[M N]=size(yce);
fun1=zigzag(yce);
bins = unique(fun1); %// Change
count1 = histc(fun1, bins); %// Change
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(bins,p1); %// Change
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
unique finds those unique values that are in your vector so that we can use these as bins to calculate our frequencies. This also figures out the all possible symbols seen in the data.

Multiple output of bootstrap in MATLAB

I have a function M file defined as follows:
function [v,m ] = myfun(y)
m=mean(y);
v=var(y);
end
For a given vector which consists of integers from 1 to 100 for simplicity, I want to do bootstrap for 10 times and obtain both mean and variance for each bootstrapped sample. The following wouldn't work:
y=[1:100]';
[m,v]=bootstrp(10,#(x) myfun(x),y);
Could any one help me out of here? Thanks in advance!
Why don't you think it works? This does exactly what you're specifying. However, I would do away with specifying a separate function and putting the mean and standard deviation directly in the anonymous function itself. Specifically:
stats = bootstrp(10, #(x) [mean(x) var(x)], y);
In this case, you will get a 10 x 2 matrix. The first column will give you the mean of each boostrapped sample while the next column will give you the variance of each bootstrapped sample. Specifically, the first row gives you the mean (first column) and variance (second column) of the first sample. The second row gives you the mean and variance of the second sample, and so on. Each column of your output in stats will give you whatever measure you are calculating in the corresponding position in the output vector of your function.
Check the documentation for bootstrp here: http://www.mathworks.com/help/stats/bootstrp.html
To answer your question as to why you're getting the too many outputs error is because you need to output only one variable, but you are outputting two. As such, group your variables into a single vector like so:
function [out] = myfun(y)
m=mean(y);
v=var(y);
out = [m,v];
end
If you now run your bootstrp code with this function, it should now work.

Matlab vectorization of multiple embedded for loops

Suppose you have 5 vectors: v_1, v_2, v_3, v_4 and v_5. These vectors each contain a range of values from a minimum to a maximum. So for example:
v_1 = minimum_value:step:maximum_value;
Each of these vectors uses the same step size but has a different minimum and maximum value. Thus they are each of a different length.
A function F(v_1, v_2, v_3, v_4, v_5) is dependant on these vectors and can use any combination of the elements within them. (Apologies for the poor explanation). I am trying to find the maximum value of F and record the values which resulted in it. My current approach has been to use multiple embedded for loops as shown to work out the function for every combination of the vectors elements:
% Set the temp value to a small value
temp = 0;
% For every combination of the five vectors use the equation. If the result
% is greater than the one calculated previously, store it along with the values
% (postitions) of elements within the vectors
for a=1:length(v_1)
for b=1:length(v_2)
for c=1:length(v_3)
for d=1:length(v_4)
for e=1:length(v_5)
% The function is a combination of trigonometrics, summations,
% multiplications etc..
Result = F(v_1(a), v_2(b), v_3(c), v_4(d), v_5(e))
% If the value of Result is greater that the previous value,
% store it and record the values of 'a','b','c','d' and 'e'
if Result > temp;
temp = Result;
f = a;
g = b;
h = c;
i = d;
j = e;
end
end
end
end
end
end
This gets incredibly slow, for small step sizes. If there are around 100 elements in each vector the number of combinations is around 100*100*100*100*100. This is a problem as I need small step values to get a suitably converged answer.
I was wondering if it was possible to speed this up using Vectorization, or any other method. I was also looking at generating the combinations prior to the calculation but this seemed even slower than my current method. I haven't used Matlab for a long time but just looking at the number of embedded for loops makes me think that this can definitely be sped up. Thank you for the suggestions.
No matter how you generate your parameter combination, you will end up calling your function F 100^5 times. The easiest solution would be to use parfor instead in order to exploit multi-core calculation. If you do that, you should store the calculation results and find the maximum after the loop, because your current approach would not be thread-safe.
Having said that and not knowing anything about your actual problem, I would advise you to implement a more structured approach, like first finding a coarse solution with a bigger step size and narrowing it down successivley by reducing the min/max values of your parameter intervals. What you have currently is the absolute brute-force method which will never be very effective.