Vectorizing by splitting matrix by rows unequally - matlab

I have X_test which is a matrix of size 967874 x 3 where the columns are: doc#, wordID, wordCount, and there's 7505 unique doc#'s (length(unique(X_test(:,1))) == length(Y_test) == 7505). The matrix rows are also already sorted according to the doc#'s column.
I also have a likelihoods matrix of size 61188 x 20 where the rows are all possible wordIDs, and the columns are different classes (length(unique(Y_test))==20)
The result I'm trying to obtain is a matrix of size 7505 x 20 where each row signifies a different document and contains, for each class (column), the sum of the wordCounts of the values in the likelihood matrix rows which correspond to the wordIDs for that document (trying to think of better phrasing...)
My first thought was to rearrange this 2D matrix into a 3D matrix according to doc#s, but the number of rows for each unique doc# are unequal. I also think making a cell array of 7505 matrices isn't a great idea, but may be wrong about that.
It's probably more explanatory if I just show the code I have that works, but is slow because it iterates through each of the 7505 documents:
probabilities = zeros(length(Y_test),nClasses); % 7505 x 20
for n=1:length(Y_test) % 7505 iterations
doc = X_test(X_test(:,1)==n,:);
result = bsxfun(#times, doc(:,3), log(likelihoods(doc(:,2),:)));
% result ends up size length(doc) x 20
probabilities(n,:) = sum(result);
end
for context, this is what I use the probabilities matrix for:
% MAP decision rule
probabilities = bsxfun(#plus, probabilities, logpriors'); % add priors
[~,predictions] = max(probabilities,[],2);
CCR = sum(predictions==Y_test)/length(Y_test); % correct classification rate
fprintf('Correct classification percentage: %0.2f%%\n\n', CCR*100);
edit: so I separated the matrix into a cell array according to doc#'s, but don't know how to apply bsxfun to all arrays in a cell at the same time.
counts = histc(X_test(:,1),unique(X_test(:,1)));
testdocs = mat2cell(X_test,counts);

Related

How to sum 3d Matrix row by interval in Matlab?

I have a 36x256x2232 3d matrix in Matlab created by M = ones(36,256,2232) and I want to reduce the size of the matrix by sum rows by interval 3. The result matrix should be 12x256x2232 and each cell should have the value 3.
I tried using reshape and sum function but I get 1x256x2232 matrix.
How can I do this without using the for-loop ?
This should do it:
M = ones(36,256,2232)
reduced = reshape(sum(reshape(M, 3,[], 256,2232), 1),[], 256, 2232);
reshape makes a 4d matrix with the given intervals
sum reduce it
second reshape transform it to 3d again
you can use also squeeze, which removes singleton dimensions:
reduced = squeeze(sum(reshape(M, 3,[], 256,2232), 1));
You can use the new-ish splitapply function (which is similar to accumarray but can handle data with multiple dimensions). This approach works even if the number of rows is not a multiple of the group size:
M = ones(4,5,2); % example data
n = 3; % group size
result = splitapply(#(x)sum(x,1), M, floor((0:size(M,1)-1).'/n)+1);

Matrix Multiplication Issue - Matlab

In an attempt to create my own covariance function in MatLab I need to perform matrix multiplication on a row to create a matrix.
Given a matrix D where
D = [-2.2769 0.8746
0.6690 -0.4720
-1.0030 -0.9188
2.6111 0.5162]
Now for each row I need manufacture a matrix. For example the first row R = [-2.2770, 0.8746] I would want the matrix M to be returned where M = [5.1847, -1.9915; -1.9915, 0.7649].
Below is what I have written so far. I am asking for some advice to explain how to use matrix multiplication on a rows to produce matrices?
% Find matrices using matrix multiplication
for i=1:size(D, 1)
P1 = (D(i,:))
P2 = transpose(P1)
M = P1*P2
end
You are trying to compute the outer product of each row with itself stored as individual slices in a 3D matrix.
Your code almost works. What you're doing instead is computing the inner product or the dot product of each row with itself. As such it'll give you a single number instead of a matrix. You need to change the transpose operation so that it's done on P1 not P2 and P2 will now simply be P1. Also you are overwriting the matrix M at each iteration. I'm assuming you'd like to store these as individual slices in a 3D matrix. To do this, allocate a 3D matrix where each 2D slice has an equal number of rows and columns which is the number of columns in D while the total number of slices is equal to the total number of rows in D. Then just index into each slice and place the result accordingly:
M = zeros(size(D,2), size(D,2), size(D,1));
% Find matrices using matrix multiplication
for ii=1:size(D, 1)
P = D(ii,:);
M(:,:,ii) = P.'*P;
end
We get:
>> M
M(:,:,1) =
5.18427361 -1.99137674
-1.99137674 0.76492516
M(:,:,2) =
0.447561 -0.315768
-0.315768 0.222784
M(:,:,3) =
1.006009 0.9215564
0.9215564 0.84419344
M(:,:,4) =
6.81784321 1.34784982
1.34784982 0.26646244
Depending on your taste, I would recommend using bsxfun to help you perform the same operation but perhaps doing it faster:
M = bsxfun(#times, permute(D, [2 3 1]), permute(D, [3 2 1]));
In fact, this solution is related to a similar question I asked in the past: Efficiently compute a 3D matrix of outer products - MATLAB. The only difference is that the question wanted to find the outer product of columns instead of the rows.
The way the code works is that we shift the dimensions with permute of D so that we get two matrices of the sizes 2 x 1 x 4 and 1 x 2 x 4. By performing bsxfun and specifying the times function, this allows you to efficiently compute the matrix of outer products per slice simultaneously.

Vector and matrix comparison in MATLAB

I have vector with 5 numbers in it, and a matrix of size 6000x20, so every row has 20 numbers. I want to count how many of the 6000 rows contain all values from the vector.
As the vector is a part of a matrix which has 80'000'000 rows, each containing unique combinations, I want a fast solution (which doesn't take more than 2 days).
Thanks
With the sizes you have, a bsxfun-based approach that builds an intermediate 6000x20x5 3D-array is affordable:
v = randi(9,1,5); %// example vector
M = randi(9,6000,20); %// example matrix
t = bsxfun(#eq, M, reshape(v,1,1,[]));
result = sum(all(any(t,2),3));

How to select values with the higher occurences from several matrices having the same size in matlab?

I would like to have a program that makes the following actions:
Read several matrices having the same size (1126x1440 double)
Select the most occuring value in each cell (same i,j of the matrices)
write this value in an output matrix having the same size 1126x1440 in the corresponding i,j position, so that this output matrix will have in each cell the most occurent value from the same position of all the input matrices.
Building on #angainor 's answer, I think there is a simpler method using the mode function.
nmatrices - number of matrices
n, m - dimensions of a single matrix
maxval - maximum value of an entry (99)
First organize data into a 3-D matrix with dimensions [n X m X nmatrices]. As an example, we can just generate the following random data in a 3-D form:
CC = round(rand(n, m, nmatrices)*maxval);
and then the computation of the most frequent values is one line:
B = mode(CC,3); %compute the mode along the 3rd dimension
Here is the code you need. I have introduced a number of constants:
nmatrices - number of matrices
n, m - dimensions of a single matrix
maxval - maximum value of an entry (99)
I first generate example matrices with rand. Matrices are changed to vectors and concatenated in the CC matrix. Hence, the dimensions of CC are [m*n, nmatrices]. Every row of CC holds individual (i,j) values for all matrices - those you want to analyze.
CC = [];
% concatenate all matrices into CC
for i=1:nmatrices
% generate some example matrices
% A = round(rand(m, n)*maxval);
A = eval(['neurone' num2str(i)]);
% flatten matrix to a vector, concatenate vectors
CC = [CC A(:)];
end
Now we do the real work. I have to transpose CC, because matlab works on column-based matrices, so I want to analyze individual columns of CC, not rows. Next, using histc I find the most frequently occuring values in every column of CC, i.e. in (i,j) entries of all matrices. histc counts the values that fall into given bins (in your case - 1:maxval) in every column of CC.
% CC is of dimension [nmatrices, m*n]
% transpose it for better histc and sort performance
CC = CC';
% count values from 1 to maxval in every column of CC
counts = histc(CC, 1:maxval);
counts have dimensions [maxval, m*n] - for every (i,j) of your original matrices you know the number of times a given value from 1:maxval is represented. The last thing to do now is to sort the counts and find out, which is the most frequently occuring one. I do not need the sorted counts, I need the permutation that will tell me, which entry from counts has the highest value. That is exactly what you want to find out.
% sort the counts. Last row of the permutation will tell us,
% which entry is most frequently found in columns of CC
[~,perm] = sort(counts);
% the result is a reshaped last row of the permutation
B = reshape(perm(end,:)', m, n);
B is what you want.

2d matrix histogram in matlab that interprets each column as a separate element

I have a 128 x 100 matrix in matlab, where each column should be treated as a separate element. Lets call this matrix M.
I have another 128 x 2000 matrix(called V) composed of columns from matrix M.
How would I make a histogram that maps the frequency of each column being used in the second matrix?
hist(double(V),double(M)) gives the error:
Error using histc
Edge vector must be monotonically
non-decreasing.
what should I be doing?
Here is an example. We start with data that resembles what you described
%# a matrix of 100 columns
M = rand(128,100);
sz = size(M);
%# a matrix composed of randomly selected columns of M (with replacement)
V = M(:,randi([1 sz(2)],[1 2000]));
Then:
%# map the columns to indices starting at 1
[~,~,idx] = unique([M,V]', 'rows', 'stable');
idx = idx(sz(2)+1:end);
%# count how many times each column occurs
count = histc(idx, 1:sz(2));
%# plot histogram
bar(1:sz(2), count, 'histc')
xlabel('column index'), ylabel('frequency')
set(gca, 'XLim',[1 sz(2)])
[Lia,Locb] = ismember(A,B,'rows') also returns a vector, Locb,
containing the highest index in B for each row in A that is also a row
in B. The output vector, Locb, contains 0 wherever A is not a row of
B.
ismember with the rows argument can identify which row of one matrix the rows of another matrix come from. Since it works on rows, and you are looking for columns, just transpose both matrices.
[~,Locb]=ismember(V',M');
histc(Locb)