Linear fit of data in a cell array - matlab

I have a cell array of data. The array is a single column with something between 500 and 3000 elements. Each element consists of a matrix with four columns. The first column are the x-, the second the y-values, whereas the third, and fourth column are irrelevant.
What I would like to do know is to fit these x- and y-values to a linear function (y=a*x+b), but not all the values, only a fraction - e.g. the first 10, 20, or 50%, the rest should not be considered. I have problems to access the relevant data in the cell array properly and to find a way to fit only a fraction of the data. What makes the task for me even more challenging is that the number of x- and y-values is different for every element of the array.
Although the elements of the array all have the size 500x4, some of them only exhibit several x- and y-paris and almost only NaN at the end, some of them almost 500 and only several NaN.

% cell array of data...
% single column, 500 to 3000 elements..
mycell = cell(1000,1);
for k = 1 : 1000
% each element is a matrix with 4 columns.. 500x4
mycell{k} = rand(500,4);
end
% first 10% means 500/10 or 50 elements
% first 50% means 500/2 or 250 elements
% lets fit the first 10% (50 elements)
for k = 1 : 1000
% get elements 1:50, or 10%
% x is col 1
x = mycell{k}(1:50, 1);
% y is col 2
y = mycell{k}(1:50, 2);
% will leave it to you to figure out the fitting. see:
% http://www.mathworks.com/help/matlab/data_analysis/linear-regression.html
polyfit(x, y, 1);
end

Related

Divide a matrix and its corresponding vector into submatrices and subvectors in MATLAB

I have two matrices X (122 x 125973) and Y (1 x 125973). I want to split in the same way X and Y into smaller matrices and vectors of 122 x 1024 (column division) in Matlab.
I have tried several methods (mat2cell, loops, etc), but I think I am missing the syntax. Any help ?
Note: 125973 can't be divided by 1024, so the last matrix (and vector) will the have the size of (122 x 21) (and (1 x 21) respectively). Thank you for your help!
Since your sub-matrices do not have equal size, you cannot put them in a 3D array (without NaN or zero padding). So you can use a cell. To do this with mat2cell you have to specify how many rows of the original matrix should be put in each individual entry of the cell:
X = rand(122,125973);
Y = rand(1,125973);
% your number of rows per 'block'
n = 1024;
% the number of cols per cell entry:
colDist = [repelem(n, floor(size(X,2)/n)) rem(size(X,2),n)];
Xcell = mat2cell(X, size(X,1), colDist);
Ycell = mat2cell(Y, size(Y,1), colDist);
Here repelem(n, floor(size(X,2)/n)) repeats n for the number of times n fits in the number of columns of X. Then I append the remainder for the number of columns at the end (rem(size(X,2),n)) of this division to this row vector colDist.
When calling mat2cell (mat2cell(X, rowDist, colDist)) the second argument rowDist should then contain the number of rows per cell entry, which for each cell entry will be equal to the number of rows in X or Y.
Alternatively, you can use a loop to divide the matrix and vector in sub-matrices and put them in the appropriate cell.
Xcell = cell(ceil(size(X,2)/n),1);
Ycell = cell(ceil(size(X,2)/n),1);
% put in the blocks of n rows
for k = 1:floor(size(X,2)/n)
indices = n*(k-1)+1:n*k;
Xcell{k} = X(:,indices);
Ycell{k} = Y(:,indices);
end
% and the remainder:
Xcell{end} = X(:, indices(end)+1:end);
Ycell{end} = Y(:, indices(end)+1:end);

Vectorizing by splitting matrix by rows unequally

I have X_test which is a matrix of size 967874 x 3 where the columns are: doc#, wordID, wordCount, and there's 7505 unique doc#'s (length(unique(X_test(:,1))) == length(Y_test) == 7505). The matrix rows are also already sorted according to the doc#'s column.
I also have a likelihoods matrix of size 61188 x 20 where the rows are all possible wordIDs, and the columns are different classes (length(unique(Y_test))==20)
The result I'm trying to obtain is a matrix of size 7505 x 20 where each row signifies a different document and contains, for each class (column), the sum of the wordCounts of the values in the likelihood matrix rows which correspond to the wordIDs for that document (trying to think of better phrasing...)
My first thought was to rearrange this 2D matrix into a 3D matrix according to doc#s, but the number of rows for each unique doc# are unequal. I also think making a cell array of 7505 matrices isn't a great idea, but may be wrong about that.
It's probably more explanatory if I just show the code I have that works, but is slow because it iterates through each of the 7505 documents:
probabilities = zeros(length(Y_test),nClasses); % 7505 x 20
for n=1:length(Y_test) % 7505 iterations
doc = X_test(X_test(:,1)==n,:);
result = bsxfun(#times, doc(:,3), log(likelihoods(doc(:,2),:)));
% result ends up size length(doc) x 20
probabilities(n,:) = sum(result);
end
for context, this is what I use the probabilities matrix for:
% MAP decision rule
probabilities = bsxfun(#plus, probabilities, logpriors'); % add priors
[~,predictions] = max(probabilities,[],2);
CCR = sum(predictions==Y_test)/length(Y_test); % correct classification rate
fprintf('Correct classification percentage: %0.2f%%\n\n', CCR*100);
edit: so I separated the matrix into a cell array according to doc#'s, but don't know how to apply bsxfun to all arrays in a cell at the same time.
counts = histc(X_test(:,1),unique(X_test(:,1)));
testdocs = mat2cell(X_test,counts);

Get Matrix of minimum coordinate distance to point set

I have a set of points or coordinates like {(3,3), (3,4), (4,5), ...} and want to build a matrix with the minimum distance to this point set. Let me illustrate using a runnable example:
width = 10;
height = 10;
% Get min distance to those points
pts = [3 3; 3 4; 3 5; 2 4];
sumSPts = length(pts);
% Helper to determine element coordinates
[cols, rows] = meshgrid(1:width, 1:height);
PtCoords = cat(3, rows, cols);
AllDistances = zeros(height, width,sumSPts);
% To get Roh_I of evry pt
for k = 1:sumSPts
% Get coordinates of current Scribble Point
currPt = pts(k,:);
% Get Row and Col diffs
RowDiff = PtCoords(:,:,1) - currPt(1);
ColDiff = PtCoords(:,:,2) - currPt(2);
AllDistances(:,:,k) = sqrt(RowDiff.^2 + ColDiff.^2);
end
MinDistances = min(AllDistances, [], 3);
This code runs perfectly fine but I have to deal with matrix sizes of about 700 milion entries (height = 700, width = 500, sumSPts = 2k) and this slows down the calculation. Is there a better algorithm to speed things up?
As stated in the comments, you don't necessary have to put everything into a huge matrix and deal with gigantic matrices. You can :
1. Slice the pts matrix into reasonably small slices (say of length 100)
2. Loop on the slices and calculate the Mindistances slice over these points
3. Take the global min
tic
Mindistances=[];
width = 500;
height = 700;
Np=2000;
pts = [randi(width,Np,1) randi(height,Np,1)];
SliceSize=100;
[Xcoords,Ycoords]=meshgrid(1:width,1:height);
% Compute the minima for the slices from 1 to floor(Np/SliceSize)
for i=1:floor(Np/SliceSize)
% Calculate indexes of the next slice
SliceIndexes=((i-1)*SliceSize+1):i*SliceSize
% Get the corresponding points and reshape them to a vector along the 3rd dim.
Xpts=reshape(pts(SliceIndexes,1),1,1,[]);
Ypts=reshape(pts(SliceIndexes,2),1,1,[]);
% Do all the diffs between your coordinates and your points using bsxfun singleton expansion
Xdiffs=bsxfun(#minus,Xcoords,Xpts);
Ydiffs=bsxfun(#minus,Ycoords,Ypts);
% Calculate all the distances of the slice in one call
Alldistances=bsxfun(#hypot,Xdiffs,Ydiffs);
% Concatenate the mindistances
Mindistances=cat(3,Mindistances,min(Alldistances,[],3));
end
% Check if last slice needed
if mod(Np,SliceSize)~=0
% Get the corresponding points and reshape them to a vector along the 3rd dim.
Xpts=reshape(pts(floor(Np/SliceSize)*SliceSize+1:end,1),1,1,[]);
Ypts=reshape(pts(floor(Np/SliceSize)*SliceSize+1:end,2),1,1,[]);
% Do all the diffs between your coordinates and your points using bsxfun singleton expansion
Xdiffs=bsxfun(#minus,Xcoords,Xpts);
Ydiffs=bsxfun(#minus,Ycoords,Ypts);
% Calculate all the distances of the slice in one call
Alldistances=bsxfun(#hypot,Xdiffs,Ydiffs);
% Concatenate the mindistances
Mindistances=cat(3,Mindistances,min(Alldistances,[],3));
end
% Get global minimum
Mindistances=min(Mindistances,[],3);
toc
Elapsed time is 9.830051 seconds.
Note :
You'll not end up doing less calculations. But It will be a lot less intensive for your memory (700M doubles takes 45Go in memory), thus speeding up the process (With the help of vectorizing aswell)
About bsxfun singleton expansion
One of the great strength of bsxfun is that you don't have to feed it matrices whose values are along the same dimensions.
For example :
Say I have two vectors X and Y defined as :
X=[1 2]; % row vector X
Y=[1;2]; % Column vector Y
And that I want a 2x2 matrix Z built as Z(i,j)=X(i)+Y(j) for 1<=i<=2 and 1<=j<=2.
Suppose you don't know about the existence of meshgrid (The example is a bit too simple), then you'll have to do :
Xs=repmat(X,2,1);
Ys=repmat(Y,1,2);
Z=Xs+Ys;
While with bsxfun you can just do :
Z=bsxfun(#plus,X,Y);
To calculate the value of Z(2,2) for example, bsxfun will automatically fetch the second value of X and Y and compute. This has the advantage of saving a lot of memory space (No need to define Xs and Ys in this example) and being faster with big matrices.
Bsxfun Vs Repmat
If you're interested with comparing the computational time between bsxfun and repmat, here are two excellent (word is not even strong enough) SO posts by Divakar :
Comparing BSXFUN and REPMAT
BSXFUN on memory efficiency with relational operations

Matlab bar3 plot

I have a problem with MATLAB bar3 plots: Here is what I have:
m x n Array Values containing values of a measurement.
Another m x n Array Angles Represents the angle at which a value was measured (e.g. the 3rd value was measured at an angle of 90°). The angular values for each measurement value are stored in another variable.
I need a range for my x-axis from -180° to +180°. This alone is no problem. But how do I hand over my measurement values? I have to somehow link them to the angular values. So that each value in Values is somehow linked to it's angular value in Angles. For my y-axis, I can simply count from 0 to the amount of rows of my Values Array.
EXAMPLE:
Valueslooks like:
3 5 6
2 1 7
5 8 2
Angles looks like:
37° 38° 39°
36° 37° 38°
34° 35° 36°
Values(1,1) = 3 was measured at Angles(1,1) = 37° for example.
At each angle, the number of bars varies depending on how many measurements exist for that angle. bar3 needs a matrix input. In order to build a matrix, missing values are filled with NaN.
Warning: NaNs are usually ignored by plotting commands, but bar3 apparently breaks this convention. It seems to replace NaNs by zeros! So at missing values you'll get a zero-height bar (instead of no bar at all).
[uAngles, ~, uAngleLabels] = unique(Angles); %// get unique values and
%// corresponding labels
valuesPerAngle = accumarray(uAngleLabels(:), Values(:), [], #(v) {v});
%// cell array where each cell contains all values corresponding to an angle
N = max(cellfun(#numel, valuesPerAngle));
valuesPerAngle = cellfun(#(c) {[c; NaN(N-numel(c),1)]}, valuesPerAngle);
%// fill with NaNs to make all cells of equal lenght, so that they can be
%// concatenated into a matrix
valuesPerAngle = cat(2, valuesPerAngle{:}); %// matrix of values for each angle,
%// filled with NaNs where needed
bar3(uAngles, valuesPerAngle.'); %'// finally, the matrix can be plotted
ylabel('Angles')
xlabel('Measurement')
With your example Values and Angles this gives:

How to select values with the higher occurences from several matrices having the same size in matlab?

I would like to have a program that makes the following actions:
Read several matrices having the same size (1126x1440 double)
Select the most occuring value in each cell (same i,j of the matrices)
write this value in an output matrix having the same size 1126x1440 in the corresponding i,j position, so that this output matrix will have in each cell the most occurent value from the same position of all the input matrices.
Building on #angainor 's answer, I think there is a simpler method using the mode function.
nmatrices - number of matrices
n, m - dimensions of a single matrix
maxval - maximum value of an entry (99)
First organize data into a 3-D matrix with dimensions [n X m X nmatrices]. As an example, we can just generate the following random data in a 3-D form:
CC = round(rand(n, m, nmatrices)*maxval);
and then the computation of the most frequent values is one line:
B = mode(CC,3); %compute the mode along the 3rd dimension
Here is the code you need. I have introduced a number of constants:
nmatrices - number of matrices
n, m - dimensions of a single matrix
maxval - maximum value of an entry (99)
I first generate example matrices with rand. Matrices are changed to vectors and concatenated in the CC matrix. Hence, the dimensions of CC are [m*n, nmatrices]. Every row of CC holds individual (i,j) values for all matrices - those you want to analyze.
CC = [];
% concatenate all matrices into CC
for i=1:nmatrices
% generate some example matrices
% A = round(rand(m, n)*maxval);
A = eval(['neurone' num2str(i)]);
% flatten matrix to a vector, concatenate vectors
CC = [CC A(:)];
end
Now we do the real work. I have to transpose CC, because matlab works on column-based matrices, so I want to analyze individual columns of CC, not rows. Next, using histc I find the most frequently occuring values in every column of CC, i.e. in (i,j) entries of all matrices. histc counts the values that fall into given bins (in your case - 1:maxval) in every column of CC.
% CC is of dimension [nmatrices, m*n]
% transpose it for better histc and sort performance
CC = CC';
% count values from 1 to maxval in every column of CC
counts = histc(CC, 1:maxval);
counts have dimensions [maxval, m*n] - for every (i,j) of your original matrices you know the number of times a given value from 1:maxval is represented. The last thing to do now is to sort the counts and find out, which is the most frequently occuring one. I do not need the sorted counts, I need the permutation that will tell me, which entry from counts has the highest value. That is exactly what you want to find out.
% sort the counts. Last row of the permutation will tell us,
% which entry is most frequently found in columns of CC
[~,perm] = sort(counts);
% the result is a reshaped last row of the permutation
B = reshape(perm(end,:)', m, n);
B is what you want.