Matrix dimensions must agree min max normalization? - matlab

Hi I get the error stated below, im trying to normalize between 0 and 1. The error I get is this:
columns =
6
??? Error using ==> minus
Matrix dimensions must agree.
Error in ==> Kmeans at 54
data = ((data-minData)./(maxData));
Not sure what ive did wrong? Full code below:
%% dimensionality reduction
columns = 6
[U,S,V]=svds(fulldata,columns);
%% randomly select dataset
rows = 1000;
columns = 6;
%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows);
%# pick random columns
indY = randperm( size(fulldata,2) );
indY = indY(1:columns);
%# filter data
data = U(indX,indY);
%% apply normalization method to every cell
maxData = max(data);
minData = min(data);
data = ((data-minData)./(maxData));
The dataset is 1000x6.

From the Matlab documentation on min:
If A is a matrix, min(A) treats the columns of A as vectors, returning a row vector containing the minimum element from each column.
If you want to find the global minimum of a matrix, use either of the following forms:
min(min(A))
min(A(:))

Related

PCA Sound Compression using Matlab

I'm trying to build a sound compressor using PCA (Principal Component Analysis).
The input sound is always a mono-channel one, so the resulting matrix is a column matrix called sampledData in the code.
Compressing the sound requires the matrix to be transformed in a 2D one. So, one would need to group the samples the following way :
Here the grouping size is 3 with a sound composed of 7 samples
[sample1 sample2 sample3
sample4 sample5 sample6
sample7 0 0 ]
What does it change to have low grouping size (= number of columns wanted for the matrix) e.g. 10 or a high one e.g. 100 or more?
[sampledData, sampleRate] = audioread("soundFile.wav");
data = sampledData;
groupingSize = 10; %Number of columns wanted from the data
%Make data size a multiple of groupingSize and fill with 0
data(end + 1:groupingSize*ceil(numel(data)/groupingSize))=0;
%Make data a matrix of x rows and groupingSize columns
data = transpose(reshape(data,groupingSize,[]));
Here is the remaining code I'm using for the sound compression. Unfortunately I have an error when percentagePC is less than 100, saying that the number of columns of the compressed matrix need to have the same number of columns as the number of rows of the inverse of eigenVec.
To illustrate, I use groupingSize = 100 which gives me a compressed matrix of 100 columns and eigenVec is a matrix of 100 row and 100 columns. If I set percentagePC = 90 I then get a compressed matrix of 90 columns (I keep 90% of the data that is most useful), so to be able to multiply both matrixes I theoretically need to reduce the size of the eigenVec matrix by 10 columns. Is this correct reasoning according to PCA?
percentagePC = 90; %Percentage of principal component to keep
[rows, cols] = size(data);
%get eigenvalues and eigenvector by centering data matrix and getting its covariance
dataCR =(data-ones(size(data,1),1)*mean(data))./(ones(size(data,1),1)*std(data,1));
dataCov = cov(dataCR);
[eigenVec, eigenVal] = eig(dataCov);
%Sort eigenvectors (desc)
[~, index] = sort(diag(eigenVal), 'descend');
eigenVec = eigenVec(:, index);
%principal components calculation
P = zeros(rows,cols);
for i = 1:cols
P(:,i) = dataCR * eigenVec(:,i);
end
%Number of principal components wanted according to percentagePC
iterations = ceil((groupingSize*percentagePC)/100);
compressed = zeros(rows,iterations);
for i = 1: iterations
compressed (:,i) = P(:,i);
end
%Fuse principal components between each other to get final compressed signal
final_compressed = zeros(rows*iterations,1);
z=1;
z = 1;
for i = 1:iterations:rows*iterations
for j = 1:iterations
final_compressed(i+j-1,1) = P(z,j);
end
z = z + 1;
end
% Decompression of compressed signal
decompressed = compressed * (eigenVec^(-1)); % /!\ ERROR IS HERE /!\
[rowsD,colsD] = size(decompressed);
final_decompressed = zeros(rowsD*colsD,1);
z = 1;
for i = 1:colsD:rowsD*colsD
for j = 1:colsD
final_decompressed(i+j-1,1) = decompressed(z,j);
end
z = z + 1;
end
filenameC = fullfile('compress.wav');
filenameD = fullfile('decompress.wav');
audiowrite(filenameC, final_compressed/max(abs(final_compressed)), round(sampleRate/(groupingSize/iterations)));
audiowrite(filenameD, decompressed, sampleRate);

Create a submatrix using random columns and loop

I have a 102-by-102 matrix. I want to select square sub-matrices of orders from 2 up to 8 using random column numbers. Here is what I have done so far.
matt is the the original matrix of size 102-by-102.
ittr = 30
cols = 3;
for i = 1:ittr
rr = randi([2,102], cols,1);
mattsub = matt([rr(1) rr(2) rr(3)], [rr(1) rr(2) rr(3)]);
end
I have to extract matrices of different orders from 2 to 8. Using the above code I would have to change the mattsub line every time I change cols. I believe it is possible to do with another loop inside but cannot figure out how. How can I do this?
There is no need to extract elements of a vector and concatenate them, just use the vector to index a matrix.
Instead of :
mattsub = matt([rr(1) rr(2) rr(3)], [rr(1) rr(2) rr(3)]);
Use this:
mattsub = matt(rr, rr);
Defining a set of random sizes is pretty easy using the randi function. Once this is done, they can be projected along your iterations number N using arrayfun. Within the iterations, the randperm and sort functions can be used in order to build the random indexers to the original matrix M.
Here is the full code:
% Define the starting parameters...
M = rand(102);
N = 30;
% Retrieve the matrix rows and columns...
M_rows = size(M,1);
M_cols = size(M,2);
% Create a vector of random sizes between 2 and 8...
sizes = randi(7,N,1) + 1;
% Generate the random submatrices and insert them into a vector of cells...
subs = arrayfun(#(x)M(sort(randperm(M_rows,x)),sort(randperm(M_cols,x))),sizes,'UniformOutput',false);
This can work on any type of matrix, even non-squared ones.
You don't need another loop, one is enough. If you use randi to get a random integer as size of your submatrix, and then use those to get random column and row indices you can easily get a random submatrix. Do note that the ouput is a cell, as the submatrices won't all be of the same size.
N=102; % Or substitute with some size function
matt = rand(N); % Initial matrix, use your own
itr = 30; % Number of iterations
mattsub = cell(itr,1); % Cell for non-uniform output
for ii = 1:itr
X = randi(7)+1; % Get random integer between 2 and 7
colr = randi(N-X); % Random column
rowr = randi(N-X); % random row
mattsub{ii} = matt(rowr:(rowr+X-1),colr:(colr+X-1));
end

Generate a random sparse matrix with N non-zero-elements

I've written a function that generates a sparse matrix of size nxd
and puts in each column 2 non-zero values.
function [M] = generateSparse(n,d)
M = sparse(d,n);
sz = size(M);
nnzs = 2;
val = ceil(rand(nnzs,n));
inds = zeros(nnzs,d);
for i=1:n
ind = randperm(d,nnzs);
inds(:,i) = ind;
end
points = (1:n);
nnzInds = zeros(nnzs,d);
for i=1:nnzs
nnzInd = sub2ind(sz, inds(i,:), points);
nnzInds(i,:) = nnzInd;
end
M(nnzInds) = val;
end
However, I'd like to be able to give the function another parameter num-nnz which will make it choose randomly num-nnz cells and put there 1.
I can't use sprand as it requires density and I need the number of non-zero entries to be in-dependable from the matrix size. And giving a density is basically dependable of the matrix size.
I am a bit confused on how to pick the indices and fill them... I did with a loop which is extremely costly and would appreciate help.
EDIT:
Everything has to be sparse. A big enough matrix will crash in memory if I don't do it in a sparse way.
You seem close!
You could pick num_nnz random (unique) integers between 1 and the number of elements in the matrix, then assign the value 1 to the indices in those elements.
To pick the random unique integers, use randperm. To get the number of elements in the matrix use numel.
M = sparse(d, n); % create dxn sparse matrix
num_nnz = 10; % number of non-zero elements
idx = randperm(numel(M), num_nnz); % get unique random indices
M(idx) = 1; % Assign 1 to those indices

Sum every n rows of matrix

Is there any way that I can sum up columns values for each group of three rows in a matrix?
I can sum three rows up in a manual way.
For example
% matrix is the one I wanna store the new data.
% data is the original dataset.
matrix(1,1:end) = sum(data(1:3, 1:end))
matrix(2,1:end) = sum(data(4:6, 1:end))
...
But if the dataset is huge, this wouldn't work.
Is there any way to do this automatically without loops?
Here are four other ways:
The obligatory for-loop:
% for-loop over each three rows
matrix = zeros(size(data,1)/3, size(data,2));
counter = 1;
for i=1:3:size(data,1)
matrix(counter,:) = sum(data(i:i+3-1,:));
counter = counter + 1;
end
Using mat2cell for tiling:
% divide each three rows into a cell
matrix = mat2cell(data, ones(1,size(data,1)/3)*3);
% compute the sum of rows in each cell
matrix = cell2mat(cellfun(#sum, matrix, 'UniformOutput',false));
Using third dimension (based on this):
% put each three row into a separate 3rd dimension slice
matrix = permute(reshape(data', [], 3, size(data,1)/3), [2 1 3]);
% sum rows, and put back together
matrix = permute(sum(matrix), [3 2 1]);
Using accumarray:
% build array of group indices [1,1,1,2,2,2,3,3,3,...]
idx = floor(((1:size(data,1))' - 1)/3) + 1;
% use it to accumulate rows (appliead to each column separately)
matrix = cell2mat(arrayfun(#(i)accumarray(idx,data(:,i)), 1:size(data,2), ...
'UniformOutput',false));
Of course all the solution so far assume that the number of rows is evenly divisble by 3.
This one-liner reshapes so that all the values needed for a particular cell are in a column, does the sum, and then reshapes the back to the expected shape.
reshape(sum(reshape(data, 3, [])), [], size(data, 2))
The naked 3 could be changed if you want to sum a different number of rows together. It's on you to make sure the number of rows in each group divides evenly.
Slice the matrix into three pieces and add them together:
matrix = data(1:3:end, :) + data(2:3:end, :) + data(3:3:end, :);
This will give an error if size(data,1) is not a multiple of three, since the three pieces wouldn't be the same size. If appropriate to your data, you might work around that by truncating data, or appending some zeros to the end.
You could also do something fancy with reshape and 3D arrays. But I would prefer the above (unless you need to replace 3 with a variable...)
Prashant answered nicely before but I would have a simple amendment:
fl = filterLength;
A = yourVector (where mod(A,fl)==0)
sum(reshape(A,fl,[]),1).'/fl;
There is the ",1" that makes the line run even when fl==1 (original values).
I discovered this while running it in a for loop like so:
... read A ...
% Plot data
hold on;
averageFactors = [1 3 10 30 100 300 1000];
colors = hsv(length(averageFactors));
clear legendTxt;
for i=1:length(averageFactors)
% ------ FILTERING ----------
clear Atrunc;
clear ttrunc;
clear B;
fl = averageFactors(i); % filter length
Atrunc = A(1:L-mod(L,fl),:);
ttrunc = t(1:L-mod(L,fl),:);
B = sum(reshape(Atrunc,fl,[]),1).'/fl;
tB = sum(reshape(ttrunc,fl,[]),1).'/fl;
length(B)
plot(tB,B,'color',colors(i,:) )
%kbhit ()
endfor

Table of correlation values

If you run the following code you will end up with a cell array composed of a correlation value in CovMatrix(:,3) and the name of the data used in calculating the correlation in CovMatrix(:,1) and CovMatrix(:,2):
clear all
FieldName = {'Name1','Name2','Name3','Name4','Name5'};
Data={rand(12,1),rand(12,1),rand(12,1),rand(12,1),rand(12,1)};
DataCell = [FieldName;Data];%place in a structure - this is the same
%structure that the data for the lakes will be placed in.
DataStructure = struct(DataCell{:});
FieldName = fieldnames(DataStructure);
Combinations = nchoosek (1:numel(FieldName),2);
d1 = cell2mat(struct2cell(DataStructure)');%this will be the surface temperatures
%use the combinations found in 'Combinations' to define which elements to
%use in calculating the coherence.
R = cell(1,size(Combinations,1));%pre-allocate the cell array
Names1 = cell(1,size(Combinations,1));
for j = 1:size(Combinations,1);
[R{j},P{j}] = corrcoef([d1(:,[Combinations(j,1)]),d1(:,[Combinations(j,2)])]);
Names1{j} = ([FieldName([Combinations(j,1)],1),FieldName([Combinations(j,2)],1)]);
end
%only obtain a single value for the correlation and p-value
for i = 1:size(Combinations,1);
R{1,i} = R{1,i}(1,2);
P{1,i} = P{1,i}(1,2);
end
R = R';P = P';
%COVARIANCE MATRIX
CovMatrix=cell(size(Combinations,1),3);%pre-allocate memory
for i=1:size(Combinations,1);
CovMatrix{i,3}=R{i,1};
CovMatrix{i,1}=Names1{1,i}{1,1};
CovMatrix{i,2}=Names1{1,i}{1,2};
end
From this I need to produce a table of the values, preferably in the form of a correlation matrix, similar to jeremytheadventurer.blogspot.com. Would this be possible in MATLAB?
You can compute the correlation matrix of your entire data set in one shot using corrcoef command:
% d1 can be simply computed as
d1_new = cell2mat(Data);
% Make sure that d1_new is the same matrix as d1
max(abs(d1(:)-d1_new(:)))
% Compute correlation matrix of columns of data in d1_new in one shot
CovMat = corrcoef(d1_new)
% Make sure that entries in CovMat are equivalent to the third column of
% CovMatrix, e.g.
CovMat(1,2)-CovMatrix{1,3}
CovMat(1,4)-CovMatrix{3,3}
CovMat(3,4)-CovMatrix{8,3}
CovMat(4,5)-CovMatrix{10,3}
Because the correlation matrix CovMat is symmetric, this contains the required result if you ignore the upper triangular part.