I'm trying to build a sound compressor using PCA (Principal Component Analysis).
The input sound is always a mono-channel one, so the resulting matrix is a column matrix called sampledData in the code.
Compressing the sound requires the matrix to be transformed in a 2D one. So, one would need to group the samples the following way :
Here the grouping size is 3 with a sound composed of 7 samples
[sample1 sample2 sample3
sample4 sample5 sample6
sample7 0 0 ]
What does it change to have low grouping size (= number of columns wanted for the matrix) e.g. 10 or a high one e.g. 100 or more?
[sampledData, sampleRate] = audioread("soundFile.wav");
data = sampledData;
groupingSize = 10; %Number of columns wanted from the data
%Make data size a multiple of groupingSize and fill with 0
data(end + 1:groupingSize*ceil(numel(data)/groupingSize))=0;
%Make data a matrix of x rows and groupingSize columns
data = transpose(reshape(data,groupingSize,[]));
Here is the remaining code I'm using for the sound compression. Unfortunately I have an error when percentagePC is less than 100, saying that the number of columns of the compressed matrix need to have the same number of columns as the number of rows of the inverse of eigenVec.
To illustrate, I use groupingSize = 100 which gives me a compressed matrix of 100 columns and eigenVec is a matrix of 100 row and 100 columns. If I set percentagePC = 90 I then get a compressed matrix of 90 columns (I keep 90% of the data that is most useful), so to be able to multiply both matrixes I theoretically need to reduce the size of the eigenVec matrix by 10 columns. Is this correct reasoning according to PCA?
percentagePC = 90; %Percentage of principal component to keep
[rows, cols] = size(data);
%get eigenvalues and eigenvector by centering data matrix and getting its covariance
dataCR =(data-ones(size(data,1),1)*mean(data))./(ones(size(data,1),1)*std(data,1));
dataCov = cov(dataCR);
[eigenVec, eigenVal] = eig(dataCov);
%Sort eigenvectors (desc)
[~, index] = sort(diag(eigenVal), 'descend');
eigenVec = eigenVec(:, index);
%principal components calculation
P = zeros(rows,cols);
for i = 1:cols
P(:,i) = dataCR * eigenVec(:,i);
end
%Number of principal components wanted according to percentagePC
iterations = ceil((groupingSize*percentagePC)/100);
compressed = zeros(rows,iterations);
for i = 1: iterations
compressed (:,i) = P(:,i);
end
%Fuse principal components between each other to get final compressed signal
final_compressed = zeros(rows*iterations,1);
z=1;
z = 1;
for i = 1:iterations:rows*iterations
for j = 1:iterations
final_compressed(i+j-1,1) = P(z,j);
end
z = z + 1;
end
% Decompression of compressed signal
decompressed = compressed * (eigenVec^(-1)); % /!\ ERROR IS HERE /!\
[rowsD,colsD] = size(decompressed);
final_decompressed = zeros(rowsD*colsD,1);
z = 1;
for i = 1:colsD:rowsD*colsD
for j = 1:colsD
final_decompressed(i+j-1,1) = decompressed(z,j);
end
z = z + 1;
end
filenameC = fullfile('compress.wav');
filenameD = fullfile('decompress.wav');
audiowrite(filenameC, final_compressed/max(abs(final_compressed)), round(sampleRate/(groupingSize/iterations)));
audiowrite(filenameD, decompressed, sampleRate);
Related
Here's my example data.
Ycoordinate = 10;
Xcoordinate = 12;
Zdata = 4;
my3Darray = zeros(Ycoordinate, Xcoordinate, Zdata);
for i = 1:Ycoordinate
for j = 1:Xcoordinate
my3Darray(i,j,:) = uint8(rand(Zdata,1)*64);
end
end
my3Darray = uint8(my3Darray);
As you can see, there're 120 locations (Y:10 * X:12) and each location has 4 of uint8 value.
And here're my questions.
I want to find if there're any two or more locations have same vector of Zdata (4 of uint8 value). How can I do this?
My actual data will be Ycoordinate=7000, Xcoordinate=7000, Zdata = 500.
So it will be around 24GB array (7000*7000*500 = 24,500,000,000 byte)
Is it possible to find same Zdata with this huge size of array?
Additionally, my data is actually boolean so it is just 0 or 1 but I don't know how to allocate only "1 bit(not 1 byte)" to my data.
The code below will tell you how many locations have duplicate z-data vectors. The idea is to reshape your data in to a 2D matrix where each row represents a single column of z-data from the original matrix. The reshaped matrix will have Xcoordinate*Ycoordinate rows and Zdata columns. Then you can use the unique function to get the unique rows of this reshaped matrix, which essentially removes any duplicate z-data vectors.
You can also replace the nested loop in your code with the following line to directly generate a 3D random matrix:
my3Darray = uint8(rand(Ycoordinate, Xcoordinate, Zdata)*64);
If you want to store boolean data, use logical arrays in MATLAB.
Edit: Follow beaker's comment above to reduce the memory footprint.
Here's the code:
clear
clc
Ycoordinate = 4000;
Xcoordinate = 4000;
Zdata = 63;
my3Darray = uint8(rand(Ycoordinate,Xcoordinate,Zdata)*64);
%reshape data so that each z-column becomes a row
A = reshape(my3Darray,Ycoordinate*Xcoordinate,Zdata);
[A_unique, I, J] = unique(A,'rows'); %get the unique rows of A
duplicate_count = size(A,1) - size(A_unique,1)
In MatLab, I have a matrix SimC which has dimension 22 x 4. I re-generate this matrix 10 times using a for loop.
I want to end up with a matrix U that contains SimC(1) in rows 1 to 22, SimC(2) in rows 23 to 45 and so on. Hence U should have dimension 220 x 4 in the end.
Thank you!!
Edit:
nTrials = 10;
n = 22;
U = zeros(nTrials * n , 4) %Dimension of the final output matrix
for i = 1 : nTrials
SimC = SomeSimulation() %This generates an nx4 matrix
U = vertcat(SimC)
end
Unfortunately the above doesn't work as U = vertcat(SimC) only gives back SimC instead of concatenating.
vertcat is a good choice, but it will result in a growing matrix. This is not good practice on larger programs because it can really slow down. In your problem, though, you aren't looping through too many times, so vertcat is fine.
To use vertcat, you would NOT pre-allocate the full final size of the U matrix...just create an empty U. Then, when invoking vertcat, you need to give it both matrices that you want to concatenate:
nTrials = 10;
n = 22;
U = [] %create an empty output matrix
for i = 1 : nTrials
SimC = SomeSimulation(); %This generates an nx4 matrix
U = vertcat(U,SimC); %concatenate the two matrices
end
The better way to do this, since you already know the final size, is to pre-allocate your full U (as you did) and then put your values into U via computing the correct indices. Something like this:
nTrials = 10;
n = 22;
U = U = zeros(nTrials * n , 4); %create a full output matrix
for i = 1 : nTrials
SimC = SomeSimulation(); %This generates an nx4 matrix
indices = (i-1)*n+[1:n]; %here are the rows where you want to put the latest output
U(indices,:)=SimC; %copies SimC into the correct rows of U
end
Hello i have a matrix whose rows and columns are multiple of 8 let's say 256x160 and i need to have as output all the possible submatrix of 8x8 elements. Fore few elements i can write
bloc = 8;
imm = imread('cameraman.tif');
[rows, columns, dimension] = size(imm); % dimension if the image is RGB
nr = rows/bloc; % numeber of blocks of rows
nc = columns/bloc; % number of blocks of columns
cell_row = repmat(bloc,1,nr);
cell_columns = repmat(bloc,1,nc);
N = mat2cell(imm, [cell_row,[cell_columns]);
I think that now it works quite good but if there is a better way just tell me thanks
Im working on spectrum analysis of wav file. I have plotting the spectrum of the whole frequency,but how can i plot just the high frequency of my file ?
this is the code :
[a,fs] = wavread('ori1.wav');
ydft = fft(a);
ydft = ydft(1:length(a)/2+1);
freq = 0:fs/length(a):fs/2;
plot(freq,abs(ydft));
You can use logical indexing:
a = randn(1,1000);
fs=10;
ydft = fft(a);
ydft = ydft(1:length(a)/2+1);
freq = 0:fs/length(a):fs/2;
lowestFrequencyToPlot = 2;
idxHigherFrequencies = freq >= lowestFrequencyToPlot;
plot(freq(idxHigherFrequencies),abs(ydft(idxHigherFrequencies)));
only the highest frequency can be plotted with end.
Edit: The array freq will consist of the frequencies like: [1,2,3,4,5].
If you compare such an array with a value -- say 3 -- (freq > 3), a vector is returned with 0 where the condition is false and 1 where the condition is true. This will be [0,0,0,1,1] (4 and 5 are bigger 3, others smaller).
This vector can then be used for logical indexing. freq(freq>3)will return the frequencies bigger than 3: [4,5].
ydft(freq>3) will return the values where the corresponding frequencies are bigger than 3.
If you run the following code you will end up with a cell array composed of a correlation value in CovMatrix(:,3) and the name of the data used in calculating the correlation in CovMatrix(:,1) and CovMatrix(:,2):
clear all
FieldName = {'Name1','Name2','Name3','Name4','Name5'};
Data={rand(12,1),rand(12,1),rand(12,1),rand(12,1),rand(12,1)};
DataCell = [FieldName;Data];%place in a structure - this is the same
%structure that the data for the lakes will be placed in.
DataStructure = struct(DataCell{:});
FieldName = fieldnames(DataStructure);
Combinations = nchoosek (1:numel(FieldName),2);
d1 = cell2mat(struct2cell(DataStructure)');%this will be the surface temperatures
%use the combinations found in 'Combinations' to define which elements to
%use in calculating the coherence.
R = cell(1,size(Combinations,1));%pre-allocate the cell array
Names1 = cell(1,size(Combinations,1));
for j = 1:size(Combinations,1);
[R{j},P{j}] = corrcoef([d1(:,[Combinations(j,1)]),d1(:,[Combinations(j,2)])]);
Names1{j} = ([FieldName([Combinations(j,1)],1),FieldName([Combinations(j,2)],1)]);
end
%only obtain a single value for the correlation and p-value
for i = 1:size(Combinations,1);
R{1,i} = R{1,i}(1,2);
P{1,i} = P{1,i}(1,2);
end
R = R';P = P';
%COVARIANCE MATRIX
CovMatrix=cell(size(Combinations,1),3);%pre-allocate memory
for i=1:size(Combinations,1);
CovMatrix{i,3}=R{i,1};
CovMatrix{i,1}=Names1{1,i}{1,1};
CovMatrix{i,2}=Names1{1,i}{1,2};
end
From this I need to produce a table of the values, preferably in the form of a correlation matrix, similar to jeremytheadventurer.blogspot.com. Would this be possible in MATLAB?
You can compute the correlation matrix of your entire data set in one shot using corrcoef command:
% d1 can be simply computed as
d1_new = cell2mat(Data);
% Make sure that d1_new is the same matrix as d1
max(abs(d1(:)-d1_new(:)))
% Compute correlation matrix of columns of data in d1_new in one shot
CovMat = corrcoef(d1_new)
% Make sure that entries in CovMat are equivalent to the third column of
% CovMatrix, e.g.
CovMat(1,2)-CovMatrix{1,3}
CovMat(1,4)-CovMatrix{3,3}
CovMat(3,4)-CovMatrix{8,3}
CovMat(4,5)-CovMatrix{10,3}
Because the correlation matrix CovMat is symmetric, this contains the required result if you ignore the upper triangular part.