Im working on spectrum analysis of wav file. I have plotting the spectrum of the whole frequency,but how can i plot just the high frequency of my file ?
this is the code :
[a,fs] = wavread('ori1.wav');
ydft = fft(a);
ydft = ydft(1:length(a)/2+1);
freq = 0:fs/length(a):fs/2;
plot(freq,abs(ydft));
You can use logical indexing:
a = randn(1,1000);
fs=10;
ydft = fft(a);
ydft = ydft(1:length(a)/2+1);
freq = 0:fs/length(a):fs/2;
lowestFrequencyToPlot = 2;
idxHigherFrequencies = freq >= lowestFrequencyToPlot;
plot(freq(idxHigherFrequencies),abs(ydft(idxHigherFrequencies)));
only the highest frequency can be plotted with end.
Edit: The array freq will consist of the frequencies like: [1,2,3,4,5].
If you compare such an array with a value -- say 3 -- (freq > 3), a vector is returned with 0 where the condition is false and 1 where the condition is true. This will be [0,0,0,1,1] (4 and 5 are bigger 3, others smaller).
This vector can then be used for logical indexing. freq(freq>3)will return the frequencies bigger than 3: [4,5].
ydft(freq>3) will return the values where the corresponding frequencies are bigger than 3.
Related
I'm trying to build a sound compressor using PCA (Principal Component Analysis).
The input sound is always a mono-channel one, so the resulting matrix is a column matrix called sampledData in the code.
Compressing the sound requires the matrix to be transformed in a 2D one. So, one would need to group the samples the following way :
Here the grouping size is 3 with a sound composed of 7 samples
[sample1 sample2 sample3
sample4 sample5 sample6
sample7 0 0 ]
What does it change to have low grouping size (= number of columns wanted for the matrix) e.g. 10 or a high one e.g. 100 or more?
[sampledData, sampleRate] = audioread("soundFile.wav");
data = sampledData;
groupingSize = 10; %Number of columns wanted from the data
%Make data size a multiple of groupingSize and fill with 0
data(end + 1:groupingSize*ceil(numel(data)/groupingSize))=0;
%Make data a matrix of x rows and groupingSize columns
data = transpose(reshape(data,groupingSize,[]));
Here is the remaining code I'm using for the sound compression. Unfortunately I have an error when percentagePC is less than 100, saying that the number of columns of the compressed matrix need to have the same number of columns as the number of rows of the inverse of eigenVec.
To illustrate, I use groupingSize = 100 which gives me a compressed matrix of 100 columns and eigenVec is a matrix of 100 row and 100 columns. If I set percentagePC = 90 I then get a compressed matrix of 90 columns (I keep 90% of the data that is most useful), so to be able to multiply both matrixes I theoretically need to reduce the size of the eigenVec matrix by 10 columns. Is this correct reasoning according to PCA?
percentagePC = 90; %Percentage of principal component to keep
[rows, cols] = size(data);
%get eigenvalues and eigenvector by centering data matrix and getting its covariance
dataCR =(data-ones(size(data,1),1)*mean(data))./(ones(size(data,1),1)*std(data,1));
dataCov = cov(dataCR);
[eigenVec, eigenVal] = eig(dataCov);
%Sort eigenvectors (desc)
[~, index] = sort(diag(eigenVal), 'descend');
eigenVec = eigenVec(:, index);
%principal components calculation
P = zeros(rows,cols);
for i = 1:cols
P(:,i) = dataCR * eigenVec(:,i);
end
%Number of principal components wanted according to percentagePC
iterations = ceil((groupingSize*percentagePC)/100);
compressed = zeros(rows,iterations);
for i = 1: iterations
compressed (:,i) = P(:,i);
end
%Fuse principal components between each other to get final compressed signal
final_compressed = zeros(rows*iterations,1);
z=1;
z = 1;
for i = 1:iterations:rows*iterations
for j = 1:iterations
final_compressed(i+j-1,1) = P(z,j);
end
z = z + 1;
end
% Decompression of compressed signal
decompressed = compressed * (eigenVec^(-1)); % /!\ ERROR IS HERE /!\
[rowsD,colsD] = size(decompressed);
final_decompressed = zeros(rowsD*colsD,1);
z = 1;
for i = 1:colsD:rowsD*colsD
for j = 1:colsD
final_decompressed(i+j-1,1) = decompressed(z,j);
end
z = z + 1;
end
filenameC = fullfile('compress.wav');
filenameD = fullfile('decompress.wav');
audiowrite(filenameC, final_compressed/max(abs(final_compressed)), round(sampleRate/(groupingSize/iterations)));
audiowrite(filenameD, decompressed, sampleRate);
Assuming that I have a dataset of the following size:
train = 500,000 * 960 %number of training samples (vector) each of 960 length
B_base = 1000000*960 %number of base samples (vector) each of 960 length
Query = 1000*960 %number of query samples (vector) each of 960 length
truth_nn = 1000*100
truth_nn contains the ground truth neighbors in the form of the
pre-computed k nearest neighbors and their square Euclidean distance. So, the columns of truth_nn represent the k = 100 nearest neighbors. I am finding difficult to apply nearest neighbor search in the code snippet. Can somebody please show how to apply the ground truth neighbors truth_nn in finding the mean average precision-recall?
It will be of immense help if somebody can show with any small example by creating any data matrix, query matrix, and the ground truth neighbors in the form of the pre-computed k nearest neighbors and their square Euclidean distance. I tried creating a sample database.
Assume, the base data is
B_base = [1 1; 2 2; 3 2; 4 4; 5 6];
Query data is
Query = [1 1; 2 1; 6 2];
[neighbors distances] = knnsearch(a,b,'k',2);
would find 2 nearest neighbors.
Question 1: how do I create the truth data containing the ground truth neighbors and pre-computed k nearest neighbor distances?
This is called as the mean average precision recall. I tried implementing the knearest neighbor search and the average precision recall as follows but cannot understand (unsure) how to apply the ground truth table
Question 2:
I am trying to apply k nearest neighbor search by converting first the real-valued features into binary.
I am unable to apply the concept of k-nearest neighbor search for different values of k = 10,20,50 and to check how much data has been correctly recalled using the GIST database. In the GIST truth_nn() file, when I specify truth_nn(i,1:k) for a query vector i, the function AveragePrecision throws error. So, if somebody can show using any sample ground truth that is of similar structure to that in GIST, how to properly specify k and calculate the Average precision recall, then I shall be able to apply the solution to the GIST database. As of now, this is my approach and shall be of immense help if the correct way is provided using any example that will be easier for me to relate to the GIST database. The problem is on how I can find neighbors from the ground truth and compare it with the neighbors obtained after sorting the distances?
I am also interested on how I can apply pdist2() instead of the present distance calcualtion as it takes a long time.
numQueryVectors = size(Query,1);
%Calculate distances
for i=1:numQueryVectors,
queryMatrix(i,:)
dist = sum((repmat(queryMatrix(i,:),numDataVectors,1)-B_base ).^2,2);
[sortval sortpos] = sort(dist,'ascend');
neighborIds(i,:) = sortpos(1:k);
neighborDistances(i,:) = sqrt(sortval(1:k));
end
%Sorting calculated nearest neighbor distances for k = 50
%HOW DO I SPECIFY k = 50 in the ground truth, truth_nn
for i=1:numQueryVectors
AP(i) = AveragePrecision(neighborIds(i,:),truth_nn(i,:));
end
mAP = mean(AP);
function ap = AveragePrecision(rank_id, truth_id)
truth_num = length(truth_id);
truth_pos = zeros(truth_num,1);
for j=1:50 %% for k = 50 nearest neighbors
truth_pos(j) = find(rank_id == truth_id(j));
end
truth_pos = sort(truth_pos, 'ascend');
% compute average precision as the area below the recall-precision curve
ap = 0;
delta_recall = 1/truth_num;
for j=1:truth_num
p = j/truth_pos(j);
ap = ap + p*delta_recall;
end
end
end
UPDATE : Based on solution, I tried to calculate the mean average precision using the formula given formula hereand a reference code . But, not sure if my approach is correct because the theory says that I need to rank the returned queries based on the indices. I do not understand this fully. Mean average precision is required to judge the quality of the retrieval algortihm.
precision = positives/total_data;
recal = positives /(positives+negatives);
precision = positives/total_data;
recall = positives /(positives+negatives);
truth_pos = sort(positives, 'ascend');
truth_num = length(truth_pos);
ap = 0;
delta_recall = 1/truth_num;
for j=1:truth_num
p = j/truth_pos(j);
ap = ap + p*delta_recall;
end
ap
The value of ap = infinity , value of positive = 0 and negatives = 150. This means that knnsearch() does not work at all.
I think you are doing extra work. This process is very simple in matlab, you can also operate on entire arrays. This should be faster than for loops, and is a bit easier to read.
Your truth_nn and neighbors should have the same data, if there are no errors. There is one entry per row. Matlab already sorts the kmeans result in ascending order, so the column 1 is the closest neighbor, the second closest is column 2, 3rd closest is 3,.... There is no need to sort the data again.
Just compare truth_nn to neighbors to get your statistics. This is a simple example to show you how the program should go. It will not work on your data without some modification
%in your example this is provided, I created my own
truth_nn = [1,2;
1,3;
4,3];
B_base = [1 1; 2 2; 3 2; 4 4; 5 6];
Query = [1 1; 2 1; 6 2];
%performs k means
num_clusters = 2;
[neighbors distances] = knnsearch(B_base,Query,'k',num_clusters);
%--- output---
% neighbors = [1,2;
% 1,2; notice this doesn't match truth_nn 1,3
% 4,3]
% distances = [ 0 1.4142;
% 1.0000 1.0000;
% 2.8284 3.0000];
%computes statistics, nnz counts number of nonzero elements, in the first
%case every piece of data that matches
%NOTE1: the indexing on truth_nn (:,1:num_clusters ) it says all rows
% but only use the first num_clusters columns. This should
% prevent the dimension mistmatch error you were getting
positives = nnz(neighbors == truth_nn(:,1:num_clusters )); %result = 5
negatives = nnz(neighbors ~= truth_nn(:,1:num_clusters )); %result = 1
%NOTE1: I've switched this from truth_nn to neighbors, this helps
% when you cahnge num_neghbors
total_data = numel(neighbors); %result = 6
percent_incorrect = 100*(negatives / total_data); % 16.6666
percent_correct = 100*(positives / total_data); % 93.3333
I'm currently working on creating a histogram of Altitudes at which a type of atmospheric instability happens. To be specific, it is when the values of what we call, N^2 is less than zero. This is where the problem comes in. I am trying to plot the occurrence frequency against the altitudes.
load /data/matlabst/DavidBloom/N_square_Ri_number_2005.mat
N_square(N_square > 0) = 0;
N_square = abs(N_square);
k = (1:87);
H = 7.5;
p0 = 101325;
nbins = (500);
N_square(N_square==0)=[];
Alt = zeros(1,578594);
PresNew = squeeze(N_square(:,:,k,:));
for lati = 1:32
for long = 1:64
for t = 1:1460
for k = 1:87
Alt(1,:) = -log((PresNew)/p0)*H;
end
end
end
end
So, let me explain what I am doing. I'm loading a file with all these different variables. Link To Image This shows the different variables it displays. Next, I take the 4-D matrix N_square and I filter all values greater than zero to equal 0. Then I take the absolute value of the leftover negative values. I then define several variables and move on to the next filtering.
(N_square(N_square==0)=[];
The goal of this one was give just discard all values of N_square that were 0. I think this is where the problem begins. Jumping down to the for loop, I am then taking the 3rd dimension of N_square and converting pressure to altitude.
My concern is that when I run this, PresNew = squeeze(N_square(:,:,k,:)); is giving me the error.
Error in PlottingN_2 (line 10)
PresNew = squeeze(N_square(:,:,k,:));
And I have no idea why.
Any thoughts or suggestions on how I could avoid this catastrophe and make my code simpler? Thanks.
When you remove random elements from a multi-dimensional array, they are removed but it can no longer be a valid multi-dimensional array because it has holes in it. Because of this, MATLAB will collapse the result into a vector, and you can't index into the third dimension of a vector like you're trying.
data = magic(3);
% 8 1 6
% 3 5 7
% 4 9 2
% Remove all values < 2
data(data < 2) = []
% 8 3 4 5 9 6 7 2
data(2,3)
% Index exceeds matrix dimensions.
The solution is to remove the 0 values after your indexing (i.e. within your loop).
Alt = zeros(1,578594);
for lati = 1:32
for long = 1:64
for t = 1:1460
for k = 1:87
% Index into 4D matrix
PresNew = N_square(:,:,k,:);
% NOW remove the 0 values
PresNew(PresNew == 0) = [];
Alt(1,:) = -log((PresNew)/p0)*H;
end
end
end
end
I have a matrix with constant consecutive values randomly distributed throughout the matrix. I want the indices of the consecutive values, and further, I want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values. For Example
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
I have struggled mightily to find a solution to this problem. It has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.
If you have a solution to that, please provide! Thank you.
For this kind of problems, I made my own utility function runlength:
function RL = runlength(M)
% calculates length of runs of consecutive equal items along columns of M
% work along columns, so that you can use linear indexing
% find locations where items change along column
jumps = diff(M) ~= 0;
% add implicit jumps at start and end
ncol = size(jumps, 2);
jumps = [true(1, ncol); jumps; true(1, ncol)];
% find linear indices of starts and stops of runs
ijump = find(jumps);
nrow = size(jumps, 1);
istart = ijump(rem(ijump, nrow) ~= 0); % remove fake starts in last row
istop = ijump(rem(ijump, nrow) ~= 1); % remove fake stops in first row
rl = istop - istart;
assert(sum(rl) == numel(M))
% make matrix of 'derivative' of runlength
% don't need last row, but needs same size as jumps for indices to be valid
dRL = zeros(size(jumps));
dRL(istart) = rl;
dRL(istop) = dRL(istop) - rl;
% remove last row and 'integrate' to get runlength
RL = cumsum(dRL(1:end-1,:));
It only works along columns since it uses linear indexing. Since you want do something similar along rows, you need to transpose back and forth, so you could use it for your case like so:
>> original = [1 1 1;2 2 3; 1 2 3];
>> original = original.'; % transpose, since runlength works along columns
>> output = runlength(original);
>> output = output.'; % transpose back
>> output(output == 1) = 0; % see hitzg's comment
>> output
output =
3 3 3
2 2 0
0 0 0
I have a bunch of times-series each described by two components, a timestamp vector (in seconds), and a vector of values measured. The time vector is non-uniform (i.e. sampled at non-regular intervals)
I am trying to compute the mean/SD of each 1-minutes interval of values (take X minute interval, compute its mean, take the next interval, ...).
My current implementation uses loops. This is a sample of what I have so far:
t = (100:999)' + rand(900,1); %' non-uniform time
x = 5*rand(900,1) + 10; % x(i) is the value at time t(i)
interval = 1; % 1-min interval
tt = ( floor(t(1)):interval*60:ceil(t(end)) )'; %' stopping points of each interval
N = length(tt)-1;
mu = zeros(N,1);
sd = zeros(N,1);
for i=1:N
indices = ( tt(i) <= t & t < tt(i+1) ); % find t between tt(i) and tt(i+1)
mu(i) = mean( x(indices) );
sd(i) = std( x(indices) );
end
I am wondering if there a faster vectorized solution. This is important because I have a large number of time-series to process each much longer than the sample shown above..
Any help is welcome.
Thank you all for the feedback.
I corrected the way t is generated to be always monotonically increasing (sorted), this was not really an issue..
Also, I may not have stated this clearly but my intention was to have a solution for any interval length in minutes (1-min was just an example)
The only logical solution seems to be...
Ok. I find it funny that to me there is only one logical solution, but many others find other solutions. Regardless, the solution does seem simple. Given the vectors x and t, and a set of equally spaced break points tt,
t = sort((100:999)' + 3*rand(900,1)); % non-uniform time
x = 5*rand(900,1) + 10; % x(i) is the value at time t(i)
tt = ( floor(t(1)):1*60:ceil(t(end)) )';
(Note that I sorted t above.)
I would do this in three fully vectorized lines of code. First, if the breaks were arbitrary and potentially unequal in spacing, I would use histc to determine which intervals the data series falls in. Given they are uniform, just do this:
int = 1 + floor((t - t(1))/60);
Again, if the elements of t were not known to be sorted, I would have used min(t) instead of t(1). Having done that, use accumarray to reduce the results into a mean and standard deviation.
mu = accumarray(int,x,[],#mean);
sd = accumarray(int,x,[],#std);
You could try and create a cell array and apply mean and std via cellfun. It's ~10% slower than your solution for 900 entries, but ~10x faster for 90000 entries.
[t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing
x = x(sortIdx);
tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300
tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable.
%# the next few commands are to count how many 1's 2's 3's etc are in tIdx
dt = [tIdx(2:end)-tIdx(1:end-1);1];
stepIdx = [0;find(dt>0)];
nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears
%# convert to cell array
xCell = mat2cell(x,nIdx,1);
%# use cellfun to calculate the mean and sd
mu(tIdx(stepIdx+1)) = cellfun(#mean,xCell); %# the indexing is like that since there may be missing steps
sd(tIdx(stepIdx+1)) = cellfun(#mean,xCell);
Note: my solution does not give the exact same results as yours, since you skip a few time values at the end (1:60:90 is [1,61]), and since the start of the interval is not exactly the same.
Here's a way that uses binary search. It is 6-10x faster for 9900 elements and about 64x times faster for 99900 elements. It was hard to get reliable times using only 900 elements so I'm not sure which is faster at that size. It uses almost no extra memory if you consider making tx directly from the generated data. Other than that it just has four extra float variables (prevind, first, mid, and last).
% Sort the data so that we can use binary search (takes O(N logN) time complexity).
tx = sortrows([t x]);
prevind = 1;
for i=1:N
% First do a binary search to find the end of this section
first = prevind;
last = length(tx);
while first ~= last
mid = floor((first+last)/2);
if tt(i+1) > tx(mid,1)
first = mid+1;
else
last = mid;
end;
end;
mu(i) = mean( tx(prevind:last-1,2) );
sd(i) = std( tx(prevind:last-1,2) );
prevind = last;
end;
It uses all of the variables that you had originally. I hope that it suits your needs. It is faster because it takes O(log N) to find the indices with binary search, but O(N) to find them the way you were doing it.
You can compute indices all at once using bsxfun:
indices = ( bsxfun(#ge, t, tt(1:end-1)') & bsxfun(#lt, t, tt(2:end)') );
This is faster than looping but requires storing them all at once (time vs space tradeoff)..
Disclaimer: I worked this out on paper, but haven't yet had the opportunity to check it "in silico"...
You may be able to avoid loops or using cell arrays by doing some tricky cumulative sums, indexing, and calculating the means and standard deviations yourself. Here's some code that I believe will work, although I am unsure how it stacks up speed-wise to the other solutions:
[t,sortIndex] = sort(t); %# Sort the time points
x = x(sortIndex); %# Sort the data values
interval = 60; %# Interval size, in seconds
intervalIndex = floor((t-t(1))./interval)+1; %# Collect t into intervals
nIntervals = max(intervalIndex); %# The number of intervals
mu = zeros(nIntervals,1); %# Preallocate mu
sd = zeros(nIntervals,1); %# Preallocate sd
sumIndex = [find(diff(intervalIndex)) ...
numel(intervalIndex)]; %# Find indices of the interval ends
n = diff([0 sumIndex]); %# Number of samples per interval
xSum = cumsum(x); %# Cumulative sum of x
xSum = diff([0 xSum(sumIndex)]); %# Sum per interval
xxSum = cumsum(x.^2); %# Cumulative sum of x^2
xxSum = diff([0 xxSum(sumIndex)]); %# Squared sum per interval
intervalIndex = intervalIndex(sumIndex); %# Find index into mu and sd
mu(intervalIndex) = xSum./n; %# Compute mean
sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1)); %# Compute std dev
The above computes the standard deviation using the simplification of the formula found on this Wikipedia page.
The same answer as above but with the parametric interval (window_size).
Issue with the vector lengths solved as well.
window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above
t = sort((100:999)' + 3*rand(900,1)); % non-uniform time
x = 5*rand(900,1) + 10; % x(i) is the value at time t(i)
int = 1 + floor((t - t(1))/window_size);
tt = ( floor(t(1)):window_size:ceil(t(end)) )';
% mean val and std dev of the accelerations at speed
mu = accumarray(int,x,[],#mean);
sd = accumarray(int,x,[],#std);
%resolving some issue with sizes (for i.e. window_size = 1 in stead of 60)
while ( sum(size(tt) > size(mu)) > 0 )
tt(end)=[];
end
errorbar(tt,mu,sd);