sound classfication using mfcc and dynamic time warping (dtw) - mfcc

My objective is to classify non-speech signal for which I am using mfcc and dtw in java. However I am stuck in middle. I would appreciate any help.
I have evaluated 13 mfcc values for each frame however some values are negative, I am confused whether the process I am following is right or wrong. Currently I am using the code provided by JAudio. I have also tried other code, they give me negative values as well.
Secondly, I get 13 coefficients for each frame, considering 157 frames for a certain length of sample, I get 157 sets of 13 mfccs. I am having hard time how to use all the coefficients in DTW because dtw only gives closest distance between two time signals. I do have code of DTW to compare two time signals. I am not sure how to use all the mfccs values of the signal as features.
Is there some crucial step of classification I am missing? Please help me.

The use of DTW suppose to verify 2 audio sequences in your case. Thus, for the sequence to be verify you will have a matrix M1xN and for the query M2xN. This implies that your cost matrix will have M1xM2.
To construct the cost matrix you have to apply a distance/cost measure between the sequences, as cost(i,j) = your_chosen_multidimension_metric(M1[i,:],M2[j,:])
The resulted cost matrix will be 2D, and you could apply easily DTW.
I made a similar code for DTW based on MFCC. Below is the Python implementation which returs DTW score; x and y are the MFCC matrix of voice sequences, with M1xN and M2xN dimensions:
def my_dtw (x, y):
cost_matrix = cdist(x, y,metric='seuclidean')
m,n = np.shape(cost_matrix)
for i in range(m):
for j in range(n):
if ((i==0) & (j==0)):
cost_matrix[i,j] = cost_matrix[i,j]
elif (i==0):
cost_matrix[i,j] = cost_matrix[i,j] + cost_matrix[i,j-1]
elif (j==0):
cost_matrix[i,j] = cost_matrix[i,j] + cost_matrix[i-1,j]
else:
min_local_dist = cost_matrix[i-1,j]
if min_local_dist > cost_matrix[i,j-1]:
min_local_dist = cost_matrix[i,j-1]
if min_local_dist > cost_matrix[i-1,j-1]:
min_local_dist = cost_matrix[i-1,j-1]
cost_matrix[i,j] = cost_matrix[i,j] + min_local_dist
return cost_matrix[m-1,n-1]

Say you have N1 sets of 13 MFCCs each for the first signal and N2 sets of MFCCs for the second.
You should compute the distance between each set in from the first signal and each set from the second (you can use the Euclidian Distance for the distance between two 13-sized arrays)
This would leave you with an N1xN2 bidimensional array on which you should now apply the DTW.

Check out: http://code.google.com/p/aquila/
Specifically: http://code.google.com/p/aquila/source/browse/trunk/examples/dtw_distance/main.cpp which has an example codeof dtw distace calculation.

Related

Small bug in MATLAB R2017B LogLikelihood after fitnlm?

Background: I am working on a problem similar to the nonlinear logistic regression described in the link [1] (my problem is more complicated, but link [1] is enough for the next sections of this post). Comparing my results with those obtained in parallel with a R package, I got similar results for the coefficients, but (very approximately) an opposite logLikelihood.
Hypothesis: The logLikelihood given by fitnlm in Matlab is in fact the negative LogLikelihood. (Note that this impairs consequently the BIC and AIC computation by Matlab)
Reasonning: in [1], the same problem is solved through two different approaches. ML-approach/ By defining the negative LogLikelihood and making an optimization with fminsearch. GLS-approach/ By using fitnlm.
The negative LogLikelihood after the ML-approach is:380
The negative LogLikelihood after the GLS-approach is:-406
I imagine the second one should be at least multiplied by (-1)?
Questions: Did I miss something? Is the (-1) coefficient enough, or would this simple correction not be enough?
Self-contained code:
%copy-pasting code from [1]
myf = #(beta,x) beta(1)*x./(beta(2) + x);
mymodelfun = #(beta,x) 1./(1 + exp(-myf(beta,x)));
rng(300,'twister');
x = linspace(-1,1,200)';
beta = [10;2];
beta0=[3;3];
mu = mymodelfun(beta,x);
n = 50;
z = binornd(n,mu);
y = z./n;
%ML Approach
mynegloglik = #(beta) -sum(log(binopdf(z,n,mymodelfun(beta,x))));
opts = optimset('fminsearch');
opts.MaxFunEvals = Inf;
opts.MaxIter = 10000;
betaHatML = fminsearch(mynegloglik,beta0,opts)
neglogLH_MLApproach = mynegloglik(betaHatML);
%GLS Approach
wfun = #(xx) n./(xx.*(1-xx));
nlm = fitnlm(x,y,mymodelfun,beta0,'Weights',wfun)
neglogLH_GLSApproach = - nlm.LogLikelihood;
Source:
[1] https://uk.mathworks.com/help/stats/examples/nonlinear-logistic-regression.html
This answer (now) only details which code is used. Please see Tom Lane's answer below for a substantive answer.
Basically, fitnlm.m is a call to NonLinearModel.fit.
When opening NonLinearModel.m, one gets in line 1209:
model.LogLikelihood = getlogLikelihood(model);
getlogLikelihood is itself described between lines 1234-1251.
For instance:
function L = getlogLikelihood(model)
(...)
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
(...)
Please also not that this notably impacts ModelCriterion.AIC and ModelCriterion.BIC, as they are computed using model.LogLikelihood ("thinking" it is the logLikelihood).
To get the corresponding formula for BIC/AIC/..., type:
edit classreg.regr.modelutils.modelcriterion
this is Tom from MathWorks. Take another look at the formula quoted:
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
Remember the normal distribution has a factor (1/sqrt(2*pi)), so taking logs of that gives us -log(2*pi)/2. So the minus sign comes from that and it is part of the log likelihood. The property value is not the negative log likelihood.
One reason for the difference in the two log likelihood values is that the "ML approach" value is computing something based on the discrete probabilities from the binomial distribution. Those are all between 0 and 1, and they add up to 1. The "GLS approach" is computing something based on the probability density of the continuous normal distribution. In this example, the standard deviation of the residuals is about 0.0462. That leads to density values that are much higher than 1 at the peak. So the two things are not really comparable. You would need to convert the normal values to probabilities on the same discrete intervals that correspond to individual outcomes from the binomial distribution.

FFT of a real symmetric vector is not real and symmetric

I am having a hard time understanding what should be a simple concept. I have constructed a vector in MATLAB that is real and symmetric. When I take the FFT in MATLAB, the result has a significant imaginary component, even though the symmetry rules of the Fourier transform say that the FT of a real symmetric function should also be real and symmetric. My example code:
N = 1 + 2^8;
k = linspace(-1,1,N);
V = exp(-abs(k));
Vf1 = fft(fftshift(V));
Vf2 = fft(ifftshift(V));
Vf3 = ifft(fftshift(V));
Vf4 = ifft(ifftshift(V));
Vf5 = fft(V);
Vf6 = ifft(V);
disp([isreal(Vf1) isreal(Vf2) isreal(Vf3) isreal(Vf4) isreal(Vf5) isreal(Vf6)])
Result:
0 0 0 0 0 0
No combinations of (i)fft or (i)fftshift result in a real symmetric vector. I've tried with both even and odd N (N = 2^8 vs. N = 1+2^8).
I did try looking at k+flip(k) and there are some residuals on the order of eps(1), but the residuals are also symmetric and the imaginary part of the FFT is not coming out as fuzz on the order of eps(1), but rather with magnitude comparable to the real part.
What blindingly obvious thing am I missing?
Blindingly obvious thing I was missing:
The FFT is not an integral over all space, so it assumes a periodic signal. Above, I am duplicating the last point in the period when I choose an even N, and so there is no way to shift it around to put the zero frequency at the beginning without fractional indexing, which does not exist.
A word about my choice of k. It is not arbitrary. The actual problem I am trying to solve is to generate a model FTIR interferogram which I will then FFT to get a spectrum. k is the distance that the interferometer travels which gets transformed to frequency in wavenumbers. In the real problem there will be various scaling factors so that the generating function V will yield physically meaningful numbers.
It's
Vf = fftshift(fft(ifftshift(V)));
That is, you need ifftshift in time-domain so that samples are interpreted as those of a symmetric function, and then fftshift in frequency-domain to again make symmetry apparent.
This only works for N odd. For N even, the concept of a symmetric function does not make sense: there is no way to shift the signal so that it is symmetric with respect to the origin (the origin would need to be "between two samples", which is impossible).
For your example V, the above code gives Vf real and symmetric. The following figure has been generated with semilogy(Vf), so that small as well as large values can be seen. (Of course, you could modify the horizontal axis so that the graph is centered at 0 frequency as it should; but anyway the graph is seen to be symmetric.)
#Yvon is absolutely right with his comment about symmetry. Your input signal looks symmetrical, but it isn't because symmetry is related to origin 0.
Using linspace in Matlab for constructing signals is mostly a bad choice.
Trying to repair the results with fftshift is a bad idea too.
Use instead:
k = 2*(0:N-1)/N - 1;
and you will get the result you expect.
However the imaginary part of the transformed values will not be perfectly zero.
There is some numerical noise.
>> max(abs(imag(Vf5)))
ans =
2.5535e-15
Answer to Yvon's question:
Why? >> N = 1+2^4 N = 17 >> x=linspace(-1,1,N) x = -1.0000 -0.8750 -0.7500 -0.6250 -0.5000 -0.3750 -0.2500 -0.1250 0 0.1250 0.2500 0.3750 0.5000 0.6250 0.7500 0.8750 1.0000 >> y=2*(0:N-1)/N-1 y = -1.0000 -0.8824 -0.7647 -0.6471 -0.5294 -0.4118 -0.2941 -0.1765 -0.0588 0.0588 0.1765 0.2941 0.4118 0.5294 0.6471 0.7647 0.8824 – Yvon 1
Your example is not a symmetric (even) function, but an antisymmetric (odd) function. However, this makes no difference.
For a antisymmetric function of length N the following statement is true:
f[i] == -f[-i] == -f[N-i]
The index i runs from 0 to N-1.
Let us see was happens with i=2. Remember, count starts with 0 and ends with 16.
x[2] = -0.75
-x[N-2] == -x[17-2] == -x[15] = (-1) 0.875 = -0.875
x[2] != -x[N-2]
y[2] = -0.7647
-y[N-2] == -y[15] = (-1) 0.7647
y[2] == y[N-2]
The problem is, that the origin of Matlab vectors start at 1.
Modulo (periodic) vectors start with origin 0.
This difference leads to many misunderstandings.
Another way of explanation why linspace(-1,+1,N) is not correct:
Imagine you have a vector which holds a single period of a periodic function,
for instance a Cosinus function. This single period is one of a infinite number of periods.
The first value of your Cosinus vector must not be same as the last value of your vector.
However,that is exactly what linspace(-1,+1,N) does.
Doing so, results in a sequence where the last value of period 1 is the same value as the first sample of the following period 2. That is not what you want.
To avoid this mistake use t = 2*(0:N-1)/N - 1. The distance t[i+1]-t[i] is 2/N and the last value has to be t[N-1] = 1 - 2/N and not 1.
Answer to Yvon's second comment
Whatever you put in an input vector of a DFT/FFT, by theory it is interpreted as a periodic function.
But that is not the point.
DFT performs an integration.
fft(m) = Sum_(k=0)^(N-1) (x(k) exp(-i 2 pi m k/N )
The first value x(k=0) describes the amplitude of the first integration interval of length 1/N. The second value x(k=1) describes the amplitude of the second integration interval of length 1/N. And so on.
The very last integration interval of the symmetric function ends with same value as the first sample. This means, the starting point of the last integration interval is k=N-1 = 1-1/N. Your input vector holds the starting points of the integration intervals.
Therefore, the last point of symmetry k=N is a point of the function, but it is not a starting point of an integration interval and so it is not a member of the input vector.
You have a problem when implementing the concept "symmetry". A purely real, even (or "symmetric") function has a Fourier transform function that is also real and even. "Even" is the symmetry with respect to the y-axis, or the t=0 line.
When implementing a signal in Matlab, however, you always start from t=0. That is, there is no way to "define" the signal starting from before the origin of time.
Searching the Internet for a while lead me to this -
Correct use of fftshift and ifftshift at input to fft and ifft.
As Luis has pointed out, you need to perform ifftshift before feeding the signal into fft. The reason has never been documented in Matlab, but only in that thread. For historical reasons, outputs AND inputs of fft and ifft are swapped. That is, instead of ordered from -N/2 to N/2-1 (the natural order), the signal in time or frequency domain is ordered from 0 to N/2-1 and then -N/2 to -1. That means, the correct way to code is fft( ifftshift(V) ), but most people ignore this at most times. Why it's got silently ignored rather than raising huge problems is that most concerns have been put on the amplitude of signal, not phase. Since circular shifting does not affect amplitude spectrum, this is not a problem (even for the Matlab guys who have written the documentations).
To check the amplitude equality -
Vf2 = fft(ifftshift(V));
Vf5 = fft(V);
Va2 = abs(fftshift(Vf2));
Va5 = abs(fftshift(Vf5));
>> min(abs(Va2-Va5)<1e-10)
ans =
1
To see how badly wrong in phase -
Vp2 = angle(fftshift(Vf2));
Vp5 = angle(fftshift(Vf5));
Anyway, as I wrote in the comment, after copy&pasting your code into a fresh and clean Matlab, it gives 0 1 0 1 0 0.
To your question about N=even and N=odd, my opinion is when N=even, the signal is not symmetric, since there are unequal number of points on either side of the time origin.
Just add the following line after "k = linspace(-1,1,N);"
k(end)=[];
it will remove the last element of the array. This is defined to be symmetric array.
also consider that isreal(complex(1,0)) is false!!!
The isreal function just checks for the memory storage format. so 1+0i is not real in the above example.
You have define your function in order to check for real numbers (like this)
myisreal=#(x) all((abs(imag(x))<1e-6*abs(real(x)))|(abs(x)<1e-8));
Finally your source code should become something like this:
N = 1 + 2^8;
k = linspace(-1,1,N);
k(end)=[];
V = exp(-abs(k));
Vf1 = fft(fftshift(V));
Vf2 = fft(ifftshift(V));
Vf3 = ifft(fftshift(V));
Vf4 = ifft(ifftshift(V));
Vf5 = fft(V);
Vf6 = ifft(V);
myisreal=#(x) all((abs(imag(x))<1e-6*abs(real(x)))|(abs(x)<1e-8));
disp([myisreal(Vf1) myisreal(Vf2) myisreal(Vf3) myisreal(Vf4) myisreal(Vf5) myisreal(Vf6)]);

Sampling and DTFT in Matlab

I need to produce a signal x=-2*cos(100*pi*n)+2*cos(140*pi*n)+cos(200*pi*n)
So I put it like this :
N=1024;
for n=1:N
x=-2*cos(100*pi*n)+2*cos(140*pi*n)+cos(200*pi*n);
end
But What I get is that the result keeps giving out 1
I tried to test each values according to each n, and I get the same results for any n
For example -2*cos(100*pi*n) with n=1 has to be -1.393310473. Instead of that, Matlab gave the result -2 for it and it always gave -2 for any n
I don't know how to fix it, so I hope someone could help me out! Thank you!
Not sure where you get the idea that -2*cos(100*pi) should be anything other than -2. Maybe you are not aware that Matlab works in radians?
Look at your expression. Each term can be factored to contain 2*pi*(an integer). And you should know that cos(2*pi*(an integer)) = 1.
So the results are exactly as expected.
What you are seeing is basically what happens when you under-sample a waveform. You may know that the Nyquist criterion says that you need to have a sampling rate that is at least two times greater than the highest frequency component present; but in your case, you are sampling one point every 50, 70, 100 complete cycles. So you are "far beyond Nyquist". And that can only be solved by sampling more closely.
For example, you could do:
t = linspace(0, 1, 1024); % sample the waveform 1024 times between 0 and 1
f1 = 50;
f2 = 70;
f3 = 100;
signal = -2*cos(2*pi*f1*t) + 2*cos(2*pi*f2*t) + cos(2*pi*f3*t);
figure; plot(t, signal)
I think you are using degrees when you are doing your calculations, so do this:
n = 1:1024
x=-2*cosd(100*pi*n)+2*cosd(140*pi*n)+cosd(200*pi*n);
cosd uses degrees instead of radians. Radians is the default for cos so matlab has a separate function when degree input is used. For me this gave:
-2*cosd(100*pi*1) = -1.3933
The first term that I got using:
x=-2*cosd(100*pi*1)+2*cosd(140*pi*1)+cosd(200*pi*1)
x = -1.0693
Also notice that I defined n as n = 1:1024; this will give all integers from 1,2,...,1024,
there is no need to use a for loop since many of Matlab's built in functions are vectorized. Meaning you can just input a vector and it will calculate the function for every element in the vector.

Frequency array feeds FFT

The final goal I am trying to achieve is the generation of a ten minutes time series: to achieve this I have to perform an FFT operation, and it's the point I have been stumbling upon.
Generally the aimed time series will be assigned as the sum of two terms: a steady component U(t) and a fluctuating component u'(t). That is
u(t) = U(t) + u'(t);
So generally, my code follows this procedure:
1) Given data
time = 600 [s];
Nfft = 4096;
L = 340.2 [m];
U = 10 [m/s];
df = 1/600 = 0.00167 Hz;
fn = Nfft/(2*time) = 3.4133 Hz;
This means that my frequency array should be laid out as follows:
f = (-fn+df):df:fn;
But, instead of using the whole f array, I am only making use of the positive half:
fpos = df:fn = 0.00167:3.4133 Hz;
2) Spectrum Definition
I define a certain spectrum shape, applying the following relationship
Su = (6*L*U)./((1 + 6.*fpos.*(L/U)).^(5/3));
3) Random phase generation
I, then, have to generate a set of complex samples with a determined distribution: in my case, the random phase will approach a standard Gaussian distribution (mu = 0, sigma = 1).
In MATLAB I call
nn = complex(normrnd(0,1,Nfft/2),normrnd(0,1,Nfft/2));
4) Apply random phase
To apply the random phase, I just do this
Hu = Su*nn;
At this point start my pains!
So far, I only generated Nfft/2 = 2048 complex samples accounting for the fpos content. Therefore, the content accounting for the negative half of f is still missing. To overcome this issue, I was thinking to merge the real and imaginary part of Hu, in order to get a signal Huu with Nfft = 4096 samples and with all real values.
But, by using this merging process, the 0-th frequency order would not be represented, since the imaginary part of Hu is defined for fpos.
Thus, how to account for the 0-th order by keeping a procedure as the one I have been proposing so far?

why is the vector coming out of 'trapz' function as NAN?

i am trying to calculate the inverse fourier transform of the vector XRECW. for some reason i get a vector of NANs.
please help!!
t = -2:1/100:2;
x = ((2/5)*sin(5*pi*t))./((1/25)-t.^2);
w = -20*pi:0.01*pi:20*pi;
Hw = (exp(j*pi.*(w./(10*pi)))./(sinc(w./(10*pi)))).*(heaviside(w+5*pi)-heaviside(w-5*pi));%low pass filter
xzohw = 0;
for q=1:20:400
xzohw = xzohw + x(q).*(2./w).*sin(0.1.*w).*exp(-j.*w*0.2*((q-1)/20)+0.5);%calculating fourier transform of xzoh
end
xzohw = abs(xzohw);
xrecw = abs(xzohw.*Hw);%filtering the fourier transform high frequencies
xrect=0;
for q=1:401
xrect(q) = (1/(2*pi)).*trapz(xrecw.*exp(j*w*t(q))); %inverse fourier transform
end
xrect = abs(xrect);
plot(t,xrect)
Here's a direct answer to your question of "why" there is a nan. If you run your code, the Nan comes from dividing by zero in line 7 for computing xzohw. Notice that w contains zero:
>> find(w==0)
ans =
2001
and you can see in line 7 that you divide by the elements of w with the (2./w) factor.
A quick fix (although it is not a guarantee that your code will do what you want) is to avoid including 0 in w by using a step which avoids zero. Since pi is certainly not divisible by 100, you can try taking steps in .01 increments:
w = -20*pi:0.01:20*pi;
Using this, your code produces a plot which might resemble what you're looking for. In order to do better, we might need more details on exactly what you're trying to do, or what these variables represent.
Hope this helps!