Delta coefficients from mfcc - mfcc

Can somebody explain to meabout calculating delta coefficients from MFCC for a frame? I didn't understand the interpretation in practical cryptography's tutorial.

The delta coefficients are the approximate derivatives, so a simple way is to calculate:
delta: v(t) = ( c(t+1) - c(t-1) ) / 2
delta-delta: a(t) = c(t-1) - 2 * c(t) + c(t+1)
But I have read that in practice, "it is more common to make more sophisticated approximations to the slope, using a wider context of frames" (Jurafsky et al., 2007, Speech and Language Processing) to determine the delta and delta-delta. For example, we might consult a finite differences table (we can see that the two values above are the lowest order estimates from those tables, but higher order estimates use more points in the calculations).

Related

How many iterations should you make for the simulation to be a 'Monte Carlo simulation' for BER calculations?

Edited question
How many iterations should you make for the simulation to be an accurate 'Monte Carlo simulation' for Bit error rate calculations?
What is the minimum value? If I want to repeat the simulation by an exponentially growing number for five times? should I start from 1e2 thus>> iterations = [1e2 1e3 1e4 1e5 1e6] or 1e3 >> [1e3 1e4 1e5 1e6 1e7]? or something else? what is the common practice?
Additional info:
I used [8e3 1e4 3e4 5e4 8e4 1e5] before but that is not enough according to the prof. because the result is not satisfactory.
Simulations take a very long time on my computer so I cannot keep changing the iterations based on the result. If there is a common practice about this, please let me know.
Thanks #BillBokeey for helping me edit the question.
What your professor propose strikes me as qualitative, but not quantitative way to estimate the convergence of your simulation.
Frankly, I don't know how BER is computed, but I deal a lot with some integral calculations by MC.
In such case you sample xi over some interval and compute
fMC = Si fi / N, where S denotes summation. We know that fMC will converge to true value with variance of sigma2/N (or std.deviation of sigma/sqrt(N)). What do we do then, we compute in the same simulation estimation of sigma, assume for large enough N to be good approximation of sigma and get simulation error plotted. IN practical terms alongside with fMC we compute second momentum sum and average as f2MC = Si f2i / N, and at the end get s=sqrt(f2MC - (fMC)2)/sqrt(N) as estimated error of the MC simulation (it will be a bit biased though).
Thus you could plot on the same graph value of BER and statistical error of the simulation. You could even do better - ask user to input required statistical error (say, in %, meaning user enters s/f*100), and continue simulation in bunches till you reach required precision.
THen you could judge if 109 points are enough or not...
Assuming that we denote our simulated BER as Pb_hat and that Pb_hat in [(1 - alpha)Pb, (1 + alpha)Pb], where Pb is the true BER, and alpha is the percent deviation tolerance (e.g., 0.1), then from [van Trees 2013, pg. 83] we know that the number of Monte Carlo trials required to obtain Pb_hat with a confidence probability pc is K=(c / alpha)^2 x (1-Pb) / Pb,
with c given in Table I.
Table I: confidence interval probabilities from the Gaussian distribution
pc
0.900
0.950
0.954
0.990
0.997
c
1.645
1.960
2.000
2.576
3.000
Example: Suppose we want to simulate a BER of 10^-4 with a percent deviation tolerance of 0.01 and a confidence probability 0.950, then from Table I we know that c = 1.960 and by applying the formula K = (1.96/0.01)^2 x (1-10^-4)/10^-4 = 384121584 Monte Carlo trials. This is a surprisingly large value, though.
As a rule of thumb, K should be on the order of 1O/BER [Jeruchim 1984]
[van Trees 2013] H. L. van Trees, K. L. Bell, and Z. Tian, Detection, estimation, and filtering theory, 2nd ed., Hoboken, NJ: Wiley, 2013.
[Jeruchim 1984] M. Jeruchim, "Techniques for Estimating the Bit Error Rate in the Simulation of Digital Communication Systems," in IEEE Journal on Selected Areas in Communications, vol. 2, no. 1, pp. 153-170, January 1984, doi: 10.1109/JSAC.1984.1146031.

some questions on cosine similarity

Yesterday I learnt that the cosine similarity, defined as
can effectively measure how similar two vectors are.
I find that the definition here uses the L2-norm to normalize the dot product of A and B, what I am interested in is that why not use the L1-norm of A and B in the denominator?
My teacher told me that if I use the L1-norm in the denominator, then cosine similarity would not be 1 if A=B. Then, I further ask him, if I modify the cosine similarity definition as follows, what the advantages and disadvantages the modified model are, as compared with the original model?
sim(A,B) = (A * B) / (||A||1 * ||B||1) if A!=B
sim(A,B) = 1 if A==B
I would appreciate if someone could give me some more explanations.
If you used L1-norm, your are not computing the cosine anymore.
Cosine is a geometrical concept, not a random definition. There is a whole string of mathematics attached to it. If you used the L1, you are not measuring angles anymore.
See also: Wikipedia: Trigonometric functions - Cosine
Note that cosine is monotone to Euclidean distance on L2 normalized vectors.
Euclidean(x,y)^2 = sum( (x-y)^2 ) = sum(x^2) + sum(y^2) - 2 sum(x*y)
if x and y are L2 normalized, then sum(x^2)=sum(y^2)=1, and then
Euclidean(x_norm,y_norm)^2 = 2 * (1 - sum(x_norm*y_norm)) = 2 * (1 - cossim(x,y))
So using cosine similarity essentially means standardizing your data to unit length. But there are also computational benefits associated with this, as sum(x*y) is cheaper to compute for sparse data.
If you L2 normalized your data, then
Euclidean(x_norm, y_norm) = sqrt(2) * sqrt(1-cossim(x,y))
For the second part of your question: fixing L1 norm isn't that easy. Consider the vectors (1,1) and (2,2). Obviously, these two vectors have the same angle, and thus should have cosine similarity 1.
Using your equation, they would have similarity (2+2)/(2*4) = 0.5
Looking at the vectors (0,1) and (0,2) - where most people agree they should have a similar similarity than above example (and where cosine indeed gives the same similarity), your equation yields (0+2)/(1+2) = 0.6666.... So your similarity does not match any intuition, does it?

How does number of points change a FFT in MATLAB

When taking fft(signal, nfft) of a signal, how does nfft change the outcome and why? Can I have a fixed value for nfft, say 2^18, or do I need to go 2^nextpow2(2*length(signal)-1)?
I am computing the power spectral density(PSD) of two signals by taking the FFT of the autocorrelation, and I want to compare the the results. Since the signals are of different lengths, I am worried if I don't fix nfft, it would make the comparison really hard!
There is no inherent reason to use a power-of-two (it just might make the processing more efficient in some circumstances).
However, to make the FFTs of two different signals "commensurate", you will indeed need to zero-pad one or other (or both) signals to the same lengths before taking their FFTs.
However, I feel obliged to say: If you need to ask this, then you're probably not at a point on the DSP learning curve where you're going to be able to do anything useful with the results. You should get yourself a decent book on DSP theory, e.g. this.
Most modern FFT implementations (including MATLAB's which is based on FFTW) now rarely require padding a signal's time series to a length equal to a power of two. However, nearly all implementations will offer better, and sometimes much much better, performance for FFT's of data vectors w/ a power of 2 length. For MATLAB specifically, padding to a power of 2 or to a length with many low prime factors will give you the best performance (N = 1000 = 2^3 * 5^3 would be excellent, N = 997 would be a terrible choice).
Zero-padding will not increase frequency resolution in your PSD, however it does reduce the bin-size in the frequency domain. So if you add NZeros to a signal vector of length N the FFT will now output a vector of length ( N + NZeros )/2 + 1. This means that each bin of frequencies will now have a width of:
Bin width (Hz) = F_s / ( N + NZeros )
Where F_s is the signal sample frequency.
If you find that you need to separate or identify two closely space peaks in the frequency domain, you need to increase your sample time. You'll quickly discover that zero-padding buys you nothing to that end - and intuitively that's what we'd expect. How can we expect more information in our power spectrum w/o adding more information (longer time series) in our input?
Best,
Paul

Calculating confidence intervals for a non-normal distribution

First, I should specify that my knowledge of statistics is fairly limited, so please forgive me if my question seems trivial or perhaps doesn't even make sense.
I have data that doesn't appear to be normally distributed. Typically, when I plot confidence intervals, I would use the mean +- 2 standard deviations, but I don't think that is acceptible for a non-uniform distribution. My sample size is currently set to 1000 samples, which would seem like enough to determine if it was a normal distribution or not.
I use Matlab for all my processing, so are there any functions in Matlab that would make it easy to calculate the confidence intervals (say 95%)?
I know there are the 'quantile' and 'prctile' functions, but I'm not sure if that's what I need to use. The function 'mle' also returns confidence intervals for normally distributed data, although you can also supply your own pdf.
Could I use ksdensity to create a pdf for my data, then feed that pdf into the mle function to give me confidence intervals?
Also, how would I go about determining if my data is normally distributed. I mean I can currently tell just by looking at the histogram or pdf from ksdensity, but is there a way to quantitatively measure it?
Thanks!
So there are a couple of questions there. Here are some suggestions
You are right that a mean of 1000 samples should be normally distributed (unless your data is "heavy tailed", which I'm assuming is not the case). to get a 1-alpha-confidence interval for the mean (in your case alpha = 0.05) you can use the 'norminv' function. For example say we wanted a 95% CI for the mean a sample of data X, then we can type
N = 1000; % sample size
X = exprnd(3,N,1); % sample from a non-normal distribution
mu = mean(X); % sample mean (normally distributed)
sig = std(X)/sqrt(N); % sample standard deviation of the mean
alphao2 = .05/2; % alpha over 2
CI = [mu + norminv(alphao2)*sig ,...
mu - norminv(alphao2)*sig ]
CI =
2.9369 3.3126
Testing if a data sample is normally distribution can be done in a lot of ways. One simple method is with a QQ plot. To do this, use 'qqplot(X)' where X is your data sample. If the result is approximately a straight line, the sample is normal. If the result is not a straight line, the sample is not normal.
For example if X = exprnd(3,1000,1) as above, the sample is non-normal and the qqplot is very non-linear:
X = exprnd(3,1000,1);
qqplot(X);
On the other hand if the data is normal the qqplot will give a straight line:
qqplot(randn(1000,1))
You might consider, also, using bootstrapping, with the bootci function.
You may use the method proposed in [1]:
MEDIAN +/- 1.7(1.25R / 1.35SQN)
Where R = Interquartile Range,
SQN = Square Root of N
This is often used in notched box plots, a useful data visualization for non-normal data. If the notches of two medians do not overlap, the medians are, approximately, significantly different at about a 95% confidence level.
[1] McGill, R., J. W. Tukey, and W. A. Larsen. "Variations of Boxplots." The American Statistician. Vol. 32, No. 1, 1978, pp. 12–16.
Are you sure you need confidence intervals or just the 90% range of the random data?
If you need the latter, I suggest you use prctile(). For example, if you have a vector holding independent identically distributed samples of random variables, you can get some useful information by running
y = prcntile(x, [5 50 95])
This will return in [y(1), y(3)] the range where 90% of your samples occur. And in y(2) you get the median of the sample.
Try the following example (using a normally distributed variable):
t = 0:99;
tt = repmat(t, 1000, 1);
x = randn(1000, 100) .* tt + tt; % simple gaussian model with varying mean and variance
y = prctile(x, [5 50 95]);
plot(t, y);
legend('5%','50%','95%')
I have not used Matlab but from my understanding of statistics, if your distribution cannot be assumed to be normal distribution, then you have to take it as Student t distribution and calculate confidence Interval and accuracy.
http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

Least squares optimal scaling

I have two waveforms which are linked by a numerical factor. I need to use optimal scaling (least squares) between the two waveforms to calculate this factor in Matlab. Unfortunately I have no idea how to do this. The two wave forms are seismic signals related by the velocity of the seismic waves, which I'm trying to calculate. Any ideas? need more info?
Call W1 and W2 the two vectors. For this to work, they must be column vectors. Transpose them if they are rows instead of columns. Then if we wish to find the value of k such that W1 = k*W2, just use backslash.
k = W2\W1;
Backslash here gives you a linear regression (least squares) estimator, as requested. This does not handle the unknown phase shift case of course.
one cheesy way to estimate the linear factor without having to deal with phase shift is to compute the ratio of the estimated scales of the waves. the cheesiest way is to use standard deviation:
k = std(W1) / std(W2);
if you care about robustness, I would substitute in the MAD or the IQR; the MAD is the median absolute deviation, which you can (somewhat inefficiently) 'inline' as so:
MAD = #(x)(median(abs(bsxfun(#minus,x,median(x)))));
k = MAD(W1) / MAD(W2);
the IQR is the interquartile range, which requires a proper quantile computation. you can implement this inefficiently using sort. I leave this as an exercise to the reader.