How to find the mean/average of a sound in Nyquist - lisp

I'm trying to write a simple measurement plug-in for Audacity and it's about as much fun as pounding rocks against my skull. All I want to do is take a chunk of audio and find the average of all the samples (the chunk's DC offset) so that I can present it as a number to the user, and so that I can subtract the DC offset from the samples for further processing. I know and understand the math I want to do, but I don't understand how to do it in Lisp/XLisp/Nyquist/whatever.
Background information in this thread
As far as I know, there is no function to do this. For some reason, the snd-avg function does not actually compute the average of the sound, as you might expect. It does the absolute value first, and then computes the average computes the average and then does the absolute value. Even though there's a separate snd-abs function that could do it. >:(
So I guess I have to write my own? This means converting a sound into an array and then computing the average of that?
(snd-fetch-array sound len step)
Reads
sequential arrays of samples from
sound, returning either an array of
FLONUMs or NIL when the sound
terminates.
(snd-samples sound limit)
Converts the
samples into a lisp array.
Nyquist Functions
And there's not even an average function, so I'll have to do a sum myself, too? But the math functions only work on lists? So I need to convert the array into a list?
And this will also use up a huge amount of memory for longer waveforms (18 bytes per sample), so it would be best to process it in chunks and do a cumulative average. But I don't even know how to do the unoptimized version.
No, (hp s 0.1) won't work, since:
I want to remove DC only, and keep arbitrarily low frequencies. 0.01 Hz should pass through unchanged, DC should be removed.
The high-pass filter is causal and the first samples of the waveform remain unchanged, no matter what knee frequency you use, making it useless for measuring peak samples, etc.

NEVERMIND
snd-maxsamp is computing the absolute value, not snd-avg. snd-avg works just fine. Here's how to squeeze a number ("FLONUM") out of it:
(snd-fetch (snd-avg s (round len) (round len) OP-AVERAGE))
This produces a negative number for negative samples and a positive number for positive samples, as it should.
Should this question be deleted or left as an example to others?

Related

Compression algorithm for contiguous numbers

I'm looking for an efficient encoding for storing simulated coefficients.
The data has thousands of curves with each 512 contiguous numbers with single precision. The data may be stored as fixed point while it should preserve about 23-bit precision (compared to unity level).
The curves could look like those:
My best approach was to convert the numbers to 24-bit fixed point. Repeatedly I took the adjacent difference as long as the sum-of-squares decreases. When compressing the resulting data using LZMA (xz,lzip) I get about 7.5x compression (compared to float32).
Adjacent differences are good at the beginning, but they emphasize the quantization noise at each turn.
I've also tried the cosine transform after subtracting the slope/curve at the boundaries. The resulting compression was much weaker.
I tried AEC but LZMA compressed much stronger. The highest compression was using bzip3 (after adjacent differences).
I found no function to fit the data with high precision and a limited parameter count.
Is there a way to reduce the penalty of quantization noise when using adjacent differences?
Are there encodings which are better suited for this type of data?
You could try a higher-order predictor. Your "adjacent difference" is a zeroth-order predictor, where the next sample is predicted to be equal to the last sample. You take the differences between the actuals and the predictions, and then compress those differences.
You can try first, second, etc. order predictors. A first-order predictor would look at the last two samples, draw a line between those, and predict that the next sample will fall on the line. A second-order predictor would look at the last three samples, fit those to a parabola, and predict that the next sample will fall on the parabola. And so on.
Assuming that your samples are equally spaced on your x-axis, then the predictors for x[0] up through cubics are:
x[-1] (what you're using now)
2*x[-1] - x[-2]
3*x[-1] - 3*x[-2] + x[-3]
4*x[-1] - 6*x[-2] + 4*x[-3] - x[-4]
(Note that the coefficients are alternating-sign binomial coefficients.)
I doubt that the cubic polynomial predictor will be useful for you, but experiment with all of them to see if any help.
Assuming that the differences are small, you should use a variable-length integer to represent them. The idea would be to use one byte for each difference most of the time. For example, you could code seven bits of difference, say -64 to 63, in one byte with the high bit clear. If the difference doesn't fit in that, then make the high bit set, and have a second byte with another seven bits for a total of 14 with that second high bit clear. And so on for larger differences.

Rolling calculation of line of best fit - one sample at a time

In an application I'm working on, I receive (x,y) samples at a fixed rate (100 Hz) one at a time and need to detect a 4-second sequence (400 samples) with a constant gradient (with a certain tolerance), and store this gradient for later use. Each sample pair is 8 bytes long, so I would need 3200 bytes of memory if I use the standard moving window least squares regression algorithm for the purpose. My question is, is there a formula for a continuous (recursive) calculation of the gradient of the line of best fit - one incoming sample at a time, without the need to keep an array of the last 400 samples? Something in the vein of exponential moving average, where at any point in time only the latest averaged value needs to be known in order to update it with the new incoming sample.
Would appreciate any pointers to existing solutions.

Wrong Amplitude after FFT [Matlab] [duplicate]

I am trying to use FFT to decode morse code, but I'm finding that when I examine the resulting frequency bin/bucket I'm interested in, the absolute value is varying quite significantly even when a constant tone is presented. This makes it impossible for me to use the rise and fall around a threshold and therefore decode audio morse.
I've even tried the simple example that seems to be copied everywhere, but it also varies...
I can't work out what I'm doing wrong, and my maths is not clever enough to understand all the formulas associated with FFT.
I now it must be possible, but I can't find out how... can anyone help please?
Make sure you are using the magnitude of the FFT result, not just the real or imaginary component of a complex result.
In general, when a longer constant amplitude sinusoid is fed to a series of shorter FFTs (windowed STFT), the magnitude result will only be constant if the period of the sinusoid is exactly integer periodic in the FFT length. e.g.
f_tone modulo (f_sampling_rate / FFT_length) == 0
If you are only interested in the magnitude of one selected tone frequency, the Goertzel algorithm would serve as a more efficient filter than a full FFT. And, depending on the setup and length restrictions required by your chosen FFT library, it may be easier to vary the length of a Goertzel to match the requirements for your target tone frequency, as well as the time/frequency resolution trade-off needed.

Peak Detection Matlab

I'm trying to get all large peaks values of this signal :
As you can see there is one large peak followed by one smaller peak, and I want to get each value of the largest peak. I already tried this [pks1,locs1] = findpeaks(y1,'MinPeakHeight',??); but I can't find what I can write instead of the ?? knowing that the signal will not be the same every time (of course there will ever be a large+smaller peak schema but time intervals and amplitudes can change). I tried a lot of things using std(), mean(),max() but none of the combination works properly.
Any ideas on how can I solve the problem ?
You could try using the 'MinPeakDistance' keyword and enter a minimum distance between the two peaks slightly higher than the distance between the large peak and the following small peak. So for example:
[pks1,locs1] = findpeaks(y1,'MinPeakDistance',0.3);
Edit:
If the time between peaks (and the following smaller one) varies a lot you'll probably have to do some post-processing. First find all the peaks including the smaller second ones. Then in your array of peaks remove every peak which is significantly lower than its two neighbours.
You could also try fiddling with 'MinPeakProminence'.
Generally these problems require a lot of calibration for the final few percent of the algorithms accuracy, and there's no universal cure.
I also recommend having a look at all the other options in the documentation.

How can I extract interval time between words in mp3 (audio) file by MATLAB?

I am trying to find interval time between words in several mp3 (audio file)
Before going further, let me explain about my audio files as below:
I asked my subjects to generate as many animal names as possible in 60 seconds. And I recorded their speech.
Therefore, all of mp3 files are sequences of animal names. (words, not sentence continuously)
First, I read one of the files and make a graph by just typing:
plot(data);
The graph is as below:
I think some local maximums are candidates of animal naming. However, I cannot figure out and get the (possibly exact) interval time between animal names.
Just peak to peak is okay? or any other alternatives for calculating the intervals between words?
Thanks :)
Peak to peak is very coarse etimation for the silence segments between the vocals segments. The method of threshold that #vsoch suggested also not allways suitable, because the vocal segments contains also low, and zero values.
The conventional method to extract the silence segments is comparing energies over different segments. You need to divide the signal to segments of around 30 ms (the exact number of samples depends on the sample rate). You also better do it with overlap of about 10 ms between segments.
For each segment evaluate the energy. this can be done by sum(segment.^2) (psaudocode...). Then plot the energies you've got, to see and choose the threshold that will separate between vocal and silence segments.
You want to use findpeaks
[pks,locs] = findpeaks(data)
You could find the local maxes that way, go left and right until it drops to a certain threshold or by a certain percentage (since the peaks have some girth), and then order them, and calculate the distances between as a subtraction between values.