Aligning two wav files precisely - matlab

I have a tool which compares two audio wav files frame by frame and returns a grade which gives the level of similarity between the two files.
I have an original wav file and a recording of the wav file, since the two files are almost similar i should get a high score of similarity, yet i get a poor score, mainly due to a very slight delay in the recorded file-leading to frame mismatch
My question is- how do i go about aligning the two audio files exactly using MATLAB, so that a valid frame to frame comparison may be done.

You should run a series of comparisons, shifting one of the frame in time and calculating the correlation between two. Highest value of correlation will give you time shift between waves.
I think you can use xcorr to achieve this.

Having had the same problem and without success to find a simple tool to sync the start of video/audio recordings automatically,
I decided to make syncstart (github).
It is a python-based command line tool that calculates the cut needed to bring the recordings into sync.
It uses an fft-based correlation of the start.
The basic code should be easily convertible to matlab:
corr = fft.ifft(fft.fft(s1pad)*np.conj(fft.fft(s2pad)))
ca = np.absolute(corr)
xmax = np.argmax(ca)
if xmax > padsize // 2:
offset = (padsize-xmax)/fs
#second signal (s2) to cut
else:
offset = xmax/fs
#first signal (s1) to cut

Related

Matlab noise source discontinuity

Using Matlab, I've made some random noise, filtered it and then successfully saved it as a gnuradio readable file for a file source. Once used in gnuradio, I set the file source to repeat and then viewed it using QT Gui Frequency Sink. I can see the filtered noise fine, but every now and then (every 10 seconds or so), the spectrum will drop in power and jump around for around a tenth of a second, then return back to normal power. My sample rate for the matlab filter is 320k and same with my gnuradio sample rate if that matters.
I think it may have to do with the fact that the noise generated on matlab is going to be a sequence that is repeated on gnuradio. I think the discontinuity happens right when the sequence repeats. Any idea how I can stop this discontinuity so I can transmit without having to worry about it? If I'm missing any info, please let me know and I'll edit the question. Thanks in advance.
NOTE: I needed to create a matlab binary file to be able to read it on GNU Radio. GNU Radio reads the binary file from my desktop, then uses the information as the file source.

Remove Spikes from Periodic Data with MATLAB

I have some data which is time-stamped by a NMEA GPS string that I decode in order to obtain the single data point Year, Month, Day, etcetera.
The problem is is that in few occasions the GPS (probably due to some signal loss) goes boinks and it spits out very very wrong stuff. This generates spikes in the time-stamp data as you can see from the attached picture which plots the vector of Days as outputted by the GPS.
As you can see, the GPS data are generally well behaved, and the days go between 1 and 30/31 each month before falling back to 1 at the next month. In certain moments though, the GPS spits out a random day.
I tried all the standard MATLAB functions for despiking (such as medfilt1 and findpeaks), but either they are not suited to the task, either I do not know how to set them up properly.
My other idea was to loop over differences between adjacent elements, but the vector is so big that the computer cannot really handle it.
Is there any vectorized way to go down such a road and detect those spikes?
Thanks so much!
you need to filter your data using a simple low pass to get rid of the outliers:
windowSize = 5;
b = (1/windowSize)*ones(1,windowSize);
a = 1;
FILTERED_DATA = filter(b,a,YOUR_DATA);
just play a bit with the windowSize until you get the smoothness you want.

How can I get into a wav file to change the sample rate?

I have a wav file pulled up in MATLAB, and I can see it's sample rate. All I need to do is change this 1 number. Everything else in the file will remain uncahnged. (The resulting sound would play at a different speed but would have an identical array of sample data.)
The reason I need to do this is because MATLAB seems to freak out when I tell it to open something sampled at anything other than 8k. All I need MATLAB for is to edit the file, so the sample rate really doesn't matter at all, since I'll be putting it back into a wav file when I'm done. So I either need to be able to change the value in the wav file that stores the sample rate, or to get MATLAB to change the sample rate it prefers from 8k to the sample rate that my files were recorded at.
if you just want to change the sampling frequency, here is the code, but it would distort the original wav file. If you decrease the sampling frequency, then the beat and music would be very slow.
Code:
[y, fs, nbits]=wavread('stego_lab');
fs2=11025;
wavwrite(y,fs2,nbits,'stego2_lab.wav');
sound(y,fs2,nbits)
you can hear it but the samples will remain the same.
Hope it helps.
There is the SOX tool, which should help you in that respect, and it comes on almost any platform - http://sox.sourceforge.net
There is also libsndrate, libsamplerate, libsndfile and others, that might have executables too.
Try this solution
[x,fs] = wavread('infile.wav');
<br>[p,q] = rat(16000/fs) % to convert to 16k sample rate</br>
<br>y = resample(x,p,q); % signal package require
wavwrite(x,16000,'outfile.wav');

How is it possible to encode black/white picture into ".wav"-file?

How is it possible to encode black/white picture into ".wav"-file? I know that it is possible for sure with help of "stenography". But I don't know it's algorithms. What algorithms exist? And what books/sources are the best for understanding of their principles?
Edited:
Actually I have stereo wav-file. My task is to decode pictures from it. The task says, that frequencies of the left channel show the X-coordinate, frequencies of the right channel show the Y-coordinate of Cartesian coordinate system. These points compose the picture with the text-message. So, I must to write programm for this. I haven't any idea what should I do.
Probably the simplest version of steganography using a wav file would be to use 16-bit samples in the wave file, but only dedicate the 15 most significant bits to sound. In the least significant bit of each sample, you'd encode one pixel of your black and white picture.
Regenerating the picture would require software to open the wave file, take the least significant bit from each sample, and put those bits back together with each other into (for example) a JPEG file.
To put things into perspective, a CD has two channels containing 16 bit samples at a rate of 44.1 KHz, so you'd only need the LSBs from around 10 seconds of sound to encode a fairly typical full-color JPEG (e.g., 100KB or so). A wave file of a typical ~3 minute pop song could hide around 15-20 full-color pictures pretty easily.
Edit: (to reply to edited answer). This is a little tougher to deal with. An individual sample can't represent any frequency; it just represents the amplitude at a given point in time. To get frequency, you need a number of samples over a period of time -- and you need to know the exact period to convert.
Once you know that, you basically do an FFT on the samples. That will tell you the relative strengths of signal at all possible frequencies. Presumably, you'd pick the strongest one and scale appropriately. Do the same for the other channel and draw a pixel at that point.
Your ears are not sensitive to small changes in sound file.
Wav files are UNCOMPRESSED data so its just a file of 16-24bit characters. Your ears cannot notice slight differences betweeen bits. All you need to do is periodically inject bit values that represent an image in the data.
So if you insert one pixel for every 1000 data points you can hide an image (without even encrypting it) in a wave file. If a user plays the file they CANNOT hear it.
When you save the file on your computer or computer afar you can use a decoding tool that is aware of the hiding techinque.

Mixing sound files of different size

I want to mix audio files of different size into a one single .wav file without clipping any file.,i.e. The resulting file size should be equal to the largest sized file of all.
There is a sample through which we can mix files of same size
[(http://www.modejong.com/iOS/#ex4 )(Example 4)].
I modified the code to get the mixed file as a .wav file.
But I am not able to understand that how to modify this code for unequal sized files.
If someone can help me out with some code snippet,i'll be really thankful.
It should be as easy as sending all the files to the mixer simultaneously. When any single file gets to the end, just treat it as if the remainder is filled with zeroes. When all files get to the end, you are done.
Note that the example code says it returns an error if there would be clipping (the sum of the waves is greater than the max representable value.). This condition is more likely if you are mixing multiple inputs. The best way around it is to create some "headroom" in the input waves. You can do either do this in preprocessing, by ensuring that each wave's volume is no more than X% of maximum. (~80-90%, depending on number of inputs.). The other way is to do it dynamically in the mixer code by multiplying each sample by some value <1.0 as you add it to the mix.
If you are selecting the waves to mix at runtime and failure due to clipping is unacceptable, you will need to modify the sample code to pin the values at max/min instead of returning an error. Don't just let them overflow or you will get noisy artifacts.
(Clipping creates artifacts as well, but when you haven't created enough headroom before mixing, it is definitely preferrable to overflow. It is a more familiar-sounding type of distortion, similar to what you get when you overdrive your speakers. See this wikipedia article on clipping:
Clipping is preferable to the alternative in digital systems—wrapping—which occurs if the digital hardware is allowed to "overflow", ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in gross distortion of the signal.
How I'd do it:
Much like the mix_buffers function that you linked to, but pass in 2 parameters for mixbufferNumSamples. Iterate over the whole of the longer of the two buffers. When the index has gone beyond the end of the shorter buffer, simply set the sample from that buffer to 0 for the rest of the function.
If you must avoid clipping and do it in real-time and you know nothing else about the two sounds, you must provide enough headroom. The simplest method is by halving each of the samples before mixing:
mixed = s1/2 + s2/2;
This ensures that the resultant mixed sample won't overflow an int16_t. It will have the side effect of making everything quieter though.
If you can run it offline, you can calculate a scale factor to apply to both waveforms which will keep the peaks when summed below the maximum allowed value.
Or you could mix them all at full volume to an int32_t buffer, keeping track of the largest (magnitude) mixed sample and then go back through the buffer multiplying each sample by a scale factor which will make that extreme sample just reach the +32767/-32768 limits.