I am writing a chord-recognizer for a school project. I have to extract features from an mp3 file and use SVM with chord labels.
How can I extract frequencies from an audio file.
Is there any scipy package which could get me beat synchronous chroma.
The decoding becomes much easier even using a home-grown tool if to get some raw stream (including WAV file which is simply a raw stream and an envelope around it). Under a usual Unix-like, you can do it e.g. with mpg123 -s, mplayer -ao pcm:fast:file=$outfile and so on. But I doubt you can find a library which eventually supports all compressed audio formats.
(Also, SoX is good to convert between all uncompressed formats.)
You can read wave files with python's wave package. Probably the easiest way to get out frequencies is by taking the FFT (numpy.fft) and finding peaks in the output. You'll want to time-box your FFT calls to something that makes sense (windows where the pitches are consistent), or else you'll be looking at a bunch of frequency patterns on top of each other.
Have fun!
You may consider computing a chromagram, which is just like a spectrogram but with the musical notes in the Y axis instead of frequencies. The Librosa python library has a built-in function to compute it. https://librosa.github.io/librosa/generated/librosa.feature.chroma_stft.html
What you're looking for my friend, is Librosa. It's perfect for Audio feature extraction and manipulation. It has a separate submodule for features. You can extract features at the lowest levels and their documentation has some very easy to understand tutorials.
Here's the link to their website. Along with a sample code
https://librosa.github.io/librosa/index.html
import librosa
audio_file = 'your_audio_file.wav'
signal , sampling_rate = librosa.load(audio_file, sr=16000)
print(type(signal), type(sampling_rate)
len(signal), sampling_rate
Related
Using Matlab, I've made some random noise, filtered it and then successfully saved it as a gnuradio readable file for a file source. Once used in gnuradio, I set the file source to repeat and then viewed it using QT Gui Frequency Sink. I can see the filtered noise fine, but every now and then (every 10 seconds or so), the spectrum will drop in power and jump around for around a tenth of a second, then return back to normal power. My sample rate for the matlab filter is 320k and same with my gnuradio sample rate if that matters.
I think it may have to do with the fact that the noise generated on matlab is going to be a sequence that is repeated on gnuradio. I think the discontinuity happens right when the sequence repeats. Any idea how I can stop this discontinuity so I can transmit without having to worry about it? If I'm missing any info, please let me know and I'll edit the question. Thanks in advance.
NOTE: I needed to create a matlab binary file to be able to read it on GNU Radio. GNU Radio reads the binary file from my desktop, then uses the information as the file source.
I've been using this library (http://kenschutte.com/midi) to work with midi files and the functions on here have been very helpful. However, the midi2audio() method only produces garbled .wav files no matter what midi I put in (although the notes are recognizable and the correct midi is being played). Has anyone else used this function library and run into this same problem and if so, how could I fix this? Or is there another function I can use online somewhere that does the same thing?
Below is the code used to generate the .wav file (copied and pasted from the link above)
[y,Fs] = midi2audio(midi);
% save to file:
% (normalize so as not clipped in writing to wav)
y = .95.*y./max(abs(y));
wavwrite(y, Fs, 'out.wav');
It appears that midi2audio only include very rudimentary sound synthesis, with frequency modulated synthesis as default. If you change to simple sine wave synthesis maybe it will sound better?
[y,Fs] = midi2audio(midi, 'sine')
If that still doesn't cut it you'd probably want to use more sophisticated software instruments.
The simplest cross-platform method for this is probably FluidSynth (also available through various repositories like MacPorts, Homebrew, apt-get, GitHub…)
FluidSynth uses sample based sound synthesis to translate the MIDI instructions into audio, and a sample bank in the SoundFont2 format is required for it to work. One such can be found here.
Having sorted that out, all you have to do to make a WAVE file out of your MIDI file is to type this into your terminal/console:
fluidsynth -F out.wav path-to-fm2-file in.mid
I am trying to train a neural network using audio files that are originally in .SPH format. I need to get integers that represent the amplitude of the sound waves for neural net, so I used sox to convert the files to .wav format by calling sox infile.SPH outfile.wav remix 1-2 (remix for converting 2 channels into 1), and then tried to use
[y, Fs, nbits, opts] = wavread('outfile.wav') in matlab to get the integer representation.
However, matlab threw Data compression format (CCITT mu-law) is not supported.
So I used sox infile.SPH -b 16 -e signed-integer -c 1 outfile.wav
which I think puts the wave file in a linear format instead of mu-law. But now matlab threw another error: Invalid Wave File. Reason: Cannot open file.
My audio files are in 8000 Hz u-law single or dual channels, and all in 8-bit, I think (8-bit for single for sure).
Is there a way to get the integer representation out of the audio files using matlab or any other programs? Either u-law or linear is fine, unless one would be better for neural net training. Preferably 8 bit, since the source files are in 8-bit.
I don't really understand .SPH. For the uncompressed ones (and ignore headers), are the files storing amplitudes (guess it has to somehow)? Can I extract numbers out of those files directly without bothering with waves? Are the signals stored in a sequential fashion such that it would make sense to split the audio files?
I am new to audio processing in general, so any pointers would be appreciated!
You need to clearly identify the main task: feeding the neural net with vectors or matrix. So the first step is to work on the audio file (without matlab!) in order to have wav files. The second step is the neural net setting/training with matlab.
I would try to decompress 'sph' files, then convert them into 'wav' (for example see the instructions here and here).
Finally, using sox in a command/terminal window is better than using it in the matlab console.
I'm trying to write a program in iPhone than can take two audio files (e.g. WAV) as inputs, compare them, and spit out a number that tells you how similar the audio files are.
If someone has done something like this, know how to go about doing it, or just have some ideas, please let me know. Anything will be greatly appreciated.
Specific questions: What language is suitable? How hard is it to do (how many
hours, roughly)? Where can I find a good source of audio library/tools?
Thanks!
I'd say it's pretty hard, not so much the implementation, but coming up with a reasonable definition of 'similar'.
That said, you're probably looking at techniques like autocorrelation and FFT, both of which are CPU-intensive tasks, so I'd say a fully-compiled language (C, C++, don't know about Objective-C) would be most suitable at least for the actual calculations. Also, you're facing a somewhat underpowered platform for such tasks (if only because uncompressed audio files are pretty large), so you're in for quite some optimization.
This book: http://www.dspguide.com/ is quite concise reading for all things DSP-related.
Sounds similar to what 'Shazam' does - awesome iPhone app by the way, check it out if you haven't already (it's free too).
A while ago there was an article on how Shazam works, read it here. It takes an acoustic fingerprint and compares it to other songs' fingerprints, returning the closest match.
I would say there is a lot of math, probably some matrices and maybe Fourier transforms involved in fingerprinting and then trying to compare the audio.
-
Probably would take a good while to program. If your math skills are up to it though, sounds like a good challenge :-)
-
EDIT: turns out there was some source code on the site I linked. It's in Java but would be well worth a look through before you start writing your own. Source code here
I am working on something similar in Java on a speech recognition app.
I would recommend using MFCC (requires calculating FFT) for feature extraction and Neural Networks or some other sort of machine learning technique for training and recognition. You train the NN with the features extracted from the reference wav file, more precisely from consecutive equal lenght slices/windows of that audio file. Then you use the NN to detect if another file, also split into slices, has the same features.
This is the basic idea upon which you can elaborate to further your own specifications, or exactly what you want your app to do.
In terms of libraries in Objective C I think you can find a few for the signal processing part (FFT and such) as for the machine learning part I have no idea about what you could find.
As for programming time it's hard to estimate because it depends on a lot of details. I would say somewhere about a week, but that's just a fair estimation.
ps: MFCC stands for Mel-Frequency Coeficients: http://en.wikipedia.org/wiki/Mel-frequency_cepstrum
I need to know if it is possible to create a 30 second sample MP3 from a WAV file. The generated MP3 file must feature a fade at the start and end.
Currently using ffmpeg, but can not find any documentation that would support being able to do such a thing.
Could someone please provide me the name of software (CLI, *nix only) that could achieve this?
This will
trim out from Position 45 sec. the next 30 seconds (0:45.0 30) and
fade the first 5 seconds (0:5) and the last 5 seconds (0 0:5) and
convert from wav to mp3
sox infile.wav outfile.mp3 trim 0:45.0 30 fade h 0:5 0 0:5
Check out SoX - Sound eXchange
I have not used it myself but one of my friends speaks highly of it.
From web page (highlighted my me):
SoX is a cross-platform (Windows,
Linux, MacOS X, etc.) command line
utility that can convert various
formats of computer audio files in to
other formats. It can also apply
various effects to these sound files,
and, as an added bonus, SoX can play
and record audio files on most
platforms.
The best way to do this is to apply the 30-second truncation, fade in and fade out to the WAV audio data before converting it to an MP3. If your conversion library has a method that takes an array of samples, this is very easy to do. If the method only accepts a WAV file (either in-memory or on disk), then this is slightly less easy as you have to learn the WAV file format (which is easy to write but somewhat more difficult to read). Either way, applying gain and/or attenuation to time-domain sample data (as in a WAV file) is much easier than trying to apply these effects to frequency-domain data (as in an MP3 file).
Of course, if your conversion library already does all this, it's best to just use that and not worry about it yourself.