Separation of singing voice from music [closed] - matlab

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to know how to perform "spectral change detection" for the classification of vocal & non-vocal segments of a song. We need to find the spectral changes from a spectrogram. Any elaborate information about this, particularly involving MATLAB?

Separating out distinct signals from audio is a very active area of research, and it is a very hard problem. This is often called Blind Signal Separation in the literature. (There is some MATLAB demo code in the previous link.
Of course, if you know that there is vocal in the music, you can use one of the many vocal separation algorithms.

As others have noted, solving this problem using only raw spectrum analysis is a dauntingly hard problem, and you're unlikely to find a good solution to it. At best, you might be able to extract some of the vocals and a few extra crossover frequencies from the mix.
However, if you can be more specific about the nature of the audio material you are working with here, you might be able to get a little bit further.
In the worst case, your material would be normal mp3's of regular songs -- ie, a full band + vocalist. I have a feeling that this is the case you are probably looking at given the nature of your question.
In the best case, you have access to the multitrack studio recordings and have at least a full mixdown and an instrumental track, in which case you could extract the vocal frequencies from the mix. You would do this by generating an impulse response from one of the tracks and applying it to the other.
In the middle case, you are dealing with simple music which you could apply some sort of algorithm tuned to the parameters of the music to. For instance, if you are dealing with electronic music, you can use to your advantage the stereo width of the track to eliminate all mono elements (ie, basslines + kicks) to extract the vocals + other panned instruments, and then apply some type of filtering and spectrum analysis from there.
In short, if you are planning on making an all-purpose algorithm to generate clean acapella cuts from arbitrary source material, you're probably biting off more than you can chew here. If you can specifically limit your source material, then you have a number of algorithms at your disposal depending on the nature of those sources.

This is hard. If you can do this reliably you will be an accomplished computer scientist. The most promising method I read about used the lyrics to generate a voice only track for comparison. Again, if you can do this and write a paper about it you will be famous (amongst computer scientists). Plus you could make a lot of money by automatically generating timings for karaoke.

If you just want to decide wether a block of music is clean a-capella or with instrumental background, you could probably do that by comparing the bandwidth of the signal to a normal human singer bandwidth. Also, you could check for the base frequency, which can only be in a pretty limited frequency range for human voices.
Still, it probably won't be easy. However, hearing aids do this all the time, so it is clearly doable. (Though they typically search for speech, not singing)

first sync the instrumental with the original, make sure they are the same length and bitrate and start and end at the exact time and convert them to .wav
then do something like
I = wavread(instrumental.wav);
N = wavread(normal.wav);
i = inv(I);
A = (N - i); // it could be A = (N * i) or A = (N + i) you'll have to play around
wavwrite(A, acapella.wav)
that should do it.. a little linear algebra goes a long way.

Related

How can I Compare 2 Audio Files Programmatically?

I want to compare 2 audio files programmatically.
For example: I have a sound file in my iPhone app, and then I record another one. I want to check if the existing sound matches the recorded sound or not ( - similar to voice recognition).
How can I accomplish this?
Have a server doing audio fingerprinting computation that is not suitable for mobile device anyway. And then your mobile app uploads your files to the server and gets the analysis result for display. So I don't think programming language implementing it matters much. Following are a few AF implementations.
Java: http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
VC++: http://code.google.com/p/musicip-libofa/
C#: https://web.archive.org/web/20190128062416/https://www.codeproject.com/Articles/206507/Duplicates-detector-via-audio-fingerprinting
I know the question has been asked a long time ago, but a clear answer could help someone else.
The libraries from Echoprint ( website: echoprint.me/start ) will help you solve the following problems :
De-duplicate a big collection
Identify (Track, Artist ...) a song on a hard drive or on a server
Run an Echoprint server with your data
Identify a song on an iOS device
PS: For more music-oriented features, you can check the list of APIs here.
If you want to implement Fingerprinting by yourself, you should read the docs listed as references here, and probably have a look at musicip-libofa on Google Code
Hope this will help ;)
Apply bandpass filter to reduce noise
Normalize for amplitude
Calculate the cross-correlation
It can be fairly Mhz intensive.
The DSP details are in the well known text:
Digital Signal Processing by
Alan V. Oppenheim and Ronald W. Schafer
I think as well you may try to select a few second sample from both audio track, mnormalise them in amplitude and reduce noise with a band pass filter and after try to use a correlator.
for instance you may take a 5 second sample of one of the thwo and made it slide over the second one computing a cross corelation for any time you shift. (be carefull that if you take a too small pachet you may have high correlation when not expeced and you will soffer the side effect due to the croping of the signal and the crosscorrelation).
After yo can collect an array with al the results of the cross correlation and get the index of the maximun.
You should then set experimentally up threshould o decide when yo assume the pachet to b the same. this will change depending on the quality of the audio track you are comparing.
I implemented a correator to receive and distinguish preamble in wireless communication. My script is actually done in matlab. if you are interested i can try to find the common part and send it to you.
It would be a too long code to be pasted hene in the forum. if you want just let me know and i will send it to ya asap.
cheers

Which is the best clustering algorithm to find outliers? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Basically I have some hourly and daily data like
Day 1
Hours,Measure
(1,21)
(2,22)
(3,27)
(4,24)
Day 2
hours,measure
(1,23)
(2,26)
(3,29)
(4,20)
Now I want to find outliers in the data by considering hourly variations and as well as the daily variations using bivariate analysis...which includes hourly and measure...
So which is the best clustering algorithm is more suited to find outlier considering this scenario?
.
one 'good' advice (:P) I can give you is that (based on my experience) it is NOT a good idea to treat time similar to spatial features. So beware of solutions that do this. You probably can start with searching the literature in outlier detection for time-series data.
You really should use a different repesentation for your data.
Why don't you use an actual outlier detection method, if you want to detect outliers?
Other than that, just read through some literature. k-means for example is known to have problems with outliers. DBSCAN on the other hand is designed to be used on data with "Noise" (the N in DBSCAN), which essentially are outliers.
Still, the way you are representing your data will make none of these work very well.
You should use time series based outlier detection method because of the nature of your data (it has its own seasonality, trend, autocorrelation etc.). Time series based outliers are of different kinds (AO, IO etc.) and it's kind of complicated but there are applications which make it easy to implement.
Download the latest build of R from http://cran.r-project.org/. Install the packages "forecast" & "TSA".
Use the auto.arima function of forecast package to derive the best model fit for your data amd pass on those variables along with your data to detectAO & detectIO of TSA functions. These functions will pop up any outlier which is present in the data with their time indexes.
R is also easy to integrate with other applications or just simply run a batch job ....Hope that helps...

Audio File Matching Program

I'm trying to write a program in iPhone than can take two audio files (e.g. WAV) as inputs, compare them, and spit out a number that tells you how similar the audio files are.
If someone has done something like this, know how to go about doing it, or just have some ideas, please let me know. Anything will be greatly appreciated.
Specific questions: What language is suitable? How hard is it to do (how many
hours, roughly)? Where can I find a good source of audio library/tools?
Thanks!
I'd say it's pretty hard, not so much the implementation, but coming up with a reasonable definition of 'similar'.
That said, you're probably looking at techniques like autocorrelation and FFT, both of which are CPU-intensive tasks, so I'd say a fully-compiled language (C, C++, don't know about Objective-C) would be most suitable at least for the actual calculations. Also, you're facing a somewhat underpowered platform for such tasks (if only because uncompressed audio files are pretty large), so you're in for quite some optimization.
This book: http://www.dspguide.com/ is quite concise reading for all things DSP-related.
Sounds similar to what 'Shazam' does - awesome iPhone app by the way, check it out if you haven't already (it's free too).
A while ago there was an article on how Shazam works, read it here. It takes an acoustic fingerprint and compares it to other songs' fingerprints, returning the closest match.
I would say there is a lot of math, probably some matrices and maybe Fourier transforms involved in fingerprinting and then trying to compare the audio.
-
Probably would take a good while to program. If your math skills are up to it though, sounds like a good challenge :-)
-
EDIT: turns out there was some source code on the site I linked. It's in Java but would be well worth a look through before you start writing your own. Source code here
I am working on something similar in Java on a speech recognition app.
I would recommend using MFCC (requires calculating FFT) for feature extraction and Neural Networks or some other sort of machine learning technique for training and recognition. You train the NN with the features extracted from the reference wav file, more precisely from consecutive equal lenght slices/windows of that audio file. Then you use the NN to detect if another file, also split into slices, has the same features.
This is the basic idea upon which you can elaborate to further your own specifications, or exactly what you want your app to do.
In terms of libraries in Objective C I think you can find a few for the signal processing part (FFT and such) as for the machine learning part I have no idea about what you could find.
As for programming time it's hard to estimate because it depends on a lot of details. I would say somewhere about a week, but that's just a fair estimation.
ps: MFCC stands for Mel-Frequency Coeficients: http://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Can an artificial neural network predict the outcome of sports games? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I was trying to find something original and fun to do with artificial neural networks (ANNs) as a personal/learning project and I though it would be cool if I could predict the results of sports games (especially NHL games).
I'm pretty sure it would be easy to evolve an ANN that can predict which team is most likely to win (usually the team with the better record). However, what I would like to do is create an ANN that would tell how likely the outcome is, similar to bookmaker odds.
Is this something an ANN can do? In the affirmative, what kind of success can I expect? I know I can't beat the bookmaker (at least not with a software solution). I want do this as a recreational project/challenge to myself. I don't expect to bet money on sports games with this project.
Way back in the days of the IBM XT I played with a shareware ANN program to try and improve my chances on the British football (soccer) pools. This is a form of betting where you try and predict which football matches will result in draws. I assigned each team a number then looked back thorough past results and from them generated a single digit for the result. From memory it was 0 from a home win , 1 for an away win and 2 for a draw. Each result went on a single line in a training file. I would then run the training file through the program and generate the ANN settings. I would then look up the following Saturdays matches and feed them into the ANN then look for matches predicted as draws.
As the weeks went on my predictions of draws did definetly become more and more accurate. However ...
1) The XT was so slow that by Christmas it was taking 24 hours to generate the ANN settings from the training data. I really had better things to do with my precious (and expensive) PC.
2) Although it was better at predicting draws it wasn't predicting enough to actually win any money. Looking back I suppose the program had just worked out that Manchester United would always beat Sheffield United. This was more football knowledge than I had but not enough to win any money.
3) Entering the results into the training data and then generating the forthcoming matches data was taking me ages and to be honest sport bores me rigid.
So I gave up and didn't become a millionaire.
These days however PC's are much faster and much of the training data could be scraped from the web. But I still doubt it is a route to a fortune but its certainly an interesting project.
Ian
A reply above stated:
I know that if the bookmakers odds could be beaten by an ANN,
bookmakers would already be using one to fix their odds.
Bookmakers don't set the line based on their analysis of the teams - they set it based on their analysis of the betting public's opinion of the teams. An ideal line for the bookie is where he has exactly the same amount bet on each side of the line - then he is guaranteed a profit = the 'juice' on the losers' bets. They move the line as game approaches to try to keep that 50/50 split. Bookie may think Home team -5 is accurate line based on game analysis, but if he expects that will draw 2x $$ on the Home team he will not set the line at -5 - he will set at -7 or -8 - to where he expects to draw equal $$ for both -5 and +5 bets.
ANNs are really good at pattern matching and prediction, so yes, odds are you could build an ANN that does what you want.
You'll need more than just team win/loss ratio to make it really effective however. Feed it stats for the players, too. For real effectiveness, try to include game-flow information... like which players are on the line for each play (for football, for example).
Ultimately, the biggest problem you'll run into (aside from the whole "writing the ANN" issue) is getting the data you need to feed it.
I've done some stock market predictions with an AI and my conclusion is that it is not very hard to make an AI that gets good results with the historical data.
Making winning transactions in the future is a different ballgame.
I have just worked on this very problem (predicting English Premier League games) for the past 10 days, and ended up with very similar results using 3 different methods: SVM, Logistic Regression, and NN.
LR and NN will give probabilities. SVM outputs 0/1 (but it can be tweaked for probas too (I haven't tried yet).
I needed a "massive" (by my standards at least) feature set though (almost 300) and a good chunk of data (13 years worth).
Re. data, I got it from the web, simply.
Conclusion: I can just about match the bookies in terms of accuracy (predicting victories in my case). If I add the pre-match odds to the feature set, I get the exact same accuracy as the bookies (as expected), but no better (surely meaning my feature set is summarized in the bookies odds, and they have a little extra knowledge on top).
I'm sure there is a way to get better accuracy, either by improving the algos, or more likely by having extremely granular data (as in which players play which games, for how many minutes, and a lot of player-level historical stats, so as to build bottom-up models of team performance).
But bottom line is I can testify NNs work quite well for that purpose. SVM is slightly better though, in my limited experience.
I think it's indeed all about data, but there's no end to what you could feed it with in order to be more accurate : winning/loosing streaks, players biorhythms, player's girlfriends mood before the game, minor/major injuries they suffered in the recent past, extra-sportive events that are bothering the players, etc, etc, etc.
But I don't think you can accurately predict which team is more likely to win, it would be just a more-or-less educated guess.
In my opinion and experience, because of the excessively large number of factors in play, designing and training the ANN will be unreasonably complex and time-consuming. ANNs are good at pattern matching, and game prediction takes much deductive reasoning rather than mere pattern matching.
But if you want to enjoy learning neural networks, it will be a good adventure. If you are successful, you might want to host your code somewhere for others to see and learn!
For game prediction, it would be much easier and faster with decision trees or a rules engine and so on. This will be no easy task either, but it will be another interesting activity.
My belief is that the unpredictability of an event is due to lack of information and understanding...If you have all the knowledge, then yes it could be done. Or, the more knowledge you have, the better it can be done.
So in theory, the answer is yes.
However, in practice, you can get a PhD and have a whole career working on this question and you still may not succeed.

Creating music visualizer [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
So how does someone create a music visualizer? I've looked on Google but I haven't really found anything that talks about the actual programming; mostly just links to plug-ins or visualizing applications.
I use iTunes but I realize that I need Xcode to program for that (I'm currently deployed in Iraq and can't download that large of a file). So right now I'm just interested in learning "the theory" behind it, like processing the frequencies and whatever else is required.
As a visualizer plays a song file, it reads the audio data in very short time slices (usually less than 20 milliseconds). The visualizer does a Fourier transform on each slice, extracting the frequency components, and updates the visual display using the frequency information.
How the visual display is updated in response to the frequency info is up to the programmer. Generally, the graphics methods have to be extremely fast and lightweight in order to update the visuals in time with the music (and not bog down the PC). In the early days (and still), visualizers often modified the color palette in Windows directly to achieve some pretty cool effects.
One characteristic of frequency-component-based visualizers is that they don't often seem to respond to the "beats" of music (like percussion hits, for example) very well. More interesting and responsive visualizers can be written that combine the frequency-domain information with an awareness of "spikes" in the audio that often correspond to percussion hits.
For creating BeatHarness ( http://www.beatharness.com ) I've 'simply' used an FFT to get the audiospectrum, then use some filtering and edge / onset-detectors.
About the Fast Fourier Transform :
http://en.wikipedia.org/wiki/Fast_Fourier_transform
If you're accustomed to math you might want to read Paul Bourke's page :
http://local.wasp.uwa.edu.au/~pbourke/miscellaneous/dft/
(Paul Bourke is a name you want to google anyway, he has a lot of information about topics you either want to know right now or probably in the next 2 years ;))
If you want to read about beat/tempo-detection google for Masataka Goto, he's written some interesting papers about it.
Edit:
His homepage : http://staff.aist.go.jp/m.goto/
Interesting read : http://staff.aist.go.jp/m.goto/PROJ/bts.html
Once you have some values for for example bass, midtones, treble and volume(left and right),
it's all up to your imagination what to do with them.
Display a picture, multiply the size by the bass for example - you'll get a picture that'll zoom in on the beat, etc.
Typically, you take a certain amount of the audio data, run a frequency analysis over it, and use that data to modify some graphic that's being displayed over and over. The obvious way to do the frequency analysis is with an FFT, but simple tone detection can work just as well, with a lower lower computational overhead.
So, for example, you write a routine that continually draws a series of shapes arranged in a circle. You then use the dominant frequencies to determine the color of the circles, and use the volume to set the size.
There are a variety of ways of processing the audio data, the simplest of which is just to display it as a rapidly changing waveform, and then apply some graphical effect to that. Similarly, things like the volume can be calculated (and passed as a parameter to some graphics routine) without doing a Fast Fourier Transform to get frequencies: just calculate the average amplitude of the signal.
Converting the data to the frequency domain using an FFT or otherwise allows more sophisticated effects, including things like spectrograms. It's deceptively tricky though to detect even quite 'obvious' things like the timing of drum beats or the pitch of notes directly from the FFT output
Reliable beat-detection and tone-detection are hard problems, especially in real time. I'm no expert, but this page runs through some simple example algorithms and their results.
Devise an algorithm to draw something interesting on the screen given a set of variables
Devise a way to convert an audio stream into a set of variables analysing things such as beats/minute frequency different frequency ranges, tone etc.
Plug the variables into your algorithm and watch it draw.
A simple visualization would be one that changed the colour of the screen every time the music went over a certain freq threshhold. or to just write the bpm onto the screen. or just displaying an ociliscope.
check out this wikipedia article
Like suggested by #Pragmaticyankee processing is indeed an interesting way to visualize your music. You could load your music in Ableton Live, and use an EQ to filter out the high, middle and low frequencies from your music. You could then use a VST follwoing plugin to convert audio enveloppes into MIDI CC messages, such as Gatefish by Mokafix Audio (works on windows) or PizMidi’s midiAudioToCC plugin (works on mac). You can then send these MIDI CC messages to a light-emitting hardware tool that supports MIDI, for instance percussa audiocubes. You could use a cube for every frequency you want to display, and assign a color to the cube. Have a look at this post:
http://www.percussa.com/2012/08/18/how-do-i-generate-rgb-light-effects-using-audio-signals-featured-question/
We have lately added DirectSound-based audio data input routines in LightningChart data visualization library. LightningChart SDK is set of components for Visual Studio .NET (WPF and WinForms), you may find it useful.
With AudioInput component, you can get real-time waveform data samples from sound device. You can play the sound from any source, like Spotify, WinAmp, CD/DVD player, or use mic-in connector.
With SpectrumCalculator component, you can get power spectrum (FFT conversion) that is handy in many visualizations.
With LightningChartUltimate component you can visualize data in many different forms, like waveform graphs, bar graphs, heatmaps, spectrograms, 3D spectrograms, 3D lines etc. and they can be combined. All rendering takes place through Direct3D acceleration.
Our own examples in the SDK have a scientific approach, not really having much entertainment aspect, but it definitely can be used for awesome entertainment visualizations too.
We have also configurable SignalGenerator (sweeps, multi-channel configurations, sines, squares, triangles, and noise waveforms, WAV real-time streaming, and DirectX audio output components for sending wave data out from speakers or line-output.
[I'm CTO of LightningChart components, doing this stuff just because I like it :-) ]