Programmatically 'Listening' to Sound (Signal Processing?)

Programmatically 'Listening' to Sound (Signal Processing?) - visualization

I'm familiar with Computer Vision (Well, know OF it), of which one application can be image recognition, such as Optical Character Recognition, I believe. However, something that I am more interested in is 'computer listening', which I have just learned is considered Digital Signal Processing.
The thing that interests me the most about signal processing is the potential application in music. I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played. Using the program, the user was able to move these around and even edit them. Now, obviously this is a lot more complicated, but does it involve the same thing? Signal Processing? I am also interested in possible applications in music visualizers and intelligent lighting systems.
My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood). Would an uncompressed format such as PCM do better than MP3? I don't know anything about sound processing, that's just what I'm inferring from what I've read so far.
I have already seen this question which has great answers and links that cover a lot of my questions. However, most of the links I've found are theoretical, which I'm sure is all interesting and is definitely worth a read given my interest in the subject, but I wanted to know if there are any existing libraries which can facilitate this, or articles pertaining to this subject that geared towards Computer Science/Programming, with perhaps example code. Even open source sound/music visualizers or any other open source sound processing code would be great.
Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.

The thing that interests me the most
about signal processing is the
potential application in music. I
remember a while ago I saw a preview
of an application (Sorry, forgot the
name)
Maybe cubase ?
which could listen to a recording of
someone playing a guitar, and
automatically graph it out across a
time-line with the actual notes/chords
that were played
Deeply simplified, when you play a note you produce a periodic wave with a given frequency. There's a mathematical trick (the Fourier transform DFT) that converts the wave into the spectrum, which instead of presenting intensity against time, it shows it against frequency of the wave. For example, a perfect A note from a tuning fork would produce an oscillating wave at 440 Hz. In the time domain this would appear as a sinusoidal wave. In the frequency domain, it will appear as a single, narrow spike centered at 440 Hz.
Now, when you play a guitar you don't produce perfect sinusoidal waves. Hitting an A will produce the fundamental frequency, 440 Hz, but also a lot of additional frequencies (e.g. 880, on octave higher, but also a lot of other higher and lower freqs), due to the physics of the vibrating string, the material and shape of the guitar etc.. These additional frequencies are called harmonics, and they mix with the fundamental to produce "the sound of the guitar" (what in musical jargon is called timbre). A different instrument (say piano) will have different mixing of harmonics with the fundamental, producing a different timbre.
What DSP programs do is to perform a DFT on the entering signal. With additional tricks, they find the fundamental and the harmonics, and according to what they find they infer the note you played. This must happen fast, because you could find the note while playing live and triggering special tricks. For example, you could hit an A note on the guitar, the DSP understands it's an A and replaces it with the A from a piano, so from the speakers you obtain the sound of a piano.
Using the program, the user was able
to move these around and even edit
them. Now, obviously this is a lot
more complicated, but does it involve
the same thing? Signal Processing? I
am also interested in possible
applications in music visualizers and
intelligent lighting systems.
Yes. Once you are in the frequency domain, things gets very easy. For example, you could light up a specific light according to the voice frequencies, and another light with the bass drum.
My understanding is that doing this
processing on a compressed audio
format such as MP3 wont yield the same
results as MIDI which contains
separate tracks (Maybe I
misunderstood).
They are two different things. MP3 is a compressed format from a sound wave. Basically it takes what pilots the speakers, and compresses it. The idea is the same: DFT, then removal of stuff that is unlikely to be heard (for example, a high pitch that comes right after a high intensity sound is less likely to be heard, so it gets removed).
MIDI on the other hand is a scroll of events (you know, like those pianos in the far west, with the rolling paper scroll). The file contains no music. It contains instead directions for a MIDI player to perform specific notes at specific times with specific instruments. The quality of the "instrument bank" is (among other things) what distinguish a bad MIDI player (which sounds like a child toy) from a good MIDI player (which sounds realistic, in particular for pianos and violins, for wind instruments I still have to hear a realistic one).
It takes that going from MIDI to MP3, you just perform through a MIDI player. To do the other way around is a different story altogether, and much more complex, and here is where DSP comes into play, as you said.
It's like boiling a fisk tank. You get a fish soup. But to get from the fish soup back to the fish tank, it's much harder.
Would an uncompressed
format such as PCM do better than MP3?
PCM is a technique to convert an analog signal to a digital signal. So your question has a fundamental misunderstanding, that no PCM format exists (the RAW format is a close call, contaning basically nothing but crude data). If you ask if a uncompressed WAV (which contains PCM data) is better than MP3, then yes, but the question sometimes is how much this better really matters to the human ear, and how much postprocessing you have to perform on that data.
know if there are any existing
libraries which can facilitate this,
or articles pertaining to this subject
that geared towards Computer
Science/Programming, with perhaps
example code. Even open source
sound/music visualizers or any other
open source sound processing code
would be great.
If you like python, take a look at this page
Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.
Neither do I, but I toyed a bit with it.

My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood).
MIDI essentially stores instrument information and musical notes. Also other effects (volume, pitch bend, vibrato, attack rate, etc.)
Not really digital signal processing.
Would an uncompressed format such as PCM do better than MP3?
Maybe somewhat; it depends on the application. MP3 reduces the precision of frequencies that humans are not sensitive to. If you want to do visualisations then MP3 is probably fine.
But if you want to, say, determine what sort of instrument is playing in a recording, then there could be useful information hidden in the frequencies that humans are not sensitive to.
I think The Scientist and Engineer's Guide to Digital Signal Processing is a great reference for programmers. Chapter 8 explains the discrete Fourier transform (used in MP3 processing and a lot of other places to separate out the component frequencies of a wave).
I used it to help make a graphical program that let you draw a wave with the mouse, then applied the DFT, and let you select how many frequencies to include. It was a great exercise.

I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played.
You might also be thinking of Melodyne: http://www.celemony.com/cms/
Though Vari audio in newer version of Cubase is pretty similar. :)

I think you need to define exactly what you are looking for and what you are trying to do.
If you want to learn about DSP, MIDI or PCM then there is plenty of information on Wikipedia and references.
There are many a myriad of applications for audio manipulation available. What you've described in your question is what takes place in every digital recording studio (which these days would account for almost all studios) every single day.
If you are intending to perform some DSP against, say, a guitar sound then you would ideally have a recording of the guitar itself (rather than a mixed down track containing drums or vocals). It should be quite obviously that you will get better results analysing a discrete signal without additional noise than you will analysing a signal containing significant levels of 'noise'. So yes, a multitrack recording would be preferable to 'an MP3'.
Typical MP3 contains left and right channels (tracks) so it technically is multitrack. When music is recorded (professionally, at least) different signals are recorded onto different tracks, precisely so that they can be edited and processed discretely at a later time.
What, then, do you want to do with the sounds?
As other answers have pointed out, this does not relate to MIDI at all.

Related

SiteCatalyst streaming video tracking and additional clarifications

we're attempting to track a streaming video with SiteCatalyst.The issue comes in as this video has obsviously no end and the s.media Module can't know how to set the seconds or milestones segment views.This is resulting in no tracking calls except for the starting one.Could a possible solution be the usage of s.media.monitor custom functions?Here's explained how to use them together with the basic Media module settings.Maybe a timing deployment of "sendRequest()" method could help...?I use this occasion to ask a brief how-to example of media.monitor methods, because I've been just using the basic settings till now, as below:
s.loadModule("Media");
s.Media.autoTrack = false;
s.Media.trackMilestones = "25,50";
s.Media.segmentByMilestones = true;... ...Thanks a lot

Yeah.. i really, really dislike the Media module. Video tracking is getting more and more popular with the clients, so it has become the biggest thorn in my side, because the nature of videos over the internet is a big mess with all kinds of moving parts internally, that make it extremely difficult to get truly accurate tracking beyond basic "start" and "stop". (actually I take that back.. I think mobile/sdk tracking is quickly becoming the thing i shake my angry fist at the most, but that's a different post!)
I think Adobe has made some heroic efforts to automate video tracking and it more or less works okay if you just have a regular (not flash) object or html5 tag embedded on the page but in practice, MOST of the time, sites implement their videos through 3rd party scripts (e.g. jwplayer, vimeo, youtube api) and the Media module automation basically goes down the drain on that count.
I understand that it needs to know how long a video is to know when to autopop the events, but I swear, 99% of the time in practice, the way Media module expects things to pop in certain orders etc.. it just doesn't align with how videos work in the real world. Even if you attempt to do it the "manual" way, more often than not it's still buggy,e.g. autoplay and buffering ALWAYS seem to screw up the open+play sequence that MUST happen in that order.
Basically, the Media module desperately needs to be rewritten to better handle streaming videos, and also just "manually" using it in general. Anyways..
Two things I have done in your situation. Overall, neither one of these options are a perfect 1:1 to normal videos with a duration, but then, streaming videos aren't really the same, so it doesn't really make sense to treat them the same.
Option #1: Use an estimated duration for your streaming video. So you said it yourself: your streaming videos have no end. Well as I mentioned, you can't calculate percent viewed unless you have a duration, pretty basic math. So, estimate a duration.
I have clients that have streaming webinars or whatever and it's true that there's technically no duration according to the player, but in reality they don't really conduct that webinar 24/7 forever. In reality it's for a set amount of time like 30 minutes or an hour or something. So, just specify the duration as that.
Yes, this will require extra custom work on your end to store/associate an estimated duration. And yes, this does have the potential for being misleading (e.g. if a webinar ends early or runs late). This option is generally good for sites that have set windows for the stream to actually be active.
Option #2: Ditch the notion of % viewed, record it as n time consumed. So the overall point of the milestones is to know how much of a video was actually watched, yes? Well, who said it has to be measured by % viewed?
How about instead, you just record n seconds consumed every n seconds. You can do this with an incrementor eVar, and/or counter event. (Part of the normal video tracking actually does include a counter event "Video Time", or a.media.timePlayed).
So basically, you'd basically just pop the events/props/eVars yourself, and ignore milestone/segment reports.
Note: This option only really works if you are using the older style video tracking that has events/props/eVars assigned for it. If you are using the newer style video tracking that does not use events/props/eVars.. well, AA does not currently offer an official way to manually pop that stuff directly. It is surely possible to unofficially do so, but I have not yet reverse engineered the latest Media module to figure out how to do that. So, in this case your only option is #1.

Recording multi-channel audio input in real-time

I am trying to perform Time Difference of Arrival in real-time using the PS3 Eye. Since it has a built-in 4 microphone array, I've successfully rearranged the array into a square array and cross-correlated the signals using MATLAB to obtain a relatively accurate TDOA algorithm. However, so far I've been recording the signal, saving the files (4 individual files for each microphone in the array), and then feeding those files into MATLAB to read after-the-fact.
My problem is: MATLAB doesn't recognize the PS3 Eye's microphones separately; it only recognizes it as a whole. So far, Audacity is one of the few programs that actually works well in doing so, but I am inexperienced in using the program and don't know its real-time capabilities. Anyone have suggestions as to how can I can perform real-time signal analysis in this manner? If using something else besides the PS3 Eye would work better, then I am open to suggestions. Thanks.

I know very little about MATLAB or PS3 eye, but various hardware microphones allow you to capture a single audio stream containing multiple (typically 2) channels. The audio data will come to you in frames, each frame containing a single sample for each channel.
I'm not really sure what you mean by "recognizes as a whole", but I assume you mean MATLAB is mixing the channels so that the device only produces one usable channel. If you can capture the channels to file, and they all originate from the same device (i.e. hardware clock), you should be fine except that this solution is not "realtime".
There is a similar discussion on Sound Exchange which ends up suggesting the Microcone. There are a variety of other products, from microphone arrays to digital mixers for analog mic sources, also, but your question seems to be mainly about how to get the data with software.
In short, make sure you are seeing a single device with multiple channels. This will ensure each channel uses the same hardware clock and will prevent drift issues.

This is just a wild guess as I don't know know about MATLAB real time input options.
Maybe try reaper ( http://www.reaper.fm/ ).. it has great multi track capabilities and you can extend it (I think the scripting language is python ). Nice documentation and third party contributions, OSC and Rewire support. So maybe you could think of routing the audio to reaper, doing some data normalization there in python and then route data to MATLAB.
Or you could use PURE DATA which is open source and very open, with lots of patches (basic processing units) that you could probably put together.
HTH
BTW I am in no way affiliated wit reaper or PD.
EDIT: you might also want to consider supercollider (http://supercollider.github.io/) or Chuck (http://chuck.cs.princeton.edu/)

Here's a lead, but I haven't been able to test it, yet.
On Windows, you can record a single 4 track ogg audio file from the Eye with Audacity (using the WASAPI driver selection).
As of 23 Jul 2014, the pa-wavplay for 32-bit and 64-bit MEX supports WASAPI. You will have to rebuild the PortAudio library to select the WASAPI interface as described here and get all four tracks in MatLab (in Windows).
Sadly, if you're not on Windows, I don't have any suggestions. Adjusting the PortAudio build might help, but I only know that WASAPI works with the Eye.

Audio feedback issue in an iphone application

There is a real time audio app for iphone, that adds some effects (reverb, delay, etc.) to input sound and plays it back.
So I'm having a classic amplified audio loop issue. You probably are familiar with this. It happens often when you put the mic close to the loudspeaker (sound from input gets amplified, goes out, gets back in and so on).
It would be great to hear any ideas how to fix this.
(I already tried to:
Limit max sound volume to prevent feedback from growing.
Use filters, to limit some frequencies.
Subtracting previously output signal from new input signal (which, I think, is the best way, but this isn't perfect. Even if timing is good (I think so) this method spoils the sound too much)
Thanks.

Your number 3 and number 2 combined are probably the best. Look up adaptive acoustic echo cancellation.
AEC using nLMS is quite easy to implement but takes a bit of CPU. It may work if you use a lower sample rate, depending on how long in ms your echo is.
There is a fast version that uses an FFT for adaption. It doesn't adapt as quickly but will probably be fine on a mobile app where there isn't a long echo tail.
The way AEC works is that it converges on an acoustic model for the echo path between speaker and microphone and then uses that model to subtract the output echo from the microphone input. It knows what is going out, it puts that through the model and obtains a guess as to what the echo will be, then removes that echo from the input. As time goes on, the model gets better and the echo smaller.

You might already know this, but just to be on the safe side - make sure you're routing the output to the right speaker. As it says in the docs when you set the "play and record" audio session category, the default output is the top speaker (the one you put your ear to during a call). There's another speaker at the bottom, and since it's a lot nearer to the microphone, it'll produce a lot more feedback. If you set the "play and record" category it would normally take a manual override to route to the wrong (bottom) speaker, but I thought I'd mention it to be sure.

To help other people trying to solve this issue: AEC plus a combination of high-pass, low-pass filters.
http://speex.org, it's AEC part does the job. High-pass, low-pass filters are quite easy to implement. (see Apple AccelerometerGraph example for LP, HP filter implementations)

We want a robot (roomba in our case) to know it's location in a given room

OK,
So we want our robot - roomba (the nice vacuum cleaner) to know it's location in a given room.
That means we have the map of the room and the robot is put somewhere and needs to know in a short time where it is located.
We saw a lot of algorithms - where the most relevant one was MCL (monte carlo algorithm) for localization of robots in space.
We are afraid that it is too big for us and don't know where to start from.
We would like to write the code in MATLAB.
So if anyone have any idea where we can find a code - we would apprecate it a lot.
We are open minded about the algorithm - so if you have a better one or something that might work, that will be great. That goes to the language we are writing it in.
Thanks.
Liron.

Interesting.
I've read a lot about trying to keep track of where the roomba is, but it seems like every system that has used only "internal" feedback from the roomba has ended disastrously. Meaning they try to keep track of the wheel locations etc... The main problem is that you can't take into consideration the wheel slip you get and will drastically change based on surface and other factors.
I would recommend using either a stationary based sensor that the roomba can locate from, on-board diagnostic sensors (such as a camera, wiskers, ultrasonic), or a combination of the two.
STAMP makes a great ultrasonic sensor package called the PING((( that can sense up to 6ft. I've used it up to 15 feet, but it works great in close proximity for mapping.
hope this helps!

How is time-based programming done?

I'm not really sure what the correct term is, but how are time-based programs like games and simulations made? I've just realized that I've only wrote programs that wait for input, then do something, and am amazed that I have no idea how I would write something like pong :)
How would something like a flight simulator be coded? It obviously wouldn't run as fast as the computer could run it. I'm guessing everything is executed on some kind of cycle. But how do you handle it when a computation takes longer than the cycle.
Also, what is the correct term for this? Searching "time-based programming" doesn't really give me helpful results.

Games are split into simulation (decide what appears, disappears or moves) and rendering (show it on the screen, play sounds). Simulation is designed to be time-dependent: you can tell the simulator "50ms have elapsed" and it will compute 50ms worth of simulation. A typical game loop will render (which takes an arbitrary amount of time), then run the simulator for the duration since the last time the simulator was run.
If the code runs fast, then the simulator steps will be short (only a few ms) and the game will render the scene more often.
If the code runs slowly, the simulator steps will have longer steps and there will be proportionally fewer renders.
If the simulator runs slower than the simulation itself (it takes 100ms to compute 50ms worth of simulation) then the game cannot run. But this is an exceedingly rare situation, and games sometimes have emergency systems that drop the quality of the simulation to improve performance when this happens.
Note that time-dependent does not necessarily mean millisecond-level precision. Some systems implement simulations using time-based functions (traveled distance equals speed times elapsed time), while others run fixed-duration simulation steps.

I think the correct term is "Real-time application".
For the first question, I'm with spender's answer.
If you know the elapsed time between two frames, you can calculate (with physics, for example) the new position of the elements based on the previous ones.

There are two approaches to this, each with advantages and disadvantages.
You can either go frame based, whereby a timer signals n new frames every second. You calculate movement simply by counting elapsed frames. In the case that computation exceeds the available time, the game slows down.
...or, keeping the frame concept, but this time you keep an absolute measure of time, when the next frame is signalled, you calculate world movement via the amount of elapsed time. This means that stuff happens in real-time, but in the case of severe CPU starvation, gameplay will become choppy.

There's an old saying that "the clock is an actor". Time-based programs are event-driven programs, but the clock is a constant source of events. At least, that's a fairly common and reasonably easy way of doing things. It falls down if you're doing hard realtime or very high performance things.

This is where you can learn the basics:
http://www.gamedev.net/reference/start_here/
Nearly all of the games are programmed in real time architecture and the computer capabilities(and the coding of course :)) determine the frame rate.
Game programming is a really complex job including object modeling, scripting, math calculations, fast and nice rendering algorithms and some other stuff like pixel shaders.
So i would recommend you to check out available engines in the first place.(just google "free game engine")
Basic logic is to create an infinite loop (while(true){}) and the loop should:
Listen for the callbacks - you get the keyb, mouse and system messages here.
Do the physics due to the time passed till the previous frame and user inputs.
Render the new frame (gdi, derictX or openGL)
Have fun

Basically, there are 2 different approaches that allow you to add animation to a game:
Frame-based Animation: easier to understand and implement but has some serious disadvantages. Think about it this way: imagine your game runs at 60FPS and it takes 2 seconds to draw a ball that goes from one side of the screen to the other. In other words, the game needs 120 frames to move the ball along the screen. If you run this game on a slow computer that's only able to render 30FPS, it means that after 2 seconds the ball will be at the middle of the screen. So the problem of this approach is that rendering (drawing the objects) and simulation (updating the positions of the objects) are done by the same function.
Time-based Animation: a sophisticated approach that separates the simulation code from the rendering code. The amount of FPS the computer can render will not influence the amount of movement (animation) that has to be done in 2 seconds.
Steven Lambert wrote a fantastic article about these techniques, as well as 3rd approach that solves a few problems with Time-based Animation.
Some time ago I wrote a C++/Qt application to demonstrate all these approaches and you can find a video of the prototype running here:
Source code is available on Github.

Searching for time-based movement will give you better results.
Basically, you either have a timer loop or an event triggered on a regular clock, depending on your language. If it's a loop, you check the time and only react every 1/60th of a second or so.
Some sites
http://www.cppgameprogramming.com/
Ruby game programming
PyGame

Flight Simulation is one of the more complex examples of real-time simulations. The understanding of fluid dynamics, control systems, and numerical methods can be overwhelming.
As an introduction to the subject of flight simulation, I recommend Build Your Own Flight Sim in C++. It is out of print, but seems to be available used. This book is from 1996, and is horribly dated. It assumes a DOS environment. However, it provides a good overview of the topics, covers numerical integration, basic flight mechanics and control systems. The code examples are simplistic, reasonably complete, and do not assume the more common toolsets used for graphics today. As with most things, I think it is easier to learn the subject with a more basic reference.
A more advanced text (college senior, first year graduate school) is Principles of Flight Simulation provides excellent coverage of the breadth of topics involved in making a flight simulation. This book would make an excellent reference for anyone seriously interested in flight simulation as an engineering task, or for more realistic game development.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse