Can we interlace videos without losing too much compression potential - interlacing

I know that in some cases, it's harder to compress efficiently interlaced videos. I've made some tests - though I can't remember how empty lines where filled - where I realized that jpeg where way bigger when interlaced.
In a C# project I'm doing, it's possible from the C++ Runtime side to interlace videos by just skipping odd or even lines when capturing. That means that I can almost double the speed of the image processing algorithm, opening the possibility to get a decent frame rate on smaller devices.
But is this possible to do anything that would make benefit in compression step? Currently, with the H.263 method I'm using, a 20 seconds, 640x480, 15fps video take up to 3-4 minutes to encode on the average device I'm using.
So, the question is: Is there an approach to modify frames either by interlacing them or any linear lossy way so that it wouldn't be -more- hard to compress than before this modification?

Related

Using libvlc to stream a video from memory?

I am generating a video stream in real time, and I've got it as a series of bitmaps in memory.
I'd like to stream these bitmaps over the network using libvlc, but I wasn't able to find the right functions in the API (all streaming functions expect a file or other source).
I even thought of emulating a capture device, but that seemed too convoluted to be true, so I'd rather ask.
My question is, what do I now have to do with these bitmaps to be able to use libvlc to stream them?
I found a question which appears to be solving the same issue.
Other suggestion with significantly less overhead is "emulating" a file with named pipes, i.e. FIFOs.

Waveform representation of any audio in iPhone

I have to draw a waveform for an audio file (CMK.mp3) in my application.
For this I have tried this Solution
As this solution is using AVAssetreader, which is taking two much time to display the waveform.
Can anyone please help, is there any other way to display the waveform quickly?
Thanks
AVAssetReader is the only way to read an AVAsset so there is no way around that. You will want to tune the code to process it without incurring unwanted overhead. I have not tried that code yet but I intend on using it to build a sample project to share on GitHub once I have the time, hopefully soon.
My approach to tune it will be to do the following:
Eliminate all Objective-C method calls and use C only instead
Move all work to a secondary queue off the main queue and use a block to call back one finished
One obstacle with rendering a waveform is you cannot have more than one AVAssetReader running at a time, at least the last time I tried. (It may have changed with iOS 6 possibly) A new reader cancels the other and that interrupts playback, so you need to do your work in sequence. I do that with queues.
In an audio app that I built it reads the CMSampleBufferRef into a CMBufferQueueRef which can hold multiple sample buffers. (see copyNextSampleBuffer on AVAssetReader) You can configure the queue to provide you with enough time to process a waveform after an AVAssetReader finishes reading an asset so that the current playback does not exhaust the contents of the CMBufferQueueRef before you start reading more buffers into it for the next track. That will be my approach when I attempt it. I just have to be careful that I do not use too much memory by making the buffer too big or making the buffer so big that it causes issues with playback. I just do not know how long it will take to process the waveform and I will test it on my older iPods and iPhone 4 before I try it on my iPhone 5 to see if they all perform well.
Be sure to stay as close to C as possible. Calls to Objective-C resources during this processing will incur potential thread switching and other run-time overhead costs which are significant enough to be noticeable. You will want to avoid that. What I may do is set up Key-Value Observing (KVO) to trigger the AVAssetReader to start the next task quickly so that I can maintain gapless playback between tracks.
Once I start my audio experiments I will put them on GitHub. I've created a repository where I will do this work. If you are interested you can "watch" that repo so you will know when I start committing updates to it.
https://github.com/brennanMKE/Audio

How can I monitor an mp3 live stream to detect corruption?

Once a month the mp3 streams messes up and the only way to tell it has messed up is by listening to it as it streams. Is there a script or program or tool I can use to monitor the live streams at a given url and send some kind of flag when it corrupts?
What happens is normally it plays a song for example or some music but once a month, every month, randomly, the stream corrupts and starts random chimpmunk like trash audio. Any ideas on this? I am just getting started at this with no idea at all.
Typically, this will happen when you play a track of the wrong sample rate.
Most (all that I've seen) SHOUTcast/Icecast encoders (going straight from files) will compress for MP3 just fine, but assume a fixed sample rate of whatever they are configured for. Typically this will be 44.1kHz. If you drop in a 48kHz track, or a 22.05kHz track, they will play at different speeds while causing all sorts of random issues with the stream.
The problem is easy enough to verify. Simply create a file of a different sample rate and test it. I suspect you will reproduce the problem. If that is the case, to my knowledge there is no way to detect it, since your stream isn't actually corrupt... it just sounds incorrect. You will have to scan all of your files for sample rate. FFMPEG in a script should be able to help you with that.
Now, if the problem actually is a corrupt MP3 stream, then you have problems on your encoding side. I suspect simply swapping out whatever DLL or module you're using with a recent stable version of LAME will help.
To detect a corrupt MP3 stream, your encoder must be using CRC. If you enable it, you should be able to read through the headers of each frame to find the CRC, and then run it on the audio data. In the event you get an error (or several frames with errors), you can then trigger a warning.
You can find information on the MP3 stream header here:
http://www.mp3-tech.org/programmer/frame_header.html

Programmatically 'Listening' to Sound (Signal Processing?)

I'm familiar with Computer Vision (Well, know OF it), of which one application can be image recognition, such as Optical Character Recognition, I believe. However, something that I am more interested in is 'computer listening', which I have just learned is considered Digital Signal Processing.
The thing that interests me the most about signal processing is the potential application in music. I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played. Using the program, the user was able to move these around and even edit them. Now, obviously this is a lot more complicated, but does it involve the same thing? Signal Processing? I am also interested in possible applications in music visualizers and intelligent lighting systems.
My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood). Would an uncompressed format such as PCM do better than MP3? I don't know anything about sound processing, that's just what I'm inferring from what I've read so far.
I have already seen this question which has great answers and links that cover a lot of my questions. However, most of the links I've found are theoretical, which I'm sure is all interesting and is definitely worth a read given my interest in the subject, but I wanted to know if there are any existing libraries which can facilitate this, or articles pertaining to this subject that geared towards Computer Science/Programming, with perhaps example code. Even open source sound/music visualizers or any other open source sound processing code would be great.
Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.
The thing that interests me the most
about signal processing is the
potential application in music. I
remember a while ago I saw a preview
of an application (Sorry, forgot the
name)
Maybe cubase ?
which could listen to a recording of
someone playing a guitar, and
automatically graph it out across a
time-line with the actual notes/chords
that were played
Deeply simplified, when you play a note you produce a periodic wave with a given frequency. There's a mathematical trick (the Fourier transform DFT) that converts the wave into the spectrum, which instead of presenting intensity against time, it shows it against frequency of the wave. For example, a perfect A note from a tuning fork would produce an oscillating wave at 440 Hz. In the time domain this would appear as a sinusoidal wave. In the frequency domain, it will appear as a single, narrow spike centered at 440 Hz.
Now, when you play a guitar you don't produce perfect sinusoidal waves. Hitting an A will produce the fundamental frequency, 440 Hz, but also a lot of additional frequencies (e.g. 880, on octave higher, but also a lot of other higher and lower freqs), due to the physics of the vibrating string, the material and shape of the guitar etc.. These additional frequencies are called harmonics, and they mix with the fundamental to produce "the sound of the guitar" (what in musical jargon is called timbre). A different instrument (say piano) will have different mixing of harmonics with the fundamental, producing a different timbre.
What DSP programs do is to perform a DFT on the entering signal. With additional tricks, they find the fundamental and the harmonics, and according to what they find they infer the note you played. This must happen fast, because you could find the note while playing live and triggering special tricks. For example, you could hit an A note on the guitar, the DSP understands it's an A and replaces it with the A from a piano, so from the speakers you obtain the sound of a piano.
Using the program, the user was able
to move these around and even edit
them. Now, obviously this is a lot
more complicated, but does it involve
the same thing? Signal Processing? I
am also interested in possible
applications in music visualizers and
intelligent lighting systems.
Yes. Once you are in the frequency domain, things gets very easy. For example, you could light up a specific light according to the voice frequencies, and another light with the bass drum.
My understanding is that doing this
processing on a compressed audio
format such as MP3 wont yield the same
results as MIDI which contains
separate tracks (Maybe I
misunderstood).
They are two different things. MP3 is a compressed format from a sound wave. Basically it takes what pilots the speakers, and compresses it. The idea is the same: DFT, then removal of stuff that is unlikely to be heard (for example, a high pitch that comes right after a high intensity sound is less likely to be heard, so it gets removed).
MIDI on the other hand is a scroll of events (you know, like those pianos in the far west, with the rolling paper scroll). The file contains no music. It contains instead directions for a MIDI player to perform specific notes at specific times with specific instruments. The quality of the "instrument bank" is (among other things) what distinguish a bad MIDI player (which sounds like a child toy) from a good MIDI player (which sounds realistic, in particular for pianos and violins, for wind instruments I still have to hear a realistic one).
It takes that going from MIDI to MP3, you just perform through a MIDI player. To do the other way around is a different story altogether, and much more complex, and here is where DSP comes into play, as you said.
It's like boiling a fisk tank. You get a fish soup. But to get from the fish soup back to the fish tank, it's much harder.
Would an uncompressed
format such as PCM do better than MP3?
PCM is a technique to convert an analog signal to a digital signal. So your question has a fundamental misunderstanding, that no PCM format exists (the RAW format is a close call, contaning basically nothing but crude data). If you ask if a uncompressed WAV (which contains PCM data) is better than MP3, then yes, but the question sometimes is how much this better really matters to the human ear, and how much postprocessing you have to perform on that data.
know if there are any existing
libraries which can facilitate this,
or articles pertaining to this subject
that geared towards Computer
Science/Programming, with perhaps
example code. Even open source
sound/music visualizers or any other
open source sound processing code
would be great.
If you like python, take a look at this page
Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.
Neither do I, but I toyed a bit with it.
My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood).
MIDI essentially stores instrument information and musical notes. Also other effects (volume, pitch bend, vibrato, attack rate, etc.)
Not really digital signal processing.
Would an uncompressed format such as PCM do better than MP3?
Maybe somewhat; it depends on the application. MP3 reduces the precision of frequencies that humans are not sensitive to. If you want to do visualisations then MP3 is probably fine.
But if you want to, say, determine what sort of instrument is playing in a recording, then there could be useful information hidden in the frequencies that humans are not sensitive to.
I think The Scientist and Engineer's Guide to Digital Signal Processing is a great reference for programmers. Chapter 8 explains the discrete Fourier transform (used in MP3 processing and a lot of other places to separate out the component frequencies of a wave).
I used it to help make a graphical program that let you draw a wave with the mouse, then applied the DFT, and let you select how many frequencies to include. It was a great exercise.
I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played.
You might also be thinking of Melodyne: http://www.celemony.com/cms/
Though Vari audio in newer version of Cubase is pretty similar. :)
I think you need to define exactly what you are looking for and what you are trying to do.
If you want to learn about DSP, MIDI or PCM then there is plenty of information on Wikipedia and references.
There are many a myriad of applications for audio manipulation available. What you've described in your question is what takes place in every digital recording studio (which these days would account for almost all studios) every single day.
If you are intending to perform some DSP against, say, a guitar sound then you would ideally have a recording of the guitar itself (rather than a mixed down track containing drums or vocals). It should be quite obviously that you will get better results analysing a discrete signal without additional noise than you will analysing a signal containing significant levels of 'noise'. So yes, a multitrack recording would be preferable to 'an MP3'.
Typical MP3 contains left and right channels (tracks) so it technically is multitrack. When music is recorded (professionally, at least) different signals are recorded onto different tracks, precisely so that they can be edited and processed discretely at a later time.
What, then, do you want to do with the sounds?
As other answers have pointed out, this does not relate to MIDI at all.

Advice on using sandbox vs. caching for UITableView async image download

Apple just released some sample code on lazy loading images in a UITableView a week ago. I checked it out and implemented it into my own UITableView (which is a drawRect one for fast scrolling), to see if there was a difference from what I was already doing.
After implementing I am not sure what is best; the new code or what I already had. I am not seeing much of a speed improvement on my 3GS.
"Sandbox" method: Load images lazily, then save to local tmp folder in the sandbox. Each time the cell is displayed it looks for whether an image with that filename is already located in the sandbox folder. If it is, it retrieves the image and displays it, if not it continues with the download, saves it locally and then displays it. The benefit with this is that the images won't be blank the second time you open the app. They will already be downloaded and ready for displaying.
Caching method: This also loads the images lazily, however, now I include a UIImage on each object in the array that's displayed in the tableview. Instead of saving the image locally, I now download the image and put it into the array for the object. Now, instead of checking for the filename every single time, it jut check whether the UIImage != nil and uses the cached image (or downloads if nil).
A small difference is also that the caching code resizes the image before caching it to the exact size of what is displayed in the cell, whereas the image used in the sandbox code example is actually a bit larger than what it needs to display, which means it has to resize on the fly when scrolling as well. I read months ago that this could be a bit expensive to do, and I am also not sure whether it makes much of a difference in terms of then using a cached image instead of the sandbox-stored image and therefore more CPU intensive anyway (compared to what you save from caching with the caching code above).
I guess my question would be whether I should even bother with the caching code? Again, the new code won't immediately load images on a new launch, whereas the old code actually does because it's already in the sandbox. Since I am not reusing images, I have a lot of images to load (from the sandbox or cache) so I am not noticing a huge difference in speed. In fact, on my 3GS it's almost impossible to tell, in my opinion. The scrolling is not silky smooth, and I assume this is due to the large amount of images that I cannot reuse (different image for each cell). I am also wondering whether the sandbox method would get slower once there's 1000+ images in the folder, for example, eventually having it look through many more images than just 100 or so.
I hope I am making sense. I wanted to be pretty thorough with the details, and I am happy to give more details if needed.
Thanks!
If you have code that already works, and there's not a pressing problem, then don't change it.
If your scrolling actually is too slow, then perhaps you could use a mixture of ideas, and try to get the UIImage, and if it's not there, load it from the sandbox, and if it's not there, then download it.
The only good way to tell if there is any discernible difference in performance is to use profiling tools like Instruments (for measuring things like display framerate for the two techniques) or Shark (to determine hotspots in your code). There could be small differences in your exact implementation that could potentially cause significant differences between any general answer we could give and the actual performance you see in your application.
The thing that primarily concerns me with the "sandbox" method is not performance but disk space usage. Users won't appreciate you filling up their iPhone or iPod Touch with unnecessary files, especially if all the images aren't consistently used or if the set of used images changes often. Without knowing more about your application its impossible to guess how often these cached images would be loaded.
If you're testing locally on your own device, you might be on Wifi network. My recommendation would be to turn Wifi off for part of your testing to see how the two approaches perform when you have to fetch all the images over the cellular network. I would also recommend trying to find an older device (iPhone 3G or worse) because the 3GS does in fact hide potential performance issues that could be annoying for users on older devices.
I have personally used the LazyTableImages technique in my apps many times (provided it hasn't changed drastically between WWDC09 and the recent 'release') and find it to be just what I need. Caching images on disk wouldn't be an option in my case, however, and you shouldn't take my anecdote too strongly into account - profile your own code and use the results it shows.
Edit: The obvious answer is that accessing an in-memory cache is going to be faster than accessing the filesystem, but of course the final word on that is left up to profiling. If the images are already in memory, they don't need to be read from flash and parsed by UIImage. The traditional tradeoff comes into play here though - in-memory caching vs. disk space.
While it may be faster for you to store your images in-memory, you need to be very sure that you correctly handle memory warnings in your application (as you should be doing anyway!). Otherwise long period of use will lead to many, many images in your in-memory cache and trigger memory warnings and if your application is not built to handle these, at best your application will be killed by the OS due to lack of memory resources.
There are pros and cons in both approaches that you present - I suggest using elements of both in your app.
It's better to keep your images in memory and save them later (perhaps when your app quits). If you have a lot of images, it might be faster to use Core Data to save them, than as regular files.
It's also better to avoid doing any resizing on the fly, i.e. in your tableView:cellForRowAtIndexPath: or tableView:willDisplayCell:forRowAtIndexPath: methods or in any method that has to do with drawing your cells' content view. If you can, ask the image provider (content management?) to supply images at the size that your table view displays.