I am looking for a way to change the bpm\tempo of mp3 file(that also convert already to mp3).
its also can be done with c library that i will add to my project.
There are two methods for changing the perceived tempo of audio. One method is to simply alter the playback speed. This is easy to do with PCM audio, but will require you to decode the MP3. It might be possible to alter the sample rate (effectively the same thing) in the compressed domain (i.e. in the MP3 file itself), but I don't know how to do that. The big drawback with this approach is all the pitches in the audio change, too. This can make vocals sound unnatural, for instance.
Another approach is to apply a pitch-invariant speed change. This is a far more complex operation, and there are many proprietary algorithms and research papers addressing it. The pitch-synchronous overlap add (PSOLA) technique is one that works well. You can also look at what Audacity (an open-source audio editor) does. Since you are on iOS, Apple's audio framework might also provide some support for this. Look for AUPitch in the iOS documentation.
Related
I have a MP3 file and need to constantly detect and show Hz value of this playing MP3 file. A bit of googling shows that I have 2 opportunities: use FFT or use Apple Accelerate framework. Unfortunately I haven't found any easy to use sample of neither. All samples, like AurioTouch etc, need tons of code to get simple number for sample buffer. Is there any easy example for pitch detection for iOS?
For example I've found https://github.com/clindsey/pkmFFT, but it's missing some files that its' author has removed. Anything working like that?
I'm afraid not. Working with sound is generally hard, and Core Audio makes no exception. Now to the matter at hand.
FFT is an algorithm for transforming input from time domain to frequency domain. Is not necessarily linked with sound processing, you can use it for other things other than sound as well.
Accelerate is an Apple provided framework, which among many other things offer an FFT implementation. So, you actually don't have two options there, just one and its implementation.
Now, depending on what you want to do(e.g. if you favour speed over accuracy, robustness over simplicity etc) and the type of waveform you have(simple, complex, human speech, music), FFT may be not enough on its own or not even the right choice for your task. There are other options, auto-correlation, zero-crossing, cepstral analysis, maximum likelihood to mention some. But none are trivial, except for zero-crossing, which also gives you the poorest results and will fail to work with complex waveforms.
Here is a good place to start:
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
There are also other question on SO.
However, as indicated by other answers, this is not something that can just be "magically" done. Even if you license code from someone (eg, iZotope, and z-plane both make excellent code for doing what you want to do), you still need to understand what's going on to get data in and out of their libraries.
If you need fast pitch detection go with http://www.schmittmachine.com/dywapitchtrack.html
You'll find a IOS sample code inside.
If you need FFT you should use Apple Accelerate framework.
Hope this help
Is there any API for iOS that allows me to create a WAV file made up of several other WAV samples? I am making a beat making app, the beat sound themselves are made of several small wav sounds like kick.wav, snare.wav etc. I want to assemble all these separate .wav sounds based on user's pattern to make a final output.wav. What should be the way to do this for iOS platform?
There is no iOS API to do anything like that directly, file to file.
The sequence of steps to do it manually might include reading the WAV files into buffers of raw PCM samples (AVAssetReader), resampling if the sample rates aren't all appropriate, compositing the samples at appropriate time offsets using a DSP mixer (including gain control and limiting), then writing the resulting composite vector as a new WAV file.
Each of those steps might be a separate questions, and for more than one stackexchange site, as there is more than one way to resample and mix down samples, and doing it well might be non-trivial, depending on your exact requirements.
Is there any way with Basic4Android to make it emit a sound of arbitrary frequency (meaning, I don't want to have pre-recorded sound files) and duration?
In some "traditional" Basic languages this would be done via e.g. a BEEP command followed by the desired frequency and duration.
Basic4Android doesn't seem to support any equivalent command.
I am looking for this feature in order to program a Morse Code generating app and for this purpose I need to stay flexible regarding the audio frequency tone (must be user selectable) between e.g. 500Hz and lets say 1000 Hz as well as variable duration in milliseconds (in order to be able to generate variable user selectable speeds of the morse code dashes and dots and silent breaks inbetween)...
It's simply not practical or near to impossible to do this with prerecorded WAV's or you would end up in a huge WAV collection for all frequency/speed combinations.
It seems to be possible in Android to do so, see example here:
http://marblemice.blogspot.com/2010/...n-android.html
As far as I can interpret this code it calculates a sine wave tone "on the fly" at the desired frequency into a buffer array and uses that buffer data to generate and play it as a PCM stream.
Since above code seems to be quite simple I wonder if a clever Java programming guy would come up with a simple Basic4Android "Tone Generator" library which others could use for this purpose?
Unfortunately, I am only an old fashioned VisualBasic guy and making my first steps with Basic4Android...for creating my own library my skills are simply too lousy.
The Audio library was updated and you can now use the Beeper object to play "beep" sounds.
Dim b As Beeper
b.Initialize(300, 500) '300 milliseconds, 500hz
b.Beep
Updated library link
This is definitely possible to do on Android, in a java-based application. I don't know if Basic4Android can do this "natively" (I'd never heard of Basic4Android before this), but it appears that you can create libraries in java that can then be accessed by Basic4Android, so it would be theoretically possible to create a java library that does this and then call it from your B4A app.
However, since this would entail learning some java and the Android plugin for Eclipse anyway, maybe you should just take the plunge and learn java for Android? I'm a long-term Visual Basic guy myself (started in 1995), and it wasn't really that difficult to transition to C# and thence to java.
I have developed a Windows application that captures video from an external device using DirectShow. The image resolution is 640x480 and the videos saved without compression have very huge sizes (approx. 27MB per second).
My goal is to reduce this size as much as possible, so I am looking for an encoder which will allow me to compress the video in real-time. It could be H.264, MPEG-2 or anything else. It must allow me to save the video to disk and it would be best if I also could stream it in real-time over network (Wi-Fi, so the size should be around 1MB per second, or less). The significant quality loss would be unacceptable.
I have found out that getting an appropriate DirectShow filter for this task is very difficult. It can be assumed that the client machine is reasonably modern (fast 2-core CPU) and can utilize CUDA/OpenCL. There are a few apps that allow to encode video using CUDA and offer good performance, however I have not found an appropriate DirectShow filter or an API which could be used to develop one. The NVIDIA nvcuvenc.dll seems to have private API so I am unable to use it directly. Any CPU-based encoders I have found are too slow for my requirements, but maybe I have missed some.
Could anybody recommend me a solution, i.e. an encoder (paid or free, that can be used in an closed-source app) that can achieve a good performance, regardless whether it is using CPU/CUDA/OpenCL or DirectCompute? Or maybe I should use some external hardware video encoder?
Best regards,
madbadger
Since you're using Directshow, by far the easiest thing to do would be to use WMV9 inside an ASF container. This is easier because it's available on almost all Windows machines (very few install time dependencies), decently fast (you should have no issues using it on a reasonably modern machine) and the quality is reasonable. But considering your limit is 8 mbit/sec (1 MB/sec), quality isn't an issue for you. 2 mbit/sec, VGA-resolution WMV9 should look quite good.
It's not nearly as good as a decent implementation of H264, but from an implementation standpoint, you're going to save yourself a lot of time by going this route.
See this:
http://msdn.microsoft.com/en-us/library/dd375008%28v=VS.85%29.aspx
Which filters have you tried?
If you're only dealing with 640x480, then any reasonable-quality commercial software-based encoder should be fine as long as you choose a realistic bitrate. Hardware acceleration using Cuda or OpenCL shouldn't be required. H264 takes a bit more horse-power and would require more CPU cores, but Mpeg2 or any of the h263-era codecs (xvid, wmv9, divx, etc) should have no problems even on a modest CPU. Streaming it over the network at the same time takes a little more effort, but should still be possible.
It's not DirectShow-based, but VLC Media Player can do all this. It's based on the FFMpeg open-source project. Some versions of it are LGPL-licensed, so the library could incorporated into your project without many restrictions.
If you just want a set of DirectShow filters that will handle all this for you, I've had good results with MainConcept's products before. They're at the expensive end of the spectrum though.
You dont specify what filters you've tried, or what 'significant' quality loss means, so about the best I suspect we can do is suggest some encoders to try to see if they meet your requirements.
Two good ones are the Theora and WebM video encoder filters (you can get them from a single installer at xiph.org). They're both high quality encoders which can be tweaked to balance performance vs quality. WebM can use multiple processors when encoding, which might help in your situation. Both are also used w/ HTML5 video, so that might be an extra plus for you.
Forget about WMV encoding for realtime streaming. WMV works well for realtime low quality streams, but it doesn't do high quality encoding in realtime.
I suggest that you take a look at MainConcept's SDK. They do a series of DirectShow filters for encoding H.264. I've implemented realtime streaming and muxing of streams encoded in H.264 using MainConcept's codec and DirectShow filters, and it's great.
Hope this helps
I am using Windows Media Encoder for real-time encoding, and it works well even in resolution 720x576. One such example of it's usage is in VideoPhill Recorder.
It is written in pure .NET with DirectShow.NET for capturing and windowsMedia.NET for encoding.
Using those two I am able to achieve real-time encoding with 24/7 stability.
And both libraries are free to use on Windows, so you won't have to pay any licenses except for OS.
ffdshow tryouts leverage ffmpeg's x264 stuff, which is said to be pretty fast (I think so anyway). Also libjpeg-turbo might help, or choosing some other codec made for high throughput like camstudio's or what not.
update: ffmpeg can take directshow input now: http://ffmpeg.zeranoe.com/forum/viewtopic.php?f=3&t=27
Have you seen this yet?
http://www.streamcoders.com/products/rtpavsource.html
http://www.streamcoders.com/products/rtpavrender.html
If you can stay at or below 1280x1024, Micorsofts MPEG-2 encoder (included in Vista and up) is quite good.
I haven't gotten it to work for 1080p content at all though. I suspect the encoder just aborts on that. Shame.
Here is one option : http://www.codeproject.com/Articles/421869/H-264-CUDA-Encoder-DirectShow-Filter-in-Csharp
It uses about 10% of my cpu (p4 3ghz) to encode a SD video to h264 in graph edit.
See the CaptureDS C# sample that comes with AVBlocks. It shows how to build a video recorder with AVBlocks and DirectShow. DirectShow is used for video capture and AVBlocks is used for video encoding:
I'm building an app that has a requirement for really accurate positional audio, down to the level of modelling inter-aural time difference (ITD), the slight delay difference between stereo channels that varies with a sound's position relative to a listener. Unfortunately, the iPhone's implementation of OpenAL doesn't have this feature, nor is a delay Audio Unit supplied in the SDK.
After a bit of reading around, I've decided that the best way to approach this problem is to implement my own delay by manipulating an AudioQueue (I can also see some projects in my future which may require learning this stuff, so this is as good an excuse to learn as any). However, I don't have any experience in low-level audio programming at all, and certainly none with AudioQueue. Trying to learn both:
a) the general theory of audio processing
and
b) the specifics of how AudioQueue implements that theory
is proving far too much to take in all at once :(
So, my questions are:
1) where's a good place to start learning about DSP and how audio generation and processing works in general (down to the level of how audio data is structured in memory, how mixing works, that kinda thing)?
2) what's a good way to get a feel for how AudioQueue does this? Are there any good examples of how to get it reading from a generated ring buffer, rather that just fetching bits of a file on-demand with AudioFileReadPackets, like Apple's SpeakHere example does?
and, most importantly
3) is there a simpler way of doing this that I've overlooked?
I think Richard Lyons' "Understanding Digital Signal Processing" is widely revered as a good starter DSP book, though it's all math and no code.
If timing is so important, you'll likely want to use the Remote I/O audio unit, rather than the higher-latency audio queue. Some of the audio unit examples may be helpful to you here, like the "aurioTouch" example that uses the Remote I/O unit for capture and performs an FFT on it to get the frequencies.
If the built-in AL isn't going to do it for you, I think you've opted into the "crazy hard" level of difficulty.
Sounds like you should probably be on the coreaudio-api list (lists.apple.com), where Apple's Core Audio engineers hang out.
Another great resource for learning the fundamental basics of DSP and their applications is The Scientist and Engineer's Guide to Digital Signal Processing by Steven W. Smith. It is available online for free at http://www.dspguide.com/ but you can also order a printed copy.
I really like how the author builds up the fundamental theory in a way that very palatable.
Furthermore, you should check out the Core Audio Public Utility which you'll find at /Developer/Extras/CoreAudio/PublicUtility. It covers a lot of the basic structures you'll need to get in place in order to work with CoreAudio.