I’ve been looking for a solution for this for the past week and still haven’t found it.
My goal is to crossfade between two audio files that are each loaded into a collection view using DidSelectItem. The problem is getting one to stop and the other one to play seamlessly without clicks or pops.
Things I’ve tried:
Github cephalopod library
Multiple AVAudioPlayers
If anyone can point me in the right direction I would appreciate it!
I would look into AVMutableAudioMixInputParameters. Specifically setVolumeRamp(fromStartVolume: toEndVolume: timeRange: CMTimeRange).
You can fade one track's volume down while you fade the other track's up, and that will result in a "seamless" transition between the two.
The clicking or popping I've found can also happen when you're using a compressed audio format like this post explains.. https://forums.macrumors.com/threads/avaudioplayer-avoiding-glitches-when-playing-looped-sounds.640862/
The reason for this is in the way compressed audio is stored compared
to uncompressed audio. Lets say, for example, you have an uncompressed
sound file which is 23600 samples long and you save the file as a .CAF
file which is compressed using AAC which, for simplicity, is
compressed at a ratio of 4:1. Ignoring the file header and (again) for
the sake of simplicity, the data in the file is stored in blocks of
1024 samples. 23600 samples compressed # 4:1 equals 5900 samples, so
you might expect your file would look like this;
block 0 : 1024 samples block 1 : 1024 samples block 2 : 1024 samples
block 3 : 1024 samples block 4 : 1024 samples block 5 : 780 samples
As you can see though, because the actual length of the sound file is
not an exact multiple of 1024 samples, the last block only contains
780 samples. Because variable block lengths are not allowed (not in
any compressed format I know but I could be wrong - it's certainly
true of MP3/AAC) the encoder has to deal with that last block by
padding out the end with silence. Therefore in the actual AAC file
you'll have;
block 0 : 1024 samples block 1 : 1024 samples block 2 : 1024 samples
block 3 : 1024 samples block 4 : 1024 samples block 5 : 780 samples +
244 samples of silence/0
This is fine for non looping sounds but the problem is obvious if you
attempt to play this expecting a seamless loop. Your sound file should
loop at the 5900th sample but because this would be in the middle of a
block of 1024 samples, looping doesn't actually occur until 244
samples later, hence you get a small pause or glitch at the loop
point.
Switching from .mp3 to .wav solved the clicking and popping I was getting between clips.
Related
I have two stereo wav files that I would like to take the left channel of the first audio file and take the right channel of the second audio file and join them into one new wave file.
Here's an image of what I'm trying to do.
I know I can read files into matlab / octave and get the separate left right channels with the code below:
[imported_sig_1, fs_rate, nbitsraw] = wavread(strcat('/tmp/01a.wav'));
imported_sig_L=imported_sig_1(:,1)';
[imported_sig_2, fs_rate, nbitsraw] = wavread(strcat('/tmp/02a.wav'));
imported_sig_R=imported_sig_2(:,2)';
I can then write the new channels that I want out using the code
wavwrite([(imported_sig_L)' (imported_sig_R)'] ,fs_rate,16,'newfile.wav'); %
The problem I'm running into is the time it takes to import the file and size of the array the wave files take up. The files I'm importing are about 1-4 hours long and it takes a while to import and it takes a lot of memory in the array is there away around importing the full file and then exporting them?
I'm using octave 3.8.1 on Ubuntu 14.04 which is like matlab but I also have access to sox
I assume the bottleneck is your hard drive and your system has sufficient memory to keep all thee files in memory at the same time. If so you won't gain an speed reading only one channel. With a 16 bit wav your HDD would have to skip 2 bytes, read 2 bytes, skip 2 bytes, read 2 bytes... For such a read operation it is much faster to copy the full file into the memory and remove the unwanted channels afterwards.
I would like to write a program that will take a video as input, create an output video file, and will (starting after a certain number of frames), begin writing modified frames to the output file frame by frame.
The modification will need to work on individual columns of pixels, one at a time.
Viewing this as a problem to be solved in Matlab, with each frame as a matrix... I cannot think of a way to make this computationally tractable.
I am hoping that someone might be able to offer suggestions on how I might begin to approach this problem.
Here are some details, in case it helps:
I'm interested in transforming a video in the following way:
Viewing a video as a sequence of (MxN) matrices, where each matrix is called a frame:
Take an input video and create new file for output video
For each column V in frame(i) of output video, replace this column by
column V in frame(i + V - N) of the input video.
For example: the new right-most column (column N) of frame(i) will contain column N of frame(i + N - N) = frame(i)... so that there is no replacement. The new 2nd to right-most column (column N-1) of frame(i) will contain column N-1 of [frame(i+N-1-N) = frame(i-1)].
In order to make this work (i.e. in order to not run out of previous frames), this column replacement will start on frame N of the video.
So... This is basically a variable delay running from left to right?
As you say, you do have two ways of going about this:
a) Use lots of memory
b) Use lots of file access
Your memory requirements increase as a cube power of the size of the video - the size of each frame increases, AND the number of previous frames you need to have open or reference increases. I.e. doubling frame size will require 4x memory per frame, and 2x number of frames open.
I think that Matlab's memory management will probably make this hard to do for e.g. a 1080p video, unless you have a pretty high-end workstation. Do you? A quick test-read of a 720p video gives 1.2MB per frame. 1080p would then be approx 5MB per frame, and you would need to have 1920 frames open: approx 10GB needed.
It will be more efficient to load frames individually, if you don't have enough memory - otherwise you will be using pagefiles and that'll be slower than loading frame-by-frame.
Your basic code reading each frame individually could be something like this:
VR=VideoReader('My_Input_Video_Filename.avi');
VW=VideoWriter('My_Output_Video_Filename.avi','MPEG-4');
NumInFrames=get(VR,'NumberOfFrames');
InWidth=get(VR,'Width');
InHeight=get(VR,'Height');
OutFrame=zeros(InHeight,InWidth,3,'uint8');
for (frame=InWidth+1:NumInFrames)
for (subindex=1:InWidth)
CData=read(VR,frame-subindex);
OutFrame(:,subindex,:)=CData(:,subindex,:);
end
writeVideo(VW,OutFrame);
end
This will probably be slow, and I haven't fully checked it works, but it does use a minimum amount of memory.
The best case for minimum file acess is probably using a ring buffer arrangement and the maximum amount of memory, which would look something like this:
VR=VideoReader('My_Input_Video_Filename.avi');
VW=VideoWriter('My_Output_Video_Filename.avi','MPEG-4');
NumInFrames=get(VR,'NumberOfFrames');
InWidth=get(VR,'Width');
InHeight=get(VR,'Height');
CDatas=read(VR,InWidth);
BufferIndex=1;
OutFrame=zeros(InHeight,InWidth,3,'uint8');
for (frame=InWidth+1:NumInFrames)
CDatas(:,:,:,BufferIndex)=read(VR,frame);
tempindices=circshift(1:InWidth,[1,-1*BufferIndex]);
for (subindex=1:InWidth)
OutFrame(:,subindex,:)=CDatas(:,subindex,:,tempindices(subindex));
end
writeVideo(VW,OutFrame);
BufferIndex=mod(BufferIndex+1,InWidth);
end
The buffer indexing code may need some tweaking there, but something along those lines would be a minimum file access, maximum memory use solution.
For a given PC with more or less memory, you can implement somewhere in between these two as a solution (i.e. reading somewhere between 1 and all frames per iteration) as a best-case.
Matlab will be quite slow for this kind of task, but it will be a good way of getting your algorithm right and working out indexing bugs and that kind of thing. Converting to a compiled language would give a good increase in speed - I converted a Matlab script to a C# program in a couple of hours, and gave a 10x increase in speed over an optimised script where the time taken was in the number of file reads.
Hope this helps, good luck!
I'm not sure if it's possible to achieve what I want, but basically I have a NSDictionary which represents a recording. It's a timeline of what sound id was played at what point in time.
I have it so that you can play back this timeline/recording, and it works perfectly.
I'm wondering if there is anyway to take this timeline, and export it as a single sound that could be saved to a computer if the device was synced with iTunes.
So basically I'm asking if I can take a timeline of sounds, play it back and have these sounds stitched together as a single sound, that can then be exported.
I'm using OpenAL as my sound framework and the sound files are all CAFs.
Any help or guidance is appreciated.
Thanks!
You will need:
A good understanding of linear PCM audio format (See Wikipedia's Linear PCM page).
A good understanding of audio sample-rates and some basic maths to convert your timings into sample-offsets.
An awareness of how two's-complement binary numbers (signed/unsigned, 16-bit, 32-bit, etc.) are stored in computers, and how the endian-ness of a processor affects this.
Patience, interest in learning, and a strong desire to get this working.
Here's what to do:
Enable file sharing in your app (UIFileSharingEnabled=YES in info.plist and write files to /Documents directory).
Render the used sounds into memory buffers containing linear PCM audio data (if they are not already, i.e. if they are compressed). You can do this using the offline rendering functionality of Audio Queues (see Apple audio queue docs). It will make things a lot easier if you render them all to the same PCM format and sample rate (For example 16-bit signed samples #44,100Hz, I'll use this format for all examples), and use the same format for your output. I recommend starting off with a Mono format then adding stereo once you get it working.
Choose an uncompressed output format and mix your sounds into a single stream:
3.1. Allocate a buffer large enough, or open a file stream to write to.
3.2. Write out any headers (for example if using WAV format output instead of raw PCM) and write zeros (or the mid-point of your sample range if not using a signed sample format) for any initial silence before your first sound starts. For example if you want 0.1 seconds silence before your first sound, write 4410 (0.1 * 44100) zero-samples i.e. write 4410 shorts (16-bit) all with zero.
3.3. Now keep track of all 'currently playing' sounds and mix them together. Start with an empty list of 'currently playing sounds and keep track of the 'current time' of the sample you are mixing, for each sample you write out increment the 'current time' by 1.0/sample_rate. When it gets time for another sound to start, add it to the 'currently playing' list with a sample offset of 0. Now to do the mixing, you iterate through all of the 'currently playing' sounds and add together their current sample, then increment the sample offset for each of them. Write the summed value into the output buffer. For example if soundA starts at 0.1 seconds (after the silence) and soundB starts at 0.2 seconds, you will be doing the equivalent of output[8820] = soundA[4410] + soundB[0]; for sample 8820 and then output[8821] = soundA[4411] + soundB[1]; for sample 8821, etc. As a sound ends (you get to the end of its samples) simply remove it from the 'currently playing' list and keep going until the end of your audio data.
3.4. The simple mixing (sum of samples) described above does have some problems. For example if two samples have values that add up to a number larger than 32767, this cannot be stored in a signed-16-bit number, this is called clipping. For now, just clamp the value to 32767, and get it working... later on come back and implement a simple limiter (see description at end).
Now that you have a mixed version of your track in an uncompressed linear PCM format, that might be enough, so write it to /Documents. If you want to write it in a compressed format, you will need to get the source for an audio encoder and run your linear PCM output through that.
Simple limiter:
Let's choose to limit the top 10% of the sample range, so if the absolute value is greater than 29490 (int limitBegin = (int)(32767 * 0.9f);) we will scale down the value. The maximum possible peak would be int maxSampleValue = 32767 * numPlayingSounds; and we want to scale values above limitBegin to peak at 32767. So do the summation into sampleValue as per the very simple mixer described above, then:
if(sampleValue > limitBegin)
{
float overLimit = (sampleValue - limitBegin) / (float)(maxSampleValue - limitBegin);
sampleValue = limitBegin + (int)(overLimit * (32767 - limitBegin));
}
If you're paying attention, you will have noticed that when numPlayingSounds changes (for example when a new sound starts), the limiter becomes more (or less) harsh and this may result in abrupt volume changes (within the limited range) to accommodate the extra sound. You can use the maximum number of playing sounds instead, or devise some clever way to ramp up the limiter over a few milliseconds.
Remember that this is operating on the absolute value of sampleValue (which may be negative in signed formats), so the code here is just to demonstrate the idea. You'll need to write it properly to handle limiting at both ends (peak and trough) of your sample range. Also, there are some tricks you can do to optimize all of the above during the mixing - you will probably spot these while you're writing the mixer, be careful and get it working first, then go back and refactor/optimize if needed.
Also remember to consider the endian-ness of the platform you are using and the file-format you are writing to, as you may need to do some byte-swapping.
One approach which isn't too hard if your files are stored in a simple format is just to combine them together manually. That is, create a new file with the caf format and manually put together the pieces you want.
This will be really easy if the sounds are uncompressed (linear PCM). But, read the documents on the caf file format here:
http://developer.apple.com/library/mac/#documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html#//apple_ref/doc/uid/TP40001862-CH210-SW1
I'm making an Iphone game, we need to use a compressed format for sound, and we want to be able to loop SEAMLESSLY back to a specific sample in the audio file (so there is an intro, then it loops back to an offset)
currently THE ONLY export process I have found that will allow seamless looping (reports the right priming and padding frame numbers, no clicking when looping ect) is using apple's afconvert to a aac format in a caf file.
but when we try and encode to lower bitrates, it automatically re samples the sound! we do NOT want to have the sound re sampled, every other encoder I have encountered has an option to set the output sample rate, but I can't find it for this one.
on another note, if anyone has had any luck with seamless looping of a compressed file format using audio queues, let me know.
currently I'm working off the information found at:
http://developer.apple.com/mac/library/qa/qa2009/qa1636.html
note that this DID work PERFECTLY when I left the bitrate for the encode at default (~128kbs) but when I set it to 32kbps - with the -b option - it resampled, and looping clicks now.
It needs to be at least 48kbps. 32kbps will downsample to a lower sample rate.
I think you are confusing sample rate (typical values: 32kHz, 44.1kHz, 48kHz) and bit rate (typical values: 128kbps, 160kbps, 192kbps).
For a bit rate, 32kbps is extremely low. Sound will have bad quality at this bit rate. You probably intended to set the sample rate to 32kHz instead, which is also not outright typical, but makes more sense.
When compressing to AAC and uncompressing back to WAV, you will not get the same audio file back, because in AAC, the audio data is represented in a completely different format than in raw wave. E.g. you can have shifts by few microseconds, which are necessary to convert to the compressed format. You can not completely get around this with any highly compressed format.
The clicking sound originates from the sudden change between two samples which are played in direct succession. This is likely taking place because the offset to which you jump back in your loop does not end up to be at exactly the same position in the AAC file as it was in the WAV file (as explained above, there can shifts by microseconds).
You will not get around these slight changes when compressing. Instead, you have to compensate for them after compression by adjusting the offset. That means you have to open the compressed sound file in an audio editor, e.g. Audacity, and manually find another offset close to the original one, which is suitable for looping.
How to find an offset which is suitable for looping?
Zoom in to the waveform's end. Look at how the waveform looks there. Then zoom in to the waveform at the original offset and search in its neighbourhood for an offset at which the waveform connects seamlessly to the end of the waveform.
For an example how this shoud look like, open the uncompressed audio file in the audio editor and examine the end of the waveform and the offset there.
I need to know if it is possible to create a 30 second sample MP3 from a WAV file. The generated MP3 file must feature a fade at the start and end.
Currently using ffmpeg, but can not find any documentation that would support being able to do such a thing.
Could someone please provide me the name of software (CLI, *nix only) that could achieve this?
This will
trim out from Position 45 sec. the next 30 seconds (0:45.0 30) and
fade the first 5 seconds (0:5) and the last 5 seconds (0 0:5) and
convert from wav to mp3
sox infile.wav outfile.mp3 trim 0:45.0 30 fade h 0:5 0 0:5
Check out SoX - Sound eXchange
I have not used it myself but one of my friends speaks highly of it.
From web page (highlighted my me):
SoX is a cross-platform (Windows,
Linux, MacOS X, etc.) command line
utility that can convert various
formats of computer audio files in to
other formats. It can also apply
various effects to these sound files,
and, as an added bonus, SoX can play
and record audio files on most
platforms.
The best way to do this is to apply the 30-second truncation, fade in and fade out to the WAV audio data before converting it to an MP3. If your conversion library has a method that takes an array of samples, this is very easy to do. If the method only accepts a WAV file (either in-memory or on disk), then this is slightly less easy as you have to learn the WAV file format (which is easy to write but somewhat more difficult to read). Either way, applying gain and/or attenuation to time-domain sample data (as in a WAV file) is much easier than trying to apply these effects to frequency-domain data (as in an MP3 file).
Of course, if your conversion library already does all this, it's best to just use that and not worry about it yourself.