I'm having a hair-pulling moment here, trying to adapt the MixerHost sample code to be more memory efficient using a circular buffer. But the problem seems to occur simply when I change the code from reading the entire Audio File to reading just a chunk of 4K bytes. The Audio File is indeed compressed, so the clientFormat is LPCM and there is an implicit conversion that takes place, just as in the sample code. But when I read smaller chunks, the data that is collected into the bufferList (AudioBufferList *) seems to be different. And it depends on the number_of_frames parameter in the ExtAudioFileRead() call:
ExtAudioFileRead (
audioFileObject,
&numberOfFramesToRead, // <- this set to chunk size in bytes
bufferList // <- contains 2 buffers, 1 chan each for L, R
);
Q: Is the number of frames to read supposed to mean that number in the OUTPUT Format? So, if I specify 1024 frames, I'll get 1024 L,R samples read to simple LPCM format?
Q: Why would I get different results if the number of frames per read is different?
SDK: iOS 5.1
Related
If for example I have two video files, both of similar characteristics, file type, encoding, resolution, etc and starting at the same point but A goes on for 10 seconds while B goes on for 20. If A's file size is 10MB and B's is 20MB, if I read in e.g. the first 5MB from both will the major video encoding formats' binary sequences match for that 5MB?
E.G. MP4, AVI, MOV, WMV?
No, different containers work differently, the first X bytes will not contain the same number of frames. In some cases like mp4, you may get audio, or metadata, and no video at all, or you may get bytes that can not be interpreted without information that comes later in the file.
I am making a simple png image from scratch. I have had the scanlines data for it. Now I want to make it into zlib stream without being compressed. How can I do that? I have read the "ZLIB Compressed Data Format Specification version 3.3" at "https://www.ietf.org/rfc/rfc1950.txt" but still not understanding. Could someone give me a hint about setting the bytes in zlib stream?
Thanks in advance!
As mentioned in RFC1950, the details of the compression algorithm are described in another castle RFC: DEFLATE Compressed Data Format Specification version 1.3 (RFC1951).
There we find
3.2.3. Details of block format
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
Note that the header bits do not necessarily begin on a byte
boundary, since a block does not necessarily occupy an integral
number of bytes.
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
[... a few other types]
which is the one you wanted. These 2 bits BTYPE, in combination with the last-block marker BFINAL, is all you need to write "uncompressed" zlib-compatible data:
3.2.4. Non-compressed blocks (BTYPE=00)
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
So, the pseudo-algorithm is:
set the initial 2 bytes to 78 9c ("default compression").
for every block of 32768 or less bytesᵃ
if it's the last block, write 01, else write 00
... write [block length] [COMP(block length)]ᵇ
... write the immediate data
repeat until all data is written.
Don't forget to add the Adler-32 checksum at the end of the compressed data, in big-endian order, after 'compressing' it this way. The Adler-32 checksum is to verify the uncompressed, original data. In the case of PNG images, that data has already been processed by its PNG filters and has row filter bytes appended – and that is "the" data that gets compressed by this FLATE-compatible algorithm.
ᵃ This is a value that happened to be convenient for me at the time; it ought to be safe to write blocks as large as 65535 bytes (just don't try to cross that line).
ᵇ Both as words with the low byte first, then high byte. It is briefly mentioned in the introduction.
I’ve been looking for a solution for this for the past week and still haven’t found it.
My goal is to crossfade between two audio files that are each loaded into a collection view using DidSelectItem. The problem is getting one to stop and the other one to play seamlessly without clicks or pops.
Things I’ve tried:
Github cephalopod library
Multiple AVAudioPlayers
If anyone can point me in the right direction I would appreciate it!
I would look into AVMutableAudioMixInputParameters. Specifically setVolumeRamp(fromStartVolume: toEndVolume: timeRange: CMTimeRange).
You can fade one track's volume down while you fade the other track's up, and that will result in a "seamless" transition between the two.
The clicking or popping I've found can also happen when you're using a compressed audio format like this post explains.. https://forums.macrumors.com/threads/avaudioplayer-avoiding-glitches-when-playing-looped-sounds.640862/
The reason for this is in the way compressed audio is stored compared
to uncompressed audio. Lets say, for example, you have an uncompressed
sound file which is 23600 samples long and you save the file as a .CAF
file which is compressed using AAC which, for simplicity, is
compressed at a ratio of 4:1. Ignoring the file header and (again) for
the sake of simplicity, the data in the file is stored in blocks of
1024 samples. 23600 samples compressed # 4:1 equals 5900 samples, so
you might expect your file would look like this;
block 0 : 1024 samples block 1 : 1024 samples block 2 : 1024 samples
block 3 : 1024 samples block 4 : 1024 samples block 5 : 780 samples
As you can see though, because the actual length of the sound file is
not an exact multiple of 1024 samples, the last block only contains
780 samples. Because variable block lengths are not allowed (not in
any compressed format I know but I could be wrong - it's certainly
true of MP3/AAC) the encoder has to deal with that last block by
padding out the end with silence. Therefore in the actual AAC file
you'll have;
block 0 : 1024 samples block 1 : 1024 samples block 2 : 1024 samples
block 3 : 1024 samples block 4 : 1024 samples block 5 : 780 samples +
244 samples of silence/0
This is fine for non looping sounds but the problem is obvious if you
attempt to play this expecting a seamless loop. Your sound file should
loop at the 5900th sample but because this would be in the middle of a
block of 1024 samples, looping doesn't actually occur until 244
samples later, hence you get a small pause or glitch at the loop
point.
Switching from .mp3 to .wav solved the clicking and popping I was getting between clips.
I would like to write a program that will take a video as input, create an output video file, and will (starting after a certain number of frames), begin writing modified frames to the output file frame by frame.
The modification will need to work on individual columns of pixels, one at a time.
Viewing this as a problem to be solved in Matlab, with each frame as a matrix... I cannot think of a way to make this computationally tractable.
I am hoping that someone might be able to offer suggestions on how I might begin to approach this problem.
Here are some details, in case it helps:
I'm interested in transforming a video in the following way:
Viewing a video as a sequence of (MxN) matrices, where each matrix is called a frame:
Take an input video and create new file for output video
For each column V in frame(i) of output video, replace this column by
column V in frame(i + V - N) of the input video.
For example: the new right-most column (column N) of frame(i) will contain column N of frame(i + N - N) = frame(i)... so that there is no replacement. The new 2nd to right-most column (column N-1) of frame(i) will contain column N-1 of [frame(i+N-1-N) = frame(i-1)].
In order to make this work (i.e. in order to not run out of previous frames), this column replacement will start on frame N of the video.
So... This is basically a variable delay running from left to right?
As you say, you do have two ways of going about this:
a) Use lots of memory
b) Use lots of file access
Your memory requirements increase as a cube power of the size of the video - the size of each frame increases, AND the number of previous frames you need to have open or reference increases. I.e. doubling frame size will require 4x memory per frame, and 2x number of frames open.
I think that Matlab's memory management will probably make this hard to do for e.g. a 1080p video, unless you have a pretty high-end workstation. Do you? A quick test-read of a 720p video gives 1.2MB per frame. 1080p would then be approx 5MB per frame, and you would need to have 1920 frames open: approx 10GB needed.
It will be more efficient to load frames individually, if you don't have enough memory - otherwise you will be using pagefiles and that'll be slower than loading frame-by-frame.
Your basic code reading each frame individually could be something like this:
VR=VideoReader('My_Input_Video_Filename.avi');
VW=VideoWriter('My_Output_Video_Filename.avi','MPEG-4');
NumInFrames=get(VR,'NumberOfFrames');
InWidth=get(VR,'Width');
InHeight=get(VR,'Height');
OutFrame=zeros(InHeight,InWidth,3,'uint8');
for (frame=InWidth+1:NumInFrames)
for (subindex=1:InWidth)
CData=read(VR,frame-subindex);
OutFrame(:,subindex,:)=CData(:,subindex,:);
end
writeVideo(VW,OutFrame);
end
This will probably be slow, and I haven't fully checked it works, but it does use a minimum amount of memory.
The best case for minimum file acess is probably using a ring buffer arrangement and the maximum amount of memory, which would look something like this:
VR=VideoReader('My_Input_Video_Filename.avi');
VW=VideoWriter('My_Output_Video_Filename.avi','MPEG-4');
NumInFrames=get(VR,'NumberOfFrames');
InWidth=get(VR,'Width');
InHeight=get(VR,'Height');
CDatas=read(VR,InWidth);
BufferIndex=1;
OutFrame=zeros(InHeight,InWidth,3,'uint8');
for (frame=InWidth+1:NumInFrames)
CDatas(:,:,:,BufferIndex)=read(VR,frame);
tempindices=circshift(1:InWidth,[1,-1*BufferIndex]);
for (subindex=1:InWidth)
OutFrame(:,subindex,:)=CDatas(:,subindex,:,tempindices(subindex));
end
writeVideo(VW,OutFrame);
BufferIndex=mod(BufferIndex+1,InWidth);
end
The buffer indexing code may need some tweaking there, but something along those lines would be a minimum file access, maximum memory use solution.
For a given PC with more or less memory, you can implement somewhere in between these two as a solution (i.e. reading somewhere between 1 and all frames per iteration) as a best-case.
Matlab will be quite slow for this kind of task, but it will be a good way of getting your algorithm right and working out indexing bugs and that kind of thing. Converting to a compiled language would give a good increase in speed - I converted a Matlab script to a C# program in a couple of hours, and gave a 10x increase in speed over an optimised script where the time taken was in the number of file reads.
Hope this helps, good luck!
As far as I know when I load wav files to matlab with command:
song = wavread('file.wav');
array song have elements with values from -1 to 1. This file (and hardware) is prepared to be played with 80dB. I need to add +30dB to achieve 110dB.
I do +10dB by multiplying by sqrt(10), so to get +30dB I do:
song = song*10*sqrt(10); which is the same as
song = song*sqrt(10)*sqrt(10)*sqrt(10);
Now values of array song have much greater values than -1 to 1 and I hear distorted sound.
Is it because of this values greater than <-1,1> or quality of my speakers/headphones?
The distortion is because your values are exceeding +/-1. The float values are converted to ADC counts, which are either +/-32768 (for a 16-bit ADC) or +/-8388608 (for a right-justified 24-bit ADC) or +/-2147483648 (for a left-justfied 24-bit ADC). For a 16-bit ADC, this is usually accomplished by an operation like adcSample = (short int)(32768.0*floatSample); in C. If floatSample is > +1 or < -1 this will cause wraparound in the short int cast, which is the distortion you hear. The cast is necessary because the ADC expects 16-bit digital samples.
You will need to adjust your amplifier/speaker settings to get the sound level you desire.
Conversely, you could create a copy of your file, lower it by 30 dB, adjust your amplifier/speakers to play the new file at 80 dB, then play the original file at the same amp/speaker settings. This will cause the original file to be played at 110 dB.
As Paul R noted in his comment, I am guessing here that you are using dB as shorthand for dB SPL when referring to the actual analog sound level produced by the full signal chain.