Streaming x264 with packet loss - encoding

I write the program where I use x264 as the coder.
I use the following parameters:
av_opt_set (codecContextH264[numberCoder]-> priv_data, "profile", "baseline", 0);
av_opt_set (codecContextH264[numberCoder]-> priv_data, "preset", "ultrafast", 0);
av_opt_set (codecContextH264[numberCoder]-> priv_data, "tune", "zerolatency", 0);
codecContextH264[numberCoder]-> bit_rate =bitrate;
codecContextH264[numberCoder]-> bit_rate_tolerance=bitrate-5000;
codecContextH264[numberCoder]-> width = w;
codecContextH264[numberCoder]-> height = h;
codecContextH264[numberCoder]-> time_base.den = fps;
codecContextH264[numberCoder]-> time_base.num = 1;
codecContextH264[numberCoder]-> pix_fmt = PIX_FMT_YUV420P;
codecContextH264[numberCoder]-> gop_size = fps*3;
codecContextH264[numberCoder]-> keyint_min = fps*3;
codecContextH264[numberCoder]-> max_b_frames = 0;
codecContextH264[numberCoder]-> slices = (int) (w*h)/1500+1;
I use only I and P frames.
What x264 settings I shall use that could lose P frames?
Perhaps x264 has no such opportunity?!
I read that if to use a "base" profile, it is possible to lose P frames...
Help please.

You can try setting the gop_size and keyint_min to 0 - that should result in a stream with only I frames, but that kind of looses the sense of compression as such.
The further is based on the assumption that you are using RTP over UDP - if you are streaming in an environment where packet loss is high, why not use TCP or implement some kind of quality service where if you see that RTP sequence numbers are missing you force the source to issue a new keyframe.


iPhone 4S, OpenGL ES seems too slow. What's wrong?

I draw 2560 very slim polygons for each frame on an iPhone 4S using OpenGL ES. The problem is that I'm getting framerates around 30, which is not smooth enough for my taste. I think it should be faster than that.
Is that right?
Please help me finding out what can be improved.
UPDATE: I do the rendering on the main thread. Are there any recommendations on which thread to perform the rendering operations?
A bit background:
I'm trying to make a smoothly scrolling (target is 60 FPS) waveform of size 320x200 in iPhone view coordinates, so 640x400 pixels on a retina display.
My test device is an iPhone 4S. With iOS 6 and 6.1, I could achieve this easily with normal UIKit drawing operations. However, since I updated the device to iOS 7, it got much slower, so I decided to use OpenGL ES, because I read lots of times that it allows faster 2D drawing.
I implemented drawing the waveform with OpenGL ES 2.0, but now it's just a slight bit faster on the device than with UIKit. And like with UIKit, the speed greatly depends on the number of pixels being drawn to, which makes me wonder what's going on.
The waveform is composed out of bars/rectangles, each of them is exactly 1 pixel in width. I draw two bars per pixel column, and each bar consists of two polygons, which means I draw 1280 bars, or 2560 polygons for each frame. The polygons are extremely slim. Each of them is at most 1 pixel wide. I think this should be no problem to draw at 60FPS with OpenGL ES.
I draw one bar like this:
- (void) glFillRect: (Float32)x0 : (Float32)y0 : (Float32)x1 : (Float32)y1 {
GLfloat vertices[8];
glVertexAttribPointer(GLKVertexAttribPosition, 2, GL_FLOAT, GL_FALSE, 0, vertices);
GLfloat* vp = vertices;
*vp++ = x0; *vp++ = y0;
*vp++ = x1; *vp++ = y0;
*vp++ = x0; *vp++ = y1;
*vp++ = x1; *vp++ = y1;
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
The code calling the above method is below. _maxDrawing and _avgDrawing are my effects, which are composed like this at app startup time:
_maxDrawing = [[GLKBaseEffect alloc] init];
_maxDrawing.useConstantColor = GL_TRUE;
_maxDrawing.constantColor = GLKVector4Make(0.075f, 0.1f, 0.25f, 1.0f);
I later adjust the projection matrix so that my drawing coordinates for OpenGL ES line up with the view coordinates of my view, which, afaik, is the standard way to go for 2D drawing.
[_maxDrawing prepareToDraw];
x_Cu = [self transformViewXToWaveformX:rect.origin.x];
for (Float32 x_Vu = rect.origin.x; x_Vu < viewEndX_Vu; x_Vu += onePixelInViewUnits) {
x_Cu += onePixelInContentUnits;
if (x_Cu < 0 || x_Cu >= waveformEndX_Cu) {
SInt64 frameIdx = (SInt64) x_Cu;
CBWaveformElement element;
element = [self.dataSource getElementContainingFrame:frameIdx];
prevMax = curMax;
curMax = futureMax;
futureMax = element.max;
smoothMax = prevMax * 0.25 + curMax * 0.5 + futureMax * 0.25;
if (smoothMax < curMax)
smoothMax = curMax;
Float32 barHeightHalf = smoothMax * heightScaleHalf;
Float32 barY0 = viewHeightHalf - barHeightHalf;
Float32 barY1 = viewHeightHalf + barHeightHalf;
[self glFillRect: x_Vu : barY0 : x_Vu + onePixelInViewUnits : barY1];
[_avgDrawing prepareToDraw];
x_Cu = [self transformViewXToWaveformX:rect.origin.x];
for (Float32 x_Vu = rect.origin.x; x_Vu < viewEndX_Vu; x_Vu += onePixelInViewUnits) {
x_Cu += onePixelInContentUnits;
if (x_Cu < 0 || x_Cu >= waveformEndX_Cu) {
SInt64 frameIdx = (SInt64) x_Cu;
CBWaveformElement element;
element = [self.dataSource getElementContainingFrame:frameIdx];
Float32 barHeightHalf = element.avg * heightScaleHalf;
Float32 barY0 = viewHeightHalf - barHeightHalf;
Float32 barY1 = viewHeightHalf + barHeightHalf;
[self glFillRect: x_Vu : barY0 : x_Vu + onePixelInViewUnits : barY1];
When I take out all the OpenGL calls, the execution duration for one frame is around 1ms, which means it could theoretically go up to 1000 FPS. All other time (around 33ms) is spent drawing.
Per Daniel's request, I'm posting this as an answer to close the question out.
In the above code, it appears that you're using a glDrawArrays() call per each box. This incurs a significant amount of overhead with a lot of boxes.
A more efficient way to approach this would be to use a VBO (probably a dynamically updated one) containing all the vertices of a your scene, or at least a larger group of the boxes, and to draw all of those with a single call.
As rickster points out, iOS 7 adds some nice support for instancing, which could also be a help here.
Regarding whether or not to render on a background thread, in my experience I've usually seen significant performance boosts (10-40%, particularly on the multicore devices) when rendering my OpenGL ES scene on a background thread. Using a serial GCD queue, it's also pretty easy to do that in a safe manner.

How do I use vDSP functions for Short Time Fourier Transform?

I trying to understand how to use vDSP functions for STFT. So I use FFT code from apple's expamles and I can get FFT of first 1024 frames but how could I get FFT of next 1024 - 2047 frames and so on, until the end of file.. (in this case I imagine the size of file is int f = 10000).
//vDSP variables
FFTSetupD setupReal;
uint32_t log2n;
uint32_t n, nOver2;
int32_t stride;
double *obtainedReal;
double scale;
log2n = N;
n = 1 << log2n;
stride = 1;
nOver2 = n/2;
int f = 10000;
buffer = malloc(f *sizeof(double));
obtainedReal = malloc(f *sizeof(double));
A.realp = malloc(f *sizeof(double));
A.imagp = malloc(f *sizeof(double));
vDSP_ctozD((DOUBLE_COMPLEX*) buffer, 2, &A, 1, nOver2);
setupReal = vDSP_create_fftsetupD(log2n, FFT_RADIX2);
if (setupReal == NULL) {
NSLog(#"fft_setup failed to allocate enough memory for real FFT\n");
return 0 ;
vDSP_fft_zripD(setupReal, &A, stride, log2n, FFT_FORWARD);
scale = (double) 1.0 / (2 * n);
vDSP_vsmulD(A.realp, 1, &scale, A.realp, 1, nOver2);
vDSP_vsmulD(A.imagp, 1, &scale, A.imagp, 1, nOver2);
vDSP_ztocD(&A, 1, (DOUBLE_COMPLEX *) obtainedReal, 2, nOver2);
If you simply want the FFT of the next 1024 elements, add nOver2 to A.realp and to A.imagp, then perform another vDSP_fft_zripD and another vDSP_ztocD. You will probably want to advance obtainedReal too, or the new results will overwrite the old results.
Note that changing A.realp and A.imagp loses the starting addresses, so you will not be able to free this memory unless you recalculate the starting addresses or save them elsewhere before changing A.realp and A.imagp.
Also, 10,000 is not an integer multiple of 1024, so your last portion will not have 1024 elements, so you need to figure out an alternative, such as getting more data or padding the data with zeroes.
You are allocating too much memory for A.realp and A.imagp. Each of them receives half of the elements in buffer, so each of them only needs half as much memory.
Even that much memory is not needed. You can use vDSP_ctozD to move just 1024 elements into A.realp and A.imagp (512 each), then perform an FFT, then move the data to obtainedReal using vDSP_ztocD, then move on to the next group by using vDSP_ctozD to move 1024 new elements into the same space in A.realp and A.imagp that was used before.

Using iOS 3d Mixer

I have a AUGraph setup fairly simply with a multichannel mixer connected to an I/O unit. The playback is accessed through a callback function and everything works nicely.
I am trying to switch over to the 3D Mixer instead of the Multichannel mixer. So I switched the parameter from kAudioUnitSubType_MultiChannelMixer to kAudioUnitSubType_AU3DMixerEmbedded and left all the other setup the same.
The result was sort of a high pitched whine that seemed to start sounding like something then became just whine-ish. I have gone through each of the 3D Mixer unit's parameters and set them to their defaults but there was no change. Flipping on and off the k3DMixerParam_Enable parameter did work at muting and unmuting the playback though.
What setup I might have missed? or know where to find an example of a working 3d Mixer?
As already pointed out the 3d mixer needs mono inputs. But you also have to use UInt16 as the input sample data type. This is a working AudioStreamBasicDescription:
AudioStreamBasicDescription streamFormat = {0};
size_t bytesPerSample = sizeof (UInt16);
streamFormat.mFormatID = kAudioFormatLinearPCM;
streamFormat.mFormatFlags = kAudioFormatFlagsCanonical;
streamFormat.mBytesPerPacket = bytesPerSample;
streamFormat.mFramesPerPacket = 1;
streamFormat.mBytesPerFrame = bytesPerSample;
streamFormat.mChannelsPerFrame = 1;
streamFormat.mBitsPerChannel = 8 * bytesPerSample;
streamFormat.mSampleRate = graphSampleRate;
// Set the input stream format of the desired 3D mixer unit audio bus
AudioUnitSetProperty (
sizeof (streamFormat)
As all answers already mention: the 3D Mixer on iOS needs mono inputs.
On iOS 8 / Xcode 6, the concept of canonical formats is deprecated and I found this (and only this) mono stream format description working as 3D Mixer input bus stream format description:
AudioStreamBasicDescription monoStreamFormat = {0};
monoStreamFormat.mSampleRate = sampleRate;
monoStreamFormat.mFormatID = kAudioFormatLinearPCM;
monoStreamFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
monoStreamFormat.mBitsPerChannel = 16;
monoStreamFormat.mChannelsPerFrame = 1;
monoStreamFormat.mFramesPerPacket = 1;
monoStreamFormat.mBytesPerPacket = 2;
monoStreamFormat.mBytesPerFrame = 2;
The sample rate should be set and then obtained from the AVAudioSession.
Set this format on the output of the Audio Unit connected to the 3D Mixer input. Which is probably a AUConverter Unit...
Note however, this hasn't been tested for < iOS 8.
The 3d Mixer needed mono inputs.

Help with IIR Comb Filter

#define D 1000
OSStatus MusicPlayerCallback(
void* inRefCon,
AudioUnitRenderActionFlags * ioActionFlags,
const AudioTimeStamp * inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames
AudioBufferList * ioData){
MusicPlaybackState *musicPlaybackState = (MusicPlaybackState*) inRefCon;
//Sample Rate 44.1
float a0,a1;
double y0, sampleinp;
//Delay Gain
a0 = 1;
a1 = 0.5;
for (int i = 0; i< ioData->mNumberBuffers; i++){
AudioBuffer buffer = ioData->mBuffers[i];
SIn16 *outSampleBuffer = buffer.mData;
for (int j = 0; j < inNumberFrames*2; j++) {
//Delay Left Channel
sampleinp = *musicPlaybackState->samplePtr++;
/* IIR equation of Comb Filter
y[n] = (a*x[n])+ (b*x[n-D])
y0 = (a0*sampleinp) + (a1*sampleinp-D);
outSample[j] = fmax(fmin(y0, 32767.0), -32768.0);
//Delay Right Channel
sampleinp = *musicPlaybackState->samplePtr++;
y0 = (a0*sampleinp) + (a1*sampleinp-D);
outSample[j] = fmax(fmin(y0, 32767.0), -32768.0);
Ok, I got a lot of info but I'm having trouble implementing it. Can someone help, it's probably something really easy i'm forgeting. It's just playing back as normal with a little boost but no delays.
Your treatment of the x0[] variables doesn't look right -- the way you have it, the left and right channels will be intermingled. You assign to x0[j] for the left channel, then
overwrite x0[j] with the right channel data. So the delayed signal x0[j-D] will
always correspond to the right channel, with the delayed left channel data being lost.
You didn't say what your sample rate is, but for a typical audio application, a
three-sample delay might not have much of an audible effect. At 44.1 ksamp/sec,
with a 3-sample delay the peaks and troughs of the filter response will be at
multiples of 14,700 Hz. All you'll get is a single peak in the audio frequency
range, in a part of the spectrum where there's hardly any power (assuming the
signal is speech or music).

How do I set up a buffer when doing an FFT using the Accelerate framework?

I'm using the Accelerate framework to perform a Fast Fourier Transform (FFT), and am trying to find a way to create a buffer for use with it that has a length of 1024. I have access to the average peak and peak of a signal on which I want to do the FFT.
Can somebody help me or give me some hints to do this?
Apple has some examples of how to set up FFTs in their vDSP Programming Guide. You should also check out the vDSP Examples sample application. While for the Mac, this code should translate directly across to iOS as well.
I recently needed to do a simple FFT of an 64 integer input waveform, for which I used the following code:
static FFTSetupD fft_weights;
static DSPDoubleSplitComplex input;
static double *magnitudes;
+ (void)initialize
/* Setup weights (twiddle factors) */
fft_weights = vDSP_create_fftsetupD(6, kFFTRadix2);
/* Allocate memory to store split-complex input and output data */
input.realp = (double *)malloc(64 * sizeof(double));
input.imagp = (double *)malloc(64 * sizeof(double));
magnitudes = (double *)malloc(64 * sizeof(double));
- (CGFloat)performAcceleratedFastFourierTransformAndReturnMaximumAmplitudeForArray:(NSUInteger *)waveformArray;
for (NSUInteger currentInputSampleIndex = 0; currentInputSampleIndex < 64; currentInputSampleIndex++)
input.realp[currentInputSampleIndex] = (double)waveformArray[currentInputSampleIndex];
input.imagp[currentInputSampleIndex] = 0.0f;
/* 1D in-place complex FFT */
vDSP_fft_zipD(fft_weights, &input, 1, 6, FFT_FORWARD);
input.realp[0] = 0.0;
input.imagp[0] = 0.0;
// Get magnitudes
vDSP_zvmagsD(&input, 1, magnitudes, 1, 64);
// Extract the maximum value and its index
double fftMax = 0.0;
vDSP_maxmgvD(magnitudes, 1, &fftMax, 64);
return sqrt(fftMax);
As you can see, I only used the real values in this FFT to set up the input buffers, performed the FFT, and then read out the magnitudes.