sws_scale screws up last pixel row in smaller x264 mp4 encoding - encoding

I am muxing pictures in the PIX_FMT_ARGB format into an mp4 video.
All of it works well, except that the last pixel row of the outgoing image is screwed up, in most cases the last row is completely black, sometimes there are other colors, it seems somehow dependant on the machine it runs on.
I am absolutely sure that the error must be in sws_scale, as I am saving the images before and after the scaling. The input images do not have the error, but after the sws_scale() I save the yuv image and the error is apparent.
Here is an example:
Original
Yuvfile (after sws_scale)
At the bottom of the Yuvfile, you will see the black row.
This is how I do the scaling (it is after the official ffmpeg examples, more or less):
static int sws_flags = SWS_FAST_BILINEAR | SWS_ACCURATE_RND;
if (img_convert_ctx == NULL)
{
img_convert_ctx = sws_getContext( srcWidth, srcHeight,
PIX_FMT_ARGB,
codecContext->width, codecContext->height,
codecContext->pix_fmt,
sws_flags, NULL, NULL, NULL );
if (img_convert_ctx == NULL)
{
av_log(c, AV_LOG_ERROR, "%s","Cannot initialize the conversion context\n");
exit(1);
}
}
fill_image(tmp_picture, pic, pic_size, frame_count, ptr->srcWidth, ptr->srcHeight );
sws_scale(img_convert_ctx, tmp_picture->data, tmp_picture->linesize,
0, srcHeight, picture->data, picture->linesize);
I also tried a number of different SWS_ flags, but all yield the same result.
Could this be a bug in sws_scale or am I doing something wrong? I am using the latest version of the ffmpeg libraries.

The problem was this function:
fill_image(tmp_picture, pic, pic_size, frame_count, ptr->srcWidth, ptr->srcHeight );
It did not copy the input image to the tmp_picture correctly. Indeed skipped the last line.
Morale: Do not trust years-old functions :D

180 is not a multiple of 8, this could be the reason for the black row. Can you try scaling it to the nearest multiple of 8,say 184 or 192(multiple of 16)? Non h264 codecs need multiple of 8 as height.

Related

AVAudioPCMBuffer built programmatically, not playing back in stereo

I'm trying to fill an AVAudioPCMBuffer programmatically in Swift to build a metronome. This is the first real app I'm trying to build, so it's also my first audio app. Right now I'm experimenting with different frameworks and methods of getting the metronome looping accurately.
I'm trying to build an AVAudioPCMBuffer with the length of a measure/bar so that I can use the .Loops option of the AVAudioPlayerNode's scheduleBuffer method. I start by loading my file(2 ch, 44100 Hz, Float32, non-inter, *.wav and *.m4a both have same issue) into a buffer, then copying that buffer frame by frame separated by empty frames into the barBuffer. The loop below is how I'm accomplishing this.
If I schedule the original buffer to play, it will play back in stereo, but when I schedule the barBuffer, I only get the left channel. As I said I'm a beginner at programming, and have no experience with audio programming, so this might be my lack of knowledge on 32 bit float channels, or on this data type UnsafePointer<UnsafeMutablePointer<float>>. When I look at the floatChannelData property in swift, the description makes it sound like this should be copying two channels.
var j = 0
for i in 0..<Int(capacity) {
barBuffer.floatChannelData.memory[j] = buffer.floatChannelData.memory[i]
j += 1
}
j += Int(silenceLengthInSamples)
// loop runs 4 times for 4 beats per bar.
edit: I removed the glaring mistake i += 1, thanks to hotpaw2. The right channel is still missing when barBuffer is played back though.
Unsafe pointers in swift are pretty weird to get used to.
floatChannelData.memory[j] only accesses the first channel of data. To access the other channel(s), you have a couple choices:
Using advancedBy
// Where current channel is at 0
// Get a channel pointer aka UnsafePointer<UnsafeMutablePointer<Float>>
let channelN = floatChannelData.advancedBy( channelNumber )
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = channelN.memory
// Get first two floats of channel channelNumber
let floatOne = channelNData.memory
let floatTwo = channelNData.advancedBy(1).memory
Using Subscript
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = floatChannelData[ channelNumber ]
// Get first two floats of channel channelNumber
let floatOne = channelNData[0]
let floatTwo = channelNData[1]
Using subscript is much clearer and the step of advancing and then manually
accessing memory is implicit.
For your loop, try accessing all channels of the buffer by doing something like this:
for i in 0..<Int(capacity) {
for n in 0..<Int(buffer.format.channelCount) {
barBuffer.floatChannelData[n][j] = buffer.floatChannelData[n][i]
}
}
Hope this helps!
This looks like a misunderstanding of Swift "for" loops. The Swift "for" loop automatically increments the "i" array index. But you are incrementing it again in the loop body, which means that you end up skipping every other sample (the Right channel) in your initial buffer.

Putting an H.264 I frame to AVSampleBufferDisplayLayer but no video image is displayed

After having a detail review of WWDC2014,Session513, I try to write my app on IOS8.0 to decode and display one live H.264 stream. First of all, I construct a H264 parameter set successfully. When I get one I frame with a 4 bit start code,just like"0x00 0x00 0x00 0x01 0x65 ...", I put it into a CMblockBuffer. Then I construct a CMSampleBuffer using previews CMBlockBuffer. After that,I put the CMSampleBuffer into a AVSampleBufferDisplayLayer. Everything is OK(I checked the value returned ) except the AVSampleBufferDisplayLayer does not show any video image. Since these APIs are fairly new to everyone, I couldn't find any body who can resolve this problem.
I'll give the key codes as follows,and I do really appreciate it if you can help to figure out why the vide image can't be displayed. Thanks a lot.
(1) AVSampleBufferDisplayLayer initialised.
dsplayer is a objc instance of my main view controller.
#property(nonatomic,strong)AVSampleBufferDisplayLayer *dspLayer;
if(!_dspLayer)
{
_dspLayer = [[AVSampleBufferDisplayLayer alloc]init];
[_dspLayer setFrame:CGRectMake(90,551,557,389)];
_dspLayer.videoGravity = AVLayerVideoGravityResizeAspect;
_dspLayer.backgroundColor = [UIColor grayColor].CGColor;
CMTimebaseRef tmBase = nil;
CMTimebaseCreateWithMasterClock(NULL,CMClockGetHostTimeClock(),&tmBase);
_dspLayer.controlTimebase = tmBase;
CMTimebaseSetTime(_dspLayer.controlTimebase, kCMTimeZero);
CMTimebaseSetRate(_dspLayer.controlTimebase, 1.0);
[self.view.layer addSublayer:_dspLayer];
}
(2)In another thread, I get one H.264 I frame.
//construct h.264 parameter set ok
CMVideoFormatDescriptionRef formatDesc;
OSStatus formatCreateResult =
CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL, ppsNum+1, props, sizes, 4, &formatDesc);
NSLog([NSString stringWithFormat:#"construct h264 param set:%ld",formatCreateResult]);
//construct cmBlockbuffer .
//databuf points to H.264 data. starts with "0x00 0x00 0x00 0x01 0x65 ........"
CMBlockBufferRef blockBufferOut = nil;
CMBlockBufferCreateEmpty (0,0,kCMBlockBufferAlwaysCopyDataFlag, &blockBufferOut);
CMBlockBufferAppendMemoryBlock(blockBufferOut,
dataBuf,
dataLen,
NULL,
NULL,
0,
dataLen,
kCMBlockBufferAlwaysCopyDataFlag);
//construct cmsamplebuffer ok
size_t sampleSizeArray[1] = {0};
sampleSizeArray[0] = CMBlockBufferGetDataLength(blockBufferOut);
CMSampleTiminginfo tmInfos[1] = {
{CMTimeMake(5,1), CMTimeMake(5,1), CMTimeMake(5,1)}
};
CMSampleBufferRef sampBuf = nil;
formatCreateResult = CMSampleBufferCreate(kCFAllocatorDefault,
blockBufferOut,
YES,
NULL,
NULL,
formatDesc,
1,
1,
tmInfos,
1,
sampleSizeArray,
&sampBuf);
//put to AVSampleBufferdisplayLayer,just one frame. But I can't see any video frame in my view
if([self.dspLayer isReadyForMoreMediaData])
{
[self.dspLayer enqueueSampleBuffer:sampBuf];
}
[self.dspLayer setNeedsDisplay];
Your NAL unit start codes 0x00 0x00 0x01 or 0x00 0x00 0x00 0x01 need to be replaced by a length header.
This was clearly stated in the WWDC session you are referring to that the Annex B start code needs to be replaced by a AVCC conform lengh header. You are basically remuxing to MP4 file format from Annex B stream format on the fly here (simplified description of course).
Your call when creating the Parameter Set is "4" for this, so you need to prefix your VCL NAL units with a 4 byte length prefix. That's why you specifiy it as in AVCC format the length header can be shorter.
Whatever you put inside CMSampleBuffer will be OK, there is no sanity check if the contents can be decoded, just that you met the required parameters for being arbitrary data combined with timing information and a parameter set.
Basically with the data you put in you said the the VCL NAL units are 1 byte long. The decoder doesn't get the full NAL unit and bails out on an error.
Also make sure that when you use create the parameter set that the PPS/SPS do not have a length byted added and that the Annex B start code is also stripped.
Also I recommend not to use AVSampleBufferDisplayLayer but go through a VTDecompressionSession, so you can do stuff like color correction or other things that are needed inside a pixel shader.
It might be an idea to use DecompressionSessionDecode Frame initially as this will give you some feedback on the success of the decoding. If there is an issue with the decoding the AVSampleBufferDisplay layer doesn't tell you it just doesn't display anything. I can give you some code to help with this if required, let me know how you get on as I am attempting the same thing :)

Repeated Scene items in iOS YUV video capturing output

I capture a video and handle the resulting YUV frames.
the output looks like the following:
Although it appears normally on my phone's screen. But my peer receives it like that img above.
Every item is repeated and shifted by some value horizontally and vertically
My captured video is 352x288 and my YPixelCount = 101376, UVPixelCount = YPIXELCOUNT/4
Any clue to solve this or a starting point to understand how to handle YUV video frames on iOS ?
NSNumber* recorderValue = [NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange];
[videoRecorderSession setSessionPreset:AVCaptureSessionPreset352x288];
And this is the captureOutput function
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection{
if(CMSampleBufferIsValid(sampleBuffer) && CMSampleBufferDataIsReady(sampleBuffer) && ([self isQueueStopped] == FALSE))
{
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(imageBuffer,0);
UInt8 *baseAddress[3] = {NULL,NULL,NULL};
uint8_t *yPlaneAddress = (uint8_t *)CVPixelBufferGetBaseAddressOfPlane(imageBuffer,0);
UInt32 yPixelCount = CVPixelBufferGetWidthOfPlane(imageBuffer,0) * CVPixelBufferGetHeightOfPlane(imageBuffer,0);
uint8_t *uvPlaneAddress = (uint8_t *)CVPixelBufferGetBaseAddressOfPlane(imageBuffer,1);
UInt32 uvPixelCount = CVPixelBufferGetWidthOfPlane(imageBuffer,1) * CVPixelBufferGetHeightOfPlane(imageBuffer,1);
UInt32 p,q,r;
p=q=r=0;
memcpy(uPointer, uvPlaneAddress, uvPixelCount);
memcpy(vPointer, uvPlaneAddress+uvPixelCount, uvPixelCount);
memcpy(yPointer,yPlaneAddress,yPixelCount);
baseAddress[0] = (UInt8*)yPointer;
baseAddress[1] = (UInt8*)uPointer;
baseAddress[2] = (UInt8*)vPointer;
CVPixelBufferUnlockBaseAddress(imageBuffer,0);
}
}
Is there anything wrong with the above code ?
Your code doesn't look to0 bad. I can see two mistakes and one potential problem:
The uvPixelCount is incorrect. The YUV 420 format means that there is color information for each 2 by 2 pixel block. So the correct count is:
uvPixelCount = (width / 2) * (height / 2);
You write something about yPixelCount / 4, but I cannot see that in your code.
The UV information is interleaved, i.e. the second plane alternatingly contains a U and a V value. Or put differently: there's a U value on all even byte addresses and a V value on all odd byte addresses. If you really need to separate the U and V information, memcpy won't do.
There can be some extra bytes after each pixel row. You should use CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0) to get the number of bytes between two rows. As a consequence, a single memcpy won't do. Instead you need to copy each pixel row separately to get rid of the extra bytes between the rows.
All these things only explain part of the resulting image. The remaining parts are probably due to differences between your code and what the receiving peer expect. You did't write anything about that? Does the peer really need separated U and V values? Does it you 4:2:0 compression as well? Does it you video range instead of full range as well?
If you provide more information, I can give your more hints.

Length of data returned from CGImageGetDataProvider is larger than expected

I'm loading a grayscale png image and I want to access the underlying pixel data. However after I load get the pixel data via CGImageGetDataProvider, the length of the data returned is longer than expected.
CCGDataProviderRef provider = CGDataProviderCreateWithFilename(cStr);
CGImageRef image = CGImageCreateWithPNGDataProvider(provider, NULL, FALSE, kCGRenderingIntentDefault);
mapWidth = CGImageGetWidth(image);
mapHeight = CGImageGetHeight(image);
lookupMap = CGDataProviderCopyData(CGImageGetDataProvider(image));
mapWidth comes out to 1804 and
mapHeight comes out to 1005. The product of which is 1813020
When I call
CFDataGetLength(lookupMap)
the response is 1833120.
Where are these extra 20100 bytes coming from?
Any help here is much appreciated. Am I missing something about the underlying format of the image?
Upon further examination of the CFDataRef I found that if I loop through the buffer,
for each row bytes: 0 to 1803 will be correct from my image, and then the next 20 bytes are all zero. So this means that my returned image is actually coming back as a 1824 by 1005 image instead of 1804 by 1005. Still no explanation as to why.
There's a buffer being added to the end of each one of my rows.
I started using
CGImageGetBytesPerRow
and solved the mystery.

EXC_BAD_ACCESS when calling avcodec_encode_video

I have an Objective-C class (although I don't believe this is anything Obj-C specific) that I am using to write a video out to disk from a series of CGImages. (The code I am using at the top to get the pixel data comes right from Apple: http://developer.apple.com/mac/library/qa/qa2007/qa1509.html). I successfully create the codec and context - everything is going fine until it gets to avcodec_encode_video, when I get EXC_BAD_ACCESS. I think this should be a simple fix, but I just can't figure out where I am going wrong.
I took out some error checking for succinctness. 'c' is an AVCodecContext*, which is created successfully.
-(void)addFrame:(CGImageRef)img
{
CFDataRef bitmapData = CGDataProviderCopyData(CGImageGetDataProvider(img));
long dataLength = CFDataGetLength(bitmapData);
uint8_t* picture_buff = (uint8_t*)malloc(dataLength);
CFDataGetBytes(bitmapData, CFRangeMake(0, dataLength), picture_buff);
AVFrame *picture = avcodec_alloc_frame();
avpicture_fill((AVPicture*)picture, picture_buff, c->pix_fmt, c->width, c->height);
int outbuf_size = avpicture_get_size(c->pix_fmt, c->width, c->height);
uint8_t *outbuf = (uint8_t*)av_malloc(outbuf_size);
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture); // ERROR occurs here
printf("encoding frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);
CFRelease(bitmapData);
free(picture_buff);
free(outbuf);
av_free(picture);
i++;
}
I have stepped through it dozens of times. Here are some numbers...
dataLength = 408960
picture_buff = 0x5c85000
picture->data[0] = 0x5c85000 -- which I take to mean that avpicture_fill worked...
outbuf_size = 408960
and then I get EXC_BAD_ACCESS at avcodec_encode_video. Not sure if it's relevant, but most of this code comes from api-example.c. I am using XCode, compiling for armv6/armv7 on Snow Leopard.
Thanks so much in advance for help!
I have not enough information here to point to the exact error, but I think that the problem is that the input picture contains less data than avcodec_encode_video() expects:
avpicture_fill() only sets some pointers and numeric values in the AVFrame structure. It does not copy anything, and does not check whether the buffer is large enough (and it cannot, since the buffer size is not passed to it). It does something like this (copied from ffmpeg source):
size = picture->linesize[0] * height;
picture->data[0] = ptr;
picture->data[1] = picture->data[0] + size;
picture->data[2] = picture->data[1] + size2;
picture->data[3] = picture->data[1] + size2 + size2;
Note that the width and height is passed from the variable "c" (the AVCodecContext, I assume), so it may be larger than the actual size of the input frame.
It is also possible that the width/height is good, but the pixel format of the input frame is different from what is passed to avpicture_fill(). (note that the pixel format also comes from the AVCodecContext, which may differ from the input). For example, if c->pix_fmt is RGBA and the input buffer is in YUV420 format (or, more likely for iPhone, a biplanar YCbCr), then the size of the input buffer is width*height*1.5, but avpicture_fill() expects the size of width*height*4.
So checking the input/output geometry and pixel formats should lead you to the cause of the error. If it does not help, I suggest that you should try to compile for i386 first. It is tricky to compile FFMPEG for the iPhone properly.
Does the codec you are encoding support the RGB color space? You may need to use libswscale to convert to I420 before encoding. What codec are you using? Can you post the code where you initialize your codec context?
The function RGBtoYUV420P may help you.
http://www.mail-archive.com/libav-user#mplayerhq.hu/msg03956.html