I'm writing a program to extract images from a video stream. So far I have figured out how to seek to the correct frames, decode the video stream, and gather the relevant data into an AVFrame struct. I'm now trying to write the data out as a JPEG image, but my code isn't working. The code I got is from here: https://gist.github.com/RLovelett/67856c5bfdf5739944ed
int save_frame_as_jpeg(AVCodecContext *pCodecCtx, AVFrame *pFrame, int FrameNo) {
AVCodec *jpegCodec = avcodec_find_encoder(AV_CODEC_ID_JPEG2000);
if (!jpegCodec) {
return -1;
}
AVCodecContext *jpegContext = avcodec_alloc_context3(jpegCodec);
if (!jpegContext) {
return -1;
}
jpegContext->pix_fmt = pCodecCtx->pix_fmt;
jpegContext->height = pFrame->height;
jpegContext->width = pFrame->width;
if (avcodec_open2(jpegContext, jpegCodec, NULL) < 0) {
return -1;
}
FILE *JPEGFile;
char JPEGFName[256];
AVPacket packet = {.data = NULL, .size = 0};
av_init_packet(&packet);
int gotFrame;
if (avcodec_encode_video2(jpegContext, &packet, pFrame, &gotFrame) < 0) {
return -1;
}
sprintf(JPEGFName, "dvr-%06d.jpg", FrameNo);
JPEGFile = fopen(JPEGFName, "wb");
fwrite(packet.data, 1, packet.size, JPEGFile);
fclose(JPEGFile);
av_free_packet(&packet);
avcodec_close(jpegContext);
return 0;
}
If I use that code, the first error I got was about the time_base on the AVCodecContext not being set. I set that to the timebase of my video decoding AVCodecContext struct. Now I'm getting another error
[jpeg2000 # 0x7fd6a4015200] dimensions not set
[jpeg2000 # 0x7fd6a307c400] dimensions not set
[jpeg2000 # 0x7fd6a5800000] dimensions not set
[jpeg2000 # 0x7fd6a307ca00] dimensions not set
[jpeg2000 # 0x7fd6a3092400] dimensions not set
and the images still aren't being written. From that Github Gist, one commenter claimed that the metadata isn't being written to the JPEG image, but how should I write this metadata? I did set the width and height of the encoding context, so I'm not sure why it claims the dimensions are not set.
JPEG2000 isn't jpeg. To encode JPEG images, use AV_CODEC_ID_MJPEG. MJPEG stands for "motion JPEG", which is how a sequence of JPEG pictures making up a video stream is typically called.
Related
I'm trying to make a video that consists of the RenderTextures.
I've written this from Unity Documentation,
but I want to append the next RenderTextures after I make a video.
Make encoder, AudioBuf as a member variable -> It leads to error that cannot create the .mp4 file or crashed on Editor.
Is there any method to keep the current .mp4 file handler for appending other RenderTextures after this function ends?
void EncodeVideoFromPredistortedImages(RenderTexture[] predistortedImages) {
// Compose the video again to encode from the Images list.
Texture2D convertedToTex2d = new Texture2D(predistortedImages[0].width, predistortedImages[0].height);
videoAttr.width = (uint)convertedToTex2d.width;
videoAttr.height = (uint)convertedToTex2d.height;
using (var encoder = new MediaEncoder(encodedVideoFilePath, videoAttr/*, audioAttr*/))
using (var audioBuf = new Unity.Collections.NativeArray<float>(sampleFramesPerVideoFrame, Unity.Collections.Allocator.Temp)) {
for (int i = 0; i < predistortedImages.Length; ++i) {
Debug.Log($"Current encoding idx {i} of {ExtractedTexturesArr.Length}");
RenderTexture prevRT = RenderTexture.active;
RenderTexture.active = predistortedImages[i];
convertedToTex2d.ReadPixels(new Rect(0, 0, predistortedImages[i].width, predistortedImages[i].height), 0, 0);
convertedToTex2d.Apply();
RenderTexture.active = prevRT;
encoder.AddFrame(convertedToTex2d);
encoder.AddSamples(audioBuf);
}
encoder.Dispose();
DestroyImmediate(convertedToTex2d);
}
I use OpenCV.Videos to deal with this problem instead of using unity 3rd party VideoWriter due to performance.
I have tried to search for this question a lot, but never have seen any satisfactory answers, so now I have a last hope here.
I have an onPreviewFrame callback set up. Which gives a byte[] of raw frames with supported preview format(NV21 with H.264 encoded type).
Now, the problem is callback always starts giving byte[] frames from a fixed orientation, whenever device rotates it doesn't reflect to captured byte[] frames. I have tried with setDisplayOrientation and setRotation but these api's are only reflecting to preview which is being displayed not at all to the captured byte [] frames.
Android docs even says, Camera.setDisplayOrientation only affects the displaying preview, not the frame bytes:
This does not affect the order of byte array passed in onPreviewFrame(byte[], Camera), JPEG pictures, or recorded videos.
Finally Is there a way, at any API level, to change the orientation of the byte[] frames?
One possible way if you don't care about the format is to the use YuvImage class to get a JPEG buffer, use this buffer to create a Bitmap and rotate it to the corresponding angle. Something like that:
#Override
public void onPreviewFrame(byte[] data, Camera camera) {
Size previewSize = camera.getParameters().getPreviewSize();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] rawImage = null;
// Decode image from the retrieved buffer to JPEG
YuvImage yuv = new YuvImage(data, ImageFormat.NV21, previewSize.width, previewSize.height, null);
yuv.compressToJpeg(new Rect(0, 0, previewSize.width, previewSize.height), YOUR_JPEG_COMPRESSION, baos);
rawImage = baos.toByteArray();
// This is the same image as the preview but in JPEG and not rotated
Bitmap bitmap = BitmapFactory.decodeByteArray(rawImage, 0, rawImage.length);
ByteArrayOutputStream rotatedStream = new ByteArrayOutputStream();
// Rotate the Bitmap
Matrix matrix = new Matrix();
matrix.postRotate(YOUR_DEFAULT_ROTATION);
// We rotate the same Bitmap
bitmap = Bitmap.createBitmap(bitmap, 0, 0, previewSize.width, previewSize.height, matrix, false);
// We dump the rotated Bitmap to the stream
bitmap.compress(CompressFormat.JPEG, YOUR_JPEG_COMPRESSION, rotatedStream);
rawImage = rotatedStream.toByteArray();
// Do something we this byte array
}
I have modified the onPreviewFrame method of this Open Source Android Touch-To-Record library to take transpose and resize a captured frame.
I defined "yuvIplImage" as following in my setCameraParams() method.
IplImage yuvIplImage = IplImage.create(mPreviewSize.height, mPreviewSize.width, opencv_core.IPL_DEPTH_8U, 2);
This is my onPreviewFrame() method:
#Override
public void onPreviewFrame(byte[] data, Camera camera)
{
long frameTimeStamp = 0L;
if(FragmentCamera.mAudioTimestamp == 0L && FragmentCamera.firstTime > 0L)
{
frameTimeStamp = 1000L * (System.currentTimeMillis() - FragmentCamera.firstTime);
}
else if(FragmentCamera.mLastAudioTimestamp == FragmentCamera.mAudioTimestamp)
{
frameTimeStamp = FragmentCamera.mAudioTimestamp + FragmentCamera.frameTime;
}
else
{
long l2 = (System.nanoTime() - FragmentCamera.mAudioTimeRecorded) / 1000L;
frameTimeStamp = l2 + FragmentCamera.mAudioTimestamp;
FragmentCamera.mLastAudioTimestamp = FragmentCamera.mAudioTimestamp;
}
synchronized(FragmentCamera.mVideoRecordLock)
{
if(FragmentCamera.recording && FragmentCamera.rec && lastSavedframe != null && lastSavedframe.getFrameBytesData() != null && yuvIplImage != null)
{
FragmentCamera.mVideoTimestamp += FragmentCamera.frameTime;
if(lastSavedframe.getTimeStamp() > FragmentCamera.mVideoTimestamp)
{
FragmentCamera.mVideoTimestamp = lastSavedframe.getTimeStamp();
}
try
{
yuvIplImage.getByteBuffer().put(lastSavedframe.getFrameBytesData());
IplImage bgrImage = IplImage.create(mPreviewSize.width, mPreviewSize.height, opencv_core.IPL_DEPTH_8U, 4);// In my case, mPreviewSize.width = 1280 and mPreviewSize.height = 720
IplImage transposed = IplImage.create(mPreviewSize.height, mPreviewSize.width, yuvIplImage.depth(), 4);
IplImage squared = IplImage.create(mPreviewSize.height, mPreviewSize.height, yuvIplImage.depth(), 4);
int[] _temp = new int[mPreviewSize.width * mPreviewSize.height];
Util.YUV_NV21_TO_BGR(_temp, data, mPreviewSize.width, mPreviewSize.height);
bgrImage.getIntBuffer().put(_temp);
opencv_core.cvTranspose(bgrImage, transposed);
opencv_core.cvFlip(transposed, transposed, 1);
opencv_core.cvSetImageROI(transposed, opencv_core.cvRect(0, 0, mPreviewSize.height, mPreviewSize.height));
opencv_core.cvCopy(transposed, squared, null);
opencv_core.cvResetImageROI(transposed);
videoRecorder.setTimestamp(lastSavedframe.getTimeStamp());
videoRecorder.record(squared);
}
catch(com.googlecode.javacv.FrameRecorder.Exception e)
{
e.printStackTrace();
}
}
lastSavedframe = new SavedFrames(data, frameTimeStamp);
}
}
This code uses a method "YUV_NV21_TO_BGR", which I found from this link
Basically this method is used to resolve, which I call as, "The Green Devil problem on Android". You can see other android devs facing the same problem on other SO threads. Before adding "YUV_NV21_TO_BGR" method when I just took transpose of YuvIplImage, more importantly a combination of transpose, flip (with or without resizing), there was greenish output in resulting video. This "YUV_NV21_TO_BGR" method saved the day. Thanks to #David Han from above google groups thread.
Also you should know that all this processing (transpose, flip and resize), in onPreviewFrame, takes much time which causes you a very serious hit on your Frames Per Second (FPS) rate. When I used this code, inside onPreviewFrame method, the resulting FPS of the recorded video was down to 3 frames/sec from 30fps.
I would advise not to use this approach. Rather you can go for post-recording processing (transpose, flip and resize) of your video file using JavaCV in an AsyncTask. Hope this helps.
I'm capturing images from camera, and I have two function for saving 16-bit(!) image one in PNG and one in TIFF formats.
Could you please explain why the PNG is a very noisy image? like this:
PNG function:
bool save_image_png(const char *file_name,const mono16bit& img)
{
[...]
/* write header */
if (setjmp(png_jmpbuf(png_ptr)))
abort_("[write_png_file] Error during writing header");
png_set_IHDR(png_ptr, info_ptr, width, height,
bit_depth,PNG_COLOR_TYPE_GRAY , PNG_INTERLACE_NONE,
PNG_COMPRESSION_TYPE_BASE, PNG_FILTER_TYPE_BASE);
png_write_info(png_ptr, info_ptr);
/* write bytes */
if (setjmp(png_jmpbuf(png_ptr)))
abort_("[write_png_file] Error during writing bytes");
row_pointers = (png_bytep*) malloc(sizeof(png_bytep) * height);
for (y=0; y<height; y++)
row_pointers[y] = (png_byte*) malloc(png_get_rowbytes(png_ptr,info_ptr));
for (y = 0; y < height; y++)
{
row_pointers[y] = (png_bytep)img.getBuffer() + y * width*2;
}
png_write_image(png_ptr, row_pointers);
/* end write */
[...]
}
and TIFF function:
bool save_image(const char *fname,const mono16bit& img)
{
[...]
for(y=0; y<height; y++) {
if((err=TIFFWriteScanline(tif,(tdata_t)(img.getBuffer()+width*y),y,0))==-1)
break;
}
TIFFClose(tif);
if(err==-1) {
fprintf(stderr,"Error writing to %s file\n",fname);
return false;
}
return true;
//#endif //USE_LIBTIFF
}
Thank you!
png_set_swap does nothing. You have to actually flip bytes in each pixel of the image.
If you’re on a PC and have SSSE3 or newer, a good way is _mm_shuffle_epi8 instruction, make a permute vector with _mm_setr_epi8.
If you’re on ARM and have NEON, use vrev16q_u8 instruction instead.
Perhaps you have a byte-order problem.
Try adding:
png_set_swap(png_ptr);
before saving the image
I have a project using libavcodec (ffmpeg). I'm using it to encode MPEG-2 video at 4:2:2 Profile, Main Level. I have the pixel format PIX_FMT_YUV422P selected in the AVCodecContext, however the video output I'm getting has all the colours wrong, and looks to me like the encoder is incorrectly reading the buffers as though it thinks it is 4:2:0 chroma rather than 4:2:2. Here's my codec setup:
//
// AVFormatContext* _avFormatContext previously defined as mpeg2video
//
//
// Set up the video stream for output
//
AVVideoStream* _avVideoStream = av_new_stream(_avFormatContext, 0);
if (!_avVideoStream)
{
err = ccErrWFFFmpegUnableToAllocateStream;
goto bail;
}
_avCodecContext = _avVideoStream->codec;
_avCodecContext->codec_id = CODEC_ID_MPEG2VIDEO;
_avCodecContext->codec_type = CODEC_TYPE_VIDEO;
//
// Set up required parameters
//
_avCodecContext->rc_max_rate = _avCodecContext->rc_min_rate = _avCodecContext->bit_rate = src->_avCodecContext->bit_rate;
_avCodecContext->flags = CODEC_FLAG_INTERLACED_DCT;
_avCodecContext->flags2 = CODEC_FLAG2_INTRA_VLC | CODEC_FLAG2_NON_LINEAR_QUANT;
_avCodecContext->qmin = 1;
_avCodecContext->qmax = 1;
_avCodecContext->rc_buffer_size = _avCodecContext->rc_initial_buffer_occupancy = 2000000;
_avCodecContext->rc_buffer_aggressivity = 0.25;
_avCodecContext->profile = 0;
_avCodecContext->level = 5;
_avCodecContext->width = f->GetWidth(); // f is a private Frame class with width, height properties etc.
_avCodecContext->height = f->GetHeight();
_avCodecContext->time_base.den = 25;
_avCodecContext->time_base.num = 1;
_avCodecContext->gop_size = 12;
_avCodecContext->max_b_frames = 2;
_avCodecContext->pix_fmt = PIX_FMT_YUV422P;
if (_avFormatContext->oformat->flags & AVFMT_GLOBALHEADER)
{
_avCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
}
if (av_set_parameters(_avFormatContext, NULL) < 0)
{
err = ccErrWFFFmpegUnableToSetParameters;
goto bail;
}
//
// Set up video codec for encoding
//
AVCodec* _avCodec = avcodec_find_encoder(_avCodecContext->codec_id);
if (!_avCodec)
{
err = ccErrWFFFmpegUnableToFindCodecForOutput;
goto bail;
}
if (avcodec_open(_avCodecContext, _avCodec) < 0)
{
err = ccErrWFFFmpegUnableToOpenCodecForOutput;
goto bail;
}
A screengrab of the resulting video frame can be seen at http://ftp.limeboy.com/images/screen_grab.png (the input was standard colour bars).
I've checked by outputting debug frames to TGA format at various points in the process, and I can confirm that it is all fine and dandy up until the point that libavcodec encodes the frame.
Any assistance most appreciated!
Cheers,
Mike.
OK, this is embarrassing.
Actually, the way I had it set up is correct. Looking through the source code for ffmpeg, it appears that all you have to do to get it to encode 4:2:2 profile and 4:2:2 chroma is to set the incoming pixel format to PIX_FMT_YUV422P.
The cause of the problem? I was watching the video file back on VLC in a virtual machine, which at some stage had changed its video resolution from 32-bit to 16-bit.
That's right! IT changed it. I didn't change it - IT did it! BY ITSELF, YOU HEAR ME!!
Apologies if anyone wasted their time chasing down this non-issue.
I have several .png images (ETA: but the format could also be JPEG or something else) that I am going to display in UITableViewCells. Right now, in order to get the row heights, I load in the images, get their size properties, and use that to figure out how high to make the rows (calculating any necessary changes along the way, since most of the images get resized before being displayed). In order to speed things up and reduce memory usage, I'd like to be able to get size without loading the images. Is there a way to do this?
Note: I know that there are a number of shortcuts I could implement to eliminate this issue, but for several reasons I can't resize images in advance or collect the image sizes in advance, forcing me to get this info at run time.
It should be pretty simple. PNG spec has an explanation of a PNG datastream (which is effectively a file). IHDR section contains information about image dimensions.
So what you have to do is to read in PNG "magic value" and then read two four-byte integers, which will be width and height, respectively. You might also need to reorder bits in these values (not sure how are they stored), but once you figure that out, it will be very simple.
As of iOS SDK 4.0, this task can be accomplished with the ImageIO framework (CGImageSource...). I have answered a similar question here.
imageUrl is an NSURL, also import ImageIO/ImageIO.h with <> around it.
CGImageSourceRef imageSourceRef = CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
if (!imageSourceRef)
return;
CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef, 0, NULL);
NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
if (!properties) {
return;
}
NSNumber *height = [properties objectForKey:#"PixelHeight"];
NSNumber *width = [properties objectForKey:#"PixelWidth"];
int height = 0;
int width = 0;
if (height) {
height = [height intValue];
}
if (width) {
width = [width intValue];
}
Note: This function doesn't work with iPhone compressed PNGs, this compression is automatically performed by XCode and change the image header, see more details here and how to disable this feature: http://discussions.apple.com/thread.jspa?threadID=1751896
Future versions of PSFramework will interpret this headers too, stay tuned.
See this function, she does just that. Reads only 30 bytes of the PNG file and returns the size (CGSize). This function is part of a framework for processing images called PSFramework (http://sourceforge.net/projects/photoshopframew/). Not yet implemented for other image formats, developers are welcome. The project is Open Source under the GNU License.
CGSize PSPNGSizeFromMetaData( NSString* anFileName ) {
// File Name from Bundle Path.
NSString *fullFileName = [NSString stringWithFormat:#"%#/%#", [[NSBundle mainBundle] bundlePath], anFileName ];
// File Name to C String.
const char* fileName = [fullFileName UTF8String];
/* source file */
FILE * infile;
// Check if can open the file.
if ((infile = fopen(fileName, "rb")) == NULL)
{
NSLog(#"PSFramework Warning >> (PSPNGSizeFromMetaData) can't open the file: %#", anFileName );
return CGSizeZero;
}
////// ////// ////// ////// ////// ////// ////// ////// ////// ////// //////
// Lenght of Buffer.
#define bytesLenght 30
// Bytes Buffer.
unsigned char buffer[bytesLenght];
// Grab Only First Bytes.
fread(buffer, 1, bytesLenght, infile);
// Close File.
fclose(infile);
////// ////// ////// ////// //////
// PNG Signature.
unsigned char png_signature[8] = {137, 80, 78, 71, 13, 10, 26, 10};
// Compare File signature.
if ((int)(memcmp(&buffer[0], &png_signature[0], 8))) {
NSLog(#"PSFramework Warning >> (PSPNGSizeFromMetaData) : The file (%#) don't is one PNG file.", anFileName);
return CGSizeZero;
}
////// ////// ////// ////// ////// ////// ////// ////// ////// //////
// Calc Sizes. Isolate only four bytes of each size (width, height).
int width[4];
int height[4];
for ( int d = 16; d < ( 16 + 4 ); d++ ) {
width[ d-16] = buffer[ d ];
height[d-16] = buffer[ d + 4];
}
// Convert bytes to Long (Integer)
long resultWidth = (width[0] << (int)24) | (width[1] << (int)16) | (width[2] << (int)8) | width[3];
long resultHeight = (height[0] << (int)24) | (height[1] << (int)16) | (height[2] << (int)8) | height[3];
// Return Size.
return CGSizeMake( resultWidth, resultHeight );
}
//Here's a quick & dirty port to C#
public static Size PNGSize(string fileName)
{
// PNG Signature.
byte[] png_signature = {137, 80, 78, 71, 13, 10, 26, 10};
try
{
using (FileStream stream = File.OpenRead(fileName))
{
byte[] buf = new byte[30];
if (stream.Read(buf, 0, 30) == 30)
{
int i = 0;
int imax = png_signature.Length;
for (i = 0; i < imax; i++)
{
if (buf[i] != png_signature[i])
break;
}
// passes sig test
if (i == imax)
{
// Calc Sizes. Isolate only four bytes of each size (width, height).
// Convert bytes to integer
int resultWidth = buf[16] << 24 | buf[17] << 16 | buf[18] << 8 | buf[19];
int resultHeight = buf[20] << 24 | buf[21] << 16 | buf[22] << 8 | buf[23];
// Return Size.
return new Size( resultWidth, resultHeight );
}
}
}
}
catch
{
}
return new Size(0, 0);
}
This is nicely implemented in Perl's Image::Size module for about a dozen formats -- including PNG and JPEG. In order to re-implement it in Objective C just take the perl code and read it as pseudocode ;-)
For instance, pngsize() is defined as
# pngsize : gets the width & height (in pixels) of a png file
# cor this program is on the cutting edge of technology! (pity it's blunt!)
#
# Re-written and tested by tmetro#vl.com
sub pngsize
{
my $stream = shift;
my ($x, $y, $id) = (undef, undef, "could not determine PNG size");
my ($offset, $length);
# Offset to first Chunk Type code = 8-byte ident + 4-byte chunk length + 1
$offset = 12; $length = 4;
if (&$read_in($stream, $length, $offset) eq 'IHDR')
{
# IHDR = Image Header
$length = 8;
($x, $y) = unpack("NN", &$read_in($stream, $length));
$id = 'PNG';
}
($x, $y, $id);
}
jpegsize is only a few lines longer.
Try using the CGImageCreateWithPNGDataProvider and CGImageCreateWithJPEGDataProvider functions. I don't know whether they're lazy enough or not, or whether that's even possible for JPEG, but it's worth trying.
low tech solutions:
if you know what the images are beforehand, store the image sizes along with their filenames in an XML file or plist (or whichever way you prefer) and just read those properties in.
if you don't know what the images are (i.e. they're going to be defined at runtime), then you must've had the images loaded at one time or another. the first time you do have them loaded, save their height and width in a file so you can access it later.