Alternative to sws_scale - libavcodec

I am performing encoding of the captured windows screen with x264 using libavcodec. Since, the input is RGB, i am converting it to YUV to make it compatible with x264. I am using the sws_scale function for the same.
My question is if there is any alternate for this function since i don't need any scaling to be done in my case. Also, it would be useful if someone could throw light on the workflow of this function.
P.S: I am assuming x264 operates only in YUV color space. If this assumption is incorrect, please inform me on the same.
Thanks in advance.

I could not find an alternative to swscale and it seems except the fast bilinear algorithm (for scaling) all other algorithms used in the library provide a fairly negligible color shift.
Also, it is mathematically impossible to convert from RGB to YUV color space without any color shift (due to the approximations in the equations).
P.S: I could not use the RGB version of libx264 / libavcodec. If you have details on how to implement and how to build a corresponding version on windows, please post links/info for the same.

P.S: I am assuming x264 operates only in YUV color space. If this assumption is incorrect, please inform me on the same.
libx264 supports I420/YV12/NV12/I422/YV16/NV16/I444/YV24/BGR24/BGR32/RGB24 input colorspaces which are encoded as YUV 4:2:0/YUV 4:2:2/YUV 4:4:4/RGB (which should be specified in params). But anything except YUV 4:2:0 will need support from decoder because they are not part of High profile but newer profiles (High 4:2:2 and High 4:4:4 profiles).

Related

What type of entropy encoder does the MATLAB save() function use? I.e. how does that function work?

I am working on a compression project, and I used the default save() function in Matlab for the purpose of lossless (entropy) encoding. The transform module is all figured out.
I used the save() function to encode a 3d array that includes a bunch of zeros. I am sure that Matlab is using some kind of lossless compression with the save() function since, when I save that array, it ends up taking far less space than an array, say, containing no zeros at all. I had no success finding out what type of entropy encoding schemes are behind the function. Because it is a core part of the algorithm, I think I must at least know what is behind the function.
Plus, if you know any other type of entropy encoder that would do a better job in compressing a 3d array that contains zeros, I would really appreciate you sharing. Or, if you think I could easily write the code for that myself, then please let me know.
The v7 format uses deflate.
The v7.3 format uses the HDF5 format, which supports gzip (deflate) and szip compression. It also has an option to not compress.
The MATLAB save function supports compression for some of the formats that are available. Specifically, -v7 (default format) and -v7.3 support compression. The details of the compression are not documented.

What is the purpose of `kCGImageSourceShouldAllowFloat` for Image I/O?

When you use Image I/O on macOS, there's an option kCGImageSourceShouldAllowFloat which is documented as follows:
Whether the image should be returned as a CGImage object that uses floating-point values, if supported by the file format. CGImage objects that use extended-range floating-point values may require additional processing to render in a pleasing manner.
But it doesn’t say what file formats support it or what the benefits are, just that it might be slower.
Does anyone know what file formats support this and what the benefits would be?
TIFF files support floating point values. For example, the 128 bits per pixel format accepts 32-bit float components. See About Bitmap Images and Image Masks. Also see Supported Pixel Formats for table of supported pixel formats for graphics contexts.
In terms of the benefits of floating point, 32 bits per channel, it just means that you have more possible gradations of colors per channel. In general you can’t see this with the naked eye (over 16 bits per channel), but if you start applying adjustments (traditionally, multiple curves or levels adjustments) it means that you’re less likely to experience posterization of the images. So, if (a) the image already has this level of information; and (b) you’re might need to perform these sorts of adjustments to images, then the added data of 32-bits per component might have benefits. Otherwise the benefits of this amount of information is somewhat limited.
Bottom line, use floating point if you are possibly editing assets that might already have floating point components. But often we don’t need or use this level of information. Most of the JPG and PNG assets we deal with are 8 bits per component, anyway.

Best practice to compress bitmap with LZ4

I'm packing some image resources for my game, and since this is a typical "compress once, decompress multiple" scenario, LZ4 High Compression fits me well (LZ4HC take longer time to compress, but decompress very fast).
I compressed a bitmap from 7.7MB to 3.0MB, which looks good to me, until I found that the PNG version is only 1.9MB.
I know that LZ4 HC do not have the ratio that deflate (which is used by PNG) does, but the ratio 2.55 vs 4.05 looks not right.
I searched and find that before compressing, PNG format will perform a Filtering operation, though I don't the details, it looks like that the Filtering move manipulate the data to fits the compress algorithm better.
So my question is:
Do I need to perform a filtering move before compressing using lz4?
If yes, where can I get a library (or code snippet) to perform filtering?
If no, is there any solution to make a PNG (or other lossless image formats) compress slow but decompress fast?
The simplest filtering in PNG is just taking the difference of subsequent pixels. The first pixel is sent as is, the next pixel is sent as the difference of that pixel and the previous pixel, and so on. That would be quite fast, and provide a good bit of the compression gain of filtering.

Understanding webp encoder options

I'm currently experimenting with webp encoder (no wic) on windows 64 environment. My samples are 10 jpg stock photos depicting landscapes and houses, and the photos already optimized in jpegtran. I do this because my goal is to optimize the images of a whole website where the images have already been compressed with photoshop using the save for web command with various values on quality and then optimized with jpegtran.
I found out that using values smaller than -q 85 have a visual impact on the quality of the webp images. So I'm playing with values above 90 where the difference is smaller. I also concluded that I have to use -jpeg_like because without it the output is sometimes bigger in size than the original, which is not acceptable. I also use -m 6 -f 100 -strong because I really don't mind about the time the encoder needs to produce the output and trying to achieve the smoother results. I tried several values for these and concluded that -m 6 -f 100 -strong have the best output regarding quality and size.
I also tried the -preset photo avoiding any other parameter except -q but the size of the output gets bigger.
What I don't understand from https://developers.google.com/speed/webp/docs/cwebp#options are the options -sns , -segments which seem to have a great impact on the output size. Sometimes the output is bigger and sometimes smaller in size for the same options but I haven't concluded yet what is the reason for that and how to properly use them.
I also don't understand the -sharpness option which doesn't have an impact at the output size at least for me.
My approach is far less than a scientific approach and more like a trial and error method and If anybody knows how to use those options for the specific input and explain them for optimum results I would appreciate such a feedback.
-strong and -sharpness only change the strength of the filtering in the header of the compressed bitstream. They will be used at decoding time. That's why you don't see a change in file size for these.
-sns controls the choice of filtering strength and quantization values within each segments. A segment is just a group of macroblocks in the picture, that are believed to be sharing similar properties regarding complexity and compressibility. A complex photo should likely use the maximum allowed 4 segments (which is the default).

Efficient way to fingerprint an image (jpg, png, etc)?

Is there an efficient way to get a fingerprint of an image for duplicate detection?
That is, given an image file, say a jpg or png, I'd like to be able to quickly calculate a value that identifies the image content and is fairly resilient to other aspects of the image (eg. the image metadata) changing. If it deals with resizing that's even better.
[Update] Regarding the meta-data in jpg files, does anyone know if it's stored in a specific part of the file? I'm looking for an easy way to ignore it - eg. can I skip the first x bytes of the file or take x bytes from the end of the file to ensure I'm not getting meta-data?
Stab in the dark, if you are looking to circumvent meta-data and size related things:
Edge Detection and scale-independent comparison
Sampling and statistical analysis of grayscale/RGB values (average lum, averaged color map)
FFT and other transforms (Good article Classification of Fingerprints using FFT)
And numerous others.
Basically:
Convert JPG/PNG/GIF whatever into an RGB byte array which is independent of encoding
Use a fuzzy pattern classification method to generate a 'hash of the pattern' in the image ... not a hash of the RGB array as some suggest
Then you want a distributed method of fast hash comparison based on matching threshold on the encapsulated hash or encoding of the pattern. Erlang would be good for this :)
Advantages are:
Will, if you use any AI/Training, spot duplicates regardless of encoding, size, aspect, hue and lum modification, dynamic range/subsampling differences and in some cases perspective
Disadvantages:
Can be hard to code .. something like OpenCV might help
Probabilistic ... false positives are likely but can be reduced with neural networks and other AI
Slow unless you can encapsulate pattern qualities and distribute the search (MapReduce style)
Checkout image analysis books such as:
Pattern Classification 2ed
Image Processing Fundamentals
Image Processing - Principles and Applications
And others
If you are scaling the image, then things are simpler. If not, then you have to contend with the fact that scaling is lossy in more ways than sample reduction.
Using the byte size of the image for comparison would be suitable for many applications. Another way would be to:
Strip out the metadata.
Calculate the MD5 (or other suitable hashing algorithm) for the
image.
Compare that to the MD5 (or whatever) of the potential dupe
image (provided you've stripped out
the metadata for that one too)
You could use an algorithm like SIFT (Scale Invariant Feature Transform) to determine key points in the pictures and match these.
See http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
It is used e.g. when stitching images in a panorama to detect matching points in different images.
You want to perform an image hash. Since you didn't specify a particular language I'm guessing you don't have a preference. At the very least there's a Matlab toolbox (beta) that can do it: http://users.ece.utexas.edu/~bevans/projects/hashing/toolbox/index.html. Most of the google results on this are research results rather than actual libraries or tools.
The problem with MD5ing it is that MD5 is very sensitive to small changes in the input, and it sounds like you want to do something a bit "smarter."
Pretty interesting question. Fastest and easiest would be to calculate crc32 of content byte array but that would work only on 100% identical images. For more intelligent compare you would probably need some kind of fuzy logic analyzis...
I've implemented at least a trivial version of this. I transform and resize all images to a very small (fixed size) black and white thumbnail. I then compare those. It detects exact, resized, and duplicates transformed to black and white. It gets a lot of duplicates without a lot of cost.
The easiest thing to do is to do a hash (like MD5) of the image data, ignoring all other metadata. You can find many open source libraries that can decode common image formats so it's quite easy to strip metadata.
But that doesn't work when image itself is manipulated in anyway, including scaling, rotating.
To do exactly what you want, you have to use Image Watermarking but it's patented and can be expensive.
This is just an idea: Possibly low frequency components present in the DCT of the jpeg could be used as a size invariant identifier.