Apple Compression Library: identify the algorithm used to compress data - swift

Apple has four compression algorithms in its Compression library.
How can I tell which algorithm was used to compress a given NSData instance? The compressed data should have a header of some kind, no?
edit: by extension, how do you know the size of compressed/decompressed data and thus the required buffer size?

Related

Best practice to compress bitmap with LZ4

I'm packing some image resources for my game, and since this is a typical "compress once, decompress multiple" scenario, LZ4 High Compression fits me well (LZ4HC take longer time to compress, but decompress very fast).
I compressed a bitmap from 7.7MB to 3.0MB, which looks good to me, until I found that the PNG version is only 1.9MB.
I know that LZ4 HC do not have the ratio that deflate (which is used by PNG) does, but the ratio 2.55 vs 4.05 looks not right.
I searched and find that before compressing, PNG format will perform a Filtering operation, though I don't the details, it looks like that the Filtering move manipulate the data to fits the compress algorithm better.
So my question is:
Do I need to perform a filtering move before compressing using lz4?
If yes, where can I get a library (or code snippet) to perform filtering?
If no, is there any solution to make a PNG (or other lossless image formats) compress slow but decompress fast?
The simplest filtering in PNG is just taking the difference of subsequent pixels. The first pixel is sent as is, the next pixel is sent as the difference of that pixel and the previous pixel, and so on. That would be quite fast, and provide a good bit of the compression gain of filtering.

Is it safe to compute a hash on an image compressed in a lossless format such as PNG, GIF, etc.?

I was wondering if any lossless image compression format such as PNG comes with some kind of uniqueness guarantee, i.e. that two different compressed binaries always decode to different images.
I want to compute the hash of images that are stored in a lossless compression format and am wondering if computing the hash of the compressed version would be sufficient.
(There are some good reasons to compute the hash on the uncompressed image but there are out of the scope of my question here.)
No, that's not true for PNG. The compression procedure have many parameters (filtering type used for each row, ZLIB compression level and settings), so a single raw image can result in many different PNG files. Even worse, PNG allows to include ancillary data (chunks) with miscelaneous info (for example, textual comments).

Storing lots of images on server compression

We have a project which will generate lots (hundreds of thousands) of .PNG images that are around 1mb. Rapid serving is not a priority as we use the images internally, not front end.
We know to use filesystem not DB to store.
We'd like to know how best to compress these images on the server to minimise long term storage costs.
linux server
They already are compressed, so you would need to recode the images into another lossless format, while preserving all of the information present in the PNG files. I don't know of a format that will do that, but you can roll your own by recoding the image data using a better lossless compressor (you can see benchmarks here), and have a separate metadata file that retains the other information from the original .png files, so that you can reconstruct the original.
The best you could get losslessly, based on the benchmarks, would be about 2/3 of their current size. You would need to test the compressors on your actual data. Your mileage may vary.

image and video compression

What are similar compressors to the RAR algorithm?
I'm interested in compressing videos (for example, avi) and images (for example, jpg)
Winrar reduced an avi video (1 frame/sec) to .88% of it's original size (i.e. it was 49.8MB, and it went down to 442KB)
It finished the compression in less than 4 seconds.
So, I'm looking to a similar (open) algorithm. I don't care about decompression time.
Compressing "already compressed" formats are meaningless. Because, you can't get anything further. Even some archivers refuse to compress such files and stores as it is. If you really need to compress image and video files you need to "recompress" them. It's not meant to simply convert file format. I mean decode image or video file to some extent (not require to fully decoding), and apply your specific models instead of formats' model with a stronger entropy coder. There are several good attempts for such usages. Here is a few list:
PackJPG: Open source and fast performer JPEG recompressor.
Dell's Experimental MPEG1 and MPEG2 Compressor: Closed source and proprietry. But, you can at least test that experimental compressor strength.
Precomp: Closed source free software (but, it'll be open in near future). It recompress GIF, BZIP2, JPEG (with PackJPG) and Deflate (only generated with ZLIB library) streams.
Note that recompression is usually very time consuming process. Because, you have to ensure bit-identical restoration. Some programs even check every possible parameter to ensure stability (like Precomp). Also, their models have to be more and more complex to gain something negligible.
Compressed formats like (jpg) can't really be compressed anymore since they have reached entropy; however, uncompressed formats like bmp, wav, and avi can.
Take a look at LZMA

What is the maximum size of JPEG metadata?

Is there a theoretical maximum to the amount of metadata (EXIF, etc) that can be incorporated in a JPEG file? I'd like to allocate a buffer that is assured to be sufficient to hold the metadata for any JPEG image without having to parse it myself.
There is no theoretical maximum, since certain APP markers can be used multiple times (e.g. APP1 is used for both the EXIF header and also the XMP block). Also, there is nothing to prevent multiple comment blocks.
In practice the one that is much more common to result in a large header is specifically the APP2 marker being used to store the ICC color profile for the image. Since some complicated color profiles can be several megabytes, it will actually get split into many APP2 blocks (since each APP block one has a 16bit addressing limit).
Each APPN data area has a length field that is 2 bytes, so 65536 would hold the biggest one. If you are just worried about the EXIF data, it would be a bit less.
http://www.fileformat.info/format/jpeg/egff.htm
There are at most 16 different APPN markers in a single file. I don't think they can be repeated, so 16*65K should be the theoretical max.
Wikipedia states:
Exif metadata are restricted in size to 64 kB in JPEG images because according to the specification this information must be contained within a single JPEG APP1 segment.