PNG: What is the benefit of using multiple IDAT-Chunks? - png

I would like to know what the benefit of using multiple IDAT-Chunks inside a PNG Image is.
The PNG documentation says
There may be multiple IDAT chunks; if so, they shall appear consecutively with no other intervening chunks. The compressed datastream is then the concatenation of the contents of the data fields of all the IDAT chunks.
I can't imagine it's because of the maximum size (2^32 bytes) of the data-block inside the chunk.

Recall that all PNG chunks (including IDAT chunks) have a prefix with the chunk length. To put all the compressed stream in a single huge IDAT chunk would cause these two inconveniences:
On the encoder side: the compressor doesn't know the total compressed data size until it has finished the compression. Then, it would need to buffer the full compressed data in memory before writing the chunk prefix.
On the decoder side: it depends on how chunk decoding is implemented; if it buffers each chunk in memory (allocating the space given by the chunk length prefix) and, after filling it and checking the CRC, it passes the content to the uncompressor, then, again, having a single huge IDAT chunk would be a memory hog.
Considering this, I believe that use rather small IDAT chunks (say, 16KB or 64KB) should be recommended practice. The overhead (12 bytes per chunk, less than 1/5000 if len=64KB) is negligible.

It appears that when reading a PNG file, libpng limits the chunks of data it buffers to 8192 bytes even if the IDAT chunk size in the file is larger. This puts an upper limit on the allocation size needed for libpng to read and decompress IDAT chunks. However, a checksum error still cannot be detected until the entire IDAT chunk has been read and this could take much longer with large IDAT chunks.
Assuming you're not concerned with early detection of CRC errors (if they do occur they'll still be detected but later on) then small IDAT chunks don't offer any benefit to the reader. Indeed, small IDAT chunks imply more separate calls to zlib and more preamble/postamble costs within zlib, so it's generally less efficient in processing time as well as space on disk.
For the writer, it's convenient to write finite-length IDAT chunks because you can determine before the write how long the chunk will be. If you want to write a single IDAT chunk then you must either complete the compression before beginning to write anything (requiring a lot of temporary storage), or you must seek within your output to update the IDAT chunk length once you know how long it is.
If you're compressing the image and streaming the result concurrently this might be impossible. If you're writing the image to disk then this is probably not a big deal.
In short, small chunks are for the compressing-on-the-fly, streaming-output use case. In most other situations you're better off with just a single chunk.

Related

Why doesn't lz4 keep small/uncompressible values uncompressed?

When compressing small values (< 500 bytes or so), and for uncompressible random values,
lz4 returns data that is much larger than the original value (e.g. 27 bytes from 4).
When a large amount of such values is compressed separately (e.g. in a key-value storage), this adds up.
Question is: why doesn't lz4 use e.g. a separate magic number for values that didn't become smaller after compression, leaving the original data as-is, and only adding 4 bytes of the overhead?
The same applies to many other compression formats.
Code with demonstration: https://jsfiddle.net/gczy7f3k/2/

How to compute the Adler32 in zlib [duplicate]

I am currently writing a C program that builds a PNG image from a data file generated by another. The image is a palette type.
Is the Adler-32 checksum calculated on the uncompressed data for...
a) each compressed block within an IDAT data chunk?
b) all compressed blocks within an IDAT data chunk?
c) all compressed blocks spanning all IDAT data chunks?
From the documents at http://www.w3.org/TR/PNG/, https://www.rfc-editor.org/rfc/rfc1950 and rfc1951 (at the same address as previuos) I am of the opinion that it is case 'c' above, allowing one's deflate implementation to chop and change how the data are compressed for each block and disregard how the compressed blocks are split between consecutive IDAT chunks.
Is this correct?
There can be only one compressed image data stream in a PNG file, and that is a single zlib stream with a single Adler-32 check at the end that is the Adler-32 of all of the uncompressed data (as pre-processed by the filters and interlacing). That zlib stream may or may not be broken up into multiple IDAT chunks. Each IDAT chunk has its own CRC-32, which is the CRC-32 of the chunk type code "IDAT" and the compressed data within.
I'm not sure what you mean by "allowing one's deflate implementation to chop and change how the data are compressed for each block". The deflate implementation for a valid PNG file must compress all of the filtered image data as a single zlib stream.
After you've compressed it as a single zlib stream, you can break up that stream however you like into a series of IDAT chunks, or as a single IDAT chunk.
PNG IDAT chunks are independent from the compressed blocks. The Adler-32 checksum is part of the zlib compression only and has nothing to do with the PNG's overall meta-structure.
From the PNG Specification:
There can be multiple IDAT chunks; if so, they must appear consecutively with no other intervening chunks. The compressed datastream is then the concatenation of the contents of all the IDAT chunks. The encoder can divide the compressed datastream into IDAT chunks however it wishes. (Multiple IDAT chunks are allowed so that encoders can work in a fixed amount of memory; typically the chunk size will correspond to the encoder's buffer size.) It is important to emphasize that IDAT chunk boundaries have no semantic significance and can occur at any point in the compressed datastream.
(emphasis mine)

Adler-32 checksum(s) in a PNG file

I am currently writing a C program that builds a PNG image from a data file generated by another. The image is a palette type.
Is the Adler-32 checksum calculated on the uncompressed data for...
a) each compressed block within an IDAT data chunk?
b) all compressed blocks within an IDAT data chunk?
c) all compressed blocks spanning all IDAT data chunks?
From the documents at http://www.w3.org/TR/PNG/, https://www.rfc-editor.org/rfc/rfc1950 and rfc1951 (at the same address as previuos) I am of the opinion that it is case 'c' above, allowing one's deflate implementation to chop and change how the data are compressed for each block and disregard how the compressed blocks are split between consecutive IDAT chunks.
Is this correct?
There can be only one compressed image data stream in a PNG file, and that is a single zlib stream with a single Adler-32 check at the end that is the Adler-32 of all of the uncompressed data (as pre-processed by the filters and interlacing). That zlib stream may or may not be broken up into multiple IDAT chunks. Each IDAT chunk has its own CRC-32, which is the CRC-32 of the chunk type code "IDAT" and the compressed data within.
I'm not sure what you mean by "allowing one's deflate implementation to chop and change how the data are compressed for each block". The deflate implementation for a valid PNG file must compress all of the filtered image data as a single zlib stream.
After you've compressed it as a single zlib stream, you can break up that stream however you like into a series of IDAT chunks, or as a single IDAT chunk.
PNG IDAT chunks are independent from the compressed blocks. The Adler-32 checksum is part of the zlib compression only and has nothing to do with the PNG's overall meta-structure.
From the PNG Specification:
There can be multiple IDAT chunks; if so, they must appear consecutively with no other intervening chunks. The compressed datastream is then the concatenation of the contents of all the IDAT chunks. The encoder can divide the compressed datastream into IDAT chunks however it wishes. (Multiple IDAT chunks are allowed so that encoders can work in a fixed amount of memory; typically the chunk size will correspond to the encoder's buffer size.) It is important to emphasize that IDAT chunk boundaries have no semantic significance and can occur at any point in the compressed datastream.
(emphasis mine)

zlib, how to decompress stream of compressed data chunk?

I have a problem about transfer compressed data across network.
The data size is around a few hundred MBs. My plan is to divide the data into 1MB chunk and compress each chunk with zlib, then stream the compressed data over network. On the other end of the network, data will be decompressed by zlib.
My question is that since I stream the compressed data, there won't be information about where each compressed chunk starts and ends in the stream. I am not sure if zlib can decompress such compressed data stream.
If zlib can, please kindly let me know what flush mode I should use in deflate/inflate methods?
Thanks!
It is not clear why you are dividing the data into chunks, or why you would need to do any special flushing. If you just mean feeding the data to zlib in chunks, that is how zlib is normally used. zlib doesn't care how you feed it it the data -- big chunks, little chunks, one byte at a time, one giant chunk, etc., will not change the compressed result.
Flushing does change the compressed result, degrading it slightly or significantly depending on how often you flush and how you flush.
Flushing is used when you want to assure that some portion of the data is fully received at a known boundary in the compressed data, or if you want to be able to recover part of the data if not all of it is received.
If the strategy you used is a must, you could make a protocol between your host and remote, such as:
02 123456 the-compressed-data 654321 the-compressed-data
The 3 numbers are:
1. the number of chunks of data, here is 2 chunks
2. the bytes of first chunk
3. the bytes of second chunk
respectively.

array data compression that is holding 13268 bits(1.66kBytes)

i.e array is having 100*125 bits of data for each aircraft+8 ascii messages each of 12 characters
what compression technique should i apply to such data
Depends mostly on what those 12500 bits look like, since that's the biggest part of your data. If there aren't any real patterns in it, or if they aren't byte-sized or word-sized patterns, "compressing" it may actually make it bigger, since almost every compression algorithm will add a small amount of extra data just to make decompression possible.