How best to archive many JPEG files containing significant redundancy across their scenes? - image-compression

Archives of JPEG files don't compress well, ostensibly because each JPEG is already highly compressed. However, when there is much redundancy between images (e.g., archived stills from a stationary camera), and the number of files is large (think thousands or more), there comes a point when failure to exploit the redundancies makes JPEG seem dramatically inefficient for files to be stored in archives.
What approach and archive format would give the best compression of JPEG files?

Related

Best practice to compress bitmap with LZ4

I'm packing some image resources for my game, and since this is a typical "compress once, decompress multiple" scenario, LZ4 High Compression fits me well (LZ4HC take longer time to compress, but decompress very fast).
I compressed a bitmap from 7.7MB to 3.0MB, which looks good to me, until I found that the PNG version is only 1.9MB.
I know that LZ4 HC do not have the ratio that deflate (which is used by PNG) does, but the ratio 2.55 vs 4.05 looks not right.
I searched and find that before compressing, PNG format will perform a Filtering operation, though I don't the details, it looks like that the Filtering move manipulate the data to fits the compress algorithm better.
So my question is:
Do I need to perform a filtering move before compressing using lz4?
If yes, where can I get a library (or code snippet) to perform filtering?
If no, is there any solution to make a PNG (or other lossless image formats) compress slow but decompress fast?
The simplest filtering in PNG is just taking the difference of subsequent pixels. The first pixel is sent as is, the next pixel is sent as the difference of that pixel and the previous pixel, and so on. That would be quite fast, and provide a good bit of the compression gain of filtering.

Is it safe to compute a hash on an image compressed in a lossless format such as PNG, GIF, etc.?

I was wondering if any lossless image compression format such as PNG comes with some kind of uniqueness guarantee, i.e. that two different compressed binaries always decode to different images.
I want to compute the hash of images that are stored in a lossless compression format and am wondering if computing the hash of the compressed version would be sufficient.
(There are some good reasons to compute the hash on the uncompressed image but there are out of the scope of my question here.)
No, that's not true for PNG. The compression procedure have many parameters (filtering type used for each row, ZLIB compression level and settings), so a single raw image can result in many different PNG files. Even worse, PNG allows to include ancillary data (chunks) with miscelaneous info (for example, textual comments).

MATLAB: Are there any problems with many (millions) small files compared to few (thousands) large files?

I'm working on a real-time test software in MATLAB. On user input I want to extract the value of one (or a few neighbouring) pixels from 50-200 high resolution images (~25 MB).
My problem is that the total image set is to big (~2000 images) to store in RAM, consequently I need to read each of the 50-200 images from disk after each user-input which of course is way to slow!
So I was thinking about splitting the images into sub-images (~100x100 pixels) and saving these separately. This would make the image-read process quick enough.
Are there any problems I should be aware of with this approach? For instance I've read about people having trouble copying many small files, will this affect me to i.e. make the image-read slower?
rahnema1 is right - imread(...,'PixelRegion') will fasten read operation. If it is not enough for you, even if your files are not fragmented, may be it is time to think about some database?
Disk operations are always the bottleneck. First we switch to disk caches, then distributed storage, then RAID, and after some more time, we finish with in-memory databases. You should choose which access speed is reasonable.

Storing lots of images on server compression

We have a project which will generate lots (hundreds of thousands) of .PNG images that are around 1mb. Rapid serving is not a priority as we use the images internally, not front end.
We know to use filesystem not DB to store.
We'd like to know how best to compress these images on the server to minimise long term storage costs.
linux server
They already are compressed, so you would need to recode the images into another lossless format, while preserving all of the information present in the PNG files. I don't know of a format that will do that, but you can roll your own by recoding the image data using a better lossless compressor (you can see benchmarks here), and have a separate metadata file that retains the other information from the original .png files, so that you can reconstruct the original.
The best you could get losslessly, based on the benchmarks, would be about 2/3 of their current size. You would need to test the compressors on your actual data. Your mileage may vary.

image and video compression

What are similar compressors to the RAR algorithm?
I'm interested in compressing videos (for example, avi) and images (for example, jpg)
Winrar reduced an avi video (1 frame/sec) to .88% of it's original size (i.e. it was 49.8MB, and it went down to 442KB)
It finished the compression in less than 4 seconds.
So, I'm looking to a similar (open) algorithm. I don't care about decompression time.
Compressing "already compressed" formats are meaningless. Because, you can't get anything further. Even some archivers refuse to compress such files and stores as it is. If you really need to compress image and video files you need to "recompress" them. It's not meant to simply convert file format. I mean decode image or video file to some extent (not require to fully decoding), and apply your specific models instead of formats' model with a stronger entropy coder. There are several good attempts for such usages. Here is a few list:
PackJPG: Open source and fast performer JPEG recompressor.
Dell's Experimental MPEG1 and MPEG2 Compressor: Closed source and proprietry. But, you can at least test that experimental compressor strength.
Precomp: Closed source free software (but, it'll be open in near future). It recompress GIF, BZIP2, JPEG (with PackJPG) and Deflate (only generated with ZLIB library) streams.
Note that recompression is usually very time consuming process. Because, you have to ensure bit-identical restoration. Some programs even check every possible parameter to ensure stability (like Precomp). Also, their models have to be more and more complex to gain something negligible.
Compressed formats like (jpg) can't really be compressed anymore since they have reached entropy; however, uncompressed formats like bmp, wav, and avi can.
Take a look at LZMA