What is the best way I can compress a number that has 540,000 digits? - numbers

I have a number that has about 540,000 digits and I want to compress this number to a reasonable length since 540,000 is kinda absurd. What would be the best compression algorithm for this and how small can I compress it to?
A little background: Basically I have a picture that is 200 pixels wide and 300 pixels long. I'm taking out the red green blue values of each pixel. So each pixel is represented with 9 digits (because each red green value is represented with a number between 0 - 255). The picture has 60000 pixels in total. So representing this picture as a number would equate to a number equal to 9 x 60000 = 540,000.

That's not a number. That's an image. There are many ways to compress an image. For lossless compression, look at PNG, JPEG-2000, and BCIF.

You don't gain anything converting an image into a number. The entropy level is the same and any good compression algorithm (lossless) would perform the same, irrespective of the encoding you use

If you want a textual version of the image as opposed to a binary format, consider encoding it in Base64 rather than Base10. This'll have each pixel be represented with 4 characters rather than 9
https://en.wikipedia.org/wiki/Base64
Further compression should be done by encoding the image as a PNG & then taking the base64 representation of that

Related

Perceptual hashing accuracy/precision

I want to find identical and very similar images within a truckload of photos. To do this, I want to compare the Levenstein (or Hamming, not decided yet) distances of their perceptual hashes. To calculate these, I want to use imghash (also not a final decision). For output, imghash allows to select output format and number of bits. I assume that changing the number of bits changes accuracy/precision, but does it really? By default, the output is a 16-character hex string (Eighteen Quintillion Four Hundred Forty-Six Quadrillion.. combinations). Seems like an overkill. But is it? And if so, what is the reasonable length?
When using imghash and hamming-distance to calc similarity of images, it goes like this:
imgHash accepts [,bits] as an optional argument, which is 8 by default. Longer hash does mean greater accuracy: For 'very similar' images I tested this with, their 4-bit hashes were same, but 8-bit hashes differ.
The maximum hamming distance (when images are completely different - black vs. white canvas) equals to hash length ^2. Accordingly, you need to adjust your selected threshold for image similarity.
Also:
The selected bit length must be divisible by 4.
When comparing the perceptual hashes, these need to be the same length.

LibPNG: smallest way to store 9-bit grayscale images

Which one produces the smallest 9-bit depth grayscale images using LibPNG?
16 bit Grayscale
8 bit GrayScale with alpha, having the 9th bit stored as alpha
Any other suggestion?
Also, from documentation it looks like in 8bit GRAY_ALPHA, alpha is 8 bit as well. Is it possible to have 8 bits of gray with only one bit of alpha?
If all 256 possible gray levels are present (or are potentially present), you'll have to use 16-bit G8A8 pixels. But if one or more gray levels is not present, you can use that spare level for transparency, and use 8-bit indexed pixels or grayscale plus a tRNS chunk to identify the transparent value.
Libpng doesn't provide a way of checking for whether a spare level is available or not, so you have to do it in your application. ImageMagick, for example, does that for you:
$ pngcheck -v rgba32.png im_opt.png
File: rgba32.png (178 bytes)
chunk IHDR at offset 0x0000c, length 13
64 x 64 image, 32-bit RGB+alpha, non-interlaced
chunk IDAT at offset 0x00025, length 121
zlib: deflated, 32K window, maximum compression
chunk IEND at offset 0x000aa, length 0
$ magick rgba32.png im_optimized.png
$ pngcheck -v im_optimized.png
File: im_optimized.png (260 bytes)
chunk IHDR at offset 0x0000c, length 13
64 x 64 image, 8-bit grayscale, non-interlaced
chunk tRNS at offset 0x00025, length 2
gray = 0x00ff
chunk IDAT at offset 0x00033, length 189
zlib: deflated, 8K window, maximum compression
chunk IEND at offset 0x000fc, length 0
There is no G8A1 format defined in the PNG specification. But the alpha channel, being all 0's or 255's, compresses very well, so it's nothing to worry about. Note that in this test case (a simple white-to-black gradient), the 32-bit RGBA file is actually smaller than the "optimized" 8-bit grayscale+tRNS
Which one produces the smallest 9-bit depth grayscale images using LibPNG?
16 bit Grayscale
8 bit GrayScale with alpha, having the 9th bit stored as alpha
The byte raw layout of both formats is very similar: G2-G1 G2-G1 ... in one case (most significant byte of each 16-bit value first), G-A G-A ... in the other. Because the filtering/prediction is done at the byte level, this means that little or no difference is to be expected between two alternatives. Because 16 bit Grayscale is in your scenario more natural, I'd opt for it.
If you go the other route, I'd suggest to experiment putting the most significant bit or the least significant bit on the alpha channel.
Also, from documentation it looks like in 8bit GRAY_ALPHA, alpha is 8 bit as well. Is it possible to have 8 bits of gray with only one bit of alpha
No. But 1 bit of alpha would mean totally opaque/totally transparent, hence you could opt for add a TRNS chunk to declare a special color as totally transparent (as pointed out in other answer, this would disallow the use of that colour)

Enhancing 8 bit images to 16 bit

My objective is to enhance 8 bit images to 16 bit ones. In other words, I want to increase the dynamic range of an 8 bit image. And to do that, I can sequentially take multiple images of 8 bit with fixed scene and fixed camera. To simplify the issue, let's assume they are grayscale images
Intuitively, I think I can achieve the goal by
Multiplying two 8 bit images
resimage = double(img1) .* double(img2)
Averaging specified number of 8 bit images
resImage = mean(images,3)
assuming images(:,:,i) contains ith 8 bit image.
After that, I can make the resulting image to 16 bit one.
resImage = uint16(resImage)
But before testing these methods, I wonder there is another way to do this - except for buying 16 bit camera, or literature for this subject might be better.
UPDATE: As comments below display, I got great information on drawbacks of simple averaging above and on image stacks for the enhancement. So it may be a good topic to study after all. Thank all for your great comments.
This question appears to relate to increasing the Dynamic Range of an image by integrating information from multiple 8 bit exposures into a 16 bit image. This is related to the practice of capturing and combining "image stacks" in astronomical imaging among other fields. An explanation of this practice and how it can both reduce image noise, and enhance dynamic range is available here:
http://keithwiley.com/astroPhotography/imageStacking.shtml
The idea is that successive captures of the same scene are subject to image noise, and this noise leads to stochastic variation of the pixel values captured. In the simplest case these variations can be leveraged by summing and dividing i.e. mean averaging the stack to improve its dynamic range but the practicality would depend very much on the noise characteristics of the camera.
You want to sum many images together, assuming there is no jitter and the camera is steady. Accumulate a large sum and then divide by some amount.
Note that to get a reasonable 16-bit image from an 8 bit source, you'd need to take hundreds of images to get any kind of reasonable result. Note that jitter will distort edge information and there is some inherent noise level of the camera that might mean you are essentially 'grinding metal'. In a practical sense, you might get 2 or 3 more bits of data from image summing, but not 8 more. To get 3 bits more would require at least 64 images (6 bits) to sum. Then divide by 8 (3 bits), as the lower bits are garbage.
Rule of thumb is to get a new bit of data, you need the squared(bits) of images, so 3 bits (8) means 64 images, 4 bits would be 256 images, etc.
Here's a link that talks about sampling:
http://electronicdesign.com/analog/understand-tradeoffs-increasing-resolution-averaging
"In fact, it can be shown that the improvement is proportional to the square root of the number of samples in the average."
Note that SNR is a log scale so equating it to bits is reasonable.

After encoding data size is increasing

I am having a text data in XML format and it's length is around 816814 bytes. It contains some image data as well as some text data.
We are using ZLIB algorithm for compressing and after compressing, the compressed data length is 487239 bytes.
After compressing we are encoding data using BASE64Encoder. But after encoding the compressed data, size is increasing and length of encoded data is 666748 bytes.
Why, after encoding data size is increasing? Is there any other best encoding techniques?
Regards,
Siddesh
As noted, when you are encoding binary 8-bit bytes with 256 possible values into a smaller set of characters, in this case 64 values, you will necessarily increase the size. For a set of n allowed characters, the expansion factor for random binary input will be log(256)/log(n), at a minimum.
If you would like to reduce this impact, then use more characters. Chances are that whatever medium you are using, it can handle more than 64 characters transparently. Find out how many by simply sending all 256 possible bytes, and see which ones make it through. Test the candidate set thoroughly, and then ideally find documentation of the medium that backs up that set of n < 256.
Once you have the set, then you can use a simple hard-wired arithmetic code to convert from the set of 256 to the set of n and back.
That is perfectly normal.
Base64 is required to be done, if your transmitting medium is not designed to transmit binary data but only textual data (eg XML)
So your zip file gets base64 encoded.
Plainly speaking, it requires the transcoder to change "non-ASCII" letters into a ASCII form but still remember the way to go back
As a rule of thumb, it's around a 33% size increase ( http://en.wikipedia.org/wiki/Base64#Examples )
This is the downside of base64. You are better of using a protocol supporting file-transfer... but for files encoded within XML, you are pretty much out of options.

how to apply RLE in binary image?

Here I have binary image,and I need to compress it using Run-length encoding RLE.I used the regular RLE algorithm and using maximum count is 16.
Instead of reducing the file size, it is increasing it. For example 5*5 matrix, 10 values repeating count is one,that is making the file bigger.
How to avoid this glitch? Is there any better way I can apply RLE partially to the matrix?
If it's for your own usage only you can create your custom image file format, and in the header you can mark if RLE is used or not, and the range of coordinates of X and Y and possible the bit planes for which it is used. But if you want to produce an image file that follows some defined image file format that uses RLE (.pcx comes into my mind) you must follow the file format specifications. If I remember correctly, in .pcx there wasn't any option to disable RLE partially.
If you are not required to use RLE and you are only looking for an easy to implement compression method, before using any compression, I suggest that you first check how many bytes your 5x5 binary matrix file takes. If the file size is 25 bytes or more, then you are saving it using at least one byte (8 bits) for each element (or alternatively you have a lot of data which is not matrix content). If you don't need to store the size, 5x5 binary matrix takes 25 bits, which is 4 bytes and 1 bit, so practically 5 bytes. I'm quite sure that there's no compression method that is generally useful for files that have size of 5 bytes. If you have matrices of different sizes, you can use eg. unsigned integer 16-bit fields (2 bytes each) for maximum matrix horizontal/vertical size of 65535 or unsigned integer 32-bit fields (4 bytes each) for maximum matrix horizontal/vertical size of 4294967295.
For example 100x100 binary matrix takes 10000 bits, which is 1250 bytes. Add 2 x 2 = 4 bytes for 16-bit size fields or 2 x 4 = 8 bytes for 32-bit size fields. After this, you can plan what would be the best compression method.