Jpeg huffman coding procedure - encoding

A Huffman table, in JPEG standard, is generated from a collection of statistics in two steps. One of the steps is implementing function/method given by this picture: (This picture is given in Annex K of JPEG standard):
Problem is here. Previously in standard (Annex C) says this sentence:
Huffman tables are specified in terms of a 16-byte list (BITS) giving the number of codes for each code length from
1 to 16. This is followed by a list of the 8-bit symbol values (HUFFVAL), each of which is assigned a Huffman code.
Obviously BITS is list of 16 elements. But in picture above, i is first set to 32 (i=32) then we want to access BITS[i]. Probably I misunderstood something, so please let someone gives me an answer.
Here is JPEG standard description of picture:
Figure K.3 gives the procedure for adjusting the BITS list so that no code is longer than 16 bits. Since symbols are paired
for the longest Huffman code, the symbols are removed from this length category two at a time. The prefix for the pair
(which is one bit shorter) is allocated to one of the pair; then (skipping the BITS entry for that prefix length) a code word
from the next shortest non-zero BITS entry is converted into a prefix for two code words one bit longer. After the BITS
list is reduced to a maximum code length of 16 bits, the last step removes the reserved code point from the code length
count.
Here is code for picture above:
void adjustBitLengthTo16Bits(vector<char>&BITS){
int i=32,j=0;
while(1){
if(BITS[i]>0){
j=i-1;
j--;
while(BITS[j]<=0)
j--;
BITS[i]=BITS[i]-2;
BITS[i-1]=BITS[i-1]+1;
BITS[j+1]=BITS[j+1]+2;
BITS[j]=BITS[j]-1;
continue;
}
else{
i--;
if(i!=16)
continue;
while(BITS[i]==0)
i--;
BITS[i]--;
return;
}
}
}

This code is only for encoders that want to generate their own custom Huffman tables. The majority of JPEG encoders just use fixed tables that are reasonable approximations of the statistics of most images. In this particular case, the first step in generating a Huffman table for the AC coefficients produces a table up to 32 entries (bits) long. Since there are only 256 unique symbols to encode (skip/length pairs), there should never be more than 32 bits needed to specify all of the Huffman codes. After the first pass has produced a set of codes (up to 32-bits in length), the second pass takes the least frequent (longest) codes and "moves" them into shorter length slots so that the maximum code length is 16-bits. In an ideal Huffman table, the frequency distributions correspond to the code lengths. In this case, the table is being made to fit by squeezing the longest codes into slots reserved for shorter codes. This can be done because the 14/15/16 bit length Huffman codes have "room" for more permutations of bits and can "fit" the longer codes in them.
Update:
There is limited benefit to "optimizing" the Huffman tables in JPEG. Most of the compression occurs because of the quantization and DCT transform of the pixels. Switching to arithmetic coding has a measurable benefit (~10% size reduction), but then it limits the audience since most JPEG decoders don't support arithmetic coding due to past patent issues.

Related

How can I find the average length of a codeword encoded in Huffman if there are N(10 or more) symbols?

I'm practicing for an exam and I found a problem which asks to find the average length of codewords which are encoded in Huffman.
This usually wouldn't be hard, but in this problem we have to encode 100 symbols which all have the same probability (1/100).
Since there is obviously no point in trying to encode 100 symbols by hand I was wondering if there is a method to find out the average length without actually going through the process of encoding.
I'm guessing this is possible since all the probabilities are equal, however I couldn't find anything online.
Any help is appreciated!
For 100 symbols with equal probability, some will be encoded with six bits, some with seven bits. A Huffman code is a complete prefix code. "Complete" means that all possible bits patterns are used.
Let's say that i codes are six bits long and j codes are seven bits long. We know that i + j = 100. There are 64 possible six-bit codes, so after i get used up, there are 64 - i left. Adding one bit to each of those to make them seven bits long doubles the number of possible codes. So now we can have up to 2(64 - i) seven-bit codes.
For the code to be complete, all of those codes must be used, so j = 2(64 - i). We now have two equations in two unknowns. We get i = 28 and j = 72.
Since all symbols are equally probable, the average number of bits used per symbol is (28x6 + 72x7) / 100, which is 6.72. Not too bad, considering the entropy of each symbol is 6.64 bits.

32-1024 bit fixed point vector arithmetic with AVX-2

For a mandelbrot generator I want to used fixed point arithmetic going from 32 up to maybe 1024 bit as you zoom in.
Now normaly SSE or AVX is no help there due to the lack of add with carry and doing normal integer arithmetic is faster. But in my case I have literally millions of pixels that all need to be computed. So I have a huge vector of values that all need to go through the same iterative formula over and over a million times too.
So I'm not looking at doing a fixed point add/sub/mul on single values but doing it on huge vectors. My hope is that for such vector operations AVX/AVX2 can still be utilized to improve the performance despite the lack of native add with carry.
Anyone know of a library for fixed point arithmetic on vectors or some example code how to do emulate add with carry on AVX/AVX2.
FP extended precision gives more bits per clock cycle (because double FMA throughput is 2/clock vs. 32x32=>64-bit at 1 or 2/clock on Intel CPUs); consider using the same tricks that Prime95 uses with FMA for integer math. With care it's possible to use FPU hardware for bit-exact integer work.
For your actual question: since you want to do the same thing to multiple pixels in parallel, probably you want to do carries between corresponding elements in separate vectors, so one __m256i holds 64-bit chunks of 4 separate bigintegers, not 4 chunks of the same integer.
Register pressure is a problem for very wide integers with this strategy. Perhaps you can usefully branch on there being no carry propagation past the 4th or 6th vector of chunks, or something, by using vpmovmskb on the compare result to generate the carry-out after each add. An unsigned add has carry out of a+b < a (unsigned compare)
But AVX2 only has signed integer compares (for greater-than), not unsigned. And with carry-in, (a+b+c_in) == a is possible with b=carry_in=0 or with b=0xFFF... and carry_in=1 so generating carry-out is not simple.
To solve both those problems, consider using chunks with manual wrapping to 60-bit or 62-bit or something, so they're guaranteed to be signed-positive and so carry-out from addition appears in the high bits of the full 64-bit element. (Where you can vpsrlq ymm, 62 to extract it for addition into the vector of next higher chunks.)
Maybe even 63-bit chunks would work here so carry appears in the very top bit, and vmovmskpd can check if any element produced a carry. Otherwise vptest can do that with the right mask.
This is a handy-wavy kind of brainstorm answer; I don't have any plans to expand it into a detailed answer. If anyone wants to write actual code based on this, please post your own answer so we can upvote that (if it turns out to be a useful idea at all).
Just for kicks, without claiming that this will be actually useful, you can extract the carry bit of an addition by just looking at the upper bits of the input and output values.
unsigned result = a + b + last_carry; // add a, b and (optionally last carry)
unsigned carry = (a & b) // carry if both a AND b have the upper bit set
| // OR
((a ^ b) // upper bits of a and b are different AND
& ~r); // AND upper bit of the result is not set
carry >>= sizeof(unsigned)*8 - 1; // shift the upper bit to the lower bit
With SSE2/AVX2 this could be implemented with two additions, 4 logic operations and one shift, but works for arbitrary (supported) integer sizes (uint8, uint16, uint32, uint64). With AVX2 you'd need 7uops to get 4 64bit additions with carry-in and carry-out.
Especially since multiplying 64x64-->128 is not possible either (but would require 4 32x32-->64 products -- and some additions or 3 32x32-->64 products and even more additions, as well as special case handling), you will likely not be more efficient than with mul and adc (maybe unless register pressure is your bottleneck).As
As Peter and Mystical suggested, working with smaller limbs (still stored in 64 bits) can be beneficial. On the one hand, with some trickery, you can use FMA for 52x52-->104 products. And also, you can actually add up to 2^k-1 numbers of 64-k bits before you need to carry the upper bits of the previous limbs.

How can I calculate the impact on collision probability when truncating a hash?

I'd like to reduce an MD5 digest from 32 characters down to, ideally closer to 16. I'll be using this as a database key to retrieve a set of (public) user-defined parameters. I'm expecting the number of unique "IDs" to eventually exceed 10,000. Collisions are undesirable but not the end of the world.
I'd like to understand the viability of a naive truncation of the MD5 digest to achieve a shorter key. But I'm having trouble digging up a formula that I can understand (given I have a limited Math background), let alone use to determine the impact on collision probability that truncating the hash would have.
The shorter the better, within reason. I feel there must be a simple formula, but I'd rather have a definitive answer than do my own guesswork cobbled together from bits and pieces I have read around the web.
You can calculate the chance of collisions with this formula:
chance of collision = 1 - e^(-n^2 / (2 * d))
Where n is the number of messages, d is the number of possibilities, and e is the constant e (2.718281828...).
#mypetition's answer is great.
I found a few other equations that are more-or-less accurate and/or simplified here, along with a great explanation and a handy comparison of real-world probabilities:
1−e^((−k(k−1))/2N) - sample plot here
(k(k-1))/2N - sample plot here
k^2/2N - sample plot here
...where k is the number of ID's you'll be generating (the "messages") and N is the largest number that can be produced by the hash digest or the largest number that your truncated hexadecimal number could represent (technically + 1, to account for 0).
A bit more about "N"
If your original hash is, for example, "38BF05A71DDFB28A504AFB083C29D037" (32 hex chars), and you truncate it down to, say, 12 hex chars (e.g.: "38BF05A71DDF"), the largest number you could produce in hexadecimal is "0xFFFFFFFFFFFF" (281474976710655 - which is 16^12-1 (or 256^6 if you prefer to think in terms of bytes). But since "0" itself counts as one of the numbers you could theoretically produce, you add back that 1, which leaves you simply with 16^12.
So you can think of N as 16 ^ (numberOfHexDigits).

MD5 hashing Algorithm step 2 and 3

I created a repository on github to write some of the computer security algorithms, and it's time to write MD5 Algorithm, i searched about papers/videos explain the algorithm with examples alongside steps, but i didn't.
I wrote this for step 1 and i don't know if this is correct or not?
//step1
var textP = ToBinaryString(Encoding.UTF8, text);
textP = textP.Length < 448 ? textP + '1' : textP;
while (textP.Length <448)
{
textP += '0';
}
Console.WriteLine(textP);
Second: to step 2, append length
A 64 bit representation of b is appended to the result of the previous step
The resulting message has a length that is an exact multiple of 512 bits
it means to append to the 448bits the origin bits of the string?
No, the first step is not correct, as it doesn't pad correctly in case there are fewer than 64 bits left in the block. In that case the padding will have to span two blocks - first put in a 1 and fill the rest with zero's, then create a 448 bit block.
The second sentence is unclear to me. The 64 bits encoding of the input size in bits needs to be added after the padding has taken place.
Note that you're trying to recreate the algorithm description literally. That's not a good idea. You need to process blocks of plaintext, keeping count of the number of bits or bytes and then perform the padding and length encoding when the end of the stream is indicated. You need a 512 bit buffer, an update and final method.
Creating the hash by using a string representing binary is not a good idea. You should process bytes and possibly words on the inside. You only need to encode anything for debugging purposes.

Huffman coding for Lossless Compression

I really need help with Huffman Coding for Lossless compression. I have an exam coming up and need to understand this, does anyone know of easy tutorials made to understand this, or could someone explain.
The questions in the exam are likely to be:
Suppose the alphabet is [A, B, C], and the known probability distribution is P(A)=0.6,
P(B)=0.2 and P(C)=0.2. For simplicity, let’s also assume that both encoder and decoder know
that the length of the messages is always 3, so there is no need for a terminator.
How many bits are needed to encode the message ACB by Huffman Coding? You need to
provide the Huffman tree and Huffman code for each symbol. (3 marks)
How many bits are needed to encode the message ACB by Arithmetic Coding? You need to
provide details of the encoding process. (3 marks)
Using the above results, discuss the advantage of Arithmetic Coding over Huffman coding.
(1 mark)
Answers:
Huffman Code: A - 1, B - 01, C - 00.
The encoding result is 10001, so 5 bits are needed. (3 marks)
The encoding process of Arithmetic Coding:
Symbol Low high range
0.0 1.0 1.0
A 0.0 0.6 0.6
C 0.48 0.6 0.12
B 0.552 0.576 0.024
The final binary codeword is 0.1001, which is 0.5625. Therefore 4 bits are needed. (3 marks)
In Huffman Coding, the length of the codeword for each symbol has to be an integer. But it
can be fractional in Arithmetic Coding. Therefore Arithmetic Coding is often more efficient
than Huffman Coding, as the results shown above. (1 mark)
http://en.wikipedia.org/wiki/Huffman_coding
If you look at the tree (top right) you'll see that each parent node is the sum of the two below it. The values at the nodes are the frequencies of the letters. Each bit in the binary sequence is a right/left branch in the tree.
Does that help?
I don't really have a clue about Arithmetic coding, but it looks quite clever.
A Huffman tree is a binary tree with the nodes representing the values with the highest distribution in the stream being compressed near the root and the values with decreasing distribution further and further away from the root, thus allowing more common values to be encoded in shorter bit strings while less common values are encoded in longer strings.
A Huffman tree is constructed as follows:
Build a table of entities in the source stream, with their distribution.
Pick the two entries in the table that have the lowest distribution.
Make a tree node out of these two entries.
Remove the entries just used from the table.
Add a new entry to the table with the combined distribution of the nodes just removed, as well as the tree node.
if there is more than one entry left in the table, go to step 2.
The entry left in the table is your root.
Basic huffmann implementation can be quite ok. But, if you are building from scratch you may need more than 1 other datastructure in your toolbox to make things easier such as a minHeap and a bit vector. The basic algorithms for encoding and decoding are pretty simple. No info on comparison with arithmetic coding.
An implementation example