Encoding of String "a" using huffman algorithm - encoding

I tried calculating the Shannon entropy for a which is 0.. I am confused how huffman would encode and store this, as it's programmatically impossible to store string "a" with 0 bits in a computer system.

Indeed the code for a single symbol would be zero bits. However you have to answer some other questions.
How is the code itself transmitted to the receiver?
How does the receiver know that it is done decoding?
To answer those questions, you'd need more bits in the stream for the receiver.

Related

when is fixed length encoding better than huffman?

For the word "sleeplessness" Huffman encoding is 27 bits while Fixed length encoding is 39
Is there a word or a general condition in which Huffman will need more bits than Fixed length encoding?
A Huffman coding using the probability of the symbols in the message will never need more bits than a fixed-length coding, though only if we ignore the bits required to transmit a description of the code itself. The Huffman code description plus the Huffman-coded message for short messages will often be larger than a fixed-length code that requires no description.

Matlab - JPEG Compression. Huffman Encoding

I've been trying to implement a JPEG compression algorithm on Matlab.
The only part I'm having trouble implementing is the huffman encoding. I do understand the DCT into quantization and zig-zag'ing that 8x8 matrix. I also understand how does huffman encoding work, in general.
What I do not understand is, after I have an output bitstream and a dictionary that translates consecutive bits to their original form, what do I do with the output? How can I tell a computer to translate that output bitstream using the dictionary I created for it?
In addition, each 8x8 matrix will have its own output and dictionary. How can all these outputs be combined into one? Because at the end of the day, the result is supposed to be an image.
I might have misunderstood some of the steps, in which case my apologies for any confusion caused by this.
Any help would be extremely appriciated!
EDIT: I'm sorry, my question appearntly hasn't been clear enough. Say I use Matlabs built in huffman functions (huffmanenco and huffmandict), what am I supposed to do with the value the huffmanenco returns?
The part of what to do with the output string of bits hasn't been clear to me as far as huffman encoding goes in other IDE's and programming languages aswell.
You have two choices with the huffman coding.
Use a pre-canned huffman table.
Make two passes over the data where the first pass generates the huffman tables and the second pass encode.
You cannot have a different dictionary for each MCU.
You say you have the run length encoded values. You huffman encode those and write to the output stream.
EDIT:
You need to be sure that the matlab huffman endocoder is JPEG-compatible. There are different ways to huffman encode.
You need to write the bits from the encoder to the JPEG stream. This means you need a bit level I/O routine. PLUS you need to convert FF values in the compressed data into FF00 values in the JPEG stream.
I suggest getting a copy of
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=pd_sim_14_1?ie=UTF8&dpID=41XJBED6RCL&dpSrc=sims&preST=_AC_UL160_SR127%2C160_&refRID=1DYN5VCQQP0N88E64P5Q
to show how the encoding is done.

How to detect block cipher mode

How to detect if a message was crypt by CBC or ECB mode?
I have made a function who encrypt in AES 128 CBC or ECB randomly, and I do hamming between clear text and cipher text, but seams not correlated to cipher mode.
How can I detect the block cipher mode?
Thank you in advance
The answer is pretty much given in the problem statement:
Remember that the problem with ECB is that it is stateless and
deterministic; the same 16 byte plaintext block will always produce
the same 16 byte cipher text.
Thus, with the assumption that some repeated plaintext blocks occur at the same ciphertext block offsets, we can simply go ahead and look for repeated ciphertext blocks of various lengths.
I am doing the same problem set and just finished this problem (using clojure).
My first hint is, it will be more clear what you need to do if you are using a language which supports first class functions/lambdas.
Anyways, let's break down the problem a bit:
First, just write a function which validates that a blackbox is encrypting data with ecb. How would you do this?
It might look something like (pseudocode below)
function boolean isEcbBlackbox(func f)
{ //what input can I use to determine this?
result = f("chosen input")
if(result ...) {//what property of result should I look for?
true
} else {
false
}
}
Remember, the key weakness of ECB is identical blocks of plaintext will be encrypted to identical blocks of ciphertext.
EDIT: The challenges are now public, so I will link to my solution(s):
https://github.com/dustinconrad/crypto-tutorial/blob/master/src/crypto_tutorial/lib/block.clj#L118
compute block size based on cipher text % 16 or 24 or 32 which ever is == 0
hamming distance should be done by cipher block 1 with rest of the cipher blocks
if we average to per byte using floating point arithmatic, if the value is below certain threshold then it is ECB.
I know the exact exercise you're doing, I'm currently doing it right now myself. I would recommend doing Frequency Analysis on the encrypted strings (don't forget the string might be base64'd or hex). If you get back a frequency distribution that matches the language of the string you encoded then it's safe to assume it's in ECB, otherwise it's probably CBC.
I don't know if this will actually work as I'm just doing the exercise now, but it's a start.
EDIT:
I rushed this answer a bit and feel I should explain more. If it's been encrypted in ECB mode then the frequency analysis should show a normal distribution style regardless of any padding to the start/end of the string and key used. Where as encryption in CBC mode should have a very random and probably flat distribution.

Is there any classic 3 byte fingerprint function?

I need a checksum/fingerprint function for short strings (say, 16 to 256 bytes) which fits in a 24 bits word. Is there any well known algorithm for that?
I propose to use a 24-bit CRC as an easy solution. CRCs are available in all lengths and always simple to compute. Wikipedia has a matching entry. The quality is far better than a modulo-reduced sum, because swapping characters will most likely produce a different CRC.
The next step (if it is a real threat to have a wrong string with the same checksum) would be a cryptographic MAC like CMAC. While this is too long out of the book, it can be reduced by taking the first 24 bits.
Simplest thing to do is a basic checksum - add up the bytes in the string, mod (2^24).
You have to watch out for character set issues when converting to bytes though, so everyone agrees on the same encoding of characters to bytes.

Simulink: Bit extraction from 1-Byte Hex

I'm relatively new to Simulink and I am looking for a possibility to extract 1-3 specific bits from one byte.
As far as I know the input format (bin, dec, hex) of the constant is irrelevant for the following!? But how can I say that the constant "1234" is hex and not dec?
In my model I use the "Constant Block" as my source (will be parametrised by a MATLAB variable which comes from a m-file).
A further processing with the "Extract Bits Block" causes an error on incompatible data types.
Can someone help me to deal with this issue?
Greets, poeschlorn
You should probably do the conversion hex->dec in your .m initialization file and use this value in Simulink.
Maybe this is not the most elegant solution, but I converted my input to decimal and then created a BCD representation of it via OR and AND logic blocks for further use.
If you have the Communications Toolbox/Blockset then you can use the Integer to Bit Converter block to do a conversion to a vector of binary digits then just extract the "bits" that you want. The Bit to Integer Converter block will do the reverse transformation.
If you don't have the Communicatins Blockset then it wouldn't be hard to do a similar thing to this using a plain MATLAB Function block.