I am working on compressing an arbitrary vector with MATLAB, which provides factory methods for Huffman Coding: huffmandict, huffmanenco, huffmandeco.
The huffmandict function produces a lookup table mapping each symbol in the signal we want to encode to its corresponding codeword which is needed to encode and then decode the signal.
It is trivial to generate the dictionary when you know the input vector. But say I'm compressing to send from Alice to Bob - I can't assume Bob knows the dictionary too - so Alice needs to send the dictionary along with the huffman code!
Is there a way in MATLAB of generating a bitstream representation of the dictionary to be prepended to our huffman code to allow for it to be decoded at the other end?
What I'm thinking is the resulting code looks like if N is the length of the encoded dictionary:
(N encoded as 8 bits)(huffman dict encoded in N bits)(huffman code)
It seems odds that MATLAB provides quite powerful factory methods for the encoding but then does not even bother to make it actually usable in digital transmission with a lot of extra work.
I understand that in the theory, a huffman tree is often built - is there a way to generate this in MATLAB, and then convert such tree back to a dictionary?
I know of two efficient code expression methods used in JPEG and gzip but as I understand they require the dictionary to be canonical, meaning that every branch on the right side (starting with 1) have to be longer.
So you have to convert the code to a canonical form since there are 2^n (n being the number of codewords) designs.
Canonical has the same expected length.
Then you can express each symbol by the length of its branch, limiting to a reasonable number like 2^4 (meaning 4 bits for each symbol).
Ok, let's get to the code, for the vector to be sent:
for i = 1:size(dict,1)
L(i) = numel(dict{i,2})
end
In the receiving side you have to do a little more (I assume there is some fixed order in your codewords labels):
k = 0;
for l = 1:16
k = k * 2;
for j = find(L==l)
d{j,1} = j;
d{j,2} = de2bi(k, 'left-msb', l);
k = k + 1;
end
end
For converting to canonical form you need only to convert your tree to vector format and back.
Related
Here is the code:
x = rand(5)*100;
save('pqfile.txt','x','-ascii','-tabs')
The above works, but:
x = rand(5)*100;
x = uint8(x);
save('pqfile.txt','x','-ascii','-tabs')
says:
Warning: Attempt to write an unsupported data type to an ASCII file.
Variable 'x' not written to file.
Does anyone know why this happens? How come I can't save the data when it is uint8. I have to read data into a VHDL testbench so was experimenting. I guess the only option is to save my 8 bit unsigned integer values in 2d array using printf then read into the test bench.
ASCII option
The save method is somewhat restrictive in what it can support, and then it uses floating point notation to represent your numbers which bloats your file when dealing with a limited range of numbers like you are (i.e. uint8, 0 to 255).
Check out dlmwrite as an alternative (documentation here).
It takes the filename to write/save to, the variable to store, and some additional parameters, like the delimiter you want to separate your values with.
For your example, it looks like this
x = rand(5)*100;
x = uint8(x);
dlmwrite('pqfile.txt',x,'\t');
Binary option
If you are looking to stored your uint8 data as single bytes then you probably want go with a custom binary file instead instead of ASCII. (Yes, you can convert uint8 to single ASCII characters but you run into issues with these values being interpreted with your delimiters; newlines or tabs.)
fid=fopen('pqfile.dat','wb');
if(fid>2)
fwrite(fid,size(x),'*uint8'); % Note: change data type here you are dealing with more than 255 rows or columns
fwrite(fid,x','*uint8'); % Transpose x (with x') so it is stored in row order.
fclose(fid);
else
fprintf(1,'Could not open the file for writing.\n');
end
I'm not sure what type of parser you are using for your VHDL, but this will pack your data into a file with a short header of the expected dimensions followed by one long row of your serialized data.
To read it back in with MATLAB, you can do this:
fid = fopen('pqfile.dat','rb');
szX = fread(fid,2,'uint8');
x = fread(fid,szX,'*uint8')'; % transpose back if you are dealing with matlab.
fclose(fid);
The transpose operations are necessary for MATLAB because it reads data column-wise, whereas most other languages (in my experience) read row-wise.
The context and the problem below are only examples that can help to visualize the question.
Context: Let's say that I'm continously generating random binary vectors G with length 1x64 (whose values are either 0 or 1).
Problem: I don't want to check vectors that I've already checked, so I want to create a kind of table that can identify what vectors are already generated before.
So, how can I identify each vector in an optimized way?
My first idea was to convert the binary vectors into decimal numbers. Due to the maximum length of the vectors, I would need 2^64 = 1.8447e+19 numbers to encode them. That's huge, so I need an alternative.
I thought about using hexadecimal coding. In that case, if I'm not wrong, I would need nchoosek(16+16-1,16) = 300540195 elements, which is also huge.
So, there are better alternatives? For example, a kind of hash function that can identify that vectors without repeating values?
So you have 64 bit values (or vectors) and you need a data structure in order to efficiently check if a new value is already existing?
Hash sets or binary trees come to mind, depending on if ordering is important or not.
Matlab has a hash table in containers.Map.
Here is a example:
tic;
n = 1e5; % number of random elements
keys = uint64(rand(n, 1) * 2^64); % random uint64
% check and add key if not already existing (using a containers.Map)
map = containers.Map('KeyType', 'uint64', 'ValueType', 'logical');
for i = 1 : n
key = keys(i);
if ~isKey(map, key)
map(key) = true;
end
end
toc;
However, depending on why you really need that and when you really need to check, the Matlab function unique might also be something for you.
Just throwing out duplicates once at the end like:
tic;
unique_keys = unique(keys);
toc;
is in this example 300 times faster than checking every time.
I guess it's for most of the people quite an easy one but I still have no idea about it...
Let's make an example: I have the number
a=vpa('123456789123456789')
or also
a=sym('123456789123456789')
This number is saved in a correct way as a symbolic number, however if I convert it to double by typing
b=double(a);
I get a calculation error; For this reason if I then use the method
dec2bin(b),
I don't get the exact result; So does anyone have an idea how to get the correct binary representation of a?
You'd help me a lot - thank you very much! :)
Until a better solution is found, you can brute-force a binary representation, making use of the fact that log2() works for symbolic arrays.
You can successively search the highest non-zero bits of a like so:
a = vpa('123456789123456789');
onebits = [];
onebit = floor(log2(a));
while onebit>=0
onebits = [onebits onebit];
a = a - 2^onebit;
onebit = floor(log2(a));
end
% construct binary representation
a_bin = zeros(1,max(double(onebits))+1);
a_bin(onebits+1) = 1; %take care: coeff of 2^0 will be index 1
a_bin = fliplr(a_bin); %put highest bit first
The result will be the binary representation of your integer a. You can convert to a string with num2str() or perhaps sprintf('%d',a_bin) if you wish.
If you use a sufficiently small test number (for which bin2dec is applicable, i.e. at most 52 bits large), you'll see that bin2dec(sprintf('%d',a_bin)) will indeed restore your original integer.
You can do the same procedure with floats, and stop once a sufficiently small bit is reached. You just have to be careful when storing the binary pattern, to interpret the indices correctly.
this is a Fourier descriptor for a set of points
a =
-3.4173 - 7.1634i
7.4589 + 0.1321i
3.1190 - 2.1870i
-7.1979 + 0.2863i
5.9594 + 0.8209i
-5.4295 -15.7931i
-1.0957 + 3.7485i
0.2657 - 4.1459i
7.4644 - 0.9546i
i need to sum each pair , but when i use a(1) or a(1,1) it produces -3.4173 - 7.1634i
when i use abs(a(1)) or abs(a(1,1)) it also produces 7.9367 which does not make sense for me !
what i need is how to access each element of any pair individually so i get -3.4173 alone and - 7.1634i alone as well so i can do normalization on it !
You have an array of complex numbers, and what you want to do is access the real and imaginary parts of each number.
r = real(a);
i = imag(a);
will result in r and i containing real and imaginary parts of your descriptor respectively.
To understand the reason you get an answer that "doesn't make any sense" from abs(a(1)), follow this link.
Your data type looks confusing because a(1,1) shouldn't give you back the imaginary part of a number... your array should just be 1-dimensional if the values are just complex numbers. But try using the real() and imag() functions on the elements of your array, which will return the real and imaginary parts respectively. You might want to consider using a different data structure though, because Matlab can handle regular complex values just fine, and in that case simply using abs() should give the modulus of the number.
I am trying to use MATLAB in order to simulate a communications encoding and decoding mechanism. Hence all of the data will be 0's or 1's.
Initially I created a vector of a specific length and populated with 0's and 1's using
source_data = rand(1,8192)<.7;
For encoding I need to perform XOR operations multiple times which I was able to do without any issue.
For the decoding operation I need to implement the Gaussian Elimination method to solve the set of equations where I realized this vector representation is not very helpful. I tried to use strcat to append multiple 0's and 1's to a variable a using a for loop:
for i=1:8192
if(mod(i,2)==0)
a = strcat(a,'0');
else
a = strcat(a,'1');
end
i = i+1;
disp(i);
end
when I tried length(a) after this I found that the length was 16384, which is twice 8192. I am not sure where I am going wrong or how best to tackle this.
Did you reinitialize a before the example code? Sounds like you ran it twice without clearing a in between, or started with a already 8192 long.
Growing an array in a loop like this in Matlab is inefficient. You can usually find a vectorized way to do stuff like this. In your case, to get an 8192-long array of alternating ones and zeros, you can just do this.
len = 8192;
a = double(mod(1:len,2) == 0);
And logicals might be more suited to your code, so you could skip the double() call.
There are probably a few answer/questions here. Firstly, how can one go from an arbitrary vector containing {0,1} elements to a string? One way would be to use cellfun with the converter num2str:
dataDbl = rand(1,8192)<.7; %see the original question
dataStr = cellfun(#num2str, num2cell(dataDbl));
Note that cellfun concatenates uniform outputs.