How can I apply Huffman coding correctly?

How can I apply Huffman coding correctly? - matlab

I applied the zigzag function after quantization to an image block, and I want to compute the Huffman coding of this block. I understand that the input argument must be a vector, and that the histogram should be calculated.
I wrote the following code, but it doesn't seem to work:
[M N]=size(yce);
fun1=zigzag(yce);
count1 = imhist(fun1);
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(count1,p1);
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
I get the following error with the huffmandict function:
Error in project at 65
[dict1,avglen1]=huffmandict(count1,p1);
Source symbols repeat.
zigzag.m is a written function in a matlab file.it converts a matrix into a vector,thus eliminating long sequences of zeros.

The Huffman encoding function (huffmandict) in MATLAB requires that the symbols vector (first argument of the function) must all be unique values. This symbols vector is a list of all possible symbols that are seen in your data that you want to encode / compress. As such, it wouldn't make sense to have a list of all symbols to be encountered if there are duplicates. This is much like a dictionary of words, where it wouldn't make sense to see the same word twice in this dictionary. The second parameter of the function is the associated probabilities of occurrence for each symbol in your sequence.
With huffmandict, what you are doing is you are creating a dictionary for Huffman encoding that consists of all possible unique symbols to be encountered when encoding/decoding as well as their associated probabilities. Therefore, by examining your code, you need to extract both the bin locations as well as the probabilities of occurrence when using imhist. Essentially, you need to call the two element output version of imhist. The second output of imhist gives you a list of all possible intensities / symbols that were encountered in the data, while the first element gives you the frequency of each these intensities / symbols in your data. You then normalize the first output element by the total number of symbols / intensities in your data to get the probabilities (assuming equiprobable encounters of course). Once this is complete, you use both of these as input into huffmandict.
In other words, you need to change only two lines of code, thus:
[M N]=size(yce);
fun1=zigzag(yce);
[count1,x] = imhist(fun1); %// Change
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(x,p1); %// Change
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
Edit
Knowing how fun1 is structured now, do not use imhist. imhist assumes that you are putting in image data, but it doesn't look like that's the case. Instead, try using histc instead to compute the frequency of occurrence. As such, simply modify your code to this:
[M N]=size(yce);
fun1=zigzag(yce);
bins = unique(fun1); %// Change
count1 = histc(fun1, bins); %// Change
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(bins,p1); %// Change
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
unique finds those unique values that are in your vector so that we can use these as bins to calculate our frequencies. This also figures out the all possible symbols seen in the data.

Related

Retrieve Gradient of Reference Line Generated by probplot

I am generating probability plots for a number of data sets in matlab.
I am plotting them using probplot with a weibull distribution reference line
data = [1,1,1,1,2,2,2,3,4,5,3,3,2,2,1,3,5,7,2,4,2] ;
h = probplot('weibull',data) ;
This function as per the matlab documentation returns a graphic array object. This appears to only contain the original data and not the reference line.
Is there any way of retreiving information about about this reference line without plotting it and indiviually extracting it using the figure tools (very much not an option I'd like to go down as there are potentionally hundreds of plots to go through).
I can see there is wblplot that returns a line array of 3 lines, one of which is the original data and one of the others is likely the reference the line however I will have to try different distributions to fit further down the road and would prefer to keep a generic approach.

You are wrong!
data = [1,1,1,1,2,2,2,3,4,5,3,3,2,2,1,3,5,7,2,4,2] ;
h = probplot('weibull',data) ;
b=h(2);
figure
plot(b.XData,b.YData)
h is a graphic array object, so its an array. The first element contains the original data, but the second h(2) contains the reference line.

Append for MATLAB

I am training an ANN, and I want to have different instances of training. In each instance, I want to find the maximum difference between the actual and predicted output. Then I want to take the average of all these maximums.
My code so far is:
maximum = [];
k=1;
for k = 1:5
%Train network
layers = [ ...
imageInputLayer([250 1 1])
reluLayer
fullyConnectedLayer(100)
fullyConnectedLayer(100)
fullyConnectedLayer(1)
regressionLayer];
options = trainingOptions('sgdm','InitialLearnRate',0.1, ...
'MaxEpochs',1000);
net = trainNetwork(nnntrain,nnnfluidtrain,layers,options);
net.Layers
%Test network
predictedn = predict(net,nnntest);
maximum = append(maximum, max(abs(predictedn-nnnfluidtest)));
k=k+1
end
My intent is to produce a list named 'maximum' with five elements (the max of each ANN training instance) that I would then like to take the average of.
However, it keeps giving me the error:
wrong number of input arguments for obsolete matrix-based syntax
when it tries to append. The first input is a list while the second is a 1x1 single.

Appending in MATLAB is a native operation. You append elements by actually building a new vector where the original vector is part of the input.
Therefore:
maximum = [maximum max(abs(predictedn-nnnfluidtest))];
If for some reason you would like to do it in function form, the function you are looking for is cat which is short form for concatenate. The append function is seen in multiple toolboxes but each one of them does not do what you want. cat is what you want but you still need to provide the original input vector as part of the arguments:
maximum = cat(2, maximum, max(abs(predictedn-nnnfluidtest)));
The first argument is the axis you want to append to. To respect the code that you're doing above, you want the columns to increase as you extend your vector so that is the second axis, or the axis being 2.

Encoding a binary vector in a suitable way in Matlab

The context and the problem below are only examples that can help to visualize the question.
Context: Let's say that I'm continously generating random binary vectors G with length 1x64 (whose values are either 0 or 1).
Problem: I don't want to check vectors that I've already checked, so I want to create a kind of table that can identify what vectors are already generated before.
So, how can I identify each vector in an optimized way?
My first idea was to convert the binary vectors into decimal numbers. Due to the maximum length of the vectors, I would need 2^64 = 1.8447e+19 numbers to encode them. That's huge, so I need an alternative.
I thought about using hexadecimal coding. In that case, if I'm not wrong, I would need nchoosek(16+16-1,16) = 300540195 elements, which is also huge.
So, there are better alternatives? For example, a kind of hash function that can identify that vectors without repeating values?

So you have 64 bit values (or vectors) and you need a data structure in order to efficiently check if a new value is already existing?
Hash sets or binary trees come to mind, depending on if ordering is important or not.
Matlab has a hash table in containers.Map.
Here is a example:
tic;
n = 1e5; % number of random elements
keys = uint64(rand(n, 1) * 2^64); % random uint64
% check and add key if not already existing (using a containers.Map)
map = containers.Map('KeyType', 'uint64', 'ValueType', 'logical');
for i = 1 : n
key = keys(i);
if ~isKey(map, key)
map(key) = true;
end
end
toc;
However, depending on why you really need that and when you really need to check, the Matlab function unique might also be something for you.
Just throwing out duplicates once at the end like:
tic;
unique_keys = unique(keys);
toc;
is in this example 300 times faster than checking every time.

Difference between hist and imhist in matlab

What is the difference between hist and imhist functions in Matlab? I have a matrix of color levels values loaded from image with imread and need to count entropy value of the image using histogram.
When using imhist the resulting matrix contains zeros in all places except the last one (lower-right) which contains some high value number (few thousands or so).
Because that output seems to be wrong, I have tried to use hist instead of imhist and the resulting values are much better, the matrix is fulfilled with correct-looking values instead of zeros.
However, according to the docs, imhist should be better in this case and hist should give weird results..
Unfortunately I am not good at Matlab, so I can not provide you with better problem description. I can add some other information in the future, though.
So I will try to better explain my problem..I have an image, for which I should count entropy and few other values (how much bytes it will take to save that image,..). I wrote this function and it works pretty well
function [entropy, bytes_image, bytes_coding] = entropy_single_pixels(im)
im = double(im);
histg = hist(im);
histg(histg==0) = [];
nzhist = histg ./ numel(im);
entropy = -sum(nzhist.*log2(nzhist));
bytes_image = (entropy*(numel(im))/8);
bytes_coding = 2*numel(unique(im));
fprintf('ENTROPY_VALUE:%s\n',num2str(entropy));
fprintf('BYTES_IMAGE:%s\n',num2str(bytes_image));
fprintf('BYTES_CODING:%s\n',num2str(bytes_coding));
end
Then I have to count the same, but I have to make "pairs" from pixels which are below each other. So I have only half the rows and the same count of columns. I need to express every unique pixel pair as a different number, so I multiplied the first one by 1000 and added the second one to it... Subsequently I need to actually apply the same function as in the first example, but that is the time, when I am getting weird numbers from the imhist function. When using hist, it seems to be OK, but I really don't think that behavior is correct, so that must be my error somewhere. I actually understand pretty good, to what I want to do, or at least I hope so, but unfortunately Matlab makes all that kind of hard for me :)

hist- compute histogram(count number of occurance of each pixel) in color image.........
imhist- compute histogram in two dimensional image.

Use im2double instead of double if you want to use imhist. The imhist function expects double or single-precision data to be in the [0,1] data range, which is why you see everything in the last bin of the histogram.

matlab matrices and fold list

i have two problems in mathematica and want to do them in matlab:
measure := RandomReal[] - 0.5
m = 10000;
data = Table[measure, {m}];
fig1 = ListPlot[data, PlotStyle -> {PointSize[0.015]}]
Histogram[data]
matlab:
measure =# (m) rand(1,m)-0.5
m=10000;
for i=1:m
data(:,i)=measure(:,i);
end
figure(1)
plot(data,'b.','MarkerSize',0.015)
figure(2)
hist(data)
And it gives me :
??? The following error occurred
converting from function_handle to
double: Error using ==> double
If i do :
measure =rand()-0.5
m=10000;
data=rand(1,m)-0.5
then, i get the right results in plot1 but in plot 2 the y=axis is wrong.
Also, if i have this in mathematica :
steps[m_] := Table[2 RandomInteger[] - 1, {m}]
steps[20]
Walk1D[n_] := FoldList[Plus, 0, steps[n]]
LastPoint1D[n_] := Fold[Plus, 0, steps[n]]
ListPlot[Walk1D[10^4]]
I did this :
steps = # (m) 2*randint(1,m,2)-1;
steps(20)
Walk1D =# (n) cumsum(0:steps(n)) --> this is ok i think
LastPointold1D= # (n) cumsum(0:steps(n))
LastPoint1D= # (n) LastPointold1D(end)-->but here i now i must take the last "folding"
Walk1D(10)
LastPoint1D(10000)
plot(Walk1D(10000),'b')
and i get an empty matrix and no plot..

Since #Itamar essentially answered your first question, here is a comment on the second one. You did it almost right. You need to define
Walk1D = # (n) cumsum(steps(n));
since cumsum is a direct analog of FoldList[Plus,0,your-list]. Then, the plot in your code works fine. Also, notice that, either in your Mathematica or Matlab code, it is not necessary to define LastPoint1D separately - in both cases, it is the last point of your generated list (vector) steps.
EDIT:
Expanding a bit on LastPoint1D: my guess is that you want it to be a last point of the walk computed by Walk1D. Therefore, it would IMO make sense to just make it a function of a generated walk (vector), that returns its last point. For example:
lastPoint1D = #(walk) (walk(end));
Then, you use it as:
walk = Walk1D(10000);
lastPoint1D(walk)
HTH

You have a few errors/mistakes translating your code to Matlab:
If I am not wrong, the line data = Table[measure, {m}]; creates m copies of measure, which in your case will create a random vector of size (1,m). If that is true, in Matlab it would simply be data = measure(m);
The function you define gets a single argument m, therefor it makes no sense using a matrix notation (the :) when calling it.
Just as a side-note, if you insert data into a matrix inside a for loop, it will run much faster if you allocate the matrix in advance, otherwise Matlab will re-allocate memory to resize the matrix in each iteration. You do this by data = zeros(1,m);.
What do you mean by "in plot 2 the y=axis is wrong"? What do you expect it to be?
EDIT
Regarding your 2nd question, it would be easier to help you if you describe in words what you want to achieve, rather than trying to read your (error producing) code. One thing which is clearly wrong is using expression like 0:steps(n), since you use m:n with two scalars m and n to produce a vector, but steps(n) produces a vector, not a scalar. You probably get an empty matrix since the first value in the vector returned by steps(n) might be -1, and 0:-1 produces an empty vector.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I apply Huffman coding correctly? - matlab

Related

Retrieve Gradient of Reference Line Generated by probplot

Append for MATLAB

Encoding a binary vector in a suitable way in Matlab

Difference between hist and imhist in matlab

matlab matrices and fold list

Categories

Resources