Validity of Huffman encoding - encoding

The encoding is
A : 0
B : 11
C : 100
D : 101
The sure thing I can say from this is A has the highest probability, followed by B and then either of C and D (since prob(C) + prob(D) < prob(B)).
However, considering A > B > C > D in terms of probability values, or A > B > D > C, I don't get the above encoding as:
--0--A
|
--1----1---B <- Should actually be zero as left branch should always be zero
|
---0-----0--C <- First zero should be 1, but it's not.
|
---1--D
It does satisfy the prefix constraint, but is there a counter-example that shows that the above encoding does in fact work?
Thanks!

There is nothing that says which branch should be zero and which should be one. You can come up with many encodings of the same set of symbol lengths. (Look up "canonical Huffman code" for a convention to resolve this.) The encoding "works" because it is a prefix code, and it will compress the symbols well if their probabilities are close to 1/2, 1/4, 1/8, and 1/8.

Related

NFA to accept the following language

I need to build an NFA (or DFA) to recognize the following language:
L = {w | w mod 3 = 1}.
So the way I tried it was to make an NFA to recognize numbers divisible by 3 and then just add 1 to them, but this approach is a lot harder than it seems (if not impossible ?).
I only managed to do an NFA to recognize numbers divisible by 3.
I will assume that w is to be interpreted as the decimal representation (without leading zeroes) of a nonnegative integer.
Given this, we can use Myhill-Nerode to iteratively determine the states we need:
the empty string can be followed by any string in L to get to a string in L. We'll call the equivalence class for this [e]. Note that this equivalence class corresponds to the initial state of a minimal DFA for L (if one exists). Note also that the initial state is not accepting since the empty string is not a valid decimal representation of a nonnegative integer.
the string 0 cannot be followed by anything to get a string in L; it leads to a dead state corresponding to equivalence class [0].
strings 1, 4 and 7 are in L so they must correspond to a new state. We'll call the equivalence class for these [1].
strings 2, 5 and 8 are not in L; however, not all strings in L lead them to strings in L. These must correspond to a new equivalence class we'll call [2].
strings 3, 6 and 9 are not in L; but these can be followed by anything in L to get a string in L. This is the same as the empty string, so we don't need a new equivalence class or state: the equivalence class is [e].
it can be verified that every two-digit decimal string is indistinguishable from some one-digit decimal string above. so, no new equivalence classes or states are needed.
To determine the transitions, simply append the transition symbol to the equivalence class's representative element and see what equivalence class the resulting string belongs to: that will be where the transition terminates. For instance, there is a transition from [e] to [0] on 0, from [e] to [1] on 1, etc.
Because 10 = 1 (mod 3), adding a new digit to the end of a decimal string will cause the new value modulo 3 to be the sum of the original number's value modulo 3 with the value of the new digit modulo 3:
x = a (mod 3)
y = b (mod 3)
x * 10 = x * 1 (mod 3) since 10 = 1 (mod 3)
x . y = x * 10 + y = x * 1 + y = x + y (mod 3)
Filling in the transitions is left as an exercise.

Calculating the e number using Raku

I'm trying to calculate the e constant (AKA Euler's Number) by calculating the formula
In order to calculate the factorial and division in one shot, I wrote this:
my #e = 1, { state $a=1; 1 / ($_ * $a++) } ... *;
say reduce * + * , #e[^10];
But it didn't work out. How to do it correctly?
I analyze your code in the section Analyzing your code. Before that I present a couple fun sections of bonus material.
One liner One letter1
say e; # 2.718281828459045
"A treatise on multiple ways"2
Click the above link to see Damian Conway's extraordinary article on computing e in Raku.
The article is a lot of fun (after all, it's Damian). It's a very understandable discussion of computing e. And it's a homage to Raku's bicarbonate reincarnation of the TIMTOWTDI philosophy espoused by Larry Wall.3
As an appetizer, here's a quote from about halfway through the article:
Given that these efficient methods all work the same way—by summing (an initial subset of) an infinite series of terms—maybe it would be better if we had a function to do that for us. And it would certainly be better if the function could work out by itself exactly how much of that initial subset of the series it actually needs to include in order to produce an accurate answer...rather than requiring us to manually comb through the results of multiple trials to discover that.
And, as so often in Raku, it’s surprisingly easy to build just what we need:
sub Σ (Unary $block --> Numeric) {
(0..∞).map($block).produce(&[+]).&converge
}
Analyzing your code
Here's the first line, generating the series:
my #e = 1, { state $a=1; 1 / ($_ * $a++) } ... *;
The closure ({ code goes here }) computes a term. A closure has a signature, either implicit or explicit, that determines how many arguments it will accept. In this case there's no explicit signature. The use of $_ (the "topic" variable) results in an implicit signature that requires one argument that's bound to $_.
The sequence operator (...) repeatedly calls the closure on its left, passing the previous term as the closure's argument, to lazily build a series of terms until the endpoint on its right, which in this case is *, shorthand for Inf aka infinity.
The topic in the first call to the closure is 1. So the closure computes and returns 1 / (1 * 1) yielding the first two terms in the series as 1, 1/1.
The topic in the second call is the value of the previous one, 1/1, i.e. 1 again. So the closure computes and returns 1 / (1 * 2), extending the series to 1, 1/1, 1/2. It all looks good.
The next closure computes 1 / (1/2 * 3) which is 0.666667. That term should be 1 / (1 * 2 * 3). Oops.
Making your code match the formula
Your code is supposed to match the formula:
In this formula, each term is computed based on its position in the series. The kth term in the series (where k=0 for the first 1) is just factorial k's reciprocal.
(So it's got nothing to do with the value of the prior term. Thus $_, which receives the value of the prior term, shouldn't be used in the closure.)
Let's create a factorial postfix operator:
sub postfix:<!> (\k) { [×] 1 .. k }
(× is an infix multiplication operator, a nicer looking Unicode alias of the usual ASCII infix *.)
That's shorthand for:
sub postfix:<!> (\k) { 1 × 2 × 3 × .... × k }
(I've used pseudo metasyntactic notation inside the braces to denote the idea of adding or subtracting as many terms as required.
More generally, putting an infix operator op in square brackets at the start of an expression forms a composite prefix operator that is the equivalent of reduce with => &[op],. See Reduction metaoperator for more info.
Now we can rewrite the closure to use the new factorial postfix operator:
my #e = 1, { state $a=1; 1 / $a++! } ... *;
Bingo. This produces the right series.
... until it doesn't, for a different reason. The next problem is numeric accuracy. But let's deal with that in the next section.
A one liner derived from your code
Maybe compress the three lines down to one:
say [+] .[^10] given 1, { 1 / [×] 1 .. ++$ } ... Inf
.[^10] applies to the topic, which is set by the given. (^10 is shorthand for 0..9, so the above code computes the sum of the first ten terms in the series.)
I've eliminated the $a from the closure computing the next term. A lone $ is the same as (state $), an anonynous state scalar. I made it a pre-increment instead of post-increment to achieve the same effect as you did by initializing $a to 1.
We're now left with the final (big!) problem, pointed out by you in a comment below.
Provided neither of its operands is a Num (a float, and thus approximate), the / operator normally returns a 100% accurate Rat (a limited precision rational). But if the denominator of the result exceeds 64 bits then that result is converted to a Num -- which trades performance for accuracy, a tradeoff we don't want to make. We need to take that into account.
To specify unlimited precision as well as 100% accuracy, simply coerce the operation to use FatRats. To do this correctly, just make (at least) one of the operands be a FatRat (and none others be a Num):
say [+] .[^500] given 1, { 1.FatRat / [×] 1 .. ++$ } ... Inf
I've verified this to 500 decimal digits. I expect it to remain accurate until the program crashes due to exceeding some limit of the Raku language or Rakudo compiler. (See my answer to Cannot unbox 65536 bit wide bigint into native integer for some discussion of that.)
Footnotes
1 Raku has a few important mathematical constants built in, including e, i, and pi (and its alias π). Thus one can write Euler's Identity in Raku somewhat like it looks in math books. With credit to RosettaCode's Raku entry for Euler's Identity:
# There's an invisible character between <> and i⁢π character pairs!
sub infix:<⁢> (\left, \right) is tighter(&infix:<**>) { left * right };
# Raku doesn't have built in symbolic math so use approximate equal
say e**i⁢π + 1 ≅ 0; # True
2 Damian's article is a must read. But it's just one of several admirable treatments that are among the 100+ matches for a google for 'raku "euler's number"'.
3 See TIMTOWTDI vs TSBO-APOO-OWTDI for one of the more balanced views of TIMTOWTDI written by a fan of python. But there are downsides to taking TIMTOWTDI too far. To reflect this latter "danger", the Perl community coined the humorously long, unreadable, and understated TIMTOWTDIBSCINABTE -- There Is More Than One Way To Do It But Sometimes Consistency Is Not A Bad Thing Either, pronounced "Tim Toady Bicarbonate". Strangely enough, Larry applied bicarbonate to Raku's design and Damian applies it to computing e in Raku.
There is fractions in $_. Thus you need 1 / (1/$_ * $a++) or rather $_ /$a++.
By Raku you could do this calculation step by step
1.FatRat,1,2,3 ... * #1 1 2 3 4 5 6 7 8 9 ...
andthen .produce: &[*] #1 1 2 6 24 120 720 5040 40320 362880
andthen .map: 1/* #1 1 1/2 1/6 1/24 1/120 1/720 1/5040 1/40320 1/362880 ...
andthen .produce: &[+] #1 2 2.5 2.666667 2.708333 2.716667 2.718056 2.718254 2.718279 2.718282 ...
andthen .[50].say #2.71828182845904523536028747135266249775724709369995957496696762772

Pumping Lemma For A CFL

I'm trying to prove that following language is not context free:
{a^n b^m a^n b^m : n,m >= 0}
I know that I need to use the pumping lemma. So I have to use w = uvxyz where |vy| > 0 and |xyz| > p (pumping length). I know that I need to show that when pumped a string for each case is outside the language.
I know the cases for when v^i or y^i is contains something and the other is empty but I don't know what to pick for my string.

Is this the simplified version of this boolean expression? Or is this reviewer wrong

Cause I've tried doing the truth table unfortunately one has 3 literals and the other has 4 so i got confused.
F = (A+B+C)(A+B+D')+B'C;
and this is the simplified version
F = A + B + C
http://www.belley.org/etc141/Boolean%20Sinplification%20Exercises/Boolean%20Simplification%20Exercise%20Questions.pdf
cause I think there's something wrong with this reviewer.. or is it accurate?
btw is simplification different from minimizing from Sum of Minterms to Sum of Products?
Yes, it is the same.
Draw the truth table for both expressions, assuming that there are four input variables in both. The value of D will not play into the second truth table: values in cells with D=1 will match values in cells with D=0. In other words, you can think of the second expression as
F = A +B + C + (0)(D)
You will see that both tables match: the (A+B+C)(A+B+D') subexpression has zeros in ABCD= {0000, 0001, 0011}; (A+B+C) has zeros only at {0000, 0001}. Adding B'C patches zero at 0011 in the first subexpressions, so the results are equivalent.

Simplify Boolean expression with De Morgan's laws

I need to simplify this Boolean expression with De Morgan's laws.
¬c xor (¬b ∨ c)
Could someone help me?
(accidentally made two accounts, so just responding with this one)
Ive found the best way to visualize a logic formula you do not understand is to make a table for it.
In the case of XOR, it represents One variable or another, but not both. So, lets make a table for A XOR B
A | B | Result
T | T | F *1
T | F | T *2
F | T | T *3
F | F | F *4
To generate the smallest possible result from the above table we can first take the most complex result that takes into account each option. Converting each line into a logical statement is fairly easy.
First, throw out anything that results in a False, Then take those that result in true, and convert them into a logical statement separated by 'OR's. In this case, 1 and 4 are false, and 2 and 3 are true. This means we only need to create logical statements for 2 and 3. I think how to do so would be best explained by example
Lets say X, Y, and Z are our variables, and the table gave us the following rows as true:
T | T | F - X & Y & ¬Z
F | T | F - ¬X & Y & ¬Z
F | F | F - ¬X & ¬Y & ¬Z
then to complete, we simply 'OR' them together
(X & Y & ¬Z) V (¬X & Y & ¬Z) V (¬X & ¬Y & ¬Z)
as you can see, where the variable is true, you put the variable directly in, and where it is false, you put a '¬' before the variable. The statement above basically says...
(True when X=T,Y=T,Z=F: False otherwise) OR (True when X=F,Y=T,Z=F: False otherwise) OR (True when X=F,Y=F,Z=F: False otherwise)
So finally bringing it back to our XOR the table rows are...
*2 A & ¬B
*3 ¬A & B
and are combined to be...
(A & ¬B) V (¬A & B)
So, now that you have an explanation of what to do with xor, you can apply this example to your problem, and come up with a logical statement you can use De Morgan's laws on to simplify.
first you have to split up xor into its basic form.
XOR represents A or B where A != B. If you can do that you should have more luck using demorgans on the whole equation