why number of string should be greater than or equal to number of states in pumping lemma? - pumping-lemma

If L is a regular language, then there exists a constant n (which depends on L) such that for every string w in the language L, such that the length of w is greater than or equal to n, we can divide w into three strings, w = xyz.
w = length of string. n = Number of States.
Why should we pick w greater than or equal to n?
and what is Pumping length?

If you look at the complete statement of the lemma (http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages), you can see that it is actually stating that every string is formed by a prefix x, a part that can be repeated any number of times y and a suffix z. Now it is obvious that, in the shortest case (when the repeating part is taken only once), the length of w equals the number of states needed for the language. This Wikipedia image is very useful:
http://en.wikipedia.org/wiki/File:Pumping-Lemma_xyz_svg.svg

You seem to be misunderstanding the lemma (which you also have not stated completely), and mixing aspects of a proof with what you did state. The lemma says that for every regular language L, there is a constant p such that every string of at least p symbols that belongs to L has a non-empty substring of length no greater than p that can be "pumped", always yielding another element of L. The constant p is the (a) "pumping length".
This can be proved by observing that if a language is regular then there is a finite state automaton that accepts it, and taking p to be the number of states in that automaton (details omitted).
That does not imply, however, that the number of states in the smallest FSA the recognizes a given regular language is the smallest possible pumping length for that language. For instance, consider the language consisting of the union of { an } and { bn } for all n. You need a four-state FSA to recognize this language, but its minimum pumping length is 1.

Related

Proof for pumping Lemma linear context free language

Where can I find the proof for the linear context free languages pumping lemma?
I am looking for the proof that is specific for the linear context free language
I also looked for the formal prof and could not find one.Not sure if the below is a formal prof but it may give you some idea.
The lemma : For every linear context free languages L there is an n>0 so that for every w in L with |w| > n we can write w as uvxyz such that |vy|> 0,|uvyz| <= n and uv^ixy^iz for every i>= 0 is in L.
"Proof":
Imagine a parse tree for some long string w in L with a start symbol S. Also lets assume that the tree does not contains non useful nodes. If w is long enough, there will be at least one non terminal repeating more than once. Lets call the first repeating non terminal going down the tree X, its first occurrence (from the top) as X[1] and its second occurrence as X[2].Let x be the string in w generated by X[2], vxy the string generated by X[1]and uvxyz the full string w generated by S. Since the movement from X[1] to X[2] generates v,y we could theoretically generate a new tree where we replicate this move multiple times before moving from X[1] down.This proves that uv^ixy^iz for every i>= 0 is in L. Since our tree contains no useless nodes, moving from X[1] to X[2] must generate some terminals and this proves that |vy|> 0.L is linear which means that on every level of the tree we have a single non terminal symbol. Each node in the tree covers some substring in w that its length is bounded by a linear function of the node height. Moving from S to X[2] covers uv and yz from w and the number of tree levels traveled is bounded by (2 * the number of non-terminals symbols + 1). Since the number of levels traveled is bounded and the tree is linear it also puts a bound on the yield of the movement from S to X[2] which means ,|uvyz| <= n for some n >= 0.
Note: Keep in mind that we construct X[1] , X[2] top down , in contradiction to how we prove the “regular” pumping lemma for context free grammar in general. In the "regular” pumping lemma there is a bound on the height of X[1] and therefore a bound on |vxy|. In our case there is no bound on the height of X[1]and it can be as high as required by the length of w. There is a bound,however,on the number of tree levels from S to X[2].This does not means much if the grammar is not linear as the output going from S to X[2] is still bounded only by the high of S (that is unbounded). But in the linear case,this output is bounded and therefore |uvyz| <= n

Hashing using division method

For the hash function : h(k) = k mod m;
I understand that m=2^n will always give the last n LSB digits. I also understand that m=2^p-1 when K is a string converted to integers using radix 2^p will give same hash value for every permutation of characters in K. But why exactly "a prime not too close to an exact power of 2" is a good choice? What if I choose 2^p - 2 or 2^p-3? Why are these choices considered bad?
Following is the text from CLRS:
"A prime not too close to an exact power of 2 is often a good choice for m. For
example, suppose we wish to allocate a hash table, with collisions resolved by
chaining, to hold roughly n D 2000 character strings, where a character has 8 bits.
We don’t mind examining an average of 3 elements in an unsuccessful search, and
so we allocate a hash table of size m D 701. We could choose m D 701 because
it is a prime near 2000=3 but not near any power of 2."
Suppose we work with radix 2p.
2p-1 case:
Why that is a bad idea to use 2p-1? Let us see,
k = ∑ai2ip
and if we divide by 2p-1 we just get
k = ∑ai2ip = ∑ai mod 2p-1
so, as addition is commutative, we can permute digits and get the same result.
2p-b case:
Quote from CLRS:
A prime not too close to an exact power of 2 is often a good choice for m.
k = ∑ai2ip = ∑aibi mod 2p-b
So changing least significant digit by one will change hash by one. Changing second least significant bit by one will change hash by two. To really change hash we would need to change digits with bigger significance. So, in case of small b we face problem similar to the case then m is power of 2, namely we depend on distribution of least significant digits.

Is it possible to implement universal hashing for the complete range of integers?

I am reading about Universal hashing on integers. The prerequisite and mandatory precondition seems to be that we choose a prime number p greater than the set of all possible keys.
I am not clear on this point.
If our set of keys are of type int then this means that the prime number needs to be of the next bigger data type e.g. long.
But eventually whatever we get as the hash would need to be down-casted to an int to index the hash table. Doesn't this down-casting affect the quality of the Universal Hashing (I am referring to the distribution of the keys over the buckets) somehow?
If our set of keys are integers then this means that the prime number
needs to be of the next bigger data type e.g. long.
That is not a problem. Sometimes it is necessary otherwise the hash family cannot be universal. See below for more information.
But eventually whatever we get as the hash would need to be
down-casted to an int to index the hash table.
Doesn't this down-casting affect the quality of the Universal Hashing
(I am referring to the distribution of the keys over the buckets)
somehow?
The answer is no. I will try to explain.
Whether p has another data type or not is not important for the hash family to be universal. Important is that p is equal or larger than u (the maximum integer of the universe of integers). It is important that p is big enough (i.e. >= u).
A hash family is universal when the collision probability is equal or
smaller than 1/m.
So the idea is to hold that constraint.
The value of p, in theory, can be as big as a long or more. It just needs to be an integer and prime.
u is the size of the domain/universe (or the number of keys). Given the universe U = {0, ..., u-1}, u denotes the size |U|.
m is the number of bins or buckets
p is a prime which must be equal or greater than n
the hash family is defined as H = {h(a,b)(x)} with h(a,b)(x) = ((a * x + b) mod p) mod m. Note that a and b are randomly chosen integers (from all possible integers, so theoretically can be larger than p) modulo a prime p (which can make them either smaller or larger than m, the number of bins/buckets); but here too the data type (domain of values does not matter). See Hashing integers on Wikipedia for notation.
Follow the proof on Wikipedia and you conclude that the collision probability is _p/m_ * 1/(p-1) (the underscores mean to truncate the decimals). For p >> m (p considerably bigger than m) the probability tends to 1/m (but this does not mean that the probability would be better the larger p is).
In other terms answering your question: p being a bigger data type is not a problem here and can be even required. p has to be equal or greater than u and a and b have to be randomly chosen integers modulo p, no matter the number of buckets m. With these constraints you can construct a universal hash family.
Maybe a mathematical example could help
Let U be the universe of integers that correspond to unsigned char (in C for example). Then U = {0, ..., 255}
Let p be (next possible) prime equal or greater than 256. Note that p can be any of these types (short, int, long be it signed or unsigned). The point is that the data type does not play a role (In programming the type mainly denotes a domain of values.). Whether 257 is short, int or long doesn't really matter here for the sake of correctness of the mathematical proof. Also we could have chosen a larger p (i.e. a bigger data type); this does not change the proof's correctness.
The next possible prime number would be 257.
We say we have 25 buckets, i.e. m = 25. This means a hash family would be universal if the collision probability is equal or less than 1/25, i.e. approximately 0.04.
Put in the values for _p/m_ * 1/(p-1): _257/25_ * 1/256 = 10/256 = 0.0390625 which is smaller than 0.04. It is a universal hash family with the chosen parameters.
We could have chosen m = u = 256 buckets. Then we would have a collision probability of 0.003891050584, which is smaller than 1/256 = 0,00390625. Hash family is still universal.
Let's try with m being bigger than p, e.g. m = 300. Collision probability is 0, which is smaller than 1/300 ~= 0.003333333333. Trivial, we had more buckets than keys. Still universal, no collisions.
Implementation detail example
We have the following:
x of type int (an element of |U|)
a, b, p of type long
m we'll see later in the example
Choose p so that it is bigger than the max u (element of |U|), p is of type long.
Choose a and b (modulo p) randomly. They are of type long, but always < p.
For an x (of type int from U) calculate ((a*x+b) mod p). a*x is of type long, (a*x+b) is also of type long and so ((a*x+b) mod p is also of type long. Note that ((a*x+b) mod p)'s result is < p. Let's denote that result h_a_b(x).
h_a_b(x) is now taken modulo m, which means that at this step it depends on the data type of m whether there will be downcasting or not. However, it does not really matter. h_a_b(x) is < m, because we take it modulo m. Hence the value of h_a_b(x) modulo m fits into m's data type. In case it has to be downcasted there won't be a loss of value. And so you have mapped a key to a bin/bucket.

Basic Pumping Lemma proof doesn't make sense

Proving that a^n b^n, n >= 0, is non-regular.
Using the string a^p b^p.
Every example I've seen claims that y can either contain a's, b's, or both. But I don't see how y can contain anything other than a's, because if y contains any b's, then the length of xy must be greater than p, which makes it invalid.
Conversely, for examples such as:
www, w is {a, b}*, the string used is a^p b a^p b a^p b. In the proofs I've seen, it claims that y cannot contain anything other than a's, for the reason I stated above. Why is this different?
Also throwing in another question:
Describe the error in the following "proof" that 0* 1* is not a regular language. (An
error must exist because 0* 1* is regular.) The proof is by contradiction. Assume
that 0* 1* is regular. Let p be the pumping length for 0* 1* given by the pumping
lemma. Choose s to be the string OP P. You know that s is a member of 0* 1*, but
a^p b^p cannot be pumped. Thus you have a contradiction. So 0* 1* is not regular.
I can't find any problem with this proof. I only know that 0*1* is a regular language because I can construct a DFA.
The pumping lemma states that for a regular language L:
for all strings s greater than p there exists a subdivision s=xyz such that:
For all i, xyiz is in L;
|y|>0; and
|xy|<p.
Now the claim that y can only contain a's or b's originates from the first item. Since if it contained both a's and b's, with i=2, this would result in a string of the form aa...abb...baa...b, etc. That's what the statement wants to say.
The third part indeed, makes it obvious that y can only contain a's. In other words, what the textbooks say is a conclusion derived from the first item.
Finally if you combine 1., 2. and 3., one reaches contradiction, because we know y must contain at least one character (2.), the string can only contain a's. Say y contains k a's. If we would "pump" this with i=2, the result is that we generate a string:
s'=xy2z=ap+kbp
We know however that s' is not part of L, which it should be by 1., so we reach inconsistency.
You can thus only make the proof work by combining the three items. It's not enough to know that y consist only out of a's: that doesn't result in contradiction. It's because there is no subdivision available that satisfies all three constraints simultaneously.
About your second question. In that case, L looks different. You can't reuse the proof of a^nb^n because L is perfectly happy if the string contains more a's. In other words, you can't find a contradiction. In other words, the last item of the proof fails. As long as y contains only one type of characters - regardless of its length - it can satisfy all three constraints.

Prove language irregular with pumping Lemma

I am trying to prove that the following language is not regular using the pumping lemma
L= { a^i b^j | i^2 > j}
Any tips on this? I am completely stuck.
Thanks.
The pumping lemma says:
If a language A is regular => there is a number p (pumping length) where, if s is any string in L such that |s| >= p, then s may be divided into three pieces s=xyz, satisfying the following condition:
xyiz is in L for each i>=0
|y|>=0
p>=|xy|
The right way to show that a certain language L is not regular is to suppose L regular and try to reach a contradiction.
Lets try to demonstrate that L = {0n1n}|n>=0} is not regular.
We start assuming to the contrary that L is regular.
You can think about this kind of demonstration as a game:
Challenger: He choose the pumping length p. You cannot do any presumption on it.
You: Now it is your turn: choose the "kind" of string that represents the irregularity of the language.
Lets say that the string is in the form 0p1p.
A good tip in this step is to try to limit the adversary next move.
Challenger: He presents to you a string s in the form 0p1p.
You: It's time to pump! If you chose correctly the form of the string in your previous move, you can do some assumption. In our case, for example, we know that the substring y consists only of 0s (at least one 0 because |y|>0), because |xy|<=p and first p-elements are 0s.
Now we show that it exists i>=0 such that xyiz is not in L. For example, for i=2 the string xyyz has more 0s than 1s and so is not a member of L. This case is a contradiction. => L is not regular.
Never forget to demonstrate why the pumped string cannot be a member of L.
If you have any doubt, feel free to ask :)
Cheers.
To the above answer, "The pumping lemma says: If a language A is regular => there is a number p (pumping length) where, if s is any string in L such that |s| >= p, then s may be divided into three pieces s=xyz, satisfying the following condition:"
You mean "If a language L is regular"
Also, the three conditions
1. xy^iz is in L for each i>=0
2. |y|>=0
3. p>=|xy|
The second should be just |y| > 0 not >=
Say you choose the string:
a^2b^5
aabbbbb. Which is in the language.
Now your opponent can choose XYZ.
Their options:
1.) X(empty)Y(some a's)
2.) X(some a's)Y(some a's and some b's)
3.) X(some a's)Y(some a's)
Based on their possible choices, you pump up Y using Y^i where i is an arbitrary number of your choice.
Say they choose 1.)
X(-)Y(a)Z(abbbbb)
If you "pump" up Y^i choosing i = 0. The new string becomes abbbbb. Which is not in the language.
Repeat this for each possible choice of the opponent, if you can pump up Y in a way that produces a string that is not in the language L, then you've succeeded in proving that the language is not regular.