How to convert this non-deterministic FSA to a deterministic one? - discrete-mathematics

wondering how to convert this non-deterministic FSA
to a deterministic one, specifically the tables.
I have got this so far but am stuck:
How do i assign BHMQ? For state one, won't there be four digits? (C, I, N, R)?

The concept you are looking for is the so-called power set construction. From the initial automaton, the state set is converted by taking its power set, which is the set of all of its subsets, to be the the state set of the resulting automaton; the final states are the ones which contain at least one of the initial final states.
The resulting automaton is equivalent to the initial one in the sense that the accepted languages are equal. However, the number of states may grow exponentially.

Related

Matlab random number rng: choosing a seed

I would like to know more precisely what happends when you choose a custom seed in Matlab, e.g.:
rng(101)
From my (limited, nut nevertheless existing) understanding of how pseudo-random number generators work, one can see the seed conceptually as choosing a position in a "very long list of pseudo-random numbers".
Question: lets say, (in my Matlab script), I choose rng(100) for my first computation (a sequence of instructions) and then rng(1e6) for my second. Please, note that each time I do some computations it involves generating up to about 300k random numbers (each time).
-> Does that imply that I make sure there is no overlap between the sequence in the "list" starting at 100 and ending around 300k and the one starting at 1e6 and ending at 1'300'000 ? (the idead of "no overlap" comes from the fact since the rng(100) and rng(1e6) are separated by much more than 300k)
i.e. that these are 2 "independent" sequences, (as far as I remember this 'long list' would be generated by a special PRNG algorithm, most likely involing modular arithmetic..?)
No that is not the case. The mapping between the seed and the "position" in our list of generated numbers is not linear, you could actually interpret it as a hash/one way function. It could actually happen that we get the same sequence of numbers shifted by one position (but it is very unlikely).
By default, MATLAB uses the Mersenne Twister (source).
Not quite. The seed you give to rng is the initiation point for the Mersenne Twister algorithm (by default) that is used to generate the pseudorandom numbers. If you choose two different seeds (no matter their relative non-negative integer values, except for maybe a special case or two), you will have effectively independent pseudorandom number streams.
For "99%" of people, the major uses of seeding the rng are using the 'shuffle' argument (to use a non-default seed based on the time to help ensure independence of numbers generated across multiple sessions), or to give it one particular seed (to be able to reproduce the same pseudorandom stream at a later date). If you try to finesse the seeds further without being extremely careful, you are more likely to cause issues than do anything helpful.
RandStream can be used to break off separate streams of pseudorandom numbers if that really matters for your application (it likely doesn't).

Finding Conditional Moments in a Markov Process

This question combines math and programming. I will first describe the general problem and then give an example that is (hopefully) simpler to understand.
General Question: Consider a Markov-chain process of N-states with transition matrix Π. Each state is associated with a value x_n (n in {1,…,n}). Our goal is to find the unconditional average of the first two moments (mean and var) along T-period paths conditional on (i) the path starts in a subset of states, N_0, (ii) it ends in a subset of states, N_T, and (iii) it is not going through a subset of states, N_not, in any of the periods between 1 to T-1. By saying we are interested in the unconditional average of these two moments, I basically mean what would be the average of these two moments in the stationary distribution. To be more concrete, let me illustrate the goal of the exercise in a simple case.
Simple Example: Consider a 3-state Markov-chain process with transition matrix Π, and let the three state be denoted by A, B, and C. Each of these states are associated with some value (x_A, x_B, and x_C), respectively. We are interested in what happens along paths that satisfy the following condition. The path starts at point A, after 3 periods are in either points B or C, and between periods 1 to 3 never go again through point A. Denote this condition by (#). So, for example, a path which we are interested in would be {A,B,B,C} with the associated values {x_A, x_B, x_B, x_C}. We are interested in the average and standard deviation along such paths. In particular, we would like to find the unconditional average of these first two moments in paths that satisfy (#).
Let me now propose a solution based on simulating the process. Since both T and N are quite large, this solution is too slow for my purpose.
Simulation Solution: Starting from some initial point simulate the process for a very long time period, and drop the first τ periods. Extract all paths along the simulation that satisfy condition (#) and compute the mean and std along each of these paths. Finally, simply take the average across these paths.
I’m hoping there is a better and more efficient way to achieve the goal. Since I want the solution to be accurate and the size of T and N the simulation takes a long time.
I would love to hear your thoughts and if you know of efficient methods to achieve this goal. Please let me know if something is not clear and I'll try to clarify it.
Thank you!!!
I think I know how to do this if N_0 consists of one state, let's call that state A.
The long run probability of being in A is pi(A) and can be obtained by solving pi = pi*P, with P the transition matrix.
The other thing you need to calculate is the probability of those transient paths. You probably need to introduce a modified P, where all states i in the set N_not are absorbing (i.e. P[i,i]=1 and P[i,j]=0 for j is not i). Then starting from a vector p(0) which has a 1 in the element corresponding to state A and 0 otherwise, you can keep calculating p(n) = p(n-1)*P to get the probabilities of your transient paths.
Multiply the result of that by pi(A) to get the unconditional probability.
You can probably do something like this as well when N_0 is a set, but I don't know how you should select p(0) in that case.

finding good hash function for languages accepted by finite state automata

I'm working on project in Java (but I think it doesn't depend on the language) where I'm generating small (4 states max) nondeterministic finite state automata on binary alphabet and I have to check fast the generated automaton for equivalence with the previous ones. Therefore, I have to use some good hash function, to avoid compairing with too many automatas.
My first thought was doing a DFS on the transitions and finding all the accepted words until length max. 5 and then I map the set of accepted words to a 64-bit long (the amount of binary words of length max. 5). But it seems to produce too many collisions on NFAs with 4 states. Increasing the length results in making the computing of the hash code too slow for practical use.
Another approach was having a set of words and testing which of them the automaton accepts but finding the right ones, I think, isn't that trivial.
Do you have any idea how to improve the hash function to avoid too many collisions without a significant loss of speed?
Thanks in advance
I was thinking further (thanks #justhalf and #templatetypedef) and I have an idea - an injective function of any NFA (or more precisely, language accepted by it) to integers - Let's have an NFA A. Let's construct minimal DFA A_min accepting the same language with complete delta-function. As a consequence of Myhill-Nerode theorem, this automaton should be unambiguous except isomorphism. Do a BFS from the initial state giving priority to the edges(transitions) based on some fixed order of characters in the alphabet (for example first 0, second 1). And renumber the states based on the order of visiting. Now we have a canonical minimal DFA and we can map the incidence matrix of states to an integer and append enumeration of final states (or better make a tuple, to avoid collision). This integer could be then used for deciding equivalence of two NFAs. Do you think, it is ok or have any other idea?

Efficient Function to Map (or Hash) Integers and Integer Ranges into Index

We are looking for the computationally simplest function that will enable an indexed look-up of a function to be determined by a high frequency input stream of widely distributed integers and ranges of integers.
It is OK if the hash/map function selection itself varies based on the specific integer and range requirements, and the performance associated with the part of the code that selects this algorithm is not critical. The number of integers/ranges of interest in most cases will be small (zero to a few thousand). The performance critical portion is in processing the incoming stream and selecting the appropriate function.
As a simple example, please consider the following pseudo-code:
switch (highFrequencyIntegerStream)
case(2) : func1();
case(3) : func2();
case(8) : func3();
case(33-122) : func4();
...
case(10,000) : func40();
In a typical example, there would be only a few thousand of the "cases" shown above, which could include a full range of 32-bit integer values and ranges. (In the pseudo code above 33-122 represents all integers from 33 to 122.) There will be a large number of objects containing these "switch statements."
(Note that the actual implementation will not include switch statements. It will instead be a jump table (which is an array of function pointers) or maybe a combination of the Command and Observer patterns, etc. The implementation details are tangential to the request, but provided to help with visualization.)
Many of the objects will contain "switch statements" with only a few entries. The values of interest are subject to real time change, but performance associated with managing these changes is not critical. Hash/map algorithms can be re-generated slowly with each update based on the specific integers and ranges of interest (for a given object at a given time).
We have searched around the internet, looking at Bloom filters, various hash functions listed on Wikipedia's "hash function" page and elsewhere, quite a few Stack Overflow questions, abstract algebra (mostly Galois theory which is attractive for its computationally simple operands), various ciphers, etc., but have not found a solution that appears to be targeted to this problem. (We could not even find a hash or map function that considered these types of ranges as inputs, much less a highly efficient one. Perhaps we are not looking in the right places or using the correct vernacular.)
The current plan is to create a custom algorithm that preprocesses the list of interesting integers and ranges (for a given object at a given time) looking for shifts and masks that can be applied to input stream to help delineate the ranges. Note that most of the incoming integers will be uninteresting, and it is of critical importance to make a very quick decision for as large a percentage of that portion of the stream as possible (which is why Bloom filters looked interesting at first (before we starting thinking that their implementation required more computational complexity than other solutions)).
Because the first decision is so important, we are also considering having multiple tables, the first of which would be inverse masks (masks to select uninteresting numbers) for the easy to find large ranges of data not included in a given "switch statement", to be followed by subsequent tables that would expand the smaller ranges. We are thinking this will, for most cases of input streams, yield something quite a bit faster than a binary search on the bounds of the ranges.
Note that the input stream can be considered to be randomly distributed.
There is a pretty extensive theory of minimal perfect hash functions that I think will meet your requirement. The idea of a minimal perfect hash is that a set of distinct inputs is mapped to a dense set of integers in 1-1 fashion. In your case a set of N 32-bit integers and ranges would each be mapped to a unique integer in a range of size a small multiple of N. Gnu has a perfect hash function generator called gperf that is meant for strings but might possibly work on your data. I'd definitely give it a try. Just add a length byte so that integers are 5 byte strings and ranges are 9 bytes. There are some formal references on the Wikipedia page. A literature search in ACM and IEEE literature will certainly turn up more.
I just ran across this library I had not seen before.
Addition
I see now that you are trying to map all integers in the ranges to the same function value. As I said in the comment, this is not very compatible with hashing because hash functions deliberately try to "erase" the magnitude information in a bit's position so that values with similar magnitude are unlikely to map to the same hash value.
Consequently, I think that you will not do better than an optimal binary search tree, or equivalently a code generator that produces an optimal "tree" of "if else" statements.
If we wanted to construct a function of the type you are asking for, we could try using real numbers where individual domain values map to consecutive integers in the co-domain and ranges map to unit intervals in the co-domain. So a simple floor operation will give you the jump table indices you're looking for.
In the example you provided you'd have the following mapping:
2 -> 0.0
3 -> 1.0
8 -> 2.0
33 -> 3.0
122 -> 3.99999
...
10000 -> 42.0 (for example)
The trick is to find a monotonically increasing polynomial that interpolates these points. This is certainly possible, but with thousands of points I'm certain you'ed end up with something much slower to evaluate than the optimal search would be.
Perhaps our thoughts on hashing integers can help a little bit. You will also find there a hashing library (hashlib.zip) based on Bob Jenkins' work which deals with integer numbers in a smart way.
I would propose to deal with larger ranges after the single cases have been rejected by the hashing mechanism.

Algorithm generation

I have a rather large(not too large but possibly 50+) set of conditions that must be placed on a set of data(or rather the data should be manipulated to fit the conditions).
For example, Suppose I have the a sequence of binary numbers of length n,
if n = 5 then a element in the data might be {0,1,1,0,0} or {0,0,0,1,1}, etc...
BUT there might be a set of conditions such as
x_3 + x_4 = 2
sum(x_even) <= 2
x_2*x_3 = x_4 mod 2
etc...
Because the conditions are quite complex in that they come from experiment(although they can be written down in logic form) and are hard to diagnose I would like instead to use a large sample set of valid data. i.e., Data I know satisfies the conditions and is a pretty large set. i.e., it is easier to collect the data then it is to deduce the conditions that the data must abide by.
Having said that, basically what I'm doing is very similar to neural networks. The difference is, I would like an actual algorithm, in some sense optimal, in some form of code that I can run instead of the network.
It might not be clear what I'm actually trying to do. What I have is a set of data in some raw format that is unique and unambiguous but not appropriate for my needs(in a sense the amount of data is too large).
I need to map the data into another set that actually is ambiguous to some degree but also has certain specific set of constraints that all the data follows(certain things just cannot happen while others are preferred).
The unique constraints and preferences are hard to figure out. That is, the mapping from the non-ambiguous set to the ambiguous set is hard to describe(which is why it is ambiguous). The goal, actually, is to have an unambiguous map by supplying the right constraints if at all possible.
So, on the vein of my initial example, I'm given(or supply) a set of elements and need some way to derive a list of constraints similar to what I've listed.
In a sense, I simply have a set of valid data and train it very similar to neural networks.
Then, after this "Training" I'm given the mapping function I can then use on any element in my dataset and it will produce a new element satisfying the constraint's, or if it can't, will give as close as possible an unambiguous result.
The main difference between neural networks and what I'm trying to achieve is I'd like to be able to use have an algorithm to code to be used instead of having to run a neural network. The difference here is the algorithm would probably be a lot less complex, not need potential retraining, and a lot faster.
Here is a simple example.
Suppose my "training set" are the binary sequences and mappings
01000 => 10000
00001 => 00010
01010 => 10100
00111 => 01110
then from the "Magical Algorithm Finder"(tm) I would get a mapping out like
f(x) = x rol 1 (rol = rotate left)
or whatever way one would want to express it.
Then I could simply apply f(x) to any other element, such as x = 011100 and could apply f to generate a hopefully unambiguous output.
Of course there are many such functions that will work on this example but the goal is to supply enough of the dataset to narrow it down to hopefully a few functions that make the most sense(at the very least will always map the training set correctly).
In my specific case I could easily convert my problem into mapping the set of binary digits of length m to the set of base B digits of length n. The constraints prevents some numbers from having an inverse. e.g., the mapping is injective but not surjective.
My algorithm could be a simple collection if statements acting on the digits if need be.
I think what you are looking for here is an application of Learning Classifier Systems, LCS -wiki. There are actually quite a few LCS open-source applications available, but you may need to experiment with the parameters in order to get a good result.
LCS/XCS/ZCS have the features that you are looking for including individual rules that could be heavily optimized, pressure to reduce the rule-set, and of course a human-readable/understandable set of rules. (Unlike a neural-net)