NFA to accept the following language - numbers

I need to build an NFA (or DFA) to recognize the following language:
L = {w | w mod 3 = 1}.
So the way I tried it was to make an NFA to recognize numbers divisible by 3 and then just add 1 to them, but this approach is a lot harder than it seems (if not impossible ?).
I only managed to do an NFA to recognize numbers divisible by 3.

I will assume that w is to be interpreted as the decimal representation (without leading zeroes) of a nonnegative integer.
Given this, we can use Myhill-Nerode to iteratively determine the states we need:
the empty string can be followed by any string in L to get to a string in L. We'll call the equivalence class for this [e]. Note that this equivalence class corresponds to the initial state of a minimal DFA for L (if one exists). Note also that the initial state is not accepting since the empty string is not a valid decimal representation of a nonnegative integer.
the string 0 cannot be followed by anything to get a string in L; it leads to a dead state corresponding to equivalence class [0].
strings 1, 4 and 7 are in L so they must correspond to a new state. We'll call the equivalence class for these [1].
strings 2, 5 and 8 are not in L; however, not all strings in L lead them to strings in L. These must correspond to a new equivalence class we'll call [2].
strings 3, 6 and 9 are not in L; but these can be followed by anything in L to get a string in L. This is the same as the empty string, so we don't need a new equivalence class or state: the equivalence class is [e].
it can be verified that every two-digit decimal string is indistinguishable from some one-digit decimal string above. so, no new equivalence classes or states are needed.
To determine the transitions, simply append the transition symbol to the equivalence class's representative element and see what equivalence class the resulting string belongs to: that will be where the transition terminates. For instance, there is a transition from [e] to [0] on 0, from [e] to [1] on 1, etc.
Because 10 = 1 (mod 3), adding a new digit to the end of a decimal string will cause the new value modulo 3 to be the sum of the original number's value modulo 3 with the value of the new digit modulo 3:
x = a (mod 3)
y = b (mod 3)
x * 10 = x * 1 (mod 3) since 10 = 1 (mod 3)
x . y = x * 10 + y = x * 1 + y = x + y (mod 3)
Filling in the transitions is left as an exercise.

Related

good hash function for a string of numbers and letters of the form "9X9XX99X9XX999999"

what would be a good hash code for a vehicle identification number, that is a string
of numbers and letters of the form "9X9XX99X9XX999999," where a "9" represents
a digit and an "X" represents a letter?
One reasonable approach is to hash the entire thing using a hash function suitable for strings, e.g. GCC's C++ Standard Library uses MURMUR32.
If you wanted to get more hands on, you could group all the digits to form one 11-digit number, and knowing the 6 letters can have 26 different values which is less than 2^5=32 - you could cheaply create a number from those letters (let's call them ABCDEF) by evaluating: A + B * 2^5 + C * 2^10 + D * 2^15 + E * 2^20 + F * 2^25
Then, separately hash both the 11-digit number and the number created from the letters with a decent hash function, and XOR or add the results; you'll have quite a good hash value for your VIN. I haven't personally evaluated it, but Thomas Mueller recommends and explains something ostensible suitable here:
uint64_t hash(uint64_t x) {
x = (x ^ (x >> 30)) * UINT64_C(0xbf58476d1ce4e5b9);
x = (x ^ (x >> 27)) * UINT64_C(0x94d049bb133111eb);
x = x ^ (x >> 31);
return x;
}

convert number string into float with specific precision (without getting rounding errors)

I have a vector of cells (say, size of 50x1, called tokens) , each of which is a struct with properties x,f1,f2 which are strings representing numbers. for example, tokens{15} gives:
x: "-1.4343429"
f1: "15.7947111"
f2: "-5.8196158"
and I am trying to put those numbers into 3 vectors (each is also 50x1) whose type is float. So I create 3 vectors:
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
and that works fine (why wouldn't it?). But then when I try to populate those vectors: (L is a for loop index)
x(L)=tokens{L}.x;
.. also for the other 2
I get :
The following error occurred converting from string to single:
Conversion to single from string is not possible.
Which I can understand; implicit conversion doesn't work for single. It does work if x, f1 and f2 are of type 50x1 double.
The reason I am doing it with floats is because the data I get is from a C program which writes the some floats into a file to be read by matlab. If I try to convert the values into doubles in the C program I get rounding errors...
So, (after what I hope is a good question,) how might I be able to get the numbers in those strings, at the right precision? (all the strings have the same number of decimal places: 7).
The MCVE:
filedata = fopen('fname1.txt','rt');
%fname1.txt is created by a C program. I am quite sure that the problem isn't there.
scanned = textscan(filedata,'%s','Delimiter','\n');
raw = scanned{1};
stringValues = strings(50,1);
for K=1:length(raw)
stringValues(K)=raw{K};
end
clear K %purely for convenience
regex = 'x=(?<x>[\-\.0-9]*),f1=(?<f1>[\-\.0-9]*),f2=(?<f2>[\-\.0-9]*)';
tokens = regexp(stringValues,regex,'names');
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
for L=1:length(tokens)
x(L)=tokens{L}.x;
f1(L)=tokens{L}.f1;
f2(L)=tokens{L}.f2;
end
Use function str2double before assigning into yours arrays (and then cast it to single if you want). Strings (char arrays) must be explicitely converted to numbers before using them as numbers.

horner's method of hashing

I know how to get the value of hashing a string by horner's method wich takes a three paramettres String str , int p (prime) and int m like this
p(str)=( sumOf(str(0)+str(1)*M+....+str(n)*M^n) )%p = hashVal
but the problem is how to get the string str by giving just hashVal , p and M
for example if I give you hashval=7 , p = 11 and M = 2 you have to give me a string for example "hello" (not right just a suggestion for understanding)
I mean that I don't know how to do the inverse
and thanks for your help
You can't get a unique input from the hash you describe, since the modulo operation throws information away. If you knew the hashval was 7, M=2 and p=11, as in your example, you wouldn't know whether the sumOf(...) was 7, or 18, or so on.
Even if you did, say you knew it was 5, for an example 2-character string you wouldn't be able to work out whether str(0) was 1 and str(1) was 2, for example, or str(0) was 5 and str(1) was 0.
Hashes are usually hard/impossible to reverse, especially uniquely. The simplest way to solve them is to hash all possible inputs and check their outputs. You'll end up with many inputs with the same hashVal (if p in your example is 3 there are only 3 different hashes).

Where in the sequence of a Probabilistic Suffix Tree does "e" occur?

In my data there are only missing data (*) on the right side of the sequences. That means that no sequence starts with * and no sequence has any other markers after *. Despite this the PST (Probabilistic Suffix Tree) seems to predict a 90% chance of starting with a *. Here's my code:
# Load libraries
library(RCurl)
library(TraMineR)
library(PST)
# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)
# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)
# Create sequence object
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")
# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)
# Look at first state
cmine(S1, pmin = 0, state = "N3", l = 1)
This generates:
[>] context: e
EX FA I1 I2 I3 N1 N2 N3 NR
S1 0.006821066 0.01107234 0.01218274 0.01208756 0.006821066 0.002569797 0.003299492 0.001554569 0.0161802
QU TR *
S1 0.01126269 0.006440355 0.9097081
How can the probability for * be 0.9097081 at the very beginning of the sequence, meaning after context e?
Does it mean that the context can appear anywhere inside a sequence, and that e denotes an arbitrary starting point somewhere inside a sequence?
A PST is a representation of a variable length Markov model (VLMC). As a classical Markov model a VLMC is assumed to be homogeneous (or stationary) meaning that the conditional probabilities of the outcome given the context are the same at each position in the sequence. In other words, the context can appear anywhere in the sequence. Actually, the search for contexts is done by exploring the tree that is supposed to apply anywhere in the sequences.
In your example, for l=1 (l is 1 + the length of the context), you look only for 0-length context, i.e., the only possible context is the empty sequence e. Your condition pmin=0, state=N3 (have a probability greater than 0 for N3) is equivalent to no condition at all. So you get the overall probability to observe each state. Because your sequences (with the missing states) are all of the same length, you would get the same results using TraMineR with
seqmeant(data.seq, with.missing=TRUE)/max(seqlength(data.seq))
To get the distribution at the first position, you can use TraMineR and look at the first column of the table of cross-sectional distributions at the successive positions returned by
seqstatd(data.seq, with.missing=TRUE)
Hope this helps.

Is this the simplified version of this boolean expression? Or is this reviewer wrong

Cause I've tried doing the truth table unfortunately one has 3 literals and the other has 4 so i got confused.
F = (A+B+C)(A+B+D')+B'C;
and this is the simplified version
F = A + B + C
http://www.belley.org/etc141/Boolean%20Sinplification%20Exercises/Boolean%20Simplification%20Exercise%20Questions.pdf
cause I think there's something wrong with this reviewer.. or is it accurate?
btw is simplification different from minimizing from Sum of Minterms to Sum of Products?
Yes, it is the same.
Draw the truth table for both expressions, assuming that there are four input variables in both. The value of D will not play into the second truth table: values in cells with D=1 will match values in cells with D=0. In other words, you can think of the second expression as
F = A +B + C + (0)(D)
You will see that both tables match: the (A+B+C)(A+B+D') subexpression has zeros in ABCD= {0000, 0001, 0011}; (A+B+C) has zeros only at {0000, 0001}. Adding B'C patches zero at 0011 in the first subexpressions, so the results are equivalent.