Levenshtein Distance Formula in CoffeeScript? - coffeescript

I am trying to create or find a CoffeeScript implementation of the Levenshtein Distance formula, aka Edit Distance. Here is what I have so far, any help at all would be much appreciated.
levenshtein = (s1,s2) ->
n = s1.length
m = s2.length
if n < m
return levenshtein(s2, s1)
if not s1
return s2.length
previous_row = [s2.length + 1]
for c1, i in s1
current_row = [i + 1]
for c2, j in s2
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] # is this unnescessary?-> (c1 != c2)
current_row.push(Math.min(insertions,deletions,substitutions))
previous_row = current_row
return previous_row[previous_row.length-1]
#End Levenshetein Function
Btw: I know this code is wrong on many levels, I am happy to receive any and all constructive criticism. Just looking to improve, and figure out this formula!
CodeEdit1: Patched up the errors Trevor pointed out, current code above includes those changes
Update: The question I am asking is - how do we do Levenshtein in CoffeeScript?
Here is the 'steps' for the Levenshtein Distance Algorithm to help you see what I am trying to accomplish.
Steps
1
Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and 0..n columns.
2
Initialize the first row to 0..n.
Initialize the first column to 0..m.
3 Examine each character of s (i from 1 to n).
4 Examine each character of t (j from 1 to m).
5 If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
6 Set cell d[i,j] of the matrix equal to the minimum of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.
c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.
7 After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].
source:http://www.merriampark.com/ld.htm

This page (linked to from the resource you mentioned) offers a JavaScript implementation of the Levenshtein distance algorithm. Based on both that and the code you posted, here's my CoffeeScript version:
LD = (s, t) ->
n = s.length
m = t.length
return m if n is 0
return n if m is 0
d = []
d[i] = [] for i in [0..n]
d[i][0] = i for i in [0..n]
d[0][j] = j for j in [0..m]
for c1, i in s
for c2, j in t
cost = if c1 is c2 then 0 else 1
d[i+1][j+1] = Math.min d[i][j+1]+1, d[i+1][j]+1, d[i][j] + cost
d[n][m]
It seems to hold up to light testing, but let me know if there are any problems.

Related

Explanation for a function within xcorr in MATLAB

Looking within the xcorr function, most of it is pretty straightforward, except for one function within xcorr called "findTransformLength".
function m = findTransformLength(m)
m = 2*m;
while true
r = m;
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
if r == 1
break;
end
m = m + 1;
end
With no comments, i fail to understand what this function is meant to acheive and what is the significance of p = [2 3 5 7]. Why those numbers specifically? Why not take a fixed FFT size instead? Is there a disadvantage(cause errors) to taking a fixed FFT size?
This part is used to get the integer closest to 2*m that can be written in the form:
Either:
m is already of this form, then the loop
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
Will decrease r down to 1 and the break will be reached.
Or m has at least one other prime factor, and r will not reach 1. You go back to the look with m+1 and so on until you reach a number of the right form.
As per why they do this, you can see on the fft doc, in the Input arguments section:
n — Transform length [] (default) | nonnegative integer scalar
Transform length, specified as [] or a nonnegative integer scalar.
Specifying a positive integer scalar for the transform length can
increase the performance of fft. The length is typically specified as
a power of 2 or a value that can be factored into a product of small
prime numbers. If n is less than the length of the signal, then fft
ignores the remaining signal values past the nth entry and returns the
truncated result. If n is 0, then fft returns an empty matrix.
Example: n = 2^nextpow2(size(X,1))

Perceptron - MatLab Serious Confusion

This is my first stab at machine learning, and I can implement the code anyway that I want. I have Matlab access, which I think will be simpler than Python, and I have pseudo code for implementing a PLA. The last part of the code, however, absolutely baffles me, though it is simpler than the code I have seen on here thus far. It seems to be calling for the use of variables not declared. Here's what I have. I'll point out the number line at which I get stuck.
1) w <- (n + 1) X m (matrix of small random nums)
2) I <- I augmented with col. of 1s
3) for 1 = 1 to 1000
4) delta_W = (N + 1) X m (matrix of zeros) // weight changes
5) for each pattern 1 <= j <= p
6) Oj = (Ij * w) > 0 // j's are subscript/vector matrix product w/ threshold
7) Dj = = Tj - Oj // diff. between target and actual
8) w = w + Ij(transpose)*Dj // the learning rule
Lines 1 thru 4 are coded.
My questions are on line 5: What does "for each pattern mean" (i.e., how does one say it in code). Also, which j are they interested in, I have a j in the observation matrix and a j in the target matrix. Also, where did "p" come from (I have i's, j's, m's and n's but no p's)? Any thoughts would be appreciated.
"for each pattern" refers to the inputs. All they are saying is to run that loop where Ij is the input to the perceptron.
To write this in MATLAB, it really depends on how your data is oriented. I would store your inputs as a mXn matrix, where m is the number of inputs and n is the size of each input.
Say our inputs look like :
input = [1 5 -1;
2 3 2;
4 5 6;
... ]
First 'augment' this with a column of ones for the bias input:
[r c] = size(input);
input = [input ones(r,1)];
Then, your for loop will simply be:
for inputNumber = 1:r
pattern = input(inputNumber,:);
and you can continue from there.

Shuffle a vector of repeated numbers so the numbers do not repeat in MATLAB

Okay, so I have a script that will produce my vector of repeated integers of a certain interval, but now theres a particular instance where I need to make sure that once it is shuffled, the numbers do not repeat. So for example, I produced a vector of repeating 1-5, 36 times, shuffled. How do I ensure that there are no repeated numbers after shuffling? And to make things even more complex, I need to produce two such vectors that do not ever have the same value at the same index. For example, lets say 1:5 was repeated twice for these vectors, so then this would be what I'm looking for:
v1 v2
4 2
2 4
3 2
5 3
4 5
1 4
5 1
1 5
3 1
2 3
I made that right now by taking an example of 1 vector and just shifting it off by 1 to create another vector that will satisfy the requirements, but in my situation, that wont actually work because I can't have them be systematically dependent like that.
So I tried a recursive technique to make the script start over if the vectors did not make the cut and as expected, that did not go over so well. I hit my maximum recursive iterations and I've realized this is clearly not the way to go. Is there some other alternative?
EDIT:
So I found a way to satisfy some of the conditions I needed above in the following code:
a = nchoosek(1:5,2);
b = horzcat(a(:,2),a(:,1));
c = vertcat(a,b);
cols = repmat(c,9,1);
cols = cols(randperm(180),:);
I just need to find a way to shuffle cols that will also enforce no repeating numbers in columns, such that cols(i,1) ~= cols(i+1,1) and cols(i,2) ~= cols(i+1,2)
This works, but it probably is not very efficient for a large array:
a = nchoosek(1:5, 2);
while (any(a(1: end - 1, 1) == a(2: end, 1)) ...
|| any(a(1: end - 1, 2) == a(2: end, 2)))
random_indices = randperm(size(a, 1));
a = a(random_indices, :);
end
a
If you want something faster, the trick is to logically insert each row in a place where your conditions are satisfied, rather than randomly re-shuffling. For example:
n1 = 5;
n2 = 9;
a = nchoosek(1:n1, 2);
b = horzcat(a(:,2), a(:,1));
c = vertcat(a, b);
d = repmat(c, n2, 1);
d = d(randperm(n1 * n2), :);
% Perform an "insertion shuffle"
for k = 2: n1 * n2
% Grab row k from array d. Walk down the rows until a position is
% found where row k does not repeat with its upstairs or downstairs
% neighbors.
m = 1;
while (any(d(k,:) == d(m,:)) || any(d(k,:) == d(m+1,:)))
m = m + 1;
end
% Insert row k in the proper position.
if (m < k)
ind = [ 1: m k m+1: k-1 k+1: n1 * n2 ];
else
ind = [ 1: k-1 k+1: m k m+1: n1 * n2 ];
end
d = d(ind,:);
end
d
One way to solve this problem is to think both vectors as being created as follows:
For every row of arrays v1 and v2
Shuffle the array [1 2 3 4 5]
Set the values of v1 and v2 at the current row with the first and second value of the shuffle. Both values will always be different.
Code:
s = [1 2 3 4 5];
Nrows = 36;
solution = zeros(Nrows,2);
for k=1:Nrows
% obtain indexes j for shuffling array s
[x,j] = sort(rand(1,5));
%row k takes the first two values of shuffled array s
solution(k,1:2) = s(j(1:2));
end
v1 = solution(:,1);
v2 = solution(:,2);
Main edit: random => rand,
With this method there is no time wasted in re-rolling repeated numbers because the first and second value of shuffling [1 2 3 4 5] will always be different.
Should you need more than two arrays with different numbers the changes are simple.

How to get fibonacci sequence using recursion?

I am trying to find the fib sequence using recursion but my function keeps giving me an error.
function y = r_nFib(seq, n)
y = zeros(1,n);
for m = 1:n
y(m) = r_nFib(m);
end
if seq == 0
y = [0 y];
else
y = [seq, seq, y];
function y = r_nFib(seq, n)
if n<3
y(1:n) = 1;
else
y(n) = r_nFib(n-2) + r_nFib(n-1);
end
y = y(n);
end
end
n is the length of the fib sequence and seq is the starting number. If seq is 0 then this is how the sequence is going to start
y = [0 1 1 2 3 5 8] % first two number will be the 0 and 1
if seq is any thing other than 0, then
if seq = 2;
y = [2 2 4 6 10] % first two number will be the seq
How do I correct my function to give me the right answer. I have never used recursion and I am new to it. Any help would be really appreciated.
y = r_nFib(4,10)
y = [4 4 8 12 20 32 52 84 136 220];
Thank you.
function y = r_nFib(seq, n)
if length(seq) == 1
if seq == 0
seq = [0, 1];
else
seq = [seq, seq];
end
end
if length(seq) >= n
y = seq;
else
y = r_nFib([seq (seq(end - 1) + seq(end))], n);
end
Here is a solution that I typed up for matlab, explaining recursion:
A recursive method works by breaking a larger problem into smaller problems each time the method is called. This allows you to break what would be a difficult problem; a factorial summation, into a series of smaller problems.
Each recursive function has 2 parts:
1) The base case: The lowest value that we care about evaluating. Usually this goes to zero or one.
if (num == 1)
out = 1;
end
2) The general case: The general case is what we are going to call until we reach the base case. We call the function again, but this time with 1 less than the previous function started with. This allows us to work our way towards the base case.
out = num + factorial(num-1);
This statement means that we are going to firstly call the function with 1 less than what this function with; we started with three, the next call starts with two, the call after that starts with 1 (Which triggers our base case!)
Once our base case is reached, the methods "recurse-out". This means they bounce backwards, back into the function that called it, bringing all the data from the functions below it!It is at this point that our summation actually occurs.
Once the original function is reached, we have our final summation.
For example, let's say you want the summation of the first 3 integers.
The first recursive call is passed the number 3.
function [out] = factorial(num)
%//Base case
if (num == 1)
out = 1;
end
%//General case
out = num + factorial(num-1);
Walking through the function calls:
factorial(3); //Initial function call
//Becomes..
factorial(1) + factorial(2) + factorial(3) = returned value
This gives us a result of 6!
matlab - Clearer explanation of recursion

Ordering a list of lists subject to constraints

I have encountered a surprisingly challenging problem arranging a matrix-like (List of Lists) of values subject to the following constraints (or deciding it is not possible):
A matrix of m randomly generated rows with up to n distinct values (no repeats within the row) arrange the matrix such that the following holds (if possible):
1) The matrix must be "lower triangular"; the rows must be ordered in ascending lengths so the only "gaps" are in the top right corner
2) If a value appears in more than one row it must be in the same column (i.e. rearranging the order of values in a row is allowed).
Expression of the problem/solution in a functional language (e.g. Scala) is desirable.
Example 1 - which has a solution
A B
C E D
C A B
becomes (as one solution)
A B
E D C
A B C
since A, B and C all appear in columns 1, 2 and 3, respectively.
Example 2 - which has no solution
A B C
A B D
B C D
has no solution since the constraints require the third row to have the C and D in the third
column which is not possible.
I thought this was an interesting problem and have modeled a proof-of-concept-version in MiniZinc (a very high level Constraint Programming system) which seems to be correct. I'm not sure if it's of any use, and to be honest I'm not sure if it's powerful for very largest problem instances.
The first problem instance has - according to this model - 4 solutions:
B A _
E D C
B A C
----------
B A _
D E C
B A C
----------
A B _
E D C
A B C
----------
A B _
D E C
A B C
The second example is considered unsatisfiable (as it should).
The complete model is here: http://www.hakank.org/minizinc/ordering_a_list_of_lists.mzn
The basic approach is to use matrices, where shorter rows are filled with a null value (here 0, zero). The problem instance is the matrix "matrix"; the resulting solution is in the matrix "x" (the decision variables, as integers which are then translated to strings in the output). Then there is a helper matrix, "perms" which are used to ensure that each row in "x" is a permutation of the corresponding row in "matrix", done with the predicate "permutation3". There are some other helper arrays/sets which simplifies the constraints.
The main MiniZinc model (sans output) is show below.
Here are some comments/assumptions which might make the model useless:
this is just a proof-of-concept model since I thought it was an interesting
problem.
I assume that the rows in the matrix (the problem data) is already ordered
by size (lower triangular). This should be easy to do as a preprocessing step
where Constraint Programming is not needed.
the shorter lists are filled with 0 (zero) so we can work with matrices.
since MiniZinc is a strongly typed language and don't support
symbols, we just define integers 1..5 to represent the letters A..E.
Working with integers is also beneficial when using traditional
Constraint Programming systems.
% The MiniZinc model (sans output)
include "globals.mzn";
int: rows = 3;
int: cols = 3;
int: A = 1;
int: B = 2;
int: C = 3;
int: D = 4;
int: E = 5;
int: max_int = E;
array[0..max_int] of string: str = array1d(0..max_int, ["_", "A","B","C","D","E"]);
% problem A (satifiable)
array[1..rows, 1..cols] of int: matrix =
array2d(1..rows, 1..cols,
[
A,B,0, % fill this shorter array with "0"
E,D,C,
A,B,C,
]);
% the valid values (we skip 0, zero)
set of int: values = {A,B,C,D,E};
% identify which rows a specific values are.
% E.g. for problem A:
% value_rows: [{1, 3}, {1, 3}, 2..3, 2..2, 2..2]
array[1..max_int] of set of int: value_rows =
[ {i | i in 1..rows, j in 1..cols where matrix[i,j] = v} | v in values];
% decision variables
% The resulting matrix
array[1..rows, 1..cols] of var 0..max_int: x;
% the permutations from matrix to x
array[1..rows, 1..cols] of var 0..max_int: perms;
%
% permutation3(a,p,b)
%
% get the permutation from a b using the permutation p.
%
predicate permutation3(array[int] of var int: a,
array[int] of var int: p,
array[int] of var int: b) =
forall(i in index_set(a)) (
b[i] = a[p[i]]
)
;
solve satisfy;
constraint
forall(i in 1..rows) (
% ensure unicity of the values in the rows in x and perms (except for 0)
alldifferent_except_0([x[i,j] | j in 1..cols]) /\
alldifferent_except_0([perms[i,j] | j in 1..cols]) /\
permutation3([matrix[i,j] | j in 1..cols], [perms[i,j] | j in 1..cols], [x[i,j] | j in 1..cols])
)
/\ % zeros in x are where there zeros are in matrix
forall(i in 1..rows, j in 1..cols) (
if matrix[i,j] = 0 then
x[i,j] = 0
else
true
endif
)
/\ % ensure that same values are in the same column:
% - for each of the values
% - ensure that it is positioned in one column c
forall(k in 1..max_int where k in values) (
exists(j in 1..cols) (
forall(i in value_rows[k]) (
x[i,j] = k
)
)
)
;
% the output
% ...
I needed a solution in a functional language (XQuery) so I implemented this first in Scala due to its expressiveness and I post the code below. It uses a brute-force, breadth first style search for solutions. I'm inly interested in a single solution (if one exists) so the algorithm throws away the extra solutions.
def order[T](listOfLists: List[List[T]]): List[List[T]] = {
def isConsistent(list: List[T], listOfLists: List[List[T]]) = {
def isSafe(list1: List[T], list2: List[T]) =
(for (i <- list1.indices; j <- list2.indices) yield
if (list1(i) == list2(j)) i == j else true
).forall(_ == true)
(for (row <- listOfLists) yield isSafe(list, row)).forall(_ == true)
}
def solve(fixed: List[List[T]], remaining: List[List[T]]): List[List[T]] =
if (remaining.isEmpty)
fixed // Solution found so return it
else
(for {
permutation <- remaining.head.permutations.toList
if isConsistent(permutation, fixed)
ordered = solve(permutation :: fixed, remaining.tail)
if !ordered.isEmpty
} yield ordered) match {
case solution1 :: otherSolutions => // There are one or more solutions so just return one
solution1
case Nil => // There are no solutions
Nil
}
// Ensure each list has unique items (i.e. no dups within the list)
require (listOfLists.forall(list => list == list.distinct))
/*
* The only optimisations applied to an otherwise full walk through all solutions is to sort the list of list so that the lengths
* of the lists are increasing in length and then starting the ordering with the first row fixed i.e. there is one degree of freedom
* in selecting the first row; by having the shortest row first and fixing it we both guarantee that we aren't disabling a solution from being
* found (i.e. by violating the "lower triangular" requirement) and can also avoid searching through the permutations of the first row since
* these would just result in additional (essentially duplicate except for ordering differences) solutions.
*/
//solve(Nil, listOfLists).reverse // This is the unoptimised version
val sorted = listOfLists.sortWith((a, b) => a.length < b.length)
solve(List(sorted.head), sorted.tail).reverse
}