Advice for understanding Stan machine learning code - matlab

I need help trying to understand the following Stan code. I am good with Matlab, but have my problems with this code I found (see below).
First I don’t understand what exactly vector[K] u[I] means. Is this a Kx1 vector, a Ix1 vector or a KxI matrix? (they write something about a latent feature vector ui(vj) of dimension K = 4, but I don't understand what that means).
Then I don't quite understand what exactly normal(0,sigma_0) and cauchy(u[i]'*v[j],lambda) does. I mean, what is the output here? When I use the “normpdf” function in MATLAB, I need an additional input x at which to evaluate the distribution. So why is that missing here and what does that mean for the output?
And lastly, K is just used in the parameters section, but then not again in any for-loop, so why is the K even necessary?
Thanks in advance for any help, it is greatly appreciated!
data {
int<lower=0> I;
int<lower=0> J;
int<lower=0> K;
real ln_gamma[I,J];
real<lower=0> sigma_0;
real<lower=0> lambda;
}
parameters {
vector[K] u[I];
vector[K] v[J];
}
model {
for (i in 1:I)
u[i] ~ normal(0,sigma_0);
for (j in 1:J)
v[j] ~ normal(0,sigma_0);
for (i in 1:I) {
for (j in 1:J) {
if (ln_gamma[i,j] != -99) {
ln_gamma[i,j] ~ cauchy(u[i]’*v[j],lambda);
}
}
}
}

Related

Has anyone used Dijkstra's algorithm in OPL?

I have a model for a mining problem. I am working on adding into the model to use the shortest path inside a mine(open pit) for hauling ore and waste. For this, I was thinking of Dijkstra's algorithm. I could not find any example of the use of Dijkstra's algorithm in OPL. Has anyone done it before and can you share some ideas, please.
if you need to write Dijsktra's algorithm then Daniel is right and you d rather use the scripting part. Now if you need a shortest path within an existing OPL model you could use the following shortest path example:
.mod
tuple edge
{
key int o;
key int d;
int weight;
}
{edge} edges=...;
{int} nodes={i.o | i in edges} union {i.d | i in edges};
int st=1; // start
int en=8; // end
dvar int obj; // distance
dvar boolean x[edges]; // do we use that edge ?
minimize obj;
subject to
{
obj==sum(e in edges) x[e]*e.weight;
forall(i in nodes)
sum(e in edges:e.o==i) x[e]
-sum(e in edges:e.d==i) x[e]
==
((i==st)?1:((i==en)?(-1):0));
}
{edge} shortestPath={e | e in edges : x[e]==1};
execute
{
writeln(shortestPath);
}
.dat
edges=
{
<1,2,9>,
<1,3,9>,
<1,4,8>,
<1,10,18>,
<2,3,3>,
<2,6,6>,
<3,4,9>,
<3,5,2>,
<3,6,2>,
<4,5,8>,
<4,7,7>,
<4,9,9>,
<4,10,10>,
<5,6,2>,
<5,7,9>,
<6,7,9>,
<7,8,4>,
<7,9,5>,
<8,9,1>,
<8,10,4>,
<9,10,3>,
};
which gives
// solution (optimal) with objective 19
{<1 4 8> <4 7 7> <7 8 4>}
If you have a problem that can be solved using Dijkstra's algorithm then it seems a bit of overkill to use OPL or CPLEX to solve it. You could code up the algorithm in any programming language and use it from there. I guess that is why you don't find any examples.
If you still want to implement in OPL then use a scripting (execute) or a main block. The scripting code you can provide there is a superset of JavaScript, so you can implement Dijkstra's algorithm in JavaScript and put it there.

How to overcome indefinite matrix error (NbClust)?

I'm getting the following error when calling NbClust():
Error in NbClust(data = ds[, sapply(ds, is.numeric)], diss = NULL, distance = "euclidean", : The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.
I've called ds <- ds[complete.cases(ds),] just before running NbClust so there's no missing values.
Any idea what's behind this error?
Thanks
I had same issue in my research.
So, I had mailed to Nadia Ghazzali, who is the package maintainer, and got an answer.
I'll attached my mail and her reply.
my e-mail:
Dear Nadia Ghazzali. Hello Nadia. I have some questions about
NbClust function in R library. I have tried googling but could not
find satisfying answers. First, I’m so grateful for you to making
this awsome R library. It is very helpful for my reasearch. I tested
NbClust function in NbClust library with my own data like below.
> clust <- NbClust(data, distance = “euclidean”,
min.nc = 2, max.nc = 10, method = ‘kmeans’, index =”all”)
But soon, an error has occurred. Error: division by zero! Error in
Indices.WBT(x = jeu, cl = cl1, P = TT, s = ss, vv = vv) : object
'scott' not found So, I tried NbClust function line by line and
found that some indices, like CCC, Scott, marriot, tracecovw,
tracew, friedman, and rubin, were not calculated because of object
vv = 0. I’m not very familiar with argebra so I don’t know meaning
of eigen value. But it seems to me that object ss(which is squart of
eigenValues) should not be 0 after prodected.
So, here is my questions.
I assume that my data is so sparse(a lot of zero values) that sqrt(eigenValues) becomes too small, is that right? I’m sorry I
can’t attach my data but I can attach some part of eigenValues and
squarted eigenValues.
> head(eigenValues)
[1] 0.039769880 0.017179826 0.007011972 0.005698736 0.005164871 0.004567238
> head(sqrt(eigenValues))
[1] 0.19942387 0.13107184 0.08373752 0.07548997 0.07186704 0.06758134
And if my assume is right, what can I do for this problems? Only one
way to drop out 7 indices?
Thank you for reading and I’ll waiting your reply. Best regards!
and her reply:
Dear Hansol,
Thank you for your interest. Yes, your understanding is good.
Unfortunately, the seven indices could not be applied.
Best regards,
Nadia Ghazzali
#seni The cause of this error is data related. If you look at the source code of this function,
NbClust <- function(data, diss="NULL", distance = "euclidean", min.nc=2, max.nc=15, method = "ward", index = "all", alphaBeale = 0.1)
{
x<-0
min_nc <- min.nc
max_nc <- max.nc
jeu1 <- as.matrix(data)
numberObsBefore <- dim(jeu1)[1]
jeu <- na.omit(jeu1) # returns the object with incomplete cases removed
nn <- numberObsAfter <- dim(jeu)[1]
pp <- dim(jeu)[2]
TT <- t(jeu)%*%jeu
sizeEigenTT <- length(eigen(TT)$value)
eigenValues <- eigen(TT/(nn-1))$value
for (i in 1:sizeEigenTT)
{
if (eigenValues[i] < 0) {
print(paste("There are only", numberObsAfter,"nonmissing observations out of a possible", numberObsBefore ,"observations."))
stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.")
}
}
And I think the root cause of this error is the negative eigenvalues that seep in when the number of clusters is very high, i.e. the max.nc is high. So to solve the problem, you must look at your data. See if it got more columns then rows. Remove missing values, check for issues like collinearity & multicollinearity, variance, covariance etc.
For the other error, invalid clustering method, look at the source code of the method here. Look at line number 168, 169 in the given link. You are getting this error message because the clustering method is empty. if (is.na(method))
stop("invalid clustering method")

3-layered Neural network doesen't learn properly

So, I'm trying to implement a neural network with 3 layers in python, however I am not the brightest person so anything with more then 2 layers is kinda difficult for me. The problem with this one is that it gets stuck at .5 and does not learn I have no actual clue where it went wrong. Thank you for anyone with the patience to explain the error to me. (I hope the code makes sense)
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def reduce(x):
return x*(1-x)
l0=[np.array([1,1,0,0]),
np.array([1,0,1,0]),
np.array([1,1,1,0]),
np.array([0,1,0,1]),
np.array([0,0,1,0]),
]
output=[0,1,1,0,1]
syn0=np.random.random((4,4))
syn1=np.random.random((4,1))
for justanumber in range(1000):
for i in range(len(l0)):
l1=sigmoid(np.dot(l0[i],syn0))
l2=sigmoid(np.dot(l1,syn1))
l2_err=output[i]-l2
l2_delta=reduce(l2_err)
l1_err=syn1*l2_delta
l1_delta=reduce(l1_err)
syn1=syn1.T
syn1+=l0[i].T*l2_delta
syn1=syn1.T
syn0=syn0.T
syn0+=l0[i].T*l1_delta
syn0=syn0.T
print l2
PS. I know that it might be a piece of trash as a script but that is why I asked for assistance
Your computations are not fully correct. For example, the reduce is called on the l1_err and l2_err, where it should be called on l1 and l2.
You are performing stochastic gradient descent. In this case with such few parameters, it oscilates hugely. In this case use a full batch gradient descent.
The bias units are not present. Although you can still learn without bias, technically.
I tried to rewrite your code with minimal changes. I have commented your lines to show the changes.
#!/usr/bin/python3
import matplotlib.pyplot as plt
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def reduce(x):
return x*(1-x)
l0=np.array ([np.array([1,1,0,0]),
np.array([1,0,1,0]),
np.array([1,1,1,0]),
np.array([0,1,0,1]),
np.array([0,0,1,0]),
]);
output=np.array ([[0],[1],[1],[0],[1]]);
syn0=np.random.random((4,4))
syn1=np.random.random((4,1))
final_err = list ();
gamma = 0.05
maxiter = 100000
for justanumber in range(maxiter):
syn0_del = np.zeros_like (syn0);
syn1_del = np.zeros_like (syn1);
l2_err_sum = 0;
for i in range(len(l0)):
this_data = l0[i,np.newaxis];
l1=sigmoid(np.matmul(this_data,syn0))[:]
l2=sigmoid(np.matmul(l1,syn1))[:]
l2_err=(output[i,:]-l2[:])
#l2_delta=reduce(l2_err)
l2_delta=np.dot (reduce(l2), l2_err)
l1_err=np.dot (syn1, l2_delta)
#l1_delta=reduce(l1_err)
l1_delta=np.dot(reduce(l1), l1_err)
# Accumulate gradient for this point for layer 1
syn1_del += np.matmul(l2_delta, l1).T;
#syn1=syn1.T
#syn1+=l1.T*l2_delta
#syn1=syn1.T
# Accumulate gradient for this point for layer 0
syn0_del += np.matmul(l1_delta, this_data).T;
#syn0=syn0.T
#syn0-=l0[i,:].T*l1_delta
#syn0=syn0.T
# The error for this datpoint. Mean sum of squares
l2_err_sum += np.mean (l2_err ** 2);
l2_err_sum /= l0.shape[0]; # Mean sum of squares
syn0 += gamma * syn0_del;
syn1 += gamma * syn1_del;
print ("iter: ", justanumber, "error: ", l2_err_sum);
final_err.append (l2_err_sum);
# Predicting
l1=sigmoid(np.matmul(l0,syn0))[:]# 1 x d * d x 4 = 1 x 4;
l2=sigmoid(np.matmul(l1,syn1))[:] # 1 x 4 * 4 x 1 = 1 x 1
print ("Predicted: \n", l2)
print ("Actual: \n", output)
plt.plot (np.array (final_err));
plt.show ();
The output I get is:
Predicted:
[[0.05214011]
[0.97596354]
[0.97499515]
[0.03771324]
[0.97624119]]
Actual:
[[0]
[1]
[1]
[0]
[1]]
Therefore the network was able to predict all the toy training examples. (Note in real data you would not like to fit the data at its best as it leads to overfitting). Note that you may get a bit different result, as the weight initialisations are different. Also, try to initialise the weight between [-0.01, +0.01] as a rule of thumb, when you are not working on a specific problem and you specifically know the initialisation.
Here is the convergence plot.
Note that you do not need to actually iterate over each example, instead you can do matrix multiplication at once, which is much faster. Also, the above code does not have bias units. Make sure you have bias units when you re-implement the code.
I would recommend you go through the Raul Rojas' Neural Networks, a Systematic Introduction, Chapter 4, 6 and 7. Chapter 7 will tell you how to implement deeper networks in a simple way.

Merge Sort algorithm efficiency

I am currently taking an online algorithms course in which the teacher doesn't give code to solve the algorithm, but rather rough pseudo code. So before taking to the internet for the answer, I decided to take a stab at it myself.
In this case, the algorithm that we were looking at is merge sort algorithm. After being given the pseudo code we also dove into analyzing the algorithm for run times against n number of items in an array. After a quick analysis, the teacher arrived at 6nlog(base2)(n) + 6n as an approximate run time for the algorithm.
The pseudo code given was for the merge portion of the algorithm only and was given as follows:
C = output [length = n]
A = 1st sorted array [n/2]
B = 2nd sorted array [n/2]
i = 1
j = 1
for k = 1 to n
if A(i) < B(j)
C(k) = A(i)
i++
else [B(j) < A(i)]
C(k) = B(j)
j++
end
end
He basically did a breakdown of the above taking 4n+2 (2 for the declarations i and j, and 4 for the number of operations performed -- the for, if, array position assignment, and iteration). He simplified this, I believe for the sake of the class, to 6n.
This all makes sense to me, my question arises from the implementation that I am performing and how it effects the algorithms and some of the tradeoffs/inefficiencies it may add.
Below is my code in swift using a playground:
func mergeSort<T:Comparable>(_ array:[T]) -> [T] {
guard array.count > 1 else { return array }
let lowerHalfArray = array[0..<array.count / 2]
let upperHalfArray = array[array.count / 2..<array.count]
let lowerSortedArray = mergeSort(array: Array(lowerHalfArray))
let upperSortedArray = mergeSort(array: Array(upperHalfArray))
return merge(lhs:lowerSortedArray, rhs:upperSortedArray)
}
func merge<T:Comparable>(lhs:[T], rhs:[T]) -> [T] {
guard lhs.count > 0 else { return rhs }
guard rhs.count > 0 else { return lhs }
var i = 0
var j = 0
var mergedArray = [T]()
let loopCount = (lhs.count + rhs.count)
for _ in 0..<loopCount {
if j == rhs.count || (i < lhs.count && lhs[i] < rhs[j]) {
mergedArray.append(lhs[i])
i += 1
} else {
mergedArray.append(rhs[j])
j += 1
}
}
return mergedArray
}
let values = [5,4,8,7,6,3,1,2,9]
let sortedValues = mergeSort(values)
My questions for this are as follows:
Do the guard statements at the start of the merge<T:Comparable> function actually make it more inefficient? Considering we are always halving the array, the only time that it will hold true is for the base case and when there is an odd number of items in the array.
This to me seems like it would actually add more processing and give minimal return since the time that it happens is when we have halved the array to the point where one has no items.
Concerning my if statement in the merge. Since it is checking more than one condition, does this effect the overall efficiency of the algorithm that I have written? If so, the effects to me seems like they vary based on when it would break out of the if statement (e.g at the first condition or the second).
Is this something that is considered heavily when analyzing algorithms, and if so how do you account for the variance when it breaks out from the algorithm?
Any other analysis/tips you can give me on what I have written would be greatly appreciated.
You will very soon learn about Big-O and Big-Theta where you don't care about exact runtimes (believe me when I say very soon, like in a lecture or two). Until then, this is what you need to know:
Yes, the guards take some time, but it is the same amount of time in every iteration. So if each iteration takes X amount of time without the guard and you do n function calls, then it takes X*n amount of time in total. Now add in the guards who take Y amount of time in each call. You now need (X+Y)*n time in total. This is a constant factor, and when n becomes very large the (X+Y) factor becomes negligible compared to the n factor. That is, if you can reduce a function X*n to (X+Y)*(log n) then it is worthwhile to add the Y amount of work because you do fewer iterations in total.
The same reasoning applies to your second question. Yes, checking "if X or Y" takes more time than checking "if X" but it is a constant factor. The extra time does not vary with the size of n.
In some languages you only check the second condition if the first fails. How do we account for that? The simplest solution is to realize that the upper bound of the number of comparisons will be 3, while the number of iterations can be potentially millions with a large n. But 3 is a constant number, so it adds at most a constant amount of work per iteration. You can go into nitty-gritty details and try to reason about the distribution of how often the first, second and third condition will be true or false, but often you don't really want to go down that road. Pretend that you always do all the comparisons.
So yes, adding the guards might be bad for your runtime if you do the same number of iterations as before. But sometimes adding extra work in each iteration can decrease the number of iterations needed.

Sudoku solver evaluation function

So I'm trying to write a simple genetic algorithm for solving a sudoku (not the most efficient way, I know, but it's just to practice evolutionary algorithms). I'm having some problems coming up with an efficient evaluation function to test if the puzzle is solved or not and how many errors there are. My first instinct would be to check if each row and column of the matrix (doing it in octave, which is similar to matlab) have unique elements by ordering them, checking for duplicates and then putting them back the way they were, which seems long winded. Any thoughts?
Sorry if this has been asked before...
Speedups:
Use bitwise operations instead of sorting.
I made 100 line sudoku solver in c it reasonably fast. For or super speed you need to implement DLX algorhitm, there is also some file on matlab exchange for that.
http://en.wikipedia.org/wiki/Exact_cover
http://en.wikipedia.org/wiki/Dancing_Links
http://en.wikipedia.org/wiki/Knuth's_Algorithm_X
#include "stdio.h"
int rec_sudoku(int (&mat)[9][9],int depth)
{
int sol[9][9][10]; //for eliminating
if(depth == 0) return 1;
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
sol[i][j][9]=9;
for(int k=0;k<9;k++)
{
if(mat[i][j]) sol[i][j][k]=0;
else sol[i][j][k]=1;
}
}
}
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
if(mat[i][j] == 0) continue;
for(int k=0;k<9;k++)
{
if(sol[i][k][mat[i][j]-1])
{
if(--sol[i][k][9]==0) return 0;
sol[i][k][mat[i][j]-1]=0;
}
if(sol[k][j][mat[i][j]-1])
{
if(--sol[k][j][9]==0) return 0;
sol[k][j][mat[i][j]-1]=0;
}
}
for(int k=(i/3)*3;k<(i/3+1)*3;k++)
{
for(int kk=(j/3)*3;kk<(j/3+1)*3;kk++)
{
if(sol[k][kk][mat[i][j]-1])
{
if(--sol[k][kk][9]==0) return 0;
sol[k][kk][mat[i][j]-1]=0;
}
}
}
}
}
for(int c=1;c<=9;c++)
{
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
if(sol[i][j][9] != c) continue;
for(int k=0;k<9;k++)
{
if(sol[i][j][k] != 1) continue;
mat[i][j]=k+1;
if(rec_sudoku(mat,depth-1)) return 1;
mat[i][j]=0;
}
return 0;
}
}
}
return 0;
}
int main(void)
{
int matrix[9][9] =
{
{1,0,0,0,0,7,0,9,0},
{0,3,0,0,2,0,0,0,8},
{0,0,9,6,0,0,5,0,0},
{0,0,5,3,0,0,9,0,0},
{0,1,0,0,8,0,0,0,2},
{6,0,0,0,0,4,0,0,0},
{3,0,0,0,0,0,0,1,0},
{0,4,0,0,0,0,0,0,7},
{0,0,7,0,0,0,3,0,0}
};
int d=0;
for(int i=0;i<9;i++) for(int j=0;j<9;j++) if(matrix[i][j] == 0) d++;
if(rec_sudoku(matrix,d)==0)
{
printf("no solution");
return 0;
}
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
printf("%i ",matrix[i][j]);
}
printf("\n");
}
return 1;
}
The check is easy, you'll create sets for rows, columns, and 3x3's adding a number if it does not exist and altering your fitness accordingly if it does not.
The real trick however is "altering your fitness" accordingly. Some problems seem well suited to GA and ES (evolution strategies), that is we look for a solution in tolerance, sudoku has an exact answer... tricky.
My first crack would probably be creating solutions with variable length chromosomes (well they could be fixed length but 9x9's with blanks). The fitness function should be able to determine which part of the solution is guaranteed and which part is not (sometimes you must take a guess in the dark in a really tough sudoku game and then back track if it does not work out), it would be a good idea to create children for each possible branch.
This then is a recursive solution. However you could start scanning from different positions on the board. Recombination would combine solutions which combine unverified portions which have overlapping solutions.
Just thinking about it in this high level easy going fashion I can see how mind bending this will be to implement!
Mutation would only be applied when there is more than one path to take, after all a mutation is a kind of guess.
Sounds good, except for the 'putting them back' part. You can just put the numbers from any line, column or square in the puzzle in a list and check for doubles any way you want. If there are doubles, there is an error. If all numbers are unique there's not. You don't need to take the actual numbers out of the puzzle, so there is no need for putting them back either.
Besides, if you're writing a solver, it should not make any invalid move, so this check would not be needed at all.
I would use the grid's numbers as an index, and increment an 9 elements length array's respective element => s_array[x]++ where x is the number taken from the grid.
Each and every element must be 1 in the array at the end of checking one row. If 0 occurs somewhere in the array, that line is wrong.
However this is just a simple sanity check if there are no problems, line-wise.
PS: if it were 10 years ago, I would suggest an assembly solution with bit manipulation (1st bit, 2nd bit, 3rd bit, etc. for the values 1,2 or 3) and check if the result is 2^10-1.
When I solved this problem, I just counted the number of duplicates in each row, column and sub-grid (in fact I only had to count duplicates in columns and sub-grids as my evolutionary operators were designed never to introduce duplicates into rows). I just used a HashSet to detect duplicates. There are faster ways but this was quick enough for me.
You can see this visualised in my Java applet (if it's too fast, increase the population size to slow it down). The coloured squares are duplicates. Yellow squares conflict with one other square, orange with two other squares and red with three or more.
Here is my solution. Sudoku solving solution in C++
Here is my solution using set. If for a line, a block or a column you get a set length of (let say) 7, your fitness would be 9 - 7.
If you are operating on a small set of integers sorting can be done in O(n) using bucket sorting.
You can use tmp arrays to do this task in matlab:
function tf = checkSubSet( board, sel )
%
% given a 9x9 board and a selection (using logical 9x9 sel matrix)
% verify that board(sel) has 9 unique elements
%
% assumptions made:
% - board is 9x9 with numbers 1,2,...,9
% - sel has only 9 "true" entries: nnz(sel) = 9
%
tmp = zeros(1,9);
tmp( board( sel ) ) = 1; % poor man's bucket sorting
tf = all( tmp == 1 ) && nnz(sel) == 9 && numel(tmp) == 9; % check validity
Now we can use checkSubSet to verify the board is correct
function isCorrect = checkSudokuBoard( board )
%
% assuming board is 9x9 matrix with entries 1,2,...,9
%
isCorrect = true;
% check rows and columns
for ii = 1:9
sel = false( 9 );
sel(:,ii) = true;
isCorrect = checkSubSet( board, sel );
if ~isCorrect
return;
end
sel = false( 9 );
sel( ii, : ) = true;
isCorrect = checkSubSet( board, sel );
if ~isCorrect
return;
end
end
% check all 3x3
for ii=1:3:9
for jj=1:3:9
sel = false( 9 );
sel( ii + (0:2) , jj + (0:2) ) = true;
isCorrect = checkSubSet( board, sel );
if ~isCorrect
return;
end
end
end