Setting up an ANN to classify Tic-Tac-Toe End-Games - neural-network

I'm having an hard time setting up a neural network to classify Tic-Tac-Toe board states (final or intermediate) as "X wins", "O wins" or "Tie".
I will describe my current solution and results. Any advice is appreciated.
* DATA SET *
Dataset = 958 possible end-games + 958 random-games = 1916 board states
(random-games might be incomplete but are all legal. i.e. do not have both players winning simultaneously).
Training set = 1600 random sample of Dataset
Test set = remaining 316 cases
In my current pseudo-random development scenario the dataset has the following characteristics.
Training set:
- 527 wins for "X"
- 264 wins for "O"
- 809 ties
Test set:
- 104 wins for "X"
- 56 wins for "O"
- 156 ties
* Modulation *
Input Layer: 18 input neurons where each one corresponds to a board position and player. Therefore,
the board (B=blank):
x x o
o x B
B o X
is encoded as:
1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0
Output Layer: 3 output neurons which correspond to each outcome (X wins, O wins, Tie).
* Architecture *
Based on: http://www.cs.toronto.edu/~hinton/csc321/matlab/assignment2.tar.gz
1 Single Hidden Layer
Hidden Layer activation function: Logistic
Output Layer activation function: Softmax
Error function: Cross-Entropy
* Results *
No combination of parameters seems to achieve 100% correct classification rate. Some examples:
NHidden LRate InitW MaxEpoch Epochs FMom Errors TestErrors
8 0,0025 0,01 10000 4500 0,8 0 7
16 0,0025 0,01 10000 2800 0,8 0 5
16 0,0025 0,1 5000 1000 0,8 0 4
16 0,0025 0,5 5000 5000 0,8 3 5
16 0,0025 0,25 5000 1000 0,8 0 5
16 0,005 0,25 5000 1000 0,9 10 5
16 0,005 0,25 5000 5000 0,8 15 5
16 0,0025 0,25 5000 1000 0,8 0 5
32 0,0025 0,25 5000 1500 0,8 0 5
32 0,0025 0,5 5000 600 0,9 0 5
8 0,0025 0,25 5000 3500 0,8 0 5
Important - If you think I could improve any of the following:
- The dataset characteristics (source and quantities of training and test cases) aren't the best.
- An alternative problem modulation is more suitable (encoding of input/output neurons)
- Better network architecture (Number of Hidden Layers, activation/error functions, etc.).
Assuming that my current options in this regard, even if not optimal, should not prevent the system from having a 100% correct classification rate, I would like to focus on other possible issues.
In other words, considering the simplicity of the game, this dataset/modulation/architecture should do it, therefore, what am I doing wrong regarding the parameters?
I do not have much experience with ANN and my main question is the following:
Using 16 Hidden Neurons, the ANN could learn to associate each Hidden Unit with "a certain player winning in a certain way"
(3 different rows + 3 different columns + 2 diagonals) * 2 players
In this setting, an "optimal" set of weights is pretty straightforward: Each hidden unit has "greater" connection weights from 3 of the input units (corresponding to a row, columns or diagonal of a player) and a "greater" connection weight to one of the output units (corresponding to "a win" of that player).
No matter what I do, I cannot decrease the number of test errors, as the above table shows.
Any advice is appreciated.

You are doing everything right, but you're simply trying to tackle a difficult problem here, namely to generalize from some examples of tic-tac-toe configurations to all others.
Unfortunately, the simple neural network you use does not perceive the spatial structure of the input (neighborhood) nor can it exploit the symmetries. So in order to get perfect test error, you can either:
increase the size of the dataset to include most (or all) possible configurations -- which the network will then be able to simply memorize, as indicated by the zero training error in most of your setups;
choose a different problem, where there is more structure to generalize from;
use a network architecture that can capture symmetry (e.g. through weight-sharing) and/or spatial relations of the inputs (e.g. different features). Convolutional networks are just one example of this.

Related

Mixed-effects linear regression model using multiple independent measurements

I am trying to implement a linear mixed effect (LME) regression model for an x-ray imaging quality metric "CNR" (contrast-to-noise ratio) for which I measured for various tube potentials (kV) and filtration materials (Filter). CNR was measured for 3 consecutive slices so I have a standard deviation of the CNR from these independent measurements as well. I am wondering how I can incorporate these multiple independent measurements in my analysis. A representation of the data for a single measurement and my first attempt using fitlme is shown below. I tried looking at online resources but could not find an answer to my specific questions.
kV=[80 90 100 80 90 100 80 90 100]';
Filter={'Al','Al','Al','Cu','Cu','Cu','Ti','Ti','Ti'}';
CNR=[10 9 8 10.1 8.9 7.9 7 6 5]';
T=table(kV,Filter,CNR);
kV Filter CNR
___ ______ ___
80 'Al' 10
90 'Al' 9
100 'Al' 8
80 'Cu' 10.1
90 'Cu' 8.9
100 'Cu' 7.9
80 'Ti' 7
90 'Ti' 6
100 'Ti' 5
OUTPUT
Linear mixed-effects model fit by ML
Model information:
Number of observations 9
Fixed effects coefficients 4
Random effects coefficients 0
Covariance parameters 1
Formula:
CNR ~ 1 + kV + Filter
Model fit statistics:
AIC BIC LogLikelihood Deviance
-19.442 -18.456 14.721 -29.442
Fixed effects coefficients (95% CIs):
Name Estimate SE pValue
'(Intercept)' 18.3 0.17533 1.5308e-09
'kV' -0.10333 0.0019245 4.2372e-08
'Filter_Cu' -0.033333 0.03849 -0.86603
'Filter_Ti' -3 0.03849 -77.942
Random effects covariance parameters (95% CIs):
Group: Error
Name Estimate Lower Upper
'Res Std' 0.04714 0.0297 0.074821
Questions/Issues with current implementation:
How is the fixed effects coefficients for '(Intercept)' with P=1.53E-9 interpreted?
I only included fixed effects. Should the standard deviation of the ROI measurements somehow be incorporated into the random effects as well?
How do I incorporate the three independent measurements of CNR for three consecutive slices for a give kV/filter combination? Should I just add more rows to the table "T"? This would result in a total of 27 observations.

Matlab: I/O Delay detection

I have a continuous process with 3 inputs and 1 output. The 3 inputs are consecutive in time: Input 1 lags the output by 30 minutes, Input 2 by 15 etc.
My dataset below shows a startup for the system after a shutdown:
I1 I2 I3 Out
0 0 0 0
3 0 0 0
8 4 0 0
13 8 6 0
22 13 9 3.2
It can be seen how input1 started and everything else followed.
My question: in Matlab, what should I look for in order to determine such I/O delay for more complex datasets?
You should pay a close look to xcorr
xcorr performs a cross-correlation between two vectors (typically time signals) and checks their conformity in dependence of a time shift between the signals. A constant I/O lag should appear as a local maximum value for the correlation coefficient.

How do I identify the correct clustering algorithm for the available data?

I have the sample data of flight routes, number of searches for that route, gross profit for the route, number of transactions for the route. I want to bucket flight routes which shows similar characteristics based on above mentioned variables. What are the steps to fix on the particular clustering algorithm?
Below is sample data which I would like to cluster.
Route Clicks Impressions CPC Share of Voice Gross-Profit Number of Transactions Conversions
AAE-ALG 2 25 0.22 $4.00 2 1
AAE-CGK 5 40 0.21 $6.00 1 1
AAE-FCO 1 25 0.25 $13.00 4 1
AAE-IST 8 58 0.30 $18.00 3 2
AAE-MOW 22 100 0.11 $1.00 6 5
AAE-ORN 11 70 0.21 $22.00 3 2
AAE-ORY 8 40 0.18 $3.00 4 4
For me it seems an N dimension clustering problem where N is the number of features, N = 7 (Route, Clicks, Impressions, CPC, Share of Voice, Gross-Profit, Number of Transactions, Conversions).
I think if you preprocess the feature values to be able to interpret distance on them you can apply K-means for clustering your data.
E.g. Route can be represented by the distance* of the airports: dA than you can find diff between 2 distances* that will be the distance between them: d = ABS(dA - dA')
Don't forget to scale your features.

How to properly model ANN to find relationship between real value input-output data?

I'm trying to make an ANN which could tell me if there is causality between my input and output data. Data is following:
My input are measured values of pesticides (19 total) in an area eg:
-1.031413662 -0.156086316 -1.079232918 -0.659174849 -0.734577317 -0.944137546 -0.596917991 -0.282641072 -0.023508282 3.405638835 -1.008434997 -0.102330305 -0.65961995 -0.687140701 -0.167400684 -0.4387984 -0.855708613 -0.775964435 1.283238514
And the output is the measured value of plant-somthing in the same area (55 total) eg:
0.00 0.00 0.00 13.56 0 13.56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13.56 0 0 0 1.69 0 0 0 0 0 0 0 0 0 0 1.69 0 0 0 0 13.56 0 0 0 0 13.56 0 0 0 0 0 0
Values for input are in range from -2.5 to 10, and for output from 0 to 100.
So the question I'm trying to answer is: in what measure does pesticide A affect plant-somthings.
What are good ways to model (represent) input/output neurons to be able to process the mentioned input/output data? And how to scale/convert input/output data to be useful for NN?
Is there a book/paper that I should look at?
First, a neural network cannot find the causality between output and input, but only the correlation (just like every other probabilistic methods). Causality can only be derived logically from reasoning (and even then, it's not always clear, it all depends on your axioms).
Secondly, about how to design a neural network to model your data, here is a pretty simple rule that can be generally applied to make a first working draft:
set the number of input neurons = the number of input variables for one sample
set the number of output neurons = the number of output variables for one sample
then play with the number of hidden layers and the number of hidden neurons per hidden layer. In practice, you want to use the fewest number of hidden layers/neurons to model your data correctly, but enough so that the function approximated by your neural network fits correctly the data (else the error in output will be huge compared to the real output dataset).
Why do you need to use just enough neurons but not too much? This is because if you use a lot of hidden neurons, you are sure to overfit your data, and thus you will make a perfect prediction on your training dataset, but not in the general case when you will use real datasets. Theoretically, this is because a neural network is a function approximator, thus it can approximate any function, but using a too high order function will lead to overfitting. See PAC learning for more info on this.
So, in your precise case, the first thing to do is to clarify how many variables you have in input and in output for each sample. If it's 19 in input, then create 19 input nodes, and if you have 55 output variables, then create 55 output neurons.
About scaling and pre-processing, yes you should normalize your data between the range 0 and 1 (or -1 and 1 it's up to you and it depends on the activation function). A very good place to start is to watch the videos at the machine learning course by Andrew Ng at Coursera, this should get you kickstarted quickly and correctly (you'll be taught the tools to check that your neural network is working correctly, and this is immensely important and useful).
Note: you should check your output variables, from the sample you gave it seems they use discrete values: if the values are discrete, then you can use discrete output variables which will be a lot more precise and predictive than using real, floating values (eg, instead of having [0, 1.69, 13.56] as the possible output values, you'll have [0,1,2], this is called "binning" or multi-class categorization). In practice, this means you have to change the way your network works, by using a classification neural network (using activation functions such as sigmoid) instead of a regressive neural network (using activation functions such as logistic regression or rectified linear unit).

Neural Network for a Robot

I need to implement a Robot Brain, I used feedforward neural network as a Controller. The robot has 24 sonar sonsor, and only one ouput which is R=Right, L=Left, F=Forward, B=Back. I also have a large dataset which contain sonar data and the desired output. The FNN is trained using backpropagation algorithm.
I used neuroph Studio to construct the FNN and to do the trainnig. Here the network params:
Input layer: 24
Hidden Layer: 10
Output Layer: 1
LearnningRate: 0.5
Momentum: 0.7
GlobalError: 0.1
My problem is that during iteration the error drop slightly and seems to be static. I tried to change the parameter but I'm not getting any useful result!!
Thanks for your help
Use 1 of n encoding for the output. Use 4 output neurons, and set up your target (output) data like this:
1 0 0 0 = right
0 1 0 0 = left
0 0 1 0 = forward
0 0 0 1 = back
Reduce the number of input sensors (and corresponding input neurons) to begin with, down to 3 or 5. This will simplify things so you can understand what's going on. Later you can build back up to 24 inputs.
Neural networks often get stuck in local minima during training, that could be why your error is static. Increasing the momentum can help avoid this.
Your learning rate looks quite high. Try 0.1, but play around with these values. Every problem is different and there are no values guaranteed to work.