I'm trying to code a Connect 4 - IA based on neural networks.
But it's the first time I use ANN, so I'm completely lost...
I found something that looks great, but need some explanations...
Structure
I'm ok with the input, but I don't understand the number of hiddens.
Also, the output should be the number of the colomn where to put the coin next turn, it means a value between 1 and 7.
But how could I get these values, using a sigmoid function ?
Related
This is my first time working with DL4J (Deep Learning for Java) and also my first Convolutional Neural Network. My Goal is to use the Convolutional Neural Netowrk to give me some predicted values about an image. I gathered and labelled my images myself. The labels or expected outputs consist of two numbers between 0 and 1 (I just wrote them in the file name like 0.01x0.87.jpg).
Now I can't find any way to use the DataSetIterator Class which DL4J uses so that I can also set my label values.
Is there a simple way to tell DL4J that I want to train my Network to recognize that image 0.01x0.01.jpg should spit out the values 0.01 and 0.01?
What you want to do is usually known as regression. In contrast to classification where you want to either have a 0 or 1 output, in regression any value can be the target.
In your case, you will likely want to use a network architecture that uses either a sigmoid (which forces your values to be between 0 and 1) or an identity (which keeps the values as is, i.e. allows for them to be outside of the 0 to 1 range) activation function.
As you have two values that you are trying to predict, you will have to also define that you are using two outputs.
So much for your model architecture.
For data loading, you can use the ImageRecordReader, but also pass it a PathMultiLabelGenerator of your own. When you implement the PathMultiLabelGenerator interface, you will get the full path of the image as a string, and you can do whatever you want with it, like for example remove the file ending, split on x and parse your filename into a list of DoubleWritable. DoubleWritable is just a simple wrapper class for double so creating that is as easy as just instantiating it by passing the actual value to the constructor.
To create a dataset iterator you can now follow the documentation on RecordReaderDataSetIterator.
Recently I've been working on character recognition using Back Propagation Algorithm. I've taken the image and reduced to 5x7 size, therefore I got 35 pixels and trained the network using those pixels with 35 input neurons, 35 hidden nodes, and 10 output nodes. And I had completed the training successfully and I got weights that I needed. And I've got stuck here. I have my test set and I know I should feed forward the network. But I don't know what to do exactly. My test set will be 4 samples of 1x35. My output layer has 10 neurons. how do I exactly distinguish the characters with the output that I will get? I want to know how this testing works. Please guide me through this stage. Thanks in advance.
One vs All
A common approach for testing these types of neural networks is "one-vs-all" approach. We view each of the output nodes as its own classifier that is giving the probability of the sample being that class vs not being that class.
For instance if you network output [1, 0, ..., 0] then class 1 has high probability of being class 1 vs not being class 1. Class 2 has low probability of being class 2 vs not being class 2, etc.
Ties
In the case of a tie, it is common (in research) to have a random function break the tie. If you get [1, 1, 1, ..., 1] then the function would pick a number from 1-10 and that is your prediction. In practice sometimes an expert system is used to break ties. Perhaps class 1 is more expensive than class 2, so we tie in preference to class 2.
Steps
So the steps are:
Split dataset into test/train set
Train weights on train set
Pass test set forward through the neural network
For each sample, choose the argmax (the output with highest value) as your prediction
In case of tie, choose randomly between all tying classes
Aside
In your particular case, I imagine implementation of this strategy will result in a network that barely beats random performance (10%) accuracy.
I would suggest some reconsidering of the network architecture.
If you look at your 5x7 images, can you tell what number that image was originally? It seems likely that scaling the image down to this size losses too much information that the network cannot distinguish between classes.
Debugging
From what you've described I would look at the following when debugging your network.
Is your data preprocessing (down-scaling) leeching out too much information? Check this by manually investigating a few of the images and seeing if you can tell what the image should be.
Does your one-hot algorithm work? When you convert your targets for training, does it successfully convert 1 -> [1, 0, 0, ..., 0]?
Is your back-prop / gradient descent algorithm correct? You should see (roughly) a monotonic decrease in your loss function while training. Try at every step (or every few steps) printing the loss that you are optimizing. Or even for a very simple gut check, print mean squared error: (P-Y)^2
I discovered neural netwaorks a few time ago and i now how they work
but i have a question: how can I train a NN countinue number patterns,
e.g. 0,2,4,6,8 as a input and 10,12,14,16,18 as output.
I dont know how to make this, but I thought about a input [0,1,2,3,4] and
output [0,2,4,6,8],what means 0 is the first number of the stream and 3 the third
Can this work or what are the other ways.
I'm trying to implement the proposed model in a CVPR paper (Deep Interactive Object Selection) in which the data set contains 5 channels for each input sample:
1.Red
2.Blue
3.Green
4.Euclidean distance map associated to positive clicks
5.Euclidean distance map associated to negative clicks (as follows):
To do so, I should fine tune the FCN-32s network using "object binary masks" as labels:
As you see, in the first conv layer I have 2 extra channels, so I did net surgery to use pretrained parameters for the first 3 channels and Xavier initialization for 2 extras.
For the rest of the FCN architecture, I have these questions:
Should I freeze all the layers before "fc6" (except the first conv layer)? If yes, how the extra channels of the first conv will be learned? Are the gradients strong enough to reach the first conv layer during training process?
What should be the kernel size of the "fc6"? should I keep 7? I saw in "Caffe net_surgery" notebook that it depends on the output size of the last layer ("pool5").
The main problem is the number of outputs of the "score_fr" and "upscore" layers, since I'm not doing class segmentation (to use 21 for 20 classes and the background), how should I change it? What about 2? (one for object and the other for the non-object (background) area)?
Should I change "crop" layer "offset" to 32 to have center crops?
In case of changing each of these layers, what is the best initialization strategy for them? "bilinear" for "upscore" and "Xavier" for the rest?
Should I convert my binary label matrix values into zero-centered ( {-0.5,0.5} ) status, or it is OK to use them with the values in {0,1} ?
Any useful idea will be appreciated.
PS:
I'm using Euclidean loss, while I'm using "1" as the number of outputs for "score_fr" and "upscore" layers. If I use 2 for that, I guess it should be softmax.
I can answer some of your questions.
The gradients will reach the first layer so it should be possible to learn the weights even if you freeze the other layers.
Change the num_output to 2 and finetune. You should get a good output.
I think you'll need to experiment with each of the options and see how the accuracy is.
You can use the values 0,1.
I have several datasets i.e. matrices that have a 2 columns, one with a matlab date number and a second one with a double value. Here an example set of one of them
>> S20_EavesN0x2DEAir(1:20,:)
ans =
1.0e+05 *
7.345016409722222 0.000189375000000
7.345016618055555 0.000181875000000
7.345016833333333 0.000177500000000
7.345017041666667 0.000172500000000
7.345017256944445 0.000168750000000
7.345017465277778 0.000166875000000
7.345017680555555 0.000164375000000
7.345017888888889 0.000162500000000
7.345018104166667 0.000161250000000
7.345018312500001 0.000160625000000
7.345018527777778 0.000158750000000
7.345018736111110 0.000160000000000
7.345018951388888 0.000159375000000
7.345019159722222 0.000159375000000
7.345019375000000 0.000160625000000
7.345019583333333 0.000161875000000
7.345019798611111 0.000162500000000
7.345020006944444 0.000161875000000
7.345020222222222 0.000160625000000
7.345020430555556 0.000160000000000
Now that I have those different sensor values, I need to get them together into a matrix, so that I could perform clustering, neural net and so on, the only problem is, that the sensor data was taken with slightly different timings or timestamps and there is nothing I can do about that from a data collection point of view.
My first thought was interpolation to make one sensor data set fit another one, but that seems like a messy approach and I was thinking maybe I am missing something, a toolbox or function that would enable me to do this quicker without me fiddling around. To even complicate things more, the number of sensors grew over time, therefore I am looking at different start dates as well.
Someone a good idea on how to go about this? Thanks
I think your first thought about interpolation was the correct one, at least if you plan to use NNs. Another option would be to use approaches which are designed to deal with missing data, like http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory for example.
It's hard to give an answer for the clustering part, because I have no idea what you're looking for in the data.
For the neural network, beside interpolating there are at least two other methods that come to mind:
training separate networks for each matrix
feeding them all together to the same network, with a flag specifying which matrix the data is coming from, i.e. something like: input (timestamp, flag_m1, flag_m2, ..., flag_mN) => target (value) where the flag_m* columns are mutually exclusive boolean values - i.e. flag_mK is 1 iff the line comes from matrix K, 0 otherwise.
These are the only things I can safely say with the amount of information you provided.