make a neural network continue number patterns - neural-network

I discovered neural netwaorks a few time ago and i now how they work
but i have a question: how can I train a NN countinue number patterns,
e.g. 0,2,4,6,8 as a input and 10,12,14,16,18 as output.
I dont know how to make this, but I thought about a input [0,1,2,3,4] and
output [0,2,4,6,8],what means 0 is the first number of the stream and 3 the third
Can this work or what are the other ways.

Related

Set Batch Size *and* Number of Training Iterations for a neural network?

I am using the KNIME Doc2Vec Learner node to build a Word Embedding. I know how Doc2Vec works. In KNIME I have the option to set the parameters
Batch Size: The number of words to use for each batch.
Number of Epochs: The number of epochs to train.
Number of Training Iterations: The number of updates done for each batch.
From Neural Networks I know that (lazily copied from https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network):
one epoch = one forward pass and one backward pass of all the training examples
batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).
As far as I understand it makes little sense to set batch size and iterations, because one is determined by the other (given the data size, which is given by the circumstances). So why can I change both parameters?
This is not necessarily the case. You can also train "half epochs". For example, in Google's inceptionV3 pretrained script, you usually set the number of iterations and the batch size at the same time. This can lead to "partial epochs", which can be fine.
If it is a good idea or not to train half epochs may depend on your data. There is a thread about this but not a concluding answer.
I am not familiar with KNIME Doc2Vec, so I am not sure if the meaning is somewhat different there. But from the definitions you gave setting batch size + iterations seems fine. Also setting number of epochs could cause conflicts though leading to situations where numbers don't add up to reasonable combinations.

Connect 4 and neural networks

I'm trying to code a Connect 4 - IA based on neural networks.
But it's the first time I use ANN, so I'm completely lost...
I found something that looks great, but need some explanations...
Structure
I'm ok with the input, but I don't understand the number of hiddens.
Also, the output should be the number of the colomn where to put the coin next turn, it means a value between 1 and 7.
But how could I get these values, using a sigmoid function ?

Neural network to figure out the meaning of user input

I'm trying to make a neural network try to figure out the meaning of input(keyboard keys in this case) according to the user.
I have multiple possible output "commands" that the NN can interpret the inputs to mean, and at each state certain outputs can count as beneficial while others are a detriment
When the NN starts up for the first time, no input should have any particular meaning to it but as time goes on I want the NN to be able to figure out what the user most likely meant.
I've tried a Multilayer perceptron NN that has as many input nodes as there are physical inputs and as many output nodes as there are commands and a number of nodes equal to the sum of the other two layers as a single hidden layer, in this case it is then a 5,15,10.
The NN assumes that the user will only make moves that are in the NN's best interest.
So far it seems the NN is just figuring out what is the command it can take that will most likely result in a beneficial move, regardless of the input key rather than trying to figure out what key should result in what move according to the user.
Because of this I'm wondering (most likely wrong) if I should produce a separate NN for each input to try and figure out the current output according to the user.
Is there a different type of NN I should look into that will work better, and is there a recommended configuration for this problem?
I'll be happy with some recommendations of reading material that would help in this particular problem.
I'm at best an amateur in NN and would like to learn a lot more about the whole field, But I'm trying to focus my efforts on this problem for now.
Accordng to me you want the output to be according to behaviour of the player as number of inputs are more than in actual case. So according to me there should be some type of memory for the actions taken by the player in order to find the patterns.This can be done using Long Short term Memory.

Transforming text with neural network

I previously worked with neural networks, but just for fun mainly using them with normalized Categorical(enum),Numeric and Bit(bool) values. I know NNs have trouble understanding characters, but I was wondering if they could understand how to transform text.
So is it possible for example to train NN to do following:
13/20 = 20
aa/bb = bb
20/10 = 10
Or (replaced d with f)
abcde = abcfe
tdfg = tffg
ddhj = ffhj
If yes, how reliably? Or maybe there is something better suited for the job?
The question wasn't related to implementation but since I have just finished a lab on a similar topic, here are some practical tips:
It would be quite possible and easy to use a RNN network to transform the text in that way. Clone this (https://github.com/karpathy/char-rnn) repository and train it with a file placed in data/folder/input.txt with enough size in the same format you want the output:
abcde = abcfe
tdfg = tffg
ddhj = ffhj
use this command to train:
th train.lua -data_dir data/folder
When testing the network it should be able to produce a correct output based on the seed text you provide:
th sample.lua cv/[latest_sample] -primetext "abcd" -length 7
should be able to produce the output:
abcd = abcf
Everything depends on complexity of the transformation. If you are interested in these examples, then, certanly yes, this is doable and reliable. Second example is trivial, you just present NN one character at the time, encode input and output as one-hot vector (one neuron per character) and it will do the job. First example can be solved by converting left and right part to one-hot input vector representing one of all possible combinations of two symbols and having two outputs, asking NN to select if first part or second part should be selected (better ways to encode input exists, especially for long strings). Provided you have enough training examples, everything should work fine.
Generally, days when it was prohibitively difficut for NNs to deal with texts are long gone. Now NNs can be trained to do machine translation (better then any other method) and even, to some extent, trained to predict output of simple computer programs based on program character string (but this is still difficult task for NNs).

Newbie to Neural Networks

Just starting to play around with Neural Networks for fun after playing with some basic linear regression. I am an English teacher so don't have a math background and trying to read a book on this stuff is way over my head. I thought this would be a better avenue to get some basic questions answered (even though I suspect there is no easy answer). Just looking for some general guidance put in layman's terms. I am using a trial version of an Excel Add-In called NEURO XL. I apologize if these questions are too "elementary."
My first project is related to predicting a student's Verbal score on the SAT based on a number of test scores, GPA, practice exam scores, etc. as well as some qualitative data (gender: M=1, F=0; took SAT prep class: Y=1, N=0; plays varsity sports: Y=1, N=0).
In total, I have 21 variables that I would like to feed into the network, with the output being the actual score (200-800).
I have 9000 records of data spanning many years/students. Here are my questions:
How many records of the 9000 should I use to train the network?
1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
If I split the data into an even number, say 9x1000 (or however many) and created a network for each one, then tested the results of each of these 9 on the other 8 sets to see which had the lowest MSE across the samples, would this be a valid way to "choose" the best network if I wanted to predict the scores for my incoming students (not included in this data at all)?
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
E.g.
750-800 = 10
700-740 = 9
etc.
Is there any benefit to doing this or should I just go ahead and try to predict the exact score?
What if ALL I cared about was whether or not the score was above or below 600. Would I then just make the output 0(below 600) or 1(above 600)?
5a. I read somewhere that it's not good to use 0 and 1, but instead 0.1 and 0.9 - why is that?
5b. What about -1(below 600), 0(exactly 600), 1(above 600), would this work?
5c. Would the network always output -1, 0, 1 - or would it output fractions that I would then have to roundup or rounddown to finalize the prediction?
Once I have found the "best" network from Question #3, would I then play around with the different parameters (number of epochs, number of neurons in hidden layer, momentum, learning rate, etc.) to optimize this further?
6a. What about the Activation Function? Will Log-sigmoid do the trick or should I try the other options my software has as well (threshold, hyperbolic tangent, zero-based log-sigmoid).
6b. What is the difference between log-sigmoid and zero-based log-sigmoid?
Thanks!
First a little bit of meta content about the question itself (and not about the answers to your questions).
I have to laugh a little that you say 'I apologize if these questions are too "elementary."' and then proceed to ask the single most thorough and well thought out question I've seen as someone's first post on SO.
I wouldn't be too worried that you'll have people looking down their noses at you for asking this stuff.
This is a pretty big question in terms of the depth and range of knowledge required, especially the statistical knowledge needed and familiarity with Neural Networks.
You may want to try breaking this up into several questions distributed across the different StackExchange sites.
Off the top of my head, some of it definitely belongs on the statistics StackExchange, Cross Validated: https://stats.stackexchange.com/
You might also want to try out https://datascience.stackexchange.com/ , a beta site specifically targeting machine learning and related areas.
That said, there is some of this that I think I can help to answer.
Anything I haven't answered is something I don't feel qualified to help you with.
Question 1
How many records of the 9000 should I use to train the network? 1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
Randomizing the selection of training data is probably not a good idea.
Keep in mind that truly random data includes clusters.
A random selection of students could happen to consist solely of those who scored above a 30 on the ACT exams, which could potentially result in a bias in your result.
Likewise, if you only select students whose SAT scores were below 700, the classifier you build won't have any capacity to distinguish between a student expected to score 720 and a student expected to score 780 -- they'll look the same to the classifier because it was trained without the relevant information.
You want to ensure a representative sample of your different inputs and your different outputs.
Because you're dealing with input variables that may be correlated, you shouldn't try to do anything too complex in selecting this data, or you could mistakenly introduce another bias in your inputs.
Namely, you don't want to select a training data set that consists largely of outliers.
I would recommend trying to ensure that your inputs cover all possible values for all of the variables you are observing, and all possible results for the output (the SAT scores), without constraining how these requirements are satisfied.
I'm sure there are algorithms out there designed to do exactly this, but I don't know them myself -- possibly a good question in and of itself for Cross Validated.
Question 3
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
My understanding is that this is not recommended as the input to a Nerual Network, but I may be wrong.
The convergence of the network should handle this for you.
Every node in the network will assign a weight to its inputs, multiply them by their weights, and sum those products as a core part of its computation.
That means that every node in the network is searching for some coefficients for each of their inputs.
To do this, all inputs will be converted to numeric values -- so conditions like gender will be translated into "0=MALE,1=FEMALE" or something similar.
For example, a node's metric might look like this at a given point in time:
2*ACT_SCORE + 0*GENDER + (-5)*VARISTY_SPORTS ...
The coefficients for each values are exactly what the network is searching for as it converges.
If you change the scale of a value, like ACT_SCORE, you just change the scale of the coefficient that will be found by the reciporical of that scaling factor.
The result should still be the same.
There are other concerns in terms of accuracy (computers have limited capacity to represent small fractions) and speed that may enter this, but not being familiar with NEURO XL, I can't say whether or not they apply for this technology.
Question 4
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
This will reduce accuracy, although you should converge to a solution much faster with fewer possible outputs (scores).
Neural Networks actually describe very high-dimensional functions in their input variables.
If you reduce the granularity of that function's output space, you essentially state that you don't care about local minima and maxima in that function, especially around the borders between your output scores.
As a result, you are sacrificing information that may be an essential component of the "true" function that you are searching for.
I hope this has been helpful, but you really should break this question down into its many components and ask them separately on different sites -- potentially some of them do belong here on StackOverflow as well.