Genetic Algorithm A.I. repetitive behavior - neural-network

I am writing a C# Windows Forms Application which simulates a simple environment (grid) with two types of objects: plants and herbivores. The herbivores have neural networks which take contents the surrounding few cells as input that decide which direction to move in. The idea is to train the herbivores to eat the plants using a fitness function and a genetic algorithm.
My problem is that if there is nothing surrounding a herbivore, it will decide to move in a particular direction, then, if there is still nothing around it, it will move in the same direction again. What I end up with is a few herbivores that just move in strait lines and never actually encounter any plants at all.
Would adding a clock signal as an input (with each bit as an individual input to the neural network) change this behavior or is this not recommended? I have also thought about adding an input which is just random data (from a Gaussian distribution) to add some unpredictability, but I don't know if this would help or harm the problem. Another idea I am not sure about is if maybe having inputs for the past few moves (as a sort of memory) might solve this issue.

I think you need a Recurrent Network. You can keep track of the last N decisions the network has made and then use them as extra inputs to your network so it will have some sort of knowledge about where it was going and for how long. It could at some point evolve in such a way that it starts doing some sort of path finding.

What #Can_Alper said is definitely good. Also take a look at LSTM's.

Related

Reinforcement learning. Driving around objects with PPO

I am working on driving industrial robots with neural nets and so far it is working well. I am using the PPO algorithm from the OpenAI baseline and so far I can drive easily from point to point by using the following rewarding strategy:
I calculate the normalized distance between the target and the position. Then I calculate the distance reward with.
rd = 1-(d/dmax)^a
For each time step, I give the agent a penalty calculated by.
yt = 1-(t/tmax)*b
a and b are hyperparameters to tune.
As I said this works really well if I want to drive from point to point. But what if I want to drive around something? For my work, I need to avoid collisions and therefore the agent needs to drive around objects. If the object is not straight in the way of the nearest path it is working ok. Then the robot can adapt and drives around it. But it gets more and more difficult to impossible to drive around objects which are straight in the way.
See this image :
I already read a paper which combines PPO with NES to create some Gaussian noise for the parameters of the neural network but I can't implement it by myself.
Does anyone have some experience with adding more exploration to the PPO algorithm? Or does anyone have some general ideas on how I can improve my rewarding strategy?
What you describe is actually one of the most important research areas of Deep RL: the exploration problem.
The PPO algorithm (like many other "standard" RL algos) tries to maximise a return, which is a (usually discounted) sum of rewards provided by your environment:
In your case, you have a deceptive gradient problem, the gradient of your return points directly at your objective point (because your reward is the distance to your objective), which discourage your agent to explore other areas.
Here is an illustration of the deceptive gradient problem from this paper, the reward is computed like yours and as you can see, the gradient of your return function points directly to your objective (the little square in this example). If your agent starts in the bottom right part of the maze, you are very likely to be stuck in a local optimum.
There are many ways to deal with the exploration problem in RL, in PPO for example you can add some noise to your actions, some other approachs like SAC try to maximize both the reward and the entropy of your policy over the action space, but in the end you have no guarantee that adding exploration noise in your action space will result in efficient of your state space (which is actually what you want to explore, the (x,y) positions of your env).
I recommend you to read the Quality Diversity (QD) literature, which is a very promising field aiming to solve the exploration problem in RL.
Here is are two great resources:
A website gathering all informations about QD
A talk from ICLM 2019
Finally I want to add that the problem is not your reward function, you should not try to engineer a complex reward function such that your agent is able to behave like you want. The goal is to have an agent that is able to solve your environment despite pitfalls like the deceptive gradient problem.

How would I find the cost of a neural network without knowing the target output like for a game?

For example,
I want to create an AI that plays Ticktacktoe, this is how I would go about it.
I have 9 input nodes which is for each space on the board, 3 nodes for one hidden layer (which I'm guessing would somehow benefit the AI by having it select a row or column with 3 spaces), and then 9 output nodes to see where the AI would put its mark on the entire board.
I'm lost on how I would find the cost of this neural network because I don't know how I would judge its prediction and affect its weights and biases.
If I wanted the AI to play a guessing game, it would make sense since I have the correct answer and I can teach it to be more accurate based on how off it was to the actual answer.
(NOTE: I am very new to neural networks, so there may be a simple answer that I've missed somewhere)
So, I did some digging around and found a good introduction to reinforcement learning. This is the method that is used to train neural networks to achieve a goal without knowing an exact target like which move is good in a certain scenario. Backpropagation is not the only learning method, but so many sources only used this method without letting the viewer know of any other methods which confused me.
Going through this playlist right now: https://www.youtube.com/watch?v=2pWv7GOvuf0&index=1&list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT
Hope this will help someone getting started with neural networks!

neural network for sudoku solver

I recently started learning neural networks, and I thought that creating a sudoku solver would be a nice application for NN. I started learning them with backward propagation neural network, but later I figured that there are tens of neural networks. At this point, I find it hard to learn all of them and then pick an appropriate one for my purpose. Hence, I am asking what would be a good choice for creating this solver. Can back propagation NN work here? If not, can you explain why and tell me which one can work.
Thanks!
Neural networks don't really seem to be the best way to solve sudoku, as others have already pointed out. I think a better (but also not really good/efficient) way would be to use an genetic algorithm. Genetic algorithms don't directly relate to NNs but its very useful to know how they work.
Better (with better i mean more likely to be sussessful and probably better for you to learn something new) ideas would include:
If you use a library:
Play around with the networks, try to train them to different datasets, maybe random numbers and see what you get and how you have to tune the parameters to get better results.
Try to write an image generator. I wrote a few of them and they are stil my favourite projects, with one of them i used backprop to teach a NN what x/y coordinate of the image has which color, and the other aproach combines random generated images with ine another (GAN/NEAT).
Try to use create a movie (series of images) of the network learning to create a picture. It will show you very well how backprop works and what parameter tuning does to the results and how it changes how the network gets to the result.
If you are not using a library:
Try to solve easy problems, one after the other. Use backprop or a genetic algorithm for training (whatever you have implemented).
Try to improove your implementation and change some things that nobody else cares about and see how it changes the results.
List of 'tasks' for your Network:
XOR (basically the hello world of NN)
Pole balancing problem
Simple games like pong
More complex games like flappy bird, agar.io etc.
Choose more problems that you find interesting, maybe you are into image recognition, maybe text, audio, who knows. Think of something you can/would like to be able to do and find a way to make you computer do it for you.
It's not advisable to only use your own NN implemetation, since it will probably not work properly the first few times and you'll get frustratet. Experiment with librarys and your own implementation.
Good way to find almost endless resources:
Use google search and add 'filetype:pdf' in the end in order to only show pdf files. Search for neural network, genetic algorithm, evolutional neural network.
Neither neural nets not GAs are close to ideal solutions for Sudoku. I would advise to look into Constraint Programming (eg. the Choco or Gecode solver). See https://gist.github.com/marioosh/9188179 for example. Should solve any 9x9 sudoku in a matter of milliseconds (the daily Sudokus of "Le monde" journal are created using this type of technology BTW).
There is also a famous "Dancing links" algorithm for this problem by Knuth that works very well https://en.wikipedia.org/wiki/Dancing_Links
Just like was mentioned in the comments, you probably want to take a look at convolutional networks. You basically input the sudoku bord as an two dimensional 'image'. I think using a receptive field of 3x3 would be quite interesting, and I don't really think you need more than one filter.
The harder thing is normalization: the numbers 1-9 don't have an underlying relation in sudoku, you could easily replace them by A-I for example. So they are categories, not numbers. However, one-hot encoding every output would mean a lot of inputs, so i'd stick to numerical normalization (1=0.1, 2 = 0.2, etc.)
The output of your network should be a softmax with of some kind: if you don't use softmax, and instead outupt just an x and y coordinate, then you can't assure that the outputedd square has not been filled in yet.
A numerical value should be passed along with the output, to show what number the network wants to fill in.
As PLEXATIC mentionned, neural-nets aren't really well suited for these kind of task. Genetic algorithm sounds good indeed.
However, if you still want to stick with neural-nets you could have a look at https://github.com/Kyubyong/sudoku. As answered Thomas W, 3x3 looks nice.
If you don't want to deal with CNN, you could find some answers here as well. https://www.kaggle.com/dithyrambe/neural-nets-as-sudoku-solvers

Neural network for approximation function for board game

I am trying to make a neural network for approximation of some unkown function (for my neural network course). The problem is that this function has very many variables but many of them are not important (for example in [f(x,y,z) = x+y] z is not important). How could I design (and learn) network for this kind of problem?
To be more specific the function is an evaluation function for some board game with unkown rules and I need to somehow learn this rules by experience of the agent. After each move the score is given to the agent so actually it needs to find how to get max score.
I tried to pass the neighborhood of the agent to the network but there are too many variables which are not important for the score and agent is finding very local solutions.
If you have a sufficient amount of data, your ANN should be able to ignore the noisy inputs. You also may want to try other learning approaches like scaled conjugate gradient or simple heuristics like momentum or early stopping so your ANN isn't over learning the training data.
If you think there may be multiple, local solutions, and you think you can get enough training data, then you could try a "mixture of experts" approach. If you go with a mixture of experts, you should use ANNs that are too "small" to solve the entire problem to force it to use multiple experts.
So, you are given a set of states and actions and your target values are the score after the action is applied to the state? If this problem gets any hairier, it will sound like a reinforcement learning problem.
Does this game have discrete actions? Does it have a discrete state space? If so, maybe a decision tree would be worth trying?

Mapping Vision Outputs To Neural Network Inputs

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?
I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.