Aiming Ki with tensorflow.js - neural-network

I created a little game with p5js. In the game there is a ball which can be shot in an adjustable angel and power. The ball has to hit a target which has an random distance and height.
Now I want to build a Neuralnetwork with tensorflow.js which always hits the target. I know that this probleme can be solved by a simple algorithm but I want to learn tensorflow.js.
To train my model I created an dataset with 10000 entries. It consists of the distance and the height if the target and the values of power and angel to hit it. Than I used the model setup from Daniel shiffmanns color classifier and adjusted the inputs and outputs for my purpose. I maped the output onto the angel and power.
Than I trained it. But the loss didnt decrease.
After a bit of research I found out that the outputs add up to 1, therfore it makes no sence to use the normalized output and map it onto the distance and angel.
Now my questions are:
How can i get the angel and power from the output?
How many layers shoud it have?
Which activation and loss functions should I use?
Or is thevprobleme to complex for my first atempt with tensorflow.js?
I hope I described it clear.
Edit
I realised that my biggest probleme is the outputfunction. Can somebody give me a clue which I should use or how I can design one? I dont find anything to this topic on the tensorflow.js website

Related

Reinforcement learning. Driving around objects with PPO

I am working on driving industrial robots with neural nets and so far it is working well. I am using the PPO algorithm from the OpenAI baseline and so far I can drive easily from point to point by using the following rewarding strategy:
I calculate the normalized distance between the target and the position. Then I calculate the distance reward with.
rd = 1-(d/dmax)^a
For each time step, I give the agent a penalty calculated by.
yt = 1-(t/tmax)*b
a and b are hyperparameters to tune.
As I said this works really well if I want to drive from point to point. But what if I want to drive around something? For my work, I need to avoid collisions and therefore the agent needs to drive around objects. If the object is not straight in the way of the nearest path it is working ok. Then the robot can adapt and drives around it. But it gets more and more difficult to impossible to drive around objects which are straight in the way.
See this image :
I already read a paper which combines PPO with NES to create some Gaussian noise for the parameters of the neural network but I can't implement it by myself.
Does anyone have some experience with adding more exploration to the PPO algorithm? Or does anyone have some general ideas on how I can improve my rewarding strategy?
What you describe is actually one of the most important research areas of Deep RL: the exploration problem.
The PPO algorithm (like many other "standard" RL algos) tries to maximise a return, which is a (usually discounted) sum of rewards provided by your environment:
In your case, you have a deceptive gradient problem, the gradient of your return points directly at your objective point (because your reward is the distance to your objective), which discourage your agent to explore other areas.
Here is an illustration of the deceptive gradient problem from this paper, the reward is computed like yours and as you can see, the gradient of your return function points directly to your objective (the little square in this example). If your agent starts in the bottom right part of the maze, you are very likely to be stuck in a local optimum.
There are many ways to deal with the exploration problem in RL, in PPO for example you can add some noise to your actions, some other approachs like SAC try to maximize both the reward and the entropy of your policy over the action space, but in the end you have no guarantee that adding exploration noise in your action space will result in efficient of your state space (which is actually what you want to explore, the (x,y) positions of your env).
I recommend you to read the Quality Diversity (QD) literature, which is a very promising field aiming to solve the exploration problem in RL.
Here is are two great resources:
A website gathering all informations about QD
A talk from ICLM 2019
Finally I want to add that the problem is not your reward function, you should not try to engineer a complex reward function such that your agent is able to behave like you want. The goal is to have an agent that is able to solve your environment despite pitfalls like the deceptive gradient problem.

Error function and ReLu in a CNN

I'm trying to get a better understanding of neural networks by trying to programm a Convolution Neural Network by myself.
So far, I'm going to make it pretty simple by not using max-pooling and using simple ReLu-activation. I'm aware of the disadvantages of this setup, but the point is not making the best image detector in the world.
Now, I'm stuck understanding the details of the error calculation, propagating it back and how it interplays with the used activation-function for calculating the new weights.
I read this document (A Beginner's Guide To Understand CNN), but it doesn't help me understand much. The formula for calculating the error already confuses me.
This sum-function doesn't have defined start- and ending points, so i basically can't read it. Maybe you can simply provide me with the correct one?
After that, the author assumes a variable L that is just "that value" (i assume he means E_total?) and gives an example for how to define the new weight:
where W is the weights of a particular layer.
This confuses me, as i always stood under the impression the activation-function (ReLu in my case) played a role in how to calculate the new weight. Also, this seems to imply i simply use the error for all layers. Doesn't the error value i propagate back into the next layer somehow depends on what i calculated in the previous one?
Maybe all of this is just uncomplete and you can point me into the direction that helps me best for my case.
Thanks in advance.
You do not backpropagate errors, but gradients. The activation function plays a role in caculating the new weight, depending on whether or not the weight in question is before or after said activation, and whether or not it is connected. If a weight w is after your non-linearity layer f, then the gradient dL/dw wont depend on f. But if w is before f, then, if they are connected, then dL/dw will depend on f. For example, suppose w is the weight vector of a fully connected layer, and assume that f directly follows this layer. Then,
dL/dw=(dL/df)*df/dw //notations might change according to the shape
//of the tensors/matrices/vectors you chose, but
//this is just the chain rule
As for your cost function, it is correct. Many people write these formulas in this non-formal style so that you get the idea, but that you can adapt it to your own tensor shapes. By the way, this sort of MSE function is better suited to continous label spaces. You might want to use softmax or an svm loss for image classification (I'll come back to that). Anyway, as you requested a correct form for this function, here is an example. Imagine you have a neural network that predicts a vector field of some kind (like surface normals). Assume that it takes a 2d pixel x_i and predicts a 3d vector v_i for that pixel. Now, in your training data, x_i will already have a ground truth 3d vector (i.e label), that we'll call y_i. Then, your cost function will be (the index i runs on all data samples):
sum_i{(y_i-v_i)^t (y_i-vi)}=sum_i{||y_i-v_i||^2}
But as I said, this cost function works if the labels form a continuous space (here , R^3). This is also called a regression problem.
Here's an example if you are interested in (image) classification. I'll explain it with a softmax loss, the intuition for other losses is more or less similar. Assume we have n classes, and imagine that in your training set, for each data point x_i, you have a label c_i that indicates the correct class. Now, your neural network should produce scores for each possible label, that we'll note s_1,..,s_n. Let's note the score of the correct class of a training sample x_i as s_{c_i}. Now, if we use a softmax function, the intuition is to transform the scores into a probability distribution, and maximise the probability of the correct classes. That is , we maximse the function
sum_i { exp(s_{c_i}) / sum_j(exp(s_j))}
where i runs over all training samples, and j=1,..n on all class labels.
Finally, I don't think the guide you are reading is a good starting point. I recommend this excellent course instead (essentially the Andrew Karpathy parts at least).

Genetic Algorithm A.I. repetitive behavior

I am writing a C# Windows Forms Application which simulates a simple environment (grid) with two types of objects: plants and herbivores. The herbivores have neural networks which take contents the surrounding few cells as input that decide which direction to move in. The idea is to train the herbivores to eat the plants using a fitness function and a genetic algorithm.
My problem is that if there is nothing surrounding a herbivore, it will decide to move in a particular direction, then, if there is still nothing around it, it will move in the same direction again. What I end up with is a few herbivores that just move in strait lines and never actually encounter any plants at all.
Would adding a clock signal as an input (with each bit as an individual input to the neural network) change this behavior or is this not recommended? I have also thought about adding an input which is just random data (from a Gaussian distribution) to add some unpredictability, but I don't know if this would help or harm the problem. Another idea I am not sure about is if maybe having inputs for the past few moves (as a sort of memory) might solve this issue.
I think you need a Recurrent Network. You can keep track of the last N decisions the network has made and then use them as extra inputs to your network so it will have some sort of knowledge about where it was going and for how long. It could at some point evolve in such a way that it starts doing some sort of path finding.
What #Can_Alper said is definitely good. Also take a look at LSTM's.

Training a model for Latent-SVM

GOOD MORNING COLLEAGUES!
I am very into train a new model from my own data set of faces!
I have found no information about this topic, then I hope my information could help people and I can get some answers as well.
I will try to explain the steps I have needed to do to train my own model and later on some questions...
I have download the Latent code from: http://cs.brown.edu/~pff/latent-release4/
I have download the PASCAL VOC 2008 code (devkit) from: http://host.robots.ox.ac.uk/pascal/VOC/voc2008/index.html
I have emulate the structure of files/folders of the VOC PASCAL but in my own data set:
Annotations. I have created a .xml where I have defined a object, face, (in each image I only have one face). I didn't define difficulties or poses...
JPEGImages where I have stored all the images
ImageSets where I have defined three files:
test.txt, where I wrote the file name of my positive samples
train.txt, where I wrote the file name of my negative samples
trainval.txt, where I wrote the file name of my positive samples (exactly the same file than test.txt).
I have change some things in globals.m and VOCinit.m (to tell the algorithm the path and the location of some files...)
Then I run the training with the command: pascal('face', 1);
Following these steps I have achieved that the training run completely and doesn't fail and I get my own model BUT I have some doubts...
Can you see anything weird in my explanation? Could it work?
Must the files test.txt/trainval.txt be equal? Why... What does it mean?
Do I have to choose the number of parts I want in the model INSIDE the function?
Please, you imagine I have two kind of samples (frontal faces and side faces) and I want to detect both... How can I address this issue? I thought I have to train a model with two components... but How can I tell to the training code which are frontal or side samples?? In the annotations with the label pose?? (I don't think so...) Are there other way to handle this purpose?
Thank you for your time!!
I hope you can solve my doubts :)
I think test.txt should contain samples (images) that will be used to estimate how good the system is after learning the faces. However, trainval.txt is used during the learning stage (training) to fine-tune the parameters of the model; it is an essential part of supervised learning.
Also, it is very hard to have one single SVM to classify faces that are both frontal and sideways. Here is my suggestion:
Train one SVM to detect if the input image is a frontal face or a sideways face. Call this something like SVM-0.
Train another SVM for frontal faces. This SVM will classify all your individuals. Note, however, that SVM is usually a binary classifier, so make sure you choose the right SVM, one that as a multiclass architecture. Call this SVM-F.
Tran a final SVM for sideways faces. Again, use a multiclass SVM. Call it SVM-S.
Present the input image to SVM-0 and if it detects it is a frontal face, present the input again to SVM-F; otherwise, give the input to SVM-S.
In my experience, you should expect very low performance in SVM-S. It is a hard problem to solve. But frontal faces is not a big deal, unless you are working with faces that vary in pose, illumination, and expression (PIE). Face recognition is affected greatly with PIE variations in the images.
I recommend you this website, it contains very good information and tutorials for starters, with or without experience.

Mapping Vision Outputs To Neural Network Inputs

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?
I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.