How does the cross-entropy error function work in an ordinary back-propagation algorithm? - neural-network

I'm working on a feed-forward backpropagation network in C++ but cannot seem to make it work properly. The network I'm basing mine on is using the cross-entropy error function. However, I'm not very familiar with it and even though I'm trying to look it up I'm still not sure. Sometimes it seems easy, sometimes difficult. The network will solve a multinomial classification problem and as far as I understand, the cross-entropy error function is suitable for these cases.
Someone that knows how it works?

Ah yes, good 'ole backpropagation. The joy of it is that it doesn't really matter (implementation wise) what error function you use, so long as it differentiable. Once you know how to calculate the cross entropy for each output unit (see the wiki article), you simply take the partial derivative of that function to find the weights for the hidden layer, and once again for the input layer.
However, if your question isn't about implementation, but rather about training difficulties, then you have your work cut out for you. Different error functions are good at different things (best to just reason it out based on the error function's definition) and this problem is compounded by other parameters like learning rates.
Hope that helps, let me know if you need any other info; your question was a lil vague...

Related

Activation functions - Neural Network

I am working with neural network in my freetime.
I developed already an easy XOR-Operation with a neural network.
But I dont know when I should use the correct activations function.
Is there an trick or is it just math logic?
There are a lot of options of activation functions such as identity, logistic, tanh, Relu, etc.
The choice of the activation function can be based on the gradient computation (back-propagation). E.g. logistic function is always differentiable but it kind of saturate when the input has large value and therefore slows down the speed of optimization. In this case Relu is prefered over logistic.
Above is only one simple example for the choise of activation function. It really depends on the actual situation.
Besides, I dont think the activation functions used in XOR neural network is representative in more complex application.
The subject of when to use a particular activation function over another is a subject of ongoing academic research. You can find papers related to it by searching for journal articles related to "neural network activation function" in an academic database, or through a Google Scholar search, such as this one:
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C2&q=neural+network+activation+function&btnG=&oq=neural+network+ac
Generally, which function to use depends mostly on what you are trying to do. An activation function is like a lens. You put input into your network, and it comes out changed or focused in some way by the activation function. How your input should be changed depends on what you are trying to achieve. You need to think of your problem, then figure out what function will help you shape your signal into the results you are trying to approximate.
Ask yourself, what is the shape of the data you are trying to model? If it is linear or approximately so, then a linear activation function will suffice. If it is more "step-shaped," you would want to use something like Sigmoid or Tanh (the Tanh function is actually just a scaled Sigmoid), because their graphs exhibit a similar shape. In the case of your XOR problem, we know that a either of those--which work by pushing the output closer to the [-1, 1] range--will work quite well. If you need something that doesn't flatten out away from zero like those two do, the ReLU function might be a good choice (in fact ReLU is probably the most popular activation function these days, and deserves far more serious study than this answer can provide).
You should analyze the graph of each one of these functions and think about the effects each will have on your data. You know the data you will be putting in. When that data goes through the function, what will come out? Will that particular function help you get the output you want? If so, it is a good choice.
Furthermore, if you have a graph of some data with a really interesting shape that corresponds to some other function you know, feel free to use that one and see how it works! Some of ANN design is about understanding, but other parts (at least currently) are intuition.
You can solve you problem with a sigmoid neurons in this case the activation function is:
https://chart.googleapis.com/chart?cht=tx&chl=%5Csigma%20%5Cleft%20(%20z%20%5Cright%20)%20%3D%20%5Cfrac%7B1%7D%7B1%2Be%5E%7B-z%7D%7D
Where:
https://chart.googleapis.com/chart?cht=tx&chl=z%20%3D%20%5Csum_%7Bj%7D%20(w_%7Bj%7Dx_%7Bj%7D%2Bb)
In this formula w there are the weights for each input, b is the bias and x there are the inputs, finally you can use back-propagation for calculate the cost function.

neural network for sudoku solver

I recently started learning neural networks, and I thought that creating a sudoku solver would be a nice application for NN. I started learning them with backward propagation neural network, but later I figured that there are tens of neural networks. At this point, I find it hard to learn all of them and then pick an appropriate one for my purpose. Hence, I am asking what would be a good choice for creating this solver. Can back propagation NN work here? If not, can you explain why and tell me which one can work.
Thanks!
Neural networks don't really seem to be the best way to solve sudoku, as others have already pointed out. I think a better (but also not really good/efficient) way would be to use an genetic algorithm. Genetic algorithms don't directly relate to NNs but its very useful to know how they work.
Better (with better i mean more likely to be sussessful and probably better for you to learn something new) ideas would include:
If you use a library:
Play around with the networks, try to train them to different datasets, maybe random numbers and see what you get and how you have to tune the parameters to get better results.
Try to write an image generator. I wrote a few of them and they are stil my favourite projects, with one of them i used backprop to teach a NN what x/y coordinate of the image has which color, and the other aproach combines random generated images with ine another (GAN/NEAT).
Try to use create a movie (series of images) of the network learning to create a picture. It will show you very well how backprop works and what parameter tuning does to the results and how it changes how the network gets to the result.
If you are not using a library:
Try to solve easy problems, one after the other. Use backprop or a genetic algorithm for training (whatever you have implemented).
Try to improove your implementation and change some things that nobody else cares about and see how it changes the results.
List of 'tasks' for your Network:
XOR (basically the hello world of NN)
Pole balancing problem
Simple games like pong
More complex games like flappy bird, agar.io etc.
Choose more problems that you find interesting, maybe you are into image recognition, maybe text, audio, who knows. Think of something you can/would like to be able to do and find a way to make you computer do it for you.
It's not advisable to only use your own NN implemetation, since it will probably not work properly the first few times and you'll get frustratet. Experiment with librarys and your own implementation.
Good way to find almost endless resources:
Use google search and add 'filetype:pdf' in the end in order to only show pdf files. Search for neural network, genetic algorithm, evolutional neural network.
Neither neural nets not GAs are close to ideal solutions for Sudoku. I would advise to look into Constraint Programming (eg. the Choco or Gecode solver). See https://gist.github.com/marioosh/9188179 for example. Should solve any 9x9 sudoku in a matter of milliseconds (the daily Sudokus of "Le monde" journal are created using this type of technology BTW).
There is also a famous "Dancing links" algorithm for this problem by Knuth that works very well https://en.wikipedia.org/wiki/Dancing_Links
Just like was mentioned in the comments, you probably want to take a look at convolutional networks. You basically input the sudoku bord as an two dimensional 'image'. I think using a receptive field of 3x3 would be quite interesting, and I don't really think you need more than one filter.
The harder thing is normalization: the numbers 1-9 don't have an underlying relation in sudoku, you could easily replace them by A-I for example. So they are categories, not numbers. However, one-hot encoding every output would mean a lot of inputs, so i'd stick to numerical normalization (1=0.1, 2 = 0.2, etc.)
The output of your network should be a softmax with of some kind: if you don't use softmax, and instead outupt just an x and y coordinate, then you can't assure that the outputedd square has not been filled in yet.
A numerical value should be passed along with the output, to show what number the network wants to fill in.
As PLEXATIC mentionned, neural-nets aren't really well suited for these kind of task. Genetic algorithm sounds good indeed.
However, if you still want to stick with neural-nets you could have a look at https://github.com/Kyubyong/sudoku. As answered Thomas W, 3x3 looks nice.
If you don't want to deal with CNN, you could find some answers here as well. https://www.kaggle.com/dithyrambe/neural-nets-as-sudoku-solvers

Matlab - output of the algorithm

I have a program using PSO algorithm using penalty function for Constraint Satisfaction. But when I run the program for different iterations, the output of the algorithm would be :
"Iteration 1: Best Cost = Inf"
.
Does anyone know why I always get inf answer?
There could be many reasons for that, none of which will be accurate if you don't provide a MWE with the code you have already tried or a context of the function you are analysing.
For instance, while studying the PSO algorithm you might use it on functions which have analytical solutions first. By doing this you can study the behaviour of the algorithm before applying to a similar problem, and fine tune its parameters.
My guess is that you might not be providing either the right function (I have done that already, getting a signal wrong is easy!), the right constraints (same logic applies), your weights for the penalty function and velocity update are way off.

Alternatives to FMINCON

Are there any faster and more efficient solvers other than fmincon? I'm using fmincon for a specific problem and I run out of memory for modest sized vector variable. I don't have any supercomputers or cloud computing options at my disposal, either. I know that any alternate solution will still run out of memory but I'm just trying to see where the problem is.
P.S. I don't want a solution that would change the way I'm approaching the actual problem. I know convex optimization is the way to go and I have already done enough work to get up until here.
P.P.S I saw the other question regarding the open source alternatives. That's not what I'm looking for. I'm looking for more efficient ones, if someone faced the same problem adn shifted to a better solver.
Hmmm...
Without further information, I'd guess that fmincon runs out of memory because it needs the Hessian (which, given that your decision variable is 10^4, will be 10^4 x numel(f(x1,x2,x3,....)) large).
It also takes a lot of time to determine the values of the Hessian, because fmincon normally uses finite differences for that if you don't specify derivatives explicitly.
There's a couple of things you can do to speed things up here.
If you know beforehand that there will be a lot of zeros in your Hessian, you can pass sparsity patterns of the Hessian matrix via HessPattern. This saves a lot of memory and computation time.
If it is fairly easy to come up with explicit formulae for the Hessian of your objective function, create a function that computes the Hessian and pass it on to fmincon via the HessFcn option in optimset.
The same holds for the gradients. The GradConstr (for your non-linear constraint functions) and/or GradObj (for your objective function) apply here.
There's probably a few options I forgot here, that could also help you. Just go through all the options in the optimization toolbox' optimset and see if they could help you.
If all this doesn't help, you'll really have to switch optimizers. Given that fmincon is the pride and joy of MATLAB's optimization toolbox, there really isn't anything much better readily available, and you'll have to search elsewhere.
TOMLAB is a very good commercial solution for MATLAB. If you don't mind going to C or C++...There's SNOPT (which is what TOMLAB/SNOPT is based on). And there's a bunch of things you could try in the GSL (although I haven't seen anything quite as advanced as SNOPT in there...).
I don't know on what version of MATLAB you have, but I know for a fact that in R2009b (and possibly also later), fmincon has a few real weaknesses for certain types of problems. I know this very well, because I once lost a very prestigious competition (the GTOC) because of it. Our approach turned out to be exactly the same as that of the winners, except that they had access to SNOPT which made their few-million variable optimization problem converge in a couple of iterations, whereas fmincon could not be brought to converge at all, whatever we tried (and trust me, WE TRIED). To this day I still don't know exactly why this happens, but I verified it myself when I had access to SNOPT. Once, when I have an infinite amount of time, I'll find this out and report this to the MathWorks. But until then...I lost a bit of trust in fmincon :)

How to optimize neural network by using genetic algorithm?

I'm quite new with this topic so any help would be great. What I need is to optimize a neural network in MATLAB by using GA. My network has [2x98] input and [1x98] target, I've tried consulting MATLAB help but I'm still kind of clueless about what to do :( so, any help would be appreciated. Thanks in advance.
Edit: I guess I didn't say what is there to be optimized as Dan said in the 1st answer. I guess most important thing is number of hidden neurons. And maybe number of hidden layers and training parameters like number of epochs or so. Sorry for not providing enough info, I'm still learning about this.
If this is a homework assignment, do whatever you were taught in class.
Otherwise, ditch the MLP entirely. Support vector regression ( http://www.csie.ntu.edu.tw/~cjlin/libsvm/ ) is much more reliably trainable across a broad swath of problems, and pretty much never runs into the stuck-in-a-local-minima problem often hit with back-propagation trained MLP which forces you to solve a network topography optimization problem just to find a network which will actually train.
well, you need to be more specific about what you are trying to optimize. Is it the size of the hidden layer? Do you have a hidden layer? Is it parameter optimization (learning rate, kernel parameters)?
I assume you have a set of parameters (# of hidden layers, # of neurons per layer...) that needs to be tuned, instead of brute-force searching all combinations to pick a good one, GA can help you "jump" from this combination to another one. So, you can "explore" the search space for potential candidates.
GA can help in selecting "helpful" features. Some features might appear redundant and you want to prune them. However, say, data has too many features to search for the best set of features by some approaches such as forward selection. Again, GA can "jump" from this set candidate to another one.
You will need to find away to encode the data (input parameters, features...) fed to GA. For finding a set of input paras or a good set of features, I think binary encoding should work. In addition, choosing operators for GA to reproduce offsprings is also important. Yet GA needs to be tuned, too (early stopping which can also be applied to ANN).
Here are just some ideas. You might want to search for more info about GA, feature selection, ANN pruning...
Since you're using MATLAB already I suggest you look into the Genetic Algorithms solver (known as GATool, part of the Global Optimization Toolbox) and the Neural Network Toolbox. Between those two you should be able to save quite a bit of figuring out.
You'll basically have to do 2 main tasks:
Come up with a representation (or encoding) for your candidate solutions
Code your fitness function (which basically tests candidate solutions) and pass it as a parameter to the GA solver.
If you need help in terms of coming up with a fitness function, or encoding of candidate solutions then you'll have to be more specific.
Hope it helps.
Matlab has a simple but great explanation for this problem here. It explains both the ANN and GA part.
For more info on using ANN in command line see this.
There is also plenty of litterature on the subject if you google it. It is however not related to MATLAB, but simply the results and the method.
Look up Matthew Settles on Google Scholar. He did some work in this area at the University of Idaho in the last 5-6 years. He should have citations relevant to your work.