How to optimize neural network by using genetic algorithm? - neural-network

I'm quite new with this topic so any help would be great. What I need is to optimize a neural network in MATLAB by using GA. My network has [2x98] input and [1x98] target, I've tried consulting MATLAB help but I'm still kind of clueless about what to do :( so, any help would be appreciated. Thanks in advance.
Edit: I guess I didn't say what is there to be optimized as Dan said in the 1st answer. I guess most important thing is number of hidden neurons. And maybe number of hidden layers and training parameters like number of epochs or so. Sorry for not providing enough info, I'm still learning about this.

If this is a homework assignment, do whatever you were taught in class.
Otherwise, ditch the MLP entirely. Support vector regression ( http://www.csie.ntu.edu.tw/~cjlin/libsvm/ ) is much more reliably trainable across a broad swath of problems, and pretty much never runs into the stuck-in-a-local-minima problem often hit with back-propagation trained MLP which forces you to solve a network topography optimization problem just to find a network which will actually train.

well, you need to be more specific about what you are trying to optimize. Is it the size of the hidden layer? Do you have a hidden layer? Is it parameter optimization (learning rate, kernel parameters)?

I assume you have a set of parameters (# of hidden layers, # of neurons per layer...) that needs to be tuned, instead of brute-force searching all combinations to pick a good one, GA can help you "jump" from this combination to another one. So, you can "explore" the search space for potential candidates.
GA can help in selecting "helpful" features. Some features might appear redundant and you want to prune them. However, say, data has too many features to search for the best set of features by some approaches such as forward selection. Again, GA can "jump" from this set candidate to another one.
You will need to find away to encode the data (input parameters, features...) fed to GA. For finding a set of input paras or a good set of features, I think binary encoding should work. In addition, choosing operators for GA to reproduce offsprings is also important. Yet GA needs to be tuned, too (early stopping which can also be applied to ANN).
Here are just some ideas. You might want to search for more info about GA, feature selection, ANN pruning...

Since you're using MATLAB already I suggest you look into the Genetic Algorithms solver (known as GATool, part of the Global Optimization Toolbox) and the Neural Network Toolbox. Between those two you should be able to save quite a bit of figuring out.
You'll basically have to do 2 main tasks:
Come up with a representation (or encoding) for your candidate solutions
Code your fitness function (which basically tests candidate solutions) and pass it as a parameter to the GA solver.
If you need help in terms of coming up with a fitness function, or encoding of candidate solutions then you'll have to be more specific.
Hope it helps.

Matlab has a simple but great explanation for this problem here. It explains both the ANN and GA part.
For more info on using ANN in command line see this.
There is also plenty of litterature on the subject if you google it. It is however not related to MATLAB, but simply the results and the method.

Look up Matthew Settles on Google Scholar. He did some work in this area at the University of Idaho in the last 5-6 years. He should have citations relevant to your work.

Related

neural network for sudoku solver

I recently started learning neural networks, and I thought that creating a sudoku solver would be a nice application for NN. I started learning them with backward propagation neural network, but later I figured that there are tens of neural networks. At this point, I find it hard to learn all of them and then pick an appropriate one for my purpose. Hence, I am asking what would be a good choice for creating this solver. Can back propagation NN work here? If not, can you explain why and tell me which one can work.
Thanks!
Neural networks don't really seem to be the best way to solve sudoku, as others have already pointed out. I think a better (but also not really good/efficient) way would be to use an genetic algorithm. Genetic algorithms don't directly relate to NNs but its very useful to know how they work.
Better (with better i mean more likely to be sussessful and probably better for you to learn something new) ideas would include:
If you use a library:
Play around with the networks, try to train them to different datasets, maybe random numbers and see what you get and how you have to tune the parameters to get better results.
Try to write an image generator. I wrote a few of them and they are stil my favourite projects, with one of them i used backprop to teach a NN what x/y coordinate of the image has which color, and the other aproach combines random generated images with ine another (GAN/NEAT).
Try to use create a movie (series of images) of the network learning to create a picture. It will show you very well how backprop works and what parameter tuning does to the results and how it changes how the network gets to the result.
If you are not using a library:
Try to solve easy problems, one after the other. Use backprop or a genetic algorithm for training (whatever you have implemented).
Try to improove your implementation and change some things that nobody else cares about and see how it changes the results.
List of 'tasks' for your Network:
XOR (basically the hello world of NN)
Pole balancing problem
Simple games like pong
More complex games like flappy bird, agar.io etc.
Choose more problems that you find interesting, maybe you are into image recognition, maybe text, audio, who knows. Think of something you can/would like to be able to do and find a way to make you computer do it for you.
It's not advisable to only use your own NN implemetation, since it will probably not work properly the first few times and you'll get frustratet. Experiment with librarys and your own implementation.
Good way to find almost endless resources:
Use google search and add 'filetype:pdf' in the end in order to only show pdf files. Search for neural network, genetic algorithm, evolutional neural network.
Neither neural nets not GAs are close to ideal solutions for Sudoku. I would advise to look into Constraint Programming (eg. the Choco or Gecode solver). See https://gist.github.com/marioosh/9188179 for example. Should solve any 9x9 sudoku in a matter of milliseconds (the daily Sudokus of "Le monde" journal are created using this type of technology BTW).
There is also a famous "Dancing links" algorithm for this problem by Knuth that works very well https://en.wikipedia.org/wiki/Dancing_Links
Just like was mentioned in the comments, you probably want to take a look at convolutional networks. You basically input the sudoku bord as an two dimensional 'image'. I think using a receptive field of 3x3 would be quite interesting, and I don't really think you need more than one filter.
The harder thing is normalization: the numbers 1-9 don't have an underlying relation in sudoku, you could easily replace them by A-I for example. So they are categories, not numbers. However, one-hot encoding every output would mean a lot of inputs, so i'd stick to numerical normalization (1=0.1, 2 = 0.2, etc.)
The output of your network should be a softmax with of some kind: if you don't use softmax, and instead outupt just an x and y coordinate, then you can't assure that the outputedd square has not been filled in yet.
A numerical value should be passed along with the output, to show what number the network wants to fill in.
As PLEXATIC mentionned, neural-nets aren't really well suited for these kind of task. Genetic algorithm sounds good indeed.
However, if you still want to stick with neural-nets you could have a look at https://github.com/Kyubyong/sudoku. As answered Thomas W, 3x3 looks nice.
If you don't want to deal with CNN, you could find some answers here as well. https://www.kaggle.com/dithyrambe/neural-nets-as-sudoku-solvers

Determine function parameters with neural network

I am currently studying a doctoral thesis in control theory. At the end of every chapter there is a simulation of a relative-with-the-subject problem. I have finished the theory,but for further understanding I would like to reproduce the simulations. The first simulation is as follows :
The solution of the problem concludes in a system of differential equations whose right hand side consists of functions with unknown parameters. The author states the following : "We will use neural networks with one hidden layer,sigmoid basis functions and 5 weights in the external layer in order to approximate every parameter of the unknown functions.More specifically, the weights of the hidden layer are selected through iterative trials and are kept stable during the simulation." And then he states the logic with which he selects the initial values of the unknown parameters and then shows the results of the simulation.
Could anyone give me a lead on where to look and what I need to know in order to solve this specific problem myself in MATLAB (since this is the environment I am most familiar with)? Because the results of a google search are chaotic since I don't really know what I'm looking for.
If you need any more info,feel free to ask!
You can try MATLAB's Neural Network Toolbox. This gives you an nice UI where you can configure the network, train it with data to find the parameter values and test for performance. No coding involved.
Or, you can program it by hand. Since you are working with one hidden layer, it should be very simple. I am sure any machine learning or neural net (NN) textbook would have one example of it. You can also look into GitHib for projects. There should be many NN projects there, in case you are looking to salvage code from existing project.
Most importantly, you should start by learning about NN, if you haven't done that already. NN with single hidden layer is easy to implement once you understand the equations for the forward and back propagation.

How do I decide which Neural Network and learning method to use in a particular case?

I am new in neural networks and I need to determine the pattern among a given set of inputs and outputs. So how do I decide which neural network to use for training or even which learning method to use? I have little idea about the pattern or relation between the given input and outputs.
Any sort of help will be appreciated. If you want me to read some stuff then it would be great if links are provided.
If any more info is needed plz say so.
Thanks.
Choosing the right neural networks is something of an art form. It's a bit difficult to give generic suggestions as the best NN for a situation will depend on the problem at hand. As with many of these problems neural netowrks may or may not be the best solution. I'd highly recommned trying out different networks and testing their performance vs a testing data set. When I did this I usually used the ANN tools though the R software package.
Also keep your mind open to other statistical learning techniques as well, things like decision trees and Support Vector Machines may be a better choice for some problems.
I'd suggest the following books:
http://www.amazon.com/Neural-Networks-Pattern-Recognition-Christopher/dp/0198538642
http://www.stats.ox.ac.uk/~ripley/PRbook/#Contents

How does the cross-entropy error function work in an ordinary back-propagation algorithm?

I'm working on a feed-forward backpropagation network in C++ but cannot seem to make it work properly. The network I'm basing mine on is using the cross-entropy error function. However, I'm not very familiar with it and even though I'm trying to look it up I'm still not sure. Sometimes it seems easy, sometimes difficult. The network will solve a multinomial classification problem and as far as I understand, the cross-entropy error function is suitable for these cases.
Someone that knows how it works?
Ah yes, good 'ole backpropagation. The joy of it is that it doesn't really matter (implementation wise) what error function you use, so long as it differentiable. Once you know how to calculate the cross entropy for each output unit (see the wiki article), you simply take the partial derivative of that function to find the weights for the hidden layer, and once again for the input layer.
However, if your question isn't about implementation, but rather about training difficulties, then you have your work cut out for you. Different error functions are good at different things (best to just reason it out based on the error function's definition) and this problem is compounded by other parameters like learning rates.
Hope that helps, let me know if you need any other info; your question was a lil vague...

Choose the right classification algorithm. Linear or non-linear? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I find this question a little tricky. Maybe someone knows an approach to answer this question. Imagine that you have a dataset(training data) which you don't know what it is about. Which features of training data would you look at in order to infer classification algorithm to classify this data? Can we say anything whether we should use a non-linear or linear classification algorithm?
By the way, I am using WEKA to analyze the data.
Any suggestions?
Thank you.
This is in fact two questions in one ;-)
Feature selection
Linear or not
add "algorithm selection", and you probably have three most fundamental questions of classifier design.
As an aside note, it's a good thing that you do not have any domain expertise which would have allowed you to guide the selection of features and/or to assert the linearity of the feature space. That's the fun of data mining : to infer such info without a priori expertise. (BTW, and while domain expertise is good to double-check the outcome of the classifier, too much a priori insight may make you miss good mining opportunities). Without any such a priori knowledge you are forced to establish sound methodologies and apply careful scrutiny to the results.
It's hard to provide specific guidance, in part because many details are left out in the question, and also because I'm somewhat BS-ing my way through this ;-). Never the less I hope the following generic advice will be helpful
For each algorithm you try (or more precisely for each set of parameters for a given algorithm), you will need to run many tests. Theory can be very helpful, but there will remain a lot of "trial and error". You'll find Cross-Validation a valuable technique.
In a nutshell, [and depending on the size of the available training data], you randomly split the training data in several parts and train the classifier on one [or several] of these parts, and then evaluate the classifier on its performance on another [or several] parts. For each such run you measure various indicators of performance such as Mis-Classification Error (MCE) and aside from telling you how the classifier performs, these metrics, or rather their variability will provide hints as to the relevance of the features selected and/or their lack of scale or linearity.
Independently of the linearity assumption, it is useful to normalize the values of numeric features. This helps with features which have an odd range etc.
Within each dimension, establish the range within, say, 2.5 standard deviations on either side of the median, and convert the feature values to a percentage on the basis of this range.
Convert nominal attributes to binary ones, creating as many dimensions are there are distinct values of the nominal attribute. (I think many algorithm optimizers will do this for you)
Once you have identified one or a few classifiers with a relatively decent performance (say 33% MCE), perform the same test series, with such a classifier by modifying only one parameter at a time. For example remove some features, and see if the resulting, lower dimensionality classifier improves or degrades.
The loss factor is a very sensitive parameter. Try and stick with one "reasonnable" but possibly suboptimal value for the bulk of the tests, fine tune the loss at the end.
Learn to exploit the "dump" info provided by the SVM optimizers. These results provide very valuable info as to what the optimizer "thinks"
Remember that what worked very well wih a given dataset in a given domain may perform very poorly with data from another domain...
coffee's good, not too much. When all fails, make it Irish ;-)
Wow, so you have some training data and you don't know whether you are looking at features representing words in a document, or genese in a cell and need to tune a classifier. Well, since you don't have any semantic information, you are going to have to do this soley by looking at statistical properties of the data sets.
First, to formulate the problem, this is more than just linear vs non-linear. If you are really looking to classify this data, what you really need to do is to select a kernel function for the classifier which may be linear, or non-linear (gaussian, polynomial, hyperbolic, etc. In addition each kernel function may take one or more parameters that would need to be set. Determining an optimal kernel function and parameter set for a given classification problem is not really a solved problem, there are only useful heuristics and if you google 'selecting a kernel function' or 'choose kernel function', you will be treated to many research papers proposing and testing various approaches. While there are many approaches, one of the most basic and well travelled is to do a gradient descent on the parameters-- basically you try a kernel method and a parameter set , train on half your data points and see how you do. Then you try a different set of parameters and see how you do. You move the parameters in the direction of best improvement in accuracy until you get satisfactory results.
If you don't need to go through all this complexity to find a good kernel function, and simply want an answer to linear or non-linear. then the question mainly comes down to two things: Non linear classifiers will have a higher risk of overfitting (undergeneralizing) since they have more dimensions of freedom. They can suffer from the classifier merely memorizing sets of good data points, rather than coming up with a good generalization. On the other hand a linear classifier has less freedom to fit, and in the case of data that is not linearly seperable, will fail to find a good decision function and suffer from high error rates.
Unfortunately, I don't know a better mathematical solution to answer the question "is this data linearly seperable" other than to just try the classifier itself and see how it performs. For that you are going to need a smarter answer than mine.
Edit: This research paper describes an algorithm which looks like it should be able to determine how close a given data set comes to being linearly seperable.
http://www2.ift.ulaval.ca/~mmarchand/publications/wcnn93aa.pdf