Parameter settings for neural networks based classification using Matlab

Parameter settings for neural networks based classification using Matlab - matlab

Recently, I am trying to using Matlab build-in neural networks toolbox to accomplish my classification problem. However, I have some questions about the parameter settings.
a. The number of neurons in the hidden layer:
The example on this page Matlab neural networks classification example shows a two-layer (i.e. one-hidden-layer and one-output-layer) feed forward neural networks. In this example, it uses 10 neurons in the hidden layer
net = patternnet(10);
My first question is how to define the best number of neurons for my classification problem? Should I use cross-validation method to get the best performed number of neurons using a training data set?
b. Is there a method to choose three-layer or more multi-layer neural networks?
c. There are many different training method we can use in the neural networks toolbox. A list can be found at Training methods list. The page mentioned that the fastest training function is generally 'trainlm'; however, generally speaking, which one will perform best? Or it totally depends on the data set I am using?
d. In each training method, there is a parameter called 'epochs', which is the training iteration for my understanding. For each training method, Matlab defined the maximum number of epochs to train. However, from the example, it seems like 'epochs' is another parameter we can tune. Am I right? Or we just set the maximum number of epochs or leave it as default?
Any experience with Matlab neural networks toolbox is welcome and thanks very much for your reply. A.

a. You can refer to How to choose number of hidden layers and nodes in neural network? and ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hu
Surely you can do cross-validation to determine the parameter of best number of neurons. But it's not recommended as it's more suitable to use it in the stage of weights training of a certain network.
b. Refer to ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hl
And for more layers of neural network, you can refer to Deep Learning, which is very hot in recent years and gets state-of-the-art performances in many of the pattern recognition tasks.
c. It depends on your data. trainlm performs better on function fitting (nonlinear regression) problems than on pattern recognition problems while training large networks and pattern recognition networks, trainscg and trainrp are good choices. Generally, Gradient Descent and Resilient Backpropagation is recommended. More detailed comparison can be found here: http://www.mathworks.cn/cn/help/nnet/ug/choose-a-multilayer-neural-network-training-function.html
d. Yes, you're right. We can tune the epochs parameter. Generally you can output the recognition results/accuracy at every epoch and you will see that it is promoting more and more slowly, and the more epochs the more computing time. You can make a compromise between the accuracy and computation time.

For part b of your question:
You can use like this code:
net = patternnet([10 15 20]);
This script create a network with 3 hidden layer that first layer has 10 neurons, second layer has 15 neurons and 3th layer has 20 neurons.

Related

what can you say about two exactly same neural networks after training on same dataset?

Supposed you have two convolutional neural networks implemented in matlab and composed by these layers:
imageInputLayer
ConvolutionalLayer
maxPoolinglayer
relulayer
softmaxlayer
fullyconnectedlayer
classification layer
Both of these networks have exactly same architecture.
I apply the same method of training for 2 networks with same hyperparameters.
Both of these networks have exactly same weights in their corresponding layers.
That is, both of these networks are a replica of each other.
Both of these networks are trained using exactly same training set and validation set without shuffle.
I am wondering:
Will the scores (training error and validation error) and trained weights be different for both?
Does it depend upon the method for training?

In short: Yes to both - because inital weights are usually initated using random numbers.
A tad less short: A neural network is simply an algorithm, if there is no noise (i.e. randomness) introduced in any function on the way, 2 networks will end up being completely the same.

Is neural network suitable for supervised learning where the data (inputs and outputs) are continuous?

I am working on a regression model with a set of 158 inputs and 4 outputs of glass manufacturing project which is a continuous process of inputs and outputs. Is the usage of Neural Net a suitable solution for such kind of regression models? If yes, I have understood that Recurrent Neural Nets can be used for time series data, which Recurrent Neural Net shall I use? If usage of NN is not suitable, what are the other types of solutions available other than Linear Regression and Regression Trees?

Neural Networks are indeed suitable for continuous data. In fact, it is continous by default I would say. It is possible to have discrete I/O for sure, it all depend on your functions.
Secondly, it is true that RNN are suitable for time series, in a way. RNN are in fact suitable for timesteps more than timestamps. RNN are working by iterations. Typically, each iteration can be seen as a fixed step forward in time. This said, if you data is more like (date, value) (what I call timestamp), it may not be so good. It would not be absolutely impossible, but that's not the idea.
Hope it helps, start with simple RNN, try to understand how it works, then, if you need more, read about more complex cells.

neural network with multiplicative probability factor

I'm developing a project for the university. I have to create a classifier for a disease. The data-set i have contains several inputs (symptoms) and each of them is associated to a multiplicative probability factor (e.g. if patient has the symptom A, he has a double probability to have that disease).
So, how can i do this type of classifier? Is there any type of neural network or other instrument to do this??
Thanks in advance

You should specify how much labeled data and unlabeled data you have.
Let's assume you have only labeled data. Then you could use neural networks, but IMHO, SVM or random forests are the best techniques for a first try.
Note that if you use machine learning techniques, your prior information about symptoms (multiplicative coefficients) are not used because the labels are used instead. If you want to use these coefficients, it's no more machine learning.

You can use neural network for this purpose also. If to speak about your situation, with binding symptom A to more chances for decease B, that is what neural network should be able to accomplish. To bind connection weights from input A ( symptom A ) to desease B. From your side, you can engrain such classification rule in case if you'll have enough training data in your training data set. Also I propose you to try two different approaches: 1. neural network with N outputs (N = number of deseases to clasif). 2. Create for each desease neural network.

What's the difference between convolutional and recurrent neural networks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm new to the topic of neural networks. I came across the two terms convolutional neural network and recurrent neural network.
I'm wondering if these two terms are referring to the same thing, or, if not, what would be the difference between them?

Difference between CNN and RNN are as follows:
CNN:
CNN takes a fixed size inputs and generates fixed-size outputs.
CNN is a type of feed-forward artificial neural network - are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing.
CNNs use connectivity pattern between its neurons and is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.
CNNs are ideal for images and video processing.
RNN:
RNN can handle arbitrary input/output lengths.
RNN unlike feedforward neural networks - can use their internal memory to process arbitrary sequences of inputs.
Recurrent neural networks use time-series information. i.e. what I spoke last will impact what I will speak next.
RNNs are ideal for text and speech analysis.

Convolutional neural networks (CNN) are designed to recognize images. It has convolutions inside, which see the edges of an object recognized on the image. Recurrent neural networks (RNN) are designed to recognize sequences, for example, a speech signal or a text. The recurrent network has cycles inside that implies the presence of short memory in the net. We have applied CNN as well as RNN choosing an appropriate machine learning algorithm to classify EEG signals for BCI: http://rnd.azoft.com/classification-eeg-signals-brain-computer-interface/

These architectures are completely different, so it is rather hard to say "what is the difference", as the only thing in common is the fact, that they are both neural networks.
Convolutional networks are networks with overlapping "reception fields" performing convolution tasks.
Recurrent networks are networks with recurrent connections (going in the opposite direction of the "normal" signal flow) which form cycles in the network's topology.

Apart from others, in CNN we generally use a 2d squared sliding window along an axis and convolute (with original input 2d image) to identify patterns.
In RNN we use previously calculated memory. If you are interested you can see, LSTM (Long Short-Term Memory) which is a special kind of RNN.
Both CNN and RNN have one point in common, as they detect patterns and sequences, that is you can't shuffle your single input data bits.

Convolutional neural networks (CNNs) for computer vision, and recurrent neural networks (RNNs) for natural language processing.
Although this can be applied in other areas, RNNs have the advantage of networks that can have signals travelling in both directions by introducing loops in the network.
Feedback networks are powerful and can get extremely complicated. Computations derived from the previous input are fed back into the network, which gives them a kind of memory. Feedback networks are dynamic: their state is changing continuously until they reach an equilibrium point.

First, we need to know that recursive NN is different from recurrent NN.
By wiki's definition,
A recursive neural network (RNN) is a kind of deep neural network created by applying the same set of weights recursively over a structure
In this sense, CNN is a type of Recursive NN.
On the other hand, recurrent NN is a type of recursive NN based on time difference.
Therefore, in my opinion, CNN and recurrent NN are different but both are derived from recursive NN.

This is the difference between CNN and RNN
Convolutional Neural NEtwork:
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. ... They have applications in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.
Recurrent Neural Networks:
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs.

It is more helpful to describe the convolution and recurrent layers first.
Convolution layer:
Includes input, one or more filters (as well as subsampling).
The input can be one-dimensional or n-dimensional (n>1), for example, it can be a two-dimensional image. One or more filters are also defined in each layer. Inputs are convolving with each filter. The method of convolution is almost similar to the convolution of filters in image processing. In general, the purpose of this section is to extract the features of each filter from the input. The output of each convolution is called a feature map.
For example, a filter is considered for horizontal edges, and the result of its convolution with the input is the extraction of the horizontal edges of the input image. Usually, in practice and especially in the first layers, a large number of filters (for example, 60 filters in one layer) are defined. Also, after convolution, the subsampling operation is usually performed, for example, their maximum or average of each of the two neighborhood values is selected.
The convolution layer allows important features and patterns to be extracted from the input. And delete input data dependencies (linear and nonlinear).
[The following figure shows an example of the use of convolutional layers and pattern extraction for classification.][1]
[1]: https://i.stack.imgur.com/HS4U0.png [Kalhor, A. (2020). Classification and Regression NNs. Lecture.]
Advantages of convolutional layers:
Able to remove correlations and reduce input dimensions
Network generalization is increasing
Network robustness increases against changes because it extracts key features
Very powerful and widely used in supervised learning
...
Recurrent layers:
In these layers, the output of the current layer or the output of the next layers can also be used as the input of the layer. It also can receive time series as input.
The output without using the recurrent layer is as follows (a simple example):
y = f(W * x)
Where x is input, W is weight and f is the activator function.
But in recurrent networks it can be as follows:
y = f(W * x)
y = f(W * y)
y = f(W * y)
... until convergence
This means that in these networks the generated output can be used as an input and thus have memory networks. Some types of recurrent networks are Discrete Hopfield Net and Recurrent Auto-Associative NET, which are simple networks or complex networks such as LSTM.
An example is shown in the image below.
Advantages of Recurrent Layers:
They have memory capability
They can use time series as input.
They can use the generated output for later use.
Very used in machine translation, voice recognition, image description
...
Networks that use convolutional layers are called convolutional networks (CNN). Similarly, networks that use recurrent layers are called recurrent networks. It is also possible to use both layers in a network according to the desired application!

Optimization of Neural Network input data

I'm trying to build an app to detect images which are advertisements from the webpages. Once I detect those I`ll not be allowing those to be displayed on the client side.
Basically I'm using Back-propagation algorithm to train the neural network using the dataset given here: http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements.
But in that dataset no. of attributes are very high. In fact one of the mentors of the project told me that If you train the Neural Network with that many attributes, it'll take lots of time to get trained. So is there a way to optimize the input dataset? Or I just have to use that many attributes?

1558 is actually a modest number of features/attributes. The # of instances(3279) is also small. The problem is not on the dataset side, but on the training algorithm side.
ANN is slow in training, I'd suggest you to use a logistic regression or svm. Both of them are very fast to train. Especially, svm has a lot of fast algorithms.
In this dataset, you are actually analyzing text, but not image. I think a linear family classifier, i.e. logistic regression or svm, is better for your job.
If you are using for production and you cannot use open source code. Logistic regression is very easy to implement compared to a good ANN and SVM.
If you decide to use logistic regression or SVM, I can future recommend some articles or source code for you to refer.

If you're actually using a backpropagation network with 1558 input nodes and only 3279 samples, then the training time is the least of your problems: Even if you have a very small network with only one hidden layer containing 10 neurons, you have 1558*10 weights between the input layer and the hidden layer. How can you expect to get a good estimate for 15580 degrees of freedom from only 3279 samples? (And that simple calculation doesn't even take the "curse of dimensionality" into account)
You have to analyze your data to find out how to optimize it. Try to understand your input data: Which (tuples of) features are (jointly) statistically significant? (use standard statistical methods for this) Are some features redundant? (Principal component analysis is a good stating point for this.) Don't expect the artificial neural network to do that work for you.
Also: remeber Duda&Hart's famous "no-free-lunch-theorem": No classification algorithm works for every problem. And for any classification algorithm X, there is a problem where flipping a coin leads to better results than X. If you take this into account, deciding what algorithm to use before analyzing your data might not be a smart idea. You might well have picked the algorithm that actually performs worse than blind guessing on your specific problem! (By the way: Duda&Hart&Storks's book about pattern classification is a great starting point to learn about this, if you haven't read it yet.)

aplly a seperate ANN for each category of features
for example
457 inputs 1 output for url terms ( ANN1 )
495 inputs 1 output for origurl ( ANN2 )
...
then train all of them
use another main ANN to join results

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse