Non-convex classification with 2 layer MLP - neural-network

Can 2-layer MLP (1 hidden layer and 1 output layer) classify the doughnut shape? or need 3 layer MLP?
Is there any important references to point this problem and prove how many layer needs to classify?

A three layer neural network is no more powerful than a two layer one with a bigger hidden layer. The only reason to go deeper is that it is more efficient use of parameters to have multiple layers instead of a really big single hidden layer.

Related

Customized neural network Matlab

I am struggling to create a customized neural network in MatLab. I've made a sketch of my intended neural network.
To explain better how the network should work:
Two input parameters (features) connected to the first hidden layer with two neurons (exactly equal to the number of input parameters)
Each input parameter is connected to one neuron only.
No bias in the first hidden layer.
All the neurons in the first hidden layer are connected to the neurons in the second layer. There is a bias term in the second layer.
The neurons from the second hidden layer is connected to one output.
For simplicity, I did not show the projection functions in the plots.
Could somebody help me with creating this (probably) simple customized network?
I appreciate your help.
You want a feedforwardnet, in your example you have one layer of 3 neurons and an output layer but no bias on the neurons. For this you'll need to set up your network and change the net.biasConnect element
net = feedforwardnet(3);
net.biasConnect(1) = false;
view(net)
Once you train the network the rest will come into place.

units of neural network layer are independent?

In neural network, there are 3 main parts defined as input layer, hidden layer and output layer. Is there any correlation between units of hidden layer? For example, for 1st and 2nd neurons of hidden layer are independent of each other, or there is a relation between each other? Is there any source that explains this issue?
Answer depends on many factors. From probabilistic perspective they are independent given inputs and before training. If input is not fixed then they are heavily correlated (as two "almost linear" functions of the same input signal). Finally, after training they will be strongly correlated, and exact correlations will depend on initialisation and training itself.

Is nnet package in R only used to fit a neural network with single hidden layer?

In the description of nnet in CRAN projects(https://cran.r-project.org/web/packages/nnet/nnet.pdf) it says that nnet fits a single hidden layer:
Description: Fit single-hidden-layer neural network, possibly with skip-layer connections
Is it possible for me to specify the number of hidden layers using nnet? My understanding was that my selection of hidden layers and number of neurons in the hidden layer were the parameters that can be changed to improve a model. Is it true that it could help a model to add/remove hidden layers? Or, are there separate areas of application of single-layered and multi-layered neural networks?
I am new to ANN. I am working on a classification model with training sample size of 55000 x 54.
Thanks in advance!
Simple answer NO, nnet always has a single layer where you specify the number of nodes. You can find more information in a similar question here. You will need to use other packages such as neuralnet or something more sophisticated like h20 or MXNet.
Regarding parameters to improve the model, there are many different parts of neural networks beyond the raw architecture (i.e. layers, nodes). These include optimization functions, activation functions, batchsizes, among many others. You likely want to consult some further resources on using neural networks.

Why do we need layers in artificial neural network?

I am quite new to artificial neural network, and what I cannot understand is why we need the concept of layer.
Isn't is enough to connect each neuron to some other neurons creating a kind of web more then a layered based structure?
For example for solving the XOR we usually need at least 3 layers, 1 input with 2 neurons, 1+ hidden layer(s) with some neurons and 1 output layer with 1 neuron.
Couldn't we create a network with 2 input neurons (we need them) and 1 output connected by a web of other neurons?
Example of what I mean
The term 'layer' is different than you might think. There is always a 'web' of neurons. A layer just denotes a group of neurons.
If I want to connect layer X with layer Y, this means I am connecting all neurons from layer X to all neurons from layer Y. But not always! You could also connect each neuron from layer X to just one neuron in layer Y. There are lots of different connection techniques.
But layers aren't required! It just makes the coding (and explanation) process a whole lot easier. Instead of connecting all neurons one by one, you can connect them in layers. It's far easier to say "layer A and B are connected" than "neuron 1,2,3,4,5 are all connected with neurons 6,7,8,9".
If you are interested in 'layerless' networks, please take a look at Liquid State Machines:
(the neurons might look to be layered, but they aren't!)
PS: I develop a Javascript neural network library, and I have created an onlinedemo in which a neural network evolves to an XOR gate - without layers, just starting with input and output. View it here.. Your example picture is exactly what kind of networks you could develop with this library.

Can a convolutional neural network be built with perceptrons?

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.
We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.
So max-pooling is applied.
With multiple convolutional layers, it looks something like this:
But my question is, this whole architecture could be build with perceptrons, right?
For every convolutional layer, one perceptron is needed, with layers:
input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;
Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field:
So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:
is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.
And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.
and then finally:
The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons.
which is a perceptron again.
So my final architecture looks like this:
-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron
Or am I overseeing something? Or is this already how they are programmed?
The answer very much depends on what exactly you call a Perceptron. Common options are:
Complete architecture. Then no, simply because it's by definition a different NN.
A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.
A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.
Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)
A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.