Can a convolutional neural network be built with perceptrons? - neural-network

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.
We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.
So max-pooling is applied.
With multiple convolutional layers, it looks something like this:
But my question is, this whole architecture could be build with perceptrons, right?
For every convolutional layer, one perceptron is needed, with layers:
input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;
Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field:
So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:
is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.
And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.
and then finally:
The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons.
which is a perceptron again.
So my final architecture looks like this:
-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron
Or am I overseeing something? Or is this already how they are programmed?

The answer very much depends on what exactly you call a Perceptron. Common options are:
Complete architecture. Then no, simply because it's by definition a different NN.
A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.
A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.
Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)
A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.

Related

How does a neural network work with correlated image data

I am new to TensorFlow and deep learning. I am trying to create a fully connected neural network for image processing. I am somewhat confused.
We have an image, say 28x28 pixels. This will have 784 inputs to the NN. For non-correlated inputs, this is fine, but image pixels are generally correlated. For instance, consider a picture of a cow's eye. How can a neural network understand this when we have all pixels lined up in an array for a fully-connected network. How does it determine the correlation?
Please research some tutorials on CNN (Convolutional Neural Network); here is a starting point for you. A fully connected layer of a NN surrenders all of the correlation information it might have had with the input. Structurally, it implements the principle that the inputs are statistically independent.
Alternately, a convolution layer depends upon the physical organization of the inputs (such as pixel adjacency), using that to find simple combinations (convolutions) of feature form one layer to another.
Bottom line: your NN doesn't find the correlation: the topology is wrong, and cannot do the job you want.
Also, please note that a layered network consisting of fully-connected neurons with linear weight combinations, is not deep learning. Deep learning has at least one hidden layer, a topology which fosters "understanding" of intermediate structures. A purely linear, fully-connected layering provides no such hidden layers. Even if you program hidden layers, the outputs remain a simple linear combination of the inputs.
Deep learning requires some other discrimination, such as convolutions, pooling, rectification, or other non-linear combinations.
Let's take it into peaces to understand the intuition behind NN learning to predict.
to predict a class of given image we have to find a correlation or direct link between once of it is input values to the class. we can think about finding one pixel can tell us this image belongs to this class. which is impossible so what we have to do is build up more complex function or let's call complex features. which will help us to find to generate a correlated data to the wanted class.
To make it simpler imagine you want to build AND function (p and q), OR function (p or q) in the both cases there is a direct link between the input and the output. in and function if there 0 in the input the output always zero. so what if we want to xor function (p xor q) there is no direct link between the input and the output. the answer is to build first layer of classifying AND and OR then by a second layer taking the result of the first layer we can build the function and classify the XOR function
(p xor q) = (p or q) and not (p and q)
By applying this method on Multi-layer NN you'll have the same result. but then you'll have to deal with huge amount of parameters. one solution to avoid this is to extract representative, variance and uncorrelated features between images and correlated with their class from the images and feed the to the Network. you can look for image features extraction on the web.
this is a small explanation for how to see the link between images and their classes and how NN work to classify them. you need to understand NN concept and then you can go to read about Deep-learning.

How to enforce feature vector representing label probability with Caffe siamese CNN?

related to How to Create CaffeDB training data for siamese networks out of image directory
If I have N labels. How can I enforce, that the feature vector of size N right before the contrastive loss layer represents some kind of probability for each class? Or comes that automatically with the siamese net design?
If you only use contrastive loss in a Siamese network, there is no way of forcing the net to classify into the correct label - because the net is only trained using "same/not same" information and does not know the semantics of the different classes.
What you can do is train with multiple loss layers.
You should aim at training a feature representation that is reach enough for your domain, so that looking at the trained feature vector of some input (in some high dimension) you should be able to easily classify that input to the correct class. Moreover, given that feature representation of two inputs one should be able to easily say if they are "same" or "not same".
Therefore, I recommend that you train your deep network with two loss layer with "bottom" as the output of one of the "InnerProduct" layers. One loss is the contrastive loss. The other loss should have another "InnerProduct" layer with num_output: N and a "SoftmaxWithLoss" layer.
A similar concept was used in this work:
Sun, Chen, Wang and Tang Deep Learning Face Representation by Joint Identification-Verification NIPS 2014.

How do i take a trained neural network and implement in another system?

I have trained a feedforward neural network in Matlab. Now I have to implement this neural network in C language (or simulate the model in Matlab using mathematical equations, without using direct functions). How do I do that? I know that I have to take the weights and bias and activation function. What else is required?
There is no point in representing it as a mathematical function because it won't save you any computations.
Indeed all you need is the weights, biases, activation and your architecture. I'm assuming it is a simple feedforward network as you said, you need to implement some kind of matrix multiplication and addition in C. Also, you'll need to implement the activation function. After that, you're ready to go. Your feed forward NN is ready to be implemented. If the C code will not be used for training, it won't be necessary to implement the backpropagation algorithm in C.
A feedforward layer would be implemented as follows:
Output = Activation_function(Input * weights + bias)
Where,
Input: (1 x number_of_input_parameters_for_this_layer)
Weights: (number_of_input_parameters_for_this_layer x number_of_neurons_for_this_layer)
Bias: (1 x number_of_neurons_for_this_layer)
Output: (1 x number_of_neurons_for_this_layer)
The output of a layer is the input to the next layer.
After some days of searching, I have found the following webpage to be very useful http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
The picture below shows a simple feedforward neural network. Picture taken from the above website.
In this figure, the circles denote the inputs to the network. The circles labeled “+1” are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer, and the rightmost layer the output layer (which, in this example, has only one node). The middle layer of nodes is called the hidden layer, because its values are not observed in the training set. In this example, the neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit.
The mathematical equations representing this feedforward network are
This neural network has parameters (W,b)=(W(l),b(l),W(2),b(2)), where we write Wij(l) to denote the parameter (or weight) associated with the connection between unit j in layer l, and unit i in layer l+1. (Note the order of the indices.) Also, bi(l) is the bias associated with unit i in layer l+1.
So, from the trained model, as Mido mentioned in his answer, we have to take the input weight matrix which is W(1), the layer weight matrix which is W(2), biases, hidden layer transfer function and output layer transfer function. After this, use the above equations to estimate the output hW,b(x). A popular transfer function used for a regression problem is tan-sigmoid transfer function in the hidden layer and linear transfer function in the output layer.
Those who use Matlab, these links are highly useful
try to simulate neural network in Matlab by myself
Neural network in MATLAB
Programming a Basic Neural Network from scratch in MATLAB

Replicator Neural Network for outlier detection, Step-wise function causing same prediction

In my project, one of my objectives is to find outliers in aeronautical engine data and chose to use the Replicator Neural Network to do so and read the following report on it (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.3366&rep=rep1&type=pdf) and am having a slight understanding issue with the step-wise function (page 4, figure 3) and the prediction values due to it.
The explanation of a replicator neural network is best described in the above report but as a background the replicator neural network I have built works by having the same number of outputs as inputs and having 3 hidden layers with the following activation functions:
Hidden layer 1 = tanh sigmoid S1(θ) = tanh,
Hidden layer 2 = step-wise, S2(θ) = 1/2 + 1/(2(k − 1)) {summation each variable j} tanh[a3(θ −j/N)]
Hidden Layer 3 = tanh sigmoid S1(θ) = tanh,
Output Layer 4 = normal sigmoid S3(θ) = 1/1+e^-θ
I have implemented the algorithm and it seems to be training (since the mean squared error decreases steadily during training). The only thing I don't understand is how the predictions are made when the middle layer with the step-wise activation function is applied since it causes the 3 middle nodes' activations to be become specific discrete values (e.g. my last activations on the 3 middle were 1.0, -1.0, 2.0 ) , this causes these values to be forward propagated and me getting very similar or exactly the same predictions every time.
The section in the report on page 3-4 best describes the algorithm but i have no idea what i have to do to fix this, i don't have much time either :(
Any help would be greatly appreciated.
Thank you
I'm facing the problem of implementing this algorithm and here is my insight into the problem that you might have had: The middle layer, by utilizing a step-wise function, is essentially performing clustering on the data. Each layer transforms the data into a discrete number which could be interpreted as a coordinate in a grid system. Imagine we use two neurons in the middle layer with step-wise values ranging from -2 to +2 in increments of 1. This way we define a 5x5 grid where each set of features will be placed. The more steps you allow, the more grids. The more grids, the more "clusters" you have.
This all sounds good and all. After all, we are compressing the data into a smaller (dimensional) representation which then is used to try to reconstruct into the original input.
This step-wise function, however, has a big problem on itself: back-propagation does not work (in theory) with step-wise functions. You can find more about this in this paper. In this last paper they suggest switching the step-wise function with a ramp-like function. That is, to have almost an infinite amount of clusters.
Your problem might be directly related to this. Try switching the step-wise function with a ramp-wise one and measure how the error changes throughout the learning phase.
By the way, do you have any of this code available anywhere for other researchers to use?

Parameter settings for neural networks based classification using Matlab

Recently, I am trying to using Matlab build-in neural networks toolbox to accomplish my classification problem. However, I have some questions about the parameter settings.
a. The number of neurons in the hidden layer:
The example on this page Matlab neural networks classification example shows a two-layer (i.e. one-hidden-layer and one-output-layer) feed forward neural networks. In this example, it uses 10 neurons in the hidden layer
net = patternnet(10);
My first question is how to define the best number of neurons for my classification problem? Should I use cross-validation method to get the best performed number of neurons using a training data set?
b. Is there a method to choose three-layer or more multi-layer neural networks?
c. There are many different training method we can use in the neural networks toolbox. A list can be found at Training methods list. The page mentioned that the fastest training function is generally 'trainlm'; however, generally speaking, which one will perform best? Or it totally depends on the data set I am using?
d. In each training method, there is a parameter called 'epochs', which is the training iteration for my understanding. For each training method, Matlab defined the maximum number of epochs to train. However, from the example, it seems like 'epochs' is another parameter we can tune. Am I right? Or we just set the maximum number of epochs or leave it as default?
Any experience with Matlab neural networks toolbox is welcome and thanks very much for your reply. A.
a. You can refer to How to choose number of hidden layers and nodes in neural network? and ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hu
Surely you can do cross-validation to determine the parameter of best number of neurons. But it's not recommended as it's more suitable to use it in the stage of weights training of a certain network.
b. Refer to ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hl
And for more layers of neural network, you can refer to Deep Learning, which is very hot in recent years and gets state-of-the-art performances in many of the pattern recognition tasks.
c. It depends on your data. trainlm performs better on function fitting (nonlinear regression) problems than on pattern recognition problems while training large networks and pattern recognition networks, trainscg and trainrp are good choices. Generally, Gradient Descent and Resilient Backpropagation is recommended. More detailed comparison can be found here: http://www.mathworks.cn/cn/help/nnet/ug/choose-a-multilayer-neural-network-training-function.html
d. Yes, you're right. We can tune the epochs parameter. Generally you can output the recognition results/accuracy at every epoch and you will see that it is promoting more and more slowly, and the more epochs the more computing time. You can make a compromise between the accuracy and computation time.
For part b of your question:
You can use like this code:
net = patternnet([10 15 20]);
This script create a network with 3 hidden layer that first layer has 10 neurons, second layer has 15 neurons and 3th layer has 20 neurons.