So I’m trying to make a CNN and so far I think I understand all of the forward propagation and the back propagation in the fully connected layers. However, I’m having some issues with the back prop in the convolutional layers.
Basically I’ve written out the dimensions of everything at each stage in a CNN with two convolutional layers and two fully connected layers, with the input having a depth of 1(as it is black and white) and only one filter being applied at each convolutional layer. I haven’t bothered to use pooling at this stage as to my knowledge it shouldn’t have any impact on the calculus, just to where it is assigned, so the dimensions should still fit as long as I also don’t include any uppooling in my backprop. I also haven’t bothered to write out the dimensions after the application of the activation functions as they would be the same as that as their input and I would be writing the same values twice.
The dimensions, as you will see, vary slightly in format. For the convolutional layers I’ve written them as though they are images, rather than in a matrix form. Whilst for the fully connected layers I’ve written the dimensions as that of the size of the matrices used(will hopefully make more sense when you see it).
The issue is that in calculating the delta for the convolutional layers, the dimensions don’t fit, what am I doing wrong?
Websites used:
http://cs231n.github.io/convolutional-networks/
http://neuralnetworksanddeeplearning.com/chap2.html#the_cross-entropy_cost_function
http://www.jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/
Calculation of dimensions:
I want to ask one general question that nowadays Deep learning specially Convolutional Neural Network (CNN) has been used in every field. Sometimes it is not necessary to use CNN for the problem but the researchers are using and following the trend.
So for the Object Detection problem, is it a kind of problem where CNN is really needed to solve the detection problem?
That is unhappy question. In title you ask about CNN, but you ask about deep learning in general.
So we don't necessary need deep learning for object recognition. But trained deep networks gets better results. Companies like Google and others are thankful for every % of better results.
About CNN, they gets better results than "traditional" ANN and also have less parameters because of weights sharing. CNN also allow transfer learning(you take a feature detector- convolution and pooling layers and than you connect on feature detector yours full connected layers).
A key concept of CNN's is the idea of translational invariance. In short, using a convolutional kernel on an image allows the machine to learn a set of weights for a specific feature (an edge, or a much more detailed object, depending on the layering of the network) and apply it across the entire image.
Consider detecting a cat in an image. If we designed some set of weights that allowed the learner to recognize a cat, we would like those weights to be the same no matter where the cat is in the image! So we would "assign" a layer in the convolutional kernel to detecting cats, and then convolve over the entire image.
Whatever the reason for the recent successes of CNN's, it should be noted that regular fully-connected ANN's should perform just as well. The problem is that they quickly become computationally infeasible on larger images, whereas CNN's are much more efficient due to parameter sharing.
The circumstances remaining an abstraction, I need to train a convolutional network and then run this convolutional network over a sliding window on an image. The goal will be to build a heatmap for making pixel perfect detection boundaries for certain objects.
I'm wondering if there is an easy way in keras to train a network and then turn it into a convolutional network without needing to run loops over an image, which is very slow?
I'm thinking I can just copy the trained convolutional filters into a larger convolutional network.
If not, I'll need to go directly to tensorflow.
This is easily done in Keras, as long as you use a fully convolutional net, i.e. replace any dense layers by a convolutional layer with kernel size 1.
The easiest way to get started is to use one of the pre-trained nets included in Keras, see https://keras.io/applications/ how this is done for custom input size. If you've trained your own fully convolutional net 'old_model', just do:
new_input = Input(new_size)
new_model = Model(new_input, old_model.output)
old_model.save_weights('w.h5')
new_model.get_weights('w.h5')
I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm new to the topic of neural networks. I came across the two terms convolutional neural network and recurrent neural network.
I'm wondering if these two terms are referring to the same thing, or, if not, what would be the difference between them?
Difference between CNN and RNN are as follows:
CNN:
CNN takes a fixed size inputs and generates fixed-size outputs.
CNN is a type of feed-forward artificial neural network - are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing.
CNNs use connectivity pattern between its neurons and is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.
CNNs are ideal for images and video processing.
RNN:
RNN can handle arbitrary input/output lengths.
RNN unlike feedforward neural networks - can use their internal memory to process arbitrary sequences of inputs.
Recurrent neural networks use time-series information. i.e. what I spoke last will impact what I will speak next.
RNNs are ideal for text and speech analysis.
Convolutional neural networks (CNN) are designed to recognize images. It has convolutions inside, which see the edges of an object recognized on the image. Recurrent neural networks (RNN) are designed to recognize sequences, for example, a speech signal or a text. The recurrent network has cycles inside that implies the presence of short memory in the net. We have applied CNN as well as RNN choosing an appropriate machine learning algorithm to classify EEG signals for BCI: http://rnd.azoft.com/classification-eeg-signals-brain-computer-interface/
These architectures are completely different, so it is rather hard to say "what is the difference", as the only thing in common is the fact, that they are both neural networks.
Convolutional networks are networks with overlapping "reception fields" performing convolution tasks.
Recurrent networks are networks with recurrent connections (going in the opposite direction of the "normal" signal flow) which form cycles in the network's topology.
Apart from others, in CNN we generally use a 2d squared sliding window along an axis and convolute (with original input 2d image) to identify patterns.
In RNN we use previously calculated memory. If you are interested you can see, LSTM (Long Short-Term Memory) which is a special kind of RNN.
Both CNN and RNN have one point in common, as they detect patterns and sequences, that is you can't shuffle your single input data bits.
Convolutional neural networks (CNNs) for computer vision, and recurrent neural networks (RNNs) for natural language processing.
Although this can be applied in other areas, RNNs have the advantage of networks that can have signals travelling in both directions by introducing loops in the network.
Feedback networks are powerful and can get extremely complicated. Computations derived from the previous input are fed back into the network, which gives them a kind of memory. Feedback networks are dynamic: their state is changing continuously until they reach an equilibrium point.
First, we need to know that recursive NN is different from recurrent NN.
By wiki's definition,
A recursive neural network (RNN) is a kind of deep neural network created by applying the same set of weights recursively over a structure
In this sense, CNN is a type of Recursive NN.
On the other hand, recurrent NN is a type of recursive NN based on time difference.
Therefore, in my opinion, CNN and recurrent NN are different but both are derived from recursive NN.
This is the difference between CNN and RNN
Convolutional Neural NEtwork:
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. ... They have applications in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.
Recurrent Neural Networks:
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs.
It is more helpful to describe the convolution and recurrent layers first.
Convolution layer:
Includes input, one or more filters (as well as subsampling).
The input can be one-dimensional or n-dimensional (n>1), for example, it can be a two-dimensional image. One or more filters are also defined in each layer. Inputs are convolving with each filter. The method of convolution is almost similar to the convolution of filters in image processing. In general, the purpose of this section is to extract the features of each filter from the input. The output of each convolution is called a feature map.
For example, a filter is considered for horizontal edges, and the result of its convolution with the input is the extraction of the horizontal edges of the input image. Usually, in practice and especially in the first layers, a large number of filters (for example, 60 filters in one layer) are defined. Also, after convolution, the subsampling operation is usually performed, for example, their maximum or average of each of the two neighborhood values is selected.
The convolution layer allows important features and patterns to be extracted from the input. And delete input data dependencies (linear and nonlinear).
[The following figure shows an example of the use of convolutional layers and pattern extraction for classification.][1]
[1]: https://i.stack.imgur.com/HS4U0.png [Kalhor, A. (2020). Classification and Regression NNs. Lecture.]
Advantages of convolutional layers:
Able to remove correlations and reduce input dimensions
Network generalization is increasing
Network robustness increases against changes because it extracts key features
Very powerful and widely used in supervised learning
...
Recurrent layers:
In these layers, the output of the current layer or the output of the next layers can also be used as the input of the layer. It also can receive time series as input.
The output without using the recurrent layer is as follows (a simple example):
y = f(W * x)
Where x is input, W is weight and f is the activator function.
But in recurrent networks it can be as follows:
y = f(W * x)
y = f(W * y)
y = f(W * y)
... until convergence
This means that in these networks the generated output can be used as an input and thus have memory networks. Some types of recurrent networks are Discrete Hopfield Net and Recurrent Auto-Associative NET, which are simple networks or complex networks such as LSTM.
An example is shown in the image below.
Advantages of Recurrent Layers:
They have memory capability
They can use time series as input.
They can use the generated output for later use.
Very used in machine translation, voice recognition, image description
...
Networks that use convolutional layers are called convolutional networks (CNN). Similarly, networks that use recurrent layers are called recurrent networks. It is also possible to use both layers in a network according to the desired application!