How flatten layer unroll input images?
For example, input - 7x7 images with depth 512.
How exactly flatten layers unroll input data to vector?
I guess in this case one cannot give an answer that is generally valid.
The flattening depends on the implementation.
It's up to you (or your used library) how the flattening is performed.
For example, you could flatten an 7x7 image (one channel) by taking every row, transforming it into a column vector and stacking all column vectors on each other (for the following section named as 'channel vector').
Imagine you have n channels (e.g. n = 512):
You could perform the above mentioned flattening for every feature map (i.e. channel) which leads to n 'channel vectors'.
You could process them separately in parallel or you could stack all channel vectors on each other gaining a vector containing all activities of all feature maps.
The flattening step is needed so that you can make use of fully connected layers after some convolutional layers.
Fully connected layers don't have a local limitation like convolutional layers (which only observe some local part of an image by using convolutional filters).
This means you can combine all the found local features of the previous convolutional layers.
Related
Suppose we have a set of images and labels meant for a machine-learning classification task. The problem is that these images come with a relatively short retention policy. While one could train a model online (i.e. update it with new image data every day), I'm ideally interested in a solution that can somehow retain images for training and testing.
To this end, I'm interested if there are any known techniques, for example some kind of one-way hashing on images, which obfuscates the image, but still allows for deep learning techniques on it.
I'm not an expert on this but the way I'm thinking about it is as follows: we have a NxN image I (say 1024x1024) with pixel values in P:={0,1,...,255}^3, and a one-way hash map f(I):P^(NxN) -> S. Then, when we train a convolutional neural network on I, we first map the convolutional filters via f, to then train on a high-dimensional space S. I think there's no need for f to locally-sensitive, in that pixels near each other don't need to map to values in S near each other, as long as we know how to map the convolutional filters to S. Please note that it's imperative that f is not invertible, and that the resulting stored image in S is unrecognizable.
One option for f,S is to use a convolutional neural network on I to then extract the representation of I from it's fully connected layer. This is not ideal because there's a high chance that this network won't retain the finer features needed for the classification task. So I think this rules out a CNN or auto encoder for f.
From my understanding of CNNs, Flatten is used to go from 2D to 1D so that you can use Dense layers to perform classification. Also in my understanding, flattening results in the dimensions of the filter times the number of filters.
Why is it that after flattening, the first Dense layer does not have to have the same dimensions as the result of Flatten (which would be dims of filter * filters)? CNNs in which the first Dense layer has fewer or greater nodes than the dims of flatten will both work, but I have no idea why. Isn't flatten supposed to give you the inputs for the Dense layers?
Flatten does indeed flatten out your outputs to 1-dimension.
However, the dense layer it feeds into can be any size. The number of neurons in the dense layer(s) do not depend on the number of inputs they receive. This is a feature of traditional neural networks (multilayer perceptrons) and has nothing to do with the convolution operations or layers beforehand.
The design of the fully-connected part of the network, where the dense layers are, does not have a definitive solution, however there are rules-of-thumb that can be followed.
When coding a convolutional neural network I am unsure of where to start with the convolutional layer. When different convolutional filters are used to produce different feature maps does that mean that the filters have different sizes (for example, 3x3, 2x2 etc.) ?
In most examples which is a good indication of how to go about coding a convolutional neural network, you will find to start with 1 convolutional layer and pass layer sizes, 3x3 window, input data features.
model.add(Conv2D(layer_size, (3,3), input_shape = x.shape[1:]))
The filter sizes usually only differ in the max pooling layer e.g 2x2.
model.add(MaxPooling2D(pool_size=(2,2)))
Layer sizes are usually selected from range layer_size = [32, 64,128] and you can do the same to experiment with different convolution_layers = [1,2,3]
I've never seen different kernel sizes for the filters in the same layer, although it is possible to do so, is not a default option the frameworks I have used. What makes filters yield different feature maps are the weights.
Along different layers different kernel sizes are used because the idea of the convolutional networks is to gradually reduce dimensionality through downsampling layers (max pooling for example), so in deep levels you have smaller feature maps and a smaller filter keeps it convolutional and less fully connected (having a kernel the same size as the image is equivalent to have a dense layer).
If you're starting with convolutionals I recommend you to play with this interactive visualization of a CNN, it helped me with a lot of concepts.
I have designed a CNN in MATLAB with the next set of layers:
So, if I'm correct making my calculations, the output size of the last ReLU layer should be 7x7x32, which seems to be correct because that size contains 1568 values, and if we look at the Fully Connected Layer, we can see that the vector of weights has size 10x1568:
Now, I'm making a project where I'm coding the same CNN by hand. But when I have to code the Fully Connected Layer, I don't know how its vector of weights is related to the previous output. For example, I guess that a hypothetical output(1,1,1) value is connected to the weights Weights(:,1). But what about the others?.
My question therefore is, how should I loop through the output to match the weights from the first one (1) to the last one (1568)?.
I am new to TensorFlow and deep learning. I am trying to create a fully connected neural network for image processing. I am somewhat confused.
We have an image, say 28x28 pixels. This will have 784 inputs to the NN. For non-correlated inputs, this is fine, but image pixels are generally correlated. For instance, consider a picture of a cow's eye. How can a neural network understand this when we have all pixels lined up in an array for a fully-connected network. How does it determine the correlation?
Please research some tutorials on CNN (Convolutional Neural Network); here is a starting point for you. A fully connected layer of a NN surrenders all of the correlation information it might have had with the input. Structurally, it implements the principle that the inputs are statistically independent.
Alternately, a convolution layer depends upon the physical organization of the inputs (such as pixel adjacency), using that to find simple combinations (convolutions) of feature form one layer to another.
Bottom line: your NN doesn't find the correlation: the topology is wrong, and cannot do the job you want.
Also, please note that a layered network consisting of fully-connected neurons with linear weight combinations, is not deep learning. Deep learning has at least one hidden layer, a topology which fosters "understanding" of intermediate structures. A purely linear, fully-connected layering provides no such hidden layers. Even if you program hidden layers, the outputs remain a simple linear combination of the inputs.
Deep learning requires some other discrimination, such as convolutions, pooling, rectification, or other non-linear combinations.
Let's take it into peaces to understand the intuition behind NN learning to predict.
to predict a class of given image we have to find a correlation or direct link between once of it is input values to the class. we can think about finding one pixel can tell us this image belongs to this class. which is impossible so what we have to do is build up more complex function or let's call complex features. which will help us to find to generate a correlated data to the wanted class.
To make it simpler imagine you want to build AND function (p and q), OR function (p or q) in the both cases there is a direct link between the input and the output. in and function if there 0 in the input the output always zero. so what if we want to xor function (p xor q) there is no direct link between the input and the output. the answer is to build first layer of classifying AND and OR then by a second layer taking the result of the first layer we can build the function and classify the XOR function
(p xor q) = (p or q) and not (p and q)
By applying this method on Multi-layer NN you'll have the same result. but then you'll have to deal with huge amount of parameters. one solution to avoid this is to extract representative, variance and uncorrelated features between images and correlated with their class from the images and feed the to the Network. you can look for image features extraction on the web.
this is a small explanation for how to see the link between images and their classes and how NN work to classify them. you need to understand NN concept and then you can go to read about Deep-learning.