Number of parameters calculation in Convolutional NN - neural-network

I'm new in the CNN study and I started by watching Andrew'NG lessons.
There is an example that I did not understand :
How did he compute the #parameters value ?

As you can see in Answer 1 of this StackOverflow question, the formula for the calculation of the number of parameters of a convolutional network is: channels_in * kernel_width * kernel_height * channels_out + channels_out.
But this formula doesn't agree with your data. And in fact the drawing you are showing does not agree with the table you are giving.
If I base myself on the drawing, then the first CN has 3 entry channels, a 5*5 sliding window and 6 output channels, so the number of parameters should be 456.
You give the number 208, and this is the number obtained for 1 entry channel and 8 output channels (the table says 8, while the drawing says 6). So it seems that 208 is correctly obtained from the table data, if we consider that there is one input channel and not three.
As for the second CN, with 6 entry channels, a sliding window 5*5 and 16 output channels, you need 2,416 parameters, which looks suspiciously close to 416, the number given in the table.
As for the remaining networks it is always the number of input dimension times the number of output dimensions, plus one: 5*5*16*120+1=48,001, 120*84+1=10,081, 84*10+1=841.

Related

what is the encoded format of airbus kaggle?

I'm try to understand the format of the segmentation.
The format of segmentation of air bus detection of kaggle
264661 17 265429 33 266197 33 266965 33 267733...
looks something like this. it does not look like voc format
what kind of format is this?
The evaluation page explains the format.
https://www.kaggle.com/c/airbus-ship-detection/overview/evaluation
In order to reduce the submission file size, our metric uses run-length encoding on the pixel values. Instead of submitting an exhaustive list of indices for your segmentation, you will submit pairs of values that contain a start position and a run length. E.g. '1 3' implies starting at pixel 1 and running a total of 3 pixels (1,2,3).
The competition format requires a space delimited list of pairs. For example, '1 3 10 5' implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. The pixels are one-indexed
and numbered from top to bottom, then left to right: 1 is pixel (1,1), 2 is pixel (2,1), etc. A prediction of of "no ship in image" should have a blank value in the EncodedPixels column.
The metric checks that the pairs are sorted, positive, and the decoded pixel values are not duplicated. It also checks that no two predicted masks for the same image are overlapping.

Interpreting time series dimension?

I am wondering if anyone can explain the interpretation of the size (number of feature) in a time series? For example consider a simple script in Matlab
X= randn(2,5,2)
X(:,:,1) =
-0.5530 0.4291 0.3937 -1.2534 0.2811
-1.4926 -0.7019 -0.8305 -1.4034 1.9545
X(:,:,2) =
0.2004 0.1438 2.3655 -0.1589 0.7140
0.4905 0.2301 -0.7813 -0.6737 0.2552
Assume X is a time series with the following output
This generates 2 vectors of length 5 each has 2 rows. Can anyone tell me what is exactly the meaning of first 2 and 5?
In some websites it says a creating 5 vectors of length 5 and size 2. What does size mean here?
Is 2 like number of features and 5 is like number of time series. The reason for this confusion is because I do not understand how to interpret following sentence:
"Generate 2 vector-valued sequences of length 5; each vector has size
2."
What do size 2 and length 5 mean here?
This entirely depends on your data, and how you want to store this. If you have some 2D data over time, I find it convenient to have a data matrix with in the 1st and 2nd dimension the 2D data per time step, and in the 3rd dimension time.
Say I have a movie of 1920 by 1080 pixels with 100 frames, I'd store this as mov = rand(1080,1920,100) (1080 and 1920 swapped because of row, col order of indexing). So now mov(:,:,1) would give me the first frame etc.
BTW, your X is a normal array, not to be confused with the timeseries object.

How to calculate third element of caffe convnet?

Following this question and this tutorial I've create a simple net just like the tutorial but with 100X100 images and first convolution kernel of 11X11 and pad=0.
I understand that the formula is : (W−F+2P)/S+1 and in my case dimension became [51X51X3] (3 is channel of rgb) but the number 96 popup in my net diagram and as this tutorial said it is third dimension of the output, in other hand , my net after first conv became [51X51X96]. I couldn't figure out , how the number 96 calculated and why.
Isn't the network convolution layer suppose to pass throw three color channel and the output should be three feature map? How come its dimension grow like this? Isn't it true that we have one kernel for each channel ? How this one kernel create 96(or in the first tutorial, 256 or 384) feature map ?
You are mixing input channels and output channels.
Your input image has three channels: R, G and B. Each filter in your conv layer acts on these three channels and its spatial kernel size (e.g., 3-by-3). Each filter outputs a single number per spatial location. So, if you have one filter in your layer then your output would have only one output channel(!)
Normally, you would like to compute more than a single filter at each layer, this is what num_output parameter is used for in convolution_param: It allows you to define how many filters will be trained in a specific convolutional layer.
Thus a Conv layer
layer {
type: "Convolution"
name: "my_conv"
bottom: "x" # shape 3-by-100-by-100
top: "y"
convolution_param {
num_output: 32 # number of filters = number of output channels
kernel_size: 3
}
}
Will output "y" with shape 32-by-98-by-98.

training neural network for user recognition in MATLAB

I'm working on gait recognition problem, the aim of this study is to be used for user authentication
I have data of 36 users
I've successfully extracted 143 features for each sample (or example) which are (36 rows and 143 columns) for each user
( in other words, I have 36 examples and 143 features are extracted for each example. Thus a matrix called All_Feat of 36*143 has been created for each individual user).
By the way, column represents the number of the extracted features and row represents the number of samples (examples) for each feature.
Then I have divided the data into two parts, training and testing (the training matrix contains 25 rows and 143 columns, while the testing Matrix contains 11 rows and 143 columns).
Then, for each user, I divided the matrix (All_Feat) into two matrixes ( Training matrix, and Test matrix ).
The training matrix contains ( 25 rows (examples) and 143 columns), while the testing matrix has (11 rows and 143 columns).
I'm new to classification and this stuff
I'd like to use machine learning (Neural Network) for classifying these features.
Therefore, the first step I need to create a reference template for each user ( which called training phase)
this can be done by training the classifier with the user's features (data) and the remaining Users as well (35 users are considered as imposters).
Based on what I have read, training Neural Network requires two classes, the first class contains all the training data of genuine user (e.g. User1) and labelled with 1 , while the second class has the training data of imposters labelled as 0 (which is binary classification, 1 for the authorised user and 0 for imposters).
**now my question is: **
1- i dont know how to create these classes!
2- For example, if I want to train Neural Network for User1, I have these variables, input and target. what should I assign to these variables?
should input= Training matrix of User1 and Training matrixes of User2, User3,.....User35 ?
Target=what should i assign to this matrix?
I really appreciate any help!
Try this: https://es.mathworks.com/help/nnet/gs/classify-patterns-with-a-neural-network.html
A few notes:
You said that, for each user, you have extracted 136 features. It sounds like you have only one repetition for each user (i.e. the user has tried-used the system once). However, I don't know the source of your data, but I dunno that it hasn't got some type of randomness. You mention gait analysis, and that sounds like that the recorded data of a given one user will be different each time that user uses the system. In other words: the user uses your system, you capture the data, you extract the 136 features (numbers); then, the user uses again the system, but the extracted 136 features will be slightly different. Therefore, you should get several examples for each user to train the classifier. In terms of "matlab matrix" your matrix should have one COLUMN for each example, and 136 rows (each of you features). Since you should have several repetitions for each user (for example 10 times), your big matrix should be something like: 136 rows x 360 columns.
You should "create" one new neural network for each user. Given a user (for example User4), you create a dataset (a new matrix) with samples of that user, and samples of several other users (User1, User3, User5...). You do a binary classification (cases: "user4" against "other users"). After training, it would be advisable to test the classifier with data of other users whose data was not present during the training phase (for example User2 and others). Since you are doing a binary classification your matrices should be somthing like follows:
Example, you have 10 trials (examples) of each user. You want to create a neural network to detect the user User1. The matrix should be like:
(notation cU1_t1 means: column with features of user 1, trial 1)
input_matrix = [cU1_t1; cU1_t2; ...; cU1_t10; cU2_t1; ...; cU36_t10]
The target matrix should be like:
target = a matrix whose 10 first columns are [ 1, 0], and the other 350 columns are [0, 1]. That means that the first 10 columns are of type A, and the others of type B. In this case "type A" means "User1", and "type B" means "Not User1".
Then, you should segment the data (train data, validation data, test data) to train the nerual network and so on. Remember to save some users just for the testing phase, for example, the train matrix should not have any of the columns of five users: user2, user6, user7, user10, user20 (50 columns).
I think you get the idea.
Regards.
************ UPDATE: ******************************
This example assumes that the user selects/indicates its name and then the system uses the neural network to authenticate the user (like a password). I will give you an small example with random numbers.
Let's say you have recorded data from 15 users (but in the future you will have more). You record "gait data" from them when they do something with your recording device. From the recorded signals you extract some features, let's say you extract 5 features (5 numbers). Hence, everytime a user uses the machine you get 5 numbers. Even if user is the same, the 5 numbers will be different each time, because the recorded signals have some randomness. Therefore, to train the neural network you have to have several examples of each user. Let's say that you have 18 repetitions performed by each user.
To sum up this example:
There are 15 users available for the experiment.
Each time the user uses the system you record 5 numbers (features). You get a feature vector. In matlab it will be a COLUMN.
For the experiment each user has performed 18 repetitions.
Now you have to create one neural network for each user. To that end, you have to construct several matrices.
Let's say you want to create the neural network (NN) of user 2 (U2). The NN will classify the feature vectors in 2 classes: U2 and NotU2. Therefore, you have to train and test the NN with examples of this. The group NotU2 represents any other user that it is not U2, however, you should NOT train the NN with data of every other user that you have in your experiment. This will be cheating (think that you can't have data from every user in the world). Therefore, to create the train dataset you will exclude all the repetitions of some users to test the NN during the training (validation dataset) and after the trainning (test dataset). For this example we will use users {U1,U3,U4} for validation, and users {U5,U6,U7} for testing.
Therefore you construct the following matrices:
Train input matrix
It wil have 12 examples of U2 (70% more or less) and every example of users {U8,U9,...,U14,U15}. Each example is a column, hence, the train matrix will be a matrix of 5 rows and 156 columns (12+8*18). I will order it as follows: [U2_ex1, U2_ex2, ..., U2_ex12, U8_ex1, U8_ex2, ..., U8_ex18, U9_ex1, ..., U15_ex1,...U15_ex18]. Where U2_ex1 represents a column vector with the 5 features obtained of User 2 during the repetition/example number 1.
-- Target matrix of train matrix. It is a matrix of 2 rows and 156 columns. Each column j represents the correct class of the example j. The column is formed by zeros, and it has a 1 at the row that indicates the class. Since we have only 2 classes the matrix has only 2 rows. I will say that class U2 will be the first one (hence the column vector for each example of this class will be [1 0]), and the other class (NotU2) will be the second one (hence the column vector for each example of this class will be [0 1]). Obviously, the columns of this matrix have the same order than the train matrix. So, according to the order that I have used, the target matrix will be:
12 columns [1 0] and 144 columns [0 1].
Validation input matrix
It will have 3 examples of U2 (15% more or less) and every example of users [U1,U3,U4]. Hence, this will be a matrix of 6 rows and 57 columns ( 3+3*18).
-- Target matrix of validation matrix: A matrix of 2 rows and 57 columns: 3 columns [1 0] and 54 columns [0 1].
Test input matrix
It will have the remaining 3 examples of U2 (15%) and every example of users [U5,U6,U7]. Hence, this will be a matrix of 6 rows and 57 columns (3+3*18).
-- Target matrix of test matrix: A matrix of 2 rows and 57 columns: 3 columns [1 0] and 54 columns [0 1].
IMPORTANT. The columns of each matrix should have a random order to improve the training. That is, do not put all the examples of U2 together and then the others. For this example I have put them in order for clarity. Obviously, if you change the order of the input matrix, you have to use the same order in the target matrix.
To use MATLAB you will have to to pass two matrices: the inputMatrix and the targetMatrix. The inputMatrix will have the train,validation and test input matrices joined. And the targetMatrix the same with the targets. So, the inputMatrix will be a matrix of 6 rows and 270 columns. The targetMatrix will have 2 rows and 270 columns. For clarity I will say that the first 156 columns are the trainning ones, then the 57 columns of validation, and finally 57 columns of testing.
The MATLAB commands will be:
% Create a Pattern Recognition Network
hiddenLayerSize = 10; %You can play with this number
net = patternnet(hiddenLayerSize);
%Specify the indices of each matrix
net.divideFcn = 'divideind';
net.divideParam.trainInd = [1: 156];
net.divideParam.valInd = [157:214];
net.divideParam.testInd = [215:270];
% % Train the Network
[net,tr] = train(net, inputMatrix, targetMatrix);
In the open window you will be able to see the performance of your neural network. The output object "net" is your neural network trained. You can use it with new data if you want.
Repeat this process for each other user (U1, U3, ...U15) to obtain his/her neural network.

How to calculate the Number of parameters for GoogLe Net?

I have a pretty good understanding of AlexNet and VGG. I could verify the number of parameters used in each layer with what is being submitted in their respective papers.
However when i try to do the same on the GoogleNet paper "Going Deeper With COnvolution", even after many iterations I am NOT able to verify the numbers they have in the 'Table 1' of their paper.
For example, the first layer is the good old plain convolution layer with kernel size (7x7), input number of maps 3 , output number of maps is 64. So based on this fact the number of parameters needed would be (3 * 49 * 64) + 64 (bias) which is around 9.5k but they say they use 2.7k. I did the math for other layers as well and i am always off by few percent than what they report. Any idea?
Thanks
I think the first line (2.7k) is wrong, but the rest of the lines of the table are correct.
Here is my computation:
http://i.stack.imgur.com/4bDo9.jpg
Be care to check which input is connect to which layer,
e.g. for the layer "inception_3a/5x5_reduce":
input = "pool2/3x3_s2" with 192 channels
dims_kernel = C*S*S =192x1x1
num_kernel = 16
Hence parameter size for that layer = 16*192*1*1 = 3072
Looks like they divided the numbers by 1024^n to convert to the K/M labels on the number of parameters in the paper Table 1. That feels wrong. We're not talking about actual storage numbers here (as in "bytes"), but straight up number of parameters. They should have just divided by 1000^n instead.
May be 7*7 conv layer is actually the combination of 7*1 conv layer and 1*7 conv layer, then the num of params could be : ((7+7)*64*3 + 64*2) / 2014 = 2.75k, which approaches 2.7k (or you can omit 128 biases).
As we know, Google introduced asymmetric convolution while doing spatial factorization in paper "Spatial Factorization into Asymmetric Convolutions"
(1x7+7x1)x3x64=2688≈2.7k, this is my opinion, I am a fresh student
Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter. The same expression can be written as follows: ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)