I have this model
Can anyone please explain how to calculate numbers of parameters in each layer and why "conv2d_3" layer has 18464 parameters.
As #today mention, you should check this post first.
conv_3d: 18464 = 32*3*3*64(convolutional kernel)+32(bias per activation)
batch_normalization_1: 128 = 32 * 4
I believe that two parameters in the batch normalization layer are non-trainable. Therefore 64 parameters from bn_1 and 128 parameters from bn_2 are the 192 non-trainable params at the end.
Related
I have a pretty good understanding of AlexNet and VGG. I could verify the number of parameters used in each layer with what is being submitted in their respective papers.
However when i try to do the same on the GoogleNet paper "Going Deeper With COnvolution", even after many iterations I am NOT able to verify the numbers they have in the 'Table 1' of their paper.
For example, the first layer is the good old plain convolution layer with kernel size (7x7), input number of maps 3 , output number of maps is 64. So based on this fact the number of parameters needed would be (3 * 49 * 64) + 64 (bias) which is around 9.5k but they say they use 2.7k. I did the math for other layers as well and i am always off by few percent than what they report. Any idea?
Thanks
I think the first line (2.7k) is wrong, but the rest of the lines of the table are correct.
Here is my computation:
http://i.stack.imgur.com/4bDo9.jpg
Be care to check which input is connect to which layer,
e.g. for the layer "inception_3a/5x5_reduce":
input = "pool2/3x3_s2" with 192 channels
dims_kernel = C*S*S =192x1x1
num_kernel = 16
Hence parameter size for that layer = 16*192*1*1 = 3072
Looks like they divided the numbers by 1024^n to convert to the K/M labels on the number of parameters in the paper Table 1. That feels wrong. We're not talking about actual storage numbers here (as in "bytes"), but straight up number of parameters. They should have just divided by 1000^n instead.
May be 7*7 conv layer is actually the combination of 7*1 conv layer and 1*7 conv layer, then the num of params could be : ((7+7)*64*3 + 64*2) / 2014 = 2.75k, which approaches 2.7k (or you can omit 128 biases).
As we know, Google introduced asymmetric convolution while doing spatial factorization in paper "Spatial Factorization into Asymmetric Convolutions"
(1x7+7x1)x3x64=2688≈2.7k, this is my opinion, I am a fresh student
Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter. The same expression can be written as follows: ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)
I need to use KNN in matlab to find the closest data in training data from A.
I have data in .mat that has this kind of information (training data):
train_data = 1 232 34 21 0.542
2 32 333 542 0.32
and so on.
Then i have a second information that I will gather through the application but I will only get
A = 2 343 543 43 0.23
So now my question is do i only need to do is something like this, and can i use something like this?
Does KNN need to learn something or do you only need to load test data and some present data (like A) and go through some formula or preload in another function that learns it then through a second function to give you the result.
Best regards.
So you have a training set (with labels) and some test data without labels? I think you can use the function you linked to classificationknn(). If i understand your question you want something like the example: Predict Classification Based on a KNN Classifier
http://www.mathworks.se/help/stats/classification-using-nearest-neighbors.html#btap7nm
I've got a problem with implementing multilayered perceptron with Matlab Neural Networks Toolkit.
I try to implement neural network which will recognize single character stored as binary image(size 40x50).
Image is transformed into a binary vector. The output is encoded in 6bits. I use simple newff function in that way (with 30 perceptrons in hidden layer):
net = newff(P, [30, 6], {'tansig' 'tansig'}, 'traingd', 'learngdm', 'mse');
Then I train my network with a dozen of characters in 3 different fonts, with following train parameters:
net.trainParam.epochs=1000000;
net.trainParam.goal = 0.00001;
net.traxinParam.lr = 0.01;
After training net recognized all characters from training sets correctly but...
It cannot recognize more then twice characters from another fonts.
How could I improve that simple network?
you can try to add random elastic distortion to your training set (in order to expand it, and making it more "generalizable").
You can see the details on this nice article from Microsoft Research :
http://research.microsoft.com/pubs/68920/icdar03.pdf
You have a very large number of input variables (2,000, if I understand your description). My first suggestion is to reduce this number if possible. Some possible techniques include: subsampling the input variables or calculating informative features (such as row and column total, which would reduce the input vector to 90 = 40 + 50)
Also, your output is coded as 6 bits, which provides 32 possible combined values, so I assume that you are using these to represent 26 letters? If so, then you may fare better with another output representation. Consider that various letters which look nothing alike will, for instance, share the value of 1 on bit 1, complicating the mapping from inputs to outputs. An output representation with 1 bit for each class would simplify things.
You could use patternnet instead of newff, this creates a network more suitable for pattern recognition. As target function use a 26-elements vector with 1 in the right letter's position (0 elsewhere). The output of the recognition will be a vector of 26 real values between 0 and 1, with the recognized letter with the highest value.
Make sure to use data from all fonts for the training.
Give as input all data sets, train will automatically divide them into train-validation-test sets according to the specified percentages:
net.divideParam.trainRatio = .70;
net.divideParam.valRatio = .15;
net.divideParam.testRatio = .15;
(choose you own percentages).
Then test using only the test set, you can find their indices into
[net, tr] = train(net,inputs,targets);
tr.testInd
My data-set includes 29 inputs and 6 outputs. When I use
net = newff(minmax(Pl),[14 12 8 6]);
to build my feed forward MLP network and train it by
net.trainParam.epochs=50;
net=train(net,Pl,Tl);
the network can not learn my data-set and its error does not decrease below 0.7, but when I use arguments of newff function like this:
net=newff(minmax(Pl),[14 12 8 6],{'tansig' 'tansig' 'tansig' 'purelin'},'trainlm');
the error is decreased very fast and it comes below 0.0001! The unusual note is that when I use the previous code using only one layer including 2 neurons:
net=newff(minmax(Pl),[2 6],{'tansig' 'purelin'},'trainlm');
the error is decreased below 0.2 again and it is doubtful!
Please give me some tips and help me to know what is difference between:
net = newff(minmax(Pl),[14 12 8 6]);
and
net=newff(minmax(Pl),[14 12 8 myANN.m],{'tansig' 'tansig' 'tansig' 'purelin'},'trainlm');
?
I think that the second argument to NEWFF (link requires login) is supposed to be the target vectors, not the size of hidden layers (which is the third argument).
Note that the default transfer function for hidden layers is tansig and for output layer is purelin, and the default training algorithm is trainlm.
Finally, remember that if you want to get reproducible results, you have to manually reset the random number generator to a fixed state at the beginning of each run.
I'm struggling to figure out how exactly to begin using SVD with a MovieLens/Netflix type data set for rating predictions. I'd very much appreciate any simple samples in python/java, or basic pseudocode of the process involved. There are a number of papers/posts that summarise the overall concept but I'm not sure how to begin implementing it, even using a number of the suggested libraries.
As far as I understand, I need to convert my initial data set as follows:
Initial data set:
user movie rating
1 43 3
1 57 2
2 219 4
Need to pivot to be:
user 1 2
movie 43 3 0
57 2 0
219 0 4
At this point, do I simply need to inject this Matrix into an SVD algorithm as provided by available libraries, and then (somehow) extract results, or is there more work required on my part?
Some information I've read:
http://www.netflixprize.com/community/viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html
http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine
http://www.slideshare.net/bmabey/svd-and-the-netflix-dataset-presentation
.. and a number of other papers
Some libraries:
LingPipe(java)
Jama(java)
Pyrsvd(python)
Any tips at all would be appreciated, especially on a basic data set.
Thanks very much,
Oli
See SVDRecommender in Apache Mahout. Your question about input format entirely depends on what library or code you're using. There's not one standard. At some level, yes, the code will construct some kind of matrix internally. For Mahout, the input for all recommenders, when supplied as a file, is a CSV file with rows like userID,itemID,rating.
Data set: http://www.grouplens.org/node/73
SVD: why not just do it in SAGE if you don't understand how to do SVD? Wolfram alpha or http://www.bluebit.gr/matrix-calculator/ will decompose the matrix for you, or it's on Wikipedia.