How to choose the number of filters in each Convolutional Layer? [closed] - neural-network

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
When building a convolutional neural network, how do you determine the number of filters used in each convolutional layer. I know that there is no hard rule about the number of filters, but from your experience/ papers you have read, etc. is there an intuition/observation about number of filters used?
For instance (I'm just making this up as example):
use more/less filters as the network gets deeper.
use larger/smaller filter with large/small kernel size
If the object of interest in the image is large/small, use ...

As you said, there are no hard rules for this.
But you can get inspiration from VGG16 for example.
It double the number of filters between each conv layers.
For the kernel size, I usually keep 3x3 or 5x5.
But, you can also take a look at Inception by Google.
They use varying kernel size, then concat them. Very interesting.

As far as I am concerned there is no foxed depth for the convolutional layers. Just several suggestions:
In CS231 they mention using 3 x 3 or 5 x 5 filters with stride of 1 or 2 is a widely used practice.
How many of them: Depends on the dataset. Also, consider using fine-tuning if the data is suitable.
How the dataset will reflect the choice? A matter of experiment.
What are the alternatives? Have a look at the Inception and ResNet papers for approaches which are close to the state of the art.

Related

Caffe CNN: diversity of filters within a conv layer [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following theoretical questions regarding the conv layer in a CNN. Imagine a conv layer with 6 filters (conv1 layer and its 6 filters in the figure).
1) what guarantees the diversity of learned filters within a conv layer? (I mean, how the learning (optimization process) makes sure that it does not learned the same (similar) filters?
2) diversity of filters within a conv layer is a good thing or not? Is there any research on this?
3) during the learning (optimization process), is there any interaction between the filters of the same layer? if yes, how?
1.
Assuming you are training your net with SGD (or a similar backprop variant) the fact that the weights are initialized at random encourage them to be diverse, since the gradient w.r.t loss for each different random filter is usually different the gradient will "pull" the weights in different directions resulting with diverse filters.
However, there is nothing that guarantees diversity. In fact, sometimes filters become tied to each other (see GrOWL and references therein) or drop to zero.
2.
Of course you want your filters to be as diverse as possible to capture all sorts of different aspects of your data. Suppose your first layer will only have filters responding to vertical edges, how is your net going to cope with classes containing horizontal edges (or other types of textures)?
Moreover, if you have several filters that are the same, why computing the same responses twice? This is highly inefficient.
3.
Using "out-of-the-box" optimizers, the learned filters of each layer are independent of each other (linearity of gradient). However, one can use more sophisticated loss functions/regularization methods to make them dependent.
For instance, using group Lasso regularization, can force some of the filters to zero while keeping the others informative.

neural network check plastic parts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
neural networks are used to generalize and classify...
I have a little experience with classify digits...
Using neural nets to recognize handwritten digits
i want to use a network to check plastic parts.
I have a videostream of production from these plastic parts.
should i train the network with many videos of correct plastic parts to get positive output and random videos to get negative output?
If you have any books or links i would be happy to see them.
EDIT
It looks like i asked a bit stupid...
During production, wrong plastic parts can be created and these should be recognized by network. There are a lot of mistakes can happen during production, so i think
it only makes sense to train the network with correct plastic parts.
A convolution neural network would be my recommendation.
You should show individual parts with similar background and lighting.
The training has to be done on both good and bad parts - a sufficient random sampling of both. You should also set aside a test set once your CNN is trained so you can evaluate it.
You'll want to generate a confusion matrix from the test data so you'll know the rate of false positives, false negatives, correct, and incorrect classifications.

Questions about word embedding(word2vec) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to understand word2vec(word embedding) architecture, and I have few questions about it:
first, why is word2vec model considered a log-linear model? Is it because it uses a soft max at output layer?
second, why does word2vec remove hidden layer? Is it just because of computational complexity?
third, why does word2vec not use activation function? (as compared to NNLM(Neural Network Language Model).
first, why word2vec model is log-linear model? because it uses a soft max at output layer?
Exactly, softmax is a log-linear classification model. The intent is to obtain values at the output that can be considered a posterior probability distribution
second, why word2vec removes hidden layer? it just because of
computational complexity?
third, why word2ved don't use activation function? compare for
NNLM(Neural Network Language Model).
I think your second and third question are linked in the sense that an extra hidden layer and an activation function would make the model more complex than necessary. Note that while no activation is explicitly formulated, we could consider it to be a linear classification function. It appears that the dependencies that the word2vec models try to model can be achieved with a linear relation between the input words.
Adding a non-linear activation function allows the neural network to map more complex functions, which could in turn lead to fit the input onto something more complex that doesn't retain the dependencies word2vec seeks.
Also note that linear outputs don't saturate which facilitates gradient-based learning.

LDA and Dimensionality Reduction [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have dataset consisting of about 300 objects with 84 features for each object. The objects are already separated into two classes. With PCA I´m able to reduce the dimensionality to about 24. I´m using 3 principle components covering about 96% of the variance of the original data. The problem I have is that PCA doesnt care about the ability to separate the classes from each other. Is there a way to combine PCA for reducing feature space and LDA for finding a discriminance function for those two classes ?
Or is there a way to use LDA for finding the features that separate two classes in threedimensional space in the best manner ?
I´m kind of irritated because I found this paper but I´m not really understanding. http://faculty.ist.psu.edu/jessieli/Publications/ecmlpkdd11_qgu.pdf
Thanks in advance.
You should have a look at this article on principle component regression (PCR, what you want if the variable to be explained is scalar) and partial least squares regression (PLSR) with MATLAB's statistics toolbox. In PCR essentially, you choose the principal components as they most explain the dependent variable. They may not be the ones with the largest variance.

Determining the movie popularity for upcoming movies with neural network [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a CSV data set consisting of a movie details per line.
These are: name, budget, revenue, popularity, runtime, rating, votes, date released.
I'm wondering how to split the data set into training, validation and testing sets?
Then of course, how to get some results?
It would be nice to get a brief step-by-step intro on where/how I should begin.
You should use the nntool. In your case I guess curve fitting is appropriate. So use the nftool
Define your input and output in nftool then you can just randomly divide your data into training, validation and testing sets using the nftool. In the Nftool GUI you can choose how much to divide your data (80-10-10 or anything). Then you just follow the interface and then set the specifics of the network (e.g. the number of hidden neurons). Then you just train the network. After training you can plot the performance of the training and depending on the performance you can retrain or change the number of hidden neurons, percentage of the training data and so on.
You can also check this :
http://www.mathworks.com/help/toolbox/nnet/gs/f9-35958.html