Backpropagation Optimization: How do I use the derivatives for optimizing the weights and biases? [closed] - neural-network

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Given the derivative of the cost function with respect to the weights or biases of the neurons of a neural network, how do I adjust these neurons to minimize the cost function?
Do I just subtract the derivative multiplied by a constant off of the individual weight and bias? If constants are involved how do I know what is reasonable to pick?

Your right about how to perform the update. This is what is done in gradient descent in its various forms. Learning rates (the constant you are referring to) are generally very small 1e-6 - 1e-8. There are numerous articles on the web covering both of these concepts.
In the interest of a direct answer though, it is good to start out with a small learning rate (on the order suggested above), and check that the loss is decreasing (via plotting). If the loss decreases, you can raise the learning rate a bit. I recommend to raise it by 3x its current value. For example, if it is 1e-6, raise it to 3e-6 and check again that your loss is still decreasing. Keep doing this until the loss is no longer decreasing nicely. This image should give some nice intuition on how learning rates affect the loss curve (image comes from Stanford's cs231n lecture series)
You want to raise the learning rate so that the model doesn't take as long to train. You don't want to raise the learning rate too much because then it is possible to overshoot the local minimum you're descending towards and for the loss to increase (the yellow curve above). This is an oversimplification because the loss landscape of a neural network is very non-convex, but this is the general intuition.

Related

Caffe CNN: diversity of filters within a conv layer [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following theoretical questions regarding the conv layer in a CNN. Imagine a conv layer with 6 filters (conv1 layer and its 6 filters in the figure).
1) what guarantees the diversity of learned filters within a conv layer? (I mean, how the learning (optimization process) makes sure that it does not learned the same (similar) filters?
2) diversity of filters within a conv layer is a good thing or not? Is there any research on this?
3) during the learning (optimization process), is there any interaction between the filters of the same layer? if yes, how?
1.
Assuming you are training your net with SGD (or a similar backprop variant) the fact that the weights are initialized at random encourage them to be diverse, since the gradient w.r.t loss for each different random filter is usually different the gradient will "pull" the weights in different directions resulting with diverse filters.
However, there is nothing that guarantees diversity. In fact, sometimes filters become tied to each other (see GrOWL and references therein) or drop to zero.
2.
Of course you want your filters to be as diverse as possible to capture all sorts of different aspects of your data. Suppose your first layer will only have filters responding to vertical edges, how is your net going to cope with classes containing horizontal edges (or other types of textures)?
Moreover, if you have several filters that are the same, why computing the same responses twice? This is highly inefficient.
3.
Using "out-of-the-box" optimizers, the learned filters of each layer are independent of each other (linearity of gradient). However, one can use more sophisticated loss functions/regularization methods to make them dependent.
For instance, using group Lasso regularization, can force some of the filters to zero while keeping the others informative.

How to choose the number of filters in each Convolutional Layer? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
When building a convolutional neural network, how do you determine the number of filters used in each convolutional layer. I know that there is no hard rule about the number of filters, but from your experience/ papers you have read, etc. is there an intuition/observation about number of filters used?
For instance (I'm just making this up as example):
use more/less filters as the network gets deeper.
use larger/smaller filter with large/small kernel size
If the object of interest in the image is large/small, use ...
As you said, there are no hard rules for this.
But you can get inspiration from VGG16 for example.
It double the number of filters between each conv layers.
For the kernel size, I usually keep 3x3 or 5x5.
But, you can also take a look at Inception by Google.
They use varying kernel size, then concat them. Very interesting.
As far as I am concerned there is no foxed depth for the convolutional layers. Just several suggestions:
In CS231 they mention using 3 x 3 or 5 x 5 filters with stride of 1 or 2 is a widely used practice.
How many of them: Depends on the dataset. Also, consider using fine-tuning if the data is suitable.
How the dataset will reflect the choice? A matter of experiment.
What are the alternatives? Have a look at the Inception and ResNet papers for approaches which are close to the state of the art.

Single Neuron Neural Network - Types of Questions? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can anybody think of a real(ish) world example of a problem that can be solved by a single neuron neural network? I'm trying to think of a trivial example to help introduce the concepts.
Using a single neuron to classification is basically logistic regression, as Gordon pointed out.
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more metric (interval or ratio scale) independent variables. (statisticssolutions)
This is a good case to apply logistic regression:
Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent. (ats)
For a single neuron network, I find solving logic functions a good example. Assuming say a sigmoid neuron, you can demonstrate how the network solves AND and OR functions, which are linearly sepparable and how it fails to solve the XOR function which is not.

MATLAB Simulation of Production Machines Failure - MTBF & MTTR [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to model production machines' failure in terms of MTBF(mean time between failure) and MTTR(mean time to repair), with exponential distribution.
Actually i am modelling a Discrete Event Simulatin (DEV/DEVS) problem. The problem is concerned with FMS(flexible manufacturing system). It involves six unreliable (likely to fail) machines which can fail during operation.
I want to know how to model unreliable machines in terms of MTBF & MTTR using exponential distribution.
Also, I want to study the effect of varying MTBF and MTTR values on the problem. e.g. Distribution used: Exponential;MTBF:400,600.....;MTTR:25,50...
Check this and the related folders, for examples of Stochastic Processes simulated under matlab, in particular, an example of a Poisson Process:
% poisson.m simulates a homogeneous Poisson process
lambda=10; % arrival rate
Tmax=3; % maximum time
T(1)=random('Exponential',1/lambda);
i=1;
while T(i) < Tmax,
T(i+1)=T(i)+random('Exponential',1/lambda);
i=i+1;
end
Y=zeros(1,i-1);
plot(T(1:(i-1)),Y,'.');
Just remember that, the Poisson Process is a chain of exponential events, as the very code suggests, Thus, for representing your machine, you just need to adapt the above code adding as many variables as machines you have...........
The rest, is part of your homework......

Can I use neural network in this case? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
can I use neural networks or svm or etc, if my output data is 27680 that all of them are zero and just one of them is one?
I mean that Is it right to do this?
when I use SVM I have this error:
Error using seqminopt>seqminoptImpl (line 198)
No convergence achieved within maximum number of iterations.
SVMs are usually binary classifiers. Basically that means that they seperate your datapoints into two groups, which signals whether a datapoint does or doesn't belong to a class. Common strategies for solving multi-class problems with SVMs are one-vs-rest and one-vs-one. In the case of one-vs-rest, you would train one classifier per class, which would be 27,680 for you. In the case of one-vs-one, you would train (K over 2) = (K(K-1))/2 classifiers, so in your case around 38 million. As you can see, both numbers are rather high, so I would be pessimistic about your probability of successfully solving your problem with SVMs.
Nevertheless you can try to increase the maximum amount iterations as described in another stackoverflow thread. Maybe it still works.
You can use Neural Nets for your task and a 1-of-K output is nothing unusual. However, even with only one hidden layer of 500 neurons (and using the input and output vector sizes mentioned in your comment) you will have (27680*2*500) + (500*27680) = 41,520,000 weights in your network. So I would expect rather long training times (although a Google employee would probably laugh about these numbers). You will also most likely need a lot of training examples, unless your input is really simple.
As an alternative you might look into Decision Trees/Random Forests, Naive Bayes or kNN.