What is bias weight in Matlab perception - matlab

what is bias weight in Matlab perception
I implemented the OR gate perceptron using nntool. Its works ok but what is the contribution of bias weight in case of the nntool perceptron.

If you interpret the perceptron as a linear classifier, then you can interpret the the perceptron itself as an algorithm saying if your input lies on one or the other side of the line. The bias parameter is then the distance of this computed line from the origin.

Related

weight and bias of trained LSTM network in Matlab

I tried to get and set weight and bias of trained lstm in Matlab but I failed.
Does any one know how I can get and set weight and bias of a LSTM becuase function like getwb(net) did not work.

Is non-linearity added to neural networks because of its derivatives?

I have question:
I always assumed that non-linearity was applied to a neural-network in order to calculate the minimum of a error surface.
If the function is f(x)=mx+b the derivative is always f'(x) = 1.
Is this one of the reasons why non-linearity ( exempli gratia through sigmoid functions which derivative is f'(x)=f(x)*(1-f(x))) is applied?
Thank you very much.
The neural network is a model of your problem, making predictions
for inputs. The loss function is a measure of the accuracy of
predictions with respect to the observed results.
"Linearity" typically refers to the model. A linear model is a very
simple one: many interesting problems can be approximated by linear
functions, but often you need a more sophisticated model.
Since the sequential composition of linear functions is still linear,
the expressiveness of deep networks derives from the fact of inserting
non linear activation functions modulating the output of artificial
neurons (approximating a thresholding filter). These non linear functions
must be derivable to work with the backpropagation algorithm.
Indipendently from the model, the loss function can be "linear" (L1),
such as the sum of absolute deviations, or non linear, such as
mean squared residuals (L2) or other different loss functions. Again,
the loss function must be derivable too.
See for instance this lecture by Hinton et al.
for the discussion of a simple linear model with a L2 loss function
(then enriched with a sigmoid activation function).

Why do we need biases in the neural network?

We have weights and optimizer in the neural network.
Why cant we just W * input then apply activation, estimate loss and minimize it?
Why do we need to do W * input + b?
Thanks for your answer!
There are two ways to think about why biases are useful in neural nets. The first is conceptual, and the second is mathematical.
Neural nets are loosely inspired by biological neurons. The basic idea is that human neurons take a bunch of inputs and "add" them together. If the sum of the inputs is greater than some threshold, then the neuron will "fire" (produce an output that goes to other neurons). This threshold is essentially the same thing as a bias. So, in this way, the bias in artificial neural nets helps to replicate the behavior of real, human neurons.
Another way to think about biases is simply by considering any linear function, y = mx + b. Let's say you are using y to approximate some linear function z. If z has a non-zero z-intercept, and you have no bias in the equation for y (i.e. y = mx), then y can never perfectly fit z. Similarly, if the neurons in your network have no bias terms, then it can be harder for your network to approximate some functions.
All that said, you don't "need" biases in neural nets--and, indeed, recent developments (like batch normalization) have made biases less frequent in convolutional neural nets.

Backpropagation - error derivative

I am learning the backpropagation algorithm used to train neural networks. It kind of makes sense, but there is still one part I don't get.
As far as I understand, the error derivative is calculated with respect to all weights in the network. This results in an error gradient whose number of dimensions is the number of weights in the net. Then, the weights are changed by the negative of this gradient, multiplied by the learning rate.
This seems about right, but why is the gradient not normalized? What is the rationale behind the length of the delta vector being proportional to the length of the gradient vector?
You can't normalize gradient. Actually in backpropogation you have gradient descent method of error. Instead you normalize and scale your input. And then it will give you proportional movement on the error surface and proportional movement on the error surface will give you faster approach to local or sometimes global minima. Here you can see explanation of what normalization does

Interpreting neurons in the neural network

I have come up with a solution for a classification problem using neural networks. I have got the weight vectors for the same too. The data is 5 dimensional and there are 5 neurons in the hidden layer.
Suppose neuron 1 has input weights w11, w12, ...w15
I have to explain the physical interpretation of these weights...like a combination of these weights, what does it represent in the problem.Does any such interpretation exist or is that the neuron has no specific interpretation as such?
A single neuron will not give you any interpretation, but looking at a combination of couple of neuron can tell you which pattern in your data is captured by that set of neurons (assuming your data is complicated enough to have multiple patterns and yet not too complicated that there is too many connections in the network).
The weights corresponding to neuron 1, in your case w11...w15, are the weights that map the 5 input features to that neuron. The weights quantify the extent to which each feature will effect its respective neuron (which is representing some higher dimensional feature, in turn). Each neuron is a matrix representation of these weights, usually after having an activation function applied.
The mathematical formula that determines the value of the neuron matrix is matrix multiplication of the feature matrix and the weight matrix, and using the loss function, which is most basically the sum of the square of the difference between the output from the matrix multiplication and the actual label.Stochastic Gradient Descent is then used to adjust the weight matrix's values to minimize the loss function.