Feedforward Neural Network - neural-network

Will there ever come a point in my epoch where my weights would become greater than 1 if i used the logistic function as my sigmoid? Just wanna check if i'm coding the proper way for my feedforward implementation. Thanks.

That's possible. E.g. If you have a 1-input, 1-output feedforward network that has no bias, your only training input is 0.1 and the corresponding output is 1, then the higher the weight, the better. The logistic function simply ensures that the output is between 0 and 1.

Related

How would a multiple output classification neural network work?

I currently understand and made a simple neural network which solves the XOR problem. I want to make a neural network for digit recognition. I know using MNIST data I would need 784 input neurons, 15 hidden neurons and 10 output neurons (0-9).
However, I don’t understand how the network would be trained and how feed forward would work with multiple output neurons.
For example, if the input was the pixels for the digit 3, how would the network determine which output neuron is picked and when training, how would the network know which neuron should be associated with the target value.
Any help would be appreciated.
So you have a classification problem with multiple outputs. I'm supposing that you are using a softmax activation function for the output layer.
How the network determines which output neuron is picked: simple, the output neuron with the greatest probability of being the target class.
The network would be trained with standard backpropagation, same algorithm that you would have with only one output.
There is only one difference: the activation function.
For binary classification you need only one output (for example with digits 0 and 1, if probability < 0.5 then class is 0, else 1).
For multi-class classification you need an output node for each class; then the network will pick the node with the greatest probability of being the target class.

Why do we need biases in the neural network?

We have weights and optimizer in the neural network.
Why cant we just W * input then apply activation, estimate loss and minimize it?
Why do we need to do W * input + b?
Thanks for your answer!
There are two ways to think about why biases are useful in neural nets. The first is conceptual, and the second is mathematical.
Neural nets are loosely inspired by biological neurons. The basic idea is that human neurons take a bunch of inputs and "add" them together. If the sum of the inputs is greater than some threshold, then the neuron will "fire" (produce an output that goes to other neurons). This threshold is essentially the same thing as a bias. So, in this way, the bias in artificial neural nets helps to replicate the behavior of real, human neurons.
Another way to think about biases is simply by considering any linear function, y = mx + b. Let's say you are using y to approximate some linear function z. If z has a non-zero z-intercept, and you have no bias in the equation for y (i.e. y = mx), then y can never perfectly fit z. Similarly, if the neurons in your network have no bias terms, then it can be harder for your network to approximate some functions.
All that said, you don't "need" biases in neural nets--and, indeed, recent developments (like batch normalization) have made biases less frequent in convolutional neural nets.

Backpropagation and training set for dummies

I'm at the very beginning of studying neural networks but my scarce skills or lack of intelligence do not allow me to understand from popular articles how to correctly prepare training set for backpropagation training method (or its limitations). For example, I want to train the simplest two-layer perceptron to solve XOR with backpropagation (e. g. modify random initial weights for 4 synapses from first layer and 4 from second). Simple XOR function has two inputs, one output: {0,0}=>0, {0,1}=>1, {1,0}=>1, {1,1}=>0. But neural networks theory tells that "backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient". Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs and this restriction can not be avoided? Or does it means, if I want to use backpropagation for solving such classification tasks as XOR (i. e. number of inputs is bigger than number of outputs), theory tells that it's always necessary to remake training set in the similarly way (input=>desired output): {0,0}=>{0,0}, {0,1}=>{1,1}, {1,0}=>{1,1}, {1,1}=>{0,0}?
Thanks for any help in advance!
Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs
If you mean the output is "the class" in classification task, I don't think so,
backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient
I think it's mean every input should have an output, not a different output.
In real life problem, like handwriting digit classification (MNIST), there are around 50.000 data training (input), but only classed to 10 digit

How to convert RNN into a regression net?

If the output is a tanh function, then I get a number between -1 and 1.
How do I go about converting the output to the scale of my y values (which happens to be around 15 right now, but will vary depending on the data)?
Or am I restricted to functions which vary within some kind of known range...?
Just remove the tanh, and your output will be an unrestricted number. Your error function should probably be squared error.
You might have to change the gradient calculation for your back-prop, if this isn't done automatically by your framework.
Edit to add: You almost certainly want to keep the tanh (or some other non-linearity) between the recurrent connections, so remove it only for the output connection.
In most RNNs for classification, most people use a softmax layer on top of their LSTM or tanh layers so I think you can replace the softmax with just a linear output layer. This is what some people do for regular neural networks as well as convolutional neural networks. You will still have the nonlinearity from the hidden layers, but your outputs will not be restricted within a certain range such as -1 and 1. The cost function would probably be the squared error like larspars mentioned.

Non-linear classification vs regression with FFANN

I am trying to differentiate between two classes of data for forecasting. Basically the dependent variables are features of a signal that I want to forecast. I want to predict whether the signal will have a positive or negative slope in the near future (1 time step ahead). I have tried with different time series analysis, such as Fourier analysis, fitting using neural networks, auto-regressive models, and classification with neural nets (using patternet in Matlab).
The function is continuous, so the most logical assumption is to use some regression analysis tool to determine what's going to happen. However, since I only care whether the slope is going to positive or negative, I changed the signal to a binary signal (1 if the slope is positive, -1 if the slope is 0 or negative).
This is by the far the best results I have gotten! However, for some unknown reason a neural net designed for classification did not work (the confusion matrix stated that there was a precision of around 50%). So I decided to try with a regular feedforward neural net...
Since the neural network outputs continuous data, I didn't know what to do... But then I remembered about Logistic regression, and since its transfer function is a log function (bounded by 0 and 1), it can be interpreted as a probability. So I basically did the same, defined a threshold (e.g above 0 is 1, below 0 is -1), and voila! The precision sky-rocked! I am getting a precision of around 70-80%.
Since I am using a sigmoid transfer function, the neural network wll have a continuous output just as logistic regression (but on this case between -1 and 1), so I am assuming my approach is technically still regression and not classification. My question is... Which is better? For my specific problem where fitting did not give really good results but I had to convert this to a binary problem... Which should give better results? Classification or regression?
Should I try a different configuration of a neural net (with a different transfer function), should I try with support vector machine or any other classification algorithm? Or should I stick with regression but defining a threshold myself just as I would do with logistic regression?