I am trying to predict Solar Energy value at a particular date.For this purpose I am using the Artificial Neural Networks model.I am having problem in deciding the correct activation function. Since sigmoid function gives me output 0-1, I want to have and output like 256.33. So I thought to apply sigmoid for hidden layer and ReLu for output layer to keep non-linearity in networks.Can you suggest me what is the way to do this? Is my approach correct?
About my Architecture-I am using 3 layers, from which one is hidden-layer.(1) I tried to apply sigmoid for both the layers as activation function.(2)Then I applied ReLU activation for both the function. These two methods were failure. Now I am trying to apply ReLU on output layer and Sigmoid for hidden layer.
One solution would be to choose some value for the maximum possible solar energy that can be generated in one day. Such as the maximum solar energy ever generated in one day or maximum solar energy possible in the best case scenario. Then use that value to scale the output of the Sigmoid function.
f(x) = Sigmoid(x) * MAX_ENERGY
Related
In a neural network, for an intermediate layer, I need to threshold the output. The output of each neuron in the layer is a real value, but I need to binarize it (to 0 or 1). But with hard thresholding, backpropagation won't work. Is there a way to achieve this?
Details:
I have a GAN kind of network i.e. there are 2 neural networks trained end-to-end. The output of first neural network is real values. I need them to be binary values. I read that Gumbel Softmax (Categorical Reparameterization) is used to handle discrete variables in a neural network. Is there a way to use that for my use-case? If yes, how? If not, is there any other way?
From what I could gather in internet is that Gumbel is a probability distribution. Using that we can generate a discrete distribution. But for use-case, I need a function that can take a real input and output a binary value. So, I need an activation function of that form. How can I achieve that?
Thanks!
I have been experimenting with neural networks these days. I have come across a general question regarding the activation function to use. This might be a well known fact to but I couldn't understand properly. A lot of the examples and papers I have seen are working on classification problems and they either use sigmoid (in binary case) or softmax (in multi-class case) as the activation function in the out put layer and it makes sense. But I haven't seen any activation function used in the output layer of a regression model.
So my question is that is it by choice we don't use any activation function in the output layer of a regression model as we don't want the activation function to limit or put restrictions on the value. The output value can be any number and as big as thousands so the activation function like sigmoid to tanh won't make sense. Or is there any other reason? Or we actually can use some activation function which are made for these kind of problems?
for linear regression type of problem, you can simply create the Output layer without any activation function as we are interested in numerical values without any transformation.
more info :
https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
for classification :
You can use sigmoid, tanh, Softmax etc.
If you have, say, a Sigmoid as an activation function in output layer of your NN you will never get any value less than 0 and greater than 1.
Basically if the data your're trying to predict are distributed within that range you might approach with a Sigmoid function and test if your prediction performs well on your training set.
Even more general, when predict a data you should come up with the function that represents your data in the most effective way.
Hence if your real data does not fit Sigmoid function well you have to think of any other function (e.g. some polynomial function, or periodic function or any other or a combination of them) but you also should always care of how easily you will build your cost function and evaluate derivatives.
Just use a linear activation function without limiting the output value range unless you have some reasonable assumption about it.
I have trained a feedforward neural network in Matlab. Now I have to implement this neural network in C language (or simulate the model in Matlab using mathematical equations, without using direct functions). How do I do that? I know that I have to take the weights and bias and activation function. What else is required?
There is no point in representing it as a mathematical function because it won't save you any computations.
Indeed all you need is the weights, biases, activation and your architecture. I'm assuming it is a simple feedforward network as you said, you need to implement some kind of matrix multiplication and addition in C. Also, you'll need to implement the activation function. After that, you're ready to go. Your feed forward NN is ready to be implemented. If the C code will not be used for training, it won't be necessary to implement the backpropagation algorithm in C.
A feedforward layer would be implemented as follows:
Output = Activation_function(Input * weights + bias)
Where,
Input: (1 x number_of_input_parameters_for_this_layer)
Weights: (number_of_input_parameters_for_this_layer x number_of_neurons_for_this_layer)
Bias: (1 x number_of_neurons_for_this_layer)
Output: (1 x number_of_neurons_for_this_layer)
The output of a layer is the input to the next layer.
After some days of searching, I have found the following webpage to be very useful http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
The picture below shows a simple feedforward neural network. Picture taken from the above website.
In this figure, the circles denote the inputs to the network. The circles labeled “+1” are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer, and the rightmost layer the output layer (which, in this example, has only one node). The middle layer of nodes is called the hidden layer, because its values are not observed in the training set. In this example, the neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit.
The mathematical equations representing this feedforward network are
This neural network has parameters (W,b)=(W(l),b(l),W(2),b(2)), where we write Wij(l) to denote the parameter (or weight) associated with the connection between unit j in layer l, and unit i in layer l+1. (Note the order of the indices.) Also, bi(l) is the bias associated with unit i in layer l+1.
So, from the trained model, as Mido mentioned in his answer, we have to take the input weight matrix which is W(1), the layer weight matrix which is W(2), biases, hidden layer transfer function and output layer transfer function. After this, use the above equations to estimate the output hW,b(x). A popular transfer function used for a regression problem is tan-sigmoid transfer function in the hidden layer and linear transfer function in the output layer.
Those who use Matlab, these links are highly useful
try to simulate neural network in Matlab by myself
Neural network in MATLAB
Programming a Basic Neural Network from scratch in MATLAB
If the output is a tanh function, then I get a number between -1 and 1.
How do I go about converting the output to the scale of my y values (which happens to be around 15 right now, but will vary depending on the data)?
Or am I restricted to functions which vary within some kind of known range...?
Just remove the tanh, and your output will be an unrestricted number. Your error function should probably be squared error.
You might have to change the gradient calculation for your back-prop, if this isn't done automatically by your framework.
Edit to add: You almost certainly want to keep the tanh (or some other non-linearity) between the recurrent connections, so remove it only for the output connection.
In most RNNs for classification, most people use a softmax layer on top of their LSTM or tanh layers so I think you can replace the softmax with just a linear output layer. This is what some people do for regular neural networks as well as convolutional neural networks. You will still have the nonlinearity from the hidden layers, but your outputs will not be restricted within a certain range such as -1 and 1. The cost function would probably be the squared error like larspars mentioned.
I am trying to make a simple radial basis function network (RBFN) for regression. I have a 20 dimensional (feature) dataset with over 600 samples. I need the final network to output 1 scalar value for each 20 dimensional sample.
Note: new to machine learning...and feel like I am missing an important concept here.
With the perceptron we can, and I have, trained a linear network until the prediction error is at a minimum using a small subset of the initial samples.
Is there a similar process with the RBFN?
Yes there is,
The main two differences between a multi-layer perceptron and a RBFN are the fact that a RBFN usually implies just one layer and that the activation function is a gaussian instead of a sigmoid.
The training phase can be done using gradient descend of the error loss function, so it is relatively simple to implement.
Keep in mind that RBFN is a linear combination of RBF units, so the range of the output is limited and you would need to transform it if you need an scalar outside of that range.
There is a few of resources that you could consult as reference:
[PDF] (http://scholar.lib.vt.edu/theses/available/etd-6197-223641/unrestricted/Ch3.pdf)
[Wikipedia] (http://en.wikipedia.org/wiki/Radial_basis_function_network)
[Wolfram] (http://reference.wolfram.com/applications/neuralnetworks/NeuralNetworkTheory/2.5.2.html)
Hope it helps,