Output Range of Neural Networks in MATLAB - matlab

I'm using a Nueral Network to solve a regression problem.
I've scaled all the values to fall in the interval [0,1].
Therefore, all the training inputs and outputs are in [0,1].
However, when I run the network for some test examples, the values are going below 0. How can I get over this? I want all the values to be in [0,1].

If by "scaled all the values in [0,1]" you mean normalization of the dataset, then all only the input vectors are in [0,1]. The output of a neuron by itself can take any value. The activation function is what maps the output to the [0,1] or [-1,1] interval. Since some outputs are below zero, your network is probably using the tansig function as activation. Change that to the logsig function, which has the same shape but gives output in [0,1] instead of [-1,1]

Related

roc curve from SVM classifier is visualise with limite thresholds in Python

i am trying to plot ROC to evaluate my classifier, however my ruc plot is not "smooth". It supposed to be some problem with the thresholds? i am quite new in python classification so propably there is sth wrong with my code. see image below. Where i sould look for solution?
i used that drop_intermediate=False but it does not help;/
This is because you are passing 0 and 1 values (predicted labels) to the plotting function. The ROC curve can only be figured out, when you provide floats in a range of 0.0 to 1.0 (predicted label probabilities) such that the ROC curve can consider multiple cutoff values and appears more "smooth" as a result.
Whatever classifier you are using, make sure y_train_pred contains float values in the range [0.0,1.0]. If you have a scoring classifier with values in the range [-∞,+∞] you can apply a sigmoid function to remap the values to this range.

Function approximation by ANN

So I have something like this,
y=l3*[sin(theta1)*cos(theta2)*cos(theta3)+cos(theta1)*sin(theta2)*cos(theta3)-sin(theta1)*sin(theta2)*sin(theta3)+cos(theta1)*cos(theta2)sin(theta3)]+l2[sin(theta1)*cos(theta2)+cos(theta1)*sin(theta2)]+l1*sin(theta1)+l0;
and something similar for x. Where thetai is angles from specified interval and li some coeficients. Task is approximate inversion of equation, so you set x and y and result will be appropriate theta. So I random generate thetas from specified intervals, compute x and y. Then I norm x and y between <-1,1> and thetas between <0,1>. This data I used as training set in such way, inputs of network are normalized x and y, outputs are normalized thetas.
I train the network, tried different configuration and absolute error of network was still around 24.9% after whole night of training. It's so much, so I don't know what to do.
Bigger training set?
Bigger network?
Experiment with learning rate?
Longer training?
Technical info
As training algorithm was used error back propagation. Neurons have sigmoid activation function, units are biased. I tried topology: [2 50 3], [2 100 50 3], training set has length 1000 and training duration was 1000 cycle(in one cycle I go through all dataset). Learning rate has value 0.2.
Error of approximation was computed as
sum of abs(desired_output - reached_output)/dataset_lenght.
Used optimizer is stochastic gradient descent.
Loss function,
1/2 (desired-reached)^2
Network was realized in my Matlab template for NN. I know that is weak point, but I'm sure my template is right because(successful solution of XOR problem, approximation of differential equations, approximation of state regulator). But I show this template, because this information may be useful.
Neuron class
Network class
EDIT:
I used 2500 unique data within theta ranges.
theta1<0, 180>, theta2<-130, 130>, theta3<-150, 150>
I also experiment with larger dataset, but accuracy doesn't improve.

How to model scalar values with a neural network if besides direction the magnitude matters too

Say you want to predict temperature changes based on some input data. Temperature changes are positive or negative scalars with a mean of zero. If only the direction matters one could just use tanh as an activation function in the output layer. But say for delta-temperatures predicting the magnitude of the change is also important, not just the sign.
How would you model this output. Tanh doesn't seem to be a good choice because it gives values between -1 and 1. And say temperature changes have a gaussian, or some other weird distribution, so hovering around the center quasi-linear domain of tanh(+-0) would be difficult to learn for a neural network. I'm worried that the sign would be good but the magnitude output would be useless.
How about having the network output one-hot vectors of length N, treat the argmax of this output vector as a temperature change on a pre-defined window. Say the window is -30 - +30 degrees, using N=60 long one-hot vector, if argmax( output )=45 that means the prediction is about 15 degrees.
I was actually not sure how to search for this topic.

Creating a 1D Second derivative of gaussian Window

In MATLAB I need to generate a second derivative of a gaussian window to apply to a vector representing the height of a curve. I need the second derivative in order to determine the locations of the inflection points and maxima along the curve. The vector representing the curve may be quite noise hence the use of the gaussian window.
What is the best way to generate this window?
Is it best to use the gausswin function to generate the gaussian window then take the second derivative of that?
Or to generate the window manually using the equation for the second derivative of the gaussian?
Or even is it best to apply the gaussian window to the data, then take the second derivative of it all? (I know these last two are mathematically the same, however with the discrete data points I do not know which will be more accurate)
The maximum length of the height vector is going to be around 100-200 elements.
Thanks
Chris
I would create a linear filter composed of the weights generated by the second derivative of a Gaussian function and convolve this with your vector.
The weights of a second derivative of a Gaussian are given by:
Where:
Tau is the time shift for the filter. If you are generating weights for a discrete filter of length T with an odd number of samples, set tau to zero and allow t to vary from [-T/2,T/2]
sigma - varies the scale of your operator. Set sigma to a value somewhere between T/6. If you are concerned about long filter length then this can be reduced to T/4
C is the normalising factor. This can be derived algebraically but in practice I always do this numerically after calculating the filter weights. For unity gain when smoothing periodic signals, I will set C = 1 / sum(G'').
In terms of your comment on the equivalence of smoothing first and taking a derivative later, I would say it is more involved than that. As which derivative operator would you use in the second step? A simple central difference would not yield the same results.
You can get an equivalent (but approximate) response to a second derivative of a Gaussian by filtering the data with two Gaussians of different scales and then taking the point-wise differences between the two resulting vectors. See Difference of Gaussians for that approach.

backpropagation neural network with discrete output

I am working through the xor example with a three layer back propagation network. When the output layer has a sigmoid activation, an input of (1,0) might give 0.99 for a desired output of 1 and an input of (1,1) might give 0.01 for a desired output of 0.
But what if want the output to be discrete, either 0 or 1, do I simply set a threshold in between at 0.5? Would this threshold need to be trained like any other weight?
Well, you can of course put a threshold after the output neuron which makes the values after 0.5 as 1 and, vice versa, all the outputs below 0.5 as zero. I suggest to don't hide the continuous output with a discretization threshold, because an output of 0.4 is less "zero" than a value of 0.001 and this difference can give you useful information about your data.
Do the training without threshold, ie. computes the error on a example by using what the neuron networks outputs, without thresholding it.
Another little detail : you use a transfer function such as sigmoid ? The sigmoid function returns values in [0, 1], but 0 and 1 are asymptote ie. the sigmoid function can come close to those values but never reach them. A consequence of this is that your neural network can not exactly output 0 or 1 ! Thus, using sigmoid times a factor a little above 1 can correct this. This and some other practical aspects of back propagation are discussed here http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf