Weather Forecasting using neural network - neural-network

I am trying to write a program for weather forecasting using Backpropagation. I have the data of different parameters like temperature, humidity, wind speed, sea level pressure etc. I have set 4 parameters (4 nodes) for input layer(temperature, humidity, wind speed, sea level pressure).
Now I am confused about what will be the output/target. Monthly/seasonal division is necessary?
And how can I normalize those 4 different parameters (between -1 to 1 )?

You could treat this as a multi-class classification problem. Let us say you want to predict if the weather will be sunny, rainy, cloudy, windy. These would be your classes to classify.
You could normalize the input features by the formula : (x-xbar)/mu, where xbar is average of the feature and mu the stdev of the feature.

Related

Neural Networks for predicting Energy at particular date

I am trying to predict Solar Energy value at a particular date.For this purpose I am using the Artificial Neural Networks model.I am having problem in deciding the correct activation function. Since sigmoid function gives me output 0-1, I want to have and output like 256.33. So I thought to apply sigmoid for hidden layer and ReLu for output layer to keep non-linearity in networks.Can you suggest me what is the way to do this? Is my approach correct?
About my Architecture-I am using 3 layers, from which one is hidden-layer.(1) I tried to apply sigmoid for both the layers as activation function.(2)Then I applied ReLU activation for both the function. These two methods were failure. Now I am trying to apply ReLU on output layer and Sigmoid for hidden layer.
One solution would be to choose some value for the maximum possible solar energy that can be generated in one day. Such as the maximum solar energy ever generated in one day or maximum solar energy possible in the best case scenario. Then use that value to scale the output of the Sigmoid function.
f(x) = Sigmoid(x) * MAX_ENERGY

Discriminant analysis method to classify data

my aim is to classify the data into two sections- upper and lower- finding the mid line of the peaks.
I would like to apply machine learning methods- i.e. Discriminant analysis.
Could you let me know how to do that in MATLAB?
It seems that what you are looking for is GMM (gaussian mixture model). With K=2 (number of mixtures) and dimension equal 1 this will be simple, fast method, which will give you a direct solution. Given components it is easy to analytically find a local minima (which is just a weighted average of means, with weights proportional to the std's).

Evaluating performance of Neural Network embeddings in kNN classifier

I am solving a classification problem. I train my unsupervised neural network for a set of entities (using skip-gram architecture).
The way I evaluate is to search k nearest neighbours for each point in validation data, from training data. I take weighted sum (weights based on distance) of labels of nearest neighbours and use that score of each point of validation data.
Observation - As I increase the number of epochs (model1 - 600 epochs, model 2- 1400 epochs and model 3 - 2000 epochs), my AUC improves at smaller values of k but saturates at the similar values.
What could be a possible explanation of this behaviour?
[Reposted from CrossValidated]
To cross check if imbalanced classes are an issue, try fitting a SVM model. If that gives a better classification(possible if your ANN is not very deep) it may be concluded that classes should be balanced first.
Also, try some kernel functions to check if this transformation makes data linearly separable?

Training data range for Neural Network

Is it better for Neural Network to use smaller range of training data or it does not matter? For example, if I want to train an ANN with angles (values of float) should I pass those values in degrees [0; 360] or in radians [0; 6.28] or maybe all values should be normalized to range [0; 1]? Does the range of training data affects ANN learing quality?
My Neural Network has 6 input neurons, 1 hidden layer and I am using sigmoid symmetric activation function (tanh).
For the neural network it doesn't matter whether the data is normalised.
However, the performance of the training method can vary a lot.
In a nutshell: typically the methods prefer variables which have larger values. This might send the training method off-track.
Crucial for most NN training methods is that all dimensions of the training data have the same domain. If all your variables are angles it doesn't matter, whether they are [0,1) or [0,2*pi) or [0,360) as long as they have the same domain. However, you should avoid having one variable for the angle [0,2*pi) and another variable for the distance in mm where distance can be much larger then 2000000mm.
Two cases where an algorithm might suffer in these cases:
(a) regularisation: if the weights of the NN are force to be small a tiny change of a weight controlling the input of a large domain variable has a much larger impact, than for a small domain
(b) gradient descent: if the step size is limited you have similar effects.
Recommendation: All variables should have the same domain size whether it is [0,1] or [0,2*pi] or ... doesn't matter.
Addition: for many domain "z-score normalisation" works extremely well.
The data points range affects the way you train a model. Suppose the range of values for features in the data set is not normalized. Then, depending on your data, you may end up having elongated Ellipses for the data points in the feature space and the learning model will have a very hard time learning the manifold on which the data points lie on (learn the underlying distribution). Also, in most cases the data points are sparsely spread in the feature space, if not normalized (see this). So, the take-home message is to normalize the features when possible.

Will an average of neural net weights be as effective as one humungous similation?

I'm planning to write a neural network to predict the closing price on day n, using open, high,low,close,& volume for days n-10 to n-1, and doing this for apx 800 days.
I was going to use an 'all but 1' strategy for validation, but this would basically square the number of simulations I'd have to run. Very inefficient!
Should it be significantly less accurate if I ran the simulation just once for each of the 800 days, and stored the weights, and then, for validation, average the weights for all dates except the one to be predicted and the 10 preceding dates?
If the transfer function were linear, it should make no difference, but of course that logistic function makes it non-linear.