Data normalisation for presenting to neural network - neural-network

I'm experimenting with neural networks and as an introduction I'm doing the popular stock market prediction method; feed in price and volumes in order to predict the future price. I need to normalise my data before presenting it to the network, but I'm unsure as to the methodology...
Each stock has a closing price and volume figure for each trading day; do I normalise the price data across the prices of all stocks for each day, or do I normalise it against the previous prices for that one stock?
I.e. I'm presenting StockA to the NN, do I normalise the price data against the previous prices of StockA, or do I normalise it with the price of StockA, B, C, D... for the date that's being presented?

In my opinion you should be treating this issue as a hyperparameter,
will say: Try both and do what works best.
In the end this comes down to what the Information in the stocks are like, and how many different (quantity, characteristics) stock data you have.
If you normalize over each single stock you'll probably get a better generalization, especially if you only have few data available.
However if you normalize over the whole stock data, you still keep the overall information of the whole dataset in each stock dataset (e.g. Magnitude of the stockprice) - which might help your model, since more expensive stocks might behave different from less expensive once.

Related

Having a data set contains

raw_data
Having a dataset which contains variables such as ordered units, average price and discount (example attached) how I could find the optimal discount numerically?
Plotting ordered units vs. discount, it seems the optimal discount is around 10% (considering more units to be ordered). How I numerically can support or reject this guess? However, maximising ordered units may not need maximising profit, but I got only that much of data to decide on the optimal discount.
ordered units vs. discount
Thank you!

Imputing missing values for linear regression model, using linear regression

I scraped a real estate website and would like to impute missing data on total area (about 40% missing) using linear regression. I achieve the best results using price, number of rooms, bedrooms, bathrooms, and powder rooms.
Correlation matrix
Adding price to the room information makes a significant difference. This makes sense, since the number of rooms alone don't give you any information on how large those rooms may be. Price can reduce some of that uncertainty. There is a 20 point difference between the R^2 scores of the model that includes and the one that excludes price (0.62 vs 0.82).
The problem that I see, is that my final model would likely also be a liner regression with price as the target. With this, it seems wrong to include price in predicting total area for imputation. My final model will look better as a consequence but I will have engineered a synthetic correlation. This is especially critical since about 40% of values need to be replaced.
Does anyone disagree with this? Should I keep price as a predictor to impute missing values even though it will be the target of my final model?
Based by context, I think you're talking about Hotel prices?
Based from my experience, imputing missing values for your predictor values, it can really make a significant boost to R^2 Scores, however the more you impute the predictor, the fewer observations you have, and thus it will be bias to conclude that to a bigger picture of Hotel Prices, since you may never know if there exist unobserved Hotel Prices with more variation right?

Neural network for weather forcast?

I have to finish a final project in data mining course. Does it make sense that I want to predict the weather using neural network? I want to use today's weather data to predict the next day's event such as raining and thunderstorm. I am afraid the teacher will say we can see the weather report everyday and this prediction is useless.
I can't say about your teacher, I just can propose for you some way how to make weather forecast better. As usually weather forecast is made by analysis of movement of clouds, winds, speed of movement and then forecast is calculated by human according to some algorithm. But if you want to make some predictions with neural network, you can use data at this web site: http://www.wunderground.com. And let's say, you want to predict weather in city A. The weather in that city depend from what happens around that city ( wind, cloud masses, period of year, time of day, etc. ). So in order to predict weather in city A, you can feed in NN weather in cities around city A. And more cities, or even countries around city A you feed in NN, the more chances of better forecast to receive you have. And if you'll provide enough data into NN, then more chances, that your NN will outperform standard weather report.
You can make a study on how difficult it may be, what can you do to help understand the problem better, improve it and suggest further studies. This is what typically is expected from a project like yours.
I assume this will not be a regression problem, but a classification problem. I would study the prediction performance of chosen features such as temperatures of various time slices in the past, same for amount of clouds, type of clouds, etc. In case you need more, just go out, look into the sky, feel the weather and inspire yourself ;)
And try to use more classifiers, such as SVM, RBF and build upon your conclusions. Best of luck!

Lake Visitor Modeling by Neural Networks

Let's say I want to model the amount of visitors at an arbitrary lake at specific time.
Given Data:
Time Series of Amount of Visitors for 12 lakes.
Weather Time Series for the 12 lakes
Number of Trees at lake
Percentage of grass/stone ground of the beach.
Hereby I want to use a Neural Network (NN) to model the amount of visitors and I have some essential questions which I want to introduce step by step. Note that the visitor time series shall not be used!
1) we only use the Inputs:
Time of Day
Day of Week
So there is two inputs and one output. I read of a rule of thumb which says that the hidden neurons should be chosen as
#input>=neurons>=#output.
Is the number of inputs here 2 or is it an estimate of the real amount of dependent variables (as weather, mood of persons, economical situation, ....). If yes so I should choose my hidden neurons as 1 or 2, correct?
2) If I want to include lake specific parameters as the number of treas or the ground ratio, can I just add these as additional inputs (constant for each of the twelve lakes) or would that not help for some reason? How could I assure that there is a causal connection between these inputs and the output?
3) For weather since it is a time series which weather values should I use. How do I get the optimal delay for example. Would Granger Causality be a mean to determine that?
Hope you can help. I just wanna discuss on the strength of NNs for modeling and want to hear your opinion. I would use Matlabs Neural Network Toolbox for this.
Thanks in advance.

Multi Step Prediction Neural Networks

I have been working with the matlab neural network toolkit. Here I am using the NARX network. I have a dataset consisting of prices of an object as well as the quantity of the object purchased over a period of time. Essential this network does one step prediction which is defined mathematically as follows:
y(t)= f (y(t −1),y(t −2),...,y(t −ny),x(t −1),x(t −2),...,x(t −nx))
Here y(t) is the price at time t and x is the amount. So the input features I am using are price and amount and the target is the price at time t+1. Suppose I have 100 records of such transactions and each transaction consists of the price and the amount.Then essentially my neural network can predict the price of the 101st transaction. This works fine for one step predictions. However, if i want to do multiple step predictions, so say i want to predict 10 transactions ahead(110th transaction), then I assume that i do a one step prediction of the price and then feed this back into the neural network. I keep doing this until I reach the 110th prediction. However, in this scenario, after i predict the 101st price , I can feed this price into the neural network to predict the 102nd price, however, I do not know the amount of the object at the 101st transaction. How do I go about this ? I was thinking about setting my targets to be the prices of transactions that are 10 transactions ahead of the current one, so that when I predict the 101st transaction, I am essentially predicting the price of the 110th transaction. Is this a viable solution or am i going about this in a completely wrong manner. Thanks in advance for any help
Similar to what kostas said, once you have the predicted 101 price, you can use all your data to predict the 101 amount, then use that to predict the 102 price, then use the 102 price to predict the 102 amount, etc. However, this compounds any error in your predictions for each variable. To mitigate that, you can add several other features, like a tapering discount on past values or a measure of error to use in the prediction (search temporal difference learning for similar ideas in the reinforcement learning realm).
I guess you can use a separate neural network to do time series prediction for x in order to produce x(t+1) up to x(t+10) and then use these values to feed another ANN to predict y(t).