Predicting the difference or the quotient? - neural-network

For a time series forecasting problem, I noticed some people tried to predict the difference or the quotient. For instance, in trading, we can try to predict the price difference P_{t-1} - P_t or the price quotient P_{t-1}/P_t. So we get a more stationary problem. With a recurrent neural network for a regression problem, trying to predict the price difference can be a real pain if the price does not change sufficiently fast because it will predict mostly zero at each step.
Questions :
What are the advantages and inconveniences of using the difference or the quotient instead of the whole quantity?
What can a nice tool to get rid of the repetitive zeros in a problem like trying to predict the price movement?

If the assumption is that the price is stationary (*Pt=Cte), then predict the whole quantity.
If the assumption is that the price increase ()is stationary (Pt= Pt-1+Cte), then predict the absolute difference Pt-Pt-1. (Note: thie is the ARIMA model with a degree of differencing=1)
If the assumption is that the price growth (in percentage) is stationary (Pt=Pt-1 +Cte * Pt-1), then predict the relative difference Pt/Pt-1.
If the price changes rarely (i.e. the absolute or relative difference is most often zero), then try to predict the time interval between tow changes rather than the price itself.

Related

residual standard deviation and mean absolute difference

I study the effect of a drug on the variability of a continuous dependent variable. The study includes two groups, one of them receives the drug. The dependent variable is repeatedly measured 6 times during the study. The variability is assessed by residual standard deviation and mean absolute difference.
Any idea how to perform the analysis in SPSS?
I study the effect of a drug on the variability of a continuous dependent variable. The study includes two groups, one of them receives the drug. The dependent variable is repeatedly measured 6 times during the study. The variability is assessed by residual standard deviation and mean absolute difference.
Any idea how to perform the analysis in SPSS?

Imputing missing values for linear regression model, using linear regression

I scraped a real estate website and would like to impute missing data on total area (about 40% missing) using linear regression. I achieve the best results using price, number of rooms, bedrooms, bathrooms, and powder rooms.
Correlation matrix
Adding price to the room information makes a significant difference. This makes sense, since the number of rooms alone don't give you any information on how large those rooms may be. Price can reduce some of that uncertainty. There is a 20 point difference between the R^2 scores of the model that includes and the one that excludes price (0.62 vs 0.82).
The problem that I see, is that my final model would likely also be a liner regression with price as the target. With this, it seems wrong to include price in predicting total area for imputation. My final model will look better as a consequence but I will have engineered a synthetic correlation. This is especially critical since about 40% of values need to be replaced.
Does anyone disagree with this? Should I keep price as a predictor to impute missing values even though it will be the target of my final model?
Based by context, I think you're talking about Hotel prices?
Based from my experience, imputing missing values for your predictor values, it can really make a significant boost to R^2 Scores, however the more you impute the predictor, the fewer observations you have, and thus it will be bias to conclude that to a bigger picture of Hotel Prices, since you may never know if there exist unobserved Hotel Prices with more variation right?

Why does classifier accuracy drop after PCA, even though 99% of the total variance is covered?

I have a 500x1000 feature vector and principal component analysis says that over 99% of total variance is covered by the first component. So I replace 1000 dimension point by 1 dimension point giving 500x1 feature vector(using Matlab's pca function). But, my classifier accuracy which was initially around 80% with 1000 features now drops to 30% with 1 feature even though more than 99% of the variance is accounted by this feature. What could be the explanation to this or are my methods wrong?
(This question partly arises from my earlier question Significance of 99% of variance covered by the first component in PCA)
Edit:
I used weka's principal components method to perform the dimensionality reduction and support vector machines(SVM) classifier.
Principal Components do not necessarily have any correlation to classification accuracy. There could be a 2-variable situation where 99% of the variance corresponds to the first PC but that PC has no relation to the underlying classes in the data. Whereas the second PC (which only contributes to 1% of the variance) is the one that can separate the classes. If you only keep the first PC, then you lose the feature that actually provides the ability to classify the data.
In practice, smaller (lower variance) PCs often are associated with noise so there can be benefit in removing them but there is no guarantee of this.
Consider a case where you have two variables: a person's mass (in grams) and body temperature (in degrees Celsius). You want to predict which people have the flu and which do not. In this case, weight has a much greater variance but probably no correlation to the flu, whereas temperature, which has low variance, has a strong correlation to the flu. After the Principal Components transformation, the first PC will be strongly aligned with mass (since it has much greater variance) so if you dropped the second PC, would be losing almost all of your classification accuracy.
It is important to remember that Principal Components is an unsupervised transformation of the data. It does not consider labels of your training data when calculating the transformation (as opposed to something like Fisher's linear discriminant).

PIV Analysis, Interrogation Area of The Cross Correlation

I'm running a PIV analysis on two consecutive images taken during an experiment to get the vector field. But I would like to know, based on what criteria do I have to choose the percentage of overlap between the tow images for the cross-correlation process? 50%, 75%...? The PIVlab_GUI tool designed for MATLAB chooses a 50% overlap by default, but it allows changing it.
I just want to know the criteria based on which I can know how much overlap is best? Do the vectors become less accurate, dependent.etc, as we increase/decrease the overlap?
My book "Fluid Mechanics Measurements" does not explain how to choose the overlap amount in the cross-correlation process, and I could not find any helpful online reference.
Any help is appreciated.
I suggest you read up on spectral estimation - which is basically equivalent to cross correlation when you segment the data and average the correlation estimates calculated from each segment (the cross correlation is the inverse Fourier transform of the cross spectrum). There's a book chapter on this stuff here, but you may want to find a more complete resource if you are unclear on the basics.
A short answer: increasing the overlap will increase the frequency resolution of the spectral estimate, and give you more segments to average over; your estimate will have a lower variance. But there are diminishing statistical returns the more you increase your overlap past 50%, while the computational complexity continues to rise (more segments = more calculations). Hence most people just choose 50% and have done with it.
It's important to note that you don't get any more information by using overlapping frames, you are simply increasing the frequency resolution (or time lag resolution, for correlation) - similar to the effect of zero-padding a signal before taking its Fourier transform - and this has statistical effects due to the way estimation of this type works.

Multi Step Prediction Neural Networks

I have been working with the matlab neural network toolkit. Here I am using the NARX network. I have a dataset consisting of prices of an object as well as the quantity of the object purchased over a period of time. Essential this network does one step prediction which is defined mathematically as follows:
y(t)= f (y(t −1),y(t −2),...,y(t −ny),x(t −1),x(t −2),...,x(t −nx))
Here y(t) is the price at time t and x is the amount. So the input features I am using are price and amount and the target is the price at time t+1. Suppose I have 100 records of such transactions and each transaction consists of the price and the amount.Then essentially my neural network can predict the price of the 101st transaction. This works fine for one step predictions. However, if i want to do multiple step predictions, so say i want to predict 10 transactions ahead(110th transaction), then I assume that i do a one step prediction of the price and then feed this back into the neural network. I keep doing this until I reach the 110th prediction. However, in this scenario, after i predict the 101st price , I can feed this price into the neural network to predict the 102nd price, however, I do not know the amount of the object at the 101st transaction. How do I go about this ? I was thinking about setting my targets to be the prices of transactions that are 10 transactions ahead of the current one, so that when I predict the 101st transaction, I am essentially predicting the price of the 110th transaction. Is this a viable solution or am i going about this in a completely wrong manner. Thanks in advance for any help
Similar to what kostas said, once you have the predicted 101 price, you can use all your data to predict the 101 amount, then use that to predict the 102 price, then use the 102 price to predict the 102 amount, etc. However, this compounds any error in your predictions for each variable. To mitigate that, you can add several other features, like a tapering discount on past values or a measure of error to use in the prediction (search temporal difference learning for similar ideas in the reinforcement learning realm).
I guess you can use a separate neural network to do time series prediction for x in order to produce x(t+1) up to x(t+10) and then use these values to feed another ANN to predict y(t).