Having a data set contains - discount

raw_data
Having a dataset which contains variables such as ordered units, average price and discount (example attached) how I could find the optimal discount numerically?
Plotting ordered units vs. discount, it seems the optimal discount is around 10% (considering more units to be ordered). How I numerically can support or reject this guess? However, maximising ordered units may not need maximising profit, but I got only that much of data to decide on the optimal discount.
ordered units vs. discount
Thank you!

Related

Imputing missing values for linear regression model, using linear regression

I scraped a real estate website and would like to impute missing data on total area (about 40% missing) using linear regression. I achieve the best results using price, number of rooms, bedrooms, bathrooms, and powder rooms.
Correlation matrix
Adding price to the room information makes a significant difference. This makes sense, since the number of rooms alone don't give you any information on how large those rooms may be. Price can reduce some of that uncertainty. There is a 20 point difference between the R^2 scores of the model that includes and the one that excludes price (0.62 vs 0.82).
The problem that I see, is that my final model would likely also be a liner regression with price as the target. With this, it seems wrong to include price in predicting total area for imputation. My final model will look better as a consequence but I will have engineered a synthetic correlation. This is especially critical since about 40% of values need to be replaced.
Does anyone disagree with this? Should I keep price as a predictor to impute missing values even though it will be the target of my final model?
Based by context, I think you're talking about Hotel prices?
Based from my experience, imputing missing values for your predictor values, it can really make a significant boost to R^2 Scores, however the more you impute the predictor, the fewer observations you have, and thus it will be bias to conclude that to a bigger picture of Hotel Prices, since you may never know if there exist unobserved Hotel Prices with more variation right?

Predicting the difference or the quotient?

For a time series forecasting problem, I noticed some people tried to predict the difference or the quotient. For instance, in trading, we can try to predict the price difference P_{t-1} - P_t or the price quotient P_{t-1}/P_t. So we get a more stationary problem. With a recurrent neural network for a regression problem, trying to predict the price difference can be a real pain if the price does not change sufficiently fast because it will predict mostly zero at each step.
Questions :
What are the advantages and inconveniences of using the difference or the quotient instead of the whole quantity?
What can a nice tool to get rid of the repetitive zeros in a problem like trying to predict the price movement?
If the assumption is that the price is stationary (*Pt=Cte), then predict the whole quantity.
If the assumption is that the price increase ()is stationary (Pt= Pt-1+Cte), then predict the absolute difference Pt-Pt-1. (Note: thie is the ARIMA model with a degree of differencing=1)
If the assumption is that the price growth (in percentage) is stationary (Pt=Pt-1 +Cte * Pt-1), then predict the relative difference Pt/Pt-1.
If the price changes rarely (i.e. the absolute or relative difference is most often zero), then try to predict the time interval between tow changes rather than the price itself.

Data normalisation for presenting to neural network

I'm experimenting with neural networks and as an introduction I'm doing the popular stock market prediction method; feed in price and volumes in order to predict the future price. I need to normalise my data before presenting it to the network, but I'm unsure as to the methodology...
Each stock has a closing price and volume figure for each trading day; do I normalise the price data across the prices of all stocks for each day, or do I normalise it against the previous prices for that one stock?
I.e. I'm presenting StockA to the NN, do I normalise the price data against the previous prices of StockA, or do I normalise it with the price of StockA, B, C, D... for the date that's being presented?
In my opinion you should be treating this issue as a hyperparameter,
will say: Try both and do what works best.
In the end this comes down to what the Information in the stocks are like, and how many different (quantity, characteristics) stock data you have.
If you normalize over each single stock you'll probably get a better generalization, especially if you only have few data available.
However if you normalize over the whole stock data, you still keep the overall information of the whole dataset in each stock dataset (e.g. Magnitude of the stockprice) - which might help your model, since more expensive stocks might behave different from less expensive once.

Training HMM - The amount of data required

I'm using HMM for classifications. I came cross an example in Wikipedia Baum–Welch algorithm Example. Hope someone can help me.
The example as follow: "Suppose we have a chicken from which we collect eggs at noon everyday. Now whether or not the chicken has laid eggs for collection depends on some unknown factors that are hidden. We can however (for simplicity) assume that there are only two states that determine whether the chicken lays eggs."
Note that we have 2 different observations (N and E) and 2 states (S1 and S2) in this example.
My question here is:
How many observations/Observed sequences (or training data) do we need to best train the model. Is there any way to estimate or to test the amount of training data required.
For each variable in your HMM model, you need about 10 samples. Using this rule of thumb, you can easily calculate how many samples do you need to construct a reliable classifier.
In your example you have two states which results in a 2 in 2 transition matrix A=[a_00, a_01;a_10, a_11] where a_ij is the transition probability from state S_i to S_j.
Moreover, each of these states with probability p_S1 and p_S2 generate observations, i.e.: If we are at state S1 with probability p_S1 the chicken will lay egg and with probability 1-p_S1 it will not.
In total you have 6 variables needed to be estimate. It is more or less obvious that it is not possible to accurately estimate them from only two observations. As I mentioned before, it is conventional to assume at least 10 samples per variable are needed in order to estimate that variable accurately.

Multi Step Prediction Neural Networks

I have been working with the matlab neural network toolkit. Here I am using the NARX network. I have a dataset consisting of prices of an object as well as the quantity of the object purchased over a period of time. Essential this network does one step prediction which is defined mathematically as follows:
y(t)= f (y(t −1),y(t −2),...,y(t −ny),x(t −1),x(t −2),...,x(t −nx))
Here y(t) is the price at time t and x is the amount. So the input features I am using are price and amount and the target is the price at time t+1. Suppose I have 100 records of such transactions and each transaction consists of the price and the amount.Then essentially my neural network can predict the price of the 101st transaction. This works fine for one step predictions. However, if i want to do multiple step predictions, so say i want to predict 10 transactions ahead(110th transaction), then I assume that i do a one step prediction of the price and then feed this back into the neural network. I keep doing this until I reach the 110th prediction. However, in this scenario, after i predict the 101st price , I can feed this price into the neural network to predict the 102nd price, however, I do not know the amount of the object at the 101st transaction. How do I go about this ? I was thinking about setting my targets to be the prices of transactions that are 10 transactions ahead of the current one, so that when I predict the 101st transaction, I am essentially predicting the price of the 110th transaction. Is this a viable solution or am i going about this in a completely wrong manner. Thanks in advance for any help
Similar to what kostas said, once you have the predicted 101 price, you can use all your data to predict the 101 amount, then use that to predict the 102 price, then use the 102 price to predict the 102 amount, etc. However, this compounds any error in your predictions for each variable. To mitigate that, you can add several other features, like a tapering discount on past values or a measure of error to use in the prediction (search temporal difference learning for similar ideas in the reinforcement learning realm).
I guess you can use a separate neural network to do time series prediction for x in order to produce x(t+1) up to x(t+10) and then use these values to feed another ANN to predict y(t).