prediction and time series - prediction

how to decide how in advance my prediction is?
i am following the featuretools churn tutorial https://github.com/Featuretools/predict-customer-churn
what i don't quite understand how did it decide that the prediction is for one month in advance.. in previous churn examples i tried, i just get aggregated data ( it could be historical for a years or months) then i build churn model and predict but i don't know if my prediction is for a month a year or even how many days in advance how is that decided!.
does it depend on the period of aggregation or the data i didn't use. i know cut off time is the time i want to make prediction but how do i tell the system i want to make prediction for 2 month in advance do i just disregard the data for the last two months by setting the cut_off time but provide the label after the two months and say my model based on the features i get is for a 2 month advanced prediction.
for ex. cut_off date is 1/8/2010 label is the customer state on 1/10/2010
so two months period is the advance prediction? and i used all historical data previous to cut_off time?
this might be a time series problem that is turned into a simple classification but i am not sure!

You pick the amount of time in advanced (called "lead time") using your domain expertise. Depending on the real world application the lead time might be more or less. Sometimes you might even build multiple models with different lead times to apply in different situations.
You control the lead time by moving the cutoff earlier with respect to the time the label became known. So, the example you give looks correct.

Related

Tableau Summing up aggregated data with FIXED

Data granularity is per customer, per invoice date, per product type.
Generally the idea is simple:
We have a moving average calculation of the volume per week. MA based on last 12 weeks (MA Volume):
window_sum(sum([Volume]),-11,0)/window_count(count([Volume]), -11,0)
We need to see the deviation of the current week vs the MA for that week (Vol DIFF):
SUM([Volume])-[MA Calc]
We need to sum up the deviations for a fixed period of time (Year/Month)
Basically this should show us whether on average, for a given period of time, we deviate positively or negatively vs the base.
enter image description here
Unfortunately I get errors like:
"Argument to SUM (an aggregate function) is already an aggregation, and cannot be further aggregated."
Or
"Level of detail expressions cannot contain table calculations or the ATTR function"
Any ideas how I can go around this one?
Managed to solve this one. Needed to add months to the view and then just WINDOW_SUM(Vol_DIFF).
Simple as that!

How to make a parameter only count once in Anylogic model during simulation?

I am having a hard time trying to solve a very easy issue I can imagine, but I just can't see it.
Namely, I am building a dynamic simulation model which calculates accumulated costs and benefits.
However, I have introduced a 5 year time span for the model but there are certain costs and benefits which occur only once (in the first year for example). But currently, the model is using these parameters in each simulation for every year. How can model it such that these values are only taken into the simulation once?
Surely there is some kind of formula which could help me with this. The AnyLogic support page did not help me either.
For applying values that change over time, you should use a "Dynamic Variable" object and set it to 0 after one year as follows (assuming "999" is replaced yb whatever value you want):

Dymola / Modelica - District heating

I am trying to validate a district heating model I built using Dymola.
In this case, I am trying to find the mass flow during a year period. I have two models running. both with the same loads and pipes with same characteristics as this picture:
pipes
Both models are as follows:
models
My results are making sense at least regarding the time of the year my flow should be higher, I am getting very high values during January, February and March, then again by the end of the year.
However those high peaks are VERY different, the first model on the picture is giving me peaks of almost 400kg/s whereas the second one is reaching up to 70kg/s.
Can anyone suggest a way to validate the model? I have the heat loads for the year hour by hour (this is the input I am giving to Dymola), I know that the min temperature of the water is 70 and the max is 85 celsius.
But I am really struggling to validate my model. Any suggestions?

Using a neural network to forecast on time series with a variable horizon

I have used ANNs to classify data before, but not for time series data. Basically I want to know the possibility (relative ease) for a neural network to take a bunch of previous time series data, then be able to predict into the future for not just a single point in time (for which it's been trained), but for an arbitrary point in time (up to certain limits, of course)
is the best/simplest way to train a bunch of ANNs, each one targeting different time horizons (e.g. 1 hours, 2 hours, 5 hours, 24 hours), then if you want a prediction for another time, say 3 hours, use something like interpolation to try and forecast?
How can I structure my problem to handle this? Is there a particular neural network design that is suited to this application. Please let me know your thoughts.

Prediction/delay forcasting using Machine Learning?

I have a set of data for the past 5 years. Approx 7000 rows of data with features that are binary {yes/no} or are multi-classed {product A, B, C} A total of about 20+ features.
I am trying to make a program (or one time analysis project) to determine (predict) the product shipdate(shipping delay days) based on this historical data. I have 2 columns that indicate when a product was planned to be shipped and another column of when it was actually shipped! Currently.
I'm wondering how I can make a prediction program that determines based on the historic data when new data input of a product will expect to ship. I don't care about a getting a specific date but even just a program that can tell me number of delay days to add...
I took an ML class a while back and I wasn't sure how to start something like this. Any advice? Plus the closest thing to this I can think of is an image recognition assignment using NN. but that was too easy here I have to deal with a date instead of pixel white/black.... I used Matlab back in the day (I still know how to use it) but I just downloaded Weka data mining tool.
I was thinking of a neural network but I'm not sure how to set it up to have my program give me a the expected delay time (# of days/month) from the inputed ship date.
Basically,
I want to input (size = 5, prod = A, ....,expected ship date = jan 1st)
and the program returns the number of days to add as a delay onto my expected ship date given the historical trends...
Would appreciate any any help on how start something like this the correct/easiest/best way... Thanks in advance.
If you use weka, then get your input/label data into the arff format and then you try out all the different regressors (this is a regression problem after all). To avoid having to do too much programming quite yet (if you are just in an exploratory phase), use the weka experimenter which has a GUI for trying out a whole bunch of regressors on your dataset.
Then when you find one that does something expected and you want to do some more data analysis using MATLAB, then you can use a weka/matlab interface.