Using a neural network to forecast on time series with a variable horizon

Using a neural network to forecast on time series with a variable horizon - neural-network

I have used ANNs to classify data before, but not for time series data. Basically I want to know the possibility (relative ease) for a neural network to take a bunch of previous time series data, then be able to predict into the future for not just a single point in time (for which it's been trained), but for an arbitrary point in time (up to certain limits, of course)
is the best/simplest way to train a bunch of ANNs, each one targeting different time horizons (e.g. 1 hours, 2 hours, 5 hours, 24 hours), then if you want a prediction for another time, say 3 hours, use something like interpolation to try and forecast?
How can I structure my problem to handle this? Is there a particular neural network design that is suited to this application. Please let me know your thoughts.

Related

Anylogic - Substantial variances in identical arrival rate schedule outputs

I am currently completing some verification checks on an Anylogic DES simulation model, and I have two source blocks with identical hourly arrival rate schedules, broken down into 24 x 1h blocks.
The issue I am encountering is significant differences in the number of agents generated by one block compared with another. I understand that the arrival rate is based on the poisson distribution, so there is some level of randomness in the instants of agent generation, but I would expect that the overall number generated by these two blocks should be similar, if not identical. For example, in one operating scenario one block is generating 78 agents, whilst the other is only generating 67 over the 24h period. This seems to be a common issue across all operating scenarios.
Is there a potential explanation regarding idiosyncrasies within Anylogic that might explain this?
Any pointers would be welcomed.

I think it occurs because it follows a poisson distribution. To solve this, you could use the interarrival time function of the source block. In that case you would have the same number of arrivals for different source blocks. However, I'm not sure whether this fits a schedule. If not, you could use the getHourOfDay() function together with a parameter representing the interarrival time. You then have to write the code below for every hour of the day:
if(getHourOfDay()==14) parameter =5;

using sources with poisson distributions will definitely not produce same results... That's the magic of stocastic models.
An alternative to solve this problem is the following:
sources will generate using the inject function
use dynamic events that will be in charge to do source.inject();
let's imagine you have R trains coming per day, and this is a fixed value you want to use, you can then distribute the trains accross the day by doing this:
for(int i=0;i<R;i++){
create_DynamicEvent1(uniform(0,1),DAY); //for source1
create_DynamicEvent2(uniform(0,1),DAY); //for source2
}
This doesn't follow a poisson distribution, but generates a predefined number of arrivals of trains throughout the day, and you can use another distribution of your choice if the uniform is not good enough for you.
run this for every day

Anylogic - How to measure work in process inventory (WIP) within simulation

I am currently working on a simple simulation that consists of 4 manufacturing workstations with different processing times and I would like to measure the WIP inside the system. The model is PennyFab2 in case anybody knows it.
So far, I have measured throughput and cycle time and I am calculating WIP using Little's law, however the results don't match he expectations. The cycle time is measured by using the time measure start and time measure end agents and the throughput by simply counting how many pieces flow through the end of the simulation.
Any ideas on how to directly measure WIP without using Little's law?
Thank you!

For little's law you count the arrivals, not the exits... but maybe it doesn't make a difference...
Otherwise.. There are so many ways
you can count the number of agents inside your system using a RestrictedAreaStart block and use the entitiesInside() function
You can just have a variable that adds +1 if something enters and -1 if something exits
No matter what, you need to add the information into a dataset or a statistics object and you get the mean of agents in your system

Little's Law defines the relationship between:
Work in Process =(WIP)
Throughput (or Flow rate)
Lead Time (or Flow Time)
This means that if you have 2 of the three you can calculate the third.
Since you have a simulation model you can record all three items explicitly and this would be my advice.
Little's Law should then be used to validate if you are recording the 3 values correctly.
You can record them as follows.
WIP = Record the average number of items in your system
Simplest way would be to count the number of items that entered the system and subtract the number of items that left the system. You simply do this calculation every time unit that makes sense for the resolution of your model (hourly, daily, weekly etc) and save the values to a DataSet or Statistics Object
Lead Time = The time a unit takes from entering the system to leaving the system
If you are using the Process Modelling Library (PML) simply use the timeMeasureStart and timeMeasureEnd Blocks, see the example model in the help file.
Throughput = the number of units out of the system per time unit
If you run the model and your average WIP is 10 units and on average a unit takes 5 days to exit the system, your throughput will be 10 units/5 days = 2 units/day
You can validate this by taking the total units that exited your system at the end of the simulation and dividing it by the number of time units your model ran
if you run a model with the above characteristics for 10 days you would expect 20 units to have exited the system.

prediction and time series

how to decide how in advance my prediction is?
i am following the featuretools churn tutorial https://github.com/Featuretools/predict-customer-churn
what i don't quite understand how did it decide that the prediction is for one month in advance.. in previous churn examples i tried, i just get aggregated data ( it could be historical for a years or months) then i build churn model and predict but i don't know if my prediction is for a month a year or even how many days in advance how is that decided!.
does it depend on the period of aggregation or the data i didn't use. i know cut off time is the time i want to make prediction but how do i tell the system i want to make prediction for 2 month in advance do i just disregard the data for the last two months by setting the cut_off time but provide the label after the two months and say my model based on the features i get is for a 2 month advanced prediction.
for ex. cut_off date is 1/8/2010 label is the customer state on 1/10/2010
so two months period is the advance prediction? and i used all historical data previous to cut_off time?
this might be a time series problem that is turned into a simple classification but i am not sure!

You pick the amount of time in advanced (called "lead time") using your domain expertise. Depending on the real world application the lead time might be more or less. Sometimes you might even build multiple models with different lead times to apply in different situations.
You control the lead time by moving the cutoff earlier with respect to the time the label became known. So, the example you give looks correct.

Alert in RAM/CPU Usage Detection in e-Commerce Server

Currently I'm building my monitoring services for my e-commerce Server, which mostly focus on CPU/RAM usage. It's likely Anomaly Detection on Timeseries data.
My approach is building LSTM Neural Network to predict next CPU/RAM value on chart trending and compare with STD (standard deviation) value multiply with some number (currently is 10)
But in real life conditions, it depends on many differents conditions, such as:
1- Maintainance Time (in this time "anomaly" is not "anomaly")
2- Sales time in day-off events, holidays, etc., RAM/CPU usages increase is normal, of courses
3- If percentages of CPU/RAM decrement are the same over 3 observations: 5 mins, 10 mins & 15 mins -> Anomaly. But if 5 mins decreased 50%, but 10 mins it didn't decrease too much (-5% ~ +5%) -> Not an "anomaly".
Currently I detect anomaly on formular likes this:
isAlert = (Diff5m >= 10 && Diff10m >= 15 && Diff30m >= 40)
where Diff is Different Percentage in Absolute value.
Unfortunately I don't save my "pure" data for building neural network, for example, when it detects anomaly, I modified that it is not an anomaly anymore.
I would like to add some attributes to my input for model, such as isMaintenance, isPromotion, isHoliday, etc. but sometimes it leads to overfitting.
I also want to my NN can adjust baseline over the time, for example, when my Service is more popular, etc.
There are any hints on these aims?
Thanks

I would say that an anomaly is an unusual outcome, i.e. a outcome that's not expected given the inputs. As you've figured out, there are a few variables that are expected to influence CPU and RAM usage. So why not feed those to the network? That's the whole point of Machine Learning. Your network will make a prediction of CPU usage, taking into account the sales volume, whether there is (or was) a maintenance window, etc.
Note that you probably don't need an isPromotion input if you include actual sales volumes. The former is a discrete input, and only captures a fraction of the information present in the totalSales input
Machine Learning definitely needs data. If you threw that away, you'll have to restart capturing it. As for adjusting the baseline, you can achieve that by overweighting recent input data.

Prediction/delay forcasting using Machine Learning?

I have a set of data for the past 5 years. Approx 7000 rows of data with features that are binary {yes/no} or are multi-classed {product A, B, C} A total of about 20+ features.
I am trying to make a program (or one time analysis project) to determine (predict) the product shipdate(shipping delay days) based on this historical data. I have 2 columns that indicate when a product was planned to be shipped and another column of when it was actually shipped! Currently.
I'm wondering how I can make a prediction program that determines based on the historic data when new data input of a product will expect to ship. I don't care about a getting a specific date but even just a program that can tell me number of delay days to add...
I took an ML class a while back and I wasn't sure how to start something like this. Any advice? Plus the closest thing to this I can think of is an image recognition assignment using NN. but that was too easy here I have to deal with a date instead of pixel white/black.... I used Matlab back in the day (I still know how to use it) but I just downloaded Weka data mining tool.
I was thinking of a neural network but I'm not sure how to set it up to have my program give me a the expected delay time (# of days/month) from the inputed ship date.
Basically,
I want to input (size = 5, prod = A, ....,expected ship date = jan 1st)
and the program returns the number of days to add as a delay onto my expected ship date given the historical trends...
Would appreciate any any help on how start something like this the correct/easiest/best way... Thanks in advance.

If you use weka, then get your input/label data into the arff format and then you try out all the different regressors (this is a regression problem after all). To avoid having to do too much programming quite yet (if you are just in an exploratory phase), use the weka experimenter which has a GUI for trying out a whole bunch of regressors on your dataset.
Then when you find one that does something expected and you want to do some more data analysis using MATLAB, then you can use a weka/matlab interface.