I'm studying Anylogic. I'm curious about something.
Some people explain that arrival rate follows Exponential Distribution.
I wanna know 'How can prove that?'
Any kind guidance from you would be very helpful and much appreciated.
Thank you so much.
The arrival rate doesn't follow an exponential distribution, it follows a poisson distribution, so there's nothing to prove on that regard.
What follows an exponential distribution is the inter-arrival time between agents.
To prove that this thing actually follows a particular distribution, you can use one of the many distribution fitting techniques out there, my favorite and the one is the Cullen and Frey Graph. You can see an answer about it here:
https://stats.stackexchange.com/questions/333495/fitting-a-probability-distribution-and-understanding-the-cullen-and-frey-graph
You can also check the wikipedia page on distribution fitting:
https://en.wikipedia.org/wiki/Probability_distribution_fitting
Have in mind that distribution fitting is kinda an art, and no technique gives you the correct distribution, but maybe a good enough approximation of a distribution. But in this case it should be quite easy.
You can't really prove that a distribution fits the data though, you can just have maybe an error estimation when you compare the distribution function with the actual data, and you can have a confidence interval for that... I'm not sure if that's what you want.
not exactly sure what you mean by "prove" that it is exponential... But anyway, it is not "some people" that explain that, it is actually mentioned in AnyLogic help under the "Source" topic as follows:
Rate - agents are generated at the specified arrival rate (which is
equivalent to exponentially distributed interarrival time with mean =
1/rate).
What you can do is collect the interval time between arrivals and plot that distribution to see that it actually looks like an exponential distribution.
To do that:
Create a typical DES process (e.g. source, queue, delay, sink)
Set the arrival type to rate and specify for example 1 per hour
Create a variable in main called "prevTime"
Create a histogram data element called "data"
In the "On exit" of the source write the following code:
data.add(time() - prevTime);
prevTime = time();
Look at the plot of the histogram and its mean.
Related
I am trying to understand the best practices regarding AnyLogic's source arrival rates. I know that Exponential and Poisson are two different probability distributions. When using "Arrival Rate" in AnyLogic and choosing a rate of 10/hour for example, does this generate 10 agents per hour exponentially or according to a Poisson distribution or is it the same thing?
I really need guidance on understanding the best practices in this matter. To simplify the question, if I have an arrival rate of 10/hour following a Poisson distribution, what is the right way to model that in AnyLogic?
Many thanks!
In any source in AnyLogic, if you choose a rate, it will automatically be poisson where your rate will be the lambda parameter of your poisson distribution... this means that in average you will get lambda agents per time unit generated
The exponential distribution is equivalent to the poisson distribution, except that it takes into consideration the time between each arrival instead. (this means that you need to use arrivals defined by inter-arrival time in your source, otherwise it wouldn't make much sense)
poisson(lambda) arrivals per time unit is equivalent to exponential(lambda) time units per arrival, it doesn't really matter which one you use
In one hierarchical model, we have two hyer parameters: dnorm(A_mu, 0.25^-2) and dnorm (B_mu, 0.25^-2). In this case, 0.25 is the sd, I use the fixed number. A_mu and B_mu represent the mean of group level. After fitting the data by rjags, we get the distributions for each parameter. So I just directly compare the highest posterior density interval (HDI) of A_mu and B_mu? Do I need to calculate something using the sd(0.25)?
In another case, if the sd of two hyper parameters is not fixed, like that: dnorm(A_mu, A_sd) and dnorm (B_mu, B_sd). How can I compare the two hyper parameters and make a decision, e.g. this group is significantly different another group?
Remember that you are getting posterior distributions for A_mu and B_mu. This makes your comparison easy as you can have a look at 95% confidence intervals (CI) for the parameters (or pick a confidence value that satisfies your needs). I believe JAGS uses Gibbs sampling and so you should be able to get the raw samples from the posteriors for A_mu and B_mu. You can then ask "what is the probability that B_mu is greater than some value?" by calculating the percentage of posterior samples that are greater than that value. Alternatively, and in a similar way to frequentist Hypothesis testing, you can ask what is the probability that the mean of B_mu is a draw from the posterior of A_mu. So the key is just to directly use the samples from your posterior. I would recommend taking a look at Andrew Gelman's BDA3 textbook (Chapter 4) for a really good reference on these concepts.
A few things to keep in mind before drawing conclusions from the data: (1) you should always check the validity of your Markov Chains by evaluating things like autocorrelation (2) try to do a posterior predictive check to make sure your model is well fit to the data. If your model is poorly fit to the data then you can get very misleading results from the procedure above.
I am trying to extract common patterns that always appear whenever a certain event occurs.
For example, patient A, B, and C all had a heart attack. Using the readings from there pulse, I want to find the common patterns before the heart attack stroke.
In the next stage I want to do this using multiple dimensions. For example, using the readings from the patients pulse, temperature, and blood pressure, what are the common patterns that occurred in the three dimensions taking into consideration the time and order between each dimension.
What is the best way to solve this problem using Neural Networks and which type of network is best?
(Just need some pointing in the right direction)
and thank you all for reading
Described problem looks like a time series prediction problem. That means a basic prediction problem for a continuous or discrete phenomena generated by some existing process. As a raw data for this problem we will have a sequence of samples x(t), x(t+1), x(t+2), ..., where x() means an output of considered process and t means some arbitrary timepoint.
For artificial neural networks solution we will consider a time series prediction, where we will organize our raw data to a new sequences. As you should know, we consider X as a matrix of input vectors that will be used in ANN learning. For time series prediction we will construct a new collection on following schema.
In the most basic form your input vector x will be a sequence of samples (x(t-k), x(t-k+1), ..., x(t-1), x(t)) taken at some arbitrary timepoint t, appended to it predecessor samples from timepoints t-k, t-k+1, ..., t-1. You should generate every example for every possible timepoint t like this.
But the key is to preprocess data so that we get the best prediction results.
Assuming your data (phenomena) is continuous, you should consider to apply some sampling technique. You could start with an experiment for some naive sampling period Δt, but there are stronger methods. See for example Nyquist–Shannon Sampling Theorem, where the key idea is to allow to recover continuous x(t) from discrete x(Δt) samples. This is reasonable when we consider that we probably expect our ANNs to do this.
Assuming your data is discrete... you still should need to try sampling, as this will speed up your computations and might possibly provide better generalization. But the key advice is: do experiments! as the best architecture depends on data and also will require to preprocess them correctly.
The next thing is network output layer. From your question, it appears that this will be a binary class prediction. But maybe a wider prediction vector is worth considering? How about to predict the future of considered samples, that is x(t+1), x(t+2) and experiment with different horizons (length of the future)?
Further reading:
Somebody mentioned Python here. Here is some good tutorial on timeseries prediction with Keras: Victor Schmidt, Keras recurrent tutorial, Deep Learning Tutorials
This paper is good if you need some real example: Fessant, Francoise, Samy Bengio, and Daniel Collobert. "On the prediction of solar activity using different neural network models." Annales Geophysicae. Vol. 14. No. 1. 1996.
I would like to create a tool for generating a stochastic time-series distribution, for which I can provide the parameters (for a normal distribution) the mean, standard deviation, skewness and kurtosis. There is a similar question here using R, but I am not able to interpret this and put it in MATLAB.
Is there something that someone knows can do this already? (I haven't been able to find anything)
If not, what would be some good advice for starting something of my own? Any known useful functions? I would also like to be able to build upon it afterwards, for example: adding outliers, clusters of volatility, adjusting heteroscedasticity.
I realise me saying 'stochastic' and then in the same sentence 'given parameters' may seem odd, but it isn't - I want each time point to be random, but the parameters to describe, say 10,000 time points.
If you're looking for the equivalent of the solution in R, Matlab's Statistics Toolbox has limited support for the Johnson and Pearson distribution systems. In particular, the johnsrnd function produces random variates for the Johnson system. The Pearson system and pearsrnd, however, takes moments directly.
A big caveat. Using moments to describe or fit or produce random variates – often referred to as moment matching – is not robust and poorly regarded by statisticians. They're not guaranteed to uniquely define a distribution unless you have the entire moment generating function.
This might be a silly question! I have a array P which represents the probability distribution of some data e.g. [0;0.3;0.7] How can I determine the type or class of discrete probability distribution of P? The original data is unavailable to me.
dfittool or fitdist requires me to give the data as input, while I already have its probability distribution. Any ideas?
You probably might have seen different probability distributions during lecture or your reading. All you have to do is plotting the given distribution against the candidates. As the distributions itself are parametrized, curve fitting or trial end error come into play. The distribution with the least error, best fit, might be the one you are looking for.
It is not possible to find out a priori what kind of distribution some data (especially with as low n as in your example) is coming from.
If you have an idea of the process that generated your data, you might be able to get an idea of which distributions to test. Maybe your data comes from the family of gamma distributions, maybe your data comes from the family of Weibull distributions etc. Then, you can fit these general distributions and see whether they are likely to simplify to a more common distribution.
For a visual representation of how well your data could approximate a certain distribution, you can use PROBPLOT.
Once you have identified possible distributions, you can fit them to the data and use the Bayesian Information Criterion (BIC) to compare which fit describes the data best. Note that unless you have huge numbers of noise-free data, it is impossible to tell which fit is correct if you have several possible distributions with comparatively low BIC.