Encoding a probability distribution for a genetic algorithm - encoding

What are some simple and efficient ways to encode a probability distribution as a chromosome for a genetic/evolutionary algorithm?

It highly depends on the nature of the probability distribution you have in hand. As you know, a probability distribution is a mathematical function. Therefore, the properties of this function govern the representation of the probability distribution as a chromosome. For example, do you have a discrete probability distribution (which is encoded by a discrete list of the probabilities of the outcomes like tossing a coin) or a continuous probability distribution (which is applicable when the set of possible outcomes can take on values in a continuous range like the temperature on a given day).
As a simple instance, consider that you want to encode Normal distribution which is an important distribution in probability theory. This distrubution can be encoded as a two-dimensional chromosome in which the first dimension is the mean (Mu) and variance (Sigma^2). You can then calculate the probability using these two parameters. For other continuous probability distribution like Cauchy, you can follow the similar way.

Related

In Bayesian simulation, why use fixed value for the parameters, which has a prior, in the model?

When simulating a Bayesian model, are we supposed to treat the parameters as a random variable (a prior), but not use a fixed value?
For example, we have a Bayesian linear model y=x\beta+\epsilon. When simulating it, literature usually: 1. set regression coefficients at fixed values, e.g. (0,3,-2,1,0,...); 2. simulate the predictors many times; 3. simulate the error term, usually standard normal; 4. generate the response.
If the regression coefficients have a prior (assume they have exchangeable priors), and thus we have posterior distributions, why would we simulate only one set of regression coefficients values? This sounds like the posterior has a distribution, meaning that we don't believe in any fixed value, while the truth indeed is fixed value. Even the posterior mean is supposed to converge to the OLS estimate under good setups, but this still feels difficult to understand.

Proper way of generating initial random vectors for generator model in GAN?

Frequently linear interpolation is used with a Gaussian or uniform prior which has unit variance and zero mean where the size of the vector can be defined in an arbitrary way e.g. 100 to generate initial random vectors for generator model in Generative Adversarial Neural (GAN).
Let's say we have 1000 images for training and batch size is 64. Then each epoch, need to generate a number of random vectors using prior distribution corresponding to each image given small batch. But the problem I see is that since there is no mapping between random vector and corresponding image, the same image can be generated using multiple initial random vectors. In this paper, it suggests overcoming this problem by using different spherical interpolation up to some extent.
So what will happens if initially generate random vectors corresponding to the number of training images and when train the model uses the same random vector which is generated initially?
In GANs the random seed used as input does not actually correspond to any real input image. What GANs actually do is learn a transformation function from a known noise distribution (e.g. Gaussian) to a complex unknown distribution, which is representated by i.i.d. samples (e.g. your training set). What the discriminator in a GAN does is to calculate a divergence (e.g. Wasserstein divergence, KL-divergence, etc.) between the generated data (e.g. transformed gaussian) and the real data (your training data). This is done in a stochastic fashion and therefore no link is neccessary between the real and the fake data. If you want to learn more about this on a hands on example, I can recommend that you train to train a Wasserstein GAN to transform one 1D gaussian distribution into another one. There you can visualize the discriminator and the gradient of the discriminator and really see the dynamics of such a system.
Anyways, what your paper is trying to tell you is after you have trained your GAN and want to see how it has mapped the generated data from the known noise space to the unknown image space. For this reason interpolation schemes have been invented like the spherical one you are quoting. They also show that the GAN has learned to map some parts of the latent space to key characteristics in images, like smiles. But this has nothing to do with the training of GANs.

matlab - pdf for multivariate uniform distribution

The pdf for the multivariate normal distribution in MATLAB is mvnpdf(...). What about the case where multiple variables are uniformly distributed: Is there a function to describe their joint distribution analogous to the multivariate normal distribution? If there is no such function, is there a trick to handle this case?
The simplest way how several variables can be uniformly distributed is if they are mutually independent; in that case you simply have a uniform distribution over the hypercube in the space spanned by the variables. In order to get samples from this distribution, you just separately generate samples for each of the variables.
The point where a "trick" might be necessary is if you have dependencies between the variables even though the marginal distribution for each of them is still uniform. In this case you have to describe the dependency structure, and I'm not aware of any standard way to do this (the way dependencies between normally distributed variables are described by a correlation matrix).
Of course such distributions exist: For two dimensions, one possibility would be to have a joint distribution that looks like a solution to the "eight rooks" problem:
Another one actually derives from the introductory Matlab example, the magic square:
Both of these examples are discrete distributions, but can be produced at arbitrary granularity, or simply interpreted as piecewise constant continuous distributions.
As you can see there are many possibilities for a multivariate distribution each of whose marginal distributions are uniform. The question you have to answer for yourself is what kind of dependencies, if any, you are interested in?
If I'm understanding the question properly, we want to calculate the pdf of a multivariate uniform distribution. By definition, the pdf is constant for all values in the support the distribution. Thus to calculate the pdf all that is required is to calculate the norming constant, which is given by the inverse of the integral of the support. That is to say, the pdf is given by
f(x) = 1 / integral(A)
where A is the support set, and x is an element in A. If an analytic solution to integral(A) is not available, then a numerical integrator can be employed.

Matlab: Markov chain for Pareto distribution

I am often using Markov chains to approximate first-order autoregressive processes AR(1). Now I would like to draw values from a Pareto distribution. Does anybody know how to construct a Markov chain for this type of distribution?
The point is that I approximate the infinite state space of the Pareto by a number n grid points. The time series of a simulation of the Markov Chain should then look 'similar' to the time series when simulating a Pareto distribution.
if you want to draw from a Pareto distribution, why would you not just invert it's cumulative density, and evaluate it for random values between zero and one?
The cumulative density of a pareto distribution is rather simple, and inverting it is no problem (except for the input 1, which results in theoretical limit to infinity)
Of course this is only a workaround, and does not perform exactly what you asked (which I would gather is more of a theoretical exercise).

How can I efficiently model the sum of Bernoullli random variables?

I am using Perl to model a random variable (Y) which is the sum of some ~15-40k independent Bernoulli random variables (X_i), each with a different success probability (p_i). Formally, Y=Sum{X_i} where Pr(X_i=1)=p_i and Pr(X_i=0)=1-p_i.
I am interested in quickly answering queries such as Pr(Y<=k) (where k is given).
Currently, I use random simulations to answer such queries. I randomly draw each X_i according to its p_i, then sum all X_i values to get Y'. I repeat this process a few thousand times and return the fraction of times Pr(Y'<=k).
Obviously, this is not totally accurate, although accuracy greatly increases as the number of simulations I use increases.
Can you think of a reasonable way to get the exact probability?
First, I would avoid using the rand built-in for this purpose which is too dependent on the underlying C library implementation to be reliable (see, for example, my blog post pointing out that the range of rand on Windows has cardinality 32,768).
To use the Monte-Carlo approach, I would start with a known good random generator, such as Rand::MersenneTwister or just use one of Random.org's services and pre-compute a CDF for Y assuming Y is pretty stable. If each Y is only used once, pre-computing the CDF is obviously pointless.
To quote Wikipedia:
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials.
In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no experiments with success probabilities p1, …, pn. (emphasis mine)
Closed-Form Expression for the Poisson-Binomial Probability Density Function might be of interest. The article is behind a paywall:
and we discuss several of its advantages regarding computing speed and implementation and in simplifying analysis, with examples of the latter including the computation of moments and the development of new trigonometric identities for the binomial coefficient and the binomial cumulative distribution function (cdf).
As far as I recall, shouldn't this end up asymptotically as a normal distribution? See also this newsgroup thread: http://newsgroups.derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html
If so, you can use Statistics::Distrib::Normal.
To obtain the exact solution you can exploit the fact that the probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. Convolution is a bit expensive but must be calculated only if the p_i change.
Once you have the probability distribution, you can easily obtain the CDF by calculating the cumulative sum of the probabilities.