Convert 32 bit uniform distribution to uniform distribution on any int - hash

Given a discrete uniform distribution D~U([0:2^N-1] from which a sample yields a number in the inclusive integer range [0, 2^N-1] for an integer N, I need a function convert such that for a sample d~D, convert(d, m) will have an integer uniform inclusive distribution Dc~U([0:m]).
Thoughts:
If the distribution is continuous, this is easy. Just cutoff the infinite representation of the number, and the uniformity is preserved.
I can't think of a way to do this for all numbers and keep uniformity.
I could re-roll for tie conditions, but am not able to formulate an algorithm.
What I eventually want, is a murmur hash on a custom range (m), rather than exact 32 bit numbers.

Related

What are some hash functions with very long period that map to a small integer

Basically, I am looking for a hash function that works on an N-tuple of 32-bit integers (N would be 4-7) that maps to a number from 0 to M, where M would be in the hundreds or thousands. I don't care if it is predictable, I only need it to have a long period along each dimension (at least 2^32) and not produce the same number two times in a row consecutively along any dimension.
The context of this is that I am doing terrain generation using perlin noise and I don't want my world to be periodic (hashing for an index in an array of gradient vectors).

MatLab:Generate N pseudo-random numbers with a Poisson distribution having mean M and total T where N,M, and T are user defined

I’d like to be able to generate in MatLab a sequence of N pseudo-random numbers with a Poisson distribution having mean M. The sum of the N numbers should be T. N, M, and T are always positive or zero and would be user specified parameters to any function.
Obviously, if T is small relative to N it is likely that there will be problems achieving a total of T. In that case the function could just return the values T and then N-1 zeros or an error code. However, it is highly likely that in most cases T>>N.
I have been trying variations based on the method of generating random numbers with a given distribution provided at http://matlabtricks.com/post-44/generate-random-numbers-with-a-given-distribution and trying various normalizations at each step but have not been successful.
You could try to approximate what you want by using multinomial distribution.
If you use Wikipedia notation, then k=N, n=T and pi=M/T. Poisson distribution has distinctive property of mean equal to variance, but if your parameters are such that pi is small, then mean npi would be quite close to variance npi(1-pi). Sum would be automatically (by property of multinomial) equal of T.
Multinomial sampling in Matlab is done using mnrmd function.
UPDATE
Wrt comment, lets consider N sampled values vi, and write their sum
Sum(i=1...N) vi = T
Lets compute mean value of the left and right side of this equation.
Sum(i=1...N) E(vi) = E(T) = T
On the right side, mean value of constant is constant itself. On the left side we have
Sum(i=1...N) E(vi) = Sum(i=1...N) M = N*M = T
Therefore, M=T/N and pi=M/T=1/N.

generation uncorrelated random sequence

I want to generate an uncorrelated stochastic random sequence with zero mean and unit variance to use it as input.and also I need to generate a white noise sequence with zero mean and variance 4.how can I do it?
You can find some random generator at https://fr.mathworks.com/help/matlab/random-number-generation.html.
If you are looking for a uniformly distributed random numbers, you can use rand and for a normally distributed random numbers randn. If you will use randn, you could change the mean and variance by using (randn * variance) + mean

Calculate the variance of an integer vector in MATLAB

I need to calculate the variance of a large vector which is stored as uint8. The MATLAB var function however only accepts double and single types as input. The easiest way to calculate the variance would therefore be
vec = randi(255,1,100,'uint8');
var(single(vec))
This of course gives the correct result. However using single datatype increses the memory usage by a factor of 4. For large vectors (~ 1 million elements) this will quickly fill up the memory.
What I tried: The definition of the variance for a discrete random variable X is
(Source: Wikipedia)
I estimated the p's using the histogram, but then got stuck: To calculate the variance in a vectorized fashion, I would need to convert the x_i's to single or double.
Is there any possibility to calculate the variance without converting the whole vector to single or double?
If you're willing to work with uint16, you can do this, it creates only 3 floating point numbers (var and the 2 means), use Var(X)=Mean(X^2)-Mean(X)^2:
uivec=uint16(vec);
mean(uivec.^2)-mean(uivec)^2
So, not as good as keeping uint8 but still twice better than converting to single. It should work with uint16 because your input is uint8 and (2^8)^2=2^16.
If you want the exact same answer as var, you need to remember that MATLAB uses the unbiased estimator for var (it divides the sum by n-1 instead of n, where n is your number of samples) so you need to do:
n=length(vec);
v=mean(uivec.^2)-mean(uivec)^2*(n/(n-1))
then your v will be exactly equal to var(single(vec)).
No. The value of the variance is going to be a floating point value most likely, so you need to perform floating point operations.
p_i itself is the Probability mass function, so sum(p_i) should be one, therefore each p_i is a floating point number.
In addition, nu, the mean, will probably not be integer neither

Probability of generating a particular random number, such as in MATLAB

In real probability, there is a 0% chance that a random number p, selected from all of the real numbers in the interval (0,1), will be 0.5. However, what are the odds that
rand == 0.5
in MATLAB? I suppose this is like asking how many double-precision numbers are between zero and one, or maybe there are other factors at play.
No particular info on MATLAB's generator...
In general even simple pseudo-random generators have long enough cycles which would cover all values representable by double.
If MATLAB uses some other form of generating random numbers it would be even better - so assuming it uniformly covers whole range of double values.
I believe probability would be: distance between representable numbers around values you are interested divided by length of the interval. See What is the minimal step in double data type? (.NET) for discussion on the distance.
Looking at this question, we see that there are 262 - 252
doubles in the interval (0 1). Therefore, the probability of picking any single one (like 0.5) would be roughly equal to one divided by this number, or
>> p = 1/(2^62-2^52)
ans =
2.170523997312134e-019
However, as horchler already indicates, it also depends on the type of random number generator you use, as well as MATLAB's implementation thereof. Sadly, I have only basic knowledge on the implementaion details for each, but you can look here for a list of available random number generators in MATLAB and google a bit further for more precise numbers.
I am not sure whether Alexei was trying to say this, but inspired by him I think the probability will indeed be approximately the distance between numbers around 0.5.
Therefore I expect the probability to be approximately:
eps(0.5)
Which evaluates to 1.1102e-16
Given the monotonic nature of the difference between double numbers I would actually think this holds:
eps(0.5-eps(0.5)) <= yourprobability <= eps(0.5)
Implying a range of 5.5511e-17 to 1.1102e-16