Random Number Generator Matlab with Multiple CPUs - matlab

I would like to write a matlab script which runs in parallel using multiple CPUS. The script should then print out a sequence of normally distributed random numbers. At the moment my script looks like this:
matlabpool close force local
clusterObj = parcluster;
matlabpool(clusterObj);
parfor K = 1:10
disp(randn)
end
It prints out a sequence of random numbers as expected. However, when I run the code again it, once again, prints out that exact same sequence of numbers. I do not want this. Each time I run my script it should print out an independently random sequence of numbers. Similarly, each time I start matlab, my script should, when I run it for the first time, print out a different sequence of 10 randomly generated numbers. How do I do this?

The solutions given so far are really not correct and may even be bad ideas. One should avoid setting the seed of the generator repeatedly. More importantly, two streams created separately with different seeds are not necessarily independent. This is addressed on this page that describes the creation of multiple streams:
For generator types that do not explicitly support independent streams, different seeds provide a method to create multiple streams. However, using a generator specifically designed for multiple independent streams is a better option, as the statistical properties across streams are better understood.
Thus, to guarantee the best statistical properties it is best to use a generator that supports substreams. Unfortunately, only the multiplicative lagged Fibonacci generator ('mlfg6331_64') and combined multiple recursive generator ('mrg32k3a') currently support this property. Compared to the default Mersenne Twister generator ('mt19937ar') these have significantly smaller periods. Here is how you would go about creating and using a random number stream with substreams:
seed = 1;
n = 10;
[stream{1:n}] = RandStream.create('mrg32k3a','NumStreams',n,'Seed',seed);
parfor k = 1:n
r = randn(stream{k},[1 3]);
disp(r);
end
Several things. You may get much better performance simply generating all of your random numbers in one call outside of your loop. This will also allow you to use the default Mersenne Twister algorithm, which may be important if, for example, you plan on doing large-scale Monte Carlo simulations. If you're going to be working with random numbers (and parallelization) I recommend that you spend some time reading the documentation for the RandStream class and going through the examples here.

Reset the random number generator used by rand, randi, and randn to its default startup settings, so that rand produces the same random numbers as if you restarted MATLABĀ®:
rng('default')
rand(1,5)
ans =
0.8147 0.9058 0.1270 0.9134 0.6324
Save the settings for the random number generator used by rand, randi, and randn, generate 5 values from rand, restore the settings, and repeat those values:
s = rng;
u1 = rand(1,5)
u1 =
0.0975 0.2785 0.5469 0.9575 0.9649
rng(s);
u2 = rand(1,5)
u2 =
0.0975 0.2785 0.5469 0.9575 0.9649
Reinitialize the random number generator used by rand, randi, and randn with a seed based on the current time. rand returns different values each time you do this. Note that it is usually not necessary to do this more than once per MATLAB session as it may affect the statistical properties of the random numbers MATLAB produces:
rng('shuffle');
rand(1,5);
I would try different generators:
rng('shuffle', generator)
rng('shuffle', generator) additionally specify the type of the random number generator used by rand, randi, and randn. The generator input is one of:
'twister' Mersenne Twister
'combRecursive' Combined Multiple Recursive
'multFibonacci' Multiplicative Lagged Fibonacci
'v5uniform' Legacy MATLABĀ® 5.0 uniform generator
'v5normal' Legacy MATLAB 5.0 normal generator
'v4' Legacy MATLAB 4.0 generator

You can set the random seed to a different value for each iteration:
matlabpool close force local
clusterObj = parcluster;
matlabpool(clusterObj);
rng('shuffle');
seeds = round(10000*abs(randn(10,1)));
parfor K = 1:10
rng(seeds(K))
disp(randn)
end

Some of the random number generators store a value that is essentially an index into the sequence of psuedo-random numbers for a particular seed value.
When running in parallel the different CPUs could be overlaying each others setting of that index value.
You could pre-allocate a vector of random numbers using one CPU and then exec the parallel for loop that pulls numbers from that vector.

Related

Using rng('shuffle') function in parallel computing

I have a parfor loop for parallel computing in Matlab. I want have different random numbers in every calling of these parforloops on 8 workers. If i don't use rng('shuffle') function i have same random number for randperm(10). In this case my code run rng('shuffle') function before randperm at the same time in all workers. Have i different random numbers in this condition? when I see randperm outputs in parfor loop, Some of these outputs are same !
I need save rng before rng('shuffle') and use something likes rng(saved_rng) after ending parallel loop?
We have this in Matlab help :
Note Because rng('shuffle') seeds the random number generator based
on the current time, you should not use this command to set the random
number stream on different workers if you want to assure independent
streams. This is especially true when the command is sent to multiple
workers simultaneously, such as inside a parfor, spmd, or a
communicating job. For independent streams on the workers, use the
default behavior; or if that is not sufficient for your needs,
consider using a unique substream on each worker.
So what should i do? Have I different random numbers if i delete rng? I have two versions of these codes. One of them is calculation with parfor and other using for loop, Can i remove shuffle from for loop? I have different random numbers in this condition?
Thanks.
Ps.
I can have these structures:
parfor I=1:X
xx = randperm(10)
end
parfor I=1:X
rng('shuffle');
xx = randperm(10)
end
rng('shuffle');
parfor I=1:X
xx = randperm(10)
end
I want have different random numbers from randperm function. How can I do that? for for structure i need shuffle function (without it the random numbers are the same) but when i add it to parfor some random outputs of randperm are the same !
To do this properly, you need to choose an RNG algorithm that supports parallel substreams (in other words, you can split up the random stream into substreams, and each of the substreams still has the right statistical properties that you want from a random stream).
The default RNG algorithm (Mersenne Twister, or mt19937ar) does not support parallel substreams, but MATLAB supports two algorithms that do (the multiplicative lagged Fibonacci generator mlfg6331_64 and the combined multiple recursive generator mrg32k3a).
For example:
s = RandStream.create('mrg32k3a','NumStreams',4,'Seed','shuffle','CellOutput',true)
s is now a cell array of random number substreams. All have the same seed, and you can record s{1}.Seed for reproducibility if you want.
Now, you can call rand(s{1}) (or randn(s{1})) to generate random numbers from stream 1, and so on. Reset a stream to its initial configuration with reset(s{1}), and you should find that each stream is separately reproducible.
Each worker can then generate random numbers in a way that is still statistically sound, and reproducible even in parallel:
parfor i = 1:4
rand(s{i})
end
For more information, look in the documentation for Statistics Toolbox under Speed up Statistical Computations. There are a few articles in there that take you through all the complicated details. If you don't have Statistics Toolbox, the documentation is online on MathWorks website.

Generating similar but different vectors

I want to generate 100 vectors each of size 1x7. I have the following code currently, but when I plot it, it seems to be too linearly spaced. Is there a way to achieve a similar result only rougher?
P = randi([7 12],100,7)'/10.* repmat(randn(1,7),100,1)';
You may use different distribution for the randomizing part. randi is using uniformly distribution. You can use rng function to control random number generation. There are different generators like :
'twister' Mersenne Twister
'combRecursive' Combined Multiple Recursive
'multFibonacci' Multiplicative Lagged Fibonacci
as an example:
rng('shuffle')
rng(1);
A = rand(2,2);
rng(2);
B = rand(2,2);
it produces different numbers each time.
check this link for more info.

What is the connection between norminv and normrnd?

If I just want to calculate a standard normal variable is there any difference between using:
samples=norminv(rand(N),0,1);
and
samples=normrnd(0,1,N,1);
Either in terms of processing time or convergence when used in a Monte Carlo simulation?
The reason I ask is that I want to use a quasi Monte Carlo technique like Halton numbers with norminv() to replicate normrnd(), but first I want to make sure I understand the relationship between them.
I guess one of the central questions is: how are the random numbers are generated in rand() and normrnd respectively? Is it the same method?
If they are entirely equivalent why the duplication?
Both approaches give the same distribution, by the theorem of inverse transformation (as you surely know). One important difference, though, is computation time:
N = 1e6;
tic
samples = norminv(rand(N,1),0,1);
toc
tic
samples = normrnd(0,1,N,1);
toc
tic
samples = randn(N,1);
toc
gives
Elapsed time is 0.171892 seconds.
Elapsed time is 0.039265 seconds.
Elapsed time is 0.029649 seconds.
So, even Matlab probably uses uniform random numbers internally to generate Gaussian random numbers, its implementation is more efficient in terms of speed than doing norminv(rand(...)) yourself.
Why is rand faster than normrnd? Because normrnd is just rand preceded with some input checking, and also, as noted in #chappjc's answer, lets you specify mean and standard deviation (but you don't seem to need that). (You can see normrnd source code by typing open normrnd).
Bottom line: I would use randn.
You get a uniform distribution with rand, and a normal distribution with randn.
Now, if the question is what is the relationship between normrnd and randn, the answer is that normrnd is a convenience function that takes the mean and standard deviation of the distribution as input arguments. That is, normrnd does the following:
r = randn(sizeOut) .* sigma + mu;
As for normrnd (using randn) vs. norminv (using rand), see Luis Mendo's answer (it will be the same distribution). And as I noted, you can skip normrnd entirely with the equation above.

Parametric random number generation with MATLAB

I was wondering if it is possible to generate a random distribution that is a function of a certain parameter. In other words, using MATLAB I type rand(1,5) I have a uniformly random distribution of 5 numbers between 0 and 1. It is possible to have this result as a function of a certain parameter? Do you know any algorithm about that? I just need that in an interval don't need a 2D representation.
I think you want to do this:
http://en.wikipedia.org/wiki/Inverse_transform_sampling
In MATLAB, it's quite straightforward, you simply specify the function.
n = 10000; % number of random draws
r = rand(n, 1); % generate uniform random numbers
f = #norminv; % specify transforming function
tr = f(r); % transformed numbers, now normally distributed
hist(tr, 30) % plot histogram
This example is a bit contrived, since we could simply have used randn. But the method holds generally.
If you have the Statistics toolbox, and you want to sample from one of the popular distributions, take a look at the random number generators that are available to you, link.

Is there a statistical difference between generating many random vectors vs a single random matrix

Is there a statistical difference between generating a series of paths for a montecarlo simulation using the following two methods (note that by path I mean a vector of 350 points, normally distributed):
A)
for path = 1:300000
Zn(path, :) = randn(1, 350);
end
or the far more efficient B)
Zn = randn(300000, 350);
I just want to be sure there is no funny added correlation or dependence between the rows in method B that isn't present in method A. Like maybe method B distributes normally over 2 dimensions where A is over 1 dimension, so maybe that makes the two statistically different?
If there is a difference then I need to know the same for uniform distributions (i.e. rand instead of randn)
Just to add to the answer of #natan (+1), run the following code:
%# Store the seed
Rng1 = rng;
%# Get a matrix of random numbers
X = rand(3, 3);
%# Restore the seed
rng(Rng1);
%# Get a matrix of random numbers one vector at a time
Y = nan(3, 3);
for n = 1:3
Y(:, n) = rand(3, 1);
end
%# Test for differences
if any(any(X - Y ~= 0)); disp('Error'); end;
You'll note that there is no difference between X and Y. That is, there is no difference between building a matrix in one step, and building a matrix from a sequence of vectors.
However, there is a difference between my code and yours. Note I am populating the matrix by columns, not rows, since when rand is used to construct a matrix in one step, it populates by column. By the way, I'm not sure if you realize, but as a general rule you should always try and perform vector operations on the columns of matrices, not the rows. I explained why in a response to a question on SO the other day; see here for more...
Regarding the question of independence/dependence, one needs to be careful with the language one uses. The sequence of numbers generated by rand are perfectly dependent. For the vast majority of statistical tests, they will appear to be independent - nonetheless, in theory, one could construct a statistical test that would demonstrate the dependency between a sequence of numbers generated by rand.
Final thought, if you have a copy of Greene's "Econometric Analysis", he gives a neat discussion of random number generation in section 17.2.
As far as the base R's random number generator is concerned, also, there doesn't appear to be any difference between generating a sequence of random numbers at once or doing it one-by one. Thus, #Colin T Bowers' (+1) suggested behavior above also holds in R. Below is an R version of Colin's code:
#set seed
set.seed(1234)
# generate a sequence of 10,000 random numbers at once
X<-rnorm(10000)
# reset the seed
set.seed(1234)
# create a vector of 10,000 zeros
Y<-rep(0,times=10000)
# generate a sequence of 10,000 random numbers, one at a time
for (i in 1:10000){
Y[i]<-rnorm(1)
}
# Test for differences
if(any(X-Y!=0)){print("Error")}