One-time randomization - matlab

I have a matrix, ECGsig, with each row containing a 1-second-long ECG signal,
I will classify them later but I want to randomly change the rows like,
idx = randperm(size(ECGsig,1));
ECGsig = ECGsig(idx,:);
However I want this to happen just once and not every time that I run the program,
Or in other words to have the random numbers generated only once,
Because if it changes every time I would have different results for classification,
Is there any way to do this beside doing in a separate m file and saving it in a mat file?
Thanks,

You can set the random generation seed so that every time you run a random result, it will generate the same random result each time. You can do this through rng. This way, even though run the program multiple times, it will still generate the same random sequence regardless. As such, try doing something like:
rng(1234);
The input into rng would be the seed. However, as per Luis Mendo's comment, rng is only available with newer versions of MATLAB. Should rng not be available with your distribution of MATLAB, do this instead:
rand('seed', 1234);
You can also take a look at randstream, but that's a bit too advanced so let's not look at it right now. To reset the seed to what it was before you opened MATLAB, choose a seed of 0. Therefore:
rng(0); %// or
rand('seed', 0);
By calling this, any random results you generate from this point will be based on a pre-determined order. The seed can be any integer you want really, but use something that you'll remember. Place this at the very beginning of your code before you do anything. The main reason why we have control over how random numbers are generated is because this encourages the production of reproducible results and research. This way, other people can generate the results you have created should you decide to do anything with random or randomizing.
Even though you said you only want to run this randomization once, this will save you the headache of saving your results to a different file before you run the program multiple times. By setting the seed, even though you're running the program multiple times, you're guaranteed to generate the same random sequence each time.

Related

How does random number generation ensure reproducibility?

While reading about Transfer Learning with MATLab I came across a piece of code which says...
rng(2016) % For reproducibility
convnet = trainNetwork(trainDigitData,layers,options);
...before training the network so that the results can be reproduced exactly as given in the example by anyone who tries that code. I would like to know how generating a pseudo-random number using rng(seed_value) function can help with reproduciblity of the entire range of results?
Not random number generation, the random number generator seed.
There is no such things as random numbers, just pseudo-random numbers, numbers that behave almost as random, generally arising from some complex mathematical function, function that usually requires an initial value. Often, computers get this initial value from the time register in the microchip in your PC, thus "ensuring" randomness.
However, if you have an algorithm that is based in random numbers (e.g. a NN), reproducibility may be a problem when you want to share your results. Someone that re-runs your code will be ensured to get different results, as randomness is part of the algorithm. But, you can tell the random number generator to instead of starting from a seed taken randomly, to start from a fixed seed. That will ensure that while the numbers generated are random between themseves, they are the same each time (e.g. [3 84 12 21 43 6] could be the random output, but ti will always be the same).
By setting a seed for your NN, you ensure that for the same data, it will output the same result, thus you can make your code "reproducible", i.e. someone else can run your code and get EXACTLY the same results.
As a test I suggest you try the following:
rand(1,10)
rand(1,10)
and then try
rng(42)
rand(1,10)
rng(42)
rand(1,10)
Wikipedia for Pseudo-random number generator
Because some times is good to use the same random numbers, this is what matlab says about that
Set the seed and generator type together when you want to:
Ensure that the behavior of code you write today returns the same results when you run that code in a future MATLABĀ® release.
Ensure that the behavior of code you wrote in a previous MATLAB release returns the same results using the current release.
Repeat random numbers in your code after running someone else's random number code
this is te point of repating the seed, and generate the same random numbers. matlab points it out in two good articles one for repeating numbers and one for different numbers
You dont want to start with weights all equal zeros, so in the initializing stage you give the weights some random value. There maybe other random values involved in searching for minimum later in the learning process, or in the way you feed your data.
So the real input to all neural network learning process is your data and the random number generator.
If they are the same, than all going to be the same.
And 'rng' command put the random number generator in predefined state so it will generate same sequence of number.
anquegi's answer, pretty much answers your question, so this post is just to elaborate a bit more.
Whenever you ask for a random number, what MATLAB really does, is that it generates a pseudo random number, which has distribution U(0,1) (that is the uniform on [0,1]) This is done via some deterministic formula, typically something like, see Linear congruential generator:
X_{n+1} = (a X_{n} + b) mod M
then a uniform number is obtained by U = X_{n+1}/M.
There is, however, a problem, If you want X_{1}, then you need X_{0}. You need to initialise the generator, this is the seed. This also means that once X_{0} is specified you will draw the same random numbers, every time. Try open a new MATLAB instance, run randn, close MATLAB, open it again and run randn again. It will be the same number. That is because MATLAB always uses the same seed whenever it is opened.
So what you do with rng(2016) is that you "reset" the generator, and put X_{0} = 2016, such that you now know all numbers that you ask for, and thus reproduce the results.

Declaring rng('shuffle','twister') many times through the use of functions degrade computation time

I have an optimization program where I have a main program and three subprograms (functions) in MATLAB. I declared rng('shuffle','twister') in my main program but I thought that I needed to declare the same rng('shuffle','twister') under my functions since they also use random sampling. My question is if it is necessary to declare rng('shuffle','twister') in my functions since it greatly degrades the computation time. I seem to be getting the same answers anyway. Is there a way around this?
You do not need to repeatedly run rng(...) in your functions, just once when you start MATLAB if you want to get different numbers each time. The random number functions in MATLAB (i.e. rand, randn, randi, etc.) share a global/system-wide generator, so there is no need to reseed it except when you restart MATLAB.
Since all of these functions access the same underlying stream, a call to one affects the values produced by the others at subsequent calls.
Hence, numbers generated in the different functions and in repeated calls to the functions will be different whether or not you reseed the generator.
More about the 'shuffle' option from this page, which indicates that not only is it not useful to re-seed frequently, but it may actually be undesirable from a statistical standpoint:
'shuffle' is a very easy way to reseed the random number generator. You might think that it's a good idea, or even necessary, to use it to get "true" randomness in MATLAB. For most purposes, though, it is not necessary to use 'shuffle' at all. Choosing a seed based on the current time does not improve the statistical properties of the values you'll get from rand, randi, and randn, and does not make them "more random" in any real sense. While it is perfectly fine to reseed the generator each time you start up MATLAB, or before you run some kind of large calculation involving random numbers, it is actually not a good idea to reseed the generator too frequently within a session, because this can affect the statistical properties of your random numbers.

Random seed across different PBS jobs

I am trying to create random numbers in Matlab which will be different across multiple PBS jobs (I am using a job array). Each Matlab job uses a parallel parfor loop in which random numbers are generated, something like this:
parfor k = 1:10
tmp = randi(100, [1 200]);
end
However when I plot my result, I see that the results from different jobs are not completely random - I cannot quantify it, e.g by saying the numbers are exactly the same, since my results are a function of the random numbers, but it is unmistakeable when plotting it.
I tried to initialize the random seed in each job, using the process id and/or the clock:
rngSeed = feature('getpid'); % OR: rngSeed = RandStream.shuffleSeed;
rng(rngSeed);
But this didn't solve the problem. I also tried to pause for a different number of seconds in each job, before using the shuffleSeed (which is clock based).
All this made me think the parfor is somehow messing with the random seed - and it makes sense, if the parfor needs to make sure you get different random numbers across different iterations of the parfor.
My questions are, is it really the case, and how can I solve it and get randomness across different PBS jobs?
EDIT running 4 jobs, each using parfor with 2 workers, I verified that although each job has it's own seed (set outside the parfor), the numbers generated are identical across jobs (not across iterations of the parfor - that is handled by Matlab).
EDIT 2 Trying what was suggested by #Sam Roberts, I use the following code:
matlabpool open local 2
st = RandStream('mlfg6331_64');
RandStream.setGlobalStream(st);
rng('shuffle');
parfor n = 1:4
x=randi(100,[1 10]);
fprintf('%d ',x(:)');
fprintf('\n')
end
matlabpool close
but I still get the same numbers on different calls to the above script.
You may want to look into using random substreams, for correct randomness and reproducibility when running in parallel.
The RandStream class allows you to create a pseudorandom number stream - numbers drawn from this stream have the properties you'd hope for (independence etc) and, if you control the seed, you also have reproducibility.
But it may not be the case that, for example, every second or every fourth number drawn from the stream has the same properties. In addition, when you use parfor you have no control over the order in which the loop iterations are run, which means that you will lose reproducibility. You can use a different substream on each worker within a parfor loop.
Some RNGs, for example mlfg6331_64, a multiplicative lagged Fibonacci generator, or mrg32k3a, a combined multiple recursive generator, support substreams - independent streams that are generated by the same RNG, but which retain the same pseudorandom properties and can be selected from separately, retaining reproducibility. In addition, many MATLAB and Toolbox functions have an option 'UseParallel' and 'UseSubstreams', which will tell them to do this stuff for you automatically.
Although the above is documented at a technical level within the MATLAB documentation, it's kind of hard to find. There's a much more explanatory guide within Statistics Toolbox documentation (should really be moved to MATLAB if you ask me). You can read it online here.
Hope that helps!

Uniform Random Number blocks in my simulation model

I've used 2 Uniform Random Number blocks in my simulation model, but every time I run the program they generate last numbers (exactly the same). I need to test the model with new generated numbers. what should I do?
thanks for your helps in advance
The fact that random number generators generate the same random numbers "from the start" is a feature, not a bug. It allows for reproducible testing. You need to initialize your random number generator with a "random seed" in order to give a different result each time - you could use the current time, for example. When you do, it is recommended that you store the seed used - it means you can go back and run exactly the same code again.
For initializing a random seed, you can use the methods given in this earlier answer
In that answer, they are setting the seed to 0 - this is the opposite of what you are trying to do. You will want to generate a non-random number (like the date), and use that. A very useful article can be found here. To quote:
If you look at the output from rand, randi, or randn in a new MATLAB
session, you'll notice that they return the same sequences of numbers
each time you restart MATLAB. It's often useful to be able to reset
the random number generator to that startup state, without actually
restarting MATLAB. For example, you might want to repeat a calculation
that involves random numbers, and get the same result.
They recommend the command
rng shuffle
To generate a new random seed. You can access the seed that was used with
rng.seed
and store that for future use. So if you co
rng shuffle
seedStore = rng.seed;
Then next time you want to reproduce results, you set
rng(seedStore);

How can I use reproducible randomization in Perl?

I have a Perl script that uses rand to generate pseudorandom integers in some range. I want it to be random (i.e. not set the seed by myself to some constant), but also want to be able to reproduce the results of a specific run if needed.
What would you do?
McWafflestix says:
Possibly you want to have a default randomly determined seed, that will give you complete randomness when desired, but which can be set prior to a run manually to give reproducibility.
The obvious way to implement this is to follow your normal seeding process (either manually from a strong random source, or letting perl do it automatically on the first call to rand), then use the first generated random value as the seed, and record it. If you want to reproduce later, just use a recorded value for the seed.
# something like this?
if ( defined $input_rand_seed ) {
srand($input_rand_seed);
} else {
my $seed = rand(); # or something fancier
log_random_seed($seed);
srand($seed);
}
If the purpose is to be able to reproduce simulation paths which incorporate random shocks (say, when you are running an economic model to produce projections, I would give up on the idea of storing the seed, but rather store each sequence alongside the model data.
Note that the built in rand is subject to vagaries of the rand implementation provided by the C runtime. On all Windows machines and across all perl versions I have used, this usually means that rand will only ever produce 32768 unique values.
That is severely limited for any serious purpose. In simulations, a crucial criterion is that random sequences used be independent of each other so that each run can be considered an independent realization.
In fact, if you are going to run a simulation 1,000 times, I would pre-produce 1,000 corresponding random sequences using known-good generators that are consistent across platforms and store them with the model inputs.
You can update the simulations using the same sequences or a new set if parameter estimates change when you get new data.
Log the seed for each run and provide a method to call the script and set the seed?
Why don't you want to set the seed, but at the same time set the seed? As I've said to you before, you need to explain why you don't want to do something so we know what you are actually asking.
You might just set it yourself only in certain conditions:
srand( $ENV{SOME_SEED} ) if defined $ENV{SOME_SEED};
If you don't call srand, rand calls it for you automatically but it doesn't report the seed that it used (at least not until Perl 5.14).
It's really just a simple programming problem. Just turn what you outlined into the code that does what you said.
Your goals are at odds with each other. One one hand, you want a self-seeding, completely random sequence of integers; on the other hand, you want reproducibility. Completely random and reproducibility are at odds with each other.
You can set the seed to something you want. Possibly you want to have a default randomly determined seed, that will give you complete randomness when desired, but which can be set prior to a run manually to give reproducibility.