How to set custom seed for pseudo-random number generator - matlab

I need to perform few tests where I use randn pseudo random number generator. How can I set the seed on my own, so every time I run this test I will get the same results? (yeah, I know it's a little bit weird, but that's the problem).
I've found the RANDSTREAM object that has the seed property, but it's read only. Is there any way to use it for seeding the generator?

The old way of doing it:
randn('seed',0)
The new way:
s = RandStream('mcg16807','Seed',0)
RandStream.setDefaultStream(s)
Note that if you use the new way, rand and randn share the same stream so if you are calling both, you may find different numbers being generated compared to the old method (which has separate generators). The old method is still supported for this reason (and legacy code).
See http://www.mathworks.com/help/techdoc/math/bsn94u0-1.html for more info.

You can just call rng(mySeed) to set the seed for the global stream (tested in Matlab R2011b). This affects the rand, randn, and randi functions.
The same page that James linked to lists this as the recommended alternative to various old methods (see the middle cell of the right column of the table).
Here's some example code:
format long; % Display numbers with full precision
format compact; % Get rid of blank lines between output
mySeed = 10;
rng(mySeed); % Set the seed
disp(rand([1,3]));
disp(randi(10,[1,10]));
disp(randn([1,3]));
disp(' ');
rng(mySeed); % Set the seed again to duplicate the results
disp(rand([1,3]));
disp(randi(10,[1,10]));
disp(randn([1,3]));
Its output is:
0.771320643266746 0.020751949359402 0.633648234926275
8 5 3 2 8 2 1 7 10 1
0.060379730526407 0.622213879877005 0.109700311365407
0.771320643266746 0.020751949359402 0.633648234926275
8 5 3 2 8 2 1 7 10 1
0.060379730526407 0.622213879877005 0.109700311365407

mySeed=57; % an integer number
rng(mySeed,'twister') %You can replace 'twister' with other generators

When you just want to reset the RNG to some known state, just use:
seed = 0;
randn('state', seed);
rand ('state', seed);
A = round(10*(rand(1,5))); // always will be [10 2 6 5 9]

Related

What is the link between randi and rand?

I'm running on R2012a version. I tried to write a function that imitates randi using rand (only rand), producing the same output when the same arguments are passed and the same seed is provided. I tried something with the command window and here's what I got:
>> s = rng;
>> R1 = randi([2 20], 3, 5)
R1 =
2 16 11 15 14
10 17 10 16 14
9 5 14 7 5
>> rng(s)
>> R2 = 2+18*rand(3, 5)
R2 =
2.6200 15.7793 10.8158 14.7686 14.2346
9.8974 16.3136 10.0206 15.5844 13.7918
8.8681 5.3637 13.6336 6.9685 4.9270
>>
A swift comparison led me to believe that there's some link between the two: each integer in R1 is within plus or minus unity from the corresponding element in R2. Nonetheless, I failed to go any further: I checked for ceiling, flooring, fixing and rounding but neither of them seems to work.
randi([2 20]) generates integers between 2 and 20, both included. That is, it can generate 19 different values, not 18.
19 * rand
generates values uniformly distributed within the half-open interval [0,19), flooring it gives you uniformly distributed integers in the range [0,18].
Thus, in general,
x = randi([a,b]]);
y = rand * (b-a+1) + a;
should yield numbers with the same property. From OP’s experiment it looks like they might generate the same sequence, but this cannot be guaranteed, and it likely doesn't.
Why? It is likely that randi is not implemented in terms of rand, but it’s underlying random generator, which produces integers. To go from a random integer x in a large range ([0,N-1]) to one in a small range ([0,n-1]), you would normally use the modulo operator (mod(x,N)) or a floored division like above, but remove a small subset of the values that skew the distribution. This other anser gives a detailed explanation. I like to think of it in terms of examples:
Say random values are in the range [0,2^16-1] (N=2^16) and you want values in the range [0,18] (n=19). mod(19,2^16)=5. That is, the largest 5 values that can be generated by the random number generator are mapped to the lowest 5 values of the output range (assuming the modulo method), leaving those numbers slightly more likely to be generated than the rest of the output range. These lowest 5 values have a chance floor(N/n)+1, whereas the rest has a chance floor(N/n). This is bad. [Using floored division instead of modulo yields a different distribution of the unevenness, but the end result is the same: some numbers are slightly more likely than others.]
To solve this issue, a correct implementation does as follows: if you get one of the values in the random generator that are floor(N/n)*n or higher, you need to throw it away and try again. This is a very small chance, of course, with a typical random number generator that uses N=2^64.
Though we don't know how randi is implemented, we can be fairly certain that it follows the correct implementation described here. So your sequence based on rand might be right for millions of numbers, but then start deviating.
Interestingly enough, Octave's randi is implemented as an M-file, so we can see how they do it. And it turns out it uses the wrong algorithm shown at the top of this answer, based on rand:
ri = imin + floor ( (imax-imin+1)*rand (varargin{:}) );
Thus, Octave's randi is biased!

Double for loop in MATLAB, storing the information

I have two for loops in MATLAB.
One of the for loops leads to different variables being inserted into the model, which are 43 and then I have 5 horizons.
So I estimate the model 215 times.
My problem is I want to store this in 215x5 matrix, the reason I have x5 is that I am estimating 5 variables, 4 are fixed and the other one comes from the for loop.
I have tried to do this in two ways,
Firstly, I create a variable called out,
out=zeros(215,5);
The first for loop is,
for i=[1,2,3,4,5];
The second for loop is,
for ii=18:60;
The 18:60 is how I define my variables using XLS read, e.g. they are inserted into the model as (data:,ii).
I have tried to store the data in two ways, I want to store OLS which contains the five estimates
First,
out(i,:)=OLS;
This method creates a 5 x 5 matrix, with the estimates for one of the (18:60), at each of the horizons.
Second,
out(ii,:)=OLS;
This stores the variables for each of the variables (18:60), at just one horizon.
I want to have a matrix which stores all of the estimates OLS, at each of the horizons, for each of my (18:60).
Minimal example
clear;
for i=[1,2,3,4,5];
K=i;
for ii=18:60
x=[1,2,3,i,ii];
out(i,:)=x;
end
end
So the variable out will store 1 2 3 5 60
I want the variable out to store all of the combinations
i.e.
1 2 3 1 1
1 2 3 1 2
...
1 2 3 5 60
Thanks
The simplest solution is to use a 3D matrix:
for jj=[1,2,3,4,5];
K=jj;
for ii=18:60
x=[1,2,3,jj,ii];
out(ii-17,jj,:)=x;
end
end
If you now reshape the out matrix you get the same result as the first block in etmuse's answer:
out = reshape(out,[],size(out,3));
(Note I replaced i by jj. i and ii are too similar to use both, it leads to confusion. It is better to use different letters for loop indices. Also, i is OK to use, but it also is the built-in imaginary number sqrt(-1). So I prefer to use ii over i.)
As you've discovered, using just one of your loop variables to index the output results in most of the results being overwritten, leaving only the results from the final iteration of the relevant loop.
There are 2 ways to create an indexing variable.
1- You can use an independent variable, initialised before the loops and incremented at the end of the internal loop.
kk=1;
for i=1:5
for ii=18:60
%calculate OLC
out(kk,:)=OLC;
kk = kk+1;
end
end
2- Use a calculation of i and ii
kk = i + 5*(ii-18)
(Use in loop as before, without increment)

What is the difference between predict and svmclassify?

I tried the following code
data = [27 9 0
11.6723281 28.93422177 0
25 9 0
23 8 0
5.896096039 23.97745722 1
21 6 0
21.16823369 5.292058423 0
4.242640687 13.43502884 1
22 6 0];
Attributes = data(:,1:2);
Classes = data(:,3);
train = [1 3 4 5 6 7];
test = [2 8 9];
%%# Train
SVMModel = fitcsvm(Classes(train),Attributes(train,:))
classOrder = SVMModel.ClassNames
sv = SVMModel.SupportVectors;
figure
gscatter(train(:,1),train(:,2),Classes)
hold on
plot(train(:,1),train(:,2),'ko','MarkerSize',10)
legend('good','bad','Support Vector')
hold off
I tried both predict and svmclassify; but it returns an error. What is the basic difference between these two functions?
[label,score] = predict(SVMModel,test);
label = svmclassify(SVMModel, test);
First off, there's quite a big note on top of the documentation page on svmclassify:
svmclassify will be removed in a future release. See fitcsvm, ClassificationSVM, and CompactClassificationSVM instead.
MATLAB is a bit vague in its naming of functions, as there's loads of functions named predict, using different schemes and algorithms. I suspect you'll want to use the one for SVMs. This should return the same result as svmclassify, but I think that either something went wrong in determining which predict MATLAB decided to use, or that predict has a newer algorithm than the unsupported svmclassify, hence a different output may result.
The conclusion is that you should use the newest functions to be able to run your code in future releases and get the newest algorithms. MATLAB will choose the correct version of predict based on what kind of input structure you feed it.

Average on contiguos segments of a vector

I'm sure this is a trivial question for a signals person. I need to find the function in Matlab that outputs averaging of contiguous segments of windowsize= l of a vector, e.g.
origSignal: [1 2 3 4 5 6 7 8 9];
windowSize = 3;
output = [2 5 8]; % i.e. [(1+2+3)/3 (4+5+6)/3 (7+8+9)/3]
EDIT: Neither one of the options presented in How can I (efficiently) compute a moving average of a vector? seems to work because I need that the window of size 3 slides, and doesnt include any of the previous elements... Maybe I'm missing it. Take a look at my example...
Thanks!
If the size of the original data is always a multiple of widowsize:
mean(reshape(origSignal,windowSize,[]));
Else, in one line:
mean(reshape(origSignal(1:end-mod(length(origSignal),windowSize)),windowSize,[]))
This is the same as before, but the signal is only taken to the end minus the extra values less than windowsize.

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?
I have Matlab R2012a and here is my output
crossvalind('KFold',1:1:11,5)
ans =
2
5
1
3
2
1
5
3
5
1
5
Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.
The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.
I tried doing:
numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))
and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.
It is possible that you mean to do:
crossvalind('KFold', 11, 5);
which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.
However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:
crossvalind('LeaveMOut', 11, 1);
although a better method would be:
for leave_out=1:11
fold_number = (1:11) ~= leave_out;
<code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end