What is the link between randi and rand? - matlab

I'm running on R2012a version. I tried to write a function that imitates randi using rand (only rand), producing the same output when the same arguments are passed and the same seed is provided. I tried something with the command window and here's what I got:
>> s = rng;
>> R1 = randi([2 20], 3, 5)
R1 =
2 16 11 15 14
10 17 10 16 14
9 5 14 7 5
>> rng(s)
>> R2 = 2+18*rand(3, 5)
R2 =
2.6200 15.7793 10.8158 14.7686 14.2346
9.8974 16.3136 10.0206 15.5844 13.7918
8.8681 5.3637 13.6336 6.9685 4.9270
>>
A swift comparison led me to believe that there's some link between the two: each integer in R1 is within plus or minus unity from the corresponding element in R2. Nonetheless, I failed to go any further: I checked for ceiling, flooring, fixing and rounding but neither of them seems to work.

randi([2 20]) generates integers between 2 and 20, both included. That is, it can generate 19 different values, not 18.
19 * rand
generates values uniformly distributed within the half-open interval [0,19), flooring it gives you uniformly distributed integers in the range [0,18].
Thus, in general,
x = randi([a,b]]);
y = rand * (b-a+1) + a;
should yield numbers with the same property. From OP’s experiment it looks like they might generate the same sequence, but this cannot be guaranteed, and it likely doesn't.
Why? It is likely that randi is not implemented in terms of rand, but it’s underlying random generator, which produces integers. To go from a random integer x in a large range ([0,N-1]) to one in a small range ([0,n-1]), you would normally use the modulo operator (mod(x,N)) or a floored division like above, but remove a small subset of the values that skew the distribution. This other anser gives a detailed explanation. I like to think of it in terms of examples:
Say random values are in the range [0,2^16-1] (N=2^16) and you want values in the range [0,18] (n=19). mod(19,2^16)=5. That is, the largest 5 values that can be generated by the random number generator are mapped to the lowest 5 values of the output range (assuming the modulo method), leaving those numbers slightly more likely to be generated than the rest of the output range. These lowest 5 values have a chance floor(N/n)+1, whereas the rest has a chance floor(N/n). This is bad. [Using floored division instead of modulo yields a different distribution of the unevenness, but the end result is the same: some numbers are slightly more likely than others.]
To solve this issue, a correct implementation does as follows: if you get one of the values in the random generator that are floor(N/n)*n or higher, you need to throw it away and try again. This is a very small chance, of course, with a typical random number generator that uses N=2^64.
Though we don't know how randi is implemented, we can be fairly certain that it follows the correct implementation described here. So your sequence based on rand might be right for millions of numbers, but then start deviating.
Interestingly enough, Octave's randi is implemented as an M-file, so we can see how they do it. And it turns out it uses the wrong algorithm shown at the top of this answer, based on rand:
ri = imin + floor ( (imax-imin+1)*rand (varargin{:}) );
Thus, Octave's randi is biased!

Related

Why do I get different result in different versions of MATLAB (2016 vs 2021)?

Why do I get different results when using the same code running in different version of MATLAB (2016 vs 2021) for sum(b.*x1) where b is single and x1 is double. How to avoid such error between MATLAB version?
MATLAB v.2021:
sum(b.*x1)
ans =
single
-0.0013286
MATLAB 2016
sum(b.*x1)
ans =
single
-0.0013283
In R2017b, they changed the behavior of sum for single-precision floats, and in R2020b they made the same changes for other data types too.
The change speeds up the computation, and improves accuracy by reducing the rounding errors. Simply put, previously the algorithm would just run through the array in sequence, adding up the values. The new behavior computes the sum over smaller portions of the array, and then adds up those results. This is more precise because the running total can become a very large number, and adding smaller numbers to it causes more rounding in those smaller numbers. The speed improvement comes from loop unrolling: the loop now steps over, say, 8 values at the time, and in the loop body, 8 running totals are computed (they don’t specify the number they use, the 8 here is an example).
Thus, your newer result is a better approximation to the sum of your array than the old one.
For more details (a better explanation of the new algorithm and the reason for the change), see this blog post.
Regarding how to avoid the difference: you could implement your own sum function, and use that instead of the builtin one. I would suggest writing it as a MEX-file for efficiency. However, do make sure you match the newer behavior of the builtin sum, as that is the better approximation.
Here is an example of the problem. Let's create an array with N+1 elements, where the first one has a value of N and the rest have a value of 1.
N = 1e8;
a = ones(N+1,1,'single');
a(1) = N;
The sum over this array is expected to be 2*N. If we set N large enough w.r.t. the data type, I see this in R2017a (before the change):
>> sum(a)
ans =
single
150331648
And I see this in R2018b (after the change for single-precision sum):
>> sum(a)
ans =
single
199998976
Both implementations make rounding errors here, but one is obviously much, much closer to the expected result (2e8, or 200000000).

Does the rand function ever produce values of 0 or 1 in MATLAB/Octave?

I'm looking for a function that will generate random values between 0 and 1, inclusive. I have generated 120,000 random values by using rand() function in octave, but haven't once got the values 0 or 1 as output. Does rand() ever produce such values? If not, is there any other function I can use to achieve the desired result?
If you read the documentation of rand in both Octave and MATLAB, it is an open interval between (0,1), so no, it shouldn't generate the numbers 0 or 1.
However, you can perhaps generate a set of random integers, then normalize the values so that they lie between [0,1]. So perhaps use something like randi (MATLAB docs, Octave docs) where it generates integer values from 1 up to a given maximum. With this, define this maximum number, then subtract by 1 and divide by this offset maximum to get values between [0,1] inclusive:
max_num = 10000; %// Define maximum number
N = 1000; %// Define size of vector
out = (randi(max_num, N, 1) - 1) / (max_num - 1); %// Output
If you want this to act more like rand but including 0 and 1, make the max_num variable quite large.
Mathematically, if you sample from a (continuous) uniform distribution on the closed interval [0 1], values 0 and 1 (or any value, in fact) have probability strictly zero.
Programmatically,
If you have a random generator that produces values of type double on the closed interval [0 1], the probability of getting the value 0, or 1, is not zero, but it's so small it can be neglected.
If the random generator produces values from the open interval (0, 1), the probability of getting a value 0, or 1, is strictly zero.
So the probability is either strictly zero or so small it can be neglected. Therefore, you shouldn't worry about that: in either case the probability is zero for practical purposes. Even if rand were of type (1) above, and thus could produce 0 and 1, it would produce them with probability so small that you would "never" see those values.
Does that sound strange? Well, that happens with any number. You "never" see rand ever outputting exactly 1/4, either. There are so many possible outputs, all of them equally likely, that the probability of any given output is virtually zero.
rand produces numbers from the open interval (0,1), which does not include 0 or 1, so you should never get those values.. This was more clearly documented in previous versions, but it's still stated in the help text for rand (type help rand rather than doc rand).
However, since it produces doubles, there are only a finite number of values that it will actually produce. The precise set varies depending on the RNG algorithm used. For Mersenne twister, the default algorithm, the possible values are all multiples of 2^(-53), within the open interval (0,1). (See doc RandStream.list, and then "Choosing a Random Number Generator" for info on other generators).
Note that 2^(-53) is eps/2. Therefore, it's equivalent to drawing from the closed interval [2^(-53), 1-2^(-53)], or [eps/2, 1-eps/2].
You can scale this interval to [0,1] by subtracting eps/2 and dividing by 1-eps. (Use format hex to display enough precision to check that at the bit level).
So x = (rand-eps/2)/(1-eps) should give you values on the closed interval [0,1].
But I should give a word of caution: they've put a lot of effort into making sure that output of rand gives an appropriate distribution of any given double within (0,1), and I don't think you're going to get the same nice properties on [0,1] if you apply the scaling I suggested. My knowledge of floating-point math and RNGs isn't up to explaining why, or what you might do about that.
I just tried this:
octave:1> max(rand(10000000,1))
ans = 1.00000
octave:2> min(rand(10000000,1))
ans = 3.3788e-08
Did not give me 0 strictly, so watch out for floating point operations.
Edit
Even though I said, watch out for floating point operations I did fall for that. As #eigenchris pointed out:
format long g
octave:1> a=max(rand(1000000,1))
a = 0.999999711020176
It yields a floating number close to one, not equal, as you can see now after changing the precision, as #rayryeng suggested.
Although not direct to the question here, I find it helpful to link to this SO post Octave - random generate number that has a one liner to generate 1s and 0s using r = rand > 0.5.

Compute 4^x mod 2π for large x

I need to compute sin(4^x) with x > 1000 in Matlab, with is basically sin(4^x mod 2π) Since the values inside the sin function become very large, Matlab returns infinite for 4^1000. How can I efficiently compute this?
I prefer to avoid large data types.
I think that a transformation to something like sin(n*π+z) could be a possible solution.
You need to be careful, as there will be a loss of precision. The sin function is periodic, but 4^1000 is a big number. So effectively, we subtract off a multiple of 2*pi to move the argument into the interval [0,2*pi).
4^1000 is roughly 1e600, a really big number. So I'll do my computations using my high precision floating point tool in MATLAB. (In fact, one of my explicit goals when I wrote HPF was to be able to compute a number like sin(1e400). Even if you are doing something for the fun of it, doing it right still makes sense.) In this case, since I know that the power we are interested in is roughly 1e600, then I'll do my computations in more than 600 digits of precision, expecting that I'll lose 600 digits by the subtractive cancellation. This is a massive subtractive cancellation issue. Think about it. That modulus operation is effectively a difference between two numbers that will be identical for the first 600 digits or so!
X = hpf(4,1000);
X^1000
ans =
114813069527425452423283320117768198402231770208869520047764273682576626139237031385665948631650626991844596463898746277344711896086305533142593135616665318539129989145312280000688779148240044871428926990063486244781615463646388363947317026040466353970904996558162398808944629605623311649536164221970332681344168908984458505602379484807914058900934776500429002716706625830522008132236281291761267883317206598995396418127021779858404042159853183251540889433902091920554957783589672039160081957216630582755380425583726015528348786419432054508915275783882625175435528800822842770817965453762184851149029376
What is the nearest multiple of 2*pi that does not exceed this number? We can get that by a simple operation.
twopi = 2*hpf('pi',1000);
twopi*floor(X^1000/twopi)
ans = 114813069527425452423283320117768198402231770208869520047764273682576626139237031385665948631650626991844596463898746277344711896086305533142593135616665318539129989145312280000688779148240044871428926990063486244781615463646388363947317026040466353970904996558162398808944629605623311649536164221970332681344168908984458505602379484807914058900934776500429002716706625830522008132236281291761267883317206598995396418127021779858404042159853183251540889433902091920554957783589672039160081957216630582755380425583726015528348786419432054508915275783882625175435528800822842770817965453762184851149029372.6669043995793459614134256945369645075601351114240611660953769955068077703667306957296141306508448454625087552917109594896080531977700026110164492454168360842816021326434091264082935824243423723923797225539436621445702083718252029147608535630355342037150034246754736376698525786226858661984354538762888998045417518871508690623462425811535266975472894356742618714099283198893793280003764002738670747
As you can see, the first 600 digits were the same. Now, when we subtract the two numbers,
X^1000 - twopi*floor(X^1000/twopi)
ans =
3.333095600420654038586574305463035492439864888575938833904623004493192229633269304270385869349155154537491244708289040510391946802229997388983550754583163915718397867356590873591706417575657627607620277446056337855429791628174797085239146436964465796284996575324526362330147421377314133801564546123711100195458248112849130937653757418846473302452710564325738128590071680110620671999623599726132925263826
This is why I referred to it as a massive subtractive cancellation issue. The two numbers were identical for many digits. Even carrying 1000 digits of accuracy, we lost many digits. When you subtract the two numbers, even though we are carrying a result with 1000 digits, only the highest order 400 digits are now meaningful.
HPF is able to compute the trig function of course. But as we showed above, we should only trust roughly the first 400 digits of the result. (On some problems, the local shape of the sin function might cause us to lose more digits than that.)
sin(X^1000)
ans =
-0.1903345812720831838599439606845545570938837404109863917294376841894712513865023424095542391769688083234673471544860353291299342362176199653705319268544933406487071446348974733627946491118519242322925266014312897692338851129959945710407032269306021895848758484213914397204873580776582665985136229328001258364005927758343416222346964077953970335574414341993543060039082045405589175008978144047447822552228622246373827700900275324736372481560928339463344332977892008702220160335415291421081700744044783839286957735438564512465095046421806677102961093487708088908698531980424016458534629166108853012535493022540352439740116731784303190082954669140297192942872076015028260408231321604825270343945928445589223610185565384195863513901089662882903491956506613967241725877276022863187800632706503317201234223359028987534885835397133761207714290279709429427673410881392869598191090443394014959206395112705966050737703851465772573657470968976925223745019446303227806333289071966161759485260639499431164004196825
So am I right, and we cannot trust all of these digits? I'll do the same computation, once in 1000 digits of precision, then a second time in 2000 digits. Compute the absolute difference, then take the log10. The 2000 digit result will be our reference as essentially exact compared to the 1000 digit result.
double(log10(abs(sin(hpf(4,[1000 0])^1000) - sin(hpf(4,[2000 0])^1000))))
ans =
-397.45
Ah. So of those 1000 digits of precision we started out with, we lost 602 digits. The last 602 digits in the result are non-zero, but still complete garbage. This was as I expected. Just because your computer reports high precision, you need to know when not to trust it.
Can we do the computation without recourse to a high precision tool? Be careful. For example, suppose we use a powermod type of computation? Thus, compute the desired power, while taking the modulus at every step. Thus, done in double precision:
X = 1;
for i = 1:1000
X = mod(X*4,2*pi);
end
sin(X)
ans =
0.955296299215251
Ah, but remember that the true answer was -0.19033458127208318385994396068455455709388...
So there is essentially nothing of significance remaining. We have lost all our information in that computation. As I said, it is important to be careful.
What happened was after each step in that loop, we incurred a tiny loss in the modulus computation. But then we multiplied the answer by 4, which caused the error to grow by a factor of 4, and then another factor of 4, etc. And of course, after each step, the result loses a tiny bit at the end of the number. The final result was complete crapola.
Lets look at the operation for a smaller power, just to convince ourselves what happened. Here for example, try the 20th power. Using double precision,
mod(4^20,2*pi)
ans =
3.55938555711037
Now, use a loop in a powermod computation, taking the mod after every step. Essentially, this discards multiples of 2*pi after each step.
X = 1;
for i = 1:20
X = mod(X*4,2*pi);
end
X
X =
3.55938555711037
But is that the correct value? Again, I'll use hpf to compute the correct value, showing the first 20 digits of that number. (Since I've done the computation in 50 total digits, I'll absolutely trust the first 20 of them.)
mod(hpf(4,[20,30])^20,2*hpf('pi',[20,30]))
ans =
3.5593426962577983146
In fact, while the results in double precision agree to the last digit shown, those double results were both actually wrong past the 5th significant digit. As it turns out, we STILL need to carry more than 600 digits of precision for this loop to produce a result of any significance.
Finally, to fully kill this dead horse, we might ask if a better powermod computation can be done. That is, we know that 1000 can be decomposed into a binary form (use dec2bin) as:
512 + 256 + 128 + 64 + 32 + 8
ans =
1000
Can we use a repeated squaring scheme to expand that large power with fewer multiplications, and so cause less accumulated error? Essentially, we might try to compute
4^1000 = 4^8 * 4^32 * 4^64 * 4^128 * 4^256 * 4^512
However, do this by repeatedly squaring 4, then taking the mod after each operation. This fails however, since the modulo operation will only remove integer multiples of 2*pi. After all, mod really is designed to work on integers. So look at what happens. We can express 4^2 as:
4^2 = 16 = 3.43362938564083 + 2*(2*pi)
Can we just square the remainder however, then taking the mod again? NO!
mod(3.43362938564083^2,2*pi)
ans =
5.50662545075664
mod(4^4,2*pi)
ans =
4.67258771281655
We can understand what happened when we expand this form:
4^4 = (4^2)^2 = (3.43362938564083 + 2*(2*pi))^2
What will you get when you remove INTEGER multiples of 2*pi? You need to understand why the direct loop allowed me to remove integer multiples of 2*pi, but the above squaring operation does not. Of course, the direct loop failed too because of numerical issues.
I would first redefine the question as follows: compute 4^1000 modulo 2pi. So we have split the problem in two.
Use some math trickery:
(a+2pi*K)*(b+2piL) = ab + 2pi*(garbage)
Hence, you can just multiply 4 many times by itself and computing mod 2pi every stage. The real question to ask, of course, is what is the precision of this thing. This needs careful mathematical analysis. It may or may not be a total crap.
Following to Pavel's hint with mod I found a mod function for high powers on mathwors.com.
bigmod(number,power,modulo) can NOT compute 4^4000 mod 2π. Because it just works with integers as modulo and not with decimals.
This statement is not correct anymore: sin(4^x) is sin(bidmod(4,x,2*pi)).

generating odd random numbers using Matlab

I need some help on how to generate odd random numbers using Matlab. How do you generate odd random numbers within a given interval, say between 1 and 100?
Well, if I could generate EVEN random numbers within an interval, then I'd just add 1. :)
That is not as silly as it sounds.
Can you generate random integers? If you could, why not multiply by 2? Then you would have EVEN random integers. See above for what to do next.
There are tools in MATLAB to generate random integers in an interval. If not, then you could write your own trivially enough. For example, what does this do:
r = 1 + 2*floor(rand(N,1)*50);
Or this:
r = 1 + 2*randi([0 49], N,1);
Note that Rody edited this answer, but made a mistake when he did so when using randi. I've corrected the problem. Note that randi intentionally goes up to only 49 in its sampling as I have changed it. That works because 2*49 + 1 = 99.
So how about in the rand case? Why have I multiplied by 50 there, and not 49? This is taken from the doc for rand:
"r = rand(n) returns an n-by-n matrix containing pseudorandom values drawn from the standard uniform distribution on the open interval (0,1)."
So rand NEVER generates an exact 1. It can generate a number slightly smaller than 1, but never 1. So when I multiply by 50, this results in a number that is never exactly 50, but only potentially slightly less than 50. The floor then generates all integers between 0 and 49, with essentially equal probability. I suppose someone will point out that since 0 is never a possible result from rand, that the integer 0 will be under-sampled by this expression by an amount of the order of eps. If you will generate that many samples that you can see this extent of undersampling, then you will need a bigger, faster computer to do your work. :)

Generate a random number with max, min and mean (average) in Matlab

I need to generate random numbers with following properties.
Min must be 1
Max must be 9
Average (mean) is 6.00 (or something else)
Random number must be Integer (positive) only
I have tried several syntaxes but nothing works, for example
r=1+8.*rand(100,1);
This gives me a random number between 1-9 but it's not an integer (for example 5.607 or 4.391) and each time I calculate the mean it varies.
You may be able to define a function that satisfies your requirements based on Matlab's randi function. But be careful, it is easy to define functions of random number generators which do not produce random numbers.
Another approach might suit -- create a probability distribution to meet your requirements. In this case you need a vector of 9 floating-point numbers which sum to 1 and which, individually, express the probability of the i-th integer occurring. For example, a distribution might be described by the following vector:
[0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1]
These split the interval [0,1] into 9 parts. Then, take your favourite rng which generates floating-point numbers in the range [0,1) and generate a number, suppose it is 0.45. Read along the interval from 0 to 1 and you find that this is in the 5-th interval, so return the integer 5.
Obviously, I've been too lazy to give you a vector which gives 6 as the mean of the distribution, but that shouldn't be too hard for you to figure out.
Here is an algorithm with a loop to reach a required mean xmean (with required precision xeps) by regenerating a random number from one half of a vector to another according to mean at current iteration. With my tests it reached the mean pretty quick.
n = 100;
xmean = 6;
xmin = 1;
xmax = 9;
xeps = 0.01;
x = randi([xmin xmax],n,1);
while abs(xmean - mean(x)) >= xeps
if xmean > mean(x)
x(find(x < xmean,1)) = randi([xmean xmax]);
elseif xmean < mean(x)
x(find(x > xmean,1)) = randi([xmin xmean]);
end
end
x is the output you need.
You can use randi to get random integers
You could use floor to truncate your random numbers to integer values only:
r = 1 + floor(9 * rand(100,1));
Obtaining a specified mean is a little trickier; it depends what kind of distribution you're after.
If the distribution is not important and all you're interested in is the mean, then there's a particularly simple function that does that:
function x=myrand
x=6;
end
Before you can design your random number generator you need to specify the distribution it should draw from. You've only partially done that: i.e., you specified it draws from integers in [1,9] and that it has a mean that you want to be able to specify. That still leaves an infinity of distributions to chose among. What other properties do you want your distribution to have?
Edit following comment: The mean of any finite sample from a probability distribution - the so-called sample mean - will only approximate the distribution's mean. There is no way around that.
That having been said, the simplest (in the maximum entropy sense) distribution over the integers in the domain [1,9] is the exponential distribution: i.e.,
p = #(n,x)(exp(-x*n)./sum(exp(-x*(1:9))));
The parameter x determines the distribution mean. The corresponding cumulative distribution is
c = cumsum(p(1:9,x));
To draw from the distribution p you can draw a random number from [0,1] and find what sub-interval of c it falls in: i.e.,
samp = arrayfun(#(y)find(y<c,1),rand(n,m));
will return an [n,m] array of integers drawn from p.