Compute 4^x mod 2π for large x - matlab

I need to compute sin(4^x) with x > 1000 in Matlab, with is basically sin(4^x mod 2π) Since the values inside the sin function become very large, Matlab returns infinite for 4^1000. How can I efficiently compute this?
I prefer to avoid large data types.
I think that a transformation to something like sin(n*π+z) could be a possible solution.

You need to be careful, as there will be a loss of precision. The sin function is periodic, but 4^1000 is a big number. So effectively, we subtract off a multiple of 2*pi to move the argument into the interval [0,2*pi).
4^1000 is roughly 1e600, a really big number. So I'll do my computations using my high precision floating point tool in MATLAB. (In fact, one of my explicit goals when I wrote HPF was to be able to compute a number like sin(1e400). Even if you are doing something for the fun of it, doing it right still makes sense.) In this case, since I know that the power we are interested in is roughly 1e600, then I'll do my computations in more than 600 digits of precision, expecting that I'll lose 600 digits by the subtractive cancellation. This is a massive subtractive cancellation issue. Think about it. That modulus operation is effectively a difference between two numbers that will be identical for the first 600 digits or so!
X = hpf(4,1000);
X^1000
ans =
114813069527425452423283320117768198402231770208869520047764273682576626139237031385665948631650626991844596463898746277344711896086305533142593135616665318539129989145312280000688779148240044871428926990063486244781615463646388363947317026040466353970904996558162398808944629605623311649536164221970332681344168908984458505602379484807914058900934776500429002716706625830522008132236281291761267883317206598995396418127021779858404042159853183251540889433902091920554957783589672039160081957216630582755380425583726015528348786419432054508915275783882625175435528800822842770817965453762184851149029376
What is the nearest multiple of 2*pi that does not exceed this number? We can get that by a simple operation.
twopi = 2*hpf('pi',1000);
twopi*floor(X^1000/twopi)
ans = 114813069527425452423283320117768198402231770208869520047764273682576626139237031385665948631650626991844596463898746277344711896086305533142593135616665318539129989145312280000688779148240044871428926990063486244781615463646388363947317026040466353970904996558162398808944629605623311649536164221970332681344168908984458505602379484807914058900934776500429002716706625830522008132236281291761267883317206598995396418127021779858404042159853183251540889433902091920554957783589672039160081957216630582755380425583726015528348786419432054508915275783882625175435528800822842770817965453762184851149029372.6669043995793459614134256945369645075601351114240611660953769955068077703667306957296141306508448454625087552917109594896080531977700026110164492454168360842816021326434091264082935824243423723923797225539436621445702083718252029147608535630355342037150034246754736376698525786226858661984354538762888998045417518871508690623462425811535266975472894356742618714099283198893793280003764002738670747
As you can see, the first 600 digits were the same. Now, when we subtract the two numbers,
X^1000 - twopi*floor(X^1000/twopi)
ans =
3.333095600420654038586574305463035492439864888575938833904623004493192229633269304270385869349155154537491244708289040510391946802229997388983550754583163915718397867356590873591706417575657627607620277446056337855429791628174797085239146436964465796284996575324526362330147421377314133801564546123711100195458248112849130937653757418846473302452710564325738128590071680110620671999623599726132925263826
This is why I referred to it as a massive subtractive cancellation issue. The two numbers were identical for many digits. Even carrying 1000 digits of accuracy, we lost many digits. When you subtract the two numbers, even though we are carrying a result with 1000 digits, only the highest order 400 digits are now meaningful.
HPF is able to compute the trig function of course. But as we showed above, we should only trust roughly the first 400 digits of the result. (On some problems, the local shape of the sin function might cause us to lose more digits than that.)
sin(X^1000)
ans =
-0.1903345812720831838599439606845545570938837404109863917294376841894712513865023424095542391769688083234673471544860353291299342362176199653705319268544933406487071446348974733627946491118519242322925266014312897692338851129959945710407032269306021895848758484213914397204873580776582665985136229328001258364005927758343416222346964077953970335574414341993543060039082045405589175008978144047447822552228622246373827700900275324736372481560928339463344332977892008702220160335415291421081700744044783839286957735438564512465095046421806677102961093487708088908698531980424016458534629166108853012535493022540352439740116731784303190082954669140297192942872076015028260408231321604825270343945928445589223610185565384195863513901089662882903491956506613967241725877276022863187800632706503317201234223359028987534885835397133761207714290279709429427673410881392869598191090443394014959206395112705966050737703851465772573657470968976925223745019446303227806333289071966161759485260639499431164004196825
So am I right, and we cannot trust all of these digits? I'll do the same computation, once in 1000 digits of precision, then a second time in 2000 digits. Compute the absolute difference, then take the log10. The 2000 digit result will be our reference as essentially exact compared to the 1000 digit result.
double(log10(abs(sin(hpf(4,[1000 0])^1000) - sin(hpf(4,[2000 0])^1000))))
ans =
-397.45
Ah. So of those 1000 digits of precision we started out with, we lost 602 digits. The last 602 digits in the result are non-zero, but still complete garbage. This was as I expected. Just because your computer reports high precision, you need to know when not to trust it.
Can we do the computation without recourse to a high precision tool? Be careful. For example, suppose we use a powermod type of computation? Thus, compute the desired power, while taking the modulus at every step. Thus, done in double precision:
X = 1;
for i = 1:1000
X = mod(X*4,2*pi);
end
sin(X)
ans =
0.955296299215251
Ah, but remember that the true answer was -0.19033458127208318385994396068455455709388...
So there is essentially nothing of significance remaining. We have lost all our information in that computation. As I said, it is important to be careful.
What happened was after each step in that loop, we incurred a tiny loss in the modulus computation. But then we multiplied the answer by 4, which caused the error to grow by a factor of 4, and then another factor of 4, etc. And of course, after each step, the result loses a tiny bit at the end of the number. The final result was complete crapola.
Lets look at the operation for a smaller power, just to convince ourselves what happened. Here for example, try the 20th power. Using double precision,
mod(4^20,2*pi)
ans =
3.55938555711037
Now, use a loop in a powermod computation, taking the mod after every step. Essentially, this discards multiples of 2*pi after each step.
X = 1;
for i = 1:20
X = mod(X*4,2*pi);
end
X
X =
3.55938555711037
But is that the correct value? Again, I'll use hpf to compute the correct value, showing the first 20 digits of that number. (Since I've done the computation in 50 total digits, I'll absolutely trust the first 20 of them.)
mod(hpf(4,[20,30])^20,2*hpf('pi',[20,30]))
ans =
3.5593426962577983146
In fact, while the results in double precision agree to the last digit shown, those double results were both actually wrong past the 5th significant digit. As it turns out, we STILL need to carry more than 600 digits of precision for this loop to produce a result of any significance.
Finally, to fully kill this dead horse, we might ask if a better powermod computation can be done. That is, we know that 1000 can be decomposed into a binary form (use dec2bin) as:
512 + 256 + 128 + 64 + 32 + 8
ans =
1000
Can we use a repeated squaring scheme to expand that large power with fewer multiplications, and so cause less accumulated error? Essentially, we might try to compute
4^1000 = 4^8 * 4^32 * 4^64 * 4^128 * 4^256 * 4^512
However, do this by repeatedly squaring 4, then taking the mod after each operation. This fails however, since the modulo operation will only remove integer multiples of 2*pi. After all, mod really is designed to work on integers. So look at what happens. We can express 4^2 as:
4^2 = 16 = 3.43362938564083 + 2*(2*pi)
Can we just square the remainder however, then taking the mod again? NO!
mod(3.43362938564083^2,2*pi)
ans =
5.50662545075664
mod(4^4,2*pi)
ans =
4.67258771281655
We can understand what happened when we expand this form:
4^4 = (4^2)^2 = (3.43362938564083 + 2*(2*pi))^2
What will you get when you remove INTEGER multiples of 2*pi? You need to understand why the direct loop allowed me to remove integer multiples of 2*pi, but the above squaring operation does not. Of course, the direct loop failed too because of numerical issues.

I would first redefine the question as follows: compute 4^1000 modulo 2pi. So we have split the problem in two.
Use some math trickery:
(a+2pi*K)*(b+2piL) = ab + 2pi*(garbage)
Hence, you can just multiply 4 many times by itself and computing mod 2pi every stage. The real question to ask, of course, is what is the precision of this thing. This needs careful mathematical analysis. It may or may not be a total crap.

Following to Pavel's hint with mod I found a mod function for high powers on mathwors.com.
bigmod(number,power,modulo) can NOT compute 4^4000 mod 2π. Because it just works with integers as modulo and not with decimals.
This statement is not correct anymore: sin(4^x) is sin(bidmod(4,x,2*pi)).

Related

MATLAB numeric precision when generating a numeric sequence

I was testing a operation like this:
[input] 3.9/0.1 : 4.1/0.1
[output] 39 40
don't know why 4.1/0.1 is approximated to 40. If I add a round(), it will go as expected:
[input] 3.9/0.1 : round(4.1/0.1)
[output] 39 40 41
What's wrong with the first operation?
In this Q&A I go into detail on how the colon operator works in MATLAB to create a range. But the detail that causes the issue described in this question is not covered there.
That post includes the full code for a function that imitates exactly what the colon operator does. Let's follow that code. We start with start = 3.9/0.1, which is exactly 39, and stop = 4.1/0.1, which, due to rounding errors, is just slightly smaller than 41, and step = 1 (the default if it's not given).
It starts by computing a tolerance:
tol = 2.0*eps*max(abs(start),abs(stop));
This tolerance is intended to be used so that the stop value, if within tol of an exact number of steps, is still used, if the last step would step over it. Without a tolerance, it would be really difficult to build correct sequences using floating-point end points and step sizes.
However, then we get this test:
if start == floor(start) && step == 1
% Consecutive integers.
n = floor(stop) - start;
elseif ...
If the start value is an exact integer, and the step size is 1, then it forces the sequence to be an integer sequence. Unfortunately, it does so by taking the number of steps as the distance between floor(stop) and start. That is, it is not using the tolerance computed earlier in determining the right stop! If stop is slightly above an integer, that integer will be in the range. If stop is slightly below an integer (as in the case of the OP), that integer will not be part of the range.
It could be debated whether MATLAB should round the stop number up in this case or not. MATLAB chose not to. All of the sequences produced by the colon operator use the start and stop values exactly as given by the user. It leaves it up to the user to ensure the bounds of the sequence are as required.
However, if the colon operator hadn't special-cased the sequence of integers, the result would have been less surprising in this case. Let's add a very small number to the start value, so it's not an integer:
>> a = 3.9/0.1 : 4.1/0.1
a =
39 40
>> b = 3.9/0.1 + eps(39) : 4.1/0.1
b =
39.0000 40.0000 41.0000
Floating-point numbers suffer from loss of precision when represented with a fixed number of bits (64-bit in MATLAB by default). This is because there are infinite number of real numbers (even within a small range of say 0.0 to 0.1). On the other hand, a n-bit binary pattern can represent a finite 2^n distinct numbers. Hence, not all the real numbers can be represented. The nearest approximation will be used instead, resulted in loss of accuracy.
The closest representable value for 4.1/0.1 in the computer as a 64-bit double precision floating point number is actually,
4.1/0.1 ≈ 40.9999999999999941713291207...
So, in essence, 4.1/0.1 < 41.0 and that's what you get from the range. If you subtract, for example, 41 - 4.1/0.1 = 7.105427357601002e-15. But when you round, you get the closest value of 41.0 as expected.
The representation scheme for 64-bit double-precision according to the IEEE-754 standard:
The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative numbers.
The following 11 bits represent exponent (E).
The remaining 52 bits represents fraction (F).

What is the link between randi and rand?

I'm running on R2012a version. I tried to write a function that imitates randi using rand (only rand), producing the same output when the same arguments are passed and the same seed is provided. I tried something with the command window and here's what I got:
>> s = rng;
>> R1 = randi([2 20], 3, 5)
R1 =
2 16 11 15 14
10 17 10 16 14
9 5 14 7 5
>> rng(s)
>> R2 = 2+18*rand(3, 5)
R2 =
2.6200 15.7793 10.8158 14.7686 14.2346
9.8974 16.3136 10.0206 15.5844 13.7918
8.8681 5.3637 13.6336 6.9685 4.9270
>>
A swift comparison led me to believe that there's some link between the two: each integer in R1 is within plus or minus unity from the corresponding element in R2. Nonetheless, I failed to go any further: I checked for ceiling, flooring, fixing and rounding but neither of them seems to work.
randi([2 20]) generates integers between 2 and 20, both included. That is, it can generate 19 different values, not 18.
19 * rand
generates values uniformly distributed within the half-open interval [0,19), flooring it gives you uniformly distributed integers in the range [0,18].
Thus, in general,
x = randi([a,b]]);
y = rand * (b-a+1) + a;
should yield numbers with the same property. From OP’s experiment it looks like they might generate the same sequence, but this cannot be guaranteed, and it likely doesn't.
Why? It is likely that randi is not implemented in terms of rand, but it’s underlying random generator, which produces integers. To go from a random integer x in a large range ([0,N-1]) to one in a small range ([0,n-1]), you would normally use the modulo operator (mod(x,N)) or a floored division like above, but remove a small subset of the values that skew the distribution. This other anser gives a detailed explanation. I like to think of it in terms of examples:
Say random values are in the range [0,2^16-1] (N=2^16) and you want values in the range [0,18] (n=19). mod(19,2^16)=5. That is, the largest 5 values that can be generated by the random number generator are mapped to the lowest 5 values of the output range (assuming the modulo method), leaving those numbers slightly more likely to be generated than the rest of the output range. These lowest 5 values have a chance floor(N/n)+1, whereas the rest has a chance floor(N/n). This is bad. [Using floored division instead of modulo yields a different distribution of the unevenness, but the end result is the same: some numbers are slightly more likely than others.]
To solve this issue, a correct implementation does as follows: if you get one of the values in the random generator that are floor(N/n)*n or higher, you need to throw it away and try again. This is a very small chance, of course, with a typical random number generator that uses N=2^64.
Though we don't know how randi is implemented, we can be fairly certain that it follows the correct implementation described here. So your sequence based on rand might be right for millions of numbers, but then start deviating.
Interestingly enough, Octave's randi is implemented as an M-file, so we can see how they do it. And it turns out it uses the wrong algorithm shown at the top of this answer, based on rand:
ri = imin + floor ( (imax-imin+1)*rand (varargin{:}) );
Thus, Octave's randi is biased!

What does [int.,int] means in Maple?

I have a code that works as non linear system equation solver.
I have so much trouble with a command that goes like this:
newt[0]:=[-2.,20]:
I don't know what does that dot works there!
I thought it may be for showing that it is -2.0, but there is no reason to use that when by default -2 = -2.0.
Can anyone help me with this?
The dot forces float calculations
It is not correct that by default -2 = -2.0. There is a very big difference for Maple in how it calculates: if you use -2 it calculates exacts (arithmetic expressions) while -2.0 tells Maple to calculate with floats (numerical expressions).
The two expressions -2.*sqrt(5) and -2*sqrt(5.) are quite different in how Maple handles them, if you notice the float position! For the first example, the square root is calculated arithmetically, while in the second example it is calculated numerically!
This can be a very big deal for some calculations; both with regards to speed and precision, and should be considered carefully when one wants to do complicated computations.
Speed example: Calculate exp(x) for x = 1,2,...,50000. (Arithmetic > numerical)
CodeTools:-Usage(seq(exp(x),x=1..50000)): # Arithmetic
memory used=19.84MiB, alloc change=0 bytes, cpu time=875.00ms,
real time=812.00ms, gc time=265.62ms
CodeTools:-Usage(seq(exp(1.*x),x=1..50000)): # Numerical
memory used=292.62MiB, alloc change=0 bytes, cpu time=9.67s,
real time=9.45s, gc time=1.09s
Notice especially the huge difference in memory used.
This is an example of when using floats gives worse performance. On the contrary, if we are just approximating anyways, numerical approximation is much faster.
Approximate exp(1) (numerical > arithmetic)
CodeTools:-Usage(seq((1+1/x)^x,x=1..20000)): # Arithmetic
memory used=0.64GiB, alloc change=0 bytes, cpu time=39.05s,
real time=40.92s, gc time=593.75ms
CodeTools:-Usage(seq((1+1./x)^x,x=1..20000)): # Numerical
memory used=56.17MiB, alloc change=0 bytes, cpu time=1.06s,
real time=1.13s, gc time=0ns
Precision example: For precision, things can go very wrong if one is not careful.
f:=x->(Pi-x)/sin(x);
limit(f(x),x=Pi); # Arithmetic returns 1 (true value)
limit(f(x),x=Pi*1.); # Numerical returns 0 (wrong!!!)
After a little working with that I finally found what it does!
short answer: it calculate the result of expression where those 2 integers are inputs.
extended answer:(example)
given 2 functions, we want to calculate Jacobin matrix for this equation system
with(linalg);
with(plots);
f := proc (x, y) -> (1/64)*(x-11)^2-(1/100)*(y-7)^2-1;
g := proc (x, y) -> (x-3)^2+(y-1)^2-400;
then we put functions in vector:
F:=(x, y) -> vector([f(x,y),g(x,y)]);
F(-2 ,20)
F(-2.,20)
result will be this:
[-79/1600 -14]
[-0.049375000 -14]

Random numbers that add to 1 with a minimum increment: Matlab

Having read carefully the previous question
Random numbers that add to 100: Matlab
I am struggling to solve a similar but slightly more complex problem.
I would like to create an array of n elements that sums to 1, however I want an added constraint that the minimum increment (or if you like number of significant figures) for each element is fixed.
For example if I want 10 numbers that sum to 1 without any constraint the following works perfectly:
num_stocks=10;
num_simulations=100000;
temp = [zeros(num_simulations,1),sort(rand(num_simulations,num_stocks-1),2),ones(num_simulations,1)];
weights = diff(temp,[],2);
I foolishly thought that by scaling this I could add the constraint as follows
num_stocks=10;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp2 = [zeros(num_simulations,1),sort(round(rand(num_simulations,num_stocks-1)*scaling)/scaling,2),ones(num_simulations,1)];
weights2 = diff(temp2,[],2);
However though this works for small values of n & small values of increment, if for example n=1,000 & the increment is 0.1% then over a large number of trials the first and last numbers have a mean which is consistently below 0.1%.
I am sure there is a logical explanation/solution to this but I have been tearing my hair out to try & find it & wondered anybody would be so kind as to point me in the right direction. To put the problem into context create random stock portfolios (hence the sum to 1).
Thanks in advance
Thank you for the responses so far, just to clarify (as I think my initial question was perhaps badly phrased), it is the weights that have a fixed increment of 0.1% so 0%, 0.1%, 0.2% etc.
I did try using integers initially
num_stocks=1000;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp = [zeros(num_simulations,1),sort(randi([0 scaling],num_simulations,num_stocks-1),2),ones(num_simulations,1)*scaling];
weights = (diff(temp,[],2)/scaling);
test=mean(weights);
but this was worse, the mean for the 1st & last weights is well below 0.1%.....
Edit to reflect excellent answer by Floris & clarify
The original code I was using to solve this problem (before finding this forum) was
function x = monkey_weights_original(simulations,stocks)
stockmatrix=1:stocks;
base_weight=1/stocks;
r=randi(stocks,stocks,simulations);
x=histc(r,stockmatrix)*base_weight;
end
This runs very fast, which was important considering I want to run a total of 10,000,000 simulations, 10,000 simulations on 1,000 stocks takes just over 2 seconds with a single core & I am running the whole code on an 8 core machine using the parallel toolbox.
It also gives exactly the distribution I was looking for in terms of means, and I think that it is just as likely to get a portfolio that is 100% in 1 stock as it is to geta portfolio that is 0.1% in every stock (though I'm happy to be corrected).
My issue issue is that although it works for 1,000 stocks & an increment of 0.1% and I guess it works for 100 stocks & an increment of 1%, as the number of stocks decreases then each pick becomes a very large percentage (in the extreme with 2 stocks you will always get a 50/50 portfolio).
In effect I think this solution is like the binomial solution Floris suggests (but more limited)
However my question has arrisen because I would like to make my approach more flexible & have the possibility of say 3 stocks & an increment of 1% which my current code will not handle correctly, hence how I stumbled accross the original question on stackoverflow
Floris's recursive approach will get to the right answer, but the speed will be a major issue considering the scale of the problem.
An example of the original research is here
http://www.huffingtonpost.com/2013/04/05/monkeys-stocks-study_n_3021285.html
I am currently working on extending it with more flexibility on portfolio weights & numbers of stock in the index, but it appears my programming & probability theory ability are a limiting factor.......
One problem I can see is that your formula allows for numbers to be zero - when the rounding operation results in two consecutive numbers to be the same after sorting. Not sure if you consider that a problem - but I suggest you think about it (it would mean your model portfolio has fewer than N stocks in it since the contribution of one of the stocks would be zero).
The other thing to note is that the probability of getting the extreme values in your distribution is half of what you want them to be: If you have uniformly distributed numbers from 0 to 1000, and you round them, the numbers that round to 0 were in the interval [0 0.5>; the ones that round to 1 came from [0.5 1.5> - twice as big. The last number (rounding to 1000) is again from a smaller interval: [999.5 1000]. Thus you will not get the first and last number as often as you think. If instead of round you use floor I think you will get the answer you expect.
EDIT
I thought about this some more, and came up with a slow but (I think) accurate method for doing this. The basic idea is this:
Think in terms of integers; rather than dividing the interval 0 - 1 in steps of 0.001, divide the interval 0 - 1000 in integer steps
If we try to divide N into m intervals, the mean size of a step should be N / m; but being integer, we would expect the intervals to be binomially distributed
This suggests an algorithm in which we choose the first interval as a binomially distributed variate with mean (N/m) - call the first value v1; then divide the remaining interval N - v1 into m-1 steps; we can do so recursively.
The following code implements this:
% random integers adding up to a definite sum
function r = randomInt(n, limit)
% returns an array of n random integers
% whose sum is limit
% calls itself recursively; slow but accurate
if n>1
v = binomialRandom(limit, 1 / n);
r = [v randomInt(n-1, limit - v)];
else
r = limit;
end
function b = binomialRandom(N, p)
b = sum(rand(1,N)<p); % slow but direct
To get 10000 instances, you run this as follows:
tic
portfolio = zeros(10000, 10);
for ii = 1:10000
portfolio(ii,:) = randomInt(10, 1000);
end
toc
This ran in 3.8 seconds on a modest machine (single thread) - of course the method for obtaining a binomially distributed random variate is the thing slowing it down; there are statistical toolboxes with more efficient functions but I don't have one. If you increase the granularity (for example, by setting limit=10000) it will slow down more since you increase the number of random number samples that are generated; with limit = 10000 the above loop took 13.3 seconds to complete.
As a test, I found mean(portfolio)' and std(portfolio)' as follows (with limit=1000):
100.20 9.446
99.90 9.547
100.09 9.456
100.00 9.548
100.01 9.356
100.00 9.484
99.69 9.639
100.06 9.493
99.94 9.599
100.11 9.453
This looks like a pretty convincing "flat" distribution to me. We would expect the numbers to be binomially distributed with a mean of 100, and standard deviation of sqrt(p*(1-p)*n). In this case, p=0.1 so we expect s = 9.4868. The values I actually got were again quite close.
I realize that this is inefficient for large values of limit, and I made no attempt at efficiency. I find that clarity trumps speed when you develop something new. But for instance you could pre-compute the cumulative binomial distributions for p=1./(1:10), then do a random lookup; but if you are just going to do this once, for 100,000 instances, it will run in under a minute; unless you intend to do it many times, I wouldn't bother. But if anyone wants to improve this code I'd be happy to hear from them.
Eventually I have solved this problem!
I found a paper by 2 academics at John Hopkins University "Sampling Uniformly From The Unit Simplex"
http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf
In the paper they outline how naive algorthms don't work, in a way very similar to woodchips answer to the Random numbers that add to 100 question. They then go on to show that the method suggested by David Schwartz can also be slightly biased and propose a modified algorithm which appear to work.
If you want x numbers that sum to y
Sample uniformly x-1 random numbers from the range 1 to x+y-1 without replacement
Sort them
Add a zero at the beginning & x+y at the end
difference them & subtract 1 from each value
If you want to scale them as I do, then divide by y
It took me a while to realise why this works when the original approach didn't and it come down to the probability of getting a zero weight (as highlighted by Floris in his answer). To get a zero weight in the original version for all but the 1st or last weights your random numbers had to have 2 values the same but for the 1st & last ones then a random number of zero or the maximum number would result in a zero weight which is more likely.
In the revised algorithm, zero & the maximum number are not in the set of random choices & a zero weight occurs only if you select two consecutive numbers which is equally likely for every position.
I coded it up in Matlab as follows
function weights = unbiased_monkey_weights(num_simulations,num_stocks,min_increment)
scaling=1/min_increment;
sample=NaN(num_simulations,num_stocks-1);
for i=1:num_simulations
allcomb=randperm(scaling+num_stocks-1);
sample(i,:)=allcomb(1:num_stocks-1);
end
temp = [zeros(num_simulations,1),sort(sample,2),ones(num_simulations,1)*(scaling+num_stocks)];
weights = (diff(temp,[],2)-1)/scaling;
end
Obviously the loop is a bit clunky and as I'm using the 2009 version the randperm function only allows you to generate permutations of the whole set, however despite this I can run 10,000 simulations for 1,000 numbers in 5 seconds on my clunky laptop which is fast enough.
The mean weights are now correct & as a quick test I replicated woodchips generating 3 numbers that sum to 1 with the minimum increment being 0.01% & it also look right
Thank you all for your help and I hope this solution is useful to somebody else in the future
The simple answer is to use the schemes that work well with NO minimum increment, then transform the problem. As always, be careful. Some methods do NOT yield uniform sets of numbers.
Thus, suppose I want 11 numbers that sum to 100, with a constraint of a minimum increment of 5. I would first find 11 numbers that sum to 45, with no lower bound on the samples (other than zero.) I could use a tool from the file exchange for this. Simplest is to simply sample 10 numbers in the interval [0,45]. Sort them, then find the differences.
X = diff([0,sort(rand(1,10)),1]*45);
The vector X is a sample of numbers that sums to 45. But the vector Y sums to 100, with a minimum value of 5.
Y = X + 5;
Of course, this is trivially vectorized if you wish to find multiple sets of numbers with the given constraint.

find discretization steps

I have data files F_j, each containing a list of numbers with an unknown number of decimal places. Each file contains discretized measurements of some continuous variable and
I want to find the discretization step d_j for file F_j
A solution I could come up with: for each F_j,
find the number (n_j) of decimal places;
multiply each number in F_j with 10^{n_j} to obtain integers;
find the greatest common divisor of the entire list.
I'm looking for an elegant way to find n_j with Matlab.
Also, finding the gcd of a long list of integers seems hard — do you have any better idea?
Finding the gcd of a long list of numbers isn't too hard. You can do it in time linear in the size of the list. If you get lucky, you can do it in time a lot less than linear. Essentially this is because:
gcd(a,b,c) = gcd(gcd(a,b),c)
and if either a=1 or b=1 then gcd(a,b)=1 regardless of the size of the other number.
So if you have a list of numbers xs you can do
g = xs(1);
for i = 2:length(xs)
g = gcd(x(i),g);
if g == 1
break
end
end
The variable g will now store the gcd of the list.
Here is some sample code that I believe will help you get the GCD once you have the numbers you want to look at.
A = [15 30 20];
A_min = min(A);
GCD = 1;
for n = A_min:-1:1
temp = A / n;
if (max(mod(temp,1))==0)
% yay GCD found
GCD = n;
break;
end
end
The basic concept here is that the default GCD will always be 1 since every number is divisible by itself and 1 of course =). The GCD also can't be greater than the smallest number in the list, thus I start with the smallest number and then decriment by 1. This is assuming that you have already converted the numbers to a whole number form at this point. Decimals will throw this off!
By using the modulus of 1 you are testing to see if the number is a whole number, if it isn't you will have a decmial remainder left which is greater than 0. If you anticipate having to deal with negative you will have to tweak this test!
Other than that, the first time you find a number where the modulus of the list (mod 1) is all zeros you've found the GCD.
Enjoy!