I fear that arc4random has betrayed me - iphone

I have code that pics a random number from 0 to 1. I am seeing that the number 1 is coming up far more times then the number 0 then I would think to be statistically possible.
This is my code:
int shipNumber = arc4random() % 2;
Should this code work? Am I just going crazy?

That code should work.
What I suspect you're seeing is truly random (or, at least, sufficiently random) and your brain is trying to find patterns. (Everybody's brain tries to find patterns everywhere. That's how you're reading this. The issue is there are no patterns in randomness [that being pretty much the definition] for your brain to latch on to, so it invents some.)
If you really want to check your output for randomness, you'll need to do a statistical analysis of some kind or other.

You might be seeing modulo bias.

Related

Faster way to compute `nchoosek` in MATLAB

I want to find a faster code than using P = nchoosek(1:100,i), which is located in a loop, and repeated i times in my code.
nchoosek(1:100,10) is absolutely vast, far bigger than any typical machine could hold in memory.
The MATLAB documentation for nchoosek says
C = nchoosek(v,k) is only practical for situations where length(v) is less than about 15.
You're not really going to be able to do this.
I found that VChoosek(v,k) is much faster than nchoosek.

Getting around floating point error with logarithms?

I'm trying to write a basic digit counter (an integer is inputted and the number of digits of that integer is outputted) for positive integers. This is my general formula:
dig(x) := Math.floor(Math.log(x,10))
I tried implementing the equivalent of dig(x) in Ruby, and found that when I was computing dig(1000) I was getting 2 instead of 3 because Math.log was returning 2.9999999999999996 which would then be truncated down to 2. What is the proper way to handle this problem? (I'm assuming this problem can occur regardless of the language used to implement this approach, but if that's not the case then please explain that in your answer).
To get an exact count of the number of digits in an integer, you can do the usual thing: (in C/C++, assuming n is non-negative)
int digits = 0;
while (n > 0) {
n = n / 10; // integer division, just drops the ones digit and shifts right
digits = digits + 1;
}
I'm not certain but I suspect running a built-in logarithm function won't be faster than this, and this will give you an exact answer.
I thought about it for a minute and couldn't come up with a way to make the logarithm-based approach work with any guarantees, and almost convinced myself that it is probably a doomed pursuit in the first place because of floating point rounding errors, etc.
From The Art of Computer Programming volume 2, we will eliminate one bit of error before the floor function is applied by adding that one bit back in.
Let x be the result of log and then do x += x / 0x10000000 for a single precision floating point number (C's float). Then pass the value into floor.
This is guaranteed to be the fastest (assuming you have the answer in numerical form) because it uses only a few floating point instructions.
Floating point is always subject to roundoff error; that's one of the hazards you need to be aware of, and actively manage, when working with it. The proper way to handle it, if you must use floats is to figure out what the expected amount of accumulated error is and allow for that in comparisons and printouts -- round off appropriately, compare for whether the difference is within that range rather than comparing for equality, etcetera.
There is no exact binary-floating-point representation of simple things like 1/10th, for example.
(As others have noted, you could rewrite the problem to avoid using the floating-point-based solution entirely, but since you asked specifically about working log() I wanted to address that question; apologies if I'm off target. Some of the other answers provide specific suggestions for how you might round off the result. That would "solve" this particular case, but as your floating operations get more complicated you'll have to continue to allow for roundoff accumulating at each step and either deal with the error at each step or deal with the cumulative error -- the latter being the more complicated but more accurate solution.)
If this is a serious problem for an application, folks sometimes use scaled fixed point instead (running financial computations in terms of pennies rather than dollars, for example). Or they use one of the "big number" packages which computes in decimal rather than in binary; those have their own round-off problems, but they round off more the way humans expect them to.

How can I swap a section of a row with another within an array?

I am in the process of coding a simple Genetic Algorithm (GA). There are probably countless areas where I have unnecessarily used a for loop. I would like some tips on how to be more MATLAB efficient as well as an answer to my question. As far as I can tell I have succeeded but I am not sure. The area which this code defines is single-point crossover
Here is what I have tried...
crossPoints=randi([1 24],popSize/2,1);
for popNo=2:2:popSize
isolate=chromoParent(popNo-1:popNo,crossPoints(popNo/2,1)+1:end);
isolate([1 2],:)=isolate([2 1],:);
chromoParent(popNo-1:popNo,crossPoints(popNo/2,1)+1:end)=isolate;
end
chromoChild=chromoParent;
where, 'crossPoints' is the point at which single point crossover
between two binary encoded chromosomes is required.
'popSize' is the size of the population, required by my code to
be an even number
'isolate' defines the sections of 2 rows which are required to be swapped
with each other
'chromoParent' is the initial population which is required to be
changed by single-point crossover
'chromoChild' is the resulting population
Both 'chromoParent' and 'chromoChild' are represented by an array of
size, popSize x 25 binary characters
Can you spot an error in the way I am thinking about this problem? What's the most efficient way (in computational time) to achieve the same thing? It would help if you could be as broad as possible so that I could begin applying the principles I learn here to the rest of my code.
Thank you.
Your code looks fine. If you want, you can reduce the instructions in the loop to a single line by some very simple indexing:
chromoParent( popNo-1:popNo, crossPoints(popNo/2,1)+1:end) = ...
chromoParent(popNo:-1:popNo-1,crossPoints(popNo/2,1)+1:end);
This may be marginally faster, but as with any optimization, you should profile it first (My guess is that these line contribute very little to the overall CPU time).

max likelihood fminsearch

I used Matlab-fminsearch for a negativ max likelihood model for a binomial distributed function. I don't get any error notice, but the parameter which I want to estimate, take always the start value. Apparently, there is a mistake. I know that I ask a totally general question. But is it possible that anybody had the same mistake and know how to deal with it?
Thanks a lot,
#woodchips, thank you a lot. Step by step, I've tried to do what you advised me. First of all, I actually maximized (-log(likelihood)) and this is not the problem. I think I found out the problem but I still have some questions, if I don't bother you. I have a model(param) to maximize in paramstart=p1. This model is built for (-log(likelihood(F))) and my F is a vectorized function like F(t,Z,X,T,param,m2,m3,k,l). I have a data like (tdata,kdata,ldata),X,T are grids and Z is a function on this grid and (m1,m2,m3) are given parameters.When I want to see the value of F(tdata,Z,X,T,m1,m2,m3,kdata,ldata), I get a good output. But I think fminsearch accept that F(tdata,Z,X,T,p,m2,m3,kdata,ldata) like a constant and thatswhy I always have as estimated parameter the start value. I will be happy, if you have any advise to tweak that.
You have some options you can try to tweak. I'd start with algorithm.
When the function value practically doesn't change around your startpoint it's also problematic. Maybe switching to log-likelyhood helps.
I always use fminunc or fmincon. They allow also providing the Hessian (typically better than "estimated") or 'typical values' so the algorithm doesn't spend time in unfeasible regions.
It is virtually always true that you should NEVER maximize a likelihood function, but ALWAYS maximize the log of that function. Floating point issues will almost always corrupt the problem otherwise. That your optimization starts and stops at the same point is a good indicator this is the problem.
You may well need to dig a little deeper than the above, but even so, this next test is the test I recommend that all users of optimization tools do for every one of their problems, BEFORE they throw a function into an optimizer. Evaluate your objective for several points in the vicinity. Does it yield significantly different values? If not, then look to see why not. Are you creating a non-smooth objective to optimize, or a zero objective? I.e., zero to within the supplied tolerances?
If it does yield different values but still not converge, then make sure you know how to call the optimizer correctly. Yeah, right, like nobody has ever made this mistake before. This is actually a very common cause of failure of optimizers.
If it does yield good values that vary, and you ARE calling the optimizer correctly, then think if there are regions into which the optimizer is trying to diverge that yield garbage results. Is the objective generating complex or imaginary results?

iOS -- implementing complex numbers

As a follow-up to this question:
I was in the process of implementing a calculator app using Apple's complex number support when I noticed that if one calculates using that support, one ends up with the following:
(1+i)^2=1.2246063538223773e-16 + 2i
Of course the correct identity is (1+i)^2=2i. This is a specific example of a more general phenomenon -- roundoff errors can be really annoying if they round a part that is supposed to be zero to something that is slightly nonzero.
Suggestions on how to deal with this? I could implement integer powers of complex numbers in other ways, but the general problem will remain, and my solution could itself cause other inconsistencies.
As you note, this is as standard rounding error issue with floating points. A #Howard notes, you should likely round your double results back into the float range before displaying.
I typically use FLT_EPSILON to help me with these kinds of things as well.
#define fequal(a,b) (fabs((a) - (b)) < FLT_EPSILON)
#define fequalzero(a) (fabs(a) < FLT_EPSILON)
With those, you might like a function like this (untested)
inline void froundzero(a) { if (fequalzero(a)) a = 0; }
The complex version is left as an exercise for the reader as they say :D