Matlab nchoosek got difference answer using int64 and sym - matlab

This is a question about the function nchoosek in Matlab.
I want to find nchoosek(54,25), which is the same as 54C25. Since the answer is about 10^15, I originally use int64. However the answer is wrong with respect to the symbolic one.
Input:
nchoosek(int64(54),int64(25))
nchoosek(sym(54),sym(25))
Output:
1683191473897753
1683191473897752
You can see that they differ by one. This is not really an urgent problem since I now use sym. However can someone tell me why this happens?
EDIT:
I am using R2013a.
I take a look at the nchoosek.m, and find that if the input are in int64, the code can be simplified into
function c = nchoosek2(v,k)
n = v; % rename v to be n. the algorithm is more readable this way.
classOut = 'int64';
nd = double(n);
kd = double(k);
nums = (nd-kd+1):nd;
dens = 1:kd;
nums = nums./dens; %%
c = round(prod(nums));
c = cast(c,classOut);
end
However, the outcome of int64(prod(nums./dens)) is different from prod(sym(nums)./sym(dens)) for me. Is this the same for everyone?

I don't have this problem on R2014a:
Numeric
>> n = int64(54);
>> k = int64(25);
>> nchoosek(n,k)
ans =
1683191473897752 % class(ans) == int64
Symbolic
>> nn = sym(n);
>> kk = sym(k);
>> nchoosek(nn,kk)
ans =
1683191473897752 % class(ans) == sym
% N!/((N-K)! K!)
>> factorial(nn) / (factorial(nn-kk) * factorial(kk))
ans =
1683191473897752 % class(ans) == sym
If you check the source code of the function edit nchoosek.m, you'll see it specifically handles the case of 64-bit integers using a separate algorithm. I won't reproduce the code here, but here are the highlights:
function c = nchoosek(v,k)
...
if int64type
% For 64-bit integers, use an algorithm that avoids
% converting to doubles
c = binCoef(n,k,classOut);
else
% Do the computation in doubles.
...
end
....
end
function c = binCoef(n,k,classOut)
% For integers, compute N!/((N-K)! K!) using prime factor cancellations
...
end

In 2013a this can be reproduced...
There is as #Amro shows a special case in nchoosek for classOut of int64 or unit64,
however in 2013a this is only applied when the answer is between
flintmax (with no argument) and
double(intmax(classOut)) + 2*eps(double(intmax(classOut)))
which for int64 gives 9007199254740992 & 9223372036854775808, which the solution does not lie between...
If the solution had fallen between these values it would be recalculated using the subfunction binCoef
for which the help states: For integers, compute N!/((N-K)! M!) using prime factor cancellations
The binCoef function would have produced the right answer for the given int64 inputs
In 2013a with these inputs binCoef is not called
Instead the "default" pascals triangle method is used in which:
Inputs are cast to double
The product of the vector ((n-k+1):n)./(1:k) is taken
this vector contains k double representations of fractions.
So what we have is almost certainly floating point error.
What can be done?
Two options I can see;
Make your own function based on the code in binCoef,
Modify nchoosek and remove && c >= flintmax from line 81
Removing this expression will force Matlab to use the more accurate integer based calculation for inputs of int64 and uint64 for any values within their precision. This will be slightly slower but will avoid floating point errors, which are rightfully unexpected when working with integer types.
Option one - should be fairly straight forward...
Option two - I recommend keeping an unchanged backup of the original function, or makeing a copy of the function with the modification and use that instead.

Related

Make vector of elements less than each element of another vector

I have a vector, v, of N positive integers whose values I do not know ahead of time. I would like to construct another vector, a, where the values in this new vector are determined by the values in v according to the following rules:
- The elements in a are all integers up to and including the value of each element in v
- 0 entries are included only once, but positive integers appear twice in a row
For example, if v is [1,0,2] then a should be: [0,1,1,0,0,1,1,2,2].
Is there a way to do this without just doing a for-loop with lots of if statements?
I've written the code in loop format but would like a vectorized function to handle it.
The classical version of your problem is to create a vector a with the concatenation of 1:n(i) where n(i) is the ith entry in a vector b, e.g.
b = [1,4,2];
gives a vector a
a = [1,1,2,3,4,1,2];
This problem is solved using cumsum on a vector ones(1,sum(b)) but resetting the sum at the points 1+cumsum(b(1:end-1)) corresponding to where the next sequence starts.
To solve your specific problem, we can do something similar. As you need two entries per step, we use a vector 0.5 * ones(1,sum(b*2+1)) together with floor. As you in addition only want the entry 0 to occur once, we will just have to start each sequence at 0.5 instead of at 0 (which would yield floor([0,0.5,...]) = [0,0,...]).
So in total we have something like
% construct the list of 0.5s
a = 0.5*ones(1,sum(b*2+1))
% Reset the sum where a new sequence should start
a(cumsum(b(1:end-1)*2+1)+1) =a(cumsum(b(1:end-1)*2+1)+1)*2 -(b(1:end-1)+1)
% Cumulate it and find the floor
a = floor(cumsum(a))
Note that all operations here are vectorised!
Benchmark:
You can do a benchmark using the following code
function SO()
b =randi([0,100],[1,1000]);
t1 = timeit(#() Nicky(b));
t2 = timeit(#() Recursive(b));
t3 = timeit(#() oneliner(b));
if all(Nicky(b) == Recursive(b)) && all(Recursive(b) == oneliner(b))
disp("All methods give the same result")
else
disp("Something wrong!")
end
disp("Vectorised time: "+t1+"s")
disp("Recursive time: "+t2+"s")
disp("One-Liner time: "+t3+"s")
end
function [a] = Nicky(b)
a = 0.5*ones(1,sum(b*2+1));
a(cumsum(b(1:end-1)*2+1)+1) =a(cumsum(b(1:end-1)*2+1)+1)*2 -(b(1:end-1)+1);
a = floor(cumsum(a));
end
function out=Recursive(arr)
out=myfun(arr);
function local_out=myfun(arr)
if isscalar(arr)
if arr
local_out=sort([0,1:arr,1:arr]); % this is faster
else
local_out=0;
end
else
local_out=[myfun(arr(1:end-1)),myfun(arr(end))];
end
end
end
function b = oneliner(a)
b = cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),a,'UniformOutput',false));
end
Which gives me
All methods give the same result
Vectorised time: 0.00083574s
Recursive time: 0.0074404s
One-Liner time: 0.0099933s
So the vectorised one is indeed the fastest, by a factor approximately 10.
This can be done with a one-liner using eval:
a = eval(['[' sprintf('sort([0 1:%i 1:%i]) ',[v(:) v(:)]') ']']);
Here is another solution that does not use eval. Not sure what is intended by "vectorized function" but the following code is compact and can be easily made into a function:
a = [];
for i = 1:numel(v)
a = [a sort([0 1:v(i) 1:v(i)])];
end
Is there a way to do this without just doing a for loop with lots of if statements?
Sure. How about recursion? Of course, there is no guarantee that Matlab has tail call optimization.
For example, in a file named filename.m
function out=filename(arr)
out=myfun(in);
function local_out=myfun(arr)
if isscalar(arr)
if arr
local_out=sort([0,1:arr,1:arr]); % this is faster
else
local_out=0;
end
else
local_out=[myfun(arr(1:end-1)),myfun(arr(end))];
end
end
end
in cmd, type
input=[1,0,2];
filename(input);
You can take off the parent function. I added it just hoping Matlab can spot the recursion within filename.m and optimize for it.
would like a vectorized function to handle it.
Sure. Although I don't see the point of vectorizing in such a unique puzzle that is not generalizable to other applications. I also don't foresee a performance boost.
For example, assuming input is 1-by-N. In cmd, type
input=[1,0,2];
cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),input,'UniformOutput',false)
Benchmark
In R2018a
>> clear all
>> in=randi([0,100],[1,100]); N=10000;
>> T=zeros(N,1);tic; for i=1:N; filename(in) ;T(i)=toc;end; mean(T),
ans =
1.5647
>> T=zeros(N,1);tic; for i=1:N; cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),in,'UniformOutput',false)); T(i)=toc;end; mean(T),
ans =
3.8699
Ofc, I tested with a few more different inputs. The 'vectorized' method is always about twice as long.
Conclusion: Recursion is faster.

Approximating an integral in MATLAB

I've been trying to implement the following integral in MATLAB
Given a number n, I wrote the code that returns an array with n elements, containing approximations of each integral.
First, I tried this using a 'for' loop and the recurrence relationship on the first line. But from the 20th integral and above the values are completely wrong (correct to 0 significant figures and wrong sign).
The same goes if I use the explicit formula on the second line and two 'for' loops.
As n grows larger, so does the error on the approximations.
So the main issue here is that I haven't found a way to minimize the error as much as possible.
Any ideas? Thanks in advance.
Here is an example of the code and the resulting values, using the second formula:
This integral, for positive values of n, cannot have values >1 or <0
First attempt:
I tried the iterative method and found interesting thing. The approximation may not be true for all n. In fact if I keep track of (n-1)*I(n-1) in each loop I can see
I = zeros(20,3);
I(1,1) = 1-1/exp(1);
for ii = 2:20
I(ii,2) = ii-1;
I(ii,3) = (ii-1)*I(ii-1,1);
I(ii,1) = 1-I(ii,3);
end
There is some problem around n=18. In fact, I18 = 0.05719 and 18*I18 = 1.029 which is larger than 1. I don't think there is any numerical error or number overflow in this procedure.
Second attempt:
To make sure the maths is correct (I verified twice on paper) I used trapz to numerically evaluate the integral, and n=18 didn't cause any problem.
>> x = linspace(0,1,1+1e4);
>> f = #(n) exp(-1)*exp(x).*x.^(n-1);
>> f = #(n) exp(-1)*exp(x).*x.^(n-1)*1e-4;
>> trapz(f(5))
ans =
1.708934160520510e-01
>> trapz(f(17))
ans =
5.571936009790170e-02
>> trapz(f(18))
ans =
5.277113416899408e-02
>>
A closer look is as follows. I18 is slightly different (to the 4th significant digit) between the (stable) numerical method and (unstable) iterative method. 18*I18 is therefore possible to exceed 1.
I = zeros(20,3);
I(1,1) = 1-1/exp(1);
for ii = 2:20
I(ii,2) = ii-1;
I(ii,3) = (ii-1)*I(ii-1,1);
I(ii,1) = 1-I(ii,3);
end
J = zeros(20,3);
x = linspace(0,1,1+1e4);
f = #(n) exp(-1)*exp(x).*x.^(n-1)*1e-4;
J(1,1) = trapz(f(1));
for jj = 2:20
J(jj,1) = trapz(f(jj));
J(jj,2) = jj-1;
J(jj,3) = (jj-1)*J(jj-1,1);
end
I suspect there is an error in each iterative step due to the nature of numerical computations. If the iteration is long, the error propagates and, unfortunately in this case, amplifies rapidly. In order to verify this, I combined the above two methods into a hybrid algo. For most of the time the iterative way is used, and once in a while a numerical integral is evaluated from scratch without relying on previous iterations.
K = zeros(40,4);
K(1,1) = 1-1/exp(1);
for kk = 2:40
K(kk,2) = trapz(f(kk));
K(kk,3) = (kk-1)*K(kk-1,1);
K(kk,4) = 1-K(kk,3);
if mod(kk,5) == 0
K(kk,1) = K(kk,2);
else
K(kk,1) = K(kk,4);
end
end
If the iteration lasts more than 4 steps, error amplification will be large enough to invert the sign, and starts nonrecoverable oscillation.
The code should be able to explain all the data structures. Anyway, let me put some focus here. The second column is the result of trapz, which is the numerical integral done on the non-iterative integration definition of I(n). The third column is (n-1)*I(n-1) and should be always positive and less than 1. The forth column is 1-(n-1)*I(n-1) and should always be positive. The first column is the choice I have made between the trapz result and iterative result, to be the "true" value of I(n).
As can be seen here, in each iteration there is a small error compared to the independent numerical way. The error grows in the 3rd and 4th iteration and finally breaks the thing in its 5th. This is observed around n=25, under the case that I pick the numerical result in every 5 loops since the beginning.
Conclusion: There is nothing wrong with any definition of this integral. However the numerical error when evaluating the expressions is unfortunately aggregating, hence limiting the way you can perform the computation.

Solve equation with exponential term

I have the equation 1 = ((π r2)n) / n! ∙ e(-π r2)
I want to solve it using MATLAB. Is the following the correct code for doing this? The answer isn't clear to me.
n= 500;
A= 1000000;
d= n / A;
f= factorial( n );
solve (' 1 = ( d * pi * r^2 )^n / f . exp(- d * pi * r^2) ' , 'r')
The answer I get is:
Warning: The solutions are parametrized by the symbols:
k = Z_ intersect Dom::Interval([-(PI/2 -
Im(log(`fexp(-PI*d*r^2)`)/n)/2)/(PI*Re(1/n))], (PI/2 +
Im(log(`fexp(-PI*d*r^2)`)/n)/2)/(PI*Re(1/n)))
> In solve at 190
ans =
(fexp(-PI*d*r^2)^(1/n))^(1/2)/(pi^(1/2)*d^(1/2)*exp((pi*k*(2*i))/n)^(1/2))
-(fexp(-PI*d*r^2)^(1/n))^(1/2)/(pi^(1/2)*d^(1/2)*exp((pi*k*(2*i))/n)^(1/2))
You have several issues with your code.
1. First, you're evaluating some parts in floating-point. This isn't always bad as long as you know the solution will be exact. However, factorial(500) overflows to Inf. In fact, for factorial, anything bigger than 170 will overflow and any input bigger than 21 is potentially inexact because the result will be larger than flintmax. This calculation should be preformed symbolically via sym/factorial:
n = sym(500);
f = factorial(n);
which returns an integer approximately equal to 1.22e1134 for f.
2. You're using a period ('.') to specify multiplication. In MuPAD, upon which most of the symbolic math functions are based, a period is shorthand for concatenation.
Additionally, as is stated in the R2015a documentation (and possibly earlier):
String inputs will be removed in a future release. Use syms to declare the variables instead, and pass them as a comma-separated list or vector.
If you had not used a string, I don't think that it would have been possible for your command to get misinterpreted and return such a confusing result. Here is how you could use solve with symbolic variables:
syms r;
n = sym(500);
A = sym(1000000);
d = n/A;
s = solve(1==(d*sym(pi)*r^2)^n/factorial(n)*exp(-d*sym(pi)*r^2),r)
which, after several minutes, returns a 1,000-by-1 vector of solutions, all of which are complex. As #BenVoigt suggests, you can try the 'Real' option for solve. However, in R2015a at least, the four solutions returned in terms of lambertw don't appear to actually be real.
A couple things to note:
MATLAB is not using the values of A, d, and f from your workspace.
f . exp is not doing at all what you wanted, which was multiplication. It's instead becoming an unknown function fexp
Passing additional options of 'Real', true to solve gets rid of most of these extraneous conditions.
You probably should avoid calling the version of solve which accepts a string, and use the Symbolic Toolbox instead (syms 'r')

Insanely huge numbers in symbolic variables after assigning double

I first define some differential equations:
%% Definitions
% Constants
syms L R J Ke p
% Input
syms ud uq m
% Output
syms id iq ome theta
% Derivations
syms did diq dome dtheta
%% Equations
did=(ud/L)-(R/L)*id+ome*iq;
diq=(uq/L)-(R/L)*iq-ome*id-(Ke/L)*ome;
dome = (p/J)*((3/2)*p*Ke*iq-m);
dtheta = ome;
I'm trying to calculate R and L now. The input and output variables come from simulink:
idvalues = DQ_OUT.signals.values(:,1);
iqvalues = DQ_OUT.signals.values(:,2);
udvalues = UIdq.signals.values(:,1);
uqvalues = UIdq.signals.values(:,2);
% ... define some position in these arrays ...
% Define values for symbolic variables
id=idvalues(position);
ud=udvalues(position);
iq=iqvalues(position);
ome=iqvalues(position);
These are double. I then eval the first equation:
eval(did)
And I get this crap:
ans =
6002386699416615/(18014398509481984*L) - (846927175344863*R)/(1125899906842624*L) + 4168268387464377/9007199254740992
I was thinking that mathematics calculator like matlab won't bother you with variable types, but what I see here is definitely variable type problem - the actual values are less than 1:
Specifically:
id = 0.7522
ud = 0.3332
iq = 0.6803
ome = 0.6803
When doing symbolic calculations, Matlab uses rational numbers for small decimals. This prevents floating point numerical issues and keeps the results exact. However as you found, it makes the results harder to read.
Matlab also has a vpa (variable precision arithmetic) function, which is capable of keeping up to 2^(29)+1 digits (apparently) in calculations, which means Matlab doesn't need to stick to rational functions in order to maintain exact results.
Before viewing the output of a symbolic calculation, use vpa to convert rational numbers with large numerators/denominators to decimal expansions, by using, in your case, vpa(eval(did)).
For example, defining
syms a
b=0.75221
then a*b gives
>> a*b
ans =
(75221*a)/100000
but vpa(a*b) gives
>> vpa(a*b)
ans =
0.75221*a

Matlab dec2bin gives wrong values

I'm using Matlab's dec2bin to convert decimal number to binary string. However, I'm getting wrong results. For example:
>> dec2bin(13339262925365424727)
ans =
1011100100011110100101001111010011000111111100011011000000000000
I checked both in a C++ implementation and in wolfram alpha and the correct result is:
1011100100011110100101001111010011000111111100011011001001010111
Is there any problem with my usage of Matlab's desc2bin?
Thanks,
Gil.
Your code is equivalent to:
x=13339262925365424727;
dec2bin(x)
but if you check the value of x, you will notice that it outruns double precision. The number is simply to large to be stored in a 64bit double. The precision is 2^11, check eps(x)
To deal with large numbers, using vpa from the symbolic toolbox is a good option, is this available?
Here is a solution using vpa:
function l=ldec2bin(x)
if x>2^52
head=floor(x/2^52);
tail=x-head*2^52;
l=[ldec2bin(head),dec2bin(double(tail),52)];
else
l=dec2bin(double(x));
end
end
usage:
>> ldec2bin(vpa('13339262925365424727'))
ans =
1011100100011110100101001111010011000111111100011011001001010111
/Update:
I came across a much shorter implementation of dec2bin for symbolic variables:
>> sdec2bin=#(x)(feval(symengine,'int2text',x,2))
sdec2bin =
#(x)(feval(symengine,'int2text',x,2))
>> sdec2bin(sym('13339262925365424727'))
ans =
1011100100011110100101001111010011000111111100011011001001010111
The integer seems to long, maybe you should try de2bi function;
http://www.mathworks.com/help/comm/ref/de2bi.html
Assuming that the input is less than intmax('uint64'), as in the example, here is a solution that doesn't require the Symbolic Math toolbox. This supports two input arguments, matching dec2bin, is vectorized, and should be much faster:
function s=int2bin(d,n)
%INT2BIN Convert nonnegative integer to a binary string
if isempty(d)
s = '';
return;
end
d = d(:);
if ~isinteger(d) || any(d < 0)
error('int2bin:InvalidIntegerInput',...
'First input must be a nonnegative integer class array.');
end
if nargin < 2
n = 1
else
n = round(double(n));
end
m = double(nextpow2(max(d)));
s = [repmat('0',length(d),n-m) rem(bsxfun(#bitshift,d,1-m:0),2)+'0'];
If you don't mind a bit less performance and prefer a one-line anonymous function, try:
int2bin = #(d,n)char(rem(bsxfun(#bitshift,d(:),1-max(n,double(nextpow2(max(d(:))))):0),2)+'0');
or this one that uses bitand instead of bitshift:
int2bin = #(d,n)char(~~bsxfun(#bitand,d(:),2.^(max(n,nextpow2(max(d(:)))):-1:0))+'0');
All versions above assume that d is a nonnegative integer class variable, e.g., uint64(13339262925365424727), and that n is a nonnegative numeric scalar. You can find full-featured int2bin and bin2int functions on my GitHub.