How to fix illegal character error in mex files - matlab

I'm getting some error message when I try to compile the fortran code on matlab.
>> mex points.f
Warning: MATLAB FORTRAN MEX Files are now defaulting to -largeArrayDims and 8 byte integers.
If you are building a FORTRAN S-Function, please recompile using the -compatibleArrayDims flag.
You can find more about adapting code to use 64-bit array dimensions at:
https://www.mathworks.com/help/matlab/matlab_external/upgrading-mex-files-to-use-64-bit-api.html.
Building with 'Intel Parallel Studio XE 2019 for Fortran with Microsoft Visual Studio 2017'.
Error using mex
C:\Users\Kinan\Desktop\Strathshare\Personal Folders\PhD\MATLABPERIDYNAMICS\points.f(44): error #5149: Illegal character in statement
label field [r]
re*8 dx, ral
----^
C:\Users\Kinan\Desktop\Strathshare\Personal Folders\PhD\MATLABPERIDYNAMICS\points.f(45): error #5149: Illegal character in statement
label field [r]
re*8 coordx, coordy, coordz
----^
>C:\Users\Kinan\Desktop\Strathshare\Personal Folders\PhD\MATLABPERIDYNAMICS\points.f(46): error #5149: Illegal character in statement
label field [r]
real*8 coord(totnode,3)
----^
The actual code is
#include "fintrf.h"
C======================================================================
C points.f
C Computational function that creates a cube of equdistant points
C This is a MEX file for MATLAB.
C======================================================================
C Gateway routine
subroutine mexFunction(nlhs, plhs, nrhs, prhs)
C Declarations
implicit none
C mexFunction arguments:
mwPointer plhs(*), prhs(*)
integer nlhs, nrhs
C Function declarations:
mwPointer mxGetDoubles
mwPointer mxCreateDoubleMatrix
integer mxIsNumeric
mwPointer mxGetM, mxGetN
C Pointers to input/output mxArrays:
mwPointer x_ptr, y_ptr
C Array information:
mwPointer mrows, ncols
mwSize size
C Arguments for computational routine:
real*8 dx, r
real*8 coordx, coordy, coordz
real*8 coord(totnode,3)
real*8 ndivx, ndivy, ndivz
integer i, j, k
C Get the size of the input array.
mrows = mxGetM(prhs(1))
ncols = mxGetN(prhs(1))
size = mrows*ncols
MX_HAS_INTERLEAVED_COMPLEX
x_ptr = mxGetDoubles(prhs(1))
C Create matrix for the return argument.
plhs(1) = mxCreateDoubleMatrix(29791,3,0)
y_ptr = mxGetDoubles(plhs(1))
call points(coord,r,dx,ndivx,ndivy,ndivz)
C Load the data into y_ptr, which is the output to MATLAB.
call mxCopyReal8ToPtr(y_output,y_ptr,size)
return
end
C-----------------------------------------------------------------------
C Computational routine
subroutine points(coord,r,dx,ndivx,ndivy,ndivz)
C Arguments for computational routine:
real*8 dx, r, coordx, coordy, coordz
real*8 coord(totnode,3), ndivx, ndivy, ndivz
integer i, j, k
do i = 1,ndivx
do j = 1,ndivy
do k = 1,ndivz
coordx = -1.0d0 / 2.0d0 * r + (dx / 2.0d0) + (i - 1) * dx
coordy = -1.0d0 / 2.0d0 * r + (dx / 2.0d0) + (j - 1) * dx
coordz = -1.0d0 / 2.0d0 * r + (dx / 2.0d0) + (k - 1) * dx
nnum = nnum + 1
coord(nnum,1) = coordx
coord(nnum,2) = coordy
coord(nnum,3) = coordz
enddo
enddo
enddo
return
end
I have a few for loops I need to do this for so if I can get a working template it would help a lot.
Sorry I tried to add more of the error message but it said I had too much code

I managed to MEX the code, but there are too many errors...
For suppressing the warning about largeArrayDims you can execute:
warning('Off', 'MATLAB:mex:FortranLargeArrayDimsWarn_link');
Note: Your Fortran code applies MX_HAS_INTERLEAVED_COMPLEX, so you need to add -2018a flag to the mex command.
I could not find a way to avoid the warning when using -2018a flag.
MEX command line uses -2018a flag:
mex -R2018a points.F
I had to make too many modifications to your code, in order to pass compilation:
I added spaces to the beginning of the lines.
I removed MX_HAS_INTERLEAVED_COMPLEX.
I didn't know what to do with totnode, so I replaced it with the value 100.
I didn't know what to do with y_output, so I replaced it with coord.
Here is your modified code that passes compilation:
#include "fintrf.h"
C======================================================================
C points.f
C Computational function that creates a cube of equdistant points
C This is a MEX file for MATLAB.
C======================================================================
C Gateway routine
subroutine mexFunction(nlhs, plhs, nrhs, prhs)
C Declarations
implicit none
C mexFunction arguments:
mwPointer plhs(*), prhs(*)
integer nlhs, nrhs
C Function declarations:
mwPointer mxGetDoubles
mwPointer mxCreateDoubleMatrix
integer mxIsNumeric
mwPointer mxGetM, mxGetN
C Pointers to input/output mxArrays:
mwPointer x_ptr, y_ptr
C Array information:
mwPointer mrows, ncols
mwSize size
C Arguments for computational routine:
real*8 dx, r
real*8 coordx, coordy, coordz
C What is totnode???
C real*8 coord(totnode,3)
real*8 coord(100,3)
real*8 ndivx, ndivy, ndivz
integer i, j, k
C Get the size of the input array.
mrows = mxGetM(prhs(1))
ncols = mxGetN(prhs(1))
size = mrows*ncols
C MX_HAS_INTERLEAVED_COMPLEX
x_ptr = mxGetDoubles(prhs(1))
C Create matrix for the return argument.
plhs(1) = mxCreateDoubleMatrix(29791,3,0)
y_ptr = mxGetDoubles(plhs(1))
call points(coord,r,dx,ndivx,ndivy,ndivz)
C Load the data into y_ptr, which is the output to MATLAB.
C call mxCopyReal8ToPtr(y_output,y_ptr,size) What is y_output???
call mxCopyReal8ToPtr(coord,y_ptr,size)
return
end
C-----------------------------------------------------------------------
C Computational routine
subroutine points(coord,r,dx,ndivx,ndivy,ndivz)
C Arguments for computational routine:
real*8 dx, r, coordx, coordy, coordz
C What is totnode???
C real*8 coord(totnode,3), ndivx, ndivy, ndivz
real*8 coord(100,3), ndivx, ndivy, ndivz
integer i, j, k
do i = 1,ndivx
do j = 1,ndivy
do k = 1,ndivz
coordx = -1.0d0 / 2.0d0 * r + (dx / 2.0d0) + (i - 1) * dx
coordy = -1.0d0 / 2.0d0 * r + (dx / 2.0d0) + (j - 1) * dx
coordz = -1.0d0 / 2.0d0 * r + (dx / 2.0d0) + (k - 1) * dx
nnum = nnum + 1
coord(nnum,1) = coordx
coord(nnum,2) = coordy
coord(nnum,3) = coordz
enddo
enddo
enddo
return
end
I hope it helps you continue your development.

Related

Speed in Matlab vs. Julia vs. Fortran

I am playing around with different languages to solve a simple value function iteration problem where I loop over a state-space grid. I am trying to understand the performance differences and how I could tweak each code. For posterity I have posted full length working examples for each language below. However, I believe that most of the tweaking is to be done in the while loop. I am a bit confused what I am doing wrong in Fortran as the speed seems subpar.
Matlab ~2.7secs : I am avoiding a more efficient solution using the repmat function for now to keep the codes comparable. Code seems to be automatically multithreaded onto 4 threads
beta = 0.98;
sigma = 0.5;
R = 1/beta;
a_grid = linspace(0,100,1001);
tic
[V_mat, next_mat] = valfun(beta, sigma, R ,a_grid);
toc
where valfun()
function [V_mat, next_mat] = valfun(beta, sigma, R, a_grid)
zeta = 1-1/sigma;
len = length(a_grid);
V_mat = zeros(2,len);
next_mat = zeros(2,len);
u = zeros(2,len,len);
c = zeros(2,len,len);
for i = 1:len
c(1,:,i) = a_grid(i) - a_grid/R + 20.0;
c(2,:,i) = a_grid(i) - a_grid/R;
end
u = c.^zeta * zeta^(-1);
u(c<=0) = -1e8;
tol = 1e-4;
outeriter = 0;
diff = 1000.0;
while (diff>tol) %&& (outeriter<20000)
outeriter = outeriter + 1;
V_last = V_mat;
for i = 1:len
[V_mat(1,i), next_mat(1,i)] = max( u(1,:,i) + beta*V_last(2,:));
[V_mat(2,i), next_mat(2,i)] = max( u(2,:,i) + beta*V_last(1,:));
end
diff = max(abs(V_mat - V_last));
end
fprintf("\n Value Function converged in %i steps. \n", outeriter)
end
Julia (after compilation) ~5.4secs (4 threads (9425469 allocations: 22.43 GiB)), ~7.8secs (1 thread (2912564 allocations: 22.29 GiB))
[EDIT: after adding correct broadcasting and #views its only 1.8-2.1seconds now, see below!]
using LinearAlgebra, UnPack, BenchmarkTools
struct paramsnew
β::Float64
σ::Float64
R::Float64
end
function valfun(params, a_grid)
#unpack β,σ, R = params
ζ = 1-1/σ
len = length(a_grid)
V_mat = zeros(2,len)
next_mat = zeros(2,len)
u = zeros(2,len,len)
c = zeros(2,len,len)
#inbounds for i in 1:len
c[1,:,i] = #. a_grid[i] - a_grid/R .+ 20.0
c[2,:,i] = #. a_grid[i] - a_grid/R
end
u = c.^ζ * ζ^(-1)
u[c.<=0] .= typemin(Float64)
tol = 1e-4
outeriter = 0
test = 1000.0
while test>tol
outeriter += 1
V_last = deepcopy(V_mat)
#inbounds Threads.#threads for i in 1:len # loop over grid points
V_mat[1,i], next_mat[1,i] = findmax( u[1,:,i] .+ β*V_last[2,:])
V_mat[2,i], next_mat[2,i] = findmax( u[2,:,i] .+ β*V_last[1,:])
end
test = maximum( abs.(V_mat - V_last)[.!isnan.( V_mat - V_last )])
end
print("\n Value Function converged in ", outeriter, " steps.")
return V_mat, next_mat
end
a_grid = collect(0:0.1:100)
p1 = paramsnew(0.98, 1/2, 1/0.98);
#time valfun(p1,a_grid)
print("\n should be compiled now \n")
#btime valfun(p1,a_grid)
Fortran (O3, mkl, qopenmp) ~9.2secs: I also must be doing something wrong when declaring the openmp variables as the compilation will crash for some grid sizes when using openmp (SIGSEGV error).
module mod_calc
use omp_lib
implicit none
integer, parameter :: dp = selected_real_kind(33,4931), len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do i=1, n
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
real(dp):: bbeta, sigma, R, zeta, tol, test
real(dp):: a_grid(len), V_mat(2,len), V_last(2,len), &
u(len,len,2), c(len,len,2)
integer :: outeriter, i, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98
sigma = 0.5
R = 1.0/0.98
zeta = 1.0 - 1.0/sigma
tol = 1e-4
test = 1000.0
outeriter = 0
do i = 1,len
c(:,i,1) = a_grid(i) - a_grid/R + 20.0
c(:,i,2) = a_grid(i) - a_grid/R
end do
u = c**zeta * 1.0/zeta
where (c<=0)
u = -1e6
end where
V_mat = 0.0
next_mat = 0.0
do while (test>tol .and. outeriter<20000)
outeriter = outeriter+1
V_last = V_mat
!$OMP PARALLEL DEFAULT(NONE) &
!$OMP SHARED(V_mat, next_mat,V_last, u, bbeta) &
!$OMP PRIVATE(i)
!$OMP DO SCHEDULE(static)
do i=1,len
V_mat(1,i) = maxval(u(:,i,1) + bbeta*V_last(2,:))
next_mat(1,i) = maxloc(u(:,i,1) + bbeta*V_last(2,:),1)
V_mat(2,i) = maxval(u(:,i,2) + bbeta*V_last(1,:))
next_mat(2,i) = maxloc(u(:,i,2) + bbeta*V_last(1,:),1)
end do
!$OMP END DO
!$OMP END PARALLEL
test = maxval(abs(log(V_last/V_mat)))
end do
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
call omp_set_num_threads(4)
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
There should be ways to reduce the amount of allocations (Julia reports 32-45% gc time!) but for now I am too novice to see them, so any comments and tipps are welcome.
Edit:
Adding #views and correct broadcasting to the while loop improved the Julia speed considerably (as expected, I guess) and hence beats the Matlab loop now. With 4 threads the code now takes only 1.97secs. Specifically,
#inbounds for i in 1:len
c[1,:,i] = #views #. a_grid[i] - a_grid/R .+ 20.0
c[2,:,i] = #views #. a_grid[i] - a_grid/R
end
u = #. c^ζ * ζ^(-1)
#. u[c<=0] = typemin(Float64)
while test>tol && outeriter<20000
outeriter += 1
V_last = deepcopy(V_mat)
#inbounds Threads.#threads for i in 1:len # loop over grid points
V_mat[1,i], next_mat[1,i] = #views findmax( #. u[1,:,i] + β*V_last[2,:])
V_mat[2,i], next_mat[2,i] = #views findmax( #. u[2,:,i] + β*V_last[1,:])
end
test = #views maximum( #. abs(V_mat - V_last)[!isnan( V_mat - V_last )])
end
The reason the fortran is so slow is that it is using quadruple precision - I don't know Julia or Matlab but it looks as though double precision is being used in that case. Further as noted in the comments some of the loop orders are incorrect for Fortran, and also you are not consistent in your use of precision in the Fortran code, most of your constants are single precision. Correcting all these leads to the following:
Original: test = 9.83440674663232047922921588613472439E-0005 Time =
31.413 seconds.
Optimised: test = 9.8343643237979391E-005 Time = 0.912 seconds.
Note I have turned off parallelisation for these, all results are single threaded. Code is below:
module mod_calc
!!$ use omp_lib
implicit none
!!$ integer, parameter :: dp = selected_real_kind(33,4931), len = 1001
integer, parameter :: dp = selected_real_kind(15), len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do i=1, n
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
real(dp):: bbeta, sigma, R, zeta, tol, test
real(dp):: a_grid(len), V_mat(len,2), V_last(len,2), &
u(len,len,2), c(len,len,2)
integer :: outeriter, i, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98_dp
sigma = 0.5_dp
R = 1.0_dp/0.98_dp
zeta = 1.0_dp - 1.0_dp/sigma
tol = 1e-4_dp
test = 1000.0_dp
outeriter = 0
do i = 1,len
c(:,i,1) = a_grid(i) - a_grid/R + 20.0_dp
c(:,i,2) = a_grid(i) - a_grid/R
end do
u = c**zeta * 1.0_dp/zeta
where (c<=0)
u = -1e6_dp
end where
V_mat = 0.0_dp
next_mat = 0.0_dp
do while (test>tol .and. outeriter<20000)
outeriter = outeriter+1
V_last = V_mat
!$OMP PARALLEL DEFAULT(NONE) &
!$OMP SHARED(V_mat, next_mat,V_last, u, bbeta) &
!$OMP PRIVATE(i)
!$OMP DO SCHEDULE(static)
do i=1,len
V_mat(i,1) = maxval(u(:,i,1) + bbeta*V_last(:, 2))
next_mat(i,1) = maxloc(u(:,i,1) + bbeta*V_last(:, 2),1)
V_mat(i,2) = maxval(u(:,i,2) + bbeta*V_last(:, 1))
next_mat(i,2) = maxloc(u(:,i,2) + bbeta*V_last(:, 1),1)
end do
!$OMP END DO
!$OMP END PARALLEL
test = maxval(abs(log(V_last/V_mat)))
end do
Write( *, * ) test
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
!!$ call omp_set_num_threads(2)
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
Compilation / linking:
ian#eris:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian#eris:~/work/stack$ gfortran -Wall -Wextra -O3 jul.f90
jul.f90:36:48:
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
1
Warning: Unused parameter ‘file_name’ declared at (1) [-Wunused-parameter]
jul.f90:35:57:
integer :: outeriter, i, sss, next_mat(2,len), fu
1
Warning: Unused variable ‘fu’ declared at (1) [-Wunused-variable]
jul.f90:35:36:
integer :: outeriter, i, sss, next_mat(2,len), fu
1
Warning: Unused variable ‘sss’ declared at (1) [-Wunused-variable]
Running:
ian#eris:~/work/stack$ ./a.out
9.8343643237979391E-005
Time = 0.908 seconds.
What #Ian Bush says in his answer about the dual precision is correct. Moreover,
You will likely not need openmp for the kind of parallelization you have done in your code. The Fortran's intrinsic do concurrent() will automatically parallelize the loop for you (when the code is compiled with the parallel flag of the respective compiler).
Also, the where elsewhere construct is slow as it often requires the creation of a logical mask array and then applying it in a do-loop. You can use do concurrent() in place of where to both avoid the extra temporary array creation and parallelize the computation on multiple cores.
Also, when comparing 64bit precision numbers, it's good to make sure both values are the same type and kind to avoid an implicit type/kind conversion before the comparison is made.
Also, the calculation of a_grid(i) - a_grid/R in computing the c array is redundant and can be avoided in the subsequent line.
Here is the modified optimized parallel Fortran code without any OpenMP,
module mod_calc
use iso_fortran_env, only: dp => real64
implicit none
integer, parameter :: len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do concurrent(i=1:n)
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
implicit none
real(dp) :: bbeta, sigma, R, zeta, tol, test
real(dp) :: a_grid(len), V_mat(len,2), V_last(len,2), u(len,len,2), c(len,len,2)
integer :: outeriter, i, j, k, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98_dp
sigma = 0.5_dp
R = 1.0_dp/0.98_dp
zeta = 1.0_dp - 1.0_dp/sigma
tol = 1e-4_dp
test = 1000.0_dp
outeriter = 0
do concurrent(i=1:len)
c(1:len,i,2) = a_grid(i) - a_grid/R
c(1:len,i,1) = c(1:len,i,2) + 20.0_dp
end do
u = c**zeta * 1.0_dp/zeta
do concurrent(i=1:len, j=1:len, k=1:2)
if (c(i,j,k)<=0._dp) u(i,j,k) = -1e6_dp
end do
V_mat = 0.0_dp
next_mat = 0.0_dp
do while (test>tol .and. outeriter<20000)
outeriter = outeriter + 1
V_last = V_mat
do concurrent(i=1:len)
V_mat(i,1) = maxval(u(:,i,1) + bbeta*V_last(:, 2))
next_mat(i,1) = maxloc(u(:,i,1) + bbeta*V_last(:, 2),1)
V_mat(i,2) = maxval(u(:,i,2) + bbeta*V_last(:, 1))
next_mat(i,2) = maxloc(u(:,i,2) + bbeta*V_last(:, 1),1)
end do
test = maxval(abs(log(V_last/V_mat)))
end do
Write( *, * ) test
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
Compiling your original code with /standard-semantics /F0x1000000000 /O3 /Qip /Qipo /Qunroll /Qunroll-aggressive /inline:all /Ob2 /Qparallel Intel Fortran compiler flags, yields the following timing,
original.exe
Time = 37.284 seconds.
compiling and running the parallel concurrent Fortran code in the above (on at most 4 cores, if any at all is used) yields,
concurrent.exe
Time = 0.149 seconds.
For comparison, this MATLAB's timing,
Value Function converged in 362 steps.
Elapsed time is 3.575691 seconds.
One last tip: There are several vectorized array computations and loops in the above code that can still be merged together to even further improve the speed of your Fortran code. For example,
u = c**zeta * 1.0_dp/zeta
do concurrent(i=1:len, j=1:len, k=1:2)
if (c(i,j,k)<=0._dp) u(i,j,k) = -1e6_dp
end do
in the above code can be all merged with the do concurrent loop appearing before it,
do concurrent(i=1:len)
c(1:len,i,2) = a_grid(i) - a_grid/R
c(1:len,i,1) = c(1:len,i,2) + 20.0_dp
end do
If you decide to do so, then you can define an auxiliary variable inverse_zeta = 1.0_dp / zeta to use in the computation of u inside the loop instead of using * 1.0_dp / zeta, thus avoiding the extra division (which is more costly than multiplication), without degrading the readability of the code.

How to convert a recursive function to mex code?

I have a recursive function choose in MATLAB code as follows:
function nk=choose(n, k)
if (k == 0)
nk=1;
else
nk=(n * choose(n - 1, k - 1)) / k;
end
end
The code is used to compute the combination between n and k. I want to speed up it by using mex code. I tried to write a mex code as
double choose(double* n, double* k)
{
if (k==0)
return 1;
else
return (n * choose(n - 1, k - 1)) / k;
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *n, *k, *nk;
int mrows, ncols;
plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);
/* Assign pointers to each input and output. */
n = mxGetPr(prhs[0]);
k = mxGetPr(prhs[1]);
nk = mxGetPr(plhs[0]);
/* Call the recursive function. */
nk=choose(n,k);
}
However, it does not work. Could you help me to modify the mex code which can implement the above MATLAB code? Thanks
The following code fixes your C mex implementation.
The problem is not the recursion of course...
Your code uses pointers instead of values (in C it's important to use pointers only in the right places).
You can use Matlab build in function: nchoosek
See: http://www.mathworks.com/help/matlab/ref/nchoosek.html
The following code works:
//choose.c
#include "mex.h"
double choose(double n, double k)
{
if (k==0)
{
return 1;
}
else
{
return (n * choose(n - 1, k - 1)) / k;
}
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *n, *k, *nk;
int mrows, ncols;
plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);
/* Assign pointers to each input and output. */
n = mxGetPr(prhs[0]);
k = mxGetPr(prhs[1]);
nk = mxGetPr(plhs[0]);
/* Call the recursive function. */
//nk=choose(n,k);
*nk = choose(*n, *k);
}
Compile it within Matlab:
mex choose.c
Execute:
choose(10,5)
ans =
252
It is not inefficient implementation...
I am helping fixing your implementation, to be used as "inefficient example".
Measure execution of rahnema1's implementation:
tic;n = 1000000;k = 500000;nk = prod((k+1:n) .* prod((1:n-k).^ (-1/(n-k))));toc
Elapsed time is 0.022855 seconds.
Measure execution of choose.mexw64 implementation:
tic;n = 1000000;k = 500000;nk = choose(1000000, 500000);toc
Elapsed time is 0.007952 seconds.
(took a little less time than prod((k+1:n) .* prod((1:n-k).^ (-1/(n-k))))).
Measure Matlab recursion, getting error (even for n=700 and k=500):
ic;n = 700;k = 500;nk = RecursiveFunctionTest(n, k);toc
Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N) to change the limit. Be aware that exceeding your available stack space can
crash MATLAB and/or your computer.
tic;n = 700;k = 400;nk = RecursiveFunctionTest(n, k);toc
Elapsed time is 0.005635 seconds.
Very inefficient...
Measuring Matlab build in function nchoosek:
tic;nchoosek(1000000, 500000);toc
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
In nchoosek at 92
Elapsed time is 0.005081 seconds.
Conclusion:
You need to implement the C mex file without using recursion, and take a measure.
Measure without recursion:
static double factorial(double number)
{
int x;
double fac = 1;
if (number == 0)
{
return 1.0;
}
for (x = 2; x <= (int)number; x++)
{
fac = fac * x;
}
return fac;
}
double choose(double n, double k)
{
if (k == 0)
{
return 1.0;
}
else
{
//n!/((n–k)! k!)
return factorial(n)/(factorial(n-k)*factorial(k));
}
}
tic;choose(1000000, 500000);toc
Elapsed time is 0.003079 seconds.
Faster...
no need to mex binomial coefficients can be implemented in Matlab:
function nk=nchoosek2(n, k)
if n-k > k
nk = prod((k+1:n) .* prod((1:n-k).^ (-1/(n-k))));
else
nk = prod((n-k+1:n) .* prod((1:k).^ (-1/k)) ) ;
end
end

Trying to return array from Fortran to Matlab with mex, getting empty array instead

So, I'm trying to return a an array of numbers from 1-n.
#include "fintrf.h"
C Gateway routine
subroutine mexFunction(nlhs, plhs, nrhs, prhs)
C Declarations
implicit none
C mexFunction arguments:
mwPointer plhs(*), prhs(*)
integer nlhs, nrhs
mwPointer mxGetPr
mwPointer mxCreateDoubleMatrix
mwPointer mxGetM, mxGetN
mwPointer mrows, ncols
mwSize size
mwPointer x_ptr, y_ptr
integer x_input,i
real*8, allocatable :: vec(:)
x_ptr = mxGetPr(prhs(1))
mrows = mxGetM(prhs(1))
ncols = mxGetN(prhs(1))
size = mrows*ncols
x_ptr=mxGetPr(prhs(1))
call mxCopyPtrToReal8(x_ptr,x_input,size)
allocate (vec(x_input))
do i=1,x_input
vec(i)=i
end do
plhs(1) = mxCreateDoubleMatrix(1, x_input, 0)
y_ptr = mxGetPr(plhs(1))
call mxCopyReal8ToPtr(vec,y_ptr,x_input)
deallocate ( vec )
return
end
I then call the mex file in fortran here
mex testingvec.F
Building with 'gfortran'.
MEX completed successfully.
a=testingvec(10);
and then find
a=[]
Can someone give me some help on this? If someone can give me some example code how to return a matrix as well, that would be sweet.
Thanks guys.
edit: new installment of the code. Still trying to get some help.
#include "fintrf.h"
C Gateway routine
subroutine mexFunction(nlhs, plhs, nrhs, prhs)
C Declarations
implicit none
C mexFunction arguments:
mwPointer plhs(*), prhs(*)
integer nlhs, nrhs
mwPointer mxGetPr
mwPointer mxCreateDoubleMatrix
mwPointer mxGetM, mxGetN
mwPointer mrows, ncols
mwSize size
mwPointer x_ptr, y_ptr
integer i
mwSize sizeone, x_input
integer*4 izero
real*8, allocatable :: vec(:)
x_ptr = mxGetPr(prhs(1))
mrows = mxGetM(prhs(1))
ncols = mxGetN(prhs(1))
size = mrows*ncols
sizeone=1
izero=0
x_ptr=mxGetPr(prhs(1))
call mxCopyPtrToReal8(x_ptr,x_input,size)
allocate (vec(x_input))
do i=1,x_input
vec(i)=i
end do
plhs(1) = mxCreateDoubleMatrix(sizeone,x_input,izero)
call mxCopyReal8ToPtr(vec,mxGetPr(plhs(1)),x_input)
deallocate ( vec )
return
end
There were both declaration problems and issues with calls to mex functions. Here's a solution, which assumes that the input is an integer-valued double giving you the length of the output vector (assuming that this is what you wanted).
#include "fintrf.h"
C Gateway routine
subroutine mexFunction(nlhs, plhs, nrhs, prhs)
C Declarations
implicit none
C mexFunction arguments:
mwPointer plhs(*), prhs(*)
integer*4 nlhs, nrhs
mwPointer mxGetPr
mwPointer mxCreateDoubleMatrix
mwSize mxGetM, mxGetN
mwSignedIndex mrows, ncols
mwSize size, x_input, sizeone
mwPointer x_ptr, y_ptr
integer*4 i, izero, x_int
real*8, allocatable :: vec(:)
real*8 :: x_dbl
sizeone = 1
izero = 0
!check input/output syntax
if (nrhs /= 1) then
call mexErrMsgIdAndTxt("MATLAB:testingvec:rhs",
> "Exactly 1 input variable required.")
end if
if (nlhs /= 1) then
call mexErrMsgIdAndTxt("MATLAB:testingvec:lhs",
> "Exactly 1 output matrix required.")
end if
x_ptr = mxGetPr(prhs(1))
mrows = mxGetM(prhs(1))
ncols = mxGetN(prhs(1))
size = mrows*ncols
call mxCopyPtrToReal8(x_ptr,x_dbl,sizeone)
x_input = int(x_dbl)
allocate (vec(x_input))
do i=1,x_input
vec(i)=i
end do
plhs(1) = mxCreateDoubleMatrix(sizeone, x_input, izero)
y_ptr = mxGetPr(plhs(1))
call mxCopyReal8ToPtr(vec,y_ptr,x_input)
deallocate ( vec )
return
end
I introduced a check for the number of input/output variables (to be updated in the actual program). And I introduced an auxiliary x_dbl which might or might not be necessary. This version reads the double input given to your function, and truncates it to get x_input.

how to create a single float sparse matrix in mex files

This Creating sparse matrix in MEX has a good example on mxCreateSparse. But this function return a double sparse matrix instead of single. If I want to return a single sparse matrix, what should I do ? Thanks !
As #horchler suggested, you could use the undocumented function mxCreateSparseNumericMatrix. Example:
singlesparse.c
#include "mex.h"
#include <string.h> /* memcpy */
/* undocumented function prototype */
EXTERN_C mxArray *mxCreateSparseNumericMatrix(mwSize m, mwSize n,
mwSize nzmax, mxClassID classid, mxComplexity ComplexFlag);
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
const float pr[] = {1.0, 7.0, 5.0, 3.0, 4.0, 2.0, 6.0};
const mwIndex ir[] = {0, 2, 4, 2, 3, 0, 4};
const mwIndex jc[] = {0, 3, 5, 5, 7};
const mwSize nzmax = 10;
const mwSize m = 5;
const mwSize n = 4;
plhs[0] = mxCreateSparseNumericMatrix(m, n, nzmax, mxSINGLE_CLASS, mxREAL);
memcpy((void*)mxGetPr(plhs[0]), (const void*)pr, sizeof(pr));
memcpy((void*)mxGetIr(plhs[0]), (const void*)ir, sizeof(ir));
memcpy((void*)mxGetJc(plhs[0]), (const void*)jc, sizeof(jc));
}
Usage:
>> mex -largeArrayDims singlesparse.c
>> s = singlesparse()
s =
(1,1) 1
(3,1) 7
(5,1) 5
(3,2) 3
(4,2) 4
(1,4) 2
(5,4) 6
>> ss = double(s);
>> whos s ss
Name Size Bytes Class Attributes
s 5x4 160 single sparse
ss 5x4 152 double sparse
>> f = full(s)
One or more output arguments not assigned during call to "full".
>> f = full(ss)
f =
1 0 0 2
0 0 0 0
7 3 0 0
0 4 0 0
5 0 0 6
>> s + s;
Undefined function 'plus' for input arguments of type 'single' and attributes 'sparse 2d real'.
>> ss + ss;
>> 2 * s;
Error using *
Undefined function 'times' for input arguments of type 'single' and attributes 'sparse 2d real'.
>> 2 * ss;
>> s * s';
Error using *
MTIMES is not supported for one sparse input and one single input.
>> ss * ss';
>> nnz(s)
ans =
7
>> nzmax(s)
ans =
10
>> dmperm(s)
Undefined function 'dmperm' for input arguments of type 'single'.
>> dmperm(ss)
ans =
1 3 0 5
>> svds(s)
Error using horzcat
The following error occurred converting from double to single:
Error using single
Attempt to convert to unimplemented sparse type
Error in svds (line 64)
B = [sparse(m,m) A; A' sparse(n,n)];
>> svds(ss)
ans =
9.9249
5.5807
3.2176
0.0000
>> % abs(s), cos(s), sin(s), s.^2, s.*s, etc.. all give errors
As you can see, the sparse single array was created successfully, however many functions expect the array to be of type double, so there is a lot of missing functionality...
Another restriction is that you cannot create multi-dimensional sparse arrays in MATLAB, they have to be 2D matrices..
Bottom line is: stick with double sparse 2D matrices in MATLAB!

Symbolic function input non-symbolic parameters

I need to define a function that returns a symbolic matrix (sym). It takes 4 input parameters- 2 symbolic matrices, and 2 integers. How do I do that?
This is what I was trying to do-
%my function
function F = matrix(F, F4, i, j)
...
F=...;
end
%calling it in a different file
syms M1;
M1 = ...;
syms M2;
M2 = ...;
syms M3;
M3 = matrix(M1,M2,1,2);
I have done this easy test;
function L = test(A,B,c,d)
syms tmp1 tmp2
tmp1 = c;
tmp2 = d;
L = tmp1*A + tmp2*B;
end
where A and B are already symbolic matrices.