I'm new to this concept of parallel pooling on MATLAB (I'm using the version 2019 a) and coding. This code that I'm going to share with you was available on the net, with some few modifications that I've made it for my requirements.
Problem Statement: I'm having a non-linear system (Rossler equation) & I have to plot its Bifurcation diagram, I tried to do it normally using for loop but its computation time was too much and my computer got hanged several times, so I got an advice to parallel pool my code in order to come out of this problem. I tried to learn how to parallel pool using MATLAB on the net but still I'm not able to resolve my Issues as there are still some problems since there are 2 parfor loops in my code I'm having problems with Indexing and in assignment of the global parameter (Please note: This code is written for normal execution without using parallel pooling).
I'm attaching my code below here, please excuse if I've mentioned a lot many lines of codes.
clc;
a = 0.2; b = 0.2; global c;
crange = 1:0.05:90; % Range for parameter c
k = 0; tspan = 0:0.1:500; % Time interval for solving Rossler system
xmax = []; % A matrix for storing the sorted value of x1
for c = crange
f = #(t,x) [-x(2)-x(3); x(1)+a*x(2); b+x(3)*(x(1)-c)];
x0 = [1 1 0]; % initial condition for Rossler system
k = k + 1;
[t,x] = ode45(f,tspan,x0); % call ode() to solve Rossler system
count = find(t>100); % find all the t values which is >10
x = x(count,:);
j = 1;
n = length(x(:,1)); % find the length of vector x1(x in our problem)
for i=2 : n-1
% check for the min value in 1st column of sol matrix
if (x(i-1,1)+eps) < x(i,1) && x(i,1) > (x(i+1,1)+eps)
xmax(k,j)=x(i,1); % Sorting the values of x1 in increasing order
j=j+1;
end
end
% generating bifurcation map by plotting j-1 element of kth row each time
if j>1
plot(c,xmax(k,1:j-1),'k.','MarkerSize',1);
end
hold on;
index(k)=j-1;
end
xlabel('Bifuracation parameter c');
ylabel('x max');
title('Bifurcation diagram for c');
This can be made compatible with parfor by taking a few relatively simple steps. Firstly, parfor workers cannot produce on-screen graphics, so we need to change things to emit a result. In your case, this is not totally trivial since your primary result xmax is being assigned-to in a not-completely-uniform manner - you're assigning different numbers of elements on different loop iterations. Not only that, it appears not to be possible to predict up-front how many columns xmax needs.
Secondly, you need to make some minor changes to the loop iteration to be compatible with parfor, which requires consecutive integer loop iterates.
So, the major change is to have the loop write individual rows of results to a cell array I've called xmax_cell. Outside the parfor loop, it's trivial to convert this back to matrix form.
Putting all this together, we end up with this, which works correctly in R2019b as far as I can tell:
clc;
a = 0.2; b = 0.2;
crange = 1:0.05:90; % Range for parameter c
tspan = 0:0.1:500; % Time interval for solving Rossler system
% PARFOR loop outputs: a cell array of result rows ...
xmax_cell = cell(numel(crange), 1);
% ... and a track of the largest result row
maxNumCols = 0;
parfor k = 1:numel(crange)
c = crange(k);
f = #(t,x) [-x(2)-x(3); x(1)+a*x(2); b+x(3)*(x(1)-c)];
x0 = [1 1 0]; % initial condition for Rossler system
[t,x] = ode45(f,tspan,x0); % call ode() to solve Rossler system
count = find(t>100); % find all the t values which is >10
x = x(count,:);
j = 1;
n = length(x(:,1)); % find the length of vector x1(x in our problem)
this_xmax = [];
for i=2 : n-1
% check for the min value in 1st column of sol matrix
if (x(i-1,1)+eps) < x(i,1) && x(i,1) > (x(i+1,1)+eps)
this_xmax(j) = x(i,1);
j=j+1;
end
end
% Keep track of what's the maximum number of columns
maxNumCols = max(maxNumCols, numel(this_xmax));
% Store this row into the output cell array.
xmax_cell{k} = this_xmax;
end
% Fix up xmax - push each row into the resulting matrix.
xmax = NaN(numel(crange), maxNumCols);
for idx = 1:numel(crange)
this_max = xmax_cell{idx};
xmax(idx, 1:numel(this_max)) = this_max;
end
% Plot
plot(crange, xmax', 'k.', 'MarkerSize', 1)
xlabel('Bifuracation parameter c');
ylabel('x max');
title('Bifurcation diagram for c');
Related
The following is my code. I try to model PFR in Matlab using ode23s. It works well with one component irreversible reaction. But when extending more dependent variables, 'Matrix dimensions must agree' problem shows. Have no idea how to fix it. Is possible to use other software to solve similar problems?
Thank you.
function PFR_MA_length
clear all; clc; close all;
function dCdt = df(t,C)
dCdt = zeros(N,2);
dCddt = [0; -vo*diff(C(:,1))./diff(V)-(-kM*C(2:end,1).*C(2:end,2)-kS*C(2:end,1))];
dCmdt = [0; -vo*diff(C(:,2))./diff(V)-(-kM*C(2:end,1).*C(2:end,2))];
dCdt(:,1) = dCddt;
dCdt(:,2) = dCmdt;
end
kM = 1;
kS = 0.5; % assumptions of the rate constants
C0 = [2, 2]; % assumptions of the entering concentration
vo = 2; % volumetric flow rate
volume = 20; % total volume of reactor, spacetime = 10
N = 100; % number of points to discretize the reactor volume on
init = zeros(N,2); % Concentration in reactor at t = 0
init(1,:) = C0; % concentration at entrance
V = linspace(0,volume,N)'; % discretized volume elements, in column form
tspan = [0 20];
[t,C] = ode23s(#(t,C) df(t,C),tspan,init);
end
'''
You can put a break point on the line that computes dCddt and observe that the size of the matrices C and V are different.
>> size(C)
ans =
200 1
>> size(V)
ans =
100 1
The element-wise divide operation, ./, between these two variables would then result in the error that you mentioned.
Per ode23s's help, the output of the call to dCdt = df(t,C) needs to be a vector. However, you are returning a matrix of size 100x2. In the next call to the same function, ode32s converts it to a vector when computing the value of C, hence the size 200x1.
In the GNU octave interpretation of Matlab behavior, one has to explicitly make sure that the solver only sees flat one-dimensional state vectors. These have to be translated forward and back in the application of the model.
Explicitly reading the object A as flat array A(:) forgets the matrix dimension information, these can be added back with the reshape(A,m,n) command.
function dCdt = df(t,C)
C = reshape(C,N,2);
...
dCdt = dCdt(:);
end
...
[t,C] = ode45(#(t,C) df(t,C), tspan, init(:));
I am wanting to parallelize some MATLAB code that does some parameter fitting using MATLAB's mle routine. My reason for parallelization is that I want to execute the mle routine for multiple different guesses on the same data set.
Initial simulations show that the parallelized code takes anywhere from 200-400 seconds longer than the serial version (~500 seconds) depending on what data set I fit (to time, I just tic and toc at the beginning and end of my code). Am I using parfor incorrectly?
tic
mitdata; % load data
data=dmso; % take the 'DMSO' column data
num=length(data);
mm=linspace(.1, 1, 2)'; % mean values
vv=linspace(.1, 2, 2)'; % variance values
N=length(mm);
n=length(vv);
% get all combinations of the initial parameter guesses I want to try
pp = cell(N, n);
for i = 1:N
for j = 1:n
pp{(i-1)*N+j} = [mm(i), vv(j)];
end
end
temp = nchoosek([1:N*n, 1:N*n], 2);
temp = sort(temp, 2);
idx = unique(temp, 'rows');
P = zeros(length(idx),4);
for ii = 1:length(idx)
P(ii,:) = [pp{idx(ii,1)},pp{idx(ii,2)}];
end
Pcell = num2cell(P); % convert P to cell so that it's easier to assign parameter values for initial guess
options = statset('MaxIter',10000, 'MaxFunEvals',10000);
pd = zeros(length(P),4); % space for best fit parameters
ld = NaN*ones(length(P),1); % space for likelihood values
parfor i=1:length(P)
[m1,v1,m2,v2] = Pcell{i,:};
x0 = [m1,v1,m2,v2]; % initial guess
[p,conf1]=mle(data,'pdf',#convolv_2invG,'start',x0, 'upperbound', [Inf Inf Inf Inf],'lowerbound',[0 0 0 0],'options',options)
pd(i,:)=p; % get best fit parameters from MLE
l=convolv_2invG(data,p(1),p(2),p(3),p(4)); % use best fit parameters to evaluate pdf
l=sum(log(l));
if l<0
ld(i)=l; % store likelihood values
end
end
toc
I have implemented a script that does constrained optimization for solving the optimal parameters of Support Vector Machines model. I noticed that my script for some reason gives inaccurate results (although very close to the real value). For example the typical situation is that the result of a calculation should be exactly 0, but instead it is something like
-1/18014398509481984 = -5.551115123125783e-17
This situation happens when I multiply matrices with vectors. What makes this also strange is that if I do the multiplications by hand in the command window in Matlab I get exactly 0 result.
Let me give an example: If I take the vectors Aq = [-1 -1 1 1] and x = [12/65 28/65 32/65 8/65]' I get exactly 0 result from their multiplication if I do this in the command window, as you can see in the picture below:
If on the other hand I do this in my function-script I don't get the result being 0 but rather the value -1/18014398509481984.
Here is the part of my script that is responsible for this multiplication (I've added the Aq and x into the script to show the contents of Aq and x as well):
disp('DOT PRODUCT OF ACTIVE SET AND NEW POINT: ')
Aq
x
Aq*x
Here is the result of the code above when run:
As you can see the value isn't exactly 0 even though it really should be. Note that this problem doesn't occur for all possible values of Aq and x. If Aq = [-1 -1 1 1] and x = [4/13 4/13 4/13 4/13] the result is exactly 0 as you can see below:
What is causing this inaccuracy? How can I fix this?
P.S. I didn't include my whole code because it's not very well documented and few hundred lines long, but I will if requested.
Thank you!
UPDATE: new test, by using Ander Biguri's advice:
UPDATE 2: THE CODE
function [weights, alphas, iters] = solveSVM(data, labels, C, e)
% FUNCTION [weights, alphas, iters] = solveSVM(data, labels, C, e)
%
% AUTHOR: jjepsuomi
%
% VERSION: 1.0
%
% DESCRIPTION:
% - This function will attempt to solve the optimal weights for a Support
% Vector Machines (SVM) model using active set method with gradient
% projection.
%
% INPUTS:
% "data" a n-by-m data matrix. The number of rows 'n' corresponds to the
% number of data points and the number of columns 'm' corresponds to the
% number of variables.
% "labels" a 1-by-n row vector of data labels from the set {-1,1}.
% "C" Box costraint upper limit. This will constrain the values of 'alphas'
% to the range 0 <= alphas <= C. If hard-margin SVM model is required set
% C=Inf.
% "e" a real value corresponding to the convergence criterion, that is if
% solution Xi and Xi-1 are within distance 'e' from each other stop the
% learning process, i.e. IF |F(Xi)-F(Xi-1)| < e ==> stop learning process.
%
% OUTPUTS:
% "weights" a vector corresponding to the optimal decision line parameters.
% "alphas" a vector of alpha-values corresponding to the optimal solution
% of the dual optimization problem of SVM.
% "iters" number of iterations until learning stopped.
%
% EXAMPLE USAGE 1:
%
% 'Hard-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, Inf, 10^-100)
%
% EXAMPLE USAGE 2:
%
% 'Soft-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, 0.8, 10^-100)
% STEP 1: INITIALIZATION OF THE PROBLEM
format long
% Calculate linear kernel matrix
L = kron(labels', labels);
K = data*data';
% Hessian matrix
Qd = L.*K;
% The minimization function
L = #(a) (1/2)*a'*Qd*a - ones(1, length(a))*a;
% Gradient of the minimizable function
gL = #(a) a'*Qd - ones(1, length(a));
% STEP 2: THE LEARNING PROCESS, ACTIVE SET WITH GRADIENT PROJECTION
% Initial feasible solution (required by gradient projection)
x = zeros(length(labels), 1);
iters = 1;
optfound = 0;
while optfound == 0 % criterion met
% Negative of the gradient at initial solution
g = -gL(x);
% Set the active set and projection matrix
Aq = labels; % In plane y^Tx = 0
P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane projection
% Values smaller than 'eps' are changed into 0
P(find(abs(P-0) < eps)) = 0;
d = P*g'; % Projection onto plane
if ~isempty(find(x==0 | x==C)) % Constraints active?
acinds = find(x==0 | x==C);
for i = 1:length(acinds)
if (x(acinds(i)) == 0 && d(acinds(i)) < 0) || x(acinds(i)) == C && d(acinds(i)) > 0
% Make the constraint vector
constr = zeros(1,length(x));
constr(acinds(i)) = 1;
Aq = [Aq; constr];
end
end
% Update the projection matrix
P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane / box projection
% Values smaller than 'eps' are changed into 0
P(find(abs(P-0) < eps)) = 0;
d = P*g'; % Projection onto plane / border
end
%%%% DISPLAY INFORMATION, THIS PART IS NOT NECESSAY, ONLY FOR DEBUGGING
if Aq*x ~= 0
disp('ACTIVE SET CONSTRAINTS Aq :')
Aq
disp('CURRENT SOLUTION x :')
x
disp('MULTIPLICATION OF Aq and x')
Aq*x
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Values smaller than 'eps' are changed into 0
d(find(abs(d-0) < eps)) = 0;
if ~isempty(find(d~=0)) && rank(P) < length(x) % Line search for optimal lambda
lopt = ((g*d)/(d'*Qd*d));
lmax = inf;
for i = 1:length(x)
if d(i) < 0 && -x(i) ~= 0 && -x(i)/d(i) <= lmax
lmax = -x(i)/d(i);
elseif d(i) > 0 && (C-x(i))/d(i) <= lmax
lmax = (C-x(i))/d(i);
end
end
lambda = max(0, min([lopt, lmax]));
if abs(lambda) < eps
lambda = 0;
end
xo = x;
x = x + lambda*d;
iters = iters + 1;
end
% Check whether search direction is 0-vector or 'e'-criterion met.
if isempty(find(d~=0)) || abs(L(x)-L(xo)) < e
optfound = 1;
end
end
%%% STEP 3: GET THE WEIGHTS
alphas = x;
w = zeros(1, length(data(1,:)));
for i = 1:size(data,1)
w = w + labels(i)*alphas(i)*data(i,:);
end
svinds = find(alphas>0);
svind = svinds(1);
b = 1/labels(svind) - w*data(svind, :)';
%%% STEP 4: OPTIMALITY CHECK, KKT conditions. See KKT-conditions for reference.
weights = [b; w'];
datadim = length(data(1,:));
Q = [zeros(1,datadim+1); zeros(datadim, 1), eye(datadim)];
A = [ones(size(data,1), 1), data];
for i = 1:length(labels)
A(i,:) = A(i,:)*labels(i);
end
LagDuG = Q*weights - A'*alphas;
Ac = A*weights - ones(length(labels),1);
alpA = alphas.*Ac;
LagDuG(any(abs(LagDuG-0) < 10^-14)) = 0;
if ~any(alphas < 0) && all(LagDuG == zeros(datadim+1,1)) && all(abs(Ac) >= 0) && all(abs(alpA) < 10^-6)
disp('Optimal found, Karush-Kuhn-Tucker conditions satisfied.')
else
disp('Optimal not found, Karush-Kuhn-Tucker conditions not satisfied.')
end
% VISUALIZATION FOR 2D-CASE
if size(data, 2) == 2
pinds = find(labels > 0);
ninds = find(labels < 0);
plot(data(pinds, 1), data(pinds, 2), 'o', 'MarkerFaceColor', 'red', 'MarkerEdgeColor', 'black')
hold on
plot(data(ninds, 1), data(ninds, 2), 'o', 'MarkerFaceColor', 'blue', 'MarkerEdgeColor', 'black')
Xb = min(data(:,1))-1;
Xe = max(data(:,1))+1;
Yb = -(b+w(1)*Xb)/w(2);
Ye = -(b+w(1)*Xe)/w(2);
lineh = plot([Xb Xe], [Yb Ye], 'LineWidth', 2);
supvh = plot(data(find(alphas~=0), 1), data(find(alphas~=0), 2), 'g.');
legend([lineh, supvh], 'Decision boundary', 'Support vectors');
hold off
end
NOTE:
If you run the EXAMPLE 1, you should get an output starting with the following:
As you can see, the multiplication between Aq and x don't produce value 0, even though they should. This is not a bad thing in this particular example, but if I have more data points with lots of decimals in them this inaccuracy becomes bigger and bigger problem, because the calculations are not exact. This is bad for example when I'm searching for a new direction vector when I'm moving towards the optimal solution in gradient projection method. The search direction isn't exactly the correct direction, but close to it. This is why I want the exactly correct values...is this possible?
I wonder if the decimals in the data points have something to do with the accuracy of my results. See the picture below:
So the question is: Is this caused by the data or is there something wrong in the optimization procedure...
Do you use format function inside your script? It looks like you used somewhere format rat.
You can always use matlab eps function, that returns precision that is used inside matlab. The absolute value of -1/18014398509481984 is smaller that this, according to my Matlab R2014B:
format long
a = abs(-1/18014398509481984)
b = eps
a < b
This basically means that the result is zero (but matlab stopped calculations because according to eps value, the result was just fine).
Otherwise you can just use format long inside your script before the calculation.
Edit
I see inv function inside your code, try replacing it with \ operator (mldivide). The results from it will be more accurate as it uses Gaussian elimination, without forming the inverse.
The inv documentation states:
In practice, it is seldom necessary to form the explicit inverse of a
matrix. A frequent misuse of inv arises when solving the system of
linear equations Ax = b. One way to solve this is with x = inv(A)*b. A
better way, from both an execution time and numerical accuracy
standpoint, is to use the matrix division operator x = A\b. This
produces the solution using Gaussian elimination, without forming the
inverse.
With the provided code, this is how I tested:
I added a break-point on the following code:
if Aq*x ~= 0
disp('ACTIVE SET CONSTRAINTS Aq :')
Aq
disp('CURRENT SOLUTION x :')
x
disp('MULTIPLICATION OF Aq and x')
Aq*x
end
When the if branch was taken, I typed at console:
K>> format rat; disp(x);
12/65
28/65
32/65
8/65
K>> disp(x == [12/65; 28/65; 32/65; 8/65]);
0
1
0
0
K>> format('long'); disp(max(abs(x - [12/65; 28/65; 32/65; 8/65])));
1.387778780781446e-17
K>> disp(eps(8/65));
1.387778780781446e-17
This suggests that this is a displaying problem: the format rat deliberately uses small integers for expressing the value, on the expense of precision. Apparently, the true value of x(4) is the next one to 8/65 than can be possibly put in double format.
So, this begs the question: are you sure that numeric convergence depends on flipping the least significant bit in a double precision value?
I have code of the following kind in MATLAB:
indices = find([1 2 2 3 3 3 4 5 6 7 7] == 3)
This returns 4,5,6 - the indices of elements in the array equal to 3. Now. my code does this sort of thing with very long vectors. The vectors are always sorted.
Therefore, I would like a function which replaces the O(n) complexity of find with O(log n), at the expense that the array has to be sorted.
I am aware of ismember, but for what I know it does not return the indices of all items, just the last one (I need all of them).
For reasons of portability, I need the solution to be MATLAB-only (no compiled mex files etc.)
Here is a fast implementation using binary search. This file is also available on github
function [b,c]=findInSorted(x,range)
%findInSorted fast binary search replacement for ismember(A,B) for the
%special case where the first input argument is sorted.
%
% [a,b] = findInSorted(x,s) returns the range which is equal to s.
% r=a:b and r=find(x == s) produce the same result
%
% [a,b] = findInSorted(x,[from,to]) returns the range which is between from and to
% r=a:b and r=find(x >= from & x <= to) return the same result
%
% For any sorted list x you can replace
% [lia] = ismember(x,from:to)
% with
% [a,b] = findInSorted(x,[from,to])
% lia=a:b
%
% Examples:
%
% x = 1:99
% s = 42
% r1 = find(x == s)
% [a,b] = myFind(x,s)
% r2 = a:b
% %r1 and r2 are equal
%
% See also FIND, ISMEMBER.
%
% Author Daniel Roeske <danielroeske.de>
A=range(1);
B=range(end);
a=1;
b=numel(x);
c=1;
d=numel(x);
if A<=x(1)
b=a;
end
if B>=x(end)
c=d;
end
while (a+1<b)
lw=(floor((a+b)/2));
if (x(lw)<A)
a=lw;
else
b=lw;
end
end
while (c+1<d)
lw=(floor((c+d)/2));
if (x(lw)<=B)
c=lw;
else
d=lw;
end
end
end
Daniel's approach is clever and his myFind2 function is definitely fast, but there are errors/bugs that occur near the boundary conditions or in the case that the upper and lower bounds produce a range outside the set passed in.
Additionally, as he noted in his comment on his answer, his implementation had some inefficiencies that could be improved. I implemented an improved version of his code, which runs faster, while also correctly handling boundary conditions. Furthermore, this code includes more comments to explain what is happening. I hope this helps someone the way Daniel's code helped me here!
function [lower_index,upper_index] = myFindDrGar(x,LowerBound,UpperBound)
% fast O(log2(N)) computation of the range of indices of x that satify the
% upper and lower bound values using the fact that the x vector is sorted
% from low to high values. Computation is done via a binary search.
%
% Input:
%
% x- A vector of sorted values from low to high.
%
% LowerBound- Lower boundary on the values of x in the search
%
% UpperBound- Upper boundary on the values of x in the search
%
% Output:
%
% lower_index- The smallest index such that
% LowerBound<=x(index)<=UpperBound
%
% upper_index- The largest index such that
% LowerBound<=x(index)<=UpperBound
if LowerBound>x(end) || UpperBound<x(1) || UpperBound<LowerBound
% no indices satify bounding conditions
lower_index = [];
upper_index = [];
return;
end
lower_index_a=1;
lower_index_b=length(x); % x(lower_index_b) will always satisfy lowerbound
upper_index_a=1; % x(upper_index_a) will always satisfy upperbound
upper_index_b=length(x);
%
% The following loop increases _a and decreases _b until they differ
% by at most 1. Because one of these index variables always satisfies the
% appropriate bound, this means the loop will terminate with either
% lower_index_a or lower_index_b having the minimum possible index that
% satifies the lower bound, and either upper_index_a or upper_index_b
% having the largest possible index that satisfies the upper bound.
%
while (lower_index_a+1<lower_index_b) || (upper_index_a+1<upper_index_b)
lw=floor((lower_index_a+lower_index_b)/2); % split the upper index
if x(lw) >= LowerBound
lower_index_b=lw; % decrease lower_index_b (whose x value remains \geq to lower bound)
else
lower_index_a=lw; % increase lower_index_a (whose x value remains less than lower bound)
if (lw>upper_index_a) && (lw<upper_index_b)
upper_index_a=lw;% increase upper_index_a (whose x value remains less than lower bound and thus upper bound)
end
end
up=ceil((upper_index_a+upper_index_b)/2);% split the lower index
if x(up) <= UpperBound
upper_index_a=up; % increase upper_index_a (whose x value remains \leq to upper bound)
else
upper_index_b=up; % decrease upper_index_b
if (up<lower_index_b) && (up>lower_index_a)
lower_index_b=up;%decrease lower_index_b (whose x value remains greater than upper bound and thus lower bound)
end
end
end
if x(lower_index_a)>=LowerBound
lower_index = lower_index_a;
else
lower_index = lower_index_b;
end
if x(upper_index_b)<=UpperBound
upper_index = upper_index_b;
else
upper_index = upper_index_a;
end
Note that the improved version of Daniels searchFor function is now simply:
function [lower_index,upper_index] = mySearchForDrGar(x,value)
[lower_index,upper_index] = myFindDrGar(x,value,value);
EDIT many years later: there was an error in the last two if/else blocks, fixed it.
ismember will give you all the indexes if you look at the first output:
>> x = [1 2 2 3 3 3 4 5 6 7 7];
>> [tf,loc]=ismember(x,3);
>> inds = find(tf)
inds =
4 5 6
You just need to use the right order of inputs.
Note that there is a helper function used by ismember that you can call directly:
% ISMEMBC - S must be sorted - Returns logical vector indicating which
% elements of A occur in S
tf = ismembc(x,3);
inds = find(tf);
Using ismembc will save computation time since ismember calls issorted first, but this will omit the check.
Note that newer versions of matlab have a builtin called by builtin('_ismemberoneoutput',a,b) with the same functionality.
Since the above applications of ismember, etc. are somewhat backwards (searching for each element of x in the second argument rather than the other way around), the code is much slower than necessary. As the OP points out, it is unfortunate that [~,loc]=ismember(3,x) only provides the location of the first occurrence of 3 in x, rather than all. However, if you have a recent version of MATLAB (R2012b+, I think), you can use yet more undocumented builtin functions to get the first an last indexes! These are ismembc2 and builtin('_ismemberfirst',searchfor,x):
firstInd = builtin('_ismemberfirst',searchfor,x); % find first occurrence
lastInd = ismembc2(searchfor,x); % find last occurrence
% lastInd = ismembc2(searchfor,x(firstInd:end))+firstInd-1; % slower
inds = firstInd:lastInd;
Still slower than Daniel R.'s great MATLAB code, but there it is (rntmX added to randomatlabuser's benchmark) just for fun:
mean([rntm1 rntm2 rntm3 rntmX])
ans =
0.559204323050486 0.263756852283128 0.000017989974213 0.000153682125682
Here are the bits of documentation for these functions inside ismember.m:
% ISMEMBC2 - S must be sorted - Returns a vector of the locations of
% the elements of A occurring in S. If multiple instances occur,
% the last occurrence is returned
% ISMEMBERFIRST(A,B) - B must be sorted - Returns a vector of the
% locations of the elements of A occurring in B. If multiple
% instances occur, the first occurence is returned.
There is actually reference to an ISMEMBERLAST builtin, but it doesn't seem to exist (yet?).
This is not an answer - I am just comparing the running time of the three solutions suggested by chappjc and Daniel R.
N = 5e7; % length of vector
p = 0.99; % probability
KK = 100; % number of instances
rntm1 = zeros(KK, 1); % runtime with ismember
rntm2 = zeros(KK, 1); % runtime with ismembc
rntm3 = zeros(KK, 1); % runtime with Daniel's function
for kk = 1:KK
x = cumsum(rand(N, 1) > p);
searchfor = x(ceil(4*N/5));
tic
[tf,loc]=ismember(x, searchfor);
inds1 = find(tf);
rntm1(kk) = toc;
tic
tf = ismembc(x, searchfor);
inds2 = find(tf);
rntm2(kk) = toc;
tic
a=1;
b=numel(x);
c=1;
d=numel(x);
while (a+1<b||c+1<d)
lw=(floor((a+b)/2));
if (x(lw)<searchfor)
a=lw;
else
b=lw;
end
lw=(floor((c+d)/2));
if (x(lw)<=searchfor)
c=lw;
else
d=lw;
end
end
inds3 = (b:c)';
rntm3(kk) = toc;
end
Daniel's binary search is very fast.
% Mean of running time
mean([rntm1 rntm2 rntm3])
% 0.631132275892504 0.295233981447746 0.000400786666188
% Percentiles of running time
prctile([rntm1 rntm2 rntm3], [0 25 50 75 100])
% 0.410663611685559 0.175298784336465 0.000012828868032
% 0.429120717937665 0.185935198821797 0.000014539383770
% 0.582281366154709 0.268931132925888 0.000019243302048
% 0.775917520641649 0.385297304740352 0.000026940622867
% 1.063753914942895 0.592429428396956 0.037773746662356
I needed a function like this. Thanks for the post #Daniel!
I worked a little with it because I needed to find several indexes in the same array. I wanted to avoid the overhead of arrayfun (or the like) or calling the function multiple times. So you can pass a bunch of values in range and you will get the indexes in the array.
function idx = findInSorted(x,range)
% Author Dídac Rodríguez Arbonès (May 2018)
% Based on Daniel Roeske's solution:
% Daniel Roeske <danielroeske.de>
% https://github.com/danielroeske/danielsmatlabtools/blob/master/matlab/data/findinsorted.m
range = sort(range);
idx = nan(size(range));
for i=1:numel(range)
idx(i) = aux(x, range(i));
end
end
function b = aux(x, lim)
a=1;
b=numel(x);
if lim<=x(1)
b=a;
end
if lim>=x(end)
a=b;
end
while (a+1<b)
lw=(floor((a+b)/2));
if (x(lw)<lim)
a=lw;
else
b=lw;
end
end
end
I guess you can use a parfor or arrayfun instead. I have not tested myself at what size of range it pays off, though.
Another possible improvement would be to use the previous found indexes (if range is sorted) to decrease the search space. I am skeptical of its potential to save CPU because of the O(log n) runtime.
The final function ended up running slightly faster. I used #randomatlabuser 's framework for that:
N = 5e6; % length of vector
p = 0.99; % probability
KK = 100; % number of instances
rntm1 = zeros(KK, 1); % runtime with ismember
rntm2 = zeros(KK, 1); % runtime with ismembc
rntm3 = zeros(KK, 1); % runtime with Daniel's function
for kk = 1:KK
x = cumsum(rand(N, 1) > p);
searchfor = x(ceil(4*N/5));
tic
range = sort(searchfor);
idx = nan(size(range));
for i=1:numel(range)
idx(i) = aux(x, range(i));
end
rntm1(kk) = toc;
tic
a=1;
b=numel(x);
c=1;
d=numel(x);
while (a+1<b||c+1<d)
lw=(floor((a+b)/2));
if (x(lw)<searchfor)
a=lw;
else
b=lw;
end
lw=(floor((c+d)/2));
if (x(lw)<=searchfor)
c=lw;
else
d=lw;
end
end
inds3 = (b:c)';
rntm2(kk) = toc;
end
%%
function b = aux(x, lim)
a=1;
b=numel(x);
if lim<=x(1)
b=a;
end
if lim>=x(end)
a=b;
end
while (a+1<b)
lw=(floor((a+b)/2));
if (x(lw)<lim)
a=lw;
else
b=lw;
end
end
end
It is not a big improvement, but it helps because I need to run several thousand searches.
% Mean of running time
mean([rntm1 rntm2])
% 9.9624e-05 5.6303e-05
% Percentiles of running time
prctile([rntm1 rntm2], [0 25 50 75 100])
% 3.0435e-05 1.0524e-05
% 3.4133e-05 1.2231e-05
% 3.7262e-05 1.3369e-05
% 3.9111e-05 1.4507e-05
% 0.0027426 0.0020301
I hope this can help somebody.
EDIT
If there is a significant chance of having exact matches, it pays off to use the very fast built-in ismember before calling the function:
[found, idx] = ismember(range, x);
idx(~found) = arrayfun(#(r) aux(x, r), range(~found));
I need to write MATLAB code that will integrate over a R^5 hypercube using Monte Carlo. I have a basic algorithm that works when I have a generic function. But the function I need to integrate is:
∫dA
A is an element of R^5.
If I had ∫f(x)dA then I think my algorithm would work.
Here is the algorithm:
% Writen by Jerome W Lindsey III
clear;
n = 10000;
% Make a matrix of the same dimension
% as the problem. Each row is a dimension
A = rand(5,n);
% Vector to contain the solution
B = zeros(1,n);
for k = 1:n
% insert the integrand here
% I don't know how to enter a function {f(1,n), f(2,n), … f(5n)} that
% will give me the proper solution
% I threw in a function that will spit out 5!
% because that is the correct solution.
B(k) = 1 / (2 * 3 * 4 * 5);
end
mean(B)
In any case, I think I understand what the intent here is, although it does seem like somewhat of a contrived exercise. Consider the problem of trying to find the area of a circle via MC, as discussed here. Here samples are being drawn from a unit square, and the function takes on the value 1 inside the circle and 0 outside. To find the volume of a cube in R^5, we could sample from something else that contains the cube and use an analogous procedure to compute the desired volume. Hopefully this is enough of a hint to make the rest of the implementation straightforward.
I'm guessing here a bit since the numbers you give as "correct" answer don't match to how you state the exercise (volume of unit hypercube is 1).
Given the result should be 1/120 - could it be that you are supposed to integrate the standard simplex in the hypercube?
The your function would be clear. f(x) = 1 if sum(x) < 1; 0 otherwise
%Question 2, problem set 1
% Writen by Jerome W Lindsey III
clear;
n = 10000;
% Make a matrix of the same dimension
% as the problem. Each row is a dimension
A = rand(5,n);
% Vector to contain the solution
B = zeros(1,n);
for k = 1:n
% insert the integrand here
% this bit of code works as the integrand
if sum(A(:,k)) < 1
B(k) = 1;
end
end
clear k;
clear A;
% Begin error estimation calculations
std_mc = std(B);
clear n;
clear B;
% using the error I calculate a new random
% vector of corect length
N_new = round(std_mc ^ 2 * 3.291 ^ 2 * 1000000);
A_new = rand(5, N_new);
B_new = zeros(1,N_new);
clear std_mc;
for k = 1:N_new
if sum(A_new(:,k)) < 1
B_new(k) = 1;
end
end
clear k;
clear A_new;
% collect descriptive statisitics
M_new = mean(B_new);
std_new = std(B_new);
MC_new_error_999 = std_new * 3.921 / sqrt(N_new);
clear N_new;
clear B_new;
clear std_new;
% Display Results
disp('Integral in question #2 is');
disp(M_new);
disp(' ');
disp('Monte Carlo Error');
disp(MC_new_error_999);