Make this matlab code run without a for loop - matlab

Lets say that I have an array x with some data values.
I performed a clustering algorithm that has produced a label map with the label names - labelMap. Each point in the data now has a unique cluster label associated to it.
I then perform the function foo(subset, secondArg) over each subset. The function foo returns a new array with the result which has the same size as its given the argument (its a map() function which also recieves a second argument).
What follows is the current implementation:
x = rand(1,1000);
numClusters = 3; % specified in advance by the user, for example using a clustering algorithm such as K-Means, this is a given.
fooSecondArg = [1,2,3]; % second argument for foo().
labelMap = kmeans(x,numClusters);
res = zeros(size(x));
%% make me run without a for loop :)
for ind = 1:numClusters
res(labelMap == ind) = foo(x(labelMap == ind), fooSecondArg(ind));
end
My question is as follows:
As there is no overlap between the indices in x foo() acts upon, is there a way to perform foo over x without using a for or a parfor loop? (I don't want to use a parfor loop as It takes quite a while to start the additional matlab processes and I cannot later on deploy it as a stand alone application).
Many thanks!
Edit:
I was asked for an example of foo() because I was told the solution may depend on it. A good viable example you may use in your answer:
function out = foo(x,secondArg)
out = x.^2/secondArg;

Whether the loop can be removed or not depends on the function you have. In your case the function is
function out = foo(x,secondArg)
out = x.^2/secondArg;
where secondArg depends on the cluster.
For this function (and for similar ones) you can indeed eliminate the loop as follows:
res = x.^2 ./ fooSecondArg(labelMap);
Here
labelMap is the same size of x (indicates the cluster label for each entry of x);
fooSecondArg is a vector of length equal to the number of clusters (indicates the second argument for each cluster).
Thus fooSecondArg(labelMap) is an array the same size of x, which for each point gives the second argument corresponding to that point's cluster. The operator ./ performs element-wise division to produce the desired result.

Related

Make vector of elements less than each element of another vector

I have a vector, v, of N positive integers whose values I do not know ahead of time. I would like to construct another vector, a, where the values in this new vector are determined by the values in v according to the following rules:
- The elements in a are all integers up to and including the value of each element in v
- 0 entries are included only once, but positive integers appear twice in a row
For example, if v is [1,0,2] then a should be: [0,1,1,0,0,1,1,2,2].
Is there a way to do this without just doing a for-loop with lots of if statements?
I've written the code in loop format but would like a vectorized function to handle it.
The classical version of your problem is to create a vector a with the concatenation of 1:n(i) where n(i) is the ith entry in a vector b, e.g.
b = [1,4,2];
gives a vector a
a = [1,1,2,3,4,1,2];
This problem is solved using cumsum on a vector ones(1,sum(b)) but resetting the sum at the points 1+cumsum(b(1:end-1)) corresponding to where the next sequence starts.
To solve your specific problem, we can do something similar. As you need two entries per step, we use a vector 0.5 * ones(1,sum(b*2+1)) together with floor. As you in addition only want the entry 0 to occur once, we will just have to start each sequence at 0.5 instead of at 0 (which would yield floor([0,0.5,...]) = [0,0,...]).
So in total we have something like
% construct the list of 0.5s
a = 0.5*ones(1,sum(b*2+1))
% Reset the sum where a new sequence should start
a(cumsum(b(1:end-1)*2+1)+1) =a(cumsum(b(1:end-1)*2+1)+1)*2 -(b(1:end-1)+1)
% Cumulate it and find the floor
a = floor(cumsum(a))
Note that all operations here are vectorised!
Benchmark:
You can do a benchmark using the following code
function SO()
b =randi([0,100],[1,1000]);
t1 = timeit(#() Nicky(b));
t2 = timeit(#() Recursive(b));
t3 = timeit(#() oneliner(b));
if all(Nicky(b) == Recursive(b)) && all(Recursive(b) == oneliner(b))
disp("All methods give the same result")
else
disp("Something wrong!")
end
disp("Vectorised time: "+t1+"s")
disp("Recursive time: "+t2+"s")
disp("One-Liner time: "+t3+"s")
end
function [a] = Nicky(b)
a = 0.5*ones(1,sum(b*2+1));
a(cumsum(b(1:end-1)*2+1)+1) =a(cumsum(b(1:end-1)*2+1)+1)*2 -(b(1:end-1)+1);
a = floor(cumsum(a));
end
function out=Recursive(arr)
out=myfun(arr);
function local_out=myfun(arr)
if isscalar(arr)
if arr
local_out=sort([0,1:arr,1:arr]); % this is faster
else
local_out=0;
end
else
local_out=[myfun(arr(1:end-1)),myfun(arr(end))];
end
end
end
function b = oneliner(a)
b = cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),a,'UniformOutput',false));
end
Which gives me
All methods give the same result
Vectorised time: 0.00083574s
Recursive time: 0.0074404s
One-Liner time: 0.0099933s
So the vectorised one is indeed the fastest, by a factor approximately 10.
This can be done with a one-liner using eval:
a = eval(['[' sprintf('sort([0 1:%i 1:%i]) ',[v(:) v(:)]') ']']);
Here is another solution that does not use eval. Not sure what is intended by "vectorized function" but the following code is compact and can be easily made into a function:
a = [];
for i = 1:numel(v)
a = [a sort([0 1:v(i) 1:v(i)])];
end
Is there a way to do this without just doing a for loop with lots of if statements?
Sure. How about recursion? Of course, there is no guarantee that Matlab has tail call optimization.
For example, in a file named filename.m
function out=filename(arr)
out=myfun(in);
function local_out=myfun(arr)
if isscalar(arr)
if arr
local_out=sort([0,1:arr,1:arr]); % this is faster
else
local_out=0;
end
else
local_out=[myfun(arr(1:end-1)),myfun(arr(end))];
end
end
end
in cmd, type
input=[1,0,2];
filename(input);
You can take off the parent function. I added it just hoping Matlab can spot the recursion within filename.m and optimize for it.
would like a vectorized function to handle it.
Sure. Although I don't see the point of vectorizing in such a unique puzzle that is not generalizable to other applications. I also don't foresee a performance boost.
For example, assuming input is 1-by-N. In cmd, type
input=[1,0,2];
cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),input,'UniformOutput',false)
Benchmark
In R2018a
>> clear all
>> in=randi([0,100],[1,100]); N=10000;
>> T=zeros(N,1);tic; for i=1:N; filename(in) ;T(i)=toc;end; mean(T),
ans =
1.5647
>> T=zeros(N,1);tic; for i=1:N; cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),in,'UniformOutput',false)); T(i)=toc;end; mean(T),
ans =
3.8699
Ofc, I tested with a few more different inputs. The 'vectorized' method is always about twice as long.
Conclusion: Recursion is faster.

Indexing elements of parameters of a function within nested for loops

I have two matrices of results, A = 128x631 and B = 128x1014 and I have a function SSD that takes two elements (x,y) as parameters and then calculates the sum of squared differences. I also have a 631x1014 matrix of 0s, called SSDMatrix, ready to put the results of my SSD function into.
What I'm trying to do is compare each element of A with each element of B by passing them into SSD, but I can't figure out how to structure my for loops to get the desired results.
When I try:
SSDMatrix = SSD(A, B);
I get exactly the result I'm looking for, but only for the first cell. How can I repeat this process for each element of A and B?
Currently I have this:
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = SSD(A,B);
end
end
This just results in the first answer being repeated 631*1014 times, so I need a way to index A and B to get the appropriate answer for each (i,j) of SSDMatrix.
It seems you were needed to do something like this -
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = sum( (A(:,i) - B(:,j)).^ 2 );
end
end
This, you can achieve with pdist2 as well that gets us the square root of summed squared distances. Now, please do note that pdist2 is part of the Statistics Toolbox. So, to get the desired output, you can do -
out = pdist2(A.',B.').^2;
Or with bsxfun -
out = squeeze(sum(bsxfun(#minus,A,permute(B,[1 3 2])).^2,1));

Subscript indices must either be real positive integers or logicals error within Matlab decay program

I am having issues with a code of mine dealing with decay. The error "Subscript indices must either be real positive integers or logicals" continues to occur no matter how many times I attempt to fix the line of code: M=M(t)+h.*F
Here is the complete code so that it may be easier to solve the issue:
M=10000;
M=#(t) M*exp(-4.5*t);
F=-4.5*M(t);
h=.1;
t(1)=0;
tmax=20;
n=(tmax-t(1))/h;
i=1;
while h<=.5
while i<=n
t=t+h;
M=M(t)+h.*F;
data_out=[t,M];
dlmwrite('single_decay_euler_h.txt',data_out,'delimiter','\t','-append');
i=i+1;
end
h=h+.1;
end
Thanks for any help.
In the start, you're setting M = 5000;. In the following line, you're creating an anonymous function also called M:
M=#(t) M*exp(-4.5*t);
Now, your initial M = 5000 variable has been overwritten, and is substituted by the function:
M(t) = 5000 * exp(-4.5*t); %// Note that the first M is used to get 5000
Thereafter you do F = -4.5*M(t). I don't know what the value t is here, but you're giving F the value -4.5 * 5000 * exp(-4.5*t), for some value of t. You are not creating a function F.
In the first iteration of the loop, M=M(t)+h.*F; is interpreted as:
M = 5000 * exp(-4.5*0) + 0.1*F %// Where F has some value determined by previous
%// the function above and the previous value of t
%// -4.5*0 is because t = 0
M is now no longer a function, but a single scalar value. The next iteration t = 0.1. When you do: M=M(t)+h.*F; now, it interprets both the first and second M as a variable, not a function. t is therefore used as an index, instead of being an input parameter to the function M (since you have overwritten it).
When you are writing M(t), you are trying to access the 0.1'th element of the 1x1 matrix (scalar) M, which obviously isn't possible.
Additional notes:
The outer while loop has no purpose as it stands now, since i isn't reset after the inner loop. When you're finished with the first iteration of the outer loop, i is already >n, so it will never enter the inner loop again.
You shouldn't mix variable and function names (as you do with M. Use different names, always. Unless you have a very good reason not to.
data_out=[t,M]; is a growing vector inside a loop. This is considered very bad practice, ans is very slow. It's better to pre-allocate memory for the vector, for instance using data_out = zeros(k,1), and insert new values using indexes, data_out(ii) = M.
It's recommended not to use i and j as variable names in MATLAB as these also represent the imaginary unit sqrt(-1). This might cause some strange bugs if you're not paying attention to it.
You can almost certainly do what you're trying to do without loops. However, the function you have written is not functioning, and it's not explained all too well what you're trying to do, so it's hard to give advice as to how you can get what you want (but I'll give it a try). I'm skipping the dlmwrite-part, because I don't really understand what you want to output.
M = 5000;
t0 = 0;
tmax = 20;
h = 0.1; %// I prefer leading zeros in decimal numbers
t = t0: h: tmax;
data_out = M .* exp(-4.5 * t);
The problem is caused by M(t) in your code, because t is not an integer or logical (t=1,1.1,1.2,...)
You need to change your code to pass an integer as a subscript. Either multiply t by 10, or don't use the matrix M if you don't need it.

Why Self-defined Matlab Function Return Empty Matrix

The function is supposed to return a value. However, when I type:
val(1d4)
it returns
ans =
[]
However, if I copied line by line, and set all parameters (V eps_today etc.), and run in the command window, it works fine...Have no clue where the problem is. Thanks:)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This function calculates the value function of both small and big%
%investment %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function negval=val(k_next)
global V eps_today k_today eps_gv k_gv z_today delta xi
theta=0.5327; beta=0.95; rho=0.7675; sigma_eta=0.7704;
%identify the grid point that falls just below
klo=max(sum(k_next>k_gv),1);
%above
khi=klo+1;
%Linear Interpolation, effectively only on the dimension of k
if klo<size(k_gv,2)
V_temp=V(:,klo)+(k_next-k_gv(klo))/(k_gv(khi)-k_gv(klo))*(V(:,khi)-V(:,klo));
else
%when k_next> maximum point of k_gv
V_temp=V(:,size(k_gv,2));
end
EV=mean(V_temp,1);
negval=-(exp(z_today+eps_today)*k_today^theta-xi*k_today-(k_next-(1-delta)*k_today)+beta*EV);
end
I tried to replicate your problem but I couldn't. If I do not set the global variables Matlab shouts and refuses to proceed through the function. Setting some of them to empty values was the same (did not try them all). If I do set the globals then the function always gives me a value.
But I strongly suspect it has something to do with the globals anyway (can you show the code where they are initialized?). As I mentioned in the comment I would get rid of them, the code could look like this
Main programme
%set parameters
Pars.beta = beta=0.95;
Pars.theta = 0.5327;
Pars.rho=0.7675;
Pars.sigma_eta=0.7704;
Pars.delta = ...;
Pars.xi = ...;
%set grid
eps_gv = ...
k_gv = ...
ne = length(eps_gv);
nk = length(k_gv);
V = zeros(ne,nk);
k_next = ...;
value = val(k_next,V,eps_gv,k_gv,Pars);
Val function
negval = function val(k_next,V,eps_gv,k_gv,Pars)
theta=Pars.theta; beta=Pars.beta; rho=Pars.rho; etc.
You don't even need eps_today and k_today as these should be the values on the grid (ie eps_gv and k_gv). I don't know what z_today is but there should either be a grid for z or z_today should be a parameter, in which case it should be in Pars. Of course if you use eps_gv and k_gv instead of eps_today and k_today the negval = ... line needs to be modified to take account of them being arrays not scalars.
One last comment, there is a bug on the EV=mean(V_temp,1); line. The mean function calculates the average of (the columns of) V. To calculate the expected value you need to do a weighted average where you weight each row of V by the probability of eps being eps_gv(1) (ie sum(V(i,:)*prob_eps(i)), sum going over i) . What you did works only if all shocks have equal probability. Pointing out since I am not sure you are aware of that.

nlfilter taking same values twice

I used nlfilter for a test function of mine as follows:
function funct
clear all;
clc;
I = rand(11,11);
ld = input('Enter the lag = ') % prompt for lag distance
A = nlfilter(I, [7 7], #dirvar);
% Subfunction
function [h] = dirvar(I)
c = (size(I)+1)/2
EW = I(c(1),c(2):end)
h = length(EW) - ld
end
end
The function works fine but it is expected that nlfilter progresses element by element, but in first two iterations the values of EW will be same 0.2089 0.4162 0.9398 0.1058. But then onwards for all iterations the next element is selected, for 3rd it is 0.4162 0.9398 0.1058 0.1920, for 4th it is 0.9398 0.1058 0.1920 0.5201 and so on. Why is it so?
This is nothing to worry about. It happens because nlfilter needs to evaluate your function to know what kind of output to create. So it uses feval once before starting to move across the image. The output from this feval call is what you see the first time.
From the nlfilter code:
% Find out what output type to make.
rows = 0:(nhood(1)-1);
cols = 0:(nhood(2)-1);
b = mkconstarray(class(feval(fun,aa(1+rows,1+cols),params{:})), 0, size(a));
% Apply fun to each neighborhood of a
f = waitbar(0,'Applying neighborhood operation...');
for i=1:ma,
for j=1:na,
x = aa(i+rows,j+cols);
b(i,j) = feval(fun,x,params{:});
end
waitbar(i/ma)
end
The 4th line call to eval is what you observe as the first output from EW, but it is not used to anything other than making the b matrix the right class. All the proper iterations happen in the for loop below. This means that the "duplicate" values you observe does not affect your final output matrix, and you need not worry.
I hope you know what the length function does? It does not give you the Euclidean length of a vector, but rather the largest dimension of a vector (so in your case, that should be 4). If you want the Euclidean length (or 2-norm), use the function norm instead. If your code does the right thing, you might want to use something like:
sz = size(I,2);
h = sz - (sz+1)/2 - ld;
In your example, this means that depending on the lag you provide, the output should be constant. Also note that you might want to put semicolons after each line in your subfunction and that using clear all as the first line of a function is useless since a function will always be executed in its own workspace (that will however clear persistent or global variables, but you don't use them in your code).