Summation of dense and sparse vectors - matlab

I need to sum up a dense and a sparse vectors in Matlab, and the naive way of doing it, that is:
w = rand(1e7,1);
d = sprand(1e7,1,.001);
tic
for k = 1 : 100
w = w + d;
end
toc
takes about 3.5 seconds, which is about 20 times slower then the way I'd expect Matlab to implement it behind the hood:
for k = 1 : 100
ind = find(d);
w(ind) = w(ind) + d(ind);
end
(of course the timing of this faster version depends on the sparsity).
So, why doesn't Matlab do it 'the fast way'? My experience with Matlab till now suggests that it is quite good at utilizing sparsity.
More important, are there other 'sparse' operations I should suspect as being not efficient?

I don't know the answer for sure, but I will give you my guess about what is happening. I don't know Fortran, but from a C++ perspective, what you are showing makes sense, we just need to deconstruct that statement.
The pseudo-code translation of a = b + c where a,b are full and c is sparse would look something like a.operator= ( b.operator+ (c) ).
In all likelihood, full matrix containers in Matlab should have specialised arithmetic operators to deal with sparse inputs, ie something like full full::operator+ ( const sparse& ). The main thing to notice here is that the result of a mixed full/sparse arithmetic operation has to be full. So we're going to need to create a new full container to store the result, even though there are few updated values.
[ Note: the returned full container is a temporary, and so the assignment a.operator= ( ... ) may avoid a full additional copy, eg with full& full::operator= ( full&& ). ]
Unfortunately, there is no way around returning a new full container because there is no arithmetic compound operations in Matlab (ie operator +=). This is why Matlab cannot exploit the fact that in your example a is the same as b (try to time your loop with x = w + d instead, there's no runtime difference), and this is where the overhead comes from IMO.
[ Note: even when there is no explicit assignment, eg b+c;, the generic answer variable ans is assigned. ]
Interestingly, there seems to be a notable difference between full full::operator+ ( const sparse& ) and full sparse::operator+ ( const full& ), that is between a = b + c and a = c + b; I can't say more about why that is, but the latter seems to be faster.
In any case, my short answer is "because Matlab doesn't have arithmetic compound operators", which is unfortunate. If you know you are going to do these operations a lot though, it shouldn't be too hard to implement ad hoc optimized versions like the one you proposed. Hope this helps!

Related

Matlab symbolic

I am trying to compare two simple expressions using Matlab symbolic toolbox. For some reason, the code returns 0. Any idea ?
syms a b c
A = (a/b)^c
B = a^c/b^c
isequal(A,B)
It seems like MATLAB has a hard time telling that two expressions are the same when (potentially) fractional exponents are involved.
So, one solution, as suggested by Mikhail is to restrict the values of c to be only integers although, as discussed in the Math.SE question jodag posted, there is nothing wrong with fractional exponents in this case.
Hence, since this restriction to integers is not necessary for the statement to be true, another solution is to use simplify function on the expression for B but allowing it to run more simplification steps in order to get the most simplified expression.
syms a b c
A = (a/b)^c
B = a^c/b^c
isequal(A,simplify(B,'step',4))
Four steps is actually the smallest number that worked for me, but that could vary across versions of MATLAB I'm assuming. To be sure, I would include more, but for really large expressions, this could become computationally intensive, so some judgment is necessary. Note that, you could also use the 'Seconds' option to limit the amount of time allowed for simplification.
In general what you wrote isn't true, under the right "assumptions" it becomes true: for example, assuming c is an integer you can trick MATLAB into expanding A
clc; clear all;
syms a
syms b
syms c integer
A = (a/b)^c;
B = simplify((a^c)/(b^c));
disp(isequal(A,B));
disp(A);
disp(B);
1
a^c/b^c
a^c/b^c

SciPy.optimize.least_squares() Objective Function Questions

I am trying to minimize a highly non-linear function by optimizing three unknown parameters a, b, and c0. I'm attempting to replicate some governing equations of a casino roulette ball in Python 3.
Here is the link to the research paper:
http://www.dewtronics.com/tutorials/roulette/documents/Roulette_Physik.pdf
I will be referencing equations (35) and (40) in the paper.
Basically, I take stopwatch lap measurements of the roulette ball spinning on the wheel. For each successive lap, the lap time will increase because of losses of momentum to non-conservative forces of friction. Then I take these time measurements and fit equation (35) using a Levenberg-Marquardt least squares method in equation (40).
My question is twofold:
(1) I'm using the scipy.optimize.least_squares() method='lm', and I'm not sure how to write the objective function! Right now I have the function written exactly as is in the paper:
def fall_time(k,a,b,c0):
F = (1 / (a * b)) * (c0 - np.arcsinh(c0) * np.exp(a * k * 2 * np.pi))
return F
def parameter_estimation_function(x0,tk):
a = x0[0]
b = x0[1]
c0 = x0[2]
S = 0
for i,t in enumerate(tk):
k = i + 1
S += (t - fall_time(k,a,b,c0))**2
return [S,1,1]
sol = least_squares(parameter_estimation_function,[0.1,0.8,-0.1],args=([tk1]),method='lm',jac='2-point',max_nfev=2000)
print(sol)
Now, in the documentation examples, I never saw the objective function written the way I have it. In the documentation, the objective function is always returns the residual, not the square of the residual. Additionally, in the documentation they never use the sum! So I'm wondering if the sum and the square are automatically handled under the hood of least_squares()?
(2) Perhaps my second question is a result of my failure to understand how to write the objective function. But anyhow, I'm having trouble getting the algorithm to converge on the minimum. I know this is because the levenberg alogrithm is "greedy" and stops near the closest minima, but I figured that I would be able to at least converge on about the same result given different initial guesses. With slight alterations in the initial guess, I'm getting parameter results with different signs. Additionally, I've yet to find a combination of initial guesses that allows the algo to converge! It always times out before it finds the solution. I've even increased the amount of function evaluations to 10,000 to see if it would. To no avail!
Perhaps somebody could shed some light on my mistakes here! I'm still relatively new to python and the scipy library!
Here is some sample data for tk that I've measured myself from the video here: https://www.youtube.com/watch?v=0Zj_9ypBnzg
tk = [0.52,1.28,2.04,3.17,4.53,6.22]
tk1 = [0.51,1.4,2.09,3,4.42,6.17]
tk2 = [0.63,1.35,2.19,3.02,4.57,6.29]
tk3 = [0.63,1.39,2.23,3.28,4.70,6.32]
tk4 = [0.57,1.4,2.1,3.06,4.53,6.17]
Thanks
1) Yes, as you suspected the sum and the square of the residuals are automatically handled.
2) Hard to say, since I'm not deeply familiar with the problem (e.g., how many local minima exist, what constitutes a 'reasonable' result, etc.). I may investigate more later.
But for kicks I fiddled with some of the values to see what would happen. For example, you can just replace the 1/b constant with a standalone variable b_inv, and this seemed to stabilize the results quite a bit. Here's the code I used to check results. (Note that I rewrote the objective function for brevity. It simply leverages the element-wise operations of numpy arrays, without changing the overall result.)
import numpy as np
from scipy.optimize import least_squares
def fall_time(k,a,b_inv,c0):
return (b_inv / a) * (c0 - np.arcsinh(c0) * np.exp(a * k * 2 * np.pi))
def parameter_estimation_function(x,tk):
return np.asarray(tk) - fall_time(k=np.arange(1,len(tk)+1), a=x[0],b_inv=x[1],c0=x[2])
tk_samples = [
[0.52,1.28,2.04,3.17,4.53,6.22],
[0.51,1.4,2.09,3,4.42,6.17],
[0.63,1.35,2.19,3.02,4.57,6.29],
[0.63,1.39,2.23,3.28,4.70,6.32],
[0.57,1.4,2.1,3.06,4.53,6.17]
]
for i in range(len(tk_samples)):
sol = least_squares(parameter_estimation_function,[0.1,1.25,-0.1],
args=(tk_samples[i],),method='lm',jac='2-point',max_nfev=2000)
print(sol.x)
with console output:
[ 0.03621789 0.64201913 -0.12072879]
[ 3.59319972e-02 1.17129458e+01 -6.53358716e-03]
[ 3.55516005e-02 1.48491493e+01 -5.31098257e-03]
[ 3.18068316e-02 1.11828091e+01 -7.75329834e-03]
[ 3.43920725e-02 1.25160378e+01 -6.36307506e-03]

Vectorize matlab code to map nearest values in two arrays

I have two lists of timestamps and I'm trying to create a map between them that uses the imu_ts as the true time and tries to find the nearest vicon_ts value to it. The output is a 3xd matrix where the first row is the imu_ts index, the third row is the unix time at that index, and the second row is the index of the closest vicon_ts value above the timestamp in the same column.
Here's my code so far and it works, but it's really slow. I'm not sure how to vectorize it.
function tmap = sync_times(imu_ts, vicon_ts)
tstart = max(vicon_ts(1), imu_ts(1));
tstop = min(vicon_ts(end), imu_ts(end));
%trim imu data to
tmap(1,:) = find(imu_ts >= tstart & imu_ts <= tstop);
tmap(3,:) = imu_ts(tmap(1,:));%Use imu_ts as ground truth
%Find nearest indecies in vicon data and map
vic_t = 1;
for i = 1:size(tmap,2)
%
while(vicon_ts(vic_t) < tmap(3,i))
vic_t = vic_t + 1;
end
tmap(2,i) = vic_t;
end
The timestamps are already sorted in ascending order, so this is essentially an O(n) operation but because it's looped it runs slowly. Any vectorized ways to do the same thing?
Edit
It appears to be running faster than I expected or first measured, so this is no longer a critical issue. But I would be interested to see if there are any good solutions to this problem.
Have a look at knnsearch in MATLAB. Use cityblock distance and also put an additional constraint that the data point in vicon_ts should be less than its neighbour in imu_ts. If it is not then take the next index. This is required because cityblock takes absolute distance. Another option (and preferred) is to write your custom distance function.
I believe that your current method is sound, and I would not try and vectorize any further. Vectorization can actually be harmful when you are trying to optimize some inner loops, especially when you know more about the context of your data (e.g. it is sorted) than the Mathworks engineers can know.
Things that I typically look for when I need to optimize some piece of code liek this are:
All arrays are pre-allocated (this is the biggest driver of performance)
Fast inner loops use simple code (Matlab does pretty effective JIT on basic commands, but must interpret others.)
Take advantage of any special data features that you have, e.g. use sort appropriate algorithms and early exit conditions from some loops.
You're already doing all this. I recommend no change.
A good start might be to get rid of the while, try something like:
for i = 1:size(tmap,2)
C = max(0,tmap(3,:)-vicon_ts(i));
tmap(2,i) = find(C==min(C));
end

Replacement for repmat in MATLAB

I have a function which does the following loop many, many times:
for cluster=1:max(bins), % bins is a list in the same format as kmeans() IDX output
select=bins==cluster; % find group of values
means(select,:)=repmat_fast_spec(meanOneIn(x(select,:)),sum(select),1);
% (*, above) for each point, write the mean of all points in x that
% share its label in bins to the equivalent row of means
delta_x(select,:)=x(select,:)-(means(select,:));
%subtract out the mean from each point
end
Noting that repmat_fast_spec and meanOneIn are stripped-down versions of repmat() and mean(), respectively, I'm wondering if there's a way to do the assignment in the line labeled (*) that avoids repmat entirely.
Any other thoughts on how to squeeze performance out of this thing would also be welcome.
Here is a possible improvement to avoid REPMAT:
x = rand(20,4);
bins = randi(3,[20 1]);
d = zeros(size(x));
for i=1:max(bins)
idx = (bins==i);
d(idx,:) = bsxfun(#minus, x(idx,:), mean(x(idx,:)));
end
Another possibility:
x = rand(20,4);
bins = randi(3,[20 1]);
m = zeros(max(bins),size(x,2));
for i=1:max(bins)
m(i,:) = mean( x(bins==i,:) );
end
dd = x - m(bins,:);
One obvious way to speed up calculation in MATLAB is to make a MEX file. You can compile C code and perform any operations you want. If you're searching for the fastest-possible performance, turning the operation into a custom MEX file would likely be the way to go.
You may be able to get some improvement by using ACCUMARRAY.
%# gather array sizes
[nPts,nDims] = size(x);
nBins = max(bins);
%# calculate means. Not sure whether it might be faster to loop over nDims
meansCell = accumarray(bins,1:nPts,[nBins,1],#(idx){mean(x(idx,:),1)},{NaN(1,nDims)});
means = cell2mat(meansCell);
%# subtract cluster means from x - this is how you can avoid repmat in your code, btw.
%# all you need is the array with cluster means.
delta_x = x - means(bins,:);
First of all: format your code properly, surround any operator or assignment by whitespace. I find your code very hard to comprehend as it looks like a big blob of characters.
Next of all, you could follow the other responses and convert the code to C (mex) or Java, automatically or manually, but in my humble opinion this is a last resort. You should only do such things when your performance is not there yet by a small margin. On the other hand, your algorithm doesn't show obvious flaws.
But the first thing you should do when trying to improve performance: profile. Use the MATLAB profiler to determine which part of your code is causing your problems. How much would you need to improve this to meet your expectations? If you don't know: first determine this boundary, otherwise you will be looking for a needle in a hay stack which might not even be in there in the first place. MATLAB will never be the fastest kid on the block with respect to runtime, but it might be the fastest with respect to development time for certain kinds of operations. In that respect, it might prove useful to sacrifice the clarity of MATLAB over the execution speed of other languages (C or even Java). But in the same respect, you might as well code everything in assembler to squeeze all of the performance out of the code.
Another obvious way to speed up calculation in MATLAB is to make a Java library (similar to #aardvarkk's answer) since MATLAB is built on Java and has very good integration with user Java libraries.
Java's easier to interface and compile than C. It might be slower than C in some cases, but the just-in-time (JIT) compiler in the Java virtual machine generally speeds things up very well.

Parallelize or vectorize all-against-all operation on a large number of matrices?

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.