Is there a way to extend a vector by making it repeat itself?
>v = [1 2];
>v10 = v x 5; %x represents some function. Something like "1 2" x 5 in perl
Then v10 would be:
>v10
1 2 1 2 1 2 1 2 1 2
This should work for the general case, not just for [1 2]
The function you're looking for is repmat().
v10 = repmat(v, 1, 5)
Obviously repmat is the way to go if you know in which direction you want to expand the vector.
However, if you want a general solution that always repeats the vector in the longest direction, this combination of repmat and indexing should do the trick:
v10=v(repmat(1:length(v),1,5))
Although late, I am posting this because this turned out to be the most popular answer to a similar question here.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. I came across this trick in one of the Electrical Engineering course lectures notes of Columbia University. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b
Related
This is my code:
variables=1000;
t=20;
x=zeros(t,t,3);
y=rand(variables,3);
z=rand(t,t,variables);
e=rand(variables,1);
for c=1:variables
x(:,:,1)=x(:,:,1)+y(c,1).*((z(:,:,c)-e(c)).^2);
x(:,:,2)=x(:,:,2)+y(c,2).*((z(:,:,c)-e(c)).^2);
x(:,:,3)=x(:,:,3)+y(c,3).*((z(:,:,c)-e(c)).^2);
end
How can I improve calculation speed on this loop? I think that the problem is the for loop with a large c.
It's a myth, but alas a persistent one, that loops are slow in MATLAB. As you've written your for loop, it goes sequentially through the last dimension of your variables. That pretty much translates to a FORTRAN loop directly, leaving little room for improvement using vectorisation. The below does vectorise your output as much as possible, but doesn't improve performance much, even though reshape() is almost free, and severely degrades readability.
In each iteration, all you're doing is calculating y(c,1).*((z(:,:,c)-e(c)).^2), which is added to the total. If we are able to vectorise that expression, we can sum over the dimension of c to get rid of the loop.
z(:,:,c)-e(c) can be vectorised by adding two singleton dimensions to e: reshape(e, [1 1 numel(e)]), then subtract and power by 2 as usual.
Multiplication by y(c,1) also works, if we add two singleton dimensions to y(:,1):, reshape(y(:,1), [1 1 numel(e)]), then multiply again as usual.
Finally, we just need to sum over our 3rd dimension and we end up with our t -by- t result: sum(tmp2, 3).
All that's left are the hardcoded three dimensions in x, which I've left be in a loop.
The working code on R2007b:
variables=10;
t=2;
x=zeros(t,t,3);
y=rand(variables,3);
z=rand(t,t,variables);
e=rand(variables,1);
for ii = 1:size(x, 3)
x(:, :, ii) = sum(bsxfun(#times, reshape(y(:,1), [1 1 numel(e)]), bsxfun(#minus, z, reshape(e, [1 1 numel(e)])).^2), 3);
end
I wasn't sure what to do with the hardcoded dimension of 3, so I just left a loop over that. The rest is vectorised away, thanks to a few reshape() calls to arrange the dimensions for the bsxfun() expansion.
Code for >R2016b with implicit expansion:
for ii = 1:size(x, 3)
x(:, :, ii) = sum(reshape(y(:,ii), [1 1 numel(e)]) .* (z - reshape(e, [1 1 numel(e)])).^2, 3)
end
A quick timing comparison shows that this is roughly 2x faster than your original loop:
Elapsed time is 0.780516 seconds. Original code
Elapsed time is 0.397369 seconds. My bsxfun() solution
Elapsed time is 0.305160 seconds. My implicit expansion
Note that in the above a 100 loops were ran for each code version, i.e. timings are 8ms, 4ms and 3ms per version.
For an introduction to reshape() you can refer to this answer of mine.
The documentation article on implicit broadcasting is rather good, as is this blog.
Is there a vectorised way to do the following? (shown by an example):
input_lengths = [ 1 1 1 4 3 2 1 ]
result = [ 1 2 3 4 4 4 4 5 5 5 6 6 7 ]
I have spaced out the input_lengths so it is easy to understand how the result is obtained
The resultant vector is of length: sum(lengths). I currently calculate result using the following loop:
result = ones(1, sum(input_lengths ));
counter = 1;
for i = 1:length(input_lengths)
start_index = counter;
end_index = counter + input_lengths (i) - 1;
result(start_index:end_index) = i;
counter = end_index + 1;
end
EDIT:
I can also do this using arrayfun (although that is not exactly a vectorised function)
cell_result = arrayfun(#(x) repmat(x, 1, input_lengths(x)), 1:length(input_lengths), 'UniformOutput', false);
cell_result : {[1], [2], [3], [4 4 4 4], [5 5 5], [6 6], [7]}
result = [cell_result{:}];
result : [ 1 2 3 4 4 4 4 5 5 5 6 6 7 ]
A fully vectorized version:
selector=bsxfun(#le,[1:max(input_lengths)]',input_lengths);
V=repmat([1:size(selector,2)],size(selector,1),1);
result=V(selector);
Downside is, the memory usage is O(numel(input_lengths)*max(input_lengths))
Benchmark of all solutions
Following the previous benchmark, I group all solutions given here in a script and run it a few hours for a benchmark. I've done this because I think it's good to see what is the performance of each proposed solution with the input lenght as parameter - my intention is not here to put down the quality of the previous one, which gives additional information about the effect of JIT. Moreover, and every participant seems to agree with that, quite a good work was done in all answers, so this great post deserves a conclusion post.
I won't post the code of the script here, this is quite long and very uninteresting. The procedure of the benchmark is to run each solution for a set of different lengths of input vectors: 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000. For each input length, I've generated a random input vector based on Poisson law with parameter 0.8 (to avoid big values):
input_lengths = round(-log(1-rand(1,ILen(i)))/poisson_alpha)+1;
Finally, I average the computation times over 100 runs per input length.
I've run the script on my laptop computer (core I7) with Matlab R2013b; JIT is activated.
And here are the plotted results (sorry, color lines), in a log-log scale (x-axis: input length; y-axis: computation time in seconds):
So Luis Mendo is the clear winner, congrats!
For anyone who wants the numerical results and/or wants to replot them, here they are (cut the table into 2 parts and approximated to 3 digits, for a better display):
N 10 20 50 100 200 500 1e+03 2e+03
-------------------------------------------------------------------------------------------------------------
OP's for-loop 8.02e-05 0.000133 0.00029 0.00036 0.000581 0.00137 0.00248 0.00542
OP's arrayfun 0.00072 0.00117 0.00255 0.00326 0.00514 0.0124 0.0222 0.047
Daniel 0.000132 0.000132 0.000148 0.000118 0.000126 0.000325 0.000397 0.000651
Divakar 0.00012 0.000114 0.000132 0.000106 0.000115 0.000292 0.000367 0.000641
David's for-loop 9.15e-05 0.000149 0.000322 0.00041 0.000654 0.00157 0.00275 0.00622
David's arrayfun 0.00052 0.000761 0.00152 0.00188 0.0029 0.00689 0.0122 0.0272
Luis Mendo 4.15e-05 4.37e-05 4.66e-05 3.49e-05 3.36e-05 4.37e-05 5.87e-05 0.000108
Bentoy13's cumsum 0.000104 0.000107 0.000111 7.9e-05 7.19e-05 8.69e-05 0.000102 0.000165
Bentoy13's sparse 8.9e-05 8.82e-05 9.23e-05 6.78e-05 6.44e-05 8.61e-05 0.000114 0.0002
Luis Mendo's optim. 3.99e-05 3.96e-05 4.08e-05 4.3e-05 4.61e-05 5.86e-05 7.66e-05 0.000111
N 5e+03 1e+04 2e+04 5e+04 1e+05 2e+05 5e+05 1e+06
-------------------------------------------------------------------------------------------------------------
OP's for-loop 0.0138 0.0278 0.0588 0.16 0.264 0.525 1.35 2.73
OP's arrayfun 0.118 0.239 0.533 1.46 2.42 4.83 12.2 24.8
Daniel 0.00105 0.0021 0.00461 0.0138 0.0242 0.0504 0.126 0.264
Divakar 0.00127 0.00284 0.00655 0.0203 0.0335 0.0684 0.185 0.396
David's for-loop 0.015 0.0286 0.065 0.175 0.3 0.605 1.56 3.16
David's arrayfun 0.0668 0.129 0.299 0.803 1.33 2.64 6.76 13.6
Luis Mendo 0.000236 0.000446 0.000863 0.00221 0.0049 0.0118 0.0299 0.0637
Bentoy13's cumsum 0.000318 0.000638 0.00107 0.00261 0.00498 0.0114 0.0283 0.0526
Bentoy13's sparse 0.000414 0.000774 0.00148 0.00451 0.00814 0.0191 0.0441 0.0877
Luis Mendo's optim. 0.000224 0.000413 0.000754 0.00207 0.00353 0.00832 0.0216 0.0441
Ok, I've added another solution to the list ... I could not prevent myself to optimize the best-so-far solution of Luis Mendo. No credit for that, it's just a variant from Luis Mendo's, I'll explain it later.
Clearly, the solutions using arrayfun are very time-consuming. The solutions using an explicit for loop are faster, yet still slow compared with others solutions. So yes, vectorizing is still a major option for optimizing a Matlab script.
Since I've seen a big dispersion on the computing times of the fastest solutions, especially with input lengths between 100 and 10000, I decide to benchmark more precisely. So I've put the slowest apart (sorry), and redo the benchmark over the 6 other solutions which run much faster. The second benchmark over this reduced list of solutions is identical except that I've average over 1000 runs.
(No table here, unless you really want to, it's quite the same numbers as before)
As it was remarked, the solution by Daniel is a little faster than the one by Divakar because it seems that the use of bsxfun with #times is slower than using repmat. Still, they are 10 times faster than for-loop solutions: clearly, vectorizing in Matlab is a good thing.
The solutions of Bentoy13 and Luis Mendo are very close; the first one uses more instructions, but the second one uses an extra allocation when concatenating 1 to cumsum(input_lengths(1:end-1)). And that's why we see that Bentoy13's solution tends to be a bit faster with big input lengths (above 5.10^5), because there is no extra allocation. From this consideration, I've made an optimized solution where there is no extra allocation; here is the code (Luis Mendo can put this one in his answer if he wants to :) ):
result = zeros(1,sum(input_lengths));
result(1) = 1;
result(1+cumsum(input_lengths(1:end-1))) = 1;
result = cumsum(result);
Any comment for improvement is welcome.
More of a comment than anything, but I did some tests. I tried a for loop, and an arrayfun, and I tested your for loop and arrayfun version. Your for loop was the fastest. I think this is because it is simple, and allows the JIT compilation to do the most optimisation. I am using Matlab, octave might be different.
And the timing:
Solution: With JIT Without JIT
Sam for 0.74 1.22
Sam arrayfun 2.85 2.85
My for 0.62 2.57
My arrayfun 1.27 3.81
Divakar 0.26 0.28
Bentoy 0.07 0.06
Daniel 0.15 0.16
Luis Mendo 0.07 0.06
So Bentoy's code is really fast, and Luis Mendo's is almost exactly the same speed. And I rely on JIT way too much!
And the code for my attempts
clc,clear
input_lengths = randi(20,[1 10000]);
% My for loop
tic()
C=cumsum(input_lengths);
D=diff(C);
results=zeros(1,C(end));
results(1,1:C(1))=1;
for i=2:length(input_lengths)
results(1,C(i-1)+1:C(i))=i*ones(1,D(i-1));
end
toc()
tic()
A=arrayfun(#(i) i*ones(1,input_lengths(i)),1:length(input_lengths),'UniformOutput',false);
R=[A{:}];
toc()
result = zeros(1,sum(input_lengths));
result(cumsum([1 input_lengths(1:end-1)])) = 1;
result = cumsum(result);
This should be pretty fast. And memory usage is the minimum possible.
An optimized version of the above code, due to Bentoy13 (see his very detailed benchmarking):
result = zeros(1,sum(input_lengths));
result(1) = 1;
result(1+cumsum(input_lengths(1:end-1))) = 1;
result = cumsum(result);
This is a slight variant of #Daniel's answer. The crux of this solution is based on that solution. Now this one avoids repmat, so in that way it's little-more "vectorized" maybe. Here's the code -
selector=bsxfun(#le,[1:max(input_lengths)]',input_lengths); %//'
V = bsxfun(#times,selector,1:numel(input_lengths));
result = V(V~=0)
For all the desperate one-liner searching people -
result = nonzeros(bsxfun(#times,bsxfun(#le,[1:max(input_lengths)]',input_lengths),1:numel(input_lengths)))
I search an elegant solution, and I think David's solution is a good start. What I have in mind is that one can generate the indexes where to add one from previous element.
For that, if we compute the cumsum of the input vector, we get:
cumsum(input_lengths)
ans = 1 2 3 7 10 12 13
This is the indexes of the ends of sequences of identical numbers. That is not what we want, so we flip the vector twice to get the beginnings:
fliplr(sum(input_lengths)+1-cumsum(fliplr(input_lengths)))
ans = 1 2 3 4 8 11 13
Here is the trick. You flip the vector, cumsum it to get the ends of the flipped vector, and then flip back; but you must substract the vector from the total length of the output vector (+1 because index starts at 1) because cumsum applies on the flipped vector.
Once you have done this, it's very straightforward, you just have to put 1 at computed indexes and 0 elsewhere, and cumsum it:
idx_begs = fliplr(sum(input_lengths)+1-cumsum(fliplr(input_lengths)));
result = zeros(1,sum(input_lengths));
result(idx_begs) = 1;
result = cumsum(result);
EDIT
First, please have a look at Luis Mendo's solution, it is very close to mine but is more simpler and a bit faster (I won't edit mine even it is very close). I think at this date this is the fastest solution from all.
Second, while looking at others solutions, I've made up another one-liner, a little different from my initial solution and from the other one-liner. Ok, this won't be very readable, so take a breath:
result = cumsum( full(sparse(cumsum([1,input_lengths(1:end-1)]), ...
ones(1,length(input_lengths)), 1, sum(input_lengths),1)) );
I cut it on two lines. Ok now let's explain it.
The similar part is to build the array of the indexes where to increment the value of the current element. I use the solution of Luis Mendo's for that. To build in one line the solution vector, I use here the fact that it is in fact a sparse representation of the binary vector, the one we will cumsum at the very end. This sparse vector is build using our computed index vector as x positions, a vector of 1 as y positions, and 1 as the value to put at these locations. A fourth argument is given to precise the total size of the vector (important if the last element of input_lengths is not 1). Then we get the full representation of this sparse vector (else the result is a sparse vector with no empty element) and we can cumsum.
There is no use of this solution other than to give another solution to this problem. A benchmark can show that it is slower than my original solution, because of a heavier memory load.
I have 12 sets of vectors (about 10-20 vectors each) and i want to pick one vector of each set so that a function f that takes the sum of these vectors as argument is maximized. In addition i have constraints for some components of that sum.
Example:
a_1 = [3 2 0 5], a_2 = [3 0 0 2], a_3 = [6 0 1 1], ... , a_20 = [2 12 4 3]
b_1 = [4 0 4 -2], b_2 = [0 0 1 0], b_3 = [2 0 0 4], ... , b_16 = [0 9 2 3]
...
l_1 = [4 0 2 0], l_2 = [0 1 -2 0], l_3 = [4 4 0 1], ... , l_19 = [3 0 9 0]
s = [s_1 s_2 s_3 s_4] = a_x + b_y + ... + l_z
Constraints:
s_1 > 40
s_2 < 100
s_4 > -20
Target: Chose x, y, ... , z to maximize f(s):
f(s) -> max
Where f is a nonlinear function that takes the vector s and returns a scalar.
Bruteforcing takes too long because there are about 5.9 trillion combinations, and since i need the maximum (or even better the top 10 combinations) i can not use any of the greedy algorithms that came to my mind.
The vectors are quite sparse, about 70-90% are zeros. If that is helping somehow ...?
The Matlab Optimization toolbox didnt help either since it doesnt much support for discrete optimization.
Basically this is a lock-picking problem, where the lock's pins have 20 distinct positions, and there are 12 pins. Also:
some of the pin's positions will be blocked, depending on the positions of all the other pins.
Depending on the specifics of the lock, there may be multiple keys that fit
...interesting!
Based on Rasman's approach and Phpdna's comment, and the assumption that you are using int8 as data type, under the given constraints there are
>> d = double(intmax('int8'));
>> (d-40) * (d+100) * (d+20) * 2*d
ans =
737388162
possible vectors s (give or take a few, haven't thought about +1's etc.). ~740 million evaluations of your relatively simple f(s) shouldn't take more than 2 seconds, and having found all s that maximize f(s), you are left with the problem of finding linear combinations in your vector set that add up to one of those solutions s.
Of course, this finding of combinations is no easy feat, and the whole method breaks down anyway if you are dealing with
int16: ans = 2.311325368800510e+018
int32: ans = 4.253529737045237e+037
int64: ans = 1.447401115466452e+076
So, I'll discuss a more direct and more general approach here.
Since we're talking integers and a fairly large search space, I'd suggest using a branch-and-bound algorithm. But unlike the bintprog algorithm, you'd have to use different branching strategies, and of course, these should be based on a non-linear objective function.
Unfortunately, there is nothing like this in the optimization toolbox (or the File Exchange as far as I could find). fmincon is a no-go, since it uses gradient and Hessian information (which will usually be all-zero for integers), and fminsearch is a no-go, since you'll need a really good initial estimate, and the rate of convergence is (roughly) O(N), meaning, for this 20-dimensional problem you'll have to wait quite long before convergence, without the guarantee of having found the global solution.
An interval method could be a possibility, however, I personally have very little experience with this. There is no native interval-related stuff in MATLAB or any of its toolboxes, but there's the freely available INTLAB.
So, if you're not feeling like implementing your own non-linear binary integer programming algorithm, or are not in the mood for an adventure with INTLAB, there's really only one thing left: heuristic methods. In this link there is a similar situation, with an outline of the solution: use the genetic algorithm (ga) from the Global Optimization toolbox.
I would implement the problem roughly like so:
function [sol, fval, exitflag] = bintprog_nonlinear()
%// insert your data here
%// Any sparsity you may have here will only make this more
%// *memory* efficient, not *computationally*
data = [...
... %// this will be an array with size 4-by-20-by-12
... %// (or some permutation of that you find more intuitive)
];
%// offsets into the 3D array to facilitate indexing a bit
offsets = bsxfun(#plus, ...
repmat(1:size(data,1), size(data,3),1), ...
(0:size(data,3)-1)' * size(data,1)*size(data,2)); %//'
%// your objective function
function val = obj(X)
%// limit "X" to integers in [1 20]
X = min(max(round(X),1),size(data,3));
%// "X" will be a collection of 12 integers between 0 and 20, which are
%// indices into the data matrix
%// form "s" from "X"
s = sum(bsxfun(#plus, offsets, X*size(data,1) - size(data,1)));
%// XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX
%// Compute the NEGATIVE VALUE of your function here
%// XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX
end
%// your "non-linear" constraint function
function [C, Ceq] = nonlcon(X)
%// limit "X" to integers in [1 20]
X = min(max(round(X),1),size(data,3));
%// form "s" from "X"
s = sum(bsxfun(#plus, offsets, X(:)*size(data,1) - size(data,1)));
%// we have no equality constraints
Ceq = [];
%// Compute inequality constraints
%// NOTE: solver is trying to solve C <= 0, so:
C = [...
40 - s(1)
s(2) - 100
-20 - s(4)
];
end
%// useful GA options
options = gaoptimset(...
'UseParallel', 'always'...
...
);
%// The rest really depends on the specifics of the problem.
%// Useful to look at will be at least 'TolCon', 'Vectorized', and of course,
%// 'PopulationType', 'Generations', etc.
%// THE OPTIMZIATION
[sol, fval, exitflag] = ga(...
#obj, size(data,3), ... %// objective function, taking a vector of 20 values
[],[], [],[], ... %// no linear (in)equality constraints
1,size(data,2), ... %// lower and upper limits
#nonlcon, options); %// your "nonlinear" constraints
end
Note that even though your constraints are essentially linear, the way by which you must compute the value for your s necessitates the use of a custom constraint function (nonlcon).
Especially note that this is currently (probably) a sub-optimal way to use ga -- I don't know the specifics of your objective function, so a lot more may be possible. For instance, I currently use a simple round() to convert the input X to integers, but using 'PopulationType', 'custom' (with a custom 'CreationFcn', 'MutationFcn' etc.) might produce better results. Also, 'Vectorized' will likely speed things up a lot, but I don't know whether your function is easily vectorized.
And yes, I use nested functions (I just love those things!); it prevents these huge, usually identical lists of input arguments if you use sub-functions or stand-alone functions, and they can really be a performance boost because there is little copying of data. But, I realize that their scoping rules make them somewhat akin to goto constructs, and so they are -ahum- "not everyone's cup of tea"...you might want to convert them to sub-functions to prevent long and useless discussions with your co-workers :)
Anyway, this should be a good place to start. Let me know if this is useful at all.
Unless you define some intelligence on how the vector sets are organized, there will be no intelligent way of solving your problem other then pure brute force.
Say you find s s.t. f(s) is max given constraints of s, you still need to figure out how to build s with twelve 4-element vectors (an overdetermined system if there ever was one), where each vector has 20 possible values. Sparsity may help, although I'm not sure how it is possible to have a vector with four elements be 70-90% zero, and sparsity would only be useful if there was some yet to be described methodology in how the vector are organized
So I'm not saying you can't solve the problem, I'm saying you need to rethink how the problem is set-up.
I know, this answer is reaching you really late.
Unfortunately, the problem, as is, show not many patterns to be exploited, besides of brute force -Branch&Bound, Master& Slave, etc.- Trying a Master Slave approach -i.e. solving first the function continuous nonlinear problem as master, and solving the discrete selection as slave could help, but with as many combinations, and without any more information over the vectors, there is not too much space for work.
But based on the given continuous almost everywhere functions, based on combinations of sums and multiplication operators and their inverses, the sparsity is a clear point to be exploited here. If 70-90% of vectors are zero, almost a good part of the solution space will be close to zero, or close to infinite. Hence a 80-20 pseudo solution would discard easily the 'zero' combinations, and use only the 'infinite' ones.
This way, the brute-force could be guided.
I'm working with a 30*26000 size matrix that has NaNs at the beginning and at the end. NaNs are also sprinkled throughout each row. I can fill in the NaNs with linear interpolation but that will leave NaNs at the beginning and end of each row. Extrapolating to replace these NaNs at the ends is not ideal for my data set.
I want to just trim the matrix. Take for example a 3 by 6 matrix:
NaN NaN 1 2 3 NaN
NaN 1 2 3 NaN NaN
1 NaN 2 3 4 5
Cut off the left most and right most columns such that no row begins or ends with a NaN.
1 2
2 3
2 3
So we are left with a 3 by 2 matrix.
How can I do this in Matlab? (speed-optimized; I will need to apply this to a million size matrix)
Thanks!
For your example you can do the following:
let a your matrix with NaN and numerical values.
ind1 = sum(isnan(a),1); % count the NaN values along columns
s = find(ind1 == 0, 1, 'first'); % find the first column without any NaN
e = find(ind1 == 0, 1, 'last'); % find the last column without any NaN
So now just keep this part of the matrix from s-th to e-th column:
b = a(:,s:e);
Additional check may be needed for the case no column is clear of NaNs.
Firstly, the vectorized solution of argyris will work perfectly well (+1). I'm only posting this because you emphasized that you wanted a speed optimized solution. Well, the downside of argyris solution is that the sum and isnan operation are performed on the entire matrix. This will be optimal if you have to come a long way in on either side to find the first non-NaN column. But what if you don't? A loop-based solution that exploits the fact that you may only need to come in a few columns may do better (particularly given how good the JIT accelerator is getting at executing single loops quickly). I've put together a speed test that includes both argyris and my solution:
%#Set up an example case using the matrix size you indicated in the question
T = 30;
N = 26000;
X = rand(T, N);
TrueL = 8;
TrueR = N - 8;
X(:, 1:TrueL) = NaN;
X(:, TrueR:end) = NaN;
%#argyris solution
tic
I1 = sum(isnan(X));
argL = find(I1 == 0, 1, 'first');
argR = find(I1 == 0, 1, 'last');
Soln1 = X(:, argL:argR);
toc
%#My loop based solution (faster if TrueL and TrueR are small)
tic
for n = 1:N
if ~any(isnan(X(:, n)))
break
end
end
ColinL = n;
for n = N:-1:1
if ~any(isnan(X(:, n)))
break
end
end
ColinR = n;
Soln2 = X(:, ColinL:ColinR);
toc
In the above example, the solution will need to get rid of the first 8 and last 8 columns. The outcome of the speed test?
Elapsed time is 0.002919 seconds. %#argyris solution
Elapsed time is 0.001007 seconds. %#My solution
The loop based solution is almost 3 times faster. Okay, now let's up the number of columns that we need to get rid of on either side to 100:
Elapsed time is 0.002769 seconds. %#argyris solution
Elapsed time is 0.001999 seconds. %#My solution
Still ahead. What about 1000 columns on either side?
Elapsed time is 0.003597 seconds. %#argyris solution
Elapsed time is 0.003719 seconds. %#My solution
So we've found our tipping point (on my machine at least - Quad core i7, Linux Mint v12, Matlab R2012b). Once we need to come in about 1000 columns on either side, we're better off using the vectorized solution.
One final note of CAUTION: If the routine is occurring inside another (possibly unrelated) loop, then speed comparisons should be re-done. This is because my solution will now involve a double loop. Even if the loops are unrelated, the JIT accelerator is not so good with double loops. I did some quick tests on my machine, and my solution still comes out ahead for small TrueL and TrueR (ie less than 100), but the advantage is not as large as it was when the outer loop was not present.
Anyway, hope this proves useful to you or anyone else who comes a-reading.
Cheers!
EDIT: I've done a few speed tests incorporating angainor's very neat one-liner (+1). It performs almost as well as my loop based solution when the number of columns to be removed is small. Suprisingly, it didn't scale that well when the number of columns to be removed is large, unlike argyris's solution. That may have something to do with the computer I'm on now though: work Windows machine - I've never really trusted it fully :-)
Both earlier proposed solutions are great, I am posting this one-liner for completeness:
A(:,isfinite(sum(A)))
ans =
1 2
2 3
2 3
It avoids going through the matrix entries twice (what Colin pointed out) by first calculating the row sums and after that calling isfinite. I also removed the find calls - they are not necessary since you can use logical indexing instead.
I do not have my computer here, so I leave out the performance tests.
I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...
A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.
Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?
How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).