MATLAB: Compute mean over a huge array - matlab

I need to compute the mean of all columns over a huge array wherby I must replace all numbers that are less than zero with zero in the first place. Using my toy example, it will become obvious that these computation takes quite an amount of time.
tmp = -5 + 10 * rand(5000,100000);
tmp(tmp<0) = 0;
result = mean(tmp);
I am wondering whether there might be a better way in order to gain some speed?

Finding values in arrays and then replacing them is a very expensive operation. Instead, do the following:
% slooooooow
tic
tmp(tmp<0)=0;
mean(tmp);
toc
% faaaaaaaaaast
tic
tmp=max(tmp,0);
mean(tmp);
toc
In my PC , this reports:
Elapsed time is 5.940434 seconds.
Elapsed time is 0.358057 seconds.
Remember that if you expect a single mean value, you should call mean(tmp(:))

Related

Efficient way of computing second min value

Given a matrix, it's easy to compute the value and index of the min value:
A = rand(10);
[value, index] = min(A(:));
However I would also like to recover the second min value (idem for max).
I can of course take any of this two approaches:
Converting A to a vector and sorting it.
PROS: I can then recover the second, third... n minimum value
CONS: If A is large, sorting is expensive
Once the min location of A is located, I can replace this value by a large one (eg: Inf) and then run min again.
PROS: Cheaper than sort
CONS: I must modify my matrix (and save the modified value in an aux variable). Also re-running min is costly on a large matrix.
I'm wondering if there is a better solution:
When computing min the algorithm has to keep track of the min value found so far, until a new value has a lower value (then we update the value).
If instead we keep track of the last n min values found so far will allow to recover the minimum n values.
I can implement this, but I'm wondering if it's the best approach or if it's already implemented.
I don't know in which case it would be less expensive than sorting, but an easy, but not so fast way would be to use the following code. I may be wrong, but I don't think you can get faster with build-in functions if you just want the first and the second min.
A = rand(10);
[firstMin, firstMinIndex] = min(A(:));
secondMin = min(A(A~=firstMin));
secondMinIndex = find(A==secondMin); % slow, but use only if you need the index
Here, you go through the matrix two times more, one for the boolean operation, and one for the second min.
After some testing on 2000x2000 and 4000x4000 random matrix, it seems that this code snipset is around 3.5 time faster than the sort function applied on the same matrix.
If you really need more efficiency, you'd have to write your own mex routine, with which you can theoretically get the two values in n+log n-2 comparison, as explained in the link provided by #luismendotomas.
Hope this help !
In a single pass:
a = [53 53 49 49 97 75 4 22 4 37];
first = Inf;
second = Inf;
for i = 1:1:numel(a)
if (a(i) < first)
second = first;
first = a(i);
elseif (a(i) < second && a(i) ~= first)
second = a(i);
end
end
fprintf('First smallest %d\n', first);
fprintf('Second smallest %d\n', second);
You can remove the a(i) ~= first condition if you rather have 4, 4 as output instead of 4, 23
Also, see this SO question
As already mentioned I suppose the best (read: "most efficient") method is to implement the methods from #luismendotomas link.
However, if you want to avoid doing too much programming yourself, then you could apply some k-nearest neighbours algorithm, given you have a lower bound on your data, e.g. if all your data points are positive, you can find the 2 nearest neighbours to 0. Though I am not sure whether this is faster than your initial suggestions or not.
For one k-nearest neighbour algorithm see e.g. this
beesleep has already pointed out that method 2 (by computing the minimum twice) is more efficient that method 1 (by sorting). However the implementation provided in the answer to compute the index of the second minimum via find is, as mentioned, very inefficient.
In fact, to get the index of the second minimum, it is ca. 10x faster to set the first minimum value to inf (as suggested in the question) and then get the index of the second minimum from the min function (as opposed to using find)
[firstMin, firstMinIndex] = min(A(:));
A(firstMinIndex) = inf;
[secondMin, secondMinIndex] = min(A(:));
Here is the code which I used to compare this implementation to the one suggested by beesleep:
for i = 1:10
A = rand(10000);
tic
[firstMin, firstMinIndex] = min(A(:));
secondMin = min(A(A~=firstMin));
secondMinIndex = find(A==secondMin); % slow, but use only if you need the index
t1(i) = toc;
tic
[firstMin, firstMinIndex] = min(A(:));
A(firstMinIndex) = inf;
[secondMin, secondMinIndex] = min(A(:));
t2(i) = toc;
end
disp(mean(t1) / mean(t2))

Vectorizing code - How to reduce MATLAB computational time

I have this piece of code
N=10^4;
for i = 1:N
[E,X,T] = fffun(); % Stochastic simulation. Returns every time three different vectors (whose length is 10^3).
X_(i,:)=X;
T_(i,:)=T;
GRID=[GRID T];
end
GRID=unique(GRID);
% Second part
for i=1:N
for j=1:(kmax)
f=find(GRID==T_(i,j) | GRID==T_(i,j+1));
s=f(1);
e=f(2)-1;
counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;
end
end
The code performs N different simulations of a stochastic process (which consists of 10^3 events, occurring at discrete moments (T vector) that depends on the specific simulation.
Now (second part) I want to know, as a function of time istant, how many simulations are in a particular state (X assumes value between 1 and 10). The idea I had: create a grid vector with all the moments at which something happens in any simulation. Then, looping over the simulations, loop over the timesteps in which something happens and incrementing all the counter indeces that corresponds to this particular slice of time.
However this second part is very heavy (I mean days of processing on a standard quad-core CPU). And it shouldn't.
Are there any ideas (maybe about comparing vectors in a more efficient way) to cut the CPU time?
This is a standalone 'second_part'
N=5000;
counter=zeros(11,length(GRID));
for i=1:N
disp(['Counting sim #' num2str(i)]);
for j=1:(kmax)
f=find(GRID==T_(i,j) | GRID==T_(i,j+1),2);
s=f(1);
e=f(2)-1;
counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;
end
end
counter=counter/N;
stop=find(GRID==Tmin);
stop=stop-1;
plot(counter(:,(stop-500):stop)')
with associated dummy data ( filedropper.com/data_38 ). In the real context the matrix has 2x rows and 10x columns.
Here is what I understand:
T_ is a matrix of time steps from N simulations.
X_ is a matrix of simulation state at T_ in those simulations.
so if you do:
[ut,~,ic]= unique(T_(:));
you get ic which is a vector of indices for all unique elements in T_. Then you can write:
counter = accumarray([ic X_(:)],1);
and get counter with no. of rows as your unique timesteps, and no. of columns as the unique states in X_ (which are all, and must be, integers). Now you can say that for each timestep ut(k) the number of time that the simulation was in state m is counter(k,m).
In your data, the only combination of m and k that has a value greater than 1 is (1,1).
Edit:
From the comments below, I understand that you record all state changes, and the time steps when they occur. Then every time a simulation change a state you want to collect all the states from all simulations and count how many states are from each type.
The main problem here is that your time is continuous, so basically each element in T_ is unique, and you have over a million time steps to loop over. Fully vectorizing such a process will need about 80GB of memory which will probably stuck your computer.
So I looked for a combination of vectorizing and looping through the time steps. We start by finding all unique intervals, and preallocating counter:
ut = unique(T_(:));
stt = 11; % no. of states
counter = zeros(stt,numel(ut));r = 1:size(T_,1);
r = 1:size(T_,1); % we will need that also later
Then we loop over all element in ut, and each time look for the relevant timestep in T_ in all simulations in a vectorized way. And finally we use histcounts to count all the states:
for k = 1:numel(ut)
temp = T_<=ut(k); % mark all time steps before ut(k)
s = cumsum(temp,2); % count the columns
col_ind = s(:,end); % fins the column index for each simulation
% convert the coulmns to linear indices:
linind = sub2ind(size(T_),r,col_ind.');
% count the states:
counter(:,k) = histcounts(X_(linind),1:stt+1);
end
This takes about 4 seconds at my computer for 1000 simulations, so it adds to a little more than one hour for the whole process. Not very quick...
You can try also one or two of the tweaks below to squeeze run time a little bit more:
As you can read here, accumarray seems to work faster in small arrays then histcouns. So may want to switch to it.
Also, computing linear indices directly is a quicker method than sub2ind, so you may want to try that.
implementing these suggestions in the loop above, we get:
R = size(T_,1);
r = (1:R).';
for k = 1:K
temp = T_<=ut(k); % mark all time steps before ut(k)
s = cumsum(temp,2); % count the columns
col_ind = s(:,end); % fins the column index for each simulation
% convert the coulmns to linear indices:
linind = R*(col_ind-1)+r;
% count the states:
counter(:,k) = accumarray(X_(linind),1,[stt 1]);
end
In my computer switching to accumarray and or removing sub2ind gain a slight improvement but it was not consistent (using timeit for testing on 100 or 1K elements in ut), so you better test it yourself. However, this still remains very long.
One thing that you may want to consider is trying to discretize your timesteps, so you will have much less unique elements to loop over. In your data about 8% of the time intervals a smaller than 1. If you can assume that this is short enough to be treated as one time step, then you could round your T_ and get only ~12.5K unique elements, which take about a minute to loop over. You can do the same for 0.1 intervals (which are less than 1% of the time intervals), and get 122K elements to loop over, what will take about 8 hours...
Of course, all the timing above are rough estimates using the same algorithm. If you do choose to round the times there may be even better ways to solve this.

SVD speed in CPU and GPU

I'm testing svd in Matlab R2014a and it seems that there is no CPU vs GPU speedup. I'm using a GTX 460 card and a Core 2 duo E8500.
Here is my code:
%test SVD
n=10000;
%host
Mh= rand(n,1000);
tic
%[Uh,Sh,Vh]= svd(Mh);
svd(Mh);
toc
%device
Md = gpuArray.rand(n,1000);
tic
%[Ud,Sd,Vd]= svd(Md);
svd(Md);
toc
Also, the run times are different from run to run, but the CPU and GPU versions are about the same. Why there is no speedup?
Here are some tests
for i=1:10
clear;
m= 10000;
n= 100;
%host
Mh= rand(m,n);
tic
[Uh,Sh,Vh]= svd(Mh);
toc
%device
Md = gpuArray.rand(m,n);
tic
[Ud,Sd,Vd]= svd(Md);
toc
end
>> test_gpu_svd
Elapsed time is 43.124130 seconds.
Elapsed time is 43.842277 seconds.
Elapsed time is 42.993283 seconds.
Elapsed time is 44.293410 seconds.
Elapsed time is 42.924541 seconds.
Elapsed time is 43.730343 seconds.
Elapsed time is 43.125938 seconds.
Elapsed time is 43.645095 seconds.
Elapsed time is 43.492129 seconds.
Elapsed time is 43.459277 seconds.
Elapsed time is 43.327012 seconds.
Elapsed time is 44.040959 seconds.
Elapsed time is 43.242291 seconds.
Elapsed time is 43.390881 seconds.
Elapsed time is 43.275379 seconds.
Elapsed time is 43.408705 seconds.
Elapsed time is 43.320387 seconds.
Elapsed time is 44.232156 seconds.
Elapsed time is 42.984002 seconds.
Elapsed time is 43.702430 seconds.
for i=1:10
clear;
m= 10000;
n= 100;
%host
Mh= rand(m,n,'single');
tic
[Uh,Sh,Vh]= svd(Mh);
toc
%device
Md = gpuArray.rand(m,n,'single');
tic
[Ud,Sd,Vd]= svd(Md);
toc
end
>> test_gpu_svd
Elapsed time is 21.140301 seconds.
Elapsed time is 21.334361 seconds.
Elapsed time is 21.275991 seconds.
Elapsed time is 21.582602 seconds.
Elapsed time is 21.093408 seconds.
Elapsed time is 21.305413 seconds.
Elapsed time is 21.482931 seconds.
Elapsed time is 21.327842 seconds.
Elapsed time is 21.120969 seconds.
Elapsed time is 21.701752 seconds.
Elapsed time is 21.117268 seconds.
Elapsed time is 21.384318 seconds.
Elapsed time is 21.359225 seconds.
Elapsed time is 21.911570 seconds.
Elapsed time is 21.086259 seconds.
Elapsed time is 21.263040 seconds.
Elapsed time is 21.472175 seconds.
Elapsed time is 21.561370 seconds.
Elapsed time is 21.330314 seconds.
Elapsed time is 21.546260 seconds.
Generally SVD is a difficult to paralellize routine. You can check here that with a high end Tesla card, the speedup is not very impressive.
You have a GTX460 card - Fermi architecture. The card is optimized for gaming (single precision computations), not HPC (double precision computation). The Single Precision / Double Precision throughput ratio is 12. So the card has 873 GFLOPS SP / 72 GFLOPS DP. Check here.
So if the Md array uses double precision elements, then the computation on it would be rather slow. Also there's a high chance that when calling the CPU routine, all CPU cores will get utilized, reducing the possible gain of running the routine on the GPU. Plus, in the GPU run you pay time for transferring the buffer to the device.
Per Divakar's suggestion, you could use Md = single(Md) to convert your array to single precision and run the benchmark again. You can try and go with a bigger dataset size to see if something changes. I don't expect to much gain for this routine on your GPU.
Update 1:
After you posted the results, I saw that the DP/SP time ratio is 2. On the CPU side this is normal, because you can fit 2 times less double values in SSE registers. However, a ratio of only 2 on the GPU side means that the gpu code does not make best use of the SM cores - because the theoretical ratio is 12. In other words, I would have expected much better SP performance for an optimized code, compared to DP. It seems that this is not the case.
As VAndrei has already stated, the SVD is an algorithm which is difficult to parallelize.
Your main problem is the size of your matrix. The performance of the SVD drops rapidly with a growing matrix size. So your main goal should be to reduce the size of the matrix.
This can be accomplished using Gaussian normal equations (which is basically a reduction of an overdetermined linear system in the least-squares sense).
This can be done by simply multiplying the transpose onto the matrix:
MhReduced = Mh' * Mh;
This reduces your matrix to the size of cols*cols (if cols is the number of columns of Mh). Then you just call [U,S,V] = svd(MhReduced);
Note: Using this method may yield singular vectors with opposite sign (just important if you're comparing these methods).
If your matix is well-conditioned this should work without problems. However, in case of an ill-conditioned matrix, this method may fail to produce a usable result, whereas applying SVD directly could still yield a usable result due to SVD's robustness.
This should increase your performance immensly, at least with matrices big enough. Another advantage is that you can use much larger matrices. You'll probably won't have to use the GPU at all (since either matrices are so big that copying to GPU costs too much or after reduction the matrix is so small that the speedup of the GPU won't be big enough).
Also note that a large chunk of performance is lost, if you use return values. If you're only interested in the performance of the SVD caluclation, don't take any return values. If you are only interested in the "solution vector", just get V (and access the last column): [~,~, V] = svd(Mh);.
EDIT:
I've looked at your sample code, but I'm not sure what it is, you are calculating. Also I realized that it's rather hard to understand what I did with A'*A, so I will explain in detail.
Given a linear system with A*x=b, A denoting the coefficient matrix
with m rows and n cols, x the solution vector and b the constant vector (both with m rows), a solution can be calculated as follows:
if A is square (m=n): x = A^-1 * b,
if A is not square (m!=n, m > n):
A * x = b
A'* A * x = A' * b
x = (A' * A)^-1 * A'*b
A" = (A'*A)^-1 * A' is typically called pseudo-inverse. However this calculation does influence the condition number of the matrix negatively. A solution to this problem is using a singular value decomposition (SVD).
If USV = svd(A) denotes the results of the SVD, the pseudo-inverse is given by VS"U', with S" is formed by taking the inverse of the non-zero elements of S.
So A" = VS"U'.
x = A"*b
However since a SVD is rather costly, especially with large matrices. If matrix A is well-conditioned and very precicse results are not necessarily required (we're talking 1e-13 or 1e-14), the much faster approach by calculating the peseudo-inverse via (A'*A)^-1 * A can be used.
If your case actually is A*x=0, just use a SVD and read the last column vector from V, it is the solution.
If you use the SVD not to solve a linear system but for the results of U and S (as your example suggests), I'm not sure what I've posted will help you.
Sources:
1, 2, 3
Here is some sample code for you to test. Test it with large matrices, you will see that using (A'*A)^-1 * A' is much faster than the alternatives.
clear all
nbRows = 30000;
nbCols = 100;
% Matrix A
A = rand(nbRows,nbCols);
% Vector b
b = rand(nbRows,1);
% A*x=b
% Solve for x, using SVD
% [U,S,V]=svd(A,0);
% x= V*((U'*b)./diag(S))
tic
[U1,S1,V1]=svd(A,0);
x1= V1*((U1'*b)./diag(S1));
toc
tic
[U1,S1,V1]=svd(A,0);
x2 = V1*inv(S1)*U1'*b;
toc
% Solve for x, using manual pseudo-inverse
% A*x=b
% A'*A*x = A'*b
% x = (A'*A)^-1 * A'*b
tic
x3 = inv(A'*A) * A'*b;
toc
% Solve for x, let Matlab decide how (most likely SVD)
tic
x4 = A\b;
toc
The issue
First of all, I have replicated your issue in Matlab2016b using the following code:
clear all
close all
clc
Nrows = 2500;
Ncols = 2500;
NumTests = 10;
h_A = rand(Nrows, Ncols);
d_A = gpuArray.rand(Nrows, Ncols);
timingCPU = 0;
timingGPU = 0;
for k = 1 : NumTests
% --- Host
tic
[h_U, h_S, h_V] = svd(h_A);
% h_S = svd(h_A);
timingCPU = timingCPU + toc;
% --- Device
tic
[d_U, d_S, d_V] = svd(d_A);
% d_S = svd(d_A);
timingGPU = timingGPU + toc;
end
fprintf('Timing CPU = %f; Timing GPU = %f\n', timingCPU / NumTests, timingGPU / NumTests);
By the above code, it is possible to either compute the singular values only or compute the full SVD including the singular vectors. It is possible also to compare the different behavior of the CPU and GPU versions of the SVD code.
The timing is reported in the following table (timing in s; Intel Core i7-6700K CPU # 4.00GHz, 16288 MB, Max threads(8), GTX 960):
Sing. values only | Full SVD | Sing. val. only | Full
| | |
Matrix size CPU GPU | CPU GPU | |
| | |
200 x 200 0.0021 0.043 | 0.0051 0.024 | 0.098 | 0.15
1000 x 1000 0.0915 0.3 | 0.169 0.458 | 0.5 | 2.3
2500 x 2500 3.35 2.13 | 4.62 3.97 | 2.9 | 23
5000 x 5000 5.2 13.1 | 26.6 73.8 | 16.1 | 161
The first 4 columns refer to a comparison between the CPU and GPU Matlab versions of the svd routine when it is used to calculate the singular values only or the full SVD. As it can be seen, the GPU version can be significantly slower than the GPU one. The motivation has been already pointed out in some answers above: there is an inherent difficulty to parallelize the SVD computation.
Using cuSOLVER?
At this point, the obvious question is: can we get some speedup with cuSOLVER? Indeed, we could use mexFiles to make the cuSOLVER routines run under Matlab. Unfortunately, the situation with cuSOLVER is even worse, as it can be deduced from the last two columns of the above table. Such columns report the timing of the codes at Singular values calculation only with CUDA and Parallel implementation for multiple SVDs using CUDA using cusolverDnSgesvd for the singular values only calculation and full SVD calculation, respectively. As it can be seen, cuSOLVER's cusolverDnSgesvd performs even worser than Matlab, if one takes into account that it deals with single precision, while Matlab with double precision.
The motivation for this behavior is further explained at cusolverDnCgesvd performance vs MKL where Joe Eaton, manager of cuSOLVER library, says
I understand the confusion here. We do provide a decent speedup for
LU, QR and LDL^t factorizations, which is what we would like to say
for SVD as well. Our purpose with cuSOLVER is to provide dense and
sparse direct solvers as part of the CUDA toolkit for the first time;
we have to start somewhere. Since CULA is no longer supported, we felt
it was urgent to get some functionality into the hands of developers
in CUDA 7.0. Since CUDA runs on more that x86 host CPUs these days,
cuSOLVER fills a need where there is no MKL. That being said, we can
do better with SVD, but it will have to wait for the next CUDA
release, priorities and timelines being tight already.
Using other libraries
At this point, other possibilities are using other libraries like
CULA;
MAGMA;
ArrayFire.
CULA is not offered for free, so I have not tried it.
I had some installation issues with MAGMA dependencies, so I have not investigated this point further (disclaimer: I expect that, with some more time, I could be able to solve such issues).
I then finally ended up with using ArrayFire.
Using ArrayFire, I had the following timing for the full SVD computation:
200 x 200 0.036
1000 x 1000 0.2
2500 x 2500 4.5
5000 x 5000 29
As it can be seen, the timing is slightly higher, but now comparable, to the CPU case.
Here is the ArrayFire code:
#include <arrayfire.h>
#include <cstdio>
#include <cstdlib>
#include <fstream>
using namespace af;
int main(int argc, char *argv[])
{
const int N = 1000;
try {
// --- Select a device and display arrayfire info
int device = argc > 1 ? atoi(argv[1]) : 0;
af::setDevice(device);
af::info();
array A = randu(N, N, f64);
af::array U, S, Vt;
// --- Warning up
timer time_last = timer::start();
af::svd(U, S, Vt, A);
S.eval();
af::sync();
double elapsed = timer::stop(time_last);
printf("elapsed time using start and stop = %g ms \n", 1000.*elapsed);
time_last = timer::start();
af::svd(U, S, Vt, A);
S.eval();
af::sync();
elapsed = timer::stop(time_last);
printf("elapsed time using start and stop = %g ms \n", 1000.*elapsed);
}
catch (af::exception& e) {
fprintf(stderr, "%s\n", e.what());
throw;
}
return 0;
}
I have tried to parallelize SVD on my laptop equipped with GTX 460 for over one months, which was also a part of my undergraduate thesis, I did so many experiments that I later discovered that MATLAB is extremely fast and outperforms my code, by the way, I used one side Jacobi, and I have not yet seen any paper that reveals an algorithm faster than svd of MATLAB. On GPU, the time cost of memory copy can be very high if you are not using an elegant model, I refer you to read more about CUDA.
If you need any help, please contact me.

MATLAB: Select elements in a vector multiple times with different index list

INPUT: a logical row vector u with length n, say [1,0,1,1,0]; and a logical matrix M of size m-by-n, say [1,0,0,1,1;0 0 0 1 1].
OUTPUT: a logical matrix of size m-by-n, the first row of which is obtained by applying the first row of matrix M as "selector", that is, [1,0,0,1,0]; and the second row is, similarly, [0 0 0 1 0].
The row vector is 20000 long, and the the matrix is 30-by-20000. This will repeat 1000 times, and I want something costs less than 1 second.
I've tried repmat, bsxfun, and element-wise multiplication, no luck. Guess there is a simple way to "choose" these elements all at once, since they are all logical values.
The fastest I can give you is 4 seconds at the moment (on my machine). I'm not 100% sure if you've tried this or not, but here you go anyway.
I have an m-file randbool.m with these contents, to generate the test data.
function x = randbool(m,n)
x = logical(rand(m,n) < 0.5);
Generate the data for testing:
>> u = randbool(1,20000);
>> M = randbool(30,20000);
I can think of three ways to loop over the rows of M (use bsxfun, use repmat, use a loop) and two ways to pull out the elements you want (conjunction &, or pointwise multiplication with .*). The fastest is the combination of bsxfun and conjunction:
Bsxfun / conjunction
>> tic, for i=1:1000, bsxfun(#and,u,M); end, toc
Elapsed time is 4.068684 seconds.
Bsxfun / multiplication
>> tic, for i=1:1000, bsxfun(#times,u,M); end, toc
Elapsed time is 4.856784 seconds.
Repmat / conjunction
>> tic, for i=1:1000, utmp=repmat(u,30,1); M&utmp; end, toc
Elapsed time is 7.305158 seconds.
Repmat / multiplication
>> tic, for i=1:1000, utmp=repmat(u,30,1); M.*utmp; end, toc
Elapsed time is 8.117164 seconds.
Looping / conjunction
>> tic, for i=1:1000, for j = 1:30, out(j,:)=u&M(j,:); end; end, toc
Elapsed time is 7.110872 seconds.
Looping / multiplication
>> tic, for i=1:1000, for j = 1:30, out(j,:)=u.*M(j,:); end; end, toc
Elapsed time is 8.322888 seconds.
This looks like a bitwise and operation. Perhaps something like this would work:
utemp=repmat(u,1,size(m,2));
output=M&utemp;
I should add that for that large of a matrix, you might run into memory problems. Essentially you need 3 copies of the 600K element matrix, which could add up.
Other solutions are overcomplicating things.
All you need to do is zero out the entries in the non-chosen columns...
M(:,~u)=0;
That's it. Ten measly characters. Chris Taylor's solution using bsxfun with #and is a little slower, other methods are worse.
octave:8> u = logical(rand(1,20000)<0.5);
octave:9> M = logical(rand(30,20000)<0.5);
octave:10> tic, for i=1:1000, N=M; N(:,~u)=0; end, toc
Elapsed time is 0.66 seconds.
octave:11> tic, for i=1:1000, N=M; N=bsxfun(#and,u,N); end, toc
Elapsed time is 0.82 seconds.
octave:12> tic, for i=1:1000, N=bsxfun(#and,u,M); end, toc
Elapsed time is 0.8 seconds.
Note that I've used "N=M" to standardise the result, because this method acts directly on the vector, but the assignment isn't adding anything significant to the time.

Comparing dates and filling in gap times in matlab

I have a data file which contains time data. The list is quite long, 100,000+ points. There is data every 0.1 seconds, and the time stamps are so:
'2010-10-10 12:34:56'
'2010-10-10 12:34:56.1'
'2010-10-10 12:34:56.2'
'2010-10-10 12:34:53.3'
etc.
Not every 0.1 second interval is necessarily present. I need to check whether a 0.1 second interval is missing, then insert this missing time into the date vector. Comparing strings seems unnecessarily complicated. I tried comparing seconds since midnight:
date_nums=datevec(time_stamps);
secs_since_midnight=date_nums(:,4)*3600+date_nums(:,5)*60+date_nums(:,6);
comparison_secs=linspace(0,86400,864000);
res=(ismember(comparison_secs,secs_since_midnight)~=1);
However this approach doesn't work due to rounding errors. Both the seconds since midnight and the linspace of the seconds to compare it to never quite equal up (due to the tenth of a second resolution?). The intent is to later do an fft on the data associated with the time stamps, so I want as much uniform data as possible (the data associated with the missing intervals will be interpolated). I've considered blocking it into smaller chunks of time and just checking the small chunks one at a time, but I don't know if that's the best way to go about it. Thanks!
Multiply your numbers-of-seconds by 10 and round to the nearest integer before comparing against your range.
There may be more efficient ways to do this than ismember. (I don't know offhand how clever the implementation of ismember is, but if it's The Simplest Thing That Could Possibly Work then you'll be taking O(N^2) time that way.) For instance, you could use the timestamps that are actually present (as integer numbers of 0.1-second intervals) as indices into an array.
Since you're concerned with missing data records and not other timing issues such as a drifting time channel, you could check for missing records by converting the time values to seconds, doing a DIFF and finding those first differences that are greater than some tolerance. This would tell you the indices where the missing records should go. It's then up to you to do something about this. Remember, if you're going to use this list of indices to fill the gaps, process the list in descending index order since inserting the records will cause the index list to be unsynchronized with the data.
>> time_stamps = now:.1/86400:now+1; % Generate test data.
>> time_stamps(randi(length(time_stamps), 10, 1)) = []; % Remove 10 random records.
>> t = datenum(time_stamps); % Convert to date numbers.
>> t = 86400 * t; % Convert to seconds.
>> index = find(diff(t) > 1.999 * 0.1)' + 1 % Find missing records.
index =
30855
147905
338883
566331
566557
586423
642062
654682
733641
806963