I am writing some matlab code and have written an algorithm that works but I don't think its particularly efficient. Since I am trying to improve my programming skills I would like to know if there is a more efficient way of doing this.
I have a (reasonably large ~ E07) matrix of values which are unordered, but fall within the range [-100, 100]. I want to create a second matrix based on the first, by using the following rules:
If the value of the point is > 70, then the value of the point should be set to 70.
If the value of the point is < -70, then the value of the point should be set to -70.
All other values should be rounded to the nearest multiple of 5.
Here is what I am currently doing:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data));
for i = 1:length(data)
if (data(i) > 70)
new_data(i) = 70;
elseif (data(i) < -70)
new_data(i) = -70;
new_data(i) = round(data(i)/5.0)*5.0;
Is there a more efficient method? I think there should be a way to do this using logical indexes but those are a new discovery for me...

You do not need a loop at all:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data)); % note that this memory allocation is not necessary at this point
new_data = round(data/5.0)*5.0;
new_data(data>70) = 70;
new_data(data<-70) = -70;

Even easier is to use max and min. Do it in one simple line.
new_data = round(5*max(-70,min(70,data)))/5;

The two answers by H.Muster and woodchips are of course the way to do it, but there still are small improvements to be found. If you are after performance you might want to exploit specifics of your problem. For example, your output data is integers -100 <= x <= 100. This obviously qualifies for 8-bit signed integer data type. This code (note explicit cast to int8 from arbitrary double precision data)
% your double precision input data
data = 100*(-1+2*rand(1,10000000));
% cast to int8 - matlab does usual round here
data = int8(data);
new_data = 5*(max(-70,min(70,data))/5);
is the fastest for two reasons:
1 data element takes 1 byte, not 8. Memory bandwidth is a limiting factor here, so you get a lot of improvement
round is no longer necessary
Here are some timings from the codes of H.Muster, woodchips, and my small modification:
H.Muster Elapsed time is 0.235885 seconds.
woodchips Elapsed time is 0.167659 seconds.
my code Elapsed time is 0.023061 seconds.
The difference is quite striking. Although MATLAB uses doubles everywhere, you should try to use integer data types when possible..
Edit This works because of how matlab implements integer arithmetic. Differently than in C, a cast of double to int implies a round operation:
a = 0.1;
ans =
a = 0.9;
ans =


dsearchn() Command is slowing down my algorithm, Any better Solution? MATLAB

I am using the following code to calculate altitude.
Data = [Distance1',Gradient];
Result = Data(dsearchn(Data(:,1), Distance2), 2);
Altitude = -cumtrapz(Distance2, Result)/1000;
Distance 1 and Distance 2 has different size with same values so I am comparing them to get corresponding value of Gradient to use with Distance 2.
Just to execute these 3 lines the Matlab takes 12 to 15 seconds. Which slow down my whole algorithm.
Is there any better way I can perform above action without slowing down my algorithm?
If I understand correctly, you are looking for the first occurance of number Distance2 in the column Data(:,1). You can perform about 3 times faster using find. Try:
k = find(Data(:,1) == Distance2,1);
Result = Data(k,2);
Here is a timing test, where pow is the length of your data (10^pow for 10000 rows), and fac is the increased speed factor for using find
pow = 5;
data = round(rand(10^pow,1)*10);
funcFind = #() find(data == 5,1);
timeFind = timeit(funcFind);
funcD = #() dsearchn(data,5);
timeD = timeit(funcD);
fac = timeD/timeFind
I manage to find alternative method by using interp1 function.
Here is an Example Code.
Distance2= [1:10:1000]';
Distance1= [1:1:1000]';
Gradient= rand(1000,1);
Data= [Distance1,Gradient];
This function add just 1 second to my original simulation time which is way better then previous 12 to 15 seconds.

Matlab trouble with a lot of data. Vibration data is 3 million rows long

I have vibration data(g) in x and y direction for a Ball Bearing in 2 columns. Is there a way to find Manhattan distance just with this data & time?
If all you look for is summing data(:,1) and data(:,2), (as in #Austin answer) using vectored sum will be the fastest:
out = data(:,1)+data(:,2);
Here is an example:
function timming_mat_dist
data = randi(100,1e6,2);
function out = arrmat(data)
man_hat_dist_func = #(x,y)x+y;
out = arrayfun(man_hat_dist_func,data(:,1),data(:,2));
function out = vecmat(data)
out = data(:,1)+data(:,2);
and the results are 2.8936 seconds for arrmat and 0.0020463 seconds for vecmat.
However, if you want to compute all distances you should use pdist:
out = pdist(data,'cityblock');
but be aware that the output will be huge (and MATLAB probably won't let allocate so much memory for that) as #hammadian pointed in his answer.
It depends on which pair you want to measure the distance between. If you are looking for all possible combination of points, this will be memory and time consuming as the space requirement will be stored in 2D matrix of size:3 million * 3 million /2 ( /2 as distance between a and b is the same as b and a), so the matrix is upper triangular.
This is 3000000*3000000/2=4.5 Tera bytes. This can be improved but still will be very high in memory requirements.
If you are trying to find manhattan distance for each set of x and y, I would recommend using arrayfun
The usage is as follows:
man_hat_dist_func = #(x,y)x+y;
out = arrayfun(man_hat_dist_func,data(:,1),data(:,2));
out = arrayfun(#(x,y)x+y,data(:,1),data(:,2));
This should be your fastest option to solve the distances. You should also consider pre-allocating memory for out using out = zeros(Length(data(:,1)),1);.

Vectorizing array indexing/subsetting in Matlab

Suppose I have a long data vector y, plus some indices into it. I want to extract a short snippet or window around every index.
For example, suppose I want to construct a matrix containing 64 samples before and 64 samples after every value that is below three. This is trivial to do in a for-loop:
WIN_SIZE = 64;
% Sample data with padding
data = [nan(WIN_SIZE,1); randn(1e6,1); nan(WIN_SIZE,1)];
% Sample events, could be anything
index = find(data < 3);
snippets = nan(length(index), 2*WIN_SIZE + 1);
for ii=1:length(index)
snippets(ii,:) = data((index(ii)-WIN_SIZE):(index(ii)+WIN_SIZE));
However,this is not blazingly fast. Is there any way to vectorize (or otherwise speed up) this operation?
(In case this is unclear, the index could be anything and may not necessarily be a property of the data; I just wanted something simple to illustrate the idea.)
Use bsxfun -
snippets = data(bsxfun(#plus,index(:),[-WIN_SIZE:WIN_SIZE]))

How do I efficiently replace a function with a lookup?

I am trying to increase the speed of code that operates on large datasets. I need to perform the function out = sinc(x), where x is a 2048-by-37499 matrix of doubles. This is very expensive and is the bottleneck of my program (even when computed on the GPU).
I am looking for any solution which improves the speed of this operation.
I expect that this might be achieved by pre-computing a vector LookUp = sinc(y) where y is the vector y = min(min(x)):dy:max(max(x)), i.e. a vector spanning the whole range of expected x elements.
How can I efficiently generate an approximation of sinc(x) from this LookUp vector?
I need to avoid generating a three dimensional array, since this would consume more memory than I have available.
Here is a test for the interp1 solution:
a = -15;
b = 15;
rands = (b-a).*rand(1024,37499) + a;
sincx = -15:0.000005:15;
sincy = sinc(sincx);
res1 = interp1(sincx,sincy,rands);
res2 = sinc(rands);
sincx = gpuArray(sincx);
sincy = gpuArray(sincy);
r = gpuArray(rands);
r = interp1(sincx,sincy,r);
r = gpuArray(rands);
r = sinc(r);
Elapsed time is 0.426091 seconds.
Elapsed time is 0.472551 seconds.
Elapsed time is 0.004311 seconds.
Elapsed time is 0.130904 seconds.
Corresponding to CPU interp1, CPU sinc, GPU interp1, GPU sinc respectively
Not sure I understood completely your problem.
But once you have LookUp = sinc(y) you can use the Matlab function interp1
out = interp1(y,LookUp,x)
where x can be a matrix of any size
I came to the conclusion, that your code can not be improved significantly. The fastest possible lookup table is based on simple indexing. For a performance test, lets just perform the test based on random data:
%test data:
%relevant code:
out = sinc(x);
Now the lookup based on integer indices:
Regardless of the size of the lookup table or the input data, the last lines in both code blocks take roughly the same time. Having sinc evaluate roughly as fast as a indexing operation, I can only assume that it is already implemented using a lookup table.
I found a faster way (if you have a NVIDIA GPU on your PC) , however this will return NaN for x=0, but if, for any reason, you can deal with having NaN or you know it will never be zero then:
if you define r = gpuArray(rands); and actually evaluate the sinc function by yourself in the GPU as:
This generally is giving me about 3.2x the speed than the interp1 version in the GPU, and its more accurate (tested using your code above, iterating 100 times with different random data, having both methods similar std).
This works because sin and elementwise division rdivide are also GPU implemented (while for some reason sinc isn't) . See: http://uk.mathworks.com/help/distcomp/run-built-in-functions-on-a-gpu.html
m = min(x(:));
y = m:dy:max(x(:));
LookUp = sinc(y);
now sinc(n) should equal
LookUp((n-m)/dy + 1)
assuming n is an integer multiple of dy and lies within the range m and max(x(:)). To get to the LookUp index (i.e. an integer between 1 and numel(y), we first shift n but the minimum m, then scale it by dy and finally add 1 because MATLAB indexes from 1 instead of 0.
I don't know what that wll do for you efficiency though but give it a try.
Also you can put this into an anonymous function to help readability:
sinc_lookup = #(n)(LookUp((n-m)/dy + 1))
and now you can just call

Nearest column in matlab

I want to find the nearest column of a matrix with a vector.
Consider the matrix is D and the vector is y. I want an acceleration method for this function
function BOOLEAN = IsExsist(D,y)
[~, Ysize, ~] = size(D);
MIN = 1.5;
for i=1:Ysize
if(BOOLEAN == 1)
if(norm(y - D(:,i),1) < MIN )
I am assuming you are looking to "accelerate" this procedure. For the same, try this -
[~,nearest_column_number] = min(sum(abs(bsxfun(#minus,D,y))))
The above code uses 1-Norm (as used by you) along all the columns of D w.r.t. y. nearest_column_number is your desired output.
If you are interested in using a threshold MIN for the getting the first nearest column number, you may use the following code -
normvals = sum(abs(bsxfun(#minus,D,y)))
nearest_column_number = find(normvals<MIN,1)
BOOLEAN = ~isempty(nearest_column_number)
nearest_column_number and BOOLEAN are the outputs you might be interested in.
If you are looking to make a function out of it, just wrap in the above code into the function format you were using, as you already have the desired output from the code.
Edit 1: If you are using this procedure for a case with large D matrices with sizes like 9x88800, use this -
normvals = sum(abs(bsxfun(#minus,D,y)));
BOOLEAN = false;
for k = 1:numel(normvals)
if normvals(k) < MIN
BOOLEAN = true;
Edit 2: It appears that you are calling this procedure/function a lot of times, which is the bottleneck here. So, my suggestion at this point would be to look into your calling function and see if you could reduce the number of calls, otherwise use your own code or try this slight modified version of it -
BOOLEAN = false;
for k = 1:numel(y)
if norm(y - D(:,k),1) < MIN %// You can try replacing this with "if sum(abs(y - D(:,i),1)) < MIN" to see if it gives any performance improvement
BOOLEAN = true;
To find the nearest column of a matrix D to a column vector y, with respect to 1-norm distance, you can use pdist2:
[~, index] = min(pdist2(y.',D.','minkowski',1));
What you are currently trying to do is optimize your Matlab implementation of linear search.
No matter how much you optimize that it will always need to calculate all D=88800 distances over all d=9 dimensions for each search.
Now that's easy to implement in Matlab as discussed in the other answer, but if you are planning to do many such searches, I would recommend to use a different data-structure and search-algorithm instead.
A good canditate would be (binary) space partioning which recursively splits your space into two parts along your dimensions. This adds quite some intial overhead to create the tree and makes insert- and remove-operations a bit more expensive. But as I understand your comments, searches are much more frequent and their execution will reduce in complexits from O(D) downto O(log(D)) which is a tremendous improvement for this problem size.
I think that there should be some usable Matlab-implementations of BSP around, e.g. on Mathworks file-exchange.
But if you don't manage to find one, I'd be happy to provide some pointers as well.