dsearchn() Command is slowing down my algorithm, Any better Solution? MATLAB - matlab

I am using the following code to calculate altitude.
Data = [Distance1',Gradient];
Result = Data(dsearchn(Data(:,1), Distance2), 2);
Altitude = -cumtrapz(Distance2, Result)/1000;
Distance 1 and Distance 2 has different size with same values so I am comparing them to get corresponding value of Gradient to use with Distance 2.
Just to execute these 3 lines the Matlab takes 12 to 15 seconds. Which slow down my whole algorithm.
Is there any better way I can perform above action without slowing down my algorithm?

If I understand correctly, you are looking for the first occurance of number Distance2 in the column Data(:,1). You can perform about 3 times faster using find. Try:
k = find(Data(:,1) == Distance2,1);
Result = Data(k,2);
Here is a timing test, where pow is the length of your data (10^pow for 10000 rows), and fac is the increased speed factor for using find
pow = 5;
data = round(rand(10^pow,1)*10);
funcFind = #() find(data == 5,1);
timeFind = timeit(funcFind);
funcD = #() dsearchn(data,5);
timeD = timeit(funcD);
fac = timeD/timeFind

I manage to find alternative method by using interp1 function.
Here is an Example Code.
Distance2= [1:10:1000]';
Distance1= [1:1:1000]';
Gradient= rand(1000,1);
Data= [Distance1,Gradient];
interp1(Distance1,Data(:,2),Distance2,'nearest');
This function add just 1 second to my original simulation time which is way better then previous 12 to 15 seconds.

Related

How do you program the Monte Carlo Integration method in Matlab?

I am trying to figure out how to right a math based app with Matlab, although I cannot seem to figure out how to get the Monte Carlo method of integration to work. I feel that I do not have algorithm thought out correctly either. As of now, I have something like:
// For the function {integral of cos(x^3)*exp(x^(1/2))+x dx
// from x = 0 to x = 10
ans = 0;
for i = 1:100000000
x = 10*rand;
ans = ans + cos(x^3)*exp(x^(1/2))+x
end
I feel that this is completely wrong because my outputs are hardly even close to what is expected. How should I correctly write this? Or, how should the algorithm for setting this up look?
Two issues:
1) If you look at what you're calculating, "ans" is going to grow as i increases. By putting a huge number of samples, you're just increasing your output value. How could you normalize this value so that it stays relatively the same, regardless of number of samples?
2) Think about what you're trying to calculate here. Your current "ans" is giving you the sum of 100000000 independent random measurements of the output to your function. What does this number represent if you divide by the number of samples you've taken? How could you combine that knowledge with the range of integration in order to get the expected area under the curve?
I managed to solve this with the formula I found here. I ended up using:
ans = 0;
n = 0;
for i:1:100000000
x = 10*rand;
n = n + cos(x^3)*exp(x^(1/2))+x;
end
ans = ((10-0)/100000000)*n

Matlab trouble with a lot of data. Vibration data is 3 million rows long

I have vibration data(g) in x and y direction for a Ball Bearing in 2 columns. Is there a way to find Manhattan distance just with this data & time?
If all you look for is summing data(:,1) and data(:,2), (as in #Austin answer) using vectored sum will be the fastest:
out = data(:,1)+data(:,2);
Here is an example:
function timming_mat_dist
data = randi(100,1e6,2);
timeit(#()arrmat(data))
timeit(#()vecmat(data))
end
function out = arrmat(data)
man_hat_dist_func = #(x,y)x+y;
out = arrayfun(man_hat_dist_func,data(:,1),data(:,2));
end
function out = vecmat(data)
out = data(:,1)+data(:,2);
end
and the results are 2.8936 seconds for arrmat and 0.0020463 seconds for vecmat.
However, if you want to compute all distances you should use pdist:
out = pdist(data,'cityblock');
but be aware that the output will be huge (and MATLAB probably won't let allocate so much memory for that) as #hammadian pointed in his answer.
It depends on which pair you want to measure the distance between. If you are looking for all possible combination of points, this will be memory and time consuming as the space requirement will be stored in 2D matrix of size:3 million * 3 million /2 ( /2 as distance between a and b is the same as b and a), so the matrix is upper triangular.
This is 3000000*3000000/2=4.5 Tera bytes. This can be improved but still will be very high in memory requirements.
If you are trying to find manhattan distance for each set of x and y, I would recommend using arrayfun
The usage is as follows:
man_hat_dist_func = #(x,y)x+y;
out = arrayfun(man_hat_dist_func,data(:,1),data(:,2));
OR
out = arrayfun(#(x,y)x+y,data(:,1),data(:,2));
This should be your fastest option to solve the distances. You should also consider pre-allocating memory for out using out = zeros(Length(data(:,1)),1);.

How do I efficiently replace a function with a lookup?

I am trying to increase the speed of code that operates on large datasets. I need to perform the function out = sinc(x), where x is a 2048-by-37499 matrix of doubles. This is very expensive and is the bottleneck of my program (even when computed on the GPU).
I am looking for any solution which improves the speed of this operation.
I expect that this might be achieved by pre-computing a vector LookUp = sinc(y) where y is the vector y = min(min(x)):dy:max(max(x)), i.e. a vector spanning the whole range of expected x elements.
How can I efficiently generate an approximation of sinc(x) from this LookUp vector?
I need to avoid generating a three dimensional array, since this would consume more memory than I have available.
Here is a test for the interp1 solution:
a = -15;
b = 15;
rands = (b-a).*rand(1024,37499) + a;
sincx = -15:0.000005:15;
sincy = sinc(sincx);
tic
res1 = interp1(sincx,sincy,rands);
toc
tic
res2 = sinc(rands);
toc'
sincx = gpuArray(sincx);
sincy = gpuArray(sincy);
r = gpuArray(rands);
tic
r = interp1(sincx,sincy,r);
toc
r = gpuArray(rands);
tic
r = sinc(r);
toc
Elapsed time is 0.426091 seconds.
Elapsed time is 0.472551 seconds.
Elapsed time is 0.004311 seconds.
Elapsed time is 0.130904 seconds.
Corresponding to CPU interp1, CPU sinc, GPU interp1, GPU sinc respectively
Not sure I understood completely your problem.
But once you have LookUp = sinc(y) you can use the Matlab function interp1
out = interp1(y,LookUp,x)
where x can be a matrix of any size
I came to the conclusion, that your code can not be improved significantly. The fastest possible lookup table is based on simple indexing. For a performance test, lets just perform the test based on random data:
%test data:
x=rand(2048,37499);
%relevant code:
out = sinc(x);
Now the lookup based on integer indices:
a=min(x(:));
b=max(x(:));
n=1000;
x2=round((x-a)/(b-a)*(n-1)+1);
lookup=sinc(1:n);
out2=lookup(x2);
Regardless of the size of the lookup table or the input data, the last lines in both code blocks take roughly the same time. Having sinc evaluate roughly as fast as a indexing operation, I can only assume that it is already implemented using a lookup table.
I found a faster way (if you have a NVIDIA GPU on your PC) , however this will return NaN for x=0, but if, for any reason, you can deal with having NaN or you know it will never be zero then:
if you define r = gpuArray(rands); and actually evaluate the sinc function by yourself in the GPU as:
tic
r=rdivide(sin(pi*r),pi*r);
toc
This generally is giving me about 3.2x the speed than the interp1 version in the GPU, and its more accurate (tested using your code above, iterating 100 times with different random data, having both methods similar std).
This works because sin and elementwise division rdivide are also GPU implemented (while for some reason sinc isn't) . See: http://uk.mathworks.com/help/distcomp/run-built-in-functions-on-a-gpu.html
m = min(x(:));
y = m:dy:max(x(:));
LookUp = sinc(y);
now sinc(n) should equal
LookUp((n-m)/dy + 1)
assuming n is an integer multiple of dy and lies within the range m and max(x(:)). To get to the LookUp index (i.e. an integer between 1 and numel(y), we first shift n but the minimum m, then scale it by dy and finally add 1 because MATLAB indexes from 1 instead of 0.
I don't know what that wll do for you efficiency though but give it a try.
Also you can put this into an anonymous function to help readability:
sinc_lookup = #(n)(LookUp((n-m)/dy + 1))
and now you can just call
sinc_lookup(n)

faster method of interpolation in matlab

I am using interp1 to inteprolate some data:
temp = 4 + (30-4).*rand(365,10);
depth = 1:10;
dz = 0.5; %define new depth interval
bthD = min(depth):dz:max(depth); %new depth vector
for i = 1:length(temp);
i_temp(i,:) = interp1(depth,temp(i,:),bthD);
end
Here, I am increasing the resolution of my measurements by interpolating the measurements from 1 m increments to 0.5 m increments. This code works fine i.e. it gives me the matrix I was looking for. However, when I apply this to my actual data, it takes a long time to run, primarily as I am running an additional loop which runs through various cells. Is there a way of achieving what is described above without using the loop, in other words, is there a faster method?
Replace your for loop with:
i_temp = interp1(depth,temp',bthD)';
You can get rid of the transposes if you change the way that temp is defined, and if you are OK with i_temp being a 19x365 array instead of 365x19.
BTW, the documentation for interp1 is very clear that you can pass in an array as the second argument.

Looping over matrix elements more efficiently in Matlab

I am writing some matlab code and have written an algorithm that works but I don't think its particularly efficient. Since I am trying to improve my programming skills I would like to know if there is a more efficient way of doing this.
I have a (reasonably large ~ E07) matrix of values which are unordered, but fall within the range [-100, 100]. I want to create a second matrix based on the first, by using the following rules:
If the value of the point is > 70, then the value of the point should be set to 70.
If the value of the point is < -70, then the value of the point should be set to -70.
All other values should be rounded to the nearest multiple of 5.
Here is what I am currently doing:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data));
for i = 1:length(data)
if (data(i) > 70)
new_data(i) = 70;
elseif (data(i) < -70)
new_data(i) = -70;
else
new_data(i) = round(data(i)/5.0)*5.0;
end
end
Is there a more efficient method? I think there should be a way to do this using logical indexes but those are a new discovery for me...
You do not need a loop at all:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data)); % note that this memory allocation is not necessary at this point
new_data = round(data/5.0)*5.0;
new_data(data>70) = 70;
new_data(data<-70) = -70;
Even easier is to use max and min. Do it in one simple line.
new_data = round(5*max(-70,min(70,data)))/5;
The two answers by H.Muster and woodchips are of course the way to do it, but there still are small improvements to be found. If you are after performance you might want to exploit specifics of your problem. For example, your output data is integers -100 <= x <= 100. This obviously qualifies for 8-bit signed integer data type. This code (note explicit cast to int8 from arbitrary double precision data)
% your double precision input data
data = 100*(-1+2*rand(1,10000000));
% cast to int8 - matlab does usual round here
data = int8(data);
new_data = 5*(max(-70,min(70,data))/5);
is the fastest for two reasons:
1 data element takes 1 byte, not 8. Memory bandwidth is a limiting factor here, so you get a lot of improvement
round is no longer necessary
Here are some timings from the codes of H.Muster, woodchips, and my small modification:
H.Muster Elapsed time is 0.235885 seconds.
woodchips Elapsed time is 0.167659 seconds.
my code Elapsed time is 0.023061 seconds.
The difference is quite striking. Although MATLAB uses doubles everywhere, you should try to use integer data types when possible..
Edit This works because of how matlab implements integer arithmetic. Differently than in C, a cast of double to int implies a round operation:
a = 0.1;
int8(a)
ans =
0
a = 0.9;
int8(a)
ans =
1