Find average value of variable in an interval in MATLAB - matlab

I have some data inside MATLAB. On the picture you can see a small portion:-
The numbers I'm interested in are RPM and Lambda. As you can see, they are neither strictly decreasing or increasing (they are non-linear so to speak). I want to find the average Lambda value in RPM intervals, like from 250-500, 500-750, 1000-1250 and so on. But I don't know how to write such code in MATLAB and the reason is that I won't know at what index this will happen, because the RPM numbers aren't strictly decreasing/increasing.
while RPM >= 1000 && RPM < 1250
Lambda_avg = sum of Lambda values in interval / number of Lambdas in interval
end
while RPM >= 1250 && RPM < 1500
...
end
I could maybe sort the RPM-column from lowest to highest, and then also sort the Lambda-column accordingly, although I'm not sure how to do that either.
Is there any way I can find the average lambda-value in a certain RPM interval across all of the data? I hope my question is clear enough.

If you have all the values of lambda in the variable lambda and all the values of RPM in the RPM variable, then you just do, for example
RPM1 = 1000;
RPM2 = 1500;
lambda_avg = mean(lambda((RPM >= RPM1) & (RPM < RPM2)));
A single & does an element-by-element AND comparison, and a single | does the elementwise OR.
If your data is organized as a MATLAB Table called data, for example, then you can do
lambda_avg = mean(data.lambda((data.RPM >= RPM1) & (data.RPM < RPM2)));
This method takes advantage of the logical indexing feature of MATLAB, and allows you to skip the loop that you attempted writing in your question...
Just for reference, if you want to explicitly write a loop to calculate this mean, you can do it like this:
lambda_avg = 0;
n_lambda = 0; % number of lambdas you found in the interval
for i = 1:numel(RPM)
if (RPM(i) >= RPM1) && (RPM(i) < RPM2)
lambda_avg = lambda_avg + lambda(i);
n_lambda = n_lambda + 1;
end
end
lambda_avg = lambda_avg / n_lambda;

Related

Find sum distance to horizontal line for all points in Matlab

I have a scatter plot of approximately 30,000 pts, all of which lie above a horizontal line which I've visually defined in my plot. My goal now is to sum the vertical distance of all of these points to this horizontal line.
The data was read in from a .csv file and is already saved to the workspace, but I also need to check whether a value is NaN, and ignore these.
This is where I'm at right now:
vert_deviation = 0;
idx = 1;
while idx <= numel(my_data(:,5)) && isnan(idx) == 0
vert_deviation = vert_deviation + ((my_data(idx,5) - horiz_line_y_val));
idx = idx + 1;
end
I know that a prerequisite of using the && operator is having two logical statements I believe, but I'm not sure how to rewrite this loop in this way at the moment. I also don't understant why vert_deviation returns NaN at the moment, but I assume this might have to do with the first mistake I described...
I would really appreciate some guidance here - thank you in advance!
EDIT: The 'horizontal line' is a slight oversimplification - in reality the lower limit I need to find the distance to consists of 6 different line segments
I should have specified that the lower limit to which I need to calculate the distance for all scatterplot points varies for different x values (the horizontal line snippet was meant to be a simplification but may have been misleading... apologies for that)
I first modified the data I had already read into the workspace by replacing all NaNvalues with 0. Next, I wrote a while loop which defines the number if indexes to loop through, and defined an && condition to filter out any zeroes. I then wrote a nested if loop which checks what range of x values the given index falls into, and subsequently takes the delta between the y values of a linear line lower limit for that section of the plot and the given point. I repeated this for all points.
while idx <= numel(my_data(:,3)) && not(my_data(idx,3) == 0)
...
if my_data(idx,3) < upper_x_lim && my_data(idx,5) > lower_x_lim
vert_deviation = vert_deviation + (my_data(idx,4) - (m6 * (my_data(idx,5)) + b6))
end
...
m6 and b6 in this case are the slope and y intercept calculated for one section of the plot. The if loop is repeated six times for each section of the lower limit.
I'm sure there are more elegant ways to do this, so I'm open to any feedback if there's room for improvement!
Your loop doesn't exclude NaN values becuase isnan(idx) == 0 checks to see if the index is NaN, rather than checking if the data point is NaN. Instead, check for isnan(my_data(idx,5)).
Also, you can simplify your code using for instead of while:
vert_deviation = 0;
for idx=1:size(my_data,1)
if !isnan(my_data(idx,5))
vert_deviation = vert_deviation + ((my_data(idx,5) - horiz_line_y_val));
end
end
As #Adriaan suggested, you can remove the loop altogether, but it seems that the code in the OP is an oversimplification of the problem. Looking at the additional code posted, I guess it is still possible to remove the loops, but I'm not certain it will be a significant speed improvement. Just use a loop.

Approximate pi using finitie series

Given the equation to approximate pi
I need to the number of terms (n) that are needed to obtain an approximation that is within 10^(-12) of the actual value of pi. The code I have to find the n looks like this:
The while loop statement I have seems to never end, so I feel like my code must be wrong.
Try something along these lines (transcribed from your image), incrementing the number of approximation terms n inside your infinite while loop:
s = 1
n = 1
while true
s = abs(pi - approximate_pi(n))
if s <= 0.001
break
end
n = n + 1
end
On a related note, this calculation is a little bit pointless if you know the value of pi beforehand. Termination condition should be on the absolute magnitude of the n-th term.
The way you're doing it makes sense only if you're trying to find out minimum n for which your approximation series produces the result within some margin of error.
Edit. So, normally you would do it like this:
n = 1;
sum_running = 0
sum_target = (pi^2 - 8) / 16;
while true
sum_running += 1 / ((2*n-1)^2 * (2*n+1)^2);
if abs(sum_target - sum_running) <= 10e-12
break
end
n += 1;
end
pi_approx = sqrt(16*sum_running + 8)
There's no need to keep recalculating pi approximation up to n terms, for each new n. This is has O(n) complexity, while your initial solution had O(n^2), so it's much faster for large n.

Find zero crossing of sampled function [duplicate]

I have written a function in MATLAB to count the number of zero crossings given a vector of signal data. If I find a zero crossing, I also check whether the absolute difference between the two vector indices involved is greater than a threshold value - this is to try to reduce the influence of signal noise.
zc = [];
thresh = 2;
for i = 1:length(v)-1
if ( (v(i)>0 && v(i+1)<0) || (v(i)<0 && v(i+1)>0) ) && abs(v(i)-v(i+1)) >= thresh
zc = [zc; i+1];
end
end
zcCount = length(zc);
I used the vector from the zero crossings function here to test it: http://hips.seas.harvard.edu/content/count-zero-crossings-matlab
A = [-0.49840598306643,
1.04975509964655,
-1.67055867973620,
-2.01437026154355,
0.98661592496732,
-0.06048256273708,
1.19294080740269,
2.68558025885591,
0.85373360483580,
1.00554850567375];
It seems to work fine but is there a more efficient way of achieving the same result? E.g. on the above webpage, they simply use the following line to calculate zero crossings:
z=find(diff(v>0)~=0)+1;
Is there a way to incorporate the threshold check into something similarly efficient?
How about
zeroCrossIndex=diff(v>0)~=0
threshholdIndex = diff(v) >= thresh;
zcCount = sum(zeroCrossIndex & threshholdIndex)

Looping over matrix elements more efficiently in Matlab

I am writing some matlab code and have written an algorithm that works but I don't think its particularly efficient. Since I am trying to improve my programming skills I would like to know if there is a more efficient way of doing this.
I have a (reasonably large ~ E07) matrix of values which are unordered, but fall within the range [-100, 100]. I want to create a second matrix based on the first, by using the following rules:
If the value of the point is > 70, then the value of the point should be set to 70.
If the value of the point is < -70, then the value of the point should be set to -70.
All other values should be rounded to the nearest multiple of 5.
Here is what I am currently doing:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data));
for i = 1:length(data)
if (data(i) > 70)
new_data(i) = 70;
elseif (data(i) < -70)
new_data(i) = -70;
else
new_data(i) = round(data(i)/5.0)*5.0;
end
end
Is there a more efficient method? I think there should be a way to do this using logical indexes but those are a new discovery for me...
You do not need a loop at all:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data)); % note that this memory allocation is not necessary at this point
new_data = round(data/5.0)*5.0;
new_data(data>70) = 70;
new_data(data<-70) = -70;
Even easier is to use max and min. Do it in one simple line.
new_data = round(5*max(-70,min(70,data)))/5;
The two answers by H.Muster and woodchips are of course the way to do it, but there still are small improvements to be found. If you are after performance you might want to exploit specifics of your problem. For example, your output data is integers -100 <= x <= 100. This obviously qualifies for 8-bit signed integer data type. This code (note explicit cast to int8 from arbitrary double precision data)
% your double precision input data
data = 100*(-1+2*rand(1,10000000));
% cast to int8 - matlab does usual round here
data = int8(data);
new_data = 5*(max(-70,min(70,data))/5);
is the fastest for two reasons:
1 data element takes 1 byte, not 8. Memory bandwidth is a limiting factor here, so you get a lot of improvement
round is no longer necessary
Here are some timings from the codes of H.Muster, woodchips, and my small modification:
H.Muster Elapsed time is 0.235885 seconds.
woodchips Elapsed time is 0.167659 seconds.
my code Elapsed time is 0.023061 seconds.
The difference is quite striking. Although MATLAB uses doubles everywhere, you should try to use integer data types when possible..
Edit This works because of how matlab implements integer arithmetic. Differently than in C, a cast of double to int implies a round operation:
a = 0.1;
int8(a)
ans =
0
a = 0.9;
int8(a)
ans =
1

Cutting down large matrix iteration time

I have some massive matrix computation to do in MATLAB. It's nothing complicated (see below). I'm having issues with making computation in MATLAB efficient. What I have below works but the time it takes simply isn't feasible because of the computation time.
for i = 1 : 100
for j = 1 : 20000
element = matrix{i}(j,1);
if element <= bigNum && element >= smallNum
count = count + 1;
end
end
end
Is there a way of making this faster? MATLAB is meant to be good at these problems so I would imagine so?
Thank you :).
count = 0
for i = 1:100
count = count + sum(matrix{i}(:,1) <= bigNum & matrix{i}(:,1) >= smallNum);
end
If your matrix is a matrix, then this will do:
count = sum(matrix(:) >= smallNum & matrix(:) <= bigNum);
If your matrix is really huge, use anyExceed. You can profile (check the running time) of both functions on matrix and decide.