Speed Up Finding Number of Elements Between Values Matlab - matlab

I've created what is a fairly simple MATLAB script to simulate the behaviour discussed in this question over on Maths SE.
clearvars;
samples = 1000;
x = 256;
r=exprnd(1/20e6,1,samples); % Generate exponentially distributed randoms.
endTime = sum(r);
quickMean=sum(r(1:x))/x; % Quick calc the mean and median.
quickMedian=0.693 * quickMean;
p = cumsum(r); % Convert event deltas into timestamps
bitstream = false(1,samples);
time = 0;
lastTime = 0;
for i = 1:samples
lastTime = time;
time = time + quickMedian;
if (numel(p(p < time & p > lastTime)) > 0)
bitstream(i) = true;
end
if (time > p(end))
break
end
end
ratio = sum(bitstream)/samples;
The script seems to work, however, if I use a large number of samples (say a million), which would be beneficial, it really crawls.
I'm assuming that the problematic statement is this one:
p(p < time & p > lastTime)
Is there a more efficient way to check if any elements in an array fall between two values?

As I mentioned in my comment, we can use the fact that p is monotonically increasing and ignore values less than lastTime. If we find the last value for which p < time, only the values to the right can be greater than* time on the next iteration (lastTime).
clearvars;
samples = 1000;
x = 256;
r=exprnd(1/20e6,1,samples); % Generate exponentially distributed randoms.
endTime = sum(r);
quickMean=sum(r(1:x))/x; % Quick calc the mean and median.
quickMedian=0.693 * quickMean;
p = cumsum(r); % Convert event deltas into timestamps
bitstream = false(1,samples);
time = 0;
lastTime = 0;
% code is the same up to here ---
lastTimeIdx = 1; % index of (last value < lastTime) + 1
for i = 1:samples
lastTime = time;
time = time + quickMedian;
valsInRange = p(lastTimeIdx:end) < time; % p > lastTime & p < time
timeIdx = find(valsInRange, 1, 'last'); % returns [] or index
if ~isempty(timeIdx)
bitstream(i) = true;
lastTimeIdx = lastTimeIdx + timeIdx; % update start of next search
end
if (time > p(end))
break
end
end
ratio = sum(bitstream)/samples;
*Actually, this is "greater than or equal to", but since the values of p are unique, they are the same thing.
Okay, I just tried histc in Octave. I'm embarrassed to say that it's ridiculously fast. Like 4 orders of magnitude faster. Here's the code I used, but histc is deprecated in MATLAB, and the binning for histcounts is different, so you may have to play with it a bit.
bitstream_hist = histc(p, [0:samples]*quickMedian) > 0;
bitstream_hist = bitstream_hist(1:samples);
One million samples finishes in a fraction of a second. Sorry I didn't think of this sooner.

Let’s examine that whole expression:
numel(p(p < time & p > lastTime)) > 0
We can separate that out for clarity:
I = p < time & p > lastTime;
tmp = p(I);
n = numel(tmp);
n > 0
Here, the creation of tmp is pretty expensive: it looks at where I is true, and it copies those elements over to a new array. But the only thing you do with this array is seeing how many elements it has. Logically n will be equal to the number of true elements in I. And you don’t really need this number, you just need to know if it’s larger than 0. That is, you want to know if any of the elements in I is true. You can do so with any:
any(p < time & p > lastTime)

Related

How do I linearly interpolate past missing values using future values in a while loop?

I am using MATLAB R2020a on a MacOS. I am trying to remove outlier values in a while loop. This involves calculating an exponentially weighted moving mean and then comparing this a vector value. If the conditions are met, the vector input is then added to a separate vector of 'acceptable' values. The while loop then advances to the next input and calculates the new exponentially weighted moving average which includes the newly accepted vector input.
However, if the condition is not met, I written code so that, instead of adding the input sample, a zero is added to the vector of 'acceptable' values. Upon the next acceptable value being added, I currently have it so the zero immediately before is replaced by the mean of the 2 framing acceptable values. However, this only accounts for one past zero and not for multiple outliers. Replacing with a framing mean may also introduce aliaising errors.
Is there any way that the zeros can instead be replaced by linearly interpolating the "candidate outlier" point using the gradient based on the framing 2 accepted vector input values? That is, is there a way of counting backwards within the while loop to search for and replace zeros as soon as a new 'acceptable' value is found?
I would very much appreciate any suggestions, thanks in advance.
%Calculate exponentially weighted moving mean and tau without outliers
accepted_means = zeros(length(cycle_periods_filtered),1); % array for accepted exponentially weighted means
accepted_means(1) = cycle_periods_filtered(1);
k = zeros(length(cycle_periods_filtered),1); % array for accepted raw cycle periods
m = zeros(length(cycle_periods_filtered), 1); % array for raw periods for all cycles with outliers replaced by mean of framing values
k(1) = cycle_periods_filtered(1);
m(1) = cycle_periods_filtered(1);
tau = m/3; % pre-allocation for efficiency
i = 2; % index for counting through input signal
j = 2; % index for counting through accepted exponential mean values
n = 2; % index for counting through raw periods of all cycles
cycle_index3(1) = 1;
while i <= length(cycle_periods_filtered)
mavCurrent = (1 - 1/w(j))*accepted_means(j - 1) + (1/w(j))*cycle_periods_filtered(i);
if cycle_periods_filtered(i) < 1.5*(accepted_means(j - 1)) && cycle_periods_filtered(i) > 0.5*(accepted_means(j - 1)) % Identify high and low outliers
accepted_means(j) = mavCurrent;
k(j) = cycle_periods_filtered(i);
m(n) = cycle_periods_filtered(i);
cycle_index3(n) = i;
tau(n) = m(n)/3;
if m(n - 1) == 0
m(n - 1) = (k(j) + k(j - 1))/2;
tau(n - 1) = m(n)/3;
end
j = j + 1;
n = n + 1;
else
m(n) = 0;
n = n + 1;
end
i = i + 1;
end
% Scrap the tail
accepted_means(j - 1:end)=[];
k(j - 1:end) = [];

Matlab: Is there a quicker way to count the number of occurrences of a value in a vector?

Thanks in advance for the help
I am using the following to count the number of occurrences of the value x in a vector v
count = sum(v == x);
Is there anyway that I can decrease the time to count these occurrences? Notice that v tends to be small; usually no more than 100 elements. However, this operation occurs tens of thousands of times in my code and seems to be by far the most time consuming operation when analyzing my code using the profiler. I've looked at the accumarray function but it appears that the approach I give above tends to be faster (at least the way I tried to use it).
Depending on the rest of your code and the type of data, one possible way to approach this is to subtract x from v and count zeros instead. E.g.,
v = rand(200,1);
v(121) = v(3); % add some duplicates of v(3)
v(189) = v(3); % add some duplicates of v(3)
x = v(3);
count = numlel(v)-nnz(v-x);
Subtracting costs CPU-time but you might benefit from it in the end. Since I don't have your data I've just made a small test. You can test on your actual data to see whether it's something for you or not.
N = 100000;
for k = 1:1
v = randn(200,1);
vy = zeros(size(v));
v(121) = v(3);
v(189) = v(3);
x = v(3);
t1=tic;
for j = 1:N
count1 = sum(v(:)==x);
end
t1s=toc(t1)/N;
t2=tic;
for j = 1:N % time the cost of subtraction prior to nnz()
vy=v-x;
count2 = numel(v)-nnz(vy);
end
t2s=toc(t2)/N;
t3=tic;
for j = 1:N % time the cost of subtraction within nnz()
count3 = numel(v)-nnz(v-x);
end
t3s=toc(t3)/N;
[count1 count2 count3]
[t1s t2s t3s]
end
ans =
3 3 3
ans =
1.0e-05 *
0.1496 0.1048 0.1222
You can see John D'Errico's answer here about counting zeros.

What's the difference between these two codes to get the sum of infinite series in MATLAB?

1 + 1/(2^4) + 1/(3^4) + 1/(4^4) + ...
This is the infinite series that I'd like to get the sum value. So I wrote this code in MATLAB.
n = 1;
numToAdd = 1;
sum = 0;
while numToAdd > 0
numToAdd = n^(-4);
sum = sum + numToAdd;
n = n + 1;
end
disp(sum);
But I couldn't get the result because this code occurred an infinite loop. However, the code I write underneath -- it worked well. It took only a second.
n = 1;
oldsum = -1;
newsum = 0;
while newsum > oldsum
oldsum = newsum;
newsum = newsum + n^(-4);
n = n+1;
end
disp(newsum);
I read these codes again and googled for a while, but coudln't find out the critical point. What makes the difference between these two codes? Is it a matter of precision of double in MATLAB?
The first version would have to go down to the minimum value for a double ~10^-308, while the second will only need to go down to the machine epsilon ~10^-16. The epsilon value is the largest value x such that 1+x = 1.
This means the first version will need approximately 10^77 iterations, while the second only needs 10^4.
The problem boils down to this:
x = 1.23456789; % Some random number
xEqualsXPlusEps = (x == x + 1e-20)
ZeroEqualsEps = (0 == 1e-20)
xEqualsXPlusEps will be true, while ZeroEqualsEps is false. This is due to the way floating point arithmetic works. The value 1e-20 is smaller than the least significant bit of x, so x+1e-20 won't be larger than x. However 1e-20 is not considered equal to 0. In comparison to x, 1e-20 is relatively small, whereas in comparison to 0, 1e-20 is not small at all.
To fix this problem you would have to use:
while numToAdd > tolerance %// Instead of > 0
where tolerance is some small number greater than zero.

MATLAB - My while-loop conditions for extracting specific data from a timeseries

Background
I have 4 data sets: one is weather data with time and pressure and another is a pressure sensor data set with the same; time and pressure. Essentially, both are a time series. The longer time series is the weather data which has about 64008 data points for both variables. The shorter time series for the pressure sensors is 51759. You could say that the shorter time series is a subset of the longer time series with some missing data points. Regardless, I want to get pressure for the weather but only for the times that my sensor has.
Motivation
So basically, I am trying to implement a while loop so that for every equivalent time pf my pressure sensor, and whether data, I will take the pressure from the weather data. I don't need to record time from the weather data because I can just use the time sequence from my pressure sensor.
Example
To a get an idea of what I am talking about, I did a sample script and it runs just fine.
x(:,1) = (1:50)';
x(:,2) = (51:100)';
y(:,1) = [1:12 20:25 40:45]';
i = 1;
j = 1;
while i < numel(y)+1
if y(i) == x(j,1)
a(i,1) = x(j,2);
i = i + 1;
j = j + 1;
else
j = j + 1;
end
end
a
% check
size(y)
size(a)
As you can see, I made a vector of x with a long series in 2 columns. And then I made a subset of values of vector y which includes data points that are contained in the x vectors. I run my script, the size of a matches y which means that the size comes out to be the same. I also saw that the matrix itself had the same values. So it works. Unless this is a simplified version where I'm missing something. Either way, my real script is below.
% Pressure Data
west_time;
west_pressure;
% Weather Data
weather_data(:,1) = weather_time;
weather_data(:,2) = weather_pressure;
% Initialize vector
weather_pressure_sensor = zeros(numel(west_time));
% Obtaining the pressure from the weather data at a
% particular point in time when it corresponds
% with the time from my pressure sensor
i = 1;
j = 1;
while i < numel(west_time),
if west_time(i) == weather_data(j,1)
weather_pressure_sensor(i,:) = weather_data(j,2);
i = i + 1;
j = j + 1;
else
i = i;
j = j + 1;
end
end
% Weather Pressure
weather_pressure_final = weather_pressure_sensor(:,2);
However, when i go to my data set, I run into an error code:
Attempted to access weather_data(64009,1); index out of
bounds because size(weather_data)=[64008,2].
Error in data_timeset2 (line 69)
if west_time(i) == weather_data(j,1)
I was wondering if I could get some assistance with my code. Am I missing something or did I not define something? This is the way I've always done while loops so I don't know why it decides to fail me now. But in any case, I'm sure it's something really trivial and stupid but I can't figure out for the life of me. Or maybe someone has another way...? Either way, much appreciated in advance!
If the time points in your data set are unique, there is a much better way to do this.
t1 = [...]; #% time series 1
t2 = [...]; #% time series 2; is a subset of t1
p1 = [...]; #% pressure series 1; same length as t1
p2 = [...]; #% pressure series 2; same length as t2
[t1, index] = sort(t1); #% make monotonic if it isn't already
p1 = p1(index); #% apply same sorting to pressures
[t2, index] = sort(t2); #% same for t2, p2
p2 = p2(index);
[Lia, Locb] = ismember(t2, t1); #% Lia contains indices of t2 that are in t1
#% Locb contains indices of t1 that are in t2
final_p = p1(Locb); #% get the values of p1 where t2 existed in t1

Cutting down large matrix iteration time

I have some massive matrix computation to do in MATLAB. It's nothing complicated (see below). I'm having issues with making computation in MATLAB efficient. What I have below works but the time it takes simply isn't feasible because of the computation time.
for i = 1 : 100
for j = 1 : 20000
element = matrix{i}(j,1);
if element <= bigNum && element >= smallNum
count = count + 1;
end
end
end
Is there a way of making this faster? MATLAB is meant to be good at these problems so I would imagine so?
Thank you :).
count = 0
for i = 1:100
count = count + sum(matrix{i}(:,1) <= bigNum & matrix{i}(:,1) >= smallNum);
end
If your matrix is a matrix, then this will do:
count = sum(matrix(:) >= smallNum & matrix(:) <= bigNum);
If your matrix is really huge, use anyExceed. You can profile (check the running time) of both functions on matrix and decide.