Detecting if values are within range of each other and taking a midpoint - MATLAB - matlab

Following on from: Detecting if any values are within a certain value of each other - MATLAB
I am currently using randi to generate a random number from which I then subtract and add a second number - generated using poissrnd:
for k=1:10
a = poissrnd(200,1);
b(k,1) = randi([1,20000]);
c(k,1:2) = [b(k,1)-a,b(k,1)+a];
end
c = sort(c);
c provides an output in this format:
823 1281
5260 5676
5372 5760
5379 5779
6808 7244
6869 7293
9203 9653
12197 12563
14411 14765
15302 15670
Which are essentially the boundaries +/- a around the point chosen in b.
I then want to set an additional variable (i.e. d = 2000) which is used as the threshold by which values are matched and then merged. The boundaries are taken into consideration for this - the output of the above value when d = 2000 would be:
1052
7456
13933
The boundaries 823-1281 are not within 2000 of any other value so the midpoint is taken - reflecting the original value. The next midpoint taken is between 5260 and 9653 because as you go along, each successive values is within 2000 of the one before it until 9653. The same logic is then applied to take the midpoint between 12197 and 15670.
Is there a quick and easy way to adapt the answer give in the linked question to deal with a 2 column format?
EDIT (in order to make it clearer):
The values held in c can be thought of as demarcating the boundaries of 'blocks' that sit on a line. Every single boundary is checked to see if anything lies within 2000 of it (the black lines).
As soon as any black line touches a red block, that entire red block is incorporated into the same merge block - in full. This is why the first midpoint value calculated is 1052 - nothing is touched by the two black lines emanating from the first two boundaries. However the next set of blocks all touch one another. This incorporates them all into the merge such that the midpoint is taken between 9653 and 5260 = 7456.
The block starting at 12197 is out of reach of it's preceding one so it remains separate. I've not shown all the blocks.
EDIT 2 #Esteban:
b =
849
1975
8336
9599
12057
12983
13193
13736
16887
18578
c =
662 1036
1764 2186
8148 8524
9386 9812
11843 12271
12809 13157
12995 13391
13543 13929
16687 17087
18361 18795
Your script then produces the result:
8980
12886
17741
When in fact it should be:
1424
8980
12886
17741
So it is just missing the first value - if no merge is occurring, the midpoint is just taken between the two values. Sometimes this seems to work - other times it doesn't.
For example here it works (when value is set to 1000 instead of 2000 as a test):
c =
2333 2789
5595 6023
6236 6664
10332 10754
11425 11865
12506 12926
12678 13114
15105 15517
15425 15797
19490 19874
result =
2561
6129
11723
15451
19682

See if this works for you -
th = 2000 %// threshold
%// Column arrays
col1 = c(:,1)
col2 = c(:,2)
%// Position of "group" shifts
grp_changes = diff([col2(1:end-1,:) col1(2:end,:)],[],2)>th
%// Start and stop positions of shifts
stops = [grp_changes ; 1]
starts = [1 ; stops(1:end-1)]
%// Finally the mean of shift positions, which is the desired output
out = floor(mean([col1(starts~=0) col2(stops~=0)],2))

Not 100% sure if it will work for all your samples... but this is the code I came up with which works with at least the data in your example:
value=2000;
indices = find(abs(c(2:end,1)-c(1:end-1,2))>value);
indices = vertcat(indices, length(c));
li = indices(1:end-1)+1;
ri = indices(2:end);
if li(1)==2
li=vertcat(1,li);
ri=vertcat(1,ri);
end
result = floor((c(ri,2)+c(li,1))/2)
it's not very clean and could surely be done in less lines, but it's easy to understand and it works, and since your c will be small, I dont see the need to further optimize this unless you will run it millions of time.

Related

Find sum distance to horizontal line for all points in Matlab

I have a scatter plot of approximately 30,000 pts, all of which lie above a horizontal line which I've visually defined in my plot. My goal now is to sum the vertical distance of all of these points to this horizontal line.
The data was read in from a .csv file and is already saved to the workspace, but I also need to check whether a value is NaN, and ignore these.
This is where I'm at right now:
vert_deviation = 0;
idx = 1;
while idx <= numel(my_data(:,5)) && isnan(idx) == 0
vert_deviation = vert_deviation + ((my_data(idx,5) - horiz_line_y_val));
idx = idx + 1;
end
I know that a prerequisite of using the && operator is having two logical statements I believe, but I'm not sure how to rewrite this loop in this way at the moment. I also don't understant why vert_deviation returns NaN at the moment, but I assume this might have to do with the first mistake I described...
I would really appreciate some guidance here - thank you in advance!
EDIT: The 'horizontal line' is a slight oversimplification - in reality the lower limit I need to find the distance to consists of 6 different line segments
I should have specified that the lower limit to which I need to calculate the distance for all scatterplot points varies for different x values (the horizontal line snippet was meant to be a simplification but may have been misleading... apologies for that)
I first modified the data I had already read into the workspace by replacing all NaNvalues with 0. Next, I wrote a while loop which defines the number if indexes to loop through, and defined an && condition to filter out any zeroes. I then wrote a nested if loop which checks what range of x values the given index falls into, and subsequently takes the delta between the y values of a linear line lower limit for that section of the plot and the given point. I repeated this for all points.
while idx <= numel(my_data(:,3)) && not(my_data(idx,3) == 0)
...
if my_data(idx,3) < upper_x_lim && my_data(idx,5) > lower_x_lim
vert_deviation = vert_deviation + (my_data(idx,4) - (m6 * (my_data(idx,5)) + b6))
end
...
m6 and b6 in this case are the slope and y intercept calculated for one section of the plot. The if loop is repeated six times for each section of the lower limit.
I'm sure there are more elegant ways to do this, so I'm open to any feedback if there's room for improvement!
Your loop doesn't exclude NaN values becuase isnan(idx) == 0 checks to see if the index is NaN, rather than checking if the data point is NaN. Instead, check for isnan(my_data(idx,5)).
Also, you can simplify your code using for instead of while:
vert_deviation = 0;
for idx=1:size(my_data,1)
if !isnan(my_data(idx,5))
vert_deviation = vert_deviation + ((my_data(idx,5) - horiz_line_y_val));
end
end
As #Adriaan suggested, you can remove the loop altogether, but it seems that the code in the OP is an oversimplification of the problem. Looking at the additional code posted, I guess it is still possible to remove the loops, but I'm not certain it will be a significant speed improvement. Just use a loop.

for loop not reading previous value

I seem to currently be banging my head against a brick wall as try as I might, I can not see my error here.
I am attempting to write a for loop in MATLAB that uses the equation below (adiabatic compression) to calculate the new pressure after one degree of crankshaft rotation in a four stroke engine cycle.
P2 = P1 * (V2 / V1) ^2
I am using the calculated volume from the crank-slider model as an input. I have tried this is Excel and it works as expected and gives the overall max output correctly.
The for loop in question is below;
Cyl_P = ones(720,1)
for i = (2:1:length(Cyl_V))'
Cyl_P(i,:) = Cyl_P(i-1,:) .* (Cyl_V(i,:) ./ Cyl_V(i-1,:)).^1.35
end
my aim is to use the first element of the vector Cyl_P which is equal to one, as an input to the equation above, and multiply it by the second element of Cyl_V divided by the first, and multiply the volume terms by 1.35. that should calculate the second element of Cyl_P. I would then like to feed that value back in to the same equation to calculate the third element and so on.
What am I missing?
I've put the full code below
Theta = deg2rad(1:1:720)'
Stroke = 82 / 1000
R = Stroke / 2
L = 90.5 / 1000
Bore = 71.9 / 1000
d_h = (R+L) - (R.*cos(Theta)) - sqrt(L.^2 - (R.*sin(Theta)).^2)
Pist_h = d_h
figure
plot(Pist_h)
Bore_A = (pi*Bore^2)/4
Swept_V = (Pist_h .* Bore_A)
Clear_V = max(Swept_V) / 10
Total_V = max(Swept_V) + Clear_V
Cyl_V = (Swept_V + Clear_V)
figure
plot(Cyl_V)
for ii = (2:1:length(Cyl_V))'
div_V(ii,:) = (Cyl_V(ii) ./ Cyl_V(ii-1,:)).^1.35
end
Cyl_P = ones(720,1)
for i = (2:1:length(Cyl_V))'
Cyl_P(i,:) = Cyl_P(i-1,:) .* (Cyl_V(i,:) ./ Cyl_V(i-1,:)).^1.35
end
figure
plot(Cyl_P)
Your problem is transposing the arrays you feed as argument to for loops. MATLAB reads for arguments per row, thus only the first iteration will be used when you feed it a column. General comments:
' is the complex transpose, .' is the regular transpose.
i is the imaginary unit in MATLAB, it's common practise not to use it as a variable name.
2:1:4 does the same as 2:4, as 1 is the default step size.
Please use semi-colons, ;, after each row, so as to prevent MATLAB from echoing the result of each line to the command window. This makes the script easier to run, and if you have matrices with >1M entries, echoing the contents might even crash the program all together. Even in this case, you are echoing 720 entries of Cyl_P 720 times. For checking variable contents, just break the script where necessary (or run it in parts) and examine the content where warranted (e.g. Cyl_P(1:3) would suffice here to check whether the loop fills the vector as intended).

trial structure psychtoolbox experiment

I want to program an experiment that should consist of 10 trials (10 pictures) that a shown either on the left or right side. At the same time there is a odd or even number shown on the opposite side. I want to measure reaction time and response (odd or even). I guess I am stuck with the trial structure.
Is it enough to just define the ntrials = length(pictures) or do I need a for loop for the variables (pic_position, number_position)?
This is my approach so far:
pic_pos = {'left' 'right'};
num_pos = {'left' 'right'};
evenodd = {'odd' 'even'};
ntrials = length(pictures);
for n = 1:length(pictures)
trials(ntrials).picture = pictures(n)
end
pictures = Shuffle(pictures);
for trial = 1:ntrials
currentnumber = num2str(numbers{trial})
switch trials(trial).num_pos
case 'right'
x = screencentrex + img_dist
case 'left'
x = screencentrex - img_dist
end;
Screen('TextSize', win, [25]);
DrawFormattedText(win, currentnumber, [x], 'center', [255 255 255]);
Screen('Flip', win);
WaitSecs(3);
Unfortunately it doesn't show me the number.
You don't neccessarily need to loop over the position or number variables. Instead, you can generate the stimulus parameters for each trial in advance, for example using the Psychtoolbox function BalanceFactors
[trialNumberPositions, trialNumberEvenOrOdd] = BalanceTrials(ntrials, 1, num_pos, evenodd);
This returns combinations of the levels of the factors 'num_pos' and 'evenodd', the orders of which are then randomized. So for example the number position for the trial number saved within the variable 'trial', in your example would be accessed as trialNumberPositions{trial}. Keep in mind that you have 4 unique combinations of evenodd and num_pos, so for your trial numbers to be balanced across conditions you would have a total number of trials that is a multiple of 4 (for example 12 trials total, rather than 10). I didn't include pic_pos because the pic_pos would always be whatever num_pos is not, as in your description the two stimuli would never be presented on the same side.
As to why your number isn't being displayed, it is hard to tell without more of the experiment script. But you are currently writing white text to the screen, is the background non-white?

Remove noise from a rectangular wave matlab

I have some recordings (from 16:00PM to 16:00PM) where ones indicate some kind of noise and zeros indicate quite moments. The following code tries to replicate these recordings.
dt = datenum('00:02:00','HH:MM:ss') - datenum('00:01:00','HH:MM:ss');
time_begin = datenum('00:00:00','HH:MM:ss');
time_end = datenum('24:00:00','HH:MM:ss');
time = repmat(cellstr(datestr(time_begin:dt:time_end,'HH:MM:ss')),2,1);
loudness = ones(1,numel(time));
quiet_start = [1489 1737];
quiet_end = [1603 1906];
for i = 1: numel(quiet_start)
loudness(quiet_start(i):quiet_end(i))=0;
end
time = time(961:2400);
loudness = loudness(961:2400);
figure
plot(loudness)
ylim([0 3])
I know that in the interval 16:00PM - 16:00PM there should be only 1 "bout" of zeros. Here (if you plot loudness) you can see that there are 2 bouts of zeros.
I have 2 possibilities:
remove one of the two bouts of zeros
remove the bout of ones in the middle
Is there any measure that I can use to take this decision? I.e. Do I make a bigger error converting ones to zeros or viceversa?
There are 2 (or more) bouts of zeros because of some errors in the recordings...but for sure there should be only one. I would like to remove the bouts in order to "modify" the system as less as possible. For instance: in this case I would remove the first bout of zeros since it is the smallest, but what to do if there are more than 2 bouts? Is there any algorithm that deals with this kind of problems?

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')