How do I linearly interpolate past missing values using future values in a while loop? - matlab

I am using MATLAB R2020a on a MacOS. I am trying to remove outlier values in a while loop. This involves calculating an exponentially weighted moving mean and then comparing this a vector value. If the conditions are met, the vector input is then added to a separate vector of 'acceptable' values. The while loop then advances to the next input and calculates the new exponentially weighted moving average which includes the newly accepted vector input.
However, if the condition is not met, I written code so that, instead of adding the input sample, a zero is added to the vector of 'acceptable' values. Upon the next acceptable value being added, I currently have it so the zero immediately before is replaced by the mean of the 2 framing acceptable values. However, this only accounts for one past zero and not for multiple outliers. Replacing with a framing mean may also introduce aliaising errors.
Is there any way that the zeros can instead be replaced by linearly interpolating the "candidate outlier" point using the gradient based on the framing 2 accepted vector input values? That is, is there a way of counting backwards within the while loop to search for and replace zeros as soon as a new 'acceptable' value is found?
I would very much appreciate any suggestions, thanks in advance.
%Calculate exponentially weighted moving mean and tau without outliers
accepted_means = zeros(length(cycle_periods_filtered),1); % array for accepted exponentially weighted means
accepted_means(1) = cycle_periods_filtered(1);
k = zeros(length(cycle_periods_filtered),1); % array for accepted raw cycle periods
m = zeros(length(cycle_periods_filtered), 1); % array for raw periods for all cycles with outliers replaced by mean of framing values
k(1) = cycle_periods_filtered(1);
m(1) = cycle_periods_filtered(1);
tau = m/3; % pre-allocation for efficiency
i = 2; % index for counting through input signal
j = 2; % index for counting through accepted exponential mean values
n = 2; % index for counting through raw periods of all cycles
cycle_index3(1) = 1;
while i <= length(cycle_periods_filtered)
mavCurrent = (1 - 1/w(j))*accepted_means(j - 1) + (1/w(j))*cycle_periods_filtered(i);
if cycle_periods_filtered(i) < 1.5*(accepted_means(j - 1)) && cycle_periods_filtered(i) > 0.5*(accepted_means(j - 1)) % Identify high and low outliers
accepted_means(j) = mavCurrent;
k(j) = cycle_periods_filtered(i);
m(n) = cycle_periods_filtered(i);
cycle_index3(n) = i;
tau(n) = m(n)/3;
if m(n - 1) == 0
m(n - 1) = (k(j) + k(j - 1))/2;
tau(n - 1) = m(n)/3;
end
j = j + 1;
n = n + 1;
else
m(n) = 0;
n = n + 1;
end
i = i + 1;
end
% Scrap the tail
accepted_means(j - 1:end)=[];
k(j - 1:end) = [];

Related

Which Bins are occupied in a 3D histogram in MatLab

I got 3D data, from which I need to calculate properties.
To reduce computung I wanted to discretize the space and calculate the properties from the Bin instead of the individual data points and then reasign the propertie caclulated from the bin back to the datapoint.
I further only want to calculate the Bins which have points within them.
Since there is no 3D-binning function in MatLab, what i do is using histcounts over each dimension and then searching for the unique Bins that have been asigned to the data points.
a5pre=compositions(:,1);
a7pre=compositions(:,2);
a8pre=compositions(:,3);
%% BINNING
a5pre_edges=[0,linspace(0.005,0.995,19),1];
a5pre_val=(a5pre_edges(1:end-1) + a5pre_edges(2:end))/2;
a5pre_val(1)=0;
a5pre_val(end)=1;
a7pre_edges=[0,linspace(0.005,0.995,49),1];
a7pre_val=(a7pre_edges(1:end-1) + a7pre_edges(2:end))/2;
a7pre_val(1)=0;
a7pre_val(end)=1;
a8pre_edges=a7pre_edges;
a8pre_val=a7pre_val;
[~,~,bin1]=histcounts(a5pre,a5pre_edges);
[~,~,bin2]=histcounts(a7pre,a7pre_edges);
[~,~,bin3]=histcounts(a8pre,a8pre_edges);
bins=[bin1,bin2,bin3];
[A,~,C]=unique(bins,'rows','stable');
a5pre=a5pre_val(A(:,1));
a7pre=a7pre_val(A(:,2));
a8pre=a8pre_val(A(:,3));
It seems like that the unique function is pretty time consuming, so I was wondering if there is a faster way to do it, knowing that the line only can contain integer or so... or a totaly different.
Best regards
function [comps,C]=compo_binner(x,y,z,e1,e2,e3,v1,v2,v3)
C=NaN(length(x),1);
comps=NaN(length(x),3);
id=1;
for i=1:numel(x)
B_temp(1,1)=v1(sum(x(i)>e1));
B_temp(1,2)=v2(sum(y(i)>e2));
B_temp(1,3)=v3(sum(z(i)>e3));
C_id=sum(ismember(comps,B_temp),2)==3;
if sum(C_id)>0
C(i)=find(C_id);
else
comps(id,:)=B_temp;
id=id+1;
C_id=sum(ismember(comps,B_temp),2)==3;
C(i)=find(C_id>0);
end
end
comps(any(isnan(comps), 2), :) = [];
end
But its way slower than the histcount, unique version. Cant avoid find-function, and thats a function you sure want to avoid in a loop when its about speed...
If I understand correctly you want to compute a 3D histogram. If there's no built-in tool to compute one, it is simple to write one:
function [H, lindices] = histogram3d(data, n)
% histogram3d 3D histogram
% H = histogram3d(data, n) computes a 3D histogram from (x,y,z) values
% in the Nx3 array `data`. `n` is the number of bins between 0 and 1.
% It is assumed all values in `data` are between 0 and 1.
assert(size(data,2) == 3, 'data must be Nx3');
H = zeros(n, n, n);
indices = floor(data * n) + 1;
indices(indices > n) = n;
lindices = sub2ind(size(H), indices(:,1), indices(:,2), indices(:,3));
for ii = 1:size(data,1)
H(lindices(ii)) = H(lindices(ii)) + 1;
end
end
Now, given your compositions array, and binning each dimension into 20 bins, we get:
[H, indices] = histogram3d(compositions, 20);
idx = find(H);
[x,y,z] = ind2sub(size(H), idx);
reduced_compositions = ([x,y,z] - 0.5) / 20;
The bin centers for H are at ((1:20)-0.5)/20.
On my machine this runs in a fraction of a second for 5 million inputs points.
Now, for each composition(ii,:), you have a number indices(ii), which matches with another number idx[jj], corresponding to reduced_compositions(jj,:). One easy way to make the assignment of results is as follows:
H(H > 0) = 1:numel(idx);
indices = H(indices);
Now for each composition(ii,:), your closest match in the reduced set is reduced_compositions(indices(ii),:).

Matlab : Help in entropy estimation of a disretized time series

This Question is in continuation to a previous one asked Matlab : Plot of entropy vs digitized code length
I want to calculate the entropy of a random variable that is discretized version (0/1) of a continuous random variable x. The random variable denotes the state of a nonlinear dynamical system called as the Tent Map. Iterations of the Tent Map yields a time series of length N.
The code should exit as soon as the entropy of the discretized time series becomes equal to the entropy of the dynamical system. It is known theoretically that the entropy of the system is log_2(2). The code exits but the frst 3 values of the entropy array are erroneous - entropy(1) = 1, entropy(2) = NaN and entropy(3) = NaN. I am scratching my head as to why this is happening and how I can get rid of it. Please help in correcting the code. THank you.
clear all
H = log(2)
threshold = 0.5;
x(1) = rand;
lambda(1) = 1;
entropy(1,1) = 1;
j=2;
tol=0.01;
while(~(abs(lambda-H)<tol))
if x(j - 1) < 0.5
x(j) = 2 * x(j - 1);
else
x(j) = 2 * (1 - x(j - 1));
end
s = (x>=threshold);
p_1 = sum(s==1)/length(s);
p_0 = sum(s==0)/length(s);
entropy(:,j) = -p_1*log2(p_1)-(1-p_1)*log2(1-p_1);
lambda = entropy(:,j);
j = j+1;
end
plot( entropy )
It looks like one of your probabilities is zero. In that case, you'd be trying to calculate 0*log(0) = 0*-Inf = NaN. The entropy should be zero in this case, so you you can just check for this condition explicitly.
Couple side notes: It looks like you're declaring H=log(2), but your post says the entropy is log_2(2). p_0 is always 1 - p_1, so you don't have to count everything up again. Growing the arrays dynamically is inefficient because matlab has to re-copy the entire contents at each step. You can speed things up by pre-allocating them (only worth it if you're going to be running for many timesteps).

Regarding loop structure in Matlab for an iterative procedure

I'm trying to code a loop in Matlab that iteratively solves for an optimal vector s of zeros and ones. This is my code
N = 150;
s = ones(N,1);
for i = 1:N
if s(i) == 0
i = i + 1;
else
i = i;
end
select = s;
HI = (item_c' * (weights.*s)) * (1/(weights'*s));
s(i) = 0;
CI = (item_c' * (weights.*s)) * (1/(weights'*s));
standarderror_afterex = sqrt(var(CI - CM));
standarderror_priorex = sqrt(var(HI - CM));
ratio = (standarderror_afterex - standarderror_priorex)/(abs(mean(weights.*s) - weights'*select));
ratios(i) = ratio;
s(i) = 1;
end
[M,I] = min(ratios);
s(I) = 0;
This code sets the element to zero in s, which has the lowest ratio. But I need this procedure to start all over again, using the new s with one zero, to find the ratios and exclude the element in s that has the lowest ratio. I need that over and over until no ratios are negative.
Do I need another loop, or do I miss something?
I hope that my question is clear enough, just tell me if you need me to explain more.
Thank you in advance, for helping out a newbie programmer.
Edit
I think that I need to add some form of while loop as well. But I can't see how to structure this. This is the flow that I want
With all items included (s(i) = 1 for all i), calculate HI, CI and the standard errors and list the ratios, exclude item i (s(I) = 0) which corresponds to the lowest negative ratio.
With the new s, including all ones but one zero, calculate HI, CI and the standard errors and list the ratios, exclude item i, which corresponds to the lowest negative ratio.
With the new s, now including all ones but two zeros, repeat the process.
Do this until there is no negative element in ratios to exclude.
Hope that it got more clear now.
Ok. I want to go through a few things before I list my code. These are just how I would try to do it. Not necessarily the best way, or fastest way even (though I'd think it'd be pretty quick). I tried to keep the structure as you had in your code, so you could follow it nicely (even though I'd probably meld all the calculations down into a single function or line).
Some features that I'm using in my code:
bsxfun: Learn this! It is amazing how it works and can speed up code, and makes some things easier.
v = rand(n,1);
A = rand(n,4);
% The two lines below compute the same value:
W = bsxfun(#(x,y)x.*y,v,A);
W_= repmat(v,1,4).*A;
bsxfun dot multiplies the v vector with each column of A.
Both W and W_ are matrices the same size as A, but the first will be much faster (usually).
Precalculating dropouts: I made select a matrix, where before it was a vector. This allows me to then form a variable included using logical constructs. The ~(eye(N)) produces an identity matrix and negates it. By logically "and"ing it with select, then the $i$th column is now select, with the $i$th element dropped out.
You were explicitly calculating weights'*s as the denominator in each for-loop. By using the above matrix to calculate this, we can now do a sum(W), where the W is essentially weights.*s in each column.
Take advantage of column-wise operations: the var() and the sqrt() functions are both coded to work along the columns of a matrix, outputting the action for a matrix in the form of a row vector.
Ok. the full thing. Any questions let me know:
% Start with everything selected:
select = true(N);
stop = false; % Stopping flag:
while (~stop)
% Each column leaves a variable out...
included = ~eye(N) & select;
% This calculates the weights with leave-one-out:
W = bsxfun(#(x,y)x.*y,weights,included);
% You can comment out the line below, if you'd like...
W_= repmat(weights,1,N).*included; % This is the same as previous line.
% This calculates the weights before dropping the variables:
V = bsxfun(#(x,y)x.*y,weights,select);
% There's different syntax, depending on whether item_c is a
% vector or a matrix...
if(isvector(item_c))
HI = (item_c' * V)./(sum(V));
CI = (item_c' * W)./(sum(W));
else
% For example: item_c is a matrix...
% We have to use bsxfun() again
HI = bsxfun(#rdivide, (item_c' * V),sum(V));
CI = bsxfun(#rdivide, (item_c' * W),sum(W));
end
standarderror_afterex = sqrt(var(bsxfun(#minus,HI,CM)));
standarderror_priorex = sqrt(var(bsxfun(#minus,CI,CM)));
% or:
%
% standarderror_afterex = sqrt(var(HI - repmat(CM,1,size(HI,2))));
% standarderror_priorex = sqrt(var(CI - repmat(CM,1,size(CI,2))));
ratios = (standarderror_afterex - standarderror_priorex)./(abs(mean(W) - sum(V)));
% Identify the negative ratios:
negratios = ratios < 0;
if ~any(negratios)
% Drop out of the while-loop:
stop = true;
else
% Find the most negative ratio:
neginds = find(negratios);
[mn, mnind] = min(ratios(negratios));
% Drop out the most negative one...
select(neginds(mnind),:) = false;
end
end % end while(~stop)
% Your output:
s = select(:,1);
If for some reason it doesn't work, please let me know.

How can I rearrange a list of numbers so that every N numbers is nonrepeating?

So I have a list of 190 numbers ranging from 1:19 (each number is repeated 10 times) that I need to sample 10 at a time. Within each sample of 10, I don't want the numbers to repeat, I tried incorporating a while loop, but computation time was way too long. So far I'm at the point where I can generate the numbers and see if there are repetitions within each subset. Any ideas?
N=[];
for i=1:10
N=[N randperm(19)];
end
B=[];
for j=1:10
if length(unique(N(j*10-9:j*10)))<10
B=[B 1];
end
end
sum(B)
Below is an updated version of the code. this might be a little more clear in showing what I want. (19 targets taken 10 at a time without repetition until all 19 targets have been repeated 10 times)
nTargs = 19;
pairs = nchoosek(1:nTargs, 10);
nPairs = size(pairs, 1);
order = randperm(nPairs);
values=randsample(order,19);
targs=pairs(values,:);
Alltargs=false;
while ~Alltargs
targs=pairs(randsample(order,19),:);
B=[];
for i=1:19
G=length(find(targs==i))==10;
B=[B G];
end
if sum(B)==19
Alltargs=true;
end
end
Here are some very simple steps to do this, basically you just shuffle the vector once, and then you grab the last 10 unique values:
v = repmat(1:19,1,10);
v = v(randperm(numel(v)));
[a idx]=unique(v);
result = unique(v);
v(idx)=[];
The algorithm should be fairly efficient, if you want to do the next 10, just run the last part again and combine the results into a totalResult
You want to sample the numbers 1:19 randomly in blocks of 10 without repetitions. The Matlab function 'randsample' has an optional 'replacement' argument which you can set to 'false' if you do not want repetitions. For example:
N = [];
replacement = false;
for i = 1:19
N = [N randsample(19,10,replacement)];
end
This generates a 19 x 10 matrix of random integers in the range [1,..,19] without repetitions within each column.
Edit: Here is a solution that addresses the requirement that each of the integers [1,..,19] occurs exactly 10 times, in addition to no repetition within each column / sample:
nRange = 19; nRep = 10;
valueRep = true; % true while there are repetitions
nLoops = 0; % count the number of iterations
while valueRep
l = zeros(1,nRep);
v = [];
for m = 1:nRep
v = [v, randperm(nRange,nRange)];
end
m1 = reshape(v,nRep,nRange);
for n = 1:nRep
l(n) = length(unique(m1(:,n)));
end
if all(l == nRep)
valueRep = false;
end
nLoops = nLoops + 1;
end
result = m1;
For the parameters in the question it takes about 300 iterations to find a result.
I think you should approach this constructively.
It's easy to initially find a 19 groups that fulfill your conditions just by rearranging the series 1:19: series1 = repmat(1:19,1,10); and rearranged= reshape(series1,10,19)
then shuffle the values
I would select two random columns copy them and switch the values at two random positions
then make a test if it fulfills your condition - like: test = #(x) numel(unique(x))==10 - if yes replace your columns
just keep shuffling till your time runs out or you are happy
of course you might come up with more efficient shuffling or testing
I was given another solution through the MATLAB forum that works pretty well (Credit to Niklas Nylen over on the MATLAB forum). Computation time is pretty low too. It basically shuffles the numbers until there are no repetitions within every 10 values. Thanks all for your help.
y = repmat(1:19,1,10);
% Run enough iterations to get the output random enough, I selected 100000
for ii = 1:100000
% Select random index
index = randi(length(y)-1);
% Check if it is allowed to switch places
if y(index)~=y(min(index+10, length(y))) && y(index+1)~=y(max(1,index-9))
% Make the switch
yTmp = y(index);
y(index)=y(index+1);
y(index+1)=yTmp;
end
end

Matlab -- random walk with boundaries, vectorized

Suppose I have a vector J of jump sizes and an initial starting point X_0. Also I have boundaries 0, B (assume 0 < X_0 < B). I want to do a random walk where X_i = [min(X_{i-1} + J_i,B)]^+. (positive part). Basically if it goes over a boundary, it is made equal to the boundary. Anyone know a vectorized way to do this? The current way I am doing it consists of doing cumsums and then finding places where it violates a condition, and then starting from there and repeating the cumsum calculation, etc until I find that I stop violating the boundaries. It works when the boundaries are rarely hit, but if they are hit all the time, it basically becomes a for loop.
In the code below, I am doing this across many samples. To 'fix' the ones that go out of the boundary, I have to loop through the samples to check...(don't think there is a vectorized 'find')
% X_init is a row vector describing initial resource values to use for
% each sample
% J is matrix where each col is a sequence of Jumps (columns = sample #)
% In this code the jumps are subtracted, but same thing
X_intvl = repmat(X_init,NumJumps,1) - cumsum(J);
X = [X_init; X_intvl];
for sample = 1:NumSamples
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
while(~isempty(k))
change = X_intvl(k-1,sample) - X_intvl(k,sample);
X_intvl(k:end,sample) = X_intvl(k:end,sample)+change;
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
end
end
Interesting question (+1).
I faced a similar problem a while back, although slightly more complex as my lower and upper bound depended on t. I never did work out a fully-vectorized solution. In the end, the fastest solution I found was a single loop which incorporates the constraints at each step. Adapting the code to your situation yields the following:
%# Set the parameters
LB = 0; %# Lower bound
UB = 5; %# Upper bound
T = 100; %# Number of observations
N = 3; %# Number of samples
X0 = (1/2) * (LB + UB); %# Arbitrary start point halfway between LB and UB
%# Generate the jumps
Jump = randn(N, T-1);
%# Build the constrained random walk
X = X0 * ones(N, T);
for t = 2:T
X(:, t) = max(min(X(:, t-1) + Jump(:, t-1), UB), 0);
end
X = X';
I would be interested in hearing if this method proves faster than what you are currently doing. I suspect it will be for cases where the constraint is binding in more than one or two places. I can't test it myself as the code you provided is not a "working" example, ie I can't just copy and paste it into Matlab and run it, as it depends on several variables for which example (or simulated) values are not provided. I tried adapting it myself, but couldn't get it to work properly?
UPDATE: I just switched the code around so that observations are indexed on columns and samples are indexed on rows, and then I transpose X in the last step. This will make the routine more efficient, since Matlab allocates memory for numeric arrays column-wise - hence it is faster when performing operations down the columns of an array (as opposed to across the rows). Note, you will only notice the speed-up for large N.
FINAL THOUGHT: These days, the JIT accelerator is very good at making single loops in Matlab efficient (double loops are still pretty slow). Therefore personally I'm of the opinion that every time you try and obtain a fully-vectorized solution in Matlab, ie no loops, you should weigh up whether the effort involved in finding a clever solution is worth the slight gains in efficiency to be made over an easier-to-obtain method that utilizes a single loop. And it is important to remember that fully-vectorized solutions are sometimes slower than solutions involving single loops when T and N are small!
I'd like to propose another vectorized solution.
So, first we should set the parameters and generate random Jumpls. I used the same set of parameters as Colin T Bowers:
% Set the parameters
LB = 0; % Lower bound
UB = 20; % Upper bound
T = 1000; % Number of observations
N = 3; % Number of samples
X0 = (1/2) * (UB + LB); % Arbitrary start point halfway between LB and UB
% Generate the jumps
Jump = randn(N, T-1);
But I changed generation code:
% Generate initial data without bounds
X = cumsum(Jump, 2);
% Apply bounds
Amplitude = UB - LB;
nsteps = ceil( max(abs(X(:))) / Amplitude - 0.5 );
for ii = 1:nsteps
ind = abs(X) > (1/2) * Amplitude;
X(ind) = Amplitude * sign(X(ind)) - X(ind);
end
% Shifting X
X = X0 + X;
So, instead of for loop I'm using cumsum function with smart post-processing.
N.B. This solution works significantly slower than Colin T Bowers's one for tight bounds (Amplitude < 5), but for loose bounds (Amplitude > 20) it works much faster.