Removing duplicate entries in a vector, when entries are complex and rounding errors are causing problems - matlab

I want to remove duplicate entries from a vector on Matlab. The problem I'm having is that rounding errors are stopping the inbuilt Matlab function 'unique' from working properly. Ideally I'd like a way to set some sort of tolerance on the 'unique' function, or a small procedure that will remove the duplicates otherwise. If both the real and imaginary parts of two entries differ by less than 0.0001, then I'm happy to consider them equal. How can I do this?
Any help will be greatly appreciated. Thanks

A simple approximation would be to round the numbers and the use the indices returned by unique:
X = ... (input vector)
[b, i] = unique(round(X / (tolerance * (1 + i))));
output = X(i);
(you can probably replace b with ~ depending on your Matlab version).
it won't quite have the behaviour you want, since it is possible that two numbers are very close but will be rounded differently. I think you could mitigate this by doing:
X = ... (input vector)
[b, ind] = unique(round(X / (tolerance * (1 + i))));
X = X(ind);
[b, ind] = unique(round(X / (tolerance * (1 + i)) + 0.5 * (1 + i)));
X = X(ind);
This will round them twice, so any numbers that are exactly on a rounding boundary will be caught by the second unique.
There is still some messiness in this - some numbers will be affected as though the tolerance was doubled. But it might be sufficient for your needs.
The alternative is probably a for loop:
X = sort(X);
last = X(1);
indices = ones(numel(X), 1);
for j=2:numel(X)
if X(j) > last + tolerance * (1 + i)
last = X(j) + tolerance * (1 + i) / 2;
else
indices(j) = 0;
end
end
X = X(logical(indices));
I think this has the best behaviour you can expect (because you want to represent the vector by as few unique values as possible - when there are lots of numbers that differ by less than the tolerance level, there may be multiple ways of splitting them. This algorithm does so greedily, starting with the smallest).

I'm almost certain the below ill always assume any values closer than 1e-8 are equal. Simply replace 1e-8 with whatever value you want.
% unique function that assumes 1e-8 is equal
function [out, I] = unique(input, first_last)
threshold = 1e-8;
if nargin < 2
first_last = 'last';
end
[out, I] = sort(input);
db = diff(out);
k = find(abs(db) < threshold);
if strcmpi(first_last, 'last')
k2 = min(I(k), I(k+1));
elseif strcmpi(first_last, 'first')
k2 = max(I(k), I(k+1));
else
error('unknown flag option for unique, must be first or last');
end
k3 = true(1, length(input));
k3(k2) = false;
out = out(k3(I));
I = I(k3(I));
end

The following might serve your purposes. Given X, an array of complex doubles, it sorts them, then checks whether the absolute value differences between elements is within the complex tolerance, real_tol and imag_tol. It removes elements that satisfy this tolerance.
function X_unique = unique_complex_with_tolerance(X,real_tol,imag_tol)
X_sorted = sort(X); %Sorts by magnitude first, then imaginary part.
dX_sorted = diff(X_sorted);
dX_sorted_real = real(dX_sorted);
dX_sorted_imag = imag(dX_sorted);
remove_idx = (abs(dX_sorted_real)<real_tol) & (abs(dX_sorted_imag)<imag_tol);
X_unique = X_sorted;
X_unique(remove_idx) = [];
return
Note that this code will remove all elements which satisfy this difference tolerance. For example, if X = [1+i,2+2i,3+3i,4+4i], real_tol = 1.1, imag_tol = 1.1, then this function will return only one element, X_unique = [4+4i], even though you might consider, for example, X_unique = [1+i,4+4i] to also be a valid answer.

Related

How do I linearly interpolate past missing values using future values in a while loop?

I am using MATLAB R2020a on a MacOS. I am trying to remove outlier values in a while loop. This involves calculating an exponentially weighted moving mean and then comparing this a vector value. If the conditions are met, the vector input is then added to a separate vector of 'acceptable' values. The while loop then advances to the next input and calculates the new exponentially weighted moving average which includes the newly accepted vector input.
However, if the condition is not met, I written code so that, instead of adding the input sample, a zero is added to the vector of 'acceptable' values. Upon the next acceptable value being added, I currently have it so the zero immediately before is replaced by the mean of the 2 framing acceptable values. However, this only accounts for one past zero and not for multiple outliers. Replacing with a framing mean may also introduce aliaising errors.
Is there any way that the zeros can instead be replaced by linearly interpolating the "candidate outlier" point using the gradient based on the framing 2 accepted vector input values? That is, is there a way of counting backwards within the while loop to search for and replace zeros as soon as a new 'acceptable' value is found?
I would very much appreciate any suggestions, thanks in advance.
%Calculate exponentially weighted moving mean and tau without outliers
accepted_means = zeros(length(cycle_periods_filtered),1); % array for accepted exponentially weighted means
accepted_means(1) = cycle_periods_filtered(1);
k = zeros(length(cycle_periods_filtered),1); % array for accepted raw cycle periods
m = zeros(length(cycle_periods_filtered), 1); % array for raw periods for all cycles with outliers replaced by mean of framing values
k(1) = cycle_periods_filtered(1);
m(1) = cycle_periods_filtered(1);
tau = m/3; % pre-allocation for efficiency
i = 2; % index for counting through input signal
j = 2; % index for counting through accepted exponential mean values
n = 2; % index for counting through raw periods of all cycles
cycle_index3(1) = 1;
while i <= length(cycle_periods_filtered)
mavCurrent = (1 - 1/w(j))*accepted_means(j - 1) + (1/w(j))*cycle_periods_filtered(i);
if cycle_periods_filtered(i) < 1.5*(accepted_means(j - 1)) && cycle_periods_filtered(i) > 0.5*(accepted_means(j - 1)) % Identify high and low outliers
accepted_means(j) = mavCurrent;
k(j) = cycle_periods_filtered(i);
m(n) = cycle_periods_filtered(i);
cycle_index3(n) = i;
tau(n) = m(n)/3;
if m(n - 1) == 0
m(n - 1) = (k(j) + k(j - 1))/2;
tau(n - 1) = m(n)/3;
end
j = j + 1;
n = n + 1;
else
m(n) = 0;
n = n + 1;
end
i = i + 1;
end
% Scrap the tail
accepted_means(j - 1:end)=[];
k(j - 1:end) = [];

Metropolis-Hastings in matlab

I am trying to use the Metropolis Hastings algorithm with a random walk sampler to simulate samples from a function $$ in matlab, but something is wrong with my code. The proposal density is the uniform PDF on the ellipse 2s^2 + 3t^2 ≤ 1/4. Can I use the acceptance rejection method to sample from the proposal density?
N=5000;
alpha = #(x1,x2,y1,y2) (min(1,f(y1,y2)/f(x1,x2)));
X = zeros(2,N);
accept = false;
n = 0;
while n < 5000
accept = false;
while ~accept
s = 1-rand*(2);
t = 1-rand*(2);
val = 2*s^2 + 3*t^2;
% check acceptance
accept = val <= 1/4;
end
% and then draw uniformly distributed points checking that u< alpha?
u = rand();
c = u < alpha(X(1,i-1),X(2,i-1),X(1,i-1)+s,X(2,i-1)+t);
X(1,i) = c*s + X(1,i-1);
X(2,i) = c*t + X(2,i-1);
n = n+1;
end
figure;
plot(X(1,:), X(2,:), 'r+');
You may just want to use the native implementation of matlab mhsample.
Regarding your code, there are a few things missing:
- function alpha,
- loop variable i (it might be just n but it is not suited for indexing since it starts at zero).
And you should always allocate memory in matlab if you want to fill it dynamically, i.e. X in your case.
To expand on the suggestions by #max, the code appears to work if you change the i indices to n and replace
n = 0;
with
n = 2;
X(:,1) = [.1,.1];
It would probably be better to assign X(:,1) to random values within your accept region (using the same code you use later), and/or include a burn-in period.
Depending upon what you are going to do with this, it may also make things cleaner to evaluate the argument to sin in the f function to keep it within 0 to 2 pi (likely by shifting the value by 2 pi if it exceeds those bounds)

Regarding loop structure in Matlab for an iterative procedure

I'm trying to code a loop in Matlab that iteratively solves for an optimal vector s of zeros and ones. This is my code
N = 150;
s = ones(N,1);
for i = 1:N
if s(i) == 0
i = i + 1;
else
i = i;
end
select = s;
HI = (item_c' * (weights.*s)) * (1/(weights'*s));
s(i) = 0;
CI = (item_c' * (weights.*s)) * (1/(weights'*s));
standarderror_afterex = sqrt(var(CI - CM));
standarderror_priorex = sqrt(var(HI - CM));
ratio = (standarderror_afterex - standarderror_priorex)/(abs(mean(weights.*s) - weights'*select));
ratios(i) = ratio;
s(i) = 1;
end
[M,I] = min(ratios);
s(I) = 0;
This code sets the element to zero in s, which has the lowest ratio. But I need this procedure to start all over again, using the new s with one zero, to find the ratios and exclude the element in s that has the lowest ratio. I need that over and over until no ratios are negative.
Do I need another loop, or do I miss something?
I hope that my question is clear enough, just tell me if you need me to explain more.
Thank you in advance, for helping out a newbie programmer.
Edit
I think that I need to add some form of while loop as well. But I can't see how to structure this. This is the flow that I want
With all items included (s(i) = 1 for all i), calculate HI, CI and the standard errors and list the ratios, exclude item i (s(I) = 0) which corresponds to the lowest negative ratio.
With the new s, including all ones but one zero, calculate HI, CI and the standard errors and list the ratios, exclude item i, which corresponds to the lowest negative ratio.
With the new s, now including all ones but two zeros, repeat the process.
Do this until there is no negative element in ratios to exclude.
Hope that it got more clear now.
Ok. I want to go through a few things before I list my code. These are just how I would try to do it. Not necessarily the best way, or fastest way even (though I'd think it'd be pretty quick). I tried to keep the structure as you had in your code, so you could follow it nicely (even though I'd probably meld all the calculations down into a single function or line).
Some features that I'm using in my code:
bsxfun: Learn this! It is amazing how it works and can speed up code, and makes some things easier.
v = rand(n,1);
A = rand(n,4);
% The two lines below compute the same value:
W = bsxfun(#(x,y)x.*y,v,A);
W_= repmat(v,1,4).*A;
bsxfun dot multiplies the v vector with each column of A.
Both W and W_ are matrices the same size as A, but the first will be much faster (usually).
Precalculating dropouts: I made select a matrix, where before it was a vector. This allows me to then form a variable included using logical constructs. The ~(eye(N)) produces an identity matrix and negates it. By logically "and"ing it with select, then the $i$th column is now select, with the $i$th element dropped out.
You were explicitly calculating weights'*s as the denominator in each for-loop. By using the above matrix to calculate this, we can now do a sum(W), where the W is essentially weights.*s in each column.
Take advantage of column-wise operations: the var() and the sqrt() functions are both coded to work along the columns of a matrix, outputting the action for a matrix in the form of a row vector.
Ok. the full thing. Any questions let me know:
% Start with everything selected:
select = true(N);
stop = false; % Stopping flag:
while (~stop)
% Each column leaves a variable out...
included = ~eye(N) & select;
% This calculates the weights with leave-one-out:
W = bsxfun(#(x,y)x.*y,weights,included);
% You can comment out the line below, if you'd like...
W_= repmat(weights,1,N).*included; % This is the same as previous line.
% This calculates the weights before dropping the variables:
V = bsxfun(#(x,y)x.*y,weights,select);
% There's different syntax, depending on whether item_c is a
% vector or a matrix...
if(isvector(item_c))
HI = (item_c' * V)./(sum(V));
CI = (item_c' * W)./(sum(W));
else
% For example: item_c is a matrix...
% We have to use bsxfun() again
HI = bsxfun(#rdivide, (item_c' * V),sum(V));
CI = bsxfun(#rdivide, (item_c' * W),sum(W));
end
standarderror_afterex = sqrt(var(bsxfun(#minus,HI,CM)));
standarderror_priorex = sqrt(var(bsxfun(#minus,CI,CM)));
% or:
%
% standarderror_afterex = sqrt(var(HI - repmat(CM,1,size(HI,2))));
% standarderror_priorex = sqrt(var(CI - repmat(CM,1,size(CI,2))));
ratios = (standarderror_afterex - standarderror_priorex)./(abs(mean(W) - sum(V)));
% Identify the negative ratios:
negratios = ratios < 0;
if ~any(negratios)
% Drop out of the while-loop:
stop = true;
else
% Find the most negative ratio:
neginds = find(negratios);
[mn, mnind] = min(ratios(negratios));
% Drop out the most negative one...
select(neginds(mnind),:) = false;
end
end % end while(~stop)
% Your output:
s = select(:,1);
If for some reason it doesn't work, please let me know.

Matlab - Sum groups of elements of a vector based on given indices

I repropose a question I asked this week and that, due to a missing tag, went unnoticed (basically it was viewed only by me).
I have two large vectors, values and indices. I need to sum the elements of values using indices as in this brute force example:
% The two vectors, which are given, look like this:
N = 3e7;
values = (rand(N, 1) > 0.3);
indices = cumsum(ceil(4*rand(N, 1)));
indices = [0; indices(find(indices > 1, 1, 'first'):find(indices < N, 1, 'last')); N];
HH = numel(indices) - 1;
% This is the brute force solution
tic
out1 = zeros(HH, 1);
for hh = 1:HH
out1(hh) = sum(values((indices(hh)+1):indices(hh+1)));
end
toc
A more efficient way to do it is the following:
tic
indices2 = diff(indices);
new_inds = (1:HH+1)';
tmp = zeros(N, 1);
tmp(cumsum(indices2)-indices2+1)=1;
new_inds_long = new_inds(cumsum(tmp));
out2 = accumarray(new_inds_long, values);
toc
A better solution is:
tic
out3 = cumsum(values);
out3 = out3(indices(2:end));
out3 = [out3(1); diff(out3)];
toc
The three solutions are equivalent
all(out1 == out2)
all(out1 == out3)
Question is: since this is really a basic function, is there any faster, already known approach/function that does the same and that I may be overlooking or that I am just not aware of?
If generating your indices is not simply a dummy for some other, this could be improved. Currently you are wasting 3/4 of the generated numbers. 1) Determine the number of indices you want (binomial distribution) 2) generate only the used indices.

Compute the convolution of two arrays in MATLAB

I am trying to generate an array from some starting values using this formula in MATLAB:
yt = a0 + ∑i=1p (ai ⋅ yt-i), t ≥ p
p is some small number compared to T (max t). I have been able to make this using two for cycles but it is really slow. Is there some easy way to do it?
First p values of y are provided and vector a (its length is p+1) is provided too...
This is what I have so far, but now when I tried it, it doesn't work 100% (I think it's because of indexing from 1 in MATLAB):
y1 = zeros(T+1, 1);
y1(1:p) = y(1:p);
for t = p+1:T+1
value = a1(1);
for j = 2:p+1
value = value + a1(j)*y1(t-j+1);
end
y1(t) = value;
end
EDIT: I solved it, I am just not used to Matlab indexing from 1...
This statement
if(p>=t)
looks odd inside a loop whose index expression is
for t = p+1:T+1
which seems to guarantee that t>p for the entire duration of the loop. Is that what you meant to write ?
EDIT in response to comment
Inside a loop indexed with this statement
for j = 2:p
how does the reference you make to a(j) ever call for a(0) ?
y1 = zeros(T+1, 1);
y1(1:p) = y(1:p);
for t = p+1:T+1
value = a1(1);
for j = 2:p+1
value = value + a1(j)*y1(t-j+1);
end
y1(t) = value;
end