I need to exclude some error data from matrix. I know what data is correct and i am trying to interpolate values between so I can get decent diagrams with not so big errors. I must use that form of matrix and I must preserve its shape. I must only substitute some data that is marked as errors. I will show you my work so far:
M=[0.1000
0.6000
0.7000
0.8000
0.9000
0.9500
1.0000
1.0500
1.1000
1.1500
1.2000
1.2500
1.3000
1.5000
1.7500
2.0000
2.2500
2.5000
3.0000];
CZ1=[ 9.4290
9.5000
9.3250
9.2700
9.2950
9.4350
9.6840
10.0690
10.1840
10.2220
10.2160
9.6160
9.6890
9.4880
9.5000
9.5340
9.3370
9.0990
8.5950];
N1=11;
Nn=13;
Mx1=M(N1);
Mx2=M(Nn);
Mx=[Mx1 Mx2]';
CN1=CZ1(N1);
CN2=CZ1(Nn);
CNy=[C1 C2]';
y1=interp1q(Mx,CNy,M(N1:Nn));
CNf=CZ1;
NEWRangeC=y1;
Cfa=changem(CZ1,[NEWRangeC], [CNf(N1:Nn)]);
figure
plot(M,Cf,'-*b',M,Cfa,'r')
So far as you can see I used points 11 and 13 and i excluded point 12 interpolating that point from 11 to 13. This is working but i want to make a modification.
My question is: How can I select values that are errors and remove them but interpolate space between their neighbors. I want to use a M matrix values as my reference (not points as my example).
Assuming you know which elements are incorrect, you can use Matlab's interp1 function to interpolate them (this will only work if the M matrix is actually a vector`:
error_indices = [11 13];
all_indices = 1:length(M)
% Get the indices where we have valid data
all_correct_indices = setdiff(all_indices, error_indices)
% the first two arguments are the available data.
% the third arguments is what indices you are looking for
M_new = interp1(all_correct_indices, M(all_correct_indices), all_indices)
The above interpolates values at all_indices -- including the missing elements. Where you already have valid data (all_correct_indices), Matlab will return that data. In other places, it will interpolate using the two nearest neighbors.
Try help interp1 for more information on how this function works.
Update - an example
x = 1:10; % all indices
y = x*10;
e = 3:7; % the unknown indices
s = setdiff(x, e); % the known indices
y_est = interp1(s, y(s), x)
ans =
10 20 30 40 50 60 70 80 90 100
And we see that interp1 had interpolated all values from 30 to 70 linearly using the available data (specifically the adjacent points 20 and 80).
Well, you can start out by finding the elements that are errors, with the find command (this will return the indices). This should also work for matrices.
You can then grab the elements around each of the indices, and interpolate between, as you did.
I have a vector in the form of
0 21.3400
0 22.3000
1 22.3000
The left column is the hour and the right side is the value. I need to calculate the averages for each hour. The problem is that my samples run for longer than 24 hours (multiple days), so it would loop back from 0-23 to 0-23 again. Another problem is that sometimes I am missing samples for a certain hour. For example;
12.0000 29.5000
14.0000 35.7400
Any ideas on how I can solve this problem?
The part "so it would loop back from 0-23 to 0-23 again" is unclear to me. Maybe you are looking for the modulo function mod(). But after you solved this particular problem, your averaging-problem can be taken care off using accumarray. It is like the perfect use case for this function.
%// your data
data = [
0 21.3400
0 22.3000
1 22.3000
12.0000 29.5000
14.0000 35.7400];
%// group (find subs)
[hours, b, subs] = unique(data(:,1));
%// apply function mean to grouped data
avg = accumarray(subs, data(:,2), [], #mean);
result = [hours, avg]
In result the result is stored table-like, the first column are the unique hours and the second column are the averaged datav alues for those hours.
result =
0 21.8200
1.0000 22.3000
12.0000 29.5000
14.0000 35.7400
As an example: for the hour 0 the average of the data values 21.3400 and 22.300 is correctly computed as (21.3400 + 22.300)/2, which equals 21.8200.
Given two vectors containing numerical values, say for example
a=1.:0.1:2.;
b=a+0.1;
I would like to select only the differing values. For this Matlab provides the function setdiff. In the above example it is obvious that setdiff(a,b) should return 1. and setdiff(b,a) gives 2.1. However, due to computational precision (see the questions here or here) the result differs. I get
>> setdiff(a,b)
ans =
1.0000 1.2000 1.4000 1.7000 1.9000
Matlab provides a function which returns a lower limit to this precision error, eps. This allows us to estimate a tolerance like tol = 100*eps;
My question now, is there an intelligent and efficient way to select only those values whose difference is below tol? Or said differently: How do I write my own version of setdiff, returning both values and indexes, which includes a tolerance limit?
I don't like the way it is answered in this question, since matlab already provides part of the required functionality.
Introduction and custom function
In a general case with floating point precision issues, one would be advised to use a tolerance value for comparisons against suspected zero values and that tolerance must be a very small value. A little robust method would use a tolerance that uses eps in it. Now, since MATLAB basically performs subtractions with setdiff, you can use eps directly here by comparing for lesser than or equal to it to find zeros.
This forms the basis of a modified setdiff for floating point numbers shown here -
function [C,IA] = setdiff_fp(A,B)
%//SETDIFF_FP Set difference for floating point numbers.
%// C = SETDIFF_FP(A,B) for vectors A and B, returns the values in A that
%// are not in B with no repetitions. C will be sorted.
%//
%// [C,IA] = SETDIFF_FP(A,B) also returns an index vector IA such that
%// C = A(IA). If there are repeated values in A that are not in B, then
%// the index of the first occurrence of each repeated value is returned.
%// Get 2D matrix of absolute difference between each element of A against
%// each element of B
abs_diff_mat = abs(bsxfun(#minus,A,B.')); %//'
%// Compare each element against eps to "negate" the floating point
%// precision issues. Thus, we have a binary array of true comparisons.
abs_diff_mat_epscmp = abs_diff_mat<=eps;
%// Find indices of A that are exclusive to it
A_ind = ~any(abs_diff_mat_epscmp,1);
%// Get unique(to account for no repetitions and being sorted) exclusive
%// A elements for the final output alongwith the indices
[C,IA] = intersect(A,unique(A(A_ind)));
return;
Example runs
Case1 (With integers)
This will verify that setdiff_fp works with integer arrays just the way setdiff does.
A = [2 5];
B = [9 8 8 1 2 1 1 5];
[C_setdiff,IA_setdiff] = setdiff(B,A)
[C_setdiff_fp,IA_setdiff_fp] = setdiff_fp(B,A)
Output
A =
2 5
B =
9 8 8 1 2 1 1 5
C_setdiff =
1 8 9
IA_setdiff =
4
2
1
C_setdiff_fp =
1 8 9
IA_setdiff_fp =
4
2
1
Case2 (With floating point numbers)
This is to show that setdiff_fp produces the correct results, while setdiff doesn't. Additionally, this will also test out the output indices.
A=1.:0.1:1.5
B=[A+0.1 5.5 5.5 2.6]
[C_setdiff,IA_setdiff] = setdiff(B,A)
[C_setdiff_fp,IA_setdiff_fp] = setdiff_fp(B,A)
Output
A =
1.0000 1.1000 1.2000 1.3000 1.4000 1.5000
B =
1.1000 1.2000 1.3000 1.4000 1.5000 1.6000 5.5000 5.5000 2.6000
C_setdiff =
1.2000 1.4000 1.6000 2.6000 5.5000
IA_setdiff =
2
4
6
9
7
C_setdiff_fp =
1.6000 2.6000 5.5000
IA_setdiff_fp =
6
9
7
For Tolerance of 1 epsilon This should work:
a=1.0:0.1:2.0;
b=a+0.1;
b=[b b-eps b+eps];
c=setdiff(a,b)
The idea is to expand b to include also its closest values.
I have encountered a problem in MatLab as I attempt to run a loop. For each iteration in the loop eigenvalues and eigenvectors for a 3x3 matrix are calculated (the matrix differs with each iteration). Further, each iteration should always yield one eigenvector of the form [0 a 0], where only the middle-value, a, is non-zero.
I need to obtain the index of the column of the eigenvector-matrix where this occurs. To do this I set up the following loop within my main-loop (where the matrix is generated):
for i = 1:3
if (eigenvectors(1,i)==0) && (eigenvectors(3,i)==0)
index_sh = i
end
end
The problem is that the eigenvector matrix in question will sometimes have an output of the form:
eigenvectors =
-0.7310 -0.6824 0
0 0 1.0000
0.6824 -0.7310 0
and in this case my code works well, and I get index_sh = 3. However, sometimes the matrix is of the form:
eigenvectors =
0.0000 0.6663 0.7457
-1.0000 0.0000 0.0000
-0.0000 -0.7457 0.6663
And in this case, MatLab does not assign any value to index_sh even though I want index_sh to be equal to 1 in this case.
If anyone knows how I can tackle this problem, so that MatLab assigns a value also when the zeros are written as 0.0000 I would be very grateful!
The problem is, very likely, that those "0.0000" are not exactly 0. To solve that, choose a tolerance and use it when comparing with 0:
tol = 1e-6;
index_sh = find(abs(eigenvectors(1,:))<tol & abs(eigenvectors(3,:))<tol);
In your code:
for ii = 1:3
if abs(eigenvectors(1,ii))<tol && abs(eigenvectors(3,ii))<tol
index_sh = i
end
end
Or, instead of a tolerance, you could choose the column whose first- and third-row entries are closer to 0:
[~, index_sh] = min(abs(eigenvectors(1,:)) + abs(eigenvectors(3,:)));
In Matlab, there is this unique command that returns thew unique rows in an array. This is a very handy command.
But the problem is that I can't assign tolerance to it-- in double precision, we always have to compare two elements within a precision. Is there a built-in command that returns unique elements, within a certain tolerance?
With R2015a, this question finally has a simple answer (see my other answer to this question for details). For releases prior to R2015a, there is such a built-in (undocumented) function: _mergesimpts. A safe guess at the composition of the name is "merge similar points".
The function is called with the following syntax:
xMerged = builtin('_mergesimpts',x,tol,[type])
The data array x is N-by-D, where N is the number of points, and D is the number of dimensions. The tolerances for each dimension are specified by a D-element row vector, tol. The optional input argument type is a string ('first' (default) or 'average') indicating how to merge similar elements.
The output xMerged will be M-by-D, where M<=N. It is sorted.
Examples, 1D data:
>> x = [1; 1.1; 1.05]; % elements need not be sorted
>> builtin('_mergesimpts',x,eps) % but the output is sorted
ans =
1.0000
1.0500
1.1000
Merge types:
>> builtin('_mergesimpts',x,0.1,'first')
ans =
1.0000 % first of [1, 1.05] since abs(1 - 1.05) < 0.1
1.1000
>> builtin('_mergesimpts',x,0.1,'average')
ans =
1.0250 % average of [1, 1.05]
1.1000
>> builtin('_mergesimpts',x,0.2,'average')
ans =
1.0500 % average of [1, 1.1, 1.05]
Examples, 2D data:
>> x = [1 2; 1.06 2; 1.1 2; 1.1 2.03]
x =
1.0000 2.0000
1.0600 2.0000
1.1000 2.0000
1.1000 2.0300
All 2D points unique to machine precision:
>> xMerged = builtin('_mergesimpts',x,[eps eps],'first')
xMerged =
1.0000 2.0000
1.0600 2.0000
1.1000 2.0000
1.1000 2.0300
Merge based on second dimension tolerance:
>> xMerged = builtin('_mergesimpts',x,[eps 0.1],'first')
xMerged =
1.0000 2.0000
1.0600 2.0000
1.1000 2.0000 % first of rows 3 and 4
>> xMerged = builtin('_mergesimpts',x,[eps 0.1],'average')
xMerged =
1.0000 2.0000
1.0600 2.0000
1.1000 2.0150 % average of rows 3 and 4
Merge based on first dimension tolerance:
>> xMerged = builtin('_mergesimpts',x,[0.2 eps],'average')
xMerged =
1.0533 2.0000 % average of rows 1 to 3
1.1000 2.0300
>> xMerged = builtin('_mergesimpts',x,[0.05 eps],'average')
xMerged =
1.0000 2.0000
1.0800 2.0000 % average of rows 2 and 3
1.1000 2.0300 % row 4 not merged because of second dimension
Merge based on both dimensions:
>> xMerged = builtin('_mergesimpts',x,[0.05 .1],'average')
xMerged =
1.0000 2.0000
1.0867 2.0100 % average of rows 2 to 4
This is a difficult problem. I'd even claim it to be impossible to solve in general, because of what I'd call the transitivity problem. Suppose that we have three elements in a set, {A,B,C}. I'll define a simple function isSimilarTo, such that isSimilarTo(A,B) will return a true result if the two inputs are within a specified tolerance of each other. (Note that everything I will say here is meaningful in one dimension as well as in multiple dimensions.) So if two numbers are known to be "similar" to each other, then we will choose to group them together.
So suppose we have values {A,B,C} such that isSimilarTo(A,B) is true, and that isSimilarTo(B,C) is also true. Should we decide to group all three together, even though isSimilarTo(A,C) is false?
Worse, move to two dimensions. Start with k points equally spaced around the perimeter of a circle. Assume the tolerance is chosen such that any point is within the specified tolerance of its immediate neighbors, but not to any other point. How would you choose to resolve which points are "unique" in the setting?
I'll claim that this problem of intransitivity makes the grouping problem not possible to resolve, at least not perfectly, and certainly not in any efficient manner. Perhaps one might try an approach based on a k-means style of aggregation. But this will be quite inefficient, as well, such an approach generally needs to know in advance the number of groups to look for.
Having said that, I would still offer a compromise, something that can sometimes work within limits. The trick is found in Consolidator, as found on the Matlab Central file exchange. My approach was to effectively round the inputs to within the specified tolerance. Having done that, a combination of unique and accumarray allows the aggregation to be done efficiently, even for large sets of data in one or many dimensions.
This is a reasonable approach when the tolerance is large enough that when multiple pieces of data belong together, they will be rounded to the same value, with occasional errors made by the rounding step.
As of R2015a, there is finally a function to do this, uniquetol (before R2015a, see my other answer):
uniquetol Set unique within a tolerance.
uniquetol is similar to unique. Whereas unique performs exact comparisons, uniquetol performs comparisons using a tolerance.
The syntax is straightforward:
C = uniquetol(A,TOL) returns the unique values in A using tolerance TOL.
As are the semantics:
Each value of C is within tolerance of one value of A, but no two elements in C are within tolerance of each other. C is sorted in ascending order. Two values u and v are within tolerance if:
abs(u-v) <= TOL*max(A(:),[],1)
It can also operate "ByRows", and the tolerance can be scaled by an input "DataScale" rather than by the maximum value in the input data.
But there is an important note about uniqueness of the solutions:
There can be multiple valid C outputs that satisfy the condition, "no two elements in C are within tolerance of each other." For example, swapping columns in A can result in a different solution being returned, because the input is sorted lexicographically by the columns. Another result is that uniquetol(-A,TOL) may not give the same results as -uniquetol(A,TOL).
There is also a new function ismembertol is related to ismember in the same way as above.
There is no such function that I know of. One tricky aspect is that if your tolerance is, say, 1e-10, and you have a vector with values that are equally spaced at 9e-11, the first and the third entry are not the same, but the first is the same as the second, and the second is the same as the third - so how many "uniques" are there?
One way to solve the problem is that you round your values to a desired precision, and then run unique on that. You can do that using round2 (http://www.mathworks.com/matlabcentral/fileexchange/4261-round2), or using the following simple way:
r = rand(100,1); % some random data
roundedData = round(r*1e6)/1e6; % round to 1e-6
uniqueValues = unique(roundedData);
You could also do it using the hist command, as long as the precision is not too high:
r = rand(100,1); % create 100 random values between 0 and 1
grid = 0:0.001:1; % creates a vector of uniquely spaced values
counts = hist(r,grid); % now you know for each element in 'grid' how many values there are
uniqueValues = grid(counts>0); % and these are the uniques
I've come across this problem before. The trick is to first sort the data and then use the diff function to find the difference between each item. Then compare when that difference is less then your tolerance.
This is the code that I use:
tol = 0.001
[Y I] = sort(items(:));
uni_mask = diff([0; Y]) > tol;
%if you just want the unique items:
uni_items = Y(uni_mask); %in sorted order
uni_items = items(I(uni_mask)); % in the original order
This doesn't take care of "drifting" ... so something like 0:0.00001:100 would actually return one unique value.
If you want something that can handle "drifting" then I would use histc but you need to make some sort of rough guess as to how many items you're willing to have.
NUM = round(numel(items) / 10); % a rough guess
bins = linspace(min(items), max(items), NUM);
counts = histc(items, bins);
unit_items = bins(counts > 0);
BTW: I wrote this in a text-editor away from matlab so there may be some stupid typos or off by one errors.
Hope that helps
This is hard to define well, assume you have a tolerance of 1.
Then what would be the outcome of [1; 2; 3; 4]?
When you have multiple columns a definition could become even more challenging.
However, if you are mostly worried about rounding issues, you can solve most of it by one of these two approaches:
Round all numbers (considering your tolerance), and then use unique
Start with the top row as your unique set, use ismemberf to determine whether each new row is unique and if so, add it to your unique set.
The first approach has the weakness that 0.499999999 and 0.500000000 may not be seen as duplicates. Whilst the second approach has the weakness that the order of your input matters.
I was stuck the other day with a MatLab 2010, so, no round(X,n), no _mergesimpts (At least I couldn't get it to work) so, a simple solution that works (at least for my data):
Using rat default tolerance:
unique(cellstr(rat(x)))
Other tolerance:
unique(cellstr(rat(x,tol)))