Filtering sequences in MATLAB - matlab

Is it possible to do something like regular expressions with MATLAB to filter things out? Basically I'm looking for something that will let me take a vector like:
[1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1]
and will return:
[3 3 3 3 4 4 4]
These are the uninterrupted sequences (where there's no interspersion).
Is this possible?

Using regular expressions
Use MATLAB's built-in regexp function for regular expression matching. However, you have to convert the input array to a string first, and only then feed it to regexp:
C = regexp(sprintf('%d ', x), '(.+ )(\1)+', 'match')
Note that I separated the values with spaces so that regexp can match multiple digit numbers as well. Then convert the result back to a numerical array:
res = str2num([C{:}])
The dot (.) in the pattern string represents any character. To find sequences of certain digits only, specify them in brackets ([]). For instance, the pattern to find only sequences of 3 and 4 would be:
([34]+ )(\1)+
A simpler approach
You can filter out successively repeating values by checking the similarity between adjacent elements using diff:
res = x((diff([NaN; x(:)])' == 0) | (diff([x(:); NaN])' == 0))
Optionally, you can keep only certain values from the result, for example:
res(res == 3 | res == 4)

You can do it like this:
v=[1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1];
vals=unique(v); % find all unique values in the vector
mask=[]; % mask of values to retain
for i=1:length(vals)
indices=find(v==vals(i)); % find indices of each unique value
% if the maximum difference between indices containing
% a given value is 1, it is contiguous
% --> add this value to the mask
if max(indices(2:end)-indices(1:end-1))==1
mask=[mask vals(i)];
end
end
% filter out what's necessary
vproc=v(ismember(v,mask))
Result:
vproc =
3 3 3 3 4 4 4

This can be another approach, although a little bit too elaborated.
If you see the plot of your array, you want to retain its level sets (i.e. a == const) which are topologically connected (i.e. made by one piece).
Coherently, such level sets are exactly the ones corresponding to a==3 and a==4.
Here is a possible implementation
a = [1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1]
r = []; % result
b = a; % b will contain the union of the level sets not parsed yet
while ~isempty(b)
m = a == b(1); % m is the current level set
connected = sum(diff([0 m 0]).^2) == 2; % a condition for being a connected set:
% the derivative must have 1 positive and 1
% negative jump
if connected == true % if the level set is connected we add it to the result
r = [r a(m)];
end
b = b(b~=b(1));
end

If you try something like
a = [1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1] %initial vector
b = a>=3 %apply filter condition
a = a(b) %keep values that satisfy filter
a will output
a = [3 3 3 3 4 4 4]

Related

How do I compare elements of one array to a column of a matrix, then shorten the matrix correspondingly?

I have a matrix 'Z' sized 100000x2 and imported as an Excel file using readmatrix. I have a created array 'Time' (Time = [-200:0.1:300]'). I would like to compare all values in column 1 of 'Z' to 'Time' and eliminate all values of column 1 of 'Z' that do not equal a value of 'Time', thus shortening my 'Z' matrix to match my desired time values. Column 2 are pressure traces, so this would give me my desired time values and the corresponding pressure trace.
This sort of thing can be done without loops:
x = [1,2,3,4,1,1,2,3,4];
x = [x', (x+1)'] % this is your 'Z' data from the excel file (toy example here)
x =
1 2
2 3
3 4
4 5
1 2
1 2
2 3
3 4
4 5
y = [1,2]; % this is your row of times you want eliminated
z = x(:,1)==y % create a matrix logical arrays indicating the matches in the first column
z =
9×2 logical array
1 0
0 1
0 0
0 0
1 0
1 0
0 1
0 0
0 0
z = z(:,1)+z(:,2); % there is probably another summing technique that is better for your case
b = [x(z~=1,1), x(z~=1,2)] % use matrix operations to extract the desired rows
b =
3 4
4 5
3 4
4 5
All the entries of x where the first column did not equal 1 or 2 are now gone.
x = ismember(Z(:,1),Time); % logical indexes of the rows you want to keep
Z(~x,:) = []; % get rid of the other rows
Or instead of shortening Z you could create a new array to use downstream in your code:
x = ismember(Z(:,1),Time); % logical indexes of the rows you want to keep
Znew = Z(x,:); % the subset you want
You have to loop over all rows, use a nested if statement to check the item, and delete the row if it doesn't match.
Syntax for loops:
for n = 1:100000:
//(operation)//
end
Syntax for if statements:
if x == y
//(operation)//
Syntax for deleting a row: Z(rownum,:) = [];

Comparing Vectors of Different Length

I am trying to compare two vectors of different size. For instance when I run the code below:
A = [1 4 3 7 9];
B = [1 2 3 4 5 6 7 8 9];
myPadded = [A zeros(1,4)];
C = ismember(myPadded,B)
I get the following output:
C = 1 1 1 1 1 0 0 0 0
However, I want an output that will reflect the positions of the compared values, hence, I would like an output that is displayed as follows:
C = 1 0 1 1 0 0 1 0 1
Please, I need some help :)
There are 2 points. First, you are writing the inputs of ismember in the wrong order. Additionally, you do not need to grow your matrix. Simply try ismember(B, A) and you will get what you expect.
The function ismember(myPadded, B) returns a vector the same size of myPadded, indicating if the i-th element of myPadded is present in B.
To get what you want, just invert parameter order: ismember(B, myPadded).
A quick way of doing this is to use logical indexing. This will only work if the last digit of B is included in A.
A = [1 4 3 7 9];
c(A) = 1; % or true.
An assumption here is that you want to subindex a vector 1:N, so that B always is B = 1:N. In case the last digit is not one this is easy to fix. Just remember to return all to its previous state after you are done. It will be 2 rows extra though.
This solution is meant as a special case working on a very common problem.

MATLAB: Creating a matrix with all possible group combinations

I'm running an experiment with lots of conditions, and particular numbers of groups in each condition.
A. 3 groups
B. 3 groups
C. 2 groups
D. 3 groups
E. 3 groups
I've worked out that there are 3×3×2×3×3 = 162 possible combinations of groups.
I want to create a MATLAB matrix with 162 rows and 5 columns. That is, one row for each combination and one column to indicate the value for each group.
So, for instance, the first row would be [1 1 1 1 1], indicating that this combination is group 1 for all conditions. The second row would be [1 1 1 1 2], indicating that it's group 1 for all conditions except for the last which is group 2. The 162nd and final row would be [3 3 2 3 3].
M = 1 1 1 1 1
1 1 1 1 2
.........
3 3 2 3 3
What's the most efficient way to achieve this? I realise I could use a loop, but feel sure there's a better way. I thought maybe the perms function would work but I can't see how.
You can use combvec (see last line, the rest is only generating test data):
% A. 3 groups
% B. 3 groups
% C. 2 groups
% D. 3 groups
% E. 3 groups
ngroups = zeros(5, 1);
ngroups(1) = 3;
ngroups(2) = 3;
ngroups(3) = 2;
ngroups(4) = 3;
ngroups(5) = 3;
v = {};
for i = 1:length(ngroups)
v{i} = 1:ngroups(i) % generate a vector of valid group indices
end
% get all possible combinations
x = combvec( v{:} )
As this will return a 5 x 162 double you need to transpose the resulting matrix x:
x.'

Count number of bouts separated by zeros

I have a vector like this:
A = [1 2 1 1 1 4 5 0 0 1 2 0 2 3 2 2 2 0 0 0 0 33]
I would like to count how many GROUPS of non zero elements it contains and save them.
so I want to isolate:
[1 2 1 1 1 4 5]
[1 2]
[2 3 2 2 2]
[33]
and then count the groups (they should be 4) :)
Can you help me please?
Thanks
To count your groups, a fast vectorized method using logical indexing is:
count = sum(diff([A 0]==0)==1)
This assumes that A is a row vector as in your example. This works with no zeros, all zeros, the empty vector, and several other test cases I tried.
To obtain your groups of values themselves, you can use a variation to my answer to a similar question:
a0 = (A~=0);
d = diff(a0);
start = find([a0(1) d]==1) % Start index of each group
len = find([d -a0(end)]==-1)-start+1 % Length, number of indexes in each group
In your case it might make sense to replace len with
finish = find([d -a0(end)]==-1) % Last index of each group
The length of start, len, and finish should be the same as the value of count so you could just use this if you need to do the breaking up. You can then use start and len (or finish) to store your groups in a cell array or struct or some other ragged array. For example:
count = length(start);
B = cell(count,1);
for i = 1:count
B{i} = A(start(i):finish(i));
end

MATLAB Combine matrices of different dimensions, filling values of corresponding indices

I have two matrices, 22007x3 and 352x2. The first column in each is an index, most (but not all) of which are shared (i.e. x1 contains indices that aren't in x2).
I would like to combine the two matrices into a 22007x4 matrix, such that column 4 is filled in with the values that correspond to particular indices in both original matrices.
For example:
x1 =
1 1 5
1 2 4
1 3 5
2 1 1
2 2 1
2 3 2
x2 =
1 15.5
2 -5.6
becomes
x3 =
1 1 5 15.5
1 2 4 15.5
1 3 5 15.5
2 1 1 -5.6
2 2 1 -5.6
2 3 2 -5.6
I've tried something along the lines of
x3(1:numel(x1),1:3)=x1;
x3(1:numel(x2(:,2)),4)=x2(:,2);
but firstly I get the error
??? Subscripted assignment dimension mismatch.
and then I can't figure out I would fill the rest of it.
An important point is that there are not necessarily an equal number of rows per index in my data.
How might I make this work?
Taking Amro's answer from here
[~, loc] = ismember(x1(:,1), x2(:,1));
ismember's second argument returns the location in x2 where each element of x1 can be found (or 0 if it can't)
a = x2(loc(loc > 0), 2);
get the relevant values using these row indices but excluding the zeros, hence the loc > 0 mask. You have to exclude these as 1, they are not in x2 and 2 you can't index with 0.
Make a new column of default values to stick on the end of x1. I think NaN() is probably better but zeros() is also fine maybe
newCol = NaN(size(x1,1),1)
Now use logical indexing to get the locations of the non zero elements and put a in those locations
newCol(loc > 0) = a
Finnaly stick it on the end
x3 = [x1, newCol]