Remove specific rows from a structure - matlab

I have a 1x1 structure (EEG) with 42 fields. One of these fields is called event and is a 1x180 structure, with 13 different fields, some of which are strings and some numeric values.
The 4th field of EEG.event is type and it contains strings (i.e. 'preo', 'pred', 'to', 'td', 'po', 'pd').
I would like to keep only those rows of the structure that contain 'preo' in the column EEG.event.type.
My ultimate aim is to create a matrix with all the columns from the structure EEG.event and only the rows with 'preo' in EEG.event.type, plus other columns from other variables.
So far I tried:
S = struct2table(EEG.event);
and it correctly returns a 180x13 table.
However I was not able to select only the rows with 'preo' in type. I tried:
A= S(S.type=='preo', :);
and it gives me an error:
Undefined operator '==' for input arguments of type 'cell'.
I also tried:
array(strcmp(S(:, 4), 'preo'), :) = [];
and it gives me this error:
Deletion requires an existing variable.
Then I thought that maybe I should have converted the table into matrix, to directly delete rows from the matrix. I tried:
B = cell2mat(S);
but it returns this error:
Error using cell2mat (line 42)
You cannot subscript a table using only one subscript. Table subscripting requires both row and variable subscripts.
Any suggestion or tip is welcome, because I don't know how to continue.
Example list that I have (only 18 rows here):
13 1 201011 'preo' 2502 201 1 1 'y' 'h' 13 13.9230000000000 13
14 1 201011 'pred' 2684 201 1 1 'y' 'h' 14 14.1049999960000 14
15 1 201012 'to' 2707 201 1 2 'y' 'h' 15 14.1280000000000 15
16 1 201012 'td' 2993 201 1 2 'y' 'h' 16 14.4140000000000 16
17 1 201013 'po' 3019 201 1 3 'y' 'h' 17 14.4400000000000 17
18 1 201013 'pd' 3383 201 1 3 'y' 'h' 18 14.8040000000000 18
55 2 61011 'preo' 8213 61 1 1 'y' 'h' 55 53.9000000000000 55
56 2 61011 'pred' 8522 61 1 1 'y' 'h' 56 54.2089999850000 56
57 2 61012 'to' 8547 61 1 2 'y' 'h' 57 54.2340000000000 57
58 2 61012 'td' 8834 61 1 2 'y' 'h' 58 54.5210000000000 58
59 2 61013 'po' 8858 61 1 3 'y' 'h' 59 54.5450000000000 59
60 2 61013 'pd' 9091 61 1 3 'y' 'h' 60 54.7780000000000 60
85 3 124011 'preo' 13924 124 1 1 'y' 'h' 85 82.4550000000000 85
86 3 124011 'pred' 14159 124 1 1 'y' 'h' 86 82.6899999990000 86
87 3 124012 'to' 14181 124 1 2 'y' 'h' 87 82.7120000000000 87
88 3 124012 'td' 14448 124 1 2 'y' 'h' 88 82.9790000000000 88
89 3 124013 'po' 14470 124 1 3 'y' 'h' 89 83.0010000000000 89
90 3 124013 'pd' 14713 124 1 3 'y' 'h' 90 83.2440000000000 90
Example list that I would like to have (from the 18 rows above):
13 1 201011 'preo' 2502 201 1 1 'y' 'h' 13 13.9230000000000 13
55 2 61011 'preo' 8213 61 1 1 'y' 'h' 55 53.9000000000000 55
85 3 124011 'preo' 13924 124 1 1 'y' 'h' 85 82.4550000000000 85

I found a solution, I post it here for others with my same issue.
I first create a cell array, and then I delete rows from the cell array. At the moment is the best I can think of.
myCell= struct2cell(EEG.event);
%it results in a 3d cell array, with the fields as first dimension (42) x a singleton dimension as second dimension x the number of the rows as third dimension (180)
new_Cell = permute(myCell,[3,1,2]);
%it deletes the singleton dimension and swap the other 2 dimensions, obtaining 180x42.
[r,c] = find(strcmp(new_Cell,'preo'));
%indices as rows (r) and columns (c) of cells with the string 'preo'
y = new_Cell([r],:);
%It keeps only the rows that you want from the original cell array 'myCell'.

Related

Checking if value exists in a matrix and getting its columns

I have a 500x500 matrix with values ranging from 1-100.
I need to look at 5 rows at a time and see if those 5 rows contain values that are greater than 75. I then need to get the index of the first column where the value is greater than 75 and the index of the last column where the value is greater than 75.
So far, I have the following:
i = 1;
while i < size(data,1)
if (i + 5) <= size(data,1)
if any(envNoClutterscansV(i:i + 5, 1:500) > 75)
% do something
end
end
i = i + 5;
end
The idea here is that I am looking at 5 rows at a time. For every 5 rows, I'm looking through all the columns to see if there are values that meet my criteria. So far, this doesn't find any values, even though I'm sure that my dataset contains the values. Additionally, I am not sure what to do from here.
I think the trouble might be that the result of any in the above code is a vector of 500 true and false values. You should sum them if you e=want to respond every time there are larger than 75 values:
if sum(any(envNoClutterscansV(i:i + 5, 1:500) > 75))
If you want to speed it up, you can avoid the loop and vectorize it, for example like this:
data = [
11 76 25 44 55 75;
11 75 95 44 85 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 90 25 44 55 88;
11 0 25 44 55 0;
91 0 25 44 55 80;
];
% Geting the number of rows
nRows=size(data,1);
% Retting a logical matrix with all the cells that are above the treshold
cellsOverTreshold=data>75;
% Getting a logical index to all the rows that contain values above
% treshold
matchingRows=any(cellsOverTreshold,2);
% In nexy line of code "reshape" rearange the data to put in columns the
% values associated to each goup of 5 rows
% So colum 1 have group one corresponding to data columns 1,2,3,4,5
% colum 2 have group two corresponding to data columns 6,7,8,9,10
% and so on
% Now we can get all the row groups that have velues above threshold
matchingRowGroups=find(any(reshape(matchingRows,5,[])));
% Now e put each row of on a cell array to be able to operate row-wise
cellRows = num2cell(cellsOverTreshold, 2);
% We now get the first and last column over the threshold for each row
firstColumOfRow = cellfun(#(x)find(x,1,'first'), cellRows,'UniformOutput',false);
lastColumOfRow = cellfun(#(x)find(x,1,'last'), cellRows,'UniformOutput',false);
% We replace the empty cells with NaNs so we can convert them to vectors
% without losing the indexing
firstColumOfRow(~matchingRows)={NaN};
lastColumOfRow(~matchingRows)={NaN};
% We rearrange the data as above and get the minimum of the first columns
% of each group, that is the first colum of the group above the threshold
firstColInGroup=nanmin(reshape([firstColumOfRow{:}]',5,[]));
% With the maximum of the last colums we get the last column of each group
lastColInGroup=nanmax(reshape([lastColumOfRow{:}]',5,[]));
% We finaly keep only the data of the groups with at that have at least one
% element above the threshold
firstColInGroup=firstColInGroup(matchingRowGroups);
lastColInGroup=lastColInGroup(matchingRowGroups);
In this way the variable "matchingRowGroups" have the indexes of each group of 5 rows that matchs. The variable "firstColInGroup" have the first column matching for each group and "lastColInGroup" the last one.
In addition to my previous answer, here is another option of vectorization, avoiding to transform data into cell arrays and avoiding using cellfun too, therefore, it is probably faster. Here it is:
data = [
11 76 25 44 55 75;
11 75 95 44 85 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 90 25 44 55 88;
11 0 25 44 55 0;
91 0 25 44 55 80;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 84 55 0;
11 0 25 44 55 0;
];
% Geting the number of rows
[nRows, nCols]=size(data);
% Retting a logical matrix with all the cells that are above the treshold
cellsOverTreshold=data>75;
% Getting a logical index to all the rows that contain values above
% treshold
matchingRows=any(cellsOverTreshold,2);
% In nexy line of code "reshape" rearange the data to put in columns the
% values associated to each goup of 5 rows
% So colum 1 have group one corresponding to data columns 1,2,3,4,5
% colum 2 have group two corresponding to data columns 6,7,8,9,10
% and so on
% Now we can get all the row groups that have velues above threshold
matchingRowGroups=find(any(reshape(matchingRows,5,[])))
%We find the rows and columns of all the first and last columns of each row
% that have values above threshold
[firstRow, firstCol]=find(cumsum(cumsum(cellsOverTreshold,2),2)==1);
[lastRow, lastCol]=find(cumsum(cumsum(cellsOverTreshold,2,'reverse'),2,'reverse')==1);
% Sort this data in vectors with one value per row, leaving NANs for rows
% with no element above threshold
firstColumOfRow=NaN(nRows,1);
lastColumOfRow=NaN(nRows,1);
firstColumOfRow(firstRow)=firstCol;
lastColumOfRow(lastRow)=lastCol;
% We rearrange the data as above and get the minimum of the first columns
% of each group, that is the first colum of the group above the threshold
firstColInGroup=nanmin(reshape(firstColumOfRow,5,[]));
% With the maximum of the last colums we get the last column of each group
lastColInGroup=nanmax(reshape(lastColumOfRow,5,[]));
% We finaly keep only the data of the groups with at that have at least one
% element above the threshold
firstColInGroup=firstColInGroup(matchingRowGroups)
lastColInGroup=lastColInGroup(matchingRowGroups)
This code looks 5 rows a time. Use find to locate the values > 75 and ind2sub to convert the indices returned by find to rows (ignored) and columns cols.
data = [
11 76 25 44 55 78;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 0 25 44 55 88;
11 0 25 44 55 0;
11 0 25 44 55 0;
];
for row = 1:5:size(data, 1)
fprintf('Row %d - %d\n', row, row+4);
indices = find(data(row:row+4,:) > 75);
if ~isempty(indices)
[~, cols] = ind2sub([5 size(data, 2)], indices);
col_min = min(cols);
col_max = max(cols);
fprintf('Column: %d and %d\n', col_min, col_max);
end
end
After thinking a bit more, here you have yet another simpler, faster and more compact solution. See my first solution for more datils on the naming of variables, but they are quite self explanatory
data = [
11 76 25 44 55 75;
11 75 95 44 85 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 90 25 44 55 88;
11 0 25 44 55 0;
91 0 25 44 55 80;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 84 55 0;
11 0 25 44 55 0;
];
% Geting the number of rows and columns
[nRows, nCols]=size(data);
%We create arrays with rows and column numbers of each element
[colNum,rowNum]=meshgrid(1:nCols,1:nRows);
% Set NaN the column numbers that do not match the treshold
colNum(data<=75)=NaN;
% Get the group number of each element
groupNum=ceil(rowNum/5);
%The matching groups are those that have at least one non-NaN element
matchingRowGroups = accumarray(groupNum(:),colNum(:),[],#(x)any(~isnan(x)))
%We get the minimum of the column numbers matching thershold on each group
firstColumOfGroup = accumarray(groupNum(:),colNum(:),[],#nanmin)
%We get the maximum of the column numbers matching thershold on each group
lastColumOfGroup = accumarray(groupNum(:),colNum(:),[],#nanmax)
The only difference with the previous solutions is that matchingRowGroups is a logical index, and firstColumOfGroup and lastColumOfGroup have one entry per group, instead of entries only for groups with elements above the threshold. Groups with no entry above threshold have NaN values

save a cell array in matlab as .xlsx or .csv file

I have a cell array myFile 637x16. The first row of the cell array is made of strings, because they will be the columns' labels in the .xlsx/.csv file.
From the second row on, the cell array is made of some columns with strings, and some columns with numbers.
I would like to export this cell array as a .xlsx or .csv file.
Here is an example of what I have:
'subject' 'PeakA' 'PeakL' 'number' 'epoch' 'code' 'type' 'latency' 'nitem' 'condition' 'ia' 'cover' 'variety' 'init_index' 'init_time' 'urevent'
5 3.50 82 13 1 201011 'pre' 2502 201 1 1 'y' 'h' 13 13.92 13
5 -1.27 112 55 2 61011 'pre' 8213 61 1 1 'y' 'h' 55 53.90 55
5 6.59 99 85 3 124011 'pre' 13924 124 1 1 'y' 'h' 85 82.45 85
5 12.65 105 127 4 178011 'pre' 19635 178 1 1 'y' 'h' 127 122.43 127
5 -0.35 105 157 5 89011 'pre' 25346 89 1 1 'y' 'h' 157 150.98 157
5 10.29 93 163 6 132011 'pre' 31057 132 1 1 'y' 'h' 163 156.69 163
5 4.61 65 193 7 166011 'pre' 36768 166 1 1 'y' 'h' 193 185.25 193
5 1.45 51 199 8 212011 'pre' 42479 212 1 1 'y' 'h' 199 190.96 199
I tried:
xlswrite('filename.xlsx', myFile);
but it gives me this error:
Warning: Could not start Excel server for export.
XLSWRITE will attempt to write file in CSV format.
> In xlswrite (line 174)
Error using xlswrite (line 187)
An error occurred on data export in CSV format.
Caused by:
Error using dlmwrite (line 112)
The input cell array cannot be converted to a matrix.
If you have a sufficiently recent version of Matlab (R2013b or older), writetable is your friend.
%# create a table
tbl = cell2table(yourCellArray(2:end,:),'variableNames',yourCellArray(1,:));
%# write to file
writetable(tbl,'filename.xlsx')
If you want to use xlswrite, it may be worth converting all data to string first, or to write the variable names separately, before you write the rest - I believe Matlab checks data types on the first row, which can cause typecast errors.

How to remove zero columns from array

I have an array which looks similar to:
0 2 3 4 0 0 7 8 0 10
0 32 44 47 0 0 37 54 0 36
I wish to remove all
0
0
from this to get:
2 3 4 7 8 10
32 44 47 37 54 36
I've tried x(x == 0) = []
but I get:
x =
2 32 3 44 4 47 7 37 8 54 10 36
How can I remove all zero columns?
Here is a possible solution:
x(:,all(x==0))=[]
You had the right approach with x(x == 0) = [];. By doing this, you would remove the right amount of elements that can still form a 2D matrix and this actually gives you a vector of values that are non-zero. All you have to do is reshape the matrix back to its original form with 2 rows:
x(x == 0) = [];
y = reshape(x, 2, [])
y =
2 3 4 7 8 10
32 44 47 37 54 36
Another way is with any:
y = x(:,any(x,1));
In this case, we look for any columns that are non-zero and use these locations to index into x and extract out those corresponding columns.
Result:
y =
2 3 4 7 8 10
32 44 47 37 54 36
Another way which is more for academic purposes is to use unique. Assuming that your matrix has all positive values:
[~,~,id] = unique(x.', 'rows');
y = x(:, id ~= 1)
y =
2 3 4 7 8 10
32 44 47 37 54 36
We transpose x so that each column becomes a row, and we look for all unique rows. The reason why the matrix needs to have all positive values is because the third output of unique assigns unique ID to each unique row in sorted order. Therefore, if we have all positive values, then a row of all zeroes would be assigned an ID of 1. Using this array, we search for IDs that were not assigned a value of 1, and use those to index into x to extract out the necessary columns.
You could also use sum.
Sum over the columns and any column with zeros only will be zeros after the summation as well.
sum(x,1)
ans =
0 34 47 51 0 0 44 62 0 46
x(:,sum(x,1)>0)
ans =
2 3 4 7 8 10
32 44 47 37 54 36
Also by reshaping nonzeros(x) as follows:
reshape(nonzeros(x), size(x,1), [])
ans =
2 3 4 7 8 10
32 44 47 37 54 36

matlab: Getting correct indices for an array

I'm having an array and when I apply find(im), I get indices for non zero elements. But, I want indices for all elements of array irrespective whether it is zero or non zero.
Here is my array:
im =[94 122 99 101 111 101;
99 92 103 87 107 116;
93 109 113 84 86 106;
5 17 6 54 56 53;
13 11 5 56 44 50;
0 10 5 49 42 51];
when I apply find(im): I get indices: 35(Since the array contain 0 in it). But I need to get 36.
How do i do it?
Since you want the linear indices of all elements in the array, and you know the number of elements in the array, their indices will be:
im = magic(5);
indices = 1:numel(im)
I.e. if you were to loop the array you would be looping all of the elements.

Tidying up a list

I'm fairly sure there should be an elegant solution to this (in MATLAB), but I just can't think of it right now.
I have a list with [classIndex, start, end], and I want to collapse consecutive class indices into one group like so:
This
1 1 40
2 46 53
2 55 55
2 57 64
2 67 67
3 68 91
1 94 107
Should turn into this
1 1 40
2 46 67
3 68 91
1 94 107
How do I do that?
EDIT
Never mind, I think I got it - it's almost like fmarc's solution, but gets the indices right
a=[ 1 1 40
2 46 53
2 55 55
2 57 64
2 67 67
3 68 91
1 94 107];
d = diff(a(:,1));
startIdx = logical([1;d]);
endIdx = logical([d;1]);
b = [a(startIdx,1),a(startIdx,2),a(endIdx,3)];
Here is one solution:
Ad = find([1; diff(A(:,1))]~=0);
output = A(Ad,:);
output(:,3) = A([Ad(2:end)-1; Ad(end)],3);
clear Ad
One way to do it if the column in question is numeric:
Build the differences along the id-column. Consecutive identical items will have zero here:
diffind = diff(a(:,1)');
Use that to index your array, using logical indexing.
b = a([true [diffind~=0]],:);
Since the first item is always included and the difference vector starts with the difference from first to second element, we need to prepend one true value to the list.