find strings in cell array using wildcards - matlab

I have a cell array of which the last part looks like this:
Columns 8372 through 8375
{'w20091231_2000.nc'} {'w20091231_2020.nc'} {'w20091231_2040.nc'} {'w20091231_2100.nc'}
Columns 8376 through 8379
{'w20091231_2120.nc'} {'w20091231_2140.nc'} {'w20091231_2200.nc'} {'w20091231_2220.nc'}
Columns 8380 through 8383
{'w20091231_2240.nc'} {'w20091231_2300.nc'} {'w20091231_2320.nc'} {'w20091231_2340.nc'}
Columns 8384 through 8387
{'wD1.nc'} {'wD2.nc'} {'wD3.nc'} {'wD4.nc'}
Now I want to rearrange this array so that it only contains the last four strings.{'wD1.nc'} {'wD2.nc'} {'wD3.nc'} {'wD4.nc'}
I tried
IndexC = strfind(names,'wD*.nc');
Index = find(not(cellfun('isempty',IndexC)))
and
Index = find(contains(names,'wD*.nc'));
names2=names(Index)
both work if wD*.nc is wD4.nc but then of course I only select the one value and not the four that I want.
How do I get to use the * ?

I had to do some googling but found this https://www.mathworks.com/matlabcentral/answers/77039-comparing-strings-with-wildcards , and something like the following seems to work:
IndexC = regexp(names, regexptranslate('wildcard', 'wD*.nc'));
Index = find(not(cellfun('isempty',IndexC)));
names2=names(Index)

In one line using regexp with the match option:
x = regexp([x{:}],'wD\d+\.nc','match')

Related

How to label structure array elements in an additional field?

I have a 1800x1 structure array with 5 fields. In the field "trial" I`ve stored 14 different numbers which are indicative for a certrain trial characteristica. So for example if 1 stands for rewarded trial and 2 stands for non rewarded trial, I want to add another field which tells me the labels of the respective other field. Any ideas about how to do that?
Assuming you have this data:
a = num2cell(randi(3,15,1));
strings = {'Laurie','rewarded trial','yada yada'};
s = struct('trail',a,'name',[]);
where the value in s(k).trail is the index from strings to be assigned to s(k).name, you can write:
s = struct('trail',a,'name',strings(cell2mat({s.trail})).');
Alternatively, you can do it with a loop:
for k = 1:size(s,1)
s(k).names = strings{s(k).trail};
end

Find and replace in matlab?

I want do find and replace all in matlab (As we do in MS office).
https://www.dropbox.com/s/hxfqunjwhnvkl1f/matlab.mat?dl=0
I have a cell array LUT_HS_complete (contains identifier in column 1 and protein name in column 2 and summary in column 3) this is my look up table. on the other hand, I have my protein-protein interaction data (named Second_layer with identifiers in first two columns and the score in column 3).
I want to replace the first two columns in my Second_layer with the corresponding protein name from my look up table.
I tried strmatch, but that didn't help me.
Source_gene = Second_layer(:,1); Source_gene = regexprep(Source_gene,'[-/\s]','');
Target_gene = Second_layer(:,2); Target_gene = regexprep(Target_gene,'[-/\s]','');
Inter_score = Second_layer(:,3);
%%
for i=1:length(Source_gene(1:end,1));
SG = strmatch(Source_gene(i),LUT_HS_complete(1:end,1),'exact');
renamed_Source_gene(SG,1) = LUT_HS_complete(SG,2);
end
for j=1:length(Target_gene(1:end,1));
TG = strmatch(Target_gene(j),LUT_HS_complete(1:end,1),'exact');
renamed_Target_gene(TG,1) = LUT_HS_complete(TG,2);
end
If you could find a solution. It would be a great help.
Might this work for you?
renamed_Second_layer(:,1)=LUT_HS_complete(cellfun(#(x) find(strcmp(x,LUT_HS_complete(:,1))),Second_layer(:,1)),2);
renamed_Second_layer(:,2)=LUT_HS_complete(cellfun(#(x) find(strcmp(x,LUT_HS_complete(:,1))),Second_layer(:,2)),2);
renamed_Second_layer(:,3)=Second_layer(:,3);

Is there a quick way to assign unique text entries in an array a number?

In MatLab, I have several data vectors that are in text. For example:
speciesname = [species1 species2 species3];
genomelength = [8 10 5];
gonometype = [RNA DNA RNA];
I realise that to make a plot, arrays must be numerical. Is there a quick and easy way to assign unique entries in an array a number, for example so that RNA = 1 and DNA = 2? Note that some arrays might not be binary (i.e. have more than two options).
Thanks!
So there is a quick way to do it, but im not sure that your plots will be very intelligible if you use numbers instead of words.
You can make a unique array like this:
u = unique(gonometype);
and make a corresponding number array is just 1:length(u)
then when you go through your data the number of the current word will be:
find(u == current_name);
For your particular case you will need to utilize cells:
gonometype = {'RNA', 'DNA', 'RNA'};
u = unique(gonometype)
u =
'DNA' 'RNA'
current = 'RNA';
find(strcmp(u, current))
ans =
2

MatLab: Find numeric values in cell array

In my cell array test = cell(1,2,20,14); I want to find numeric values in the subset test(:,1,1,1).
For example test(:,:,1,1) looks like this:
>> test(:,:,1,1)
ans =
[ 0] [0.1000] [57]
[0.9000] [0.9500] [73]
I want to find the index of the cell containing 0.9 in the first column, so I can access the third column (in this case value 73). I tried:
find(test{:,:,1,1} == 0.9) which gives:
Error using == Too many input arguments..
How can I find the respective index?
Thanks.
Try this to access that third column value directly -
cell2mat(test(vertcat(test{:,1,1,1})==0.9,3,1,1))
Edit 1: If you would like to test out for match w.r.t. the first two columns of test's subset, use this -
v1 = reshape(vertcat(test{:,[1 2],1,1}),[],2)
cell2mat(test(ismember(v1,[0.9 0.95],'rows'),3,1,1))
Just add brackets [] around test{:,:,1,1}. This wraps the different cell values together to one vector/matrix.
Like this:
[index1, index2] = find([test{:,:,1,1}] == 0.9)

Complementary array Matlab

We've got an array of values, and we would like to create another array whose values are not in the first one.
Example:
load('internet.mat')
The first column contains the values in MBs, we have thought in something like:
MB_no = setdiff(v, internet(:,1))
where v is a 0 vector whose length equals to the number of rows in internet.mat. But it just doesn't work.
So, how do we do this?
You need to specify the range of possible values to define what values are not in internet . Say the range is v = 1:10 then setdiff(v,internet(:,1)) will give you the values in 1:10 that are not in the first column of internet.
It seems as if you don't want the first column.
You can simply do:
MB_no=internet(:,2:end);
assuming internet(:,1) has only positive integers and you wish to find which are the integers in [1,...,max( internet(:,1) )] that do not appear in that range you can simply do
app = [];
app( internet(:,1) ) = 1;
MB_no = find( app == 0 );
This is somewhat like bucket sort.