I created a list of names of data files, e.g. abc123.xml, abc456.xml, via
list = dir('folder/*.xml').
Matlab starts this out as a 10x1 struct with 5 fields, where the first one is the name. I extracted the needed data with struct2table, so I now got a 10x1 table. I only need the numerical value as 10x1 double. How can I get rid of the alphanumerical stuff and change the data type?
I tried regexp (Undefined function 'regexp' for input arguments of type 'table') and strfind (Conversion to double from table is not possible). Couldn't come up with anything else, as I'm very new to Matlab.
You can extract the name fields and place them in a cell array, use regexp to capture the first string of digits it finds in each name, then use str2double to convert those to numeric values:
strs = regexp({list.name}, '(\d+)', 'once', 'tokens');
nums = str2double([strs{:}]);
Related
I have a char array of format a = [1.234 ; 2.345; 3.456] and I need to convert this to a numeric array in MATLAB. I have tried str2num(a) but it only seems to work on integers, as it is returning an empty vector. Here is what the data actually looks like:
Any suggestions on how to tackle this problem are appreciated!
If your character array is either of the following formats:
a = '[1.234; 2.345; 3.456]'; % 1-by-N with brackets, spaces, or semicolons
a = ['1.234'; '2.345'; '3.456']; % M-by-N
Then str2num should work as you want:
vec = str2num(a)
vec =
1.234000000000000
2.345000000000000
3.456000000000000
If it's not working then that probably means that your character array val has rows with invalid formats or characters that don't properly convert. Since the array has 3100 rows, you probably don't want to search through it by hand. One easy way to highlight where invalid rows might be is to identify where there are characters other than numbers, periods, or white space. Here's how you can get a list of rows that may warrant further inspection:
suspiciousRows = find(~all(ismember(val, '0123456789. '), 2));
The function str2double would works for your case. Please refer to this link for detailed usage.
str2num would work as it is but as str2num uses eval,so a better alternative is str2double. But it is not directly applicable on a char array like yours. You can convert that array into a cell using cellstr and then apply str2double.
req = str2double(cellstr(val))
Another approach if you have MATLAB R2016b or a later version is to convert that char array into a string array and then apply str2double.
req = str2double(string(val))
What is the difference between string and character class in MATLAB?
a = 'AX'; % This is a character.
b = string(a) % This is a string.
The documentation suggests:
There are two ways to represent text in MATLAB®. You can store text in character arrays. A typical use is to store short pieces of text as character vectors. And starting in Release 2016b, you can also store multiple pieces of text in string arrays. String arrays provide a set of functions for working with text as data.
This is how the two representations differ:
Element access. To represent char vectors of different length, one had to use cell arrays, e.g. ch = {'a', 'ab', 'abc'}. With strings, they can be created in actual arrays: str = [string('a'), string('ab'), string('abc')].
However, to index characters in a string array directly, the curly bracket notation has to be used:
str{3}(2) % == 'b'
Memory use. Chars use exactly two bytes per character. strings have overhead:
a = 'abc'
b = string('abc')
whos a b
returns
Name Size Bytes Class Attributes
a 1x3 6 char
b 1x1 132 string
The best place to start for understanding the difference is the documentation. The key difference, as stated there:
A character array is a sequence of characters, just as a numeric array is a sequence of numbers. A typical use is to store short pieces of text as character vectors, such as c = 'Hello World';.
A string array is a container for pieces of text. String arrays provide a set of functions for working with text as data. To convert text to string arrays, use the string function.
Here are a few more key points about their differences:
They are different classes (i.e. types): char versus string. As such they will have different sets of methods defined for each. Think about what sort of operations you want to do on your text, then choose the one that best supports those.
Since a string is a container class, be mindful of how its size differs from an equivalent character array representation. Using your example:
>> a = 'AX'; % This is a character.
>> b = string(a) % This is a string.
>> whos
Name Size Bytes Class Attributes
a 1x2 4 char
b 1x1 134 string
Notice that the string container lists its size as 1x1 (and takes up more bytes in memory) while the character array is, as its name implies, a 1x2 array of characters.
They can't always be used interchangeably, and you may need to convert between the two for certain operations. For example, string objects can't be used as dynamic field names for structure indexing:
>> s = struct('a', 1);
>> name = string('a');
>> s.(name)
Argument to dynamic structure reference must evaluate to a valid field name.
>> s.(char(name))
ans =
1
Strings do have a bit of overhead, but still increase by 2 bytes per character. After every 8 characters it increases the size of the variable. The red line is y=2x+127.
figure is created using:
v=[];N=100;
for ct = 1:N
s=char(randi([0 255],[1,ct]));
s=string(s);
a=whos('s');v(ct)=a.bytes;
end
figure(1);clf
plot(v)
xlabel('# characters')
ylabel('# bytes')
p=polyfit(1:N,v,1);
hold on
plot([0,N],[127,2*N+127],'r')
hold off
One important practical thing to note is, that strings and chars behave differently when interacting with square brackets. This can be especially confusing when coming from python. consider following example:
>>['asdf' '123']
ans =
'asdf123'
>> ["asdf" "123"]
ans =
1×2 string array
"asdf" "123"
I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str
I searched the documentation about the fastaread function, but I'm still kind of confused about the Sequence part.
So, suppose my file is stored in the file_path location, and fastaread(file_path) will return the data obtained.
fastaread returns two columns, one is with title header and the other one is with title sequence. Then, fastaread(file_path).sequence will return the sequence column? Does that mean that fastaread(file_path).sequence is a column vector?
Actually it's a row vector containing characters.
Example with data provided with Matlab:
p53nt = fastaread('p53nt.txt')
gives a structure with 2 fields: Header and Sequence.
p53nt.Header gives info about the actual sequence:
p53nt.Header
ans =
gi|8400737|ref|NM_000546.2| Homo sapiens tumor protein p53 (Li-Fraumeni syndrome) (TP53), mRNA
while p53nt.Sequence gives a character array of size 1xN:
S = p53nt.Sequence
S =
ACTTGTCATGGCGACTGTCCAGC... And so on
Typing whos S gives this:
Name Size Bytes Class Attributes
S 1x2629 5258 char
So if you want a column vector for some reason, use the colon operator:
S = S(:);
Hope its a bit clearer now!
I have a string in the following format :
fileName.jpg,10,20,10,10,...,12,14,True
Basically, I have a string with comma separated values. The first value is a string, then it follows an array of 100 values and lastly another string being true or false.
Is there a way or directly reading these values into 3 variable? Two strings and an array?
The array of values might contain n\a values which I want to treat as -1 or something similar or by using a cell array and having an empty cell for those? Can you recommend me something for this type of problem?
You can use textscan:
n = 100; % number of integers between filename and logical values
M = textscan(str, ['%s' repmat('%d',1, n) '%s'], 'delimiter', ',',...
'TreatAsEmpty', 'n\a', 'EmptyValue', -1, 'CollectOutput', true);
The result M is a cell array with the file name in the first cell, the 100 integer values in the second, and a string containing the logical value in the last cell.
You can use strsplit and extract the values from your String and store them in separate variables
Code Sample:
a = strsplit("fileName.jpg,10,20,10,10,...,12,14,True",",")
fileName = a(1)
flag = a(end)
data = a(2:end-1)