I have an cell-array of dimensions 1x6 like this:
A = {'25_2.mat','25_3.mat','25_4.mat','25_5.mat','25_6.mat','25_7.mat'};
I want to read for example from the A{1} , the number after the '_' i.e 2 for my example
Using cellfun, strfind and str2double
out = cellfun(#(x) str2double(x(strfind(x,'_')+1:strfind(x,'.')-1)),A)
How does it work?
This code simply finds the index of character one number after the occurrence of '_'. Lets call it as start_index. Then finds the character one number lesser than the index of occurrence of '.' character. Lets call it as end_index. Then retrieves all the characters between start_index and end_index. Finally converts those characters to numbers using str2double.
Sample Input:
A = {'2545_23.mat','2_3.mat','250_4.mat','25_51.mat','25_6.mat','25_7.mat'};
Output:
>> out
out =
23 3 4 51 6 7
You can access the contents of the cell by using the curly braces{...}. Once you have access to the contents, you can use indexes to access the elements of the string as you would do with a normal array. For example:
test = {'25_2.mat', '25_3.mat', '25_4.mat', '25_5.mat', '25_6.mat', '25_7.mat'}
character = test{1}(4);
If your string length is variable, you can use strfind to find the index of the character you want.
Assuming the numbers are non-negative integers after the _ sign: use a regular expression with lookbehind, and then convert from string to number:
numbers = cellfun(#(x) str2num(x{1}), regexp(A, '(?<=\_)\d+', 'match'));
Related
I have a text file with 1 million decimal digits of "e" number with 80 digits on each line excluding the first and the last line which have 76 and 4 digits and the file has 12501 lines. I want to convert it into a vector in matlab with each digit on each row. I tried num2str function, but the problem is that it gets converted like for example '7.1828e79' (13 characters). What can I do?
P.S.1: The first two lines of the text file (76 and 80 digits) are:
7182818284590452353602874713526624977572470936999595749669676277240766303535 47594571382178525166427427466391932003059921817413596629043572900334295260595630
P.S.2: I used "dlmread" and got a 12501x1 vector, with the first and second row of 7.18281828459045e+75 and 4.75945713821785e+79 and the problem is that when I use num2str for example for the first row value, I get: '7.182818284590453e+75' as a string and not the whole 76 digits. My aim was to do something like this:
e1=dlmread('e.txt');
es1=num2str(e1);
for i=1:12501
for j=1:length(es1(1,:))
a1((i-1)*length(es1(1,:))+j)=es1(i,j);
end
end
e_digits=a1.';
but I get a string like this:
a1='7.182818284590453e+754.759457138217852e+797.381323286279435e+799.244761460668082e+796.133138458300076e+791.416928368190255e+79 5...'
with 262521 characters instead of 1 million digits.
P.S.3: I think the problem might be solved if I can manipulate the text file in a way that I have one digit on each line and simply use dlmread.
Well, this is not hard, there are many ways to do it.
So first you want to load in your file as a Char Array using something simple like (you want a Char Array so that you can easily manipulate it to forget about the lines breaks) :
C = fileread('yourfile.txt'); %loads file as Char Array
D = C(~isspace(C)); %Removes SPACES which are line-breaks
Next, you want to actually append a SPACE between each char (this is because you want to use the num2str transform - and matlab needs to see the space), you can do this using a RESHAPE, a STRTRIM or simply a REGEX:
E = strtrim(regexprep(D, '.{1}', '$0 ')); %Add a SPACE after each Numeric Digit
Now you can transform it using str2num into a Vector:
str2num(E)'; %Converts the Char Array back to Vector
which will give you a single digit each row.
Using your example, I get a vector of 156 x 1, with 1 digit each row as you required.
You can get a digit per row like this
fid = fopen('e.txt','r');
c = textscan(fid,'%s');
c=cat(1,c{:});
c = cellfun(#(x) str2num(reshape(x,[],1)),c,'un',0);
c=cat(1,c{:});
And it is not the only possible way.
Could you please tell what is the final task, how do you plan using the array of e digits?
What is the difference between string and character class in MATLAB?
a = 'AX'; % This is a character.
b = string(a) % This is a string.
The documentation suggests:
There are two ways to represent text in MATLAB®. You can store text in character arrays. A typical use is to store short pieces of text as character vectors. And starting in Release 2016b, you can also store multiple pieces of text in string arrays. String arrays provide a set of functions for working with text as data.
This is how the two representations differ:
Element access. To represent char vectors of different length, one had to use cell arrays, e.g. ch = {'a', 'ab', 'abc'}. With strings, they can be created in actual arrays: str = [string('a'), string('ab'), string('abc')].
However, to index characters in a string array directly, the curly bracket notation has to be used:
str{3}(2) % == 'b'
Memory use. Chars use exactly two bytes per character. strings have overhead:
a = 'abc'
b = string('abc')
whos a b
returns
Name Size Bytes Class Attributes
a 1x3 6 char
b 1x1 132 string
The best place to start for understanding the difference is the documentation. The key difference, as stated there:
A character array is a sequence of characters, just as a numeric array is a sequence of numbers. A typical use is to store short pieces of text as character vectors, such as c = 'Hello World';.
A string array is a container for pieces of text. String arrays provide a set of functions for working with text as data. To convert text to string arrays, use the string function.
Here are a few more key points about their differences:
They are different classes (i.e. types): char versus string. As such they will have different sets of methods defined for each. Think about what sort of operations you want to do on your text, then choose the one that best supports those.
Since a string is a container class, be mindful of how its size differs from an equivalent character array representation. Using your example:
>> a = 'AX'; % This is a character.
>> b = string(a) % This is a string.
>> whos
Name Size Bytes Class Attributes
a 1x2 4 char
b 1x1 134 string
Notice that the string container lists its size as 1x1 (and takes up more bytes in memory) while the character array is, as its name implies, a 1x2 array of characters.
They can't always be used interchangeably, and you may need to convert between the two for certain operations. For example, string objects can't be used as dynamic field names for structure indexing:
>> s = struct('a', 1);
>> name = string('a');
>> s.(name)
Argument to dynamic structure reference must evaluate to a valid field name.
>> s.(char(name))
ans =
1
Strings do have a bit of overhead, but still increase by 2 bytes per character. After every 8 characters it increases the size of the variable. The red line is y=2x+127.
figure is created using:
v=[];N=100;
for ct = 1:N
s=char(randi([0 255],[1,ct]));
s=string(s);
a=whos('s');v(ct)=a.bytes;
end
figure(1);clf
plot(v)
xlabel('# characters')
ylabel('# bytes')
p=polyfit(1:N,v,1);
hold on
plot([0,N],[127,2*N+127],'r')
hold off
One important practical thing to note is, that strings and chars behave differently when interacting with square brackets. This can be especially confusing when coming from python. consider following example:
>>['asdf' '123']
ans =
'asdf123'
>> ["asdf" "123"]
ans =
1×2 string array
"asdf" "123"
I would like to separate each element in the matrix below with a comma.
1 2 3
4 5 6
7 8 9
Here's my attempt:
s= sprintf('%.17g,',matrix)
Ouput=1,2,3,4,5,6,7,8,9,
Desired output:
1, 2, 3
4, 5, 6
7, 8, 9
Thanks in advance for your suggestions.
You just need to specify the formatting of the entire first line:
s = sprintf('%.17g, %.17g, %.17g\n',matrix.')
MATLAB keeps re-using the formatting string as long as there are elements left in matrix.
To generalize this process, use the following expression:
s = sprintf([strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n'], matrix.')
So there's a lot going on in this one line - let's unpack it from inside out:
repmat({'%.17g'},1,size(matrix,2))
This sub-expression takes a single cell array of size 1x1, containing the string %.17g, and duplicates it into a cell array with dimensions specified by the next two arguments. We want to construct a cell array with a single row (hence the argument 1) representing all the format specifiers (%...) we need. Since we want one instance of %.17g for each column, we use size(matrix,2) as the last argument to repmat, since that returns the number of columns of the matrix.
As an example, if you have 5 columns, you get this:
>> repmat({'%.17g'},1,5)
ans =
'%.17g' '%.17g' '%.17g' '%.17g' '%.17g'
Next, since you want columns delimited by commas and spaces, you can use strjoin():
>> strjoin(repmat({'%.17g'},1,5), ', ')
ans =
%.17g, %.17g, %.17g, %.17g, %.17g
Note the use of a comma and several spaces as the second argument (the delimiting string) to strjoin(). Adjust the number of spaces according to your display needs. We need one more thing to be able to print a multi-line matrix - a carriage return. To do this, we use the fact that two strings in square brackets [] are concatenated by MATLAB:
[strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n']
This produces the final formatting string that we need. All that is left, is to add the sprintf and pass in the matrix argument. As Rijul Sudhir pointed out, you do have to transpose your matrix because MATLAB will walk down a column to pair the matrix elements with the format specifiers.
EDIT: Stewie Griffin was correct about the transpose operation (.') - code has been corrected.
I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str
Hello I have these two vectors
Q = [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4]
and
Year = [2000,2000,2000,2000,2001,2001,2001,2001,2002.....]
and I would like to concatenate them into one single array Time
Time = [20001,20002,20003,20004,20010....]
Or
Time= {'2000Q1', '2000Q2', '2000Q3', '2000Q4', '2001Q1'....}
So far I tried with this code
m = zeros(136,1)
for i=1:136
m(i,1)= strcat(Q(i),Year(i));
end
And Matlab outputed me this:
Subscripted assignment dimension mismatch.
Help pls ?
If your vectors Year and Q have the same number of elements, you do not need a loop, just transpose them (or just make sure they are in column), then concatenate with the [] operator:
Time = [ num2str(Year.') num2str(Q.') ] ;
will give you:
20001
20002
20003
20004
20011
...
And if you want the 'Q' character, insert it in the expression:
Time = [ num2str(Year.') repmat('Q',length(Q),1) num2str(Q.') ]
Will give you:
2000Q1
2000Q2
2000Q3
2000Q4
2001Q1
...
This will be a char array, if you want a cell array, use cellstr on the same expression:
time = cellstr( [num2str(Year.') repmat('Q',length(Q),1) num2str(Q.')] ) ;
To obtain strings:
strtrim(mat2cell(num2str([Year(:) Q(:) ],'%i%i'), ones(1,numel(Q))));
Explanation:
Concat both numeric vectors as two columns (using [...])
Convert to char array, where each row is the concatenation of two numbers (using num2str with sprintf-like format specifiers). It is assumed that all numbers are integers (if not, change the format specifiers). This may introduce unwanted spaces if not all the concatenated numbers have the same number of digits.
Convert to a cell array, putting each row in a different cell (using mat2cell).
Remove whitespaces in each cell (using strtrim)
To obtain numbers: apply str2double to the above:
str2double(strtrim(mat2cell(num2str([Year(:) Q(:) ],'%i%i'), ones(1,numel(Q)))));
Or compute directly
10.^ceil(max(log10(Q)))*Year + Q;
You can use arrayfun
If you want your output in string format (with a 'Q' in the middle) then use sprintf to format the string
Time = arrayfun( #(y,q) sprintf('%dQ%d', y, q ), Year, Q, 'uni', 0 );
Resulting with a cellarray
Time =
'2000Q1' '2000Q2' '2000Q3' '2000Q4' '2001Q1' '2001Q2' '2001Q3'...
Alternatively, if you skip the 'Q' you can save each number in an array
Time = arrayfun( #(y,q) y*10+q, Year, Q )
Resulting with a regular array
Time =
20001 20002 20003 20004 20011 20012 20013 ...
Thats because you are initializing m to zeros(136,1) and then trying to save a full string into the first value. and obviously a double cannot hold a string.
I give you 2 options, but I favor the first one.
1.- you can just use cell arrays, so your code converts into:
m = cell(136,1)
for ii=1:136
m{ii}= strcat(Q(ii),Year(ii));
end
and then m will be: m{1}='2000Q1';
2.- Or if you know that your strings will ALWAYS be the same size (in your case it lokos like they are always 6) you can:
m = zeros(136,strsize)
for ii=1:136
m(ii,:)= strcat(Q(ii),Year(ii));
end
and then m will be: m(1,:)= [ 50 48 48 48 81 49 ] wich translated do ASCII will be 2000Q1