Reading character by character from a string into an array - matlab

What is the usual way in MATLAB to read an integer into an array, digit by digit?
I'm trying to split a four digit integer, 1234 into an array [1 2 3 4].

Here is a very easy way to do it for a single integer
s = num2str(1234)
for t=length(s):-1:1
result(t) = str2num(s(t));
end
The most compact way however, would be:
'1234'-'0'

Or try this
result = str2num(num2str(1234)')'

You can use arrayfun
arrayfun(#str2num, num2str(x))

Here is an elegant and efficient solution using a recursive function:
function d = int2dig(n)
if n >= 10
d = [int2dig(floor(n/10)),mod(n,10)];
else
d = n;
end

Related

Convert number in scientific notation to string in Matlab

I would like to convert a scientific number to a string in Matlab but using a rather specific format. I started with the following:
>> num2str(1E4, '%e')
ans =
'1.000000e+04'
Then played around with the formatstring to get rid of the digits after the decimal point in the first part
>> num2str(1E4, '%.0e')
ans =
'1e+04'
The thing is I want it exactly how I am expressing it in numbers, namely I want a string like this '1E4'. I could use strrep to get rid of that plus sign but I refuse to use it to get rid of the leading 0 on the +04 part since I have other instances of the variable which have things like +10. It it feasible to reproduce the number as a string without resorting to some big complicated algorithm? Preferably using the formatstring?
Solution
According to num2str documentation, you need to use a format parameter of and precision parameter as follows:
num2str(1E4,'%.E')
Result
ans = 1E+04
Read about sprintf . LEt A be your number, to achieve what you want, you can use:
sprintf('%1.0e',A)
Here is a way to convert integers to scientific notation:
function out= scientific(num)
E = 0;
if mod(num,10) == 0
[f n]=factor(num);
E=min(n(ismember(f,[2 5])));
end
out = sprintf('%dE%d',num/10^E,E);
end
>> scientific(134)
ans = 134E0
>> scientific(134000)
ans = 134E3
Another solution that accepts input as vector:
function out= scientific2(num)
E = sum(cumsum(num2str(num(:))-48,2,'reverse')==0,2);
out = num2str([num(:)./10.^E,E],'%dE%d\n');
end
You could use a combination of sprintf and regexprep.
my_format = #(x)regexprep(sprintf('%.E',x),'E\+0*','E');
Examples:
>> my_format(1E4)
ans =
1E4
>> my_format(2E12)
ans =
2E12
This is not ideal for all cases:
>> my_format(5) % Expect 5E0
ans =
5E
>> my_format(1E-4) % Expect 1E-4
ans =
1E-04
We can fix the first case with a token:
f2 = #(x)regexprep(sprintf('%.E',x),'E\+0*(\d)','E$1');
>> {f2(1E4), f2(1E20), f2(5)}
ans =
'1E4' '1E20' '5E0'
And we can fix the second case with tokens and a ? quantifier:
>> f3 = #(x)regexprep(sprintf('%.E',x),'E\+?(-?)0*(\d)','E$1$2');
>> {f3(1E4), f3(1E20), f3(5),f3(1E-1),f3(2E-12)}
ans =
'1E4' '1E20' '5E0' '1E-1' '2E-12'
To explain, sprintf('%.E',x) formats x in scientific notation with E, e.g. 1E+04, then it finds
'E\+?(-?)0*(\d)'
E The literal E
\+?(-?) Either a + or a -; if - then save to group $1
0* As many 0s as it can match, subject to...
(\d) At least one digit, saves digit to group $2
Finally, the matched text is replaced with E$1$2, that is the literal E, then group $1 (a minus sign if found E-, nothing if found E+) and the group $2 (a single digit).

Importing data with engineering notation into Matlab

I've got a .xls file and I want to import it into Matlab by xlsread function..I get NaNs for numbers with engineering notation..like I get NaNs for 15.252 B or 1.25 M
Any suggestions?
Update: I can use [num,txt,raw] = xlsread('...') and the raw one is exactly what I want but how can I replace the Ms with (*106)?
First you could extract everything from excel in a cell array using
[~,~,raw] = xlsread('MyExcelFilename.xlsx')
Then you could write a simple function that returns a number from the string based on 'B', 'M' and so on. Here is such an example:
function mynumber = myfunc( mystring )
% get the numeric part
my_cell = regexp(mystring,'[0-9.]+','match');
mynumber = str2double(my_cell{1});
% get ending characters
my_cell = regexp(mystring,'[A-z]+','match');
mychars = my_cell{1};
% multiply the number based on char
switch mychars
case 'B'
mynumber = mynumber*1e9;
case 'M'
mynumber = mynumber*1e6;
otherwise
end
end
Of course there are other methods to split the numeric string from the rest, use what you want. For more info see the regexp documentation. Finally use cellfun to convert cell array to numeric array:
my_array = cellfun(#myfunc,raw);
EDIT:
Matlab does not offer any built-in formatting of strings in engineering format.
Source: http://se.mathworks.com/matlabcentral/answers/892-engineering-notation-printed-into-files
In the source you will find also function which would be helpful for you.

Replacing Several Different Character of a string

I have to write a function to replace the characters of a string with those letters.
A=U
T=A
G=C
C=G
Example:
Input: 'ATAGTACCGGTTA'
Therefore, the output should be:
'UAUCAUGGCCAAU'
I can replace only one character. However, I have no how to do several. I could replace several if '"G=C and C=G" this condition was not there.
I use:
in='ATAGTACCGGTTA'
check=in=='A'
in(check)='U'
ans='UTUGTUCCGGTTU'
if I keep doing this at some point G will be replaced by C then then all the C will be replaced by G. How can I stop this?? Any help will be appreciated.
Just for fun, here's probably the absolute simplest way, via indexing:
key = 'UGCA';
[~, ~, idx] = unique(in);
out = key(idx'); % transpose idx since unique() returns a column vector
I do love indexing :D
Edit: As rightly pointed out, this is very optimised for the question as stated. Since [a, ~, idx] = unique(in); returns a and idx such that a(idx) == in, and by default a is sorted, we can just assume that a == 'ACGT' and pre-construct key to be the appropriate translation of indices into a.
If some characters from the known alphabet never appear in the input string, or if other unknown characters appear, then the indices don't match and the assumption breaks. In that case, we have to calculate the appropriate key explicitly - filling in the step that was optimised out above:
alph = 'ACGT';
trans = 'UGCA';
[key, ~, idx] = unique(in);
[~, alphidx, keyidx] = intersect(alph, key); % find which elements of alph
% appear at which points in key
key(keyidx) = trans(alphidx); % translate the elements of key that we can
out = key(idx');
The simplest way would be to use an intermediary letter. For instance:
in='ATAGTACCGGTTA'
in(in == 'A')='U'
in(in == 'T')='A'
in(in == 'C')='X'
in(in == 'G')='C'
in(in == 'X')='G'
This way you keep the 'C' and 'G' characters separate.
EDIT:
As others have mentioned, there are a few things other things you could do to improve this approach (though personally I think Notlikethat's way is cleanest). For instance, if you use a second variable, you don't have to worry about keeping 'C' and 'G' separate:
in='ATAGTACCGGTTA'
out=in;
out(in == 'A')='U';
out(in == 'T')='A';
out(in == 'C')='G';
out(in == 'G')='C';
Alternatively, you could make your indices first, then index after:
in='ATAGTACCGGTTA'
inA=in=='A';
inT=in=='T';
inC=in=='C';
inG=in=='G';
in(inA)='U';
in(inT)='A';
in(inC)='G';
in(inG)='C';
Finally, my personal favourite for sheer idiocy:
out=char(in+floor((68-in).*(in<70)*7/4)*4-round(ceil((in-67)/4)*3.7));
(Seriously, that last one works)
You can perform multiple character translation with bsxfun.
Inputs:
in = 'ATAGTACCGGTTA';
pat = ['A','T','G','C'];
subst = ['U','A','C','G'];
out0 ='UAUCAUGGCCAAU';
Translate all characters simultaneously:
>> ii = (1:numel(pat))*bsxfun(#eq,in,pat.'); %' instead of repmat and .*
>> out = subst(ii)
out =
UAUCAUGGCCAAU
>> isequal(out,out0)
ans =
1
Say you only want to translate a subset of the characters, leaving part of the sequence intact, it is easily solved with logical indexing and a few extra lines:
% Leave the Gs and Cs in place
pat = ['A','T'];
subst = ['U','A'];
ii = (1:numel(pat))*bsxfun(#eq,in,pat.'); %' same
out = char(zeros(1,numel(in)));
nz = ii>0;
out(nz) = subst(ii(nz));
out(~nz) = in(~nz)
out =
UAUGAUCCGGAAU
The original Gs and Cs are unchanged; A became U, and T became A (T is gone).
I would suggest to use containter.Map:
m=containers.Map({'A','T','G','C'},{'U','A','C','G'})
mapfkt=#(input)(cell2mat(m.values(num2cell(input))))
Usage:
mapfkt('ATAGTACCGGTTA')
Here is another method that should be fairly efficient, general, and in the line of thought of your original attempt:
%Suppose this is your input
myString = 'abcdeabcde';
fromSting = 'ace';
toString = 'xyz';
%Then it just takes this:
[idx fromLocation] = ismember(myString,fromSting)
myString(idx)=toString(fromLocation(idx))
If you know that all letters need to be replaced, the last line can be slightly simplified as you wont need to use idx.

Concatenate strings of digits in matlab

Suppose I have a series of strings such as:
a = '101010101010'
b = '010101'
c = '000101010'
is there a way in Matlab to concatenate them and produce the binary number 101010101010010101000101010?
Use the concatenation operator [ ], with horizontal concatenation , (vertical concatenation ; will fail here unless you reshape() into column vectors):
[a,b,c]
However, I suggest storing your variables in a cell array:
s = {'101010101010','010101', '000101010'};
[s{:}]
or
cat(2,s{:})
To concatenate strings, you could say:
out = [a b c];
Alternatively:
out = strcat(a,b,c);
Yet another way:
out = sprintf('%s', a,b,c);
I think that this should work:
res = [a,b,c]
or alternatively call
res = strcat(a,b,c)
or, yet
res = cat(2,a,b,c)

How can I filter my array of numbers in Matlab/Octave?

I have a very trivial example where I'm trying to filter by matching a String:
A = [0:1:999];
B = A(int2str(A) == '999');
This
A(A > 990);
works
This
int2str(5) == '5'
also works
I just can't figure out why I cannot put the two together. I get an error about nonconformant arguments.
int2str(A) produces a very long char array (of size 1 x 4996) containing the string representations of all those numbers (including spacing) appended together end to end.
int2str(A) == '999'
So, in the statement above, you're trying to compare a matrix of size 1 x 4996 with another of size 1 x 3. This, of course, fails as the two either need to be of the same size, or at least one needs to be a scalar, in which case scalar expansion rules apply.
A(A > 990);
The above works because of logical indexing rules, the result will be the elements from the indices of A for which that condition holds true.
int2str(5) == '5'
This only works because the result of the int2str call is a 1 x 1 matrix ('5') and you're comparing it to another matrix of the same size. Try int2str(555) == '55' and it'll fail with the same error as above.
I'm not sure what result you expected from the original statements, but maybe you're looking for this:
A = [0:1:999];
B = int2str(A(A == 999)) % outputs '999'
I am not sure that the int2str() conversion is what you are looking for. (Also, why do you need to convert numbers to strings and then carry out a char comparison?)
Suppose you have a simpler case:
A = 1:3;
strA = int2str(A)
strA =
1 2 3
Note that this is a 1x7 char array. Thus, comparing it against a scalar char:
strA == '2'
ans =
0 0 0 1 0 0 0
Now, you might wanna transpose A and carry out the comparison:
int2str(A')=='2'
ans =
0
1
0
however, this approach will not work if the number of digits of each number is not the same because lower numbers will be padded with spaces (try creating A = 1:10 and comparing against '2').
Then, create a cell array of string without whitespaces and use strcmp():
csA = arrayfun(#int2str,A','un',0)
csA =
'1'
'2'
'3'
strcmp('2',csA)
Should be much faster, and correct to turn the string into a number, than the other way around. Try
B = A(A == str2double ('999'));