I would like to convert a scientific number to a string in Matlab but using a rather specific format. I started with the following:
>> num2str(1E4, '%e')
ans =
'1.000000e+04'
Then played around with the formatstring to get rid of the digits after the decimal point in the first part
>> num2str(1E4, '%.0e')
ans =
'1e+04'
The thing is I want it exactly how I am expressing it in numbers, namely I want a string like this '1E4'. I could use strrep to get rid of that plus sign but I refuse to use it to get rid of the leading 0 on the +04 part since I have other instances of the variable which have things like +10. It it feasible to reproduce the number as a string without resorting to some big complicated algorithm? Preferably using the formatstring?
Solution
According to num2str documentation, you need to use a format parameter of and precision parameter as follows:
num2str(1E4,'%.E')
Result
ans = 1E+04
Read about sprintf . LEt A be your number, to achieve what you want, you can use:
sprintf('%1.0e',A)
Here is a way to convert integers to scientific notation:
function out= scientific(num)
E = 0;
if mod(num,10) == 0
[f n]=factor(num);
E=min(n(ismember(f,[2 5])));
end
out = sprintf('%dE%d',num/10^E,E);
end
>> scientific(134)
ans = 134E0
>> scientific(134000)
ans = 134E3
Another solution that accepts input as vector:
function out= scientific2(num)
E = sum(cumsum(num2str(num(:))-48,2,'reverse')==0,2);
out = num2str([num(:)./10.^E,E],'%dE%d\n');
end
You could use a combination of sprintf and regexprep.
my_format = #(x)regexprep(sprintf('%.E',x),'E\+0*','E');
Examples:
>> my_format(1E4)
ans =
1E4
>> my_format(2E12)
ans =
2E12
This is not ideal for all cases:
>> my_format(5) % Expect 5E0
ans =
5E
>> my_format(1E-4) % Expect 1E-4
ans =
1E-04
We can fix the first case with a token:
f2 = #(x)regexprep(sprintf('%.E',x),'E\+0*(\d)','E$1');
>> {f2(1E4), f2(1E20), f2(5)}
ans =
'1E4' '1E20' '5E0'
And we can fix the second case with tokens and a ? quantifier:
>> f3 = #(x)regexprep(sprintf('%.E',x),'E\+?(-?)0*(\d)','E$1$2');
>> {f3(1E4), f3(1E20), f3(5),f3(1E-1),f3(2E-12)}
ans =
'1E4' '1E20' '5E0' '1E-1' '2E-12'
To explain, sprintf('%.E',x) formats x in scientific notation with E, e.g. 1E+04, then it finds
'E\+?(-?)0*(\d)'
E The literal E
\+?(-?) Either a + or a -; if - then save to group $1
0* As many 0s as it can match, subject to...
(\d) At least one digit, saves digit to group $2
Finally, the matched text is replaced with E$1$2, that is the literal E, then group $1 (a minus sign if found E-, nothing if found E+) and the group $2 (a single digit).
Related
Say I have an array that contains the following elements:
1.0e+14 *
1.3325 1.6485 2.0402 1.0485 1.2027 2.0615 1.7432 1.9709 1.4807 0.9012
Now, is there a way to grab 1.0e+14 * (base and exponent) individually?
If I do arr(10), then this will return 9.0120e+13 instead of 0.9012e+14.
Assuming the question is to grab any elements in the array with coefficient less than one. Is there a way to obtain 1.0e+14, so that I could just do arr(i) < 1.0e+14?
I assume you want string output.
Let a denote the input numeric array. You can do it this way, if you don't mind using evalc (a variant of eval, which is considered bad practice):
s = evalc('disp(a)');
s = regexp(s, '[\de+-\.]+', 'match');
This produces a cell array with the desired strings.
Example:
>> a = [1.2e-5 3.4e-6]
a =
1.0e-04 *
0.1200 0.0340
>> s = evalc('disp(a)');
>> s = regexp(s, '[\de+-\.]+', 'match')
s =
'1.0e-04' '0.1200' '0.0340'
Here is the original answer from Alain.
Basic math can tell you that:
floor(log10(N))
The log base 10 of a number tells you approximately how many digits before the decimal are in that number.
For instance, 99987123459823754 is 9.998E+016
log10(99987123459823754) is 16.9999441, the floor of which is 16 - which can basically tell you "the exponent in scientific notation is 16, very close to being 17".
Now you have the exponent of the scientific notation. This should allow you to get to whatever your goal is ;-).
And depending on what you want to do with your exponent and the number, you could also define your own method. An example is described in this thread.
Can I convert a symbol that is a product of products into an array of products?
I tried to do something like this:
syms A B C D;
D = A*B*C;
factor(D);
but it doesn't factor it out (mostly because that isn't what factor is designed to do).
ans =
A*B*C
I need it to work if A B or C is replaced with any arbitrarily complicated parenthesized function, and it would be nice to do it without knowing what variables are in the function.
For example (all variables are symbolic):
D = x*(x-1)*(cos(z) + n);
factoring_function(D);
should be:
[x, x-1, (cos(z) + n)]
It seems like a string parsing problem, but I'm not confident that I can convert back to symbolic variables afterwards (also, string parsing in matlab sounds really tedious).
Thank you!
Use regexp on the string to split based on *:
>> str = 'x*(x-1)*(cos(z) + n)';
>> factors_str = regexp(str, '\*', 'split')
factors_str =
'x' '(x-1)' '(cos(z) + n)'
The result factor_str is a cell array of strings. To convert to a cell array of sym objects, use
N = numel(factors_str);
factors = cell(1,N); %// each cell will hold a sym factor
for n = 1:N
factors{n} = sym(factors_str{n});
end
I ended up writing the code to do this in python using sympy. I think I'm going to port the matlab code over to python because it is a more preferred language for me. I'm not claiming this is fast, but it serves my purposes.
# Factors a sum of products function that is first order with respect to all symbolic variables
# into a reduced form using products of sums whenever possible.
# #params orig_exp A symbolic expression to be simplified
# #params depth Used to control indenting for printing
# #params verbose Whether to print or not
def factored(orig_exp, depth = 0, verbose = False):
# Prevents sympy from doing any additional factoring
exp = expand(orig_exp)
if verbose: tabs = '\t'*depth
terms = []
# Break up the added terms
while(exp != 0):
my_atoms = symvar(exp)
if verbose:
print tabs,"The expression is",exp
print tabs,my_atoms, len(my_atoms)
# There is nothing to sort out, only one term left
if len(my_atoms) <= 1:
terms.append((exp, 1))
break
(c,v) = collect_terms(exp, my_atoms[0])
# Makes sure it doesn't factor anything extra out
exp = expand(c[1])
if verbose:
print tabs, "Collecting", my_atoms[0], "terms."
print tabs,'Seperated terms with ',v[0], ', (',c[0],')'
# Factor the leftovers and recombine
c[0] = factored(c[0], depth + 1)
terms.append((v[0], c[0]))
# Combines trivial terms whenever possible
i=0
def termParser(thing): return str(thing[1])
terms = sorted(terms, key = termParser)
while i<len(terms)-1:
if equals(terms[i][1], terms[i+1][1]):
terms[i] = (terms[i][0]+terms[i+1][0], terms[i][1])
del terms[i+1]
else:
i += 1
recombine = sum([terms[i][0]*terms[i][1] for i in range(len(terms))])
return simplify(recombine, ratio = 1)
I have to write a function to replace the characters of a string with those letters.
A=U
T=A
G=C
C=G
Example:
Input: 'ATAGTACCGGTTA'
Therefore, the output should be:
'UAUCAUGGCCAAU'
I can replace only one character. However, I have no how to do several. I could replace several if '"G=C and C=G" this condition was not there.
I use:
in='ATAGTACCGGTTA'
check=in=='A'
in(check)='U'
ans='UTUGTUCCGGTTU'
if I keep doing this at some point G will be replaced by C then then all the C will be replaced by G. How can I stop this?? Any help will be appreciated.
Just for fun, here's probably the absolute simplest way, via indexing:
key = 'UGCA';
[~, ~, idx] = unique(in);
out = key(idx'); % transpose idx since unique() returns a column vector
I do love indexing :D
Edit: As rightly pointed out, this is very optimised for the question as stated. Since [a, ~, idx] = unique(in); returns a and idx such that a(idx) == in, and by default a is sorted, we can just assume that a == 'ACGT' and pre-construct key to be the appropriate translation of indices into a.
If some characters from the known alphabet never appear in the input string, or if other unknown characters appear, then the indices don't match and the assumption breaks. In that case, we have to calculate the appropriate key explicitly - filling in the step that was optimised out above:
alph = 'ACGT';
trans = 'UGCA';
[key, ~, idx] = unique(in);
[~, alphidx, keyidx] = intersect(alph, key); % find which elements of alph
% appear at which points in key
key(keyidx) = trans(alphidx); % translate the elements of key that we can
out = key(idx');
The simplest way would be to use an intermediary letter. For instance:
in='ATAGTACCGGTTA'
in(in == 'A')='U'
in(in == 'T')='A'
in(in == 'C')='X'
in(in == 'G')='C'
in(in == 'X')='G'
This way you keep the 'C' and 'G' characters separate.
EDIT:
As others have mentioned, there are a few things other things you could do to improve this approach (though personally I think Notlikethat's way is cleanest). For instance, if you use a second variable, you don't have to worry about keeping 'C' and 'G' separate:
in='ATAGTACCGGTTA'
out=in;
out(in == 'A')='U';
out(in == 'T')='A';
out(in == 'C')='G';
out(in == 'G')='C';
Alternatively, you could make your indices first, then index after:
in='ATAGTACCGGTTA'
inA=in=='A';
inT=in=='T';
inC=in=='C';
inG=in=='G';
in(inA)='U';
in(inT)='A';
in(inC)='G';
in(inG)='C';
Finally, my personal favourite for sheer idiocy:
out=char(in+floor((68-in).*(in<70)*7/4)*4-round(ceil((in-67)/4)*3.7));
(Seriously, that last one works)
You can perform multiple character translation with bsxfun.
Inputs:
in = 'ATAGTACCGGTTA';
pat = ['A','T','G','C'];
subst = ['U','A','C','G'];
out0 ='UAUCAUGGCCAAU';
Translate all characters simultaneously:
>> ii = (1:numel(pat))*bsxfun(#eq,in,pat.'); %' instead of repmat and .*
>> out = subst(ii)
out =
UAUCAUGGCCAAU
>> isequal(out,out0)
ans =
1
Say you only want to translate a subset of the characters, leaving part of the sequence intact, it is easily solved with logical indexing and a few extra lines:
% Leave the Gs and Cs in place
pat = ['A','T'];
subst = ['U','A'];
ii = (1:numel(pat))*bsxfun(#eq,in,pat.'); %' same
out = char(zeros(1,numel(in)));
nz = ii>0;
out(nz) = subst(ii(nz));
out(~nz) = in(~nz)
out =
UAUGAUCCGGAAU
The original Gs and Cs are unchanged; A became U, and T became A (T is gone).
I would suggest to use containter.Map:
m=containers.Map({'A','T','G','C'},{'U','A','C','G'})
mapfkt=#(input)(cell2mat(m.values(num2cell(input))))
Usage:
mapfkt('ATAGTACCGGTTA')
Here is another method that should be fairly efficient, general, and in the line of thought of your original attempt:
%Suppose this is your input
myString = 'abcdeabcde';
fromSting = 'ace';
toString = 'xyz';
%Then it just takes this:
[idx fromLocation] = ismember(myString,fromSting)
myString(idx)=toString(fromLocation(idx))
If you know that all letters need to be replaced, the last line can be slightly simplified as you wont need to use idx.
Say I typed x = 'BODD' into the command prompt of MATLAB and then said x(1) it would return B. What I want is x(1) to return the empty String ('') or nothing etc. and x(2) to return B and so forth up until x(5) returning the final D?
I assume that you mean that you really do want the empty zero-length string, ''. There have been some Answers to this question that assume that you meant that you wanted the one-character string that contains a space, ASCII value 32.
If that's the case, I'm afraid you can't to that - MATLAB arrays (including character arrays, which is all that a MATLAB "string" is) don't work that way. There are two ways to look at it...
You asked for x(1). Now, the indexing expression that you used, 1, has size 1x1. Therefore, you are guaranteed to get either a 1x1 return value, OR an error. That means that there's no way to get a 0x1 or 0x0 (the true "empty string"). This is similar to the way that, if you had asked for x(2:4), you would be guaranteed to get a 1x3 array of characters back. In that case, 2:4 is a 1x3 array.
There's no way to "meaningfully" prepend a zero-length string to the beginning of another string. If a = 'WXYZ';, then running b = ['' a] just returns 'WXYZ' back. It didn't somehow stick a magical placeholder for an empty string at the front of the original string.
You can't concatenate '' at end or beginning
However, you can have blank/space, like this :-
>> x=BODD;
>> x=[' ' x]; % Use normal matrix concatenation
>> x(1)
ans =
>> x(2)
ans =
B
Try following concatenation
x = [' ' x];
If you want the string itself to still be 'BODD', you could try writing a custom function:
function [char] = emptyConcat(string, index)
if (index == 1)
char = '';
else
char = string(index - 1);
I'm looking for a completely general way to convert any value to a string in MATLAB.
Basically, I want to be able to write something like
x = disp(y);
The above fails with the error Too many output arguments. (I was not able to find the source code for disp.)
Is there a single MATLAB function for converting any value into a string?
(Note that this function should behave like the identity when passed a string.)
Basically I'm looking for MATLAB's equivalent of Python's str. I thought it might be char, but (for example) char(Inf) fails to produce anything like the string 'Inf'. (Note: that was just an example. It does not begin to cover all the possibilities.)
pm89's answer has the right idea, but doesn't work because evalc requires a string as input. I suggest making your own function like so:
function str = anything2string(thing)
str = evalc('disp(thing)');
It works for anything that Matlab can display:
>> anything2string(3)
ans =
3
>> anything2string(Inf)
ans =
Inf
>> anything2string('hi')
ans =
hi
>> anything2string(1:4)
ans =
1 2 3 4
It's not quite the same as Python's str, but num2str works with Inf and handles strings as input.
num2str(Inf)
ans = Inf
num2str('some string')
ans = some string
You could get the exact same string as you see in your command window using evalc (evaluate and capture the result):
x = evalc('disp(y)'); % y could be anything displayable by Matlab!