Text formating - write out words following XY

Text formating - write out words following XY - matlab

I am fairly new to MATLAB and for my homework I have a block of text from which I need to select words after "and" or "And" and then replace every letter X for letter Y. I know how to do this in python through .split() and cycling through a sting(word) where I search for X. However, in matlab I am lost. Could you please tell me if there are some equivalent commands ? Something along the commands like
fileread
textscan
fseek
Thank you
EDIT:
What I actaully meant was that from a string:
str = 'I like apples and pineapples and other fruit'
I need to obtain
'pineapples'
'other'
and return these with 'e' switched for 'z'

Use regular case-insensitive regular expressions. Find everything after and or And, and switch X with Y:
str = 'This is a text with X and X and Z'
[startIndex,endIndex] = regexpi(str,'and');
str2 = str(endIndex(1) + 1 : end)
str2(str2 == 'X') = 'Y';
str = [str(1:endIndex), str2]
str =
This is a text with X and Y and Z
It is a bit messy. I guess it could be done simpler, but at least it works! If you don't case about the case of X, use strcmpi instead of ==.
UPDATE#:
After your comment, I guess this should work:
[startIndex,endIndex] = regexpi(str,'and');
str2 = str(endIndex(1) + 1 : end);
words = regexp(str2,' ','split');
nums = cellfun(#(x) find(x == 'e'), words, 'UniformOutput', false);
[idx] = find(~cellfun(#isempty, nums));
wordList = words(idx)
wordList(wordList == 'e') = 'X'

Related

How to print a chain of arrows with fprintf

I am looking for a way to set up the fprintf function so that it returns the string 1->2->...->n for any input n. However, I cannot find a way to do so without having an extra arrow attached at the beginning (->1->2->...->n) or the end of the string (1->2->...->n->). Is there a way around this?

You could use strjoin for this...
n = 4;
str = strjoin( arrayfun(#num2str, 1:n, 'uni', 0), '->' );
% str = '1->2->3->4'
Or if you're set on using fprintf (or sprintf), you could manually add the first element (for ease, assume n >= 1)
str = ['1', sprintf('->%.0f', 2:n )];
If you just want to print these to the Command Window, simply use disp on either option instead of (or after) assigning to str. If you're writing to a file with fprintf then simply use fprintf( fid, [str '\n'] ) to print the line to file.

For this type of task, the solution is to print either the first or the last element separately:
n = 8;
fprintf('%d', 1);
fprintf('->%d', 2:n);
fprintf('\n');

Here's another approach to build the desired string:
n = 10;
str = regexprep(num2str(1:n), '\s+', '->');
This gives
str =
'1->2->3->4->5->6->7->8->9->10'

Convert matlab symbol to array of products

Can I convert a symbol that is a product of products into an array of products?
I tried to do something like this:
syms A B C D;
D = A*B*C;
factor(D);
but it doesn't factor it out (mostly because that isn't what factor is designed to do).
ans =
A*B*C
I need it to work if A B or C is replaced with any arbitrarily complicated parenthesized function, and it would be nice to do it without knowing what variables are in the function.
For example (all variables are symbolic):
D = x*(x-1)*(cos(z) + n);
factoring_function(D);
should be:
[x, x-1, (cos(z) + n)]
It seems like a string parsing problem, but I'm not confident that I can convert back to symbolic variables afterwards (also, string parsing in matlab sounds really tedious).
Thank you!

Use regexp on the string to split based on *:
>> str = 'x*(x-1)*(cos(z) + n)';
>> factors_str = regexp(str, '\*', 'split')
factors_str =
'x' '(x-1)' '(cos(z) + n)'
The result factor_str is a cell array of strings. To convert to a cell array of sym objects, use
N = numel(factors_str);
factors = cell(1,N); %// each cell will hold a sym factor
for n = 1:N
factors{n} = sym(factors_str{n});
end

I ended up writing the code to do this in python using sympy. I think I'm going to port the matlab code over to python because it is a more preferred language for me. I'm not claiming this is fast, but it serves my purposes.
# Factors a sum of products function that is first order with respect to all symbolic variables
# into a reduced form using products of sums whenever possible.
# #params orig_exp A symbolic expression to be simplified
# #params depth Used to control indenting for printing
# #params verbose Whether to print or not
def factored(orig_exp, depth = 0, verbose = False):
# Prevents sympy from doing any additional factoring
exp = expand(orig_exp)
if verbose: tabs = '\t'*depth
terms = []
# Break up the added terms
while(exp != 0):
my_atoms = symvar(exp)
if verbose:
print tabs,"The expression is",exp
print tabs,my_atoms, len(my_atoms)
# There is nothing to sort out, only one term left
if len(my_atoms) <= 1:
terms.append((exp, 1))
break
(c,v) = collect_terms(exp, my_atoms[0])
# Makes sure it doesn't factor anything extra out
exp = expand(c[1])
if verbose:
print tabs, "Collecting", my_atoms[0], "terms."
print tabs,'Seperated terms with ',v[0], ', (',c[0],')'
# Factor the leftovers and recombine
c[0] = factored(c[0], depth + 1)
terms.append((v[0], c[0]))
# Combines trivial terms whenever possible
i=0
def termParser(thing): return str(thing[1])
terms = sorted(terms, key = termParser)
while i<len(terms)-1:
if equals(terms[i][1], terms[i+1][1]):
terms[i] = (terms[i][0]+terms[i+1][0], terms[i][1])
del terms[i+1]
else:
i += 1
recombine = sum([terms[i][0]*terms[i][1] for i in range(len(terms))])
return simplify(recombine, ratio = 1)

Copy text from a string that meets a certain condition MATLAB

i have a strings from a text file:
20130806_083642832,!AIVDM,1,1,,B,13aFeA0P00PEqQNNC4Um7Ow`2#O2,0*5E
20130806_083643032,!AIVDM,2,1,4,B,E>jN6<0W6#1WPab3bPa2#LtP0000:uoH?9Ur,0*50
i need to go through the characters and extract the date at the start then the message that starts after B, (but could also be A,) up until ,0
Any thoughts?

Ok, there are much more elegant ways to solve this, but my following example will give you a feeling on how to manipulate strings in MatLab (Which might be the thing you are having problems with). Here you go:
String='20130806_083642832,!AIVDM,1,1,,B,13aFeA0P00PEqQNNC4Um7Ow`2#O2,0*5E'
for i=1:length(String)
if(strcmp(String(i),'B')) %or strcmp(String(i),'A')
for j=i:length(String) %or "for j=length(String):i" if you meant the last 0 ;)
if(strcmp(String(j),'0'))
String2=String(i:j)
break
end
end
break
end
end
Output
String =
20130806_083642832,!AIVDM,1,1,,B,13aFeA0P00PEqQNNC4Um7Ow`2#O2,0*5E
String2 =
B,13aFeA0
Just play around with string indexing and with strcmp or strcmpi and you'll get a feeling and will be able to write much nicer expressions.
Now try extracting the date by yourself!
Hope that helps!

Without loops you could do something like this:
startString = ['20130806_083642832,!AIVDM,1,1,,B,13aFeA0P00PEqQNNC4Um7Ow`2#O2,0*5E'];
startPosition = find(startString == 'B') + 1;
if ~startPosition
startPosition = find(startString == 'A') + 1;
end
tmpMessage = startString(startPosition:end);
endPosition = find(tmpMessage == '0') - 1;
outMessage = tmpMessage(1:endPosition(1))
dateString = startString(1:8)
This gives the output:
outMessage = ,13aFeA
dateString = 20130806

Replacing Several Different Character of a string

I have to write a function to replace the characters of a string with those letters.
A=U
T=A
G=C
C=G
Example:
Input: 'ATAGTACCGGTTA'
Therefore, the output should be:
'UAUCAUGGCCAAU'
I can replace only one character. However, I have no how to do several. I could replace several if '"G=C and C=G" this condition was not there.
I use:
in='ATAGTACCGGTTA'
check=in=='A'
in(check)='U'
ans='UTUGTUCCGGTTU'
if I keep doing this at some point G will be replaced by C then then all the C will be replaced by G. How can I stop this?? Any help will be appreciated.

Just for fun, here's probably the absolute simplest way, via indexing:
key = 'UGCA';
[~, ~, idx] = unique(in);
out = key(idx'); % transpose idx since unique() returns a column vector
I do love indexing :D
Edit: As rightly pointed out, this is very optimised for the question as stated. Since [a, ~, idx] = unique(in); returns a and idx such that a(idx) == in, and by default a is sorted, we can just assume that a == 'ACGT' and pre-construct key to be the appropriate translation of indices into a.
If some characters from the known alphabet never appear in the input string, or if other unknown characters appear, then the indices don't match and the assumption breaks. In that case, we have to calculate the appropriate key explicitly - filling in the step that was optimised out above:
alph = 'ACGT';
trans = 'UGCA';
[key, ~, idx] = unique(in);
[~, alphidx, keyidx] = intersect(alph, key); % find which elements of alph
% appear at which points in key
key(keyidx) = trans(alphidx); % translate the elements of key that we can
out = key(idx');

The simplest way would be to use an intermediary letter. For instance:
in='ATAGTACCGGTTA'
in(in == 'A')='U'
in(in == 'T')='A'
in(in == 'C')='X'
in(in == 'G')='C'
in(in == 'X')='G'
This way you keep the 'C' and 'G' characters separate.
EDIT:
As others have mentioned, there are a few things other things you could do to improve this approach (though personally I think Notlikethat's way is cleanest). For instance, if you use a second variable, you don't have to worry about keeping 'C' and 'G' separate:
in='ATAGTACCGGTTA'
out=in;
out(in == 'A')='U';
out(in == 'T')='A';
out(in == 'C')='G';
out(in == 'G')='C';
Alternatively, you could make your indices first, then index after:
in='ATAGTACCGGTTA'
inA=in=='A';
inT=in=='T';
inC=in=='C';
inG=in=='G';
in(inA)='U';
in(inT)='A';
in(inC)='G';
in(inG)='C';
Finally, my personal favourite for sheer idiocy:
out=char(in+floor((68-in).*(in<70)*7/4)*4-round(ceil((in-67)/4)*3.7));
(Seriously, that last one works)

You can perform multiple character translation with bsxfun.
Inputs:
in = 'ATAGTACCGGTTA';
pat = ['A','T','G','C'];
subst = ['U','A','C','G'];
out0 ='UAUCAUGGCCAAU';
Translate all characters simultaneously:
>> ii = (1:numel(pat))*bsxfun(#eq,in,pat.'); %' instead of repmat and .*
>> out = subst(ii)
out =
UAUCAUGGCCAAU
>> isequal(out,out0)
ans =
1
Say you only want to translate a subset of the characters, leaving part of the sequence intact, it is easily solved with logical indexing and a few extra lines:
% Leave the Gs and Cs in place
pat = ['A','T'];
subst = ['U','A'];
ii = (1:numel(pat))*bsxfun(#eq,in,pat.'); %' same
out = char(zeros(1,numel(in)));
nz = ii>0;
out(nz) = subst(ii(nz));
out(~nz) = in(~nz)
out =
UAUGAUCCGGAAU
The original Gs and Cs are unchanged; A became U, and T became A (T is gone).

I would suggest to use containter.Map:
m=containers.Map({'A','T','G','C'},{'U','A','C','G'})
mapfkt=#(input)(cell2mat(m.values(num2cell(input))))
Usage:
mapfkt('ATAGTACCGGTTA')

Here is another method that should be fairly efficient, general, and in the line of thought of your original attempt:
%Suppose this is your input
myString = 'abcdeabcde';
fromSting = 'ace';
toString = 'xyz';
%Then it just takes this:
[idx fromLocation] = ismember(myString,fromSting)
myString(idx)=toString(fromLocation(idx))
If you know that all letters need to be replaced, the last line can be slightly simplified as you wont need to use idx.

Creating a translator in MatLab

I am trying to create a simple program in Matlab where the user can input a string (such as "A", "B", "AB" or "A B") and the program will output a word corresponding to my letter.
Input | Output
A Hello
B Hola
AB HelloHola
A B Hello Hola
This is my code:
A='Hello'; B='Hola';
userText = input('What is your message: ', 's');
userText = upper(userText);
for ind = 1:length(userText)
current = userText(ind);
X = ['The output is ', current];
disp(X);
end
Currently I don't get my desired results. I instead get this:
Input | Output
A The output is A
B The output is B
I'm not totally sure why X = ['The output is ', current]; evaluates to The output is A instead of The output is Hello.
Edit:
How would this program be able to handle numbers... such as 1 = "Goodbye"

What's going on:
%// intput text
userText = input('What is your message: ', 's');
%// ...and some lines later
X = ['The output is ', userText];
You never map what you type to what is contained by the variables A and B.
The fact that they are called A and B has nothing to do with what you type. You could call them C and blargh and still get the same result.
Now, you could use eval, but that's really not advisable here. In this case, using eval would force the one typing in the strings to know the exact names of your variables...that's a portability, maintainability, security, etc. disaster waiting to happen.
There are better solutions possible, for instance, create a simple map:
map = {
'A' 'Hello'
'B' 'Hola'
'1' 'Goodbye'
};
userText = input('What is your message: ', 's');
str = map{strcmpi(map(:,1), userText), 2};
disp(['The output is ', str]);

I would recommend using a map object to contain what you want. This will circumvent the eval function (which I suggest avoiding like the plague). This is pretty simple to read and understand, and is pretty efficient especially in the case of a long input string.
translation = containers.Map()
translation('A') = 'Hola';
translation('B') = 'Hello';
translation('1') = 'Goodbye';
inputString = 'ABA1BA1B11ABBA';
resultString = '';
for i = 1:length(inputString)
if translation.isKey(inputString(i))
% get mapped string if it exists
resultString = [resultString,translation(inputString(i))];
else
% if mapping does not exist, simply put the input string in (covers space case)
resultString = [resultString,inputString(i)];
end
end

Take a look at the command eval. Currently, you are displaying the name of the variable that contains the string you want. eval will help you in actually accessing and printing it.

What you need to do it :
X = ['The output is ', eval(current)];
Here the documentation : eval

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Text formating - write out words following XY - matlab

Related

How to print a chain of arrows with fprintf

Convert matlab symbol to array of products

Copy text from a string that meets a certain condition MATLAB

Replacing Several Different Character of a string

Creating a translator in MatLab

Categories

Resources