Adding specific letters to a string MATLAB - matlab

I'm a neuroscience/biomedical engineering major struggling with this whole MATLAB programming ordeal and so far, this website is the best teacher available to me right now. I am currently having trouble with one of my HW problems. What I need to do is take a phrase, find a specific word in it, then take a specific letter in it and increase that letter by the number indicated. In other words:
phrase = 'this homework is so hard'
word = 'so'
letter = 'o'
factor = 5
which should give me 'This homework is sooooo hard'
I got rid of my main error, though I really don;t know how. I exited MATLAB, then got back into it. Lo and behold, it magically worked.
function[out1] = textStretch(phrase, word, letter, stretch)
searchword= strfind(phrase, word);
searchletter strfind(hotdog, letter); %Looks for the letter in the word
add = (letter+stretch) %I was hoping this would take the letter and add to it, but that's not what it does
replace= strrep(phrase, word, add) %This would theoretically take the phrase, find the word and put in the new letter
out1 = replace
According to the teacher, the ones() function might be useful, and I have to concatenate strings, but if I can just find it in the string and replace it, why do I need to concatenate?

Since this is homework I won't write the whole thing out for you but you were on the right track with strfind.
a = strfind(phrase, word);
b = strfind(word, letter);
What does phrase(1:a) return? What does phrase(a+b:end) return?
Making some assumptions about why your teacher wants you to use ones:
What does num = double('o') return?
What does char(num) return? How about char([num num])?
You can concatenate strings like this:
out = [phrase(1:a),'ooooo',phrase(a+b:end)];
So all you really need to focus on is how to get a string which is letter repeated factor times.
If you wanted to use strrep instead you would need to give it the full word you are searching for and a copy of that word with the repeated letters in:
new_phrase = strrep(phrase, 'so', 'sooooo');
Again, the issue is how to get the 'sooooo' string.

See if this works for you -
phrase_split = regexp(phrase,'\s','Split'); %// Split into words as cells
wordr = cellstr(strrep(word,letter,letter(:,ones(1,factor))));%// Stretched word
phrase_split(strcmp(phrase_split,word)) = wordr;%//Put stretched word into place
out = strjoin(phrase_split) %// %// Output as the string cells joined together
Note: strjoin needs a recent MATLAB version, which if unavailable could be obtained from here.
Or you can just use a hack obtained from the m-file itself -
out = [repmat(sprintf(['%s', ' '], phrase_split{1:end-1}), ...
1, ~isscalar(phrase_split)), sprintf('%s', phrase_split{end})]
Sample run -
phrase =
this homework is so hard and so boring
word =
so
letter =
o
factor =
5
out =
this homework is sooooo hard and sooooo boring
So, just wrap the code into a function wrapper like this -
function out = textStretch(phrase, word, letter, factor)
Homework molded edit:
phrase = 'this homework is seriously hard'
word = 'seriously'
letter = 'r'
stretch = 6
out = phrase
stretched_word = letter(:,ones(1,stretch))
hotdog = strfind(phrase, word)
hotdog_st = strfind(word,letter)
start_ind = hotdog+hotdog_st-1
out(start_ind+stretch:end+stretch-1) = out(start_ind+1:end)
out(hotdog+hotdog_st-1:hotdog+hotdog_st-1+stretch-1) = stretched_word
Output -
out =
this homework is serrrrrriously hard
As again, use this syntax to convert to function -
function out = textStretch(phrase, word, letter, stretch)

Well Jessica first of all this is WRONG, but I am not here to give you the solution. Could you please just use it this way? This surely run.
function main_script()
phrase = 'this homework is so hard';
word = 'so';
letter = 'o';
factor = 5;
[flirty] = textStretchNEW(phrase, word, letter, factor)
end
function [flirty] = textStretchNEW(phrase, word, letter, stretch)
hotdog = strfind(phrase, word);
colddog = strfind(hotdog, letter);
add = letter + stretch;
hug = strrep(phrase, word, add);
flirty = hug
end

Related

Shifting a string in matlab

Ok so I have retrieved this string from the text file now I am supposed to shift it by a specified amount. so for example, if the string I retrieved was
To be, or not to be
That is the question
and the shift number was 5 then the output should be
stionTo be, or not
to beThat is the que
I was going to use circshift but the given string wouldn't of a matching dimesions. Also the string i would retrieve would be from .txt file.
So here is the code i used
S = sprintf('To be, or not to be\nThat is the question')
circshift(S,5,2)
but the output is
stionTo be, or not to be
That is the que
but i need
stionTo be, or not
to beThat is the que
By storing the locations of the new lines, removing the new lines and adding them back in later we can achieve this. This code does rely on the insertAfter function which is only available in MATLAB 2016b and later.
S = sprintf('To be, or not to be\nThat is the \n question');
newline = regexp(S,'\n');
S(newline) = '';
S = circshift(S,5,2);
for ii = 1:numel(newline)
S = insertAfter(S,newline(ii)-numel(newline)+ii,'\n');
end
S = sprintf(S);
You can do this by performing a circular shift on the indices of the non-newline characters. (The code below actually skips all control characters with ASCII code < 32.)
function T = strshift(S, k)
T = S;
c = find(S >= ' '); % shift only printable characters (ascii code >= 32)
T(c) = T(circshift(c, k, 2));
end
Sample run:
>> S = sprintf('To be, or not to be\nThat is the question')
S = To be, or not to be
That is the question
>> r = strshift(S, 5)
r = stionTo be, or not
to beThat is the que
If you want to skip only the newline characters, just change to
c = find(S != 10);

How to get rid of the punctuation? and check the spelling error

eliminate punctuation
words split when meeting new line and space, then store in array
check the text file got error or not with the function of checkSpelling.m file
sum up the total number of error in that article
no suggestion is assumed to be no error, then return -1
sum of error>20, return 1
sum of error<=20, return -1
I would like to check spelling error of certain paragraph, I face the problem to get rid of the punctuation. It may have problem to the other reason, it return me the error as below:
My data2 file is :
checkSpelling.m
function suggestion = checkSpelling(word)
h = actxserver('word.application');
h.Document.Add;
correct = h.CheckSpelling(word);
if correct
suggestion = []; %return empty if spelled correctly
else
%If incorrect and there are suggestions, return them in a cell array
if h.GetSpellingSuggestions(word).count > 0
count = h.GetSpellingSuggestions(word).count;
for i = 1:count
suggestion{i} = h.GetSpellingSuggestions(word).Item(i).get('name');
end
else
%If incorrect but there are no suggestions, return this:
suggestion = 'no suggestion';
end
end
%Quit Word to release the server
h.Quit
f19.m
for i = 1:1
data2=fopen(strcat('DATA\PRE-PROCESS_DATA\F19\',int2str(i),'.txt'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
word_punctuation=regexprep(CharData,'[`~!##$%^&*()-_=+[{]}\|;:\''<,>.?/','')
word_newLine = regexp(word_punctuation, '\n', 'split')
word = regexp(word_newLine, ' ', 'split')
[sizeData b] = size(word)
suggestion = cellfun(#checkSpelling, word, 'UniformOutput', 0)
A19(i)=sum(~cellfun(#isempty,suggestion))
feature19(A19(i)>=20)=1
feature19(A19(i)<20)=-1
end
Substitute your regexprep call to
word_punctuation=regexprep(CharData,'\W','\n');
Here \W finds all non-alphanumeric characters (inclulding spaces) that get substituted with the newline.
Then
word = regexp(word_punctuation, '\n', 'split');
As you can see you don't need to split by space (see above). But you can remove the empty cells:
word(cellfun(#isempty,word)) = [];
Everything worked for me. However I have to say that you checkSpelling function is very slow. At every call it has to create an ActiveX server object, add new document, and delete the object after check is done. Consider rewriting the function to accept cell array of strings.
UPDATE
The only problem I see is removing the quote ' character (I'm, don't, etc). You can temporary substitute them with underscore (yes, it's considered alphanumeric) or any sequence of unused characters. Or you can use list of all non-alphanumeric characters to be remove in square brackets instead of \W.
UPDATE 2
Another solution to the 1st UPDATE:
word_punctuation=regexprep(CharData,'[^A-Za-z0-9''_]','\n');

Capitalizing only the first letters without changing any numbers or punctuation

I would like to modify a string that will have make the first letter capitalized and all other letters lower cased, and anything else will be unchanged.
I tried this:
function new_string=switchCase(str1)
%str1 represents the given string containing word or phrase
str1Lower=lower(str1);
spaces=str1Lower==' ';
caps1=[true spaces];
%we want the first letter and the letters after space to be capital.
strNew1=str1Lower;
strNew1(caps1)=strNew1(caps1)-32;
end
This function works nicely if there is nothing other than a letter after space. If we have anything else for example:
str1='WOW ! my ~Code~ Works !!'
Then it gives
new_string =
'Wow My ^code~ Works !'
However, it has to give (according to the requirement),
new_string =
'Wow! My ~code~ Works !'
I found a code which has similarity with this problem. However, that is ambiguous. Here I can ask question if I don't understand.
Any help will be appreciated! Thanks.
Interesting question +1.
I think the following should fulfil your requirements. I've written it as an example sub-routine and broken down each step so it is obvious what I'm doing. It should be straightforward to condense it into a function from here.
Note, there is probably also a clever way to do this with a single regular expression, but I'm not very good with regular expressions :-) I doubt a regular expression based solution will run much faster than what I've provided (but am happy to be proven wrong).
%# Your example string
Str1 ='WOW ! my ~Code~ Works !!';
%# Convert case to lower
Str1 = lower(Str1);
%# Convert to ascii
Str1 = double(Str1);
%# Find an index of all locations after spaces
I1 = logical([0, (Str1(1:end-1) == 32)]);
%# Eliminate locations that don't contain lower-case characters
I1 = logical(I1 .* ((Str1 >= 97) & (Str1 <= 122)));
%# Check manually if the first location contains a lower-case character
if Str1(1) >= 97 && Str1(1) <= 122; I1(1) = true; end;
%# Adjust all appropriate characters in ascii form
Str1(I1) = Str1(I1) - 32;
%# Convert result back to a string
Str1 = char(Str1);

Extracting strings from cells in MATLAB [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Using regexp to find a word
I'm working on an assignment for my CS course.
We're given a plain text file, which, in my case, contains a series of tweets.
What I need to do is create a script that will detect hashtags, and then save each hashtag into an cell array.
So far I know how to write a function that detects the '#' symbol...
strfind(textRead{i},'#');
where in a for loop where i=1:30 (that is, the number of cells of text). However, past that, I'm at a loss as to how I should write a script that will detect the '#' and return the text between that and the next ' ' (space) character.
Try this:
str = '#someHashtag other tweet text ignore #random';
regexp(str, '#[A-z]*', 'match')
I think you'll be able to find the rest out yourself :)
Here is basic skeleton. But make sure to use correct regexp to extract the values ;-)
Yes with the above Dorin's regexp and match you get one value at a time. You may add a token as per this example from mathworks.
Sample:
str = ['if <code>A </code> == x<sup>2 </sup>, ' ... '<em>disp(x) </em>']
str = if <code>A </code> == x<sup>2 </sup>, <em>disp(x) </em>
expr = '<(\w+).*?>.*?</\1>';
[tok mat] = regexp(str, expr, 'tokens', 'match');
tok{:}
ans = 'code'
ans = 'sup'
ans = 'em'
in above code you don't really need to loop and can process entire text bulk as one string , hopefully not hitting any string limit......
But if you want to loop, or if you need to loop, you use the following sample with Rody's regexp and match only.
fid = fopen('data.txt');
dataText = fgetl(fid);
while ~feof(fid)
ldata = textscan(dataText,'*%d#*');
X = (ldata, '#[A-z]*', 'match')
Cellarray = X{1}
end
Disp(X)
fclose(fid);

Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?

Eg if input string is helloworld I want the output to be like:
do
he
we
low
hell
hold
roll
well
word
hello
lower
world
...
all the way up to the longest word that is an anagram of a substring of helloworld. Like in Scrabble for example.
The input string can be any length, but rarely more than 16 chars.
I've done a search and come up with structures like a trie, but I am still unsure of how to actually do this.
The structure used to hold your dictionary of valid entries will have a huge impact on efficiency. Organize it as a tree, root being the singular zero letter "word", the empty string. Each child of root is a single first letter of a possible word, children of those being the second letter of a possible word, etc., with each node marked as to whether it actually forms a word or not.
Your tester function will be recursive. It starts with zero letters, finds from the tree of valid entries that "" isn't a word but it does have children, so you call your tester recursively with your start word (of no letters) appended with each available remaining letter from your input string (which is all of them at that point). Check each one-letter entry in tree, if valid make note; if children, re-call tester function appending each of remaining available letters, and so on.
So for example, if your input string is "helloworld", you're going to first call your recursive tester function with "", passing the remaining available letters "helloworld" as a 2nd parameter. Function sees that "" isn't a word, but child "h" does exist. So it calls itself with "h", and "elloworld". Function sees that "h" isn't a word, but child "e" exists. So it calls itself with "he" and "lloworld". Function sees that "e" is marked, so "he" is a word, take note. Further, child "l" exists, so next call is "hel" with "loworld". It will next find "hell", then "hello", then will have to back out and probably next find "hollow", before backing all the way out to the empty string again and then starting with "e" words next.
I couldn't resist my own implementation. It creates a dictionary by sorting all the letters alphabetically, and mapping them to the words that can be created from them. This is an O(n) start-up operation that eliminates the need to find all permutations. You could implement the dictionary as a trie in another language to attain faster speedups.
The "getAnagrams" command is also an O(n) operation which searches each word in the dictionary to see if it is a subset of the search. Doing getAnagrams("radiotelegraphically")" (a 20 letter word) took approximately 1 second on my laptop, and returned 1496 anagrams.
# Using the 38617 word dictionary at
# http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt
# Usage: getAnagrams("helloworld")
def containsLetters(subword, word):
wordlen = len(word)
subwordlen = len(subword)
if subwordlen > wordlen:
return False
word = list(word)
for c in subword:
try:
index = word.index(c)
except ValueError:
return False
word.pop(index)
return True
def getAnagrams(word):
output = []
for key in mydict.iterkeys():
if containsLetters(key, word):
output.extend(mydict[key])
output.sort(key=len)
return output
f = open("dict.txt")
wordlist = f.readlines()
f.close()
mydict = {}
for word in wordlist:
word = word.rstrip()
temp = list(word)
temp.sort()
letters = ''.join(temp)
if letters in mydict:
mydict[letters].append(word)
else:
mydict[letters] = [word]
An example run:
>>> getAnagrams("helloworld")
>>> ['do', 'he', 'we', 're', 'oh', 'or', 'row', 'hew', 'her', 'hoe', 'woo', 'red', 'dew', 'led', 'doe', 'ode', 'low', 'owl', 'rod', 'old', 'how', 'who', 'rho', 'ore', 'roe', 'owe', 'woe', 'hero', 'wood', 'door', 'odor', 'hold', 'well', 'owed', 'dell', 'dole', 'lewd', 'weld', 'doer', 'redo', 'rode', 'howl', 'hole', 'hell', 'drew', 'word', 'roll', 'wore', 'wool','herd', 'held', 'lore', 'role', 'lord', 'doll', 'hood', 'whore', 'rowed', 'wooed', 'whorl', 'world', 'older', 'dowel', 'horde', 'droll', 'drool', 'dwell', 'holed', 'lower', 'hello', 'wooer', 'rodeo', 'whole', 'hollow', 'howler', 'rolled', 'howled', 'holder', 'hollowed']
The data structure you want is called a Directed Acyclic Word Graph (dawg), and it is described by Andrew Appel and Guy Jacobsen in their paper "The World's Fastest Scrabble Program" which unfortunately they have chosen not to make available free online. An ACM membership or a university library will get it for you.
I have implemented this data structure in at least two languages---it is simple, easy to implement, and very, very fast.
A simple-minded approach is to generate all the "substrings" and, for each of them, check whether it's an element of the set of acceptable words. E.g., in Python 2.6:
import itertools
import urllib
def words():
f = urllib.urlopen(
'http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt')
allwords = set(w[:-1] for w in f)
f.close()
return allwords
def substrings(s):
for i in range(2, len(s)+1):
for p in itertools.permutations(s, i):
yield ''.join(p)
def main():
w = words()
print '%d words' % len(w)
ss = set(substrings('weep'))
print '%d substrings' % len(ss)
good = ss & w
print '%d good ones' % len(good)
sgood = sorted(good, key=lambda w:(len(w), w))
for aword in sgood:
print aword
main()
will emit:
38617 words
31 substrings
5 good ones
we
ewe
pew
wee
weep
Of course, as other responses pointed out, organizing your data purposefully can greatly speed-up your runtime -- although the best data organization for a fast anagram finder could well be different... but that will largely depend on the nature of your dictionary of allowed words (a few tens of thousands, like here -- or millions?). Hash-maps and "signatures" (based on sorting the letters in each word) should be considered, as well as tries &c.
What you want is an implementation of a power set.
Also look at Eric Lipparts blog, he blogged about this very thing a little while back
EDIT:
Here is an implementation I wrote of getting the powerset from a given string...
private IEnumerable<string> GetPowerSet(string letters)
{
char[] letterArray = letters.ToCharArray();
for (int i = 0; i < Math.Pow(2.0, letterArray.Length); i++)
{
StringBuilder sb = new StringBuilder();
for (int j = 0; j < letterArray.Length; j++)
{
int pos = Convert.ToInt32(Math.Pow(2.0, j));
if ((pos & i) == pos)
{
sb.Append(letterArray[j]);
}
}
yield return new string(sb.ToString().ToCharArray().OrderBy(c => c).ToArray());
}
}
This function gives me the powersets of chars that make up the passed in string, I then can use these as keys into a dictionary of anagrams...
Dictionary<string,IEnumerable<string>>
I created my dictionary of anagrams like so... (there are probably more efficient ways, but this was simple and plenty quick enough with the scrabble tournament word list)
wordlist = (from s in fileText.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
let k = new string(s.ToCharArray().OrderBy(c => c).ToArray())
group s by k).ToDictionary(o => o.Key, sl => sl.Select(a => a));
Like Tim J, Eric Lippert's blog posts where the first thing to come to my mind. I wanted to add that he wrote a follow-up about ways to improve the performance of his first attempt.
A nasality talisman for the sultana analyst
Santalic tailfans, part two
I believe the Ruby code in the answers to this question will also solve your problem.
I've been playing a lot of Wordfeud on my phone recently and was curious if I could come up with some code to give me a list of possible words. The following code takes your availble source letters (* for a wildcards) and an array with a master list of allowable words (TWL, SOWPODS, etc) and generates a list of matches. It does this by trying to build each word in the master list from your source letters.
I found this topic after writing my code, and it's definitely not as efficient as John Pirie's method or the DAWG algorithm, but it's still pretty quick.
public IList<string> Matches(string sourceLetters, string [] wordList)
{
sourceLetters = sourceLetters.ToUpper();
IList<string> matches = new List<string>();
foreach (string word in wordList)
{
if (WordCanBeBuiltFromSourceLetters(word, sourceLetters))
matches.Add(word);
}
return matches;
}
public bool WordCanBeBuiltFromSourceLetters(string targetWord, string sourceLetters)
{
string builtWord = "";
foreach (char letter in targetWord)
{
int pos = sourceLetters.IndexOf(letter);
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
continue;
}
// check for wildcard
pos = sourceLetters.IndexOf("*");
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
}
}
return string.Equals(builtWord, targetWord);
}