How do I find letters in words that are part of a string and remove them? (List comprehensions with if statements) - python-3.7

I'm trying to remove vowels from a string. Specifically, remove vowels from words that have more than 4 letters.
Here's my thought process:
(1) First, split the string into an array.
(2) Then, loop through the array and identify words that are more than 4 letters.
(3) Third, replace vowels with "".
(4) Lastly, join the array back into a string.
Problem: I don't think the code is looping through the array.
Can anyone find a solution?
def abbreviate_sentence(sent):
split_string = sent.split()
for word in split_string:
if len(word) > 4:
abbrev = word.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"

I just figured out that the "abbrev = words.replace..." line was incomplete.
I changed it to:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "") if len(words) > 4 else words for words in split_string]
I found the part of the solution here: Find and replace string values in list.
It is called a List Comprehension.
I also found List Comprehension with If Statement
The new lines of code look like:
def abbreviate_sentence(sent):
split_string = sent.split()
for words in split_string:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"

Related

How to search roman numerals in string with dart language?

I'm new to flutter. i have a string which have some roman numbers that indicate steps.. so i want to arrange that steps with new line and don't know how to do it. i tried string.replaceAll() but cannot get it since there are many roman number and some of then are the same such as i and ii. for example i have this string..
String text = 'some text here i) some step 1 ii) some step 2 iii) some step 3. some text after step blabla '
I want the output to have '\n' infront of the roman number which will arrange the output
some text here
i) some step 1
ii) some step 2
iii) some step 3
some text after step blabla
is there something i can use to detect the numeral numbers and add '\n' infront of it in the string or is there some other way.
There is probably a simpler way to do this with some sort of regex that I don't know about, but something like this should work.
String text = 'Your String'
List<String> words = text.split(' ');
String result = '';
for(var word in words){
if(word.endsWith(')')){
result += word + '\n';
}else{
result += word
}
}
//result now contains the desired string
Tricky to know when "some text after step blabla" should have line breaks in front because I don't know what to look for... That you would have to specify more closely.
Using the Numerus package to check for valid Roman Numeral.
Have this a go:
String text = 'some text here i) some step 1 ii) some step 2 iii) some step 3. ao) invalid roman numberal. some textafter';
RegExp regexp = new RegExp(r"((\w+)\))");
final stringWithLinebreaks = text.replaceAllMapped(regexp, (match) {
return match.group(2).isValidRomanNumeral()
? '\n${match.group(1)}'
: '${match.group(1)}';
});
print(stringWithLinebreaks);
That will print out:
some text here
i) some step 1
ii) some step 2
iii) some step 3. ao) invalid roman numberal. some textafter
You could of course make it better in several ways. Such as converting the roman numeral to int value using toRomanNumeralValue(), then sorting the steps accordingly if they are in the incorrect order in the text string. You could also make the RegExp more precise in several ways. For instance replacing \w with [iIvVxXlLcCdDmM] and so on..
i've found the answer from javascript and modify some of it... this is my answer... but i managed to search roman number from 1 to 10 only.. but its ok since the step never exceed more than 10
String formatText(String text) {
RegExp regExp = new RegExp(r'(?:viii?|i(?:ii?|[vx])?|vi?|x)\)');
String value = text.replaceAllMapped(regExp, (match) => "\n${match.group(0)}");
return value;
}

Scala: Transforming List of Strings containing long descriptions to list of strings containing only last sentences

I have a List[String], for example:
val test=List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.",
..)
I want the result to be like:
List("I want this sentence", "But I want this sentence"..)
I tried few approaches but didn't work
test.map(x=>x.split(".").reverse.head)
test.map(x=>x.split(".").last)
Try using this
test.reverse.head.split("\\.").last
To handle any Exception
Try(List[String]().reverse.head.split("\\.").last).getOrElse("YOUR_DEFAULT_STRING")
You can map over you List, split each String and then take the last element. Try the below code.
val list = List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.")
list.map(_.split("\\.").last.trim)
It will give you
List(I want this sentence, But I want this sentence)
test.map (_.split("\\.").last)
Split takes a regular expression, and in such, the dot stands for every character, so you have to mask it.
Maybe you want to include question marks and bangs:
test.map (_.split("[!?.]").last)
and trim surrounding whitespace:
test.map (_.split("[!?.]").last.trim).
The reverse.head would have been a good idea, if there wasn't the last:
scala> test.map (_.split("[!?.]").reverse.head.trim)
res138: List[String] = List(I want this sentence, But I want this sentence)
You can do this a number of ways:
For each string in your original list: split by ., reverse the list, take the first value
test.map(_.split('.').reverse.headOption)
// List(Some( I want this sentence), Some( But I want this sentence))
.headOption results in Some("string") or None, and you can do something like a .getOrElse("no valid string found") on it. You can trim the unwanted whitespace if you want.
Regex match
test.map { sentence =>
val regex = ".*\\.\\s*([^.]*)\\.$".r
val regex(value) = sentence
value
}
This will fetch any string at the end of a longer string which is preceded by a full stop and a space and followed by a full stop. You can modify the regex to change the exact rules of the regex, and I recommend playing around with regex101.com if you fancy learning more regex. It's very good.
This solution is better for more complicated examples and requirements, but it's worth keeping in mind. If you are worried that the regex might not match, you can do something like checking if the regex matches before extracting it:
test.map { sentence =>
val regexString = ".*\\.\\s*([^.]*)\\.$"
val regex = regexString.r
if(sentence.matches(regexString)) {
val regex(value) = sentence
value
} else ""
}
Take the last after splitting the string by .
test.map(_.split('.').map(_.trim).lastOption)

Split: A subscript must be between 1 and the size of the array

I have a super simple formula. The problem is that sometimes the data doesn't have a second value, or sometimes the value is blank.
Split ({PO_RECEIVE.VENDOR_LOT_ID}," ")[2]
ID
111 222
123
123 222
I was thinking if I could come up with some logic to figure out whether the string has multiple value's it would solve my problem, but haven't quiet found what I'm looking for:
If {PO_RECEIVE.VENDOR_LOT_ID} = SingleOrBlankString then
{PO_RECEIVE.VENDOR_LOT_ID} else
Split ({PO_RECEIVE.VENDOR_LOT_ID}," ")[2]
Better Example Data:
3011111*42011111111
2711 00291111111
711111//12111111111
/J1111 69111111111
170111
If the string can contain a maximum of two values, separated by a space, then you can check if the string contains a space using the InStr function:
If InStr({PO_RECEIVE.VENDOR_LOT_ID}, " ") > 0 Then
{PO_RECEIVE.VENDOR_LOT_ID}
Else
Split ({PO_RECEIVE.VENDOR_LOT_ID}," ")[2]
If there can be multiple spaces between the parts you can use following formulas to get the values:
Left part:
This function returns the left part of the string until the first space.
If InStr({PO_RECEIVE.VENDOR_LOT_ID}, " ") > 0 Then
Left({PO_RECEIVE.VENDOR_LOT_ID}, InStr({PO_RECEIVE.VENDOR_LOT_ID}, " "))
Right part:
This function returns the right part of the string after the last space.
The InStrRev-function returns the position of the last space because it searches the string backwards.
The Len-function returns the length of the string.
[length] - [position of last space] = [length of the right part]
If InStr({PO_RECEIVE.VENDOR_LOT_ID}, " ") > 0 Then
Right({PO_RECEIVE.VENDOR_LOT_ID}, Len({PO_RECEIVE.VENDOR_LOT_ID}) - InStrRev(testString, " "))

Getting IndexOutOfBounds Exception while search for a subtring

I have a string like
var word = "banana"
and a sentence like var sent = "the monkey is holding a banana which is yellow"
sent1 = "banana!!"
I want to search banana in sent and then write to a file in the following way:
the monkey is holding a
banana
which is yellow
I'm doing it in the following way:
var before = sent.substring(0, sent.indexOf(word))
var after = sent.substring(sent.indexOf(word) + word.length)
println(before)
println(after)
This works fine but when I do the same for sent1, then it gives me IndexOutOfBoundsException. I think it is because there is nothing before banana in sent1. How to deal with this?
You can split based on the word and you will get an array with everything before and after the word.
val search = sent.split(word)
search: Array[String] = Array("the monkey is holding a ", " which is yellow")
This works in the "banana!!!" case:
"banana!!".split(word)
res5: Array[String] = Array("", !!)
Now you can write the three lines to a file in your favorite way:
println(search(0))
println(word)
println(search(1))
What if you had more than one occurrence of the word? .split understands regular expressions, so you could improve the previous solution with something like this:
string
.replaceAll("\\s+(?=banana)|(?<=banana)\\s+")
.foreach(println)
\\s means a whitespace character
(?=<word>) means "followed by <word>"
(?<=<word>) means "preceded by <word>"
So, this would split your string into pieces, using any spaces either preceded or followed by the "banana", and not the word itself. The actual word ends up in the list, just like the other parts of the string, so you don't need to print it out explicitly
This regex trick is called "positive look-around" ( ?= is look-ahead, ?<= is look-behind) in case you are wondering.

Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?

Eg if input string is helloworld I want the output to be like:
do
he
we
low
hell
hold
roll
well
word
hello
lower
world
...
all the way up to the longest word that is an anagram of a substring of helloworld. Like in Scrabble for example.
The input string can be any length, but rarely more than 16 chars.
I've done a search and come up with structures like a trie, but I am still unsure of how to actually do this.
The structure used to hold your dictionary of valid entries will have a huge impact on efficiency. Organize it as a tree, root being the singular zero letter "word", the empty string. Each child of root is a single first letter of a possible word, children of those being the second letter of a possible word, etc., with each node marked as to whether it actually forms a word or not.
Your tester function will be recursive. It starts with zero letters, finds from the tree of valid entries that "" isn't a word but it does have children, so you call your tester recursively with your start word (of no letters) appended with each available remaining letter from your input string (which is all of them at that point). Check each one-letter entry in tree, if valid make note; if children, re-call tester function appending each of remaining available letters, and so on.
So for example, if your input string is "helloworld", you're going to first call your recursive tester function with "", passing the remaining available letters "helloworld" as a 2nd parameter. Function sees that "" isn't a word, but child "h" does exist. So it calls itself with "h", and "elloworld". Function sees that "h" isn't a word, but child "e" exists. So it calls itself with "he" and "lloworld". Function sees that "e" is marked, so "he" is a word, take note. Further, child "l" exists, so next call is "hel" with "loworld". It will next find "hell", then "hello", then will have to back out and probably next find "hollow", before backing all the way out to the empty string again and then starting with "e" words next.
I couldn't resist my own implementation. It creates a dictionary by sorting all the letters alphabetically, and mapping them to the words that can be created from them. This is an O(n) start-up operation that eliminates the need to find all permutations. You could implement the dictionary as a trie in another language to attain faster speedups.
The "getAnagrams" command is also an O(n) operation which searches each word in the dictionary to see if it is a subset of the search. Doing getAnagrams("radiotelegraphically")" (a 20 letter word) took approximately 1 second on my laptop, and returned 1496 anagrams.
# Using the 38617 word dictionary at
# http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt
# Usage: getAnagrams("helloworld")
def containsLetters(subword, word):
wordlen = len(word)
subwordlen = len(subword)
if subwordlen > wordlen:
return False
word = list(word)
for c in subword:
try:
index = word.index(c)
except ValueError:
return False
word.pop(index)
return True
def getAnagrams(word):
output = []
for key in mydict.iterkeys():
if containsLetters(key, word):
output.extend(mydict[key])
output.sort(key=len)
return output
f = open("dict.txt")
wordlist = f.readlines()
f.close()
mydict = {}
for word in wordlist:
word = word.rstrip()
temp = list(word)
temp.sort()
letters = ''.join(temp)
if letters in mydict:
mydict[letters].append(word)
else:
mydict[letters] = [word]
An example run:
>>> getAnagrams("helloworld")
>>> ['do', 'he', 'we', 're', 'oh', 'or', 'row', 'hew', 'her', 'hoe', 'woo', 'red', 'dew', 'led', 'doe', 'ode', 'low', 'owl', 'rod', 'old', 'how', 'who', 'rho', 'ore', 'roe', 'owe', 'woe', 'hero', 'wood', 'door', 'odor', 'hold', 'well', 'owed', 'dell', 'dole', 'lewd', 'weld', 'doer', 'redo', 'rode', 'howl', 'hole', 'hell', 'drew', 'word', 'roll', 'wore', 'wool','herd', 'held', 'lore', 'role', 'lord', 'doll', 'hood', 'whore', 'rowed', 'wooed', 'whorl', 'world', 'older', 'dowel', 'horde', 'droll', 'drool', 'dwell', 'holed', 'lower', 'hello', 'wooer', 'rodeo', 'whole', 'hollow', 'howler', 'rolled', 'howled', 'holder', 'hollowed']
The data structure you want is called a Directed Acyclic Word Graph (dawg), and it is described by Andrew Appel and Guy Jacobsen in their paper "The World's Fastest Scrabble Program" which unfortunately they have chosen not to make available free online. An ACM membership or a university library will get it for you.
I have implemented this data structure in at least two languages---it is simple, easy to implement, and very, very fast.
A simple-minded approach is to generate all the "substrings" and, for each of them, check whether it's an element of the set of acceptable words. E.g., in Python 2.6:
import itertools
import urllib
def words():
f = urllib.urlopen(
'http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt')
allwords = set(w[:-1] for w in f)
f.close()
return allwords
def substrings(s):
for i in range(2, len(s)+1):
for p in itertools.permutations(s, i):
yield ''.join(p)
def main():
w = words()
print '%d words' % len(w)
ss = set(substrings('weep'))
print '%d substrings' % len(ss)
good = ss & w
print '%d good ones' % len(good)
sgood = sorted(good, key=lambda w:(len(w), w))
for aword in sgood:
print aword
main()
will emit:
38617 words
31 substrings
5 good ones
we
ewe
pew
wee
weep
Of course, as other responses pointed out, organizing your data purposefully can greatly speed-up your runtime -- although the best data organization for a fast anagram finder could well be different... but that will largely depend on the nature of your dictionary of allowed words (a few tens of thousands, like here -- or millions?). Hash-maps and "signatures" (based on sorting the letters in each word) should be considered, as well as tries &c.
What you want is an implementation of a power set.
Also look at Eric Lipparts blog, he blogged about this very thing a little while back
EDIT:
Here is an implementation I wrote of getting the powerset from a given string...
private IEnumerable<string> GetPowerSet(string letters)
{
char[] letterArray = letters.ToCharArray();
for (int i = 0; i < Math.Pow(2.0, letterArray.Length); i++)
{
StringBuilder sb = new StringBuilder();
for (int j = 0; j < letterArray.Length; j++)
{
int pos = Convert.ToInt32(Math.Pow(2.0, j));
if ((pos & i) == pos)
{
sb.Append(letterArray[j]);
}
}
yield return new string(sb.ToString().ToCharArray().OrderBy(c => c).ToArray());
}
}
This function gives me the powersets of chars that make up the passed in string, I then can use these as keys into a dictionary of anagrams...
Dictionary<string,IEnumerable<string>>
I created my dictionary of anagrams like so... (there are probably more efficient ways, but this was simple and plenty quick enough with the scrabble tournament word list)
wordlist = (from s in fileText.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
let k = new string(s.ToCharArray().OrderBy(c => c).ToArray())
group s by k).ToDictionary(o => o.Key, sl => sl.Select(a => a));
Like Tim J, Eric Lippert's blog posts where the first thing to come to my mind. I wanted to add that he wrote a follow-up about ways to improve the performance of his first attempt.
A nasality talisman for the sultana analyst
Santalic tailfans, part two
I believe the Ruby code in the answers to this question will also solve your problem.
I've been playing a lot of Wordfeud on my phone recently and was curious if I could come up with some code to give me a list of possible words. The following code takes your availble source letters (* for a wildcards) and an array with a master list of allowable words (TWL, SOWPODS, etc) and generates a list of matches. It does this by trying to build each word in the master list from your source letters.
I found this topic after writing my code, and it's definitely not as efficient as John Pirie's method or the DAWG algorithm, but it's still pretty quick.
public IList<string> Matches(string sourceLetters, string [] wordList)
{
sourceLetters = sourceLetters.ToUpper();
IList<string> matches = new List<string>();
foreach (string word in wordList)
{
if (WordCanBeBuiltFromSourceLetters(word, sourceLetters))
matches.Add(word);
}
return matches;
}
public bool WordCanBeBuiltFromSourceLetters(string targetWord, string sourceLetters)
{
string builtWord = "";
foreach (char letter in targetWord)
{
int pos = sourceLetters.IndexOf(letter);
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
continue;
}
// check for wildcard
pos = sourceLetters.IndexOf("*");
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
}
}
return string.Equals(builtWord, targetWord);
}