Single user defined function that preprocesses a python list of strings - data-cleaning

I have the following list of strings
my_list = ["This: is the first string", "This: is another String", This: is the third string of words in the list!"]
I want to create a function that takes each string from my_list in string format and removes the "This: " (the first 6 characters), punctuations, and stop words.
This is what I have tried:
def preprocess(any_list):
[e[6:] for e in any_list]
return any_list
no_punct = [char for char in any_list if char not in string.punctuation]
no_punct = ''.join(no_punct)
clean_words = [word for word in no_punct.split() if word.lower() not in stopwords('english')]
return clean_words
preprocess(my_list)

Related

Swift 5 split string at integer index

It used to be you could use substring to get a portion of a string. That has been deprecated in favor on string index. But I can't seem to make a string index out of integers.
var str = "hellooo"
let newindex = str.index(after: 3)
str = str[newindex...str.endIndex]
No matter what the string is, I want the second 3 characters. So and str would contain "loo". How can I do this?
Drop the first three characters and the get the remaining first three characters
let str = "helloo"
let secondThreeCharacters = String(str.dropFirst(3).prefix(3))
You might add some code to handle the case if there are less than 6 characters in the string

How do I find letters in words that are part of a string and remove them? (List comprehensions with if statements)

I'm trying to remove vowels from a string. Specifically, remove vowels from words that have more than 4 letters.
Here's my thought process:
(1) First, split the string into an array.
(2) Then, loop through the array and identify words that are more than 4 letters.
(3) Third, replace vowels with "".
(4) Lastly, join the array back into a string.
Problem: I don't think the code is looping through the array.
Can anyone find a solution?
def abbreviate_sentence(sent):
split_string = sent.split()
for word in split_string:
if len(word) > 4:
abbrev = word.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"
I just figured out that the "abbrev = words.replace..." line was incomplete.
I changed it to:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "") if len(words) > 4 else words for words in split_string]
I found the part of the solution here: Find and replace string values in list.
It is called a List Comprehension.
I also found List Comprehension with If Statement
The new lines of code look like:
def abbreviate_sentence(sent):
split_string = sent.split()
for words in split_string:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"

Swift-How to write a variable or a value that is changing between parenthesis

(in swift language) For example " A + D " I want the string A to stay all the time but the value of D changes depending on let's say Hp, so when Hp is fd the string will be "A + fd" and etc
I mean like( "A + %s" % Hp ) for the string in python. Such as here: What does %s mean in Python?
If you are talking about %s, then it's a c-style formatting key, which awaits string variable or value in the list of arguments. In Swift, you compose strings using "\(variable)" syntax, which is called String interpolation, as explained in the documentation:
String Interpolation
String interpolation is a way to construct a new String value from a
mix of constants, variables, literals, and expressions by including
their values inside a string literal. You can use string interpolation
in both single-line and multiline string literals. Each item that you
insert into the string literal is wrapped in a pair of parentheses,
prefixed by a backslash ():
Source: official documentation
Example:
var myVar = "World"
var string = "Hello \(myVar)"
With non-strings:
let multiplier = 3
let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
// Output: message is "3 times 2.5 is 7.5"

Parse string to arguments

I have a character vector like so:
string = 'a(0:2), b(3), c(rand(4, 5)*0.1)';
I'd like to use this char array as input arguments to a function. The arguments would then be:
a(0:2)
b(3)
c(rand(4, 5)*0.1)
How can I parse the string into those input arguments?
At first glance, one could split the string with the ', ' separator, but it would fail for the third argument obviously.
A simple solution is using split as following:
expressions = split(string, "), ");
Then add ")" at the end of each string in expressions.

Problem in appending a string to a already filled string builder(at the beginning by using INSERT) and then converting that to string array(C#3.0)

I have a string builder like
StringBuilder sb = new StringBuilder("Value1");
sb.AppendLine("Value2");
Now I have a string say
string str = "value 0";
I did
sb.Insert(0,str);
and then
string[] strArr = sb.ToString().Trim().Replace("\r", string.Empty).Split('\n');
The result I am getting as (Array size of 2 where I should get 3)
[0] value 0 Value1
[1] value2
But the desired output being
[0] Value 0
[1] Value1
[2] Value2
Where I am going wrong?
I am using C#3.0
Please help.. It 's urgent
Thanks
The method StringBuilder.Insert does not insert a new line automatically so you have to add one yourself:
string str = "value 0" + Environment.NewLine;
Actually, you would get an array of size one. You put "Value1" in the StringBuilder when you create it, then you add "Value2" and a line break, making the string "Value1Value2\r\n" (assuming the CR+LF line break for this example). Then you insert "Value 0" at the beginning, making the string "Value 0Value1Value2\r\n". Trimming the string removes the line break at the end, and splitting on a character that doesn't exist in the string gives you an array with only one item:
[0] Value 0Value1Value2
The Insert method doesn't add a line break like AppendLine does, so you have to add the line break manually:
StringBuilder sb = new StringBuilder();
sb.AppendLine("Value1");
sb.AppendLine("Value2");
string str = "value 0";
sb.Insert(0, str + Environment.NewLine);
Now you can trim and split the string:
string[] strArr =
sb.ToString()
.Trim()
.Split(new string[]{ Environment.NewLine }, StringSplitOptions.None);
You are inserting Value 0 and which would result in the first line being Value0Value1
Insert will only insert a string at the specified position. It does work the same as AppendLine. As there is no carriage return your split won't work as you intended.