Swift: How to identify and delete prepositions in a string - swift

I am trying to identify keys word in user entry to search for, so I thought of filtering out some parts of speech in order to extract key words to query in my database .
currently I use the code below to replace the word "of" from a string
let rawString = "I’m jealous of my parents. I’ll never have a kid as cool as theirs, one who is smart, has devilishly good looks, and knows all sorts of funny phrases."
var filtered = self.rawString.replacingOccurrences(of: "of", with: "")
what I want to do now is extend it to replace all preposition in a string.
What I was thinking of doing is creating a huge list of known prepositions like
let prepositions = ["in","through","after","under","beneath","before"......]
and then spliting the string by white space with
var WordList : [String] = filtered.components(separatedBy: " ")
and then looping through the wordlist to find a prepositional match and deleting it. Creating the list will be ugly and might not be efficient for my code.
What is the best way to detect and delete prepositions from a string?

Use NaturalLanguage:
import NaturalLanguage
let text = "The ripe taste of cheese improves with age."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
var newSentence = [String]()
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
guard let tag = tag, tag != .preposition else { return true }
newSentence.append("\(text[tokenRange])")
return true
}
print("Input: \(text)")
print("Output: \(newSentence.joined(separator: " "))")
This prints:
Input: The ripe taste of cheese improves with age.
Output: The ripe taste cheese improves age
Notice the two prepositions of and with are removed. My approach also removes the punctuation; you can adjust this with the .omitPunctuation option.

var newString = rawString
.split(separator: " ")
.filter{ !prepositions.contains(String($0))}
.joined(separator: " ")

Related

How to lemmatize a single word in Swift

How do you get the stem form of a single word token? Here is my code. It works for some words, but not others.
let text = "people" // works
// let text = "geese" // doesn't work
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
let (tag, range) = tagger.tag(at: text.startIndex, unit: .word, scheme: .lemma)
let stemForm = tag?.rawValue ?? String(text[range])
However, if I lemmatize the entire text it's able to find all the stem forms of words.
let text = "This is text with plurals such as geese, people, and millennia."
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
var words: [String] = []
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lemma, options: [.omitWhitespace, .omitPunctuation]) { tag, range in
let stemForm = tag?.rawValue ?? String(text[range])
words += [stemForm]
return true
}
// this be text with plural such as goose person and millennium
words.joined(separator: " ")
Also, is it possible to reverse the process and find the plural version of a stem word?
If you set the language of the text before tagging it, it works:
tagger.string = text
tagger.setLanguage(.english, range: text.startIndex..<text.endIndex)
let (tag, range) = tagger.tag(at: text.startIndex, unit: .word, scheme: .lemma)
Without setting a language, the tagger guesses the language. Apparently, just "geese" alone is too little information for it to guess that it is English. If you check dominantLanguage without setting the language explicitly, it is apparently Dutch.

How to check if a string contains a substring within an array of strings in Swift?

I have a string "A very nice beach" and I want to be able to see if it contains any words of the substring within the array of wordGroups.
let string = "A very nice beach"
let wordGroups = [
"beach",
"waterfront",
"with a water view",
"near ocean",
"close to water"
]
First solution is for exactly matching the word or phrase in wordGroups using regex
var isMatch = false
for word in wordGroups {
let regex = "\\b\(word)\\b"
if string.range(of: regex, options: .regularExpression) != nil {
isMatch = true
break
}
}
As suggested in the comments the above loop can be replace with a shorter contains version
let isMatch = wordGroups.contains {
string.range(of: "\\b\($0)\\b", options: .regularExpression) != nil
}
Second solution is for simply text if string contains the any of the strings in the array
let isMatch2 = wordGroups.contains(where: string.contains)
So for "A very nice beach" both returns true but for "Some very nice beaches" only the second one returns true
Wasn't too sure how to interpret "to see if it contains any words of the substring within the array of wordGroups", but this solution checks to see if any words of your input string are contained in any substring of your word groups.
func containsWord(str: String, wordGroups: [String]) -> Bool {
// Get all the words from your input string
let words = str.split(separator: " ")
for group in wordGroups {
// Put all the words in the group into set to improve lookup time
let set = Set(group.split(separator: " "))
for word in words {
if set.contains(word) {
return true
}
}
}
return false
}

Swift 5.1 - is there a clean way to deal with locations of substrings/ pattern matches

I'm very, very new to Swift and admittedly struggling with some of its constructs. I have to work with a text file and do many manipulations - here's an example to illustrate the point:
let's say I have a text like this (multi line)
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
x----------------x
I want to be able to do simple things like find the location of #name, then split it to get the name and so on. I've done this in javascript and it was pretty simple with the use of substr and the regex matches.
In swift, which is supposed to be swift and easy and what not, I'm finding this exceedingly confusing.
Can someone help with how one might do
Find the location of the start of a substring
Extract all text between from the end of a substring to the end of text
Sorry if this is trivial - but the Apple documentation feels very complicated, and lots of examples are years old. I can't also seem to find easy application of regex.
You can use string range(of: String) method to find the range of your string, get its upperBound and search for the end of the line from that position of the string:
Playground testing:
let sentence = """
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
"""
if let start = sentence.range(of: "#name:")?.upperBound,
let end = sentence[start...].range(of: "\n")?.lowerBound {
let substring = sentence[start..<end]
print("name:", substring)
}
If you need to get the string from there to the end of the string you can use PartialRangeFrom:
if let start = sentence.range(of: "#summary:")?.upperBound {
let substring = sentence[start...]
print("summary:", substring)
}
If you find yourself using that a lot you can extend StringProtocol and create your own method:
extension StringProtocol {
func substring<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: start, options: options)?.upperBound,
let upper = self[lower...].range(of: end, options: options)?.lowerBound
else { return nil }
return self[lower..<upper]
}
func substring<S:StringProtocol>(after string: S, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: string, options: options)?.upperBound else { return nil }
return self[lower...]
}
}
Usage:
let name = sentence.substring(between: "#name:", and: "\n") // " a name"
let sumary = sentence.substring(after: "#summary:") // " a paragraph of text\n\n{{something}}\n\na whole bunch of multi-line text"
You can use regular expressions as well:
let name = sentence.substring(between: "#\\w+:", and: "\\n", options: .regularExpression) // " a name"
You can do this with range() and distance():
let str = "Example string"
let range = str.range(of: "amp")!
print(str.distance(from: str.startIndex, to: range.lowerBound)) // 2
let lastStr = str[range.upperBound...]
print(lastStr) // "le string"

How to get all middle names In a name

Say, I have the following fullnames:
1) Whitney Rajakanya SiriVana Giovendi
2) Cheryl Thompson Winston
How to retrieve the middlename from the above respective fullname?
Example:
There are 2 middle names in name (1), and there is one middle name in name(2)
I used this code but it didn't get the middle name.
var components = fullName.components(separatedBy: " ")
if(components.count > 0)
{
let firstName = components.removeFirst()
}
Problem:
1) How get all the middle names in a name? Some names have 1 or more (like shown above).
Thanks
If you define "middle name" as everything except the first and last word in a name, then you can split the string by spaces, dropFirst and dropLast, then join the result.
var components = fullName.components(separatedBy: " ")
if (components.count <= 2) {
// no middle name
} else {
let middleName = components.dropFirst().dropLast().joined(separator: " ")
}
You can also make use of PersonNameComponentsFormatter if the names are from various locales and you need different ways of handling each of them.
you can define a greedy reg-exp and get the middle names easily, like:
let namesArray = ["Whitney Rajakanya SiriVana Giovendi", "Cheryl Thompson Winston", "James T. Kirk", "Jean-Luc Picard", "J. Archer"]
if let regExp = try? NSRegularExpression(pattern: " (.*) ", options: .caseInsensitive) {
namesArray.forEach { (name) in
regExp.matches(in: name, options: .reportProgress, range: NSRange(location: 0, length: name.count)).forEach({ (textCheckingResult) in
guard textCheckingResult.numberOfRanges > 1 else { return }
let middleNames = (name as NSString).substring(with: textCheckingResult.range(at: 1))
debugPrint("\(middleNames)")
})
}
}
then you can see the middle names printed out, like:
Rajakanya SiriVana
Thompson
T.
that could be a clean and sleek solution.
NOTE: logically it is not clear whether the shortened middle names should be filtered out or not, but you can get the gist and will be able to extend this concept at your convenince.

How to use NSStringEnumerationOptions.ByWords with punctuation

I'm using this code to find the NSRange and text content of the string contents of a NSTextField.
nstext.enumerateSubstringsInRange(NSMakeRange(0, nstext.length),
options: NSStringEnumerationOptions.ByWords, usingBlock: {
(substring, substringRange, _, _) -> () in
//Do something with substring and substringRange
}
The problem is that NSStringEnumerationOptions.ByWords ignores punctuation, so that
Stop clubbing, baby seals
becomes
"Stop" "clubbing" "baby" "seals"
not
"Stop" "clubbing," "baby" "seals
If all else fails I could just check the characters before or after a given word and see if they are on the exempted list (where would I find which characters .ByWords exempts?); but there must be a more elegant solution.
How can I find the NSRanges of a set of words, from a string which includes the punctuation as part of the word?
You can use componentsSeparatedByString instead
var arr = nstext.componentsSeparatedByString(" ")
Output :
"Stop" "clubbing," "baby" "seals
Inspired by Richa's answer, I used componentsSeparatedByString(" "). I had to add a bit of code to make it work for me, since I wanted the NSRanges from the output. I also wanted it to still work if there were two instances of the same word - e.g. 'please please stop clubbing, baby seals'.
Here's what I did:
var words: [String] = []
var ranges: [NSRange] = []
//nstext is a String I converted to a NSString
words = nstext.componentsSeparatedByString(" ")
//apologies for the poor naming
var nstextLessWordsWeHaveRangesFor = nstext
for word in words
{
let range:NSRange = nstextLessWordsWeHaveRangesFor.rangeOfString(word)
ranges.append(range)
//create a string the same length as word so that the 'ranges' don't change in the future (if I just replace it with "" then the future ranges will be wrong after removing the substring)
var fillerString:String = ""
for var i=0;i<word.characters.count;++i{
fillerString = fillerString.stringByAppendingString(" ")
}
nstextLessWordsWeHaveRangesFor = nstextLessWordsWeHaveRangesFor.stringByReplacingCharactersInRange(range, withString: fillerString)
}