How do you get the stem form of a single word token? Here is my code. It works for some words, but not others.
let text = "people" // works
// let text = "geese" // doesn't work
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
let (tag, range) = tagger.tag(at: text.startIndex, unit: .word, scheme: .lemma)
let stemForm = tag?.rawValue ?? String(text[range])
However, if I lemmatize the entire text it's able to find all the stem forms of words.
let text = "This is text with plurals such as geese, people, and millennia."
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
var words: [String] = []
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lemma, options: [.omitWhitespace, .omitPunctuation]) { tag, range in
let stemForm = tag?.rawValue ?? String(text[range])
words += [stemForm]
return true
}
// this be text with plural such as goose person and millennium
words.joined(separator: " ")
Also, is it possible to reverse the process and find the plural version of a stem word?
If you set the language of the text before tagging it, it works:
tagger.string = text
tagger.setLanguage(.english, range: text.startIndex..<text.endIndex)
let (tag, range) = tagger.tag(at: text.startIndex, unit: .word, scheme: .lemma)
Without setting a language, the tagger guesses the language. Apparently, just "geese" alone is too little information for it to guess that it is English. If you check dominantLanguage without setting the language explicitly, it is apparently Dutch.
Related
I have this code to lemmatize a word to find it's stem version.
let text = "people"
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
tagger.setLanguage(.english, range: text.startIndex..<text.endIndex)
let (tag, range) = tagger.tag(at: text.startIndex, unit: .word, scheme: .lemma)
let stemForm = tag?.rawValue ?? String(text[range]) // person
Is it possible to do the reverse in Swift and find the plural version of a word?
I get a message from my response like "your bill is: 10.00"
But I need to show in bold the number and only that (everything after the ":"). I know I could use SubString, but don't understand exactly how to split text and correctly format it
my old test:
self.disclaimerLabel.attributedText = String(format: my).htmlAttributedString(withBaseFont: Font.overlineRegular07.uiFont, boldFont: Font.overlineBold02.uiFont, baseColor: Color.blueyGreyTwo.uiColor, boldColor: Color.blueyGreyTwo.uiColor)
How was my built ? If from 2 parts, set attributes to each before joining.
If you get my as a whole, you can access substrings with
let parts = my.split(separator: ":")
parts[1] will be "your bill is"
parts[2] will be "10:00"
The need to add styling to a single word or phrase is so common that it is worth having on hand a method to help you:
extension NSMutableAttributedString {
func apply(attributes: [NSAttributedString.Key: Any], to targetString: String) {
let nsString = self.string as NSString
let range = nsString.range(of: targetString)
guard range.length != 0 else { return }
self.addAttributes(attributes, range: range)
}
}
So then your only problem is discovering the stretch of text that you want to apply the attributes to. If you don't know that it is "10.00" then, as you've been told, you can find out by splitting the string at the colon-plus-space.
You can split your string into char : and then you can change text attributes like :
var str = "your bill is: 10.00"
var splitArray = str.components(separatedBy: ":")
let normalText = NSMutableAttributedString(string: splitArray[0] + ":")
let boldText = splitArray[1]
let boldTextAtr = NSMutableAttributedString(string: boldText, attributes: [NSAttributedString.Key.font: UIFont.boldSystemFont(ofSize: 16.0) ])
normalText.append(boldTextAtr)
let labell = UILabel()
labell.attributedText = normalText
labell.attributedText will print what you exactly want
I am trying to identify keys word in user entry to search for, so I thought of filtering out some parts of speech in order to extract key words to query in my database .
currently I use the code below to replace the word "of" from a string
let rawString = "I’m jealous of my parents. I’ll never have a kid as cool as theirs, one who is smart, has devilishly good looks, and knows all sorts of funny phrases."
var filtered = self.rawString.replacingOccurrences(of: "of", with: "")
what I want to do now is extend it to replace all preposition in a string.
What I was thinking of doing is creating a huge list of known prepositions like
let prepositions = ["in","through","after","under","beneath","before"......]
and then spliting the string by white space with
var WordList : [String] = filtered.components(separatedBy: " ")
and then looping through the wordlist to find a prepositional match and deleting it. Creating the list will be ugly and might not be efficient for my code.
What is the best way to detect and delete prepositions from a string?
Use NaturalLanguage:
import NaturalLanguage
let text = "The ripe taste of cheese improves with age."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
var newSentence = [String]()
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
guard let tag = tag, tag != .preposition else { return true }
newSentence.append("\(text[tokenRange])")
return true
}
print("Input: \(text)")
print("Output: \(newSentence.joined(separator: " "))")
This prints:
Input: The ripe taste of cheese improves with age.
Output: The ripe taste cheese improves age
Notice the two prepositions of and with are removed. My approach also removes the punctuation; you can adjust this with the .omitPunctuation option.
var newString = rawString
.split(separator: " ")
.filter{ !prepositions.contains(String($0))}
.joined(separator: " ")
I'm very, very new to Swift and admittedly struggling with some of its constructs. I have to work with a text file and do many manipulations - here's an example to illustrate the point:
let's say I have a text like this (multi line)
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
x----------------x
I want to be able to do simple things like find the location of #name, then split it to get the name and so on. I've done this in javascript and it was pretty simple with the use of substr and the regex matches.
In swift, which is supposed to be swift and easy and what not, I'm finding this exceedingly confusing.
Can someone help with how one might do
Find the location of the start of a substring
Extract all text between from the end of a substring to the end of text
Sorry if this is trivial - but the Apple documentation feels very complicated, and lots of examples are years old. I can't also seem to find easy application of regex.
You can use string range(of: String) method to find the range of your string, get its upperBound and search for the end of the line from that position of the string:
Playground testing:
let sentence = """
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
"""
if let start = sentence.range(of: "#name:")?.upperBound,
let end = sentence[start...].range(of: "\n")?.lowerBound {
let substring = sentence[start..<end]
print("name:", substring)
}
If you need to get the string from there to the end of the string you can use PartialRangeFrom:
if let start = sentence.range(of: "#summary:")?.upperBound {
let substring = sentence[start...]
print("summary:", substring)
}
If you find yourself using that a lot you can extend StringProtocol and create your own method:
extension StringProtocol {
func substring<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: start, options: options)?.upperBound,
let upper = self[lower...].range(of: end, options: options)?.lowerBound
else { return nil }
return self[lower..<upper]
}
func substring<S:StringProtocol>(after string: S, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: string, options: options)?.upperBound else { return nil }
return self[lower...]
}
}
Usage:
let name = sentence.substring(between: "#name:", and: "\n") // " a name"
let sumary = sentence.substring(after: "#summary:") // " a paragraph of text\n\n{{something}}\n\na whole bunch of multi-line text"
You can use regular expressions as well:
let name = sentence.substring(between: "#\\w+:", and: "\\n", options: .regularExpression) // " a name"
You can do this with range() and distance():
let str = "Example string"
let range = str.range(of: "amp")!
print(str.distance(from: str.startIndex, to: range.lowerBound)) // 2
let lastStr = str[range.upperBound...]
print(lastStr) // "le string"
I am given a string like 4eysg22yl3kk and my output should be like this:
foureysgtweny-twoylthreekk or if I am given 0123 it should be output as one hundred twenty-three. So basically, as I scan the string, I need to convert numbers to string.
I do not know how to implement this in Swift as I iterate through the string? Any idea?
You actually have two basic problems.
The first is convert a "number" to "spelt out" value (ie 1 to one). This is actually easy to solve, as NumberFormatter has a spellOut style property
let formatter = NumberFormatter()
formatter.numberStyle = .spellOut
let text = formatter.string(from: NSNumber(value: 1))
which will result in "one", neat.
The other issue though, is how to you separate the numbers from the text?
While I can find any number of solutions for "extract" numbers or characters from a mixed String, I can't find one which return both, split on their boundaries, so, based on your input, we'd end up with ["4", "eysg", "22", "yl", "3", "kk"].
So, time to role our own...
func breakApart(_ text: String, withPattern pattern: String) throws -> [String]? {
do {
let regex = try NSRegularExpression(pattern: "[0-9]+", options: .caseInsensitive)
var previousRange: Range<String.Index>? = nil
var parts: [String] = []
for match in regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.count)) {
guard let range = Range(match.range, in: text) else {
return nil
}
let part = text[range]
if let previousRange = previousRange {
let textRange = Range<String.Index>(uncheckedBounds: (lower: previousRange.upperBound, upper: range.lowerBound))
parts.append(String(text[textRange]))
}
parts.append(String(part))
previousRange = range
}
if let range = previousRange, range.upperBound != text.endIndex {
let textRange = Range<String.Index>(uncheckedBounds: (lower: range.upperBound, upper: text.endIndex))
parts.append(String(text[textRange]))
}
return parts
} catch {
}
return nil
}
Okay, so this is a little "dirty" (IMHO), but I can't seem to think of a better approach, hopefully someone will be kind enough to provide some hints towards one ;)
Basically what it does is uses a regular expression to find all the groups of numbers, it then builds an array, cutting the string apart around the matching boundaries - like I said, it's crude, but it gets the job done.
From there, we just need to map the results, spelling out the numbers as we go...
let formatter = NumberFormatter()
formatter.numberStyle = .spellOut
let value = "4eysg22yl3kk"
if let parts = try breakApart(value, withPattern: pattern) {
let result = parts.map { (part) -> String in
if let number = Int(part), let text = formatter.string(from: NSNumber(value: number)) {
return text
}
return part
}.joined(separator: " ")
print(result)
}
This will end up printing four eysg twenty-two yl three kk, if you don't want the spaces, just get rid of separator in the join function
I did this in Playgrounds, so it probably needs some cleaning up
I was able to solve my question without dealing with anything extra than converting my String to an array and check char by char. If I found a digit I was saving it in a temp String and as soon as I found out the next char is not digit, I converted my digit to its text.
let inputString = Array(string.lowercased())