I am trying to analyze a sentence and make an action depending on that.
Here are examples of sentences received from speech recognition. I wrote couple of sentences, because we actually don't know what user is going to say for sure, and if he at all says the right pattern.
var str = "20 minutes to take a shower"
var sentence = "seven minutes to make 10 last homeworks"
var sentence2 = "strum guitar for 15 minutes"
var plans = "launch with friend 12:15, then drawing lesson"
I want to extract "20 minutes" ; assign it for the timeValue and launch the timer.
Also, to assign a task to taskValue to represent the task that I am doing. (I thought getting task value by removing "20 minutes" from the initial sentence).
What do you think is the best way to work with the String that I need to analyze?
I thought of finding index ranges and then cutting/copying with the help of indexes, but
The format of the indexes it returns is like this: and I don't know how to extract the number of index. (In this case: 0 through 10)
<NSSimpleRegularExpressionCheckingResult: 0x6000027f0140>{0, 10}{<NSRegularExpression: 0x600003cfcb40> [0-9]{1,} minutes 0x1}
How to ignite the timer? We have to validate that there's the proper command given. And, when we get the timeValue and taskValue back, then how do we ignite the timer? (The whole process was: User pushes button -> user speaking -> speech recognized(and displayed on the screen label) -> sentence analyzed(?) -> timer starts(?) and task displayed in the label(?) )
What is your recommendation for architecture of speech analysis system. Maybe you know some articles on this topic?
Here's the logic for the speech detection.
var timeValue: Int = Int()
var taskValue: String = ""
func stringDeduction(of inputText: String) -> (Int?, String?) {
let pattern = "[0-9]{1,} minutes"
let regexOptions: NSRegularExpression.Options = [.caseInsensitive]
let matchingOptions: NSRegularExpression.MatchingOptions = [.reportCompletion]
// TODO - catch errors with regex
let regex = try! NSRegularExpression(pattern: pattern, options: regexOptions)
// } catch {
// print("error in regex")
// }
let range = NSRange(location: 0, length: inputText.utf8.count)
// \d - matches any digit
// Pattern for time format like this 00:00
//let patternForTime = "[0-9]{1,}:[0-9]{1,2}"
if let matchIndex = regex.firstMatch(in: inputText, options: matchingOptions, range: range) {
print(matchIndex)
} else {
print("No match.")
}
// check whether the string matches and print one of two messages
if let match = regex.firstMatch(in: inputText, range: NSRange(location: 0, length: inputText.utf8.count)) {
print("*: Match!")
} else {
print("*: No match.")
}
/* Question - how to use "mathces" properly?!
if let match = regex.matches(in: testString, options: .reportCompletion ,range: NSRange(location: 0, length: testString.utf8.count)) {
print("*: Match!")
print(match)
} else {
print("*: No match.")
}
*/
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text, options: [], range: NSRange(text.startIndex..., in: text))
return results.map {
String(text[Range($0.range, in: text)!])
}
} catch {
print("invalid regex")
return []
}
}
var task = regex.stringByReplacingMatches(in: inputText, options: .withoutAnchoringBounds, range: range, withTemplate: "")
//var taskMutableString = NSMutableString(string: str)
//regex.replaceMatches(in: taskMutableString, options: .withoutAnchoringBounds, range: range, withTemplate: "")
//taskMutableString
var timeStringArray = matches(for: pattern, in: inputText)
var timeString = timeStringArray[0]
let time = Int(timeString.replacingOccurrences(of: " minutes", with: ""))
timeValue = time ?? Int()
taskValue = task.replacingOccurrences(of: "to" , with: "", options: .caseInsensitive, range: task.startIndex..<task.index(task.startIndex, offsetBy: 4))
let taskReturn = taskValue
return (time, taskReturn)
// the regex ^[ \t]+|[ \t]+$ matches excess whitespace at the beginning or end of a line.
// what regex, or string method matches exess whitespace at the beginning of a line
}
Here's the extension to work with string like with array. Like this, - str[0..2]
extension String {
subscript (bounds: CountableClosedRange<Int>) -> String {
let start = index(startIndex, offsetBy: bounds.lowerBound)
let end = index(startIndex, offsetBy: bounds.upperBound)
return String(self[start...end])
}
subscript (bounds: CountableRange<Int>) -> String {
let start = index(startIndex, offsetBy: bounds.lowerBound)
let end = index(startIndex, offsetBy: bounds.upperBound)
return String(self[start..<end])
}
}
Here's another approach I have tried: (Although it doesn't seem good)
let timeArray = ["one minute", "two minutes", "three minutes", "four minutes", "five minutes", "six minutes", "seven minutes", "eight minutes", "nine minutes"]
let timeValueCheck = "Answer: \(sentence.containsAny(of: timeArray) ?? "doesn't contain")"
//Dividing String into words
let abc: [String] = str.components(separatedBy: " ")
// Finding a number within an array of words
// There's a problem if there are couple of numbers in the sentence, it returns all of them and not only the needed time.
let numbers = abc.compactMap {
// convert each substring into an Int
return Int($0)
}
for i in 1...100 {
if str.contains(String(i) + " minutes") {
print(i)
}
}
Thank you for any of your help! I've just gone insane with this task for 3 months. Also, if something is unclear or a bit messy, please tell me! I'll try to correct.
Related
In App i have string like
1A11A1
I want to convert it to
1A1 1A1
There should be space after 3characters.
What i tried is : code = 1A11A1
let end = code.index(code.startIndex, offsetBy: code.count)
let range = code.startIndex..<end
if code.count < 3 {
code = code.replacingOccurrences(of: "(\\d+)", with: "$1", options: .regularExpression, range: range)
}
else {
code = code.replacingOccurrences(of: "(\\d{3})(\\d+)", with: "$1 $2", options: .regularExpression, range: range)
}
If your rule is that you want a "space after 3 characters," take three characters, add a space and then the rest:
let result = "\(code.prefix(3)) \(code.dropFirst(3))"
// "1A1 1A1"
Rob's solution is fine, just for the sake of it, there's also an option to use insert(" ", at: index), something like this:
extension String {
var postalCode: String {
var result = self
// Check that this string is the right length
guard result.count == 6 else {
return result
}
let index = result.index(result.startIndex, offsetBy: 3)
result.insert(" ", at: index)
return result
}
}
Test:
let str: String = "1A11A1"
print(str.postalCode) // prints 1A1 1A1
let str2: String = "1A1 1A1"
print(str2.postalCode) // prints 1A1 1A1 (doesn't change format)
let str3: String = "12345"
print(str3.postalCode) // prints 12345 (doesn't change format)
for the following code:
import Foundation
extension String {
var fullRange: NSRange {
return .init(self.startIndex ..< self.endIndex, in: self)
}
public subscript(range: Range<Int>) -> Self.SubSequence {
let st = self.index(self.startIndex, offsetBy: range.startIndex)
let ed = self.index(self.startIndex, offsetBy: range.endIndex)
let sub = self[st ..< ed]
return sub
}
func split(regex pattern: String) throws -> [String] {
let regex = try NSRegularExpression.init(pattern: pattern, options: [])
let fRange = self.fullRange
let match = regex.matches(in: self, options: [], range: fRange)
var list = [String]()
var start = 0
for m in match {
let r = m.range
let end = r.location
list.append(String(self[start ..< end]))
start = end + r.length
}
if start < self.count {
list.append(String(self[start ..< self.count]))
}
return list
}
}
print(try! "مرتفع جداً\nVery High".split(regex: "\n"))
the output should be :
["مرتفع جداً", "Very High"]
but instead it is:
["مرتفع جداً\n", "ery High"]
that because regex (for this case) matched the \n at the offset 10 instead of 9
is there any thing wrong in my code, or it is a bug in swift with regex !!
It's not a bug. You are trying to use Int indexes which is error-prone and strongly discouraged in an Unicode environment.
This is the equivalent of your code with the proper String.Index type and the dedicated API to convert NSRange to Range<String.Index> and vice versa. fullRange and subscript are obsolete.
I just left out the print line. startIndex and endIndex are properties of String
extension String {
func split(regex pattern: String) throws -> [String] {
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.matches(in: self, range: NSRange(startIndex..., in: self))
var list = [String]()
var start = startIndex
for match in matches {
let range = Range(match.range, in: self)!
let end = range.lowerBound
list.append(String(self[start..<end]))
start = range.upperBound
}
if start < endIndex {
list.append(String(self[start..<endIndex]))
}
return list
}
}
print(try! "مرتفع جداً\nVery High".split(regex: "\n"))
The result is ["مرتفع جداً", "Very High"]
I found the issue behind this bug?!
Swift Strings are so much weirder than any other language; since every character is 4 bytes length, then a single character (may, would, will, ..) contains 1 or 2 unicode characters (witch what happened in my case), so the solution is to subarray the unicodeScalars of the swift String instead of the string it self !!
i want to look in an array of strings to get all strings containing subbstring. This function should also work with wildcard.
I wrote this function:
func wordcontains(word: String, from words: [String]) -> [String] {
//Si il y a des jokers on utilise la methode regex
//Sinon on utilise la methode simple car beaucoup plus rapide
let foundWords = words.filter { otherWord in
let wordregex = word.replacingOccurrences(of: "?", with: ".")
if (otherWord.range(of: "[A-Z]*\(wordregex)[A-Z]*", options: .regularExpression) != nil){
return true
}else {
return false
}
}
return foundWords
}
and it's working like that:
input : anagrams(word: "ARC?", from: ["BOU", "BAC", "ARCS", "ARCH", "TREE","ARCHE","PROUE"])
output : ["ARCS", "ARCH", "ARCHE"]
it's working well with a small array, but i need to check in an array of 300000 words and it take a while.
What is the best way to optimize the regex / function?
Perhaps there is a better approch ?
For your interest, the code I used for testing.
Create a Command Line Tool project.
import Foundation
func wordcontains(word: String, from words: [String]) -> [String] {
...(exactly the same code as yours)...
}
///Creating NSRegularExpression outside of the loop
func wordcontains2(word: String, from words: [String]) -> [String] {
let wordregex = word.replacingOccurrences(of: "?", with: ".")
let pattern = "[A-Z]*\(wordregex)[A-Z]*"
let regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: pattern)
} catch {
fatalError(error.localizedDescription)
}
let foundWords = words.filter { otherWord in
regex.firstMatch(in: otherWord, range: NSRange(0..<otherWord.utf16.count)) != nil
}
return foundWords
}
/// Removing `[A-Z]*` from both ends as suggested in rmaddy's comment.
/// This assumes all words in the parameter `words` consists only capital letters.
func wordcontains3(word: String, from words: [String]) -> [String] {
let wordregex = word.replacingOccurrences(of: "?", with: ".")
let regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: wordregex)
} catch {
fatalError(error.localizedDescription)
}
let foundWords = words.filter { otherWord in
regex.firstMatch(in: otherWord, range: NSRange(0..<otherWord.utf16.count)) != nil
}
return foundWords
}
Generally, creating an instance of NSRegularExpression is an expensive operation, so moving it outside of the loop may improve the performance (of course, in case the regex does not change), but the effect is very limited.
And I added some code for testing.
func makeRandomWords(_ count: Int) -> [String] {
var words: [String] = []
for _ in 0..<count {
let len = Int.random(in: 3...5)
var word = ""
for _ in 0..<len {
let charCode = UInt32.random(in: UInt32(UInt8(ascii: "A"))...UInt32(UInt8(ascii: "Z")))
word.append(Character(UnicodeScalar(charCode)!))
}
words.append(word)
}
return words
}
let words = makeRandomWords(300_000) //I have found the number of words is `300000` after I wrote my comment...
do {
let date1 = Date()
let w1 = wordcontains(word: "ARC?", from: words)
let date2 = Date()
print(date2.timeIntervalSince(date1), w1)
let date3 = Date()
let w2 = wordcontains2(word: "ARC?", from: words)
let date4 = Date()
print(date4.timeIntervalSince(date3), w2)
let date5 = Date()
let w3 = wordcontains3(word: "ARC?", from: words)
let date6 = Date()
print(date6.timeIntervalSince(date5), w3)
}
Result:
6.443639039993286 ["ARCQJ", "ARCZB", "AARCI", "ARCR", "ARCR", "ARCQS", "ARCGM", "ARCKL", "UARCN", "FARCS", "ARCNA", "ARCZM", "PARCL", "ARCTA", "ARCS", "ARCE", "ARCG", "ARCE"]
1.7534430027008057 ["ARCQJ", "ARCZB", "AARCI", "ARCR", "ARCR", "ARCQS", "ARCGM", "ARCKL", "UARCN", "FARCS", "ARCNA", "ARCZM", "PARCL", "ARCTA", "ARCS", "ARCE", "ARCG", "ARCE"]
1.4359259605407715 ["ARCQJ", "ARCZB", "AARCI", "ARCR", "ARCR", "ARCQS", "ARCGM", "ARCKL", "UARCN", "FARCS", "ARCNA", "ARCZM", "PARCL", "ARCTA", "ARCS", "ARCE", "ARCG", "ARCE"]
The result may change as this code uses random words, but the consumed times may not show big difference for each run.
I have a string like this:
"te_st" and like to replace all underscores followed by a character with the uppercased version of this character.
From "te_st" --> Found (regex: "_.") --------replace with next char (+ uppercase ("s"->"S")--------> "teSt"
From "te_st" ---> to "teSt"
From "_he_l_lo" ---> to "HeLLo"
From "an_o_t_h_er_strin_g" ---> to "anOTHErStrinG"
... but I can not really get it working using Swift's NSRegularExpression like this small snipped does:
var result = "te_st" // result should be teSt
result = try! NSRegularExpression(pattern: "_*").stringByReplacingMatches(in: result, range: NSRange(0..<result.count), withTemplate: ("$1".uppercased()))
There's no regular syntax to convert a match to uppercase. The code you posted is attempting to convert the string $1 to uppercase which is of course just $1. It isn't attempting to convert the value represented by the $1 match at runtime.
Here's another approach using a regular expression to find the _ followed by a lowercase letter. Those are enumerated and replaced with the uppercase letter.
extension String {
func toCamelCase() -> String {
let expr = try! NSRegularExpression(pattern: "_([a-z])")
var res = self
for match in expr.matches(in: self, range: NSRange(0..<res.count)).reversed() {
let range = Range(match.range, in: self)!
let letterRange = Range(match.range(at: 1), in: self)!
res.replaceSubrange(range, with: self[letterRange].uppercased())
}
return res
}
}
print("te_st".toCamelCase())
print("_he_l_lo".toCamelCase())
print("an_o_t_h_er_strin_g".toCamelCase())
This outputs:
teSt
HeLLo
anOTHErStrinG
Here is one implementation using NSRegularExpression. I use group match to get the character after _ and capitalize it and replace the string.
func capitalizeLetterAfterUnderscore(string: String) -> String {
var capitalizedString = string
guard let regularExpression = try? NSRegularExpression(pattern: "_(.)") else {
return capitalizedString
}
let matches = regularExpression.matches(in: string,
options: .reportCompletion,
range: NSMakeRange(0, string.count))
for match in matches {
let groupRange = match.range(at: 1)
let index = groupRange.location
let characterIndex = string.index(string.startIndex,
offsetBy: index)
let range = characterIndex ... characterIndex
let capitalizedCharacter = String(capitalizedString[characterIndex]).capitalized
capitalizedString = capitalizedString.replacingCharacters(in: range,
with: capitalizedCharacter)
}
capitalizedString = capitalizedString.replacingOccurrences(of: "_", with: "")
return capitalizedString
}
capitalizeLetterAfterUnderscore(string: "an_o_t_h_er_strin_g") // anOTHErStrinG
And here is other one without using regular expression. I made extension for method which could also be reused.
extension String {
func indexes(of character: String) -> [Index] {
precondition(character.count == 1, "character should be single letter string")
return enumerated().reduce([]) { (partial, component) in
let currentIndex = index(startIndex,
offsetBy: component.offset)
return String(self[currentIndex]) == character
? partial + [currentIndex]
: partial
}
}
func capitalizeLetter(after indexes: [Index]) -> String {
var modifiedString = self
for currentIndex in indexes {
guard let letterIndex = index(currentIndex,
offsetBy: 1,
limitedBy: endIndex)
else { continue }
let range = letterIndex ... letterIndex
modifiedString = modifiedString.replacingCharacters(in: range,
with: self[range].capitalized)
}
return modifiedString
}
}
let string = "an_o_t_h_er_strin_g"
let newString = string.capitalizeLetter(after: string.indexes(of: "_"))
.replacingOccurrences(of: "_",with: "")
You can use string range(of:, options:, range:) method with .regularExpression options to match the occurrences of "_[a-z]" and replace the subranges iterating the ranges found at reversed order by the character at the index after the range lowerbound uppercased:
let string = "an_o_t_h_er_strin_g"
let regex = "_[a-z]"
var start = string.startIndex
var ranges:[Range<String.Index>] = []
while let range = string.range(of: regex, options: .regularExpression, range: start..<string.endIndex) {
start = range.upperBound
ranges.append(range)
}
var finalString = string
for range in ranges.reversed() {
finalString.replaceSubrange(range, with: String(string[string.index(after: range.lowerBound)]).uppercased())
}
print(finalString) // "anOTHErStrinG\n"
The problem is that it is converting the string "$1" to upper case (which is, unsurprisingly unchanged, just "$1") and using "$1" as the template. If you want to use regex, you will have to enumerate through matches yourself.
The alternative is to split the string by _ characters and uppercase the first character of every substring (except the first) and joining it back together using reduce:
let input = "te_st"
let output = input.components(separatedBy: "_").enumerated().reduce("") { $0 + ($1.0 == 0 ? $1.1 : $1.1.uppercasedFirst()) }
Or, if your goal isn't to write code as cryptic as most regex, we can make that a tad more legible:
let output = input
.components(separatedBy: "_")
.enumerated()
.reduce("") { result, current in
if current.offset == 0 {
return current.element // because you don’t want the first component capitalized
} else {
return result + current.element.uppercasedFirst()
}
}
Resulting in:
teSt
Note, that uses this extension for capitalizing the first character:
extension String {
func uppercasedFirst(with locale: Locale? = nil) -> String {
guard count > 0 else { return self }
return String(self[startIndex]).uppercased(with: locale) + self[index(after: startIndex)...]
}
}
If you want to do sort of dynamic conversion with NSRegularExpression, you can subclass NSRegularExpression and override replacementString(for:in:offset:template:):
class ToCamelRegularExpression: NSRegularExpression {
override func replacementString(for result: NSTextCheckingResult, in string: String, offset: Int, template templ: String) -> String {
if let range = Range(result.range(at: 1), in: string) {
return string[range].uppercased()
} else {
return super.replacementString(for: result, in: string, offset: 0, template: templ)
}
}
}
func toCamelCase(_ input: String) -> String { //Make this a String extension if you prefer...
let regex = try! ToCamelRegularExpression(pattern: "_(.)")
return regex.stringByReplacingMatches(in: input, options: [], range: NSRange(0..<input.utf16.count), withTemplate: "$1")
}
print(toCamelCase("te_st")) //-> teSt
print(toCamelCase("_he_l_lo")) //-> HeLLo
print(toCamelCase("an_o_t_h_er_strin_g")) //-> anOTHErStrinG
I have many strings, like this:
'This is a "table". There is an "apple" on the "table".'
I want to replace "table", "apple" and "table" with spaces. Is there a way to do it?
A simple regular expression:
let sentence = "This is \"table\". There is an \"apple\" on the \"table\""
let pattern = "\"[^\"]+\"" //everything between " and "
let replacement = "____"
let newSentence = sentence.replacingOccurrences(
of: pattern,
with: replacement,
options: .regularExpression
)
print(newSentence) // This is ____. There is an ____ on the ____
If you want to keep the same number of characters, then you can iterate over the matches:
let sentence = "This is table. There is \"an\" apple on \"the\" table."
let regularExpression = try! NSRegularExpression(pattern: "\"[^\"]+\"", options: [])
let matches = regularExpression.matches(
in: sentence,
options: [],
range: NSMakeRange(0, sentence.characters.count)
)
var newSentence = sentence
for match in matches {
let replacement = Array(repeating: "_", count: match.range.length - 2).joined()
newSentence = (newSentence as NSString).replacingCharacters(in: match.range, with: "\"" + replacement + "\"")
}
print(newSentence) // This is table. There is "__" apple on "___" table.
I wrote an extension to do this:
extension String {
mutating func replace(from: String, to: String, by new: String) {
guard let from = range(of: from)?.lowerBound, let to = range(of: to)?.upperBound else { return }
let range = from..<to
self = replacingCharacters(in: range, with: new)
}
func replaced(from: String, to: String, by new: String) -> String {
guard let from = range(of: from)?.lowerBound, let to = range(of: to)?.upperBound else { return self }
let range = from..<to
return replacingCharacters(in: range, with: new)
}
}