Text Recognition - Matching Strings to Patterns

Text Recognition - Matching Strings to Patterns - swift

I am using the Apple example on text recognition/reading phone numbers. I would like to change it so that instead of recognizing phone numbers it recognizes two different patterns, CMW followed by numbers and letters or DWP followed by numbers and letters.
Here is what I am using that I am unsure what to change:
import Foundation
extension Character {
// Given a list of allowed characters, try to convert self to those in list
// if not already in it. This handles some common misclassifications for
// characters that are visually similar and can only be correctly recognized
// with more context and/or domain knowledge. Some examples (should be read
// in Menlo or some other font that has different symbols for all characters):
// 1 and l are the same character in Times New Roman
// I and l are the same character in Helvetica
// 0 and O are extremely similar in many fonts
// oO, wW, cC, sS, pP and others only differ by size in many fonts
func getSimilarCharacterIfNotIn(allowedChars: String) -> Character {
let conversionTable = [
"s": "S",
"S": "5",
"5": "S",
"o": "O",
"Q": "O",
"O": "0",
"0": "O",
"l": "I",
"I": "1",
"1": "I",
"B": "8",
"8": "B"
]
// Allow a maximum of two substitutions to handle 's' -> 'S' -> '5'.
let maxSubstitutions = 2
var current = String(self)
var counter = 0
while !allowedChars.contains(current) && counter < maxSubstitutions {
if let altChar = conversionTable[current] {
current = altChar
counter += 1
} else {
// Doesn't match anything in our table. Give up.
break
}
}
return current.first!
}
}
extension String {
// Extracts the first US-style phone number found in the string, returning
// the range of the number and the number itself as a tuple.
// Returns nil if no number is found.
func extractPhoneNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 17 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
func extractSerialNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 10 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
}
class StringTracker {
var frameIndex: Int64 = 0
typealias StringObservation = (lastSeen: Int64, count: Int64)
// Dictionary of seen strings. Used to get stable recognition before
// displaying anything.
var seenStrings = [String: StringObservation]()
var bestCount = Int64(0)
var bestString = ""
func logFrame(strings: [String]) {
for string in strings {
if seenStrings[string] == nil {
seenStrings[string] = (lastSeen: Int64(0), count: Int64(-1))
}
seenStrings[string]?.lastSeen = frameIndex
seenStrings[string]?.count += 1
print("Seen \(string) \(seenStrings[string]?.count ?? 0) times")
}
var obsoleteStrings = [String]()
// Go through strings and prune any that have not been seen in while.
// Also find the (non-pruned) string with the greatest count.
for (string, obs) in seenStrings {
// Remove previously seen text after 30 frames (~1s).
if obs.lastSeen < frameIndex - 30 {
obsoleteStrings.append(string)
}
// Find the string with the greatest count.
let count = obs.count
if !obsoleteStrings.contains(string) && count > bestCount {
bestCount = Int64(count)
bestString = string
}
}
// Remove old strings.
for string in obsoleteStrings {
seenStrings.removeValue(forKey: string)
}
frameIndex += 1
}
func getStableString() -> String? {
// Require the recognizer to see the same string at least 10 times.
if bestCount >= 10 {
return bestString
} else {
return nil
}
}
func reset(string: String) {
seenStrings.removeValue(forKey: string)
bestCount = 0
bestString = ""
}
}

Related

Get Length of a substring in string before certain character Swift

My main string is like this "90000+8000-1000*10". I wanted to find the length of substring that contain number and make it into array. So it will be like this:
print(substringLength[0]) //Show 5
print(substringLength[1]) //Show 4
Could anyone help me with this? Thanks in advance!

⚠️ Be aware of using replacingOccurrences!
Although this method (mentioned by #Raja Kishan) may work in some cases, it's not forward compatible and will fail if you have unhandled characters (like other expression operators)
✅ Just write it as you say it:
let numbers = "90000+8000-1000*10".split { !$0.isWholeNumber && $0 != "." }
You have the numbers! go ahead and count the length
numbers[0].count // show 5
numbers[1].count // shows 4
🎁 You can also have the operators like:
let operators = "90000+8000-1000*10".split { $0.isWholeNumber || $0 == "." }

You can split when the character is not a number.
The 'max splits' method is used for performance, so you don't unnecessarily split part of the input you don't need. There are also preconditions to handle any bad input.
func substringLength(of input: String, at index: Int) -> Int {
precondition(index >= 0, "Index is negative")
let sections = input.split(maxSplits: index + 1, omittingEmptySubsequences: false) { char in
!char.isNumber
}
precondition(index < sections.count, "Out of range")
return sections[index].count
}
let str = "90000+8000-1000*10"
substringLength(of: str, at: 0) // 5
substringLength(of: str, at: 1) // 4
substringLength(of: str, at: 2) // 4
substringLength(of: str, at: 3) // 2
substringLength(of: str, at: 4) // Precondition failed: Out of range

If the sign (operator) is fixed then you can replace all signs with a common one sign and split the string by a common sign.
Here is the example
extension String {
func getSubStrings() -> [String] {
let commonSignStr = self.replacingOccurrences(of: "+", with: "-").replacingOccurrences(of: "*", with: "-")
return commonSignStr.components(separatedBy: "-")
}
}
let str = "90000+8000-1000*10"
str.getSubStrings().forEach({print($0.count)})

I'd assume that the separators are not numbers, regardless of what they are.
let str = "90000+8000-1000*10"
let arr = str.split { !$0.isNumber }
let substringLength = arr.map { $0.count }
print(substringLength) // [5, 4, 4, 2]
print(substringLength[0]) //Show 5
print(substringLength[1]) //Show 4

Don't use isNumber Character property. This would allow fraction characters as well as many others that are not single digits 0...9.
Discussion
For example, the following characters all represent numbers:
“7” (U+0037 DIGIT SEVEN)
“⅚” (U+215A VULGAR FRACTION FIVE SIXTHS)
“㊈” (U+3288 CIRCLED IDEOGRAPH NINE)
“𝟠” (U+1D7E0 MATHEMATICAL DOUBLE-STRUCK DIGIT EIGHT)
“๒” (U+0E52 THAI DIGIT TWO)
let numbers = "90000+8000-1000*10".split { !("0"..."9" ~= $0) } // ["90000", "8000", "1000", "10"]
let numbers2 = "90000+8000-1000*10 ५ ๙ 万 ⅚ 𝟠 ๒ ".split { !("0"..."9" ~= $0) } // ["90000", "8000", "1000", "10"]

Analyzing speech and doing an action

I am trying to analyze a sentence and make an action depending on that.
Here are examples of sentences received from speech recognition. I wrote couple of sentences, because we actually don't know what user is going to say for sure, and if he at all says the right pattern.
var str = "20 minutes to take a shower"
var sentence = "seven minutes to make 10 last homeworks"
var sentence2 = "strum guitar for 15 minutes"
var plans = "launch with friend 12:15, then drawing lesson"
I want to extract "20 minutes" ; assign it for the timeValue and launch the timer.
Also, to assign a task to taskValue to represent the task that I am doing. (I thought getting task value by removing "20 minutes" from the initial sentence).
What do you think is the best way to work with the String that I need to analyze?
I thought of finding index ranges and then cutting/copying with the help of indexes, but
The format of the indexes it returns is like this: and I don't know how to extract the number of index. (In this case: 0 through 10)
<NSSimpleRegularExpressionCheckingResult: 0x6000027f0140>{0, 10}{<NSRegularExpression: 0x600003cfcb40> [0-9]{1,} minutes 0x1}
How to ignite the timer? We have to validate that there's the proper command given. And, when we get the timeValue and taskValue back, then how do we ignite the timer? (The whole process was: User pushes button -> user speaking -> speech recognized(and displayed on the screen label) -> sentence analyzed(?) -> timer starts(?) and task displayed in the label(?) )
What is your recommendation for architecture of speech analysis system. Maybe you know some articles on this topic?
Here's the logic for the speech detection.
var timeValue: Int = Int()
var taskValue: String = ""
func stringDeduction(of inputText: String) -> (Int?, String?) {
let pattern = "[0-9]{1,} minutes"
let regexOptions: NSRegularExpression.Options = [.caseInsensitive]
let matchingOptions: NSRegularExpression.MatchingOptions = [.reportCompletion]
// TODO - catch errors with regex
let regex = try! NSRegularExpression(pattern: pattern, options: regexOptions)
// } catch {
// print("error in regex")
// }
let range = NSRange(location: 0, length: inputText.utf8.count)
// \d - matches any digit
// Pattern for time format like this 00:00
//let patternForTime = "[0-9]{1,}:[0-9]{1,2}"
if let matchIndex = regex.firstMatch(in: inputText, options: matchingOptions, range: range) {
print(matchIndex)
} else {
print("No match.")
}
// check whether the string matches and print one of two messages
if let match = regex.firstMatch(in: inputText, range: NSRange(location: 0, length: inputText.utf8.count)) {
print("*: Match!")
} else {
print("*: No match.")
}
/* Question - how to use "mathces" properly?!
if let match = regex.matches(in: testString, options: .reportCompletion ,range: NSRange(location: 0, length: testString.utf8.count)) {
print("*: Match!")
print(match)
} else {
print("*: No match.")
}
*/
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text, options: [], range: NSRange(text.startIndex..., in: text))
return results.map {
String(text[Range($0.range, in: text)!])
}
} catch {
print("invalid regex")
return []
}
}
var task = regex.stringByReplacingMatches(in: inputText, options: .withoutAnchoringBounds, range: range, withTemplate: "")
//var taskMutableString = NSMutableString(string: str)
//regex.replaceMatches(in: taskMutableString, options: .withoutAnchoringBounds, range: range, withTemplate: "")
//taskMutableString
var timeStringArray = matches(for: pattern, in: inputText)
var timeString = timeStringArray[0]
let time = Int(timeString.replacingOccurrences(of: " minutes", with: ""))
timeValue = time ?? Int()
taskValue = task.replacingOccurrences(of: "to" , with: "", options: .caseInsensitive, range: task.startIndex..<task.index(task.startIndex, offsetBy: 4))
let taskReturn = taskValue
return (time, taskReturn)
// the regex ^[ \t]+|[ \t]+$ matches excess whitespace at the beginning or end of a line.
// what regex, or string method matches exess whitespace at the beginning of a line
}
Here's the extension to work with string like with array. Like this, - str[0..2]
extension String {
subscript (bounds: CountableClosedRange<Int>) -> String {
let start = index(startIndex, offsetBy: bounds.lowerBound)
let end = index(startIndex, offsetBy: bounds.upperBound)
return String(self[start...end])
}
subscript (bounds: CountableRange<Int>) -> String {
let start = index(startIndex, offsetBy: bounds.lowerBound)
let end = index(startIndex, offsetBy: bounds.upperBound)
return String(self[start..<end])
}
}
Here's another approach I have tried: (Although it doesn't seem good)
let timeArray = ["one minute", "two minutes", "three minutes", "four minutes", "five minutes", "six minutes", "seven minutes", "eight minutes", "nine minutes"]
let timeValueCheck = "Answer: \(sentence.containsAny(of: timeArray) ?? "doesn't contain")"
//Dividing String into words
let abc: [String] = str.components(separatedBy: " ")
// Finding a number within an array of words
// There's a problem if there are couple of numbers in the sentence, it returns all of them and not only the needed time.
let numbers = abc.compactMap {
// convert each substring into an Int
return Int($0)
}
for i in 1...100 {
if str.contains(String(i) + " minutes") {
print(i)
}
}
Thank you for any of your help! I've just gone insane with this task for 3 months. Also, if something is unclear or a bit messy, please tell me! I'll try to correct.

Regex to do something only if string has prefix

I have string "something/w1/w2/". I want to get all string between "/" characters, only if I my string is prefixed by "something".
For example, if string is "something/w1/w2/" I want to get matches "w1", "w2".
And if it is "otherThing/w1/w2/" I don't want to get any matches.
Currently, I am using "(?<=something/).+?(?=/)", but in "something/w1/w2/" it returns only "w1". How can I get also "w2"?

You could use match something at the start of the string or get iterative matches using the \G anchor matching / and a capturing group that matches any char except a /
The matches are in the first capturing group.
(?:^something|\G(?!^))/([^/\r\n]+)
With double escapes:
(?:^something|\\G(?!^))/([^/\\r\\n]+)
(?: Non capturing group
^something Match something from the start of the string
| Or
\G(?!^) Assert position at the end of previous match, not at the start
) Close non capturing group
/ match literally
([^/\r\n]+) Capture group 1 Match 1+ times any char except a / or newline
Regex demo

You can achieve this without using regex but plain Swift. Check if the string has a prefix and then split by slashes.
func extractStringsBetweenSlashes(from string: String, ifPrefix prefix: String) -> [Substring]? {
guard string.hasPrefix(prefix) else { return nil }
return string.dropFirst(prefix.count).split(separator: "/")
}
print(extractStringsBetweenSlashes(from: "something/a/b/c/d/e", ifPrefix: "something/")) // Optional(["a", "b", "c", "d", "e"])
print(extractStringsBetweenSlashes(from: "something/abcdef/", ifPrefix: "something/")) // Optional(["abcdef"])
print(extractStringsBetweenSlashes(from: "else/a/b/c/d/e", ifPrefix: "something/")) // nil

You may use
let string = "something/w1/w2/"
extension String {
func findconsecutiveMatches() -> [[String]] {
let regex = try? NSRegularExpression(pattern: "(?:(?<!\\A)\\G|^something)/([^/]+)", options: [])
if let matches = regex?.matches(in: self, options:[], range:NSMakeRange(0, self.count)) {
return matches.map { match in
return (1..<match.numberOfRanges).map {
let rangeBounds = match.range(at: $0)
guard let range = Range(rangeBounds, in: self) else {
return ""
}
return String(self[range])
}
}
} else {
return []
}
}
}
let result = string.findconsecutiveMatches().flatMap { $0 }
print(result)
// => ["w1", "w2"]
The regex is
(?:(?<!\A)\G|^something)/([^/]+)
Details
(?:(?<!\A)\G|^something) - either the end of the previous match or something at the start of the string
/ - a / char
([^/]+) - Group 1: any 1+ more chars other than /.

How to use Swift NSRegularExpression to get uppercased letter?

I have a string like this:
"te_st" and like to replace all underscores followed by a character with the uppercased version of this character.
From "te_st" --> Found (regex: "_.") --------replace with next char (+ uppercase ("s"->"S")--------> "teSt"
From "te_st" ---> to "teSt"
From "_he_l_lo" ---> to "HeLLo"
From "an_o_t_h_er_strin_g" ---> to "anOTHErStrinG"
... but I can not really get it working using Swift's NSRegularExpression like this small snipped does:
var result = "te_st" // result should be teSt
result = try! NSRegularExpression(pattern: "_*").stringByReplacingMatches(in: result, range: NSRange(0..<result.count), withTemplate: ("$1".uppercased()))

There's no regular syntax to convert a match to uppercase. The code you posted is attempting to convert the string $1 to uppercase which is of course just $1. It isn't attempting to convert the value represented by the $1 match at runtime.
Here's another approach using a regular expression to find the _ followed by a lowercase letter. Those are enumerated and replaced with the uppercase letter.
extension String {
func toCamelCase() -> String {
let expr = try! NSRegularExpression(pattern: "_([a-z])")
var res = self
for match in expr.matches(in: self, range: NSRange(0..<res.count)).reversed() {
let range = Range(match.range, in: self)!
let letterRange = Range(match.range(at: 1), in: self)!
res.replaceSubrange(range, with: self[letterRange].uppercased())
}
return res
}
}
print("te_st".toCamelCase())
print("_he_l_lo".toCamelCase())
print("an_o_t_h_er_strin_g".toCamelCase())
This outputs:
teSt
HeLLo
anOTHErStrinG

Here is one implementation using NSRegularExpression. I use group match to get the character after _ and capitalize it and replace the string.
func capitalizeLetterAfterUnderscore(string: String) -> String {
var capitalizedString = string
guard let regularExpression = try? NSRegularExpression(pattern: "_(.)") else {
return capitalizedString
}
let matches = regularExpression.matches(in: string,
options: .reportCompletion,
range: NSMakeRange(0, string.count))
for match in matches {
let groupRange = match.range(at: 1)
let index = groupRange.location
let characterIndex = string.index(string.startIndex,
offsetBy: index)
let range = characterIndex ... characterIndex
let capitalizedCharacter = String(capitalizedString[characterIndex]).capitalized
capitalizedString = capitalizedString.replacingCharacters(in: range,
with: capitalizedCharacter)
}
capitalizedString = capitalizedString.replacingOccurrences(of: "_", with: "")
return capitalizedString
}
capitalizeLetterAfterUnderscore(string: "an_o_t_h_er_strin_g") // anOTHErStrinG
And here is other one without using regular expression. I made extension for method which could also be reused.
extension String {
func indexes(of character: String) -> [Index] {
precondition(character.count == 1, "character should be single letter string")
return enumerated().reduce([]) { (partial, component) in
let currentIndex = index(startIndex,
offsetBy: component.offset)
return String(self[currentIndex]) == character
? partial + [currentIndex]
: partial
}
}
func capitalizeLetter(after indexes: [Index]) -> String {
var modifiedString = self
for currentIndex in indexes {
guard let letterIndex = index(currentIndex,
offsetBy: 1,
limitedBy: endIndex)
else { continue }
let range = letterIndex ... letterIndex
modifiedString = modifiedString.replacingCharacters(in: range,
with: self[range].capitalized)
}
return modifiedString
}
}
let string = "an_o_t_h_er_strin_g"
let newString = string.capitalizeLetter(after: string.indexes(of: "_"))
.replacingOccurrences(of: "_",with: "")

You can use string range(of:, options:, range:) method with .regularExpression options to match the occurrences of "_[a-z]" and replace the subranges iterating the ranges found at reversed order by the character at the index after the range lowerbound uppercased:
let string = "an_o_t_h_er_strin_g"
let regex = "_[a-z]"
var start = string.startIndex
var ranges:[Range<String.Index>] = []
while let range = string.range(of: regex, options: .regularExpression, range: start..<string.endIndex) {
start = range.upperBound
ranges.append(range)
}
var finalString = string
for range in ranges.reversed() {
finalString.replaceSubrange(range, with: String(string[string.index(after: range.lowerBound)]).uppercased())
}
print(finalString) // "anOTHErStrinG\n"

The problem is that it is converting the string "$1" to upper case (which is, unsurprisingly unchanged, just "$1") and using "$1" as the template. If you want to use regex, you will have to enumerate through matches yourself.
The alternative is to split the string by _ characters and uppercase the first character of every substring (except the first) and joining it back together using reduce:
let input = "te_st"
let output = input.components(separatedBy: "_").enumerated().reduce("") { $0 + ($1.0 == 0 ? $1.1 : $1.1.uppercasedFirst()) }
Or, if your goal isn't to write code as cryptic as most regex, we can make that a tad more legible:
let output = input
.components(separatedBy: "_")
.enumerated()
.reduce("") { result, current in
if current.offset == 0 {
return current.element // because you don’t want the first component capitalized
} else {
return result + current.element.uppercasedFirst()
}
}
Resulting in:
teSt
Note, that uses this extension for capitalizing the first character:
extension String {
func uppercasedFirst(with locale: Locale? = nil) -> String {
guard count > 0 else { return self }
return String(self[startIndex]).uppercased(with: locale) + self[index(after: startIndex)...]
}
}

If you want to do sort of dynamic conversion with NSRegularExpression, you can subclass NSRegularExpression and override replacementString(for:in:offset:template:):
class ToCamelRegularExpression: NSRegularExpression {
override func replacementString(for result: NSTextCheckingResult, in string: String, offset: Int, template templ: String) -> String {
if let range = Range(result.range(at: 1), in: string) {
return string[range].uppercased()
} else {
return super.replacementString(for: result, in: string, offset: 0, template: templ)
}
}
}
func toCamelCase(_ input: String) -> String { //Make this a String extension if you prefer...
let regex = try! ToCamelRegularExpression(pattern: "_(.)")
return regex.stringByReplacingMatches(in: input, options: [], range: NSRange(0..<input.utf16.count), withTemplate: "$1")
}
print(toCamelCase("te_st")) //-> teSt
print(toCamelCase("_he_l_lo")) //-> HeLLo
print(toCamelCase("an_o_t_h_er_strin_g")) //-> anOTHErStrinG

Is it possible to write a Swift function that replaces only part of an extended grapheme cluster like 👩‍👩‍👧‍👧?

I want to write a function that could be used like this:
let 👩‍👩‍👧‍👦 = "👩‍👩‍👧‍👧".replacingFirstOccurrence(of: "👧", with: "👦")
Given how odd both this string and Swift's String library are, is this possible in Swift?

Based on the insights gained at Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings?, a sensible approach might be to replace Unicode scalars:
extension String {
func replacingFirstOccurrence(of target: UnicodeScalar, with replacement: UnicodeScalar) -> String {
let uc = self.unicodeScalars
guard let idx = uc.index(of: target) else { return self }
let prefix = uc[uc.startIndex..<idx]
let suffix = uc[uc.index(after: idx) ..< uc.endIndex]
return "\(prefix)\(replacement)\(suffix)"
}
}
Example:
let family1 = "👩‍👩‍👧‍👦"
print(family1.characters.map { Array(String($0).unicodeScalars) })
// [["\u{0001F469}", "\u{200D}"], ["\u{0001F469}", "\u{200D}"], ["\u{0001F467}", "\u{200D}"], ["\u{0001F466}"]]
let family2 = family1.replacingFirstOccurrence(of: "👧", with: "👦")
print(family2) // 👩‍👩‍👦‍👦
print(family2.characters.map { Array(String($0).unicodeScalars) })
// [["\u{0001F469}", "\u{200D}"], ["\u{0001F469}", "\u{200D}"], ["\u{0001F466}", "\u{200D}"], ["\u{0001F466}"]]
And here is a possible version which locates and replaces the Unicode scalars of an arbitrary string:
extension String {
func replacingFirstOccurrence(of target: String, with replacement: String) -> String {
let uc = self.unicodeScalars
let tuc = target.unicodeScalars
// Target empty or too long:
if tuc.count == 0 || tuc.count > uc.count {
return self
}
// Current search position:
var pos = uc.startIndex
// Last possible position of `tuc` within `uc`:
let end = uc.index(uc.endIndex, offsetBy: tuc.count - 1)
// Locate first Unicode scalar
while let from = uc[pos..<end].index(of: tuc.first!) {
// Compare all Unicode scalars:
let to = uc.index(from, offsetBy: tuc.count)
if !zip(uc[from..<to], tuc).contains(where: { $0 != $1 }) {
let prefix = uc[uc.startIndex..<from]
let suffix = uc[to ..< uc.endIndex]
return "\(prefix)\(replacement)\(suffix)"
}
// Next search position:
uc.formIndex(after: &pos)
}
// Target not found.
return self
}
}

Using the range(of:options:range:locale:) the solution became quite concise:
extension String {
func replaceFirstOccurrence(of searchString: String, with replacementString: String) -> String {
guard let range = self.range(of: searchString, options: .literal) else { return self }
return self.replacingCharacters(in: range, with: replacementString)
}
}
This works by first finding the range of searchString within the instance, and if a range is found the range is replaced with replacementString. Otherwise the instance just returns itself. And, since the range(of:) method returns as soon as it finds a match, the returned range is guaranteed to be the first occurrence.
"221".replaceFirstOccurrence(of: "2", with: "3") // 321
"👩‍👩‍👧‍👦".replaceFirstOccurrence(of: "\u{1f469}", with: "\u{1f468}") // 👨‍👩‍👧‍👦
*To clarify, the last test case converts woman-woman-girl-boy to man-woman-girl-boy.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Text Recognition - Matching Strings to Patterns - swift

Related

Get Length of a substring in string before certain character Swift

Analyzing speech and doing an action

Regex to do something only if string has prefix

How to use Swift NSRegularExpression to get uppercased letter?

Is it possible to write a Swift function that replaces only part of an extended grapheme cluster like 👩‍👩‍👧‍👧?

Categories

Resources