Is there any specific API to get the next alphabet of a character?
Example:
if "Somestring".characters.first results in "S", then should
return "T"
If there's none I guess I have to iterate through a collection of alphabet and return the next character in order. Or is there any other better solution?
If you think of the Latin capital letters "A" ... "Z" then the
following should work:
func nextLetter(_ letter: String) -> String? {
// Check if string is build from exactly one Unicode scalar:
guard let uniCode = UnicodeScalar(letter) else {
return nil
}
switch uniCode {
case "A" ..< "Z":
return String(UnicodeScalar(uniCode.value + 1)!)
default:
return nil
}
}
It returns the next Latin capital letter if there is one,
and nil otherwise. It works because the Latin capital letters
have consecutive Unicode scalar values.
(Note that UnicodeScalar(uniCode.value + 1)! cannot fail in that
range.) The guard statement handles both multi-character
strings and extended grapheme clusters (such as flags "🇩🇪").
You can use
case "A" ..< "Z", "a" ..< "z":
if lowercase letters should be covered as well.
Examples:
nextLetter("B") // C
nextLetter("Z") // nil
nextLetter("€") // nil
func nextChar(str:String) {
if let firstChar = str.unicodeScalars.first {
let nextUnicode = firstChar.value + 1
if let var4 = UnicodeScalar(nextUnicode) {
var nextString = ""
nextString.append(Character(UnicodeScalar(var4)))
print(nextString)
}
}
}
nextChar(str: "A") // B
nextChar(str: "ζ") // η
nextChar(str: "z") // {
Related
I am using the Apple example on text recognition/reading phone numbers. I would like to change it so that instead of recognizing phone numbers it recognizes two different patterns, CMW followed by numbers and letters or DWP followed by numbers and letters.
Here is what I am using that I am unsure what to change:
import Foundation
extension Character {
// Given a list of allowed characters, try to convert self to those in list
// if not already in it. This handles some common misclassifications for
// characters that are visually similar and can only be correctly recognized
// with more context and/or domain knowledge. Some examples (should be read
// in Menlo or some other font that has different symbols for all characters):
// 1 and l are the same character in Times New Roman
// I and l are the same character in Helvetica
// 0 and O are extremely similar in many fonts
// oO, wW, cC, sS, pP and others only differ by size in many fonts
func getSimilarCharacterIfNotIn(allowedChars: String) -> Character {
let conversionTable = [
"s": "S",
"S": "5",
"5": "S",
"o": "O",
"Q": "O",
"O": "0",
"0": "O",
"l": "I",
"I": "1",
"1": "I",
"B": "8",
"8": "B"
]
// Allow a maximum of two substitutions to handle 's' -> 'S' -> '5'.
let maxSubstitutions = 2
var current = String(self)
var counter = 0
while !allowedChars.contains(current) && counter < maxSubstitutions {
if let altChar = conversionTable[current] {
current = altChar
counter += 1
} else {
// Doesn't match anything in our table. Give up.
break
}
}
return current.first!
}
}
extension String {
// Extracts the first US-style phone number found in the string, returning
// the range of the number and the number itself as a tuple.
// Returns nil if no number is found.
func extractPhoneNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 17 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
func extractSerialNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 10 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
}
class StringTracker {
var frameIndex: Int64 = 0
typealias StringObservation = (lastSeen: Int64, count: Int64)
// Dictionary of seen strings. Used to get stable recognition before
// displaying anything.
var seenStrings = [String: StringObservation]()
var bestCount = Int64(0)
var bestString = ""
func logFrame(strings: [String]) {
for string in strings {
if seenStrings[string] == nil {
seenStrings[string] = (lastSeen: Int64(0), count: Int64(-1))
}
seenStrings[string]?.lastSeen = frameIndex
seenStrings[string]?.count += 1
print("Seen \(string) \(seenStrings[string]?.count ?? 0) times")
}
var obsoleteStrings = [String]()
// Go through strings and prune any that have not been seen in while.
// Also find the (non-pruned) string with the greatest count.
for (string, obs) in seenStrings {
// Remove previously seen text after 30 frames (~1s).
if obs.lastSeen < frameIndex - 30 {
obsoleteStrings.append(string)
}
// Find the string with the greatest count.
let count = obs.count
if !obsoleteStrings.contains(string) && count > bestCount {
bestCount = Int64(count)
bestString = string
}
}
// Remove old strings.
for string in obsoleteStrings {
seenStrings.removeValue(forKey: string)
}
frameIndex += 1
}
func getStableString() -> String? {
// Require the recognizer to see the same string at least 10 times.
if bestCount >= 10 {
return bestString
} else {
return nil
}
}
func reset(string: String) {
seenStrings.removeValue(forKey: string)
bestCount = 0
bestString = ""
}
}
For example: I have character "b" and I what to get "a", so "a" is the previous character.
let b: Character = "b"
let a: Character = b - 1 // Compilation error
It's actually pretty complicated to get the previous character from Swift's Character type because Character is actually comprised of one or more Unicode.Scalar values. Depending on your needs you could restrict your efforts to just the ASCII characters. Or you could support all characters comprised of a single Unicode scalar. Once you get into characters comprised of multiple Unicode scalars (such as the flag Emojis or various skin toned Emojis) then I'm not even sure what the "previous character" means.
Here is a pair of methods added to a Character extension that can handle ASCII and single-Unicode scalar characters.
extension Character {
var previousASCII: Character? {
if let ascii = asciiValue, ascii > 0 {
return Character(Unicode.Scalar(ascii - 1))
}
return nil
}
var previousScalar: Character? {
if unicodeScalars.count == 1 {
if let scalar = unicodeScalars.first, scalar.value > 0 {
if let prev = Unicode.Scalar(scalar.value - 1) {
return Character(prev)
}
}
}
return nil
}
}
Examples:
let b: Character = "b"
let a = b.previousASCII // Gives Optional("a")
let emoji: Character = "😆"
let previous = emoji.previousScalar // Gives Optional("😅")
I would like to get a func which will be able to reverse a string without affecting special characters, preferably using regex, ex:
Input: “Weather is cool 24/7” -> Output: “rehtaeW si looc 24/7”
Input: “abcd efgh” -> Output: “dcba hgfe”
Input: “a1bcd efg!h” -> Output: “d1cba hgf!e”
I was able to write only for all characters without exceptions, I'm a beginner, and I don't know how to use regexes
func reverseTheWord(reverse: String) -> String {
let parts = reverse.components(separatedBy: " ")
let reversed = parts.map{String($0.reversed())}
let reversedWord = reversed.joined(separator: " ")
return reversedWord
}
thanks in advance!
Here is a solution where I first check what type each word is, only letters, no letters or a mix of letters and other characters and handle each differently.
The first two are self explanatory and for the mix one I first reverse the word and remove all non letters and then reinsert the non letters at their original position
func reverseTheWords(_ string: String) -> String {
var words = string.components(separatedBy: .whitespaces)
for (index, word) in words.enumerated() {
//Only letters
if word.allSatisfy(\.isLetter) {
words[index] = String(word.reversed())
continue
}
//No letters
if !word.contains(where: \.isLetter) { continue }
//Mix
var reversed = word.reversed().filter(\.isLetter)
for (index, char) in word.enumerated() {
if !char.isLetter {
index < reversed.endIndex ? reversed.insert(char, at: index) : reversed.append(char)
}
}
words[index] = String(reversed)
}
return words.joined(separator: " ")
}
My main string is like this "90000+8000-1000*10". I wanted to find the length of substring that contain number and make it into array. So it will be like this:
print(substringLength[0]) //Show 5
print(substringLength[1]) //Show 4
Could anyone help me with this? Thanks in advance!
⚠️ Be aware of using replacingOccurrences!
Although this method (mentioned by #Raja Kishan) may work in some cases, it's not forward compatible and will fail if you have unhandled characters (like other expression operators)
✅ Just write it as you say it:
let numbers = "90000+8000-1000*10".split { !$0.isWholeNumber && $0 != "." }
You have the numbers! go ahead and count the length
numbers[0].count // show 5
numbers[1].count // shows 4
🎁 You can also have the operators like:
let operators = "90000+8000-1000*10".split { $0.isWholeNumber || $0 == "." }
You can split when the character is not a number.
The 'max splits' method is used for performance, so you don't unnecessarily split part of the input you don't need. There are also preconditions to handle any bad input.
func substringLength(of input: String, at index: Int) -> Int {
precondition(index >= 0, "Index is negative")
let sections = input.split(maxSplits: index + 1, omittingEmptySubsequences: false) { char in
!char.isNumber
}
precondition(index < sections.count, "Out of range")
return sections[index].count
}
let str = "90000+8000-1000*10"
substringLength(of: str, at: 0) // 5
substringLength(of: str, at: 1) // 4
substringLength(of: str, at: 2) // 4
substringLength(of: str, at: 3) // 2
substringLength(of: str, at: 4) // Precondition failed: Out of range
If the sign (operator) is fixed then you can replace all signs with a common one sign and split the string by a common sign.
Here is the example
extension String {
func getSubStrings() -> [String] {
let commonSignStr = self.replacingOccurrences(of: "+", with: "-").replacingOccurrences(of: "*", with: "-")
return commonSignStr.components(separatedBy: "-")
}
}
let str = "90000+8000-1000*10"
str.getSubStrings().forEach({print($0.count)})
I'd assume that the separators are not numbers, regardless of what they are.
let str = "90000+8000-1000*10"
let arr = str.split { !$0.isNumber }
let substringLength = arr.map { $0.count }
print(substringLength) // [5, 4, 4, 2]
print(substringLength[0]) //Show 5
print(substringLength[1]) //Show 4
Don't use isNumber Character property. This would allow fraction characters as well as many others that are not single digits 0...9.
Discussion
For example, the following characters all represent numbers:
“7” (U+0037 DIGIT SEVEN)
“⅚” (U+215A VULGAR FRACTION FIVE SIXTHS)
“㊈” (U+3288 CIRCLED IDEOGRAPH NINE)
“𝟠” (U+1D7E0 MATHEMATICAL DOUBLE-STRUCK DIGIT EIGHT)
“๒” (U+0E52 THAI DIGIT TWO)
let numbers = "90000+8000-1000*10".split { !("0"..."9" ~= $0) } // ["90000", "8000", "1000", "10"]
let numbers2 = "90000+8000-1000*10 ५ ๙ 万 ⅚ 𝟠 ๒ ".split { !("0"..."9" ~= $0) } // ["90000", "8000", "1000", "10"]
This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}