Swift - How to check a string not included punctuations and numbers - swift

I want to check a string to be able to understand that string is suitable for using as a display name in the app. Below block looks only for english characters. How can I cover all language letters? Also all punctuations and numbers won't be allowed.
func isSuitableForDisplayName(inputString: String) -> Bool {
let mergedString = inputString.stringByRemovingWhitespaces
let characterset = CharacterSet(charactersIn: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
if mergedString.rangeOfCharacter(from: characterset.inverted) != nil {
return false
} else {
return true
}
}

You can use CharacterSet.letters, which contains all the characters in the Unicode categories L and M.
Category M includes combining marks. If you don't want those, use:
CharacterSet.letters.subtracting(.nonBaseCharacters)
Also, your way of checking whether a string contains only the characters in a character set is quite weird. I would do something like this:
return mergedString.trimmingCharacters(in: CharacterSet.letters) == ""

Related

Check or validation Persian(Farsi) string swift

I searched over web pages and stack overflow about validation of a Persian(Farsi) language string. Most of them have mentioned Arabic letters. Also, I want to know if my string is fully Persian(not contain).
for example, these strings are Persian:
"چهار راه"
"خیابان."
And These are not:
"خیابان 5"
"چرا copy کردی؟"
Also, just Persian or Arabic digits are allowed. There are exceptions about [.,-!] characters(because keyboards are not supported these characters in Persian)
UPDATE:
I explained a swift version of using regex and predicate in my answer.
Based on this extension found elsewhere:
extension String {
func matches(_ regex: String) -> Bool {
return self.range(of: regex, options: .regularExpression, range: nil, locale: nil) != nil
}
}
and construct your regex containing allowed characters like
let mystra = "چهار راه"
let mystrb = "خیابان."
let mystrc = "خیابان 5"
let mystrd = "چرا copy کردی؟" //and so on
for a in mystra {
if String(a).matches("[\u{600}-\u{6FF}\u{064b}\u{064d}\u{064c}\u{064e}\u{064f}\u{0650}\u{0651}\u{0020}]") { // add unicode for dot, comma, and other needed puctuation marks, for now I added space etc
} else { // not in range
print("oh no--\(a)---zzzz")
break // or return false
}
}
Make sure you construct the Unicode needed using the above model.
Result for other strings
for a in mystrb ... etc
oh no--.---zzzz
oh no--5---zzzz
oh no--c---zzzz
Enjoy
After a period I could find a better way:
extension String {
var isPersian: Bool {
let predicate = NSPredicate(format: "SELF MATCHES %#",
"([-.]*\\s*[-.]*\\p{Arabic}*[-.]*\\s*)*[-.]*")
return predicate.evaluate(with: self)
}
}
and you can use like this:
print("yourString".isPersian) //response: true or false
The main key is using regex and predicate. these links help you to manipulate whatever you want:
https://nshipster.com/nspredicate/
https://nspredicate.xyz/
http://userguide.icu-project.org/strings/regexp
Feel free and ask whatever question about this topic :D
[EDIT] The following regex can be used to accept Latin numerics, as they are mostly accepted in Persian texts
"([-.]*\\s*[-.]*\\p{Arabic}*[0-9]*[-.]*\\s*)*[-.]*"

how to remove special characters at the beginning / end (first/last) of a string in swift 4

how to remove special characters at the beginning/end of a string in swift 4
For example,
let myString = "abcde."
IF I want to remove special characters from the end of myString like . then it returns to me abcde
OR, if I want to remove e. at the end, it will return abcd.
AND, if I put something wrong like de, it will return the original string abcde.
Removing special characters at the beginning is the similar situation.
The only thing you should do to meet the requirement in the question is to make an extension of String
extension String {
// remove the end
func removeTheSpecialCharAtLast(char: String) -> String {
if hasSuffix(char){
return String(dropLast(char.count))
}
return self
}
// remove the beginning
func removeTheSpecialCharAtFirst(char: String) -> String {
if hasPrefix(char){
return String(dropFirst(char.count))
}
return self
}
}
And, I've made a test to remove chars at the end. To remove chars at the beginning is similar
let myString = "abcde."
let op1 = myString.removeTheSpecialCharAtLast(char: "cd")
let op2 = myString.removeTheSpecialCharAtLast(char: ".")
let op3 = myString.removeTheSpecialCharAtLast(char: "de")
let op4 = myString.removeTheSpecialCharAtLast(char: "de.")
print(op1) //abcde.
print(op2) //abcde
print(op3) //abcde.
print(op4) //abc

Strange String.unicodeScalars and CharacterSet behaviour

I'm trying to use a Swift 3 CharacterSet to filter characters out of a String but I'm getting stuck very early on. A CharacterSet has a method called contains
func contains(_ member: UnicodeScalar) -> Bool
Test for membership of a particular UnicodeScalar in the CharacterSet.
But testing this doesn't produce the expected behaviour.
let characterSet = CharacterSet.capitalizedLetters
let capitalAString = "A"
if let capitalA = capitalAString.unicodeScalars.first {
print("Capital A is \(characterSet.contains(capitalA) ? "" : "not ")in the group of capital letters")
} else {
print("Couldn't get the first element of capitalAString's unicode scalars")
}
I'm getting Capital A is not in the group of capital letters yet I'd expect the opposite.
Many thanks.
CharacterSet.capitalizedLetters
returns a character set containing the characters in Unicode General Category Lt aka "Letter, titlecase". That are
"Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)" (compare Wikipedia: Unicode character property or
Unicode® Standard Annex #44 – Table 12. General_Category Values).
You can find a list here: Unicode Characters in the 'Letter, Titlecase' Category.
You can also use the code from
NSArray from NSCharacterset to dump the contents of the character
set:
extension CharacterSet {
func allCharacters() -> [Character] {
var result: [Character] = []
for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
result.append(Character(uniChar))
}
}
}
return result
}
}
let characterSet = CharacterSet.capitalizedLetters
print(characterSet.allCharacters())
// ["Dž", "Lj", "Nj", "Dz", "ᾈ", "ᾉ", "ᾊ", "ᾋ", "ᾌ", "ᾍ", "ᾎ", "ᾏ", "ᾘ", "ᾙ", "ᾚ", "ᾛ", "ᾜ", "ᾝ", "ᾞ", "ᾟ", "ᾨ", "ᾩ", "ᾪ", "ᾫ", "ᾬ", "ᾭ", "ᾮ", "ᾯ", "ᾼ", "ῌ", "ῼ"]
What you probably want is CharacterSet.uppercaseLetters which
Returns a character set containing the characters in Unicode General Category Lu and Lt.

Get numbers characters from a string [duplicate]

This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}

How can I check if a string contains Chinese in Swift?

I want to know that how can I check if a string contains Chinese in Swift?
For example, I want to check if there's Chinese inside:
var myString = "Hi! 大家好!It's contains Chinese!"
Thanks!
This answer
to How to determine if a character is a Chinese character can also easily be translated from
Ruby to Swift (now updated for Swift 3):
extension String {
var containsChineseCharacters: Bool {
return self.range(of: "\\p{Han}", options: .regularExpression) != nil
}
}
if myString.containsChineseCharacters {
print("Contains Chinese")
}
In a regular expression, "\p{Han}" matches all characters with the
"Han" Unicode property, which – as I understand it – are the characters
from the CJK languages.
Looking at questions on how to do this in other languages (such as this accepted answer for Ruby) it looks like the common technique is to determine if each character in the string falls in the CJK range. The ruby answer could be adapted to Swift strings as extension with the following code:
extension String {
var containsChineseCharacters: Bool {
return self.unicodeScalars.contains { scalar in
let cjkRanges: [ClosedInterval<UInt32>] = [
0x4E00...0x9FFF, // main block
0x3400...0x4DBF, // extended block A
0x20000...0x2A6DF, // extended block B
0x2A700...0x2B73F, // extended block C
]
return cjkRanges.contains { $0.contains(scalar.value) }
}
}
}
// true:
"Hi! 大家好!It's contains Chinese!".containsChineseCharacters
// false:
"Hello, world!".containsChineseCharacters
The ranges may already exist in Foundation somewhere rather than manually hardcoding them.
The above is for Swift 2.0, for earlier, you will have to use the free contains function rather than the protocol extension (twice):
extension String {
var containsChineseCharacters: Bool {
return contains(self.unicodeScalars) {
// older version of compiler seems to need extra help with type inference
(scalar: UnicodeScalar)->Bool in
let cjkRanges: [ClosedInterval<UInt32>] = [
0x4E00...0x9FFF, // main block
0x3400...0x4DBF, // extended block A
0x20000...0x2A6DF, // extended block B
0x2A700...0x2B73F, // extended block C
]
return contains(cjkRanges) { $0.contains(scalar.value) }
}
}
}
The accepted answer only find if string contains Chinese character, i created one suit for my own case:
enum ChineseRange {
case notFound, contain, all
}
extension String {
var findChineseCharacters: ChineseRange {
guard let a = self.range(of: "\\p{Han}*\\p{Han}", options: .regularExpression) else {
return .notFound
}
var result: ChineseRange
switch a {
case nil:
result = .notFound
case self.startIndex..<self.endIndex:
result = .all
default:
result = .contain
}
return result
}
}
if "你好".findChineseCharacters == .all {
print("All Chinese")
}
if "Chinese".findChineseCharacters == .notFound {
print("Not found Chinese")
}
if "Chinese你好".findChineseCharacters == .contain {
print("Contains Chinese")
}
gist here: https://gist.github.com/williamhqs/6899691b5a26272550578601bee17f1a
Try this in Swift 2:
var myString = "Hi! 大家好!It's contains Chinese!"
var a = false
for c in myString.characters {
let cs = String(c)
a = a || (cs != cs.stringByApplyingTransform(NSStringTransformMandarinToLatin, reverse: false))
}
print("\(myString) contains Chinese characters = \(a)")
I have created a Swift 3 String extension for checking how much Chinese characters a String contains. Similar to the code by Airspeed Velocity but more comprehensive. Checking various Unicode ranges to see whether a character is Chinese. See Chinese character ranges listed in the tables under section 18.1 in the Unicode standard specification: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf
The String extension can be found on GitHub: https://github.com/niklasberglund/String-chinese.swift
Usage example:
let myString = "Hi! 大家好!It contains Chinese!"
let chinesePercentage = myString.chinesePercentage()
let chineseCharacterCount = myString.chineseCharactersCount()
print("String contains \(chinesePercentage) percent Chinese. That's \(chineseCharacterCount) characters.")