Arabic text stripper / parser

Arabic text stripper / parser - swift

I'm new to programming. I made a function that removes the vowels on Arabic text input. My question is if I used the best loop for the task or is there a better and more concise way to write this?
It would be nice to improve the code. Thank you.
// It seems to work now. I solved it.
// What do you think? :) Happy that I managed to solve it.
// Programming is fun!!! :D
var arabic:String = "الْعَرَبِيَّةُ لُغَةٌ جَمِيلَةٌ"
func txtStripper(arabic: String) -> String {
var strippedTxt = ""
for character in arabic {
if character != "َ" && character != "ِ" && character != "ّ" && character != "ْ" && character != "ُ" && character != "ٌ" && character != "ً" && character != "ٍ" {
strippedTxt += toString(print(character))
}
}
return strippedTxt
}
txtStripper(arabic)

First of all you can clean up the if statement by using contains function.
func txtStripper(arabic: String) -> String {
var strippedTxt = ""
let vowels : Character[] = ["َ", "ِ", "ّ", "ْ", "ُ", "ٌ", "ً", "ٍ"]
for character in arabic {
if !contains (vowels, character) {
strippedTxt += toString(character)
}
}
return strippedTxt
}
Next, if you are comfortable with closures, you can rewrite the whole function in much more concise way as follows:
func txtStripperWithClosures(arabic:String) -> String {
let vowels : Character[] = ["َ", "ِ", "ّ", "ْ", "ُ", "ٌ", "ً", "ٍ"]
return Array(arabic).filter({!contains(vowels, $0)}).reduce("",+)
}
It works as follows:
Array(arabic) turns the String into Character[]
filter({!contains(vowels, $0)}) removes the vowel characters from the array
reduce("",+) joins the character list back into a String

Related

Reverse words with exclusion rules

I would like to get a func which will be able to reverse a string without affecting special characters, preferably using regex, ex:
Input: “Weather is cool 24/7” -> Output: “rehtaeW si looc 24/7”
Input: “abcd efgh” -> Output: “dcba hgfe”
Input: “a1bcd efg!h” -> Output: “d1cba hgf!e”
I was able to write only for all characters without exceptions, I'm a beginner, and I don't know how to use regexes
func reverseTheWord(reverse: String) -> String {
let parts = reverse.components(separatedBy: " ")
let reversed = parts.map{String($0.reversed())}
let reversedWord = reversed.joined(separator: " ")
return reversedWord
}
thanks in advance!

Here is a solution where I first check what type each word is, only letters, no letters or a mix of letters and other characters and handle each differently.
The first two are self explanatory and for the mix one I first reverse the word and remove all non letters and then reinsert the non letters at their original position
func reverseTheWords(_ string: String) -> String {
var words = string.components(separatedBy: .whitespaces)
for (index, word) in words.enumerated() {
//Only letters
if word.allSatisfy(\.isLetter) {
words[index] = String(word.reversed())
continue
}
//No letters
if !word.contains(where: \.isLetter) { continue }
//Mix
var reversed = word.reversed().filter(\.isLetter)
for (index, char) in word.enumerated() {
if !char.isLetter {
index < reversed.endIndex ? reversed.insert(char, at: index) : reversed.append(char)
}
}
words[index] = String(reversed)
}
return words.joined(separator: " ")
}

Trim only trailing whitespace from end of string in Swift 3

Every example of trimming strings in Swift remove both leading and trailing whitespace, but how can only trailing whitespace be removed?
For example, if I have a string:
" example "
How can I end up with:
" example"
Every solution I've found shows trimmingCharacters(in: CharacterSet.whitespaces), but I want to retain the leading whitespace.
RegEx is a possibility, or a range can be derived to determine index of characters to remove, but I can't seem to find an elegant solution for this.

With regular expressions:
let string = " example "
let trimmed = string.replacingOccurrences(of: "\\s+$", with: "", options: .regularExpression)
print(">" + trimmed + "<")
// > example<
\s+ matches one or more whitespace characters, and $ matches
the end of the string.

In Swift 4 & Swift 5
This code will also remove trailing new lines.
It works based on a Character struct's method .isWhitespace
var trailingSpacesTrimmed: String {
var newString = self
while newString.last?.isWhitespace == true {
newString = String(newString.dropLast())
}
return newString
}

This short Swift 3 extension of string uses the .anchored and .backwards option of rangeOfCharacter and then calls itself recursively if it needs to loop. Because the compiler is expecting a CharacterSet as the parameter, you can just supply the static when calling, e.g. "1234 ".trailing(.whitespaces) will return "1234". (I've not done timings, but would expect faster than regex.)
extension String {
func trailingTrim(_ characterSet : CharacterSet) -> String {
if let range = rangeOfCharacter(from: characterSet, options: [.anchored, .backwards]) {
return self.substring(to: range.lowerBound).trailingTrim(characterSet)
}
return self
}
}

In Foundation you can get ranges of indices matching a regular expression. You can also replace subranges. Combining this, we get:
import Foundation
extension String {
func trimTrailingWhitespace() -> String {
if let trailingWs = self.range(of: "\\s+$", options: .regularExpression) {
return self.replacingCharacters(in: trailingWs, with: "")
} else {
return self
}
}
}
You can also have a mutating version of this:
import Foundation
extension String {
mutating func trimTrailingWhitespace() {
if let trailingWs = self.range(of: "\\s+$", options: .regularExpression) {
self.replaceSubrange(trailingWs, with: "")
}
}
}
If we match against \s* (as Martin R. did at first) we can skip the if let guard and force-unwrap the optional since there will always be a match. I think this is nicer since it's obviously safe, and remains safe if you change the regexp. I did not think about performance.

Handy String extension In Swift 4
extension String {
func trimmingTrailingSpaces() -> String {
var t = self
while t.hasSuffix(" ") {
t = "" + t.dropLast()
}
return t
}
mutating func trimmedTrailingSpaces() {
self = self.trimmingTrailingSpaces()
}
}

Swift 4
extension String {
var trimmingTrailingSpaces: String {
if let range = rangeOfCharacter(from: .whitespacesAndNewlines, options: [.anchored, .backwards]) {
return String(self[..<range.lowerBound]).trimmingTrailingSpaces
}
return self
}
}

Demosthese's answer is a useful solution to the problem, but it's not particularly efficient. This is an upgrade to their answer, extending StringProtocol instead, and utilizing Substring to remove the need for repeated copying.
extension StringProtocol {
#inline(__always)
var trailingSpacesTrimmed: Self.SubSequence {
var view = self[...]
while view.last?.isWhitespace == true {
view = view.dropLast()
}
return view
}
}

No need to create a new string when dropping from the end each time.
extension String {
func trimRight() -> String {
String(reversed().drop { $0.isWhitespace }.reversed())
}
}
This operates on the collection and only converts the result back into a string once.

It's a little bit hacky :D
let message = " example "
var trimmed = ("s" + message).trimmingCharacters(in: .whitespacesAndNewlines)
trimmed = trimmed.substring(from: trimmed.index(after: trimmed.startIndex))

Without regular expression there is not direct way to achieve that.Alternatively you can use the below function to achieve your required result :
func removeTrailingSpaces(with spaces : String) -> String{
var spaceCount = 0
for characters in spaces.characters{
if characters == " "{
print("Space Encountered")
spaceCount = spaceCount + 1
}else{
break;
}
}
var finalString = ""
let duplicateString = spaces.replacingOccurrences(of: " ", with: "")
while spaceCount != 0 {
finalString = finalString + " "
spaceCount = spaceCount - 1
}
return (finalString + duplicateString)
}
You can use this function by following way :-
let str = " Himanshu "
print(removeTrailingSpaces(with : str))

One line solution with Swift 4 & 5
As a beginner in Swift and iOS programming I really like #demosthese's solution above with the while loop as it's very easy to understand. However the example code seems longer than necessary. The following uses essentially the same logic but implements it as a single line while loop.
// Remove trailing spaces from myString
while myString.last == " " { myString = String(myString.dropLast()) }
This can also be written using the .isWhitespace property, as in #demosthese's solution, as follows:
while myString.last?.isWhitespace == true { myString = String(myString.dropLast()) }
This has the benefit (or disadvantage, depending on your point of view) that this removes all types of whitespace, not just spaces but (according to Apple docs) also including newlines, and specifically the following characters:
“\t” (U+0009 CHARACTER TABULATION)
“ “ (U+0020 SPACE)
U+2029 PARAGRAPH SEPARATOR
U+3000 IDEOGRAPHIC SPACE
Note: Even though .isWhitespace is a Boolean it can't be used directly in the while loop as it ends up being optional ? due to the chaining of the optional .last property, which returns nil if the String (or collection) is empty. The == true logic gets around this since nil != true.
I'd love to get some feedback on this, esp. in case anyone sees any issues or drawbacks with this simple single line approach.

Swift 5
extension String {
func trimTrailingWhiteSpace() -> String {
guard self.last == " " else { return self }
var tmp = self
repeat {
tmp = String(tmp.dropLast())
} while tmp.last == " "
return tmp
}
}

How can I check if a string contains Chinese in Swift?

I want to know that how can I check if a string contains Chinese in Swift?
For example, I want to check if there's Chinese inside:
var myString = "Hi! 大家好！It's contains Chinese!"
Thanks!

This answer
to How to determine if a character is a Chinese character can also easily be translated from
Ruby to Swift (now updated for Swift 3):
extension String {
var containsChineseCharacters: Bool {
return self.range(of: "\\p{Han}", options: .regularExpression) != nil
}
}
if myString.containsChineseCharacters {
print("Contains Chinese")
}
In a regular expression, "\p{Han}" matches all characters with the
"Han" Unicode property, which – as I understand it – are the characters
from the CJK languages.

Looking at questions on how to do this in other languages (such as this accepted answer for Ruby) it looks like the common technique is to determine if each character in the string falls in the CJK range. The ruby answer could be adapted to Swift strings as extension with the following code:
extension String {
var containsChineseCharacters: Bool {
return self.unicodeScalars.contains { scalar in
let cjkRanges: [ClosedInterval<UInt32>] = [
0x4E00...0x9FFF, // main block
0x3400...0x4DBF, // extended block A
0x20000...0x2A6DF, // extended block B
0x2A700...0x2B73F, // extended block C
]
return cjkRanges.contains { $0.contains(scalar.value) }
}
}
}
// true:
"Hi! 大家好！It's contains Chinese!".containsChineseCharacters
// false:
"Hello, world!".containsChineseCharacters
The ranges may already exist in Foundation somewhere rather than manually hardcoding them.
The above is for Swift 2.0, for earlier, you will have to use the free contains function rather than the protocol extension (twice):
extension String {
var containsChineseCharacters: Bool {
return contains(self.unicodeScalars) {
// older version of compiler seems to need extra help with type inference
(scalar: UnicodeScalar)->Bool in
let cjkRanges: [ClosedInterval<UInt32>] = [
0x4E00...0x9FFF, // main block
0x3400...0x4DBF, // extended block A
0x20000...0x2A6DF, // extended block B
0x2A700...0x2B73F, // extended block C
]
return contains(cjkRanges) { $0.contains(scalar.value) }
}
}
}

The accepted answer only find if string contains Chinese character, i created one suit for my own case:
enum ChineseRange {
case notFound, contain, all
}
extension String {
var findChineseCharacters: ChineseRange {
guard let a = self.range(of: "\\p{Han}*\\p{Han}", options: .regularExpression) else {
return .notFound
}
var result: ChineseRange
switch a {
case nil:
result = .notFound
case self.startIndex..<self.endIndex:
result = .all
default:
result = .contain
}
return result
}
}
if "你好".findChineseCharacters == .all {
print("All Chinese")
}
if "Chinese".findChineseCharacters == .notFound {
print("Not found Chinese")
}
if "Chinese你好".findChineseCharacters == .contain {
print("Contains Chinese")
}
gist here: https://gist.github.com/williamhqs/6899691b5a26272550578601bee17f1a

Try this in Swift 2:
var myString = "Hi! 大家好！It's contains Chinese!"
var a = false
for c in myString.characters {
let cs = String(c)
a = a || (cs != cs.stringByApplyingTransform(NSStringTransformMandarinToLatin, reverse: false))
}
print("\(myString) contains Chinese characters = \(a)")

I have created a Swift 3 String extension for checking how much Chinese characters a String contains. Similar to the code by Airspeed Velocity but more comprehensive. Checking various Unicode ranges to see whether a character is Chinese. See Chinese character ranges listed in the tables under section 18.1 in the Unicode standard specification: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf
The String extension can be found on GitHub: https://github.com/niklasberglund/String-chinese.swift
Usage example:
let myString = "Hi! 大家好！It contains Chinese!"
let chinesePercentage = myString.chinesePercentage()
let chineseCharacterCount = myString.chineseCharactersCount()
print("String contains \(chinesePercentage) percent Chinese. That's \(chineseCharacterCount) characters.")

How can I check if a string contains letters in Swift? [duplicate]

This question already has answers here:
What is the best way to determine if a string contains a character from a set in Swift
(11 answers)
Closed 7 years ago.
I'm trying to check whether a specific string contains letters or not.
So far I've come across NSCharacterSet.letterCharacterSet() as a set of letters, but I'm having trouble checking whether a character in that set is in the given string. When I use this code, I get an error stating:
'Character' is not convertible to 'unichar'
For the following code:
for chr in input{
if letterSet.characterIsMember(chr){
return "Woah, chill out!"
}
}

You can use NSCharacterSet in the following way :
let letters = NSCharacterSet.letters
let phrase = "Test case"
let range = phrase.rangeOfCharacter(from: characterSet)
// range will be nil if no letters is found
if let test = range {
println("letters found")
}
else {
println("letters not found")
}
Or you can do this too :
func containsOnlyLetters(input: String) -> Bool {
for chr in input {
if (!(chr >= "a" && chr <= "z") && !(chr >= "A" && chr <= "Z") ) {
return false
}
}
return true
}
In Swift 2:
func containsOnlyLetters(input: String) -> Bool {
for chr in input.characters {
if (!(chr >= "a" && chr <= "z") && !(chr >= "A" && chr <= "Z") ) {
return false
}
}
return true
}
It's up to you, choose a way. I hope this help you.

You should use the Strings built in range functions with NSCharacterSet rather than roll your own solution. This will give you a lot more flexibility too (like case insensitive search if you so desire).
let str = "Hey this is a string"
let characterSet = NSCharacterSet(charactersInString: "aeiou")
if let _ = str.rangeOfCharacterFromSet(characterSet, options: .CaseInsensitiveSearch) {
println("true")
}
else {
println("false")
}
Substitute "aeiou" with whatever letters you're looking for.
A less flexible, but fun swift note all the same, is that you can use any of the functions available for Sequences. So you can do this:
contains("abc", "c")
This of course will only work for individual characters, and is not flexible and not recommended.

The trouble with .characterIsMember is that it takes a unichar (a typealias for UInt16).
If you iterate your input using the utf16 view of the string, it will work:
let set = NSCharacterSet.letterCharacterSet()
for chr in input.utf16 {
if set.characterIsMember(chr) {
println("\(chr) is a letter")
}
}
You can also skip the loop and use the contains algorithm if you only want to check for presence/non-presence:
if contains(input.utf16, { set.characterIsMember($0) }) {
println("contains letters")
}

Check empty string in Swift?

In Objective C, one could do the following to check for strings:
if ([myString isEqualToString:#""]) {
NSLog(#"myString IS empty!");
} else {
NSLog(#"myString IS NOT empty, it is: %#", myString);
}
How does one detect empty strings in Swift?

There is now the built in ability to detect empty string with .isEmpty:
if emptyString.isEmpty {
print("Nothing to see here")
}
Apple Pre-release documentation: "Strings and Characters".

A concise way to check if the string is nil or empty would be:
var myString: String? = nil
if (myString ?? "").isEmpty {
print("String is nil or empty")
}

I am completely rewriting my answer (again). This time it is because I have become a fan of the guard statement and early return. It makes for much cleaner code.
Non-Optional String
Check for zero length.
let myString: String = ""
if myString.isEmpty {
print("String is empty.")
return // or break, continue, throw
}
// myString is not empty (if this point is reached)
print(myString)
If the if statement passes, then you can safely use the string knowing that it isn't empty. If it is empty then the function will return early and nothing after it matters.
Optional String
Check for nil or zero length.
let myOptionalString: String? = nil
guard let myString = myOptionalString, !myString.isEmpty else {
print("String is nil or empty.")
return // or break, continue, throw
}
// myString is neither nil nor empty (if this point is reached)
print(myString)
This unwraps the optional and checks that it isn't empty at the same time. After passing the guard statement, you can safely use your unwrapped nonempty string.

In Xcode 11.3 swift 5.2 and later
Use
var isEmpty: Bool { get }
Example
let lang = "Swift 5"
if lang.isEmpty {
print("Empty string")
}
If you want to ignore white spaces
if lang.trimmingCharacters(in: .whitespaces).isEmpty {
print("Empty string")
}

Here is how I check if string is blank. By 'blank' I mean a string that is either empty or contains only space/newline characters.
struct MyString {
static func blank(text: String) -> Bool {
let trimmed = text.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
return trimmed.isEmpty
}
}
How to use:
MyString.blank(" ") // true

You can also use an optional extension so you don't have to worry about unwrapping or using == true:
extension String {
var isBlank: Bool {
return self.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty
}
}
extension Optional where Wrapped == String {
var isBlank: Bool {
if let unwrapped = self {
return unwrapped.isBlank
} else {
return true
}
}
}
Note: when calling this on an optional, make sure not to use ? or else it will still require unwrapping.

To do the nil check and length simultaneously
Swift 2.0 and iOS 9 onwards you could use
if(yourString?.characters.count > 0){}

isEmpty will do as you think it will, if string == "", it'll return true.
Some of the other answers point to a situation where you have an optional string.
PLEASE use Optional Chaining!!!!
If the string is not nil, isEmpty will be used, otherwise it will not.
Below, the optionalString will NOT be set because the string is nil
let optionalString: String? = nil
if optionalString?.isEmpty == true {
optionalString = "Lorem ipsum dolor sit amet"
}
Obviously you wouldn't use the above code. The gains come from JSON parsing or other such situations where you either have a value or not. This guarantees code will be run if there is a value.

Check check for only spaces and newlines characters in text
extension String
{
var isBlank:Bool {
return self.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()).isEmpty
}
}
using
if text.isBlank
{
//text is blank do smth
}

Swift String (isEmpty vs count)
You should use .isEmpty instead of .count
.isEmpty Complexity = O(1)
.count Complexity = O(n)
isEmpty does not use .count under the hood, it compares start and end indexes startIndex == endIndex
Official doc Collection.count
Complexity: O(1) if the collection conforms to RandomAccessCollection; otherwise, O(n), where n is the length of the collection.
Single character can be represented by many combinations of Unicode scalar values(different memory footprint), that is why to calculate count we should iterate all Unicode scalar values
String = alex
String = \u{61}\u{6c}\u{65}\u{78}
[Char] = [a, l, e, x]
Unicode text = alex
Unicode scalar values(UTF-32) = u+00000061u+0000006cu+00000065u+00000078
1 Character == 1 extended grapheme cluster == set of Unicode scalar values
Example
//Char á == extended grapheme cluster of Unicode scalar values \u{E1}
//Char á == extended grapheme cluster of Unicode scalar values \u{61}\u{301}
let a1: String = "\u{E1}" // Unicode text = á, UTF-16 = \u00e1, UTF-32 = u+000000e1
print("count:\(a1.count)") //count:1
// Unicode text = a, UTF-16 = \u0061, UTF-32 = u+00000061
// Unicode text = ́, UTF-16 = \u0301, UTF-32 = u+00000301
let a2: String = "\u{61}\u{301}" // Unicode text = á, UTF-16 = \u0061\u0301, UTF-32 = u+00000061u+00000301
print("count:\(a2.count)") //count:1

For optional Strings how about:
if let string = string where !string.isEmpty
{
print(string)
}

if myString?.startIndex != myString?.endIndex {}

I can recommend add small extension to String or Array that looks like
extension Collection {
public var isNotEmpty: Bool {
return !self.isEmpty
}
}
With it you can write code that is easier to read.
Compare this two lines
if !someObject.someParam.someSubParam.someString.isEmpty {}
and
if someObject.someParam.someSubParam.someString.isNotEmpty {}
It is easy to miss ! sign in the beginning of fist line.

public extension Swift.Optional {
func nonEmptyValue<T>(fallback: T) -> T {
if let stringValue = self as? String, stringValue.isEmpty {
return fallback
}
if let value = self as? T {
return value
} else {
return fallback
}
}
}

What about
if let notEmptyString = optionalString where !notEmptyString.isEmpty {
// do something with emptyString
NSLog("Non-empty string is %#", notEmptyString)
} else {
// empty or nil string
NSLog("Empty or nil string")
}

You can use this extension:
extension String {
static func isNilOrEmpty(string: String?) -> Bool {
guard let value = string else { return true }
return value.trimmingCharacters(in: .whitespaces).isEmpty
}
}
and then use it like this:
let isMyStringEmptyOrNil = String.isNilOrEmpty(string: myString)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Arabic text stripper / parser - swift

Related

Reverse words with exclusion rules

Trim only trailing whitespace from end of string in Swift 3

How can I check if a string contains Chinese in Swift?

How can I check if a string contains letters in Swift? [duplicate]

Check empty string in Swift?

Categories

Resources