Why does swift substring with range require a special type of Range - swift

Consider this function to build a string of random characters:
func makeToken(length: Int) -> String {
let chars: String = "abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ"
var result: String = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let idxEnd = idx + 1
let range: Range = idx..<idxEnd
let char = chars.substring(with: range)
result += char
}
return result
}
This throws an error on the substring method:
Cannot convert value of type 'Range<Int>' to expected argument
type 'Range<String.Index>' (aka 'Range<String.CharacterView.Index>')
I'm confused why I can't simply provide a Range with 2 integers, and why it's making me go the roundabout way of making a Range<String.Index>.
So I have to change the Range creation to this very over-complicated way:
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let start = chars.index(chars.startIndex, offsetBy: idx)
let end = chars.index(chars.startIndex, offsetBy: idx + 1)
let range: Range = start..<end
Why isn't it good enough for Swift for me to simply create a range with 2 integers and the half-open range operator? (..<)
Quite the contrast to "swift", in javascript I can simply do chars.substr(idx, 1)

I suggest converting your String to [Character] so that you can index it easily with Int:
func makeToken(length: Int) -> String {
let chars = Array("abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ".characters)
var result = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.count)))
result += String(chars[idx])
}
return result
}

Swift takes great care to provide a fully Unicode-compliant, type-safe, String abstraction.
Indexing a given Character, in an arbitrary Unicode string, is far from a trivial task. Each Character is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character. In particular, hiding all this complexity behind a simple Int based indexing scheme might result in the wrong performance mental model for programmers.
Having said that, you can always convert your string to a Array<Character> once for easy (and fast!) indexing. For instance:
let chars: String = "abcdefghijklmnop"
var charsArray = Array(chars.characters)
...
let resultingString = String(charsArray)

Related

How to get the range of the first line in a string?

I would like to change the formatting of the first line of text in an NSTextView (give it a different font size and weight to make it look like a headline). Therefore, I need the range of the first line. One way to go is this:
guard let firstLineString = textView.string.components(separatedBy: .newlines).first else {
return
}
let range = NSRange(location: 0, length: firstLineString.count)
However, I might be working with quite long texts so it appears to be inefficient to first split the entire string into line components when all I need is the first line component. Thus, it seems to make sense to use the firstIndex(where:) method:
let firstNewLineIndex = textView.string.firstIndex { character -> Bool in
return CharacterSet.newlines.contains(character)
}
// Then: Create an NSRange from 0 up to firstNewLineIndex.
This doesn't work and I get an error:
Cannot convert value of type '(Unicode.Scalar) -> Bool' to expected argument type 'Character'
because the contains method accepts not a Character but a Unicode.Scalar as a parameter (which doesn't really make sense to me because then it should be called a UnicodeScalarSet and not a CharacterSet, but nevermind...).
My question is:
How can I implement this in an efficient way, without first slicing the whole string?
(It doesn't necessarily have to use the firstIndex(where:) method, but appears to be the way to go.)
A String.Index range for the first line in string can be obtained with
let range = string.lineRange(for: ..<string.startIndex)
If you need that as an NSRange then
let nsRange = NSRange(range, in: string)
does the trick.
You can use rangeOfCharacter, which returns the Range<String.Index> of the first character from a set in your string:
extension StringProtocol where Index == String.Index {
var partialRangeOfFirstLine: PartialRangeUpTo<String.Index> {
return ..<(rangeOfCharacter(from: .newlines)?.lowerBound ?? endIndex)
}
var rangeOfFirstLine: Range<Index> {
return startIndex..<partialRangeOfFirstLine.upperBound
}
var firstLine: SubSequence {
return self[partialRangeOfFirstLine]
}
}
You can use it like so:
var str = """
some string
with new lines
"""
var attributedString = NSMutableAttributedString(string: str)
let firstLine = NSAttributedString(string: String(str.firstLine))
// change firstLine as you wish
let range = NSRange(str.rangeOfFirstLine, in: str)
attributedString.replaceCharacters(in: range, with: firstLine)

Get numbers characters from a string [duplicate]

This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}

String convert to Int and replace comma to Plus sign

Using Swift, I'm trying to take a list of numbers input in a text view in an app and create a sum of this list by extracting each number for a grade calculator. Also the amount of values put in by the user changes each time. An example is shown below:
String of: 98,99,97,96...
Trying to get: 98+99+97+96...
Please Help!
Thanks
Use components(separatedBy:) to break up the comma-separated string.
Use trimmingCharacters(in:) to remove spaces before and after each element
Use Int() to convert each element into an integer.
Use compactMap (previously called flatMap) to remove any items that couldn't be converted to Int.
Use reduce to sum up the array of Int.
let input = " 98 ,99 , 97, 96 "
let values = input.components(separatedBy: ",").compactMap { Int($0.trimmingCharacters(in: .whitespaces)) }
let sum = values.reduce(0, +)
print(sum) // 390
For Swift 3 and Swift 4.
Simple way: Hard coded. Only useful if you know the exact amount of integers coming up, wanting to get calculated and printed/used further on.
let string98: String = "98"
let string99: String = "99"
let string100: String = "100"
let string101: String = "101"
let int98: Int = Int(string98)!
let int99: Int = Int(string99)!
let int100: Int = Int(string100)!
let int101: Int = Int(string101)!
// optional chaining (if or guard) instead of "!" recommended. therefore option b is better
let finalInt: Int = int98 + int99 + int100 + int101
print(finalInt) // prints Optional(398) (optional)
Fancy way as a function: Generic way. Here you can put as many strings in as you need in the end. You could, for example, gather all the strings first and then use the array to have them calculated.
func getCalculatedIntegerFrom(strings: [String]) -> Int {
var result = Int()
for element in strings {
guard let int = Int(element) else {
break // or return nil
// break instead of return, returns Integer of all
// the values it was able to turn into Integer
// so even if there is a String f.e. "123S", it would
// still return an Integer instead of nil
// if you want to use return, you have to set "-> Int?" as optional
}
result = result + int
}
return result
}
let arrayOfStrings = ["98", "99", "100", "101"]
let result = getCalculatedIntegerFrom(strings: arrayOfStrings)
print(result) // prints 398 (non-optional)
let myString = "556"
let myInt = Int(myString)

Swift 3.0 iterate over String.Index range

The following was possible with Swift 2.2:
let m = "alpha"
for i in m.startIndex..<m.endIndex {
print(m[i])
}
a
l
p
h
a
With 3.0, we get the following error:
Type 'Range' (aka 'Range') does not conform to protocol 'Sequence'
I am trying to do a very simple operation with strings in swift -- simply traverse through the first half of the string (or a more generic problem: traverse through a range of a string).
I can do the following:
let s = "string"
var midIndex = s.index(s.startIndex, offsetBy: s.characters.count/2)
let r = Range(s.startIndex..<midIndex)
print(s[r])
But here I'm not really traversing the string. So the question is: how do I traverse through a range of a given string. Like:
for i in Range(s.startIndex..<s.midIndex) {
print(s[i])
}
You can traverse a string by using indices property of the characters property like this:
let letters = "string"
let middle = letters.index(letters.startIndex, offsetBy: letters.characters.count / 2)
for index in letters.characters.indices {
// to traverse to half the length of string
if index == middle { break } // s, t, r
print(letters[index]) // s, t, r, i, n, g
}
From the documentation in section Strings and Characters - Counting Characters:
Extended grapheme clusters can be composed of one or more Unicode scalars. This means that different characters—and different representations of the same character—can require different amounts of memory to store. Because of this, characters in Swift do not each take up the same amount of memory within a string’s representation. As a result, the number of characters in a string cannot be calculated without iterating through the string to determine its extended grapheme cluster boundaries.
emphasis is my own.
This will not work:
let secondChar = letters[1]
// error: subscript is unavailable, cannot subscript String with an Int
Another option is to use enumerated() e.g:
let string = "Hello World"
for (index, char) in string.characters.enumerated() {
print(char)
}
or for Swift 4 just use
let string = "Hello World"
for (index, char) in string.enumerated() {
print(char)
}
Use the following:
for i in s.characters.indices[s.startIndex..<s.endIndex] {
print(s[i])
}
Taken from Migrating to Swift 2.3 or Swift 3 from Swift 2.2
Iterating over characters in a string is cleaner in Swift 4:
let myString = "Hello World"
for char in myString {
print(char)
}
If you want to traverse over the characters of a String, then instead of explicitly accessing the indices of the String, you could simply work with the CharacterView of the String, which conforms to CollectionType, allowing you access to neat subsequencing methods such as prefix(_:) and so on.
/* traverse the characters of your string instance,
up to middle character of the string, where "middle"
will be rounded down for strings of an odd amount of
characters (e.g. 5 characters -> travers through 2) */
let m = "alpha"
for ch in m.characters.prefix(m.characters.count/2) {
print(ch, ch.dynamicType)
} /* a Character
l Character */
/* round odd division up instead */
for ch in m.characters.prefix((m.characters.count+1)/2) {
print(ch, ch.dynamicType)
} /* a Character
l Character
p Character */
If you'd like to treat the characters within the loop as strings, simply use String(ch) above.
With regard to your comment below: if you'd like to access a range of the CharacterView, you could easily implement your own extension of CollectionType (specified for when Generator.Element is Character) making use of both prefix(_:) and suffix(_:) to yield a sub-collection given e.g. a half-open (from..<to) range
/* for values to >= count, prefixed CharacterView will be suffixed until its end */
extension CollectionType where Generator.Element == Character {
func inHalfOpenRange(from: Int, to: Int) -> Self {
guard case let to = min(to, underestimateCount()) where from <= to else {
return self.prefix(0) as! Self
}
return self.prefix(to).suffix(to-from) as! Self
}
}
/* example */
let m = "0123456789"
for ch in m.characters.inHalfOpenRange(4, to: 8) {
print(ch) /* \ */
} /* 4 a (sub-collection) CharacterView
5
6
7 */
The best way to do this is :-
let name = "nick" // The String which we want to print.
for i in 0..<name.count
{
// Operation name[i] is not allowed in Swift, an alternative is
let index = name.index[name.startIndex, offsetBy: i]
print(name[index])
}
for more details visit here
Swift 4.2
Simply:
let m = "alpha"
for i in m.indices {
print(m[i])
}
Swift 4:
let mi: String = "hello how are you?"
for i in mi {
print(i)
}
To concretely demonstrate how to traverse through a range in a string in Swift 4, we can use the where filter in a for loop to filter its execution to the specified range:
func iterateStringByRange(_ sentence: String, from: Int, to: Int) {
let startIndex = sentence.index(sentence.startIndex, offsetBy: from)
let endIndex = sentence.index(sentence.startIndex, offsetBy: to)
for position in sentence.indices where (position >= startIndex && position < endIndex) {
let char = sentence[position]
print(char)
}
}
iterateStringByRange("string", from: 1, to: 3) will print t, r and i
When iterating over the indices of characters in a string, you seldom only need the index. You probably also need the character at the given index. As specified by Paulo (updated for Swift 4+), string.indices will give you the indices of the characters. zip can be used to combine index and character:
let string = "string"
// Define the range to conform to your needs
let range = string.startIndex..<string.index(string.startIndex, offsetBy: string.count / 2)
let substring = string[range]
// If the range is in the type "first x characters", like until the middle, you can use:
// let substring = string.prefix(string.count / 2)
for (index, char) in zip(substring.indices, substring) {
// index is the index in the substring
print(char)
}
Note that using enumerated() will produce a pair of index and character, but the index is not the index of the character in the string. It is the index in the enumeration, which can be different.

Fit Swift string in database VARCHAR(255)

I'm trying to get a valid substring of at most 255 UTF8 code units from a Swift string (the idea is to be able to store it an a database VARCHAR(255) field).
The standard way of getting a substring is this :
let string: String = "Hello world!"
let startIndex = string.startIndex
let endIndex = string.startIndex.advancedBy(255, limit: string.endIndex)
let databaseSubstring1 = string[startIndex..<endIndex]
But obviously that would give me a string of 255 characters that may require more than 255 bytes in UTF8 representation.
For UTF8 I can write this :
let utf8StartIndex = string.utf8.startIndex
let utf8EndIndex = utf8StartIndex.advancedBy(255, limit: string.utf8.endIndex)
let databaseSubstringUTF8View = name.utf8[utf8StartIndex..<utf8EndIndex]
let databaseSubstring2 = String(databaseSubstringUTF8View)
But I run the risk of having half a character at the end, which means my UTF8View would not be a valid UTF8 sequence.
And as expected databaseSubstring2 is an optional string because the initializer can fail (it is defined as public init?(_ utf8: String.UTF8View)).
So I need some way of stripping invalid UTF8 code points at the end, or – if possible – a builtin way of doing what I'm trying to do here.
EDIT
Turns out that databases understand characters, so I should not try to count UTF8 code units, but rather how many characters the database will count in my string (which will probably depend on the database).
According to #OOPer, MySQL counts characters as UTF-16 code units. I have come up with the following implementation :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of UTF-16 code units
while (string.utf16.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
The idea is to count UTF-16 code units, but remove characters from the end (that is what Swift think what a character is).
EDIT 2
Still according to #OOPer, Posgresql counts characters as unicode scalars, so this should probably work :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of Unicode Scalars
while (string.unicodeScalars.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
As I write in my comment, you may need your databaseStringForString(_:maxLength:) to truncate your string to match the length limit of your DBMS. PostgreSQL with utf8, MySQL with utf8mb4.
And I would write the same functionality as your EDIT 2:
func databaseStringForString(string: String, maxUnicodeScalarLength: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].unicodeScalars.count > maxUnicodeScalarLength {
return string[start..<index]
}
}
return string
}
This may be less efficient, but a little bit shorter.
let s = "abc\u{1D122}\u{1F1EF}\u{1F1F5}" //->"abc𝄢🇯🇵"
let dbus = databaseStringForString(s, maxUnicodeScalarLength: 5) //->"abc𝄢"(=="abc\u{1D122}")
So, someone who works with MySQL with utf8(=utf8mb3) needs something like this:
func databaseStringForString(string: String, maxUTF16Length: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].utf16.count > maxUTF16Length {
return string[start..<index]
}
}
return string
}
let dbu16 = databaseStringForString(s, maxUTF16Length: 4) //->"abc"