I'm trying to get a valid substring of at most 255 UTF8 code units from a Swift string (the idea is to be able to store it an a database VARCHAR(255) field).
The standard way of getting a substring is this :
let string: String = "Hello world!"
let startIndex = string.startIndex
let endIndex = string.startIndex.advancedBy(255, limit: string.endIndex)
let databaseSubstring1 = string[startIndex..<endIndex]
But obviously that would give me a string of 255 characters that may require more than 255 bytes in UTF8 representation.
For UTF8 I can write this :
let utf8StartIndex = string.utf8.startIndex
let utf8EndIndex = utf8StartIndex.advancedBy(255, limit: string.utf8.endIndex)
let databaseSubstringUTF8View = name.utf8[utf8StartIndex..<utf8EndIndex]
let databaseSubstring2 = String(databaseSubstringUTF8View)
But I run the risk of having half a character at the end, which means my UTF8View would not be a valid UTF8 sequence.
And as expected databaseSubstring2 is an optional string because the initializer can fail (it is defined as public init?(_ utf8: String.UTF8View)).
So I need some way of stripping invalid UTF8 code points at the end, or – if possible – a builtin way of doing what I'm trying to do here.
EDIT
Turns out that databases understand characters, so I should not try to count UTF8 code units, but rather how many characters the database will count in my string (which will probably depend on the database).
According to #OOPer, MySQL counts characters as UTF-16 code units. I have come up with the following implementation :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of UTF-16 code units
while (string.utf16.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
The idea is to count UTF-16 code units, but remove characters from the end (that is what Swift think what a character is).
EDIT 2
Still according to #OOPer, Posgresql counts characters as unicode scalars, so this should probably work :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of Unicode Scalars
while (string.unicodeScalars.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
As I write in my comment, you may need your databaseStringForString(_:maxLength:) to truncate your string to match the length limit of your DBMS. PostgreSQL with utf8, MySQL with utf8mb4.
And I would write the same functionality as your EDIT 2:
func databaseStringForString(string: String, maxUnicodeScalarLength: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].unicodeScalars.count > maxUnicodeScalarLength {
return string[start..<index]
}
}
return string
}
This may be less efficient, but a little bit shorter.
let s = "abc\u{1D122}\u{1F1EF}\u{1F1F5}" //->"abc𝄢🇯🇵"
let dbus = databaseStringForString(s, maxUnicodeScalarLength: 5) //->"abc𝄢"(=="abc\u{1D122}")
So, someone who works with MySQL with utf8(=utf8mb3) needs something like this:
func databaseStringForString(string: String, maxUTF16Length: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].utf16.count > maxUTF16Length {
return string[start..<index]
}
}
return string
}
let dbu16 = databaseStringForString(s, maxUTF16Length: 4) //->"abc"
Related
I have some Strings that vary in length but always end in "listing(number)"
myString = 9AMnep8MAziUCK7VwKF51mXZ2listing28
.
I want to get the String without "listing(number)":
9AMnep8MAziUCK7VwKF51mXZ2
.
Methods I've tried such as .index(of: ) only let you format based off one character. Any simple solutions?
A possible solution is to search for the substring with Regular Expression and remove the result (replace it with empty string)
let myString = "9AMnep8MAziUCK7VwKF51mXZ2listing28"
let trimmedString = myString.replacingOccurrences(of: "listing\\d+$", with: "", options: .regularExpression)
\\d+ searches for one ore more digits
$ represents the end of the string
Alternatively without creating a new string
var myString = "9AMnep8MAziUCK7VwKF51mXZ2listing28"
if let range = myString.range(of: "listing\\d+$", options: .regularExpression) {
myString.removeSubrange(range)
}
Another option is to split the string in parts with "listing" as separator
let result = myString.components(separatedBy: "listing").first
So to solve your issue find the code below with few comments written to try and explain each steps have taken. kindly note i have modified or arrived at this solution using this links as a guide.
https://stackoverflow.com/a/40070835/6596443
https://www.dotnetperls.com/substring-swift
extension String {
//
// Paramter inputString: This is the string you want to manipulate
// Paramter- startStringOfUnwanted: This is the string you want to start the removal or replacement from
//return : The expected output you want but can be emptystring if unable to
static func trimUnWantedEndingString(inputString: String,startStringOfUnwanted: String) -> String{
//Output string
var outputString: String?
//Getting the range based on the string content
if let range = myString.range(of: startStringOfUnwanted) {
//Get the lowerbound of the range
let lower = range.lowerBound
//Get the upperbound of the range
let upper = range.upperBound
//Get the integer position of the start index of the unwanted string i added plus one to ensure it starts from the right position
let startPos = Int(myString.distance(from: myString.startIndex, to: lower))+1
//Get the integer position of the end index of the unwanted string i added plus one to ensure it starts from the right position
let endPos = Int(myString.distance(from: myString.startIndex, to: upper))+1
//Substract the start int from the end int to get the integer value that will be used to get the last string i want to stop trimming at
let endOffsetBy = endPos-startPos
//get thes string char ranges of values
let result = myString.index(myString.startIndex, offsetBy: 0)..<myString.index(myString.endIndex, offsetBy: -endOffsetBy)
//converts the results to string or get the string representation of the result and then assign it to the OutputString
outputString = String(myString[result]);
}
return outputString ?? "";
}
}
let myString = "9AMnep8MAziUCK7VwKF51mXZ2listing28"
String.trimUnWantedEndingString(inputString: myString, startStringOfUnwanted:"listing")
I'm trying to split the text of a string into lines no longer than 72 characters (to break lines to the usual Usenet quoting line length). The division should be done by replacing a space with a new line (choosing the closest space so that every line is <= 72 characters). [edited]
The text is present in a string and could also contain emoji or other symbols.
I have tried different approaches but the fact that I can not separate a word but I must necessarily separate the text where there is a space has not allowed me to find a solution for now.
Does anyone know how this result can be obtained in Swift? Also with Regular expressions if needed. [edited]
In other languages you can index a string with an integer. Not so in Swift: you must interact with its character index, which can be a pain in the neck if you are not familiar with it.
Try this:
private func split(line: Substring, byCount n: Int, breakableCharacters: [Character]) -> String {
var line = String(line)
var lineStartIndex = line.startIndex
while line.distance(from: lineStartIndex, to: line.endIndex) > n {
let maxLineEndIndex = line.index(lineStartIndex, offsetBy: n)
if breakableCharacters.contains(line[maxLineEndIndex]) {
// If line terminates at a breakable character, replace that character with a newline
line.replaceSubrange(maxLineEndIndex...maxLineEndIndex, with: "\n")
lineStartIndex = line.index(after: maxLineEndIndex)
} else if let index = line[lineStartIndex..<maxLineEndIndex].lastIndex(where: { breakableCharacters.contains($0) }) {
// Otherwise, find a breakable character that is between lineStartIndex and maxLineEndIndex
line.replaceSubrange(index...index, with: "\n")
lineStartIndex = index
} else {
// Finally, forcible break a word
line.insert("\n", at: maxLineEndIndex)
lineStartIndex = maxLineEndIndex
}
}
return line
}
func split(string: String, byCount n: Int, breakableCharacters: [Character] = [" "]) -> String {
precondition(n > 0)
guard !string.isEmpty && string.count > n else { return string }
var string = string
var startIndex = string.startIndex
repeat {
// Break a string into lines.
var endIndex = string[string.index(after: startIndex)...].firstIndex(of: "\n") ?? string.endIndex
if string.distance(from: startIndex, to: endIndex) > n {
let wrappedLine = split(line: string[startIndex..<endIndex], byCount: n, breakableCharacters: breakableCharacters)
string.replaceSubrange(startIndex..<endIndex, with: wrappedLine)
endIndex = string.index(startIndex, offsetBy: wrappedLine.count)
}
startIndex = endIndex
} while startIndex < string.endIndex
return string
}
let str1 = "Iragvzvyn vzzntvav chooyvpngr fh Vafgntenz r pv fbab gnagvffvzv nygev unfugnt, qv zvabe fhpprffb, pur nttertnab vzzntvav pba y’vzznapnovyr zntyvrggn"
let str2 = split(string: str1, byCount: 72)
print(str2)
Edit: this turns out to be more complicated than I thought. The updated answer improves upon the original by processing the text line by line. You may ask why I devise my own algorithm to break lines instead of components(separatedBy: "\n"). The reason is to preserve blank lines. components(...) will collapse consecutive blank lines into one.
Consider this function to build a string of random characters:
func makeToken(length: Int) -> String {
let chars: String = "abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ"
var result: String = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let idxEnd = idx + 1
let range: Range = idx..<idxEnd
let char = chars.substring(with: range)
result += char
}
return result
}
This throws an error on the substring method:
Cannot convert value of type 'Range<Int>' to expected argument
type 'Range<String.Index>' (aka 'Range<String.CharacterView.Index>')
I'm confused why I can't simply provide a Range with 2 integers, and why it's making me go the roundabout way of making a Range<String.Index>.
So I have to change the Range creation to this very over-complicated way:
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let start = chars.index(chars.startIndex, offsetBy: idx)
let end = chars.index(chars.startIndex, offsetBy: idx + 1)
let range: Range = start..<end
Why isn't it good enough for Swift for me to simply create a range with 2 integers and the half-open range operator? (..<)
Quite the contrast to "swift", in javascript I can simply do chars.substr(idx, 1)
I suggest converting your String to [Character] so that you can index it easily with Int:
func makeToken(length: Int) -> String {
let chars = Array("abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ".characters)
var result = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.count)))
result += String(chars[idx])
}
return result
}
Swift takes great care to provide a fully Unicode-compliant, type-safe, String abstraction.
Indexing a given Character, in an arbitrary Unicode string, is far from a trivial task. Each Character is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character. In particular, hiding all this complexity behind a simple Int based indexing scheme might result in the wrong performance mental model for programmers.
Having said that, you can always convert your string to a Array<Character> once for easy (and fast!) indexing. For instance:
let chars: String = "abcdefghijklmnop"
var charsArray = Array(chars.characters)
...
let resultingString = String(charsArray)
I have a string of binary values e.g. "010010000110010101111001". Is there a simple way to convert this string into its ascii representation to get (in this case) "Hey"?
Only found the other way or things for Integer:
let binary = "11001"
if let number = Int(binary, radix: 2) {
print(number) // Output: 25
}
Do someone know a good and efficient solution for this case?
A variant of #OOPer's solution would be to use a conditionally binding while loop and index(_:offsetBy:limitedBy:) in order to iterate over the 8 character substrings, taking advantage of the fact that index(_:offsetBy:limitedBy:) returns nil when you try to advance past the limit.
let binaryBits = "010010000110010101111001"
var result = ""
var index = binaryBits.startIndex
while let next = binaryBits.index(index, offsetBy: 8, limitedBy: binaryBits.endIndex) {
let asciiCode = UInt8(binaryBits[index..<next], radix: 2)!
result.append(Character(UnicodeScalar(asciiCode)))
index = next
}
print(result) // Hey
Note that we're going via Character rather than String in the intermediate step – this is simply to take advantage of the fact that Character is specially optimised for cases where the UTF-8 representation fits into 63 bytes, which is the case here. This saves heap-allocating an intermediate buffer for each character.
Purely for the fun of it, another approach could be to use sequence(state:next:) in order to create a sequence of the start and end indices of each substring, and then reduce in order to concatenate the resultant characters together into a string:
let binaryBits = "010010000110010101111001"
// returns a lazily evaluated sequence of the start and end indices for each substring
// of 8 characters.
let indices = sequence(state: binaryBits.startIndex, next: {
index -> (index: String.Index, nextIndex: String.Index)? in
let previousIndex = index
// Advance the current index – if it didn't go past the limit, then return the
// current index along with the advanced index as a new element of the sequence.
return binaryBits.characters.formIndex(&index, offsetBy: 8, limitedBy: binaryBits.endIndex) ? (previousIndex, index) : nil
})
// iterate over the indices, concatenating the resultant characters together.
let result = indices.reduce("") {
$0 + String(UnicodeScalar(UInt8(binaryBits[$1.index..<$1.nextIndex], radix: 2)!))
}
print(result) // Hey
On the face of it, this appears to be much less efficient than the first solution (due to the fact that reduce should copy the string at each iteration) – however it appears the compiler is able to perform some optimisations to make it not much slower than the first solution.
You may need to split the input binary digits into 8-bit chunks, and then convert each chunk to an ASCII character. I cannot think of a super simple way:
var binaryBits = "010010000110010101111001"
var index = binaryBits.startIndex
var result: String = ""
for _ in 0..<binaryBits.characters.count/8 {
let nextIndex = binaryBits.index(index, offsetBy: 8)
let charBits = binaryBits[index..<nextIndex]
result += String(UnicodeScalar(UInt8(charBits, radix: 2)!))
index = nextIndex
}
print(result) //->Hey
Does basically the same as OOPer's solution, but he/she was faster and has a shorter, more elegant approach :-)
func getASCIIString(from binaryString: String) -> String? {
guard binaryString.characters.count % 8 == 0 else {
return nil
}
var asciiCharacters = [String]()
var asciiString = ""
let startIndex = binaryString.startIndex
var currentLowerIndex = startIndex
while currentLowerIndex < binaryString.endIndex {
let currentUpperIndex = binaryString.index(currentLowerIndex, offsetBy: 8)
let character = binaryString.substring(with: Range(uncheckedBounds: (lower: currentLowerIndex, upper: currentUpperIndex)))
asciiCharacters.append(character)
currentLowerIndex = currentUpperIndex
}
for asciiChar in asciiCharacters {
if let number = UInt8(asciiChar, radix: 2) {
let character = String(describing: UnicodeScalar(number))
asciiString.append(character)
} else {
return nil
}
}
return asciiString
}
let binaryString = "010010000110010101111001"
if let asciiString = getASCIIString(from: binaryString) {
print(asciiString) // Hey
}
A different approach
let bytes_string: String = "010010000110010101111001"
var range_count: Int = 0
let characters_array: [String] = Array(bytes_string.characters).map({ String($0)})
var conversion: String = ""
repeat
{
let sub_range = characters_array[range_count ..< (range_count + 8)]
let sub_string: String = sub_range.reduce("") { $0 + $1 }
let character: String = String(UnicodeScalar(UInt8(sub_string, radix: 2)!))
conversion += character
range_count += 8
} while range_count < characters_array.count
print(conversion)
You can do this:
extension String {
var binaryToAscii: String {
stride(from: 0, through: count - 1, by: 8)
.map { i in map { String($0)}[i..<(i + 8)].joined() }
.map { String(UnicodeScalar(UInt8($0, radix: 2)!)) }
.joined()
}
}
My current attempts at creating a random unicode character generate have failed with errors such as those mentioned in my other question here. It's obviously not as simple as just generating a random number.
Question: How can I generate a random unicode character in Swift?
Unicode Scalar Value
Any Unicode code point except high-surrogate and low-surrogate code
points. In other words, the ranges of integers 0 to D7FF and E000
to 10FFFF inclusive.
So, I've made a small code's snippet. See below.
This code works
func randomUnicodeCharacter() -> String {
let i = arc4random_uniform(1114111)
return (i > 55295 && i < 57344) ? randomUnicodeCharacter() : String(UnicodeScalar(i))
}
randomUnicodeCharacter()
This code doesn't work!
let N: UInt32 = 65536
let i = arc4random_uniform(N)
var c = String(UnicodeScalar(i))
print(c, appendNewline: false)
I was a little bit confused with this and this. [Maximum value: 65535]
static func randomCharacters(withLength length: Int = 20) -> String {
let base = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
var randomString: String = ""
for _ in 0..<length {
let randomValue = arc4random_uniform(UInt32(base.characters.count))
randomString += "\(base[base.index(base.startIndex, offsetBy: Int(randomValue))])"
}
return randomString
}
Here you can modify length (Int) and use this for generating random characters.