I'm trying to split the text of a string into lines no longer than 72 characters (to break lines to the usual Usenet quoting line length). The division should be done by replacing a space with a new line (choosing the closest space so that every line is <= 72 characters). [edited]
The text is present in a string and could also contain emoji or other symbols.
I have tried different approaches but the fact that I can not separate a word but I must necessarily separate the text where there is a space has not allowed me to find a solution for now.
Does anyone know how this result can be obtained in Swift? Also with Regular expressions if needed. [edited]
In other languages you can index a string with an integer. Not so in Swift: you must interact with its character index, which can be a pain in the neck if you are not familiar with it.
Try this:
private func split(line: Substring, byCount n: Int, breakableCharacters: [Character]) -> String {
var line = String(line)
var lineStartIndex = line.startIndex
while line.distance(from: lineStartIndex, to: line.endIndex) > n {
let maxLineEndIndex = line.index(lineStartIndex, offsetBy: n)
if breakableCharacters.contains(line[maxLineEndIndex]) {
// If line terminates at a breakable character, replace that character with a newline
line.replaceSubrange(maxLineEndIndex...maxLineEndIndex, with: "\n")
lineStartIndex = line.index(after: maxLineEndIndex)
} else if let index = line[lineStartIndex..<maxLineEndIndex].lastIndex(where: { breakableCharacters.contains($0) }) {
// Otherwise, find a breakable character that is between lineStartIndex and maxLineEndIndex
line.replaceSubrange(index...index, with: "\n")
lineStartIndex = index
} else {
// Finally, forcible break a word
line.insert("\n", at: maxLineEndIndex)
lineStartIndex = maxLineEndIndex
}
}
return line
}
func split(string: String, byCount n: Int, breakableCharacters: [Character] = [" "]) -> String {
precondition(n > 0)
guard !string.isEmpty && string.count > n else { return string }
var string = string
var startIndex = string.startIndex
repeat {
// Break a string into lines.
var endIndex = string[string.index(after: startIndex)...].firstIndex(of: "\n") ?? string.endIndex
if string.distance(from: startIndex, to: endIndex) > n {
let wrappedLine = split(line: string[startIndex..<endIndex], byCount: n, breakableCharacters: breakableCharacters)
string.replaceSubrange(startIndex..<endIndex, with: wrappedLine)
endIndex = string.index(startIndex, offsetBy: wrappedLine.count)
}
startIndex = endIndex
} while startIndex < string.endIndex
return string
}
let str1 = "Iragvzvyn vzzntvav chooyvpngr fh Vafgntenz r pv fbab gnagvffvzv nygev unfugnt, qv zvabe fhpprffb, pur nttertnab vzzntvav pba y’vzznapnovyr zntyvrggn"
let str2 = split(string: str1, byCount: 72)
print(str2)
Edit: this turns out to be more complicated than I thought. The updated answer improves upon the original by processing the text line by line. You may ask why I devise my own algorithm to break lines instead of components(separatedBy: "\n"). The reason is to preserve blank lines. components(...) will collapse consecutive blank lines into one.
Related
I have a UITextView and I want to line break each time a user is extending a limit of chars per line (let's say 30 chars per line is the maximum). And I want to save the word wrapping too so if a 30 limit is reached in the middle of a word, it should just go straight to the new line.
How should I approach this problem? I was hoping for a native solution but can't find anything related in the documentation.
You can use this workaround by using textViewDidChange delegate method from UITextViewDelegate to add newline after every 30 characters, like this
func textViewDidChange(_ textView: UITextView) {
if let text = textView.text {
let strings = string.components(withMaxLength: 30) // generating an array of strings with equally split parts
var newString = ""
for string in strings {
newString += "\(string)\n" //joining all the strings back with newline
}
textView.text = String(newString.dropLast(2)) //dropping the new line sequence at the end
}
}
You will need this extension to split String in equal parts for above code to work though:
extension String {
func components(withMaxLength length: Int) -> [String] {
return stride(from: 0, to: self.count, by: length).map {
let start = self.index(self.startIndex, offsetBy: $0)
let end = self.index(start, offsetBy: length, limitedBy: self.endIndex) ?? self.endIndex
return String(self[start..<end])
}
}
}
I am trying to take a hex string and insert dashes between every other character (e.g. "b201a968" to "b2-01-a9-68"). I have found several ways to do it, but the problem is my string is fairly large (8066 characters) and the fastest I can get it to work it still takes several seconds. These are the ways I have tried and how long they are taking. Can anyone help me optimize this function?
//42.68 seconds
func reformatDebugString(string: String) -> String
{
var myString = string
var index = 2
while(true){
myString.insert("-", at: myString.index(myString.startIndex, offsetBy: index))
index += 3
if(index >= myString.characters.count){
break
}
}
return myString
}
//21.65 seconds
func reformatDebugString3(string: String) -> String
{
var myString = ""
let length = string.characters.count
var first = true
for i in 0...length-1{
let index = string.index(myString.startIndex, offsetBy: i)
let c = string[index]
myString += "\(c)"
if(!first){
myString += "-"
}
first = !first
}
return myString
}
//11.37 seconds
func reformatDebugString(string: String) -> String
{
var myString = string
var index = myString.characters.count - 2
while(true){
myString.insert("-", at: myString.index(myString.startIndex, offsetBy: index))
index -= 2
if(index == 0){
break
}
}
return myString
}
The problem with all three of your approaches is the use of index(_:offsetBy:) in order to get the index of the current character in your loop. This is an O(n) operation where n is the distance to offset by – therefore making all three of your functions run in quadratic time.
Furthermore, for solutions #1 and #3, your insertion into the resultant string is an O(n) operation, as all the characters after the insertion point have to be shifted up to accommodate the added character. It's generally cheaper to build up the string from scratch in this case, as we can just add a given character onto the end of the string, which is O(1) if the string has enough capacity, O(n) otherwise.
Also for solution #1, saying myString.characters.count is an O(n) operation, so not something you want to be doing at each iteration of the loop.
So, we want to build the string from scratch, and avoid indexing and calculating the character count inside the loop. Here's one way of doing that:
extension String {
func addingDashes() -> String {
var result = ""
for (offset, character) in characters.enumerated() {
// don't insert a '-' before the first character,
// otherwise insert one before every other character.
if offset != 0 && offset % 2 == 0 {
result.append("-")
}
result.append(character)
}
return result
}
}
// ...
print("b201a968".addingDashes()) // b2-01-a9-68
Your best solution (#3) in a release build took 37.79s on my computer, the method above took 0.023s.
As already noted in Hamish's answer, you should avoid these two things:
calculate each index with string.index(string.startIndex, offsetBy: ...)
modifying a large String with insert(_:at:)
So, this can be another way:
func reformatDebugString4(string: String) -> String {
var result = ""
var currentIndex = string.startIndex
while currentIndex < string.endIndex {
let nextIndex = string.index(currentIndex, offsetBy: 2, limitedBy: string.endIndex) ?? string.endIndex
if currentIndex != string.startIndex {
result += "-"
}
result += string[currentIndex..<nextIndex]
currentIndex = nextIndex
}
return result
}
I have a string of binary values e.g. "010010000110010101111001". Is there a simple way to convert this string into its ascii representation to get (in this case) "Hey"?
Only found the other way or things for Integer:
let binary = "11001"
if let number = Int(binary, radix: 2) {
print(number) // Output: 25
}
Do someone know a good and efficient solution for this case?
A variant of #OOPer's solution would be to use a conditionally binding while loop and index(_:offsetBy:limitedBy:) in order to iterate over the 8 character substrings, taking advantage of the fact that index(_:offsetBy:limitedBy:) returns nil when you try to advance past the limit.
let binaryBits = "010010000110010101111001"
var result = ""
var index = binaryBits.startIndex
while let next = binaryBits.index(index, offsetBy: 8, limitedBy: binaryBits.endIndex) {
let asciiCode = UInt8(binaryBits[index..<next], radix: 2)!
result.append(Character(UnicodeScalar(asciiCode)))
index = next
}
print(result) // Hey
Note that we're going via Character rather than String in the intermediate step – this is simply to take advantage of the fact that Character is specially optimised for cases where the UTF-8 representation fits into 63 bytes, which is the case here. This saves heap-allocating an intermediate buffer for each character.
Purely for the fun of it, another approach could be to use sequence(state:next:) in order to create a sequence of the start and end indices of each substring, and then reduce in order to concatenate the resultant characters together into a string:
let binaryBits = "010010000110010101111001"
// returns a lazily evaluated sequence of the start and end indices for each substring
// of 8 characters.
let indices = sequence(state: binaryBits.startIndex, next: {
index -> (index: String.Index, nextIndex: String.Index)? in
let previousIndex = index
// Advance the current index – if it didn't go past the limit, then return the
// current index along with the advanced index as a new element of the sequence.
return binaryBits.characters.formIndex(&index, offsetBy: 8, limitedBy: binaryBits.endIndex) ? (previousIndex, index) : nil
})
// iterate over the indices, concatenating the resultant characters together.
let result = indices.reduce("") {
$0 + String(UnicodeScalar(UInt8(binaryBits[$1.index..<$1.nextIndex], radix: 2)!))
}
print(result) // Hey
On the face of it, this appears to be much less efficient than the first solution (due to the fact that reduce should copy the string at each iteration) – however it appears the compiler is able to perform some optimisations to make it not much slower than the first solution.
You may need to split the input binary digits into 8-bit chunks, and then convert each chunk to an ASCII character. I cannot think of a super simple way:
var binaryBits = "010010000110010101111001"
var index = binaryBits.startIndex
var result: String = ""
for _ in 0..<binaryBits.characters.count/8 {
let nextIndex = binaryBits.index(index, offsetBy: 8)
let charBits = binaryBits[index..<nextIndex]
result += String(UnicodeScalar(UInt8(charBits, radix: 2)!))
index = nextIndex
}
print(result) //->Hey
Does basically the same as OOPer's solution, but he/she was faster and has a shorter, more elegant approach :-)
func getASCIIString(from binaryString: String) -> String? {
guard binaryString.characters.count % 8 == 0 else {
return nil
}
var asciiCharacters = [String]()
var asciiString = ""
let startIndex = binaryString.startIndex
var currentLowerIndex = startIndex
while currentLowerIndex < binaryString.endIndex {
let currentUpperIndex = binaryString.index(currentLowerIndex, offsetBy: 8)
let character = binaryString.substring(with: Range(uncheckedBounds: (lower: currentLowerIndex, upper: currentUpperIndex)))
asciiCharacters.append(character)
currentLowerIndex = currentUpperIndex
}
for asciiChar in asciiCharacters {
if let number = UInt8(asciiChar, radix: 2) {
let character = String(describing: UnicodeScalar(number))
asciiString.append(character)
} else {
return nil
}
}
return asciiString
}
let binaryString = "010010000110010101111001"
if let asciiString = getASCIIString(from: binaryString) {
print(asciiString) // Hey
}
A different approach
let bytes_string: String = "010010000110010101111001"
var range_count: Int = 0
let characters_array: [String] = Array(bytes_string.characters).map({ String($0)})
var conversion: String = ""
repeat
{
let sub_range = characters_array[range_count ..< (range_count + 8)]
let sub_string: String = sub_range.reduce("") { $0 + $1 }
let character: String = String(UnicodeScalar(UInt8(sub_string, radix: 2)!))
conversion += character
range_count += 8
} while range_count < characters_array.count
print(conversion)
You can do this:
extension String {
var binaryToAscii: String {
stride(from: 0, through: count - 1, by: 8)
.map { i in map { String($0)}[i..<(i + 8)].joined() }
.map { String(UnicodeScalar(UInt8($0, radix: 2)!)) }
.joined()
}
}
I've been updating some of my old code and answers with Swift 3 but when I got to Swift Strings and Indexing it has been a pain to understand things.
Specifically I was trying the following:
let str = "Hello, playground"
let prefixRange = str.startIndex..<str.startIndex.advancedBy(5) // error
where the second line was giving me the following error
'advancedBy' is unavailable: To advance an index by n steps call 'index(_:offsetBy:)' on the CharacterView instance that produced the index.
I see that String has the following methods.
str.index(after: String.Index)
str.index(before: String.Index)
str.index(String.Index, offsetBy: String.IndexDistance)
str.index(String.Index, offsetBy: String.IndexDistance, limitedBy: String.Index)
These were really confusing me at first so I started playing around with them until I understood them. I am adding an answer below to show how they are used.
All of the following examples use
var str = "Hello, playground"
startIndex and endIndex
startIndex is the index of the first character
endIndex is the index after the last character.
Example
// character
str[str.startIndex] // H
str[str.endIndex] // error: after last character
// range
let range = str.startIndex..<str.endIndex
str[range] // "Hello, playground"
With Swift 4's one-sided ranges, the range can be simplified to one of the following forms.
let range = str.startIndex...
let range = ..<str.endIndex
I will use the full form in the follow examples for the sake of clarity, but for the sake of readability, you will probably want to use the one-sided ranges in your code.
after
As in: index(after: String.Index)
after refers to the index of the character directly after the given index.
Examples
// character
let index = str.index(after: str.startIndex)
str[index] // "e"
// range
let range = str.index(after: str.startIndex)..<str.endIndex
str[range] // "ello, playground"
before
As in: index(before: String.Index)
before refers to the index of the character directly before the given index.
Examples
// character
let index = str.index(before: str.endIndex)
str[index] // d
// range
let range = str.startIndex..<str.index(before: str.endIndex)
str[range] // Hello, playgroun
offsetBy
As in: index(String.Index, offsetBy: String.IndexDistance)
The offsetBy value can be positive or negative and starts from the given index. Although it is of the type String.IndexDistance, you can give it an Int.
Examples
// character
let index = str.index(str.startIndex, offsetBy: 7)
str[index] // p
// range
let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end
str[range] // play
limitedBy
As in: index(String.Index, offsetBy: String.IndexDistance, limitedBy: String.Index)
The limitedBy is useful for making sure that the offset does not cause the index to go out of bounds. It is a bounding index. Since it is possible for the offset to exceed the limit, this method returns an Optional. It returns nil if the index is out of bounds.
Example
// character
if let index = str.index(str.startIndex, offsetBy: 7, limitedBy: str.endIndex) {
str[index] // p
}
If the offset had been 77 instead of 7, then the if statement would have been skipped.
Why is String.Index needed?
It would be much easier to use an Int index for Strings. The reason that you have to create a new String.Index for every String is that Characters in Swift are not all the same length under the hood. A single Swift Character might be composed of one, two, or even more Unicode code points. Thus each unique String must calculate the indexes of its Characters.
It is possible to hide this complexity behind an Int index extension, but I am reluctant to do so. It is good to be reminded of what is actually happening.
I appreciate this question and all the info with it. I have something in mind that's kind of a question and an answer when it comes to String.Index.
I'm trying to see if there is an O(1) way to access a Substring (or Character) inside a String because string.index(startIndex, offsetBy: 1) is O(n) speed if you look at the definition of index function. Of course we can do something like:
let characterArray = Array(string)
then access any position in the characterArray however SPACE complexity of this is n = length of string, O(n) so it's kind of a waste of space.
I was looking at Swift.String documentation in Xcode and there is a frozen public struct called Index. We can initialize is as:
let index = String.Index(encodedOffset: 0)
Then simply access or print any index in our String object as such:
print(string[index])
Note: be careful not to go out of bounds`
This works and that's great but what is the run-time and space complexity of doing it this way? Is it any better?
func change(string: inout String) {
var character: Character = .normal
enum Character {
case space
case newLine
case normal
}
for i in stride(from: string.count - 1, through: 0, by: -1) {
// first get index
let index: String.Index?
if i != 0 {
index = string.index(after: string.index(string.startIndex, offsetBy: i - 1))
} else {
index = string.startIndex
}
if string[index!] == "\n" {
if character != .normal {
if character == .newLine {
string.remove(at: index!)
} else if character == .space {
let number = string.index(after: string.index(string.startIndex, offsetBy: i))
if string[number] == " " {
string.remove(at: number)
}
character = .newLine
}
} else {
character = .newLine
}
} else if string[index!] == " " {
if character != .normal {
string.remove(at: index!)
} else {
character = .space
}
} else {
character = .normal
}
}
// startIndex
guard string.count > 0 else { return }
if string[string.startIndex] == "\n" || string[string.startIndex] == " " {
string.remove(at: string.startIndex)
}
// endIndex - here is a little more complicated!
guard string.count > 0 else { return }
let index = string.index(before: string.endIndex)
if string[index] == "\n" || string[index] == " " {
string.remove(at: index)
}
}
Create a UITextView inside of a tableViewController. I used function: textViewDidChange and then checked for return-key-input.
then if it detected return-key-input, delete the input of return key and dismiss keyboard.
func textViewDidChange(_ textView: UITextView) {
tableView.beginUpdates()
if textView.text.contains("\n"){
textView.text.remove(at: textView.text.index(before: textView.text.endIndex))
textView.resignFirstResponder()
}
tableView.endUpdates()
}
I'm trying to get a valid substring of at most 255 UTF8 code units from a Swift string (the idea is to be able to store it an a database VARCHAR(255) field).
The standard way of getting a substring is this :
let string: String = "Hello world!"
let startIndex = string.startIndex
let endIndex = string.startIndex.advancedBy(255, limit: string.endIndex)
let databaseSubstring1 = string[startIndex..<endIndex]
But obviously that would give me a string of 255 characters that may require more than 255 bytes in UTF8 representation.
For UTF8 I can write this :
let utf8StartIndex = string.utf8.startIndex
let utf8EndIndex = utf8StartIndex.advancedBy(255, limit: string.utf8.endIndex)
let databaseSubstringUTF8View = name.utf8[utf8StartIndex..<utf8EndIndex]
let databaseSubstring2 = String(databaseSubstringUTF8View)
But I run the risk of having half a character at the end, which means my UTF8View would not be a valid UTF8 sequence.
And as expected databaseSubstring2 is an optional string because the initializer can fail (it is defined as public init?(_ utf8: String.UTF8View)).
So I need some way of stripping invalid UTF8 code points at the end, or – if possible – a builtin way of doing what I'm trying to do here.
EDIT
Turns out that databases understand characters, so I should not try to count UTF8 code units, but rather how many characters the database will count in my string (which will probably depend on the database).
According to #OOPer, MySQL counts characters as UTF-16 code units. I have come up with the following implementation :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of UTF-16 code units
while (string.utf16.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
The idea is to count UTF-16 code units, but remove characters from the end (that is what Swift think what a character is).
EDIT 2
Still according to #OOPer, Posgresql counts characters as unicode scalars, so this should probably work :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of Unicode Scalars
while (string.unicodeScalars.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
As I write in my comment, you may need your databaseStringForString(_:maxLength:) to truncate your string to match the length limit of your DBMS. PostgreSQL with utf8, MySQL with utf8mb4.
And I would write the same functionality as your EDIT 2:
func databaseStringForString(string: String, maxUnicodeScalarLength: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].unicodeScalars.count > maxUnicodeScalarLength {
return string[start..<index]
}
}
return string
}
This may be less efficient, but a little bit shorter.
let s = "abc\u{1D122}\u{1F1EF}\u{1F1F5}" //->"abc𝄢🇯🇵"
let dbus = databaseStringForString(s, maxUnicodeScalarLength: 5) //->"abc𝄢"(=="abc\u{1D122}")
So, someone who works with MySQL with utf8(=utf8mb3) needs something like this:
func databaseStringForString(string: String, maxUTF16Length: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].utf16.count > maxUTF16Length {
return string[start..<index]
}
}
return string
}
let dbu16 = databaseStringForString(s, maxUTF16Length: 4) //->"abc"