This string extension works with Latin characters, but doesn't work with Cyrillic. Can someone explain why, and how can I fix it?
extension String {
var asciiArray: [UInt32] {
return unicodeScalars.filter{$0.isASCII}.map{$0.value}
}
}
I think you are confusing the original ASCII standard with one of its cyrillic extensions (such as KOI-8R). The original ASCII is 7-bit, whereas an extension would use up the codes above 127 (and up to 255) for its purposes.
Swift's isASCII property on UnicodeScalar type indicates whether a scalar is from the original ASCII.
Related
I have a custom keyboard extension that inputs data from a BLE device into a text field.
func getdata(data:Data){
...
processing data from BLE device
...
dataToSend = "...\n"
textDocumentProxy.insertText(dataToSend)
}
When this function is used to insert text in different applications it behaves differently. For example, in notes, the line feed ("\n") seems to work correctly and insert a new line. But when the data is being inserted in an email or a numbers sheet, it does not work correctly and instead of inserting a new line, it inserts a tab ("\t").
I also have a function that inserts a new line character
func newLine(){
textDocumentProxy.insertText("\n")
}
that works as expected regardless of what application I am using. Does anyone know why "\n" by itself works correctly but when at the end of a string has different behavior?
For completeness, I have tried calling newLine() at the end of getdata() thinking there may be an issue with inserting "\n" at the end of a string but the results were the same.
There are another newLine characters available but I can't just simply paste them here (Because they make a new lines).
using this extension:
extension CharacterSet {
var allCharacters: [Character] {
var result: [Character] = []
for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
result.append(Character(uniChar))
}
}
}
return result
}
}
you can access all characters in any CharacterSet. There is a character set called newlines. Use one of them to fulfill your requirements:
let newlines = CharacterSet.newlines.allCharacters
for newLine in newlines {
textDocumentProxy.insertText(String(newLine))
}
Then store the one you tested and worked everywhere and use it anywhere.
Note that you can't relay on the index of the character set. It may change.
I wanted to replace the first character of a String and got it to work like this:
s.replaceSubrange(Range(NSMakeRange(0,1),in:s)!, with:".")
I wonder if there is a simpler method to achieve the same result?
[edit]
Get nth character of a string in Swift programming language doesn't provide a mutable substring. And it requires writing a String extension, which isn't really helping when trying to shorten code.
To replace the first character, you can do use String concatenation with dropFirst():
var s = "😃hello world!"
s = "." + s.dropFirst()
print(s)
Result:
.hello world!
Note: This will not crash if the String is empty; it will just create a String with the replacement character.
Strings work very differently in Swift than many other languages. In Swift, a character is not a single byte but instead a single visual element. This is very important when working with multibyte characters like emoji (see: Why are emoji characters like 👩👩👧👦 treated so strangely in Swift strings?)
If you really do want to set a single random byte of your string to an arbitrary value as you expanded on in the comments of your question, you'll need to drop out of the string abstraction and work with your data as a buffer. This is sort of gross in Swift thanks to various safety features but it's doable:
var input = "Hello, world!"
//access the byte buffer
var utf8Buffer = input.utf8CString
//replace the first byte with whatever random data we want
utf8Buffer[0] = 46 //ascii encoding of '.'
//now convert back to a Swift string
var output:String! = nil //buffer for holding our new target
utf8Buffer.withUnsafeBufferPointer { (ptr) in
//Load the byte buffer into a Swift string
output = String.init(cString: ptr.baseAddress!)
}
print(output!) //.ello, world!
I thought I understand Unicode scalars in Swift pretty well, but the dog face emoji proved me wrong.
for code in "🐶".utf16 {
print(code)
}
The UTF-16 codes are 55357 and 56374. In hex, that's d83d and dc36.
Now:
let dog = "\u{d83d}\u{dc36}"
Instead of getting a string with "🐶", I'm getting an error:
Invalid unicode scalar
I tried with UTF-8 codes and it didn't work neither. Not throwing an error, but returning "ð¶" instead of the dog face.
What is wrong here?
The \u{nnnn} escape sequence expects a Unicode scalar value, not the UTF-16 representation (with high and low surrogates):
for code in "🐶".unicodeScalars {
print(String(code.value, radix: 16))
}
// 1f436
let dog = "\u{1F436}"
print(dog) // 🐶
Solutions to reconstruct a string from its UTF-16 representation can be found at Is there a way to create a String from utf16 array in swift?. For example:
let utf16: [UInt16] = [ 0xd83d, 0xdc36 ]
let dog = String(utf16CodeUnits: utf16, count: utf16.count)
print(dog) // 🐶
I'm receiving via a REST API a string which contains unicode encoded characters in form of \uXXXX
e.g. Ain\u2019t which should be Ain’t
Is there a nice way to convert these?
You can use \u{my_unicode}:
print("Ain\u{2019}t this a beautiful day")
/* Prints "Ain’t this a beautiful day"
From the Language Guide - Strings and Characters - Unicode:
String literals can include the following special characters:
...
An arbitrary Unicode scalar, written as \u{n}, where n is a 1–8 digit
hexadecimal number with a value equal to a valid Unicode code point
You can apply a string transform StringTransform:
extension String {
var decodingUnicodeCharacters: String { applyingTransform(.init("Hex-Any"), reverse: false) ?? "" }
}
let string = #"Ain\u2019t"#
print(string.decodingUnicodeCharacters) // "Ain’t\n"
So I know how to convert String to utf8 format like this
for character in strings.utf8 {
// for example A will converted to 65
var utf8Value = character
}
I already read the guide but can't find how to convert Unicode code point that represented by integer to String. For example: converting 65 to A. I already tried to use the "\u"+utf8Value but it still failed.
Is there any way to do this?
If you look at the enum definition for Character you can see the following initializer:
init(_ scalar: UnicodeScalar)
If we then look at the struct UnicodeScalar, we see this initializer:
init(_ v: UInt32)
We can put them together, and we get a whole character
Character(UnicodeScalar(65))
and if we want it in a string, it's just another initializer away...
1> String(Character(UnicodeScalar(65)))
$R1: String = "A"
Or (although I can't figure out why this one works) you can do
String(UnicodeScalar(65))