Inconsistent ranges of strings with greek letter - swift

I'll show the code and the output, since it's easier to explain the issue.
Code and output in the commented lines:
let greekLetter = "β"
let string1 = greekLetter
/// string2 is the same as string1 but converted to NSString then back to String
let string2 = String(NSString(string: greekLetter))
print(string1.range(of: greekLetter)!)
/// prints: Index(_rawBits: 0)..<Index(_rawBits: 131072)
print(string2.range(of: greekLetter)!)
/// prints: Index(_rawBits: 0)..<Index(_rawBits: 65536)
The problem: A String that contains a greek letter returns a range that is different from the same String with the same greek letter that was converted to NSString and then back to String again.
Any ideas why?
Why this question is raised:
I'm doing some parsing and I need to find the range of specific string and then insert something else instead of it. Because of wrong ranges returned inserting strings in a wrong position due to wrong lower/upper bound location.
UPDATE 2:
Let's say I have a task: in a given string "β-1" change "1" to "2". And this string comes from the server.
Please look at this code sample:
let wordWithGreekLetter = "β-1"
var string1 = wordWithGreekLetter
let data = """
{ "name" : "\(wordWithGreekLetter)" }
""".data(using: String.Encoding.utf8)
struct User: Decodable {
let name: String
}
let user = try! JSONDecoder().decode(User.self, from: data!)
/// string2 is the same as string1 but decoded from the data
var string2 = user.name
let rangeOfNumberOne1 = string1.range(of: "1")!
string1.removeSubrange(rangeOfNumberOne1)
string1.insert("2", at: rangeOfNumberOne1.lowerBound)
/// RESULT: string1 = "β-2"
let rangeOfNumberOne2 = string2.range(of: "1")!
string2.removeSubrange(rangeOfNumberOne2)
string2.insert("2", at: rangeOfNumberOne2.lowerBound)
/// RESULT: string2 = "β2-"

As Rob explained in Why is startIndex not equal to endIndex shifted to startIndex position in String in Swift?, the raw bits of the index are an implementation detail, and you should not care about that value.
The actual problem is that (quote from Collection):
Saved indices may become invalid as a result of mutating operations.
so that rangeOfNumberOne1/2 may be no longer valid after you call removeSubrange() on the string.
In this particular case this may happen for string2 (which is bridged from an NSString) because removing a character may reorganize the internal storage. But this is pure speculation: what matters only is that the current code exhibits undefined behavior.
If you replace
let rangeOfNumberOne1 = string1.range(of: "1")!
string1.removeSubrange(rangeOfNumberOne1)
string1.insert("2", at: rangeOfNumberOne1.lowerBound)
by
let rangeOfNumberOne1 = string1.range(of: "1")!
string1.replaceSubrange(rangeOfNumberOne1, with: "2")
(and similarly for string2) then you'll get the same result "β-2" for both strings.

Related

Trying to concatenate a string with what I think is an unicode suffix

I have this app of mine that reads datamatrix barcodes from drugs using the camera.
When it does for a particular drug, I receive this string from the detector, as seen on Xcode console:
0100000000D272671721123110700XXXX\U0000001d91D1
my problem is that \U0000001d91D1 part.
This code can be decomposed on the following:
01 00000000D27267 17 211231 10 700XXXX \U0000001d 91D1"
01 = drug code
17 = expiring date DMY
10 = batch number
The last part is the dosage rate
Now on another part of the application I am on the simulator, with no camera, so I need to pass this string to the module that decomposes the code.
I have tried to store the code as a string using
let code = "0100000000D272671721123110700XXXX\U0000001d91D1"
it complains about the inverted bar, so I change it to double bar
let code = "0100000000D272671721123110700XXXX\\U0000001d91D1"
the detector analyzes this string and concludes that the batch number is 700XXXX\U0000001d91D1, instead of just 700XXXX, so the information contained from the \ forward is lost.
I think this is unicode or something.
How do I create this string correctly.
You can use string transform to decode your hex unicode characters:
let str1 = #"0100000000D272671721123110700XXXX\U00000DF491D1"#
let str2 = #"0100000000D272671721123110700XXXX\U0000001d91D1"#
let decoded1 = str1.applyingTransform(.init("Hex-Any"), reverse: false)! // "0100000000D272671721123110700XXXX෴91D1"
let decoded2 = str2.applyingTransform(.init("Hex-Any"), reverse: false)! // "0100000000D272671721123110700XXXX91D1"
You can also get rid of the verbosity extending StringTransform and StringProtocol:
extension StringTransform {
static let hexToAny: Self = .init("Hex-Any")
static let anyToHex: Self = .init("Any-Hex")
}
extension StringProtocol {
var decodingHex: String {
applyingTransform(.hexToAny, reverse: false)!
}
var encodingHex: String {
applyingTransform(.anyToHex, reverse: false)!
}
}
Usage:
let str1 = #"0100000000D272671721123110700XXXX\U00000DF491D1"#
let str2 = #"0100000000D272671721123110700XXXX\U0000001d91D1"#
let decoded1 = str1.decodingHex // "0100000000D272671721123110700XXXX෴91D1"
let decoded2 = str2.decodingHex // "0100000000D272671721123110700XXXX91D1"
The \U0000001d substring probably represents code point U+001D INFORMATION SEPARATOR THREE, which is also the ASCII code point GS (group separator).
In a Swift string literal, we can write that code point using a Unicode escape sequence: \u{1d}. Try writing your string literal like this:
let code = "0100000000D272671721123110700XXXX\u{1d}91D1"

Format String left of multiple characters in Swift 5?

I have some Strings that vary in length but always end in "listing(number)"
myString = 9AMnep8MAziUCK7VwKF51mXZ2listing28
.
I want to get the String without "listing(number)":
9AMnep8MAziUCK7VwKF51mXZ2
.
Methods I've tried such as .index(of: ) only let you format based off one character. Any simple solutions?
A possible solution is to search for the substring with Regular Expression and remove the result (replace it with empty string)
let myString = "9AMnep8MAziUCK7VwKF51mXZ2listing28"
let trimmedString = myString.replacingOccurrences(of: "listing\\d+$", with: "", options: .regularExpression)
\\d+ searches for one ore more digits
$ represents the end of the string
Alternatively without creating a new string
var myString = "9AMnep8MAziUCK7VwKF51mXZ2listing28"
if let range = myString.range(of: "listing\\d+$", options: .regularExpression) {
myString.removeSubrange(range)
}
Another option is to split the string in parts with "listing" as separator
let result = myString.components(separatedBy: "listing").first
So to solve your issue find the code below with few comments written to try and explain each steps have taken. kindly note i have modified or arrived at this solution using this links as a guide.
https://stackoverflow.com/a/40070835/6596443
https://www.dotnetperls.com/substring-swift
extension String {
//
// Paramter inputString: This is the string you want to manipulate
// Paramter- startStringOfUnwanted: This is the string you want to start the removal or replacement from
//return : The expected output you want but can be emptystring if unable to
static func trimUnWantedEndingString(inputString: String,startStringOfUnwanted: String) -> String{
//Output string
var outputString: String?
//Getting the range based on the string content
if let range = myString.range(of: startStringOfUnwanted) {
//Get the lowerbound of the range
let lower = range.lowerBound
//Get the upperbound of the range
let upper = range.upperBound
//Get the integer position of the start index of the unwanted string i added plus one to ensure it starts from the right position
let startPos = Int(myString.distance(from: myString.startIndex, to: lower))+1
//Get the integer position of the end index of the unwanted string i added plus one to ensure it starts from the right position
let endPos = Int(myString.distance(from: myString.startIndex, to: upper))+1
//Substract the start int from the end int to get the integer value that will be used to get the last string i want to stop trimming at
let endOffsetBy = endPos-startPos
//get thes string char ranges of values
let result = myString.index(myString.startIndex, offsetBy: 0)..<myString.index(myString.endIndex, offsetBy: -endOffsetBy)
//converts the results to string or get the string representation of the result and then assign it to the OutputString
outputString = String(myString[result]);
}
return outputString ?? "";
}
}
let myString = "9AMnep8MAziUCK7VwKF51mXZ2listing28"
String.trimUnWantedEndingString(inputString: myString, startStringOfUnwanted:"listing")

character extract from string in swift

I have a task to do something with a chess board. The input gives us a starting position of some chess figure. For example "b4" or "a6" or something like that. How can i decompose the input and make from it two integer numbers, like in C++:
string input;
cin>>input
int coord_x = input[0] - 'a';
int coord_y = input[1]
I cannot manage to do that in swift. I do something like:
let input : String=readLine()!
let characters = Array(input)
and then try to take the int but it doesnt work, no matter what i try...
and what type is the content of the Array in swift?
You can retrieve the c string representation like this:
let string = "a5"
let scalars = string.lowercased().cString(using: .ascii)!
let first = scalars[0]
let second = scalars[1]
It could be safer to retrieve the unicodeScalar characters instead:
let string = "a5".lowercased()
let characters = Array(string.unicodeScalars)
let first = characters[0].value - UnicodeScalar(unicodeScalarLiteral: "a").value

Interpolate String Loaded From File

I can't figure out how to load a string from a file and have variables referenced in that string be interpolated.
Let's say a text file at filePath that has these contents:
Hello there, \(name)!
I can load this file into a string with:
let string = String.stringWithContentsOfFile(filePath, encoding: NSUTF8StringEncoding, error: nil)!
In my class, I have loaded a name in: let name = "George"
I'd like this new string to interpolate the \(name) using my constant, so that its value is Hello there, George!. (In reality the text file is a much larger template with lots of strings that need to be swapped in.)
I see String has a convertFromStringInterpolation method but I can't figure out if that's the right way to do this. Does anyone have any ideas?
This cannot be done as you intend, because it goes against type safety at compile time (the compiler cannot check type safety on the variables that you are trying to refer to on the string file).
As a workaround, you can manually define a replacement table, as follows:
// Extend String to conform to the Printable protocol
extension String: Printable
{
public var description: String { return self }
}
var string = "Hello there, [firstName] [lastName]. You are [height]cm tall and [age] years old!"
let firstName = "John"
let lastName = "Appleseed"
let age = 33
let height = 1.74
let tokenTable: [String: Printable] = [
"[firstName]": firstName,
"[lastName]": lastName,
"[age]": age,
"[height]": height]
for (token, value) in tokenTable
{
string = string.stringByReplacingOccurrencesOfString(token, withString: value.description)
}
println(string)
// Prints: "Hello there, John Appleseed. You are 1.74cm tall and 33 years old!"
You can store entities of any type as the values of tokenTable, as long as they conform to the Printable protocol.
To automate things further, you could define the tokenTable constant in a separate Swift file, and auto-generate that file by using a separate script to extract the tokens from your string-containing file.
Note that this approach will probably be quite inefficient with very large string files (but not much more inefficient than reading the whole string into memory on the first place). If that is a problem, consider processing the string file in a buffered way.
There is no built in mechanism for doing this, you will have to create your own.
Here is an example of a VERY rudimentary version:
var values = [
"name": "George"
]
var textFromFile = "Hello there, <name>!"
var parts = split(textFromFile, {$0 == "<" || $0 == ">"}, maxSplit: 10, allowEmptySlices: true)
var output = ""
for index in 0 ..< parts.count {
if index % 2 == 0 {
// If it is even, it is not a variable
output += parts[index]
}
else {
// If it is odd, it is a variable so look it up
if let value = values[parts[index]] {
output += value
}
else {
output += "NOT_FOUND"
}
}
}
println(output) // "Hello there, George!"
Depending on your use case, you will probably have to make this much more robust.

How do I convert a string into a vector of bytes in rust?

That might be the dumbest Rustlang question ever but I promise I tried my best to find the answer in the documentation or any other place on the web.
I can convert a string to a vector of bytes like this:
let bar = bytes!("some string");
Unfortunately I can't do it this way
let foo = "some string";
let bar = bytes!(foo);
Because bytes! expects a string literal.
But then, how do I get my foo converted into a vector of bytes?
(&str).as_bytes gives you a view of a string as a &[u8] byte slice (that can be called on String since that derefs to str, and there's also String.into_bytes will consume a String to give you a Vec<u8>.
Use the .as_bytes version if you don't need ownership of the bytes.
fn main() {
let string = "foo";
println!("{:?}", string.as_bytes()); // prints [102, 111, 111]
}
BTW, The naming conventions for conversion functions are helpful in situations like these, because they allow you to know approximately what name you might be looking for.
To expand the answers above. Here are a few different conversions between types.
&str to &[u8]:
let my_string: &str = "some string";
let my_bytes: &[u8] = my_string.as_bytes();
&str to Vec<u8>:
let my_string: &str = "some string";
let my_bytes: Vec<u8> = my_string.as_bytes().to_vec();
String to &[u8]:
let my_string: String = "some string".to_owned();
let my_bytes: &[u8] = my_string.as_bytes();
String to Vec<u8>:
let my_string: String = "some string".to_owned();
let my_bytes: Vec<u8> = my_string.into_bytes();
Specifying the variable type is optional in all cases. Just added to avoid confusion.
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5ad228e45a38b4f097bbbba49100ecfc
`let v1: Vec<u8> = string.encode_to_vec();`
`let v2: &[u8] = string.as_bytes();`
two work difference, in some of library use ownership of bytes !! if you use as_bytes() see compiler error: must be static.
for example: tokio_uring::fs::File::write_at()
get a ownership of bytes !!
but if you need borrowing , use as_bytes()