StringTransform to sanitize strings in Swift - swift

I'm attempting to sanitize a string in Swift using a single StringTransform.
I'm using this example string: "Mom's \t Famous \"Ćevapčići\"!"
And the expected result is: "moms-famous-cevapcici"
So far, I've been able to achieve this using a combination of StringTransform and NSRegularExpression:
"Mom's \t Famous \"Ćevapčići\"!"
.applyingTransform(StringTransform("Latin-ASCII; Lower; [:Punctuation:] Remove;"))?
// produces: Optional("moms famous\tcevapcici")
.replacingMatches(
by: try! NSRegularExpression(pattern: "[^a-z0-9]+", options: []),
withTemplate: "-"
)
// produces: Optional("moms-famous-cevapcici")
Is there a way to do this using only StringTransform?
So far, I've only figured out how to remove certain characters, but not replace them. Eg.:
StringTransform("Latin-ASCII; Lower; [:Punctuation:] Remove; [^a-z0-9] Remove;")
The above transform produces "momsfamouscevapcici".
I'd also like to avoid this result: moms---famous-cevapcici. Ideally, this transform could replace several consecutive characters with one dash.

Related

Regular expression with backslash in swift

i am having problems using replacingOccurrences to replace a word after some specific keywords inside a textview in swift 5 and Xcode 12.
For example:
My textview will have the following string "NAME\JOHN PHONE\555444333"
"NAME" and "PHONE" will be unique so anytime i change the proper field i want to change the name or phone inside this textview.
let's for example change JOHN for CLOE with the code
txtOther.text = txtOther.text.replacingOccurrences(of: "NAME(.*?)\\s", with: "NAME\\\(new_value) ", options: .regularExpression)
print (string)
output: "NAMECLOE " instead of "NAME\CLOE "
I can't get the backslash to get print according to the regular expression.
Or maybe change the regex expression just to change JOHN for CLOE after "NAME"
Thanks!!!
Ariel
You can solve this by using a raw string for your regular expresion, that is a string surrounded with #
let pattern = #"(NAME\\)(.*)\s"#
Note that name and the \ is in a separate group that can be referenced when replacing
let output = string.replacingOccurrences(of: pattern, with: "$1\(new_value) ", options: .regularExpression)
Use
"NAME\\JOHN PHONE\\555444333".replacingOccurrences(
of: #"NAME\\(\S+)"#,
with: "NAME\\\\\(new_value)",
options: .regularExpression
)
Double backslashes in the replacement, backslash is a special metacharacter inside a replacement.
\S+ matches one or more characters different from whitespace, this is shorter and more efficient than .*?\s, and you do not have to worry about how to put back the whitespace.

How to get non-escaped apostrophe from .components(separatedBy: CharacterSet)

How I can get components(separatedBy: CharacterSet) to return the substrings so that they do not contain escaped apostrophes or single quotes?
When I print the resulting array, I want it to not include the backslash character.
I am using a playground to manipulate text and produce output in the terminal that I can copy and use outside of Xcode, so I want to strip the escape character from the string representation produced in the terminal output.
var str = "can't,,, won't, , good-bye, Santa Claus"
var delimiters = CharacterSet.letters.inverted.subtracting(.whitespaces)
delimiters = delimiters.subtracting(CharacterSet(charactersIn: "-"))
delimiters = delimiters.subtracting(CharacterSet(charactersIn: "'"))
var result = str.components(separatedBy: delimiters)
.map({ $0.trimmingCharacters(in: .whitespaces) })
.filter({ !$0.isEmpty })
print(result) // ["can\'t", "won\'t", "good-bye", "Santa Claus"]
What you are asking for is a metaphysical impossibility. You cannot want anything about how print prints. It's only a representation in the log.
Your strings do not actually contain any backslashes, so what's the problem? How the print command output notates them is irrelevant. You might as well "want" the print command to translate your strings into French. No, that's not what it does. It just prints, and the way it prints is the way it prints.
Another way to look at it: An array doesn't contain square brackets at both ends. And a string doesn't contain double-quotes at both ends. Those are things you might write in order express those things as literals, but they are not real as part of the actual object. Well, I don't see you objecting to those!
Basically, if you want to control the output of something, you write an output routine. If you're doing to rely on print, just accept the funny old way it writes stuff and move on.

Count leading tabs in Swift string

I need to count the number of leading tabs in a Swift string. I know there are fairly simple solutions (e.g. looping over the string until a non-tab character is encountered) but I am looking for a more elegant solution.
I have attempted to use a regex such as ^\\t* along with the .numberOfMatches method but this detects all the tab characters as one match. For example, if the string has three leading tabs then that method just returns 1. Is there a way to write a regex that treats each individual tab character as a single match?
Also open to other ways of approaching this without using a regex.
Here is a non-regex solution
let count = someString.prefix(while: {$0 == "\t"}).count
You may use
\G\t
See the regex demo.
Here,
\G - matches a string start position or end of the previous match position, and
\t - matches a single tab.
Swift test:
let string = "\t\t123"
let regex = try! NSRegularExpression(pattern: "\\G\t", options: [])
let numberOfOccurrences = regex.numberOfMatches(in: string, range: NSRange(string.startIndex..., in: string))
print(numberOfOccurrences) // => 2

How to split a Korean word into it's components?

So, for example the character 김 is made up of ㄱ, ㅣ and ㅁ. I need to split the Korean word into it's components to get the resulting 3 characters.
I tried by doing the following but it doesn't seem to output it correctly:
let str = "김"
let utf8 = str.utf8
let first:UInt8 = utf8.first!
let char = Character(UnicodeScalar(first))
The problem is, that that code returns ê, when it should be returning ㄱ.
You need to use the decomposedStringWithCompatibilityMapping string to get the unicode scalar values and then use those scalar values to get the characters. Something below,
let string = "김"
for scalar in string.decomposedStringWithCompatibilityMapping.unicodeScalars {
print("\(scalar) ")
}
Output:
ᄀ
ᅵ
ᆷ
You can create list of character strings as,
let chars = string.decomposedStringWithCompatibilityMapping.unicodeScalars.map { String($0) }
print(chars)
// ["ᄀ", "ᅵ", "ᆷ"]
Korean related info in Apple docs
Extended grapheme clusters are a flexible way to represent many
complex script characters as a single Character value. For example,
Hangul syllables from the Korean alphabet can be represented as either
a precomposed or decomposed sequence. Both of these representations
qualify as a single Character value in Swift:
let precomposed: Character = "\u{D55C}" // 한
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}" // ᄒ, ᅡ, ᆫ
// precomposed is 한, decomposed is 한

Parsing Infix Mathematical Expressions in Swift Using Regular Expressions

I would like to convert a string that is formatted as an infix mathematical to an array of tokens, using regular expressions. I'm very new to regular expressions, so forgive me if the answer to this question turns out to be too trivial
For example:
"31+2--3*43.8/1%(1*2)" -> ["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "*", "2", ")"]
I've already implemented a method that achieves this task, however, it consists of many lines of code and a few nested loops. I figured that when I define more operators/functions that may even consist of multiple characters, such as log or cos, it would be easier to edit a regex string rather than adding many more lines of code to my working function. Are regular expressions the right job for this, and if so, where am I going wrong? Or am I better off adding to my working parser?
I've already referred to the following SO posts:
How to split a string, but also keep the delimiters?
This one was very helpful, but I don't believe I'm using 'lookahead' correctly.
Validate mathematical expressions using regular expression?
The solution to the question above doesn't convert the string into an array of tokens. Rather, it checks to see if the given string is a valid mathematical expression.
My code is as follows:
func convertToInfixTokens(expression: String) -> [String]?
{
do
{
let pattern = "^(((?=[+-/*]))(-)?\\d+(\\.\\d+)?)*"
let regex = try NSRegularExpression(pattern: pattern)
let results = regex.matches(in: expression, range: NSRange(expression.startIndex..., in: expression))
return results.map
{
String(expression[Range($0.range, in: expression)!])
}
}
catch
{
return nil
}
}
When I do pass a valid infix expression to this function, it returns nil. Where am I going wrong with my regex string?
NOTE: I haven't even gotten to the point of trying to parse parentheses as individual tokens. I'm still figuring out why it won't work on this expression:
"-99+44+2+-3/3.2-6"
Any feedback is appreciated, thanks!
Your pattern does not work because it only matches text at the start of the string (see ^ anchor), then the (?=[+-/*]) positive lookahead requires the first char to be an operator from the specified set but the only operator that you consume is an optional -. So, when * tries to match the enclosed pattern sequence the second time with -99+44+2+-3/3.2-6, it sees +44 and -?\d fails to match it (as it does not know how to match + with -?).
Here is how your regex matches the string:
You may tokenize the expression using
let pattern = "(?<!\\d)-?\\d+(?:\\.\\d+)?|[-+*/%()]"
See the regex demo
Details
(?<!\d) - there should be no digit immediately to the left of the current position
-? - an optional -
\d+ - 1 or more digits
(?:\.\d+)? - an optional sequence of . and 1+ digits
| - or
\D - any char but a digit.
Output using your function:
Optional(["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "1", "*", "2", ")"])