Remove characters from string [Swift] [closed] - swift

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
there is a line from the console that needs to be displayed in the application. Can anyone suggest how to clear the line?
"\u{1B}[2K\u{1B}[1G\u{1B}[32msuccess\u{1B}[39m Checking for changed pages - 0.000s"
And get in the end:
"success Checking for changed pages - 0.000s"

Use the regular expression mentioned in this answer
let str = "\u{1B}[2K\u{1B}[1G\u{1B}[32msuccess\u{1B}[39m Checking for changed pages - 0.000s"
let regex = try! NSRegularExpression(pattern: "\u{1B}(?:[#-Z\\-_]|\\[[0-?]*[ -/]*[#-~])")
let range = NSRange(str.startIndex..<str.endIndex, in: str)
let cleaned = regex.stringByReplacingMatches(in: str, options: [], range: range, withTemplate: "")
print(cleaned) // "success Checking for changed pages - 0.000s"

#Gereon's answer is one way to skin the cat. Here's another:
let s = "\u{1B}[2K\u{1B}[1G\u{1B}[32msuccess\u{1B}[39m Checking for changed pages - 0.000s"
guard let r = s.range(of: "Checking for changed pages") else {
fatalError("Insert code for what to do if the substring isn't found")
}
let cleaned = "success " + String(s[r.lowerBound...])
Here I just literally insert the "success". But if you really need to verify that it's in the string, that can be done too.
guard let r = s.range(of: "Checking for changed pages"), s.contains("success")
else
{
fatalError("Insert code for what to do if the substring isn't found")
}
To solve the more general problem of removing ANSI escape sequences, you'll need to parse them. Neither my simple solution, nor the regex solution will do it. You'll need to explicitly look for the possible valid codes that follow the escapes.
let escapeSequences: [String] =
[/* list of escape sequnces */
"[2K", "[1G", "[32m", "[39m", // etc...
]
let escapeChar = Character("\u{1B}")
var result = ""
var i = s.startIndex
outer: while i != s.endIndex
{
if s[i] == escapeChar
{
i = s.index(after: i)
for sequence in escapeSequences {
if s[i...].hasPrefix(sequence) {
i = s.index(i, offsetBy: sequence.distance(from: sequence.startIndex, to: sequence.endIndex))
continue outer
}
}
}
else { result.append(s[i]) }
i = s.index(after: i)
}
print(result)
The thing is, I think ANSI escape sequences can be combined in interesting ways so that what would be multiple escapes can be merged into a single one in some cases. So it can be more complex than just the simple parser I presented.

Related

AWS Polly - Highlighting special characters

I am using the AWS Polly service for text to speech. But if the text contains some special characters, it is returning the wrong start and end numbers.
For example if the text is : "Böylelikle" it returns : {"time":6,"type":"word","start":0,"end":11,"value":"Böylelikle"}
But it should start from 0 and end to 10.
I've searched AWS Documentation and they say for the start and end values, the offset in bytes not characters.
My question is how can I convert this byte value to the character.
My code is:
builder.continueOnSuccessWith { (awsTask: AWSTask<NSURL>) -> Any? in
if builder.error == nil {
if let url = awsTask.result {
do {
let txtData = try Data(contentsOf: url as URL)
if let txtString = String(data: txtData, encoding: .utf8) {
let lines = txtString.components(separatedBy: .newlines)
for line in lines {
let jsonData = Data(line.utf8)
let pollyVoiceSentence = try JSONDecoder().decode(PollyVoiceSentence.self, from: jsonData)
voiceSentences.append(pollyVoiceSentence)
}
}
} catch {
print("Could not parse TXT file")
}
}
} else {
print("ParseJSON: \(builder.error!)")
}
completionHandler(voiceSentences)
return nil
}
And to highlight words:
let start = pollyVoiceSentence.start
var end = pollyVoiceSentence.end
let voiceRange = NSRange(location: start, length: end - start)
print("RANGE: \(voiceRange) - Word: \(pollyVoiceSentence.value)")
Thanks.
It looks like they are providing you String.utf8.count for the word. Swift supports Unicode and not all characters can be represented within UTF8.
You can read the official docs here -
String and Characters
There are a ton of useful details there. I would like to highlight following for your use case -
Here's how it looks for your input as well -
What you can do in your case is -
Decode the PollyVoiceSentence the way you are today.
Create an extension on PollyVoiceSentence to account for this char count issue.
Iterate/account for all words in a sentence, because each previous word's char-count now affects start for all the subsequent words.
And you can't trust the start & end provided by the json, because it clearly doesn't fit best with Swift's String API.

Swift String Tokenizer / Parser [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Hello there fellow Swift devs!
I am a junior dev, and I'm trying to figure out a best way to tokenize / parse Swift String as an exercise.
What I have is a string which looks like this:
let string = "This is a {B}string{/B} and this is a substring."
What I would like to do is, tokenize the string, and change the "strings / tokens" inside the tags you see.
I can see using NSRegularExpression and it's matches, but it feels too generic. I would like to have only say 2 of these tags, that change the text. What would be the best approach in Swift 5.2^?
if let regex = try? NSRegularExpression(pattern: "\\{[a-z0-9]+\}", options: .caseInsensitive) {
let string = self as NSString
return regex.matches(in: self, options: [], range: NSRange(location: 0, length: string.length)).map {
// now $0 is the result? but it won't work for enclosing the tags :/
}
}
If the option of using html tags instead of {B}{/B} is acceptable, then you can use the StringEx library that I wrote for this purpose.
You can select a substring inside the html tag and replace it with another string like this:
let string = "This is a <b>string</b> and this is a substring."
let ex = string.ex
ex[.tag("b")].replace(with: "some value")
print(ex.rawString) // This is a <b>some value</b> and this is a substring.
print(ex.string) // This is a some value and this is a substring.
if necessary, you can also style the selected substrings and get NSAttributedString:
ex[.tag("b")].style([
.font(.boldSystemFont(ofSize: 16)),
.color(.black)
])
myLabel.attributedText = ex.attributedString
Not sure if you have solved it with NLTokenizer or not, but you can certainly solve it with Regx here is how (I have implemented it as generic, in future if you have to handle different kinds of tags and substite different string for them small tweak to the logic should do the job )
override func viewDidLoad() {
super.viewDidLoad()
let regexStr = "(\\{B\\}(\\s*\\w+\\s*)*\\{\\/B\\})"
let regex = try! NSRegularExpression(pattern: regexStr)
var string = "Sandeep {B}Bhandaari{/B} is here{B}Sandeep{/B}"
var foundRanges = [NSRange]()
regex.enumerateMatches(in: string, options: [], range: NSMakeRange(0, string.count)) { (match, flag, stop) in
if let matchRange = match?.range(at: 1) {
foundRanges.append(matchRange)
}
}
let substituteString = "abcd"
var replacedString = string as NSString
let foundRangesCount = foundRanges.count
var currentRange = 0
while foundRangesCount > currentRange {
let range = foundRanges[currentRange]
replacedString = replacedString.replacingCharacters(in: range, with: substituteString) as NSString
reEvaluateAllRanges(ranges: &foundRanges, byOffset: range.length - substituteString.count)
currentRange += 1
}
debugPrint(replacedString)
}
func reEvaluateAllRanges(ranges: inout [NSRange], byOffset: Int) {
var newFoundRange = [NSRange]()
for range in ranges {
newFoundRange.append(NSMakeRange(range.location - byOffset, range.length))
}
ranges = newFoundRange
}
Input: "Sandeep {B}Bhandaari{/B} is here"
Output: Sandeep abcd is here
Input: "Sandeep {B}Bhandaari{/B} is here{B}Sandeep{/B}"
Output: Sandeep abcd is hereabcd
Look at the edge case handling Longer strings replaced by smaller substitute strings and vice versa also detection of string enclosed in tag with / without space
EDIT 1:
Regx (\\{B\\}(\\s*\\w+\\s*)*\\{\\/B\\}) should be self explanatory, incase you need help with understanding it use cheat sheet
regex.enumerateMatches(in: string, options: [], range: NSMakeRange(0, string.count)) { (match, flag, stop) in
if let matchRange = match?.range(at: 1) {
foundRanges.append(matchRange)
}
}
I could have modified substring here itself, but if you have more than one match and if you mutate string evaluated ranges will be corrupted hence am saving all found ranges into an array and apply replace on each one of them later
let substituteString = "abcd"
var replacedString = string as NSString
let foundRangesCount = foundRanges.count
var currentRange = 0
while foundRangesCount > currentRange {
let range = foundRanges[currentRange]
replacedString = replacedString.replacingCharacters(in: range, with: substituteString) as NSString
reEvaluateAllRanges(ranges: &foundRanges, byOffset: range.length - substituteString.count)
currentRange += 1
}
Here am iterating through all found match ranges and replace character in range with substitute string, you can always have a switch / if else ladder inside while loop to look for different types of tags and pass different substitute strings for each tags
func reEvaluateAllRanges(ranges: inout [NSRange], byOffset: Int) {
var newFoundRange = [NSRange]()
for range in ranges {
newFoundRange.append(NSMakeRange(range.location - byOffset, range.length))
}
ranges = newFoundRange
}
This function modifies all the ranges in array using the offset, remember you need to only modify range's location, length remains same
One bit of optimisation you can do is probably get rid of ranges from array for which you have already applied substitute strings

Explanation of lastIndex of: and firstIndex of: used in a string in Swift

I am solving a programming problem in Swift and I found a solution online which I don't totally understand, the problem is: Write a function that reverses characters in (possibly nested) parentheses in the input string. the solution is
var inputString = "foo(bar)baz(ga)kjh"
var s = inputString
while let openIdx = s.lastIndex(of: "(") {
let closeIdx = s[openIdx...].firstIndex(of:")")!
s.replaceSubrange(openIdx...closeIdx, with: s[s.index(after: openIdx)..<closeIdx].reversed())
}
print (s) // output: foorabbazagkjh (the letters inside the braces are reversed)
I d like to have details about: lastIndex(of: does in this case
and what let closeIdx = s[openIdx...].firstIndex(of:")")! does as well
The best place to experiment with these kinds of questions would Playground. Also, check out the documentation.
Now let go through each of the statement:
let openIdx = s.lastIndex(of: "(") // it will find the last index of "(", the return type here is Array.Index?
so if I print the value after with index including till end of string, it would be
print(s[openIdx!...]) // `!` exclamation is used for forced casting
// (ga)kjh
Now for your second question;
let closeIdx = s[openIdx...].firstIndex(of:")")!
Let break it down s[openIdx...] is equal to (ga)kjh in first iteration and so it will return the index of ) after a.
The suggestion would be always break the statement and learn what each expression is doing.

Substrings in Swift

I'm having a problem with understand how I can work with substrings in Swift. Basically, I'm getting a JSON value that has a string with the following format:
Something
I'm trying to get rid of the HTML anchor tag with Swift so I'm left with Something. My thought was to find the index of every < and > in the string so then I could just do a substringWithRange and advance up to the right index.
My problem is that I can't figure out how to find the index. I've read that Swift doesn't support the index (unless you extend it.)
I don't want to add CPU cycles unnecessarily. So my question is, how do I find the indexes in a way that is not inefficient? Or, is there a better way of filtering out the tags?
Edit: Converted Andrew's first code sample to a function:
func formatTwitterSource(rawStr: String) -> String {
let unParsedString = rawStr
var midParseString = ""
var parsedString = ""
if let firstEndIndex = find(unParsedString, ">") {
midParseString = unParsedString[Range<String.Index>(start: firstEndIndex.successor(), end: unParsedString.endIndex)]
if let secondStartIndex = find(midParseString, "<") {
parsedString = midParseString[Range<String.Index>(start: midParseString.startIndex, end: secondStartIndex)]
}
}
return parsedString
}
Nothing too complicated. It takes in a String that has the tags in it. Then it uses Andrew's magic to parse everything out. I renamed the variables and made them clearer so you can see which variable does what in the process. Then in the end, it returns the parsed string.
You could do something like this, but it isn't pretty really. Obviously you would want to factor this into a function and possibly allow for various start/end tokens.
let testText = "Something"
if let firstEndIndex = find(testText, ">") {
let testText2 = testText[Range<String.Index>(start: firstEndIndex.successor(), end: testText.endIndex)]
if let secondStartIndex = find(testText2, "<") {
let testText3 = testText2[Range<String.Index>(start: testText2.startIndex, end: secondStartIndex)]
}
}
Edit
Working on this a little further and came up with something a little more idiomatic?
let startSplits = split(testText, { $0 == "<" })
let strippedValues = map(startSplits) { (s) -> String? in
if let endIndex = find(s, ">") {
return s[Range<String.Index>(start: endIndex.successor(), end: s.endIndex)]
}
return nil
}
let strings = map(filter(strippedValues, { $0 != "" })) { $0! }
It uses a little more functional style there at the end. Not sure I much enjoy the Swift style of map/filter compared to Haskell. But anyhow, the one potentially dangerous part is that forced unwrapping in the final map. If you can live with a result of [String?] then it isn't necessary.
Even though this question has been already answered, I am adding solution based on regex.
let pattern = "<.*>(.*)<.*>"
let src = "Something"
var error: NSError? = nil
var regex = NSRegularExpression(pattern: pattern, options: .DotMatchesLineSeparators, error: &error)
if let regex = regex {
var result = regex.stringByReplacingMatchesInString(src, options: nil, range: NSRange(location:0,
length:countElements(src)), withTemplate: "$1")
println(result)
}

Efficiently remove the last word from a string in Swift

I am trying to build an autocorrect system, so I need to be able to delete the last word typed and replace it with the correct one. My solution:
func autocorrect() {
hasWordReadyToCorrect = false
var wordProxy = self.textDocumentProxy as UITextDocumentProxy
var stringOfWords = wordProxy.documentContextBeforeInput
fullString = "Unset Value"
if stringOfWords != nil {
var words = stringOfWords.componentsSeparatedByCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
for word in words {
arrayOfWords += [word]
}
println("The last word of the array is \(arrayOfWords.last)")
for (mistake, word) in autocorrectList {
println("The mistake is \(mistake)")
if mistake == arrayOfWords.last {
fullString = word
hasWordReadyToCorrect = true
}
}
println("The corrected String is \(fullString)")
}
}
This method is called after each keystroke, and if the space is pressed, it corrects the word. My problem comes in when the string of text becomes longer than about 20 words. It takes a while for it to fill the array each time a character is pressed, and it starts to lag to a point of not being able to use it. Is there a more efficient and elegant Swift way of writing this function? I'd appreciate any help!
This doesn't answer the OP's "autocorrect" issue directly, but this is code is probably the easiest way to answer the question posed in the title:
Swift 3
let myString = "The dog jumped over a fence"
let myStringWithoutLastWord = myString.components(separatedBy: " ").dropLast().joined(separator: " ")
1.
One thing, iteration isn't necessary for this:
for word in words {
arrayOfWords += [word]
}
You can just do:
arrayOfWords += words
2.
Breaking the for loop will prevent iterating unnecessarily:
for (mistake, word) in autocorrectList {
println("The mistake is \(mistake)")
if mistake == arrayOfWords.last {
fullString = word
hasWordReadyToCorrect = true
break; // Add this to stop iterating through 'autocorrectList'
}
}
Or even better, forget the for-loop completely:
if let word = autocorrectList[arrayOfWords.last] {
fullString = word
hasWordReadyToCorrect = true
}
Ultimately what you're doing is seeing if the last word of the entered text matches any of the keys in the autocorrect list. You can just try to get the value directly using optional binding like this.
---
I'll let you know if I think of more.