Access an optional capture by name when using Swift Regex Builder

Access an optional capture by name when using Swift Regex Builder - swift

I'm just getting started with regular expressions and Swift Regex, so a heads up that my terminology my be incorrect. I have boiled this problem down to a very simple task:
I have input lines that have either just one word (a name) or start with the word "Test" followed by one space and then a name. I want to extract the name and also be able to access - without using match indices - the match to "Test " (which may be nil). Here is code that better describes the problem:
import RegexBuilder
let line1 = "Test John"
let line2 = "Robert"
let nameReference = Reference(String.self)
let testReference = Reference(String.self)
let regex = Regex {
Optionally {
Capture(as:testReference) {
"Test "
} transform : { text in
String(text)
}
}
Capture(as:nameReference) {
OneOrMore(.any)
} transform : { text in
String(text)
}
}
if let matches = try? regex.wholeMatch(in: line1) { // USE line1 OR line2 HERE
let theName = matches[nameReference]
print("Name is \(theName)")
// using index to access the test flag works fine for both line1 and line2:
if let flag = matches.1, flag == "Test " {
print("Using index: This is a test line")
} else {
print("Using index: Not a test line")
}
// but for line2, attempting to access with testReference crashes:
if matches[testReference] == "Test " { // crashes for line2 (not surprisingly)
print("Using reference: This is a test line")
} else {
print("Using reference: Not a test line")
}
}
When regex.wholeMatch() is called with line1 things work as expected with output:
Name is John
Using index: This is a test line
Using reference: This is a test line
but when called with line2 it crashes with a SIGABRT and output:
Name is Robert
Using index: Not a test line
Could not cast value of type 'Swift.Optional<Swift.Substring>' (0x7ff84bf06f20) to 'Swift.String' (0x7ff84ba6e918).
The crash is not surprising, because the Capture(as:testReference) was never matched.
My question is: is there a way to do this without using match indices (matches.1)? An answer using Regex Builder would be much appreciated:-)
The documentation says Regex.Match has a subscript(String) method which "returns nil if there's no capture with that name". That would be ideal, but it works only when the match output is type AnyRegexOutput.

I don't think you can get away with not using indexes, or at least code that knows the index but might hide it. Regular expression parsing works like that in any language, because it's always assumed that you know the order of elements in the expression.
For something like this, your example could be simplified to something like
let nameRegex = Regex {
ZeroOrMore("Test ")
Capture { OneOrMore(.anyNonNewline) }
}
if let matches = try? nameRegex.wholeMatch(in: line2) {
let (_, name) = matches.output
print("Name: \(name)")
}
That works for both of your sample lines. The let (_, name) doesn't use a numeric index but it's effectively the same thing since it uses index 1 as the value for name.
If your data is as straightforward as these examples, a regular expression may be overkill. You could work with if line1.hasPrefix("Test ") to detect lines with Test and then drop the first 5 characters, for example.

Related

Count the number of lines in a Swift String

After reading a medium sized file (about 500kByte) from a web-service I have a regular Swift String (lines) originally encoded in .isolatin1. Before actually splitting it I would like to count the number of lines (quickly) in order to be able to initialise a progress bar.
What is the best Swift idiom to achieve this?
I came up with the following:
let linesCount = lines.reduce(into: 0) { (count, letter) in
if letter == "\r\n" {
count += 1
}
}
This does not look too bad but I am asking myself if there is a shorter/faster way to do it. The characters property provides access to a sequence of Unicode graphemes which treat \r\n as only one entity. Checking this with all CharacterSet.newlines does not work, since CharacterSet is not a set of Character but a set of Unicode.Scalar (a little counter-intuitively in my book) which is a set of code points (where \r\n counts as two code points), not graphemes. Trying
var lines = "Hello, playground\r\nhere too\r\nGalahad\r\n"
lines.unicodeScalars.reduce(into: 0) { (cnt, letter) in
if CharacterSet.newlines.contains(letter) {
cnt += 1
}
}
will count to 6 instead of 3. So this is more general than the above method, but it will not work correctly for CRLF line endings.
Is there a way to allow for more line ending conventions (as in CharacterSet.newlines) that still achieves the correct result for CRLF? Can the number of lines be computed with less code (while still remaining readable)?

If it's ok for you to use a Foundation method on an NSString, I suggest using
enumerateLines(_ block: #escaping (String, UnsafeMutablePointer<ObjCBool>) -> Void)
Here's an example:
import Foundation
let base = "Hello, playground\r\nhere too\r\nGalahad\r\n"
let ns = base as NSString
ns.enumerateLines { (str, _) in
print(str)
}
It separates the lines properly, taking into account all linefeed types, such as "\r\n", "\n", etc:
Hello, playground
here too
Galahad
In my example I print the lines but it's trivial to count them instead, as you need to - my version is just for the demonstration.

As I did not find a generic way to count newlines I ended up just solving my problem by iterating through all the characters using
let linesCount = text.reduce(into: 0) { (count, letter) in
if letter == "\r\n" { // This treats CRLF as one "letter", contrary to UnicodeScalars
count += 1
}
}
I was sure this would be a lot faster than enumerating lines for just counting, but I resolved to eventually do the measurement. Today I finally got to it and found ... that I could not have been more wrong.
A 10000 line string counted lines as above in about 1.0 seconds , but counting through enumeration using
var enumCount = 0
text.enumerateLines { (str, _) in
enumCount += 1
}
only took around 0.8 seconds and was consistently faster by a little more than 20%. I do not know what tricks the Swift engineers hide in their sleves, but they sure manage to enumerateLines very quickly. This just for the record.

You can use the following extension
extension String {
var numberOfLines: Int {
return self.components(separatedBy: "\n").count
}
}

Swift 5 Extension
extension String {
func numberOfLines() -> Int {
return self.numberOfOccurrencesOf(string: "\n") + 1
}
func numberOfOccurrencesOf(string: String) -> Int {
return self.components(separatedBy:string).count - 1
}
}
Example:
let testString = "First line\nSecond line\nThird line"
let numberOfLines = testString.numberOfLines() // returns 3

I use this, a CharacterSet which Apple provides, made for this task:
let newLines = text.components(separatedBy: .newlines).count - 1

swift why is characters.split used for? and why is map(String.init) used for

import Foundation
for i in 1 ... n {
let entry = readLine()!.characters.split(" ").map(String.init)
let name = entry[0]
let phone = Int(entry[1])!
phoneBook[name] = phone``
}
//can someone explain this piece of code`

I assume you know everything else in the code except this line:
let entry = readLine()!.characters.split(" ").map(String.init)
readLine() reads user input and returns it. Let's say the user input is
Sweeper 12345678
using .characters.split(" "), we split the input using a separator. What is this separator? A space (" ")! Now the input has been split into two - "Sweeper" and "12345678".
We want the two split parts to be strings, right? Strings are much more easier to manipulate. Currently the split parts are stored in an array of String.CharacterView.SubSequence. We want to turn each String.CharacterView.SubSequence into a string. That is why we use map. map applies a certain function to everything in a collection. So
.map(String.init)
is like
// this is for demonstration purposes only, not real code
for x in readLine()!.characters.split(" ") {
String.init(x)
}
We have now transformed the whole collection into strings!

There is error in your code replace it like below:
let entry = readLine()!.characters.split(separator: " ").map(String.init)
Alternative to the above code is:
let entry = readLine()!.components(separatedBy: " ")
Example:
var str = "Hello, playground"
let entry = str.characters.split(separator: " ").map(String.init)
print(entry)
Now characters.split with split the characters with the separator you give in above case " "(space). So it will generate an array of characters. And you need to use it as string so you are mapping characters into String type by map().

How to get multiple lines of stdin Swift HackerRank?

I just tried out a HackerRank challenge, and if a question gives you x lines of input, putting x lines of let someVariable = readLine() simply doesn't cut it, because there are lot's of test cases that shoot way more input to the code we write, so hard coded readLine() for each line of input won't fly.
Is there some way to get multiple lines of input into one variable?

For anyone else out there who's trying a HackerRank challenge for the first time, you might need to know a couple of things that you may have never come across. I only recently learned about this piece of magic called the readLine() command, which is a native function in Swift.
When the HackerRank system executes your code, it passes your code lines of input and this is a way of retrieving that input.
let line1 = readLine()
let line2 = readLine()
let line3 = readLine()
line1 is now given the value of the first line of input mentioned in the question (or delivered to your code by one of the test cases), with line2 being the second and so on.
Your code may work just great but may fail on a bunch of other test cases. These test cases don't send your code the same number of lines of input. Here's food for thought:
var string = ""
while let thing = readLine() {
string += thing + " "
}
print(string)
Now the string variable contains all the input there was to receive (as a String, in this case).
Hope that helps someone
:)

Definitely you shouldn't do this:
while let readString = readLine() {
s += readString
}
This because Swift will expect an input string (from readLine) forever and will never terminate, causing your application die by timeout.
Instead you should think in a for loop assuming you know how many lines you need to read, which is usually this way in HackerRank ;)
Try something like this:
let n = Int(readLine()!)! // Number of test cases
for _ in 1 ... n { // Loop from 1 to n
let line = readLine()! // Read a single line
// do something with input
}
If you know that each line is an integer, you can use this:
let line = Int(readLine()!)!
Or if you know each line is an array of integers, use this:
let line = readLine()!.characters.split(" ").map{ Int(String($0))! }
Or if each line is an array of strings:
let line = readLine()!.characters.split(" ").map{ String($0) }
I hope this helps.

For new version, to get an array of numbers separated by space
let numbers = readLine()!.components(separatedBy: [" "]).map { Int($0)! }

Using readLine() and AnyGenerator to construct a String array of the std input lines
readLine() will read from standard input line-by-line until EOF is hit, whereafter it returns nil.
Returns Characters read from standard input through the end of the
current line or until EOF is reached, or nil if EOF has already been
reached.
This is quite neat, as it makes readLine() a perfect candidate for generating a sequence using the AnyGenerator initializer init(body:) which recursively (as next()) invokes body, terminating in case body equals nil.
AnyGenerator
init(body: () -> Element?)
Create a GeneratorType instance whose next method invokes body
and returns the result.
With this, there's no need to actually supply the amount of lines we expect from standard input, and hence, we can catch all input from standard input e.g. into a String array, where each element corresponds to an input line:
let allLines = AnyGenerator { readLine() }.map{ $0 }
// type: Array<String>
After which we can work with the String array to apply whatever operations needed to solve a given task (/HackerRank task).
// example standard input
4 3
<tag1 value = "HelloWorld">
<tag2 name = "Name1">
</tag2>
</tag1>
tag1.tag2~name
tag1~name
tag1~value
/* resulting allLines array:
["4 3", "<tag1 value = \"HelloWorld\">",
"<tag2 name = \"Name1\">",
"</tag2>",
"</tag1>",
"tag1.tag2~name",
"tag1~name",
"tag1~value"] */

I recently discovered a neat trick to get a certain amount of lines. I'm gonna assume the first line gives you the amount of lines you get:
guard let count = readLine().flatMap({ Int($0) }) else { fatalError("No count") }
let lines = AnyGenerator{ readLine() }.prefix(count)
for line in lines {
}

I usually use this form.
if let line = readLine(), let cnt = Int(line) {
for _ in 1...cnt {
if let line = readLine() {
// your code for a line
}
}
}

Following the answer from dfrib, for Swift 3+, AnyIterator can be used instead of AnyGenerator, in the same way:
let allLines = AnyIterator { readLine() }.map{ $0 }
// type: Array<String>

Interpolate String Loaded From File

I can't figure out how to load a string from a file and have variables referenced in that string be interpolated.
Let's say a text file at filePath that has these contents:
Hello there, \(name)!
I can load this file into a string with:
let string = String.stringWithContentsOfFile(filePath, encoding: NSUTF8StringEncoding, error: nil)!
In my class, I have loaded a name in: let name = "George"
I'd like this new string to interpolate the \(name) using my constant, so that its value is Hello there, George!. (In reality the text file is a much larger template with lots of strings that need to be swapped in.)
I see String has a convertFromStringInterpolation method but I can't figure out if that's the right way to do this. Does anyone have any ideas?

This cannot be done as you intend, because it goes against type safety at compile time (the compiler cannot check type safety on the variables that you are trying to refer to on the string file).
As a workaround, you can manually define a replacement table, as follows:
// Extend String to conform to the Printable protocol
extension String: Printable
{
public var description: String { return self }
}
var string = "Hello there, [firstName] [lastName]. You are [height]cm tall and [age] years old!"
let firstName = "John"
let lastName = "Appleseed"
let age = 33
let height = 1.74
let tokenTable: [String: Printable] = [
"[firstName]": firstName,
"[lastName]": lastName,
"[age]": age,
"[height]": height]
for (token, value) in tokenTable
{
string = string.stringByReplacingOccurrencesOfString(token, withString: value.description)
}
println(string)
// Prints: "Hello there, John Appleseed. You are 1.74cm tall and 33 years old!"
You can store entities of any type as the values of tokenTable, as long as they conform to the Printable protocol.
To automate things further, you could define the tokenTable constant in a separate Swift file, and auto-generate that file by using a separate script to extract the tokens from your string-containing file.
Note that this approach will probably be quite inefficient with very large string files (but not much more inefficient than reading the whole string into memory on the first place). If that is a problem, consider processing the string file in a buffered way.

There is no built in mechanism for doing this, you will have to create your own.
Here is an example of a VERY rudimentary version:
var values = [
"name": "George"
]
var textFromFile = "Hello there, <name>!"
var parts = split(textFromFile, {$0 == "<" || $0 == ">"}, maxSplit: 10, allowEmptySlices: true)
var output = ""
for index in 0 ..< parts.count {
if index % 2 == 0 {
// If it is even, it is not a variable
output += parts[index]
}
else {
// If it is odd, it is a variable so look it up
if let value = values[parts[index]] {
output += value
}
else {
output += "NOT_FOUND"
}
}
}
println(output) // "Hello there, George!"
Depending on your use case, you will probably have to make this much more robust.

Efficiently remove the last word from a string in Swift

I am trying to build an autocorrect system, so I need to be able to delete the last word typed and replace it with the correct one. My solution:
func autocorrect() {
hasWordReadyToCorrect = false
var wordProxy = self.textDocumentProxy as UITextDocumentProxy
var stringOfWords = wordProxy.documentContextBeforeInput
fullString = "Unset Value"
if stringOfWords != nil {
var words = stringOfWords.componentsSeparatedByCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
for word in words {
arrayOfWords += [word]
}
println("The last word of the array is \(arrayOfWords.last)")
for (mistake, word) in autocorrectList {
println("The mistake is \(mistake)")
if mistake == arrayOfWords.last {
fullString = word
hasWordReadyToCorrect = true
}
}
println("The corrected String is \(fullString)")
}
}
This method is called after each keystroke, and if the space is pressed, it corrects the word. My problem comes in when the string of text becomes longer than about 20 words. It takes a while for it to fill the array each time a character is pressed, and it starts to lag to a point of not being able to use it. Is there a more efficient and elegant Swift way of writing this function? I'd appreciate any help!

This doesn't answer the OP's "autocorrect" issue directly, but this is code is probably the easiest way to answer the question posed in the title:
Swift 3
let myString = "The dog jumped over a fence"
let myStringWithoutLastWord = myString.components(separatedBy: " ").dropLast().joined(separator: " ")

1.
One thing, iteration isn't necessary for this:
for word in words {
arrayOfWords += [word]
}
You can just do:
arrayOfWords += words
2.
Breaking the for loop will prevent iterating unnecessarily:
for (mistake, word) in autocorrectList {
println("The mistake is \(mistake)")
if mistake == arrayOfWords.last {
fullString = word
hasWordReadyToCorrect = true
break; // Add this to stop iterating through 'autocorrectList'
}
}
Or even better, forget the for-loop completely:
if let word = autocorrectList[arrayOfWords.last] {
fullString = word
hasWordReadyToCorrect = true
}
Ultimately what you're doing is seeing if the last word of the entered text matches any of the keys in the autocorrect list. You can just try to get the value directly using optional binding like this.
---
I'll let you know if I think of more.