After reading a medium sized file (about 500kByte) from a web-service I have a regular Swift String (lines) originally encoded in .isolatin1. Before actually splitting it I would like to count the number of lines (quickly) in order to be able to initialise a progress bar.
What is the best Swift idiom to achieve this?
I came up with the following:
let linesCount = lines.reduce(into: 0) { (count, letter) in
if letter == "\r\n" {
count += 1
}
}
This does not look too bad but I am asking myself if there is a shorter/faster way to do it. The characters property provides access to a sequence of Unicode graphemes which treat \r\n as only one entity. Checking this with all CharacterSet.newlines does not work, since CharacterSet is not a set of Character but a set of Unicode.Scalar (a little counter-intuitively in my book) which is a set of code points (where \r\n counts as two code points), not graphemes. Trying
var lines = "Hello, playground\r\nhere too\r\nGalahad\r\n"
lines.unicodeScalars.reduce(into: 0) { (cnt, letter) in
if CharacterSet.newlines.contains(letter) {
cnt += 1
}
}
will count to 6 instead of 3. So this is more general than the above method, but it will not work correctly for CRLF line endings.
Is there a way to allow for more line ending conventions (as in CharacterSet.newlines) that still achieves the correct result for CRLF? Can the number of lines be computed with less code (while still remaining readable)?
If it's ok for you to use a Foundation method on an NSString, I suggest using
enumerateLines(_ block: #escaping (String, UnsafeMutablePointer<ObjCBool>) -> Void)
Here's an example:
import Foundation
let base = "Hello, playground\r\nhere too\r\nGalahad\r\n"
let ns = base as NSString
ns.enumerateLines { (str, _) in
print(str)
}
It separates the lines properly, taking into account all linefeed types, such as "\r\n", "\n", etc:
Hello, playground
here too
Galahad
In my example I print the lines but it's trivial to count them instead, as you need to - my version is just for the demonstration.
As I did not find a generic way to count newlines I ended up just solving my problem by iterating through all the characters using
let linesCount = text.reduce(into: 0) { (count, letter) in
if letter == "\r\n" { // This treats CRLF as one "letter", contrary to UnicodeScalars
count += 1
}
}
I was sure this would be a lot faster than enumerating lines for just counting, but I resolved to eventually do the measurement. Today I finally got to it and found ... that I could not have been more wrong.
A 10000 line string counted lines as above in about 1.0 seconds , but counting through enumeration using
var enumCount = 0
text.enumerateLines { (str, _) in
enumCount += 1
}
only took around 0.8 seconds and was consistently faster by a little more than 20%. I do not know what tricks the Swift engineers hide in their sleves, but they sure manage to enumerateLines very quickly. This just for the record.
You can use the following extension
extension String {
var numberOfLines: Int {
return self.components(separatedBy: "\n").count
}
}
Swift 5 Extension
extension String {
func numberOfLines() -> Int {
return self.numberOfOccurrencesOf(string: "\n") + 1
}
func numberOfOccurrencesOf(string: String) -> Int {
return self.components(separatedBy:string).count - 1
}
}
Example:
let testString = "First line\nSecond line\nThird line"
let numberOfLines = testString.numberOfLines() // returns 3
I use this, a CharacterSet which Apple provides, made for this task:
let newLines = text.components(separatedBy: .newlines).count - 1
Related
Trying to make a func that will count characters in between two specified char like:
count char between "#" and "." or "#" and ".com"
If this is only solution could this code be written in a simple way with .count or something less confusing
func validateEmail(_ str: String) -> Bool {
let range = 0..<str.count
var numAt = Int()
numDot = Int()
if str.contains("#") && str.contains(".") && str.characters.first != "#" {
for num in range {
if str[str.index(str.startIndex, offsetBy: num)] == "#" {
numAt = num
print("The position of # is \(numAt)")
} else if
str[str.index(str.startIndex, offsetBy: num)] == "." {
numDot = num
print("The position of . is \(numDot)")
}
}
if (numDot - numAt) > 1 {
return true
}
}
return false
}
With help from #Βασίλης Δ. i made a direct if statement for func validateEmail that check if number of char in between are less than 1
if (str.split(separator: "#").last?.split(separator: ".").first!.count)! < 1{
return false
}
It could be usefull
There are many edge cases to what you're trying to do, and email validation is notoriously complicated. I recommend doing as little of it as possible. Many, many things are legal email addresses. So you will need to think carefully about what you want to test. That said, this addresses what you've asked for, which is the distance between the first # and the first . that follows it.
func lengthOfFirstComponentAfterAt(in string: String) -> Int? {
guard
// Find the first # in the string
let firstAt = string.firstIndex(of: "#"),
// Find the first "." after that
let firstDotAfterAt = string[firstAt...].firstIndex(of: ".")
else {
return nil
}
// Return the distance between them (not counting the dot itself)
return string.distance(from: firstAt, to: firstDotAfterAt) - 1
}
lengthOfFirstComponentAfterAt(in: "rob#example.org") // Optional(7)
There's a very important lesson about Collections in this code. Notice the expression:
string[firstAt...].firstIndex(of: ".")
When you subscript a Collection, each element of the resulting slice has the same index as in the original collection. The returned value from firstIndex can be used directly to subscript string without offsetting. This is very different than how indexes work in many other languages, and allows powerful algorithms, and also creates at lot of bugs when developers forget this.
I tried to print a content of the CharacterSet.decimalDigits with:
print(CharacterSet.decimalDigits)
output: CFCharacterSet Predefined DecimalDigit Set
But my expectation was something like this:
[1, 2, 3, 4 ...]
So my question is: How to print content of the CharacterSet.decimalDigits?
This is not easy. Character sets are not made to be iterated, they are made to check whether a character is inside them or not. They don't contain the characters themselves and the ranges cannot be accessed.
The only thing you can do is to iterate over all characters and check every one of them against the character set, e.g.:
let set = CharacterSet.decimalDigits
let allCharacters = UInt32.min ... UInt32.max
allCharacters
.lazy
.compactMap { UnicodeScalar($0) }
.filter { set.contains($0) }
.map { String($0) }
.forEach { print($0) }
However, note that such a thing takes significant time and shouldn't be used inside a production application.
I don't think you can to that, at least not directly. If you look at the output of
let data = CharacterSet.decimalDigits.bitmapRepresentation
for byte in data {
print(String(format: "%02x", byte))
}
you'll see that the set internally stores bits at the code positions where the decimal digits are.
I have a piece of code in my iOS app that should go through a word and check if a character is in it. When it finds at least one, it should change a string full of "_" of the same length as the word to one with the character in the right place:
wordToGuess = six
letterGuessed = i
wordAsUnderscores = _i_
The code works. But I start to have problems when I type in characters like: "ć", "ł", "ą", etc. From using character.utf8.count I saw that Swift thinks those are not 1 but 2 characters. So I get something like this:
wordToGuess = cześć
letterGuessed = ś
wordAsUnderscores = _ _ ś (place filled with empty char) _
It takes up 2 places.
I was at it for 6 hours and didn't figure out how to fix it, so I'm asking you guys for help.
Code that is supposed to do that:
let characterGuessed = Character(letterGuessed)
for index in wordToGuess.indices {
if (wordToGuess[index] == characterGuessed) {
let endIndex = wordToGuess.index(after: index)
let charRange = index..<endIndex
wordAsUnderscores = wordAsUnderscores.replacingCharacters(in: charRange, with: letterGuessed)
wordToGuessLabel.text = wordAsUnderscores
}
}
I would like the code to treat "ć", "ł", "ą" characters the same as "i", "a" and so on. I don't want them to be treated as 2.
The reason is that you cannot use indices from one string (wordToGuess) for subscripting another string (wordAsUnderscores). Generally, indices of one collection must not be used with a different collection. (There are exception like Array though).
Here is a working variant:
let wordToGuess = "cześć"
let letterGuessed: Character = "ś"
var wordAsUnderscores = "c____"
wordAsUnderscores = String(zip(wordToGuess, wordAsUnderscores)
.map { $0 == letterGuessed ? letterGuessed : $1 })
print(wordAsUnderscores) // c__ś_
The strings are traversed in parallel, and for each correctly guessed character in wordToGuess the corresponding character in wordAsUnderscores is replaced by that character.
I just tried out a HackerRank challenge, and if a question gives you x lines of input, putting x lines of let someVariable = readLine() simply doesn't cut it, because there are lot's of test cases that shoot way more input to the code we write, so hard coded readLine() for each line of input won't fly.
Is there some way to get multiple lines of input into one variable?
For anyone else out there who's trying a HackerRank challenge for the first time, you might need to know a couple of things that you may have never come across. I only recently learned about this piece of magic called the readLine() command, which is a native function in Swift.
When the HackerRank system executes your code, it passes your code lines of input and this is a way of retrieving that input.
let line1 = readLine()
let line2 = readLine()
let line3 = readLine()
line1 is now given the value of the first line of input mentioned in the question (or delivered to your code by one of the test cases), with line2 being the second and so on.
Your code may work just great but may fail on a bunch of other test cases. These test cases don't send your code the same number of lines of input. Here's food for thought:
var string = ""
while let thing = readLine() {
string += thing + " "
}
print(string)
Now the string variable contains all the input there was to receive (as a String, in this case).
Hope that helps someone
:)
Definitely you shouldn't do this:
while let readString = readLine() {
s += readString
}
This because Swift will expect an input string (from readLine) forever and will never terminate, causing your application die by timeout.
Instead you should think in a for loop assuming you know how many lines you need to read, which is usually this way in HackerRank ;)
Try something like this:
let n = Int(readLine()!)! // Number of test cases
for _ in 1 ... n { // Loop from 1 to n
let line = readLine()! // Read a single line
// do something with input
}
If you know that each line is an integer, you can use this:
let line = Int(readLine()!)!
Or if you know each line is an array of integers, use this:
let line = readLine()!.characters.split(" ").map{ Int(String($0))! }
Or if each line is an array of strings:
let line = readLine()!.characters.split(" ").map{ String($0) }
I hope this helps.
For new version, to get an array of numbers separated by space
let numbers = readLine()!.components(separatedBy: [" "]).map { Int($0)! }
Using readLine() and AnyGenerator to construct a String array of the std input lines
readLine() will read from standard input line-by-line until EOF is hit, whereafter it returns nil.
Returns Characters read from standard input through the end of the
current line or until EOF is reached, or nil if EOF has already been
reached.
This is quite neat, as it makes readLine() a perfect candidate for generating a sequence using the AnyGenerator initializer init(body:) which recursively (as next()) invokes body, terminating in case body equals nil.
AnyGenerator
init(body: () -> Element?)
Create a GeneratorType instance whose next method invokes body
and returns the result.
With this, there's no need to actually supply the amount of lines we expect from standard input, and hence, we can catch all input from standard input e.g. into a String array, where each element corresponds to an input line:
let allLines = AnyGenerator { readLine() }.map{ $0 }
// type: Array<String>
After which we can work with the String array to apply whatever operations needed to solve a given task (/HackerRank task).
// example standard input
4 3
<tag1 value = "HelloWorld">
<tag2 name = "Name1">
</tag2>
</tag1>
tag1.tag2~name
tag1~name
tag1~value
/* resulting allLines array:
["4 3", "<tag1 value = \"HelloWorld\">",
"<tag2 name = \"Name1\">",
"</tag2>",
"</tag1>",
"tag1.tag2~name",
"tag1~name",
"tag1~value"] */
I recently discovered a neat trick to get a certain amount of lines. I'm gonna assume the first line gives you the amount of lines you get:
guard let count = readLine().flatMap({ Int($0) }) else { fatalError("No count") }
let lines = AnyGenerator{ readLine() }.prefix(count)
for line in lines {
}
I usually use this form.
if let line = readLine(), let cnt = Int(line) {
for _ in 1...cnt {
if let line = readLine() {
// your code for a line
}
}
}
Following the answer from dfrib, for Swift 3+, AnyIterator can be used instead of AnyGenerator, in the same way:
let allLines = AnyIterator { readLine() }.map{ $0 }
// type: Array<String>
What's the most Swiftian way to iterate backwards through the Characters in a String? i.e. like for ch in str, only in reverse?
I think I must be missing something obvious, because the best I could come up with just now was:
for var index = str.endIndex;
index != str.startIndex;
index = index.predecessor() {
let ch = str[index.predecessor()]
...
}
I realise "what's the best..." may be classed as subjective; I suppose what I'm really looking for is a terse yet readable way of doing this.
Edit: While reverse() works and is terse, it looks like this might be quite inefficient compared to the above, i.e. it seems like it's not actually iterating backwards, but creating a full reverse copy of the characters in the String. This would be much worse than my original if, say, you were looking for something that was usually a few characters from the end of a 10,000-character String. I'm therefore leaving this question open for a bit to attract other approaches.
The reversed function reverses a C: CollectionType and returns a ReversedCollection:
for char in "string".characters.reversed() {
// ...
}
If you find that reversed pre-reverses the string, try:
for char in "string".characters.lazy.reversed() {
// ...
}
lazy returns a lazily evaluated sequence (LazyBidirectionalCollection) then reversed() returns another LazyBidirectionalCollection that is visited in reverse.
As of December 2015 with Swift version 2.1, the proper way to do this is
for char in string.characters.reverse() {
//loop backwards
}
String no longer conforms to SequenceType<T> but its character set does.
Not sure about efficiency, but I will suggest
for ch in reverse(str) {
println(ch)
}
Here is a code for reversing a string that doesn't use reverse(str)
// Reverse String
func myReverse(str:String) -> String {
var buffer = ""
for character in str {
buffer.insert(character, atIndex: buffer.startIndex)
}
return buffer
}
myReverse("Paul") // gives “luaP”
Just a little experiment. For what its worth.
Ok, leant how to read the question....
Would this work Matt?
func ReverseIteration(str:String) {
func myReverse(str:String) -> String {
var buffer = ""
for character in str {
buffer.insert(character, atIndex: buffer.startIndex)
}
return buffer
}
// reverse string then iterate forward.
var newStr = myReverse(str)
for char in newStr {
println(char)
// do some code here
}
this?
extension String {
var reverse: String {
var reverseStr = ""
for character in self {
reverseStr = String(character) + reverseStr
}
return reverseStr
}
}