If I have a long range of numbers such as 1...1000000, what would be an efficient way to convert them to strings with the following mapping?
1->A, 2->B, 3->C, ... 10->A0, 11->AA, 12->AB etc.
I took the approach of splitting each number into digits (using modulus) and using it to get a character from an array to build the strings. Takes about 5 seconds for 1...1000. Is there a faster approach?
My code:
let numbers = 1...1000000
let charArray:[Character] = ["0","A","B","C","D","E","F","G","H","I"]
var results: [String] = []
func transformNumbers() {
for number in numbers {
var string = ""
var i = number
while i > 0 {string.insert(charArray[(i%10)], at: string.startIndex); i/=10}
results.append(string)
}
}
Your code took about 15 seconds on my old MacBook for 1...1000000, and the code below, less than 1 second:
(Using Xcode 8.3.3 with Release build on macOS 10.12.5)
let unicodeScalarArray:[UnicodeScalar] = ["0","A","B","C","D","E","F","G","H","I"]
let utf16CodeUnitArray:[UInt16] = unicodeScalarArray.map{UInt16($0.value)}
var results: [String] = []
func transformNumbers7() {
results = numbers.map {number in
var digits: [UInt16] = []
var i = number
while i > 0 {digits.append(utf16CodeUnitArray[i%10]); i/=10}
digits.reverse()
return String(utf16CodeUnits: digits, count: digits.count)
}
}
Generally,
Repeated insert(_:at:) can be slower than repeated append(_:) and reverse()
Working with Characters may be less efficient than UnicodeScalar, UTF-16 Code Units or UTF-8 Code Units.
Not sure if it is the fastest way, but switching to a map expression instead of mutating the results list speeds things up a bit over 10x on my machine:
let results = numbers.map { (val: Int) -> String in
var string = ""
var i = val
while i > 0 {string.insert(charArray[(i%10)], at: string.startIndex); i/=10}
return string
}
Related
I'm doing an exercise which requires producing 64-bit positive integers in Swfit, but I have no idea how that can be achieved. My machine is 64-bit for sure, but my test code cannot even produce 63-bit prositive integers.
Using Double may solve the problem, but that's not what the exercise intends to be. Is there any solution for this issue? Thank you.
The test code is as follows:
import Foundation
func numberOfGrainsOnChessBoard () {
let ar = Array(1...64)
let arr = ar.map{twoMultipliedNTimes($0)}
var index = 1
for i in arr {
print("\(index): \(i)")
index = index + 1
}
}
func twoMultipliedNTimes (_ times: Int) -> UInt64 {
var product : UInt64 = 1;
for _ in 1...times {
product = product * 2
}
return product
}
addGrainsOnChessBoard()
The above code got an overflown error.
The code below will generate random Int64 bits integers between it's minimum and it's maximum value. So you can change the range to fit your needs.
let myInt: Int64 = Int64.random(in: Int64.min...Int64.max)
I have a function in Swift that computes the hamming distance of two strings and then puts them into a connected graph if the result is 1.
For example, read to hear returns a hamming distance of 2 because read[0] != hear[0] and read[3] != hear[3].
At first, I thought my function was taking a long time because of the quantity of input (8,000+ word dictionary), but I knew that several minutes was too long. So, I rewrote my same algorithm in Java, and the computation took merely 0.3s.
I have tried writing this in Swift two different ways:
Way 1 - Substrings
extension String {
subscript (i: Int) -> String {
return self[Range(i ..< i + 1)]
}
}
private func getHammingDistance(w1: String, w2: String) -> Int {
if w1.length != w2.length { return -1 }
var counter = 0
for i in 0 ..< w1.length {
if w1[i] != w2[i] { counter += 1 }
}
return counter
}
Results: 434 seconds
Way 2 - Removing Characters
private func getHammingDistance(w1: String, w2: String) -> Int {
if w1.length != w2.length { return -1 }
var counter = 0
var c1 = w1, c2 = w2 // need to mutate
let length = w1.length
for i in 0 ..< length {
if c1.removeFirst() != c2.removeFirst() { counter += 1 }
}
return counter
}
Results: 156 seconds
Same Thing in Java
Results: 0.3 seconds
Where it's being called
var graph: Graph
func connectData() {
let verticies = graph.canvas // canvas is Array<Node>
// Node has key that holds the String
for vertex in 0 ..< verticies.count {
for compare in vertex + 1 ..< verticies.count {
if getHammingDistance(w1: verticies[vertex].key!, w2: verticies[compare].key!) == 1 {
graph.addEdge(source: verticies[vertex], neighbor: verticies[compare])
}
}
}
}
156 seconds is still far too inefficient for me. What is the absolute most efficient way of comparing characters in Swift? Is there a possible workaround for computing hamming distance that involves not comparing characters?
Edit
Edit 1: I am taking an entire dictionary of 4 and 5 letter words and creating a connected graph where the edges indicate a hamming distance of 1. Therefore, I am comparing 8,000+ words to each other to generate edges.
Edit 2: Added method call.
Unless you chose a fixed length character model for your strings, methods and properties such as .count and .characters will have a complexity of O(n) or at best O(n/2) (where n is the string length). If you were to store your data in an array of character (e.g. [Character] ), your functions would perform much better.
You can also combine the whole calculation in a single pass using the zip() function
let hammingDistance = zip(word1.characters,word2.characters)
.filter{$0 != $1}.count
but that still requires going through all characters of every word pair.
...
Given that you're only looking for Hamming distances of 1, there is a faster way to get to all the unique pairs of words:
The strategy is to group words by the 4 (or 5) patterns that correspond to one "missing" letter. Each of these pattern groups defines a smaller scope for word pairs because words in different groups would be at a distance other than 1.
Each word will belong to as many groups as its character count.
For example :
"hear" will be part of the pattern groups:
"*ear", "h*ar", "he*r" and "hea*".
Any other word that would correspond to one of these 4 pattern groups would be at a Hamming distance of 1 from "hear".
Here is how this can be implemented:
// Test data 8500 words of 4-5 characters ...
var seenWords = Set<String>()
var allWords = try! String(contentsOfFile: "/usr/share/dict/words")
.lowercased()
.components(separatedBy:"\n")
.filter{$0.characters.count == 4 || $0.characters.count == 5}
.filter{seenWords.insert($0).inserted}
.enumerated().filter{$0.0 < 8500}.map{$1}
// Compute patterns for a Hamming distance of 1
// Replace each letter position with "*" to create patterns of
// one "non-matching" letter
public func wordH1Patterns(_ aWord:String) -> [String]
{
var result : [String] = []
let fullWord : [Character] = aWord.characters.map{$0}
for index in 0..<fullWord.count
{
var pattern = fullWord
pattern[index] = "*"
result.append(String(pattern))
}
return result
}
// Group words around matching patterns
// and add unique pairs from each group
func addHamming1Edges()
{
// Prepare pattern groups ...
//
var patternIndex:[String:Int] = [:]
var hamming1Groups:[[String]] = []
for word in allWords
{
for pattern in wordH1Patterns(word)
{
if let index = patternIndex[pattern]
{
hamming1Groups[index].append(word)
}
else
{
let index = hamming1Groups.count
patternIndex[pattern] = index
hamming1Groups.append([word])
}
}
}
// add edge nodes ...
//
for h1Group in hamming1Groups
{
for (index,sourceWord) in h1Group.dropLast(1).enumerated()
{
for targetIndex in index+1..<h1Group.count
{ addEdge(source:sourceWord, neighbour:h1Group[targetIndex]) }
}
}
}
On my 2012 MacBook Pro, the 8500 words go through 22817 (unique) edge pairs in 0.12 sec.
[EDIT] to illustrate my first point, I made a "brute force" algorithm using arrays of characters instead of Strings :
let wordArrays = allWords.map{Array($0.unicodeScalars)}
for i in 0..<wordArrays.count-1
{
let word1 = wordArrays[i]
for j in i+1..<wordArrays.count
{
let word2 = wordArrays[j]
if word1.count != word2.count { continue }
var distance = 0
for c in 0..<word1.count
{
if word1[c] == word2[c] { continue }
distance += 1
if distance > 1 { break }
}
if distance == 1
{ addEdge(source:allWords[i], neighbour:allWords[j]) }
}
}
This goes through the unique pairs in 0.27 sec. The reason for the speed difference is the internal model of Swift Strings which is not actually an array of equal length elements (characters) but rather a chain of varying length encoded characters (similar to the UTF model where special bytes indicate that the following 2 or 3 bytes are part of a single character. There is no simple Base+Displacement indexing of such a structure which must always be iterated from the beginning to get to the Nth element.
Note that I used unicodeScalars instead of Character because they are 16 bit fixed length representations of characters that allow a direct binary comparison. The Character type isn't as straightforward and take longer to compare.
Try this:
extension String {
func hammingDistance(to other: String) -> Int? {
guard self.characters.count == other.characters.count else { return nil }
return zip(self.characters, other.characters).reduce(0) { distance, chars in
distance + (chars.0 == chars.1 ? 0 : 1)
}
}
}
print("read".hammingDistance(to: "hear")) // => 2
The following code executed in 0.07 secounds for 8500 characters:
func getHammingDistance(w1: String, w2: String) -> Int {
if w1.characters.count != w2.characters.count {
return -1
}
let arr1 = Array(w1.characters)
let arr2 = Array(w2.characters)
var counter = 0
for i in 0 ..< arr1.count {
if arr1[i] != arr2[i] { counter += 1 }
}
return counter
}
After some messing around, I found a faster solution to #Alexander's answer (and my previous broken answer)
extension String {
func hammingDistance(to other: String) -> Int? {
guard !self.isEmpty, !other.isEmpty, self.characters.count == other.characters.count else {
return nil
}
var w1Iterator = self.characters.makeIterator()
var w2Iterator = other.characters.makeIterator()
var distance = 0;
while let w1Char = w1Iterator.next(), let w2Char = w2Iterator.next() {
distance += (w1Char != w2Char) ? 1 : 0
}
return distance
}
}
For comparing strings with a million characters, on my machine it's 1.078 sec compared to 1.220 sec, so roughly a 10% improvement. My guess is this is due to avoiding .zip and the slight overhead of .reduce and tuples
As others have noted, calling .characters repeatedly takes time. If you convert all of the strings once, it should help.
func connectData() {
let verticies = graph.canvas // canvas is Array<Node>
// Node has key that holds the String
// Convert all of the keys to utf16, and keep them
let nodesAsUTF = verticies.map { $0.key!.utf16 }
for vertex in 0 ..< verticies.count {
for compare in vertex + 1 ..< verticies.count {
if getHammingDistance(w1: nodesAsUTF[vertex], w2: nodesAsUTF[compare]) == 1 {
graph.addEdge(source: verticies[vertex], neighbor: verticies[compare])
}
}
}
}
// Calculate the hamming distance of two UTF16 views
func getHammingDistance(w1: String.UTF16View, w2: String.UTF16View) -> Int {
if w1.count != w2.count {
return -1
}
var counter = 0
for i in w1.startIndex ..< w1.endIndex {
if w1[i] != w1[i] {
counter += 1
}
}
return counter
}
I used UTF16, but you might want to try UTF8 depending on the data. Since I don't have the dictionary you are using, please let me know the result!
*broken*, see new answer
My approach:
private func getHammingDistance(w1: String, w2: String) -> Int {
guard w1.characters.count == w2.characters.count else {
return -1
}
let countArray: Int = w1.characters.indices
.reduce(0, {$0 + (w1[$1] == w2[$1] ? 0 : 1)})
return countArray
}
comparing 2 strings of 10,000 random characters took 0.31 seconds
To expand a bit: it should only require one iteration through the strings, adding as it goes.
Also it's way more concise 🙂.
Consider this function to build a string of random characters:
func makeToken(length: Int) -> String {
let chars: String = "abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ"
var result: String = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let idxEnd = idx + 1
let range: Range = idx..<idxEnd
let char = chars.substring(with: range)
result += char
}
return result
}
This throws an error on the substring method:
Cannot convert value of type 'Range<Int>' to expected argument
type 'Range<String.Index>' (aka 'Range<String.CharacterView.Index>')
I'm confused why I can't simply provide a Range with 2 integers, and why it's making me go the roundabout way of making a Range<String.Index>.
So I have to change the Range creation to this very over-complicated way:
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let start = chars.index(chars.startIndex, offsetBy: idx)
let end = chars.index(chars.startIndex, offsetBy: idx + 1)
let range: Range = start..<end
Why isn't it good enough for Swift for me to simply create a range with 2 integers and the half-open range operator? (..<)
Quite the contrast to "swift", in javascript I can simply do chars.substr(idx, 1)
I suggest converting your String to [Character] so that you can index it easily with Int:
func makeToken(length: Int) -> String {
let chars = Array("abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ".characters)
var result = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.count)))
result += String(chars[idx])
}
return result
}
Swift takes great care to provide a fully Unicode-compliant, type-safe, String abstraction.
Indexing a given Character, in an arbitrary Unicode string, is far from a trivial task. Each Character is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character. In particular, hiding all this complexity behind a simple Int based indexing scheme might result in the wrong performance mental model for programmers.
Having said that, you can always convert your string to a Array<Character> once for easy (and fast!) indexing. For instance:
let chars: String = "abcdefghijklmnop"
var charsArray = Array(chars.characters)
...
let resultingString = String(charsArray)
I am trying to take a hex string and insert dashes between every other character (e.g. "b201a968" to "b2-01-a9-68"). I have found several ways to do it, but the problem is my string is fairly large (8066 characters) and the fastest I can get it to work it still takes several seconds. These are the ways I have tried and how long they are taking. Can anyone help me optimize this function?
//42.68 seconds
func reformatDebugString(string: String) -> String
{
var myString = string
var index = 2
while(true){
myString.insert("-", at: myString.index(myString.startIndex, offsetBy: index))
index += 3
if(index >= myString.characters.count){
break
}
}
return myString
}
//21.65 seconds
func reformatDebugString3(string: String) -> String
{
var myString = ""
let length = string.characters.count
var first = true
for i in 0...length-1{
let index = string.index(myString.startIndex, offsetBy: i)
let c = string[index]
myString += "\(c)"
if(!first){
myString += "-"
}
first = !first
}
return myString
}
//11.37 seconds
func reformatDebugString(string: String) -> String
{
var myString = string
var index = myString.characters.count - 2
while(true){
myString.insert("-", at: myString.index(myString.startIndex, offsetBy: index))
index -= 2
if(index == 0){
break
}
}
return myString
}
The problem with all three of your approaches is the use of index(_:offsetBy:) in order to get the index of the current character in your loop. This is an O(n) operation where n is the distance to offset by – therefore making all three of your functions run in quadratic time.
Furthermore, for solutions #1 and #3, your insertion into the resultant string is an O(n) operation, as all the characters after the insertion point have to be shifted up to accommodate the added character. It's generally cheaper to build up the string from scratch in this case, as we can just add a given character onto the end of the string, which is O(1) if the string has enough capacity, O(n) otherwise.
Also for solution #1, saying myString.characters.count is an O(n) operation, so not something you want to be doing at each iteration of the loop.
So, we want to build the string from scratch, and avoid indexing and calculating the character count inside the loop. Here's one way of doing that:
extension String {
func addingDashes() -> String {
var result = ""
for (offset, character) in characters.enumerated() {
// don't insert a '-' before the first character,
// otherwise insert one before every other character.
if offset != 0 && offset % 2 == 0 {
result.append("-")
}
result.append(character)
}
return result
}
}
// ...
print("b201a968".addingDashes()) // b2-01-a9-68
Your best solution (#3) in a release build took 37.79s on my computer, the method above took 0.023s.
As already noted in Hamish's answer, you should avoid these two things:
calculate each index with string.index(string.startIndex, offsetBy: ...)
modifying a large String with insert(_:at:)
So, this can be another way:
func reformatDebugString4(string: String) -> String {
var result = ""
var currentIndex = string.startIndex
while currentIndex < string.endIndex {
let nextIndex = string.index(currentIndex, offsetBy: 2, limitedBy: string.endIndex) ?? string.endIndex
if currentIndex != string.startIndex {
result += "-"
}
result += string[currentIndex..<nextIndex]
currentIndex = nextIndex
}
return result
}
My current attempts at creating a random unicode character generate have failed with errors such as those mentioned in my other question here. It's obviously not as simple as just generating a random number.
Question: How can I generate a random unicode character in Swift?
Unicode Scalar Value
Any Unicode code point except high-surrogate and low-surrogate code
points. In other words, the ranges of integers 0 to D7FF and E000
to 10FFFF inclusive.
So, I've made a small code's snippet. See below.
This code works
func randomUnicodeCharacter() -> String {
let i = arc4random_uniform(1114111)
return (i > 55295 && i < 57344) ? randomUnicodeCharacter() : String(UnicodeScalar(i))
}
randomUnicodeCharacter()
This code doesn't work!
let N: UInt32 = 65536
let i = arc4random_uniform(N)
var c = String(UnicodeScalar(i))
print(c, appendNewline: false)
I was a little bit confused with this and this. [Maximum value: 65535]
static func randomCharacters(withLength length: Int = 20) -> String {
let base = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
var randomString: String = ""
for _ in 0..<length {
let randomValue = arc4random_uniform(UInt32(base.characters.count))
randomString += "\(base[base.index(base.startIndex, offsetBy: Int(randomValue))])"
}
return randomString
}
Here you can modify length (Int) and use this for generating random characters.