I have a set of strings I need to sort in an order which is not Latin alphabetic.
Specifically, I have a string "AiyawbpfmnrhHxXzsSqkgtTdD" which specifies the sorting order, i.e., "y" comes before "a", but after "A". In case you are interested, this is the sort order for ancient Egyptian hieroglyphs as specified in the Manuel de Codage.
In Swift, is there a convenient way to specify a predicate or other approach for this type of collation order?
First, turn your alphabet into a Dictionary that maps each Character to its integer position in the alphabet:
import Foundation
let hieroglyphAlphabet = "AiyawbpfmnrhHxXzsSqkgtTdD"
let hieroglyphCodes = Dictionary(
uniqueKeysWithValues: hieroglyphAlphabet
.enumerated()
.map { (key: $0.element, value: $0.offset) }
)
Next, extend StringProtocol with a property that returns an array of such alphabetic positions:
extension StringProtocol {
var hieroglyphEncoding: [Int] { map { hieroglyphCodes[$0] ?? -1 } }
}
I'm turning non-alphabetic characters into -1, so they will be treated as less-than alphabetic characters. You could turn them into .max to treat them as greater-than, or use a more complex type than Int if you need more special treatment.
Now you can sort an array of strings by hieroglyphEncoding, using the lexicographicallyPrecedes method of Sequence:
let unsorted = "this is the sort order".components(separatedBy: " ")
let sorted = unsorted.sorted {
$0.hieroglyphEncoding.lexicographicallyPrecedes($1.hieroglyphEncoding)
}
print(sorted)
Output:
["order", "is", "sort", "the", "this"]
It is not efficient to recompute the hieroglyphEncoding of each string on demand during the sort, so if you have many strings to sort, you should wrap each string and its encoding into a wrapper for sorting or use a Schwartzian transform.
Related
In a mobile App I use an API that can only handle about 300 words. How can I trimm a string in Swift so that it doesn't contain more words?
The native .trimmingCharacters(in: CharacterSet) does not seem to be able to do this as it is intended to trimm certain characters.
There is no off-the shelf way to limit the number of words in a string.
If you look at this post, it documents using the method enumerateSubstrings(in: Range) and setting an option of .byWords. It looks like it returns an array of Range values.
You could use that to create an extension on String that would return the first X words of that string:
extension String {
func firstXWords(_ wordCount: Int) -> Substring {
var ranges: [Range<String.Index>] = []
self.enumerateSubstrings(in: self.startIndex..., options: .byWords) { _, range, _, _ in
ranges.append(range)
}
if ranges.count > wordCount - 1 {
return self[self.startIndex..<ranges[wordCount - 1].upperBound]
} else {
return self[self.startIndex..<self.endIndex]
}
}
}
If we then run the code:
let sentence = "I want to an algorithm that could help find out how many words are there in a string separated by space or comma or some character. And then append each word separated by a character to an array which could be added up later I'm making an average calculator so I want the total count of data and then add up all the words. By words I mean the numbers separated by a character, preferably space Thanks in advance"
print(sentence.firstXWords(10))
The output is:
I want to an algorithm that could help find out
Using enumerateSubstrings(in: Range) is going to give much better results than splitting your string using spaces, since there are a lot more separators than just spaces in normal text (newlines, commas, colons, em spaces, etc.) It will also work for languages like Japanese and Chinese that often don't have spaces between words.
You might be able to rewrite the function to terminate the enumeration of the string as soon as it reaches the desired number of words. If you want a small percentage of the words in a very long string that would make it significantly faster (the code above should have O(n) performance, although I haven't dug deeply enough to be sure of that. I also couldn't figure out how to terminate the enumerateSubstrings() function early, although I didn't try that hard.)
Leo Dabus provided an improved version of my function. It extends StringProtocol rather than String, which means it can work on substrings. Plus, it stops once it hits your desired word count, so it will be much faster for finding the first few words of very long strings:
extension StringProtocol {
func firstXWords(_ n: Int) -> SubSequence {
var endIndex = self.endIndex
var words = 0
enumerateSubstrings(in: startIndex..., options: .byWords) { _, range, _, stop in
words += 1
if words == n {
stop = true
endIndex = range.upperBound
}
}
return self[..<endIndex] }
}
I'm a new Swift developer. I'm using Swift 4.2 and Xcode 10.2.
I would like to search an array for a single result that has the most characters compared to my search string. To be more specific, I need the longest string from my array which is a prefix of the search string.
For example, if my array is:
let array = ["1", "13", "1410", "1649", "1670"]
and my search string is:
let searchString = "16493884777"
I would like the result to be "1649".
I can't find another SO question that has a swift solution.
You could just iterate over the prefix array from the end (assuming the prefix array is sorted) and return immediately if you hit a match since that prefix will be guaranteed to be the longest since another matching prefix of the same length cannot exist:
import Foundation
func longestMatchingPrefix(_ prefixArray: [String], _ searchString: String) -> String {
for p in prefixArray.reversed() {
if searchString.hasPrefix(p) {
return p
}
}
return "No matching prefix found"
}
print(longestMatchingPrefix(["1", "13", "1410", "1649", "1670"], "16493884777"))
Output:
1649
I am wondering why map format has to be {( )} rather than just { }
func intersect(_ nums1: [Int], _ nums2: [Int]) -> [Int] {
// the following is right
var num1Reduce = nums1.reduce(0){ $0 + $ 1}
/// the following is wrong ??
var num2Dict = Dictionary(nums2.map{ $0, 1 }, uniquingKeysWith : +)
// the following is right
var num1Dict = Dictionary(nums1.map{ ($0, 1) }, uniquingKeysWith : +)
}
and I even see the following format ({ }). I am totally confused!
let cars = peopleArray.map({ $0.cars })
print(cars)
You are using the following Dictionary initializer:
init<S>(_ keysAndValues: S, uniquingKeysWith combine: (Dictionary<Key, Value>.Value, Dictionary<Key, Value>.Value) throws -> Dictionary<Key, Value>.Value) rethrows where S : Sequence, S.Element == (Key, Value)
Note that S is a sequence where its elements are a tuple of key/value pairs.
When you pass nums1.map{ ($0, 1) } to the first parameter, you are creating an array of key/value tuples from nums1.
It fails when you use nums2.map{ $0, 1 } because that is missing the parentheses for the tuple.
Keep in mind that nums1.map{ ($0, 1) } is shorthand for nums1.map({ ($0, 1) }). That's all related to trailing closures which has nothing to do with the parentheses for the tuple that appear inside the { }.
A map is a function that takes a closure as a parameter. We can call the map and pass the parameter like we do for any other ordinary function call without removing the brackets ()e.g
(0...100).map ({ _ in print("yeti")})
But swift allows us to remove the brackets as a way of shorthanding and we can write it like, hence eliminating the ()
(0...100).map { _ in print("yeti")}
But incase you want to access individual values of the array elements, you can do so in two ways,
Given an array, you can access it's individual element using $0, which basically says, Hey map, give me the first element at this current index.
(0...100).map {$0}
Instead of using the default swift indexing, you decide to define the value you are accessing by giving it a readable variable name e.g
(0...100).map {element in}
This gets $0 and assigns it to element, the in keyword basically tells the compiler that hey, $0 is now element and we are going to use it after in. Otherwise if you remove the in keyword, the compiler says it doesn't know any variable called element.
For special collections like dictionaries, they have two values per index, i.e the key and value, therefore if you want to access the contents of a dictionary during the mapping, you can do it in two ways like above, a). use the default swift indexes, or give the values per index, readable variable names. e.g
let dictionary = ["a": 3, "b": 4, "c": 5]
dictionary.map{($0, $1)}
We use inner brackets () to let the compiler know that the collection we are mapping over has two values per index. Please note the inner parenthesis are creating a tuple
dictionary.map {(key, value) in }
I'm looking for a way, in Swift 4, to test if a Character is a member of an arbitrary CharacterSet. I have this Scanner class that will be used for some lightweight parsing. One of the functions in the class is to skip any characters, at the current position, that belong to a certain set of possible characters.
class MyScanner {
let str: String
var idx: String.Index
init(_ string: String) {
str = string
idx = str.startIndex
}
var remains: String { return String(str[idx..<str.endIndex])}
func skip(charactersIn characters: CharacterSet) {
while idx < str.endIndex && characters.contains(str[idx])) {
idx = source.index(idx, offsetBy: 1)
}
}
}
let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print("what remains: \"\(scanner.remains)\"")
I would like to implement the skip(charactersIn:) function so that the above code would print buzz fizz.
The tricky part is characters.contains(str[idx])) in the while - .contains() requires a Unicode.Scalar, and I'm at a loss trying to figure out the next step.
I know I could pass in a String to the skip function, but I'd like to find a way to make it work with a CharacterSet, because of all the convenient static members (alphanumerics, whitespaces, etc.).
How does one test a CharacterSet if it contains a Character?
Not sure if it's the most efficient way but you can create a new CharSet and check if they are sub/super-sets (Set comparison is rather quick)
let newSet = CharacterSet(charactersIn: "a")
// let newSet = CharacterSet(charactersIn: "\(character)")
print(newSet.isSubset(of: CharacterSet.decimalDigits)) // false
print(newSet.isSubset(of: CharacterSet.alphanumerics)) // true
Swift 4.2
CharacterSet extension function to check whether it contains Character:
extension CharacterSet {
func containsUnicodeScalars(of character: Character) -> Bool {
return character.unicodeScalars.allSatisfy(contains(_:))
}
}
Usage example:
CharacterSet.decimalDigits.containsUnicodeScalars(of: "3") // true
CharacterSet.decimalDigits.containsUnicodeScalars(of: "a") // false
I know that you wanted to use CharacterSet rather than String, but CharacterSet does not (yet, at least) support characters that are composed of more than one Unicode.Scalar. See the "family" character (π©βπ©βπ§βπ¦) or the international flag characters (e.g. "π―π΅" or "π―π²") that Apple demonstrated in the string discussion in WWDC 2017 video What's New in Swift. The multiple skin tone emoji also manifest this behavior (e.g. π©π» vs π©π½).
As a result, I'd be wary of using CharacterSet (which is a "set of Unicode character values for use in search operations"). Or, if you want to provide this method for the sake of convenience, be aware that it will not work correctly with characters represented by multiple unicode scalars.
So, you might offer a scanner that provides both CharacterSet and String renditions of the skip method:
class MyScanner {
let string: String
var index: String.Index
init(_ string: String) {
self.string = string
index = string.startIndex
}
var remains: String { return String(string[index...]) }
/// Skip characters in a string
///
/// This rendition is safe to use with strings that have characters
/// represented by more than one unicode scalar.
///
/// - Parameter skipString: A string with all of the characters to skip.
func skip(charactersIn skipString: String) {
while index < string.endIndex, skipString.contains(string[index]) {
index = string.index(index, offsetBy: 1)
}
}
/// Skip characters in character set
///
/// Note, character sets cannot (yet) include characters that are represented by
/// more than one unicode scalar (e.g. π©βπ©βπ§βπ¦ or π―π΅ or π°π»). If you want to test
/// for these multi-unicode characters, you have to use the `String` rendition of
/// this method.
///
/// This will simply stop scanning if it encounters a multi-unicode character in
/// the string being scanned (because it knows the `CharacterSet` can only represent
/// single-unicode characters) and you want to avoid false positives (e.g., mistaking
/// the Jamaican flag, π―π², for the Japanese flag, π―π΅).
///
/// - Parameter characterSet: The character set to check for membership.
func skip(charactersIn characterSet: CharacterSet) {
while index < string.endIndex,
string[index].unicodeScalars.count == 1,
let character = string[index].unicodeScalars.first,
characterSet.contains(character) {
index = string.index(index, offsetBy: 1)
}
}
}
Thus, your simple example will still work:
let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print(scanner.remains) // "buzz fizz"
But use the String rendition if the characters you want to skip might include multiple unicode scalars:
let family = "π©\u{200D}π©\u{200D}π§\u{200D}π¦" // π©βπ©βπ§βπ¦
let boy = "π¦"
let charactersToSkip = family + boy
let string = boy + family + "foobar" // π¦π©βπ©βπ§βπ¦foobar
let scanner = MyScanner(string)
scanner.skip(charactersIn: charactersToSkip)
print(scanner.remains) // foobar
As Michael Waterfall noted in the comments below, CharacterSet has a bug and doesnβt even handle 32-bit Unicode.Scalar values correctly, meaning that it doesnβt even handle single scalar characters properly if the value exceeds 0xffff (including emoji, amongst others). The String rendition, above, handles these correctly, though.
The String function padding(toLength:withPad:startingAt:) will pad strings by adding padding characters on the end to "fill out" the string to the desired length.
Is there an equivalent function that will pad strings by prepending padding characters at the beginning?
This would be useful if you want to right-justify a substring in a monospaced output string, for example.
I could certainly write one, but I would expect there to be a built-in function, seeing as how there is already a function that pads at the end.
You can do this by reversing the string, padding at the end, end then reversing againβ¦
let string = "abc"
// Pad at end
string.padding(toLength: 7, withPad: "X", startingAt: 0)
// "abcXXXX"
// Pad at start
String(String(string.reversed()).padding(toLength: 7, withPad: "X", startingAt: 0).reversed())
// "XXXXabc"
Since Swift string manipulations are a rehash of the old NSString class, I suppose Apple never bothered to complete the feature set and just gave us toll free bridging as mana from the gods.
Or, since Objective-C never shied away from super verbose yet cryptic code, they expect us to use the native function twice :
let a = "42"
"".padding(toLength:10, withPad:a.padding(toLength:10, withPad:"0", startingAt:0), startingAt:a.characters.count)
// 0000000042
.
[EDIT] Objective-C ranting aside, the solution is a bit more subtle than that and adding some more useful padding methods to the String type is probably going to make things easier to use and maintain:
For example:
extension String
{
func padding(leftTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
let rightPadded = self.padding(toLength:max(count,paddedLength), withPad:pad, startingAt:padStart)
return "".padding(toLength:paddedLength, withPad:rightPadded, startingAt:count % paddedLength)
}
func padding(rightTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
return self.padding(toLength:paddedLength, withPad:pad, startingAt:padStart)
}
func padding(sidesTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
let rightPadded = self.padding(toLength:max(count,paddedLength), withPad:pad, startingAt:padStart)
return "".padding(toLength:paddedLength, withPad:rightPadded, startingAt:(paddedLength+count)/2 % paddedLength)
}
}