I'm pulling my hair out trying to generate a valid NSRange, it doesn't seem like it should be this complicated so I'm guessing I'm using the wrong approach. Here is what I'm trying to do:
I have a string with some unicode character in it:
"The quick brown fox\n❄jumped\n❄over the lazy dog"
I want to create an NSRange from that character until the end of string, and while I can get the corresponding index for the first occurrence of the character:
text.rangeOfString("❄")?.startIndex
I can't seem to get the end of the string in a consistent format (something that I can pass to NSMakeRange) to actually generate the range. This seems like it should be pretty simple, yet I've been stuck for over an hour now trying to figure out how to get it to work, I keep ending up with Index types that I can't cast to integers to convert back to length that NSMakeRange requires for its second element.
Ideally I'd do something like this (which is invalid due to incompatible and non-castable types (Index vs Int)):
let start = text.rangeOfString("❄")?.startIndex
NSMakeRange(start, text.endIndex - start)
I am using Swift, so I have the ability to use Swift's Range<String.Index>, if it will make things easier, although it seems to be yet another range representation different from NSRange and I'm not sure how compatible the two are (don't want to run into another dimension of Index vs Int).
Cast your String as NSString.
You will be able to use Foundation's .rangeOfString instead of Swift's .rangeOfString.
The Foundation's one will return an NSRange.
Be careful though, it doesn't work the same as Swift's method with Unicode, and NSRange and Range are not compatible (although there's ways to convert them).
Related
In Swift 3, you can count the characters in a String with:
str.characters.count
I need to do this frequently, and that line above looks like it could be O(N). Is there a way to get a string length, or a length of something — maybe the underlying unicode buffer — with an operation that is guaranteed to not have to walk the entire string? Maybe:
str.utf16.count
I ask because I'm checking the length of some text every time the user types a character, to limit the size of a UITextView. The call doesn't need to be an exact count of the glyphs, like characters.count.
This is a good question. The answer is... complicated. Converting from UTF-8 to UTF-16, or vice-versa, or converting to or from some other encoding, will all require examining the string, since the characters can be made up of more than one code unit. So if you want to get the count in constant time, it's going to come down to what the internal representation is. If the string is using UTF-16 internally, then it's a reasonable assumption that string.utf16.count would be in constant time, but if the internal representation is UTF-8 or something else, then the string will need to be analyzed to determine what the length in UTF-16 would be. So what's String using internally? Well:
https://github.com/apple/swift/blob/master/stdlib/public/core/StringCore.swift
/// The core implementation of a highly-optimizable String that
/// can store both ASCII and UTF-16, and can wrap native Swift
/// _StringBuffer or NSString instances.
This is discouraging. The internal representation could be ASCII or UTF-16, or it could be wrapping a Foundation NSString. Hrm. We do know that NSString uses UTF-16 internally, since this is actually documented, so that's good. So the main outlier here is when the string stores ASCII. The saving grace is that since the first 128 Unicode code points have the same values as the ASCII character set, any ASCII character 0xXX should correspond to the UTF-16 character 0x00XX, so the UTF-16 length should simply be the ASCII length times two, and thus calculable in constant time. Is this the case in the implementation? Let's look.
In the UTF16View source, there is no implementation of count. It appears that count is inherited from Collection's implementation, which is implemented via distance():
public var count: IndexDistance {
return distance(from: startIndex, to: endIndex)
}
UTF16View's implementation of distance() looks like this:
public func distance(from start: Index, to end: Index) -> IndexDistance {
// FIXME: swift-3-indexing-model: range check start and end?
return start.encodedOffset.distance(to: end.encodedOffset)
}
And in the String.Index source, encodedOffset looks like this:
public var encodedOffset : Int {
return Int(_compoundOffset >> _Self._strideBits)
}
where _compoundOffset appears to be a simple 64-bit integer:
internal var _compoundOffset : UInt64
and _strideBits appears to be a straight integer as well:
internal static var _strideBits : Int { return 2 }
So it... looks... like you should get constant time from string.utf16.count, since unless I'm making a mistake somewhere, you're just bit-shifting a couple of integers and then comparing the results (I'd probably still run some tests to be sure). The caveat is, of course, that this isn't documented, and thus could change in the future—particularly since the documentation for String does claim that it needs to iterate through the string:
Unlike with isEmpty, calculating a view’s count property requires iterating through the elements of the string.
With all that said, you're using a UITextView, which is implemented in Objective-C via NSAttributedString. If you're willing to incur the Objective-C message-passing overhead (which, let's be honest, is probably occurring under the scenes anyway to generate the String), you can just call its length property, which, since NSAttributedString is built on top of NSString, which does guarantee that it uses UTF-16 internally, is almost certain to be in constant time.
I'm having a problem with non-latin character sets, and I need to check that a Range is in bounds before performing .substringWithRange. This seems really easy, but I can't find the way to do this.
Given a range:
let lastCharRange = currentString.endIndex.predecessor() ..< currentString.endIndex
How can I check:
let lastExpected = expectedString.substringWithRange(lastCharRange)
is in bounds?
Indexes are tied closely to the String that generated them -- actually to the String's CharacterView, which is a CollectionType. This holds true generally for collections.
So, you simply can't use the Index you got from one String on another String.
Depending on what you are doing, you might have to get a substring from the first and then search the second. You can also get the two Strings' CharacterViews and work with them via their collection-based interface: expectedString.characters.last, for example.
I have a buffer that contains just characters
let buffer: [Int8] = ....
Then I need to pass this to a function process that takes [UInt8] as an argument.
func process(buffer: [UInt8]) {
// some code
}
What would be the best way to pass the [Int8] buffer to cast to [Int8]? I know following code would work, but in this case the buffer contains just bunch of characters, and it is unnecessary to use functions like map.
process(buffer.map{ x in UInt8(x) }) // OK
process([UInt8](buffer)) // error
process(buffer as! [UInt8]) // error
I am using Xcode7 b3 Swift2.
I broadly agree with the other answers that you should just stick with map, however, if your array were truly huge, and it really was painful to create a whole second buffer just for converting to the same bit pattern, you could do it like this:
// first, change your process logic to be generic on any kind of container
func process<C: CollectionType where C.Generator.Element == UInt8>(chars: C) {
// just to prove it's working...
print(String(chars.map { UnicodeScalar($0) }))
}
// sample input
let a: [Int8] = [104, 101, 108, 108, 111] // ascii "Hello"
// access the underlying raw buffer as a pointer
a.withUnsafeBufferPointer { buf -> Void in
process(
UnsafeBufferPointer(
// cast the underlying pointer to the type you want
start: UnsafePointer(buf.baseAddress),
count: buf.count))
}
// this prints [h, e, l, l, o]
Note withUnsafeBufferPointer means what it says. It’s unsafe and you can corrupt memory if you get this wrong (be especially careful with the count). It works based on your external knowledge that, for example, if any of the integers are negative then your code doesn't mind them becoming corrupt unsigned integers. You might know that, but the Swift type system can't, so it won't allow it without resort to the unsafe types.
That said, the above code is correct and within the rules and these techniques are justifiable if you need the performance edge. You almost certainly won’t unless you’re dealing with gigantic amounts of data or writing a library that you will call a gazillion times.
It’s also worth noting that there are circumstances where an array is not actually backed by a contiguous buffer (for example if it were cast from an NSArray) in which case calling .withUnsafeBufferPointer will first copy all the elements into a contiguous array. Also, Swift arrays are growable so this copy of underlying elements happens often as the array grows. If performance is absolutely critical, you could consider allocating your own memory using UnsafeMutablePointer and using it fixed-size style using UnsafeBufferPointer.
For a humorous but definitely not within the rules example that you shouldn’t actually use, this will also work:
process(unsafeBitCast(a, [UInt8].self))
It's also worth noting that these solutions are not the same as a.map { UInt8($0) } since the latter will trap at runtime if you pass it a negative integer. If this is a possibility you may need to filter them first.
IMO, the best way to do this would be to stick to the same base type throughout the whole application to avoid the whole need to do casts/coercions. That is, either use Int8 everywhere, or UInt8, but not both.
If you have no choice, e.g. if you use two separate frameworks over which you have no control, and one of them uses Int8 while another uses UInt8, then you should use map if you really want to use Swift. The latter 2 lines from your examples (process([UInt8](buffer)) and
process(buffer as! [UInt8])) look more like C approach to the problem, that is, we don't care that this area in memory is an array on singed integers we will now treat it as if it is unsigneds. Which basically throws whole Swift idea of strong types to the window.
What I would probably try to do is to use lazy sequences. E.g. check if it possible to feed process() method with something like:
let convertedBuffer = lazy(buffer).map() {
UInt8($0)
}
process(convertedBuffer)
This would at least save you from extra memory overhead (as otherwise you would have to keep 2 arrays), and possibly save you some performance (thanks to laziness).
You cannot cast arrays in Swift. It looks like you can, but what's really happening is that you are casting all the elements, one by one. Therefore, you can use cast notation with an array only if the elements can be cast.
Well, you cannot cast between numeric types in Swift. You have to coerce, which is a completely different thing - that is, you must make a new object of a different numeric type, based on the original object. The only way to use an Int8 where a UInt8 is expected is to coerce it: UInt8(x).
So what is true for one Int8 is true for an entire array of Int8. You cannot cast from an array of Int8 to an array of UInt8, any more than you could cast one of them. The only way to end up with an array of UInt8 is to coerce all the elements. That is exactly what your map call does. That is the way to do it; saying it is "unnecessary" is meaningless.
I'm pretty new in Swift and I was wondering what is the difference between this (that compiles successfully, and returns "A"):
var label = "Apoel"
label[label.startIndex]
and the following, for which the compiler is complaining:
label[0]
I know that label is not an array of chars like C but using the first approach, means that the string manipulation in Swift is similar to that of C.
Also, I understand that the word finishes with something like C's "\0" because
label[label.endIndex]
gives an empty character while
label[label.endIndex.predecessor()
returns "l" which is the last letter of the String.
startIndex is of type Index which is a struct and not a simple Integer.
So I am trying to do a few things with numbers in Objective C and realize there is a plethora of options, and i am just bewildered as to which type to use for my app.
so here are the types.
NSNumber (which is a class)
NSDecmial (which is a struct)
NSDecimalNumber (which is a class)
float/double (which are primitive types)
so essentially what i need to do is take an NSString, which is representing decimal based hours. (10.4 would be 10 hours and (4/10)*60 minutes) and convert it into:
a string representation D H:M (this needs division, multiplication and basic arithmatic)
a Number type to store for easy calculations latter (will mostly be converting between NSTimeIntervals and doing subtractions)
Oh and i need to be able to do an Absolute value as well on these
It appears that the hard part is actually transitioning between the types.
To me this is a very trivial problem so I"m not sure if its getting late or because objective C numerical types suck, but i could use a hand.
Use primitive types (double, CGFLoat, NSInteger) for typical arithmetic and when you need to store a number as an instance variable that's going to be used primarily for arithmetic in other places. You can use C math functions (abs(), pow(), etc) as needed. NSTimeInterval is a typedef for double, so you can interchange the two.
Use NSNumber when you need to store a number as an object, for example if you're creating an NSArray of numbers. Some parts of Cocoa like Core Data or key value coding deal more with NSNumber than primitive types, so you may find yourself using NSNumber more then usual in those situations. For example, if you write [timeKeepersArray valueForKeyPath:#"sum.seconds"] you'll get back an NSNumber, so you may find it easier just to keep that variable instead of converting it to a primitive.
Since it's a small amount of extra code to convert between NSNumber and primitive types, usually your application will end up favoring one or the other depending on what you're doing with numbers.
Oh, and NSDecmial and NSDecimalNumber? Don't worry too much about them, they only come up when you need really precise decimal operations, such as if you're storing financial data.