Serialize [Bit] to NSData in Swift - swift

I'm implementing Haffman coding in Swift and to get benefit from this coding I need to serialize my one-zero sequences as effective as possible. Usual approach in such situations is to use bit arrays. Swift contains Bit type and there is no problem to convert sequences into [Bit], but I haven't found any standard solution to get NSData from this array. The only way I see is to create [Int8] (which could be serialized) and fill all bits manually using bitwise operators, in the worst case I will lose 7 bits in the last element.
So if there is any standard (ready-to-use) solution for [Bit] serialization?

The simple answer is NO.
Bit in Swift is implemented as public enum Bit : Int, Comparable, RandomAccessIndexType, _Reflectable { ... }. I don't see any advantage to use Bit type against the Int, except the well defined level of abstraction. The minimal NSData instance use at least one byte (in reality the amount of memory usage depends on the underlying processor capabilities. Serialization is also just an abstraction, you can serialize your [Bit] as sequence of words "Bit with binary value One", "Bit with binary value Zero", .... To save your bit sequence to NSData and to be able to reconstruct it (unserialize it), you still need to make some kind of 'binary protocol'. At least you need to save the numbers of bits as a part of your data. If you use Huffman coding, you need to save your symbol table too ...

Related

Scodec: Using vectorOfN with a vlong field

I am playing around with the Bitcoin blockchain to learn Scala and some useful libraries.
Currently I am trying to decode and encode Blocks with SCodec and my problem is that the vectorOfN function takes its size as an Int. How can I use a long field for the size while still preserving the full value range.
In other words is there a vectorOfLongN function?
This is my code which would compile fine if I were using vintL instead of vlongL:
object Block {
implicit val codec: Codec[Block] = {
("header" | Codec[BlockHeader]) ::
(("numTx" | vlongL) >>:~
{ numTx => ("transactions" | vectorOfN(provide(numTx), Codec[Transaction]) ).hlist })
}.as[Block]
}
You may assume that appropriate Codecs for the Blockheader and the Transactions are implemented. Actually, vlong is used as a simplification for this question, because Bitcoin uses its own codec for variable sized ints.
I'm not a scodec specialist but my common sense suggests that this is not possible because Scala's Vector being a subtype of GenSeqLike is limited to have length of type Int and apply that accepts Int index as its argument. And AFAIU this limitation comes from the underlying JVM platform where you can't have an array of size more than Integer.MAX_VALUE i.e. around 2^31 (see also "Criticism of Java" wiki). And although Vector theoretically could have work this limitation around, it was not done. So it makes no sense for vectorOfN to support Long size as well.
In other words, if you want something like this, you probably should start from creating your own Vector-like class that does support Long indices working around JVM limitations.
You may want to take a look at scodec-stream, which comes in handy when all of your data is not available immediately or does not fit into memory.
Basically, you would use your usual codecs.X and turn it into a StreamDecoder via scodec.stream.decode.many(normal_codec). This way you can work with the data through scodec without the need to load it into memory entirely.
A StreamDecoder then offers methods like decodeInputStream along scodec's usual decode.
(I used it a while ago in a slightly different context – parsing data sent by a client to a server – but it looks like it would apply here as well).

How to cast [Int8] to [UInt8] in Swift

I have a buffer that contains just characters
let buffer: [Int8] = ....
Then I need to pass this to a function process that takes [UInt8] as an argument.
func process(buffer: [UInt8]) {
// some code
}
What would be the best way to pass the [Int8] buffer to cast to [Int8]? I know following code would work, but in this case the buffer contains just bunch of characters, and it is unnecessary to use functions like map.
process(buffer.map{ x in UInt8(x) }) // OK
process([UInt8](buffer)) // error
process(buffer as! [UInt8]) // error
I am using Xcode7 b3 Swift2.
I broadly agree with the other answers that you should just stick with map, however, if your array were truly huge, and it really was painful to create a whole second buffer just for converting to the same bit pattern, you could do it like this:
// first, change your process logic to be generic on any kind of container
func process<C: CollectionType where C.Generator.Element == UInt8>(chars: C) {
// just to prove it's working...
print(String(chars.map { UnicodeScalar($0) }))
}
// sample input
let a: [Int8] = [104, 101, 108, 108, 111] // ascii "Hello"
// access the underlying raw buffer as a pointer
a.withUnsafeBufferPointer { buf -> Void in
process(
UnsafeBufferPointer(
// cast the underlying pointer to the type you want
start: UnsafePointer(buf.baseAddress),
count: buf.count))
}
// this prints [h, e, l, l, o]
Note withUnsafeBufferPointer means what it says. It’s unsafe and you can corrupt memory if you get this wrong (be especially careful with the count). It works based on your external knowledge that, for example, if any of the integers are negative then your code doesn't mind them becoming corrupt unsigned integers. You might know that, but the Swift type system can't, so it won't allow it without resort to the unsafe types.
That said, the above code is correct and within the rules and these techniques are justifiable if you need the performance edge. You almost certainly won’t unless you’re dealing with gigantic amounts of data or writing a library that you will call a gazillion times.
It’s also worth noting that there are circumstances where an array is not actually backed by a contiguous buffer (for example if it were cast from an NSArray) in which case calling .withUnsafeBufferPointer will first copy all the elements into a contiguous array. Also, Swift arrays are growable so this copy of underlying elements happens often as the array grows. If performance is absolutely critical, you could consider allocating your own memory using UnsafeMutablePointer and using it fixed-size style using UnsafeBufferPointer.
For a humorous but definitely not within the rules example that you shouldn’t actually use, this will also work:
process(unsafeBitCast(a, [UInt8].self))
It's also worth noting that these solutions are not the same as a.map { UInt8($0) } since the latter will trap at runtime if you pass it a negative integer. If this is a possibility you may need to filter them first.
IMO, the best way to do this would be to stick to the same base type throughout the whole application to avoid the whole need to do casts/coercions. That is, either use Int8 everywhere, or UInt8, but not both.
If you have no choice, e.g. if you use two separate frameworks over which you have no control, and one of them uses Int8 while another uses UInt8, then you should use map if you really want to use Swift. The latter 2 lines from your examples (process([UInt8](buffer)) and
process(buffer as! [UInt8])) look more like C approach to the problem, that is, we don't care that this area in memory is an array on singed integers we will now treat it as if it is unsigneds. Which basically throws whole Swift idea of strong types to the window.
What I would probably try to do is to use lazy sequences. E.g. check if it possible to feed process() method with something like:
let convertedBuffer = lazy(buffer).map() {
UInt8($0)
}
process(convertedBuffer)
This would at least save you from extra memory overhead (as otherwise you would have to keep 2 arrays), and possibly save you some performance (thanks to laziness).
You cannot cast arrays in Swift. It looks like you can, but what's really happening is that you are casting all the elements, one by one. Therefore, you can use cast notation with an array only if the elements can be cast.
Well, you cannot cast between numeric types in Swift. You have to coerce, which is a completely different thing - that is, you must make a new object of a different numeric type, based on the original object. The only way to use an Int8 where a UInt8 is expected is to coerce it: UInt8(x).
So what is true for one Int8 is true for an entire array of Int8. You cannot cast from an array of Int8 to an array of UInt8, any more than you could cast one of them. The only way to end up with an array of UInt8 is to coerce all the elements. That is exactly what your map call does. That is the way to do it; saying it is "unnecessary" is meaningless.

Using signed integers instead of unsigned integers [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In software development, it's usually a good idea to take advantage of compiler-errors. Allowing the compiler to work for you by checking your code makes sense. In strong-type languages, if a variable only has two valid values, you'd make it a boolean or define an enum for it. Swift furthers this by bringing the Optional type.
In my mind, the same would apply to unsigned integers: if you know a negative value is impossible, program in a way that enforces it. I'm talking about high-level APIs; not low-level APIs where the negative value is usually used as a cryptic error signalling mechanism.
And yet Apple suggests avoiding unsigned integers:
Use UInt only when you specifically need an unsigned integer type with the same size as the platform’s native word size. If this is not the case, Int is preferred, even when the values to be stored are known to be non-negative. [...]
Here's an example: Swift's Array.count returns an Int. How can one possibly have negative amount of items?!
Why?!
Apple's states that:
A consistent use of Int for integer values aids code interoperability, avoids the need to convert between different number types, and matches integer type inference, as described in Type Safety and Type Inference.
But I don't buy it! Using Int wouldn't aid "interoperability" anymore than UInt since Int could resolve to Int32 or Int64 (for 32-bit and 64-bit platforms respectively).
If you care about robustness at all, using signed integers where it makes no logical sense essentially forces to do an additional check (What if the value is negative?)
I can't see the act of casting between signed and unsigned as being anything other than trivial. Wouldn't that simply indicate the compiler to resolve the machine-code to use either signed or unsigned byte-codes?!
Casting back and forth between signed and unsigned integers is extremely bug-prone on one side while adds little value on the other.
One reason to have unsigned int that you suggest, being an implicit guarantee that an index never gets negative value.. well, it's a bit speculative. Where would the potential to get a negative value come from? Of course, from the code, that is, either from a static value or from a computation. But in both cases, for a static or a computed value to be able to get negative they must be handled as signed integers. Therefore, it is a language implementation responsibility to introduce all sorts of checks every time you assign signed value to unsigned variable (or vice versa). This means that we talk not about being forced "to do an additional check" or not, but about having this check implicitly made for us by the language every time we feel lazy to bother with corner cases.
Conceptually, signed and unsigned integers come into the language from low level (machine codes). In other words, unsigned integer is in the language not because it is the language that has a need, but because it is directly bridge-able to machine instructions, hence allows performance gain just for being native. No other big reason behind. Therefore, if one has just a glimpse of portability in mind, then one would say "Be it Int and this is it. Let developer write clean code, we bring the rest".
As long as we have an opinions based question...
Basing programming language mathematical operations on machine register size is one of the great travesties of Computer Science. There should be Integer*, Rational, Real and Complex - done and dusted. You need something that maps to a U8 Register for some device driver? Call it a RegisterOfU8Data - or whatever - just not 'Int(eger)'
*Of course, calling it an 'Integer' means it 'rolls over' to an unlimited range, aka BigNum.
Sharing what I've discovered which indirectly helps me understand... at least in part. Maybe it ends up helping others?!
After a few days of digging and thinking, it seems part of my problem boils down to the usage of the word "casting".
As far back as I can remember, I've been taught that casting was very distinct and different from converting in the following ways:
Converting kept the meaning but changed the data.
Casting kept the data but changed the meaning.
Casting was a mechanism allowing you to inform the compiler how both it and you would be manipulating some piece of data (No changing of data, thus no cost). " Me to the compiler: Okay! Initially I told you this byte was an number because I wanted to perform math on it. Now, lets treat it as an ASCII character."
Converting was a mechanism for transforming the data into different formats. " Me to the compiler: I have this number, please generate an ASCII string that represents that value."
My problem, it seems, is that in Swift (and most likely other languages) the line between casting and converting is blurred...
Case in point, Apple explains that:
Type casting in Swift is implemented with the is and as operators. […]
var x: Int = 5
var y: UInt = x as UInt // Casting... Compiler refuses claiming
// it's not "convertible".
// I don't want to convert it, I want to cast it.
If "casting" is not this clearly defined action it could explain why unsigned integers are to be avoided like the plague...

NSCoding and integer arrays

How do you use NSCoding to code (and decode) an array of of ten values of primitive type int? Encode each integer individually (in a for-loop). But what if my array held one million integers? Is there a more satisfying alternative to using a for-loop here?
Edit (after first answer): And decode? (#Justin: I'll then tick your answer.)
If performance is your concern here: CFData/NSData is NSCoding compliant, so just wrap your serialized representation of the array as NSCFData.
edit to detail encoding/decoding:
your array of ints will need to to be converted to a common endian format (depending on the machine's endianness) - e.g. always store it as little or big endian. during encoding, convert it to an array of integers in the specified endianness, which is passed to the NSData object. then pass the NSData representation to the NSCoder instance. at decode, you'll receive an NSData object for the key, you conditionally convert it to the native endianness of the machine when decoding it. one set of byte swapping routines available for OS X and iOS begin with OSSwap*.
alternatively, see -[NSCoder encodeBytes:voidPtr length:numBytes forKey:key]. this routine also requires the client to swap endianness.

How do you use the different Number Types in Objective C

So I am trying to do a few things with numbers in Objective C and realize there is a plethora of options, and i am just bewildered as to which type to use for my app.
so here are the types.
NSNumber (which is a class)
NSDecmial (which is a struct)
NSDecimalNumber (which is a class)
float/double (which are primitive types)
so essentially what i need to do is take an NSString, which is representing decimal based hours. (10.4 would be 10 hours and (4/10)*60 minutes) and convert it into:
a string representation D H:M (this needs division, multiplication and basic arithmatic)
a Number type to store for easy calculations latter (will mostly be converting between NSTimeIntervals and doing subtractions)
Oh and i need to be able to do an Absolute value as well on these
It appears that the hard part is actually transitioning between the types.
To me this is a very trivial problem so I"m not sure if its getting late or because objective C numerical types suck, but i could use a hand.
Use primitive types (double, CGFLoat, NSInteger) for typical arithmetic and when you need to store a number as an instance variable that's going to be used primarily for arithmetic in other places. You can use C math functions (abs(), pow(), etc) as needed. NSTimeInterval is a typedef for double, so you can interchange the two.
Use NSNumber when you need to store a number as an object, for example if you're creating an NSArray of numbers. Some parts of Cocoa like Core Data or key value coding deal more with NSNumber than primitive types, so you may find yourself using NSNumber more then usual in those situations. For example, if you write [timeKeepersArray valueForKeyPath:#"sum.seconds"] you'll get back an NSNumber, so you may find it easier just to keep that variable instead of converting it to a primitive.
Since it's a small amount of extra code to convert between NSNumber and primitive types, usually your application will end up favoring one or the other depending on what you're doing with numbers.
Oh, and NSDecmial and NSDecimalNumber? Don't worry too much about them, they only come up when you need really precise decimal operations, such as if you're storing financial data.