Using signed integers instead of unsigned integers [closed] - swift

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In software development, it's usually a good idea to take advantage of compiler-errors. Allowing the compiler to work for you by checking your code makes sense. In strong-type languages, if a variable only has two valid values, you'd make it a boolean or define an enum for it. Swift furthers this by bringing the Optional type.
In my mind, the same would apply to unsigned integers: if you know a negative value is impossible, program in a way that enforces it. I'm talking about high-level APIs; not low-level APIs where the negative value is usually used as a cryptic error signalling mechanism.
And yet Apple suggests avoiding unsigned integers:
Use UInt only when you specifically need an unsigned integer type with the same size as the platform’s native word size. If this is not the case, Int is preferred, even when the values to be stored are known to be non-negative. [...]
Here's an example: Swift's Array.count returns an Int. How can one possibly have negative amount of items?!
Why?!
Apple's states that:
A consistent use of Int for integer values aids code interoperability, avoids the need to convert between different number types, and matches integer type inference, as described in Type Safety and Type Inference.
But I don't buy it! Using Int wouldn't aid "interoperability" anymore than UInt since Int could resolve to Int32 or Int64 (for 32-bit and 64-bit platforms respectively).
If you care about robustness at all, using signed integers where it makes no logical sense essentially forces to do an additional check (What if the value is negative?)
I can't see the act of casting between signed and unsigned as being anything other than trivial. Wouldn't that simply indicate the compiler to resolve the machine-code to use either signed or unsigned byte-codes?!

Casting back and forth between signed and unsigned integers is extremely bug-prone on one side while adds little value on the other.
One reason to have unsigned int that you suggest, being an implicit guarantee that an index never gets negative value.. well, it's a bit speculative. Where would the potential to get a negative value come from? Of course, from the code, that is, either from a static value or from a computation. But in both cases, for a static or a computed value to be able to get negative they must be handled as signed integers. Therefore, it is a language implementation responsibility to introduce all sorts of checks every time you assign signed value to unsigned variable (or vice versa). This means that we talk not about being forced "to do an additional check" or not, but about having this check implicitly made for us by the language every time we feel lazy to bother with corner cases.
Conceptually, signed and unsigned integers come into the language from low level (machine codes). In other words, unsigned integer is in the language not because it is the language that has a need, but because it is directly bridge-able to machine instructions, hence allows performance gain just for being native. No other big reason behind. Therefore, if one has just a glimpse of portability in mind, then one would say "Be it Int and this is it. Let developer write clean code, we bring the rest".

As long as we have an opinions based question...
Basing programming language mathematical operations on machine register size is one of the great travesties of Computer Science. There should be Integer*, Rational, Real and Complex - done and dusted. You need something that maps to a U8 Register for some device driver? Call it a RegisterOfU8Data - or whatever - just not 'Int(eger)'
*Of course, calling it an 'Integer' means it 'rolls over' to an unlimited range, aka BigNum.

Sharing what I've discovered which indirectly helps me understand... at least in part. Maybe it ends up helping others?!
After a few days of digging and thinking, it seems part of my problem boils down to the usage of the word "casting".
As far back as I can remember, I've been taught that casting was very distinct and different from converting in the following ways:
Converting kept the meaning but changed the data.
Casting kept the data but changed the meaning.
Casting was a mechanism allowing you to inform the compiler how both it and you would be manipulating some piece of data (No changing of data, thus no cost). " Me to the compiler: Okay! Initially I told you this byte was an number because I wanted to perform math on it. Now, lets treat it as an ASCII character."
Converting was a mechanism for transforming the data into different formats. " Me to the compiler: I have this number, please generate an ASCII string that represents that value."
My problem, it seems, is that in Swift (and most likely other languages) the line between casting and converting is blurred...
Case in point, Apple explains that:
Type casting in Swift is implemented with the is and as operators. […]
var x: Int = 5
var y: UInt = x as UInt // Casting... Compiler refuses claiming
// it's not "convertible".
// I don't want to convert it, I want to cast it.
If "casting" is not this clearly defined action it could explain why unsigned integers are to be avoided like the plague...

Related

Precondition failed: Negative count not allowed

Error:
Precondition failed: Negative count not allowed: file /BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang/swiftlang-900.0.74.1/src/swift/stdlib/public/core/StringLegacy.swift, line 49
Code:
String(repeating: "a", count: -1)
Thinking:
Well, it doesn't make sense repeating some string a negative number of times. Since we have types in Swift, why not use an UInt?
Here we have some documentation about it.
Use UInt only when you specifically need an unsigned integer type with
the same size as the platform’s native word size. If this isn’t the
case, Int is preferred, even when the values to be stored are known to
be nonnegative. A consistent use of Int for integer values aids code
interoperability, avoids the need to convert between different number
types, and matches integer type inference, as described in Type Safety
and Type Inference.
Apple Docs
Ok that Int is preferred, therefore the API is just following the rules, but why the Strings API is designed like that? Why this constructor is not private and the a public one with UInt ro something like that? Is there a "real" reason? It this some "undefined behavior" kind of thing?
Also: https://forums.developer.apple.com/thread/98594
This isn't undefined behavior — in fact, a precondition indicates the exact opposite: an explicit check was made to ensure that the given count is positive.
As to why the parameter is an Int and not a UInt — this is a consequence of two decisions made early in the design of Swift:
Unlike C and Objective-C, Swift does not allow implicit (or even explicit) casting between integer types. You cannot pass an Int to function which takes a UInt, and vice versa, nor will the following cast succeed: myInt as? UInt. Swift's preferred method of converting is using initializers: UInt(myInt)
Since Ints are more generally applicable than UInts, they would be the preferred integer type
As such, since converting between Ints and UInts can be cumbersome and verbose, the easiest way to interoperate between the largest number of APIs is to write them all in terms of the common integer currency type: Int. As the docs you quote mention, this "aids code interoperability, avoids the need to convert between different number types, and matches integer type inference"; trapping at runtime on invalid input is a tradeoff of this decision.
In fact, Int is so strongly ingrained in Swift that when Apple framework interfaces are imported into Swift from Objective-C, NSUInteger parameters and return types are converted to Int and not UInt, for significantly easier interoperability.

What is the correct type for returning a C99 `bool` to Rust via the FFI?

A colleague and I have been scratching our heads over how to return a bool from <stdbool.h> (a.k.a. _Bool) back to Rust via the FFI.
We have our C99 code we want to use from Rust:
bool
myfunc(void) {
...
}
We let Rust know about myfunc using an extern C block:
extern "C" {
fn myfunc() -> T;
}
What concrete type should T be?
Rust doesn't have a c_bool in the libc crate, and if you search the internet, you will find various GitHub issues and RFCs where people discuss this, but don't really come to any consensus as to what is both correct and portable:
https://github.com/rust-lang/rfcs/issues/1982#issuecomment-297534238
https://github.com/rust-lang/rust/issues/14608
https://github.com/rust-lang/rfcs/issues/992
https://github.com/rust-lang/rust/pull/46156
As far as I can gather:
The size of a bool in C99 is undefined other than the fact it must be at least large enough to store true (1) and false (0). In other words, at least one bit long.
It could even be one bit wide.
Its size might be ABI defined.
This comment suggests that if a C99 bool is passed into a function as a parameter or out of a function as the return value, and the bool is smaller than a C int then it is promoted to the same size as an int. Under this scenario, we can tell Rust T is u32.
All right, but what if (for some reason) a C99 bool is 64 bits wide? Is u32 still safe? Perhaps under this scenario we truncate the 4 most significant bytes, which would be fine, since the 4 least significant bytes are more than enough to represent true and false.
Is my reasoning correct? Until Rust gets a libc::c_bool, what would you use for T and why is it safe and portable for all possible sizes of a C99 bool (>=1 bit)?
As of 2018-02-01, the size of Rust's bool is officially the same as C's _Bool.
This means that bool is the correct type to use in FFI.
The rest of this answer applies to versions of Rust before the official decision was made
Until Rust gets a libc::c_bool, what would you use for T and why is it safe and portable for all possible sizes of a C99 bool (>=1 bit)?
As you've already linked to, the official answer is still "to be determined". That means that the only possibility that is guaranteed to be correct is: nothing.
That's right, as sad as it may be. The only truly safe thing would be to convert your bool to a known, fixed-size integral type, such as u8, for the purpose of FFI. That means you need to marshal it on both sides.
Practically, I'd keep using bool in my FFI code. As people have pointed out, it magically lines up on all the platforms that are in wide use at the moment. If the language decides to make bool FFI compatible, you are good to go. If they decide something else, I'd be highly surprised if they didn't introduce a lint to allow us to catch the errors quickly.
See also:
Is bool guaranteed to be 1 byte?
After a lot of thought, I'm going to try answering my own question. Please comment if you can find a hole in the following reasoning.
This is not the correct answer -- see the comments below
I think a Rust u8 is always safe for T.
We know that a C99 bool is an integer large enough to store 0 or 1, which means it's free to be an unsigned integer of at least 1-bit, or (if you are feeling weird) a signed integer of at least 2-bits.
Let's break it down by case:
If the C99 bool is 8-bits then a Rust u8 is perfect. Even in the signed case, the top bit will be a zero since representing 0 and 1 never requires a negative power of two.
If the C99 bool is larger than a Rust u8, then by "casting it down" to a 8-bit size, we only ever discard leading zeros. Thus this is safe too.
Now consider the case where the C99 bool is smaller than the Rust u8. When returning a value from a C function, it's not possible to return a value of size less than one byte due to the underlying calling convention. The CC will require return value to be loaded into a register or into a location on the stack. Since the smallest register or memory location is one byte, the return value will need to be extended (with zeros) to at least a one byte sized value (and I believe the same is true of function arguments, which too must adhere to calling convention). If the value is extended to a one-byte value, then it's the same as case 1. If the value is extended to a larger size, then it's the same as case 2.

c cast and deference a pointer strict aliasing

In http://blog.regehr.org/archives/1307, the author claims that the following snippet has undefined behavior:
unsigned long bogus_conversion(double d) {
unsigned long *lp = (unsigned long *)&d;
return *lp;
}
The argument is based on http://port70.net/~nsz/c/c11/n1570.html#6.5p7, which specified the allowed access circumstances. However, in the footnote(88) for this bullet point, it says this list is only for checking aliasing purpose, so I think this snippet is fine, assuming sizeof(long) == sizeof(double).
My question is whether the above snippet is allowed.
The snippet is erroneous but not because of aliasing. First there is a simple rule that says to deference a pointer to object with a different type than its effective type is wrong. Here the effective type is double, so there is an error.
This safeguard is there in the standard, because the bit representation of a double must not be a valid representation for unsigned long, although this would be quite exotic nowadays.
Second, from a more practical point of view, double and unsigned long may have different alignment properties, and accessing this in that way may produce a bus error or just have a run time penalty.
Generally casting pointers like that is almost always wrong, has no defined behavior, is bad style and in addition is mostly useless, anyhow. Focusing on aliasing in the argumentation about these problems is a bad habit that probably originates in incomprehensible and scary gcc warnings.
If you really want to know the bit representation of some type, there are some exceptions of the "effective type" rule. There are two portable solutions that are well defined by the C standard:
Use unsigned char* and inspect the bytes.
Use a union that comprises both types, store the value in there and read it with the other type. By that you are telling the compiler that you want an object that can be seen as both types. But here you should not use unsigned long as a target type but uint64_t, since you have to be sure that the size is exactly what you think it is, and that there are no trap representations.
To illustrate that, here is the same function as in the question but with defined behavior.
unsigned long valid_conversion(double d) {
union {
unsigned long ul;
double d;
} ub = { .d = d, };
return ub.ul;
}
My compiler (gcc on a Debian, nothing fancy) compiles this to exactly the same assembler as the code in the question. Only that you know that this code is portable.

Why didn't scala design around Integer Overflow?

I am a former Java developer and I have recently watched the insightful and entertaining introduction to Scala for Java developers by professor Venkat Subramaniam (https://www.youtube.com/watch?v=LH75sJAR0hc).
A major point introduced is the elimination of declared types in lieu of "type inference". Presumably, this means the higher-order compiler recognizes the type I intend to use, by the context.
Being an application security expert by trade, the first thing I tried to do is break this type inference... Example:
// declare a function that returns the square of an input Int. The return type is to be inferred.
scala> val square = (x:Int) => x*x
square: Int => Int = <function1>
// I can see the compiler inferred an Int for the output value, which I do not agree with.
scala> square(2147483647)
res1: Int = 1
// integer overflow
My question is why did the compiler not see that "*" is an operator with a threat of overflow, and wrap the inputs in something a little more protective like a BigInteger?
According to the professor, I am supposed to forget about the internal implementation and just get on with my business logic. But after my quick demonstration I'm not so sure that Scala is safe for a programmer who doesn't understand what the compiler is doing with my methods.
I think #rightføld somewhat overstates how often overflows do or don't happen (particularly when considering an attacker who is actively trying to overflow you). But I agree with his basic point. Converting all math to BigInteger would almost certainly have created a massive performance impact over Java. For developers to choose such a language, they'd have to get something visible for that cost.
String objects have a much smaller performance overhead over cstrings for many operations. They also provide very visible benefits to the developer, which is why people use them, not security per se. There are many common things that string objects make easy to do over cstrings. BigInteger provides none of that. It requires exactly the same code at a fraction of the speed, but just won't overflow (a bug few developers see day to day, even if security guys see it more often).
The equivalent would have been a cstring (with strcmp, strcpy, strcat, etc.) that ran at a fraction of the speed, but just didn't require a null terminator. I don't think many people would have jumped to use that, either, no matter how much that would help security over null-terminated strings. And if the language required it, I don't see a lot of people anxious to use the language.
And as #rightføld suggests in the comments, interoperability with Java would be trashed, since most if not all numbers would wind up being BigInteger. You'd constantly be converting, which raises the same dangers of overflows while adding a lot of code complexity (and more performance impacts).
A from-scratch language might get away with ubiquitous BigInteger (like python) if the language had a lot of other compelling features, but it's a very hard thing to retrofit into a language that wants to be a natural transition from (and with) Java.
In addition to the above answers, I think this question misunderstands the purpose of type inference in a statically typed language. Type inference does not make the choices that you are referring to - promoting a Int to a BigInt. It is restricted to simply "inferring" the type of an expression based the the known types of subexpressions at compile time.
The * function in Int returns an Int when supplied with an Int input parameter
def *(x: Int): Int
In this case, since x is declared to be an Int, then x*x must be an Int based on the signature of *.
If we really wanted this behavior, we could define a function that promotes Int to BigInt when multiplying.
implicit class SafeInt(x: Int) {
def safeMult(a: Int): scala.math.BigInt = scala.math.BigInt(x)*a
}
Then when we can define a square with the desired property:
scala> val square = (x: Int) => x safeMult x
square: Int => scala.math.BigInt = <function1>
The compiler infers based on the methods available. Int has a method *(Int): Int that is, as far as the compiler knows, perfectly well defined; 2147483647*2147483647 is a perfectly good method call with the result 1, it doesn't throw ClassCastException or anything like that.
Why is the Int type written this way? Largely for Java/JVM compatibility; many parts of Scala have design compromises for the sake of Java compatibility. If you don't need that functionality, you might prefer to use Haskell or a similar language. (I suspect that even without the requirement for JVM compatibility, Scala would have wanted to expose the machine-native integer types so that users could make that performance/correctness tradeoff where desired. They might not have been the default though)
If you're doing numeric computation in Scala you probably want to use the Spire library, which makes it easy to abstract over numeric types, and provides several high-performance numeric types with particular properties. In particular it has a SafeLong type that handles arbitrary-precision integers but with much better performance than BigInt for values which fall within the Long range, similar to Python's integer type.
Because overflow occurs almost never in practice, and BigInteger is slow as a dog compared to Int. It is also most inconvenient to have all * operations on Ints return BigIntegers.
"Recognizes the type I intend to use" is not an accurate description of what scala tries to do. It infers the most generic type possible given the constraints imposed by the context. Hence if you write List(Nil, "1"), you'll get List[Serializable], because Serializable is an interface that List and String share - disregarding that Serializable was probably not on your mind at all.
The question you're asking could be asked more precisely as "why is Int the type of numeric literals instead of BigInteger?" - inference doesn't have much to do with it.
And we can opine all we want on that topic, but there's one most accurate answer describing why Scala is what it is: "because Java".
If you wanted the type of safety that you seem to want, then one approach is to define via a partial function which guards against numeric overflow and then returns either an Option[Int] or even perhaps an Either[Int, BigInteger].
The type inference for your square function is correct - given that it's inferred from the input types you've specified and the type of the * function...it's not really broken in my opinion.

How do you use the different Number Types in Objective C

So I am trying to do a few things with numbers in Objective C and realize there is a plethora of options, and i am just bewildered as to which type to use for my app.
so here are the types.
NSNumber (which is a class)
NSDecmial (which is a struct)
NSDecimalNumber (which is a class)
float/double (which are primitive types)
so essentially what i need to do is take an NSString, which is representing decimal based hours. (10.4 would be 10 hours and (4/10)*60 minutes) and convert it into:
a string representation D H:M (this needs division, multiplication and basic arithmatic)
a Number type to store for easy calculations latter (will mostly be converting between NSTimeIntervals and doing subtractions)
Oh and i need to be able to do an Absolute value as well on these
It appears that the hard part is actually transitioning between the types.
To me this is a very trivial problem so I"m not sure if its getting late or because objective C numerical types suck, but i could use a hand.
Use primitive types (double, CGFLoat, NSInteger) for typical arithmetic and when you need to store a number as an instance variable that's going to be used primarily for arithmetic in other places. You can use C math functions (abs(), pow(), etc) as needed. NSTimeInterval is a typedef for double, so you can interchange the two.
Use NSNumber when you need to store a number as an object, for example if you're creating an NSArray of numbers. Some parts of Cocoa like Core Data or key value coding deal more with NSNumber than primitive types, so you may find yourself using NSNumber more then usual in those situations. For example, if you write [timeKeepersArray valueForKeyPath:#"sum.seconds"] you'll get back an NSNumber, so you may find it easier just to keep that variable instead of converting it to a primitive.
Since it's a small amount of extra code to convert between NSNumber and primitive types, usually your application will end up favoring one or the other depending on what you're doing with numbers.
Oh, and NSDecmial and NSDecimalNumber? Don't worry too much about them, they only come up when you need really precise decimal operations, such as if you're storing financial data.