c cast and deference a pointer strict aliasing - c11

In http://blog.regehr.org/archives/1307, the author claims that the following snippet has undefined behavior:
unsigned long bogus_conversion(double d) {
unsigned long *lp = (unsigned long *)&d;
return *lp;
}
The argument is based on http://port70.net/~nsz/c/c11/n1570.html#6.5p7, which specified the allowed access circumstances. However, in the footnote(88) for this bullet point, it says this list is only for checking aliasing purpose, so I think this snippet is fine, assuming sizeof(long) == sizeof(double).
My question is whether the above snippet is allowed.

The snippet is erroneous but not because of aliasing. First there is a simple rule that says to deference a pointer to object with a different type than its effective type is wrong. Here the effective type is double, so there is an error.
This safeguard is there in the standard, because the bit representation of a double must not be a valid representation for unsigned long, although this would be quite exotic nowadays.
Second, from a more practical point of view, double and unsigned long may have different alignment properties, and accessing this in that way may produce a bus error or just have a run time penalty.
Generally casting pointers like that is almost always wrong, has no defined behavior, is bad style and in addition is mostly useless, anyhow. Focusing on aliasing in the argumentation about these problems is a bad habit that probably originates in incomprehensible and scary gcc warnings.
If you really want to know the bit representation of some type, there are some exceptions of the "effective type" rule. There are two portable solutions that are well defined by the C standard:
Use unsigned char* and inspect the bytes.
Use a union that comprises both types, store the value in there and read it with the other type. By that you are telling the compiler that you want an object that can be seen as both types. But here you should not use unsigned long as a target type but uint64_t, since you have to be sure that the size is exactly what you think it is, and that there are no trap representations.
To illustrate that, here is the same function as in the question but with defined behavior.
unsigned long valid_conversion(double d) {
union {
unsigned long ul;
double d;
} ub = { .d = d, };
return ub.ul;
}
My compiler (gcc on a Debian, nothing fancy) compiles this to exactly the same assembler as the code in the question. Only that you know that this code is portable.

Related

Precondition failed: Negative count not allowed

Error:
Precondition failed: Negative count not allowed: file /BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang/swiftlang-900.0.74.1/src/swift/stdlib/public/core/StringLegacy.swift, line 49
Code:
String(repeating: "a", count: -1)
Thinking:
Well, it doesn't make sense repeating some string a negative number of times. Since we have types in Swift, why not use an UInt?
Here we have some documentation about it.
Use UInt only when you specifically need an unsigned integer type with
the same size as the platform’s native word size. If this isn’t the
case, Int is preferred, even when the values to be stored are known to
be nonnegative. A consistent use of Int for integer values aids code
interoperability, avoids the need to convert between different number
types, and matches integer type inference, as described in Type Safety
and Type Inference.
Apple Docs
Ok that Int is preferred, therefore the API is just following the rules, but why the Strings API is designed like that? Why this constructor is not private and the a public one with UInt ro something like that? Is there a "real" reason? It this some "undefined behavior" kind of thing?
Also: https://forums.developer.apple.com/thread/98594
This isn't undefined behavior — in fact, a precondition indicates the exact opposite: an explicit check was made to ensure that the given count is positive.
As to why the parameter is an Int and not a UInt — this is a consequence of two decisions made early in the design of Swift:
Unlike C and Objective-C, Swift does not allow implicit (or even explicit) casting between integer types. You cannot pass an Int to function which takes a UInt, and vice versa, nor will the following cast succeed: myInt as? UInt. Swift's preferred method of converting is using initializers: UInt(myInt)
Since Ints are more generally applicable than UInts, they would be the preferred integer type
As such, since converting between Ints and UInts can be cumbersome and verbose, the easiest way to interoperate between the largest number of APIs is to write them all in terms of the common integer currency type: Int. As the docs you quote mention, this "aids code interoperability, avoids the need to convert between different number types, and matches integer type inference"; trapping at runtime on invalid input is a tradeoff of this decision.
In fact, Int is so strongly ingrained in Swift that when Apple framework interfaces are imported into Swift from Objective-C, NSUInteger parameters and return types are converted to Int and not UInt, for significantly easier interoperability.

What is the correct type for returning a C99 `bool` to Rust via the FFI?

A colleague and I have been scratching our heads over how to return a bool from <stdbool.h> (a.k.a. _Bool) back to Rust via the FFI.
We have our C99 code we want to use from Rust:
bool
myfunc(void) {
...
}
We let Rust know about myfunc using an extern C block:
extern "C" {
fn myfunc() -> T;
}
What concrete type should T be?
Rust doesn't have a c_bool in the libc crate, and if you search the internet, you will find various GitHub issues and RFCs where people discuss this, but don't really come to any consensus as to what is both correct and portable:
https://github.com/rust-lang/rfcs/issues/1982#issuecomment-297534238
https://github.com/rust-lang/rust/issues/14608
https://github.com/rust-lang/rfcs/issues/992
https://github.com/rust-lang/rust/pull/46156
As far as I can gather:
The size of a bool in C99 is undefined other than the fact it must be at least large enough to store true (1) and false (0). In other words, at least one bit long.
It could even be one bit wide.
Its size might be ABI defined.
This comment suggests that if a C99 bool is passed into a function as a parameter or out of a function as the return value, and the bool is smaller than a C int then it is promoted to the same size as an int. Under this scenario, we can tell Rust T is u32.
All right, but what if (for some reason) a C99 bool is 64 bits wide? Is u32 still safe? Perhaps under this scenario we truncate the 4 most significant bytes, which would be fine, since the 4 least significant bytes are more than enough to represent true and false.
Is my reasoning correct? Until Rust gets a libc::c_bool, what would you use for T and why is it safe and portable for all possible sizes of a C99 bool (>=1 bit)?
As of 2018-02-01, the size of Rust's bool is officially the same as C's _Bool.
This means that bool is the correct type to use in FFI.
The rest of this answer applies to versions of Rust before the official decision was made
Until Rust gets a libc::c_bool, what would you use for T and why is it safe and portable for all possible sizes of a C99 bool (>=1 bit)?
As you've already linked to, the official answer is still "to be determined". That means that the only possibility that is guaranteed to be correct is: nothing.
That's right, as sad as it may be. The only truly safe thing would be to convert your bool to a known, fixed-size integral type, such as u8, for the purpose of FFI. That means you need to marshal it on both sides.
Practically, I'd keep using bool in my FFI code. As people have pointed out, it magically lines up on all the platforms that are in wide use at the moment. If the language decides to make bool FFI compatible, you are good to go. If they decide something else, I'd be highly surprised if they didn't introduce a lint to allow us to catch the errors quickly.
See also:
Is bool guaranteed to be 1 byte?
After a lot of thought, I'm going to try answering my own question. Please comment if you can find a hole in the following reasoning.
This is not the correct answer -- see the comments below
I think a Rust u8 is always safe for T.
We know that a C99 bool is an integer large enough to store 0 or 1, which means it's free to be an unsigned integer of at least 1-bit, or (if you are feeling weird) a signed integer of at least 2-bits.
Let's break it down by case:
If the C99 bool is 8-bits then a Rust u8 is perfect. Even in the signed case, the top bit will be a zero since representing 0 and 1 never requires a negative power of two.
If the C99 bool is larger than a Rust u8, then by "casting it down" to a 8-bit size, we only ever discard leading zeros. Thus this is safe too.
Now consider the case where the C99 bool is smaller than the Rust u8. When returning a value from a C function, it's not possible to return a value of size less than one byte due to the underlying calling convention. The CC will require return value to be loaded into a register or into a location on the stack. Since the smallest register or memory location is one byte, the return value will need to be extended (with zeros) to at least a one byte sized value (and I believe the same is true of function arguments, which too must adhere to calling convention). If the value is extended to a one-byte value, then it's the same as case 1. If the value is extended to a larger size, then it's the same as case 2.

Using signed integers instead of unsigned integers [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In software development, it's usually a good idea to take advantage of compiler-errors. Allowing the compiler to work for you by checking your code makes sense. In strong-type languages, if a variable only has two valid values, you'd make it a boolean or define an enum for it. Swift furthers this by bringing the Optional type.
In my mind, the same would apply to unsigned integers: if you know a negative value is impossible, program in a way that enforces it. I'm talking about high-level APIs; not low-level APIs where the negative value is usually used as a cryptic error signalling mechanism.
And yet Apple suggests avoiding unsigned integers:
Use UInt only when you specifically need an unsigned integer type with the same size as the platform’s native word size. If this is not the case, Int is preferred, even when the values to be stored are known to be non-negative. [...]
Here's an example: Swift's Array.count returns an Int. How can one possibly have negative amount of items?!
Why?!
Apple's states that:
A consistent use of Int for integer values aids code interoperability, avoids the need to convert between different number types, and matches integer type inference, as described in Type Safety and Type Inference.
But I don't buy it! Using Int wouldn't aid "interoperability" anymore than UInt since Int could resolve to Int32 or Int64 (for 32-bit and 64-bit platforms respectively).
If you care about robustness at all, using signed integers where it makes no logical sense essentially forces to do an additional check (What if the value is negative?)
I can't see the act of casting between signed and unsigned as being anything other than trivial. Wouldn't that simply indicate the compiler to resolve the machine-code to use either signed or unsigned byte-codes?!
Casting back and forth between signed and unsigned integers is extremely bug-prone on one side while adds little value on the other.
One reason to have unsigned int that you suggest, being an implicit guarantee that an index never gets negative value.. well, it's a bit speculative. Where would the potential to get a negative value come from? Of course, from the code, that is, either from a static value or from a computation. But in both cases, for a static or a computed value to be able to get negative they must be handled as signed integers. Therefore, it is a language implementation responsibility to introduce all sorts of checks every time you assign signed value to unsigned variable (or vice versa). This means that we talk not about being forced "to do an additional check" or not, but about having this check implicitly made for us by the language every time we feel lazy to bother with corner cases.
Conceptually, signed and unsigned integers come into the language from low level (machine codes). In other words, unsigned integer is in the language not because it is the language that has a need, but because it is directly bridge-able to machine instructions, hence allows performance gain just for being native. No other big reason behind. Therefore, if one has just a glimpse of portability in mind, then one would say "Be it Int and this is it. Let developer write clean code, we bring the rest".
As long as we have an opinions based question...
Basing programming language mathematical operations on machine register size is one of the great travesties of Computer Science. There should be Integer*, Rational, Real and Complex - done and dusted. You need something that maps to a U8 Register for some device driver? Call it a RegisterOfU8Data - or whatever - just not 'Int(eger)'
*Of course, calling it an 'Integer' means it 'rolls over' to an unlimited range, aka BigNum.
Sharing what I've discovered which indirectly helps me understand... at least in part. Maybe it ends up helping others?!
After a few days of digging and thinking, it seems part of my problem boils down to the usage of the word "casting".
As far back as I can remember, I've been taught that casting was very distinct and different from converting in the following ways:
Converting kept the meaning but changed the data.
Casting kept the data but changed the meaning.
Casting was a mechanism allowing you to inform the compiler how both it and you would be manipulating some piece of data (No changing of data, thus no cost). " Me to the compiler: Okay! Initially I told you this byte was an number because I wanted to perform math on it. Now, lets treat it as an ASCII character."
Converting was a mechanism for transforming the data into different formats. " Me to the compiler: I have this number, please generate an ASCII string that represents that value."
My problem, it seems, is that in Swift (and most likely other languages) the line between casting and converting is blurred...
Case in point, Apple explains that:
Type casting in Swift is implemented with the is and as operators. […]
var x: Int = 5
var y: UInt = x as UInt // Casting... Compiler refuses claiming
// it's not "convertible".
// I don't want to convert it, I want to cast it.
If "casting" is not this clearly defined action it could explain why unsigned integers are to be avoided like the plague...

Working with opaque types (Char and Long)

I'm trying to export a Scala implementation of an algorithm for use in JavaScript. I'm using #JSExport. The algorithm works with Scala Char and Long values which are marked as opaque in the interoperability guide.
I'd like to know (a) what this means; and (b) what the recommendation is for dealing with this.
I presume it means I should avoid Char and Long and work with String plus a run-time check on length (or perhaps use a shapeless Sized collection) and Int instead.
But other ideas welcome.
More detail...
The kind of code I'm looking at is:
#JSExport("Foo")
class Foo(val x: Int) {
#JSExport("add")
def add(n: Int): Int = x+n
}
...which works just as expected: new Foo(1).add(2) produces 3.
Replacing the types with Long the same call reports:
java.lang.ClassCastException: 1 is not an instance of scala.scalajs.runtime.RuntimeLong (and something similar with methods that take and return Char).
Being opaque means that
There is no corresponding JavaScript type
There is no way to create a value of that type from JavaScript (except if there is an #JSExported constructor)
There is no way of manipulating a value of that type (other than calling #JSExported methods and fields)
It is still possible to receive a value of that type from Scala.js code, pass it around, and give it back to Scala.js code. It is also always possible to call .toString(), because java.lang.Object.toString() is #JSExported. Besides toString(), neither Char nor Long export anything, so you can't do anything else with them.
Hence, as you have experienced, a JavaScript 1 cannot be used as a Scala.js Long, because it's not of the right type. Neither is 'a' a valid Char (but it's a valid String).
Therefore, as you have inferred yourself, you must indeed avoid opaque types, and use other types instead if you need to create/manipulate them from JavaScript. The Scala.js side can convert back and forth using the standard tools in the language, such as someChar.toInt and someInt.toChar.
The choice of which type is best depends on your application. For Char, it could be Int or String. For Long, it could be String, a pair of Ints, or possibly even Double if the possible values never use more than 52 bits of precision.

Must I cast this that way?

size_t pixelsWidth = (size_t)bitmapSize.width;
Or is it totally fine to do without the casting to size_t? bitmapSize is of type CGSize...
You should use the proper type, which is probably CGFloat. size_t is something int'ish and inappropriate.
In this case, the type of bitmapSize.width is CGFloat (currently float on iPhone).
Converting from float to size_t has undefined behavior (according to the C standard - not sure whether Apple provides any further guarantees) if the value converted doesn't fit into a size_t. When it does fit, the conversion loses the fractional part. It makes no difference whether the conversion is implicit or explicit.
The cast is "good" in the sense that it shows, in the code, that a conversion takes place, so programmers reading the code won't forget. It's "bad" in the sense that it probably suppresses any warnings that the compiler would give you that this conversion is dangerous.
So, if you're absolutely confident that the conversion will always work then you want the cast (to suppress the warning). If you're not confident then you don't want the cast (and you probably should either avoid the conversion or else get confident).
In this case, it seems likely that the conversion is safe as far as size is concerned. Assuming that your CGSize object represents the size of an actual UI element, it won't be all that big. A comment in the code would be nice, but this is the kind of thing that programmers stop commenting after the fiftieth time, and start thinking it's "obvious" that of course an image width fits in a size_t.
A further question is whether the conversion is safe regarding fractions. Under what circumstances, if any, would the size be fractional?
C supports implicit casting, and you will also get a warning if size_t is less precise than CGSize for some reason.
The size_t type is for something completely different, you should not use it for such purposes.
Its purpose is to express the sizes of different types in memory. For example, the sizeof(int) is of type size_t and it returns the size of the int type.
As the others suggested, use the appropriate type for that variable.
A cast is usually not needed (and sometimes wrong).
C does the "right thing" most of the time.
In your case, the cast is not needed (but not wrong, merely redundant).
You need to cast
arguments to printf when they don't match the conversion specification
printf("%p\n", (void*)some_pointer)
arguments to is*() (isdigit, isblank, ...) and toupper() and tolower()
if (isxdigit((unsigned char)ch)) { /* ... */ }
(If I remember more, I'll add them here)