How does the assembler handle classes and objects, and how are they stored in RAM and the executable? - class

How does an assembler handle classes and objects when a programme is compiled? And how is this stored in RAM and the executable file?
At first, memory is allocated according to the class' size, for example 20 bytes. In these 20 bytes, all variables of the class are stored. But there are two problems:
How do pointers to classes work?
How does a called method know which object it belongs to?
Could you explain this to me? (If using example code, I prefer Objective-C, C++ and x86 Assembly)

The assembler has no clue what a class is, it only assembles machine code, which the occasional macro tossed in. For all intents and purposes a class is merly a struct with an optional vftable, with all the handling and class 'special features' (virtualism, polymorphism, inheiritanc etc) being done in the intermediate stage, when IR code is created. Memory would be allocated the same as a struct, variable, array or any other data 'blob' (statically or dynamically, taking alignment, const'ness and packing into account), except for the support code to handle stack & static based dtor unwinding(done again at the IR level), ctors, and static initialization(though static initialization can happen for more than class objects). I suggest you give the dragon book a read through (the first eight chapters would cover it), to get a clearer picture of how a compiler and assembler work, seeing as these things are not handled by the assembler, but by the compiler front and/or back ends, depending on how the compiler an its IL are structured.

(2) Member functions are rewritten by the compiler. Consider class A as follows.
class A {
int i;
public:
A () : i (0) { }
void f (int a, char *b) { i = a; }
}
Then what the compiler makes of A::f looks something like this (pseudocode):
void A::f (A * const this, int a, char *b) { this->i = a; }
Consider now a call to A::f.
A a;
a.f (25, "");
The compiler generates code similar to this (pseudocode):
A a;
A::f (&a, 25, "");
In other words, the compiler works the hidden this pointer into every non-static member function and each call receives a reference to the instance that it was called upon. This, in particular, means that you can invoke non-static member functions on NULL pointers.
A *a = NULL;
a->f (25, "");
The crash only occurs when you actually reference non-static class member variables.
The resulting crash report also illustrates the answer to question (1). In many cases, you'll not crash on address 0, but an offset of that. That offset corresponds to the offset that the accessed variable has in the memory layout the compiler chose for class A (in this case, many compilers will actually offset it with 0x0 and class A will in memory not be distinguishable from struct A { int i; };). Basically, pointers to classes are pointers to the equivalent C struct. Member functions are implemented as free functions taking an instance reference as first argument. All and any access checks with regard to public, protected and private members is done upfront by the compiler, the generated assembly has no clue about any of those concepts. In fact, early versions of C++ are said to have been sets of clever C macros.
The memory layout (typically) changes a bit when you have virtual functions.

Related

Reverse C++/WinRT ABI parameter order for MIDL 3.0 array parameter?

I have an existing interface that I'm trying to define using MIDL 3.0. One of it's methods has this C++ signature:
HRESULT GetArray(struct FOO** outArray, uint32_t* outSize);
I tried translating this to IDL like so:
namespace Examples {
struct Foo {
Int32 n1;
Int32 n2;
};
interface IExample {
void GetArray(out Foo[] array);
}
}
However, the resulting C++/WinRT ABI has the parameters in the opposite order:
template <> struct abi<Examples::IExample>{ struct type : IInspectable
{
virtual HRESULT __stdcall GetArray(uint32_t* __arraySize, struct struct_Examples_Foo** array) noexcept = 0;
};};
This does make sense considering that is the recommended order. Unfortunately, I don't have the ability to change the parameter order of the existing interface. Instead, I figured I might be able to work around it using "classic" style:
namespace Examples {
[uuid("d7675bdc-7b6e-4936-a4a0-f113c1a3ef70"), version(1)]
interface IExample {
HRESULT GetArray(
[out, size_is(, *size)] Foo** array,
[out] unsigned long* size
);
}
}
But, this is rejected by the MIDL compiler:
MIDL4058: [msg]The size parameter of an array parameter must appear directly before the array parameter. [context]size
How do I write this interface in IDL in such a way that results in the correct ABI?
WinRT has a strict ABI definition for the ordering of array parameters, and as you have discovered it is (size, pointer) and not the other way around. There is no way to change this, since all the projections (such as .NET, JavaScript, and C++/CX) expect this order and would fail catastrophically if passed in the wrong order.
If you cannot change the ordering, can you write a wrapper class that exposes the correct ordering and simply forwards the calls onto your existing code with the parameters reversed?
Failing that, there is another way to support this if you only care about C++ (and maybe C# clients). That is, rather than defining a WinRT interface for this method, you can define a classic-COM interface and have your WinRT object implement that interface as well. Then the clients of your WinRT object QI for that COM interface and can pass the arguments in the order you require.

Are initializer expressions part of the constructor in D?

In D, can I initialize directly on declaration and expect the initializer expressions are part of the constructor?
I came from C# and there this is the case. But with DMD 2.071.0 Im getting other behavior.
class Other { }
class Test { Other nonStaticMember = new Other; }
void test()
{
auto t1 = new Test;
auto t2 = new Test;
// Assert is failing, the two Test instances are
// being initialized to the same reference
// instead of execute the Other constructor twice.
assert(t1.nonStaticMember !is t2.nonStaticMember);
}
If this is the intented behavior it should be documented here: https://dlang.org/spec/class.html right?
This code doesn't do in D what it would do in C#.
In your example, Other is instantiated during compilation.
I.e. Other is instantiated once, during compilation, and is placed in the program's data segment. Then, nonStaticMember's initial value will, by default, point to that instance of Other, for all Test instances.
So, everything is working exactly as designed, even if it may be surprising when coming from other languages.
If this is the intented behavior it should be documented here: https://dlang.org/spec/class.html right?
Perhaps, but note that this behavior is not at all specific to classes. A pointer to any value allocated on the heap, as the initial value of a global or local static variable, will behave the same. Whenever the value of an expression is demanded during compilation (and that includes initializers for global/static variables), D attempts to evaluate it at compile-time. A few years ago, this has been extended to allocating values on the "heap" too, which then end up in the program's initial data segment.

why this complicated structure constants is internal linkage

as you know, constants defaults to internal linkage.
const int Buf = 1000; // defaults to internal linkage
Buf can be defined in a header file, it's visible only within the files where it is defined and cannot be seen at link time by other translation units.
however, if some complicated structure constants are defined as below:
- constants.h
const complicatedClass myObject("I'm a const object","internal linkage",5);
complicatedClass definition:
class complicatedClass
{
private :
char* charArry;
std::string strTemp;
static int numbers;
int mSize;
public:
complicatedClass();
complicatedClass(char* pChrArry, std::string temp, int size);
~complicatedClass();
public:
void print() const;
std::string getStrTemp() const;
};
it seems that compile must create storage for complicated structure constants, thus it should be external linkage. however, everything is ok when this constants header file (constants.h) was included in multiple files. I assume the linker error should be raised, myObject shouldn't be defined in many places(in multiple files)
can anyone explain this issue? thanks in advance.
Internal linkage does not mean no storage. Rather it means the variable is not visible in other translation units.
In C++ const allows the compiler to either create storage for the variable or not. Whether it does so or not depends on whether it needs it.
So in your example the compiler will create storage for myObject only if it needs it (which it probably does) because it is const. Also because it is const, myObject will also have internal linkage which means each translation unit will have its own copy of myObject if storage is required.
A simple test you can do to see this in action is to take the address of myObject in a number of different translation units (effectively in different cpp files) and print it out. This will do two things: force storage to be created for myObject even if it wasn't already; and you will see two different addresses because of the internal linkage.

Scala: Do classes that extend a trait always take the traits properties?

Given the following:
class TestClass extends TestTrait {
def doesSomething() = methodValue + intValue
}
trait TestTrait {
val intValue = 4
val unusedValue = 5
def methodValue = "method"
def unusedMethod = "unused method"
}
When the above code runs, will TestClass actually have memory allocated to unusedValue or unusedMethod? I've used javap and I know that there exists an unusedValue and an unusedMethod, but I cannot determine if they are actually populated with any sort of state or memory allocation.
Basically, I'm trying to understand if a class ALWAYS gets all that a trait provides, or if the compiler is smart enough to only provide what the class actually uses from the trait?
If a trait always imposes itself on a class, it seems like it could be inefficient, since I expect many programmers will use traits as mixins and therefore wasting memory everywhere.
Thanks to all who read and help me get to the bottom of this!
Generally speaking, in languages like Scala and Java and C++, each class has a table of pointers to its instance methods. If your question is whether the Scala compiler will allocate slots in the method table for unusedMethod then I would say yes it should.
I think your question is whether the Scala compiler will look at the body of TestClass and say "whoa, I only see uses of methodValue and intValue, so being a good compiler I'm going to refrain from allocating space in TestClass's method table for unusedMethod. But it can't really do this in general. The reason is, TestClass will be compiled into a class file TestClass.class and this class may be used in a library by programmers that you don't even know.
And what will they want to do with your class? This:
var x = new TestClass();
print(x.unusedMethod)
See, the thing is the compiler can't predict who is going to use this class in the future, so it puts all methods into its method table, even the ones not called by other methods in the class. This applies to methods declared in the class or picked up via an implemented trait.
If you expect the compiler to do global system-wide static analysis and optimization over a fixed, closed system then I suppose in theory it could whittle away such things, but I suspect that would be a very expensive optimization and not really worth it. If you need this kind of memory savings you would be better off writing smaller traits on your own. :)
It may be easiest to think about how Scala implements traits at the JVM level:
An interface is generated with the same name as the trait, containing all the trait's method signatures
If the trait contains only abstract methods, then nothing more is needed
If the trait contains any concrete methods, then the definition of these will be copied into any class that mixes in the trait
Any vals/vars will also get copied verbatim
It's also worth noting how a hypothetical var bippy: Int is implemented in equivalent java:
private int bippy; //backing field
public int bippy() { return this.bippy; } //getter
public void bippy_$eq(int x) { this.bippy = x; } //setter
For a val, the backing field is final and no setter is generated
When mixing-in a trait, the compiler doesn't analyse usage. For one thing, this would break the contract made by the interface. It would also take an unacceptably long time to perform such an analysis. This means that you will always inherit the cost of the backing fields from any vals/vars that get mixed in.
As you already hinted, if this is a problem then the solution is just use defs in your traits.
There are several other benefits to such an approach and, thanks to the uniform access principle, you can always override such a method with a val further down in the inheritance hierarchy if you need to.

Are there any static duck-typed languages?

Can I specify interfaces when I declare a member?
After thinking about this question for a while, it occurred to me that a static-duck-typed language might actually work. Why can't predefined classes be bound to an interface at compile time? Example:
public interface IMyInterface
{
public void MyMethod();
}
public class MyClass //Does not explicitly implement IMyInterface
{
public void MyMethod() //But contains a compatible method definition
{
Console.WriteLine("Hello, world!");
}
}
...
public void CallMyMethod(IMyInterface m)
{
m.MyMethod();
}
...
MyClass obj = new MyClass();
CallMyMethod(obj); // Automatically recognize that MyClass "fits"
// MyInterface, and force a type-cast.
Do you know of any languages that support such a feature? Would it be helpful in Java or C#? Is it fundamentally flawed in some way? I understand you could subclass MyClass and implement the interface or use the Adapter design pattern to accomplish the same thing, but those approaches just seem like unnecessary boilerplate code.
A brand new answer to this question, Go has exactly this feature. I think it's really cool & clever (though I'll be interested to see how it plays out in real life) and kudos on thinking of it.
As documented in the official documentation (as part of the Tour of Go, with example code):
Interfaces are implemented implicitly
A type implements an interface by implementing its methods. There is
no explicit declaration of intent, no "implements" keyword.
Implicit interfaces decouple the definition of an interface from its
implementation, which could then appear in any package without
prearrangement.
How about using templates in C++?
class IMyInterface // Inheritance from this is optional
{
public:
virtual void MyMethod() = 0;
}
class MyClass // Does not explicitly implement IMyInterface
{
public:
void MyMethod() // But contains a compatible method definition
{
std::cout << "Hello, world!" "\n";
}
}
template<typename MyInterface>
void CallMyMethod(MyInterface& m)
{
m.MyMethod(); // instantiation succeeds iff MyInterface has MyMethod
}
MyClass obj;
CallMyMethod(obj); // Automatically generate code with MyClass as
// MyInterface
I haven't actually compiled this code, but I believe it's workable and a pretty trivial C++-ization of the original proposed (but nonworking) code.
Statically-typed languages, by definition, check types at compile time, not run time. One of the obvious problems with the system described above is that the compiler is going to check types when the program is compiled, not at run time.
Now, you could build more intelligence into the compiler so it could derive types, rather than having the programmer explicitly declare types; the compiler might be able to see that MyClass implements a MyMethod() method, and handle this case accordingly, without the need to explicitly declare interfaces (as you suggest). Such a compiler could utilize type inference, such as Hindley-Milner.
Of course, some statically typed languages like Haskell already do something similar to what you suggest; the Haskell compiler is able to infer types (most of the time) without the need to explicitly declare them. But obviously, Java/C# don't have this ability.
I don't see the point. Why not be explicit that the class implements the interface and have done with it? Implementing the interface is what tells other programmers that this class is supposed to behave in the way that interface defines. Simply having the same name and signature on a method conveys no guarantees that the intent of the designer was to perform similar actions with the method. That may be, but why leave it up for interpretation (and misuse)?
The reason you can "get away" with this successfully in dynamic languages has more to do with TDD than with the language itself. In my opinion, if the language offers the facility to give these sorts of guidance to others who use/view the code, you should use it. It actually improves clarity and is worth the few extra characters. In the case where you don't have access to do this, then an Adapter serves the same purpose of explicitly declaring how the interface relates to the other class.
F# supports static duck typing, though with a catch: you have to use member constraints. Details are available in this blog entry.
Example from the cited blog:
let inline speak (a: ^a) =
let x = (^a : (member speak: unit -> string) (a))
printfn "It said: %s" x
let y = (^a : (member talk: unit -> string) (a))
printfn "Then it said %s" y
type duck() =
member x.speak() = "quack"
member x.talk() = "quackity quack"
type dog() =
member x.speak() = "woof"
member x.talk() = "arrrr"
let x = new duck()
let y = new dog()
speak x
speak y
TypeScript!
Well, ok... So it's a javascript superset and maybe does not constitute a "language", but this kind of static duck-typing is vital in TypeScript.
Most of the languages in the ML family support structural types with inference and constrained type schemes, which is the geeky language-designer terminology that seems most likely what you mean by the phrase "static duck-typing" in the original question.
The more popular languages in this family that spring to mind include: Haskell, Objective Caml, F# and Scala. The one that most closely matches your example, of course, would be Objective Caml. Here's a translation of your example:
open Printf
class type iMyInterface = object
method myMethod: unit
end
class myClass = object
method myMethod = printf "Hello, world!"
end
let callMyMethod: #iMyInterface -> unit = fun m -> m#myMethod
let myClass = new myClass
callMyMethod myClass
Note: some of the names you used have to be changed to comply with OCaml's notion of identifier case semantics, but otherwise, this is a pretty straightforward translation.
Also, worth noting, neither the type annotation in the callMyMethod function nor the definition of the iMyInterface class type is strictly necessary. Objective Caml can infer everything in your example without any type declarations at all.
Crystal is a statically duck-typed language. Example:
def add(x, y)
x + y
end
add(true, false)
The call to add causes this compilation error:
Error in foo.cr:6: instantiating 'add(Bool, Bool)'
add(true, false)
^~~
in foo.cr:2: undefined method '+' for Bool
x + y
^
A pre-release design for Visual Basic 9 had support for static duck typing using dynamic interfaces but they cut the feature* in order to ship on time.
Boo definitely is a static duck-typed language: http://boo.codehaus.org/Duck+Typing
An excerpt:
Boo is a statically typed language,
like Java or C#. This means your boo
applications will run about as fast as
those coded in other statically typed
languages for .NET or Mono. But using
a statically typed language sometimes
constrains you to an inflexible and
verbose coding style, with the
sometimes necessary type declarations
(like "x as int", but this is not
often necessary due to boo's Type
Inference) and sometimes necessary
type casts (see Casting Types). Boo's
support for Type Inference and
eventually generics help here, but...
Sometimes it is appropriate to give up
the safety net provided by static
typing. Maybe you just want to explore
an API without worrying too much about
method signatures or maybe you're
creating code that talks to external
components such as COM objects. Either
way the choice should be yours not
mine.
Along with the normal types like
object, int, string...boo has a
special type called "duck". The term
is inspired by the ruby programming
language's duck typing feature ("If it
walks like a duck and quacks like a
duck, it must be a duck").
New versions of C++ move in the direction of static duck typing. You can some day (today?) write something like this:
auto plus(auto x, auto y){
return x+y;
}
and it would fail to compile if there's no matching function call for x+y.
As for your criticism:
A new "CallMyMethod" is created for each different type you pass to it, so it's not really type inference.
But it IS type inference (you can say foo(bar) where foo is a templated function), and has the same effect, except it's more time-efficient and takes more space in the compiled code.
Otherwise, you would have to look up the method during runtime. You'd have to find a name, then check that the name has a method with the right parameters.
Or you would have to store all that information about matching interfaces, and look into every class that matches an interface, then automatically add that interface.
In either case, that allows you to implicitly and accidentally break the class hierarchy, which is bad for a new feature because it goes against the habits of what programmers of C#/Java are used to. With C++ templates, you already know you're in a minefield (and they're also adding features ("concepts") to allow restrictions on template parameters).
Structural types in Scala does something like this.
See Statically Checked “Duck Typing” in Scala
D (http://dlang.org) is a statically compiled language and provides duck-typing via wrap() and unwrap() (http://dlang.org/phobos-prerelease/std_typecons.html#.unwrap).
Sounds like Mixins or Traits:
http://en.wikipedia.org/wiki/Mixin
http://www.iam.unibe.ch/~scg/Archive/Papers/Scha03aTraits.pdf
In the latest version of my programming language Heron it supports something similar through a structural-subtyping coercion operator called as. So instead of:
MyClass obj = new MyClass();
CallMyMethod(obj);
You would write:
MyClass obj = new MyClass();
CallMyMethod(obj as IMyInterface);
Just like in your example, in this case MyClass does not have to explicitly implement IMyInterface, but if it did the cast could happen implicitly and the as operator could be omitted.
I wrote a bit more about the technique which I call explicit structural sub-typing in this article.