Questions regarding address binding - operating-system

There are three different address binding techniques :
Compile time binding
Load time binding
Run time binding
I have many questions regarding them :
1- According to my understanding, every OS uses a certain address binding technique to implement. Modern OSes uses run time binding and MS-DOS uses Compile time binding. Is that right ? Or the programmer can select which address binding to use ?
2- At compile time binding is used if the compiler knows in advance where the program will resides in main memory. How do the compiler know this future information ? Is it given by the programmer ?
3- Run time binding is used if the program will always change his location during execution time . Is for example swapping and segmentation compaction are examples why the programs will change their location during execution , or they are different concepts here ?

1- According to my understanding, every OS uses a certain address binding technique to implement. Modern OSes uses run time binding and MS-DOS uses Compile time binding. Is that right ? Or the programmer can select which address binding to use ?
Actually, even the most recent OS uses all the binding/linking types. It isn't really a matter of the OS but mostly a matter of user mode implementation. It is mostly the programmer which will decide which linking type to use but it isn't an explicit choice. It is chosen by the programmer implicitly by the way it programs in high level code. For example, if the programmer decides to use virtual functions in C++ then the address to jump to will be determined at runtime based on the type of the object referenced instead of the type of the reference. It is useful/recommended to know about linking and how it works to write more efficient code and to know what you are doing.
2- At compile time binding is used if the compiler knows in advance where the program will resides in main memory. How do the compiler know this future information ? Is it given by the programmer ?
It is quite complex and there is a lot to say about that. Today, you have paging that is always in use. A program has access to the whole address space. This means that a compiler doesn't have to care much about the position of an executable. It cares mostly about the start address and the rest is position independent or absolute. Position independent code doesn't even care all that much about the start address. The start address mostly provides a position to find the _start symbol but that's it. The rest of the code is using relative jumps to some position in the text segment and RIP-relative addressing to access global/static data.
It isn't that much about where the program will reside. It is a matter of if the function is present and implemented within your own code or in another library. If a function is present in another library, then the compiler will leave an unresolved symbol in the executable for the linker to resolve before execution. This is called dynamic linking an example of load time binding. The dynamic linker will find that dynamic library and place it somewhere in RAM. The dynamic linker will resolve where to jump to get to that function.
3- Run time binding is used if the program will always change his location during execution time . Is for example swapping and segmentation compaction are examples why the programs will change their location during execution , or they are different concepts here ?
There is no "program that will always change his location". Run-time binding is mostly present in C++ and is only a matter of virtual functions. All linking is done at compile time or at load time except for virtual functions. Virtual functions allow to call a function in an object by determining the type of the object referenced instead of using the type of the reference itself. This is used to provide polymorphism because the same function signature can have several implementations and which one to call will be determined based on the type of the object referenced. (For more info see: https://cs.stackexchange.com/questions/145444/what-attributes-are-bound-statically-vs-dynamically/145470#145470)

Related

How is Dart FFI implemented indeed? Are they as cheap as a normal function call, or do they do heavy lifting under the hood?

I am quite interested in how Dart FFI is implemented. Are they as cheap as a normal function call, or do they do heavy lifting under the hood?
I have searched through the Internet but cannot find much information. I only find this article talking a bit about the insights of argument passing and ABIs. In addition, I guess it should have some protections because Dart has things like GC while C does not.
Thanks for any hints!
Dart FFI uses C's dlopen() for platforms other than Windows (non-POSIX). It's a very lightweight interface to shared objects. Compiled shared objects contain a table with symbol names and their memory offset. The shared object is opened with dlopen() then specific contained items are found in memory using dlsym() with their symbol name. This provides a memory address for, say, a function with the name 'testDartFFI()'. The dart runtime can then call this function with the memory address, using the Dart prototype to correctly pass and return values based on C standards. Overhead-wise it's not much different than calling other dynamically linked system libraries.

How are CPU Register values updated?

I know this might be a silly question which I am asking, but I am really curious about this, since I am not having much knowledge of computer architecture.
Suppose I have a Register R1 and say I loaded value of a variable say LOCK=5 into the register, so now R1 has the value 5 stored into it, now let's suppose I updated the value of LOCK to 10 after some time, so will the value of register still be 5 or will it be updated.
When it comes to register based CPU architectures, I think Neo from the matrix has a valueable lession: "There are no variables."
Variables, as you're using them in a higher level programming languages are an abstract thing for describing to the compiler what operations to do a particular piece of data. That data may reside in system memory, or for temporary values never leave the register file.
However once the program has been compiled to a binary, there no longer are variables! For debugging purposes the compiler may annotate the code with information of the kind "at this particular position in the code, what is referred to as variable 'x' right now happens to be held in …".
I think the best way to understand this is to compile some very simple programs and look at their respective assembly, to see how things fit together. The Godbolt Compiler Explorer is a really valuable tool, here.

Finding class and function names from an x86-64 executable

I am wondering about if it is always possible, in some way to obtain the function and class names when reversing an application. (in this case a game) I have tried for around 1 month to reverse a game (Assassin's Creed Unity (anvil engine)) but still no luck getting the function names. I have found a way to obtain the class names but no clue on function names.
So my question is, is it possible to actually obtain the function name out having the documentation, and create a hierarchy. (I ame doing this to get better at reversing and to learn new things (asm x64))
Any tips and tricks related to reversing classes/structers are appreciated.
No, function and class names aren't needed for compiled code to work, and usually aren't part of an executable that's had its symbol table stripped.
The exception to that would be calls across DLL boundaries where you might get some mangled C++ names containing function and class names, or if there are any error-check / assert messages in the release build then some names might show up in strings.
C++ with RTTI (RunTime Type Info) might have type names somewhere, maybe mapping vtable pointers to strings, or for classes with no virtual members probably only if typeid was ever actually used. (Or not at all if compiled with RTTI disabled. activate RTTI in c++)
Even exception-handling I think doesn't need class names in the binary.
Other than that, there's no need for class names or function names in the compiled binary. Definitely not in the machine code itself; that's of course all pointers / relative offsets, even for classes with virtual functions. How do objects work in x86 at the assembly level?.
C++ does not generally support introspection, unlike Java, so there's no default need for any of the info you're looking for to be in the executable anywhere.

Solidity contracts : Source control

Suppose, I wrote a contract in Solidity that is now currently being run by a number of nodes. For some reason, I made a change - code or configuration or both. How do I know that, all the nodes running this contract are running the latest version of the code?
Conversely, if the contract was placed in an open repository (e.g. GitHub), how do I know, the code wasn't tampered with?
What if, a majority of nodes did decide to tamper the code and run that version?
Short answer
There is no "latest" version. Basically, you cannot even make a "change" to the deployed code. Please continue to read if you really need your DApp to be updatable.
Explanation
When you compile your Solidity source code into EVM byte code, deploy it to the blockchain (by spending some Ether), and wait for the transaction to be mined, the DApp program is fixed. During runtime, the only way that could possibly make a change in the state of the DApp is invoking a public function of the contract.
Solution
You may make use of the above fact to make your DApp updatable:
Define an interface (Cat) having some abstract functions (meow) that you want them to be updatable
Define a contract (Shorthair) that implements the interface (contract Shorthair is Cat)
Deploy the contract (Shorthair)
Define another contract (FunMaker)
Define an address variable (address catAddress)
This is to store the address of your "latest" implementation
In the constructor, accept an address parameter
Assign the value to catAddress
When you want to call the updatable function (meow), do this:
Cat cat = Cat(catAddress)
cat.meow()
Create another function that accepts an address parameter (setCatAddress) so that you can update the value of the address variable (catAddress)
Now, your smart contract (FunMaker) is updatable. When you want to make an update, just write another contract that implements Cat (Shorthair2, Persian, whatever) and deploy it. After the deployment, record the address and call setCatAddress on FunMaker with it to overwrite the value of catAddress.
Caveats
The spirit of smart contracts is "define what to do, do what is defined". Trust is based on the fact that nothing could be changed after deployment. However, if you implement your DApp like this, you are basically reversing the principles.
Besides, the contract being called (Shorthair) cannot deal with balances and variables in the caller (FunMaker) directly. Despite that it is still do-able with careful design and work-arounds, it is better to evaluate if it is worth doing so at the first place.
It's organized on completely different lines.
The contract bytecode (generally from a compiler), not the source, is part of the blockchain. it's indifferent to traditional distribution channels.
The existence of the contract is part of the shared history of the chain, because the bytecode was (part of) a specific transaction that deployed the contract. Said deployment transaction is also part of the immutable history of the chain.
Nodes don't have very much latitude. They don't get to decide what version they want to run. Either they run the actual code or they cease being part of the consensus.
So, basically, you know all the nodes are running the contract you deployed, with few (if any) exceptions. It's the only correct interpretation of the chain.
Hope it helps.
How do I know the source code of the contract at a specific address?
Basically, there is no simple way.
The simplest way I can imagine is by byte code comparison:
Get the byte code at a specific address by this call
Get the source code from the owner
In most of the cases, they are willing to provide you the source code
This is because the spirit of smart contracts is "define what to do, do what is defined"
The trust comes from the fact that the user knows exactly how it is implemented --- no more and no less
Compile it with a Solidity compiler
You may find the official online compiler handy
Do a string comparison on the results

How can I tell if a Perl module is actually used in my program?

I have been on a "cleaning spree" lately at work, doing a lot of touch-up stuff that should have been done awhile ago. One thing I have been doing is deleted modules that were imported into files and never used, or they were used at one point but not anymore. To do this I have just been deleting an import and running the program's test file. Which gets really, really tedious.
Is there any programmatic way of doing this? Short of me writing a program myself to do it.
Short answer, you can't.
Longer possibly more useful answer, you won't find a general purpose tool that will tell you with 100% certainty whether the module you're purging will actually be used. But you may be able to build a special purpose tool to help you with the manual search that you're currently doing on your codebase. Maybe try a wrapper around your test suite that removes the use statements for you and ignores any error messages except messages that say Undefined subroutine &__PACKAGE__::foo and other messages that occur when accessing missing features of any module. The wrapper could then automatically perform a dumb source scan on the codebase of the module being purged to see if the missing subroutine foo (or other feature) might be defined in the unwanted module.
You can supplement this with Devel::Cover to determine which parts of your code don't have tests so you can manually inspect those areas and maybe get insight into whether they are using code from the module you're trying to purge.
Due to the halting problem you can't statically determine whether any program, of sufficient complexity, will exit or not. This applies to your problem because the "last" instruction of your program might be the one that uses the module you're purging. And since it is impossible to determine what the last instruction is, or if it will ever be executed, it is impossible to statically determine if that module will be used. Further, in a dynamic language, which can extend the program during it's run, analysis of the source or even the post-compile symbol tables would only tell you what was calling the unwanted module just before run-time (whatever that means).
Because of this you won't find a general purpose tool that works for all programs. However, if you are positive that your code doesn't use certain run-time features of Perl you might be able to write a tool suited to your program that can determine if code from the module you're purging will actually be executed.
You might create alternative versions of the modules in question, which have only an AUTOLOAD method (and import, see comment) in it. Make this AUTOLOAD method croak on use. Put this module first into the include path.
You might refine this method by making AUTOLOAD only log the usage and then load the real module and forward the original function call. You could also have a subroutine first in #INC which creates the fake module on the fly if necessary.
Of course you need a good test coverage to detect even rare uses.
This concept is definitely not perfect, but it might work with lots of modules and simplify the testing.