64bit Hello world freezes after compiling - x86-64

Program compiles, but freezes after starting. If replace the format and include with 32-bit versions or comment out the MessageBox, then everything works fine.
format PE64 GUI
include 'E:\Fresh\include\win64a.inc'
entry start
section '.data' data readable writeable
text db 'Hello world!',0
section '.text' code readable executable
start:
invoke MessageBox,0,text,text,0
invoke ExitProcess,0
section '.idata' import data readable writeable
library kernel32,'KERNEL32.DLL', user32, 'USER32.DLL'
import kernel32, ExitProcess, 'ExitProcess'
import user32, MessageBox, 'MessageBoxA'

Your stack is not aligned to 16 bytes, as the ABI requires. Add and rsp, -16 to the beginning of your code, and it will work.
Regarding this exchange in the comments:
Ruslan: What does the disassembly look like? Are invoke macros expanded as expected?
rancid_rot: Not sure, there is MessageBox in cs instead of ds. And mov rcx,0 instead push 0.
I'd recommend avoiding invoke and similar macros until you learn what they should expand to. Otherwise you think you write in assembly, but actually you write in a high-level language only resembling assembly, not even knowing what code you will get in the end—thus defying the whole purpose of using an assembler.
To actually learn to call functions in Win64 assembly, see the documentation on Win64 calling conventions.

Related

How does a disassembler work and how is it different from a decompiler?

I'm looking into installing a disassembler (or decompiler) on my Linux Mint 17.3 OS and I wanted to know what the difference is between a disassembler and a decompiler. I have a rough idea of what they are (the names are fairly self-explanatory), but they are still a bit confusing.
I've read that a disassembler turns a program into assembly language, which I don't know, so it seems kind of useless to me. I've also read that a decompiler turns a 'binary file' into its source code. What exactly is a binary file?
Apparently, decompilers cannot decompile to C, only Python and other similar languages. So how can I turn a program into its original C source code?
A disassembler is a pretty straightforward application that transfers machine code into assembly language statements - This activity is the reverse operation that an assembler program does and is straightforward because there is a strict one-to-one relationship between machine code and assembly. A disassembler aims at a specific CPU. The original assembler that was used to create the executable is only of minor relevance.
A decompiler aims at recreating a compiled high-level language program from machine code into its original format - Thus trying the reverse operation of a C or Forth (popular languages for which de-compilers exist) compiler. Because there are so many high-level languages and thus so many ways in how original high-level language constructs could be expressed in machine code (even a lot of different strategies for the same language and construct, even in the same compiler, and even different strategies depending on the compiler mode and situation), this operation is much more complex and very dependent on the original compiler (and maybe even the command line that was used, it's chosen optimization level and also the used version).
Even if all that fits, most of the work of a decompiler is educated guessing and will most probably never reach a point where it can reconstruct the original program in its source code form 100% - It will rather end up with a version of source code that could have been the original program.

How could I run a single line of code (not script) from command prompt?

Simple question here, just can't seem to pass it google in a way it can understand.
Say I wanted to execute a line of actual programming code (c++ or java or python... etc) like SetCursorPos or printf from the command prompt command line. I vaguely imagine I would have to invoke the compiler and pass the command to it like a parameter, where from it would then be converted into machine language and passed to... where exactly?
Okay so that was kind of two questions.
How to run actual code from the command line and
what exactly is happening when a fully compiled program, or converted line of code (presuming these are essentially binary containers at that point), is executed?
Question one takes priority obviously. Unfortunately, I can not find any documentation on it, just a bunch of stuff vaguely related to it.
How to run actual code from the command line
Without delving into the vast amounts of blurriness between them, there are two major categories of language implementations: interpreters and compilers.
With many interpreters (or implementations with implicit compilation, such as V8 JavaScript's jit compiler, or pretty much anything with a repl), running a single line from the command line should be fairly trivial. CPython (the standard implementation of Python) has the -c command option:
$ python -c 'print("Hello, world!")'
Hello, world!
Language implementations with explicit compilation steps will tend to be decidedly less simple. In particular, the compiler would need to either accept source either from directly out of the argument list, or from standard input (via piping or redirection). On the output side, your compiler would have to support immediately executing that program, or outputting it to standard out, so that an operating system feature (if it exists) can execute it from a pipe.
To my knowledge, most explicit compilers are not designed with such usage in mind. In such cases, your best bet is to see if there is a REPL available for the language in question, preferably one as compatible with your compiler as possible, or to create (or find) a wrapper that makes it look like your language has a REPL. The wrapper would:
Accept input along the lines of CPython above.
Create a temporary source file behind the scenes with the code to be run and any necessary boilerplate.
Pass that file to the compiler.
Automatically run the resulting executable.
Delete the source file and executable. These may be cleaned up by the operating system later instead, if they're in a temp directory.
From the point of view of the user, this should look pretty similar to the CPython example, as they wouldn't have to interact with or see the compiler or temporary files.

How can I reference #defines in a C file from python?

I have a C file that has a bunch of #defines for bits that I'd like to reference from python. There's enough of them that I'd rather not copy them into my python code, instead is there an accepted method to reference them directly from python?
Note: I know I can just open the header file and parse it, that would be simple, but if there's a more pythonic way, I'd like to use it.
Edit:
These are very simple #defines that define the meanings of bits in a mask, for example:
#define FOO_A 0x3
#define FOO_B 0x5
Running under the assumption that the C .h file contains only #defines (and therefore has nothing external to link against), then the following would work with swig 2.0 (http://www.swig.org/) and python 2.7 (tested). Suppose the file containing just defines is named just_defines.h as above:
#define FOO_A 0x3
#define FOO_B 0x5
Then:
swig -python -module just just_defines.h ## generates just_defines.py and just_defines_wrap.c
gcc -c -fpic just_defines_wrap.c -I/usr/include/python2.7 -I. ## creates just_defines_wrap.o
gcc -shared just_defines_wrap.o -o _just.so ## create _just.so, goes with just_defines.py
Usage:
$ python
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import just
>>> dir(just)
['FOO_A', 'FOO_B', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_just', '_newclass', '_object', '_swig_getattr', '_swig_property', '_swig_repr', '_swig_setattr', '_swig_setattr_nondynamic']
>>> just.FOO_A
3
>>> just.FOO_B
5
>>>
If the .h file also contains entry points, then you need to link against some library (or more) to resolve those entry points. That makes the solution a little more complicated since you may have to hunt down the correct libs. But for a "just defines case" you don't have to worry about this.
You might have some luck with the h2py.py script found in the Tools/scripts directory of the Python source tarball. While it can't handle complex preprocessor macros, it might be sufficient for your needs.
Here is a description of the functionality from the top of the script:
Read #define's and translate to Python code.
Handle #include statements.
Handle #define macros with one argument.
Anything that isn't recognized or doesn't translate into valid
Python is ignored.
Without filename arguments, acts as a filter.
If one or more filenames are given, output is written to corresponding
filenames in the local directory, translated to all uppercase, with
the extension replaced by ".py".
By passing one or more options of the form "-i regular_expression"
you can specify additional strings to be ignored. This is useful
e.g. to ignore casts to u_long: simply specify "-i '(u_long)'".
#defines are macros, that have no meaning whatsoever outside of your C compiler's preprocessor. As such, they are the bane of multi-language programmers everywhere. (For example, see this Ada question: Setting the license for modules in the linux kernel from two weeks ago).
Short of running your source code through the C-preprocessor, there really is no good way to deal with them. I typically just figure out what they evalutate to (in complex cases, often there's no better way to do this than to actually compile and run the damn code!), and hard-code that value into my program.
The (well one of the) annoying parts is that the C preprocessor is considered by C coders to be a very simple little thing that they often use without even giving a second thought to. As a result, they tend to be shocked that it causes big problems for interoperability, while we can deal with most other problems C throws at us fairly easily.
In the simple case shown above, by far the easiest way to handle it would be to encode the same two values in constants in your python program somewhere. If keeping up with changes is a big deal, it probably wouldn't be too much trouble to write a Python program to parse those values out of the file. However, you'd have to realise that your C code would only re-evaluate those values on a compile, while your python program would do it whenever it runs (and thus should probably only be run when the C code is also compiled).
If you're writing an extension module, use http://docs.python.org/3/c-api/module.html#PyModule_AddIntMacro
I had almost exactly this same problem so wrote a Python script to parse the C file. It's intended to be renamed to match your c file (but with .py instead of .h) and imported as a Python module.
Code: https://gist.github.com/camlee/3bf869a5bf39ac5954fdaabbe6a3f437
Example:
configuration.h
#define VERBOSE 3
#define DEBUG 1
#ifdef DEBUG
#define DEBUG_FILE "debug.log"
#else
#define NOT_DEBUGGING 1
#endif
Using from Python:
>>> import configuration
>>> print("The verbosity level is %s" % configuration.VERBOSE)
The verbosity level is 3
>>> configuration.DEBUG_FILE
'"debug.log"'
>>> configuration.NOT_DEBUGGING is None
True

Racket Interactive vs Compiled Performance

Whether or not I compile a Racket program seems to make no difference to the runtime performance.
Is it just the loading of the file initially that is improved by compilation? In other words, does running racket src.rkt do a jit compilation on the fly, which is why I see no difference in compiling vs interactive?
Even for tight loops of integer arithmetic, where I thought some difference would occur, the profile times are equivalent whether or not I previously did a raco make.
Am I missing something simple?
PS, I notice that I can run racket against the source file (.rkt) or .zo file. Does racket automatically use the .zo if one is found that corresponds to the .rkt file, or does the .zo file need to be used explicitly? Either way, it makes no difference to the performance numbers I'm seeing.
Yes, you're right.
Racket compiles code in two stages: first, the code is compiled into bytecode form, and then when it runs it gets jitted into machine code. When you compile a file, you're basically creating the bytecode which saves on re-compiling it later. Since that's usually not something that takes a lot of time for small pieces of code, you won't see any noticeable difference in runtimes. For an extreme example, you can delete all *.zo files in the collection tree and start DrRacket -- it will take a lot of time to start since there's a ton of code, but once it does start, it would run almost as usual. (It would also be slow to click "run" since that will reload and recompile some files.) Another concern for bigger pieces of code is that the compilation process can make memory consumption higher, but that's also not an issue with smaller pieces of code.
See also the Performace chapter in the guide for hints on how to improve performance.
Racket will always compile your code, regardless of whether it is run interactively at the REPL or run from the command-line. Here is the section in the guide that explains it. In interactive mode, the compiler turns every expression/definition into bytecode in memory and executes that. Otherwise, the compilers outputs the bytecode to zo files.
Note: Eli replied at the same time I did. See his response for more details.

Matlab code after compilation

I am totally a newbie in Matlab
I want to ask that when we write a program in Matlab software or IDE and save it with a
.m (dot m) file and then compile and execute it, then that .m (dot m) file is converted into which file? I want to know this because i heard that matlab is platform independent and i did google this but i got converting matlab file to C, C++ etc
Sorry for the silly question and thanks in advance.
Matlab is an interpreted language. So in most cases there is no persistent intermediate form. However, there is an encrypted intermediate form called pcode and there are also the MATLAB compiler and MATLAB coder which delivers code in other high level languages such as C.
edit:
pcode is not generated automatically and should be platform/version independent. But it's major purpose is to encrypt the code, not to compile it (although, it does some partial compilation). To use pcode, you still need the MATLAB environment installed, so in many ways it acts like interpreted code.
But from your follow-up question I guess you don't quite understand how MATLAB works. The code gets interpreted (although with a bit of Just-In-Time Compilation), so there is no need for a persistent intermediate code file: the actual data structures representing your code are maintained by MATLAB. In contrast to compiled languages, where your development cycle is something like "write code, compile & link, execute", the compilation (actually: interpretation) step is part of the execution, so you end up with "write code, execute" in most of the cases.
Just to give you some intuitive understanding of the difference between a compiler and an interpreter. A compiler translates a high level language to a lower level language (let's say machine code that can be executed by your computer). Afterwards that compiled code (most likely stored in a file) is executed by your computer. An interpreter on the other hand, interprets your high level code piece by piece, determining what machine code corresponds to your high level code during the runtime of the program and immediately executes that machine code. So there is no real need to have a machine code equivalent of your entire program available (so in many cases an interpreter will not store the complete machine code, as that is just wasted effort and space).
You could look at interpretation more or less as a human would interpret code: when you try to manually determine the output of some code, you follow the calculations line by line and keep track of your results. You don't generally translate that entire code into some different form and afterwards execute that code. And since you don't translate the code entirely, there is no need to persistently store the intermediate form.
As I said above: you can use other tools such as MATLAB coder to convert your MATLAB code to other high languages such as C/C++, or you can use the MATLAB compiler to compile your code to executable form that depends on some runtime libraries. But those are only used in very specific cases (e.g. when you have to deploy a MATLAB application on computers/embedded devices without MATLAB, when you need to improve performance of your code, ...)
note: My explanation about compilers and interpreters is a quick comparison of the archetypal interpreter and compiler. Many real-life cases are somewhere in between, e.g. Java generally compiles to (JVM) bytecode which is then interpreted by the JVM and something similar can be said about the .NET languages and its CLR.
Since MATLAB is an interpreter, you can write code and just execute it from the IDE, without compilation.
If you want to deploy your program, you can use the MATLAB compiler to create an stand-alone executable or a shared library that you can use in a C++ project. On Windows, MATLAB code would compile to an .EXE file or a .DLL file, respectively.