"All programs are interpreted". How? - perl

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?

A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c

Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.

My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.

The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)

I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.

Related

Programming in QuickBasic with repl.it?

I'm trying to get a "retro-computing" class open and would like to give people the opportunity to finish projects at home (without carrying a 3kb monstrosity out of 1980 with them) I've heard that repl.it has every programming language, does it have QuickBasic and how do I use it online? Thanks for the help in advance!
You can do it (hint: search for QBasic; it shares syntax with QuickBASIC), but you should be aware that it has some limitations as it's running on an incomplete JavaScript implementation. For completeness, I'll reproduce the info from the original blog post:
What works
Only text mode is supported. The most common commands (enough to run
nibbles) are implemented. These include:
Subs and functions
Arrays
User types
Shared variables
Loops
Input from screen
What doesn't work
Graphics modes are not supported
No statements are allowed on the same line as IF/THEN
Line numbers are not supported
Only the built-in functions used by NIBBLES.BAS are implemented
All subroutines and functions must be declared using DECLARE
This is far from being done. In the comments, AC0KG points out that
P=1-1 doesn't work.
In short, it would need another 50 or 100 hours of work and there is
no reason to do this.
One caveat that I haven't been able to determine is a statement like INPUT or LINE INPUT... They just don't seem to work for me on repl.it, and I don't know where else one might find qb.js hosted.
My recommendation: FreeBASIC
I would recommend FreeBASIC instead, if possible. It's essentially a modern reimplementation coded in C++ (last I knew) with additional functionality.
Old DOS stuff like the DEF SEG statement and VARSEG function are no longer applicable since it is a modern BASIC implementation operating on a 32-bit flat address space rather than 16-bit segmented memory. I'm not sure what the difference between the old SADD function and the new StrPtr function is, if there is any, but the idea is the same: return the address of the bytes that make up a string.
You could also disable some stuff and maintain QB compatibility using #lang "qb" as the first line of a program as there will be noticeable differences when using the default "fb" dialect, or you could embrace the new features and avoid the "qb" dialect, focusing primarily on the programming concepts instead; the choice is yours. Regardless of the dialect you choose, the basic stuff should work just fine:
DECLARE SUB collatz ()
DIM SHARED n AS INTEGER
INPUT "Enter a value for n: ", n
PRINT n
DO WHILE n <> 4
collatz
PRINT n
LOOP
PRINT 2
PRINT 1
SUB collatz
IF n MOD 2 = 1 THEN
n = 3 * n + 1
ELSE
n = n \ 2
END IF
END SUB
A word about QB64
One might argue that there is a much more compatible transpiler known as QB64 (except for some things like DEF FN...), but I cannot recommend it if you want a tool for students to use. It's a large download for Windows users, and its syntax checking can be a bit poor at times, to the point that you might see the QB code compile only to see a cryptic message like "C++ compilation failed! See internals\temp\compile.txt for details". Simply put, it's usable and highly compatible, but it needs some work, like the qb.js script that repl.it uses.
An alternative: DOSBox and autorun
You could also find a way to run an actual copy of QB 4.5 in something like DOSBox and simply modify the autorun information in the default DOSBox.conf (or whatever it's called) to automatically launch QB. Then just repackage it with the modified DOSBox.conf in a nice installer for easy distribution (NSIS, Inno Setup, etc.) This will provide the most retro experience beyond something like a FreeDOS virtual machine as you'll be dealing with the 16-bit segmented memory, VGA, etc.—all emulated of course.

How does a disassembler work and how is it different from a decompiler?

I'm looking into installing a disassembler (or decompiler) on my Linux Mint 17.3 OS and I wanted to know what the difference is between a disassembler and a decompiler. I have a rough idea of what they are (the names are fairly self-explanatory), but they are still a bit confusing.
I've read that a disassembler turns a program into assembly language, which I don't know, so it seems kind of useless to me. I've also read that a decompiler turns a 'binary file' into its source code. What exactly is a binary file?
Apparently, decompilers cannot decompile to C, only Python and other similar languages. So how can I turn a program into its original C source code?
A disassembler is a pretty straightforward application that transfers machine code into assembly language statements - This activity is the reverse operation that an assembler program does and is straightforward because there is a strict one-to-one relationship between machine code and assembly. A disassembler aims at a specific CPU. The original assembler that was used to create the executable is only of minor relevance.
A decompiler aims at recreating a compiled high-level language program from machine code into its original format - Thus trying the reverse operation of a C or Forth (popular languages for which de-compilers exist) compiler. Because there are so many high-level languages and thus so many ways in how original high-level language constructs could be expressed in machine code (even a lot of different strategies for the same language and construct, even in the same compiler, and even different strategies depending on the compiler mode and situation), this operation is much more complex and very dependent on the original compiler (and maybe even the command line that was used, it's chosen optimization level and also the used version).
Even if all that fits, most of the work of a decompiler is educated guessing and will most probably never reach a point where it can reconstruct the original program in its source code form 100% - It will rather end up with a version of source code that could have been the original program.

How could I run a single line of code (not script) from command prompt?

Simple question here, just can't seem to pass it google in a way it can understand.
Say I wanted to execute a line of actual programming code (c++ or java or python... etc) like SetCursorPos or printf from the command prompt command line. I vaguely imagine I would have to invoke the compiler and pass the command to it like a parameter, where from it would then be converted into machine language and passed to... where exactly?
Okay so that was kind of two questions.
How to run actual code from the command line and
what exactly is happening when a fully compiled program, or converted line of code (presuming these are essentially binary containers at that point), is executed?
Question one takes priority obviously. Unfortunately, I can not find any documentation on it, just a bunch of stuff vaguely related to it.
How to run actual code from the command line
Without delving into the vast amounts of blurriness between them, there are two major categories of language implementations: interpreters and compilers.
With many interpreters (or implementations with implicit compilation, such as V8 JavaScript's jit compiler, or pretty much anything with a repl), running a single line from the command line should be fairly trivial. CPython (the standard implementation of Python) has the -c command option:
$ python -c 'print("Hello, world!")'
Hello, world!
Language implementations with explicit compilation steps will tend to be decidedly less simple. In particular, the compiler would need to either accept source either from directly out of the argument list, or from standard input (via piping or redirection). On the output side, your compiler would have to support immediately executing that program, or outputting it to standard out, so that an operating system feature (if it exists) can execute it from a pipe.
To my knowledge, most explicit compilers are not designed with such usage in mind. In such cases, your best bet is to see if there is a REPL available for the language in question, preferably one as compatible with your compiler as possible, or to create (or find) a wrapper that makes it look like your language has a REPL. The wrapper would:
Accept input along the lines of CPython above.
Create a temporary source file behind the scenes with the code to be run and any necessary boilerplate.
Pass that file to the compiler.
Automatically run the resulting executable.
Delete the source file and executable. These may be cleaned up by the operating system later instead, if they're in a temp directory.
From the point of view of the user, this should look pretty similar to the CPython example, as they wouldn't have to interact with or see the compiler or temporary files.

Matlab code after compilation

I am totally a newbie in Matlab
I want to ask that when we write a program in Matlab software or IDE and save it with a
.m (dot m) file and then compile and execute it, then that .m (dot m) file is converted into which file? I want to know this because i heard that matlab is platform independent and i did google this but i got converting matlab file to C, C++ etc
Sorry for the silly question and thanks in advance.
Matlab is an interpreted language. So in most cases there is no persistent intermediate form. However, there is an encrypted intermediate form called pcode and there are also the MATLAB compiler and MATLAB coder which delivers code in other high level languages such as C.
edit:
pcode is not generated automatically and should be platform/version independent. But it's major purpose is to encrypt the code, not to compile it (although, it does some partial compilation). To use pcode, you still need the MATLAB environment installed, so in many ways it acts like interpreted code.
But from your follow-up question I guess you don't quite understand how MATLAB works. The code gets interpreted (although with a bit of Just-In-Time Compilation), so there is no need for a persistent intermediate code file: the actual data structures representing your code are maintained by MATLAB. In contrast to compiled languages, where your development cycle is something like "write code, compile & link, execute", the compilation (actually: interpretation) step is part of the execution, so you end up with "write code, execute" in most of the cases.
Just to give you some intuitive understanding of the difference between a compiler and an interpreter. A compiler translates a high level language to a lower level language (let's say machine code that can be executed by your computer). Afterwards that compiled code (most likely stored in a file) is executed by your computer. An interpreter on the other hand, interprets your high level code piece by piece, determining what machine code corresponds to your high level code during the runtime of the program and immediately executes that machine code. So there is no real need to have a machine code equivalent of your entire program available (so in many cases an interpreter will not store the complete machine code, as that is just wasted effort and space).
You could look at interpretation more or less as a human would interpret code: when you try to manually determine the output of some code, you follow the calculations line by line and keep track of your results. You don't generally translate that entire code into some different form and afterwards execute that code. And since you don't translate the code entirely, there is no need to persistently store the intermediate form.
As I said above: you can use other tools such as MATLAB coder to convert your MATLAB code to other high languages such as C/C++, or you can use the MATLAB compiler to compile your code to executable form that depends on some runtime libraries. But those are only used in very specific cases (e.g. when you have to deploy a MATLAB application on computers/embedded devices without MATLAB, when you need to improve performance of your code, ...)
note: My explanation about compilers and interpreters is a quick comparison of the archetypal interpreter and compiler. Many real-life cases are somewhere in between, e.g. Java generally compiles to (JVM) bytecode which is then interpreted by the JVM and something similar can be said about the .NET languages and its CLR.
Since MATLAB is an interpreter, you can write code and just execute it from the IDE, without compilation.
If you want to deploy your program, you can use the MATLAB compiler to create an stand-alone executable or a shared library that you can use in a C++ project. On Windows, MATLAB code would compile to an .EXE file or a .DLL file, respectively.

Is there some kind of tool to look at the encoding of Intel x86 instructions?

Forgive me if this might be a dumb question but, I'm in an assembly class that was mostly taught using an emulated CPU that was supposed to teach the concepts of assembly code. We haven't even written an Intel program, so I'm trying to adjust. In our emulated CPU, we were able to generate a symbol table file that gave the bytes equivalent for instructions:
http://imgur.com/tw5S8.png
Would I be able to do such a thing with Intel x86 instructions?
Try IDA. It has an option to show binary values of opcodes.
EDIT: Well.. it's a disassembler. Try opening a binary file, and set the number of opcode bytes to show (in Options/General/) to something that is not zero.
If you are looking for an IDE that shows you in real time the opcodes for the instruction you've used, then I don't think you'll find one, because of lack of "market". Can you explain why you need it? Do you want to know just their length, or want to learn them? There is simple pattern for lengths, so by dissasembling many binaries you'll catch it. If it's the opcodes you want.. well, there are lots of them, almost no rules, and practically no use to do it.
I see.. then you have to generate the list file . Your assembler should have an option for that. (for NASM it's -l listfile). Just put any instruction(s) in your .asm file, and generate listing for it. It should contain the binary encoding for each instruction.
First, get Intel Instruction Set Refference, or, better, this link: http://siyobik.info/index.php?module=x86 . There you'll find that most opcodes have several encodings. In your particular case, the bit 1 of the opcode specifies direction, and since both operands are registers, you can toggle the direction and swap the register codes, and the result will be the same. Usually you have this freedom on most register to register arithmetic operations. To check this, try decompiling with IDA this source file:
db 02h, E0h
db 00h, C4h
There is a demo program shipped with fasm.dll which has an editor and hex-viewer: