How does Perl react during compile time and run time? - perl

What my understanding over this would be:
During compile time, an error is one that keeps Perl from being able to parse the file; such as a missing semi-colon.
And a run time error is an error that can not be detected until the code is run; such as a divide by zero error or a call to an undefined subroutine.
As Perl is an interpreted language, will the entire code or script be complied once and then run or will it compile for each and every line and then go for run?
What is the explanation?

The program will be compiled once into an optree. The optree is traversed and executed.
At runtime, it can happen that additional compile phases become necessary. The usual culprits are string eval and delayed/dynamic loading of code units, e.g. require, do.

Perl is an interpreted language, which means that both the compile phase and the run phase happen in sequence every time you attempt to start a script.
Compiled languages on the other hand (C, Pascal, etc) separate out those two phases, and typically have an intermediate phase called linking, which joins together object files and libraries into the final executable file.
In compiled languages, detecting an undefined function can happen in either the compile phase, or the linking phase, depending on how strict the language specification is. In original C calling an undefined function would be found by the linker, but in C++ it would be found by the compiler.
To further complicated matters, some languages such as Java have separate compile and execution phases, but the compilation is actually to an intermediate "byte code", which is then interpreted by a run time system (i.e. the Java Virtual Machine).
Strictly speaking, Perl also uses an intermediate byte code, but the separation of the phases is mostly invisible.

Related

How could I run a single line of code (not script) from command prompt?

Simple question here, just can't seem to pass it google in a way it can understand.
Say I wanted to execute a line of actual programming code (c++ or java or python... etc) like SetCursorPos or printf from the command prompt command line. I vaguely imagine I would have to invoke the compiler and pass the command to it like a parameter, where from it would then be converted into machine language and passed to... where exactly?
Okay so that was kind of two questions.
How to run actual code from the command line and
what exactly is happening when a fully compiled program, or converted line of code (presuming these are essentially binary containers at that point), is executed?
Question one takes priority obviously. Unfortunately, I can not find any documentation on it, just a bunch of stuff vaguely related to it.
How to run actual code from the command line
Without delving into the vast amounts of blurriness between them, there are two major categories of language implementations: interpreters and compilers.
With many interpreters (or implementations with implicit compilation, such as V8 JavaScript's jit compiler, or pretty much anything with a repl), running a single line from the command line should be fairly trivial. CPython (the standard implementation of Python) has the -c command option:
$ python -c 'print("Hello, world!")'
Hello, world!
Language implementations with explicit compilation steps will tend to be decidedly less simple. In particular, the compiler would need to either accept source either from directly out of the argument list, or from standard input (via piping or redirection). On the output side, your compiler would have to support immediately executing that program, or outputting it to standard out, so that an operating system feature (if it exists) can execute it from a pipe.
To my knowledge, most explicit compilers are not designed with such usage in mind. In such cases, your best bet is to see if there is a REPL available for the language in question, preferably one as compatible with your compiler as possible, or to create (or find) a wrapper that makes it look like your language has a REPL. The wrapper would:
Accept input along the lines of CPython above.
Create a temporary source file behind the scenes with the code to be run and any necessary boilerplate.
Pass that file to the compiler.
Automatically run the resulting executable.
Delete the source file and executable. These may be cleaned up by the operating system later instead, if they're in a temp directory.
From the point of view of the user, this should look pretty similar to the CPython example, as they wouldn't have to interact with or see the compiler or temporary files.

What are the avaialble compilers/interpreters for Perl 5?

Like C where gcc, borland and many more compilers are available, I am wondering whether any other Compiler/Interpreters are available for Perl 5?
From my reading, I understand there was perlcc which compiled the code into B:OP format and then interpreter was used to convert the optree to machine executable.
Ignore perlcc; it is no longer part of Perl, and will only confuse you*.
Perl is an interpreted language. Upon startup, the Perl interpreter parses the source code of a script and executes it immediately. While there is an intermediate representation (the optree), it is purely in memory, and is not reused.
There is only one Perl interpreter. There are no alternate implementations.
(If you're curious: perlcc worked by storing the optree as constant data in an executable which linked against the Perl interpreter. This was a dubious optimization; it didn't actually save much startup time, didn't affect runtime at all, and broke many scripts. It wasn't actually transforming the Perl script to C.)

Racket Interactive vs Compiled Performance

Whether or not I compile a Racket program seems to make no difference to the runtime performance.
Is it just the loading of the file initially that is improved by compilation? In other words, does running racket src.rkt do a jit compilation on the fly, which is why I see no difference in compiling vs interactive?
Even for tight loops of integer arithmetic, where I thought some difference would occur, the profile times are equivalent whether or not I previously did a raco make.
Am I missing something simple?
PS, I notice that I can run racket against the source file (.rkt) or .zo file. Does racket automatically use the .zo if one is found that corresponds to the .rkt file, or does the .zo file need to be used explicitly? Either way, it makes no difference to the performance numbers I'm seeing.
Yes, you're right.
Racket compiles code in two stages: first, the code is compiled into bytecode form, and then when it runs it gets jitted into machine code. When you compile a file, you're basically creating the bytecode which saves on re-compiling it later. Since that's usually not something that takes a lot of time for small pieces of code, you won't see any noticeable difference in runtimes. For an extreme example, you can delete all *.zo files in the collection tree and start DrRacket -- it will take a lot of time to start since there's a ton of code, but once it does start, it would run almost as usual. (It would also be slow to click "run" since that will reload and recompile some files.) Another concern for bigger pieces of code is that the compilation process can make memory consumption higher, but that's also not an issue with smaller pieces of code.
See also the Performace chapter in the guide for hints on how to improve performance.
Racket will always compile your code, regardless of whether it is run interactively at the REPL or run from the command-line. Here is the section in the guide that explains it. In interactive mode, the compiler turns every expression/definition into bytecode in memory and executes that. Otherwise, the compilers outputs the bytecode to zo files.
Note: Eli replied at the same time I did. See his response for more details.

Matlab code after compilation

I am totally a newbie in Matlab
I want to ask that when we write a program in Matlab software or IDE and save it with a
.m (dot m) file and then compile and execute it, then that .m (dot m) file is converted into which file? I want to know this because i heard that matlab is platform independent and i did google this but i got converting matlab file to C, C++ etc
Sorry for the silly question and thanks in advance.
Matlab is an interpreted language. So in most cases there is no persistent intermediate form. However, there is an encrypted intermediate form called pcode and there are also the MATLAB compiler and MATLAB coder which delivers code in other high level languages such as C.
edit:
pcode is not generated automatically and should be platform/version independent. But it's major purpose is to encrypt the code, not to compile it (although, it does some partial compilation). To use pcode, you still need the MATLAB environment installed, so in many ways it acts like interpreted code.
But from your follow-up question I guess you don't quite understand how MATLAB works. The code gets interpreted (although with a bit of Just-In-Time Compilation), so there is no need for a persistent intermediate code file: the actual data structures representing your code are maintained by MATLAB. In contrast to compiled languages, where your development cycle is something like "write code, compile & link, execute", the compilation (actually: interpretation) step is part of the execution, so you end up with "write code, execute" in most of the cases.
Just to give you some intuitive understanding of the difference between a compiler and an interpreter. A compiler translates a high level language to a lower level language (let's say machine code that can be executed by your computer). Afterwards that compiled code (most likely stored in a file) is executed by your computer. An interpreter on the other hand, interprets your high level code piece by piece, determining what machine code corresponds to your high level code during the runtime of the program and immediately executes that machine code. So there is no real need to have a machine code equivalent of your entire program available (so in many cases an interpreter will not store the complete machine code, as that is just wasted effort and space).
You could look at interpretation more or less as a human would interpret code: when you try to manually determine the output of some code, you follow the calculations line by line and keep track of your results. You don't generally translate that entire code into some different form and afterwards execute that code. And since you don't translate the code entirely, there is no need to persistently store the intermediate form.
As I said above: you can use other tools such as MATLAB coder to convert your MATLAB code to other high languages such as C/C++, or you can use the MATLAB compiler to compile your code to executable form that depends on some runtime libraries. But those are only used in very specific cases (e.g. when you have to deploy a MATLAB application on computers/embedded devices without MATLAB, when you need to improve performance of your code, ...)
note: My explanation about compilers and interpreters is a quick comparison of the archetypal interpreter and compiler. Many real-life cases are somewhere in between, e.g. Java generally compiles to (JVM) bytecode which is then interpreted by the JVM and something similar can be said about the .NET languages and its CLR.
Since MATLAB is an interpreter, you can write code and just execute it from the IDE, without compilation.
If you want to deploy your program, you can use the MATLAB compiler to create an stand-alone executable or a shared library that you can use in a C++ project. On Windows, MATLAB code would compile to an .EXE file or a .DLL file, respectively.

"All programs are interpreted". How?

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.