Does choosing a programming language decide performance when all of it is compiled to some 1's and 0's
Eg: printf (in C) vs cout (C++) vs print (in Python)
Do all of the above have same binary compiled code ?
Appreciate any help in understanding this concept of programming language and role on hardware in detail! Thanks in advance
The choice of programming language can have many impacts on the performance of your code, how portable it is, the comparability and among other things, how easily the objective can be put into code. To answer you question directly, C and C++ would likely produce the 'same binary' when printing an output, if they were both done for the same target environment. Python is different because it is an interpreted language, meaning the code is read by a program written in code native to the architecture and acted upon accordingly. Python is something of an edge case in this regard because it is technically compiled at execution time (and can be before distribution) but into an intermediate code similar in principle to Java byte code that is only understood by the Python interpreter.
The difference you bring up between lower language's like C and higher ones like Java, Python and even JavaScript is the nature of their execution being done by native hardware or by the interpreter. Language's running on bare metal are generally understood to be faster than those on interpreters as the interpreter takes time to understand the code and uses it's own system resources. Java tends to break this rule because it's interpreter is a full virtual machine that understands very simple byte code, making it competitive in speed to language's like C.
To what kind of binary code they are compiled depends on the compiler. For C and C++ there are dozens of different compilers which might generate different binary code. Besides that, most compilers even have optimization flags that influence the generated binary code a lot.
Python isn't even directly compiled into "machine code", it's compiled into bytecode for a python interpreter. The Python interpreter itself is a program that runs on the machine, then reads the python-bytecode and executes it probably by internally calling predefined functions (that already exist in machine-code)
Related
I'm looking into installing a disassembler (or decompiler) on my Linux Mint 17.3 OS and I wanted to know what the difference is between a disassembler and a decompiler. I have a rough idea of what they are (the names are fairly self-explanatory), but they are still a bit confusing.
I've read that a disassembler turns a program into assembly language, which I don't know, so it seems kind of useless to me. I've also read that a decompiler turns a 'binary file' into its source code. What exactly is a binary file?
Apparently, decompilers cannot decompile to C, only Python and other similar languages. So how can I turn a program into its original C source code?
A disassembler is a pretty straightforward application that transfers machine code into assembly language statements - This activity is the reverse operation that an assembler program does and is straightforward because there is a strict one-to-one relationship between machine code and assembly. A disassembler aims at a specific CPU. The original assembler that was used to create the executable is only of minor relevance.
A decompiler aims at recreating a compiled high-level language program from machine code into its original format - Thus trying the reverse operation of a C or Forth (popular languages for which de-compilers exist) compiler. Because there are so many high-level languages and thus so many ways in how original high-level language constructs could be expressed in machine code (even a lot of different strategies for the same language and construct, even in the same compiler, and even different strategies depending on the compiler mode and situation), this operation is much more complex and very dependent on the original compiler (and maybe even the command line that was used, it's chosen optimization level and also the used version).
Even if all that fits, most of the work of a decompiler is educated guessing and will most probably never reach a point where it can reconstruct the original program in its source code form 100% - It will rather end up with a version of source code that could have been the original program.
I know there is no such thing, strictly speaking, as a compiled or interpreted language.
But, generally speaking, is LISP used to write scripts like Python, bash script, and batch script?
Or is it a general purpose programming language like C++, JAVA, and C#?
Can anyone explain this in simple terms?
Early versions of Lisp programming language and Dartmouth BASIC would be examples interpreter language (parse the source code and perform its behavior directly.). However, Common lisp (Current version) is a compiler language.
Note that most Lisp compilers are not Just In Time compilers. You as a programmer can invoke the compiler, for example in Common Lisp with the functions COMPILE and COMPILE-FILE. Then Lisp code gets compiled.
Additionally most Lisp systems with both a compiler and an interpreter allow the execution of interpreted and compiled code to be freely mixed.
For more details check here
Lisp is a compiled general purpose language, in its modern use.
To clarify:
“LISP” is nowadays understood as “Common Lisp”
Common Lisp is an ANSI Standard
There are several implementations of Common Lisp, both free and commercial
Code is usually compiled, then loaded into an image. The order in which the individual parts/files of an entire system are compiled and loaded is usually defined through a system definition facility (which mostly means ASDF nowadays).
Most implementations also provide a means for loading source code when started. Example:
sbcl --load 'foo.lisp'
This makes it also possible to use lisp source files as “scripts”, even though they will very likely be compiled before execution.
Traditionally, LISP can be interpreted or compiled -- with some of each running at the same time. Compilation, in some cases, would be to a virtual machine like JAVA.
LISP is a general purpose programming language, but rarely used as such anymore. In the days of microcoded LISP machines, the entire operating system, including things like network, graphics and printer drivers, were all written in LISP itself. The very first IMAP mail client, for example, was written entirely in LISP.
The unusual syntax likely makes other programming languages, like Python, more attractive. But if one looks carefully, you can find LISP-inspired elements in popular languages like Perl.
I wish that John McCarthy was still alive, but...
From LISP 1.5 Programmer's Manual :
LISP can interpret and execute programs written in the form of S-
expressions. Thus, like machine language, and unlike most other higher
level languages, it can be used to generate programs for further
execution.
I need more clarification about how machine language can used to generate programs and how Lisp can do it?
All that is saying is that machine code can directly write machine instructions to memory and jump to those instructions to execute them; this is the basis of many attack vectors to break into software, in fact.
The point is, when you're writing machine code, it's easy to generate machine code. But when you're writing in a compiled language like C, you can't just generate C code at run time and then execute it - unless your program includes a C compiler.
Lisp - and, these days, many other languages, especially "scripting languages" like Perl, Python, Ruby, Tcl, Javascript, and command shells - have the ability to execute code that is generated at runtime. In Lisp, since code and data have the same structure, this is usually less work than it is in the other languages, where the code to be evaluated is generally a string that has to be parsed. (Though Perl has the ability to eval a block instead of a string, which lets the compiler do the parsing ahead of time for literal code.)
A machine language can alter itself while running. The last assembly programming i did was for MS DOS and resident program that i used to run before testing other programs. When my program misbehaved, a keystroke switched to the resident program and could peek into the running program and alter it directly before resuming. It was quite handy since I didn't have a debugger.
LISP had this from the very beginning since it was originally interpreted. You could change the definition of a function while you were running and the whole langugage was always available at runtime, even eval and define. When it started getting compiled it wasn't compiled like Algol, but partially, allowing for interpreted and compiled code to intermix at the same time. The fact that its code structure was list structure and that symbols are a data type contributed to this.
Last interview I saw with McCarthy he was asked about what he thought of modern programming languages (Not LISP family but the Algol family language Ruby, that is said to be influenced by LISP), and before answering he asked if they could represent code as data (like list structure). Since it didn't, Ruby is still behind what LISP was in the 60s in his opinion.
Many new programming languages are emerging in the Algol family and some of the most promising ones, like Perl6 and Nemerle, are getting closer to the features LISP had in the 60s.
Machine language programs can fill memory regions with arbitrary bytes. Then they can just jump to the start of such region which will thus get executed right away.
Lisp language programs can easily create arbitrary S-expressions in memory, using cons. Then they can just call eval on these S-expressions to evaluate (interpret) them.
High level languages programs can easily fill memory regions with characters representing new code in the language's syntax. But they can not run such a code.
I am totally a newbie in Matlab
I want to ask that when we write a program in Matlab software or IDE and save it with a
.m (dot m) file and then compile and execute it, then that .m (dot m) file is converted into which file? I want to know this because i heard that matlab is platform independent and i did google this but i got converting matlab file to C, C++ etc
Sorry for the silly question and thanks in advance.
Matlab is an interpreted language. So in most cases there is no persistent intermediate form. However, there is an encrypted intermediate form called pcode and there are also the MATLAB compiler and MATLAB coder which delivers code in other high level languages such as C.
edit:
pcode is not generated automatically and should be platform/version independent. But it's major purpose is to encrypt the code, not to compile it (although, it does some partial compilation). To use pcode, you still need the MATLAB environment installed, so in many ways it acts like interpreted code.
But from your follow-up question I guess you don't quite understand how MATLAB works. The code gets interpreted (although with a bit of Just-In-Time Compilation), so there is no need for a persistent intermediate code file: the actual data structures representing your code are maintained by MATLAB. In contrast to compiled languages, where your development cycle is something like "write code, compile & link, execute", the compilation (actually: interpretation) step is part of the execution, so you end up with "write code, execute" in most of the cases.
Just to give you some intuitive understanding of the difference between a compiler and an interpreter. A compiler translates a high level language to a lower level language (let's say machine code that can be executed by your computer). Afterwards that compiled code (most likely stored in a file) is executed by your computer. An interpreter on the other hand, interprets your high level code piece by piece, determining what machine code corresponds to your high level code during the runtime of the program and immediately executes that machine code. So there is no real need to have a machine code equivalent of your entire program available (so in many cases an interpreter will not store the complete machine code, as that is just wasted effort and space).
You could look at interpretation more or less as a human would interpret code: when you try to manually determine the output of some code, you follow the calculations line by line and keep track of your results. You don't generally translate that entire code into some different form and afterwards execute that code. And since you don't translate the code entirely, there is no need to persistently store the intermediate form.
As I said above: you can use other tools such as MATLAB coder to convert your MATLAB code to other high languages such as C/C++, or you can use the MATLAB compiler to compile your code to executable form that depends on some runtime libraries. But those are only used in very specific cases (e.g. when you have to deploy a MATLAB application on computers/embedded devices without MATLAB, when you need to improve performance of your code, ...)
note: My explanation about compilers and interpreters is a quick comparison of the archetypal interpreter and compiler. Many real-life cases are somewhere in between, e.g. Java generally compiles to (JVM) bytecode which is then interpreted by the JVM and something similar can be said about the .NET languages and its CLR.
Since MATLAB is an interpreter, you can write code and just execute it from the IDE, without compilation.
If you want to deploy your program, you can use the MATLAB compiler to create an stand-alone executable or a shared library that you can use in a C++ project. On Windows, MATLAB code would compile to an .EXE file or a .DLL file, respectively.
A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.