LISP 1.5 How lisp is like a machine language? - lisp

I wish that John McCarthy was still alive, but...
From LISP 1.5 Programmer's Manual :
LISP can interpret and execute programs written in the form of S-
expressions. Thus, like machine language, and unlike most other higher
level languages, it can be used to generate programs for further
execution.
I need more clarification about how machine language can used to generate programs and how Lisp can do it?

All that is saying is that machine code can directly write machine instructions to memory and jump to those instructions to execute them; this is the basis of many attack vectors to break into software, in fact.
The point is, when you're writing machine code, it's easy to generate machine code. But when you're writing in a compiled language like C, you can't just generate C code at run time and then execute it - unless your program includes a C compiler.
Lisp - and, these days, many other languages, especially "scripting languages" like Perl, Python, Ruby, Tcl, Javascript, and command shells - have the ability to execute code that is generated at runtime. In Lisp, since code and data have the same structure, this is usually less work than it is in the other languages, where the code to be evaluated is generally a string that has to be parsed. (Though Perl has the ability to eval a block instead of a string, which lets the compiler do the parsing ahead of time for literal code.)

A machine language can alter itself while running. The last assembly programming i did was for MS DOS and resident program that i used to run before testing other programs. When my program misbehaved, a keystroke switched to the resident program and could peek into the running program and alter it directly before resuming. It was quite handy since I didn't have a debugger.
LISP had this from the very beginning since it was originally interpreted. You could change the definition of a function while you were running and the whole langugage was always available at runtime, even eval and define. When it started getting compiled it wasn't compiled like Algol, but partially, allowing for interpreted and compiled code to intermix at the same time. The fact that its code structure was list structure and that symbols are a data type contributed to this.
Last interview I saw with McCarthy he was asked about what he thought of modern programming languages (Not LISP family but the Algol family language Ruby, that is said to be influenced by LISP), and before answering he asked if they could represent code as data (like list structure). Since it didn't, Ruby is still behind what LISP was in the 60s in his opinion.
Many new programming languages are emerging in the Algol family and some of the most promising ones, like Perl6 and Nemerle, are getting closer to the features LISP had in the 60s.

Machine language programs can fill memory regions with arbitrary bytes. Then they can just jump to the start of such region which will thus get executed right away.
Lisp language programs can easily create arbitrary S-expressions in memory, using cons. Then they can just call eval on these S-expressions to evaluate (interpret) them.
High level languages programs can easily fill memory regions with characters representing new code in the language's syntax. But they can not run such a code.

Related

What is the difference and the relation between the Lisp interpreter and the Lisp image? Can they be used as synonoms?

I noticed some people using the terms as if they were synonoms.
For instance, in the same scenario, I heard "add this function to the lisp image evaluating it" and "eval this function into the Lisp interpreter to use it later".
However, I am not sure the use is technically precise. Thus, the question.
These are two orthogonal concepts. Let’s start from the usually comprehensive Common Lisp Glossary:
Lisp image n. a running instantiation of a Common Lisp implementation. A Lisp image is characterized by a single address space in which any object can directly refer to any another in conformance with this specification, and by a single, common, global environment.
So the key idea is that an image is a set of mutually referring Lisp objects (functions and data) that can be “called” or “accessed” during the execution of a program.
The way in which a Common Lisp program is executed depends instead from the way in which a system is implemented. It could be executed by compilation in machine language, for instance, or through some form of interpretation (or even a mix of the two). So a Lisp interpreter is just a particular way in which an implementation is done (and in the current Common Lisp systems there are many different ways to implement the language).
Image
"Image" is a file on disk.
"Add a function to the image"
means evaluate the function and save the image, so the function if immediately available on the next invocation.
REPL
"Interpreter" is (usually) a wrong level of abstraction; one should use
"REPL" instead. E.g., SBCL does not have an interpreter at all (everything is always compiled) but this is not a detail that is relevant to this topic.
"eval this function into the Lisp interpreter to use it later"
means evaluate the function in the current REPL and use it in the same process (i.e., it is available until Lisp is restarted).
An image is a copy of a Lisp heap written to disk (or another secondary storage). The Lisp heap is the memory for data storage in RAM of a computer. To write a Lisp heap to an image, the running Lisp is stopped and the memory is dumped to disk. Then the Lisp is either resumed or quit.
The image can be used to restore the heap upon starting a new Lisp. That's usually faster than starting a fresh Lisp and then loading the corresponding software.
A Lisp interpreter is a program which executes Lisp programs from source. Many Lisp implementations don't use an interpreter, but they execute compiled Lisp code, typically Lisp code compiled to native machine code.

Impact of choosing a programming language on the OS performance

Does choosing a programming language decide performance when all of it is compiled to some 1's and 0's
Eg: printf (in C) vs cout (C++) vs print (in Python)
Do all of the above have same binary compiled code ?
Appreciate any help in understanding this concept of programming language and role on hardware in detail! Thanks in advance
The choice of programming language can have many impacts on the performance of your code, how portable it is, the comparability and among other things, how easily the objective can be put into code. To answer you question directly, C and C++ would likely produce the 'same binary' when printing an output, if they were both done for the same target environment. Python is different because it is an interpreted language, meaning the code is read by a program written in code native to the architecture and acted upon accordingly. Python is something of an edge case in this regard because it is technically compiled at execution time (and can be before distribution) but into an intermediate code similar in principle to Java byte code that is only understood by the Python interpreter.
The difference you bring up between lower language's like C and higher ones like Java, Python and even JavaScript is the nature of their execution being done by native hardware or by the interpreter. Language's running on bare metal are generally understood to be faster than those on interpreters as the interpreter takes time to understand the code and uses it's own system resources. Java tends to break this rule because it's interpreter is a full virtual machine that understands very simple byte code, making it competitive in speed to language's like C.
To what kind of binary code they are compiled depends on the compiler. For C and C++ there are dozens of different compilers which might generate different binary code. Besides that, most compilers even have optimization flags that influence the generated binary code a lot.
Python isn't even directly compiled into "machine code", it's compiled into bytecode for a python interpreter. The Python interpreter itself is a program that runs on the machine, then reads the python-bytecode and executes it probably by internally calling predefined functions (that already exist in machine-code)

The concept of Self-Hosting

So I'm developing a small programming language, and am trying to grasp around the concept of "Self-Hosting".
Wikipedia states:
The first self-hosting compiler (excluding assemblers) was written for Lisp by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
From this, I understand that someone had a Lisp interpreter, (lets say in Python).
The Python program then reads a Lisp program which in turn can also read Lisp programs.
By the term, "Self-Hosting", this surely can't mean the Python program can cease to be of use, because removing that would remove the ability to run the Lisp program which reads other Lisp programs!
So by this, how does a program become able to host itself directly on the OS? Maybe I'm just not understanding it correctly.
In this case, the term self-hosting applies to the Lisp compiler they wrote, not the interpreter.
The Python Lisp interpreter (as in your example) would take Lisp source as input, and execute it directly.
The Lisp compiler (written in lisp) can take any Lisp source as input and generate a native machine binary[1] as output (which could then run without an interpreter).
With those two pieces, eliminating Python becomes feasible. The process would go as follows:
python.exe lispinterpret.py lispcompiler.lisp -i lispcompiler.lisp -o lispcompiler.exe
We ask Python to interpret a lisp program from source (lispcompiler.lisp), and we pass lispcompiler.lisp itself as input. lispcompiler.lisp then outputs lispcompiler.exe as output, which is a native machine binary (and doesn't depend on Python).
The next time you want to compile the compiler, the command is:
lispcompiler.exe -i lispcompiler.lisp -o lispcompiler2.exe
And you will have a new compiler without the use of Python.
[1] Or you could generate assembly code, which is passed to an assembler.

Is LISP a compiled or interpreted language?

I know there is no such thing, strictly speaking, as a compiled or interpreted language.
But, generally speaking, is LISP used to write scripts like Python, bash script, and batch script?
Or is it a general purpose programming language like C++, JAVA, and C#?
Can anyone explain this in simple terms?
Early versions of Lisp programming language and Dartmouth BASIC would be examples interpreter language (parse the source code and perform its behavior directly.). However, Common lisp (Current version) is a compiler language.
Note that most Lisp compilers are not Just In Time compilers. You as a programmer can invoke the compiler, for example in Common Lisp with the functions COMPILE and COMPILE-FILE. Then Lisp code gets compiled.
Additionally most Lisp systems with both a compiler and an interpreter allow the execution of interpreted and compiled code to be freely mixed.
For more details check here
Lisp is a compiled general purpose language, in its modern use.
To clarify:
“LISP” is nowadays understood as “Common Lisp”
Common Lisp is an ANSI Standard
There are several implementations of Common Lisp, both free and commercial
Code is usually compiled, then loaded into an image. The order in which the individual parts/files of an entire system are compiled and loaded is usually defined through a system definition facility (which mostly means ASDF nowadays).
Most implementations also provide a means for loading source code when started. Example:
sbcl --load 'foo.lisp'
This makes it also possible to use lisp source files as “scripts”, even though they will very likely be compiled before execution.
Traditionally, LISP can be interpreted or compiled -- with some of each running at the same time. Compilation, in some cases, would be to a virtual machine like JAVA.
LISP is a general purpose programming language, but rarely used as such anymore. In the days of microcoded LISP machines, the entire operating system, including things like network, graphics and printer drivers, were all written in LISP itself. The very first IMAP mail client, for example, was written entirely in LISP.
The unusual syntax likely makes other programming languages, like Python, more attractive. But if one looks carefully, you can find LISP-inspired elements in popular languages like Perl.

The purpose of Lisp syntax to model AST

Lisp syntax represents AST as far as I know, but in high level format to allow human to easily read and modify, at the same time make it easy for the machine to process the source code as well.
For this reason, in Lisp, it is said that code is data and data is code, since code (s-epxression) is just AST, in essence. We can plug in more ASTs (which is our data, which is just lisp code) into other ASTs (lisp code) or independently to extend its functionality and manipulate it on the fly (runtime) without having to recompile the whole OS to integrate new code.In other languages, we have to recompile from to turn the human-language source code into valid AST before it is compiled into code.
Is this the reason for Lisp syntax to be designed like it is (represent an AST but is human readable, to satisfy both human and the machine) in the first place? To enable stronger (on the fly - runtime) as well as simpler (no recompile, faster) communication between man-machine?
I heard that the Lisp machine only has a single address space which holds all data. In operating system like Linux, the programmers only have virtual address space and pretend it to be the real physical address space and can do whatever they want. Data and code in Linux are separated regions, because effectively, data is data and data is code. In normal OS written in C (or C like language), it would be very messy if we only operate a single address space for the whole system and mixing data with code would be very messy.
In Lisp Machine, since code is data and data is code, is this the reason for this to have only a single address space (without the virtual layer)? Since we have GC and no pointer, should it be safe to operate on physical memory without breaking it (since having only 1 single space is a lot less complicated)?
EDIT: I ask this because it is said that one of the advantage of Lisp is single address space:
A safe language means a reliable environment without the need to
separate tasks out into their own separate memory spaces.
The "clearly separated process" model characteristic of Unix has
potent merits when dealing with software that might be unreliable to
the point of being unsafe, as is the case with code written in C or
C++ , where an invalid pointer access can "take down the system."
MS-DOS and its heirs are very unreliable in that sense, where just
about any program bug can take the whole system down; "Blue Screen of
Death" and the likes.
If the whole system is constructed and coded in Lisp, the system is as
reliable as the Lisp environment. Typically this is quite safe, as
once you get to the standards-compliant layers, they are quite
reliable, and don't offer direct pointer access that would allow the
system to self-destruct.
Third Law of Sane Personal Computing
Volatile storage devices (i.e. RAM) shall serve exclusively as
read/write cache for non-volatile storage devices. From the
perspective of all software except for the operating system, the
machine must present a single address space which can be considered
non-volatile. No computer system obeys this law which takes longer to
fully recover its state from a disruption of its power source than an
electric lamp would.
Single address space, as it is stated, holds all the running processes in the same memory space. I am just curious why people insist that single address space is better. I relate it to the AST like syntax of Lisp, to try to explain how it fits the single space model.
Your question doesn't reflect reality very accurately, especially in the part about code/data separation in Linux and other OS'es. Actually, this separation is enforced not at the OS level, but by the compiler/program loader. At the OS level there are just memory pages that can have different protection bits set (like executable, read-only etc), and above this level different executable formats exist (like ELF in Linux) that specify restrictions on different parts of program memory.
Returning to Lisp, as far as I know, historically, the S-expression format was used by Lisp creators, because they wanted to concentrate on the semantics of the language, putting syntax aside for some time. There was a plan to eventually create some syntax for Lisp (see M-expressions), and there were some Lisp-based languages which had some more syntax, like Dylan. But, overall, the Lisp community had come to the consensus, that the benefits of S-expressions outweight their cons, so they had stuck.
Regarding code as data, this is not strictly bound to S-expressions, as other code can as well be treated as data. This whole approach is called meta-programming and is supported at different levels and with different mechanisms by many languages. Every language, that supports eval (Perl, JavaScript, Python) allows to treat code as data, just the representation is almost always a string, while in Lisp it is a tree, which is much much more convenient and facilitates advanced stuff, like macros.