What is a Lisp image? - lisp

Essentially, I would like to know what a Lisp image is? Is it a slice of memory containing the Lisp interpreter and one or more programs or what?

The Lisp image as dumped memory
The image is usually a file. It is a dump of the memory of a Lisp system. It contains all functions (often compiled to machine code), variable values, symbols, etc. of the Lisp system. It is a snapshot of a running Lisp.
To create an image, one starts the Lisp, uses it for a while and then one dumps an image (the name of the function to do that depends on the implementation).
Using a Lisp image
Next time one restarts Lisp, one can use the dumped image and gets a state back roughly where one was before. When dumping an image one can also tell the Lisp what it should do when the dumped image is started. That way one can reconnect to servers, open files again, etc.
To start such a Lisp system, one needs a kernel and an image. Sometimes the Lisp can put both into a single file, so that an executable file contains both the kernel (with some runtime functionality) and the image data.
On a Lisp Machine (a computer, running a Lisp operating system) a kind of boot loader (the FEP, Front End Processor) may load the image (called 'world') into memory and then start this image. In this case there is no kernel and all that is running on the computer is this Lisp image, which contains all functionality (interpreter, compiler, memory management, GC, network stack, drivers, ...). Basically it is an OS in a single file.
Some Lisp systems will optimize the memory before dumping an image. They may do a garbage collection, order the objects in memory, etc.
Why use images?
Why would one use images? It saves time to load things and one can give preconfigured Lisp systems with application code and data to users. Starting a Common Lisp implementation with a saved image is usually fast - a few milliseconds on a current computer.
Since the Lisp image may contain a lot of functionality (a compiler, even a development environment, lots of debugging information, ...) it is typically several megabytes in size.
Using images in Lisp is very similar to what Smalltalk systems do. Squeak for example also uses an image of Smalltalk code and data and a runtime executable. There is a practical difference: most current Lisp systems use compiled machine code. Thus the image is not portable between different processor architectures (x86, x86-64, SPARC, POWER, ARM, ...) or even operating systems.
History
Such Lisp images have been in use since a long time. For example the function SYSOUT in BBN Lisp from 1967 created such an image. SYSIN would read such an image at start.
Examples for functions saving images
For an example see the function save-image of LispWorks or read the SBCL manual on saving core images.

Several language implementations use an 'image' to store ''everything'' that is there
in the current context. This image may several abstraction layers of compilation ( i.e.
an intermediate parse level, an intermediate byte code, native op codes ). Loading this
image will be much faster than compiling all the source files. It's very much like the
hibernate feature, at a program level.
For example , if your-lisp-implementation uses a image, than first you ( or the compiler
vendor will ) boot strap an image, and save it.
Then you have two options: (1) either load your lisp file everytime you invoke lisp or (2) load all your lisp, and save the image , and use this image
Hope that helps

In general, it's the storage part of the lisp process (that is, all "lisp" functions and data), but no parts of the underlying lisp binary. On the plus side, this gives a fast start-up, since there's (essentially) no book-keeping to be done on loading the image, everything is just there. On the other hand, it means any open files, sockets and what-have-you are missing, so saving images as some sort of check-pointing will require a bit of implementation to make it work.

Related

What is the difference and the relation between the Lisp interpreter and the Lisp image? Can they be used as synonoms?

I noticed some people using the terms as if they were synonoms.
For instance, in the same scenario, I heard "add this function to the lisp image evaluating it" and "eval this function into the Lisp interpreter to use it later".
However, I am not sure the use is technically precise. Thus, the question.
These are two orthogonal concepts. Let’s start from the usually comprehensive Common Lisp Glossary:
Lisp image n. a running instantiation of a Common Lisp implementation. A Lisp image is characterized by a single address space in which any object can directly refer to any another in conformance with this specification, and by a single, common, global environment.
So the key idea is that an image is a set of mutually referring Lisp objects (functions and data) that can be “called” or “accessed” during the execution of a program.
The way in which a Common Lisp program is executed depends instead from the way in which a system is implemented. It could be executed by compilation in machine language, for instance, or through some form of interpretation (or even a mix of the two). So a Lisp interpreter is just a particular way in which an implementation is done (and in the current Common Lisp systems there are many different ways to implement the language).
Image
"Image" is a file on disk.
"Add a function to the image"
means evaluate the function and save the image, so the function if immediately available on the next invocation.
REPL
"Interpreter" is (usually) a wrong level of abstraction; one should use
"REPL" instead. E.g., SBCL does not have an interpreter at all (everything is always compiled) but this is not a detail that is relevant to this topic.
"eval this function into the Lisp interpreter to use it later"
means evaluate the function in the current REPL and use it in the same process (i.e., it is available until Lisp is restarted).
An image is a copy of a Lisp heap written to disk (or another secondary storage). The Lisp heap is the memory for data storage in RAM of a computer. To write a Lisp heap to an image, the running Lisp is stopped and the memory is dumped to disk. Then the Lisp is either resumed or quit.
The image can be used to restore the heap upon starting a new Lisp. That's usually faster than starting a fresh Lisp and then loading the corresponding software.
A Lisp interpreter is a program which executes Lisp programs from source. Many Lisp implementations don't use an interpreter, but they execute compiled Lisp code, typically Lisp code compiled to native machine code.

Getting number of processors in emacs

Is there a good cross-platform method for determining the number of processors a machine has in elisp? I'm trying to make my config auto detect some build options, and would like to have it automatically use the number of processors + 1. Grepping /proc/cpuinfo isn't a solution for me because it won't work on Windows.
Emacs 24.3 Lisp doesn't have access to that information. Your options would seem to include:
writing an Elisp library which uses the value of SYSTEM-TYPE to choose a system-specific method for getting the processor count;
modifying the Emacs C source and rebuilding it so that it can, for each potentially multiprocessor platform on which Emacs builds, expose the processor count at the Lisp level.
At least, that was true four hours ago, when I first started writing this answer. But then I got interested in the problem, and now you have a third option:
downloading my system-cores.el library, which implements the first of the two options above, and calling (system-cores :physical) to get the number of physical processors, (system-cores :logical) to get the number of logical cores, or just plain (system-cores) to get an alist containing both.
Caveats include:
This library relies strongly on the PROCESS-LINES function. If that function can't do anything sensible in the context where you need to call SYSTEM-CORES, then SYSTEM-CORES can't either. (For example, if you're on Darwin and (getenv "PATH") doesn't contain /usr/sbin, PROCESS-LINES will bomb out with "Searching for program: no such file or directory, system_profiler".)
The systems currently known to be supported are GNU/Linux (anything with /proc/cpuinfo, more or less), Windows NT (and its derivatives, including 2000, XP, and all subsequent versions), and Darwin (OS X, at least 10.8, theoretically as far back as 10.2). Not coincidentally, these are also the systems to which I have access.
I've also included a delegate which should work properly on at least some flavors of BSD, but I don't have a BSD box on which to test it, so there's no telling whether or not it'll really work -- at the very least, you'll almost certainly need to modify the list of sysctls examined by the SYSTEM-CORES-SYSCTL delegate.
If you're using a modern variety of Linux, Windows, or OS X, great! You should be good to go, right out of the box. If not, and if your platform includes a command-line utility which provides the requisite information in its results, then it shouldn't be hard to write a delegate for your system. Or, if you don't want to write a delegate yourself, then email me all of:
the proper invocation for the command-line utility in question
a sample of the output it produces on your system
the output of M-: system-type in Emacs
the output of M-: system-configuration in Emacs
and I'll be able to write a delegate myself and add it to the library.
Edit: The :cores and :processors keywords have been replaced with :physical and :logical, respectively; I couldn't keep them straight, and I don't see why I should expect anyone else to, either.
Edit: Per #lunaryorn, replaced (split-string (shell-command-to-string ...)) with (process-lines ...). This saves invoking a shell, which might make the library more reliable, and certainly makes its code easier to read.

LISP 1.5 How lisp is like a machine language?

I wish that John McCarthy was still alive, but...
From LISP 1.5 Programmer's Manual :
LISP can interpret and execute programs written in the form of S-
expressions. Thus, like machine language, and unlike most other higher
level languages, it can be used to generate programs for further
execution.
I need more clarification about how machine language can used to generate programs and how Lisp can do it?
All that is saying is that machine code can directly write machine instructions to memory and jump to those instructions to execute them; this is the basis of many attack vectors to break into software, in fact.
The point is, when you're writing machine code, it's easy to generate machine code. But when you're writing in a compiled language like C, you can't just generate C code at run time and then execute it - unless your program includes a C compiler.
Lisp - and, these days, many other languages, especially "scripting languages" like Perl, Python, Ruby, Tcl, Javascript, and command shells - have the ability to execute code that is generated at runtime. In Lisp, since code and data have the same structure, this is usually less work than it is in the other languages, where the code to be evaluated is generally a string that has to be parsed. (Though Perl has the ability to eval a block instead of a string, which lets the compiler do the parsing ahead of time for literal code.)
A machine language can alter itself while running. The last assembly programming i did was for MS DOS and resident program that i used to run before testing other programs. When my program misbehaved, a keystroke switched to the resident program and could peek into the running program and alter it directly before resuming. It was quite handy since I didn't have a debugger.
LISP had this from the very beginning since it was originally interpreted. You could change the definition of a function while you were running and the whole langugage was always available at runtime, even eval and define. When it started getting compiled it wasn't compiled like Algol, but partially, allowing for interpreted and compiled code to intermix at the same time. The fact that its code structure was list structure and that symbols are a data type contributed to this.
Last interview I saw with McCarthy he was asked about what he thought of modern programming languages (Not LISP family but the Algol family language Ruby, that is said to be influenced by LISP), and before answering he asked if they could represent code as data (like list structure). Since it didn't, Ruby is still behind what LISP was in the 60s in his opinion.
Many new programming languages are emerging in the Algol family and some of the most promising ones, like Perl6 and Nemerle, are getting closer to the features LISP had in the 60s.
Machine language programs can fill memory regions with arbitrary bytes. Then they can just jump to the start of such region which will thus get executed right away.
Lisp language programs can easily create arbitrary S-expressions in memory, using cons. Then they can just call eval on these S-expressions to evaluate (interpret) them.
High level languages programs can easily fill memory regions with characters representing new code in the language's syntax. But they can not run such a code.

The purpose of Lisp syntax to model AST

Lisp syntax represents AST as far as I know, but in high level format to allow human to easily read and modify, at the same time make it easy for the machine to process the source code as well.
For this reason, in Lisp, it is said that code is data and data is code, since code (s-epxression) is just AST, in essence. We can plug in more ASTs (which is our data, which is just lisp code) into other ASTs (lisp code) or independently to extend its functionality and manipulate it on the fly (runtime) without having to recompile the whole OS to integrate new code.In other languages, we have to recompile from to turn the human-language source code into valid AST before it is compiled into code.
Is this the reason for Lisp syntax to be designed like it is (represent an AST but is human readable, to satisfy both human and the machine) in the first place? To enable stronger (on the fly - runtime) as well as simpler (no recompile, faster) communication between man-machine?
I heard that the Lisp machine only has a single address space which holds all data. In operating system like Linux, the programmers only have virtual address space and pretend it to be the real physical address space and can do whatever they want. Data and code in Linux are separated regions, because effectively, data is data and data is code. In normal OS written in C (or C like language), it would be very messy if we only operate a single address space for the whole system and mixing data with code would be very messy.
In Lisp Machine, since code is data and data is code, is this the reason for this to have only a single address space (without the virtual layer)? Since we have GC and no pointer, should it be safe to operate on physical memory without breaking it (since having only 1 single space is a lot less complicated)?
EDIT: I ask this because it is said that one of the advantage of Lisp is single address space:
A safe language means a reliable environment without the need to
separate tasks out into their own separate memory spaces.
The "clearly separated process" model characteristic of Unix has
potent merits when dealing with software that might be unreliable to
the point of being unsafe, as is the case with code written in C or
C++ , where an invalid pointer access can "take down the system."
MS-DOS and its heirs are very unreliable in that sense, where just
about any program bug can take the whole system down; "Blue Screen of
Death" and the likes.
If the whole system is constructed and coded in Lisp, the system is as
reliable as the Lisp environment. Typically this is quite safe, as
once you get to the standards-compliant layers, they are quite
reliable, and don't offer direct pointer access that would allow the
system to self-destruct.
Third Law of Sane Personal Computing
Volatile storage devices (i.e. RAM) shall serve exclusively as
read/write cache for non-volatile storage devices. From the
perspective of all software except for the operating system, the
machine must present a single address space which can be considered
non-volatile. No computer system obeys this law which takes longer to
fully recover its state from a disruption of its power source than an
electric lamp would.
Single address space, as it is stated, holds all the running processes in the same memory space. I am just curious why people insist that single address space is better. I relate it to the AST like syntax of Lisp, to try to explain how it fits the single space model.
Your question doesn't reflect reality very accurately, especially in the part about code/data separation in Linux and other OS'es. Actually, this separation is enforced not at the OS level, but by the compiler/program loader. At the OS level there are just memory pages that can have different protection bits set (like executable, read-only etc), and above this level different executable formats exist (like ELF in Linux) that specify restrictions on different parts of program memory.
Returning to Lisp, as far as I know, historically, the S-expression format was used by Lisp creators, because they wanted to concentrate on the semantics of the language, putting syntax aside for some time. There was a plan to eventually create some syntax for Lisp (see M-expressions), and there were some Lisp-based languages which had some more syntax, like Dylan. But, overall, the Lisp community had come to the consensus, that the benefits of S-expressions outweight their cons, so they had stuck.
Regarding code as data, this is not strictly bound to S-expressions, as other code can as well be treated as data. This whole approach is called meta-programming and is supported at different levels and with different mechanisms by many languages. Every language, that supports eval (Perl, JavaScript, Python) allows to treat code as data, just the representation is almost always a string, while in Lisp it is a tree, which is much much more convenient and facilitates advanced stuff, like macros.

"All programs are interpreted". How?

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.