Interested in VM for lisp-like languages on 8-bit system - lisp

I'm looking for recommended virtual machines that can run on a 8-bit microprocessor AND support dynamic languages. I'd like a VM solution because I perceive benefits in terms of code density, portability, and ability to have a smaller interpreter, leaving more room for larger programs.
My goal is to run a complete LOGO interpreter, following "LOGO for the Apple II" syntax, on something like a 6502 microprocessor.
I've seen references to PyMite, Java "micro edition", and of course now the UCSD p-System sources from the 1970s are available.
Suggestions are welcome.

(Note: I've already +1'ed the FORTH answer.)
Since you mention the 6502, Steve Wozniak (!) wrote an article for Byte magazine in the late 1970s, describing the SWEET16 interpreter for the 6502. This was a partial VM for the 6502, that provided 16-bit integer arithmetic that was EASILY interspersed into 6502 assembly language. It was the basis for the original Integer BASIC, that (as I recall) was later replaced by the floating-point Applesoft BASIC.

FORTH implementation for 6502.

You might want to check out the PICOBIT system, which is a Scheme implementation that works on very very small systems, such as the PIC18. It has since been ported to ARM, and could almost certainly be ported to the 6502 or other processors.

Related

Impact of choosing a programming language on the OS performance

Does choosing a programming language decide performance when all of it is compiled to some 1's and 0's
Eg: printf (in C) vs cout (C++) vs print (in Python)
Do all of the above have same binary compiled code ?
Appreciate any help in understanding this concept of programming language and role on hardware in detail! Thanks in advance
The choice of programming language can have many impacts on the performance of your code, how portable it is, the comparability and among other things, how easily the objective can be put into code. To answer you question directly, C and C++ would likely produce the 'same binary' when printing an output, if they were both done for the same target environment. Python is different because it is an interpreted language, meaning the code is read by a program written in code native to the architecture and acted upon accordingly. Python is something of an edge case in this regard because it is technically compiled at execution time (and can be before distribution) but into an intermediate code similar in principle to Java byte code that is only understood by the Python interpreter.
The difference you bring up between lower language's like C and higher ones like Java, Python and even JavaScript is the nature of their execution being done by native hardware or by the interpreter. Language's running on bare metal are generally understood to be faster than those on interpreters as the interpreter takes time to understand the code and uses it's own system resources. Java tends to break this rule because it's interpreter is a full virtual machine that understands very simple byte code, making it competitive in speed to language's like C.
To what kind of binary code they are compiled depends on the compiler. For C and C++ there are dozens of different compilers which might generate different binary code. Besides that, most compilers even have optimization flags that influence the generated binary code a lot.
Python isn't even directly compiled into "machine code", it's compiled into bytecode for a python interpreter. The Python interpreter itself is a program that runs on the machine, then reads the python-bytecode and executes it probably by internally calling predefined functions (that already exist in machine-code)

How does a disassembler work and how is it different from a decompiler?

I'm looking into installing a disassembler (or decompiler) on my Linux Mint 17.3 OS and I wanted to know what the difference is between a disassembler and a decompiler. I have a rough idea of what they are (the names are fairly self-explanatory), but they are still a bit confusing.
I've read that a disassembler turns a program into assembly language, which I don't know, so it seems kind of useless to me. I've also read that a decompiler turns a 'binary file' into its source code. What exactly is a binary file?
Apparently, decompilers cannot decompile to C, only Python and other similar languages. So how can I turn a program into its original C source code?
A disassembler is a pretty straightforward application that transfers machine code into assembly language statements - This activity is the reverse operation that an assembler program does and is straightforward because there is a strict one-to-one relationship between machine code and assembly. A disassembler aims at a specific CPU. The original assembler that was used to create the executable is only of minor relevance.
A decompiler aims at recreating a compiled high-level language program from machine code into its original format - Thus trying the reverse operation of a C or Forth (popular languages for which de-compilers exist) compiler. Because there are so many high-level languages and thus so many ways in how original high-level language constructs could be expressed in machine code (even a lot of different strategies for the same language and construct, even in the same compiler, and even different strategies depending on the compiler mode and situation), this operation is much more complex and very dependent on the original compiler (and maybe even the command line that was used, it's chosen optimization level and also the used version).
Even if all that fits, most of the work of a decompiler is educated guessing and will most probably never reach a point where it can reconstruct the original program in its source code form 100% - It will rather end up with a version of source code that could have been the original program.

LISP 1.5 How lisp is like a machine language?

I wish that John McCarthy was still alive, but...
From LISP 1.5 Programmer's Manual :
LISP can interpret and execute programs written in the form of S-
expressions. Thus, like machine language, and unlike most other higher
level languages, it can be used to generate programs for further
execution.
I need more clarification about how machine language can used to generate programs and how Lisp can do it?
All that is saying is that machine code can directly write machine instructions to memory and jump to those instructions to execute them; this is the basis of many attack vectors to break into software, in fact.
The point is, when you're writing machine code, it's easy to generate machine code. But when you're writing in a compiled language like C, you can't just generate C code at run time and then execute it - unless your program includes a C compiler.
Lisp - and, these days, many other languages, especially "scripting languages" like Perl, Python, Ruby, Tcl, Javascript, and command shells - have the ability to execute code that is generated at runtime. In Lisp, since code and data have the same structure, this is usually less work than it is in the other languages, where the code to be evaluated is generally a string that has to be parsed. (Though Perl has the ability to eval a block instead of a string, which lets the compiler do the parsing ahead of time for literal code.)
A machine language can alter itself while running. The last assembly programming i did was for MS DOS and resident program that i used to run before testing other programs. When my program misbehaved, a keystroke switched to the resident program and could peek into the running program and alter it directly before resuming. It was quite handy since I didn't have a debugger.
LISP had this from the very beginning since it was originally interpreted. You could change the definition of a function while you were running and the whole langugage was always available at runtime, even eval and define. When it started getting compiled it wasn't compiled like Algol, but partially, allowing for interpreted and compiled code to intermix at the same time. The fact that its code structure was list structure and that symbols are a data type contributed to this.
Last interview I saw with McCarthy he was asked about what he thought of modern programming languages (Not LISP family but the Algol family language Ruby, that is said to be influenced by LISP), and before answering he asked if they could represent code as data (like list structure). Since it didn't, Ruby is still behind what LISP was in the 60s in his opinion.
Many new programming languages are emerging in the Algol family and some of the most promising ones, like Perl6 and Nemerle, are getting closer to the features LISP had in the 60s.
Machine language programs can fill memory regions with arbitrary bytes. Then they can just jump to the start of such region which will thus get executed right away.
Lisp language programs can easily create arbitrary S-expressions in memory, using cons. Then they can just call eval on these S-expressions to evaluate (interpret) them.
High level languages programs can easily fill memory regions with characters representing new code in the language's syntax. But they can not run such a code.

The purpose of Lisp syntax to model AST

Lisp syntax represents AST as far as I know, but in high level format to allow human to easily read and modify, at the same time make it easy for the machine to process the source code as well.
For this reason, in Lisp, it is said that code is data and data is code, since code (s-epxression) is just AST, in essence. We can plug in more ASTs (which is our data, which is just lisp code) into other ASTs (lisp code) or independently to extend its functionality and manipulate it on the fly (runtime) without having to recompile the whole OS to integrate new code.In other languages, we have to recompile from to turn the human-language source code into valid AST before it is compiled into code.
Is this the reason for Lisp syntax to be designed like it is (represent an AST but is human readable, to satisfy both human and the machine) in the first place? To enable stronger (on the fly - runtime) as well as simpler (no recompile, faster) communication between man-machine?
I heard that the Lisp machine only has a single address space which holds all data. In operating system like Linux, the programmers only have virtual address space and pretend it to be the real physical address space and can do whatever they want. Data and code in Linux are separated regions, because effectively, data is data and data is code. In normal OS written in C (or C like language), it would be very messy if we only operate a single address space for the whole system and mixing data with code would be very messy.
In Lisp Machine, since code is data and data is code, is this the reason for this to have only a single address space (without the virtual layer)? Since we have GC and no pointer, should it be safe to operate on physical memory without breaking it (since having only 1 single space is a lot less complicated)?
EDIT: I ask this because it is said that one of the advantage of Lisp is single address space:
A safe language means a reliable environment without the need to
separate tasks out into their own separate memory spaces.
The "clearly separated process" model characteristic of Unix has
potent merits when dealing with software that might be unreliable to
the point of being unsafe, as is the case with code written in C or
C++ , where an invalid pointer access can "take down the system."
MS-DOS and its heirs are very unreliable in that sense, where just
about any program bug can take the whole system down; "Blue Screen of
Death" and the likes.
If the whole system is constructed and coded in Lisp, the system is as
reliable as the Lisp environment. Typically this is quite safe, as
once you get to the standards-compliant layers, they are quite
reliable, and don't offer direct pointer access that would allow the
system to self-destruct.
Third Law of Sane Personal Computing
Volatile storage devices (i.e. RAM) shall serve exclusively as
read/write cache for non-volatile storage devices. From the
perspective of all software except for the operating system, the
machine must present a single address space which can be considered
non-volatile. No computer system obeys this law which takes longer to
fully recover its state from a disruption of its power source than an
electric lamp would.
Single address space, as it is stated, holds all the running processes in the same memory space. I am just curious why people insist that single address space is better. I relate it to the AST like syntax of Lisp, to try to explain how it fits the single space model.
Your question doesn't reflect reality very accurately, especially in the part about code/data separation in Linux and other OS'es. Actually, this separation is enforced not at the OS level, but by the compiler/program loader. At the OS level there are just memory pages that can have different protection bits set (like executable, read-only etc), and above this level different executable formats exist (like ELF in Linux) that specify restrictions on different parts of program memory.
Returning to Lisp, as far as I know, historically, the S-expression format was used by Lisp creators, because they wanted to concentrate on the semantics of the language, putting syntax aside for some time. There was a plan to eventually create some syntax for Lisp (see M-expressions), and there were some Lisp-based languages which had some more syntax, like Dylan. But, overall, the Lisp community had come to the consensus, that the benefits of S-expressions outweight their cons, so they had stuck.
Regarding code as data, this is not strictly bound to S-expressions, as other code can as well be treated as data. This whole approach is called meta-programming and is supported at different levels and with different mechanisms by many languages. Every language, that supports eval (Perl, JavaScript, Python) allows to treat code as data, just the representation is almost always a string, while in Lisp it is a tree, which is much much more convenient and facilitates advanced stuff, like macros.

Lisp as a meta environment

I'm working towards my Ph.D regarding better software reuse by integrating different types of computer languages. Due to performance and safety issues I don't consider to integrate them with foreign function calls and/or the use of web services.
Lisp is my favorite vehicle, because of interactive development, macros, doing modifications at runtime, code as data (the usual things one would imagine hearing the word Lisp), and others.
There are some approaches to port different types of Lisp to virtual machines like the JVM (clojure, kawa, SISC, ABCL, etc.) or .NET (clojure .NET, DotLisp, IronLisp). This is quite interesting, but one is restricted to the "universe" of the respective virtual machine.
Does anybody know of approaches the other way round, i.e. running Java or C# on a Lisp system? I have found the rest of cloak. It seem to be more or less a dead project. To me it would be much more sensible to have Lisp as a common abstraction, hosting other languages like Java and C#.
Which obstacles do you see to overcome this lack of a generic and extendable "language environment" integrating languages like Java or C# (without foreign function calls or (web) services))? Is it due to the fact that no Lisp system is running on a kind of a virtual machine, like the LLVM for instance, or what else?
Best regards, Ingmar
Lisp is a good platform for this kind of language hosting because of its macro capabilities. However, you want many more language features to do it well: modules, reader macros, high-level macro specification, and so on. Racket is one Lisp variant that's going forward in this direction. You can already use Algol 60, a variant of Prolog, a typed sister language, and so on. Guile is also moving in this direction with an ECMAScript implementation.
As far as implementing Java or C# on Lisp, it is possible in theory but it would require a massive amount of work to support these languages at a practical level (Racket used to implement a small portion of Java). It's also not clear that you would really gain anything considering that the CLR and JVM are both multi-language platforms now. What is more interesting is harnessing Lisp macros to define better custom languages (DSLs), defining useful dialects of your Lisp, or implementing another language specifically to bootstrap a useful tool (e.g., Guile implementing Emacs Lisp).
well "it depends", as always, right?
How much of Lisp do you want expose to Java, if any? For example, if you port the JVM to Lisp, do you somehow mate the JVMs need for a garbage collector to the actual underlying GC of the Lisp implementation, or do you simply write your own that GCs the JVM objects within the JVM heap.
It may very well be impractical to mate the two, for several reasons. The Lisp GC is pretty much hidden, much like Javas GC, from the actual implementation. That may be too hidden to work with a JVM implementation.
There's no reason you can't build a JVM in Lisp, it's just a bunch of byte codes. Lisp handles bytes just fine.
There have been implementations of the JVM in JavaScript, it's not much different than a Lisp at its core.
But beyond having a lispy command line to interact with the JVM, the JVM itself wouldn't be very "lispy". How could it be? It's a JAVA VM. The IMPLEMENTATION can be "lispy", but, ideally, none of that lisp-ness would bubble out to the JVM itself.
Beyond any advantages Lisp has in developing ANY program, I don't think Lisp lends itself specifically to being "better" to developing a virtual machine.
Lisp is great at developing other languages, particularly other S-Exp based languages. But a VM is a VM. Monster case statement or some other dispatch base on numeric values mechanism.
Lisp is a perfect host language for such a meta-platform, but it is not necessarily an ideal target language for compiling something low level and imperative. Fortunately, nothing stops you from generating, say, an assembly code within your Lisp environment.