Is Lisp a virtual machine like JVM? - lisp

Why would someone refer to Lisp as a virtual machine like JVM?

Probably because that person refers to a specific implementation of Lisp which runs on top of a Virtual Machine? Various Lisp systems since atleast the 70s have been running on top of specialized virtual machines. Some current implementations like CLISP and CMUCL still have their own virtual machines.
Virtual machines for Lisp are usually specially tailored for the demands of Lisp. They provide the necessary primitive data types (like cons cells, symbols and large integers), instruction set (generic function calling, run-time data type checking, ...), memory management (garbage collection) and other services (dynamic loading of code). They often provide some kind of extended stack machine.

Related

Is it possible/useful to transpile Scala to golang?

Scala native has been recently released, but the garbage collector they used (for now) is extremely rudimentary and makes it not suitable for serious use.
So I wonder: why not just transpile Scala to Go (a la Scala.js)? It's going to be a fast, portable runtime. And their GC is getting better and better. Not to mention the inheritance of a great concurrency model: channels and goroutines.
So why did scala-native choose to go so low level with LLVM?
What would be the catch with a golang transpiler?
There are two kinds of languages that are good targets for compilers:
Languages whose semantics closely match the source language's semantics.
Languages which have very low-level and thus very general semantics (or one might argue: no semantics at all).
Examples for #1 include: compiling ECMAScript 2015 to ECMAScript 5 (most language additions were specifically designed as syntactic sugar for existing features, you just have to desugar them), compiling CoffeeScript to ECMAScript, compiling TypeScript to ECMAScript (basically, after type checking, just erase the types and you are done), compiling Java to JVM byte code, compiling C♯ to CLI CIL bytecode, compiling Python to CPython bytecode, compiling Python to PyPy bytecode, compiling Ruby to YARV bytecode, compiling Ruby to Rubinius bytecode, compiling ECMAScript to SpiderMonkey bytecode.
Examples for #2 include: machine code for a general purpose CPU (RISC even more so), C--, LLVM.
Compiling Scala to Go fits neither of the two. Their semantics are very different.
You need either a language with powerful low-level semantics as the target language, so that you can build your own semantics on top, or you need a language with closely matching semantics, so that you can map your own semantics into the target language.
In fact, even JVM bytecode is already too high-level! It has constructs such as classes that do not match constructs such as Scala's traits, so there has to be a fairly complex encoding of traits into classes and interfaces. Likewise, before invokedynamic, it was actually pretty much impossible to represent dynamic dispatch on structural types in JVM bytecode. The Scala compiler had to resort to reflection, or in other words, deliberately stepping outside of the semantics of JVM bytecode (which resulted in a terrible performance overhead for method dispatch on structural types compared to method dispatch on other class types, even though both are the exact same thing).
Proper Tail Calls are another example: we would like to have them in Scala, but because JVM bytecode is not powerful enough to express them without a very complex mapping (basically, you have to forego using the JVM's call stack altogether and manage your own stack, which destroys both performance and Java interoperability), it was decided to not have them in the language.
Go has some of the same problems: in order to implement Scala's expressive non-local control-flow constructs such as exceptions or threads, we need an equally expressive non-local control-flow construct to map to. For typical target languages, this "expressive non-local control-flow construct" is either continuations or the venerable GOTO. Go has GOTO, but it is deliberately limited in its "non-localness". For writing code by humans, limiting the expressive power of GOTO is a good thing, but for a compiler target language, not so much.
It is very likely possible to rig up powerful control-flow using goroutines and channels, but now we are already leaving the comfortable confines of just mapping Scala semantics to Go semantics, and start building Scala high-level semantics on top of Go high-level semantics that weren't designed for such usage. Goroutines weren't designed as a general control-flow construct to build other kinds of control-flow on top of. That's not what they're good at!
So why did scala-native choose to go so low level with LLVM?
Because that's precisely what LLVM was designed for and is good at.
What would be the catch with a golang transpiler?
The semantics of the two languages are too different for a direct mapping and Go's semantics are not designed for building different language semantics on top of.
their GC is getting better and better
So can Scala-native's. As far as I understand, the choice for current use of Boehm-Dehmers-Weiser is basically one of laziness: it's there, it works, you can drop it into your code and it'll just do its thing.
Note that changing the GC is under discussion. There are other GCs which are designed as drop-ins rather than being tightly coupled to the host VM's object layout. E.g. IBM is currently in the process of re-structuring J9, their high-performance JVM, into a set of loosely coupled, independently re-usable "runtime building blocks" components and releasing them under a permissive open source license.
The project is called "Eclipse OMR" (source on GitHub) and it is already production-ready: the Java 8 implementation of IBM J9 was built completely out of OMR components. There is a Ruby + OMR project which demonstrates how the components can easily be integrated into an existing language runtime, because the components themselves assume no language semantics and no specific memory or object layout. The commit which swaps out the GC and adds a JIT and a profiler clocks in at just over 10000 lines. It isn't production-ready, but it boots and runs Rails. They also have a similar project for CPython (not public yet).
why not just transpile Scala to Go (a la Scala.js)?
Note that Scala.JS has a lot of the same problems I mentioned above. But they are doing it anyway, because the gain is huge: you get access to every web browser on the planet. There is no comparable gain for a hypothetical Scala.go.
There's a reason why there are initiatives for getting low-level semantics into the browser such as asm.js and WebAssembly, precisely because compiling a high-level language to another high-level language always has this "semantic gap" you need to overcome.
In fact, note that even for lowish-level languages that were specifically designed as compilation targets for a specific language, you can still run into trouble. E.g. Java has generics, JVM bytecode doesn't. Java has inner classes, JVM bytecode doesn't. Java has anonymous classes, JVM bytecode doesn't. All of these have to be encoded somehow, and specifically the encoding (or rather non-encoding) of generics has caused all sorts of pain.

Where do the names value/expression kept in functional programs?

In C#, all the value fields like int, float are kept in stack and all the reference variables pointers are in stack and the actual values are kept in heap. (hope my understanding is correct here).
1. Since in functional programming model there is no value and reference type, where do the name symbol values are kept?
2.How does the stack and heap come into play on functional programs?
Thanks
You're trying to compare C#, which is one specific language, with functional languages all as a group. This is an apples-to-oranges comparison (or maybe more accurately, apples-to-spices comparison?).
Within imperative languages already you can observe differences between what values are stored in the stack vs. which ones go on the heap. For example, C and C++ (as I understand it) allow the programmer to manually choose which of these two ways they want for any type.
And another subtlety is the difference between what the language guarantees to the programmer vs. the way the language is implemented. One example is that recent versions of Oracle's Java VM have an optimization that they call "escape analysis", which is able to allocate an object on the stack if the VM can prove that the object reference does not escape the method (determined after inlining is performed). So even though Java calls its object types "reference" types, this doesn't mean that it will be allocated in the heap. Quoting this article by Brian Goetz:
The Java language does not offer any way to explicitly allocate an object on the stack, but this fact doesn't prevent JVMs from still using stack allocation where appropriate. JVMs can use a technique called escape analysis, by which they can tell that certain objects remain confined to a single thread for their entire lifetime, and that lifetime is bounded by the lifetime of a given stack frame. Such objects can be safely allocated on the stack instead of the heap. Even better, for small objects, the JVM can optimize away the allocation entirely and simply hoist the object's fields into registers.
Similar considerations apply to functional languages—it all depends on (a) what does the language promise, and (b) how the language implementation works and how sophisticated it is. But we can divide the functional language world into two important camps:
Eager functional languages like Scheme, Scala, Clojure or ML.
Lazy functional languages like Haskell.
There are several types of implementation for eager languages:
Pure stack-based implementations. These work same way as modern imperative languages. Common Lisp works this way. Since JVM functional languages use the same VM as Java does, so do they.
Pure continuation-passing style implementations. These are completely stackless—everything, including activation frames, is allocated on the heap. These make it easy to support tail-call optimization and first-class continuations. This technique I believe was pioneered by Scheme implementations, and is also used by the Standard ML of New Jersey compiler.
Mixed implementations. These typically are trying to be mostly stack-based but also support tail-call optimization, and maybe first-class continuations. Example: a bunch of random Scheme systems.
Lazy languages are another story, because the conventional call-stack implementation does not translate directly to lazy evaluation. The GHC Haskell compiler is based on a model called the "STG Machine", which does use a stack and a heap, but the way the STG stack works is different from imperative languages; an entry in the STG stack does not correspond to a "function call" as conventional stack entries do.
Since functional languages generally use immutable values (meaning: "variables" that you can't modify), it doesn't matter for the user whether the values are stored on the stack or on the heap.
Because of this, typically the compiler will decide how to store the values. For example, it might decide that small values (integers, floats, pairs of integers, 8 byte arrays, etc) are stored on the stack and large values (strings, lists, ...) are stored on the heap. It is fully the compiler's decision.
For languages like Haskell, that supports lazy evaluation, values need to be stored on the heap (except when some tricks are used). This is because a variable needs to either be a pointer to the function/closure that computes the value, or a pointer to the actual already computed value.
Since Scala is mentioned in the tags, I'll add and answer about that language. Scala compiles to JVM bytecode, so, at the end of the day, it works just like any other JVM language (including Java):
references and locally defined primitives go on the stack;
objects (including their primitive fields) go on the heap.
About primitive types, it's worth noting that Scala doesn't actually have primitive types in the language; but value types (like Int or Long) do get compiled to the JVM's primitive types in the bytecode, when possible.
edit: To avoid leaving somethign incorrect in this answer: as mentioned in Luis Casillas' extensive answer, objects may end up stored on the stack (or even not allocated as objects at all) if the JVM can judge that it's safe and efficient to do so.

Why doesn't a primitive `call-with-current-continuations` exist in Common Lisp

Scheme offers a primitive call-with-current-continuation, commonly abbreviated call/cc, which has no equivalent in the ANSI Common Lisp specification (although there are some libraries that try to implement them).
Does anybody know the reason why the decision of not creating a similar primitive in the ANSI Common Lisp specification was made?
Common Lisp has a detailed file compilation model as part of the standard language. The model supports compiling the program to object files in one environment, and loading them into an image in another environment. There is nothing comparable in Scheme. No eval-when, or compile-file, load-time-value or concepts like what is an externalizable object, how semantics in compiled code must agree with interpreted code. Lisp has a way to have functions inlined or not to have them inlined, and so basically you control with great precision what happens when a compiled module is re-loaded.
By contrast, until a recent revision of the Scheme report, the Scheme language was completely silent on the topic of how a Scheme program is broken into multiple files. No functions or macros were provided for this. Look at R5RS, under 6.6.4 System Interface. All that you have there is a very loosely defined load function:
optional procedure: (load filename)
Filename should be a string naming an existing file containing Scheme source code. The load procedure reads expressions and definitions from the file and evaluates them sequentially. It is unspecified whether the results of the expressions are printed. The load procedure does not affect the values returned by current-input-port and current-output-port. Load returns an unspecified value.
Rationale: For portability, load must operate on source files. Its operation on other kinds of files necessarily varies among implementations.
So if that is the extent of your vision about how applications are built from modules, and all details beyond that are left to implementors to work out, of course the sky is the limit regarding inventing programming language semantics. Note in part the Rationale part: if load is defined as operating on source files (with all else being a bonus courtesy of the implementors) then it is nothing more than a textual inclusion mechanism like #include in the C language, and so the Scheme application is really just one body of text that is physically spread into multiple text files pulled together by load.
If you're thinking about adding any feature to Common Lisp, you have to think about how it fits into its detailed dynamic loading and compilation model, while preserving the good performance that users expect.
If the feature you're thinking of requires global, whole-program optimization (whereby the system needs to see the structural source code of everything) in order that users' programs not run poorly (and in particular programs which don't use that feature) then it won't really fly.
Specifically with regard to the semantics of continuations, there are issues. In the usual semantics of a block scope, once we leave a scope and perform cleanup, that is gone; we cannot go back to that scope in time and resume the computation. Common Lisp is ordinary in that way. We have the unwind-protect construct which performs unconditional cleanup actions when a scope terminates. This is the basis for features like with-open-file which provides an open file handle object to a block scope and ensures that this is closed no matter how the block scope terminates. If a continuation escapes from that scope, that continuation no longer has a valid file. We cannot simply not close the file when we leave the scope because there is no assurance that the continuation will ever be used; that is to say, we have to assume that the scope is in fact being abandoned forever and clean up the resource in a timely way. The band-aid solution for this kind of problem is dynamic-wind, which lets us add handlers on entry and exit to a block scope. Thus we can re-open the file when the block is restarted by a continuation. And not only re-open it, but actually position the stream at exactly the same position in the file and so on. If the stream was half way through decoding some UTF-8 character, we must put it into the same state. So if Lisp got continuations, either they would be broken by various with- constructs that perform cleanup (poor integration) or else those constructs would have to acquire much more hairy semantics.
There are alternatives to continuations. Some uses of continuations are non-essential. Essentially the same code organization can be obtained with closures or restarts. Also, there is a powerful language/operating-system construct that can compete with the continuation: namely, the thread. While continuations have aspects that are not modeled nicely by threads (and not to mention that they do not introduce deadlocks and race conditions into the code) they also have disadvantages compared to threads: like the lack of actual concurrency for utilization of multiple processors, or prioritization. Many problems expressible with continuations can be expressed with threads almost as easily. For instance, continuations let us write a recursive-descent parser which looks like a stream-like object which just returns progressive results as it parses. The code is actually a recursive descent parser and not a state machine which simulates one. Threads let us do the same thing: we can put the parser into a thread wrapped in an "active object", which has some "get next thing" method that pulls stuff from a queue. As the thread parsers, instead of returning a continuation, it just throws objects into a queue (and possibly blocks for some other thread to remove them). Continuation of execution is provided by resuming that thread; its thread context is the continuation. Not all threading models suffer from race conditions (as much); there is for instance cooperative threading, under which one thread runs at a time, and thread switches only potentially take place when a thread makes an explicit call into the threading kernel. Major Common Lisp implementations have had light-weight threads (typically called "processes") for decades, and have gradually moved toward more sophisticated threading with multiprocessing support. The support for threads lessens the need for continuations, and is a greater implementation priority because language run-times without thread support are at technological disadvantage: inability to take full advantage of the hardware resources.
This is what Kent M. Pitman, one of the designers of Common Lisp, had to say on the topic: from comp.lang.lisp
The design of Scheme was based on using function calls to replace most common control structures. This is why Scheme requires tail-call elimination: it allows a loop to be converted to a recursive call without potentially running out of stack space. And the underlying approach of this is continuation-passing style.
Common Lisp is more practical and less pedagogic. It doesn't dictate implementation strategies, and continuations are not required to implement it.
Common Lisp is the result of a standardization effort on several flavors of practical (applied) Lisps (thus "Common"). CL is geared towards real life applications, thus it has more "specific" features (like handler-bind) instead of call/cc.
Scheme was designed as small clean language for teaching CS, so it has the fundamental call/cc which can be used to implement other tools.
See also Can call-with-current-continuation be implemented only with lambdas and closures?

Dynamic typing and programming distributed systems

Coming from Scala (and Akka), I recently began looking at other languages that were designed with distributed computing in mind, namely Erlang (and a tiny bit of Oz and Bloom). Both Erlang and Oz are dynamically typed, and if I remember correctly (will try to find link) people have tried to add types to Erlang and managed to type a good portion of it, but could not successfully coerce the system to make it fit the last bit?
Oz, while a research language, is certainly interesting to me, but that is dynamically typed as well.
Bloom's current implementation is in Ruby, and is consequently dynamically typed.
To my knowledge, Scala (and I suppose Haskell, though I believe that was built initially more as an exploration into pure lazy functional languages as opposed to distributed systems) is the only language that is statically typed and offer language-level abstractions (for lack of a better term) in distributed computing.
I am just wondering if there are inherent advantages of dynamic typing over static typing, specifically in the context of providing language level abstractions for programming distributed systems.
Not really. For example, the same group that invented Oz later did some work on Alice ML, a project whose mission statement was to rethink Oz as a typed, functional language. And although it remained a research project, I'd argue that it was enough proof of concept to demonstrate that the same basic functionality can be supported in such a setting.
(Full disclosure: I was a PhD student in that group at the time, and the type system of Alice ML was my thesis.)
Edit: The problem with adding types to Erlang isn't distribution, it simply is an instance of the general problem that adding types to a language after the fact never works out well. On the other hand, there still is Dialyzer for Erlang.
Edit 2: I should mention that there were other interesting research projects for typed distributed languages, e.g. Acute, which had a scope similar to Alice ML, or ML5, which used modal types to enable stronger checking of mobility characteristics. But they have only survived in the form of papers.
There are no inherent advantages of dynamic typing over static typing for distributed systems. Both have their own advantages and disadvantages in general.
Erlang (Akka is inspired from Erlang Actor Model) is dynamically typed. Dynamic typing in Erlang was historically chosen for simple reasons; those who implemented Erlang at first mostly came from dynamically typed languages particularly Prolog, and as such, having Erlang dynamic was the most natural option to them. Erlang was built with failure in mind.
Static typing helps in catching many errors during compilation time itself rather than at runtime as in case of dynamic typing. Static Typing was tried in Erlang and it was a failure. But Dynamic typing helps in faster prototyping. Check this link for reference which talks a lot about the difference.
Subjectively, I would rather think about the solution/ algorithm of a problem rather than thinking about the type of each of the variable that I use in the algorithm. It also helps in quick development.
These are few links which might help
BenefitsOfDynamicTyping
static-typing-vs-dynamic-typing
BizarroStaticTypingDebate
Cloud Haskell is maturing quickly, statically-typed, and awesome. The only thing it doesn't feature is Erlang-style hot code swapping - that's the real "killer feature" of dynamically-typed distributed systems (the "last bit" that made Erlang difficult to statically type).

Are there any managed programming languages that compile to machine code?

Managed languages being the ones that handle memory cleanup for you.
EDIT I'm not talking about garbage collection. I was just interested in knowing about languages that would free() memory for me automatically, and still compile down to machine code.
You seem to be confusing "Managed" and "Garbage collection", while often managed languages (for example C# and Java) have automated garbage collection, "managed" actually refers to the fact that there is a "virtual machine" which executes your code (see http://en.wikipedia.org/wiki/Managed_code).
So for example the CLR (common language runtime) is the virtual machine executing .Net code, and the JVM (Java virtual machine) is the virtual machine executing java code.
You can in fact have Garbage collection for unmanaged languages (for example C++), and visa versa have managed languages without garbage collection (EDIT: I was looking for some but I can't seem to find any unless Objective C counts, I'm not sure it makes a huge amount of sense to create a managed language without garbage collection anyway)
Both of Java and C# can in fact be compiled directly into machine code, so they are executed directly and not using a virtual machine - for .Net code this is done using NGEN (in fact the CLR compiles .Net assemblies into machine code as you execute it, so-called "Just in time" compilation)
EDIT: As an update to the update of your question, there are in fact a number of alternatives to garbage collection in a spectrum between the extreme of complete manual memory management and garbage collection, and a lot of languages which compile to machine code incorporate varying forms of memory management which dont require you to explicitly free memory.
Can I ask - is this an "out of interest" question, or are you trying to select a language for a project - If the latter then why are you so interested in having your langauge compile down to machine code? Certainly in the case of .Net having your code JIT compiled offers a number of performance advantages (in the majority of cases), also NGENing your code doesn't remove the dependency on the .Net framework.
lots:
LISP (and variants), Erlang, C# (under Mono), Haskell, Java (with gcj)
Sure there are. Java, for instance. (gcj)
However the term managed itself implies you have to carry some runtime around.
A few more, in the broader sense of "managed" meaning safe (via runtime type checking or exhaustive static analysis) and/or garbage collected:
OCaml
D
Ada
Prolog
Clean
Eiffel
Analog to Efraims's answer, any .NET program will compile to machine code as well, usually in 2 steps (JIT) but there is a NGEN tool to pre-compile the MSIL to native.
There is a semi-GC choice : GLIB.
Gilb use reference count to manage lifespan of object. When refrence count meet 0, an object is cleaned.
It much much more inconvienient than .NET or Java or Python, but when you have to use C, it's better than nothing.