Typed Racket Optimizer - compiler-optimization

I am learning some Typed Racket at the moment and i have a somewhat philosophical dilemma:
Racket claims to be a language development framework and Typed Racket is one such languages implemented on top of it. The documentation mentions that due to types being used, the compiler now can do more/better optimizations.
The concrete question:
Where do these optimizations happen?
1) In the compile/expand part (which is "programmable" as part of the language building framework)
-or-
2) further down the line in the (bytecode) optimizer (which is written in C and not directly modifieable via the framework).
If 2) is true, does that mean the type information is lost after the compile/expand stage and later "rebuilt/guessed" by the optimizer or has the intermediate representation been altered to to accomodate the type information and inform later stages about them?
The reason i am asking this specific question is because i want to get a feeling for how general the Racket language framework really is, i.e. is also viable for statically typed languages without any modifications in the backend versus the type system being only a front-end thing, while the code at runtime is still dynamically typed (but statically checked of course).
Thank you.

Typed Racket's optimizations occur during macro expansion. To see for yourself, you can change #lang typed/racket to #lang typed/racket #:no-optimize, which shows Typed Racket is in complete control of what optimizations are applied.
The optimizations consist of using type information to replace various uses of certain procedures with their unsafe equivalents. The unsafe procedures perform no runtime checks on the types of their arguments and cause undefined behavior (read: segfaults) if used incorrectly. You can find out more in the documentation section entitled Optimization in Typed Racket.
The exposure of the unsafe variants of procedures is what really makes it possible for user-defined languages to implement these optimizations. For example, if you wrote your own language with a type system that could prove vectors were never accessed with out-of-bounds indices you could replaces uses of vector-ref with unsafe-vector-ref.
There are similar optimizations that occur at the bytecode level, but these mostly apply when the JIT can infer type information that's not visible at macro expansion time. These are not user-controlled, but you don't have to rely on them.

Related

Common Lisp: Are all functions built from the core functions, CAR, CDR, CONS, etc?

True or False?
Common Lisp has a mountain of functions. All of those functions are built (or could be built) using this small set of core functions: CAR, CDR, CONS, ATOM, EQ, QUOTE, COND, LAMBDA, LABEL, NULL.
If the answer is False, can you provide an example of a function that cannot be implemented using the core functions? Perhaps the list of core functions is incomplete, and another or two additional core functions are required?
[..] All of those functions are built (or could be built) using [..]
The important part is the could, which you already figured out yourself. That (almost*) all of (a) Lisp could be built using that small set of core functions (forms) is part of the beauty of Lisp. In practice, though, the set of functions (forms) not implemented in Lisp is a lot larger.
Why do implementations then bother to implement that much, when they only could implement that minimal core? As a small example, think of this expression:
(+ 1 2)
You could implement a Lisp that uses only the small set of core functions and it would be able (given an appropriate parser for the numbers) to evaluate this expression. But it would be painfully slow! The systems (CPUs) that are available to us mostly provide a lot of different instructions which Lisp implementations (especially the compiling ones) try to leverage as much as possible in order to allow fast execution of Lisp programs. And, to get back to the example, that also means that one wouldn't do actual calculations using peano arithmetic but rather use the "boolean logic arithmetic" implemented in hardware.
If the answer is False, can you provide an example of a function that cannot be implemented using the core functions?
That's easy, how'd you implement format? Anything that's not part of the "algorithmic" nature of a programming language, i.e. the stuff interfacing with the "outside world", is often not implemented in itself but - being dependent on the underlying system - most often in C or assembly.
False. Any function which deals with a datatype other than conses can not be implemented from your proposed core set: for instance important types such as NUMBER and SYMBOL can not be dealt with at all. Any function which does I/O likewise, and probably other vast reaches of the language.
Your list sounds like it came from some incomplete description of a proposed core of Lisp 1.5.

What is the difference between Clojure REPL and Scala REPL?

I’ve been working with Scala language for a few months and I’ve already created couple projects in Scala. I’ve found Scala REPL (at least its IntelliJ worksheet implementation) is quite convenient for quick development. I can write code, see what it does and it’s nice. But I do the procedure only for functions (not whole program). I can’t start my application and change it on spot. Or at least I don’t know how (so if you know you are welcome to give me piece of advice).
Several days ago my associate told me about Clojure REPL. He uses Emacs for development process and he can change code on spot and see results without restarting. For example, he starts the process and if he changes implementation of a function, his code will change his behavior without restart. I would like to have the same thing with Scala language.
P.S. I want to discuss neither which language is better nor does functional programming better than object-oriented one. I want to find a good solution. If Clojure is the better language for the task so let it be.
The short answer is that Clojure was designed to use a very simple, single pass compiler which reads and compiles a single s-expression or form at a time. For better or worse there is no global type information, no global type inference and no global analysis or optimization. Clojure uses clojure.lang.Var instances to create global bindings through a series of hashmaps from textual symbols to transactional values. def forms all create bindings at global scope in this global binding map. So where in Scala a "function" (method) will be resolved to an instance or static method on a given JVM class, in Clojure a "function" (def) is really just a reference to an entry in the table of var bindings. When a function is invoked, there isn't a static link to another class, instead the var is reference by symbolic name, then dereferenced to get an instance of a clojure.lang.IFn object which is then invoked.
This layer of indirection means that it is possible to re-evaluate only a single definition at a time, and that re-evaluation becomes globaly visible to all clients of the re-defined var.
In comparison, when a definition in Scala changes, scalac must reload the changed file, macroexpand, type infer, type check, and compile. Then due to the semantics of classloading on the JVM, scalac must also reload all classes which depend on methods in the class which changed. Also all values which are instances of the changed class become trash.
Both approaches have their strengths and weaknesses. Obviously Clojure's approach is simpler to implement, however it pays an ongoing cost in terms of performance due to continual function lookup operations forget correctness concerns due to lack of static types and what have you. This is arguably suitable for contexts in which lots of change is happening in a short timeframe (interactive development) but is less suitable for context when code is mostly static (deployment, hence Oxcart). some work I did suggests that the slowdown on Clojure programs from lack of static method linking is on the order of 16-25%. This is not to call Clojure slow or Scala fast, they just have different priorities.
Scala chooses to do more work up front so that the compiled application will perform better which is arguably more suitable for application deployment when little or no reloading will take place, but proves a drag when you want to make lots of small changes.
Some material I have on hand about compiling Clojure code more or less cronological by publication order since Nicholas influenced my GSoC work a lot.
Clojure Compilation [Nicholas]
Clojure Compilation: Full Disclojure [Nicholas]
Why is Clojure bootstrapping so slow? [Nicholas]
Oxcart and Clojure [me]
Of Oxen, Carts and Ordering [me]
Which I suppose leaves me in the unhappy place of saying simply "I'm sorry, Scala wasn't designed for that the way Clojure was" with regards to code hot swapping.

Why is Scala's Type system not a Library in Clojure

I've heard people claim that:
Scala's type system is amazing (existential types, variant, co-variant)
Because of the power of macros, everything is a library in Clojure: (pattern matching, logic programming, non-determinism, ..)
Question:
If both assertions are true, why is Scala's type system not a library in Clojure? Is it because:
types are one of these things that do not work well as a library? [i.e. the changes would somehow have to threaded through every existing clojure library, including clojure.core?]
is Scala's notion of types fundamentally incompatible with clojure protocol / records?
... ?
It's an interesting question.
You are certainly right about Scala having an amazing type system, and about Clojure being phenomenal for meta-programming and extension of the language (although that is about more than just macros....).
A few reasons I can think of:
Clojure is a dynamically typed language while Scala is a statically typed language. Having powerful type inference isn't so much use in a language where you can assume relatively little about the types of your inputs.
Clojure already has a very interesting project to add typing as a library (Typed Clojure) which looks very promising - however it's very different in approach to Scala as it is designed for a dynamic language from the start (inspired more by Typed Racket, I believe).
Clojure philosophy actually discourages certain OOP concepts (particularly implementation inheritance, mutable objects, and data encapsulation). A type system that supports these things (as Scala does) wouldn't be a good fit for Clojure idioms - at best they would be ignored, but they could easily encourage a style of development that would cause people to run into severe problems later.
Clojure already provides tools that solve many of the problems you would typically solve with types in other languages - e.g. the use of protocols for polymorphism.
There's a strong focus in the Clojure community on simplicity (in the sense of the excellent video "Simple Made Easy" - see particularly the slide at 39:30). While Scala's type system is certainly amazing, I think it's a stretch to describe it as "Simple"
Putting in a Scala-style type system would probably require a complete rewrite of the Clojure compiler and make it substantially more complex. Nobody seems to have signed up so far to take on that particular challenge... and there's a risk that even if someone were willing and able to do this then the changes could be rejected for the various cultural / technical reasons covered above.
In the absence of a major change to Clojure itself (which I think would be unlikely) then one interesting possibility would be to create a DSL within Clojure that provided Scala-style type inference for a specific domain and compiled this DSL direct to optimised Java bytecode. I could see that being a useful approach for specific problem domains (large scale numerical data crunching with big matrices, for example).
To simply answer your question "... why is Scala's type system not a library in Clojure?":
Because the type system is part of the scala compiler and not of the scala library. The whole power of scalas type system only exists at compile time. The JVM has no support for things like that, because of type erasure and also, because it would simply slow down execution. And also there is no need for it. If you have a statically typed language, you don't need type information at runtime, unless you want to do dirty stuff.
edit:
#mikera the jvm is sure capable of running the scala compiler, I did not say anything like that. I just said, that the jvm has no support for type systems like that. It does not even support generics. At runtime all these types are gone. The compiler checks for the correctness of a program and removes all the higher kinded types / generics.
example:
val xs: List[Int] = List(1,2,3,4)
val x1: Int = xs.head
will at runtime look like this:
val xs: List = List.apply(1,2,3,4)
val x1: Int = xs.head.asInstanceOf[Int]
But it doesn't matter, because the compiler checked it before. You can only get in trouble here, when you use reflection, because you could put any value in the list and it would break at runtime exactly where the value is casted to Int.
And this is one of the reasons, why the scala type system is not part of the scala library, but built into the compiler.
And also the question of the OP was "... why is Scala's type system not a library in Clojure?" and not "Is it possible to create a type system such as scalas for clojure?" and I perfectly answered that question.

Would the ability to declare Lisp functions 'pure' be beneficial?

I have been reading a lot about Haskell lately, and the benefits that it derives from being a purely functional language. (I'm not interested in discussing monads for Lisp) It makes sense to me to (at least logically) isolate functions with side-effects as much as possible. I have used setf and other destructive functions plenty, and I recognize the need for them in Lisp and (most of) its derivatives.
Here we go:
Would something like (declare pure) potentially help an optimizing compiler? Or is this a moot point because it already knows?
Would the declaration help in proving a function or program, or at least a subset that was declared as pure? Or is this again something that is unnecessary because it's already obvious to the programmer and compiler and prover?
If for nothing else, would it be useful to a programmer for the compiler to enforce purity for functions with this declaration and add to the readability/maintainablity of Lisp programs?
Does any of this make any sense? Or am I too tired to even think right now?
I'd appreciate any insights here. Info on compiler implementation or provability is welcome.
EDIT
To clarify, I didn't intend to restrict this question to Common Lisp. It clearly (I think) doesn't apply to certain derivative languages, but I'm also curious if some features of other Lisps may tend to support (or not) this kind of facility.
You have two answers but neither touch on the real problem.
First, yes, it would obviously be good to know that a function is pure. There's a ton of compiler level things that would like to know that, as well as user level things. Given that lisp languages are so flexible, you could twist things a bit: instead of a "pure" declaration that asks the compiler to try harder or something, you just make the declaration restrict the code in the definition. This way you can guarantee that the function is pure.
You can even do that with additional supporting facilities -- I mentioned two of them in a comment I made to johanbev's answer: add the notion of immutable bindings and immutable data structures. I know that in Common Lisp these are very problematic, especially immutable bindings (since CL loads code by "side-effecting" it into place). But such features will help simplifying things, and they're not inconceivable (see for example the Racket implementation that has immutable pairs and other data structures, and has immutable bindings.
But the real question is what can you do in such restricted functions. Even a very simple looking problem would be infested with issues. (I'm using Scheme-like syntax for this.)
(define-pure (foo x)
(cons (+ x 1) (bar)))
Seems easy enough to tell that this function is indeed pure, it doesn't do anything . Also, seems that having define-pure restrict the body and allow only pure code would work fine in this case, and will allow this definition.
Now start with the problems:
It's calling cons, so it assumes that it is also known to be pure. In addition, as I mentioned above, it should rely on cons being what it is, so assume that the cons binding is immutable. Easy, since it's a known builtin. Do the same with bar, of course.
But cons does have a side effect (even if you're talking about Racket's immutable pairs): it allocates a new pair. This seems like a minor and ignorable point, but, for example, if you allow such things to appear in pure functions, then you won't be able to auto-memoize them. The problem is that someone might rely on every foo call returning a new pair -- one that is not-eq to any other existing pair. Seems that to make it fine you need to further restrict pure functions to deal not only with immutable values, but also values where the constructor doesn't always create a new value (eg, it could hash-cons instead of allocate).
But that code also calls bar -- so no you need to make the same assumptions on bar: it must be known as a pure function, with an immutable binding. Note specifically that bar receives no arguments -- so in that case the compiler could not only require that bar is a pure function, it could also use that information and pre-compute its value. After all, a pure function with no inputs could be reduced to a plain value. (Note BTW that Haskell doesn't have zero-argument functions.)
And that brings another big issue in. What if bar is a function of one input? In that case you'd have an error, and some exception will get thrown ... and that's no longer pure. Exceptions are side-effects. You now need to know the arity of bar in addition to everything else, and you need to avoid other exceptions. Now, how about that input x -- what happens if it isn't a number? That will throw an exception too, so you need to avoid it too. This means that you now need a type system.
Change that (+ x 1) to (/ 1 x) and you can see that not only do you need a type system, you need one that is sophisticated enough to distinguish 0s.
Alternatively, you could re-think the whole thing and have new pure arithmetic operations that never throw exceptions -- but with all the other restrictions you're now quite a long way from home, with a language that is radically different.
Finally, there's one more side-effect that remains a PITA: what if the definition of bar is (define-pure (bar) (bar))? It certainly is pure according to all of the above restrictions... But diverging is a form of a side effect, so even this is no longer kosher. (For example, if you did make your compiler optimize nullary functions to values, then for this example the compiler itself would get stuck in an infinite loop.) (And yes, Haskell doesn't deal with that, it doesn't make it less of an issue.)
Given a Lisp function, knowing if it is pure or not is undecidable in general. Of course, necessary conditions and sufficient conditions can be tested at compile time. (If there are no impure operations at all, then the function must be pure; if an impure operation gets executed unconditionally, then the function must be impure; for more complicated cases, the compiler could try to prove that the function is pure or impure, but it will not succeed in all cases.)
If the user can manually annotate a function as pure, then the compiler could either (a.) try harder to prove that the function is pure, ie. spend more time before giving up, or (b.) assume that it is and add optimizations which would not be correct for impure functions (like, say, memoizing results). So, yes, annotating functions as pure could help the compiler if the annotations are assumed to be correct.
Apart from heuristics like the "trying harder" idea above, the annotation would not help to prove stuff, because it's not giving any information to the prover. (In other words, the prover could just assume that the annotation is always there before trying.) However, it could make sense to attach to pure functions a proof of their purity.
The compiler could either (a.) check if pure functions are indeed pure at compile time, but this is undecidable in general, or (b.) add code to try to catch side effects in pure functions at runtime and report those as an error. (a.) would probably be helpful with simple heuristics (like "an impure operation gets executed unconditionally), (b.) would be useful for debug.
No, it seems to make sense. Hopefully this answer also does.
The usual goodies apply when we can assume purity and referential
transparency. We can automatically memoize hotspots. We can
automatically parallelize computation. We can deal away with a lot of
race conditions. We can also use structure sharing with data that we
know cannot be modified, for instance the (quasi) primitive ``cons()''
does not need to copy the cons-cells in the list it's consing to.
These cells are not affected in any way by having another cons-cell
pointing to it. This example is kinda obvious, but compilers are often
good performers in figuring out more complex structure sharing.
However, actually determining if a lambda (a function) is pure or has
referential transparency is very tricky in Common Lisp. Remember that
a funcall (foo bar) start by looking at (symbol-function foo). So in
this case
(defun foo (bar)
(cons 'zot bar))
foo() is pure.
The next lambda is also pure.
(defun quux ()
(mapcar #'foo '(zong ding flop)))
However, later on we can redefine foo:
(let ((accu -1))
(defun foo (bar)
(incf accu)))
The next call to quux() is no longer pure! The old pure foo() has been
redefined to an impure lambda. Yikes. This example is maybe somewhat
contrived but it's not that uncommon to lexically redefine some
functions, for instance with a let block. In that case it's not
possible to know what would happen at compile time.
Common Lisp has a very dynamic semantic, so actually being
able to determine control flow and data flow ahead of time (for
instance when compiling) is very hard, and in most useful cases
entirely undecidable. This is quite typical of languages with dynamic
type systems. There is a lot of common idioms in Lisp you cannot use
if you must use static typing. It's mainly these that fouls any
attempt to do much meaningful static analysis. We can do it for primitives
like cons and friends. But for lambdas involving other things than
primitives we are in much deeper water, especially in the cases where
we need to look at complex interplay between functions. Remember that
a lambda is only pure if all the lambdas it calls are also pure.
On the top of my head, it could be possible, with some deep macrology,
to do away with the redefinition problem. In a sense, each lambda gets
an extra argument which is a monad that represents the entire state of
the lisp image (we can obviously restrict ourselves to what the function
will actually look at). But it's probably more useful to be able do
declare purity ourselves, in the sense that we promise the compiler
that this lambda is indeed pure. The consequences if it isn't is then
undefined, and all sorts of mayhem could ensue...

Writing programs in dynamic languages that go beyond what the specification allows

With the growth of dynamically typed languages, as they give us more flexibility, there is the very likely probability that people will write programs that go beyond what the specification allows.
My thinking was influenced by this question, when I read the answer by bobince:
A question about JavaScript's slice and splice methods
The basic thought is that splice, in Javascript, is specified to be used in only certain situations, but, it can be used in others, and there is nothing that the language can do to stop it, as the language is designed to be extremely flexible.
Unless someone reads through the specification, and decides to adhere to it, I am fairly certain that there are many such violations occuring.
Is this a problem, or a natural extension of writing such flexible languages? Or should we expect tools like JSLint to help be the specification police?
I liked one answer in this question, that the implementation of python is the specification. I am curious if that is actually closer to the truth for these types of languages, that basically, if the language allows you to do something then it is in the specification.
Is there a Python language specification?
UPDATE:
After reading a couple of comments, I thought I would check the splice method in the spec and this is what I found, at the bottom of pg 104, http://www.mozilla.org/js/language/E262-3.pdf, so it appears that I can use splice on the array of children without violating the spec. I just don't want people to get bogged down in my example, but hopefully to consider the question.
The splice function is intentionally generic; it does not require that its this value be an Array object.
Therefore it can be transferred to other kinds of objects for use as a method. Whether the splice function
can be applied successfully to a host object is implementation-dependent.
UPDATE 2:
I am not interested in this being about javascript, but language flexibility and specs. For example, I expect that the Java spec specifies you can't put code into an interface, but using AspectJ I do that frequently. This is probably a violation, but the writers didn't predict AOP and the tool was flexible enough to be bent for this use, just as the JVM is also flexible enough for Scala and Clojure.
Whether a language is statically or dynamically typed is really a tiny part of the issue here: a statically typed one may make it marginally easier for code to enforce its specs, but marginally is the key word here. Only "design by contract" -- a language letting you explicitly state preconditions, postconditions and invariants, and enforcing them -- can help ward you against users of your libraries empirically discovering what exactly the library will let them get away with, and taking advantage of those discoveries to go beyond your design intentions (possibly constraining your future freedom in changing the design or its implementation). And "design by contract" is not supported in mainstream languages -- Eiffel is the closest to that, and few would call it "mainstream" nowadays -- presumably because its costs (mostly, inevitably, at runtime) don't appear to be justified by its advantages. "Argument x must be a prime number", "method A must have been previously called before method B can be called", "method C cannot be called any more once method D has been called", and so on -- the typical kinds of constraints you'd like to state (and have enforced implicitly, without having to spend substantial programming time and energy checking for them yourself) just don't lend themselves well to be framed in the context of what little a statically typed language's compiler can enforce.
I think that this sort of flexibility is an advantage as long as your methods are designed around well defined interfaces rather than some artificial external "type" metadata. Most of the array functions only expect an object with a length property. The fact that they can all be applied generically to lots of different kinds of objects is a boon for code reuse.
The goal of any high level language design should be to reduce the amount of code that needs to be written in order to get stuff done- without harming readability too much. The more code that has to be written, the more bugs get introduced. Restrictive type systems can be, (if not well designed), a pervasive lie at worst, a premature optimisation at best. I don't think overly restrictive type systems aid in writing correct programs. The reason being that the type is merely an assertion, not necessarily based on evidence.
By contrast, the array methods examine their input values to determine whether they have what they need to perform their function. This is duck typing, and I believe that this is more scientific and "correct", and it results in more reusable code, which is what you want. You don't want a method rejecting your inputs because they don't have their papers in order. That's communism.
I do not think your question really has much to do with dynamic vs. static typing. Really, I can see two cases: on one hand, there are things like Duff's device that martin clayton mentioned; that usage is extremely surprising the first time you see it, but it is explicitly allowed by the semantics of the language. If there is a standard, that kind of idiom may appear in later editions of the standard as a specific example. There is nothing wrong with these; in fact, they can (unless overused) be a great productivity boost.
The other case is that of programming to the implementation. Such a case would be an actual abuse, coming from either ignorance of a standard, or lack of a standard, or having a single implementation, or multiple implementations that have varying semantics. The problem is that code written in this way is at best non-portable between implementations and at worst limits the future development of the language, for fear that adding an optimization or feature would break a major application.
It seems to me that the original question is a bit of a non-sequitor. If the specification explicitly allows a particular behavior (as MUST, MAY, SHALL or SHOULD) then anything compiler/interpreter that allows/implements the behavior is, by definition, compliant with the language. This would seem to be the situation proposed by the OP in the comments section - the JavaScript specification supposedly* says that the function in question MAY be used in different situations, and thus it is explicitly allowed.
If, on the other hand, a compiler/interpreter implements or allows behavior that is expressly forbidden by a specification, then the compiler/interpreter is, by definition, operating outside the specification.
There is yet a third scenario, and an associated, well defined, term for those situations where the specification does not define a behavior: undefined. If the specification does not actually specify a behavior given a particular situation, then the behavior is undefined, and may be handled either intentionally or unintentionally by the compiler/interpreter. It is then the responsibility of the developer to realize that the behavior is not part of the specification, and, should s/he choose to leverage the behavior, the developer's application is thereby dependent upon the particular implementation. The interpreter/compiler providing that implementation is under no obligation to maintain the officially undefined behavior beyond backwards compatibility and whatever commitments the producer may make. Furthermore, a later iteration of the language specification may define the previously undefined behavior, making the compiler/interpreter either (a) non-compliant with the new iteration, or (b) come out with a new patch/version to become compliant, thereby breaking older versions.
* "supposedly" because I have not seen the spec, myself. I go by the statements made, above.