Would the ability to declare Lisp functions 'pure' be beneficial? - lisp

I have been reading a lot about Haskell lately, and the benefits that it derives from being a purely functional language. (I'm not interested in discussing monads for Lisp) It makes sense to me to (at least logically) isolate functions with side-effects as much as possible. I have used setf and other destructive functions plenty, and I recognize the need for them in Lisp and (most of) its derivatives.
Here we go:
Would something like (declare pure) potentially help an optimizing compiler? Or is this a moot point because it already knows?
Would the declaration help in proving a function or program, or at least a subset that was declared as pure? Or is this again something that is unnecessary because it's already obvious to the programmer and compiler and prover?
If for nothing else, would it be useful to a programmer for the compiler to enforce purity for functions with this declaration and add to the readability/maintainablity of Lisp programs?
Does any of this make any sense? Or am I too tired to even think right now?
I'd appreciate any insights here. Info on compiler implementation or provability is welcome.
EDIT
To clarify, I didn't intend to restrict this question to Common Lisp. It clearly (I think) doesn't apply to certain derivative languages, but I'm also curious if some features of other Lisps may tend to support (or not) this kind of facility.

You have two answers but neither touch on the real problem.
First, yes, it would obviously be good to know that a function is pure. There's a ton of compiler level things that would like to know that, as well as user level things. Given that lisp languages are so flexible, you could twist things a bit: instead of a "pure" declaration that asks the compiler to try harder or something, you just make the declaration restrict the code in the definition. This way you can guarantee that the function is pure.
You can even do that with additional supporting facilities -- I mentioned two of them in a comment I made to johanbev's answer: add the notion of immutable bindings and immutable data structures. I know that in Common Lisp these are very problematic, especially immutable bindings (since CL loads code by "side-effecting" it into place). But such features will help simplifying things, and they're not inconceivable (see for example the Racket implementation that has immutable pairs and other data structures, and has immutable bindings.
But the real question is what can you do in such restricted functions. Even a very simple looking problem would be infested with issues. (I'm using Scheme-like syntax for this.)
(define-pure (foo x)
(cons (+ x 1) (bar)))
Seems easy enough to tell that this function is indeed pure, it doesn't do anything . Also, seems that having define-pure restrict the body and allow only pure code would work fine in this case, and will allow this definition.
Now start with the problems:
It's calling cons, so it assumes that it is also known to be pure. In addition, as I mentioned above, it should rely on cons being what it is, so assume that the cons binding is immutable. Easy, since it's a known builtin. Do the same with bar, of course.
But cons does have a side effect (even if you're talking about Racket's immutable pairs): it allocates a new pair. This seems like a minor and ignorable point, but, for example, if you allow such things to appear in pure functions, then you won't be able to auto-memoize them. The problem is that someone might rely on every foo call returning a new pair -- one that is not-eq to any other existing pair. Seems that to make it fine you need to further restrict pure functions to deal not only with immutable values, but also values where the constructor doesn't always create a new value (eg, it could hash-cons instead of allocate).
But that code also calls bar -- so no you need to make the same assumptions on bar: it must be known as a pure function, with an immutable binding. Note specifically that bar receives no arguments -- so in that case the compiler could not only require that bar is a pure function, it could also use that information and pre-compute its value. After all, a pure function with no inputs could be reduced to a plain value. (Note BTW that Haskell doesn't have zero-argument functions.)
And that brings another big issue in. What if bar is a function of one input? In that case you'd have an error, and some exception will get thrown ... and that's no longer pure. Exceptions are side-effects. You now need to know the arity of bar in addition to everything else, and you need to avoid other exceptions. Now, how about that input x -- what happens if it isn't a number? That will throw an exception too, so you need to avoid it too. This means that you now need a type system.
Change that (+ x 1) to (/ 1 x) and you can see that not only do you need a type system, you need one that is sophisticated enough to distinguish 0s.
Alternatively, you could re-think the whole thing and have new pure arithmetic operations that never throw exceptions -- but with all the other restrictions you're now quite a long way from home, with a language that is radically different.
Finally, there's one more side-effect that remains a PITA: what if the definition of bar is (define-pure (bar) (bar))? It certainly is pure according to all of the above restrictions... But diverging is a form of a side effect, so even this is no longer kosher. (For example, if you did make your compiler optimize nullary functions to values, then for this example the compiler itself would get stuck in an infinite loop.) (And yes, Haskell doesn't deal with that, it doesn't make it less of an issue.)

Given a Lisp function, knowing if it is pure or not is undecidable in general. Of course, necessary conditions and sufficient conditions can be tested at compile time. (If there are no impure operations at all, then the function must be pure; if an impure operation gets executed unconditionally, then the function must be impure; for more complicated cases, the compiler could try to prove that the function is pure or impure, but it will not succeed in all cases.)
If the user can manually annotate a function as pure, then the compiler could either (a.) try harder to prove that the function is pure, ie. spend more time before giving up, or (b.) assume that it is and add optimizations which would not be correct for impure functions (like, say, memoizing results). So, yes, annotating functions as pure could help the compiler if the annotations are assumed to be correct.
Apart from heuristics like the "trying harder" idea above, the annotation would not help to prove stuff, because it's not giving any information to the prover. (In other words, the prover could just assume that the annotation is always there before trying.) However, it could make sense to attach to pure functions a proof of their purity.
The compiler could either (a.) check if pure functions are indeed pure at compile time, but this is undecidable in general, or (b.) add code to try to catch side effects in pure functions at runtime and report those as an error. (a.) would probably be helpful with simple heuristics (like "an impure operation gets executed unconditionally), (b.) would be useful for debug.
No, it seems to make sense. Hopefully this answer also does.

The usual goodies apply when we can assume purity and referential
transparency. We can automatically memoize hotspots. We can
automatically parallelize computation. We can deal away with a lot of
race conditions. We can also use structure sharing with data that we
know cannot be modified, for instance the (quasi) primitive ``cons()''
does not need to copy the cons-cells in the list it's consing to.
These cells are not affected in any way by having another cons-cell
pointing to it. This example is kinda obvious, but compilers are often
good performers in figuring out more complex structure sharing.
However, actually determining if a lambda (a function) is pure or has
referential transparency is very tricky in Common Lisp. Remember that
a funcall (foo bar) start by looking at (symbol-function foo). So in
this case
(defun foo (bar)
(cons 'zot bar))
foo() is pure.
The next lambda is also pure.
(defun quux ()
(mapcar #'foo '(zong ding flop)))
However, later on we can redefine foo:
(let ((accu -1))
(defun foo (bar)
(incf accu)))
The next call to quux() is no longer pure! The old pure foo() has been
redefined to an impure lambda. Yikes. This example is maybe somewhat
contrived but it's not that uncommon to lexically redefine some
functions, for instance with a let block. In that case it's not
possible to know what would happen at compile time.
Common Lisp has a very dynamic semantic, so actually being
able to determine control flow and data flow ahead of time (for
instance when compiling) is very hard, and in most useful cases
entirely undecidable. This is quite typical of languages with dynamic
type systems. There is a lot of common idioms in Lisp you cannot use
if you must use static typing. It's mainly these that fouls any
attempt to do much meaningful static analysis. We can do it for primitives
like cons and friends. But for lambdas involving other things than
primitives we are in much deeper water, especially in the cases where
we need to look at complex interplay between functions. Remember that
a lambda is only pure if all the lambdas it calls are also pure.
On the top of my head, it could be possible, with some deep macrology,
to do away with the redefinition problem. In a sense, each lambda gets
an extra argument which is a monad that represents the entire state of
the lisp image (we can obviously restrict ourselves to what the function
will actually look at). But it's probably more useful to be able do
declare purity ourselves, in the sense that we promise the compiler
that this lambda is indeed pure. The consequences if it isn't is then
undefined, and all sorts of mayhem could ensue...

Related

What is the advantage of saying your function should never be inlined?

I understand Swift's inlining well. I know the nuances between the four function-inlining attributes. I use #inline(__always) a lot, especially when I'm just making sugary APIs like this:
public extension String {
#inline(__always)
var length: Int { count }
}
I do this because there's not really a cost involved in inlining it, but there would be the cost of an extra stack frame if it weren't inlined. For less-obvious sugar, I'll lean toward #inlinable andor #usableFromInline as needed.
However, one distinction vexes me. The two possible arguments to #inline are never and __always. Despite the lack of actual documentation, this choice of spelling here acts as a sort of self-documentation, implying that if you are going to use one of these, you should lean toward never, and __always is discouraged.
But why is this the direction the Swift language designers encourage? As far as I know, if no attribute is applied at all, then this is the behavior:
If a function (et al) is used within the module in which it's declared, the compiler might choose to inline it or not, depending on which would produce better code (by some measure)
If that function (et al) is used outside the module, its implementation is not exposed in a way that allows it to be inlined, so it is never inlined.
So, it seems most of the time, not-inlining is the default. That's fine and dandy, I have no problem with that on the surface; don't bloat the executable any more than you need to.
But then, I've never had a reason to think #inline(never) is useful. From what I understand, the only reason I would use #inline(never) is if I've noticed that the Swift compiler is choosing to inline a non-annotated function too much, and it's bloating my executable. This seems like a super-niche occurrence:
My software is running fine
The Swift compiler's algorithm for deciding whether to inline something is not making the right choice for my code
I care about the size of the binary so much that I'm inspecting it closely enough to discover that a function is being inlined automatically too much
The problem is only in code that I've written into my own module; not code I'm using from some other module
Or, as Rob said in the comments, if you're going through some disassembly and automatic inlining makes it hard to read.
I can't imagine that these are the use cases which the Swift language designers had in mind when designing this attribute. Especially since Swift is not meant for embedded systems, binary size (and the (dis)assembly in general) isn't really that much of a concern. I've never seen an unreasonably-large Swift binary anyway (>50MB).
So why is never encouraged more than __always? I often run into reasons why I should force a function to be inlined, but I've not yet seen a reason to force a function to be stacked, at least in my own work.

Typed Racket Optimizer

I am learning some Typed Racket at the moment and i have a somewhat philosophical dilemma:
Racket claims to be a language development framework and Typed Racket is one such languages implemented on top of it. The documentation mentions that due to types being used, the compiler now can do more/better optimizations.
The concrete question:
Where do these optimizations happen?
1) In the compile/expand part (which is "programmable" as part of the language building framework)
-or-
2) further down the line in the (bytecode) optimizer (which is written in C and not directly modifieable via the framework).
If 2) is true, does that mean the type information is lost after the compile/expand stage and later "rebuilt/guessed" by the optimizer or has the intermediate representation been altered to to accomodate the type information and inform later stages about them?
The reason i am asking this specific question is because i want to get a feeling for how general the Racket language framework really is, i.e. is also viable for statically typed languages without any modifications in the backend versus the type system being only a front-end thing, while the code at runtime is still dynamically typed (but statically checked of course).
Thank you.
Typed Racket's optimizations occur during macro expansion. To see for yourself, you can change #lang typed/racket to #lang typed/racket #:no-optimize, which shows Typed Racket is in complete control of what optimizations are applied.
The optimizations consist of using type information to replace various uses of certain procedures with their unsafe equivalents. The unsafe procedures perform no runtime checks on the types of their arguments and cause undefined behavior (read: segfaults) if used incorrectly. You can find out more in the documentation section entitled Optimization in Typed Racket.
The exposure of the unsafe variants of procedures is what really makes it possible for user-defined languages to implement these optimizations. For example, if you wrote your own language with a type system that could prove vectors were never accessed with out-of-bounds indices you could replaces uses of vector-ref with unsafe-vector-ref.
There are similar optimizations that occur at the bytecode level, but these mostly apply when the JIT can infer type information that's not visible at macro expansion time. These are not user-controlled, but you don't have to rely on them.

Examples of non-trivial fexpr usage

I'm looking for (real world) uses of fexprs, where they are used in a way different to what can be accomplished with lazy evaluation.
Most examples that I could find use fexprs only to implement conditional evaluation, like for a short circuit "and" operative (Evaluate first argument, if false, don't evaluate second and directly return false).
I'm looking for "useful" uses, that is where using fexpr leads to code that is "better" (cleaner) than what could be done without fexprs.
There are two main reasons you would want to use fexprs.
The first one is because they allow you to evaluate the arguments an arbitrary number of times. This makes it possible to implement operators that evaluate their arguments lazily like you suggested. Constructs built this way are also capable of evaluating their arguments more than once. This makes it possible to implement loops through fexprs!
The other case is for transformation. Transforming code is basically a way of writing a compiler on top of your existing Lisp. Although it uses macros and not fexprs, cl-who is a great example of the kind of transformations that can be made.
Fexpr are somewhat orthogonal to lazy/eager evaluation.
The usual function approach is to eval the arguments to a function then call it on the result. Lazy eval still behaves like this, it just delays the evaluation until immediately before the parameter is used.
The usual macro approach is to pass the unevaluated arguments into a template which evaluates anything that isn't quoted. The resulting piece of AST is injected into the call site where it is usually evaluated again. This works much the same with lazy eval.
The historically insane fexpr approach is to pass unevaluated arguments to the function, which does as it pleases with them. The result is injected directly into the call site and usually not evaluated automatically.
The fexpr is pretty close to an arbitrary transform. So you can implement macros and lambdas with them. You can also implement whatever hybrid of eager/lazy evaluation you wish. Likewise you could implement fexpr given default lazy eval and explicit calls to eval() in various places to force eager behaviour.
I don't think I would characterise fexpr as an easy solution to implementing lazy eval though, in a cure is worse than the disease sense.

Side effects in Scala

I am learning Scala right in these days. I have a slight familiarity with Haskell, although I cannot claim to know it well.
Parenthetical remark for those who are not familiar with Haskell
One trait that I like in Haskell is that not only functions are first-class citizens, but side effects (let me call them actions) are. An action that, when executed, will endow you with a value of type a, belongs to a specific type IO a. You can pass these actions around pretty much like any other value, and combine them in interesting ways.
In fact, combining the side effects is the only way in Haskell to do something with them, as you cannot execute them. Rather, the program that will be executed, is the combined action which is returned by your main function. This is a neat trick that allows functions to be pure, while letting your program actually do something other than consuming power.
The main advantage of this approach is that the compiler is aware of the parts of the code where you perform side effects, so it can help you catch errors with them.
Actual question
Is there some way in Scala to have the compiler type check side effects for you, so that - for instance - you are guaranteed not to execute side effects inside a certain function?
No, this is not possible in principle in Scala, as the language does not enforce referential transparency -- the language semantics are oblivious to side effects. Your compiler will not track and enforce freedom from side effects for you.
You will be able to use the type system to tag some actions as being of IO type however, and with programmer discipline, get some of the compiler support, but without the compiler proof.
The ability to enforce referential transparency this is pretty much incompatible with Scala's goal of having a class/object system that is interoperable with Java.
Java code can be impure in arbitrary ways (and may not be available for analysis when the Scala compiler runs) so the Scala compiler would have to assume all foreign code is impure (assigning them an IO type). To implement pure Scala code with calls to Java, you would have to wrap the calls in something equivalent to unsafePerformIO. This adds boilerplate and makes the interoperability much less pleasant, but it gets worse.
Having to assume that all Java code is in IO unless the programmer promises otherwise would pretty much kill inheriting from Java classes. All the inherited methods would have to be assumed to be in the IO type; this would even be true of interfaces, since the Scala compiler would have to assume the existence of an impure implementation somewhere out there in Java-land. So you could never derive a Scala class with any non-IO methods from a Java class or interface.
Even worse, even for classes defined in Scala, there could theoretically be an untracked subclass defined in Java with impure methods, whose instances might be passed back in to Scala as instances of the parent class. So unless the Scala compiler can prove that a given object could not possibly be an instance of a class defined by Java code, it must assume that any method call on that object might call code that was compiled by the Java compiler without respecting the laws of what functions returning results not in IO can do. This would force almost everything to be in IO. But putting everything in IO is exactly equivalent to putting nothing in IO and just not tracking side effects!
So ultimately, Scala encourages you to write pure code, but it makes no attempt to enforce that you do so. As far as the compiler is concerned, any call to anything can have side effects.

which clojure library interface design is best?

I want to provide multiple implementations of a message reader/writer. What is the best approach?
Here is some pseudo-code of what I'm currently thinking:
just have a set of functions that all implementations must provide and leave it up to the caller to hold onto the right streams
(ns x-format)
(read-message [stream] ...)
(write-message [stream message] ...)
return a map with two closed functions holding onto the stream
(ns x-format)
(defn make-formatter [socket]
{:read (fn [] (.read (.getInputStream socket))))
:write (fn [message] (.write (.getOutputStream socket) message)))})
something else?
I think the first option is better. It's more extensible, depending how these objects are going to be used. It's easier to add or change a new function that works on an existing object if the functions and objects are separate. In Clojure there usually isn't much reason to bundle functions along with the objects they work on, unless you really want to hide implementation details from users of your code.
If you're writing an interface for which you expect many implementations, consider using multimethods also. You can have the default throw a "not implemented" exception, to force implementors to implement your interface.
As Gutzofter said, if the only reason you're considering the second option is to allow people not to have to type a parameter on every function call, you could consider having all of your functions use some var as the default socket object and writing a with-socket macro which uses binding to set that var's value. See the builtin printing methods which default to using the value of *out* as the output stream, and with-out-str which binds *out* to a string writer, as a Clojure example.
This article may interest you; it compares and contrasts some OOP idioms with Clojure equivalents.
I think that read-message and write-message are utility functions. What you need to do is encapsulate your functions in a with- macro(s). See 'with-output-to-string' in common lisp to see what I mean.
Edit:
When you use a with- macro you can have error handling and resource allocation in the macro expansion.
I'd go with the first option and make all those functions multimethods.