I am interested in S-expressions, but I still don't have the right idioms in mind.
Imagine a VLSI component, characterized by a name and a list of typed ports. What is preferable :
(component name (p1 float) (p2 float))
or
(component name ((p1 float) (p2 float)))
?
The first says,
I am absolutely sure, for all time, that the component item consists of a name and 0-n ports, and nothing else.
The second states:
The component item consists of a name and a list of ports, and I might decide to add some other things later.
I'd personally prefer the first because it's more readable/editable, but either is possible. It depends on how complicated your sexprs are going to be and how you're going to process them.
It depends why you want S-expressions.
You can view them as a way to express textually some structured data. (In that broad sense, JSON, YAML and even XML could be similar).
Or you can view them as expressions for some interpreter which interprets them. Then you have to define that interpreter.
Related
I often read some programming languages have "First Class" support for modules (OCaml, Scala, TypeScript[?]) and recently stumbled upon an answer on SO citing modules as first class citizens among the distinguishing features of Scala.
I thought I knew very well what Modular Programming means but after these incidents I'm beginning to doubt my understanding...
I think modules are nothing special but instances of certain classes that are acting as mini-libraries. The mini-library code goes into a class, objects of that class are the modules. You can pass them around as dependencies to any other class that requires the services provided by the module, so any decent OOPL has first class modules but apparently not!
What exactly is a module? How is it different than, say, a plain class or an object?
How is (1) related (or not) to the Modular Programming that we all know?
What exactly does it mean for a language to have first class modules? What are the benefits? what are the drawbacks if a languages lacks such feature?
A module, as well as a subroutine, is a way of organizing your code. When we develop programs, we pack instructions into subroutines, subroutines into structures, structures into packages, libraries, assemblies, frameworks, solutions, and so on. So, putting everything else aside, it is just a mechanism to organize your code.
The essential reason, why we use all those mechanisms, instead of just laying out our instructions linearly, is because the complexity of a program grows non-linearly with respect to its size. In other words, a program built from n pieces each having m instructions is easier to comprehend than a program which is built from n*m instructions. This is, of course, not always true (otherwise we can just split our program into arbitrary parts and be happy). In fact, for that to be true, we have to introduce one essential mechanism called abstraction. We can benefit from splitting a program into manageable subparts only if each part provides some sort of abstraction. For example, we can have, connect_to_database, query_for_students, sort_by_grade, and take_the_first_n abstractions packed as functions or subroutines, and is much easier to understand the code which is expressed in terms of those abstractions, rather than trying to understand the code in which all those functions are inlined.
So now we have functions and it is natural to introduce the next level of organization -- collections of functions. It is common to see that some functions build families around some common abstraction, e.g., student_name, student_grade, student_courses, etc, they all revolve around the same abstraction student. The same is for connection_establish, connection_close, etc. Therefore we need some mechanism that will tie together those functions. Here we are starting to have options. Some languages took the OOP path, in which objects and classes are the units of the organization. Where a bunch of functions and a state is called an object. Other languages took a different path and decided to combine functions into static structures called modules. The main difference is that a module is a static, compile-time structure, where objects are runtime structures that have to be created in runtime to be used. As a result, naturally, objects tend to contain state, while modules do not (and contain only code). And objects are inherently regular values, which you can assign to variables, store them in files, and do other manipulations which you can do with data. Classical modules, contrary to objects, do not have runtime representation, therefore you can't pass modules as parameters to your functions, store them in a list, and otherwise perform any computations on modules. This is basically what people mean by saying first class citizen - an ability to treat an entity as a simple value.
Back to composable programs. In order to make objects/modules composable, we need to be sure that they create abstractions. For functions abstraction boundary is clearly defined - it is the tuple of parameters. For objects, we have a notion of interfaces and classes. While for modules we have only interfaces. Since modules are inherently more simple (they do not include the state) we do not have to deal with their constructing and deconstructing, therefore we do not need a more complicated notion of a class. Both classes and interfaces is a way to classify objects and modules by some criteria so that we can reason about different modules without looking into the implementation, the same way as we did with connect_to_database, query_for_students, et al functions - we were reasoning about them only based on their name and interface (and probably documentation). Now we can have a class student or a module Student both defining an abstraction called student, so that we can save a lot of brain power, without having to deal with the way how are those students implemented.
And beyond making our programs easier to understand, abstractions give us another benefit -- generalization. Since we don't need to reason about the implementation of a function or a module, it means that all implementations are interchangeable to some degree. Therefore, we can write our programs so that they will express their behavior in a general way, without breaking the abstractions, and then choose particular instances when we run our programs. Objects are runtime instances and essentially it means that we can choose our implementation in runtime. Which is nice. Classes are, however, rarely first-class citizens, therefore we have to invent different cumbersome methods to make the selection, like the Abstract Factory and Builder design patterns. For modules, the situation is even worse, since they are inherently a compile-time structure, we have to choose our implementation at the program building/lining time. Which is not what people want to do in the modern world.
And here comes first-class modules, being an amalgamation of modules and objects, they give us the best of two worlds - an easy to reason about stateless structures, which are, at the same time, a pure first-class citizens, which you can store in a variable, put into list and select the desired implementation in runtime.
Speaking of OCaml, underneath the hood, first-class modules are simply a record of functions. In OCaml, you can even add state to the first-class module making it practically indistinguishable from an object. This brings us to another topic - in the real world, the separation between objects and structures is not that clear. For example, OCaml provides both modules and objects and you can put objects inside modules and even vice verse. In C/C++ we have compilation units, symbols visibility, opaque data types, and header files, which enables some sort of modular programming, as well as we have structures and namespaces. Therefore, the difference is sometimes hard to tell.
Therefore, to summarize. Modules are pieces of code with a well-defined interface to access this code. First class modules are modules which could be manipulated as a regular value, e.g., stored in a data structure, assigned a variable, and picked at runtime.
OCaml perspective here.
Modules and classes are very different.
First of all, classes in OCaml are a very specific (and complex) feature. To go into some detail, classes implement inheritance, row polymorphism and dynamic dispatch (aka virtual methods). It allows them to be highly flexible at the expense of some efficiency.
Modules, however, are quite a different thing altogether.
Indeed, you can see modules as atomic mini-libraries, and usually they are used to define a type and its accessors, but they are much more powerful than just that.
Modules allow you to create several types, as well as module types and submodules. Basically, they allow to create complex compartmentalization and abstraction.
Functors give you behavior similar to c++'s templates. Except they are safe. Basically, they are functions on modules, which allows you to parameterize a data structure or algorithm over some other module.
Modules are usually solved statically and therefore easy to inline, allowing you to write clear code without fear of a loss in efficiency.
Now, a first-class citizen is an entity that can be put in a variable, passed to a function and tested for equality. In a way, it means they will be dynamically evaluated.
For example, suppose you have a module Jpeg and a module Png that allow you to manipulate different kind of pics. Statically, you don't know what kind of image you'll need to display. So you can use first-class modules:
let get_img_type filename =
match Filename.extension filename with
| ".jpg" | ".jpeg" -> (module Jpeg : IMG_HANDLER)
| ".png" -> (module Png : IMG_HANDLER)
let display_img img_type filename =
let module Handler = (val img_type : IMG_HANDLER) in
Handler.display filename
The main differences between a module and an object usually are
Modules are second-class, i.e., they are rather static entities that cannot be passed around as values, while objects can.
Modules can contain types and all other forms of declarations (and types can be made abstract), while objects typically cannot.
However, as you note, there are languages where modules can be wrapped up as first-class values (e.g. Ocaml) and there are languages where objects can contain types (e.g. Scala). That blurs the line a little. There still tend to be various biases towards certain patterns, with different trade-offs made in the type systems. For example, objects focus on recursive types, while modules focus on type abstraction and allowing any definition. It is a very hard problem to support both at the same time without severe compromises, since that quickly leads to an undecidable type system.
As has been mentioned already, "modules", "classes" and "objects" are more like tendencies than strict formal definitions. And if you implement modules as objects for example, as I understand Scala does, then obviously there are no fundamental differences between them, but mostly just syntactic differences that make them more convenient for certain use cases.
In regards to OCaml specifically though, here's a practical example of something you cannot do with modules that you can do with classes because of fundamental differences in implementation:
Modules have functions, which can reference each other recursively using the rec and and keyword. A module can also "inherit" the implementation of another module using include and override its definitions. For example:
module Base = struct
let name = "base"
let print () = print_endline name
end
module Child = struct
include Base
let name = "child"
end
but because modules are early bound, that is, names are resolved at compile time, it's not possible to get Base.print to reference Child.name instead of Base.name. At least not without altering both Base and Child significantly to explicitly enable it:
module AbstractBase(T : sig val name : string end) = struct
let name = T.name
let print () = print_endline name
end
module Base = struct
include AbstractBase(struct let name = "base" end)
end
module Child = struct
include AbstractBase(struct let name = "child" end)
end
With classes on the other hand, overriding is trivial and the default:
class base = object(self)
method name = "base"
method print = print_endline self#name
end
class child = object
inherit base
method! name = "child"
end
Classes can reference themselves, through a variable conventionally named this or self (in OCaml you can name it whatever you want, but self is the convention). They are also late bound, meaning they are resolved at runtime and can therefore call method implementations that didn't exist when it was defined. This is called open recursion.
So why aren't modules late bound too? Primarily for performance reasons I think. Doing a dictionary search on the name of every function call will undoubtedly have a significant impact on execution time.
I see this a lot in examples I read in books and articles:
(caddr *something*)
Or the many variants of c***r commands.
It seems a bit ridiculous to me, when you can more clearly just pull things out with elt:
(elt *something* 2)
But I don't see this technique used as much.
Is there a convention I don't understand that prefers the c***r functions?
elt is a generic function that works on both lists and arrays. It is needed when you want to write a generic algorithm which works in the same way on both data types. But there aren't many such algorithms because this would usually disadvantage lists.
The general convention seems to be that:
If you write a general-purpose function to work on lists you would use c(a|d)+r function (these cases are exceedingly rare because most of the time there is a library function for that). This usually happens in Stackoverflow questions code / code for class assignments etc :)
Seasoned Lisp programmers would use first, second etc. if possible. This is also some times mentioned in best practices. The best practices would also usually mention that one should create appropriate data structures instead of dealing with non-trivial lists.
nth or elt are indeed rare because it's hard to think of a good use case for them. I can imagine how both can be used in macros, where the performance isn't important, but some kind of generality is desirable, for instance if someone would want to deal with strings and lists of characters in the same way. Maybe in some prototyping code, where the programmer isn't yet sure of what data type they are going to use, but that's about it.
Functions like caddr, etc can extract components from lists nested inside other lists. But elt only operates on the top-level list, thus it could return a nested list in its entirety, but you'd need to also nest elt commands to extract components, which is what caddr is ultimately doing, so they are not really so comparable.
In some cases, you could interchange them, as they might return the same results if your list has no lists inside it.
When I return something of type Option, it seems useful to explain in the name of the function name that it is an option, not the thing itself. For example, seqs have reduceOption. Is there a standard naming convention? Things I have seen:
maybeFunctionName
functionNameOption
- neither seems to be all that great.
reduceOption and friends (headOption, etc.) are only named that way to distinguish them from their unsafe alternatives (which arguably shouldn't exist in the first placeāi.e, there should just be a head that returns an Option[A]).
whateverOption isn't the usual practice in the standard library (or most other Scala libraries that I'm aware of), and in general you shouldn't need or want to use this kind of Hungarian notation in Scala.
Why would you want to make your function names longer? It doesn't contribute anything, as the fact that it returns an Option is obvious when looking at the function's type.
reduceOption is sort of a special case, since in most cases you really want to use reduce, except that it doesn't work on empty sequences.
I have been reading a lot about Haskell lately, and the benefits that it derives from being a purely functional language. (I'm not interested in discussing monads for Lisp) It makes sense to me to (at least logically) isolate functions with side-effects as much as possible. I have used setf and other destructive functions plenty, and I recognize the need for them in Lisp and (most of) its derivatives.
Here we go:
Would something like (declare pure) potentially help an optimizing compiler? Or is this a moot point because it already knows?
Would the declaration help in proving a function or program, or at least a subset that was declared as pure? Or is this again something that is unnecessary because it's already obvious to the programmer and compiler and prover?
If for nothing else, would it be useful to a programmer for the compiler to enforce purity for functions with this declaration and add to the readability/maintainablity of Lisp programs?
Does any of this make any sense? Or am I too tired to even think right now?
I'd appreciate any insights here. Info on compiler implementation or provability is welcome.
EDIT
To clarify, I didn't intend to restrict this question to Common Lisp. It clearly (I think) doesn't apply to certain derivative languages, but I'm also curious if some features of other Lisps may tend to support (or not) this kind of facility.
You have two answers but neither touch on the real problem.
First, yes, it would obviously be good to know that a function is pure. There's a ton of compiler level things that would like to know that, as well as user level things. Given that lisp languages are so flexible, you could twist things a bit: instead of a "pure" declaration that asks the compiler to try harder or something, you just make the declaration restrict the code in the definition. This way you can guarantee that the function is pure.
You can even do that with additional supporting facilities -- I mentioned two of them in a comment I made to johanbev's answer: add the notion of immutable bindings and immutable data structures. I know that in Common Lisp these are very problematic, especially immutable bindings (since CL loads code by "side-effecting" it into place). But such features will help simplifying things, and they're not inconceivable (see for example the Racket implementation that has immutable pairs and other data structures, and has immutable bindings.
But the real question is what can you do in such restricted functions. Even a very simple looking problem would be infested with issues. (I'm using Scheme-like syntax for this.)
(define-pure (foo x)
(cons (+ x 1) (bar)))
Seems easy enough to tell that this function is indeed pure, it doesn't do anything . Also, seems that having define-pure restrict the body and allow only pure code would work fine in this case, and will allow this definition.
Now start with the problems:
It's calling cons, so it assumes that it is also known to be pure. In addition, as I mentioned above, it should rely on cons being what it is, so assume that the cons binding is immutable. Easy, since it's a known builtin. Do the same with bar, of course.
But cons does have a side effect (even if you're talking about Racket's immutable pairs): it allocates a new pair. This seems like a minor and ignorable point, but, for example, if you allow such things to appear in pure functions, then you won't be able to auto-memoize them. The problem is that someone might rely on every foo call returning a new pair -- one that is not-eq to any other existing pair. Seems that to make it fine you need to further restrict pure functions to deal not only with immutable values, but also values where the constructor doesn't always create a new value (eg, it could hash-cons instead of allocate).
But that code also calls bar -- so no you need to make the same assumptions on bar: it must be known as a pure function, with an immutable binding. Note specifically that bar receives no arguments -- so in that case the compiler could not only require that bar is a pure function, it could also use that information and pre-compute its value. After all, a pure function with no inputs could be reduced to a plain value. (Note BTW that Haskell doesn't have zero-argument functions.)
And that brings another big issue in. What if bar is a function of one input? In that case you'd have an error, and some exception will get thrown ... and that's no longer pure. Exceptions are side-effects. You now need to know the arity of bar in addition to everything else, and you need to avoid other exceptions. Now, how about that input x -- what happens if it isn't a number? That will throw an exception too, so you need to avoid it too. This means that you now need a type system.
Change that (+ x 1) to (/ 1 x) and you can see that not only do you need a type system, you need one that is sophisticated enough to distinguish 0s.
Alternatively, you could re-think the whole thing and have new pure arithmetic operations that never throw exceptions -- but with all the other restrictions you're now quite a long way from home, with a language that is radically different.
Finally, there's one more side-effect that remains a PITA: what if the definition of bar is (define-pure (bar) (bar))? It certainly is pure according to all of the above restrictions... But diverging is a form of a side effect, so even this is no longer kosher. (For example, if you did make your compiler optimize nullary functions to values, then for this example the compiler itself would get stuck in an infinite loop.) (And yes, Haskell doesn't deal with that, it doesn't make it less of an issue.)
Given a Lisp function, knowing if it is pure or not is undecidable in general. Of course, necessary conditions and sufficient conditions can be tested at compile time. (If there are no impure operations at all, then the function must be pure; if an impure operation gets executed unconditionally, then the function must be impure; for more complicated cases, the compiler could try to prove that the function is pure or impure, but it will not succeed in all cases.)
If the user can manually annotate a function as pure, then the compiler could either (a.) try harder to prove that the function is pure, ie. spend more time before giving up, or (b.) assume that it is and add optimizations which would not be correct for impure functions (like, say, memoizing results). So, yes, annotating functions as pure could help the compiler if the annotations are assumed to be correct.
Apart from heuristics like the "trying harder" idea above, the annotation would not help to prove stuff, because it's not giving any information to the prover. (In other words, the prover could just assume that the annotation is always there before trying.) However, it could make sense to attach to pure functions a proof of their purity.
The compiler could either (a.) check if pure functions are indeed pure at compile time, but this is undecidable in general, or (b.) add code to try to catch side effects in pure functions at runtime and report those as an error. (a.) would probably be helpful with simple heuristics (like "an impure operation gets executed unconditionally), (b.) would be useful for debug.
No, it seems to make sense. Hopefully this answer also does.
The usual goodies apply when we can assume purity and referential
transparency. We can automatically memoize hotspots. We can
automatically parallelize computation. We can deal away with a lot of
race conditions. We can also use structure sharing with data that we
know cannot be modified, for instance the (quasi) primitive ``cons()''
does not need to copy the cons-cells in the list it's consing to.
These cells are not affected in any way by having another cons-cell
pointing to it. This example is kinda obvious, but compilers are often
good performers in figuring out more complex structure sharing.
However, actually determining if a lambda (a function) is pure or has
referential transparency is very tricky in Common Lisp. Remember that
a funcall (foo bar) start by looking at (symbol-function foo). So in
this case
(defun foo (bar)
(cons 'zot bar))
foo() is pure.
The next lambda is also pure.
(defun quux ()
(mapcar #'foo '(zong ding flop)))
However, later on we can redefine foo:
(let ((accu -1))
(defun foo (bar)
(incf accu)))
The next call to quux() is no longer pure! The old pure foo() has been
redefined to an impure lambda. Yikes. This example is maybe somewhat
contrived but it's not that uncommon to lexically redefine some
functions, for instance with a let block. In that case it's not
possible to know what would happen at compile time.
Common Lisp has a very dynamic semantic, so actually being
able to determine control flow and data flow ahead of time (for
instance when compiling) is very hard, and in most useful cases
entirely undecidable. This is quite typical of languages with dynamic
type systems. There is a lot of common idioms in Lisp you cannot use
if you must use static typing. It's mainly these that fouls any
attempt to do much meaningful static analysis. We can do it for primitives
like cons and friends. But for lambdas involving other things than
primitives we are in much deeper water, especially in the cases where
we need to look at complex interplay between functions. Remember that
a lambda is only pure if all the lambdas it calls are also pure.
On the top of my head, it could be possible, with some deep macrology,
to do away with the redefinition problem. In a sense, each lambda gets
an extra argument which is a monad that represents the entire state of
the lisp image (we can obviously restrict ourselves to what the function
will actually look at). But it's probably more useful to be able do
declare purity ourselves, in the sense that we promise the compiler
that this lambda is indeed pure. The consequences if it isn't is then
undefined, and all sorts of mayhem could ensue...
I want to provide multiple implementations of a message reader/writer. What is the best approach?
Here is some pseudo-code of what I'm currently thinking:
just have a set of functions that all implementations must provide and leave it up to the caller to hold onto the right streams
(ns x-format)
(read-message [stream] ...)
(write-message [stream message] ...)
return a map with two closed functions holding onto the stream
(ns x-format)
(defn make-formatter [socket]
{:read (fn [] (.read (.getInputStream socket))))
:write (fn [message] (.write (.getOutputStream socket) message)))})
something else?
I think the first option is better. It's more extensible, depending how these objects are going to be used. It's easier to add or change a new function that works on an existing object if the functions and objects are separate. In Clojure there usually isn't much reason to bundle functions along with the objects they work on, unless you really want to hide implementation details from users of your code.
If you're writing an interface for which you expect many implementations, consider using multimethods also. You can have the default throw a "not implemented" exception, to force implementors to implement your interface.
As Gutzofter said, if the only reason you're considering the second option is to allow people not to have to type a parameter on every function call, you could consider having all of your functions use some var as the default socket object and writing a with-socket macro which uses binding to set that var's value. See the builtin printing methods which default to using the value of *out* as the output stream, and with-out-str which binds *out* to a string writer, as a Clojure example.
This article may interest you; it compares and contrasts some OOP idioms with Clojure equivalents.
I think that read-message and write-message are utility functions. What you need to do is encapsulate your functions in a with- macro(s). See 'with-output-to-string' in common lisp to see what I mean.
Edit:
When you use a with- macro you can have error handling and resource allocation in the macro expansion.
I'd go with the first option and make all those functions multimethods.