Recommended macros to add functionality to Clojure's defrecord constructor? - macros

defrecord in clojure allows for defining simple data containers with custom fields.
e.g.
user=> (defrecord Book [author title ISBN])
user.Book
The minimal constructor that results takes only positional arguments with no additional functionality such as defaulting of fields, field validation etc.
user=> (Book. "J.R.R Tolkien" "The Lord of the Rings" 9780618517657)
#:user.Book{:author "J.R.R Tolkien", :title "The Lord of the Rings", :ISBN 9780618517657}
It is always possible to write functions wrapping the default constructor to get more complex construction semantics - using keyword arguments, supplying defaults and so on.
This seems like the ideal scenario for a macro to provide expanded semantics. What macros have people written and/or recommend for richer defrecord construction?

Examples of support for full and partial record constructor functions and support for eval-able print and pprint forms:
http://david-mcneil.com/post/765563763/enhanced-clojure-records
http://github.com/david-mcneil/defrecord2
David is a colleague of mine and we are using this defrecord2 extensively in our project. I think something like this should really be part of Clojure core (details might vary considerably of course).
The things we've found to be important are:
Ability to construct a record with named (possibly partial) parameters: (new-foo {:a 1})
Ability to construct a record by copying an existing record and making modifications: (new-foo old-foo {:a 10})
Field validation - if you pass a field outside the declared record fields, throw an error. Of course, this is actually legal and potentially useful, so there are ways to make it optional. Since it would be rare in our usage, it's far more likely to be an error.
Default values - these would be very useful but we haven't implemented it. Chas Emerick has written about adding support for default values here: http://cemerick.com/2010/08/02/defrecord-slot-defaults/
Print and pprint support - we find it very useful to have records print and pprint in a form that is eval-able back to the original record. For example, this allows you to run a test, swipe the actual output, verify it, and use it as the expected output. Or to swipe output from a debug trace and get a real eval-able form.

Here is one that defines a record with default values and invariants. It creates a ctor that can take keyword args to set the values of the fields.
(defconstrainedrecord Foo [a 1 b 2]
[(every? number? [a b])])
(new-Foo)
;=> #user.Foo{:a 1, :b 2}
(new-Foo :a 42)
; #user.Foo{:a 42, :b 2}
And like I said... invariants:
(new-Foo :a "bad")
; AssertionError
But they only make sense in the context of Trammel.

Here is one approach: http://david-mcneil.com/post/765563763/enhanced-clojure-records

Related

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.

Scala: when to use explicit type annotations

I've been reading a lot of other people's Scala code recently, and one of the things that I have difficultly with (coming from Java) is a lack of explicit type annotations.
It's certainly convenient when writing code to be able to leave out type annotations -- however when reading code I often find that explicit type annotations help me to understand at a glance what code is doing more easily.
The Scala style guide (http://docs.scala-lang.org/style/types.html) doesn't seem to provide any definitive guidance on this, stating:
Use type inference where possible, but put clarity first, and favour explicitness in public APIs.
To my mind, this is a bit contradictory. While it's clearly obvious what type this variable is:
val tokens = new HashMap[String, Int]
It's not so obvious what type this one is:
val tokens = readTokens()
So, if I was putting clarity first I would probably annotate all variables where the type is not already declared on the same line.
Do any Scala practitioners have guidance on this? Am I crazy to be considering adding type annotations to my local variables? I'm particularly interested in hearing from folks who spend a lot of time reading scala code (for example, in code reviews), as well as writing it.
It's not so obvious what type this one is:
val tokens = readTokens()
Good names are important: the name is plural, ergo it returns some collection of some kind. The most general collection types in Scala are Traversable and Iterator, and they mostly share a common interface, so it's not really important which one of the two it is. The name also talks about "reading tokens", ergo it obviously should return Tokens in some fashion. And last but not least, the method call has parentheses, which according to the style guide means it has side-effects, so I wouldn't count on being able to traverse the collection more than once.
Ergo, the return type is something like
Traversable[Token]
or
Iterator[Token]
and which of the two it is doesn't really matter because their client interfaces are mostly identical.
Note also that the latter constraint (only traversing the collection once) isn't even captured in the type, even if you were providing an explicit type, you would still have to look at the name and the style!

Motivation for Scala underscore in terms of formal language theory and good style?

Why is it that many people say that using underscore is good practice in Scala and makes your code more readable? They say the motivation comes from formal language theory. Nevertheless many programmers, particularly from other languages, especially those that have anonymous functions, prefer not to use underscores particularly for placeholders.
So what is the point in the underscore? Why does Scala (and some other functional languages as pointed by om-nom-nom) have the underscore? And what is the formal underpinning, in terms of complexity and language theory, as to why it often good style to use it?
Linguistics
The origin and motivation for most of the underscore uses in Scala is to allow one to construct expressions and declarations without the need to always give every variable (I mean "variable" as in Predicate Calculus, not in programming) of the language a name. We use this all the time in Natural Language, for example I referred to a concept in the previous sentence in this sentence using "this" and I referred to this sentence using "this" without there being any confusion over what I mean. In Natural Language these words are usually called "pronouns", "anaphors", "cataphors", the referents "antecedent" or "postcedent", and the process of understanding/dereferencing them is called "anaphora".
Algorithmic Information Theory
If we had to name every 'thing' in Natural Language before we can refer to it, similarly every type of thing in order to quantify over it, as in Predicate Calculus and in most programming languages, then speaking would become extremely long winded. It is thanks to context that we can infer what is meant by words like "this", "it", "that", etc, we do it easily.
Therefore why restrict this simple, elegant and efficient means to communicate to Natural Language? So it was added to Scala.
If we did attempt to name every single 'thing' or 'type of thing', sentences become so long and complicated that it becomes very difficult to understand due to it's verbosity and the introduction of redundant symbols. The more symbols you add to a sentence the more difficult it becomes to understand, ergo this is why it's good practice, not only in Natural Language, but in Scala too. In fact one could formalize this assertion in terms of Kolmogorov Complexity and prove that a sequence of sentences adopting placeholders have lower complexity than those that unnecessarily name everything (unless the name is exactly the same in every instance, but that usually doesn't make sense). Therefore we can conclusively say contrary to some programmers belief, that the placeholder syntax is simpler and easier to read.
The reason why it has some resistance in it's use, is that if one is already a programmer, one must make an effort to retrain the brain not to name everything, just as (if they can remember) they may have found learning to code in the first place required quite an effort.
Examples
Now let's look at some specific uses more formally:
Placeholder Syntax
Means "it", "them", "that", "their" etc (i.e. pronouns), e.g. 1
lines.map(_.length)
can be read as "map lines to their length", similarly we can read lineOption.map(_.length) as "map the line to it's length". In terms of complexity theory, this is simpler than "for each 'line' in lines, take the length of 'line'" - which would be lines.map(line => line.length).
Can also be read as "the" (definite article) when used with type annotation, e.g.
(_: Int) + 1
"Add 1 to the integer"
Existential Types
Means "of some type" ("some" the pronoun), e.g
foo: Option[_]
means "foo is an Option of some type".
Higher Kinded type parameters
Again, basically means "of some type" ("some" the pronoun), e.g.
class A[K[_],T](a: K[T])
Can be read "class A takes some K of some type ..."
Pattern Match Wildcards
Means "anything" or "whatever" (pronouns), e.g.
case Foo(_) => "hello"
can be read as "for a Foo containing anything, return 'hello'", or "for a Foo containing whatever, return 'hello'"
Import Wildcards
Means "everything" (pronoun), e.g.
import foo._
can be read as "import everything from foo".
Default Values
Now I read this like "a" (indefinite article), e.g.
val wine: RedWine = _
"Give me a red wine", the waiter should give you the house red.
Other uses of underscore
The other uses of underscores are not really related to the point of this Q&A, nevertheless we breifly discuss them
Ignored Values/Params/Extractions
Allow us to ignore things in an explicit 'pattern safe' way. E.g.
val (x, _) = getMyPoint
Says, we are not going to use the second coordinate, so no need to get freaky when you cant find a use in the code.
Import Hidding
Just a way to say "except" (preposition).
Function Application
E.g.
val f: String => Unit = println _
This is an interesting one as it has an exact analogue in linguistics, namely nominalization, "the use of a verb, an adjective, or an adverb as the head of a noun phrase, with or without morphological transformation" - wikipedia. More simply it is the process of turning verbs or adjectives into nouns.
Use in special method names
Purely a syntax thing and doesn't really relate to linguistics.

PLT Redex: parameterizing a language definition

This is a problem that's been nagging at me for some time, and I wonder if anyone here can help.
I have a PLT Redex model of a language called lambdaLVar that is more or less a garden-variety untyped lambda calculus, but extended with a store containing "lattice variables", or LVars. An LVar is a variable whose value can only increase over time, where the meaning of "increase" is given by a partially ordered set (aka a lattice) that the user of the language specifies. Therefore lambdaLVar is really a family of languages -- instantiate it with one lattice and you get one language; with a different lattice, and you get another. You can take a look at the code here; the important stuff is in lambdaLVar.rkt.
In the on-paper definition of lambdaLVar, the language definition is parameterized by that user-specified lattice. For a long time, I've wanted to do the same kind of parameterization in the Redex model, but so far, I haven't been able to figure out how. Part of the trouble is that the grammar of the language depends on how the user instantiates the lattice: elements of the lattice become terminals in the grammar. I don't know how to express a grammar in Redex that is abstract over the lattice.
In the meantime, I tried to make lambdaLVar.rkt as modular as I could. The language defined in that file is specialized to a particular lattice: natural numbers with max as the least-upper-bound (lub) operation. (Or, equivalently, natural numbers ordered by <=. It's a very boring lattice.) The only parts of the code that are specific to that lattice are the line (define lub-op max) near the top, and natural appearing in the grammar. (There's a lub metafunction that is defined in terms of the user-specified lub-op function. The latter is just a Racket function, so lub has to escape out to Racket to call lub-op.)
Barring the ability to actually specify lambdaLVar in a way that is abstract over the choice of lattice, it seems like I ought to be able to write a version of lambdaLVar with the most bare-bones of lattices -- just Bot and Top elements, where Bot <= Top -- and then use define-extended-language to add more stuff. For instance, I could define a language called lambdaLVar-nats that is specialized to the naturals lattice I described:
;; Grammar for elements of a lattice of natural numbers.
(define-extended-language lambdaLVar-nats
lambdaLVar
(StoreVal .... ;; Extend the original language
natural))
;; All we have to specify is the lub operation; leq is implicitly <=
(define-metafunction/extension lub lambdaLVar-nats
lub-nats : d d -> d
[(lub-nats d_1 d_2) ,(max (term d_1) (term d_2))])
Then, to replace the two reduction relations slow-rr and fast-rr that I had for lambdaLVar, I could define a couple of wrappers:
(define nats-slow-rr
(extend-reduction-relation slow-rr
lambdaLVar-nats))
(define nats-fast-rr
(extend-reduction-relation fast-rr
lambdaLVar-nats))
My understanding from the documentation on extend-reduction-relation is that it should reinterpret the rules in slow-rr and fast-rr, but using lambdaLVar-nats. Putting all this together, I tried running the test suite that I had with one of the new, extended reduction relations:
> (program-test-suite nats-slow-rr)
The first thing I get is a contract violation complaint: small-step-base: input (((l 3)) new) at position 1 does not match its contract. The contract line of small-step-base is just #:contract (small-step-base Config Config), where Config is a grammar nonterminal that has a new meaning if reinterpreted under lambdaLVar-nats than it did under lambdaLVar, because of the specific lattice stuff. As an experiment, I got rid of the contracts onsmall-step-base and small-step-slow.
I was then able to actually run my 19 test programs, but 10 of them fail. Perhaps unsurprisingly, all the ones that fail are programs that use natural-number-valued LVars in some way. (The rest are "pure" programs that don't interact with the store of LVars at all.) So, the tests that fail are exactly the ones that use the extended grammar.
So I kept following the rabbit hole, and it seems like Redex wants me to extend all of the existing judgment forms and metafunctions to be associated with lambdaLVar-nats rather than lambdaLVar. That makes sense, and it seems to work OK for judgment forms as far as I can tell, but with metafunctions I get into trouble: I want the new metafunction to overload the old one of the same name (because existing judgment forms are using it) and there doesn't seem to be a way to do that. If I have to rename the metafunctions, it defeats the purpose, because I'll have to write whole new judgment forms anyway. I suppose that what I want is a sort of late binding of metafunction calls!
My question in a nutshell: Is there any way in Redex to parameterize the definition of a language in the way I want, or to extend the definition of a language in a way that will do what I want? Will I end up just having to write Redex-generating macros?
Thanks for reading!
I asked the Racket users mailing list; the thread begins here. To summarize the resulting discussion: In Redex as it stands today, the answer is no, there is no way to parameterize a language definition in the way I want. However, it should be possible in a future version of Redex with a module system, which is in the works right now.
It also doesn't work to try to use Redex's existing extension forms (define-extended-language, extend-reduction-relation, and so on) in the way I tried to do here, because -- as I discovered -- the original metafunctions do not get transitively reinterpreted to use the extended languages. But a module system would apparently help with this, too, because it would allow you to package up metafunctions, judgment-forms, and reduction relations together and simultaneously extend them (see the discussion here).
So, for now, the answer is, indeed, to write a Redex-generating macro. Something like this works:
(define-syntax-rule (define-lambdaLVar-language name lub-op lattice-values ...)
(begin
;; Entire original Redex model goes here, with `natural` replaced with
;; `lattice-values ...`, and instances of `...` replaced with `(... ...)`
))
And then you can instantiate particular lattices with, e.g.,:
(define-lambdaLVar-language lambdaLVar-nat max natural)
I hope Redex does get modules soon, but in the meantime, this seems to work well.

Questions about Lists and other stuff in Clojure

I have a few questions concerning Lists, classes and variables in clojure.
This may seem quite stupid but how do I access elements in a List ?
I'm coding a program that allows you to manipulate a phonebook; You can add an entry, delete one or print the information about one. This leads me to two questions :
Is there a way to create a class "entry" that would contain "name" "adress" "phone number" variables ? or is that impossible in clojure (and functional programming in general ?) If I can't have a List of objects containing that information, how would I go about this task ?
I was thinking of having a function that reads user input to know what the user wants to do (add entry, delete entry or print info) then calls the appropriate function to do that which calls back the first function when it's done. Is passing as a parameter the List of entries to each function the right thing to do ?
This may seem quite stupid but how do I access elements in a List ?
(nth coll index)
For example:
(nth [1 2 3 4] 2) ; -> 3 (since it uses zero-based indexing)
Is there a way to create a class "entry" that would contain "name" "adress" "phone number" variables ? or is that impossible in clojure (and functional programming in general ?) If I can't have a List of objects containing that information, how would I go about this task ?
It's possible in Clojure, but unidiomatic. In Clojure, the basic abstraction for data entities are maps, not classes (except for some corner cases where direct interoperation with Java frameworks is needed). So you would just use a map:
(def entry {:name "x" :address "y" :phone-number "z"})
To access the item's name, you either use
(:name entry)
or
(get entry :name)
The former works only when the keys of the map are keywords, the latter works with all types of key.
So for your example, your data model (the phonebook) would be a seq (say, a list or a vector) of such maps.
I was thinking of having a function that reads user input to know what the user wants to do (add entry, delete entry or print info) then calls the appropriate function to do that which calls back the first function when it's done. Is passing as a parameter the List of entries to each function the right thing to do?
Since your model consists of only one main data structure (the phone book seq), passing it as an arg is certainly an appropriate way to design your functions. If you expect to have more kinds of top-level containers (i.e. for a more real world application), I'd recommend looking into the Application Context Pattern, which will look a bit intimidating at first (at least it did for me, and it contains a lot of Clojure-specific jargon), but is well worth the effort to learn.
Have you considered buying the book Programming Clojure? A pdf version is only $21 US. Well worth the money in my opinion.
(entry :name)
will also work as well when accessing a map. So you have three ways of accessing an element of a map by using the keyword:
(entry :name)
or
(:name entry)
or
(get entry :name)
where
(def entry {:name "x" :address "y" :phone-number "z"})
As Rayne mentioned, the second form is only possible if the key is a keyword. You can use the other "short" form with keys of other types:
user=>(def my-map {"a" "b" "c" "d"})
user=>(my-map "c")
"d"
user=>(get my-map "a")
"b"
For part one of your question, if you will be accessing the items in your list with (nth ...) you may consider using a vector. Vectors are not like arrays in other languages. for instance chopping them and adding new elements to the end are also efficient in addition to numerical indexing. Under the hood the arrays are actually very similar to maps.
best of all arrays are functions of an index:
(def a [1 2 3 4])
(a 2) ==> 3
for parts 2 and 3 pmf's answer cover them very nicely.