PLT Redex: parameterizing a language definition - racket

This is a problem that's been nagging at me for some time, and I wonder if anyone here can help.
I have a PLT Redex model of a language called lambdaLVar that is more or less a garden-variety untyped lambda calculus, but extended with a store containing "lattice variables", or LVars. An LVar is a variable whose value can only increase over time, where the meaning of "increase" is given by a partially ordered set (aka a lattice) that the user of the language specifies. Therefore lambdaLVar is really a family of languages -- instantiate it with one lattice and you get one language; with a different lattice, and you get another. You can take a look at the code here; the important stuff is in lambdaLVar.rkt.
In the on-paper definition of lambdaLVar, the language definition is parameterized by that user-specified lattice. For a long time, I've wanted to do the same kind of parameterization in the Redex model, but so far, I haven't been able to figure out how. Part of the trouble is that the grammar of the language depends on how the user instantiates the lattice: elements of the lattice become terminals in the grammar. I don't know how to express a grammar in Redex that is abstract over the lattice.
In the meantime, I tried to make lambdaLVar.rkt as modular as I could. The language defined in that file is specialized to a particular lattice: natural numbers with max as the least-upper-bound (lub) operation. (Or, equivalently, natural numbers ordered by <=. It's a very boring lattice.) The only parts of the code that are specific to that lattice are the line (define lub-op max) near the top, and natural appearing in the grammar. (There's a lub metafunction that is defined in terms of the user-specified lub-op function. The latter is just a Racket function, so lub has to escape out to Racket to call lub-op.)
Barring the ability to actually specify lambdaLVar in a way that is abstract over the choice of lattice, it seems like I ought to be able to write a version of lambdaLVar with the most bare-bones of lattices -- just Bot and Top elements, where Bot <= Top -- and then use define-extended-language to add more stuff. For instance, I could define a language called lambdaLVar-nats that is specialized to the naturals lattice I described:
;; Grammar for elements of a lattice of natural numbers.
(define-extended-language lambdaLVar-nats
lambdaLVar
(StoreVal .... ;; Extend the original language
natural))
;; All we have to specify is the lub operation; leq is implicitly <=
(define-metafunction/extension lub lambdaLVar-nats
lub-nats : d d -> d
[(lub-nats d_1 d_2) ,(max (term d_1) (term d_2))])
Then, to replace the two reduction relations slow-rr and fast-rr that I had for lambdaLVar, I could define a couple of wrappers:
(define nats-slow-rr
(extend-reduction-relation slow-rr
lambdaLVar-nats))
(define nats-fast-rr
(extend-reduction-relation fast-rr
lambdaLVar-nats))
My understanding from the documentation on extend-reduction-relation is that it should reinterpret the rules in slow-rr and fast-rr, but using lambdaLVar-nats. Putting all this together, I tried running the test suite that I had with one of the new, extended reduction relations:
> (program-test-suite nats-slow-rr)
The first thing I get is a contract violation complaint: small-step-base: input (((l 3)) new) at position 1 does not match its contract. The contract line of small-step-base is just #:contract (small-step-base Config Config), where Config is a grammar nonterminal that has a new meaning if reinterpreted under lambdaLVar-nats than it did under lambdaLVar, because of the specific lattice stuff. As an experiment, I got rid of the contracts onsmall-step-base and small-step-slow.
I was then able to actually run my 19 test programs, but 10 of them fail. Perhaps unsurprisingly, all the ones that fail are programs that use natural-number-valued LVars in some way. (The rest are "pure" programs that don't interact with the store of LVars at all.) So, the tests that fail are exactly the ones that use the extended grammar.
So I kept following the rabbit hole, and it seems like Redex wants me to extend all of the existing judgment forms and metafunctions to be associated with lambdaLVar-nats rather than lambdaLVar. That makes sense, and it seems to work OK for judgment forms as far as I can tell, but with metafunctions I get into trouble: I want the new metafunction to overload the old one of the same name (because existing judgment forms are using it) and there doesn't seem to be a way to do that. If I have to rename the metafunctions, it defeats the purpose, because I'll have to write whole new judgment forms anyway. I suppose that what I want is a sort of late binding of metafunction calls!
My question in a nutshell: Is there any way in Redex to parameterize the definition of a language in the way I want, or to extend the definition of a language in a way that will do what I want? Will I end up just having to write Redex-generating macros?
Thanks for reading!

I asked the Racket users mailing list; the thread begins here. To summarize the resulting discussion: In Redex as it stands today, the answer is no, there is no way to parameterize a language definition in the way I want. However, it should be possible in a future version of Redex with a module system, which is in the works right now.
It also doesn't work to try to use Redex's existing extension forms (define-extended-language, extend-reduction-relation, and so on) in the way I tried to do here, because -- as I discovered -- the original metafunctions do not get transitively reinterpreted to use the extended languages. But a module system would apparently help with this, too, because it would allow you to package up metafunctions, judgment-forms, and reduction relations together and simultaneously extend them (see the discussion here).
So, for now, the answer is, indeed, to write a Redex-generating macro. Something like this works:
(define-syntax-rule (define-lambdaLVar-language name lub-op lattice-values ...)
(begin
;; Entire original Redex model goes here, with `natural` replaced with
;; `lattice-values ...`, and instances of `...` replaced with `(... ...)`
))
And then you can instantiate particular lattices with, e.g.,:
(define-lambdaLVar-language lambdaLVar-nat max natural)
I hope Redex does get modules soon, but in the meantime, this seems to work well.

Related

When to use macros functions in Erlang?

I'm currently following the book Learn You Some Erlang for Great Good by Fred Herbert and one of the sections is regarding Macros.
I understand using macros for variables (constant values, mainly), however, I don't understand the use case for macros as functions. For example, Herbert writes:
Defining a "function" macro is similar. Here's a simple macro used to subtract one number from another:
-define(sub(X, Y), X-Y).
Why not just define this as a function elsewhere? Why use a macro? Is there some sort of performance advantage from the compiler or is this merely just a "this function is so simple, let's just define it in one line" type of thing?
I'm not trying to start a debate or preference argument, but after seeing some production Erlang code, I've started noticing lots of macros function usage.
In this case, the one obvious advantage of the macro not being a function (-define(sub(X, Y), X-Y), which would be safer as -define(sub(X, Y), (X-Y))) is that it can be used as a guard since custom function calls are forbidden.
In many cases it would otherwise be safer to define the function as an inlined one.
On the other hand, there are other interesting cases, such as assertions in tests or shortcuts where what you want is to keep some local context in the final place.
For example, let's say I want to make a generic call for a test where the objective is 'match a given pattern and return a given value, or fail after M milliseconds'.
I cannot make this generic with code since patterns are not data structures you are allowed to carry around. However, with macros:
-define(wait_for(PAT, Timeout),
receive
PAT -> VAL
after Timeout ->
error(timeout)
end).
This macro can then be used as:
my_test() ->
Pid = start_whatever(),
%% ...
?wait_for({'EXIT', Pid, Reason}, 5000),
?assertMatch(shutdown, Reason).
By doing this, I'm able to simplify the form of text in some tests without needing a bunch of nesting, and in a way that is not possible with functions.
Do note that the assertion itself as defined by eunit is using a function macro, and does something akin to
-define(assertMatch(PAT, TERM),
%% funs to avoid leaking bindings into parent scope
(fun() ->
try
PAT = TERM,
true
catch _:_ ->
error({assertion_failed, ?LINE, ...})
end
end)()).
This similarly lets you carry patterns and bindings and do fancy forms that couldn't be possible otherwise.
In this last case, you'll notice I used the ?LINE macro. That's another advantage of macros: you preserve information and locality about the call site, such as its module name, line number, and so on. This is useful when such metadata is required, such as when you're reporting test failures.
If you're looking at old code, there might be macros used as a way of inlining small functions under the assumption that function calls are very expensive. I'm not sure if that was ever true, but it's not something you need to worry about today.
Macros can be used to define constants, like
-define(MAX_TIMEOUT, 30 * 1000).
%% ...
gen_server:call(my_server, {do_stuff, Data}, ?MAX_TIMEOUT),
%% ...
I mostly prefer to pass in environment variables for this job, but it's more work to read them on startup and stash them somewhere and write accessors.
Finally, you can do some simple metaprogramming:
-define(MAKE_REQUEST_FUN(Method),
Method(Request, HTTPOptions, Options) ->
httpc:request(Method, Request, HTTPOptions, Options)).
?MAKE_REQUEST_FUN(get).
?MAKE_REQUEST_FUN(put).
%% Now we've defined a get/3 that can be called as
%% get(Request, [], []).

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.

Motivation for Scala underscore in terms of formal language theory and good style?

Why is it that many people say that using underscore is good practice in Scala and makes your code more readable? They say the motivation comes from formal language theory. Nevertheless many programmers, particularly from other languages, especially those that have anonymous functions, prefer not to use underscores particularly for placeholders.
So what is the point in the underscore? Why does Scala (and some other functional languages as pointed by om-nom-nom) have the underscore? And what is the formal underpinning, in terms of complexity and language theory, as to why it often good style to use it?
Linguistics
The origin and motivation for most of the underscore uses in Scala is to allow one to construct expressions and declarations without the need to always give every variable (I mean "variable" as in Predicate Calculus, not in programming) of the language a name. We use this all the time in Natural Language, for example I referred to a concept in the previous sentence in this sentence using "this" and I referred to this sentence using "this" without there being any confusion over what I mean. In Natural Language these words are usually called "pronouns", "anaphors", "cataphors", the referents "antecedent" or "postcedent", and the process of understanding/dereferencing them is called "anaphora".
Algorithmic Information Theory
If we had to name every 'thing' in Natural Language before we can refer to it, similarly every type of thing in order to quantify over it, as in Predicate Calculus and in most programming languages, then speaking would become extremely long winded. It is thanks to context that we can infer what is meant by words like "this", "it", "that", etc, we do it easily.
Therefore why restrict this simple, elegant and efficient means to communicate to Natural Language? So it was added to Scala.
If we did attempt to name every single 'thing' or 'type of thing', sentences become so long and complicated that it becomes very difficult to understand due to it's verbosity and the introduction of redundant symbols. The more symbols you add to a sentence the more difficult it becomes to understand, ergo this is why it's good practice, not only in Natural Language, but in Scala too. In fact one could formalize this assertion in terms of Kolmogorov Complexity and prove that a sequence of sentences adopting placeholders have lower complexity than those that unnecessarily name everything (unless the name is exactly the same in every instance, but that usually doesn't make sense). Therefore we can conclusively say contrary to some programmers belief, that the placeholder syntax is simpler and easier to read.
The reason why it has some resistance in it's use, is that if one is already a programmer, one must make an effort to retrain the brain not to name everything, just as (if they can remember) they may have found learning to code in the first place required quite an effort.
Examples
Now let's look at some specific uses more formally:
Placeholder Syntax
Means "it", "them", "that", "their" etc (i.e. pronouns), e.g. 1
lines.map(_.length)
can be read as "map lines to their length", similarly we can read lineOption.map(_.length) as "map the line to it's length". In terms of complexity theory, this is simpler than "for each 'line' in lines, take the length of 'line'" - which would be lines.map(line => line.length).
Can also be read as "the" (definite article) when used with type annotation, e.g.
(_: Int) + 1
"Add 1 to the integer"
Existential Types
Means "of some type" ("some" the pronoun), e.g
foo: Option[_]
means "foo is an Option of some type".
Higher Kinded type parameters
Again, basically means "of some type" ("some" the pronoun), e.g.
class A[K[_],T](a: K[T])
Can be read "class A takes some K of some type ..."
Pattern Match Wildcards
Means "anything" or "whatever" (pronouns), e.g.
case Foo(_) => "hello"
can be read as "for a Foo containing anything, return 'hello'", or "for a Foo containing whatever, return 'hello'"
Import Wildcards
Means "everything" (pronoun), e.g.
import foo._
can be read as "import everything from foo".
Default Values
Now I read this like "a" (indefinite article), e.g.
val wine: RedWine = _
"Give me a red wine", the waiter should give you the house red.
Other uses of underscore
The other uses of underscores are not really related to the point of this Q&A, nevertheless we breifly discuss them
Ignored Values/Params/Extractions
Allow us to ignore things in an explicit 'pattern safe' way. E.g.
val (x, _) = getMyPoint
Says, we are not going to use the second coordinate, so no need to get freaky when you cant find a use in the code.
Import Hidding
Just a way to say "except" (preposition).
Function Application
E.g.
val f: String => Unit = println _
This is an interesting one as it has an exact analogue in linguistics, namely nominalization, "the use of a verb, an adjective, or an adverb as the head of a noun phrase, with or without morphological transformation" - wikipedia. More simply it is the process of turning verbs or adjectives into nouns.
Use in special method names
Purely a syntax thing and doesn't really relate to linguistics.

What is a "multisorted algebra", and how do I use it to solve "real problems"?

Apparently, Alexander Stepanov has stated the following in an interview:
“I find OOP [object-oriented programming] technically unsound. It attempts to decompose the world in terms of interfaces that vary on a single type. To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types.” [Emphasis added.]
Ignoring his statement regarding OOP for a moment, what are "multisorted algebras", beyond his terse definition, and can you give a practical example of how they are used (in the language of your choice)?
I believe he was talking about generic programming (he coined the term), whether meant in the context of this talk about the STL, or 'at large', in the sense of:
programming against a sort of interface that describes something that could fit all (and hopefully several) types (hence multi-sorted), ...
... provided they have some properties, often something about the nature of some operations on elements of that type (hence algebras).
To do (1), you need to have a way to specify a program that takes a type as a parameter, i.e. polymorphism, and to do (2), you need a way to say that you also want that type to carry specific operations (and, provided you can express them, properties). In effect, you're parametrizing your program by the structure of the data it manipulates. The paradigm is called in some places bounded polymorphism, datatype-generic programming, ... which reflects that languages have different notions of how to implement that idea — hence the italicized 'sort of' above.
For C++, it seems that —to Stepanov at least— this corresponds to templates (though ideas on how to do this best are still evolving).
For OO languages (Generic Java, C#), constraints on type parameters are typically expressed using subtype bounds ('bounded wildcards' ...).
For Haskell or Scala, you have (respectively, and similarly) type classes or implicits.
The ML family of languages prefers to do this using modules.
note that a number of proof assistants (which can express 'honest-to-god' properties as types) have developed a flavor of type classes : Isabelle, Coq, Matita are such examples
Note that Stepanov just co-wrote an entire book giving an exhaustive development of a library that embodies exactly what (I think) he means. So if you want examples in C++, this is definitely where you should look. Note also that this is much more evolved than the now-common advice of coding against an interface, rather than an object.
By 'practical example', I don't know if you mean 'how' or 'why' does one uses it. To give a caricaturally quick answer to the 'why', genericity is nice because, a bit like run-of-the-mill polymorphism, it lets you reuse code. But, more importantly:
polymorphic code that has to work with every single type often can't do anything interesting, whereas having a constrained interface to play with allows you to write richer programs
by specifying how that interface fits some your data, you have a type-safe way to select just those elements that suit your needs. For example, you probably know that the reduction operator (the reduce of Python & Hadoop, fold of a bunch of functional languages) is parallelizable only if the order in which you apply your reduction function doesn't matter (+, x, min, and work, but set difference doesn't). If you have a notion of 'type equipped with an associative operation', you know that you will be able to call a parallel reduction on it.
any overhead incurred by genericity occurs at compile time. For example, templates are legendarily fast
If you have seen some generic Java, look at say, the Comparable generic interface. It defines just one operation, but the contract it makes, though basic, is very much of algebraic flavor. I quote:
For the mathematically inclined, the relation that defines the natural ordering on a given class C is:
{(x, y) such that x.compareTo((Object)y) <= 0}.
The quotient for this total order is:
{(x, y) such that x.compareTo((Object)y) == 0}.
It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, > and that the natural ordering is a total order on C.
Now, I can write a method that selects the minimum, once, and use it for any type that fits this interface:
public static <T extends Comparable<T>> T min (T x, T y) {
if (x.compare(y) < 0) x; else y;
}
Naturally, since the way programmative constructs implement that notion varies wildly, what you will get in terms of usability & expressivity will also vary. Perhaps you should not judge data-generic programming just by OO languages like C++ or Java ­— but I've written too much already to start with module ascription or the automatic instance generation of type classes.
I'm too late, but maybe it will be helpful for you. User huitseeker wrote an excellent answer from the viewpoint of software design. I want to answer your question from the viewpoint of mathematics. Before diving into software world Alex Stepanov was a mathematician and studied abstract and universal algebra. And he often tried to bring rigorous mathematical foundations into the world of software and algorithm design. In his books From Mathematics to Generic Programming and Elements of Programming he advocates this design practice. His ideas about mixing concepts of algebraic structures and software design were realised in the notion of generic programming. And now let's talk about his quote:
To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types
In my opinion there are two main concepts he wanted to mention here: the idea of abstract data type (ADT) and algebraic structure. First concept: ADT. ADT - is a mathematical model for a data types where a data type is defined only by it's semantic. Stepanov contrasted the idea of ADT to the idea of object in the OOP sense. Objects contains data and state whilst ADTs - not. ADT - is a behavioural abstraction, an operation cluster which describes interaction with data. Behavioural abstraction is entirely described by means of algebraic specification of abstract data type. You can read about this more in the original Liskov and Zilles paper, also I recommend you a paper Object-Oriented Programming Versus Abstract Data Types by William R. Cook.
(Discalimer: you can skip this paragraph, because it is more "mathematical and not so important") At first I want to clarify some terminology. When I talk about the algebraic structure it is the same as algebra. The word algebra is often also used for an algebraic structure. To be more precise when we talk about algebraic structures (algebras) we usually mean algebra over an algebraic theory. There is a concept of the variety of algebras, because there are several notions of an algebraic structure on an object of some category. By definition, an algebraic theory (algebra over it) consists of a specification of operations and laws that these operations must satisfy: this is a working definition of the algebraic structure we will use, and this definition ,I think, Stepanov implicitly mentioned in the quote.
Second concept which Stepanov wanted to mention is the most interesting property of ADTs: they can be formally modelled directly as many-sorted algebraic structures. Let's talk about it more formally. An algebraic structure - is a carrier set with one or more finitary operations defined on it. These operations are usually defined not over one set but over the multiple ones. E.g. let's define and algebra which models string concatenation. This algebra will be defined not over one set of strings but over two sets: strings set S and natural numbers set N, because we can define an operation which can concatenate a string with itself some finite number of times. So, this operation will take two operands, which belongs to different underlying (carrier) sets: S and N. Set which define these different operands (their types) in algebra called a set of sorts. Sort is an algebraical analog of the type. Algebra with multiple sorts called a multi-sorted algebra. In universal algebra, a signature lists the operations that characterize an algebraic structure. A many-sorted algebraic structure can have an arbitrary number of domains. The sorts are part of the signature, and they play the role of names for the different domains. Many-sorted signatures also prescribe on which sorts the functions and relations of a many-sorted algebraic structure are defined. For a one-sorted variety of algebras a signature is a set, whose elements are called operations, to each of which is assigned a cardinal number (0,1,2,…) called its arity. A signature of multi-sorted algebra can be defined as Σ = (S,OP,A), where S – set of sort names (types), OP - set of operation names and A - arities as before, except that now an arity is a list (sequence or more generally free monoid) of input sorts instead of merely a natural number (the length of the list) together with one output sort. Now we can create an algebraic specification of an abstract data type ADT as a triple:
ADT = (N, Σ, E)
, where N - name of abstract data type, Σ = (S,OP,A) - signature of multi-sorted algebraic structure, E = {e1, e2, …,en} - is a finite collection of equalities in the signature. As can you see now we have a rigorous mathematical description of ADT. In mathematics many-sorted algebraic structures are often used as a convenient tool even when they could be avoided with a little effort. Many-sorted algebraic structures are rarely defined in a rigorous way, because it is straightforward to carry out the generalization explicitly. That's why theory of many-sorted algebras can be successfully applied to software design.
So, Alex Stepanov wanted to say that he prefer ADTs and generic programming to OOP, because thus we can create programs with rigorous mathematical/algebraical foundations. I appreciate his efforts a lot. We all know that algebraical design is always correct, rigorous, beautiful, simple and gives us better abstractions.
Not that I am an expert with the theory of any of those but let's take a look at the quote so that I can try to give my practical understanding to add to the discussion.
To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types.
From my readings, I think families of interfaces that span multiple types sounds a lot like type classes from Haskell, which is similar to concepts from C++. Take a type class like Foldable it actually is a type parametrized interface, ie. a familiy of interfaces that span multiple types. So about your question of how to solve problems with multisorted algebras, generic programming is all about that if you take it to mean type classes or concepts.

Recommended macros to add functionality to Clojure's defrecord constructor?

defrecord in clojure allows for defining simple data containers with custom fields.
e.g.
user=> (defrecord Book [author title ISBN])
user.Book
The minimal constructor that results takes only positional arguments with no additional functionality such as defaulting of fields, field validation etc.
user=> (Book. "J.R.R Tolkien" "The Lord of the Rings" 9780618517657)
#:user.Book{:author "J.R.R Tolkien", :title "The Lord of the Rings", :ISBN 9780618517657}
It is always possible to write functions wrapping the default constructor to get more complex construction semantics - using keyword arguments, supplying defaults and so on.
This seems like the ideal scenario for a macro to provide expanded semantics. What macros have people written and/or recommend for richer defrecord construction?
Examples of support for full and partial record constructor functions and support for eval-able print and pprint forms:
http://david-mcneil.com/post/765563763/enhanced-clojure-records
http://github.com/david-mcneil/defrecord2
David is a colleague of mine and we are using this defrecord2 extensively in our project. I think something like this should really be part of Clojure core (details might vary considerably of course).
The things we've found to be important are:
Ability to construct a record with named (possibly partial) parameters: (new-foo {:a 1})
Ability to construct a record by copying an existing record and making modifications: (new-foo old-foo {:a 10})
Field validation - if you pass a field outside the declared record fields, throw an error. Of course, this is actually legal and potentially useful, so there are ways to make it optional. Since it would be rare in our usage, it's far more likely to be an error.
Default values - these would be very useful but we haven't implemented it. Chas Emerick has written about adding support for default values here: http://cemerick.com/2010/08/02/defrecord-slot-defaults/
Print and pprint support - we find it very useful to have records print and pprint in a form that is eval-able back to the original record. For example, this allows you to run a test, swipe the actual output, verify it, and use it as the expected output. Or to swipe output from a debug trace and get a real eval-able form.
Here is one that defines a record with default values and invariants. It creates a ctor that can take keyword args to set the values of the fields.
(defconstrainedrecord Foo [a 1 b 2]
[(every? number? [a b])])
(new-Foo)
;=> #user.Foo{:a 1, :b 2}
(new-Foo :a 42)
; #user.Foo{:a 42, :b 2}
And like I said... invariants:
(new-Foo :a "bad")
; AssertionError
But they only make sense in the context of Trammel.
Here is one approach: http://david-mcneil.com/post/765563763/enhanced-clojure-records