How do you override `GetHashCode()` idiomatically for an algebraic data type? - hash

I have an algebraic type of the following form:
type Example =
| Ctor0 of int
| Ctor1 of Example * Example
| Ctor2 of Example * Example
| Ctor3 of Example list
If I need to hash a value of type Example, I can use the default method: myValue.GetHashCode() or use hash which does the same thing. Now, suppose I want to define my own hash function for some reason because I need a special hash for the Ctor3 constructor, then I would override the GethashCode() method:
type Example =
| Ctor0 of int
| Ctor1 of Example * Example
| Ctor2 of Example * Example
| Ctor3 of Example list
override this.GethashCode() =
match this with
| Ctor0 (n) -> hash n (* The default `hash` is sufficient here *)
| ...
| Ctor3 (xs) -> myCustomHash xs
I need to define a hash value for Ctor1 and Ctor2, but their form is similar and using hash would not help. Indeed, for example one could define Ctor1(Ctor 7, Ctor 4) and Ctor2(Ctor 7, Ctor 4) if, in GetHashCode(), I hash only the value of the constructors, I would have the same hash for two different values.
override this.GetHashCode() =
match this with
| Ctor0 (n) -> hash n
| Ctor3 (xs) -> myCustomHash xs
(* I could do something like that: *)
| Ctor1 (a, b) -> hash a + hash b (* or maybe 1 + hash (a, b) *)
| Ctor2 (a, b) -> hash a - hash b (* 2 + hash (a, b) *)
The latter code does not seem to me very satisfactory... In such a situation, and assuming I have a large number of constructors, how am I supposed to define a hash function that of course produces a different value for each constructor (with a high probability anyway)?
I don't know much about hash algorithms, so my question may be very silly, but an indication of the idiomatic way to define (override) the hash function for this style of situation would be appreciated.
PS: I know you have to use CustomEquality and NoComparison, I'm just talking about how to do what I want here.

The recommended way to combine hash codes since .NET Standard 2.1 is using System.HashCode.Combine(). To differentiate between cases, you can add an integer argument.
override this.GetHashCode() =
match this with
| Ctor0 n -> HashCode.Combine(0, n)
| Ctor1 (a, b) -> HashCode.Combine(1, a, b)
| Ctor2 (a, b) -> HashCode.Combine(2, a, b)
| Ctor3 xs -> HashCode.Combine(3, myCustomHash xs)
I added the integer argument in all cases for consistency, but I think you're right that Ctor0 and Ctor3 don't really need it.
If you need to be compatible with older .NET versions, you can rely on hash for tuples instead. (I'm using struct tuples here which should allocate less than normal tuples)
override this.GetHashCode() =
match this with
| Ctor0 n -> hash struct(0, n)
| Ctor1 (a, b) -> hash struct(1, a, b)
| Ctor2 (a, b) -> hash struct(2, a, b)
| Ctor3 xs -> hash struct(3, myCustomHash xs)

Related

Is there a function that transforms/maps both Either's Left and Right cases taking two transformation functions respectively?

I have not found a function in Scala or Haskell that can transform/map both Either's Left and Right cases taking two transformation functions at the same time, namely a function that is of the type
(A => C, B => D) => Either[C, D]
for Either[A, B] in Scala, or the type
(a -> c, b -> d) -> Either a b -> Either c d
in Haskell. In Scala, it would be equivalent to calling fold like this:
def mapLeftOrRight[A, B, C, D](e: Either[A, B], fa: A => C, fb: B => D): Either[C, D] =
e.fold(a => Left(fa(a)), b => Right(fb(b)))
Or in Haskell, it would be equivalent to calling either like this:
mapLeftOrRight :: (a -> c) -> (b -> d) -> Either a b -> Either c d
mapLeftOrRight fa fb = either (Left . fa) (Right . fb)
Does a function like this exist in the library? If not, I think something like this is quite practical, why do the language designers choose not to put it there?
Don't know about Scala, but Haskell has a search engine for type signatures. It doesn't give results for the one you wrote, but that's just because you take a tuple argument while Haskell functions are by convention curried†. https://hoogle.haskell.org/?hoogle=(a -> c) -> (b -> d) -> Either a b -> Either c d does give matches, the most obvious being:
mapBoth :: (a -> c) -> (b -> d) -> Either a b -> Either c d
...actually, even Google finds that, because the type variables happen to be exactly as you thought. (Hoogle also finds it if you write it (x -> y) -> (p -> q) -> Either x p -> Either y q.)
But actually, as Martijn said, this behaviour for Either is only a special case of a bifunctor, and indeed Hoogle also gives you the more general form, which is defined in the base library:
bimap :: Bifunctor p => (a -> b) -> (c -> d) -> p a c -> p b d
†TBH I'm a bit disappointed that Hoogle doesn't by itself figure out to curry the signature or to swap arguments. Pretty sure it actually used to do that automatically, but at some point they simplified the algorithm because with the huge number of libraries the time it took and number of results got out of hand.
Cats provides Bifunctor, for example
import cats.implicits._
val e: Either[String, Int] = Right(41)
e.bimap(e => s"boom: $e", v => 1 + v)
// res0: Either[String,Int] = Right(42)
The behaviour you are talking about is a bifunctor behaviour, and would commonly be called bimap. In Haskell, a bifunctor for either is available: https://hackage.haskell.org/package/bifunctors-5/docs/Data-Bifunctor.html
Apart from the fold you show, another implementation in scala would be either.map(fb).left.map(fa)
There isn't such a method in the scala stdlib, probably because it wasn't found useful or fundamental enough. I can somewhat relate to that: mapping both sides in one operation instead of mapping each side individually doesn't come across as fundamental or useful enough to warrant inclusion in the scala stdlib to me either. The bifunctor is available in Cats though.
In Haskell, the method exists on Either as mapBoth and BiFunctor is in base.
In Haskell, you can use Control.Arrow.(+++), which works on any ArrowChoice:
(+++) :: (ArrowChoice arr) => arr a b -> arr c d -> arr (Either a c) (Either b d)
infixr 2 +++
Specialised to the function arrow arr ~ (->), that is:
(+++) :: (a -> b) -> (c -> d) -> Either a c -> Either b d
Hoogle won’t find +++ if you search for the type specialised to functions, but you can find generalised operators like this by replacing -> in the signature you want with a type variable: x a c -> x b d -> x (Either a b) (Either c d).
An example of usage:
renderResults
:: FilePath
-> Int
-> Int
-> [Either String Int]
-> [Either String String]
renderResults file line column
= fmap ((prefix ++) +++ show)
where
prefix = concat [file, ":", show line, ":", show column, ": error: "]
renderResults "test" 12 34 [Right 1, Left "beans", Right 2, Left "bears"]
==
[ Right "1"
, Left "test:12:34: error: beans"
, Right "2"
, Left "test:12:34: error: bears"
]
There is also the related operator Control.Arrow.(|||) which does not tag the result with Either:
(|||) :: arr a c -> a b c -> arr (Either a b) c
infixr 2 |||
Specialised to (->):
(|||) :: (a -> c) -> (b -> c) -> Either a b -> c
Example:
assertRights :: [Either String a] -> [a]
assertRights = fmap (error ||| id)
sum $ assertRights [Right 1, Right 2]
==
3
sum $ assertRights [Right 1, Left "oh no"]
==
error "oh no"
(|||) is a generalisation of the either function in the Haskell Prelude for matching on Eithers. It’s used in the desugaring of if and case in arrow proc notation.

Similar record types in a list/array in purescript

Is there any way to do something like
first = {x:0}
second = {x:1,y:1}
both = [first, second]
such that both is inferred as {x::Int | r} or something like that?
I've tried a few things:
[{x:3}] :: Array(forall r. {x::Int|r}) -- nope
test = Nil :: List(forall r. {x::Int|r})
{x:1} : test -- nope
type X r = {x::Int | r}
test = Nil :: List(X) -- nope
test = Nil :: List(X())
{x:1} : test
{x:1, y:1} : test -- nope
Everything I can think of seems to tell me that combining records like this into a collection is not supported. Kind of like, a function can be polymorphic but a list cannot. Is that the correct interpretation? It reminds me a bit of the F# "value restriction" problem, though I thought that was just because of CLR restrictions whereas JS should not have that issue. But maybe it's unrelated.
Is there any way to declare the list/array to support this?
What you're looking for is "existential types", and PureScript just doesn't support those at the syntax level the way Haskell does. But you can roll your own :-)
One way to go is "data abstraction" - i.e. encode the data in terms of operations you'll want to perform on it. For example, let's say you'll want to get the value of x out of them at some point. In that case, make an array of these:
type RecordRep = Unit -> Int
toRecordRep :: forall r. { x :: Int | r } -> RecordRep
toRecordRep {x} _ = x
-- Construct the array using `toRecordRep`
test :: Array RecordRep
test = [ toRecordRep {x:1}, toRecordRep {x:1, y:1} ]
-- Later use the operation
allTheXs :: Array Int
allTheXs = test <#> \r -> r unit
If you have multiple such operations, you can always make a record of them:
type RecordRep =
{ getX :: Unit -> Int
, show :: Unit -> String
, toJavaScript :: Unit -> Foreign.Object
}
toRecordRep r =
{ getX: const r.x
, show: const $ show r.x
, toJavaScript: const $ unsafeCoerce r
}
(note the Unit arguments in every function - they're there for the laziness, assuming each operation could be expensive)
But if you really need the type machinery, you can do what I call "poor man's existential type". If you look closely, existential types are nothing more than "deferred" type checks - deferred to the point where you'll need to see the type. And what's a mechanism to defer something in an ML language? That's right - a function! :-)
newtype RecordRep = RecordRep (forall a. (forall r. {x::Int|r} -> a) -> a)
toRecordRep :: forall r. {x::Int|r} -> RecordRep
toRecordRep r = RecordRep \f -> f r
test :: Array RecordRep
test = [toRecordRep {x:1}, toRecordRep {x:1, y:1}]
allTheXs = test <#> \(RecordRep r) -> r _.x
The way this works is that RecordRep wraps a function, which takes another function, which is polymorphic in r - that is, if you're looking at a RecordRep, you must be prepared to give it a function that can work with any r. toRecordRep wraps the record in such a way that its precise type is not visible on the outside, but it will be used to instantiate the generic function, which you will eventually provide. In my example such function is _.x.
Note, however, that herein lies the problem: the row r is literally not known when you get to work with an element of the array, so you can't do anything with it. Like, at all. All you can do is get the x field, because its existence is hardcoded in the signatures, but besides the x - you just don't know. And that's by design: if you want to put anything into the array, you must be prepared to get anything out of it.
Now, if you do want to do something with the values after all, you'll have to explain that by constraining r, for example:
newtype RecordRep = RecordRep (forall a. (forall r. Show {x::Int|r} => {x::Int|r} -> a) -> a)
toRecordRep :: forall r. Show {x::Int|r} => {x::Int|r} -> RecordRep
toRecordRep r = RecordRep \f -> f r
test :: Array RecordRep
test = [toRecordRep {x:1}, toRecordRep {x:1, y:1}]
showAll = test <#> \(RecordRep r) -> r show
Passing the show function like this works, because we have constrained the row r in such a way that Show {x::Int|r} must exist, and therefore, applying show to {x::Int|r} must work. Repeat for your own type classes as needed.
And here's the interesting part: since type classes are implemented as dictionaries of functions, the two options described above are actually equivalent - in both cases you end up passing around a dictionary of functions, only in the first case it's explicit, but in the second case the compiler does it for you.
Incidentally, this is how Haskell language support for this works as well.
Folloing #FyodorSoikin answer based on "existential types" and what we can find in purescript-exists we can provide yet another solution.
Finally we will be able to build an Array of records which will be "isomorphic" to:
exists tail. Array { x :: Int | tail }
Let's start with type constructor which can be used to existentially quantify over a row type (type of kind #Type). We are not able to use Exists from purescript-exists here because PureScript has no kind polymorphism and original Exists is parameterized over Type.
newtype Exists f = Exists (forall a. f (a :: #Type))
We can follow and reimplement (<Ctrl-c><Ctrl-v> ;-)) definitions from Data.Exists and build a set of tools to work with such Exists values:
module Main where
import Prelude
import Unsafe.Coerce (unsafeCoerce)
import Data.Newtype (class Newtype, unwrap)
newtype Exists f = Exists (forall a. f (a :: #Type))
mkExists :: forall f a. f a -> Exists f
mkExists r = Exists (unsafeCoerce r :: forall a. f a)
runExists :: forall b f. (forall a. f a -> b) -> Exists f -> b
runExists g (Exists f) = g f
Using them we get the ability to build an Array of Records with "any" tail but we have to wrap any such a record type in a newtype before:
newtype R t = R { x :: Int | t }
derive instance newtypeRec :: Newtype (R t) _
Now we can build an Array using mkExists:
arr :: Array (Exists R)
arr = [ mkExists (R { x: 8, y : "test"}), mkExists (R { x: 9, z: 10}) ]
and process values using runExists:
x :: Array [ Int ]
x = map (runExists (unwrap >>> _.x)) arr

Encoding of inferrable records

As you probably know, records are somewhat special in ocaml, as each label has to be uniquely assigned to a nominal record type, i.e. the following function cannot be typed without context:
let f r = r.x
Proper first class records (i.e. things that behave like tuples with labels) are trivially encoded using objects, e.g.
let f r = r#x
when creating the objects in the right way (i.e. no self-recursion, no mutation), they behave just like records.
I am however, somewhat unhappy with this solution for two reasons:
when making records updatetable (i.e. by adding an explicit "with_l" method for each label l), the type is somewhat too loose (it should be the same as the original record). Admitted, one can enforce this equality, but this is still inconvenient.
I have the suspicion that the OCaml compiler does not infer that these records are actually immutable: In a function
let f r = r#x + r#x
would the compiler be able to run a common subexpression elimination?
For these reasons, I wonder if there is a better encoding:
Is there another (aside from using objects) type-safe encoding (e.g. using polymorphic variants) of records with inferrable type in OCaml?
Can this encoding avoid the problems mentioned above?
If I understand you correctly you're looking for a very special kind of polymorphism. You want to write a function that will work for all types, such that the type is a record with certain fields. This sounds more like a syntactic polymorphism in a C++ style, not as semantic polymorphism in ML style. If we will slightly rephrase the task, by capturing the idea that a field accessing is just a syntactic sugar for a field projection function, then we can say, that you want to write a function that is polymorphic over all types that provide a certain set of operations. This kind of polymorphism can be captured by OCaml using one of the following mechanisms:
functors
first class modules
objects
I think that functors are obvious, so I will show an example with first class modules. We will write a function print_student that will work on any type that satisfies the Student signature:
module type Student = sig
type t
val name : t -> string
val age : t -> int
end
let print_student (type t)
(module S : Student with type t = t) (s : t) =
Printf.printf "%s %d" (S.name s) (S.age s)
The type of print_student function is (module Student with type t = 'a) -> 'a -> unit. So it works for any type that satisfies the Student interface, and thus it is polymorphic. This is a very powerful polymorphism that comes with a price, you need to pass the module structure explicitly when you're invoking the function, so it is a System F style polymorphism. Functors will also require you to specify concrete module structure. So both are not inferrable (i.e., not an implicit Hindley-Milner-like style polymorphism, that you are looking for). For the latter, only objects will work (there are also modular implicits, that relax the explicitness requirement, but they are still not in the trunk, but they will actually answer your requirements).
With object-style row polymorphism it is possible to write a function that is polymorphic over a set of types conforming to some signature, and to infer this signature implicitly from the function definintion. However, such power comes with a price. Since object operations are encoded with methods and methods are just function pointers that are assigned dynamically in the runtime, you shouldn't expect any compile time optimizations. It is not possible to perform any static analysis on something that is bound dynamically. So, of course, no Common Subexpression elimination, nor inlining. For functors and first class modules, the optimization is possible on a newer branch of the compiler with flamba (See 4.03.0+flambda opam switch). But on a regular compiler installation no inlining will be performed.
Different approaches
What concerning other techniques. First of all we can use camlp{4,5}, or ppx or even m4 and cpp to preprocess code, but this would be hardly idiomatic and of doubtful usefulness.
Another way, is instead of writing a function that is polymorphic, we can try to find a suitable monomorphic data type. A direct approach would be to use a list of polymorphic variants, e.g.,
type attributes = [`name of string | `age of int]
type student = attribute list
In fact we even don't need to specify all these types ahead, and our function can require only those fields that are needed, a form of a row polymorphism:
let rec name = function
| [] -> raise Not_found
| `name n -> n
| _ :: student -> name student
The only problem with this encoding, is that you cannot guarantee that the same named attribute can occur once and only once. So it is possible that a student doesn't have a name at all, or, that is worser, it can have more then one names. Depending on your problem domain it can be acceptable.
If it is not, then we can use GADT and extensible variants to encode heterogenous maps, i.e., an associative data structures that map keys to
different type (in a regular (homogenous) map or assoc list value type is unified). How to construct such containers is beyond the scope of the answer, but fortunately there're at least two available implementations. One, that I use personally is called universal map (Univ_map) and is provided by a Core library (Core_kernel in fact). It allows you to specify two kinds of heterogenous maps, with and without a default values. The former corresponds to a record with optional field, the latter has default for each field, so an accessor is a total function. For example,
open Core_kernel.Std
module Dict = Univ_map.With_default
let name = Dict.Key.create ~name:"name" ~default:"Joe" sexp_of_string
let age = Dict.Key.create ~name:"age" ~default:18 sexp_of_int
let print student =
printf "%s %d"
(Dict.get student name) (Dict.get age name)
You can hide that you're using universal map using abstract type, as there is only one Dict.t that can be used across different abstractions, that may break modularity. Another example of heterogeneous map implementation is from Daniel Bunzli. It doesn't provide With_default kind of map, but has much less dependencies.
P.S. Of course for such a redundant case, where this only one operation it is much easier to just pass this operation explicitly as function, instead of packing it into a structure, so we can write function f from your example as simple as let f x r = x r + x r. But this would be the same kind of polymoprism as with first class modules/functors, just simplified. And I assume, that your example was specifically reduced to one field, and in your real use case you have more complex set of fields.
Very roughly speaking, an OCaml object is a hash table whose keys are its method name hash. (The hash of a method name can be obtained by Btype.hash_variant of OCaml compiler implementation.)
Just like objects, you can encode polymorphic records using (int, Obj.t) Hashtbl.t. For example, a function to get a value of a field l can be written as follows:
(** [get r "x"] is poly-record version of [r.x] *)
let get r k = Hashtbl.find t (Btype.hash_variant k))
Since it is easy to access the internals unlike objects, the encoding of {r with l = e} is trivial:
(** [copy_with r [(k1,v1);..;(kn,vn)]] is poly-record version of
[{r with k1 = v1; ..; kn = vn}] *)
let copy_with r fields =
let r = Hashtbl.copy r in
List.iter (fun (k,v) -> Hashtbl.replace r (Btype.hash_variant k) v) fields
and the creation of poly-records:
(** [create [(k1,v1);..(kn,vn)]] is poly-record version of [{k1=v1;..;kn=vn}] *)
let create fields = copy_with fields (Hashtbl.create (List.length fields))
Since all the types of the fields are squashed into one Obj.t, you have to use Obj.magic to store various types into this implementation and therefore this is not type-safe by itself. However, we can make it type-safe wrapping (int, Obj.t) Hashtbl.t with phantom type whose parameter denotes the fields and their types of a poly-record. For example,
<x : int; y : float> Poly_record.t
is a poly-record whose fields are x : int and y : float.
Details of this phantom type wrapping for the type safety is too long to explain here. Please see my implementation https://bitbucket.org/camlspotter/ppx_poly_record/src . To tell short, it uses PPX preprocessor to generate code for type-safety and to provide easier syntax sugar.
Compared with the encoding by objects, this approach has the following properties:
The same type safety and the same field access efficiency as objects
It can enjoy structural subtyping like objects, what you want for poly-records.
{r with l = e} is possible
Streamable outside of a program safely, since hash tables themselves have no closure in it. Objects are always "contaminated" with closures therefore they are not safely streamable.
Unfortunately it lacks efficient pattern matching, which is available for mono-records. (And this is why I do not use my implementation :-( ) I feel for it PPX reprocessing is not enough and some compiler modification is required. It will not be really hard though since we can make use of typing of objects.
Ah and of course, this encoding is very side effective therefore no CSE optimization can be expected.
Is there another (aside from using objects) type-safe encoding (e.g. using polymorphic variants) of records with inferrable type in OCaml?
For immutable records, yes. There is a standard theoretical duality between polymorphic records ("inferrable" records as you describe) and polymorphic variants. In short, a record { l_1 = v_1; l_2 = v_2; ...; l_n = v_n } can be implemented by
function `l_1 k -> k v_1 | `l_2 k -> k v_2 | ... | `l_n k -> k v_n
and then the projection r.l_i becomes r (`l_i (fun v -> v)). For instance, the function fun r -> r.x is encoded as fun r -> r (`x (fun v -> v)). See also the following example session:
# let myRecord = (function `field1 k -> k 123 | `field2 k -> k "hello") ;;
(* encodes { field1 = 123; field2 = "hello" } *)
val myRecord : [< `field1 of int -> 'a | `field2 of string -> 'a ] -> 'a = <fun>
# let getField1 r = r (`field1 (fun v -> v)) ;;
(* fun r -> r.field1 *)
val getField1 : ([> `field1 of 'a -> 'a ] -> 'b) -> 'b = <fun>
# getField1 myRecord ;;
- : int = 123
# let getField2 r = r (`field2 (fun v -> v)) ;;
(* fun r -> r.field2 *)
val getField2 : ([> `field2 of 'a -> 'a ] -> 'b) -> 'b = <fun>
# getField2 myRecord ;;
- : string = "hello"
For mutable records, we can add setters like:
let ref1 = ref 123
let ref2 = ref "hello"
let myRecord =
function
| `field1 k -> k !ref1
| `field2 k -> k !ref2
| `set_field1(v1, k) -> k (ref1 := v1)
| `set_field2(v2, k) -> k (ref2 := v2)
and use them like myRecord (`set_field1(456, fun v -> v)) and myRecord (`set_field2("world", fun v -> v)) for example. However, localizing ref1 and ref2 like
let myRecord =
let ref1 = ref 123 in
let ref2 = ref "hello" in
function
| `field1 k -> k !ref1
| `field2 k -> k !ref2
| `set_field1(v1, k) -> k (ref1 := v1)
| `set_field2(v2, k) -> k (ref2 := v2)
causes a value restriction problem and requires a little more polymorphic typing trick (which I omit here).
Can this encoding avoid the problems mentioned above?
The "common subexpression elimination" for (the encoding of) r.x + r.x can be done only if OCaml knows the definition of r and inlines it. (Sorry my previous answer was inaccurate here.)

How to concisely express function iteration?

Is there a concise, idiomatic way how to express function iteration? That is, given a number n and a function f :: a -> a, I'd like to express \x -> f(...(f(x))...) where f is applied n-times.
Of course, I could make my own, recursive function for that, but I'd be interested if there is a way to express it shortly using existing tools or libraries.
So far, I have these ideas:
\n f x -> foldr (const f) x [1..n]
\n -> appEndo . mconcat . replicate n . Endo
but they all use intermediate lists, and aren't very concise.
The shortest one I found so far uses semigroups:
\n f -> appEndo . times1p (n - 1) . Endo,
but it works only for positive numbers (not for 0).
Primarily I'm focused on solutions in Haskell, but I'd be also interested in Scala solutions or even other functional languages.
Because Haskell is influenced by mathematics so much, the definition from the Wikipedia page you've linked to almost directly translates to the language.
Just check this out:
Now in Haskell:
iterateF 0 _ = id
iterateF n f = f . iterateF (n - 1) f
Pretty neat, huh?
So what is this? It's a typical recursion pattern. And how do Haskellers usually treat that? We treat that with folds! So after refactoring we end up with the following translation:
iterateF :: Int -> (a -> a) -> (a -> a)
iterateF n f = foldr (.) id (replicate n f)
or point-free, if you prefer:
iterateF :: Int -> (a -> a) -> (a -> a)
iterateF n = foldr (.) id . replicate n
As you see, there is no notion of the subject function's arguments both in the Wikipedia definition and in the solutions presented here. It is a function on another function, i.e. the subject function is being treated as a value. This is a higher level approach to a problem than implementation involving arguments of the subject function.
Now, concerning your worries about the intermediate lists. From the source code perspective this solution turns out to be very similar to a Scala solution posted by #jmcejuela, but there's a key difference that GHC optimizer throws away the intermediate list entirely, turning the function into a simple recursive loop over the subject function. I don't think it could be optimized any better.
To comfortably inspect the intermediate compiler results for yourself, I recommend to use ghc-core.
In Scala:
Function chain Seq.fill(n)(f)
See scaladoc for Function. Lazy version: Function chain Stream.fill(n)(f)
Although this is not as concise as jmcejuela's answer (which I prefer), there is another way in scala to express such a function without the Function module. It also works when n = 0.
def iterate[T](f: T=>T, n: Int) = (x: T) => (1 to n).foldLeft(x)((res, n) => f(res))
To overcome the creation of a list, one can use explicit recursion, which in reverse requires more static typing.
def iterate[T](f: T=>T, n: Int): T=>T = (x: T) => (if(n == 0) x else iterate(f, n-1)(f(x)))
There is an equivalent solution using pattern matching like the solution in Haskell:
def iterate[T](f: T=>T, n: Int): T=>T = (x: T) => n match {
case 0 => x
case _ => iterate(f, n-1)(f(x))
}
Finally, I prefer the short way of writing it in Caml, where there is no need to define the types of the variables at all.
let iterate f n x = match n with 0->x | n->iterate f (n-1) x;;
let f5 = iterate f 5 in ...
I like pigworker's/tauli's ideas the best, but since they only gave it as a comments, I'm making a CW answer out of it.
\n f x -> iterate f x !! n
or
\n f -> (!! n) . iterate f
perhaps even:
\n -> ((!! n) .) . iterate

Datatype for terms over a signature in SML

I want to implement an arbitrary signature in SML. How can I define a datatype for terms over that signature ?I would be needing it to write functions that checks whether the terms are well formed .
In my point of view, there are two major ways of representing an AST. Either as series of (possibly mutually recursive) datatypes or just as one big datatype. There are pros an cos for both.
If we define the following BNF (extracted from the SML definition and slightly simplified)
<exp> ::= <exp> andalso <exp>
| <exp> orelse <exp>
| raise <exp>
| <appexp>
<appexp> ::= <atexp> | <appexp> <atexp>
<atexp> ::= ( <exp>, ..., <exp> )
| [ <exp>, ..., <exp> ]
| ( <exp> ; ... ; <exp> )
| ( <exp> )
| ()
As stated this is simplified and much of the atexp is left out.
1. A series of possibly mutually recursive datatypes
Here you would for example create a datatype for expressions, declarations, patterns, etc.
Basicly you would create a datatype for each of the non-terminals in your BNF.
We would most likely create the following datatypes
datatype exp = Andalso of exp * exp
| Orelse of exp * exp
| Raise of exp
| App of exp * atexp
| Atexp of atexp
and atexp = Tuple of exp list
| List of exp list
| Seq of exp list
| Par of exp
| Unit
Notice that the non-terminal has been consumed into exp datatype instead of having it as its own. That would just clutter up the AST for no reason. You have to remember that a BNF is often written in such a way that it also defined precedens and assosiativity (e.g., for arithmetic). In such cases you can often simplify the BNF by merging multiple non-terminals into one datatype.
The good thing about defining multiple datatypes is that you kind of get some well formednes of your AST. If for example we also had non-terminal for declarations, we know that the AST will newer contain a declaration inside a list (as only expressions can be there). Because of this, most of you well formedness check is not nessesary.
This is however not always a good thing. Often you need to do some checking on the AST anyways, for example type checking. In many cases the BNF is quite large and thus the number of datatypes nessesary to model the AST is also quite large. Keeping this in mind, you have to create one function for each of your datatypes,for every type of modification you wan't to do on your AST. In many cases you only wan't to change a small part of your AST but you will (most likely) still need to define a function for each datatype. Most of these functions will basicly be the identity and then only in a few cases you will do the desired work.
If for example we wan't to count how many units there are in a given AST we could define the following functions
fun countUnitexp (Andalso (e1, e2)) = countUnitexp e1 + countUnitexp e2
| countUnitexp (Orelse (e1, e2)) = countUnitexp e1 + countUnitexp e2
| countUnitexp (Raise e1) = countUnitexp e1
| countUnitexp (App (e1, atexp)) = countUnitexp e1 + countUnitatexp atexp
| countUnitexp (Atexp atexp) = countUnitatexp atexp
and countUnitatexp (Tuple exps) = sumUnit exps
| countUnitatexp (List exps) = sumUnit exps
| countUnitatexp (Seq exps) = sumUnit exps
| countUnitatexp (Par exp) = countUnitexp exp
| countUnitatexp Unit = 1
and sumUnit exps = foldl (fn (exp,b) => b + countUnitexp exp) 0 exps
As you may see we are doing a lot of work, just for this simple task. Imagine a bigger grammar and a more complicated task.
2. One (big) datatype (nodes) -- and a Tree of these nodes
Lets combine the datatypes from before, but change them such that they don't (themself) contain their children. Because in this approach we build a tree structure that has a node and some children of that node. Obviously if you have an identifier, then the identifier needs to contain the actual string representation (e.g., variable name).
So lets start out by defined the nodes for the tree structure.
(* The comment is the kind of children and possibly specific number of children
that the BNF defines to be valid *)
datatype node = Exp_Andalso (* [exp, exp] *)
| Exp_Orelse (* [exp, exp] *)
| Exp_Raise (* [exp] *)
| Exp_App (* [exp, atexp] *)
(* Superflous:| Exp_Atexp (* [atexp] *) *)
| Atexp_Tuple (* exp list *)
| Atexp_List (* exp list *)
| Atexp_Seq (* exp list *)
| Atexp_Par (* [exp] *)
| Atexp_Unit (* [] *)
See how the Atexp from the tupe now becomes superflous and thus we remove it. Personally I think it is nice to have the comment next by telling which children (in the tree structure) we can expect.
(* Note this is a non empty tree. That is you have to pack it in an option type
if you wan't to represent an empty tree *)
datatype 'a tree = T of 'a * 'a tree list
(* Define the ast as trees of our node datatype *)
type ast = node tree
We then define a generic tree and define the type ast to be a "tree of nodes".
If you use some library then there is a big chance that such a tree structure is already present. Also it might be handy late on to extend this tree structure to contain more than just the node as data, however we just keep it simple here.
fun foldTree f b (T (n, [])) = f (n, b)
| foldTree f b (T (n, ts)) = foldl (fn (t, b') => foldTree f b' t)
(f (n, b)) ts
For this example we define a fold function over the tree, again if you are using a library then all these functions for folding, mapping, etc. are most likely already defined.
fun countUnit (Atexp_Unit) = 1
| countUnit _ = 0
If we then take the example from before, that we wan't to count the number of occurances of unit, we can then just fold the above function over the tree.
val someAST = T(Atexp_Tuple, [ T (Atexp_Unit, [])
, T (Exp_Raise, [])
, T (Atexp_Unit, [])
]
)
A simple AST could look like the above (note that this is actually not valid as we have a Exp_Raise with no children). And we could then do the counting by
foldTree (fn (a,b) => (countUnit a) + b) 0 someAST
The down side of this approach is that you have to write a check function that verifies that your AST is well formed, as there is no restrictions when you create the AST. This includes that the children are of the correct "type" (e.g., only Exp_* as children in an Exp_Andalso) and that there are the correct number of children (e.g., exactly two children in Exp_Andalso).
This approach also requires a bit of builk getting started, given you don't use some library that has a tree defined (including auxilary functions for modifying the tree). However in the long run it pays of.