How does an instance of "Arbitrary" looks for a tree? - class

In our CS-Lectures we currently learn about QuickCheck in Haskell. Now I got a task to use QuickCheck with the following tree-type:
data Tree = Leaf Int | Node Tree Tree
deriving (Eq, Show)
I have already written some necessery equations to check different properties of trees.
I know, that I need an instance of "Arbitrary" to run the whole thing.
So tried this:
instance Arbitrary Tree where
arbitrary = sized tree'
where tree' 0 = do a <- arbitrary
oneof [return (Leaf a)]
tree' n = do a <- arbitrary
oneof [return (Leaf a), return (Node (tree' (n-1)) (tree' (n-1)))]
But now I am getting some errors such as:
Couldn't match type `Gen Tree' with `Tree'
Expected type: a -> Tree
Actual type: a -> Gen Tree
* In an equation for `arbitrary':
arbitrary
= sized tree'
where
tree' 0
= do a <- arbitrary
....
tree' n
= do a <- arbitrary
....
In the instance declaration for `Arbitrary Tree'
|
61 | where tree' 0 = do a <- arbitrary
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
or:
* Couldn't match type `Tree' with `Gen Tree'
Expected type: Int -> Gen Tree
Actual type: Int -> Tree
* In the first argument of `sized', namely tree'
In the expression: sized tree'
In an equation for `arbitrary':
arbitrary
= sized tree'
where
tree' 0
= do a <- arbitrary
....
tree' n
= do a <- arbitrary
....
|
60 | arbitrary = sized tree'
| ^^^^^
I think the problem is that I am doing some kind of recursion when choosing a Node. Because in that case the subtrees of that node are not trees but more like "return trees". Hopefully you know, what I mean.
Can somebody help me with this?
Thank you :)

The simplest way to implement this is with:
instance Arbitrary Tree where
arbitrary = frequency [
(3, Leaf <$> arbitrary)
, (1, Node <$> arbitrary <*> arbitrary)
]
Here the arbitrary functions in bold are the ones implement for the Tree instance. The arbitrary for Leaf is the arbitrary instance for an Int.
Here we thus specify that an arbitrary tree is a leaf with an arbitrary Int, or it is a Node with an arbitrary left and right sub-Tree.
or with sized :: (Int -> Gen a) -> Gen a:
instance Arbitrary Tree where
arbitrary = sized go
where go 0 = Leaf <$> arbitrary
go n = oneof [Leaf <$> arbitrary, Node <$> go' <*> go']
where go' = go (n-1)
here the size specifies the depth of the tree, not the number of elements.

This can be derived using the generic-random library
{-# Language DataKinds #-}
{-# Language DeriveGeneric #-}
{-# Language DerivingVia #-}
import GHC.Generics
import Generic.Random.DerivingVia
import Test.QuickCheck
-- ghci> :set -XTypeApplications
-- ghci> sample #Tree arbitrary
-- Node (Leaf 0) (Node (Leaf 0) (Node (Leaf 0) (Node (Node (Leaf 0) (Leaf 0)) (Leaf 0))))
-- Leaf 0
-- Leaf (-2)
-- Leaf 5
-- Leaf 0
-- Leaf 2
-- Leaf 1
-- Leaf 7
-- Node (Leaf (-7)) (Leaf (-2))
-- Node (Leaf 4) (Node (Leaf 0) (Leaf 3))
-- Node (Leaf 5) (Leaf (-2))
data Tree = Leaf Int | Node Tree Tree
deriving
stock (Eq, Show, Generic)
deriving Arbitrary
via GenericArbitraryRec '[2, 1] Tree
Let me know if there is something wrong with the distribution!

Related

Capturing rules of graph using types in Scala, OCaml and Haskell

I am trying to describe a complex graph with many different types of nodes and edges which can only be connected to each other according to a set of rules. I would like these rules to be checked at compile time using the type system of the language. There are many different node and edge types in my real application.
I have easily created a simple example in Scala:
sealed trait Node {
val name: String
}
case class NodeType1(override val name: String) extends Node
case class NodeType2(override val name: String) extends Node
case class NodeType3(override val name: String) extends Node
sealed trait Edge
case class EdgeType1(source: NodeType1, target: NodeType2) extends Edge
case class EdgeType2(source: NodeType2, target: NodeType1) extends Edge
object Edge {
def edgeSource(edge: Edge): Node = edge match {
case EdgeType1(src, _) => src
case EdgeType2(src, _) => src
}
}
object Main {
def main(args: Array[String]) {
val n1 = NodeType1("Node1")
val n2 = NodeType2("Node2")
val edge = EdgeType1(n1, n2)
val source = Edge.edgeSource(edge)
println(source == n1) // true
}
}
A valid graph can only connect a given edge type between the given types of nodes as shown in the Scala example above. The function "edgeSource" extracts the source node from the edge, as simple as that.
Here comes a non working example of what I would like to write in OCaml:
type node =
NodeType1 of string
| NodeType2 of string
type edge =
EdgeType1 of NodeType1 * NodeType2
| EdgeType2 of NodeType2 * NodeType1
let link_source (e : edge) : node =
match e with
| EdgeType1 (src, _) -> src
| EdgeType2 (src, _) -> src
The problem here is that "NodeTypeX" are constructors and not types. Hence, I am unable to use them when I describe the tuples with source and target for which the edges are defined. The "link_source" function can only return one type and "node" is the variant which can return something.
I have been trying out how to fix this in both OCaml and Haskell and here is an example of one go in OCaml where the node type wraps node_type_X:
type node_type_1 = NodeType1 of string
type node_type_2 = NodeType2 of string
type node =
NodeType1Node of node_type_1
| NodeType2Node of node_type_2
type edge =
EdgeType1 of node_type_1 * node_type_2
| EdgeType2 of node_type_2 * node_type_1
let link_source (e : edge) : node =
match e with
| EdgeType1 (src, _) -> NodeType1Node src
| EdgeType2 (src, _) -> NodeType2Node src
But the problem with this is that I am duplicating the type information. I am specifying the source node type in the definition of edge, and it is also given when matching the edge in link_source as NodeTypeXNode.
Obviously I am not understanding how to solve this problem. I am stuck thinking in class hierarchies. What would be the correct way to express what I am achieving in the Scala code above in OCaml or Haskell?
Edit: the answer with GADTs is much more direct.
Here's a Haskell version (without unsafeCoerce), which is one possible translation of your Scala code. I can't help with an OCaml solution however.
Note that, in Haskell, == cannot be used on values of different type (and the ability to do so in Scala is frequently frowned upon and a source of annoyance and bugs). However, I've provided a solution below for comparing different node types if you really need it. If you don't truly need it, I'd recommend avoiding it, as it depends on GHC features/extensions that make your code less portable and could potentially even cause problems for the type checker.
WITHOUT polymorphic node comparison:
{-# LANGUAGE TypeFamilies, FlexibleContexts #-}
-- the FlexibleContexts extension can be eliminated
-- by removing the constraint on edgeSource.
-- let's start with just the data types
data NodeType1 = NodeType1 { name1 :: String } deriving Eq
data NodeType2 = NodeType2 { name2 :: String } deriving Eq
data NodeType3 = NodeType3 { name3 :: String } deriving Eq
data EdgeType1 = EdgeType1 { source1 :: NodeType1, target1 :: NodeType2 }
data EdgeType2 = EdgeType2 { source2 :: NodeType2, target2 :: NodeType1 }
-- you tell the compiler that the node types
-- somehow "belong together" by using a type class
class Node a where name :: a -> String
instance Node NodeType1 where name = name1
instance Node NodeType2 where name = name2
instance Node NodeType3 where name = name3
-- same about the edges, however in order to
-- map each Edge type to a different Node type,
-- you need to use TypeFamilies; see
-- https://wiki.haskell.org/GHC/Type_families
class Edge a where
type SourceType a
-- the constraint here isn't necessary to make
-- the code compile, but it ensures you can't
-- map Edge types to non-Node types.
edgeSource :: Node (SourceType a) => a -> SourceType a
instance Edge EdgeType1 where
type SourceType EdgeType1 = NodeType1
edgeSource = source1
instance Edge EdgeType2 where
type SourceType EdgeType2 = NodeType2
edgeSource = source2
main = do
let n1 = NodeType1 "Node1"
n2 = NodeType2 "Node2"
edge = EdgeType1 n1 n2
source = edgeSource edge
print (source == n1) -- True
-- print (source == n2) -- False -- DOESN'T COMPILE
WITH polymorphic node comparison:
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}
-- again, constraint not required but makes sure you can't
-- define node equality for non-Node types.
class (Node a, Node b) => NodeEq a b where
nodeEq :: a -> b -> Bool
-- I wasn't able to avoid OVERLAPPING/OVERLAPS here.
-- Also, if you forget `deriving Eq` for a node type N,
-- `nodeEq` justs yield False for any a, b :: N, without warning.
instance {-# OVERLAPPING #-} (Node a, Eq a) => NodeEq a a where
nodeEq = (==)
instance {-# OVERLAPPING #-} (Node a, Node b) => NodeEq a b where
nodeEq _ _ = False
main = do
let n1 = NodeType1 "Node1"
n2 = NodeType2 "Node2"
edge = EdgeType1 n1 n2
source = edgeSource edge
print (source `nodeEq` n1) -- True
print (source `nodeEq` n2) -- False
The abov is not the only way to tell the Haskell type system about your constraints, for example functional dependencies seem applicable, and GADTs.
Explanation:
It's worth understanding why the solution seems to be more direct in Scala.
Scala's a hybrid between subtype polymorphism based OO such as the one found in C++, Java/C#, Python/Ruby, and (often Haskell-like) functional programming, which typically avoids subtyping a.k.a. datatype inheritance, and resorts to other, arguably better, forms of polymorphism.
In Scala, the way you define ADTs is by encoding them as a sealed trait + a number of (potentially sealed) case classes and/or case objects. However, this is only a pure ADT only if you never refer to the types of the case objects and case classes, so as to pretend they are like the Haskell or ML ADTs. However, your Scala solution indeed uses those types, i.e. it points "into" the ADT.
There's no way to do that in Haskell as individual constructors of an ADT do not have a distinct type. Instead, if you need to type-distinguish between individual constructors of an ADT, you need to split the original ADT into separate ADTs, one per constructor of the original ADT. You then "group" these ADTs together, so as to be able to refer to all of them in your type signatures, by putting them in a type class, which is a form of ad-hoc polymorphism.
I think the most straightforward translation of your Scala version is using phantom types to mark the node and edge type and bind them to specific constructors with GADTs.
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
data Type = Type1 | Type2
data Edge t where
EdgeType1 :: Node Type1 -> Node Type2 -> Edge Type1
EdgeType2 :: Node Type2 -> Node Type1 -> Edge Type2
data Node t where
NodeType1 :: String -> Node Type1
NodeType2 :: String -> Node Type2
instance Eq (Node t) where
NodeType1 a == NodeType1 b = a == b
NodeType2 a == NodeType2 b = a == b
edgeSource :: Edge t -> Node t
edgeSource (EdgeType1 src _) = src
edgeSource (EdgeType2 src _) = src
main :: IO ()
main = do
let n1 = NodeType1 "Node1"
n2 = NodeType2 "Node2"
edge = EdgeType1 n1 n2
src = edgeSource edge
print $ src == n1
This is now actually safer than the Scala version since we know the exact type returned from edgeSource statically instead of just getting an abstract base class that we would need to type-cast or pattern match against.
If you want to mimic the Scala version exactly, you could hide the phantom type in an existential wrapper to return a generic, "unknown" Node from edgeSource.
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE FlexibleInstances #-}
data Some t where
Some :: t x -> Some t
edgeSource :: Edge t -> Some Node
edgeSource (EdgeType1 src _) = Some src
edgeSource (EdgeType2 src _) = Some src
label :: Node t -> String
label (NodeType1 l) = l
label (NodeType2 l) = l
instance Eq (Some Node) where
Some n1 == Some n2 = label n1 == label n2
You were asking too much of the Ocaml type system. At this point in your second attempt:
let link_source (e : edge) : node =
match e with
| EdgeType1 (src, _) ->
you are saying: It should be clear that src is of node_type_1, and I gave the return type node, so the compiler should be able to sort out the correct constructor to use from the type of src. However this is not possible in general: In a given variant, there is not a unique mapping from 'member types' to constructors; for example: type a = A of int | B of int. So you do have to specify the constructor (you could name it shorter).
If you don't want that you have to use polymorphism. One option is make the src_link function polymorphic. One attempt would be
type e12 = node_type_1 * node_type_2
type e21 = node_type_2 * node_type_1
let link_source = fst
but then you have to expose the link types as tuples separately. Another option is to use polymorphic variants.
type node1 = [`N1 of string]
type node2 = [`N2 of string]
type node3 = [`N3 of string]
type node = [node1 | node2 | node3]
type edge = E12 of node1 * node2 | E21 of node2 * node1
then one could write
let link_source (e:edge) : [<node] = match e with
| E12 (`N1 s, _) -> `N1 s
| E21 (`N2 s, _) -> `N2 s
this automaticallly unifies the return type and checks that it's an existing node. The last pattern match can also be handled by type coercion:
let link_source (e:edge) : node = match e with
| E12 (n1, _) -> (n1:>node)
| E21 (n2, _) -> (n2:>node)
GADTs can also help. With the same definitions for node{,1,2,3} above one can define
type ('a, 'b) edge =
| E12 : node1 * node2 -> (node1, node2) edge
| E21 : node2 * node1 -> (node2, node1) edge
and then a polymorphic
let link_source : type a b . (a, b) edge -> a = function
| E12 (n1, _) -> n1
| E21 (n2, _) -> n2
addendum: when using GADTs it's not necessary to use polymorphic variants. So, one can just have
type node1 = N1 of string
type node2 = N2 of string
type node3 = N3 of string
and the same definitions of edge and link_source will work.

Is it possible to lazily traverse a recursive data-structure with O(1) memory usage, tail-call optimized?

Let's say that we have a recursive data-structure, like a binary tree. There are many ways to traverse it, and they have different memory-usage profiles. For instance, if we were to simply print the value of each node, using pseudo-code like the following in-order traversal...
visitNode(node) {
if (node == null) return;
visitNode(node.leftChild);
print(node.value);
visitNode(node.rightChild);
}
...our memory usage would be constant, but due to the recursive calls, we would increase the size of the call stack. On very large trees, this could potentially overflow it.
Let's say that we decided to optimize for call-stack size; assuming that this language is capable of proper tailcalls, we could rewrite this as the following pre-order traversal...
visitNode(node, nodes = []) {
if (node != null) {
print(node.value);
visitNode(nodes.head, nodes.tail + [node.left, node.right]);
} else if (node == null && nodes.length != 0 ) {
visitNode(nodes.head, nodes.tail);
} else return;
}
While we would never blow the stack, we would now see heap usage increase linearly with respect to the size of the tree.
Let's say we were then to attempt to lazily traverse the tree - here is where my reasoning gets fuzzy. I think that even using a basic lazy evaluation strategy, we would grow memory at the same rate as the tailcall optimized version. Here is a concrete example using Scala's Stream class, which provides lazy evaluation:
sealed abstract class Node[A] {
def toStream: Stream[Node[A]]
def value: A
}
case class Fork[A](value: A, left: Node[A], right: Node[A]) extends Node[A] {
def toStream: Stream[Node[A]] = this #:: left.toStream.append(right.toStream)
}
case class Leaf[A](value: A) extends Node[A] {
def toStream: Stream[Node[A]] = this #:: Stream.empty
}
Although only the head of the stream is strictly evaluated, anytime the left.toStream.append(right.toStream) is evaluated, I think this would actually evaluate the head of both the left and right streams. Even if it doesn't (due to append cleverness), I think that recursively building this thunk (to borrow a term from Haskell) would essentially grow memory at the same rate. Rather than saying, "put this node in the list of nodes to traverse", we're basically saying, "here's another value to evaluate that will tell you what to traverse next", but the outcome is the same; linear memory growth.
The only strategy I can think of that would avoid this is having mutable state in each node declaring which paths have been traversed. This would allow us to have a referentially transparent function that says, "Given a node, I will tell you which single node you should traverse next", and we could use that to build an O(1) iterator.
Is there another way to accomplish O(1), tailcall optimized traversal of a binary tree, possibly without mutable state?
Is there another way to accomplish O(1), tailcall optimized traversal of a binary tree, possibly without mutable state?
As I stated in my comment, you can do this if the tree need not survive the traversal. Here's a Haskell example:
data T = Leaf | Node T Int T
inOrder :: T -> [Int]
inOrder Leaf = []
inOrder (Node Leaf x r) = x : inOrder r
inOrder (Node (Node l x m) y r) = inOrder $ Node l x (Node m y r)
This takes O(1) auxiliary space if we assume the garbage collector will clean up any Node that we just processed, so we effectively replace it by a right-rotated version. However, if the nodes we process cannot immediately be garbage-collected, then the final clause may build up an O(n) number of nodes before it hits a leaf.
If you have parent pointers, then it's also doable. Parent pointers require mutable state, though, and prevent sharing of subtrees, so they're not really functional. If you represent an iterator by a pair (cur, prev) that is initially (root, nil), then you can perform iteration as outlined here. You need a language with pointer comparisons to make this work, though.
Without parent pointers and mutable state, you need to maintain some data structure that at least tracks where the root of the tree is and how to get there, since you'll need such a structure at some point during in-order or post-order traversal. Such a structure necessarily takes Ω(d) space where d is the depth of the tree.
A fancy answer.
We can use free monads to get efficient memory utilization bound.
{-# LANGUAGE RankNTypes
, MultiParamTypeClasses
, FlexibleInstances
, UndecidableInstances #-}
class Algebra f x where
phi :: f x -> x
A algebra of a functor f is a function phi from f x to x for some x. For example, any monad has a algebra for any object m x:
instance (Monad m) => Algebra m (m x) where
phi = join
A free monad for any functor f can be constructed (possibly, some sort of functors only, like omega-cocomplete, or some such; but all Haskell types are polynomial functors, which are omega-cocomplete, so the statement is certainly true for all Haskell functors):
data Free f a = Free (forall x. Algebra f x => (a -> x) -> x)
runFree g (Free m) = m g
instance Functor (Free f) where
fmap f m = Free $ \g -> runFree (g . f) m
wrap :: (Functor f) => f (Free f a) -> Free f a
wrap f = Free $ \g -> phi $ fmap (runFree g) f
instance (Functor f) => Algebra f (Free f a) where
phi = wrap
instance (Functor f) => Monad (Free f) where
return a = Free ($ a)
m >>= f = fjoin $ fmap f m
fjoin :: (Functor f) => Free f (Free f a) -> Free f a
fjoin mma = Free $ \g -> runFree (runFree g) mma
Now we can use Free to construct free monad for functor T a:
data T a b = T a b b
instance Functor (T a) where
fmap f (T a l r) = T a (f l) (f r)
For this functor we can define a algebra for object [a]
instance Algebra (T a) [a] where
phi (T a l r) = l++(a:r)
A tree is a free monad over functor T a:
type Tree a = Free (T a) ()
It can be constructed using the following functions (if defined as ADT, they'd be constructor names, so nothing extraordinary):
tree :: a -> Tree a -> Tree a -> Tree a
tree a l r = phi $ T a l r -- phi here is for Algebra f (Free f a)
-- and translates T a (Tree a) into Tree a
leaf :: Tree a
leaf = return ()
To demonstrate how this works:
bar = tree 'a' (tree 'b' leaf leaf) $ tree 'r' leaf leaf
buz = tree 'b' leaf $ tree 'u' leaf $ tree 'z' leaf leaf
foo = tree 'f' leaf $ tree 'o' (tree 'o' leaf leaf) leaf
toString = runFree (\_ -> [] :: String)
main = print $ map toString [bar, buz, foo]
As runFree traverses the tree to replace leaf () with [], the algebra for T a [a] in all contexts is the algebra that constructs a string representing in-order traversal of the tree. Because functor T a b constructs a new tree as it goes, it must have the same memory consumption characteristics as the solution quoted by larsmans - if the tree is not kept in memory, the nodes are discarded as soon as they are replaced by the string representing the whole subtree.
Given that you have references to nodes' parents, there's a nice solution posted here. Replace the while loop with a tail-recursive call (passing in last and current and that should do it.
The built-in back-references allow you to keep track of traversal ordering. Without these, I can't think of a way to do it on a (balanced) tree with less than O(log(n)) auxiliary space.
I was not able to find an answer but I got some pointers. Go have a look at http://www.ics.uci.edu/~dan/pub.html, scroll down to
[33] D.S. Hirschberg and S.S. Seiden, A bounded-space tree traversal algorithm, Information Processing Letters 47 (1993)
Download the postscript file, you may need to convert it to PDF (my ps viewer was unable to present it correctly). It mentions on page 2 (Table 1) a number of algorithms and additional literature.

Why does fold left expect (a -> b -> a) instead of (b -> a -> a)?

I wonder why the function expected by fold left has type signature a -> b -> a instead of b -> a -> a. Is there a design decision behind this?
In Haskell, for example, I have to write foldl (\xs x -> x:xs) [] xs to reverse a list instead of the shorter foldl (:) [] xs (which would be possible with b -> a -> a). On the other hand, there are use cases which require the standard a -> b -> a. In Scala, this could be appending: xs.foldLeft(List.empty[Int]) ((xs, x) => xs:+x) which can be written as xs.foldLeft(List.empty[Int]) (_:+_).
Do proportionately more use cases occur requiring the given type signature instead of the alternative one, or are there other decisions which led to the design that fold left has in Haskell and Scala (and probably lots of other languages)?
Conceptually speaking, a right fold, say foldr f z [1..4] replaces a list of the following form
:
/ \
1 :
/ \
2 :
/ \
3 :
/ \
4 []
with the value of an expression of the following form
f
/ \
1 f
/ \
2 f
/ \
3 f
/ \
4 z
If we were to represent this expression on a single line, all parentheses would associate to the right, hence the name right fold: (1 `f` (2 `f` (3 `f` (4 `f` z)))). A left fold is dual in some sense to a right fold. In particular, we would like for the shape of the corresponding diagram for a left fold to be a mirror image of that for a left fold, as in the following:
f
/ \
f 4
/ \
f 3
/ \
f 2
/ \
z 1
If we were to write out this diagram on a single line, we would get an expression where all parentheses associate to the left, which jibes well with the name of a left fold:
((((z `f` 1) `f` 2) `f` 3) `f` 4)
But notice that in this mirror diagram, the recursive result of the fold is fed to f as the first argument, while each element of the list is fed as the second argument, ie the arguments are fed to f in reverse order compared to right folds.
The type signature is foldl :: (a -> b -> a) -> a -> [b] -> a; it's natural for the combining function to have the initial value on the left, because that's the way it combines with the elements of the list. Similarly, you'll notice foldr has it the other way round. The complication in your definition of reverse is because you're using a lambda expression where flip would have been nicer: foldl (flip (:)) [] xs, which also has the pleasant similarity between the concepts of flip and reverse.
Because you write (a /: bs) for foldLeft in short form; this is an operator which pushes a through all the bs, so it is natural to write the function the same way (i.e. (A,B) => A). Note that foldRight does it in the other order.
Say you have this:
List(4, 2, 1).foldLeft(8)(_ / _)
That's the same as:
((8 / 4) / 2) / 1
See how the first parameter is always te accumulator? Having the parameters in this order makes placeholder syntax (the underscore) a direct translation to the expanded expression.

Datatype for terms over a signature in SML

I want to implement an arbitrary signature in SML. How can I define a datatype for terms over that signature ?I would be needing it to write functions that checks whether the terms are well formed .
In my point of view, there are two major ways of representing an AST. Either as series of (possibly mutually recursive) datatypes or just as one big datatype. There are pros an cos for both.
If we define the following BNF (extracted from the SML definition and slightly simplified)
<exp> ::= <exp> andalso <exp>
| <exp> orelse <exp>
| raise <exp>
| <appexp>
<appexp> ::= <atexp> | <appexp> <atexp>
<atexp> ::= ( <exp>, ..., <exp> )
| [ <exp>, ..., <exp> ]
| ( <exp> ; ... ; <exp> )
| ( <exp> )
| ()
As stated this is simplified and much of the atexp is left out.
1. A series of possibly mutually recursive datatypes
Here you would for example create a datatype for expressions, declarations, patterns, etc.
Basicly you would create a datatype for each of the non-terminals in your BNF.
We would most likely create the following datatypes
datatype exp = Andalso of exp * exp
| Orelse of exp * exp
| Raise of exp
| App of exp * atexp
| Atexp of atexp
and atexp = Tuple of exp list
| List of exp list
| Seq of exp list
| Par of exp
| Unit
Notice that the non-terminal has been consumed into exp datatype instead of having it as its own. That would just clutter up the AST for no reason. You have to remember that a BNF is often written in such a way that it also defined precedens and assosiativity (e.g., for arithmetic). In such cases you can often simplify the BNF by merging multiple non-terminals into one datatype.
The good thing about defining multiple datatypes is that you kind of get some well formednes of your AST. If for example we also had non-terminal for declarations, we know that the AST will newer contain a declaration inside a list (as only expressions can be there). Because of this, most of you well formedness check is not nessesary.
This is however not always a good thing. Often you need to do some checking on the AST anyways, for example type checking. In many cases the BNF is quite large and thus the number of datatypes nessesary to model the AST is also quite large. Keeping this in mind, you have to create one function for each of your datatypes,for every type of modification you wan't to do on your AST. In many cases you only wan't to change a small part of your AST but you will (most likely) still need to define a function for each datatype. Most of these functions will basicly be the identity and then only in a few cases you will do the desired work.
If for example we wan't to count how many units there are in a given AST we could define the following functions
fun countUnitexp (Andalso (e1, e2)) = countUnitexp e1 + countUnitexp e2
| countUnitexp (Orelse (e1, e2)) = countUnitexp e1 + countUnitexp e2
| countUnitexp (Raise e1) = countUnitexp e1
| countUnitexp (App (e1, atexp)) = countUnitexp e1 + countUnitatexp atexp
| countUnitexp (Atexp atexp) = countUnitatexp atexp
and countUnitatexp (Tuple exps) = sumUnit exps
| countUnitatexp (List exps) = sumUnit exps
| countUnitatexp (Seq exps) = sumUnit exps
| countUnitatexp (Par exp) = countUnitexp exp
| countUnitatexp Unit = 1
and sumUnit exps = foldl (fn (exp,b) => b + countUnitexp exp) 0 exps
As you may see we are doing a lot of work, just for this simple task. Imagine a bigger grammar and a more complicated task.
2. One (big) datatype (nodes) -- and a Tree of these nodes
Lets combine the datatypes from before, but change them such that they don't (themself) contain their children. Because in this approach we build a tree structure that has a node and some children of that node. Obviously if you have an identifier, then the identifier needs to contain the actual string representation (e.g., variable name).
So lets start out by defined the nodes for the tree structure.
(* The comment is the kind of children and possibly specific number of children
that the BNF defines to be valid *)
datatype node = Exp_Andalso (* [exp, exp] *)
| Exp_Orelse (* [exp, exp] *)
| Exp_Raise (* [exp] *)
| Exp_App (* [exp, atexp] *)
(* Superflous:| Exp_Atexp (* [atexp] *) *)
| Atexp_Tuple (* exp list *)
| Atexp_List (* exp list *)
| Atexp_Seq (* exp list *)
| Atexp_Par (* [exp] *)
| Atexp_Unit (* [] *)
See how the Atexp from the tupe now becomes superflous and thus we remove it. Personally I think it is nice to have the comment next by telling which children (in the tree structure) we can expect.
(* Note this is a non empty tree. That is you have to pack it in an option type
if you wan't to represent an empty tree *)
datatype 'a tree = T of 'a * 'a tree list
(* Define the ast as trees of our node datatype *)
type ast = node tree
We then define a generic tree and define the type ast to be a "tree of nodes".
If you use some library then there is a big chance that such a tree structure is already present. Also it might be handy late on to extend this tree structure to contain more than just the node as data, however we just keep it simple here.
fun foldTree f b (T (n, [])) = f (n, b)
| foldTree f b (T (n, ts)) = foldl (fn (t, b') => foldTree f b' t)
(f (n, b)) ts
For this example we define a fold function over the tree, again if you are using a library then all these functions for folding, mapping, etc. are most likely already defined.
fun countUnit (Atexp_Unit) = 1
| countUnit _ = 0
If we then take the example from before, that we wan't to count the number of occurances of unit, we can then just fold the above function over the tree.
val someAST = T(Atexp_Tuple, [ T (Atexp_Unit, [])
, T (Exp_Raise, [])
, T (Atexp_Unit, [])
]
)
A simple AST could look like the above (note that this is actually not valid as we have a Exp_Raise with no children). And we could then do the counting by
foldTree (fn (a,b) => (countUnit a) + b) 0 someAST
The down side of this approach is that you have to write a check function that verifies that your AST is well formed, as there is no restrictions when you create the AST. This includes that the children are of the correct "type" (e.g., only Exp_* as children in an Exp_Andalso) and that there are the correct number of children (e.g., exactly two children in Exp_Andalso).
This approach also requires a bit of builk getting started, given you don't use some library that has a tree defined (including auxilary functions for modifying the tree). However in the long run it pays of.

Defining multiple-type container classes in haskell, trouble binding variables

I'm having trouble with classes in haskell.
Basically, I have an algorithm (a weird sort of graph-traversal algorithm) that takes as input, among other things, a container to store the already-seen nodes (I'm keen on avoiding monads, so let's move on. :)). The thing is, the function takes the container as a parameter, and calls just one function: "set_contains", which asks if the container... contains node v. (If you're curious, another function passed in as a parameter does the actual node-adding).
Basically, I want to try a variety of data structures as parameters. Yet, as there is no overloading, I cannot have more than one data structure work with the all-important contains function!
So, I wanted to make a "Set" class (I shouldn't roll my own, I know). I already have a pretty nifty Red-Black tree set up, thanks to Chris Okasaki's book, and now all that's left is simply making the Set class and declaring RBT, among others, as instances of it.
Here is the following code:
(Note: code heavily updated -- e.g., contains now does not call a helper function, but is the class function itself!)
data Color = Red | Black
data (Ord a) => RBT a = Leaf | Tree Color (RBT a) a (RBT a)
instance Show Color where
show Red = "r"
show Black = "b"
class Set t where
contains :: (Ord a) => t-> a-> Bool
-- I know this is nonesense, just showing it can compile.
instance (Ord a) => Eq (RBT a) where
Leaf == Leaf = True
(Tree _ _ x _) == (Tree _ _ y _) = x == y
instance (Ord a) => Set (RBT a) where
contains Leaf b = False
contains t#(Tree c l x r) b
| b == x = True
| b < x = contains l b
| otherwise = contains r b
Note how I have a pretty stupidly-defined Eq instance of RBT. That is intentional --- I copied it (but cut corners) from the gentle tutorial.
Basically, my question boils down to this: If I comment out the instantiation statement for Set (RBT a), everything compiles. If I add it back in, I get the following error:
RBTree.hs:21:15:
Couldn't match expected type `a' against inferred type `a1'
`a' is a rigid type variable bound by
the type signature for `contains' at RBTree.hs:11:21
`a1' is a rigid type variable bound by
the instance declaration at RBTree.hs:18:14
In the second argument of `(==)', namely `x'
In a pattern guard for
the definition of `contains':
b == x
In the definition of `contains':
contains (t#(Tree c l x r)) b
| b == x = True
| b < x = contains l b
| otherwise = contains r b
And I simply cannot, for the life of me, figure out why that isn't working. (As a side note, the "contains" function is defined elsewhere, and basically has the actual set_contains logic for the RBT data type.)
Thanks! - Agor
Third edit: removed the previous edits, consolidated above.
You could also use higher-kinded polyphormism. The way your class is defined it sort of expects a type t which has kind *. What you probably want is that your Set class takes a container type, like your RBT which has kind * -> *.
You can easily modify your class to give your type t a kind * -> * by applying t to a type variable, like this:
class Set t where
contains :: (Ord a) => t a -> a -> Bool
and then modify your instance declaration to remove the type variable a:
instance Set RBT where
contains Leaf b = False
contains t#(Tree c l x r) b
| b == x = True
| b < x = contains l b
| otherwise = contains r b
So, here is the full modified code with a small example at the end:
data Color = Red | Black
data (Ord a) => RBT a = Leaf | Tree Color (RBT a) a (RBT a)
instance Show Color where
show Red = "r"
show Black = "b"
class Set t where
contains :: (Ord a) => t a -> a -> Bool
-- I know this is nonesense, just showing it can compile.
instance (Ord a) => Eq (RBT a) where
Leaf == Leaf = True
(Tree _ _ x _) == (Tree _ _ y _) = x == y
instance Set RBT where
contains Leaf b = False
contains t#(Tree c l x r) b
| b == x = True
| b < x = contains l b
| otherwise = contains r b
tree = Tree Black (Tree Red Leaf 3 Leaf) 5 (Tree Red Leaf 8 (Tree Black Leaf 12 Leaf))
main =
putStrLn ("tree contains 3: " ++ test1) >>
putStrLn ("tree contains 12: " ++ test2) >>
putStrLn ("tree contains 7: " ++ test3)
where test1 = f 3
test2 = f 12
test3 = f 7
f = show . contains tree
If you compile this, the output is
tree contains 3: True
tree contains 12: True
tree contains 7: False
You need a multi-parameter type class. Your current definition of Set t doesn't mention the contained type in the class definition, so the member contains has to work for any a. Try this:
class Set t a | t -> a where
contains :: (Ord a) => t-> a-> Bool
instance (Ord a) => Set (RBT a) a where
contains Leaf b = False
contains t#(Tree c l x r) b
| b == x = True
| b < x = contains l b
| otherwise = contains r b
The | t -> a bit of the definition is a functional dependency, saying that for any given t there is only one possible a. It's useful to have (when it makes sense) since it helps the compiler figure out types and reduces the number of ambiguous type problems you often otherwise get with multi-parameter type classes.
You'll also need to enable the language extensions MultiParamTypeClasses and FunctionalDependencies at the top of your source file:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
The error means that the types don't match. What is the type of contains? (If its type is not something like t -> a -> Bool as set_contains is, something is wrong.)
Why do you think you shouldn't roll your own classes?
When you write the instance for Set (RBT a), you define contains for the specific type a only. I.e. RBT Int is a set of Ints, RBT Bool is a set of Bools, etc.
But your definition of Set t requires that t be a set of all ordered a's at the same time!
That is, this should typecheck, given the type of contains:
tree :: RBT Bool
tree = ...
foo = contains tree 1
and it obviously won't.
There are three solutions:
Make t a type constructor variable:
class Set t where
contains :: (Ord a) => t a -> a-> Bool
instance Set RBT where
...
This will work for RBT, but not for many other cases (for example, you may want to use a bitset as a set of Ints.
Functional dependency:
class (Ord a) => Set t a | t -> a where
contains :: t -> a -> Bool
instance (Ord a) => Set (RBT a) a where
...
See GHC User's Guide for details.
Associated types:
class Set t where
type Element t :: *
contains :: t -> Element t -> Bool
instance (Ord a) => Set (RBT a) where
type Element (RBT a) = a
...
See GHC User's Guide for details.
To expand on Ganesh's answer, you can use Type Families instead of Functional Dependencies. Imho they are nicer. And they also change your code less.
{-# LANGUAGE FlexibleContexts, TypeFamilies #-}
class Set t where
type Elem t
contains :: (Ord (Elem t)) => t -> Elem t -> Bool
instance (Ord a) => Set (RBT a) where
type Elem (RBT a) = a
contains Leaf b = False
contains (Tree c l x r) b
| b == x = True
| b < x = contains l b
| otherwise = contains r b