Gentle Intro to Haskell: " .... there is no single type that contains both 2 and 'b'." Can I not make such a type ? - scala

I am currently learning Haskell, so here are a beginner's questions:
What is meant by single type in the text below ?
Is single type a special Haskell term ? Does it mean atomic type here ?
Or does it mean that I can never make a list in Haskell in which I can put both 1 and 'c' ?
I was thinking that a type is a set of values.
So I cannot define a type that contains Chars and Ints ?
What about algebraic data types ?
Something like: data IntOrChar = In Int | Ch Char ? (I guess that should work but I am confused what the author meant by that sentence.)
Btw, is that the only way to make a list in Haskell in which I can put both Ints and Chars? Or is there a more tricky way ?
A Scala analogy: in Scala it would be possible to write implicit conversions to a type that represents both Ints and Chars (like IntOrChar) and then it would be possible to put seemlessly Ints and Chars into List[IntOrChar], is that not possible with Haskell ? Do I always have to explicitly wrap every Int or Char into IntOrChar if I want to put them into a list of IntOrChar ?
From Gentle Intro to Haskell:
Haskell also incorporates polymorphic types---types that are
universally quantified in some way over all types. Polymorphic type
expressions essentially describe families of types. For example,
(forall a)[a] is the family of types consisting of, for every type a,
the type of lists of a. Lists of integers (e.g. [1,2,3]), lists of
characters (['a','b','c']), even lists of lists of integers, etc., are
all members of this family. (Note, however, that [2,'b'] is not a
valid example, since there is no single type that contains both 2 and
'b'.)

Short answer.
In Haskell there are no implicit conversions. Also there are no union types - only disjoint unions(which are algebraic data types). So you can only write:
someList :: [IntOrChar]
someList = [In 1, Ch 'c']
Longer and certainly not gentle answer.
Note: This is a technique that's very rarely used. If you need it you're probably overcomplicating your API.
There are however existential types.
{-# LANGUAGE ExistentialQuantification, RankNTypes #-}
class IntOrChar a where
intOrChar :: a -> Either Int Char
instance IntOrChar Int where
intOrChar = Left
instance IntOrChar Char where
intOrChar = Right
data List = Nil
| forall a. (IntOrChar a) => Cons a List
someList :: List
someList = (1 :: Int) `Cons` ('c' `Cons` Nil)
Here I have created a typeclass IntOrChar with only function intOrChar. This way you can convert anything of type forall a. (IntOrChar a) => a to Either Int Char.
And also a special kind of list that uses existential type in its second constructor.
Here type variable a is bound(with forall) at the constructor scope. Therefore every time
you use Cons you can pass anything of type forall a. (IntOrChar a) => a as a first argument. Consequently during a destruction(i.e. pattern matching) the first argument will
still be forall a. (IntOrChar a) => a. The only thing you can do with it is either pass it on or call intOrChar on it and convert it to Either Int Char.
withHead :: (forall a. (IntOrChar a) => a -> b) -> List -> Maybe b
withHead f Nil = Nothing
withHead f (Cons x _) = Just (f x)
intOrCharToString :: (IntOrChar a) => a -> String
intOrCharToString x =
case intOrChar of
Left i -> show i
Right c -> show c
someListHeadString :: Maybe String
someListHeadString = withHead intOrCharToString someList
Again note that you cannot write
{- Wont compile
safeHead :: IntOrChar a => List -> Maybe a
safeHead Nil = Nothing
safeHead (Cons x _) = Just x
-}
-- This will
safeHead2 :: List -> Maybe (Either Int Char)
safeHead2 Nil = Nothing
safeHead2 (Cons x _) = Just (intOrChar x)
safeHead will not work because you want a type of IntOrChar a => Maybe a with a bound at safeHead scope and Just x will have a type of IntOrChar a1 => Maybe a1 with a1 bound at Cons scope.

In Scala there are types that include both Int and Char such as AnyVal and Any, which are both supertypes of Char and Int. In Haskell there is no such hierarchy, and all the basic types are disjoint.
You can create your own union types which describe the concept of 'either an Int or a Char (or you could use the built-in Either type), but there are no implicit conversions in Haskell to transparently convert an Int into an IntOrChar.
You could emulate the concept of 'Any' using existential types:
data AnyBox = forall a. (Show a, Hashable a) => AB a
heteroList :: [AnyBox]
heteroList = [AB (1::Int), AB 'b']
showWithHash :: AnyBox -> String
showWithHash (AB v) = show v ++ " - " ++ (show . hash) v
let strs = map showWithHash heteroList
Be aware that this pattern is discouraged however.

I think that the distinction that is being made here is that your algebraic data type IntOrChar is a "tagged union" - that is, when you have a value of type IntOrChar you will know if it is an Int or a Char.
By comparison consider this anonymous union definition (in C):
typedef union { char c; int i; } intorchar;
If you are given a value of type intorchar you don't know (apriori) which selector is valid. That's why most of the time the union constructor is used in conjunction with a struct to form a tagged-union construction:
typedef struct {
int tag;
union { char c; int i; } intorchar_u
} IntOrChar;
Here the tag field encodes which selector of the union is valid.
The other major use of the union constructor is to overlay two structures to get an efficient mapping between sub-structures. For example, this union is one way to efficiently access the individual bytes of a int (assuming 8-bit chars and 32-bit ints):
union { char b[4]; int i }
Now, to illustrate the main difference between "tagged unions" and "anonymous unions" consider how you go about defining a function on these types.
To define a function on an IntOrChar value (the tagged union) I claim you need to supply two functions - one which takes an Int (in the case that the value is an Int) and one which takes a Char (in case the value is a Char). Since the value is tagged with its type, it knows which of the two functions it should use.
If we let F(a,b) denote the set of functions from type a to type b, we have:
F(IntOrChar,b) = F(Int,b) \times F(Char,b)
where \times denotes the cross product.
As for the anonymous union intorchar, since a value doesn't encode anything bout its type the only functions which can be applied are those which are valid for both Int and Char values, i.e.:
F(intorchar,b) = F(Int,b) \cap F(Char,b)
where \cap denotes intersection.
In Haskell there is only one function (to my knowledge) which can be applied to both integers and chars, namely the identity function. So there's not much you could do with a list like [2, 'b'] in Haskell. In other languages this intersection may not be empty, and then constructions like this make more sense.
To summarize, you can have integers and characters in the same list if you create a tagged-union, and in that case you have to tag each of the values which will make you list look like:
[ I 2, C 'b', ... ]
If you don't tag your values then you are creating something akin to an anonymous union, but since there aren't any (useful) functions which can be applied to both integers and chars there's not really anything you can do with that kind of union.

Related

How declare tagged union of polymorphic collection types

I'm new to Purescript. My current learning exercise is to create a tagged union of polymorphic Array and List. I'll use it in a function that finds the length of any Array or List. Here's my attempt:
import Data.List as L
import Data.Array as A
data Collection = CollectionList (forall a. L.List a)
| CollectionArray (forall b. Array b)
colLength :: Collection -> Int
colLength (CollectionList list) = L.length list
colLength (CollectionArray arr) = A.length arr
main :: Effect Unit
main = do
logShow (colLength (CollectionArray [3,5]))
The compiler doesn't like it:
Could not match type Int with type b0
while checking that type Int is at least as general as type b0
while checking that expression 3 has type b0
in value declaration main
where b0 is a rigid type variable
I'm confused by the parts, checking that type Int is at least as general as type b0 and b0 is a rigid type variable. My intention was to allow b to be anything. Not sure what I did to make the compiler put conditions on what b can be.
If you know how, please show the correct way to define a tagged union of polymoric types that'll work in my colLength function.
forall a doesn't mean "any type goes here"
It means that whoever accesses the value, gets to choose what a is, and whoever provides the value has to make sure that the value is of that type. It's a contract between the provider and the consumer.
So when you provide the value CollectionArray [3,5], you have to make it such that it works for all possible a that whoever accesses that value later might choose.
Obviously, there is only one way you can construct such value:
CollectionArray []
What you probably actually meant to do (and I'm guessing here) was to make your collection polymorphic, in the sense that it can contain values of any type, but the type is chosen by whoever creates the collection, and then whoever accesses it later has to deal with that particular type.
To do that, you have to put the type variable on the outside:
data Collection a = CollectionList (L.List a)
| CollectionArray (Array a)
That way, when you create a collection CollectionArray [3,5], it becomes of type Collection Int, and now everywhere you pass it, such as colLength, will have to deal with that Int
This, in turn, can be achieved by making colLength itself generic:
colLength :: forall a. Collection a -> Int
colLength (CollectionList list) = L.length list
colLength (CollectionArray arr) = A.length arr
Now whoever accesses (i.e. calls) colLength itself gets to choose what a is, which works fine, because it's the same place that created the Connection Int in the first place.

Get Array containing all data type posible values?

Isn't it possible given
data Letter = A | B | C | ... | Z
automagically get an array that contains all the possible values:
[A, B, C, ..., Z]
?
You can use the autoderived Generic type class:
data Letter = ...
derive instance Generic Letter _
Note the underscore at the end of the derive line: the Generic type class takes two types - (1) the data type you're describing and (2) its generic description, but the latter will be provided by the compiler automatically for you. That's what the underscore means.
And then you can enumerate all elements by their indexes, using genericBottom/genericTop to get the index range and genericFromEnum/genericToEnum to convert to/from integers.
One hiccup is that genericToEnum returns a Maybe, because strictly speaking not every integer number can be converted to an enum value, but in this case you know all the numbers are valid, because you obtained them by genericFromEnum in the first place, so you can just mapMaybe instead of regular map:
allElements ::
forall a rep.
Generic a rep =>
GenericBoundedEnum rep =>
GenericTop rep =>
GenericBottom rep =>
Array a
allElements = mapMaybe genericToEnum (idxFrom..idxTo)
where
idxFrom = genericFromEnum (genericBottom :: a)
idxTo = genericFromEnum (genericTop :: a)
Usage:
allLetters :: Array Letter
allLetters = allElements
Note that the allElements function is generic enough to work with any type, provided (1) all its constructors are parameterless and (2) it has a Generic instance.

Scala currying example in the tutorial is confusing me

I'm reading a bit about Scala currying here and I don't understand this example very much:
def foldLeft[B](z: B)(op: (B, A) => B): B
What is the [B] in square brackets? Why is it in brackets? The B after the colon is the return type right? What is the type?
It looks like this method has 2 parameter lists: one with a parameter named z and one with a parameter named op which is a function.
op looks like it takes a function (B, A) => B). What does the right side mean? It returns B?
And this is apparently how it is used:
val numbers = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val res = numbers.foldLeft(0)((m, n) => m + n)
print(res) // 55
What is going on? Why wasn't the [B] needed when called?
In Scala documentation that A type (sometimes A1) is often the placeholder for a collection's element type. So if you have...
List('c','q','y').foldLeft( //....etc.
...then A becomes the reference for Char because the list is a List[Char].
The B is a placeholder for a 2nd type that foldLeft will have to deal with. Specifically it is the type of the 1st parameter as well as the type of the foldLeft result. Most of the time you actually don't need to specify it when foldLeft is invoked because the compiler will infer it. So, yeah, it could have been...
numbers.foldLeft[Int](0)((m, n) => m + n)
...but why bother? The compiler knows it's an Int and so does anyone reading it (anyone who knows Scala).
(B, A) => B is a function that takes 2 parameters, one of type B and one of type A, and produces a result of type B.
What is the [B] in square brackets?
A type parameter (that's also known as "generics" if you've seen something like Java before)
Why is it in brackets?
Because type parameters are written in brackets, that's just Scala syntax. Java, C#, Rust, C++ use angle brackets < > for similar purposes, but since arrays in Scala are accessed as arr(idx), and (unlike Haskell or Python) Scala does not use [ ... ] for list comprehensions, the square brackets could be used for type parameters, and there was no need for angular brackets (those are more difficult to parse anyway, especially in a language which allows almost arbitrary names for infix and postfix operators).
The B after the colon is the return type right?
Right.
What is the type?
Ditto. The return type is B.
It looks like this method has 2 parameter lists: one with a parameter named z and one with a parameter named op which is a function.
This method has a type parameter B and two argument lists for value arguments, correct. This is done to simplify the type inference: the type B can be inferred from the first argument z, so it does not have to be repeated when writing down the lambda-expression for op. This wouldn't work if z and op were in the same argument list.
op looks like it takes a function (B, A) => B.
The type of the argument op is (B, A) => B, that is, Function2[B, A, B], a function that takes a B and an A and returns a B.
What does the right side mean? It returns B?
Yes.
What is going on?
m acts as accumulator, n is the current element of the list. The fold starts with integer value 0, and then accumulates from left to right, adding up all numbers. Instead of (m, n) => m + n, you could have written _ + _.
Why wasn't the [B] needed when called?
It was inferred from the type of z. There are many other cases where the generic type cannot be inferred automatically, then you would have to specify the return type explicitly by passing it as an argument in the square brackets.
This is what is called polymorphism. The function can work on multiple types and you sometimes want to give a parameter of what type will be worked with. Basically the B is a type parameter and can either be given explicitly as a type, which would be Int and then it should be given in square brackets or implicitly in parentheses like you did with the 0. Read about polymorphism here Scala polymorphism

in insufficiently-polymorphic why are there less ways to implement `List a -> List a -> List a` then `List Char -> List Char -> List Char`

in insufficiently-polymorphic
the author says about:
def foo[A](fst: List[A], snd: List[A]): List[A]
There are fewer ways we can implement the function. In particular, we
can’t just hard-code some elements in a list, because we have no
ability to manufacture values of an arbitrary type.
I did not understand this, because also in the [Char] version we had no ability to manufacture values of an arbitrary type we had to have them of type [Char] so why are there less ways to implement this?
In the generic version you know that the output list can only contain some arrangement of the elements contained in fst and snd since there is no way to construct new values of some arbitrary type A. In contrast, if you know the output type is Char you can e.g.
def foo(fst: List[Char], snd: List[Char]) = List('a', 'b', 'c')
In addition you cannot use the values contained in the input lists to make decisions which affect the output, since you don't know what they are. You can do this if you know the input type e.g.
def foo(fst: List[Char], snd: List[Char]) = fst match {
case Nil => snd
case 'a'::fs => snd
case _ => fst
}
I'm assuming the author means, that there's no way to construct a non-empty List a but there's a way to construct a List Char, e.g. by using a String literal. You could just ignore the arguments and just return a hard-coded String.
An example of this would be:
foo :: List Char -> List Char -> List Char
foo a b = "Whatever"
You can't construct a value of an arbitrary type a, but you can construct a value of type Char.
This is a simple case of a property called "parametricity" or "free theorem", which applies to every polymorphic function.
An even simpler example is the following:
fun1 :: Int -> Int
fun2 :: forall a. a -> a
fun1 can be anything: successor, predecessor, square, factorial, etc. This is because it can "read" its input, and act accordingly.
fun2 must be the identity function (or loop forever). This because fun2 receives its input, but it can not examine it in any useful way: since it is of an abstract, unknown type a, no operations can be performed on it. The input is effectively an opaque token. The output of foo2 must be of type a, for which we do not know any construction means -- we can not create a value of type a from nothing. The only option is to take the input a and use it to craft the output a. Hence, fun2 is the identity.
The above parametricity result holds when you have no way to perform tests on the input or the type a. If we, e.g., allowed if x.instanceOf[Int] ..., or if x==null ..., or type casts (in OOP) then we could write fun2 in other ways.

Encoding of inferrable records

As you probably know, records are somewhat special in ocaml, as each label has to be uniquely assigned to a nominal record type, i.e. the following function cannot be typed without context:
let f r = r.x
Proper first class records (i.e. things that behave like tuples with labels) are trivially encoded using objects, e.g.
let f r = r#x
when creating the objects in the right way (i.e. no self-recursion, no mutation), they behave just like records.
I am however, somewhat unhappy with this solution for two reasons:
when making records updatetable (i.e. by adding an explicit "with_l" method for each label l), the type is somewhat too loose (it should be the same as the original record). Admitted, one can enforce this equality, but this is still inconvenient.
I have the suspicion that the OCaml compiler does not infer that these records are actually immutable: In a function
let f r = r#x + r#x
would the compiler be able to run a common subexpression elimination?
For these reasons, I wonder if there is a better encoding:
Is there another (aside from using objects) type-safe encoding (e.g. using polymorphic variants) of records with inferrable type in OCaml?
Can this encoding avoid the problems mentioned above?
If I understand you correctly you're looking for a very special kind of polymorphism. You want to write a function that will work for all types, such that the type is a record with certain fields. This sounds more like a syntactic polymorphism in a C++ style, not as semantic polymorphism in ML style. If we will slightly rephrase the task, by capturing the idea that a field accessing is just a syntactic sugar for a field projection function, then we can say, that you want to write a function that is polymorphic over all types that provide a certain set of operations. This kind of polymorphism can be captured by OCaml using one of the following mechanisms:
functors
first class modules
objects
I think that functors are obvious, so I will show an example with first class modules. We will write a function print_student that will work on any type that satisfies the Student signature:
module type Student = sig
type t
val name : t -> string
val age : t -> int
end
let print_student (type t)
(module S : Student with type t = t) (s : t) =
Printf.printf "%s %d" (S.name s) (S.age s)
The type of print_student function is (module Student with type t = 'a) -> 'a -> unit. So it works for any type that satisfies the Student interface, and thus it is polymorphic. This is a very powerful polymorphism that comes with a price, you need to pass the module structure explicitly when you're invoking the function, so it is a System F style polymorphism. Functors will also require you to specify concrete module structure. So both are not inferrable (i.e., not an implicit Hindley-Milner-like style polymorphism, that you are looking for). For the latter, only objects will work (there are also modular implicits, that relax the explicitness requirement, but they are still not in the trunk, but they will actually answer your requirements).
With object-style row polymorphism it is possible to write a function that is polymorphic over a set of types conforming to some signature, and to infer this signature implicitly from the function definintion. However, such power comes with a price. Since object operations are encoded with methods and methods are just function pointers that are assigned dynamically in the runtime, you shouldn't expect any compile time optimizations. It is not possible to perform any static analysis on something that is bound dynamically. So, of course, no Common Subexpression elimination, nor inlining. For functors and first class modules, the optimization is possible on a newer branch of the compiler with flamba (See 4.03.0+flambda opam switch). But on a regular compiler installation no inlining will be performed.
Different approaches
What concerning other techniques. First of all we can use camlp{4,5}, or ppx or even m4 and cpp to preprocess code, but this would be hardly idiomatic and of doubtful usefulness.
Another way, is instead of writing a function that is polymorphic, we can try to find a suitable monomorphic data type. A direct approach would be to use a list of polymorphic variants, e.g.,
type attributes = [`name of string | `age of int]
type student = attribute list
In fact we even don't need to specify all these types ahead, and our function can require only those fields that are needed, a form of a row polymorphism:
let rec name = function
| [] -> raise Not_found
| `name n -> n
| _ :: student -> name student
The only problem with this encoding, is that you cannot guarantee that the same named attribute can occur once and only once. So it is possible that a student doesn't have a name at all, or, that is worser, it can have more then one names. Depending on your problem domain it can be acceptable.
If it is not, then we can use GADT and extensible variants to encode heterogenous maps, i.e., an associative data structures that map keys to
different type (in a regular (homogenous) map or assoc list value type is unified). How to construct such containers is beyond the scope of the answer, but fortunately there're at least two available implementations. One, that I use personally is called universal map (Univ_map) and is provided by a Core library (Core_kernel in fact). It allows you to specify two kinds of heterogenous maps, with and without a default values. The former corresponds to a record with optional field, the latter has default for each field, so an accessor is a total function. For example,
open Core_kernel.Std
module Dict = Univ_map.With_default
let name = Dict.Key.create ~name:"name" ~default:"Joe" sexp_of_string
let age = Dict.Key.create ~name:"age" ~default:18 sexp_of_int
let print student =
printf "%s %d"
(Dict.get student name) (Dict.get age name)
You can hide that you're using universal map using abstract type, as there is only one Dict.t that can be used across different abstractions, that may break modularity. Another example of heterogeneous map implementation is from Daniel Bunzli. It doesn't provide With_default kind of map, but has much less dependencies.
P.S. Of course for such a redundant case, where this only one operation it is much easier to just pass this operation explicitly as function, instead of packing it into a structure, so we can write function f from your example as simple as let f x r = x r + x r. But this would be the same kind of polymoprism as with first class modules/functors, just simplified. And I assume, that your example was specifically reduced to one field, and in your real use case you have more complex set of fields.
Very roughly speaking, an OCaml object is a hash table whose keys are its method name hash. (The hash of a method name can be obtained by Btype.hash_variant of OCaml compiler implementation.)
Just like objects, you can encode polymorphic records using (int, Obj.t) Hashtbl.t. For example, a function to get a value of a field l can be written as follows:
(** [get r "x"] is poly-record version of [r.x] *)
let get r k = Hashtbl.find t (Btype.hash_variant k))
Since it is easy to access the internals unlike objects, the encoding of {r with l = e} is trivial:
(** [copy_with r [(k1,v1);..;(kn,vn)]] is poly-record version of
[{r with k1 = v1; ..; kn = vn}] *)
let copy_with r fields =
let r = Hashtbl.copy r in
List.iter (fun (k,v) -> Hashtbl.replace r (Btype.hash_variant k) v) fields
and the creation of poly-records:
(** [create [(k1,v1);..(kn,vn)]] is poly-record version of [{k1=v1;..;kn=vn}] *)
let create fields = copy_with fields (Hashtbl.create (List.length fields))
Since all the types of the fields are squashed into one Obj.t, you have to use Obj.magic to store various types into this implementation and therefore this is not type-safe by itself. However, we can make it type-safe wrapping (int, Obj.t) Hashtbl.t with phantom type whose parameter denotes the fields and their types of a poly-record. For example,
<x : int; y : float> Poly_record.t
is a poly-record whose fields are x : int and y : float.
Details of this phantom type wrapping for the type safety is too long to explain here. Please see my implementation https://bitbucket.org/camlspotter/ppx_poly_record/src . To tell short, it uses PPX preprocessor to generate code for type-safety and to provide easier syntax sugar.
Compared with the encoding by objects, this approach has the following properties:
The same type safety and the same field access efficiency as objects
It can enjoy structural subtyping like objects, what you want for poly-records.
{r with l = e} is possible
Streamable outside of a program safely, since hash tables themselves have no closure in it. Objects are always "contaminated" with closures therefore they are not safely streamable.
Unfortunately it lacks efficient pattern matching, which is available for mono-records. (And this is why I do not use my implementation :-( ) I feel for it PPX reprocessing is not enough and some compiler modification is required. It will not be really hard though since we can make use of typing of objects.
Ah and of course, this encoding is very side effective therefore no CSE optimization can be expected.
Is there another (aside from using objects) type-safe encoding (e.g. using polymorphic variants) of records with inferrable type in OCaml?
For immutable records, yes. There is a standard theoretical duality between polymorphic records ("inferrable" records as you describe) and polymorphic variants. In short, a record { l_1 = v_1; l_2 = v_2; ...; l_n = v_n } can be implemented by
function `l_1 k -> k v_1 | `l_2 k -> k v_2 | ... | `l_n k -> k v_n
and then the projection r.l_i becomes r (`l_i (fun v -> v)). For instance, the function fun r -> r.x is encoded as fun r -> r (`x (fun v -> v)). See also the following example session:
# let myRecord = (function `field1 k -> k 123 | `field2 k -> k "hello") ;;
(* encodes { field1 = 123; field2 = "hello" } *)
val myRecord : [< `field1 of int -> 'a | `field2 of string -> 'a ] -> 'a = <fun>
# let getField1 r = r (`field1 (fun v -> v)) ;;
(* fun r -> r.field1 *)
val getField1 : ([> `field1 of 'a -> 'a ] -> 'b) -> 'b = <fun>
# getField1 myRecord ;;
- : int = 123
# let getField2 r = r (`field2 (fun v -> v)) ;;
(* fun r -> r.field2 *)
val getField2 : ([> `field2 of 'a -> 'a ] -> 'b) -> 'b = <fun>
# getField2 myRecord ;;
- : string = "hello"
For mutable records, we can add setters like:
let ref1 = ref 123
let ref2 = ref "hello"
let myRecord =
function
| `field1 k -> k !ref1
| `field2 k -> k !ref2
| `set_field1(v1, k) -> k (ref1 := v1)
| `set_field2(v2, k) -> k (ref2 := v2)
and use them like myRecord (`set_field1(456, fun v -> v)) and myRecord (`set_field2("world", fun v -> v)) for example. However, localizing ref1 and ref2 like
let myRecord =
let ref1 = ref 123 in
let ref2 = ref "hello" in
function
| `field1 k -> k !ref1
| `field2 k -> k !ref2
| `set_field1(v1, k) -> k (ref1 := v1)
| `set_field2(v2, k) -> k (ref2 := v2)
causes a value restriction problem and requires a little more polymorphic typing trick (which I omit here).
Can this encoding avoid the problems mentioned above?
The "common subexpression elimination" for (the encoding of) r.x + r.x can be done only if OCaml knows the definition of r and inlines it. (Sorry my previous answer was inaccurate here.)