OCaml interface vs. signature? - interface

I'm a bit confused about interfaces vs. signatures in OCaml.
From what I've read, interfaces (the .mli files) are what govern what values can be used/called by the other programs. Signature files look like they're exactly the same, except that they name it, so that you can create different implementations of the interface.
For example, if I want to create a module that is similar to a set in Java:
I'd have something like this:
the set.mli file:
type 'a set
val is_empty : 'a set -> bool
val ....
etc.
The signature file (setType.ml)
module type Set = sig
type 'a set
val is_empty : 'a set -> bool
val ...
etc.
end
and then an implementation would be another .ml file, such as SpecialSet.ml, which includes a struct that defines all the values and what they do.
module SpecialSet : Set
struct
...
I'm a bit confused as to what exactly the "signature" does, and what purpose it serves. Isn't it acting like a sort of interface? Why is both the .mli and .ml needed? The only difference in lines I see is that it names the module.
Am I misunderstanding this, or is there something else going on here?

OCaml's module system is tied into separate compilation (the pairs of .ml and .mli files). So each .ml file implicitly defines a module, each .mli file defines a signature, and if there is a corresponding .ml file that signature is applied to that module.
It is useful to have an explicit syntax to manipulate modules and interfaces to one's liking inside a .ml or .mli file. This allows signature constraints, as in S with type t = M.t.
Not least is the possibility it gives to define functors, modules parameterized by one or several modules: module F (X : S) = struct ... end. All these would be impossible if the only way to define a module or signature was as a file.
I am not sure how that answers your question, but I think the answer to your question is probably "yes, it is as simple as you think, and the system of having .mli files and explicit signatures inside files is redundant on your example. Manipulating modules and signatures inside a file allows more complicated tricks in addition to these simple things".

This question is old but maybe this is useful to someone:
A file named a.ml appears as a module A in the program...
The interface of the module a.ml can be written in file named a.mli
slide link
This is from the OCaml MOOC from Université Paris Diderot.

Related

How do you resolve an OCaml circular build error?

I have code that produced a circular build error, and I looked up the error. This page gives a similar but smaller example of what's in my .mli file: https://ocaml.org/learn/tutorials/ocamlbuild/New_kinds_of_build_errors.html
Essentially the problem is that my file is both defining a type and defining functions that use arguments and return values of that same type. However, that's exactly what I want my program to do. My type is not private, it's declared explicitly in the .mli file:
type state = {
current_pos : int*int;
contents : int*int list;
}
val update_state : state -> state
It seems to me reasonable to want to build a module that defines a type and then to share that type with other files, but it seems like the circular build error will always prevent that. Is there some "more proper" way of doing this sharing?
There's nothing at all wrong with the code you posted. It compiles fine. So the problem is in your .ml file.
The page you point to shows code that is incorrect. The only point being made is that you'll get a different error if you use ocamlbuild than you would if you just compile the file directly.
The key point is that you should not use the name of a module inside the definition of the module.
Instead of this (in a.ml):
type t = int
let x : A.t = 14
You should have this:
type t = int
let x: t = 14
If your code is really like this example, you just need to remove the module names inside the .ml file.
As you say, what you want to do is by far the most common use of a module.

Extract Types/Classnames from flat Modelica code

I was wondering if there already exists a possibility to extract from flat Modelica code all variables AND their corresponding types (classnames respectively).
For example:
Given an extract from a flattened Modelica model:
constant Integer nSurfaces = 8;
constant Integer construction1.nLayers(min = 1.0) = 2 "Number of layers of the construction";
parameter Modelica.SIunits.Length construction1.thickness[construction1.nLayers]= {0.2, 0.1} "Thickness of each construction layer";
Here, the wanted output would be something like:
nSurfaces, Integer, constant;
construction1.nLayers, Integer, constant;
construction1.thickness[construction1.nLayers], Modelica.SIunits.Length, parameter
Ideally, for construction1.thickness there would be two lines (=number of construction1.nLayers).
I know, that it is possible to get a list of used variables from the dsin.txt, which is produced while translating a model. But until now I did not find an already existing way to get the corresponding types. And I really would like to avoid writing an own parser :-).
You could try to generate the file modelDescription.xml as defined by the FMI standard. It contains a ton of information and XML should be easier to parse, e.g. python has a couple of xml parsing/reading packages.
If you are using Dymola you just set the flag Advanced.FMI.GenerateModelDescriptionInterface2 = true to generate the model description file.
The second idea could be to let the compiler/tool parse the Modelica file for you as they need to do that anyway, try searching for AST (abstract syntax tree). In Dymola, this is available through the ModelManagement library, and also through the Python interface.
Third idea could be to use one of the Modelica parsers available, e.g. have a look at:
https://github.com/lbl-srg/modelica-json
https://hackage.haskell.org/package/modelicaparser
https://github.com/xie-dongping/modparc
https://github.com/pymoca/pymoca
https://github.com/pymola/pymola/tree/master/src/pymola
Fourth, if all that did not work, you still do not have to write a full parser, you could use ANTLR, then use an existing grammar file (look for e.g. modelica.g4).

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.

Interface with multiple implementations in OCaml

What is the conventional way to create an interface in OCaml? It's possible to have an interface with a single implementation by creating an interface file foo.mli and an implementation file foo.ml, but how can you create multiple implementations for the same interface?
You must use modules and signatures. A .ml file implicitly define a module, and a .mli its signature. With explicit modules and signature, you can apply a signature to several different modules.
See this chapter of the online book "Developing Applications with OCaml".
If you're going to have multiple implementations for the same signature, define your signature inside a compilation unit, rather than as a compilation unit, and (if needed) similarly for the modules. There's an example of that in the standard library: the OrderedType signature, that describes modules with a type and a comparison function on that type:
module type OrderedType = sig
type t
val compare : t -> t -> int
end
This signature is defined in both set.mli and map.mli (you can refer to it as either Set.OrderedType or Map.OrderedType, or even write it out yourself: signatures are structural). There are several compilation units in the standard library that have this signature (String, Nativeint, etc.). You can also define your own module, and you don't need to do anything special when defining the module: as long as it has a type called t and a value called compare of type t -> t -> int, the module has that signature. There's a slightly elaborate example of that in the standard library: the Set.Make functor builds a module which has the signature OrderedType, so you can build sets of sets that way.
(* All four modules passed as arguments to Set.Make have the signature Set.OrderedType *)
module IntSet = Set.Make(module type t = int val compare = Pervasives.compare end)
module StringSet = Set.Make(String)
module StringSetSet = Set.Make(StringSet)
module IntSetSet = Set.Make(IntSet)

Redundancy in OCaml type declaration (ml/mli)

I'm trying to understand a specific thing about ocaml modules and their compilation:
am I forced to redeclare types already declared in a .mli inside the specific .ml implementations?
Just to give an example:
(* foo.mli *)
type foobar = Bool of bool | Float of float | Int of int
(* foo.ml *)
type baz = foobar option
This, according to my normal way of thinking about interfaces/implementations, should be ok but it says
Error: Unbound type constructor foobar
while trying to compile with
ocamlc -c foo.mli
ocamlc -c foo.ml
Of course the error disappears if I declare foobar inside foo.ml too but it seems a complex way since I have to keep things synched on every change.
Is there a way to avoid this redundancy or I'm forced to redeclare types every time?
Thanks in advance
OCaml tries to force you to separate the interface (.mli) from the implementation (.ml. Most of the time, this is a good thing; for values, you publish the type in the interface, and keep the code in the implementation. You could say that OCaml is enforcing a certain amount of abstraction (interfaces must be published; no code in interfaces).
For types, very often, the implementation is the same as the interface: both state that the type has a particular representation (and perhaps that the type declaration is generative). Here, there can be no abstraction, because the implementer doesn't have any information about the type that he doesn't want to publish. (The exception is basically when you declare an abstract type.)
One way to look at it is that the interface already contains enough information to write the implementation. Given the interface type foobar = Bool of bool | Float of float | Int of int, there is only one possible implementation. So don't write an implementation!
A common idiom is to have a module that is dedicated to type declarations, and make it have only a .mli. Since types don't depend on values, this module typically comes in very early in the dependency chain. Most compilation tools cope well with this; for example ocamldep will do the right thing. (This is one advantage over having only a .ml.)
The limitation of this approach is when you also need a few module definitions here and there. (A typical example is defining a type foo, then an OrderedFoo : Map.OrderedType module with type t = foo, then a further type declaration involving'a Map.Make(OrderedFoo).t.) These can't be put in interface files. Sometimes it's acceptable to break down your definitions into several chunks, first a bunch of types (types1.mli), then a module (mod1.mli and mod1.ml), then more types (types2.mli). Other times (for example if the definitions are recursive) you have to live with either a .ml without a .mli or duplication.
Yes, you are forced to redeclare types. The only ways around it that I know of are
Don't use a .mli file; just expose everything with no interface. Terrible idea.
Use a literate-programming tool or other preprocessor to avoid duplicating the interface declarations in the One True Source. For large projects, we do this in my group.
For small projects, we just duplicate type declarations. And grumble about it.
You can let ocamlc generate the mli file for you from the ml file:
ocamlc -i some.ml > some.mli
In general, yes, you are required to duplicate the types.
You can work around this, however, with Camlp4 and the pa_macro syntax extension (findlib package: camlp4.macro). It defines, among other things, and INCLUDE construct. You can use it to factor the common type definitions out into a separate file and include that file in both the .ml and .mli files. I haven't seen this done in a deployed OCaml project, however, so I don't know that it would qualify as recommended practice, but it is possible.
The literate programming solution, however, is cleaner IMO.
No, in the mli file, just say "type foobar". This will work.