What (exactly) are "First Class" modules? - scala

I often read some programming languages have "First Class" support for modules (OCaml, Scala, TypeScript[?]) and recently stumbled upon an answer on SO citing modules as first class citizens among the distinguishing features of Scala.
I thought I knew very well what Modular Programming means but after these incidents I'm beginning to doubt my understanding...
I think modules are nothing special but instances of certain classes that are acting as mini-libraries. The mini-library code goes into a class, objects of that class are the modules. You can pass them around as dependencies to any other class that requires the services provided by the module, so any decent OOPL has first class modules but apparently not!
What exactly is a module? How is it different than, say, a plain class or an object?
How is (1) related (or not) to the Modular Programming that we all know?
What exactly does it mean for a language to have first class modules? What are the benefits? what are the drawbacks if a languages lacks such feature?

A module, as well as a subroutine, is a way of organizing your code. When we develop programs, we pack instructions into subroutines, subroutines into structures, structures into packages, libraries, assemblies, frameworks, solutions, and so on. So, putting everything else aside, it is just a mechanism to organize your code.
The essential reason, why we use all those mechanisms, instead of just laying out our instructions linearly, is because the complexity of a program grows non-linearly with respect to its size. In other words, a program built from n pieces each having m instructions is easier to comprehend than a program which is built from n*m instructions. This is, of course, not always true (otherwise we can just split our program into arbitrary parts and be happy). In fact, for that to be true, we have to introduce one essential mechanism called abstraction. We can benefit from splitting a program into manageable subparts only if each part provides some sort of abstraction. For example, we can have, connect_to_database, query_for_students, sort_by_grade, and take_the_first_n abstractions packed as functions or subroutines, and is much easier to understand the code which is expressed in terms of those abstractions, rather than trying to understand the code in which all those functions are inlined.
So now we have functions and it is natural to introduce the next level of organization -- collections of functions. It is common to see that some functions build families around some common abstraction, e.g., student_name, student_grade, student_courses, etc, they all revolve around the same abstraction student. The same is for connection_establish, connection_close, etc. Therefore we need some mechanism that will tie together those functions. Here we are starting to have options. Some languages took the OOP path, in which objects and classes are the units of the organization. Where a bunch of functions and a state is called an object. Other languages took a different path and decided to combine functions into static structures called modules. The main difference is that a module is a static, compile-time structure, where objects are runtime structures that have to be created in runtime to be used. As a result, naturally, objects tend to contain state, while modules do not (and contain only code). And objects are inherently regular values, which you can assign to variables, store them in files, and do other manipulations which you can do with data. Classical modules, contrary to objects, do not have runtime representation, therefore you can't pass modules as parameters to your functions, store them in a list, and otherwise perform any computations on modules. This is basically what people mean by saying first class citizen - an ability to treat an entity as a simple value.
Back to composable programs. In order to make objects/modules composable, we need to be sure that they create abstractions. For functions abstraction boundary is clearly defined - it is the tuple of parameters. For objects, we have a notion of interfaces and classes. While for modules we have only interfaces. Since modules are inherently more simple (they do not include the state) we do not have to deal with their constructing and deconstructing, therefore we do not need a more complicated notion of a class. Both classes and interfaces is a way to classify objects and modules by some criteria so that we can reason about different modules without looking into the implementation, the same way as we did with connect_to_database, query_for_students, et al functions - we were reasoning about them only based on their name and interface (and probably documentation). Now we can have a class student or a module Student both defining an abstraction called student, so that we can save a lot of brain power, without having to deal with the way how are those students implemented.
And beyond making our programs easier to understand, abstractions give us another benefit -- generalization. Since we don't need to reason about the implementation of a function or a module, it means that all implementations are interchangeable to some degree. Therefore, we can write our programs so that they will express their behavior in a general way, without breaking the abstractions, and then choose particular instances when we run our programs. Objects are runtime instances and essentially it means that we can choose our implementation in runtime. Which is nice. Classes are, however, rarely first-class citizens, therefore we have to invent different cumbersome methods to make the selection, like the Abstract Factory and Builder design patterns. For modules, the situation is even worse, since they are inherently a compile-time structure, we have to choose our implementation at the program building/lining time. Which is not what people want to do in the modern world.
And here comes first-class modules, being an amalgamation of modules and objects, they give us the best of two worlds - an easy to reason about stateless structures, which are, at the same time, a pure first-class citizens, which you can store in a variable, put into list and select the desired implementation in runtime.
Speaking of OCaml, underneath the hood, first-class modules are simply a record of functions. In OCaml, you can even add state to the first-class module making it practically indistinguishable from an object. This brings us to another topic - in the real world, the separation between objects and structures is not that clear. For example, OCaml provides both modules and objects and you can put objects inside modules and even vice verse. In C/C++ we have compilation units, symbols visibility, opaque data types, and header files, which enables some sort of modular programming, as well as we have structures and namespaces. Therefore, the difference is sometimes hard to tell.
Therefore, to summarize. Modules are pieces of code with a well-defined interface to access this code. First class modules are modules which could be manipulated as a regular value, e.g., stored in a data structure, assigned a variable, and picked at runtime.

OCaml perspective here.
Modules and classes are very different.
First of all, classes in OCaml are a very specific (and complex) feature. To go into some detail, classes implement inheritance, row polymorphism and dynamic dispatch (aka virtual methods). It allows them to be highly flexible at the expense of some efficiency.
Modules, however, are quite a different thing altogether.
Indeed, you can see modules as atomic mini-libraries, and usually they are used to define a type and its accessors, but they are much more powerful than just that.
Modules allow you to create several types, as well as module types and submodules. Basically, they allow to create complex compartmentalization and abstraction.
Functors give you behavior similar to c++'s templates. Except they are safe. Basically, they are functions on modules, which allows you to parameterize a data structure or algorithm over some other module.
Modules are usually solved statically and therefore easy to inline, allowing you to write clear code without fear of a loss in efficiency.
Now, a first-class citizen is an entity that can be put in a variable, passed to a function and tested for equality. In a way, it means they will be dynamically evaluated.
For example, suppose you have a module Jpeg and a module Png that allow you to manipulate different kind of pics. Statically, you don't know what kind of image you'll need to display. So you can use first-class modules:
let get_img_type filename =
match Filename.extension filename with
| ".jpg" | ".jpeg" -> (module Jpeg : IMG_HANDLER)
| ".png" -> (module Png : IMG_HANDLER)
let display_img img_type filename =
let module Handler = (val img_type : IMG_HANDLER) in
Handler.display filename

The main differences between a module and an object usually are
Modules are second-class, i.e., they are rather static entities that cannot be passed around as values, while objects can.
Modules can contain types and all other forms of declarations (and types can be made abstract), while objects typically cannot.
However, as you note, there are languages where modules can be wrapped up as first-class values (e.g. Ocaml) and there are languages where objects can contain types (e.g. Scala). That blurs the line a little. There still tend to be various biases towards certain patterns, with different trade-offs made in the type systems. For example, objects focus on recursive types, while modules focus on type abstraction and allowing any definition. It is a very hard problem to support both at the same time without severe compromises, since that quickly leads to an undecidable type system.

As has been mentioned already, "modules", "classes" and "objects" are more like tendencies than strict formal definitions. And if you implement modules as objects for example, as I understand Scala does, then obviously there are no fundamental differences between them, but mostly just syntactic differences that make them more convenient for certain use cases.
In regards to OCaml specifically though, here's a practical example of something you cannot do with modules that you can do with classes because of fundamental differences in implementation:
Modules have functions, which can reference each other recursively using the rec and and keyword. A module can also "inherit" the implementation of another module using include and override its definitions. For example:
module Base = struct
let name = "base"
let print () = print_endline name
end
module Child = struct
include Base
let name = "child"
end
but because modules are early bound, that is, names are resolved at compile time, it's not possible to get Base.print to reference Child.name instead of Base.name. At least not without altering both Base and Child significantly to explicitly enable it:
module AbstractBase(T : sig val name : string end) = struct
let name = T.name
let print () = print_endline name
end
module Base = struct
include AbstractBase(struct let name = "base" end)
end
module Child = struct
include AbstractBase(struct let name = "child" end)
end
With classes on the other hand, overriding is trivial and the default:
class base = object(self)
method name = "base"
method print = print_endline self#name
end
class child = object
inherit base
method! name = "child"
end
Classes can reference themselves, through a variable conventionally named this or self (in OCaml you can name it whatever you want, but self is the convention). They are also late bound, meaning they are resolved at runtime and can therefore call method implementations that didn't exist when it was defined. This is called open recursion.
So why aren't modules late bound too? Primarily for performance reasons I think. Doing a dictionary search on the name of every function call will undoubtedly have a significant impact on execution time.

Related

How to wrap procedural algorithms in OOP language

I have to implement an algorithm which fits perfectly to the procedural design approach. It has no relations with some data structure, it just takes couple of objects, bunch of control parameters and performs complicated operations on them, including creating and modifying intermediate temporal data, subroutines calls, many cpu-intensive data transformations. The algorithm is too specific to include in either parameter object as method.
What is idiomatic way to wrap such algorithms in an OOP language? Define static object with static method that performs calculation? Define class that takes all algorithm parameters as constructor arguments and have result method to return result? Any other way?
If you need more specifics, I'm writing in scala. But any general OOP approach is also applicable.
A static method (or a method on a singleton object in the case of Scala -- which I'm just gonna call a static method because that's the most common terminology) can work perfectly fine and is probably the most common approach to this.
There's some reasons to use other approaches, but they aren't strictly necessary and I'd avoid them unless you actually need an advantage that they give. The reason for this is because static methods are the simplest (if least versatile) approach.
Using a non-static method can be useful because you can then utilize design patterns like the factory pattern. For example, you might have an Operator class with a method evaluate. Now you could have different factories create different Operators so that you can swap your algorithm on the fly. Perhaps a calculator might have an AddOperatorFactory, MultiplyOperatorFactory and so on. Obviously this requires that you are able to instantiate an object that represents the algorithm. Of course, you could just pass a function around directly, as Scala and many other languages allow. Classes allow for inheritance, though, which opens the doors for some design patterns and, well, you're asking about OOP, not Scala specifically.
Also useful is the ability to have state with an object. With static methods, your only options for retaining state are either having global state (ew) or making the user of the static methods keep track of this state (more work for the users). With an instance of an object, you can keep that state inside the instance. For example, if your algorithm is a graph search, perhaps you'd want to allow resuming a search after you find the first match (which obviously requires storing state).
It's not much harder to have to do new MyAlgorithm().doStuff() instead of MyAlgorithm.doStuff(), so if in doubt, I would err on the side of avoiding static methods if you think you'll need the functionality that having an instance offers.

Scala - Does pattern matching break the Open-Closed principle? [duplicate]

If I add a new case class, does that mean I need to search through all of the pattern matching code and find out where the new class needs to be handled? I've been learning the language recently, and as I read about some of the arguments for and against pattern matching, I've been confused about where it should be used. See the following:
Pro:
Odersky1 and
Odersky2
Con:
Beust
The comments are pretty good in each case, too. So is pattern matching something to be excited about or something I should avoid using? Actually, I imagine the answer is "it depends on when you use it," but what are some positive use cases for it and what are some negative ones?
Jeff, I think you have the right intuition: it depends.
Object-oriented class hierarchies with virtual method dispatch are good when you have a relatively fixed set of methods that need to be implemented, but many potential subclasses that might inherit from the root of the hierarchy and implement those methods. In such a setup, it's relatively easy to add new subclasses (just implement all the methods), but relatively difficult to add new methods (you have to modify all the subclasses to make sure they properly implement the new method).
Data types with functionality based on pattern matching are good when you have a relatively fixed set of classes that belong to a data type, but many potential functions that operate on that data type. In such a setup, it's relatively easy to add new functionality for a data type (just pattern match on all its classes), but relatively difficult to add new classes that are part of the data type (you have to modify all the functions that match on the data type to make sure they properly support the new class).
The canonical example for the OO approach is GUI programming. GUI elements need to support very little functionality (drawing themselves on the screen is the bare minimum), but new GUI elements are added all the time (buttons, tables, charts, sliders, etc). The canonical example for the pattern matching approach is a compiler. Programming languages usually have a relatively fixed syntax, so the elements of the syntax tree will change rarely (if ever), but new operations on syntax trees are constantly being added (faster optimizations, more thorough type analysis, etc).
Fortunately, Scala lets you combine both approaches. Case classes can both be pattern matched and support virtual method dispatch. Regular classes support virtual method dispatch and can be pattern matched by defining an extractor in the corresponding companion object. It's up to the programmer to decide when each approach is appropriate, but I think both are useful.
While I respect Cedric, he's completely wrong on this issue. Scala's pattern matching can be fully-encapsulated from class changes when desired. While it is true that a change to a case class would require changing any corresponding pattern matching instances, this is only when using such classes in a naive fashion.
Scala's pattern matching always delegates to the deconstructor of a class's companion object. With a case class, this deconstructor is automatically generated (along with a factory method in the companion object), though it is still possible to override this auto-generated version. At all times, you can assert complete control over the pattern matching process, insulating any patterns from potential changes in the class itself. Thus, pattern matching is simply another way of accessing class data through the safe filter of encapsulation, just like any other method.
So, Dr. Odersky's opinion would be the one to trust here, particularly given the sheer volume of research he has performed in the area of object-oriented programming and design.
As for where it should be used, that is entirely according to taste. If it makes your code more concise and maintainable, use it! Otherwise, don't. For most object-oriented programs, pattern matching is unnecessary. However, once you begin to integrate more functional idioms (Option, List, etc) I think you'll find that pattern matching will significantly reduce syntactic overhead as well as improving the safety offered by the type system. In general, any time you want to extract data while simultaneously testing some condition (e.g. extracting a value from Some), pattern matching will likely be of use.
Pattern matching is definitely good if you are doing functional programming. In case of OO, there are some cases where it is good. In Cedric's example itself, it depends on how you view the print() method conceptually. Is it a behavior of each Term object? Or is it something outside it? I would say it is outside, and makes sense to do pattern matching. On the other hand if you have an Employee class with various subclasses, it is a poor design choice to do pattern matching on an attribute of it (say name) in the base class.
Also pattern matching offers an elegant way of unpacking members of a class.

Architecture-independent "pure logic" code generation

I don't really know if any common terms exist for what I'm asking about, so I apologize for possible stupid misuse of the terms.
I'm interested, if there are any solutions or at least experiments for creating "pure logic" code, abstract of any architectural patterns, and later generation of architecture-specific code based on it.
For example:
"pure logic" is addition of two integers — a and b
it can be dumped as inline "= a + b"
or as a function "function sum(a,b){return a+b}; =sum(a,b)"
or as a class "class Sum(a, b){...}; s = new Sum(a,b); =s.result();
or maybe this class has no constructor arguments but requires applying them after construction
or it accepts a dictionary with dozen possible keys including 2 we need
or maybe we have DI/IoC container and we call lazy-loaded singleton serevice with 2 injected arguments
and so on
So, basically, it's like we have a mix of global functions and variables, and then we apply generation rules and templates to get a specific coder-friendly result.
Basically, you cannot escape having to define some syntax, and giving it semantics. And that gives you a language. In this language you have types (integers) and an operation (you can add them).
So now this business of generating code is basically your compiler for the language, which uses various high level languages as the back end.
Since some of the languages are perhaps not as "pure" as your high level language, or are semantically distant in various ways, the generated code in some of the back-end dialects might end up looking like dog's breakfast in order to precisely implement the semantics.

Interface in a dynamic language?

Interface (or an abstract class with all the methods abstract) is a powerful weapon in a static-typed language such as C#, JAVA. It allows different derived types to be used in a uniformed way. Design patterns encourage us to use interface as much as possible.
However, in a dynamic-typed language, all objects are not checked for their type at compile time. They don't have to implement an interface to be used in a specific way. You just need to make sure that they have some methods (attributes) defined. This makes interface not necessary, or at least not as useful as it is in a static language.
Does a typical dynamic language (e.g. ruby) have interface? If it does, then what are the benefits of having it? If it doesn't, then are we losing many of the beautiful design patterns that require an interface?
Thanks.
I guess there is no single answer for all dynamic languages. In Python, for instance, there are no interfaces, but there is multiple inheritance. Using interface-like classes is still useful:
Interface-like classes can provide default implementation of methods;
Duck-typing is good, but to an extent; sometimes it is useful to be able to write isinstance(x, SomeType), especially when SomeType contains many methods.
Interfaces in dynamic languages are useful as documentation of APIs that can be checked automatically, e.g. by development tools or asserts at runtime.
As an example, zope.interface is the de-facto standard for interfaces in Python. Projects such as Zope and Twisted that expose huge APIs for consumption find it useful, but as far as I know it's not used much outside this type of projects.
In Ruby, which is a dynamically-typed language and only allows single inheritance, you can mimic an "interface" via mixins, rather than polluting the class with the methods of the "interface".
Mixins partially mimic multiple inheritance, allowing an object to "inherit" from multiple sources, but without the ambiguity and complexity of actually having multiple parents. There is only one true parent.
To implement an interface (in the abstract sense, not an actual interface type as in statically-typed languages) You define a module as if it were an interface in a static language. You then include it in the class. Voila! You've gathered the duck type into what is essentially an interface.
Very simplified example:
module Equippable
def weapon
"broadsword"
end
end
class Hero
include Equippable
def hero_method_1
end
def hero_method_2
end
end
class Mount
include Equippable
def mount_method_1
end
end
h = Hero.new
h.weapon # outputs "broadsword"
m = Mount.new
m.weapon # outputs "broadsword"
Equippable is the interface for Hero, Mount, and any other class or model that includes it.
(Obviously, the weapon will most likely be dynamically set by an initializer, which has been simplified away in this example.)

Are static inner classes a good idea or poor design?

I'm find I have several places that having public static inner classes designed that extend "helper" classes makes my code a lot more type safe and, in my opinion, readable. For example, imagine I have a "SearchCriteria" class. There are a lot of commonalities for the different things I search for (a search term and then a group of search term types, a date range, etc.) By extending it in a static inner class, I tightly couple the extension and the searchable class with the specific differences. This seems like a bad idea in theory (Tight Coupling Bad!) but the extension is specific to this searchable class (One Class, One Purpose).
My question is, in your experience, has the use of static inner classes (or whatever your language equivelent is) made your code more readable/maintainable or has this ended up biting you in the EOF?
Also, I'm not sure if this is community wiki material or not.
Sounds perfectly reasonable to me. By making it an inner class, you're making it easy to find and an obvious candidate for review when the searchable class changes.
Tight coupling is only bad when you couple things that don't really belong together just because one of them happens to call the other one. For classes that collaborate closely, e.g. when, as in your case, one of them exists to support the other, then it's called "cohesion", and it's a good thing.
Note that the class is not the unit of reuse. Therefore, some coupling between classes is normal and expected. The unit of reuse is usually a collection of related classes.
In Python, we have a variety of structures.
Packages. They contain modules. These are essentially directories with a little bit of Python machinery thrown in.
Modules. They contain classes (and functions). These are files; and can contain any number of closely-related classes. Often, the "inner class" business is handled at this level.
Classes. These can contain inner class definitions as well as method functions. Sometimes (not very often) inner classes may actually be used. This is rare, since the module-level coupling among classes is usually perfectly clear.
The only caveat with using inner classes is making sure you're not repeating yourself all over the place - as in - make sure, when you define an inner class, you're not going to need to use that functionality anywhere else, and, that that functionality is necessarily coupled with the outer class. You don't want to end up with a whole bunch of inner classes that all implement the exact same setOrderyByNameDesc() method.
The point in "loose coupling" is to keep the two classes separate so that if there are code changes in your "SearchCriteria" class nothing would have to be change in the other classes. I think that the static inner classes you are talking about could potentially make maintaining code a nightmare. One change in SearchCriteria could send you searching through all of the static classes to figure out which ones are now broken because of the update. Personally, I would stay away from any such inner classes unless it is really needed for some reason.