To me classes are quite similar to NodeJS (CommonJS) modules. You can have many of them, they can be reused, they can use each other and they are generally one-per-file.
What makes modules so different from classes? The way you use them differs, and the namespace difference is obvious. Besides that they seem very much the same thing to me or perhaps I am just not seeing the obvious benefit here.
Modules are more like packages (to use the Java term) than classes. You don't instantiate a module; there is only one copy of it. It's a tool for organizing related functionality, but it doesn't typically encapsulate the data of a particular instance of an object.
Probably the closest analogue to a class (setting aside those libraries that actually construct class-based inheritance in JavaScript) is just a constructor function. You can of course put such functions inside a module.
function Car() {
this.colour = 'red';
}
Car.prototype.getColour = function() { return this.colour; };
var myCar = new Car();
myCar.getColour(); // returns 'red'
You use both modules and classes for encapsulation, but the nature of that encapsulation is different.
JS was initially a prototypal inheritance system. It was super simple like the rest of the language. But then Netscape decided to make it be more like Java and added the idea of constructors to the language. Hence pseudo classes were born.
You can check this link to know how prototypal OOP is used in JS:
http://howtonode.org/prototypical-inheritance
One critical thing; that "generally one-per-file" thing is not true; modules are absolutely one-per-file. A require() that brings the module's exports into the namespace has no way of distinguishing between the exported contents of that module; everything that module (file) exports are imported with a require() statement. Attempting to put more than one module into a file only means that you'll get everything in that file when you try to load "either" module.
Related
I often read some programming languages have "First Class" support for modules (OCaml, Scala, TypeScript[?]) and recently stumbled upon an answer on SO citing modules as first class citizens among the distinguishing features of Scala.
I thought I knew very well what Modular Programming means but after these incidents I'm beginning to doubt my understanding...
I think modules are nothing special but instances of certain classes that are acting as mini-libraries. The mini-library code goes into a class, objects of that class are the modules. You can pass them around as dependencies to any other class that requires the services provided by the module, so any decent OOPL has first class modules but apparently not!
What exactly is a module? How is it different than, say, a plain class or an object?
How is (1) related (or not) to the Modular Programming that we all know?
What exactly does it mean for a language to have first class modules? What are the benefits? what are the drawbacks if a languages lacks such feature?
A module, as well as a subroutine, is a way of organizing your code. When we develop programs, we pack instructions into subroutines, subroutines into structures, structures into packages, libraries, assemblies, frameworks, solutions, and so on. So, putting everything else aside, it is just a mechanism to organize your code.
The essential reason, why we use all those mechanisms, instead of just laying out our instructions linearly, is because the complexity of a program grows non-linearly with respect to its size. In other words, a program built from n pieces each having m instructions is easier to comprehend than a program which is built from n*m instructions. This is, of course, not always true (otherwise we can just split our program into arbitrary parts and be happy). In fact, for that to be true, we have to introduce one essential mechanism called abstraction. We can benefit from splitting a program into manageable subparts only if each part provides some sort of abstraction. For example, we can have, connect_to_database, query_for_students, sort_by_grade, and take_the_first_n abstractions packed as functions or subroutines, and is much easier to understand the code which is expressed in terms of those abstractions, rather than trying to understand the code in which all those functions are inlined.
So now we have functions and it is natural to introduce the next level of organization -- collections of functions. It is common to see that some functions build families around some common abstraction, e.g., student_name, student_grade, student_courses, etc, they all revolve around the same abstraction student. The same is for connection_establish, connection_close, etc. Therefore we need some mechanism that will tie together those functions. Here we are starting to have options. Some languages took the OOP path, in which objects and classes are the units of the organization. Where a bunch of functions and a state is called an object. Other languages took a different path and decided to combine functions into static structures called modules. The main difference is that a module is a static, compile-time structure, where objects are runtime structures that have to be created in runtime to be used. As a result, naturally, objects tend to contain state, while modules do not (and contain only code). And objects are inherently regular values, which you can assign to variables, store them in files, and do other manipulations which you can do with data. Classical modules, contrary to objects, do not have runtime representation, therefore you can't pass modules as parameters to your functions, store them in a list, and otherwise perform any computations on modules. This is basically what people mean by saying first class citizen - an ability to treat an entity as a simple value.
Back to composable programs. In order to make objects/modules composable, we need to be sure that they create abstractions. For functions abstraction boundary is clearly defined - it is the tuple of parameters. For objects, we have a notion of interfaces and classes. While for modules we have only interfaces. Since modules are inherently more simple (they do not include the state) we do not have to deal with their constructing and deconstructing, therefore we do not need a more complicated notion of a class. Both classes and interfaces is a way to classify objects and modules by some criteria so that we can reason about different modules without looking into the implementation, the same way as we did with connect_to_database, query_for_students, et al functions - we were reasoning about them only based on their name and interface (and probably documentation). Now we can have a class student or a module Student both defining an abstraction called student, so that we can save a lot of brain power, without having to deal with the way how are those students implemented.
And beyond making our programs easier to understand, abstractions give us another benefit -- generalization. Since we don't need to reason about the implementation of a function or a module, it means that all implementations are interchangeable to some degree. Therefore, we can write our programs so that they will express their behavior in a general way, without breaking the abstractions, and then choose particular instances when we run our programs. Objects are runtime instances and essentially it means that we can choose our implementation in runtime. Which is nice. Classes are, however, rarely first-class citizens, therefore we have to invent different cumbersome methods to make the selection, like the Abstract Factory and Builder design patterns. For modules, the situation is even worse, since they are inherently a compile-time structure, we have to choose our implementation at the program building/lining time. Which is not what people want to do in the modern world.
And here comes first-class modules, being an amalgamation of modules and objects, they give us the best of two worlds - an easy to reason about stateless structures, which are, at the same time, a pure first-class citizens, which you can store in a variable, put into list and select the desired implementation in runtime.
Speaking of OCaml, underneath the hood, first-class modules are simply a record of functions. In OCaml, you can even add state to the first-class module making it practically indistinguishable from an object. This brings us to another topic - in the real world, the separation between objects and structures is not that clear. For example, OCaml provides both modules and objects and you can put objects inside modules and even vice verse. In C/C++ we have compilation units, symbols visibility, opaque data types, and header files, which enables some sort of modular programming, as well as we have structures and namespaces. Therefore, the difference is sometimes hard to tell.
Therefore, to summarize. Modules are pieces of code with a well-defined interface to access this code. First class modules are modules which could be manipulated as a regular value, e.g., stored in a data structure, assigned a variable, and picked at runtime.
OCaml perspective here.
Modules and classes are very different.
First of all, classes in OCaml are a very specific (and complex) feature. To go into some detail, classes implement inheritance, row polymorphism and dynamic dispatch (aka virtual methods). It allows them to be highly flexible at the expense of some efficiency.
Modules, however, are quite a different thing altogether.
Indeed, you can see modules as atomic mini-libraries, and usually they are used to define a type and its accessors, but they are much more powerful than just that.
Modules allow you to create several types, as well as module types and submodules. Basically, they allow to create complex compartmentalization and abstraction.
Functors give you behavior similar to c++'s templates. Except they are safe. Basically, they are functions on modules, which allows you to parameterize a data structure or algorithm over some other module.
Modules are usually solved statically and therefore easy to inline, allowing you to write clear code without fear of a loss in efficiency.
Now, a first-class citizen is an entity that can be put in a variable, passed to a function and tested for equality. In a way, it means they will be dynamically evaluated.
For example, suppose you have a module Jpeg and a module Png that allow you to manipulate different kind of pics. Statically, you don't know what kind of image you'll need to display. So you can use first-class modules:
let get_img_type filename =
match Filename.extension filename with
| ".jpg" | ".jpeg" -> (module Jpeg : IMG_HANDLER)
| ".png" -> (module Png : IMG_HANDLER)
let display_img img_type filename =
let module Handler = (val img_type : IMG_HANDLER) in
Handler.display filename
The main differences between a module and an object usually are
Modules are second-class, i.e., they are rather static entities that cannot be passed around as values, while objects can.
Modules can contain types and all other forms of declarations (and types can be made abstract), while objects typically cannot.
However, as you note, there are languages where modules can be wrapped up as first-class values (e.g. Ocaml) and there are languages where objects can contain types (e.g. Scala). That blurs the line a little. There still tend to be various biases towards certain patterns, with different trade-offs made in the type systems. For example, objects focus on recursive types, while modules focus on type abstraction and allowing any definition. It is a very hard problem to support both at the same time without severe compromises, since that quickly leads to an undecidable type system.
As has been mentioned already, "modules", "classes" and "objects" are more like tendencies than strict formal definitions. And if you implement modules as objects for example, as I understand Scala does, then obviously there are no fundamental differences between them, but mostly just syntactic differences that make them more convenient for certain use cases.
In regards to OCaml specifically though, here's a practical example of something you cannot do with modules that you can do with classes because of fundamental differences in implementation:
Modules have functions, which can reference each other recursively using the rec and and keyword. A module can also "inherit" the implementation of another module using include and override its definitions. For example:
module Base = struct
let name = "base"
let print () = print_endline name
end
module Child = struct
include Base
let name = "child"
end
but because modules are early bound, that is, names are resolved at compile time, it's not possible to get Base.print to reference Child.name instead of Base.name. At least not without altering both Base and Child significantly to explicitly enable it:
module AbstractBase(T : sig val name : string end) = struct
let name = T.name
let print () = print_endline name
end
module Base = struct
include AbstractBase(struct let name = "base" end)
end
module Child = struct
include AbstractBase(struct let name = "child" end)
end
With classes on the other hand, overriding is trivial and the default:
class base = object(self)
method name = "base"
method print = print_endline self#name
end
class child = object
inherit base
method! name = "child"
end
Classes can reference themselves, through a variable conventionally named this or self (in OCaml you can name it whatever you want, but self is the convention). They are also late bound, meaning they are resolved at runtime and can therefore call method implementations that didn't exist when it was defined. This is called open recursion.
So why aren't modules late bound too? Primarily for performance reasons I think. Doing a dictionary search on the name of every function call will undoubtedly have a significant impact on execution time.
I am reading through O'Reilly's Perl Objects, References & Modules, more specifically its section about modules. It states that when using use Some::Module you can specify an import list. From its explanation it seems that the only benefit of using this list is for the sake of keeping your namespace clean. In other words, if you have a subroutine some_sub in your main package and the loaded module has a sub with the same name, your subroutine will be overridden. However, if you specify an import list and leave out some_sub from this list, you'll not have this conflict. You can then still run some_sub from the Module by declaring it like so: Some::Module::some_sub.
Is there any other benefit than the one I described above? I am asking this because in some cases you load modules with loads of functionality, even though you are only interested in some of its methods. At first I thought that by specifying an import list you only loaded those methods and not bloating memory with methods you wouldn't use anyway. However, from the explanation above that does not seem the case. Can you selectively save resources by only loading parts of a module? Or is Perl smart enough to do this when compiling without the need of a programmer's intervention?
From use we see that use Module LIST; means exactly
BEGIN { require Module; Module->import( LIST ); }
On the other hand, from require
Otherwise, require demands that a library file be included if it hasn't already been included. The file is included via the do-FILE mechanism, [...]
and do 'file' executes 'file' as a Perl script. Thus with use we load the whole module.
"Importing" a sub means that its name is added (or overwritten) in the caller's symbol table (via the CODE slot for the typeglob, normally aliased), by the package's import function. The sub's code isn't copied. Now, import can be written any way the author wants to, but generally the import list in the use statement merely controls what symbols are brought into the namespace. The preferred way to provide import in a module is to use the Exporter's import method.
Selective importing relieves the symbol table (and perhaps some related mechanisms), but I am not aware of practical benefits of this. The benefits are related to programming, via reduced chances for collisions.
Another clear benefit is that it nicely documents what is used in the code.
Note that "import list" is just a convention. Module's import function is free to do whatever it pleases with this list and you can see it (ab)used by many so-called pragma modules. Therefore partial loading is NOT bound to use in any way. For example module can load heavy function stubs WHEREVER you've imported them or not and dynamically load heavy implementation on actual first call.
Therefore use with partial import list may, or may not actually save any resources - it is all depends on actual implementation of used module.
While require and use indeed load entire .pm file - that file well could be just a lightweight stub and loader for actual code located elsewhere. There's another convention to call those modules ::Heavy.
Modules are free to implement partial loading in any way they please as well. Here are just some possibilities how module can save resources:
AUTOLOAD (with its complimentary AutoLoader, AutoSplit, and SelfLoader modules).
Use stubs that load necessary submodules.
Dynamically load heavy data (i.e. dictionaries or encoding maps) when they are first accessed by their name.
If you depend on other heavy modules, dynamically require them at runtime in functions that depend on them instead of compile-time use at the very start.
Everything on this list could work automatically behind the scenes, exposed through use import list, or work/be called in other, completely arbitrary way. Once again, it's completely up to module's implementation.
In java, there is a way to import a class and all of its children in one line:
import java.utils.*
In Perl, I've found I can only import specific classes:
use Perl::Utils::Folder;
use Perl::Utils::Classes qw(new run_class);
Is there similar way like java to import everything that falls under a tree structure, only in Perl?
No, there is not a way to easily do what you are after.
You could walk the relevant paths in your PERL library's filesystem and use every .pm file you came across (that's what Module::Find, as suggested by #Daniel Böhmer, does), but that can miss a few things:
Packages that are declared in funny ways/at runtime.
Multiple packages per module file.
Other cases I haven't thought of.
This is also a bad idea, for a few reasons:
You mentioned "classes" in your question, rather than just packages. Perl packages and subpackages do not necessarily represent classes/instantiable object-oriented code. If you were to programmatically generate a list of all packages in a hierarchy and then call $packagename->new() on each of them, you might have a syntax error, if one of the packages was just a library of functions.
Packages and subpackages often are not directly related, developed by the same people, or used for similar things. Just because a package starts with Net:: doesn't mean that it will obey standard conventions that other Net::-prefixed packages expect. For example, File::Find and File::Tail share a prefix, but have very little to do with each other; the prefix is in common because both utilities work with files as their goal.
Lots of packages do things at BEGIN/INIT/etc time when they're compiled. Some of them (sadly) do different things depending on the order in which they're used relative to other modules. The solution to this problem for module developers is "don't do that", but for module users, it's "use sparingly, and only when needed".
It clutters your local namespace with lots of potentially-exported symbols you don't necessarily need (to conditionally import symbols, you'll have to use import arguments like you're doing in your example; there's no programmatic way to define "symbols I'm interested in", since Perl doesn't have that kind static analysis at compile time . . . not for lots of call styles, at least).
It slows down your program's startup time by compiling things you might not necessarily need. This might not seem important at the early phase of a project, but for larger projects it is very easy to end up in situations where you're pulling in over a thousand CPAN modules when you start Apache (or launch your main script, or whatever), and your app takes more than a minute just to start.
I have a hunch that you're trying to reduce boilerplate (as in: all of your modules have a big block of use statements at the top, and that's duplicated everywhere). There are a few ways to do this, starting with:
Don't: import things in each module as you need them, and use strict/warnings and lots of tests to be told early on if you're calling functionality that you haven't imported yet.
You could also make your own Exporter subclass that uses all of your standard modules and adds the functions that you frequently use from them to its #EXPORTS (or splices their #EXPORTS onto its own, or uses Exporter sub-sub-classing, or something).
Factor your code so that the parts that depend on multiple imported modules live in a single utility module, and import that.
Factor your code so that the parts that depend on the imported modules live in a parent class, and address its methods via instances of subclasses (or SUPER), so your subclasses don't have to explicitly contain the imports, e.g. $instance->method_that_calls_an_imported_function_in_the_parent();
Also, as an aside, using package.* imports in Java is debatable, and has many of the same drawbacks of doing it in Perl.
In Perl, the class Foo::Bar::Foo may not be a subclass of Foo::Bar. Nor, is there any guarantee that a subclass module even has the same class prefix. IO::File is a subclass of IO::Handle and not of IO which isn't even a module.
There also isn't even an easy way to tell of a Perl module is a sub-class of another Perl module. There are (at least) three ways to declare a subclass' relationship to a class:
use parent
use base
The #ISA package variable
It is possible to use #INC to find all modules, then look at the source and look at use parent, use base, and #ISA declarations and build a Perl class matrix, then go through that matrix to load the classes you do need. That will probably be slow and cumbersome, and doesn't even cover Moose based classes.
You're asking the wrong question. You're asking "Find all of the subclasses of a particular class.". This will include classes that you're probably not even interested in. I know (for example LWP) that there can be dozens of various classes and subclasses that include stuff you're not even interested in.
What you should be asking is "What do I need to do?", and then find the classes that fulfill your needs. If these classes happen to be child classes of a particular parent class, these subclasses will load the required class.
We do Java programming here, and one of the standards is not to use asterisks in our import statements. This is considered sloppy programming. If you need a particular class, you should declare it rather than simply declaring a superclass. Many of our reporting tools have problems with asterisk declarations in import statements.
There is a Module::Find module, but I am not sure exactly how it works. I believe it simply assumes that subclasses are in the same module hierarchy as the superclass, but that's far from true in Perl.
In general, I think it is a bad idea to load a whole 'tree' of modules (or subclasses so to speak).
There is definitely something wrong in your design if you need to know all and everything about sub classes/modules. You break the rules of encapsulation and you should not need to know how a class handles its responsibilities.
I'm attempting to create a small ORM library for use in a Mojolicious web-application. I've grown very fond of Ruby's Datamapper library and would like to emulate some of its behaviour if possible.
In Datamapper you can mixin Resource, and then have methods added to your class such as 'all', etc:
# User.rb
class User
include Datamapper::Resource;
end
...
# Application.rb
users = User.all
For my library I'm attempting to add some package level functionality to modules that inherit from a base Model in order to achieve a similar behaviour.
In essence, I would like to be able to do something approximating the following:
# User.pm
package User;
use base Model;
...
# Application.pm
my #users = User::all();
I've had a look around for examples of meta-programming in perl and haven't found anything immediately helpful.
What I'm after is the following:
Alternate perl patterns that achieve similar elegance in a more idiomatic fashion
Ability to inherit subroutines on the package level, as well as the object level
Ability to execute code on 'use' in the scope of the current package, or
Have the current package passed to code that is executed on 'use'
A guide to meta-programming in Perl
A existing declarative ORM library that supports easily creating mock-adapters as well as DB2, and MySQL
Ideally I would like to avoid running eval on large strings as much as possible.
Any help would be greatly appreciated :-)
Alternate perl patterns that achieve similar elegance in a more idiomatic fashion
Roles surpass mixins.
Ability to inherit subroutines on the package level, as well as the object level
Roles are normally consumed on the package level, but with trickery also can be applied to an instance only. (FIXME how?)
Ability to execute code on 'use' in the scope of the current package
import
Have the current package passed to code that is executed on 'use'
All parameters on the use statement are passed into import as arguments.
A guide to meta-programming in Perl
Moose::Manual, Moose::Cookbook
A existing declarative ORM library that supports easily creating mock-adapters as well as DB2, and MySQL
DBIx::Class
Using perl and TAP, I have written a lot of selenium tests and saved them in *.t files.
I have created some helper functions, put them into a non-object oriented package, say My::Util::SeleniumHelper.
All functions are exported from this module.
In the beginning, one package was sufficient, now the single-module API contains quite a few unrelated functions. These functions are called, for example make_sel(),
head_ok(),
cms_logout(),
cms_login(),
cms_clickthru_simple(),
selenium_rc_running(),
treecontrol_toggles() - you get the idea.
Moreover, many blocks of code in the t-files are still redundant, making the .t file look like a template.
Thus, I want to give my *.t code a more OO design.
Any ideas on how to design the new API?
Essentially, I am also looking for code examples (here, or on the internet) where someone has extended the selenium object in a clever way. It does not have to be in perl.
Would it be useful to add methods to the Test::WWW::Selenium object $sel?
$sel->my_click_ok()
I should I try to override the $sel object?, Deriving a Test::WWW::Selenium::Customized class from Test::WWW::Selenium
This would violate the "Prefer composition over inheritance" idiom
Should I wrap the selenium object into another object using composition?
$myobj->{sel}->click_ok()
Here are some more requirements or thoughts:
I also want to use the pageObjects Pattern/Idiom. Not doing so yet.
Maybe so
$myobj->{current_page}->loginbox
or
$myobj->do_stuff($current_page->loginbox)
I noted that in most cases, basically, I'd like to give the selenium method something like an Moose's around() modifier. Do th standard thing, but do some things before and after.
However, I prefer to not use Moose here because the tests need to run on a few different machines, and don't want to install Moose and all its dependencies on all these PCs. I am not saying that is impossible to use moose, however I did not yet use non-moose objects (Test::WWW::Selenium) and moose objects together.
I'm using Moose and delegation to extend Test::WWW::Selenium. The only thing thats in the extension is configuration stuff (host, port, browser, etc). Everything else is in roles.
Making a custom class inheriting from the Selenium one seems completely reasonable in this case. Eric's Moose delegation solution is a little cleaner; but a bit more complicated too.
I'm subclassing Test::WWW::Selenium. new {} needs to call SUPER, but then on, it looks and tastes like the parent. I've got a new open() that lints the HTML and checks links (memoized of course).