How to localize CPAN module and dependencies - perl

I am trying to localize a CPAN module MooX::Options using Locale::TextDomain after having read "On the state of I18N in perl".
In the discussion in the pull request the question came up how to deal with messages not originating in the module itself, but in a dependency. In this specific case, when you specify an option on the command line which is not defined anywhere in the code, you'll get the warning:
Unknown option: xyz
originating in the module Getopt::Long, which in itself is not localized yet.
The question is how to deal with these. I see basically three strategies:
Ignore them, which I find dissatisfactory.
Try to someway or other catch all the corner cases and messages in the module I'm currently localizing (in this case MooX::Options), and this way working around the missing localization in the dependent modules. This option seems brittle, as I'd have to constantly adapt to changes in the base modules. Sometimes, it might by next to impossible to catch messages, as they're written to output streams directly by the modules (as is the case in this example).
Try to localize the dependent modules themselves. This option seems hard to achieve, as different projects might use different I18N tools and strategies themselves and the dependency graph might be huge.
All in all, I think this problem is more general and not that specific to perl and cpan modules. So, I'm interessted in your thoughts, strategies and approaches.

I have rather strong opionions on the idea of translating computing terms, and most people disagree with my views, so take what I am saying with a grain of salt.
I do not understand the point of internationalizing a library for parsing command line options unless you want to further ghettoize what is already a small group of users of said library.
Would wget be more useful to Turkish users if instead it was called wal or wgetir? Or, instead of wget --mirror, should Turkish users write getir --ayna? What about that w?
If you just translate the messages, what is the point of outputting a help message in response to wget -h when the Turkish equivalent would be wget -y?
The fact is, almost all attempts at translating programming related terms I have seen are simply awful. The people who are most eager to translate are usually not in command of either human language — Nor do they seem to understand what they are translating.
However, as a result of these eager people, I find that at least the Turkish translations of pretty much any software I touch is just awful. Whatever Danish translations I have seen did not fare much better, but, at least, they were tolerable owing to the greater commonality of structure between Danish and English.
I think everyone's energy is better spent on actually making sure their programs handle content, including names of external resources/references, in different languages well, rather than giving me error messages in some Frankenstein language, or letting me specify command line options whose mnemonics do not match their descriptions etc, or presenting menus that contain of strings of words that really do not convey any meaning.
I have felt this way for the last for many decades now ... Even when I was patching IBM PC keyboard drivers with hex editors so people at various places could type reports in WordStar, and create charts in Harvard Graphics.
So, my unpopular advice is to put your energy elsewhere ...
For example, use exception objects so the user of your library (who is likely a programmer and will understand "Directory not found" much more readily than "Kütük bulunamadı") can deduce in a human-language independent way what happened, and what message to show the user. I haven't looked closely at MooX::Options, but I notice there is at least one string croak.
Here is an actual error message from an IBM product:
Belirtilen kütük örüntüsüyle eşleşen hiçbir kütük bulunamadı
You can ask every one of the almost 200 million Turkic people on earth what a "kütük örüntüsü" is, and only the person who actually came up with this non-sensical string of characters will be able to tell you that it corresponds to "file pattern". What, then, do they gain by using the phrase "kütük örüntüsü" versus "file pattern"? Nothing.
However, they lose the ability to communicate with, and, also, compete with, programmers in the English speaking world.
PS: Apologies for all Turkish examples, but I feel most comfortable drawing abominable examples based on my native language.

Related

Perl SQLite error handling: taking different actions according to the error code

I'm new to Perl programming (and to SO too) so my question may be formulated in a bad way, but I really have read a lot of books and tutorials and I haven't found anything addressing (even mentioning) my problem.
I'm trying to use DBI and SQLite to write some code which retries an insertion query if a recoverable error occurs (DB full or locked, etc.), but dies if the error is unrecoverable (DB deleted or corrupted, etc.).
I've found that the SQLite C interface exports the error codes:
http://www.sqlite.org/c3ref/c_abort.html
but I haven't found anything similar for Perl. I really wish I don't have to use magic numbers in my very first Perl program! :-)
By the way, the documents and examples I saw online explain very well the manual vs. automatic (i.e. with exceptions) error handling in DBI, but none of them shows how to take different actions according to the error type. Isn't this a common use case?
Moreover, they all agree that DBI::err is not the correct variable to tell which error occurred. They more or less implicitly say that DBI::errstr is to be used, but I find it a bit awkward to rely on a string comparison against a human oriented, possibly multi-line error string...
Thanks for any suggestion!
My work with DBI has almost always been with mysql instead of sqlite3, so I can't speak from direct experience. However, don't feel too bad if you absolutely must check for magic strings and numbers. The key thing is how you do it. Whenever you have to rely on magic strings and/or numbers, put them in a constant or hash in the configuration portion of your script (or perhaps even a configuration file).
The bad thing about magic numbers/strings is that they are hard to manage if they change, or either no one quite remembers what generates that condition, etc. But this is mitigated if you do it correctly and document it.
BTW - if you're just getting started, I strongly recommend reading Damian Conway's, "Perl Best Practices". I checked, and he doesn't address handling magic strings/numbers, but it's still the best book I've read on Perl. Once you go through PBP, take a look at Perl Critic -- it's a wonderful tool that will make you cry. It will flag magic strings, and many, many other things :)

What are "not so well defined problems" that LISP is supposed to solve?

Most people agree that LISP helps to solve problems that are not well defined, or that are not fully understood at the beginning of the project.
"Not fully understood"" might indicate that we don't know what problem we are trying to solve, so the developer refines the problem domain continuously. But isn't this process language independent?
All this refinement does not take away the need for, say, developing algorithms/solutions for the final problem that does need to be solved. And that is the actual work.
So, I'm not sure what advantage LISP provides if the developer has no idea where he's going i.e. solving a problem that is not finalised yet.
Lisp (not "LISP") has a number of advantages when you're facing problems that are not well-defined. First of all, you have a REPL where you can quickly experiment with -- that helps in sketching out quick functions and trying to play with them, leading to a very rapid development cycle. Second, having a dynamically typed language is working well in this context too: with a statically typed language you need to "design more" before you begin, and changing the design leads to changing more code -- in contrast, with Lisps you just write the code and the data it operates on can change as needed. In addition to these, there's the usual benefits of a functional language -- one with first class lambda functions, etc (eg, garbage collection).
In general, these advantage have been finding their way into other languages. For example, Javascript has everything that I listed so far. But there is one more advantage for Lisps that is still not present in other languages -- macros. This is an important tool to use when your problem calls for a domain specific language. Basically, in Lisp you can extend the language with constructs that are specific to your problem -- even if these constructs lead to a completely different language.
Finally, you need to plan ahead for what happens when the code becomes more than a quick experiment. In this case you want your language to cope with "growing scripts into applications" -- for example, having a module system means that you can get a more "serious"
application. For example, in Racket you can get your solution separated into such modules, where each can be written in its own language -- it even has a statically typed language which makes it possible to start with a dynamically typed development cycle and once the code becomes more stable and/or big enough that maintenance becomes difficult, you can switch some modules into the static language and get the usual benefits from that. Racket is actually unique among Lisps and Schemes in this kind of support, but even with others the situation is still far more advanced than in non-Lisp languages.
In AI (Artificial Intelligence) historically Lisp was seen as the AI assembly language. It was used to build higher-level languages which help to work with the problem domain in a more direct way. Many of these domains need a lot of 'knowledge' for finding usable answers.
A typical example is an expert system for, say, oil exploration. The expert system gets as inputs (geological) observations and gives information about the chances to find oil, what kind of oil, in what depths, etc. To do that it needs 'expert knowledge' how to interpret the data. When you start such a project to develop such an expert system it is typically not clear what kind of inferences are needed, what kind of 'knowledge' experts can provide and how this 'knowledge' can be written down for a computer.
In this case one typically develops new languages on top of Lisp and you are not working with a fixed predefined language.
As an example see this old paper about Dipmeter Advisor, a Lisp-based expert system developed by Schlumberger in the 1980s.
So, Lisp does not solve any problems. But it was originally used to solve problems that are complex to program, by providing new language layers which should make it easier to express the domain 'knowledge', rules, constraints, etc. to find solutions which are not straight forward to compute.
The "big" win with a language that allows for incremental development is that you (typically) has a read-eval-print loop (or "listener" or "console") that you interact with, plus you tend to not need to lose state when you compile and load new code.
The ability to keep state around from test run to test run means that lengthy computations that are untouched by your changes can simply be kept around instead of being re-computed.
This allows you to experiment and iterate faster. Being able to iterate faster means that exploration is less of a hassle. Very useful for exploratory programming, something that is typical with dealing with less well-defined problems.

Moose OOP or Standard Perl?

I'm going into writing some crawlers for a web-site, the idea is that the site will use some back-end Perl scripts to fetch data from other sites, my design (in a very abstract way..) will be to write a package, lets say:
package MyApp::Crawler::SiteName
where site name will be a module / package for crawling specific sites, I will obviously will have other packages that will be shared across different modules, but that not relevant here.
anyway, making long short, my question is: Why (or why not...) should I prefer Moose over Standard OO Perl?
While I disagree with Flimzy's introduction ("I've not used Moose, but I have used this thing that uses Moose"), I agree with his premise.
Use what you feel you can produce the best results with. If the (or a) goal is to learn how to effectively use Moose then use Moose. If the goal is to produce good code and learning Moose would be a distraction to that, then don't use Moose.
Your question however is open-ended (as others have pointed out). There is no answer that will be universally accepted as true, otherwise Moose's adoption rate would be much higher and I wouldn't be answering this. I can really only explain why I choose to use Moose every time I start a new project.
As Sid quotes from the Moose documentation. Moose's core goal is to be a cleaner, standardized way of doing what Object Oriented Perl programmers have been doing since Perl 5.0 was released. It provides shortcuts to make doing the right thing simpler than doing the wrong thing. Something that is, in my opinion, lacking in standard Perl. It provides new tools to make abstracting your problem into smaller more easily solved problems simpler, and it provides a robust introspection and meta-programming API that tries to normalize the beastiary that is hacking Perl internals from Perl space (ie what I used to refer to as Symbol Table Witchery).
I've found that my natural sense of how much code is "too much" has been reduced by 66% since I started using Moose[^1]. I've found that I more easily follow good design principles such as encapsulation and information hiding, since Moose provides tools to make it easier than not. Because Moose automatically generates much of the boiler-plate that I normally would have to write out (such as accessor methods, delegate methods, and other such things) I find that it's easier to quickly get up to speed on what I was doing six months ago. I also find myself writing far less tricky code just to save myself a few keystrokes than I would have a few years ago.
It is possible to write clean, robust, elegant Object Oriented Perl that doesn't use Moose[^2]. In my experience it requires more effort and self control. I've found that in those occasions where the project demands I can't use Moose, my regular Object Oriented code has benefitted from the habits I have picked up from Moose. I think about it the same way I would think about writing it with Moose and then type three times as much code as I write down what I expect Moose would generate for me[^3].
So I use Moose because I find it makes me better programmer, and because of it I write better programs. If you don't find this to be true for you too, then Moose isn't the right answer.
[^1]: I used to start to think about my design when I reached ~300 lines of code in a module. Now I start getting uneasy at ~100 lines.
[^2]: Miyagawa's code in Twiggy is an excellent example that I was reading just today.
[^3]: This isn't universally true. There are several stories going around about people who write less maintainable, horrific code by going overboard with the tools that Moose provides. Bad programmers can write bad code anywhere.
You find the answer why to use Moose in the Documentation of it.
The main goal of Moose is to make Perl 5 Object Oriented programming easier, more consistent, and less tedious. With Moose you can think more about what you want to do and less about the mechanics of OOP.
From my experience and probably others will tell you the same. Moose reduces your code size extremly, it has a lot of features and just standard features like validation, forcing values on creation of a object, lazy validation, default values etc. are just so easy and readable that you will never want to miss Moose.
Use Moose. This is from something I wrote last night (using Mouse in this case). It should be pretty obvious what it does, what it validates, and what it sets up. Now imagine writing the equivalent raw OO. It's not super hard but it does start to get much harder to read and not just the code proper but the intent which can be the most important part when reading code you haven't seen before or in awhile.
has "io" =>
is => "ro",
isa => "FileHandle",
required => 1,
handles => [qw( sysread )],
trigger => sub { binmode +shift->{io}, ":bytes" },
;
I wrote a big test class last year that also used the handles functionality to redispatch a ton of methods to the underlying Selenium/WWWMech object. Disappearing this sort of boilerplate can really help readability and maintenance.
I've never used Moose, but I've used Catalyst, and have extensive experience with OO Perl and non-OO Perl. And my experience tells me that the best method to use is the method you're most comfortable using.
For me, that method has become "anything except Catalyst" :) (That's not to say that people who love and swear by Catalyst are wrong--it's just my taste).
If you already have the core of a crawler app that you can build on, use whatever it's written in. If you're starting from scratch, use whatever you have the most experience in--unless this is your chance to branch out and try something new, then by all means, accomplish your task while learning something new at the same time.
I think this is just another instance of "which language is best?" which can never be answered, except by the individual.
When I learned about objects in Perl, first thing I thought was why it's so complicated when Perl is usually trying to keep things simple.
With Moose I see that uncomplicated OOP is possible in Perl. It sort of bring OOP of Perl back to manageable level.
(yes, I admit, I don't like perl's object design.)
Although this was asked 10 years ago, much has changed in the Perl world. I think most people now see that Moose didn't deliver all people thought it might. There are several derivative projects, lots of MooseX shims, and so on. That much churn is the code smell of something that's close to useful but not quite there. Even Stevan Little, the original author, was working on it's replacement, Moxie, but it never really went anywhere. I don't think anyone was ever completely satisfied with Moose.
Currently, the future is probably not Moose, irrespective of your estimation of that framework. I have a chapter for Moose in Intermediate Perl, but I'd remove it in the next edition. That's not a decision from quality, just on-the-ground truth from what's happening in that space.
Ovid is working on Corrina, a similar but also different framework that tries to solve some of the same problems. Ovid outlines various things he wants to improve, so you can use that as a basis for your own decision. No matter what you think about either Moose or Corrina, I think the community is now transitioning out of its Moose phase.
You may want to listen to How Moose made me a bad OO programmer, a short talk from Tadeusz Sośnierz at PerlCon 2019.
Even if your skeptical, I'd say try Moose in a small project before you commit to reconfiguring a large codebase. Then, try it in a medium project. I tend to think that frameworks like Moose look appealing in the sorts of examples that fit on a page, but don't show their weaknesses until the problems get much more complex. Don't go all in until you know a little more.
You should at least know the basics of Moose, though, because you are likely to run into other code that uses it.
The Perl 5 Porters may even include this one in core perl, but a full delivery may happen in the five to ten year range, and even then you'd have to insist on a supported Perl. Most people I encounter are about three to five years behind on Perl versions. If you control your perl version, that's not a problem. Not everyone does though. Holding out for Corrina may not be a good short-term plan.

Are MooseX::Declare and MooseX::Method::Signatures production ready?

From the current version (0.98) of the Moose::Manual::MooseX are the lines:
We have high hopes for the future of
MooseX::Method::Signatures and
MooseX::Declare. However, these
modules, while used regularly in
production by some of the more insane
members of the community, are still
marked alpha just in case backwards
incompatible changes need to be made.
I noticed that for MooseX::Method::Signatures the change log for September 2009 mentions the removal of the "scary ALPHA disclaimer".
So, are these still "alpha"?
Would I still be considered one of the "more insane" to use them?
I'd say they are production ready - I'm using them in production - but there are several things to consider:
Performance
MooseX::Declare and dependencies do almost all of their magic at compile time. Depending on the size of your program, you might find anywhere from half a second to several seconds of additional initialization overhead. If this a problem, don't use MooseX::Declare.
At runtime, the main overhead is type and argument checking, which you should (ideally) be doing anyway. That said, Moose type constraints have some overheads, namely coercion and the more complex (MooseX::Types::Structured-style) constraints. Don't use these if performance is an issue.
Stability
MooseX::Declare and MooseX::Method::Signature's external syntax is now stable. But it is important to know that the internals are subject to extreme change. (fortunately, changes for the better)
To give you an idea, the signature itself is grabbed using a big block of C code stolen from the Perl tokenizer (toke.c). This can break in some situations since it isn't actually parsing anything. The bit inside the brackets is parsed using PPI, which is designed for pure Perl, but the resulting PPI tree is then hacked up to get something useful. Devel::Declare itself is a hack - after it sees specific keywords (e.g. 'role', 'class', 'method') the Devel::Declare-using module must rewrite the source code by hand, with no interaction with the real Perl parser.
Corner cases may cause Perl to segfault. Or rewrite the source code badly, so you get syntax errors but have no idea what's causing them without -MO::Deparse. If you mess up the MooseX::Declare syntax by accident, there is no guarantee that the module will detect this and give you a sensible error. The ALPHA message may have gone, but this is still doing dark and scary things internally, and you should be prepared for that.
UPDATE
MooseX::Declare has not been updated much, and you may wish to look at alternatives such as Moops. Personally, I have decided to stick with pure Moose until Perl itself begins to support class/method/has syntax natively, which is possibly on the cards.
I think it's a matter of differing perspectives as much as anything -- rafl is one of the aforementioned "more insane members of the community" while Rolsky is more conservative. It's up to you to decide who you agree with, and really I think that the most important variable is your own code.
MooseX::Declare is good code. It won't randomly blow up your machine, it's not awful for performance, and it offers a lot of nifty stuff while reducing the amount of boilerplate that you have to write. But it might change in the future, making your code refuse to compile until it's updated; it might make your editor and other development tools confused when it sees syntax that it can't parse, it might piss off your collaborators by making them learn a new module to work with your code, or it might piss off your boss by making it so any future maintainer has to learn a new module to work with your code. Which of those things apply to you, and to what degree? You know better than I do, I hope.
There are people who feel that the maturity and stability of MooseX::Delcare, Devel::Declare on which it's based, or even Moose itself are not yet ready for "prime time". I also know of two large companies with millions of visitors a month, who have MooseX::Declare in their production environment. I personally am happy with the stack I am provided with Moose and do not see a need yet to bring in MooseX::Declare. I know people who's opinion I deeply respect who refuse to write new code without the declarative sugar from MooseX::Declare.
All of this is to say, the decision on whether something is or is not production ready is highly dependent upon your production environment, your development needs, and taste for risk. Without being in your shoes we can't possibly give an informed decision as to how well any given tool matches that profile.
MooseX::Method::Signatures (MXMS), and MooseX::Declare which uses it, is not production ready. This is not because the code isn't stable, but because it is appallingly slow. Simply using the method keyword, no types or arguments, is a 500-1000x runtime performance hit over a regular method call. My Macbook Pro can do about 6,000 simple method calls per second using MXMS vs 5,000,000 with plain Perl.
Method::Signatures, in contrast, has almost no performance hit above what it would normally cost to do the requested checks. The syntax is almost exactly the same as MXMS and it supports Moose (and Mouse) types. Both rely on the same underlying syntax modifying technique. (Full disclosure, I am the author of Method::Signatures.)
If you like MooseX::Declare but want the performance of Method::Signatures, try Method::Signatures::Modifiers.
It depends on what you mean by "production ready". I wouldn't depend on them until their velocity slows down quite a bit. I like my production stuff to not need frequent care from external code changes, API adjustments, and so on. That's not something particular to Moose, but any young project.
You have to judge how much that matters to you. In some situations, pushing stuff into production is a lengthy process, so you must be circumspect with such things. At the other extreme, some places let you edit files directly on the production server. That is, you have to define your tolerance before anyone can tell you which side a given MooseX module is on.
MooseX::Declare and MooseX::Method::Signatures are working well but they can have really nasty penalty depending on what does your code do. This can be fixed by just not using method keyword or using Method::Signatures::Modifiers.
Performance penalty I am seeing is around 2-5x compared to Method::Signatures::Modifiers (5x being mostly for one specific class I was using). And it seems that it is mostly compile time or maybe first time initialization, because it is getting under 2x when the computation is longer.
Method::Signatures::Modifiers has better errors but you have to turn this optimization off when you use debugger (it goes haywire because it does not see these methods, you can see for yourself in -MO=Deparse output).
It may be worth it to get rid of Perl argument shifting hell.

When should I use OO Perl?

I'm just learning Perl.
When is it advisable to use OO Perl instead of non-OO Perl?
My tendency would be to always prefer OO unless the project is just a code snippet of < 10 lines.
TIA
From Damian Conway:
10 criteria for knowing when to use object-oriented design
Design is large, or is likely to become large
When data is aggregated into obvious structures, especially if there’s a lot of data in each aggregate
For instance, an IP address is not a good candidate: There’s only 4 bytes of information related to an IP address. An immigrant going through customs has a lot of data related to him, such as name, country of origin, luggage carried, destination, etc.
When types of data form a natural hierarchy that lets us use inheritance.
Inheritance is one of the most powerful feature of OO, and the ability to use it is a flag.
When operations on data varies on data type
GIFs and JPGs might have their cropping done differently, even though they’re both graphics.
When it’s likely you’ll have to add data types later
OO gives you the room to expand in the future.
When interactions between data is best shown by operators
Some relations are best shown by using operators, which can be overloaded.
When implementation of components is likely to change, especially in the same program
When the system design is already object-oriented
When huge numbers of clients use your code
If your code will be distributed to others who will use it, a standard interface will make maintenence and safety easier.
When you have a piece of data on which many different operations are applied
Graphics images, for instance, might be blurred, cropped, rotated, and adjusted.
When the kinds of operations have standard names (check, process, etc)
Objects allow you to have a DB::check, ISBN::check, Shape::check, etc without having conflicts between the types of check.
There is a good discussion about same subject # PerlMonks.
Having Moose certainly makes it easier to always use OO from the word go. The only real exception is if compilation start-up is an issue (Moose does currently have a compile time overhead).
I don't think you should measure it by lines of code.
You are right, often when you are just writing a simple script OO is probably too much overhead, but I think you should be more flexible regarding the 10 lines aproach.
In all cases when you are using OO Perl Rememebr to use Moose (or Mouse)
This question doesn't have that much to do with Perl. The question is "when, given a choice, should I use OO?" That "given a choice" bit is because in some languages (Java, for example), you really don't have any choice.
The answer is "when it makes sense". Think about the problem you're trying to solve. Does the problem fit into the OO concepts of classes and object? If it does, great, use OO. Otherwise use some other paradigm.
Perl is fairly flexible, and you can easily write procedural, functional, or OO Perl, or even mix them together. Don't get hung up on doing OO because everyone else is. Learn to use the right approach for each task.
All of this takes experience and practice, so make sure to try all these approaches out, and maybe even take some smaller problems and solve them in multiple ways to see how each works.
Damian Conway has a passage in Perl Best Practices about this. It is not a rule that you have to follow it, but it is probably better advice that I can give without knowing a lot about what you are doing.
Here is the publisher's page if that is a better place to link to the book.