How can I add internationalization to my Perl script? - perl

I'm looking at introducing multi-lingual support to a mature CGI application written in Perl. I had originally considered rolling my own solution using a Perl hash (stored on disk) for translation files but then I came across a CPAN module which appears to do just what I want (i18n).
Does anyone have any experience with internationalization (specifically the i18n CPAN module) in Perl? Is the i18n module the preferred method for multi-lingual support or should I reconsider a custom solution?

There is a Perl Journal article on software localisation. It will provide you with a good idea of what you can expect when adding multi-lingual support. It's beautifully written and humourous.
Specifically, the article is written by the folks who wrote and maintain Locale::Maketext, so I would recommend that module simply based upon the amount of pain it is clear the authors have had to endure to make it work correctly.

If you have the time then do take a look at the way the I18N is done in the Jifty framework - although initially quite confusing it is very elegant and usable.
They overload _ so that you can use _("text to translate") anywhere in the code. These strings are then translated using Locale::Maketext as normal.
What makes it really powerful is that they defer the translation until the string is needed using Scalar::Defer so that you can start adding the strings at any time, even before you know which language they will be translated into. For example in config files etc. This really make I18N easy to work with.


Is this worth doing for practice and learning, without modules (Perl)

looking into connecting to a secure ftp site (using perl), and downloading all the .log files, saving in new directories named after the day I downloaded the files. I want to do this without modules, as a learning experience, but before I start I wanted to know if you guys thought it was doing, or is way too much for a relatively new programmer and I should just learn the modules?
If it's production work, no, use the modules. Your implementation will be buggy, missing features and unknown to the next person maintaining that code.
Otherwise, yes. It's good to learn the principles of a network protocol. I do have a reservation about FTP as it is a bit baroque, insecure, inefficient and on its way out. scp, HTTP or rsync would be more useful to put your energy into.
I'd start with reading the RFC and putting together your own FTP module using just network sockets. Document and test it as if you were going to release to CPAN as a full learning exercise in making a network module. Run it against some various FTP server implementations as they often interpret the spec differently (or not at all). Don't be afraid to cheat and look at what the existing modules do. Who knows, you might write something better than what's already there.
Learning the principals, just like we did at school for long multiplication and division, means we know how things work when we use a short hand.
However, when new to the world,just like when you learn to speak, you did "A is for Apple" etc, you didnt get explained about the finesse of grammar and all that, you learnt to express yourself enough to be understood.
Programming is a little like the same. While in an ideal world you can easily argue a prewritten generic library is often way less efficient than a specifically targeted set of routines. If the wheel you are using was already invented, it seems a lot of work to make a new one.
So, use the wheels and cogs afailable, once you really have the hang of it, NOW look at inventing your own more efficient ones.
Ad cpan modules:
Modules are an great learning source. Here is zilion modules and you can really learn much studying some of them.
And when/while you mastering your perl, you will start writing you own modules. When your program will use modules anyway (yours one), you can ask - why don't use modules already developed and debugged?
So, learn perl basics, study some modules (for example Net::SFTP) and if you still want write your own solution - it is up to you. :)

Mixed language web dev environments

I have inherited a broad, ill-designed web portfolio at my job. Most pages are written in Perl as most of the data ingested, processed, and displayed on the site comes in the form of flat-files which then have to be meticulously regexed and databased in our MySQL and Oracle databases.
As the first IT-trained manager of this environment, I am taking it upon myself to scrub the websites and lay down some structure to the development process. One of the choices I have been given is to choose whether or not to continue in Perl. There is substantial in-house talent for Java and PHP is rather easy to learn. I have considered taking the reins off the developers and letting them choose whatever language they want to use for their pages but that sounds like it might be trouble if the guy who chose PHP gets hit by a bus and no one else can fix it.
As the years go by hiring Perl programmers becomes more and more difficult and the complexity of maintaining legacy Perl code from previous developers whose main focus may have been just getting a page up and running is becoming very resource-consuming. Another, previous (non-IT) manager was more focused in on quick turnaround and immediate gratification of pages rather than ensuring that it was done right the first time (he has since been promoted outside our branch).
The production server is solaris. MySQL has most of our data but new projects have begun using Oracle more and more (for GIS data). Web servers are universally Apache. We live in an intranet disconnected from the regular internet. Our development is conducted in an Agile, iterative manner.
Whatever language is chosen to push forward into, there are resources to have the existing development staff re-trained. No matter what, the data that comes into our environment will have to be regexed to death so perl isn't going away anytime soon. My question to the community is, what are the pros and cons of the following languages for the above-defined web dev environment: Perl, PHP, Java, Python, and --insert your favorite language here--. If you had it to do all over again, which language set-up would you have chosen?
Edits and Clarifications:
Let me clarify a bit on my original post. I'm not throwing everything away. I've been given the opportunity to adjust the course of the ship to what I believe is a better heading. Even if I chose a new language, the perl code will be around for some time to come.
Hypothetically speaking, if I chose Assembly as my new language (haha) I would have to get the old developers up to speed probably by sending them to some basic assembly classes. New pages/projects would be in the new language and the old pages/projects would have to play nice with the new pages/projects. Some might one day be rewritten into the new language, some may never be changed.
What will likely always be in Perl will be the parsing scripts we wrote years ago to sift through and database information from the flat files. But that's okay because they don't interface with the webpages, they interface with the database.
Thank you all for your input, it has been very very helpful thus far.
It seems that your problem is more legacy code and informal development methodology than the language per se. So if you already have Perl developers on staff, why not start modernizing your methods and your code base, instead of switching to a new language, and creating an heterogeneous code base.
Modern Perl offers a lot in terms of good practices and powerful tools: testing is emphasized, with the Test::* modules and WWW::Mechanize, data base interaction can be done through plain DBI, but also using ORM modules like the excellent DBIx::Class, OO with Moose is now on par with more modern languages, mod_perl gives you access to a lot of power within Apache. There are also quite a few MVC frameworks for Perl. One that's getting a lot of attention is Catalyst.
Invest in a few copies of Perl Best Practices, bring in a proper trainer for a few classes on modern development methods, and start changing the culture of your group.
And if you have trouble finding developers that are already proficient in Perl, you can always hire good PHP people and train them, that shouldn't be too difficult. At least their willingness to learn a new language would be a good sign of their flexibility and will to improve.
It is always tempting to blame the state of your code on the language its written in, but in your case I am not sure that is the case. Lots of big companies seem to have no problem managing huge code bases in Perl, the list is long but the main Web companies are all there, along with a number of financial institutions.
I would bring in someone who is very good at Perl, to at least look at the current design. They would be able to tell you how bad the Perl code really is, and what needs to be done to get it into good shape.
At that point I would start considering my options. If the Perl code is salvageable, well than great, hire someone proficient in Perl. Also train some of your existing staff to help on the existing code-base. If you don't have someone proficient in Perl in charge of the Perl code, your code-base may become even worse than it already is.
Only if it was in terrible shape, would I consider abandoning it for another language. What that language is, that is something your going to have to think about that yourself.
p.s. I'm a bit biased, I prefer Perl
If regex is important I would choose a language with good support.
If you would use java, you will not be able to just copy paste your regex code from the perl code because the slashes have to be escaped. So I would vote against java.
I'm not familiar with php enough to know its regex features, but given your choices I would go for python. You can create cleaner code in python.
Would ruby also be an option? It also has good perl like regex support and rails supports agile web development out of the box.
first off let me point out that MySQL's spatial extensions work with GIS.
Second, if you have a bunch of Perl programmers that will need tow ork on the new sites then your best bet is to choose something they won't have too much trouble understanding. The obvious "something" there is PHP. When I learnt PHP I'd done some Perl years ago and picked up PHP in no time at all.
Switching to something like Java or .Net (or even Ruby on Rails) would be a far more dramatic shift in design.
Plus with Apache servers you already have your environment set up and you can probably stage any development as a mix of Perl and PHP reasonably easily.
As to the last part of your question: what would you do it in if you could do it over again? To me that's a seprate and basically irrelevant question. The fact is you're not rewinding and starting over, you're just... starting over. So legacy support, transition, skills development and all those other issues are far more important than any hypothetical question of what you'd choose in a perfect world if all other things were equal.
Love it or hate it, PHP is popular and is going nowhere anytime soon. Finding skiled people to do it is not too difficult (well, the dificulty is filtering them out from the self-taught cowboy script jockeys who think they can code but can't) and it's not a far step from Web-based Perl.
If your developers are any good, they'll be able to handle anything thrown at them. Deciding what language to use is quite a tricky strategic position, but I recommend you think very carefully before introducing any MORE (i.e. don't).
Unless of course, there is something that you absolutely cannot do (or cannot do sanely) with what you've got.
I think choosing a single language is key and if your database is primarily MySQL, then PHP seems like the obvious choice. It naturally works with your database, it's open source and there is tons of documentation, source code, doesn't require compiling, etc.
People come and go through positions and any website will evolve over time. If you have the ability to set some guidelines and rules, I would choose something that is forgiving, common-place, and easy(er) to learn.
I'd also suggest writing it down so people in the future don't re-invent the wheel.

How do I develop web 2.0 apps with

A few years ago I did a lot of work with I'm evaluating using it again for a quick project. Can someone bring me up to speed on the current state of developing with in the "Web 2.0" world? What are the best libraries on CPAN to use with it? Are there clean ways to include jQuery, YUI, other CSS libs, etc, and do some AJAX. There are of course lots of libraries on CPAN but what works and what is commonly used?
We aren't still doing this?
I realize people are going to offer Catalyst as an answer. However, many people may have legacy apps they simply want to enhance. Is starting over really the best answer?
Personally, I'm no fan of Catalyst (too heavy for my taste) or Mason (mixing code and HTML is bad ju-ju), but I do quite well using for input[1], HTML::Template for output, and CGI::Ajax to provide AJAX functionality where called for.
If you're looking at frameworks, you may also want to consider CGI::Application, which is a widely-used and lighter-weight alternative to Catalyst/Mason.
[1] I can't recall the last time I called anything other than $q->param or $q->cookie from There are still a lot of tutorials out there saying to use its HTML-generation functions, but that's still mixing code and HTML in a way that's just as bad as using here docs, if not worse.
Consider using something more modern, for example Catalyst. It will make your life much easier and you won't have to reinvent the wheel. I understand that it is just a small project, but from my experience many small projects in time become large ones :)
The "web 2.0" apps that I've worked with usually use client-side JavaScript to request JSON data from the server, then use that data to update the page in-place via DOM.
The JSON module is useful for returning structured data to a browser.
As far as including JavaScript, HTML, or whatever in a here doc - that was never a good idea, and still isn't. Instead, use one of the plethora of template modules to be found on CPAN. For a CGI, I'd avoid "heavy" modules like Mason or Template Toolkit, and use a lighter module for quicker startup, such as Text::Template, or Template::Simple.
Yes, you can write perfect web2.0 web applications WITHOUT using any framework on the server side in any language Perl, Python, Java, etc and WITHOUT using any JavaScript libraries/framework on the client side. The definition of web 2.0 is kind of a loose definition, and I'm guessing by web2.0, you mean Ajax or partial page refresh, then all you would really need is to focus on the following:
Know about the XmlHttpRequest object.
Know how to return JSON object from the server to the client.
Know how to safely evaluate/parse the JSON object using JavaScript and know to manipulate the DOM. Also, at least know about innerHTML. InnerHTML is helpful occasioanally.
Know CSS.
Having said that, it's a lot easier to use some framework on the server side, but not because it's required by web2.0 and it's a lot easier to use some JavaScript on the client like jQuery, mootools, YUI. And you can mix-and-match depends on your needs and your tastes. Most JavaScript provides wrapper around the XmlHttpRequest so that it works across all browsers. No one write "naked" XmlHttpRequest anymore, unless you want to show some samples.
It's perfectly possible to write "Web 2.0" apps using, but you'll have to do the work yourself. From what I've seen, the focus in the Perl development community has been on developing successor frameworks to CGI, not on writing helper modules to let legacy apps get bootstrapped into modern paradigms. So you're somewhat on your own.
As to whether to start over, what are you really trying to accomplish? Everyone's definition of "Web 2.0" is somewhat different.
If you're trying to introduce a few modern features (like AJAX) to a legacy app, then there's no reason you need to start over.
On the other hand if you're trying to write something that truly looks, feels, and works like a modern web app (for example, moving away from the page-load is app-state model), you should probably consider starting from the ground up. Trying to make that much of a transformation happen after the fact is going to be more trouble than it's worth for anything but the most trivial of apps.
I agree with Adam's answer, you probably want to use Catalyst. That being said, if you really don't want to, there's nothing preventing you from using only The thing is, Catalyst is a collection of packages that do the things you want to make Web 2.0 easy. It combines the various templating engines such as Template Toolkit or Mason with the various ORM interfaces like DBIx::Class and Class::DBI.
Certainly you don't have to use these things to write Web 2.0 apps, it's just a good idea. Part of your question is wondering if javascript and CSS frameworks like jQuery, or prototype require anything from the server-side code. They don't, you can use them with any kind of server-side code you want.
For new apps, if you don't find Catalyst to your taste, Dancer is another lightweight framework you may like. There are also plenty of others, including CGI::Simple, Mojo/Mojolicious, Squatting...
Any of these lightweight frameworks can take care of the boring parts of web programming for you, and let you get on with writing the fun parts the way you want to.
If the jump from to Catalyst seems too daunting then perhaps something like Squatting might be more appropriate?
Squatting is a web microframework and I have found it ideal for quick prototyping and for replacing/upgrading my old CGI scripts.
I have recently built a small "web 2.0" app with Squatting using jQuery with no issues at all. Inside the CPAN distribution there is an example directory which contains some programs using jQuery and AJAX including a very interesting [COMET]( example which makes use of Continuity (which Squatting "squats" on by default).
NB. If required then you can later "squat" your app onto Catalyst with Squatting::On::Catalyst
There is also CGI::Ajax.

Is Perl a good option for heavy text-processing?

I have this web application which needs to do several heavy text processing tasks: removing certain characters, parsing XML files, among others. Some of them involve regular expressions.
The web application has some implementations in Java and others in PHP. Is it worth using Perl or other specific text processing language for such tasks, or is there really no difference with using PHP?
I even thought of using Sed, Awk maybe even some compiled C scripts for processing texts. There's a lot of text to be processed...
Yes, Perl is a good option. As a language, it's definitely more suitable for those kinds of tasks than Java or PHP. If you have the Perl knowledge, I would recommend it for this kind of task.
I too suggest you use Perl, it's made for text crunching.
However, if you are going to parse/process XML, please don't try to roll your own solution, there are several high quality modules that do the job correctly. As a starter, I recommend you take a look at XML::Twig
Also, for regular expressions, there are dozens of already-made ones under the Regexp::Common distribution. Most probably you'll find what you need there and it will save you time.
Perl is THE language for text processsing. It was designed with this in mind.
Text processing is exactly what Perl was created for. After all it's Practical Extraction and Report Language. On the other hand, for web application I'd prefer Python.
Yes, Perl was designed with processing text in mind.
It has tons of useful text processing features, and it was the first language I used (long ago) that had regular expressions.
Yes. Text processing is PERL's #1 strong point. Since you will integrate into your existing app, you'll need to execute an external program so think about how to run it securely and perhaps as a background process (to avoid start up delays in your real time web app.)

What's the best way to write a Perl CGI application?

Every example I've seen of CGI/Perl basically a bunch of print statements containing HTML, and this doesn't seem like the best way to write a CGI app. Is there a better way to do this? Thanks.
EDIT: I've decided to use CGI::Application and HTML::Template, and use the following tutorial: Thanks!
Absolutely (you're probably looking at tutorials from the 90s). You'll want to pick a framework. In Perl-land these are the most popular choices:
CGI::Application - very lightweight with lots of community plugins
Catalyst - heavier with more bells and whistles
Jifty - more magical than the above
This is a really, really big question. In short, the better way is called Model/View/Controller (aka MVC). With MVC, your application is split into three pieces.
The Model is your data and business logic. It's the stuff that makes up the core of your application.
The View is the code for presenting things to the user. In a web application, this will typically be some sort of templating system, but it could also be a PDF or Excel spreadsheet. Basically, it's the output.
Finally, you have the Controller. This is responsible for putting the Model and View together. It takes a user's request, gets the relevant model objects, and calls the appropriate view.
mpeters already mentioned several MVC frameworks for Perl. You'll also want to pick a templating engine. The two most popular are Template Toolkit and Mason.
Leaving the question of CGI vs MVC framework for the moment, what you're going to want is one of the output templating modules from the CPAN.
The Template Toolkit is very popular ( on CPAN)
Also popular are Text::Template, HTML::Template, and HTML::Mason.
HTML::Mason is much more than a template module, and as such might be a little too heavy for a simple CGI app, but is worth investigating a little while you're deciding which would be best for you.
Text::Template is reasonably simple, and uses Perl inside the templates, so you can loop over data and perform display logic in Perl. This is seen as both a pro and con by people.
HTML::Template is also small and simple. It implements its own small set of tags for if/then/else processing, variable setting, and looping. That's it. This is seen as both a pro and a con for the exact opposite reasons as Text::Template.
Template toolkit (TT) implements a very large, perlish template language that includes looping and logic, and much more.
I used HTML::Template one, and found I wanted a few more features. I then used Text::Template with success, but found its desire to twiddle with namespaces to be a little annoying. I've come to know and love Template Toolkit. For me it just feels right.
Your mileage may vary.
Of course, there is still the old "print HTML" method, sometimes a couple of print statements suffices. But you've hit upon the idea of separating your display from your main logic. Which is a good thing.
It's the first step down the road to Model/View/Controller (MVC) in which you keep separate your data model&business logic (your code that accepts the input, does something with it, and decides what needs to be output), your your input/output (Templates or print statements - HTML, PDF, etc.) , and the code that connects the two (CGI, CGI::Application, Catalyst MVC Framework, etc.). The idea being that a change to your data structure (in the Model) should not require changes to your output routines (View).
The Perl5 Wiki provides a good (though not yet complete) list of web frameworks & templates.
The comparison articles linked in the "templates" wiki entry is worth reading. I would also recommend reading this push style templating systems article on PerlMonks.
For templating then Template Toolkit is the one I've used most and can highly recommend it. There is also an O'Reilly book and is probably the most used template system in the Perl kingdom (inside or outside of web frameworks).
Another approach which I've been drawn more and more to is non template "builder" solutions. Modules like Template::Declare & HTML::AsSubs fit this bill.
One solution that I feel strikes the right balance in the Framework/Roll-your-own dilemma is the use of three key perl modules:, Template Toolkit , and DBI. With these three modules you can do elegant MVC programming that is easy to write and to maintain.
All three modules are very flexible with Template Toolkit (TT) allowing you to output data in html, XML, or even pdf if you need to. You can also include perl logic in TT, even add your database interface there. This makes your CGI scripts very small and easy to maintain, especially when you use the "standard" pragma.
It also allows you to put JavaScript and AJAXy stuff in the template itself which mimics the separation between client and server.
These three modules are among the most popular on CPAN, have excellent documentation and a broad user base. Plus developing with these means you can rather quickly move to mod_perl once you have prototyped your code giving you a quick transition to Apache's embedded perl environment.
All in all a compact yet flexible toolset.
You can also separate presentation out from code and just use a templating system without needing to bring in all the overhead of a full-blown framework. Template Toolkit can be used by itself in this fashion, as can Mason, although I tend to consider it to be more of a framework disguised as a templating system.
If you're really gung-ho about separating code from presentation, though, be aware that TT and Mason both allow (or even encourage, depending on which docs you read) executable code to be embedded in the templates. Personally, I feel that embedding code in your HTML is no better than embedding HTML in your code, so I tend to opt for HTML::Template.