Best ways to become familiar with a Perl codebase?

Best ways to become familiar with a Perl codebase? - perl

I recently joined a Perl project and I need to start being productive with the codebase fairly quickly. However, I'm finding that I'm getting stuck because I don't know where I need to change or how all the parts of the code fit together.
What are your tips and tools for becoming familiar with a Perl codebase that you have no experience with?
(Note: I realize that there's already a similar question. I'm wondering if there's any Perl-specific strategies.)

First, if the previous maintainers were doing their job well, you should have an extensive test suite and perldoc documentation for each module and script in the codebase. If so, read through the perldoc, and read through the tests. The perldoc should give you an overview of what things do, and the test suite will give you examples of the code being used in context.
Depending on the author, the internal comments may be useful in understanding the intention of the code, so looking through the actual source my provide insights into algorithms, bugs, and intended use as well.
If you don't have any of these, proceed as you would for any badly-maintained codebase: start small, writing programs that try to use the code, and use Test::More and the like to start turning these into a test suite.
In the first case, you may find it to be very simple, in the second, very hard. Peter Scott's Perl Medic can be very useful in assisting you in turning such a codebase into something usable and useful if you're stuck with the second case, and Mike Thomsen's recommendation of Effective Perl Programming is also a good one.

I work on Melody which is written primarily in Perl. It's a rather large code base, and I've found the process of learning the Melody code base is identical to any Java system I've worked on.
It really comes down to just working with it, googling when you see behavior you've never seen before and experimentation.
This book is a great reference for picking up Perl in a serious way. It's not very dense and it will teach you a lot about proper Perl development.

Besides the "similar question", http://perldoc.perl.org/ and an empty test.pl file is a good starting point!

I would like to see a real answer here. The only thing I have is more questions (you don't have to provide answers here, just ask yourself):
Goal
What is the goal of the project, what it is supposed to do?
Who knows the workflow?
Environment
Can you set up the project in a clean test environment?
Does it use a versioning system?
Where are the entry points (i.e. executable files)?
Does it rely on external programs?
Does it require additional system tweaks (i.e. cron scripts)?
Perl code
Is your project using strict and warnings everywhere?
Which CPAN modules are used?
Are there any frameworks used (Moose, Catalyst, probably some ORM, ...)?
Are there any perldocs in the project's modules?
Are there any tests (notably t/*.t)?

I usually start with working on some simple bug report or a simple feature I want to add. While working on code I write comments for code and commit them. Writing tests also helps.

Related

Is it "OK" to wrap standard Perl modules with Moose?

Many standard modules are all using straight up perl -- problem is these guys arent using Moosey stuff, so I catch myself wrapping them with Moose or reinventing some simple functions in bigger libraries for convenience.
I wondered if there was any general approach to how developers using Moose incorporate other libraries that are non-Moose.
Being new to Perl and Moose I'd like to have a better understanding of how Moose is used in situations like this, or when it is generally preferred to use Moose vs Perl or even MooseX, or some other package, or whether its arbitrary.
Seems like there are different schools of thought, but Perl being as old as it is -- there are too many conflicting sources, so it's hard to navigate to a consistent truth. I'm not sure what to believe!
Anyone have a definitive source they turn to for "modern" usage of perl? Understand I've only been using perl for a month so I'm green to this community.
Updated
I don't want to hurt anybody's feelings by talking about libraries they love in a way they may not appreciate, so I've removed my side commentary about certain libraries Ive used to refocus on the question at hand.
Thanks for your guidance!

While I do not know what others do, I would be very reluctant to create myself extra work. I do not see any general need to Moosify a bunch of modules that already work.
If you want to inherit from non-Moose modules, take a look at MooseX::NonMoose.
If the HTML generation cruft in CGI.pm bothers you, you can use CGI::Simple.

About Reinventing CGI
CGI is a library which has been thoroughly tried and tested and if it needs improvement, you could build an extension, or contribute/contact the maintainers. You have to remember that modules are only as good as their track record (reliability) and their upkeep. Many people created decent modules, but didn't continue maintaining it, so they sort of fell to obscurity.
CGI is a boat all of its own, which if you think there's a lot of overhead, you could use CGI::Simple or CGI::Minimal. CGI.pm does more than parse querystrings, it also has cookie management (sessions), HTML generation, and other useful functions.
Others have had some criticisms of the overhead with CGI.pm, but that's why they developed FastCGI, which is modifying the server to use a persistent state of the script, thus loading the overhead once, rather than on every page load.
It is possible for you to create another (even better) version, but why bother? Many people may probably tell you, you shouldn't reinvent the wheel, with good reason. CGI has been around for almost 2 decades, with so many users testing it, finding holes, and having patched the holes; however, I'm never a big fan of saying "you shouldn't do something." If you think something could be better, make it better. There are many OSes that exist today just because of that reason, why settle for something that does 95% of what you need, if you need that other 5% too? But I also say, weigh your costs vs. benefits and determine if you want to devote your time to this, or if maybe there's another problem out there that has yet to be solved, that could use a little more manpower. To have something successful, you're going to need to test it thoroughly, and will most likely need to create something that other people would want and (at this point) there isn't much of a reason for CGI-users to be motivated to switch.
About Modern Perl
I think "modern Perl" is an oxymoron. I would jokingly call modern Perl; Ruby or Python.
That isn't to say that Perl isn't useful, because it is, but it's been around a long time. While it has had its significant share of changes from version to version, the most popular, Perl5, hasn't changed all that much; mind you, my definition of change is not adding to the language (new operators and functionality), but deprecating/replacing old features or changing the behavior of existing ones (like for/foreach loops).
Note: Perl6 could be considered modern perl (and does have many significant changes), but it's not widely adopted and was supposed to be released many many years ago (it's the Duke Nukem 4 Ever of programming languages).
About XS
I haven't done much module programming, but if memory serves correct, XS is the interface between Perl and C, which I think allows you to compile your perl modules for faster execution. Consider the PostgreSQL DBI module. There's a DBD::PgPP, which is a pure perl module to interface with Postgres, but there's also DBD::Pg, which I think compiles some of the code using C and takes advantage of some other OS utilities. Compiled modules have the benefit of faster load and execution (there may be some better resource management in there too).

Is this worth doing for practice and learning, without modules (Perl)

looking into connecting to a secure ftp site (using perl), and downloading all the .log files, saving in new directories named after the day I downloaded the files. I want to do this without modules, as a learning experience, but before I start I wanted to know if you guys thought it was doing, or is way too much for a relatively new programmer and I should just learn the modules?

If it's production work, no, use the modules. Your implementation will be buggy, missing features and unknown to the next person maintaining that code.
Otherwise, yes. It's good to learn the principles of a network protocol. I do have a reservation about FTP as it is a bit baroque, insecure, inefficient and on its way out. scp, HTTP or rsync would be more useful to put your energy into.
I'd start with reading the RFC and putting together your own FTP module using just network sockets. Document and test it as if you were going to release to CPAN as a full learning exercise in making a network module. Run it against some various FTP server implementations as they often interpret the spec differently (or not at all). Don't be afraid to cheat and look at what the existing modules do. Who knows, you might write something better than what's already there.

Learning the principals, just like we did at school for long multiplication and division, means we know how things work when we use a short hand.
However, when new to the world,just like when you learn to speak, you did "A is for Apple" etc, you didnt get explained about the finesse of grammar and all that, you learnt to express yourself enough to be understood.
Programming is a little like the same. While in an ideal world you can easily argue a prewritten generic library is often way less efficient than a specifically targeted set of routines. If the wheel you are using was already invented, it seems a lot of work to make a new one.
So, use the wheels and cogs afailable, once you really have the hang of it, NOW look at inventing your own more efficient ones.

Ad cpan modules:
Modules are an great learning source. Here is zilion modules and you can really learn much studying some of them.
And when/while you mastering your perl, you will start writing you own modules. When your program will use modules anyway (yours one), you can ask - why don't use modules already developed and debugged?
So, learn perl basics, study some modules (for example Net::SFTP) and if you still want write your own solution - it is up to you. :)
'

Which cpan modules are the best to read and study?

I've looked into the source code of DBIx::Class recently and found that I don't understand a thing (though I mastered a couple tricks while trying to).
So my question is: which CPAN modules are a must read for someone who wants to learn, and in what order?

If I were doing the same I’d probably start with the ::Tiny space. I’d expect it to be less distracting—fewer edge cases cluttering things—and more idiomatic—terseness lends itself to Perl idiom—in general.
Then I’d attack the medium–large nodes from this excellent document—Map of the CPAN’s authors (large PDF). Update: Web version. Zoom into the bigger nodes then search on search.cpan.org for them. The largest nodes sometimes represent old-school and while exceptional code exists in the old school, not a lot of good teaching examples do (so I say). Authors like Miyagawa, Kennedy, and Kogman come to mind immediately as worth reviewing. There are plenty of others. Basically any module you see recommended here often, look up the author and poke around his or her other packages, as it were.

I learned quite a bit (tie-ing, platform independent filesystem access, etc) by reading the code for File::chdir. It is also a very handy module to use in your scripts, I use it all the time.
I would also add to bvr's list: read the source for modules that you use frequently, since you are already familiar with their expected behavior, you can more clearly see what is being done to achieve that result.

The question is what you want to learn, but it is certainly good idea to study various modules, because you learn to read other people code and learn various tricks. Some random recommendations I can think of
start with smaller modules with clear interface you know and are interested in
once you feel familiar with organization of modules and basics, try something larger
try rather newer modules
look into test suite and into examples
if you don't understand specific piece, try to make reduced example a play with it
It is hard to recommend something specific, but I liked my recent look into Web::Scraper module.

If you're fluent in perl - if don't perldoc ;) - , sugest learn the packages Task::Kensho or Modern::Perl.
These packages do cover comprehensive in culture Perl, since tests until hacks, passing by crawling, modules to developers, e-mail, dates, modern oriented obejct in Perl.
Participe of discussion lists, read the history of list, irc.
Perl have many tricks, the community always responds with enthusiasm =)

Which Perl test module should I use?

There's Test::Simple, Test::More, Test::Builder (all part of the Test::Simple distribution), Test::Class, Test::Unit, Test::Moose...
I'm starting a new project using Moose—which module should I use for writing my tests?

You can eliminate Test::Builder from the list for a while. Test::Builder is the base module that other Test:: modules are built on. So until you want to start writing your own test modules you won't need it.
I'd also ignore Test::Simple. Test::More does all that Test::Simple does - and plenty more.
Test::Class is a nice way to write unit tests in a really object oriented way. I'd recommend it for complex OO-based systems.
Test::Moose is for testing various Moose-related features in your code. You say that you're using Moose so it might well be useful for you. It can be used in conjunction with Test::More.
So my recommendation would be to start with Test::More and Test::Moose. But also take a look at Test::Class to see if it fits with the way that you want to write tests.
Perl Testing: A Developers Notebook is a great introduction to this topic.

In addition to davorg's excellent answer, I'd like to note that I'm still mostly using Test::More (with some help from Test-Differences, Test-WWW-Mechanize-LibXML, and other modules). I can recommend against using Test.pm which is old and stupid, and Test::Simple, which is a small subset of the Test::More functionality.
There are also Test::Most (an extension of Test::More), Test::Class and Test::Class::Most, which some people prefer, but I didn't take the time to learn them yet.
There's an ongoing debate about whether a plan (= tests' count) is a good thing or not. Personally, I've already noticed a case in someone else's CPAN module where the count of tests was different on my system's than on them (and varied based on different versions of DBI (IIRC)) and which convinced me that the plan is a good thing. As a result I've created Test-Count which is a way to count and update the assertions count based on annotations inside well-formed comments (and which supports source code in other languages besides Perl 5). I'm still supporting it, so if you need anything, give me a shout.

I'd recommend Test::Class as a basis for your test framework. It makes for better structured, more modular code. And you can still use Test::More and other Test modules with it.
Also check Test::Exception.

Others have suggested Test::Class; I have found the following PDF overview from $foo Magazin (I didn't write it, just found it) was quite helpful for some exampled beyond what the POD documentation provides.

It is not really essential for moose development.
However, if you're into Web Development , I think Test::WWW::Selenium is becoming indespensable for testing javascript-heavy web pages and their behavior in the most common web browsers (firefox, iexplorer, googlechrome, etc.)

Perl OO frameworks and program design - Moose and Conway's inside-out objects (Class::Std)

This is more of a use-case type of question... but also generic enough to be more broadly applicable:
In short, I'm working on a module that's more or less a command-line wrapper; OO naturally. Without going into too many details (unless someone wants them), there isn't a crazy amount of complexity to the system, but it did feel natural to have three or four objects in this framework. Finally, it's an open source thing I'll put out there, rather than a module with a few developers in the same firm working on it.
First I implemented the OO using Class::Std, because Perl Best Practices (Conway, 2005) made a good argument for why to use inside-out objects. Full control over what attributes get accessed and so on, proper encapsulation, etc. Also his design is surprisingly simple and clever.
I liked it, but then noticed that no one really uses this; in fact it seems Conway himself doesn't really recommend this anymore?
So I moved to everyone's favorite, Moose. It's easy to use, although way way overkill feature-wise for what I want to do. The big, major downside is: it's got a slew of module dependencies that force users of my module to download them all. A minor downside is it's got way more functionality than I really need.
What are recommendations? Inconvenience fellow developers by forcing them to use a possibly-obsolete module, or force every user of the module to download Moose and all its dependencies?
Is there a third option for a proper Perl OO framework that's popular but neither of these two?

To be perfectly fair, seeing virtually everything interesting these days in Perl world has Moose somewhere as a dependency, I don't see it being much a debt for other "fellow Perl developers".
Chances are they already have it installed as we speak!
Edit: Some statistics:
Moose is currently rated at 65th place on the "Most Depended on" modules list, Aliases top 100, with over 1637 packages depending on it. Thats almost as much as stuff like Time::HiRes , and more than DBI, and I don't think you're as likely to question depending on those would you?

The currently accepted "modern Perl OO framework" is Moose. I'd say make your users download it, or you can bundle it up with your modules in the installation using PAR::Packer.
Quoting from "But I can't use CPAN" (...because my users won't want to install things):
Assuming you're just handling your users a tarball, then Module::Install provides a solution - if you put your script into script/ and then do
install_script(glob 'script/*');
auto_install;
in your Makefile.PL, then not only will 'make install' put your script somewhere useful for you but 'make installdeps' will invoke cpan (or if present, cpanplus) to install all missing dependencies for you.

Well, there's Mouse, which is like Moose but without all the dependencies (and some of the features). It also starts up a bit faster. I haven't tried it myself, but it's generally well thought of.

To add to the existing fine answers...
Some of what was recommended in PBP isn't bad advice, but Perl marches on. When it was written, inside-out objects were the new hotness. Now the Moose has absorbed all. There is MooseX::InsideOut which gives you the power of Moose with the total encapsulation of Class::Std, but unless you work with undisciplined programmers its really not necessary.
Those features of Moose you don't need now, you'll need them eventually. Even if you don't need all of them, with Moose you won't have to use and learn Yet Another OO System every time you need an interesting feature. And god forbid you need TWO features at the SAME TIME!

There is also a Perl Module Object::InsideOut , actively maintained as of 2010.
Kind of a predecessor to Moose, or to be clear: development started independently at the same time as Moose started,
I know it exists but I haven't used it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse