Method to understand existing perl code? [closed] - perl

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I've got some perl code (about 300-500 lines) that I've got to get working again. I've got limited experience programming, and normally if I'm coding I just find the best solution that make sense to me as is. In this case, I've got to use this code because it's built for an existing legacy system, for which the code is the documentation for the wiring, logic and rendering of the system. It's not a lot of code, but I also can't post even chunks of code or data to get help. What's the best method of understanding the syntax, what it's doing, how the code is wired, how the logic is model, etc.
Questions, feedback, comments -- just comment, thanks!!

See the book
Perl Medic, Transforming Legacy Code, by Peter Scott, from 2004.
A few review notes, including a table of contents, are listed at this perlmonks node.
A brief review is here.
The book covers many techniques in enough depth to learn & apply them to your problem. If you only have a little time, scan the whole book, then I recommend Chapter 3 on testing, Chapter 4 on rewriting, and Chapter 11, A Case Study (30+ pages).
If you don't use this book, at least use perldoc to learn Test::More and related modules. Having useful tests that exercise the original and modified code will build your confidence in making changes, because you can see when a specific change causes a test to fail.
Update.
See this book for more detailed data on Perl's test tools than given in Perl Medic:
Perl Testing: A Developer's Notebook, by Ian Langworth & chromatic, 2006.

Some time ago I read nice text about understanding large project quickly on perlmonks - Swallowing an elephant in 10 easy steps. There are many useful suggestions that can pay off.

Have a look for tools like perltidy which will regularize the formatting.
Also consider running the code in the debugger, and stepping through line by line. See perldoc perldebug for details.

My advice would be to start with the functions... go through the code, find everywhere a function is used and determine if it is a regular perl function that is universal, or if it is a custom function. Then search the code and find where all of the custom functions are created and determine what each individual function does.
Add comments to the code as you go, once you figure out what a function is doing, add a comment.
That alone should give you a good head start.
In more nitty gritty, you may want to go through after that and label/comment/make note of what each perl variable is used for/what it is the first time it is used.
That should get you well on your way to figuring the code out... and if there is a function or something else you can't figure out, search the web using the wonderful Google... or post here not necessarily with the exact code, as you said you can't, but with a general idea of what it is doing.
Also, one thing I forgot to mention, is to find any loops, whether while, for, etc and determine what is running inside of them and what they are being looped through.

There have been some good suggestions already, another one is the B::Deparse module, which attempts to rewrite your code (often) in a longer clearer way.
For example when you run perl -MO=Deparse ob.pl on the file ob.pl containing this obfuscated code:
#P=split//,".URRUU\c8R";#d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
#p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
You get:
#P = split(??, '.URRUUxR', 0);
#d = split(??, "\nrekcah xinU / lreP rehtona tsuJ", 0);
sub p {
#p{"r$p", "u$p"} = ('P', 'P');
pipe "r$p", "u$p";
++$p;
($q *= 2) += $f = !fork;
map {$P = $P[$f ^ ord $p{$_} & 6];
$p{$_} = / ^$P/xi ? $P : close $_;} keys %p;
}
p ;
p ;
p ;
p ;
p ;
map {close $_ if $p{$_} =~ /^[P.]/;} %p;
wait until $?;
map {<$_> if /^r/;} %p;
$_ = $d[$q];
sleep rand 2 if /\S/;
print $_;
ob.pl syntax OK
Which is (possibly) better; I don't know, maybe not. Worth a try anyway.
Edit: you can get even a little more info by changing the command to perl -MO=Deparse,-p ob.pl which puts in explicit parentheses which can help understand operator precedence.

Aren't there any comments in the program that can let you know what each line or each function is doing?
I would go through the code from start to finish. Using a pen and lots of paper. This were the author of the program should have written test cases, so that new programmers, like yourself, can understand how the code works. Not many programmers write test cases.

Related

Execute Commands in the Linux Commandline [Lazarus / Free Pascal]

I have a problem. I want to execute some commands in the Commandline of linux. I tested TProcess (So i am using Lazarus) but now when i am starting the programm, there is nothing, wich the Program do.
Here is my Code:
uses [...], unix, process;
[...]
var LE_Path: TLabeledEdit;
[...]
Pro1:=TProcess.Create(nil);
Pro1.CommandLine:=(('sudo open'+LE_Path.Text));
Pro1.Options := Pro1.Options; //Here i used Options before
Pro1.Execute;
With this Program, i want to open Files with sudo (The Programm is running on the User Interface)
->Sorry for my Bad English; Sorry for fails in the Question: I am using StackOverflow the first time.
I guess the solution was a missing space char?
Change
Pro1.CommandLine:=(('sudo open'+LE_Path.Text));
to
Pro1.CommandLine:=(('sudo open '+LE_Path.Text));
# ----------------------------^--- added this space char.
But if you're a beginner programmer, my other comments are still worth considering:
trying to use sudo in your first bit of code may be adding a whole extra set of problems. SO... Get something easier to work first, maybe
/bin/ls -l /path/to/some/dir/that/has/only/a/few/files.
find out how to print a statement that will be executed. This is the most basic form of debugging and any language should support that.
Your english communicated your problem well enough, and by including sample code and reasonable (not perfect) problem description "we" were able to help you. In general, a good question contains the fewest number of steps to re-create the problem. OR, if you're trying to manipulate data,
a. small sample input,
b. sample output from that same input
c. your "best" code you have tried
d. your current output
e. your thoughts about why it is not working
AND comments to indicate generally other things you have tried.

would it be worth it to use inline::C to speed up math

i have been working on a perl program to process large amounts of dna. It outputs exactly what i need however it takes much longer than i would like using NYTprof i have narrowed down the major problem areas to be the loop that adds my values together. would using inline::C to do the math make my program faster or should i accept the speed and move on? is there another way to improve the speed? here is my program and an input it would run as well as an executable with the default values entered already.
It's unlikely you'll get useful help here (this included). I can see various problems with your code, and none have to do with the choice of language.
use CPAN. If you're parsing genbank, then use some an appropriate module.
You're writing assembly in Perl, and neither Perl nor you are very good at that. It's near impossible to know what's going on when you don't pass parameters to subroutines, instead relying on globals all over the place. What do #X1, #X2, #Y1, #Y2 mean?
The following might be your problem: until ($ender - $starter > $tlength) { (line 153). According to your test case, these start by being 103, 1, and 200, and it's not clear when or if they change. Depending on what's in #te, it might or might not ever get out of the loop; I just can't tell from your code.
It would help if we knew, exactly, what are the parameters to add, the in-out invariants, and what it is returning.
That's all I got.
I second the recommendation of PDL made in a comment, if it's applicable. Or the use of a CPAN module tailored to your problem (again, if applicable).
I didn't see anything that looked unambiguously like "the loop that adds my values together" in that code; please, show just the code you are considering optimizing, ideally with just enough structure around it to actually run it.
So to answer your generic question generically, yes, Inline::C can be a useful tool for optimization if you are certain your performance problem is limited to what it actually can do for you. In using it, be aware that invoking your C code from Perl or vice versa is non-trivially expensive, so you have to have enough code translated to C to minimize the transitions.

What makes Perl code maintainable?

I've been writing Perl for several years now and it is my preferred language for text processing (many of the genetics/genomics problems I work on are easily reduced to text processing problems). Perl as a language can be very forgiving, and it's possible to write very poor, but functional, code in Perl. Just the other day, my friend said he calls Perl a write-only language: write it once, understand it once, and never ever try to go back and fix it after it's finished.
While I have definitely been guilty of writing bad scripts at times, I feel like I have also written some very clear and maintainable code in Perl. However, if someone asked me what makes the code clear and maintainable, I wouldn't be able to give a confident answer.
What makes Perl code maintainable? Or maybe a better question is what makes Perl code hard to maintain? Let's assume I'm not the only one that will be maintaining the code, and that the other contributors, like me, are not professional Perl programmers but scientists with programming experience.
What makes Perl code unmaintainable? Pretty much anything that makes any other program unmaintainable. Assuming anything other than a short script intended to carry out a well defined task, these are:
Global variables
Lack of separation of concerns: Monolithic scripts
NOT using self-documenting identifiers (variable names and method names). E.g. you should know what a variable's purpose is from its name. $c bad. $count better. $token_count good.
Spell identifiers out. Program size is no longer of paramount concern.
A subroutine or method called doWork doesn't say anything
Make it easy to find the source of symbols from another package. Either use explicit package prefix, or explicitly import every symbol used via use MyModule qw(list of imports).
Perl-specific:
Over-reliance on short-cuts and obscure builtin variables
Abuse of subroutine prototypes
not using strict and not using warnings
Reinventing the wheel rather than using established libraries
Not using a consistent indentation style
Not using horizontal and vertical white space to guide the reader
etc etc etc.
Basically, if you think Perl is -f>#+?*<.-&'_:$#/%!, and you aspire to write stuff like that in production code, then, yeah, you'll have problems.
People tend to confuse stuff Perl programmers do for fun (e.g., JAPHs, golf etc) with what good Perl programs are supposed to look like.
I am still unclear on how they are able to separate in their minds code written for IOCCC from maintainable C.
I suggest:
Don't get too clever with the Perl. If you start playing golf with the code, it's going to result in harder-to-read code. The code you write needs to be readable and clear more than it needs to be clever.
Document the code. If it's a module, add POD describing typical usage and methods. If it's a program, add POD to describe command line options and typical usage. If there's a hairy algorithm, document it and provide references (URLs) if possible.
Use the /.../x form of regular expressions, and document them. Not everyone understands regexes well.
Know what coupling is, and the pros/cons of high/low coupling.
Know what cohesion is, and the pros/cons of high/low cohesion.
Use modules appropriately. A nice well-defined, well-contained concept makes a great module. Reuse of such modules is the goal. Don't use modules simply to reduce the size of a monolithic program.
Write unit tests for you code. A good test suite will not only allow you to prove your code is working today, but tomorrow as well. It will also let you make bolder changes in the future, with confidence that you are not breaking older applications. If you do break things, then, well, your tests suite wasn't broad enough.
But overall, the fact that you care enough about maintainability to ask a question about it, tells me that you're already in a good place and thinking the right way.
I don't use all of Perl Best Practices, but that's the thing that Damian wrote it for. Whether or not I use all the suggestions, they are all worth at least considering.
What makes Perl code maintainable?
At the least:
use strict;
use warnings;
See perldoc perlstyle for some general guidelines that will make your programs easier to read, understand, and maintain.
One factor very important to code readability that I haven't seen mentioned in other answers is the importance of white space, which is both Perl-agnostic and in some ways Perl-specific.
Perl lets you write VERY concise code, but consise chunks don't mean they have to be all bunched together.
White space has lots of meaning/uses when we are talking about readability, not all of them widely used but most useful:
Spaces around tokens to easier separate them visually.
This space is doubly important in Perl due to prevalence of line noise characters even in best-style Perl code.
I find $myHashRef->{$keys1[$i]}{$keys3{$k}} to be less readable at 2am in the middle of producion emergency compared to spaced out:
$myHashRef->{ $keys1[$i] }->{ $keys3{$k} }.
As a side note, if you find your code doing a lot of deep nested reference expressions all starting with the same root, you should absolutely consider assigning that root into a temporary pointer (see Sinan's comment/answer).
A partial but VERY important special case of this is of course regular expressions. The difference was illustrated to death in all the main materials I recall (PBP, RegEx O'Reilly book, etc..) so I won't lengthen this post even further unless someone requests examples in the comments.
Correct and uniform indentation. D'oh. Obviously. Yet I see way too much code 100% unreadable due to crappy indentation, and even less readable when half of the code was indented with TABs by a person whose editor used 4 character tabs and another by a person whose editor used 8 character TABs. Just set your bloody editor to do soft (e.g. space-emulated) TABs and don't make others miserable.
Empty lines around logically separate units of code (both blocks and just sets of lines). You can write a 10000 line Java program in 1000 lines of good Perl. Now don't feel like Benedict Arnold if you add 100-200 empty lines to those 1000 to make things more readable.
Splitting uber-long expressions into multiple lines, closely followed by...
Correct vertical alignment. Witness the difference between:
if ($some_variable > 11 && ($some_other_bigexpression < $another_variable || $my_flag eq "Y") && $this_is_too_bloody_wide == 1 && $ace > my_func() && $another_answer == 42 && $pi == 3) {
and
if ($some_variable > 11 && ($some_other_bigexpression < $another_variable ||
$my_flag eq "Y") && $this_is_too_bloody_wide == 1 && $ace > my_func()
&& $another_answer == 42 && $pi == 3) {
and
if ( $some_variable > 11
&& ($some_other_bigexpression < $another_variable || $my_flag eq "Y")
&& $this_is_too_bloody_wide == 1
&& $ace > my_func()
&& $another_answer == 42
&& $pi == 3) {
Personally, I prefer to fix the vertical alignment one more step by aligning LHS and RHS (this is especially readable in case of long SQL queries but also in Perl code itself, both the long conditionals like this one as well as many lines of assignments and hash/array initializations):
if ( $some_variable > 11
&& ($some_other_bigexpression < $another_variable || $my_flag eq "Y")
&& $this_is_too_bloody_wide == 1
&& $ace > my_func()
&& $another_answer == 42
&& $pi == 3 ) {
As a side note, in some cases the code could be made even more readable/maintainable by not having such long expressions in the first place. E.g. if the contents of the if(){} block is a return, then doing multiple if/unless statements each of which has a return block may be better.
i see this as an issue of people being told that perl is unreadable, and they start to make assumptions about the maintability of their own code. if you are conscientious enough to consider readability as a hallmark of quality code, chances are this critique doesn't apply to you.
most people will cite regexes when they discuss readability. regexes are a dsl embedded in perl and you can either read them or not. if someone can't take the time to understand something so basic and essential to many languages, i'm not concerned about trying to bridge some inferred cognitive gap...they should just man up, read the perldocs, and ask questions where necessary.
others will cite perl's use of short-form vars such as #_, $! etc. these are all easily disambiguated...i'm not interested in making perl look like java.
the upside of all of these quirks and perlisms is that codebases written in the language are often terse and compact. i'd rather read ten lines of perl than one hundred lines of java.
to me there is so much more to "maintainability" than simply having easy-to-read code. write tests, make assertions...do everything else you can do to lean on perl and its ecosystem to keep code correct.
in short: write programs to be first correct, then secure, then well-performing....once these goals have been met, then worry about making it nice to curl up with near a fire.
I would say the packaging/object models, that gets reflected in the directory structure for .pm files. For my PhD I wrote quite a lot of Perl code that I reuse afterwards. It was for automatic LaTeX diagram generator.
I'll talk some positive things to make Perl maintainable.
It's true that you usually shouldn't get too clever with really dense statements a la return !$#;#% and the like, but a good amount of clever using list-processing operators, like map and grep and list-context returns from the likes of split and similar operators, in order to write code in a functional style can make a positive contribution to maintainability. At my last employer we also had some snazzy hash-manipulation functions that worked in a similar way (hashmap and hashgrep, though technically we only fed them even-sized lists). For instance:
# Look for all the servers, and return them in a pipe-separated string
# (because we want this for some lame reason or another)
return join '|',
sort
hashmap {$a =~ /^server_/ ? $b : +()}
%configuration_hash;
See also Higher Order Perl, http://hop.perl.plover.com - good use of metaprogramming can make defining tasks more coherent and readable, if you can keep the metaprogramming itself from getting in the way.

How can I profile a subroutine without using modules?

I'm tempted to relabel this question 'Look at this brick. What type of house does it belong to?'
Here's the situation: I've effectively been asked to profile some subroutines having access to neither profilers (even Devel::DProf) nor Time::HiRes. The purpose of this exercise is to 'locate' bottlenecks.
At the moment, I'm sprinkling print statements at the beginning and end of each sub that log entries and exits to file, along with the result of the time function. Not ideal, but it's the best I can go by given the circumstances. At the very least it'll allow me to see how many times each sub is called.
The code is running under Unix. The closest thing I see to my need is perlfaq8, but that doesn't seem to help (I don't know how to make a syscall, and am wondering if it'll affect the code timing unpredictably).
Not your typical everyday SO question...
This technique should work.
Basically, the idea is if you run Perl with the -d flag, it goes into the debugger. Then, when you run the program, ctrl-Break or ctrl-C should cause it to pause in the middle of whatever it is doing. Then you can type T to show the stack, and examine any other variables if you like, before continuing it.
Do this about 10 or 20 times. Any line of code (or any function, if you prefer) costing a significant percent of time will appear on that percent of stack samples, roughly, so you will not miss it.
For example, if a line of code (typically a function call) costs 20% of time, and you pause the program 20 times, you will see that line on 4 stack samples, give or take 1.8 samples. The amount of time that could be saved if you could avoid executing that line, or execute it a lot less, is a 20% reduction in overall execution time.
Then you can repeat it to find more problems.
You said the purpose is to 'locate' bottlenecks. This method does exactly that. Measuring function execution time is only a very indirect way to do that.
As far as syscall, there's a pretty good example in this post: http://www.cpan.org/scripts/date_and_time/gettimeofday
I think it's clear enough even for someone who never used syscall before (like myself :)
May I ask what the specifics of "having no access" are?
It's usually possible to get access to CPAN modules, even in cases where installing them in central location is not in the cards. Is there a problem with downloading the module? Installing it in your home directory? Using software with the module incuded?
If one of those is a hang-up it can probably be fixed... if it's some company policy, that's priceless :(
Well, you can write your own profiler. It's not as bad as it sounds. A profiler is just a very special-case debugger. You want to read the perldebguts man page for some good first-cut code to get started if you must write your own.
What you want, and what your boss wants, though he or she may not know it, is to use Devel::NYTProf to do a really good job of profiling your code, and getting the job done instead of having to wait for you to partially duplicate the functions of it while learning how it is done.
The comment you made about "personal use" doesn't make sense. You're doing a job for work, and the work needs to get done, and you need (or your manager needs to get you) the resources to do that work. "Personal use" doesn't seem to enter into it.
Is it a question of someone else refusing to sign off on the module to have it installed on the machine running the software to be measured? Is it a licensing question? Is it not being allowed to install arbitrary software on a production machine (understandable, but there's got to be some way the software's tested before it goes live - I hope - profile it there)?
What is the reason that a well-known module from a trustworthy source can't be used? Have you made the money case to your manager that more money will be spent coding a new, less-functional, profiler from scratch than finding a way to use one that is both good and already available?
For each subroutine, create a wrapper around it which reports the time in some format which you can export to something like R, a database, Excel or something similar (CSV would be a good choice). Add something like this to your code. If you are using a Perl less than 5.7 (when Time::HiRes was first added to core), use syscall as mentioned above instead of Time::HiRes's functions below.
INIT {
sub wrap_sub {
no strict 'refs';
my $sub = shift;
my $subref = *{$sub}{CODE};
return sub {
local *__ANON__ = "wrapped_$sub";
my $fsecs = Time::HiRes::gettimeofday();
print STDERR "$sub,$fsecs,";
if (wantarray) {
#return = eval { $subref->(#_) } or die $#;
} else {
$return[0] = eval { $subref->(#_) } or die $#;
}
$fsecs = Time::HiRes::gettimeofday();
print STDERR "$fsecs\n";
return wantarray ? #return : $return[0];
};
}
require Time::HiRes;
my #subs = qw{the subs you want to profile};
no strict 'refs';
no warnings 'redefine';
foreach my $sub (#subs) {
*{$sub} = wrap_sub($sub);
}
}
Replace 'subs you want to profile' with the subs you need profiled, and use an open()ed file handle instead of STDERR if you need to, bearing in mind you can get the results of the run separate from the output of the script (on Unix, with the bourne, korn and bash shells), like this
perl ./myscript.pl 2>myscript.profile

What's the minimal set of characters I need to filter before passing a string to a system call?

Assume that the following Perl code is given:
my $user_supplied_string = &retrieved_from_untrusted_user();
$user_supplied_string =~ s/.../.../g; # filtering done here
my $output = `/path/to/some/command '${user_supplied_string}'`;
The code is clearly insecure, but assume that the only thing that can be changed is the filtering code on line #2.
My question:
What is the minimal set of characters that needs to be filtered on line #2 to make the above code secure?
Please note:
Whitelisting is not an option in this case, so please keep your answer focused on what to filter out to make it secure. And more specifically; what is the minimal set of characters to filter out to make it secure? Everything else is off-topic.
Make sure your answer addresses the question stated ("What is the minimal set of characters that needs to be filtered on line #2 to make the above code secure?"). If your answer does not address that very specific question then don't post. Thanks.
First, given that you are concerned with security, I suggest you look into taint mode. As for the minimal set of characters to allow to be visible to shell, you are better off not letting any characters be seen by the shell:
my $output = do {
local $/;
open my $pipe, "-|", "/path/to/some/command", $user_supplied_string
or die "could not run /path/to/some/command: $!";
<$pipe>;
};
The set of characters that you allow depend on what the application in that system call is going to do with them. There's the shell special characters, but that's ony one part of the problem. You also have to ensure that the value you give to the command is valid input, and that requires some more work.
See, for instance, my chapter on security in Mastering Perl where I go into the gory details of the problem.
Perhaps you can explain why your problem ties both your hands behind your back and blindfolds you. Your problem isn't technical if those are your constraints.
After a little research, the following may be the minimal set you're looking for, at least on a subset of UNIX-like systems. Of course, I have not personally tested it, so YMMV:
&;`'\"|*?~<>^()[]{}$\n\r
In a regex:
s/[\&\;\`\'\\\"\|\*\?\~\<\>\^\(\)\[\]\{\}\$\n\r]//g
I don't think actually using this in real code would be a good idea, but I can see how it could be interesting out of pure curiosity.