I am trying to write in Perl to count the number of each A/C/G/T bases in a DNA sequence. But couldn't figure out what went wrong in my code. "ATCTAGCTAGCTAGCTA" is the kind of data I am given.
#!usr/bin/perl
use strict;
use warnings;
my $in_file = <$ARGV[0]>;
open( my $FH_IN, "<", $in_file );
my $dna = <$FH_IN>;
my $index = 0;
my ( $freq_a, $freq_c, $freq_g, $freq_t ) = 0;
my $dna_length = length($dna);
while ( $index < $dna_length ) {
my $base = substr( $dna, $index, 1 );
if ( $base eq "A" ) {
$freq_a++;
$index++;
next;
} elsif ( $base eq "C" ) {
$freq_c++;
$index++;
next;
} elsif ( $base eq "G" ) {
$freq_g++;
$index++;
next;
} elsif ( $base eq "T" ) {
$freq_t++;
$index++;
next;
} else {
next;
}
}
print "$freq_a\n$freq_c\n$freq_g\n$freq_t\n";
exit;
I know there are a lot of ways to do it, but what I want to know is what I did wrong so I can learn from mistakes.
Perl has a special file handle to use with these kinds of problems: The diamond operator <>. It will read input from either a file name, if provided, and standard input if not.
Secondly, since you are only interested in ACGT, might as well look for only them, using a regex: /([ACGT])/g.
Thirdly, using a hash is the idiomatic way to count characters in Perl: $count{A}++
So your script becomes:
use strict;
use warnings;
my %count;
while (<>) {
while (/([ACGT])/g) {
$count{$1}++;
}
}
print "$_\n" for #count{qw(A C G T)};
Usage:
script.pl input.txt
Okay, you've done well so far and there's only one problem that stops your program from working.
It's far from obvious, but each line that's read from the file has a newline character "\n" at the end. So what's happening is that $index reaches the newline in the string, which is processed by the else clause (because it's not A, C, G or T) which just does a next, so the same character is processed over and over again. Your program just hangs, right?
You could remove the newline with chomp, but a proper fix is to increment $index in the else clause just as you do with all the other characters. So it looks like
else {
++$index;
next;
}
As you've suspected, there are much better ways to write this. There are also a couple of other nasties in your code, but that change should get you on your way for now.
It would be instructive for you to print out the values in $dna_length, $index and $base each time you go round the loop - immediately after you assign a value to $base.
Your code would be more robust if you moved the incrementing of $index to the end of the loop (outside of the if/elsif/else block) and removed all of your next statements.
An alternative "quick fix" is to chomp() the input line before you start processing it.
I am starting out in Perl and am parsing some text line by line in a while loop and could not find help on this particular problem. I would like to use information from previously read lines of text in a current line of text.
My code is as follows:
while(<data>){
my $message = substr $_, 0, 1;
if ($message eq 'A'){
my $order_ref = substr $_, 1, 9;
my $order_book = substr $_, 20, 6;
push #add_orders, $_;
print add_order_file "$order_ref,$order_book\n";
}
if ($message eq 'X'){
my $order_ref = substr $_, 1, 9;
#now I would like to use order_ref to look up order_book from a previous line of text
# where the message is equal to A,
my $order_book = LOOKED UP VALUE FROM PREVIOUS TEXT;
push #add_orders, $_;
print add_order_file "$order_ref,$order_book\n";
}
}
"A" messages always precede "X" messages, so I know for sure that if I see an X entry with an order_ref number I scroll back and find the associated A message where I can pull out the order_book variable. I realize this will involve regexp's of some sort but I have no idea how to make Perl search previous lines only. Thanks!
EDIT: I should be clearer on this. "A" messages precede "X" messages, but they can all have different order_refs, so the data looks like this:
A order_ref1, order_book1
A order_ref2,order_book2
A order_ref3,order_book1
X order_ref2
X order_ref1
For the X orders I want to look up the order_book using order_ref2 and order_ref1.
With your re-definition of your entire question, a new answer is required.
You need to store your order_refs in a hash, to use for later lookup. This variable needs to be declared outside the while loop.
Note that I have changed the numbers in your substr calls to match your sample input. If you share some information on how the input lines are constructed, there may be a better way to extract the different values. Using substr assumes a fixed width type data.
use strict;
use warnings;
my %order_book; # your lookup hash
my #add_orders;
while (<DATA>) {
chomp;
my $message = substr $_, 0, 1;
if ($message eq 'A' or $message eq 'X') {
my $order_ref = substr $_, 2, 10;
if ($message eq 'A') {
$order_book{$order_ref} = substr $_, 13;
}
push #add_orders, $_;
print "$order_ref,$order_book{$order_ref}\n";
}
}
__DATA__
A order_ref1,order_book1
A order_ref2,order_book2
A order_ref3,order_book1
X order_ref2
X order_ref1
X order_ref3
Output:
order_ref1,order_book1
order_ref2,order_book2
order_ref3,order_book1
order_ref2,order_book2
order_ref1,order_book1
order_ref3,order_book1
TLP's answer already is correct. Here are some more suggestions to your code:
use strict; use warnings;
my #add_orders;
my $last_order_book;
while (my $line = <DATA>) {
my $message = substr $line, 0, 1;
if ( $message eq "A" ) {
my $order_ref = substr $line, 1, 9;
my $order_book = $last_order_book = substr $line, 20, 6;
push( #add_orders, $line );
print "$order_ref,$order_book\n";
}
elsif ( $message eq "Q" ) {
# Stuff happening ...
}
elsif ( $message eq "X" ) {
my $order_ref = substr $line, 1, 9;
my $order_book = $last_order_book;
push( #add_orders, $line );
print "$order_ref,$order_book\n";
}
}
__DATA__
A123456789012345678901234567890
XLine XLine XLine XLine XABCDEF
I've changed a couple of things in the code.
First of all, let's answer your question: You can add a variable that is scoped outside of the block to store your $order_book if you do not want to use the one you had inside the loop. I named it $last_order_book. It remembers the last seen value from the "A" part. Note that you can assign values to multiple variables by chaining them like my $foo = my $bar = "baz".
Now to my suggestions:
Always use strict and use warnings. I don't know if you did, but I'll say it just in case.
You are using $_ a lot. I believe that if you have to use it explicitly very often then you should actually just give it a name and use that instead. It will save you trouble understanding what is going on later.
Each line can only ever have one kind of $message, so it does not make sense to have multiple if {} constructs. Instead, use if {} elsif {} and sort them by the number of times each kind of line occurs. That will save time because it stops executing the whole if-construct once it found one of the conditions. This is useful if you are dealing with a lot of data, but it does not hurt to always do it this way. In order to make it more clear, I added a $message eq "Q" case.
I have the following question:
I want to create a perl script that reads from a text file (file with several columns of numbers) and calculate some statistics (mean, median, sd, variance). I already built one script, but as I am not in love yet with perl, I can't fix the problems of syntax on it...
Here is my perl script..
#!/usr/bin/perl -w
use strict;
open(FILEHANDLE, data.txt);
while (<FILEHANDLE>) {
shift #ARGV;
my #array = split(\t,$_);
}
close(FILEHANDLE);
###### mean, sum and size
$N = $sum = 0;
$array[$x-1];
$N++;
$sum += $array[$x-1];
###### minimum and the maximum
($min = 0, $max = 0);
$max = $array[$x-1] if ($max < $array[$x-1]), (my#sorted = sort { $a <=> $b } #samples) {
print join(" ",#sorted);
}
##### median
if ($N % 2==1) {
print "$median = $sorted[int($N/2)]\n"; ## check this out
};
else ($median = ($sorted[$N/2] + $sorted[($N/2)-1]) / 2)) {
print "$median\n"; # check this out
};
##### quantiles 1º and 3º
if $qt1 = $sorted[$r25-1] {
print "\n"; # check this out
};
else $qt1 = $fr*($sorted[$ir] - $sorted[$ir-1]) + $sorted[$ir-1] {
print "\n"; # check this out
};
##### variance
for (my $i=0;
$i<scalar(#samples);
$i++)
{
$Var += ($samples[$i]-$mean)**2;
$Var = $Var/($N-1);
};
###### standard error
($Std = sqrt($Var)/ sqrt($N));
############################################################
print "$min\n";
print "$max\n";
print "$mean\n";
print "$median\n";
print "$qt1\n";
print "$var\n";
print "$std\n";
exit(0);
I want to get it working. Please help. THANKS IN ADVANCE!
Errors in your code:
open(FILEHANDLE, data.txt);
data.txt needs to be quoted. You are not checking the return value of the open, e.g. ... or die $!. You should use a lexical filehandle and three argument open, e.g. open my $fh, '<', "data.txt" or die $!.
shift #ARGV;
This does nothing except remove the first value from you argument list, which is then promptly discarded.
my #array = split(\t,$_);
You are using \t as a bareword, it should be a regex, /\t/. Your #array is declared inside a lexical scope of the while loop, and will be undefined outside this block.
$N = $sum = 0;
Both variables are not declared, which will cause the script to die when you use strict (which is a very good idea). Use my $N to solve that. Also, $N is not a very good variable name.
$array[$x-1];
This will do nothing. $x is not declared (see above), and also undefined. The whole statement does nothing, it is like having a line 3;. I believe you will get an error such as Useless use of variable in void context.
$N++;
This increments $N to 1, which is a useless thing to do, since you only a few lines above initialized it to 0.
Well.. the list goes on. I suggest you start smaller, use strict and warnings since they are very good tools, and work out the errors one by one. A very good idea would be to make subroutines of your calculations, e.g.:
sub sum {
# code here
return $sum;
}
Go to perldoc.perl.org and read the documentation. Especially useful would be the syntax related ones and perlfunc.
Also, you should be aware that this functionality can be found in modules, which you can find at CPAN.
Your main problem is you have not declared your variables such as $N, $max, etc.
You need to introduce all new variables with my the first time you reference them. Just like you did with $array and $i. So for example
$N = $sum = 0;
Should become
my( $N, $sum ) = ( 0, 0 );
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}
My co-workers complain that my Perl looks too much like C, which is natural since I program in C most of the time, and Perl just a bit. Here's my latest effort. I'm interest in Perl that is easy to understand. I'm a bit of a Perl critic, and have little tolerance for cryptic Perl. But with readability in mind, how could the following code be more Perlish?
It's goal is to do a traffic analysis and find which IP addresses are within the ranges given in the file "ips". Here's my effort:
#!/usr/bin/perl -w
# Process the files named in the arguments, which will contain lists of IP addresses, and see if
# any of them are in the ranges spelled out in the local file "ip", which has contents of the
# form start-dotted-quad-ip-address,end-dotted-quad-ip_address,stuff_to_be_ignored
use English;
open(IPS,"ips") or die "Can't open 'ips' $OS_ERROR";
# Increment a dotted-quad ip address
# Ignore the fact that part1 could get erroneously large.
sub increment {
$ip = shift;
my ($part_1, $part_2, $part_3, $part_4) = split (/\./, $ip);
$part_4++;
if ( $part_4 > 255 ) {
$part_4 = 0;
($part_3++);
if ( $part_3 > 255 ) {
$part_3 = 0;
($part_2++);
if ( $part_2 > 255 ) {
$part_2 = 0;
($part_1++);
}
}
}
return ("$part_1.$part_2.$part_3.$part_4");
}
# Compare two dotted-quad ip addresses.
sub is_less_than {
$left = shift;
$right = shift;
my ($left_part_1, $left_part_2, $left_part_3, $left_part_4) = split (/\./, $left);
my ($right_part_1, $right_part_2, $right_part_3, $right_part_4) = split (/\./, $right);
if ($left_part_1 != $right_part_1 ) {
return ($left_part_1 < $right_part_1);
}
if ($left_part_2 != $right_part_2 ) {
return ($left_part_2 < $right_part_2);
}
if ($left_part_3 != $right_part_3 ) {
return ($left_part_3 < $right_part_3);
}
if ($left_part_4 != $right_part_4 ) {
return ($left_part_4 < $right_part_4);
}
return (false); # They're equal
}
my %addresses;
# Parse all the ip addresses and record them in a hash.
while (<IPS>) {
my ($ip, $end_ip, $junk) = split /,/;
while (is_less_than($ip, $end_ip) ) {
$addresses{$ip}=1;
$ip = increment($ip);
}
}
# print IP addresses in any of the found ranges
foreach (#ARGV) {
open(TRAFFIC, $_) or die "Can't open $_ $OS_ERROR";
while (<TRAFFIC> ) {
chomp;
if (defined $addresses{$_}) {
print "$_\n";
}
}
close (TRAFFIC);
}
From years of seeing Perl code written by C programmers, here's some generic advice:
Use hashes. Use lists. USE HASHES! USE LISTS! Use list operations (map, grep, split, join), especially for small loops. Don't use fancy list algorithms; pop, splice, push, shift and unshift are cheaper. Don't use trees; hashes are cheaper. Hashes are cheap, make them, use them and throw them out! Use the iterator for loop, not the 3-arg one. Don't call things $var1, $var2, $var3; use a list instead. Don't call things $var_foo, $var_bar, $var_baz; use a hash instead. Use $foo ||= "default". Don't use $_ if you have to type it.
Don't use prototypes, IT'S A TRAP!!
Use regexes, not substr() or index(). Love regexes. Use the /x modifier to make them readable.
Write statement if $foo when you want a block-less conditional. There's almost always a better way to write a nested condition: try recursion, try a loop, try a hash.
Declare variables when you need them, not at the top of the subroutine. use strict. use warnings, and fix them all. use diagnostics. Write tests. Write POD.
Use CPAN. Use CPAN! USE CPAN! Someone's probably already done it, better.
Run perlcritic. Run it with --brutal just for kicks. Run perltidy. Think about why you do everything. Change your style.
Use the time not spent fighting the language and debugging memory allocation to improve your code.
Ask questions. Take style commentary on your code graciously. Go to a Perl Mongers meeting. Go onto perlmonks.org. Go to YAPC or a Perl Workshop. Your Perl knowledge will grow by leaps and bounds.
Most of writing code to be "Perlish" would be taking advantage of the built-in functions in Perl.
For instance, this:
my ($part_1, $part_2, $part_3, $part_4) = split (/\./, $ip);
$part_4++;
if ( $part_4 > 255 ) {
$part_4 = 0;
($part_3++);
if ( $part_3 > 255 ) {
$part_3 = 0;
($part_2++);
if ( $part_2 > 255 ) {
$part_2 = 0;
($part_1++);
}
}
}
I would rewrite something like:
my #parts = split (/\./, $ip);
foreach my $part(reverse #parts){
$part++;
last unless ($part > 255 && !($part = 0));
}
That does what your code posted above does but is a little cleaner.
Are you sure the code does what you want though? Just to me it looks a little strange that you only move to the previous 'part' of the IP if the one after it is > 255.
Sometimes the most Perlish thing to do is to turn to CPAN instead of writing any code at all.
Here is a quick and dirty example using Net::CIDR::Lite and Net::IP::Match::Regexp:
#!/path/to/perl
use strict;
use warnings;
use English;
use IO::File;
use Net::CIDR::Lite;
use Net::IP::Match::Regexp qw(create_iprange_regexp match_ip);
my $cidr = Net::CIDR::Lite->new();
my $ips_fh = IO::File->new();
$ips_fh->open("ips") or die "Can't open 'ips': $OS_ERROR";
while (my $line = <$ips_fh>) {
chomp $line;
my ($start, $end) = split /,/, $line;
my $range = join('-', $start, $end);
$cidr->add_range($range);
}
$ips_fh->close();
my $regexp = create_iprange_regexp($cidr->list());
foreach my $traffic_fn (#ARGV) {
my $traffic_fh = IO::File->new();
$traffic_fh->open($traffic_fn) or die "Can't open '$traffic_fh': $OS_ERROR";
while (my $ip_address = <$traffic_fh>) {
chomp $ip_address;
if (match_ip($ip_address, $regexp)) {
print $ip_address, "\n";
}
}
$traffic_fh->close();
}
DISCLAIMER: I just banged that out, it's had minimal testing and no benchmarking. Sanity checks, error handling and comments omitted to keep the line count down. I didn't scrimp on the whitespace, though.
As for your code: There is no need to define your functions before you use them.
Another example rewrite:
sub is_less_than {
my $left = shift; # I'm sure you just "forgot" to put the my() here...
my $right = shift;
my ($left_part_1, $left_part_2, $left_part_3, $left_part_4) = split (/\./, $left);
my ($right_part_1, $right_part_2, $right_part_3, $right_part_4) = split (/\./, $right);
if ($left_part_1 != $right_part_1 ) {
return ($left_part_1 < $right_part_1);
}
if ($left_part_2 != $right_part_2 ) {
return ($left_part_2 < $right_part_2);
}
if ($left_part_3 != $right_part_3 ) {
return ($left_part_3 < $right_part_3);
}
if ($left_part_4 != $right_part_4 ) {
return ($left_part_4 < $right_part_4);
}
return (false); # They're equal
}
To this:
sub is_less_than {
my #left = split(/\./, shift);
my #right = split(/\./, shift);
# one way to do it...
for(0 .. 3) {
if($left[$_] != $right[$_]) {
return $left[$_] < $right[$_];
}
}
# another way to do it - let's avoid so much indentation...
for(0 .. 3) {
return $left[$_] < $right[$_] if $left[$_] != $right[$_];
}
# yet another way to do it - classic Perl unreadable one-liner...
$left[$_] == $right[$_] or return $left[$_] < $right[$_] for 0 .. 3;
# just a note - that last one uses the short-circuit logic to condense
# the if() statement to one line, so the for() can be added on the end.
# Perl doesn't allow things like do_this() if(cond) for(0 .. 3); You
# can only postfix one conditional. This is a workaround. Always use
# 'and' or 'or' in these spots, because they have the lowest precedence.
return 0 == 1; # false is not a keyword, or a boolean value.
# though honestly, it wouldn't hurt to just return 0 or "" or undef()
}
Also, here:
my ($ip, $end_ip, $junk) = split /,/;
$junk might need to be #junk to capture all the junk, or you can probably leave it off - if you assign an unknown-sized array to an "array" of two elements, it will silently discard all the extra stuff. So
my($ip, $end_ip) = split /,/;
And here:
foreach (#ARGV) {
open(TRAFFIC, $_) or die "Can't open $_ $OS_ERROR";
while (<TRAFFIC> ) {
chomp;
if (defined $addresses{$_}) {
print "$_\n";
}
}
close (TRAFFIC);
}
Instead of TRAFFIC, use a variable to store the filehandle. Also, in general, you should use exists() to check if a hash element exists, rather than defined() - it might exist but be set to undef (this shouldn't happen in your program, but it's a nice habit for when your program gets more complicated):
foreach (#ARGV) {
open(my $traffic, $_) or die "Can't open $_ $OS_ERROR";
while (<$traffic> ) {
chomp;
print "$_\n" if exists $addresses{$_};
}
# $traffic goes out of scope, and implicitly closes
}
Of course, you could also use Perl's wonderful <> operator, which opens each element of #ARGV for reading, and acts as a filehandle that iterates through them:
while(<>) {
chomp;
print "$_\n" if exists $addresses{$_};
}
As has been noted before, try to avoid useing English unless you use English qw( -no_match_vars ); to avoid the significant performance penalty of those evil match_vars in there. And as hasn't been noted yet, but should be...
ALWAYS ALWAYS ALWAYS always use strict; and use warnings; or else Larry Wall will descend from heaven and break your code. I see you have -w - this is enough, because even off of Unix, Perl parses the shebang line, and will find your -w and will use warnings; like it should. However, you need to use strict;. This will catch a lot of serious errors in your code, like not declaring variables with my or using false as a language keyword.
Making your code work under strict as well as warnings will result in MUCH cleaner code that never breaks for reasons you can't figure out. You'll spend hours at the debugger debugging and you'll probably end up using strict and warnings anyway just to figure out what the errors are. Only remove them if (and only if) your code is finished and you're releasing it and it never generates any errors.
While doing this certainly is one way to do it in Perl.
use strict;
use warnings;
my $new_ip;
{
my #parts = split ('\.', $ip);
foreach my $part(reverse #parts){
$part++;
if( $part > 255 ){
$part = 0;
next;
}else{
last;
}
}
$new_ip = join '.', reverse #parts;
}
This is how I would actually implement it.
use NetAddr::IP;
my $new_ip = ''.(NetAddr::IP->new($ip,0) + 1) or die;
I can't say that this solution will make your program more Perl-ish, but it might simplify your algorithm.
Rather than treating an IP address as a dotted-quad, base-256 number which needs the nested-if structure to implement the increment function, consider an IP address to be a 32-bit integer. Convert an IP of the form a.b.c.d into an integer with this (not tested):
sub ip2int {
my $ip = shift;
if ($ip =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/) {
return ($1 << 24) + ($2 << 16) + ($3 << 8) + $4;
} else {
return undef;
}
}
Now it's easy to determine if an IP falls between two endpoint IPs. Just do simple integer arithmetic and comparisons.
$begin = "192.168.5.0";
$end = "192.168.10.255";
$target = "192.168.6.2";
if (ip2int($target) >= ip2int($begin) && ip2int($target) <= ip2int($end)) {
print "$target is between $begin and $end\n";
} else {
print "$target is not in range\n";
}
Tell your coworkers that their perl looks too much like line noise. Please don't obfuscate your code just for the sake of obfuscation - it's asinine development goals like that which give perl such a bad reputation for being unreadable, when it's really bad programmers (apparently, like your coworkers) who write sloppy code. Nicely structured, indented, and logical code is a good thing. C is a good thing.
Seriously, though - the best place to figure out how to write perl is in the O'Reilly "Perl Best Practices", by Damian Conway. It tells you how he thinks you should do things, and he always gives good reasons for his position as well as occasionally giving good reasons to disagree. I do disagree with him on some points, but his reasoning is sound. The odds that you work with anyone who knows perl better than Mr. Conway are pretty slim, and having a printed book (or at least a Safari subscription) gives you some more solid backing for your arguments. Pick up a copy of the Perl Cookbook while you're at it, as looking at code examples for solving common problems should get you on the right track. I hate to say "buy the book", but those are exceptionally good books that any perl developer should read.
With regards to your specific code, you're using foreach, $_, split with no parens, shift, etc. It looks plenty perl-ish to my eyes - which have been developing with perl for quite a while. One note, though - I hate the English module. If you must use it, do it like use English qw( -no_match_vars );. The match_vars option slows down regexp parsing measurably, and the $PREMATCH / $POSTMATCH variables it provides aren't usually useful.
There is only 1 advice: use strict. Rest of it is hardly relevant.
I know exactly how you feel. My first language was FORTRAN and like a good FORTRAN programmer, I wrote FORTRAN in every language since :).
I have this really wonderful book Effective Perl Programming that I keep re-reading every now and then. Especially a chapter called "Idiomatic Perl". Here are a few things I use to keep my Perl looking like Perl: List Operators like for map and grep, slices and hash slices, the quote operators.
Another thing that keeps my Perl from looking like FORTRAN/C is a regular reading of module sources especially those of the masters.
You could use Acme::Bleach or Acme::Morse
While this would work:
use strict;
use warnings;
use 5.010;
use NetAddr::IP;
my %addresses;
# Parse all the ip addresses and record them in a hash.
{
open( my $ips_file, '<', 'ips') or die;
local $_; # or my $_ on Perl 5.10 or later
while( my $line = <$ips_file> ){
my ($ip, $end_ip) = split ',', $line;
next unless $ip and $end_ip;
$ip = NetAddr::IP->new( $ip, 0 ) or die;
$end_ip = NetAddr::IP->new( $end_ip ) or die;
while( $ip <= $end_ip ){
$addresses{$ip->addr} = 1;
$ip++;
}
}
close $ips_file
}
# print IP addresses in any of the found ranges
use English;
for my $arg (#ARGV) {
open(my $traffic, '<',$arg) or die "Can't open $arg $OS_ERROR";
while( my $ip = <$traffic> ){
chomp $ip;
if( $addresses{$ip} ){
say $ip
}
}
close ($traffic);
}
I would if possible use netmasks, because it gets even simpler:
use Modern::Perl;
use NetAddr::IP;
my #addresses;
{
open( my $file, '<', 'ips') or die;
while( (my $ip = <$file>) =~ s(,.*){} ){
next unless $ip;
$ip = NetAddr::IP->new( $ip ) or die;
push #addresses, $ip
}
close $file
}
for my $filename (#ARGV) {
open( my $traffic, '<', $filename )
or die "Can't open $filename";
while( my $ip = <$traffic> ) {
chomp $ip;
next unless $ip;
$ip = NetAddr::IP->new($ip) or next; # skip line on error
my #match;
for my $cmp ( #addresses ){
if( $ip->within($cmp) ){
push #match, $cmp;
#last;
}
}
say "$ip => #match" if #match;
say "# no match for $ip" unless #match;
}
close ($traffic);
}
Test ips file:
192.168.0.1/24
192.168.0.0
0:0:0:0:0:0:C0A8:0/128
Test traffic file:
192.168.1.0
192.168.0.0
192.168.0.5
Output:
# no match for 192.168.1.0/32
192.168.0.0/32 => 192.168.0.1/24 192.168.0.0/32 0:0:0:0:0:0:C0A8:0/128
192.168.0.5/32 => 192.168.0.1/24
Instead of doing this :
if ($left_part_1 != $right_part_1 ) {
return ($left_part_1 < $right_part_1);
}
you could do this :
return $left_part_1 < $right_part_1 if($left_part_1 != $right_part_1);
Also, you could use the Fatal module, to avoid checking stuff for errors.
The only criteria I use for "how my code looks" is how easy it is to read and understand the purpose of the code (especially by programmers unfamiliar with Perl), not whether it follows a particular style.
If a Perl language feature makes some logic easier to understand then I use it, if not I don't - even if it can do it in less code.
Your co-workers may think my code is extremely "un perl-ish", but I'll bet they understood exactly what the code is doing and could modify it to fix / extend it without any trouble:
my version:
#******************************************************************************
# Load the allowable ranges into a hash
#******************************************************************************
my %ipRanges = loadIPAddressFile("../conf/ip.cfg");
#*****************************************************************************
# Get the IP to check on the command line
#*****************************************************************************
my ( $in_ip_address ) = #ARGV;
# Convert it to number for comparison
my $ipToCheckNum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $in_ip_address));
#*****************************************************************************
# Loop through the ranges and see if the number is in any of them
#*****************************************************************************
my $startIp;
my $endIp;
my $msg = "IP [$in_ip_address] is not in range.\n";
foreach $startIp (keys(%ipRanges))
{
$endIp = $ipRanges{$startIp};
if ( $startIp <= $ipToCheckNum and $endIp >= $ipToCheckNum )
{
$msg = "IP [$in_ip_address] is in range [$startIp] to [$endIp]\n";
}
}
print $msg;
#******************************************************************************
# Function: loadIPAddressFile()
# Author: Ron Savage
# Date: 04/10/2009
#
# Description:
# loads the allowable IP address ranges into a hash from the specified file.
# Hash key is the starting value of the range, value is the end of the range.
#******************************************************************************
sub loadIPAddressFile
{
my $ipFileHandle;
my $startIP;
my $endIP;
my $startIPnum;
my $endIPnum;
my %rangeList;
#***************************************************************************
# Get the arguments sent
#***************************************************************************
my ( $ipFile ) = #_;
if ( open($ipFileHandle, "< $ipFile") )
{
while (<$ipFileHandle>)
{
( $startIP, $endIP ) = split(/\,/, $_ );
# Convert them to numbers for comparison
$startIPnum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $startIP));
$endIPnum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $endIP));
$rangeList{$startIPnum} = $endIPnum;
}
close($ipFileHandle);
}
else
{
print "Couldn't open [$ipFile].\n";
}
return(%rangeList);
}
(Note: the extra "#" lines are in there to preserve my freakin' spacing, which always gets whacked when posting code here)
Am I missing something... will any of the above array versions work? The mods are performed on variables local to the for loop. I think Brad Gilbert's Net::IP solution would be my choice. Chris Lutz pretty much cleaned the rest the way I would've.
As an aside - some of the comments about readability strike me as curious. Are there fewer [vigorous] complaints about the readability of Erlang/Lisp syntax because there is ONLY ONE way to write code in them?
This is probably more like C, but is also more simple:
use Socket qw(inet_aton inet_ntoa);
my $ip = ("192.156.255.255");
my $ip_1 = inet_ntoa(pack("N", unpack("N", inet_aton($ip))+1));
print "$ip $ip_1\n";
Update: I posted this before reading all of the code in the question. The code here just does the incrementing of the ip address.