Perl sub skips foreach within which it is called - perl

I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}

To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better

The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}

As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;

Related

Perl: Manipulating while(<>) loop in file reading

My question is regarding the while loop that reads line from files. The situation is that I want to store values from or the entire next line when the while loop while(<FILEHANDLE>) is performing the action on present line ($_). So what is the way to address this problem? Is there a specific function or module that does this thing?
If you want to process four lines at a time and each set of lines is separated by #FCC then you need to change perl's input file separator.
In your script put
$/="\#FCC"
This means that when you do (<>), each record you get in $_ is now four lines of your file.
use warnings;
use strict;
local $/="\#FCC";
while (<>) {
chomp;
#Each time we iterate, $_ is now all four lines of each record.
}
Edit
You'll need to backslash the #
You can read from <> anywhere, not just in the head of the loop, e.g.
while (my $line = <>) {
chomp $line;
my $another_line = <>;
chomp $another_line;
print "$line followed by $another_line\n";
}
Assuming your file is small-ish (perhaps less than 1gb) you could just stuff it into an array and walk it:
use warnings;
use strict;
my #lines;
while (<>) {
chomp;
push #lines, $_;
}
my $num_lines = #lines; #use of array in scalar context give length of array
# don't do last line (there is no next one)
$num_lines -= 1;
foreach (my $i = 0; $i < $num_lines; $i++) {
my $next_line = $i+1;
print "line $i plus $next_line:",$lines[$i],$lines[$i+1],"\n";
}
Note that the semantics of my solution is a bit different from the answer above. My solution would print out everything except the first line twice; if you wanted everything to be printed once, the above solution might make more sense.
If you want to read n lines at a time from a file you can use Tie::File and use an array to reference n elements at a time, like this:
use strict;
use warnings;
use Tie::File;
my $filename = 'path_to_your_file';
tie my #array, 'Tie::File', $filename or die 'Unable to open file';
my $index = 0;
my $size = #array;
while (1) {
last if ($index > $size); # Be careful! Try to do a better check than this!
print #array[$index..$index+3];
print "----\n";
$index += 4;
}
(This is just an example, try to write better code)
As the documentation says, the file is not loaded into memory all at once, so it will work even for large files.

In Perl, how can I make two passes over all the files specified on the command line via the diamond operator?

If i have a text file and i want to run two types of operations, but each operation must read each line of the text separately from the other. The only way i know how to do it is
open out,(">>out.txt");
while (<>){
#operation one
}
while (<>){
#operation two
}
close out;
but this will run only on the first while, in which the operation runs fine, but the second one will not be complete because the second while(<>) does not actually re-read the file but tries to continue from where the first while left. Which is at the end of the file. So is there another way? Or is there a way to tell the second while to start again at the beginning?
Given you mention in a comment:
perl example.pl text.txt
The answer is - don't use <> and instead open a filehandle.
my ( $filename ) = #ARVG;
open ( my $input, "<", $filename ) or die $!;
while ( <$input> ) {
print;
}
seek ( $input, 0, 0 );
while ( <$input> ) {
#something else
}
Alternatively, you can - assuming test.txt isn't particularly large - just read the whole thing into an array.
my #input_lines = <$input>;
foreach ( #input_lines ) {
#something
}
If you want to specify multiple files on the command line, you can wrap the whole thing in a foreach loop:
foreach my $filename ( #ARVG ) {
## open; while; seek; while etc.
}
Couldn't you simply use the following?
while (<>) {
operation1($_);
operation2($_);
}
If not, then I'm assuming you need to process the content of all the files using one operation before it's process by the other.
<> reads from the files listed in #ARGV, removing them as it opens them, so the simplest solution is to backup #ARGV and repopulate it.
my #argv = #ARGV;
while (<>) { operation1($_); }
#ARGV = #argv;
while (<>) { operation2($_); }
Of course, it will fail if <> reads from something other than a plain file or a symlink to a plain file. (Same goes for any solution using seek.) The only to make that work would be to load the entire file into temporary storage (e.g. memory or a temporary file). The following is the simplest example of that:
my #lines = <>;
for (#lines) { operation1($_); }
for (#lines) { operation2($_); }
If the data fits into memory:
my #lines = <>;
for ( #lines ){
# operation one
}
for ( #lines ){
# operation two
}
You can localize #ARGV before the first pass.
#!/usr/bin/env perl
use strict;
use warnings;
{
local #ARGV = #ARGV;
while (<>){
print "Pass 1: $_";
}
}
while (<>){
print "Pass 2: $_";
}
If no file handle is used with the diamond operator, Perl will examine the #ARGV special variable. If #ARGV has no elements, then the diamond operator will read from STDIN.
This is other way of achieve your requirements:
my #stdin=<>;
foreach my $item( #stdin ) {
# ...
}
foreach my $item( #stdin ) {
# ...
}
If you need to run the operation line by line, why not try something like this
sub operation_1 {
my $line = shift;
#processing for operation 1
}
sub operation_2 {
my $line = shift;
#processing for operation 2
}
while(<>) {
my $line = $_;
chomp($line);
operation_1($line);
operation_2($line);
}
If you were reading from an actual file, you could use
seek FILEHANDLE,0,0;
However, you are using stdin and I don't think that it's possible to rewind stdin and start over.

Print only the first word in line

I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines

Perl nested loops

I have a text file with the following contents:
NW1 SN1 DEV1
NW2 SN1 DEV2
I wrote a Perl script to iterate over the file, but it is running only once. The code is:
open(INPUT1,"input.txt");
#input_array = <INPUT1>;
for($i=0;$i<#input_array;$i++)
{
my ($ser,$node,#dev)=split(/ +/,$input_array[$i]);
for($x=0;$x<#dev;$x++)
{
print("Hi");
}
}
The script is iterating for the first line but not iterating for second line.
The code you posted could be improved, and brought up to more modern standards.
It uses a bareword filehandle INPUT1.
It doesn't use 3-arg open.
It doesn't use strict or warnings (see this question).
It doesn't check the return value of open or close. ( That's what the autodie line is for in the following code )
It uses C-style for loops when it doesn't need to.
It loads the entire file into memory even though it only deals with the file one line at a time.
use strict;
use warnings;
use autodie; # checks return value of open and close for us
# 3 arg open
open( my $in_fh, '<', 'input.txt' );
# don't read the file into memory until needed
while( <$in_fh> ){
# using $_ simplified this line
my ($ser,$node,#dev) = split;
# no need to track the indices just loop over the array
for my $dev (#dev){
print "Hi\n";
}
}
close $in_fh;
If for some reason you really did need the indices of the #dev array it would be better to write it like this:
for my $x ( 0..$#dev ){
...
}
If you want to explicitly store the line into a variable of a different name, you would change the while loop to this:
while( my $line = <$in_fh> ){
my ($ser,$node,#dev) = split / +/ $line;
...
}
You forgot to enter to slurp mode of the '<>' operator. To suck in complete file you should do this:
open(INPUT1,"input.txt");
undef $/;
#input_array = split(/\r?\n/, <INPUT1>);
close INPUT1;
or better yet like this:
open(INPUT1,"input.txt");
while(<INPUT1>) {
chomp;
my ($ser,$node,#dev)=split(/ +/,$_);
for($x=0;$x<#dev;$x++)
{
print("Hi");
}
}
close INPUT1;

How can I write Perl that doesn't look like C?

My co-workers complain that my Perl looks too much like C, which is natural since I program in C most of the time, and Perl just a bit. Here's my latest effort. I'm interest in Perl that is easy to understand. I'm a bit of a Perl critic, and have little tolerance for cryptic Perl. But with readability in mind, how could the following code be more Perlish?
It's goal is to do a traffic analysis and find which IP addresses are within the ranges given in the file "ips". Here's my effort:
#!/usr/bin/perl -w
# Process the files named in the arguments, which will contain lists of IP addresses, and see if
# any of them are in the ranges spelled out in the local file "ip", which has contents of the
# form start-dotted-quad-ip-address,end-dotted-quad-ip_address,stuff_to_be_ignored
use English;
open(IPS,"ips") or die "Can't open 'ips' $OS_ERROR";
# Increment a dotted-quad ip address
# Ignore the fact that part1 could get erroneously large.
sub increment {
$ip = shift;
my ($part_1, $part_2, $part_3, $part_4) = split (/\./, $ip);
$part_4++;
if ( $part_4 > 255 ) {
$part_4 = 0;
($part_3++);
if ( $part_3 > 255 ) {
$part_3 = 0;
($part_2++);
if ( $part_2 > 255 ) {
$part_2 = 0;
($part_1++);
}
}
}
return ("$part_1.$part_2.$part_3.$part_4");
}
# Compare two dotted-quad ip addresses.
sub is_less_than {
$left = shift;
$right = shift;
my ($left_part_1, $left_part_2, $left_part_3, $left_part_4) = split (/\./, $left);
my ($right_part_1, $right_part_2, $right_part_3, $right_part_4) = split (/\./, $right);
if ($left_part_1 != $right_part_1 ) {
return ($left_part_1 < $right_part_1);
}
if ($left_part_2 != $right_part_2 ) {
return ($left_part_2 < $right_part_2);
}
if ($left_part_3 != $right_part_3 ) {
return ($left_part_3 < $right_part_3);
}
if ($left_part_4 != $right_part_4 ) {
return ($left_part_4 < $right_part_4);
}
return (false); # They're equal
}
my %addresses;
# Parse all the ip addresses and record them in a hash.
while (<IPS>) {
my ($ip, $end_ip, $junk) = split /,/;
while (is_less_than($ip, $end_ip) ) {
$addresses{$ip}=1;
$ip = increment($ip);
}
}
# print IP addresses in any of the found ranges
foreach (#ARGV) {
open(TRAFFIC, $_) or die "Can't open $_ $OS_ERROR";
while (<TRAFFIC> ) {
chomp;
if (defined $addresses{$_}) {
print "$_\n";
}
}
close (TRAFFIC);
}
From years of seeing Perl code written by C programmers, here's some generic advice:
Use hashes. Use lists. USE HASHES! USE LISTS! Use list operations (map, grep, split, join), especially for small loops. Don't use fancy list algorithms; pop, splice, push, shift and unshift are cheaper. Don't use trees; hashes are cheaper. Hashes are cheap, make them, use them and throw them out! Use the iterator for loop, not the 3-arg one. Don't call things $var1, $var2, $var3; use a list instead. Don't call things $var_foo, $var_bar, $var_baz; use a hash instead. Use $foo ||= "default". Don't use $_ if you have to type it.
Don't use prototypes, IT'S A TRAP!!
Use regexes, not substr() or index(). Love regexes. Use the /x modifier to make them readable.
Write statement if $foo when you want a block-less conditional. There's almost always a better way to write a nested condition: try recursion, try a loop, try a hash.
Declare variables when you need them, not at the top of the subroutine. use strict. use warnings, and fix them all. use diagnostics. Write tests. Write POD.
Use CPAN. Use CPAN! USE CPAN! Someone's probably already done it, better.
Run perlcritic. Run it with --brutal just for kicks. Run perltidy. Think about why you do everything. Change your style.
Use the time not spent fighting the language and debugging memory allocation to improve your code.
Ask questions. Take style commentary on your code graciously. Go to a Perl Mongers meeting. Go onto perlmonks.org. Go to YAPC or a Perl Workshop. Your Perl knowledge will grow by leaps and bounds.
Most of writing code to be "Perlish" would be taking advantage of the built-in functions in Perl.
For instance, this:
my ($part_1, $part_2, $part_3, $part_4) = split (/\./, $ip);
$part_4++;
if ( $part_4 > 255 ) {
$part_4 = 0;
($part_3++);
if ( $part_3 > 255 ) {
$part_3 = 0;
($part_2++);
if ( $part_2 > 255 ) {
$part_2 = 0;
($part_1++);
}
}
}
I would rewrite something like:
my #parts = split (/\./, $ip);
foreach my $part(reverse #parts){
$part++;
last unless ($part > 255 && !($part = 0));
}
That does what your code posted above does but is a little cleaner.
Are you sure the code does what you want though? Just to me it looks a little strange that you only move to the previous 'part' of the IP if the one after it is > 255.
Sometimes the most Perlish thing to do is to turn to CPAN instead of writing any code at all.
Here is a quick and dirty example using Net::CIDR::Lite and Net::IP::Match::Regexp:
#!/path/to/perl
use strict;
use warnings;
use English;
use IO::File;
use Net::CIDR::Lite;
use Net::IP::Match::Regexp qw(create_iprange_regexp match_ip);
my $cidr = Net::CIDR::Lite->new();
my $ips_fh = IO::File->new();
$ips_fh->open("ips") or die "Can't open 'ips': $OS_ERROR";
while (my $line = <$ips_fh>) {
chomp $line;
my ($start, $end) = split /,/, $line;
my $range = join('-', $start, $end);
$cidr->add_range($range);
}
$ips_fh->close();
my $regexp = create_iprange_regexp($cidr->list());
foreach my $traffic_fn (#ARGV) {
my $traffic_fh = IO::File->new();
$traffic_fh->open($traffic_fn) or die "Can't open '$traffic_fh': $OS_ERROR";
while (my $ip_address = <$traffic_fh>) {
chomp $ip_address;
if (match_ip($ip_address, $regexp)) {
print $ip_address, "\n";
}
}
$traffic_fh->close();
}
DISCLAIMER: I just banged that out, it's had minimal testing and no benchmarking. Sanity checks, error handling and comments omitted to keep the line count down. I didn't scrimp on the whitespace, though.
As for your code: There is no need to define your functions before you use them.
Another example rewrite:
sub is_less_than {
my $left = shift; # I'm sure you just "forgot" to put the my() here...
my $right = shift;
my ($left_part_1, $left_part_2, $left_part_3, $left_part_4) = split (/\./, $left);
my ($right_part_1, $right_part_2, $right_part_3, $right_part_4) = split (/\./, $right);
if ($left_part_1 != $right_part_1 ) {
return ($left_part_1 < $right_part_1);
}
if ($left_part_2 != $right_part_2 ) {
return ($left_part_2 < $right_part_2);
}
if ($left_part_3 != $right_part_3 ) {
return ($left_part_3 < $right_part_3);
}
if ($left_part_4 != $right_part_4 ) {
return ($left_part_4 < $right_part_4);
}
return (false); # They're equal
}
To this:
sub is_less_than {
my #left = split(/\./, shift);
my #right = split(/\./, shift);
# one way to do it...
for(0 .. 3) {
if($left[$_] != $right[$_]) {
return $left[$_] < $right[$_];
}
}
# another way to do it - let's avoid so much indentation...
for(0 .. 3) {
return $left[$_] < $right[$_] if $left[$_] != $right[$_];
}
# yet another way to do it - classic Perl unreadable one-liner...
$left[$_] == $right[$_] or return $left[$_] < $right[$_] for 0 .. 3;
# just a note - that last one uses the short-circuit logic to condense
# the if() statement to one line, so the for() can be added on the end.
# Perl doesn't allow things like do_this() if(cond) for(0 .. 3); You
# can only postfix one conditional. This is a workaround. Always use
# 'and' or 'or' in these spots, because they have the lowest precedence.
return 0 == 1; # false is not a keyword, or a boolean value.
# though honestly, it wouldn't hurt to just return 0 or "" or undef()
}
Also, here:
my ($ip, $end_ip, $junk) = split /,/;
$junk might need to be #junk to capture all the junk, or you can probably leave it off - if you assign an unknown-sized array to an "array" of two elements, it will silently discard all the extra stuff. So
my($ip, $end_ip) = split /,/;
And here:
foreach (#ARGV) {
open(TRAFFIC, $_) or die "Can't open $_ $OS_ERROR";
while (<TRAFFIC> ) {
chomp;
if (defined $addresses{$_}) {
print "$_\n";
}
}
close (TRAFFIC);
}
Instead of TRAFFIC, use a variable to store the filehandle. Also, in general, you should use exists() to check if a hash element exists, rather than defined() - it might exist but be set to undef (this shouldn't happen in your program, but it's a nice habit for when your program gets more complicated):
foreach (#ARGV) {
open(my $traffic, $_) or die "Can't open $_ $OS_ERROR";
while (<$traffic> ) {
chomp;
print "$_\n" if exists $addresses{$_};
}
# $traffic goes out of scope, and implicitly closes
}
Of course, you could also use Perl's wonderful <> operator, which opens each element of #ARGV for reading, and acts as a filehandle that iterates through them:
while(<>) {
chomp;
print "$_\n" if exists $addresses{$_};
}
As has been noted before, try to avoid useing English unless you use English qw( -no_match_vars ); to avoid the significant performance penalty of those evil match_vars in there. And as hasn't been noted yet, but should be...
ALWAYS ALWAYS ALWAYS always use strict; and use warnings; or else Larry Wall will descend from heaven and break your code. I see you have -w - this is enough, because even off of Unix, Perl parses the shebang line, and will find your -w and will use warnings; like it should. However, you need to use strict;. This will catch a lot of serious errors in your code, like not declaring variables with my or using false as a language keyword.
Making your code work under strict as well as warnings will result in MUCH cleaner code that never breaks for reasons you can't figure out. You'll spend hours at the debugger debugging and you'll probably end up using strict and warnings anyway just to figure out what the errors are. Only remove them if (and only if) your code is finished and you're releasing it and it never generates any errors.
While doing this certainly is one way to do it in Perl.
use strict;
use warnings;
my $new_ip;
{
my #parts = split ('\.', $ip);
foreach my $part(reverse #parts){
$part++;
if( $part > 255 ){
$part = 0;
next;
}else{
last;
}
}
$new_ip = join '.', reverse #parts;
}
This is how I would actually implement it.
use NetAddr::IP;
my $new_ip = ''.(NetAddr::IP->new($ip,0) + 1) or die;
I can't say that this solution will make your program more Perl-ish, but it might simplify your algorithm.
Rather than treating an IP address as a dotted-quad, base-256 number which needs the nested-if structure to implement the increment function, consider an IP address to be a 32-bit integer. Convert an IP of the form a.b.c.d into an integer with this (not tested):
sub ip2int {
my $ip = shift;
if ($ip =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/) {
return ($1 << 24) + ($2 << 16) + ($3 << 8) + $4;
} else {
return undef;
}
}
Now it's easy to determine if an IP falls between two endpoint IPs. Just do simple integer arithmetic and comparisons.
$begin = "192.168.5.0";
$end = "192.168.10.255";
$target = "192.168.6.2";
if (ip2int($target) >= ip2int($begin) && ip2int($target) <= ip2int($end)) {
print "$target is between $begin and $end\n";
} else {
print "$target is not in range\n";
}
Tell your coworkers that their perl looks too much like line noise. Please don't obfuscate your code just for the sake of obfuscation - it's asinine development goals like that which give perl such a bad reputation for being unreadable, when it's really bad programmers (apparently, like your coworkers) who write sloppy code. Nicely structured, indented, and logical code is a good thing. C is a good thing.
Seriously, though - the best place to figure out how to write perl is in the O'Reilly "Perl Best Practices", by Damian Conway. It tells you how he thinks you should do things, and he always gives good reasons for his position as well as occasionally giving good reasons to disagree. I do disagree with him on some points, but his reasoning is sound. The odds that you work with anyone who knows perl better than Mr. Conway are pretty slim, and having a printed book (or at least a Safari subscription) gives you some more solid backing for your arguments. Pick up a copy of the Perl Cookbook while you're at it, as looking at code examples for solving common problems should get you on the right track. I hate to say "buy the book", but those are exceptionally good books that any perl developer should read.
With regards to your specific code, you're using foreach, $_, split with no parens, shift, etc. It looks plenty perl-ish to my eyes - which have been developing with perl for quite a while. One note, though - I hate the English module. If you must use it, do it like use English qw( -no_match_vars );. The match_vars option slows down regexp parsing measurably, and the $PREMATCH / $POSTMATCH variables it provides aren't usually useful.
There is only 1 advice: use strict. Rest of it is hardly relevant.
I know exactly how you feel. My first language was FORTRAN and like a good FORTRAN programmer, I wrote FORTRAN in every language since :).
I have this really wonderful book Effective Perl Programming that I keep re-reading every now and then. Especially a chapter called "Idiomatic Perl". Here are a few things I use to keep my Perl looking like Perl: List Operators like for map and grep, slices and hash slices, the quote operators.
Another thing that keeps my Perl from looking like FORTRAN/C is a regular reading of module sources especially those of the masters.
You could use Acme::Bleach or Acme::Morse
While this would work:
use strict;
use warnings;
use 5.010;
use NetAddr::IP;
my %addresses;
# Parse all the ip addresses and record them in a hash.
{
open( my $ips_file, '<', 'ips') or die;
local $_; # or my $_ on Perl 5.10 or later
while( my $line = <$ips_file> ){
my ($ip, $end_ip) = split ',', $line;
next unless $ip and $end_ip;
$ip = NetAddr::IP->new( $ip, 0 ) or die;
$end_ip = NetAddr::IP->new( $end_ip ) or die;
while( $ip <= $end_ip ){
$addresses{$ip->addr} = 1;
$ip++;
}
}
close $ips_file
}
# print IP addresses in any of the found ranges
use English;
for my $arg (#ARGV) {
open(my $traffic, '<',$arg) or die "Can't open $arg $OS_ERROR";
while( my $ip = <$traffic> ){
chomp $ip;
if( $addresses{$ip} ){
say $ip
}
}
close ($traffic);
}
I would if possible use netmasks, because it gets even simpler:
use Modern::Perl;
use NetAddr::IP;
my #addresses;
{
open( my $file, '<', 'ips') or die;
while( (my $ip = <$file>) =~ s(,.*){} ){
next unless $ip;
$ip = NetAddr::IP->new( $ip ) or die;
push #addresses, $ip
}
close $file
}
for my $filename (#ARGV) {
open( my $traffic, '<', $filename )
or die "Can't open $filename";
while( my $ip = <$traffic> ) {
chomp $ip;
next unless $ip;
$ip = NetAddr::IP->new($ip) or next; # skip line on error
my #match;
for my $cmp ( #addresses ){
if( $ip->within($cmp) ){
push #match, $cmp;
#last;
}
}
say "$ip => #match" if #match;
say "# no match for $ip" unless #match;
}
close ($traffic);
}
Test ips file:
192.168.0.1/24
192.168.0.0
0:0:0:0:0:0:C0A8:0/128
Test traffic file:
192.168.1.0
192.168.0.0
192.168.0.5
Output:
# no match for 192.168.1.0/32
192.168.0.0/32 => 192.168.0.1/24 192.168.0.0/32 0:0:0:0:0:0:C0A8:0/128
192.168.0.5/32 => 192.168.0.1/24
Instead of doing this :
if ($left_part_1 != $right_part_1 ) {
return ($left_part_1 < $right_part_1);
}
you could do this :
return $left_part_1 < $right_part_1 if($left_part_1 != $right_part_1);
Also, you could use the Fatal module, to avoid checking stuff for errors.
The only criteria I use for "how my code looks" is how easy it is to read and understand the purpose of the code (especially by programmers unfamiliar with Perl), not whether it follows a particular style.
If a Perl language feature makes some logic easier to understand then I use it, if not I don't - even if it can do it in less code.
Your co-workers may think my code is extremely "un perl-ish", but I'll bet they understood exactly what the code is doing and could modify it to fix / extend it without any trouble:
my version:
#******************************************************************************
# Load the allowable ranges into a hash
#******************************************************************************
my %ipRanges = loadIPAddressFile("../conf/ip.cfg");
#*****************************************************************************
# Get the IP to check on the command line
#*****************************************************************************
my ( $in_ip_address ) = #ARGV;
# Convert it to number for comparison
my $ipToCheckNum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $in_ip_address));
#*****************************************************************************
# Loop through the ranges and see if the number is in any of them
#*****************************************************************************
my $startIp;
my $endIp;
my $msg = "IP [$in_ip_address] is not in range.\n";
foreach $startIp (keys(%ipRanges))
{
$endIp = $ipRanges{$startIp};
if ( $startIp <= $ipToCheckNum and $endIp >= $ipToCheckNum )
{
$msg = "IP [$in_ip_address] is in range [$startIp] to [$endIp]\n";
}
}
print $msg;
#******************************************************************************
# Function: loadIPAddressFile()
# Author: Ron Savage
# Date: 04/10/2009
#
# Description:
# loads the allowable IP address ranges into a hash from the specified file.
# Hash key is the starting value of the range, value is the end of the range.
#******************************************************************************
sub loadIPAddressFile
{
my $ipFileHandle;
my $startIP;
my $endIP;
my $startIPnum;
my $endIPnum;
my %rangeList;
#***************************************************************************
# Get the arguments sent
#***************************************************************************
my ( $ipFile ) = #_;
if ( open($ipFileHandle, "< $ipFile") )
{
while (<$ipFileHandle>)
{
( $startIP, $endIP ) = split(/\,/, $_ );
# Convert them to numbers for comparison
$startIPnum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $startIP));
$endIPnum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $endIP));
$rangeList{$startIPnum} = $endIPnum;
}
close($ipFileHandle);
}
else
{
print "Couldn't open [$ipFile].\n";
}
return(%rangeList);
}
(Note: the extra "#" lines are in there to preserve my freakin' spacing, which always gets whacked when posting code here)
Am I missing something... will any of the above array versions work? The mods are performed on variables local to the for loop. I think Brad Gilbert's Net::IP solution would be my choice. Chris Lutz pretty much cleaned the rest the way I would've.
As an aside - some of the comments about readability strike me as curious. Are there fewer [vigorous] complaints about the readability of Erlang/Lisp syntax because there is ONLY ONE way to write code in them?
This is probably more like C, but is also more simple:
use Socket qw(inet_aton inet_ntoa);
my $ip = ("192.156.255.255");
my $ip_1 = inet_ntoa(pack("N", unpack("N", inet_aton($ip))+1));
print "$ip $ip_1\n";
Update: I posted this before reading all of the code in the question. The code here just does the incrementing of the ip address.