How can I split a pipe-separated string in a list? - perl

Here at work, we are working on a newsletter system that our clients can use. As an intern one of my jobs is to help with the smaller pieces of the puzzle. In this case what I need to do is scan the logs of the email server for bounced messages and add the emails and the reason the email bounced to a "bad email database".
The bad emails table has two columns: 'email' and 'reason'
I use the following statement to get the information from the logs and send it to the Perl script
grep " 550 " /var/log/exim/main.log | awk '{print $5 "|" $23 " " $24 " " $25 " " $26 " " $27 " " $28 " " $29 " " $30 " " $31 " " $32 " " $33}' | perl /devl/bademails/getbademails.pl
If you have sugestions on a more efficient awk script, then I would be glad to hear those too but my main focus is the Perl script. The awk pipes "foo#bar.com|reason for bounce" to the Perl script. I want to take in these strings, split them at the | and put the two different parts into their respective columns in the database. Here's what I have:
#!usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbpath = "dbi:mysql:database=system;host=localhost:3306";
my $dbh = DBI->connect($dbpath, "root", "******")
or die "Can't open database: $DBI::errstr";
while(<STDIN>) {
my $line = $_;
my #list = # ? this is where i am confused
for (my($i) = 0; $i < 1; $i++)
{
if (defined($list[$i]))
{
my #val = split('|', $list[$i]);
print "Email: $val[0]\n";
print "Reason: $val[1]";
my $sth = $dbh->prepare(qq{INSERT INTO bademails VALUES('$val[0]', '$val[1]')});
$sth->execute();
$sth->finish();
}
}
}
exit 0;

Something like this would work:
while(<STDIN>) {
my $line = $_;
chomp($line);
my ($email,$reason) = split(/\|/, $line);
print "Email: $email\n";
print "Reason: $reason";
my $sth = $dbh->prepare(qq{INSERT INTO bademails VALUES(?, ?)});
$sth->execute($email, $reason);
$sth->finish();
}
You might find it easier to just do the whole thing in Perl. "next unless / 550 /" could replace the grep and a regex could probably replace the awk.

I'm not sure what you want to put in #list? If the awk pipes one line per entry, you'll have that in $line, and you don't need the for loop on the #list.
That said, if you're going to pipe it into Perl, why bother with the grep and AWK in the first place?
#!/ust/bin/perl -w
use strict;
while (<>) {
next unless / 550 /;
my #tokens = split ' ', $_;
my $addr = $tokens[4];
my $reason = join " ", #tokens[5..$#tokens];
# ... DBI code
}
Side note about the DBI calls: you should really use placeholders so that a "bad email" wouldn't be able to inject SQL into your database.

Have you considered using App::Ack instead? Instead of shelling out to an external program, you can just use Perl instead. Unfortunately, you'll have to read through the ack program code to really get a sense of how to do this, but you should get a more portable program as a result.

Why not forgo the grep and awk and go straight to Perl?
Disclaimer: I have not checked if the following code compiles:
while (<STDIN>) {
next unless /550/; # skips over the rest of the while loop
my #fields = split;
my $email = $fields[4];
my $reason = join(' ', #fields[22..32]);
...
}
EDIT: See #dland's comment for a further optimisation :-)
Hope this helps?

my(#list) = split /\|/, $line;
This will generate more than two entries in #list if you have extra pipe symbols in the tail of the line. To avoid that, use:
$line =~ m/^([^|]+)\|(.*)$/;
my(#list) = ($1, $2);
The dollar in the regex is arguably superfluous, but also documents 'end of line'.

Related

Perl input multiple text files from command line and print them

I'm trying to find the number of positive (P) and negative integers (N), number of words with all lower case characters(L),all upper case characters(F), Number of words with the first character capital and the rest of characters lower case(U).
List of words in alphabetical order together with the line number and the filename of each occurrence The following example illustrates the output of the program on sample input.
file1
Hello! world my friend. ALI went to school. Ali has -1 dollars and 10 TL
file2
Hello there my friend. VELI went to school. Veli has 10,
dollars and -10,TL
After you run your program,
>prog.pl file1 file2
the output you get is as follows:
N=2
P=2
L=18
F=4
U=4
-----------
ali file1 (1 1)
and file1 (2) file2 (2)
dollars file1 (2) file2 (2)
friend file1 (1) file2 (1)
has file1 (1) file2 (1)
hello file1 (1) file2 (1)
my file1 (1) file2 (1)
school file1 (1) file2 (1)
there file2 (1)
tl file1 (2) file2 (2)
to file1 (1) file2 (1)
veli file2 (1 1)
went file1 (1) file2 (1)
world file1 (1)
I tried to fill the entries,could you help me to deal with it?
#!/usr/bin/perl
$N= 0 ;
$P= 0 ;
$L= 0 ;
$F= 0 ;
$U= 0 ;
foreach __________ ( ____________) {__________________
or die("Cannot opened because: $!") ;
$lineno = 0 ;
while($line=<>) {
chomp ;
$lineno++ ;
#tokens = split $line=~ (/[ ,.:;!\?]+/) ;
foreach $str (#tokens) {
$N++ if ($str =~ /^-\d+$/) ;
$P++ if ($str =~ /^\d+$/) ;
$L++ if ($str =~ /^[a-z]+$/) ;
$F++ if ($str =~ /^[A-Z][a-z]+$/) ;
$U++ if ($str =~ /^[A-Z]+$/) ;
if ($str =~ /^[a-zA-Z]+$/) {
$str =~ __________________;
if ( (____________________) || ($words{$str} =~ /\)$/ ) ) {
$words{$str} = $words{$str} . " " . $file . " (" . $lineno ;
}
else {_______________________________________;
}}}}
close(FH) ;
foreach $w (__________________) {
if ( ! ($words{$w} =~ /\)$/ )) {
$words{$w} = ______________________;
}}}
print "N=$N\n" ;
print "P=$P\n" ;
print "L=$L\n" ;
print "F=$F\n" ;
print "U=$U\n" ;
print "-----------\n" ;
foreach $w (sort(keys(%words))) {
print $w," ", $words{$w}, "\n";
}
A few hints, and I'll let you get on your way...
Perl has what is called a diamond operator. This operator opens all files placed on the command line (which is read into the #ARGS array), and reads them line-by-line.
use strict;
use warnings;
use autodie;
use feature qw(say);
while my $line ( <> ) {
chomp $line;
say "The line read in is '$line'";
}
Try this program and run it as you would your program. See what happens.
Next, take a look at the Perl documentation for variables related to file handles. Especially take a look at the $/ variable. This variable is what used to break records. It's normally set to a new-line, so when you read in a file, you read it in line-by-line. You may want to try that. If not, you can fall back onto something like this:
use strict;
use warnings;
use autodie;
use feature qw(say);
while my $line ( <> ) {
chomp $line;
#words = split /\s+/, $line;
for my $word ( #words ) {
say "The word is '$word'";
}
}
Now you can use a hash to track which words were in each file and how many times. You can also track the various types of words you've mentioned. However, please don't use variables such as $U. Use $first_letter_uppercase. This will have more meaning in your program and will be less confusing for you.
Your teacher is teaching you the way Perl was written almost 30 years ago. This was back before God created the Internet. (Well, not quite. The Internet was already 10 years old, but no one outside of a few academics had heard of it). Perl programming has greatly evolved since then. Get yourself a good book on Modern Perl (that is Perl 5.x).
The pragmas at the beginning of my program (the use statements) do the following:
use strict - Use strict syntax. This does several things, but the main thing is to make sure you cannot use a variable unless you first declare it. (using most likely my). This prevents mistakes such as putting $name in one place, and referring to $Name in another place.
use warnings - This warns you of basic errors such as you're attempting to use a variable that isn't defined. By default, Perl assumes the variable is a null string or equal to zero if you use it in an arithmetic context. When you attempt to print or check a variable that hasn't been assigned a value. It probably means you have a logic mistake.
The above two pragmas will catch 90% of your errors.
use autodie - This will cause your program to automatically die in many circumstances. For example, you attempt to open a none existent file for reading. This way, you don't have to remember to check each instance of whether or not certain operations succeeded of failed.
use feature qw(say) - This allows you to use say instead of print. The say command is just like print, but automatically adds a new line on the end. It can make your code way cleaner and easier to understand.
For example:
print "N=$N\n" ;
vs.
say "N=$N" ;
Here's how I'd write that program. But it won't get you many marks as it's a long way from the "fill in the blanks" approach that your teacher is using. But that's good, because your teacher's Perl is very dated.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my ($N, $P, $L, $F, $U);
my %words;
while (<>) {
my #tokens = split /[^-\w]+/;
foreach my $token (#tokens) {
$N++ if $token =~ /^-\d+$/;
$P++ if $token =~ /^\d+$/;
next unless $token =~ /[a-z]/i;
$L++ if $token eq lc $token;
$U++ if $token eq uc $token;
$F++ if $token eq ucfirst lc $token;
push #{$words{lc $token}{$ARGV}}, $.;
}
close ARGV if eof;
}
say "N=$N";
say "P=$P";
say "L=$L";
say "F=$F";
say "U=$U";
for my $word (sort { $a cmp $b } keys %words) {
print "$word ";
for my $file (sort { $a cmp $b } keys %{$words{$word}} ) {
print "$file (", join(' ', #{$words{$word}{$file}}), ') ';
}
print "\n";
}

Perl script not running conditional statements?

I know I'm a newb in Perl so please excuse my stupid mistakes. I am making a calculator that takes user input, but it isn't working. It runs fine and dandy until the if statement, when it reaches the if statement it just ends the program. I looked through forums and books but couldn't find anything.
use warnings;
print "number\n";
$number = <STDIN>;
# Asks the user for what number to calculate.
print "Second number\n";
$secnumber = <STDIN>;
# Asks the user for second number to calculate the first number with
print "Calculation\n Multiplication x\n Addition +\n Substraction -\n ";
$calculation = <STDIN>;
# Asks the user for which calculation to make.
if ($calculation eq "x") {
print "$number" . 'x' . "\n$secnumber" . '=' . "\n" . ($number * $secnumber);
} elsif ($calculation eq "+") {
print "$number" . '+' . "\n$secnumber" . '=' . "\n" . ($number + $secnumber);
} elsif ($calculation eq "-") {
print "$number" . '-' . "\n$secnumber" . '=' . "\n" . ($number - $secnumber);
}
# Displays the calculation and answer.
The value assigned to $calculation will contain a new line character. So on a unix type system the value assigned to $calculation would actually be +\n
you need to use the chomp function which will remove the new line character. you can find more information on chomp with this URL http://perldoc.perl.org/functions/chomp.html
You can apply chomp in two ways. You can have it chomp the new line at the time of reading it
chomp ($calculation = <STDIN>);
Or you can do it after the assignement.
$calculation = <STDIN>;
chomp ($calculation);
Also as a new user to Perl, i would recommend as well as using the warning pragma, you should also use the strict pragma to help you keep good maintainable code.
use warnings;
use strict;
You need to do a chomp($calculation) before the if stmt and after the initial assignment operation. (the $calulation = <STDIN>;).

Why are Perl's $. and $ARGV behaving strangely after setting #ARGV and using <>

I wrote a perl program to take a regex from the command line and do a recursive search of the current directory for certain filenames and filetypes, grep each one for the regex, and output the results, including filename and line number. [ basic grep + find functionality that I can go in and customize as needed ]
cat <<'EOF' >perlgrep2.pl
#!/usr/bin/env perl
$expr = join ' ', #ARGV;
my #filetypes = qw(cpp c h m txt log idl java pl csv);
my #filenames = qw(Makefile);
my $find="find . ";
my $nfirst = 0;
foreach(#filenames) {
$find .= " -o " if $nfirst++;
$find .= "-name \"$_\"";
}
foreach(#filetypes) {
$find .= " -o " if $nfirst++;
$find .= "-name \\*.$_";
}
#files=`$find`;
foreach(#files) {
s#^\./##;
chomp;
}
#ARGV = #files;
foreach(<>) {
print "$ARGV($.): $_" if m/\Q$expr/;
close ARGV if eof;
}
EOF
cat <<'EOF' >a.pl
print "hello ";
$a=1;
print "there";
EOF
cat <<'EOF' >b.pl
print "goodbye ";
print "all";
$a=1;
EOF
chmod ugo+x perlgrep2.pl
./perlgrep2.pl print
If you copy and paste this into your terminal, you will see this:
perlgrep2.pl(36): print "hello ";
perlgrep2.pl(0): print "there";
perlgrep2.pl(0): print "goodbye ";
perlgrep2.pl(0): print "all";
perlgrep2.pl(0): print "$ARGV($.): $_" if m/\Q$expr/;
This is very surprising to me. The program appears to be working except that the $. and $ARGV variables do not have the values I expected. It appears from the state of the variables that perl has already read all three files (total of 36 lines) when it executes the first iteration of the loop over <>. What's going on ? How to fix ? This is Perl 5.12.4.
You're using foreach(<>) where you should be using while(<>). foreach(<>) will read every file in #ARGV into a temporary list before it starts iterating over it.

Splitting and printing with Perl

My Perl script is attempting to take in a command line argument that is a file name such as name.txt or hello.txt.exe and parse out the file extension based on the . delimiter, and print only the extension like exe or txt. Here's what I currently have which doesn't print anything and I'm not entirely sure why.
usr/bin/perl -w
use strict;
my ($user_arg) = shift #ARGV;
my ($ext);
if ( ($ext) = $user_arg =~ /(\.[^.].+)$/)
{
print "Ends in ", ($ext) = $user_arg =~ /(\.[^.].+)$/ , "\n";
print "Ends in" , ($ext) = $user_arg =~ /(\.[^.]+)$/, "\n";
}
elsif( ($ext) = $user_arg =~ /(\.[^.]+)$/)
{
print"Ends in " , ($ext), "\n";
}
else
{
print "No Extension";
}
*Updated, now my problem is the first statement will print both conditions if it's something like name.txt it will print .txt twice, where I want it to only print .txt once UNLESS it's name.txt.exe where I'd like it to print .txt.exe then .exe
There's two main issues here:
1) You need to shift off #ARGV
my $arg = shift #ARGV;
2) You need to escape the 'dot'
my #values = split /\./, $user_arg;
Other things...
You usually want to sanitize user input:
die "usage: $0 filename\n" if {some condition}
I think you mean chomp $val; in your foreach.
It wouldn't hurt to be familiar with File::Basename, fileparse could make your life easier. Although it might be overkill here.
UPDATE
You should be able to integrate this yourself. In your case you won't need to loop
over a list of files, you'll just have one.
This doesn't do what you want where it prints "txt.exe", "exe". But you can fine tune this to your liking.
my #file_tests = qw(nosuffix testfile.txt /path/to/file.exe foo.bar.baz);
for my $fullname (#file_tests) {
my #names = split /\./, $fullname;
# shift off the first element, which will
# give you the list of suffixs or an empty list
shift #names;
# you can decide how you want to print this list
# if scalar #names is 0 don't print anything
print "list of suffixes: " . join( ', ', #names ) . "\n"
if scalar(#names) > 0;
}
OUTPUT:
list of suffixes: txt
list of suffixes: exe
list of suffixes: bar, baz

Replace every second comma in a text file

I recorded some data on my laptop and because the OS system language is German it converted the decimal separator to a comma (didn't think of that at the time...).
The column separator (there are three columns in the text file) is a comma too and so I end up with six columns instead of three
Example.txt
4,0,5,0,6,0
should be
4.0, 5.0, 6.0
How can I loop through all files in a folder and replace every first, third and fifth comma with a point in all lines in my data-files? I would prefer a bash script (.sh) or possibly a perl solution
Or how about awk
for F in * ; do awk -F, 'BEGIN { OFS = "," } ; { print $1"."$2, $3"."$4, $5"."$6 } ' $F | sponge $F ; done
You need "moreutils" for sponge, by the way. And back up your files first!
Generally for csv parsing you should use Text::CSV, however for this correction task, a quick and dirty could be:
#!/usr/bin/perl
use strict;
use warnings;
my $output;
#onen my $out, '>', 'outfile.dat';
#open my $in, '<', 'infile.dat';
#while(<$in>){
while(<DATA>){
chomp;
my #fields = split ',';
while (#fields) {
$output .= shift(#fields) . '.' . shift(#fields);
$output .= ', ' if #fields;
}
$output .= "\n";
}
#print $out $output;
print $output;
__DATA__
4,0,5,0,6,0
4,0,5,0,6,0
of course you will read from a file rather than DATA and print to a new file presumably. I have added this real-world usage as comments.
Well I see lots of valid and good answers here, here's another.
perl -wpe 'my $i; s/,/($i^=1) ? "." : ","/ge'
Here /e means "execute the replacement part"; $i^=1 generates a 1,0,1,0...sequence, and x?y:z selects y or z based on x's value (i.e. if (x) {y} else {z})
Following perl script should help you.
perl -e '$a = $ARGV[0]; $a =~ s/(\d)\,(\d\,)?/$1\.$2/g; print $a' "4,0,5,0,6,0"
OUTPUT
4.0,5.0,6.0
In Perl, the necessary regex would be s/,([^,]*,?)/.$1/g. If you apply this to a string, it will replace the first comma with a period, preserve the next comma (if any), and then start looking for commas again after the second one.