How can I reformat the GECOS field with Perl or awk? - perl

I want to scan the passwd file and change the order of words in the comment field from firstname lastname to lastname firstname, and force the surname to capitals.
So, change every line from:
jbloggs:x:9999:99:Joe Bloggs:/home/jbloggs:/bin/ksh
to:
jbloggs:x:9999:99:BLOGGS Joe:/home/jbloggs:/bin/ksh
I'm new to Perl and I'm having problems with different field separators in awk.
Appreciate any help.

Use Passwd::Unix or Passwd::Linux.

$ awk -v FS=":" '{split($5, a, " "); name = toupper(a[2]) " " a[1]; gsub($5, name); print $0}' passwd
Won't work if you have middle names though.
Edit: easier to read version
awk -v FS=":" '
{
split($5, name_parts, " ")
name = toupper(name_parts[2]) " " name_parts[1]
gsub($5, name)
print $0
}' passwd

Stand-alone example:
use strict;
use warnings;
my $s = 'jbloggs:x:9999:99:Joe Bloggs:/home/jbloggs:/bin/ksh';
my #tokens = split /:/, $s;
my ($first, $last) = split /\s+/, $tokens[4];
$tokens[4] = uc($last) . " $first";
print join(':', #tokens), "\n";
__END__
jbloggs:x:9999:99:BLOGGS Joe:/home/jbloggs:/bin/ksh
As a script (output to STDOUT; must redirect output to a file):
use strict;
use warnings;
while (<>) {
chomp;
my #tokens = split /:/;
my ($first, $last) = split /\s+/, $tokens[4];
$tokens[4] = uc($last) . " $first";
print join(':', #tokens), "\n";
}

This will process the file as you read it and put the new format entries into the array #newEntries.
open PASSWD, "/etc/passwd";
while(<PASSWD>) {
#fields = split /:/;
($first, $last) = split (/\s/, $fields[4]);
$last = uc $last;
$fields[4] = "$last $first";
push #newEntries, join(':', #fields);
}

$ awk -F":" ' { split($5,a," ");$5=toupper(a[2])" "a[1] } 1 ' OFS=":" /etc/passwd
jbloggs:x:9999:99:BLOGGS Joe:/home/jbloggs:/bin/ksh

$ perl -pe 's/^((.*:){4})(\S+)\s+(\S+?):/$1\U$4\E, $3:/' \
/etc/passwd >/tmp/newpasswd
To rewrite only those users who have logged in within the past sixty days according to lastlog, you might use
#! /usr/bin/perl -n
use warnings;
use strict;
use Date::Parse;
my($user) = /^([^:]+):/;
die "$0: $ARGV:$.: no user\n" unless defined $user;
if ($user eq "+") {
next;
}
my $lastlog = (split " ", `lastlog -u "$user" | sed 1d`, 4)[-1];
die "$0: $ARGV:$.: lastlog -u $user failed\n"
unless defined $lastlog;
my $time_t = str2time $lastlog;
# rewrites users who've never logged in
#next if defined $time_t && $time_t < ($^T - 60 * 24 * 60 * 60);
# users who have logged in within the last sixty days
next unless defined $time_t && $time_t >= ($^T - 60 * 24 * 60 * 60);
s/^((.*:){4})(\S+)\s+(\S+?):/$1\U$4\E, $3:/;
print;
as in
$ ./fixusers /etc/passwd >/tmp/fixedusers

Related

find nearest key match with input value greater or equal to key and records unsorted

I'm having the below input file and trying to find out the nearest key match with input value greater or equal to key. It works when the input file is sorted.
Input file:
10,Line1
20,Line2
30,Line3
40,Line4
50,Line5
55,Line6
70,Line7
75,Line8
90,Line9
95,Line10
99,Line11
Code that I tried:
$ awk -F, -v inp=85 ' NR==1 { dp=0 } { dt=($1-inp); d=sqrt(dt*dt); if(d<=dp && inp >= $1 ) { rec=$0 } dp=d } END { print rec } ' source.txt
75,Line8
$ awk -F, -v inp=55 ' NR==1 { dp=0 } { dt=($1-inp); d=sqrt(dt*dt); if(d<=dp && inp >= $1 ) { rec=$0 } dp=d } END { print rec } ' source.txt
55,Line6
It works fine when the source.txt is sorted on the key column i.e first. But it gives incorrect results when the file is not sorted
$ shuf source.txt | awk -F, -v inp=85 ' NR==1 { dp=0 } { dt=($1-inp); d=sqrt(dt*dt); if(d<=dp && inp >= $1 ) { rec=$0 } dp=d } END { print rec } '
50,Line5 # Wrong
Can this be fixed for the unsorted file?.
Solutions using any unix tools are welcome!
You may use this awk:
awk -F, -v n=85 'n>=$1 && (max=="" || $1>max){max=$1; rec=$0} END{print rec}' file
75,Line8
Run this again with a different value:
awk -F, -v n=55 'n>=$1 && (max=="" || $1>max){max=$1; rec=$0} END{print rec}' file
55,Line6
With Perl
perl -0777 -wnE' $in = shift // 85;
%h = split /(?:\s*,\s*|\s*\n\s*)/;
END { --$in while not exists $h{$in}; say "$in, $h{$in}" }
' data.txt 57
Notes
read the whole file into a string ("slurp"), by -0777
populate a hash with file data; I strip surrounding spaces in the process
count down from input-value and check for such a key, until we get to one that exists
input is presumed integer and being in range
The nearest key is the first one that exists as input "clicks" down toward it an integer at a time.
The invocation above (for 57) prints the line: 55, Line6.
A version that does check the range of input and allows non-integer input
perl -MList::Util=min -0777 -wnE' $in = int shift // 85;
%h = split /(?:\s*,\s*|\s*\n\s*)/;
die "Input $in out of range\n" if $in < min keys %h;
END { --$in while not exists $h{$in}; say "$in, $h{$in}" }
' data.txt 57
Following code comply with your requirement
use strict;
use warnings;
my $target = shift
or die "Please enter a value";
my $line;
while( <DATA> ) {
my #data = split ',';
last if $data[0] > $target;
$line = $_;
}
print $line;
__DATA__
10,Line1
20,Line2
30,Line3
40,Line4
50,Line5
55,Line6
70,Line7
75,Line8
90,Line9
95,Line10
99,Line11
Code for unsorted lines
use strict;
use warnings;
my $target = shift
or die "Please enter a value";
my #lines = <DATA>;
my $line;
my %data;
map { my #array = split ',', $_; $data{$array[0]} = $_ } #lines;
foreach my $key ( sort keys %data ) {
last if $key > $target;
$line = $data{$key};
}
print $line;
__DATA__
10,Line1
20,Line2
30,Line3
40,Line4
50,Line5
55,Line6
70,Line7
75,Line8
90,Line9
95,Line10
99,Line11

Why are Perl's $. and $ARGV behaving strangely after setting #ARGV and using <>

I wrote a perl program to take a regex from the command line and do a recursive search of the current directory for certain filenames and filetypes, grep each one for the regex, and output the results, including filename and line number. [ basic grep + find functionality that I can go in and customize as needed ]
cat <<'EOF' >perlgrep2.pl
#!/usr/bin/env perl
$expr = join ' ', #ARGV;
my #filetypes = qw(cpp c h m txt log idl java pl csv);
my #filenames = qw(Makefile);
my $find="find . ";
my $nfirst = 0;
foreach(#filenames) {
$find .= " -o " if $nfirst++;
$find .= "-name \"$_\"";
}
foreach(#filetypes) {
$find .= " -o " if $nfirst++;
$find .= "-name \\*.$_";
}
#files=`$find`;
foreach(#files) {
s#^\./##;
chomp;
}
#ARGV = #files;
foreach(<>) {
print "$ARGV($.): $_" if m/\Q$expr/;
close ARGV if eof;
}
EOF
cat <<'EOF' >a.pl
print "hello ";
$a=1;
print "there";
EOF
cat <<'EOF' >b.pl
print "goodbye ";
print "all";
$a=1;
EOF
chmod ugo+x perlgrep2.pl
./perlgrep2.pl print
If you copy and paste this into your terminal, you will see this:
perlgrep2.pl(36): print "hello ";
perlgrep2.pl(0): print "there";
perlgrep2.pl(0): print "goodbye ";
perlgrep2.pl(0): print "all";
perlgrep2.pl(0): print "$ARGV($.): $_" if m/\Q$expr/;
This is very surprising to me. The program appears to be working except that the $. and $ARGV variables do not have the values I expected. It appears from the state of the variables that perl has already read all three files (total of 36 lines) when it executes the first iteration of the loop over <>. What's going on ? How to fix ? This is Perl 5.12.4.
You're using foreach(<>) where you should be using while(<>). foreach(<>) will read every file in #ARGV into a temporary list before it starts iterating over it.

`perl -ane`, grabbing all the arguments until the end (variable number of args)?

I've got a pretty messy perl script (I'm not a Perl guru) which is this one:
perl -ane '($h,$m,$s) = split /:/, $F[0];
$pid = $F[1];
$args = $F[2]." ".$F[3]." ".$F[4]." ".$F[5]." ".$F[6]." ".$F[7]." ".$F[8]." ".$F[9]." ".$F[10]." ".$F[11]." ".$F[12]." ".$F[13];
if(($h == 1 && $m > 30) || ($h > 1)) {
print "$h :: $m $kb $pid\nArguments:\n$args\n\n "; kill 9, $pid }'
I'm searching for a way, instead of having all these concatenations for $arg, to say something like $arg=$F[2-end]
I'd love any help on that :)
Thanks!
$args = join " ", #F[2..$#F];
$#arrayname is the index of the last element of #arrayname; #arrayname[$start..$end] gets you a subarray starting with $arrayname[$start] and ending with $arrayname[$end], and containing all the elements in between. Put those together and you get #F[2..$#F] for "all the elements of #F from $F[2] through the end of the array".
Then you use join to concatenate all those array elements together into a single string; the first argument tells Perl what to put in between the rest.
You can do this in bash as well:
while read -r time pid args; do
IFS=: read -r h m s <<< $time
(( $h*60 + $m > 90 )) && {
# I don't see where $kb was defined in the original code
cat <<EOF
$h:$m $kb $pid
ARGUMENTS
$args
EOF
# Are you sure a more gentle kill won't work?
# kill -9 should be the last resort for buggy code
kill "$pid"
}
done
Instead of using -autosplit, consider limiting the number of fields being split in the first place.
($t, $pid, $args) = split " ", $_, 3;
($h, $m, $s) = split /:/, $t;
You could always write a program. Nobody will hate you for it and you might get thanks from people who can read your code.
It would look something like this, although you don't say where $kb comes from
use strict;
use warnings;
while (<>) {
my ($time, $pid, #args) = split;
my ($h, $m) = split /:/, $time;
if ( $h > 1 or $h == 1 and $m > 30 ) {
print "${h}::$m $kb $pid\n";
print "Arguments:\n";
print "#args\n\n";
kill 9, $pid
}
}

extract every nth number

i want to extract every 3rd number ( 42.034 , 41.630 , 40.158 as so on ) from the file
see example-
42.034 13.749 28.463 41.630 12.627 28.412 40.158 12.173 30.831 26.823
12.596 32.191 26.366 13.332 32.938 25.289 12.810 32.419 23.949 13.329
Any suggestions using perl script ?
Thanks,
dac
You can split file's contents to separate numbers and use the modulo operator to extract every 3rd number:
my $contents = do { local $/; open my $fh, "file" or die $!; <$fh> };
my #numbers = split /\s+/, $contents;
for (0..$#numbers) {
$_ % 3 == 0 and print "$numbers[$_]\n";
}
use strict;
use warnings;
use 5.010; ## for say
use List::MoreUtils qw/natatime/;
my #vals = qw/42.034 13.749 28.463 41.630 12.627 28.412 40.158 12.173 30.831
26.823 12.596 32.191 26.366 13.332 32.938 25.289 12.810 32.419 23.949 13.329/;
my $it = natatime 3, #vals;
say while (($_) = $it->());
This is probably the shortest way to specify that. If #list is your list of numbers
#list[ grep { $_ % 3 == 0 } 0..$#list ]
It's a one-liner!
$ perl -lane 'print for grep {++$i % 3 == 1} #F' /path/to/your/input
-n gives you line-by-line processing, -a autosplitting for field processing, and $i (effectively initialized to zero for our purposes) keeps count of the number of fields processed...
This method avoids reading the entire file into memory at once:
use strict;
my #queue;
while (<>) {
push #queue, / ( \d+ (?: \. \d* ) ? ) /gx;
while (#queue >= 3) {
my $third = (splice #queue, 0, 3)[2];
print $third, "\n"; # Or do whatever with it.
}
}
If the file has 10 numbers in every line you can use this:
perl -pe 's/([\d.]+) [\d.]+ [\d.]+/$1/g;' file
It's not a clean solution but it should "do the job".
Looks like this post lacked a solution that didn't read the whole file and used grep.
#!/usr/bin/perl -w
use strict;
my $re = qr/-?\d+(?:\.\d*)/; # Insert a more precise regexp here
my $n = 3;
my $count = 0;
while (<>) {
my #res = grep { not $count++ % $n } m/($re)/go;
print "#res\n";
};
I believe you’ll find that this work per spec, behaves politely, and never reads in more than it needs to.
#!/usr/bin/env perl
use 5.010_001;
use strict;
use autodie;
use warnings qw[ FATAL all ];
use open qw[ :std IO :utf8 ];
END { close STDOUT }
use Regexp::Common;
my $real_num_rx = $RE{num}{real};
my $left_edge_rx = qr{
(?: (?<= \A ) # or use \b
| (?<= \p{White_Space} ) # or use \D
)
}x;
my $right_edge_rx = qr{
(?= \z # or use \b
| \p{White_Space} # or use \D
)
}x;
my $a_number_rx = $left_edge_rx
. $real_num_rx
. $right_edge_rx
;
if (-t STDIN && #ARGV == 0) {
warn "$0: reading numbers from stdin,"
. " type ^D to end, ^C to kill\n";
}
$/ = " ";
my $count = 0;
while (<>) {
while (/($a_number_rx)/g) {
say $1 if $count++ % 3 == 0;
}
}

How can I split a pipe-separated string in a list?

Here at work, we are working on a newsletter system that our clients can use. As an intern one of my jobs is to help with the smaller pieces of the puzzle. In this case what I need to do is scan the logs of the email server for bounced messages and add the emails and the reason the email bounced to a "bad email database".
The bad emails table has two columns: 'email' and 'reason'
I use the following statement to get the information from the logs and send it to the Perl script
grep " 550 " /var/log/exim/main.log | awk '{print $5 "|" $23 " " $24 " " $25 " " $26 " " $27 " " $28 " " $29 " " $30 " " $31 " " $32 " " $33}' | perl /devl/bademails/getbademails.pl
If you have sugestions on a more efficient awk script, then I would be glad to hear those too but my main focus is the Perl script. The awk pipes "foo#bar.com|reason for bounce" to the Perl script. I want to take in these strings, split them at the | and put the two different parts into their respective columns in the database. Here's what I have:
#!usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbpath = "dbi:mysql:database=system;host=localhost:3306";
my $dbh = DBI->connect($dbpath, "root", "******")
or die "Can't open database: $DBI::errstr";
while(<STDIN>) {
my $line = $_;
my #list = # ? this is where i am confused
for (my($i) = 0; $i < 1; $i++)
{
if (defined($list[$i]))
{
my #val = split('|', $list[$i]);
print "Email: $val[0]\n";
print "Reason: $val[1]";
my $sth = $dbh->prepare(qq{INSERT INTO bademails VALUES('$val[0]', '$val[1]')});
$sth->execute();
$sth->finish();
}
}
}
exit 0;
Something like this would work:
while(<STDIN>) {
my $line = $_;
chomp($line);
my ($email,$reason) = split(/\|/, $line);
print "Email: $email\n";
print "Reason: $reason";
my $sth = $dbh->prepare(qq{INSERT INTO bademails VALUES(?, ?)});
$sth->execute($email, $reason);
$sth->finish();
}
You might find it easier to just do the whole thing in Perl. "next unless / 550 /" could replace the grep and a regex could probably replace the awk.
I'm not sure what you want to put in #list? If the awk pipes one line per entry, you'll have that in $line, and you don't need the for loop on the #list.
That said, if you're going to pipe it into Perl, why bother with the grep and AWK in the first place?
#!/ust/bin/perl -w
use strict;
while (<>) {
next unless / 550 /;
my #tokens = split ' ', $_;
my $addr = $tokens[4];
my $reason = join " ", #tokens[5..$#tokens];
# ... DBI code
}
Side note about the DBI calls: you should really use placeholders so that a "bad email" wouldn't be able to inject SQL into your database.
Have you considered using App::Ack instead? Instead of shelling out to an external program, you can just use Perl instead. Unfortunately, you'll have to read through the ack program code to really get a sense of how to do this, but you should get a more portable program as a result.
Why not forgo the grep and awk and go straight to Perl?
Disclaimer: I have not checked if the following code compiles:
while (<STDIN>) {
next unless /550/; # skips over the rest of the while loop
my #fields = split;
my $email = $fields[4];
my $reason = join(' ', #fields[22..32]);
...
}
EDIT: See #dland's comment for a further optimisation :-)
Hope this helps?
my(#list) = split /\|/, $line;
This will generate more than two entries in #list if you have extra pipe symbols in the tail of the line. To avoid that, use:
$line =~ m/^([^|]+)\|(.*)$/;
my(#list) = ($1, $2);
The dollar in the regex is arguably superfluous, but also documents 'end of line'.