Perl script throws syntax error for awk command - perl

I have a file which contains each users userid and password. I need to fetch userid and password from that file by passing userid as an search element using awk command.
user101,smith,smith#123
user102,jones,passj#007
user103,albert,albpass#01
I am using a awk command inside my perl script like this:
...
...
my $userid = ARGV[0];
my $user_report_file = "report_file.txt";
my $data = `awk -F, '$1 ~ /$userid/ {print $2, $3}' $user_report_file`;
my ($user,$pw) = split(" ",$data);
...
...
Here I am getting the error:
awk: ~ /user101/ {print , }
awk: ^ syntax error
But if I run same command in terminal window its able to give result like below:
$] awk -F, '$1 ~ /user101/ {print $2, $3}' report_file.txt
smith smith#123
What could be the issue here?

The backticks are a double-quoted context, so you need to escape any literal $ that you want awk to interpret.
my $data = `awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file`;
If you don't do that, you're interpolating the capture variables from the last successful Perl match.
When I have these sorts of problems, I try the command as a string first to see if it is what I expect:
my $data = "awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file";
say $data;
Here's the Perl equivalent of that command:
$ perl -aF, -e '$F[0]=~/101/ && print "#F[1,2]"' report_file
But, this is something you probably want to do in Perl instead of creating another process:
Interpolating data into external commands can go wrong, such as a filename that is foo.txt; rm -rf /.
The awk you run is the first one in the path, so someone can make that a completely different program (so use the full path, like /usr/bin/awk).
Taint checking can tell you when you are passing unsanitized data to the shell.
Inside a program you don't get all the shortcuts, but if this is the part of your program that is slow, you probably want to rethink how you are accessing this data because scanning the entire file with any tool isn't going to be that fast:
open my $fh, '<', $user_report_file or die;
while( <$fh> ) {
chomp;
my #F = split /,/;
next unless $F[0] =~ /\Q$userid/;
print "#F[1,2]";
last; # if you only want the first one
}

Related

Sed - replace words

I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.

Merge two lines into one within a configuration file

I have several AIX systems with a configuration file, let's call it /etc/bar/config. The file may or may not have a line declaring values for foo. An example would be:
foo = A_1,GROUP_1,USER_1,USER_2,USER_3
The foo line may or may not be the same on all systems. Different systems may have different values and different a different number of values. My task is to add "bare minimum" values in the config file on all systems. The bare minimum line will look like this.
foo = A_1,USER_1,SYS_1,SYS_2
If the line does not exist, I must create it. If the line does exist, I must merge the two lines. Using my examples, the result would be this. The order of the values does not matter.
foo = A_1,GROUP_1,USER_1,USER_3,USER_2,SYS_1,SYS_2
Obviously I want a script to do my work. I have the standard sh, ksh, awk, sed, grep, perl, cut, etc. Since this is AIX, I do not have access to the GNU versions of these utilities.
Originally, I had a script with these commands to replace the entire foo line.
cp /etc/bar/config /etc/bar/config.$$
sed "s/foo = .*/foo = A_1,USER_1,SYS_1,SYS_2/" /etc/bar/config.$$ > /etc/bar/config
But this simply replaces the line. It does take into consideration any pre-existing configuration, including a line that's missing. And I'm doing other configuration modifications in the script, such as adding completely unique lines to other files and restarting a process, so I'd perfer this be some type of shell-based code snippet I can add to my change script. I am open to other options, especially if the solution is simpler.
Some dirty bash/sed:
#!/usr/bin/bash
input_file="some_filename"
v=$(grep -n '^foo *=' "$input_file")
lineno=$(cut -d: -f1 <<< "${v}0:")
base="A_1,USER_1,SYS_1,SYS_2,"
if [[ "$lineno" == 0 ]]; then
echo "foo = A_1,USER_1,SYS_1,SYS_2" >> "$input_file"
else
all=$(sed -n ${lineno}'s/^foo *= */'"$base"'/p' "$input_file" | \
tr ',' '\n' | sort | uniq | tr '\n' ',' | \
sed -e 's/^/foo = /' -e 's/, *$//' -e 's/ */ /g' <<< "$all")
sed -i "${lineno}"'s/.*/'"$all"'/' "$input_file"
fi
Untested bash, etc.
config=/etc/bar/config
default=A_1,USER_1,SYS_1,SYS_2
pattern='^foo[[:blank:]]*=[[:blank:]]*' # shared with grep and sed
if current=$( grep "$pattern" "$config" | sed "s/$pattern//" )
then
new=$( echo "$current,$default" | tr ',' '\n' | sort | uniq | paste -sd, )
sed "s/$pattern.*/foo = $new/" "$config" > "$config.$$.tmp" &&
mv "$config.$$.tmp" "$config"
else
echo "foo = $default" >> "$config"
fi
A vanilla perl solution:
perl -i -lpe '
BEGIN {%foo = map {$_ => 1} qw/A_1 USER_1 SYS_1 SYS_2/}
if (s/^foo\s*=\s*//) {
$found=1;
$foo{$_}=1 for split /,/;
$_ = "foo = " . join(",", keys %foo);
}
END {print "foo = " . join(",", keys %foo) unless $found}
' /etc/bar/config
This Perl code will do as you ask. It expects the path to the file to be modified as a parameter on the command line.
Note that it reads the entire input file into the array #config and then overwrites the same file with the modified data.
It works by building a hash %values from a combination of the items already present in the foo = line and the list of defaults items in #defaults. The combination is sorted in alphabetical order and joined eith a comma
use strict;
use warnings;
my #defaults = qw/ A_1 USER_1 SYS_1 SYS_2 /;
my ($file) = #ARGV;
my #config = <>;
open my $out_fh, '>', $file or die $!;
select $out_fh;
for ( #config ) {
if ( my ($pfx, $vals) = /^(foo \s* = \s* ) (.+) /x ) {
my %values;
++$values{$_} for $vals =~ /[^,\s]+/g;
++$values{$_} for #defaults;
print $pfx, join(',', sort keys %values), "\n";
}
else {
print;
}
}
close $out_fh;
output
foo = A_1,GROUP_1,SYS_1,SYS_2,USER_1,USER_2,USER_3
Since you didn't provide sample input and expected output I couldn't test this but this is the right approach:
awk '
/foo = / { old = ","$3; next }
{ print }
END {
split("A_1,USER_1,SYS_1,SYS_2"old,all,/,/)
for (i in all)
if (!seen[all[i]]++)
new = (new ? new "," : "") all[i]
print "foo =", new
}
' /etc/bar/config > tmp && mv tmp /etc/bar/config

variable for field separator in perl

In awk I can write: awk -F: 'BEGIN {OFS = FS} ...'
In Perl, what's the equivalent of FS? I'd like to write
perl -F: -lane 'BEGIN {$, = [what?]} ...'
update with an example:
echo a:b:c:d | awk -F: 'BEGIN {OFS = FS} {$2 = 42; print}'
echo a:b:c:d | perl -F: -ane 'BEGIN {$, = ":"} $F[1] = 42; print #F'
Both output a:42:c:d
I would prefer not to hard-code the : in the Perl BEGIN block, but refer to wherever the -F option saves its argument.
To sum up, what I'm looking for does not exist:
there's no variable that holds the argument for -F, and more importantly
Perl's "FS" is fundamentally a different data type (regular expression) than the "OFS" (string) -- it does not make sense to join a list of strings using a regex.
Note that the same holds true in awk: FS is a string but acts as regex:
echo a:b,c:d | awk -F'[:,]' 'BEGIN {OFS=FS} {$2=42; print}'
outputs "a[:,]42[:,]c[:,]d"
Thanks for the insight and workarounds though.
You can use perl's -s (similar to awk's -v) to pass a "FS" variable, but the split becomes manual:
echo a:b:c:d | perl -sne '
BEGIN {$, = $FS}
#F = split $FS;
$F[1] = 42;
print #F;
' -- -FS=":"
If you know the exact length of input, you could do this:
echo a:b:c:d | perl -F'(:)' -ane '$, = $F[1]; #F = #F[0,2,4,6]; $F[1] = 42; print #F'
If the input is of variable lengths, you'll need something more sophisticated than #f[0,2,4,6].
EDIT: -F seems to simply provide input to an automatic split() call, which takes a complete RE as an expression. You may be able to find something more suitable by reading the perldoc entries for split, perlre, and perlvar.
You can sort of cheat it, because perl is actually using the split function with your -F argument, and you can tell split to preserve what it splits on by including capturing parens in the regex:
$ echo a:b:c:d | perl -F'(:)' -ane 'print join("/", #F);'
a/:/b/:/c/:/d
You can see what perl's doing with some of these "magic" command-line arguments by using -MO=Deparse, like this:
$ perl -MO=Deparse -F'(:)' -ane 'print join("/", #F);'
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/(:)/, $_, 0);
print join('/', #F);
}
-e syntax OK
You'd have to change your #F subscripts to double what they'd normally be ($F[2] = 42).
Darnit...
The best I can do is:
echo a:b:c:d | perl -ne '$v=":";#F = split("$v"); $F[1] = 42; print join("$v", #F) . "\n";'
You don't need the -F: this way, and you're only stating the colon once. I was hoping there was someway of setting variables on the command line like you can with Awk's -v switch.
For one liners, Perl is usually not as clean as Awk, but I remember using Awk before I knew of Perl and writing 1000+ line Awk scripts.
Trying things like this made people think Awk was either named after the sound someone made when they tried to decipher such a script, or stood for AWKward.
There is no input record separator in Perl. You're basically emulating awk by using the -a and -F flags. If you really don't want to hard code the value, then why not just use an environmental variable?
$ export SPLIT=":"
$ perl -F$SPLIT -lane 'BEGIN { $, = $ENV{SPLIT}; } ...'

awk or sed CSV file manipulation

"a004-1b","North","at006754"
"a004-1c","south","atytgh0"
"a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
I want to print the first column and the second column without any extra character I want eliminate all ("", and the third column) Thanks in advance
awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
The above script will even handle fields that have embedded double quotes or commas. The only downside (if you can call it that) is that the first field starts at $2
Proof of Concept
$ awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
a004-1b North
a004-1c south
a010-1a midwest
a010-1b south
You need GNU Awk 4 for this to work:
$ gawk -vFPAT='[^",]+' '{print $1,$2}'
I love this new "field pattern" feature. It's my new hammer and everything is a nail. Read up on it at http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
(Written this way it doesn't account for embedded commas or quotes, because the question implies this is not needed.)
If you're using awk for this, why put a Perl tag on it?
In Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
use Text::CSV;
my $csv = Text::CSV->new();
while( my $row = $csv->getline( \*DATA )){
print 'row: ', Dumper $row;
}
__DATA__
"a004-1b","North","at006754"
"a004-1c","south","atytgh0""a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
awk -F'\"|\,' '{print $2,$5}' sample
Not handling embedded double quotes:
sed -e 's/^"\([^"]*\)","\([^"]*\)".*/\1 \2/'
To handle them:
sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P'
The above works even for a 1 or 2 field input.
If you want it "pure" awk or sed, this won't fit the bill, but otherwise it works:
awk -F, '{print $1 " " $2}' | tr -d '"'

awk to perl conversion

I have a directory full of files containing records like:
FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx
01/26/2011
These items are being held for you at the location shown below each one.
IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.
The Waltons. The complete DAXXXX12118198
Pickup at:CHUPACABRA LOCATION 02/02/2011
GRIMLY, WILFORD
29 FAKE LANE
S. BLEMPGLORFF RI 99XXX
I need to remove all entries with the expression Pickup at:CHUPACABRA LOCATION.
The "record separator" issue:
I can't touch the input file's formatting -- it must be retained as is. Each record
is separated by roughly 40+ new lines.
Here's some awk ( this works ):
BEGIN {
RS="\n\n\n\n\n\n\n\n\n+"
FS="\n"
}
!/CHUPACABRA/{print $0}
My stab with perl:
perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";chomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000
Nothing is returned. I'm not sure how to specify 'field separator' in perl except at the commandline. Tried the a2p utility -- no dice. For the curious, here's what it produces:
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
This has to run under someone's Windows box otherwise I'd stick with awk.
Thanks!
Bubnoff
EDIT ( SOLVED ) **
Thanks mob!
Here's a ( working ) perl script version ( adjusted a2p output ):
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n"x10;
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
In Perl, the record separator is a literal string, not a regular expression. As the perlvar doc famously says:
Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)
Still, it looks like you can get away with $/="\n" x 10 or something like that:
perl -a -F\n -ne '$/="\n"x10;$\="\n";chomp;$regex="CHUPACABRA";
print if /\S/ && !m/$regex/i;' data/lib51.000
Note the extra /\S/ &&, which will skip empty paragraphs from input that has more than 20 consecutive newlines.
Also, have you considered just installing Cygwin and having awk available on your Windows machine?
There is no need for (much)conversion if you can download gawk for windows
Did you know that Perl comes with a program called a2p that does exactly what you described you want to do in your title?
And, if you have Perl on your machine, the documentation for this program is already there:
C> perldoc a2p
My own suggestion is to get the Llama book and learn Perl anyway. Despite what the Python people say, Perl is a great and flexible language. If you know shell, awk and grep, you'll understand many of the Perl constructs without any problems.