variable for field separator in perl - perl

In awk I can write: awk -F: 'BEGIN {OFS = FS} ...'
In Perl, what's the equivalent of FS? I'd like to write
perl -F: -lane 'BEGIN {$, = [what?]} ...'
update with an example:
echo a:b:c:d | awk -F: 'BEGIN {OFS = FS} {$2 = 42; print}'
echo a:b:c:d | perl -F: -ane 'BEGIN {$, = ":"} $F[1] = 42; print #F'
Both output a:42:c:d
I would prefer not to hard-code the : in the Perl BEGIN block, but refer to wherever the -F option saves its argument.

To sum up, what I'm looking for does not exist:
there's no variable that holds the argument for -F, and more importantly
Perl's "FS" is fundamentally a different data type (regular expression) than the "OFS" (string) -- it does not make sense to join a list of strings using a regex.
Note that the same holds true in awk: FS is a string but acts as regex:
echo a:b,c:d | awk -F'[:,]' 'BEGIN {OFS=FS} {$2=42; print}'
outputs "a[:,]42[:,]c[:,]d"
Thanks for the insight and workarounds though.
You can use perl's -s (similar to awk's -v) to pass a "FS" variable, but the split becomes manual:
echo a:b:c:d | perl -sne '
BEGIN {$, = $FS}
#F = split $FS;
$F[1] = 42;
print #F;
' -- -FS=":"

If you know the exact length of input, you could do this:
echo a:b:c:d | perl -F'(:)' -ane '$, = $F[1]; #F = #F[0,2,4,6]; $F[1] = 42; print #F'
If the input is of variable lengths, you'll need something more sophisticated than #f[0,2,4,6].
EDIT: -F seems to simply provide input to an automatic split() call, which takes a complete RE as an expression. You may be able to find something more suitable by reading the perldoc entries for split, perlre, and perlvar.

You can sort of cheat it, because perl is actually using the split function with your -F argument, and you can tell split to preserve what it splits on by including capturing parens in the regex:
$ echo a:b:c:d | perl -F'(:)' -ane 'print join("/", #F);'
a/:/b/:/c/:/d
You can see what perl's doing with some of these "magic" command-line arguments by using -MO=Deparse, like this:
$ perl -MO=Deparse -F'(:)' -ane 'print join("/", #F);'
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/(:)/, $_, 0);
print join('/', #F);
}
-e syntax OK
You'd have to change your #F subscripts to double what they'd normally be ($F[2] = 42).

Darnit...
The best I can do is:
echo a:b:c:d | perl -ne '$v=":";#F = split("$v"); $F[1] = 42; print join("$v", #F) . "\n";'
You don't need the -F: this way, and you're only stating the colon once. I was hoping there was someway of setting variables on the command line like you can with Awk's -v switch.
For one liners, Perl is usually not as clean as Awk, but I remember using Awk before I knew of Perl and writing 1000+ line Awk scripts.
Trying things like this made people think Awk was either named after the sound someone made when they tried to decipher such a script, or stood for AWKward.

There is no input record separator in Perl. You're basically emulating awk by using the -a and -F flags. If you really don't want to hard code the value, then why not just use an environmental variable?
$ export SPLIT=":"
$ perl -F$SPLIT -lane 'BEGIN { $, = $ENV{SPLIT}; } ...'

Related

Perl script throws syntax error for awk command

I have a file which contains each users userid and password. I need to fetch userid and password from that file by passing userid as an search element using awk command.
user101,smith,smith#123
user102,jones,passj#007
user103,albert,albpass#01
I am using a awk command inside my perl script like this:
...
...
my $userid = ARGV[0];
my $user_report_file = "report_file.txt";
my $data = `awk -F, '$1 ~ /$userid/ {print $2, $3}' $user_report_file`;
my ($user,$pw) = split(" ",$data);
...
...
Here I am getting the error:
awk: ~ /user101/ {print , }
awk: ^ syntax error
But if I run same command in terminal window its able to give result like below:
$] awk -F, '$1 ~ /user101/ {print $2, $3}' report_file.txt
smith smith#123
What could be the issue here?
The backticks are a double-quoted context, so you need to escape any literal $ that you want awk to interpret.
my $data = `awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file`;
If you don't do that, you're interpolating the capture variables from the last successful Perl match.
When I have these sorts of problems, I try the command as a string first to see if it is what I expect:
my $data = "awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file";
say $data;
Here's the Perl equivalent of that command:
$ perl -aF, -e '$F[0]=~/101/ && print "#F[1,2]"' report_file
But, this is something you probably want to do in Perl instead of creating another process:
Interpolating data into external commands can go wrong, such as a filename that is foo.txt; rm -rf /.
The awk you run is the first one in the path, so someone can make that a completely different program (so use the full path, like /usr/bin/awk).
Taint checking can tell you when you are passing unsanitized data to the shell.
Inside a program you don't get all the shortcuts, but if this is the part of your program that is slow, you probably want to rethink how you are accessing this data because scanning the entire file with any tool isn't going to be that fast:
open my $fh, '<', $user_report_file or die;
while( <$fh> ) {
chomp;
my #F = split /,/;
next unless $F[0] =~ /\Q$userid/;
print "#F[1,2]";
last; # if you only want the first one
}

Sed - replace words

I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.

three levels of parenthesis/quoting in shell snippets in tcsh?

I'm using tcsh; I want to run some snippet from sh on the command line, which itself contains a perl snippet, which contains some strings that are to be printed.
This results in three levels of parentheses, but there are only two available — " and '.
Is there a way around?
tcsh# sh -c 'while (true); do mtr --order "SRL BGAWV M" …; hping --icmp-ts --count 12 … | perl -ne '... if (/tsrtt=(\d+)/) {print $0,"\t"…}' ; done'
To include a single quote inside of single quotes, use '\''. e.g.
perl -ne'... print $0, "\t" ...'
becomes
sh -c '... | perl -ne'\''... print $0, "\t" ...'\'''
In this particular case, an alternative is to replace
perl -ne'... print $0, "\t" ...'
with
perl -ne"... print \$0, qq{\t} ..."
so you'd get
sh -c '... | perl -ne"... print \$0, qq{\t} ..."'
I'd just write the whole thing in Perl
perl -e'
while (1) {
system("mtr", "--order", "SRL BGAWV M");
open(my $pipe, "-|", "hping", "--icmp-ts", "--count", "12");
while (<$pipe>) {
...
}
}
'
Use q/../ for single quotes and qq/.../ for double quotes within your Perl code.
For instance, print $0, qq/\t/
Another solution is doing a big and long echo, with a few arguments, all escaped with ', where the actual literal ' is gathered from the result of executing printf "'", and piping this whole echo to sh, instead of passing the string as an argument directly to sh.
This actually seems somewhat easier, because it doesn't involve escaping the whole perl snippet, basically, but only escaping the two ' that are used for perl -ne.
tcsh# echo 'while (true); do mtr --order "SRL BGAWV M" …; hping --icmp-ts --count 12 … | perl -ne' `printf "'"` '... if (/tsrtt=(\d+)/) {print $0,"\t"…}' `printf "'"` '; done' | sh

How can I let perl interpret a string variable that represents an address

I want to feed input to a C program with a perl script like this
./cprogram $(perl -e 'print "\xab\xcd\xef";').
However, the string must be read from a file. So I get something like this:
./cprogram $(perl -e 'open FILE, "<myfile.txt"; $file_contents = do { local $/; <FILE> }; print $file_contents'. However, now perl interprets the string as the string "\xab\xcd\xef", and I want it to interpret it as the byte sequence as in the first example.
How can this be achieved? It has to be ran on a server without File::Slurp.
In the first case, you pass the three bytes AB CD EF (produced by the string literal "\xAB\xCD\xEF") to print.
In the second case, you must be passing something other than those three bytes to print. I suspect you are passing the twelve character string \xAB\xCD\xEF to print.
So your question becomes: How does one convert the twelve-character string \xAB\xCD\xEF into the three bytes AB CD EF. Well, you'd require some kind of parser such as
s/\\x([0-9a-fA-F][0-9a-fA-F])|\\([^x])|([^\\]+)/
$1 ? chr(hex($1)) : $2 ? $2 : $3
/eg
And here it is at work:
$ perl -e'print "\\xAB\\xCD\\xEF";' >file
$ echo -n "$( perl -0777pe'
s{\\x([0-9a-fA-F][0-9a-fA-F])|\\([^x])|([^\\]+)}{
$1 ? chr(hex($1)) : $2 // $3
}eg;
' file )" | od -t x1
0000000 ab cd ef
0000003
Is Perl's eval too evil? If not, end in print eval("\"$file_contents\"");
Or can you prepare the file in advance using Perl? EG print FILE "\xAB\xCD\xED"; then read the resulting file with your existing code.
using a bash trick:
perl -e "$(echo "print \"$(cat input)"\")"
which for your example becomes:
./cprogram "$(perl -e "$(echo "print \"$(cat myfile.txt)"\")")"

awk or sed CSV file manipulation

"a004-1b","North","at006754"
"a004-1c","south","atytgh0"
"a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
I want to print the first column and the second column without any extra character I want eliminate all ("", and the third column) Thanks in advance
awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
The above script will even handle fields that have embedded double quotes or commas. The only downside (if you can call it that) is that the first field starts at $2
Proof of Concept
$ awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
a004-1b North
a004-1c south
a010-1a midwest
a010-1b south
You need GNU Awk 4 for this to work:
$ gawk -vFPAT='[^",]+' '{print $1,$2}'
I love this new "field pattern" feature. It's my new hammer and everything is a nail. Read up on it at http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
(Written this way it doesn't account for embedded commas or quotes, because the question implies this is not needed.)
If you're using awk for this, why put a Perl tag on it?
In Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
use Text::CSV;
my $csv = Text::CSV->new();
while( my $row = $csv->getline( \*DATA )){
print 'row: ', Dumper $row;
}
__DATA__
"a004-1b","North","at006754"
"a004-1c","south","atytgh0""a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
awk -F'\"|\,' '{print $2,$5}' sample
Not handling embedded double quotes:
sed -e 's/^"\([^"]*\)","\([^"]*\)".*/\1 \2/'
To handle them:
sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P'
The above works even for a 1 or 2 field input.
If you want it "pure" awk or sed, this won't fit the bill, but otherwise it works:
awk -F, '{print $1 " " $2}' | tr -d '"'