Perl grep-like one-liner with regex match condition and assignment? - perl

Say I have this file:
cat > testfile.txt <<'EOF'
test1line
test23line
test456line
EOF
Now, I want to use perl in a "grep like" manner (a one-liner expression/command with file argument, which dumps output to terminal/stdout), such that I match all of the numbers in the above lines, and I make them into zero-padded three digit representation. So, I tried this to check if matching generally works:
perl -nE '/(.*test)(\d+)(line.*)/ && print "$1 - $2 -$3\n";' testfile.txt
#test - 1 -line
#test - 23 -line
#test - 456 -line
Well, that works; so now, I was thinking, I'll just call sprintf to format, assign that to a variable, and print that variable instead, and I'm done; unfortunately, my attempt there failed:
perl -nE '/(.*test)(\d+)(line.*)/ && $b = sprintf("%03d", $2); print "$1 - $b -$3\n";' testfile.txt
#Can't modify logical and (&&) in scalar assignment at -e line 1, near ");"
#Can't modify pattern match (m//) in scalar assignment at -e line 1, near ");"
#Execution of -e aborted due to compilation errors.
Ok, so something went wrong there. As far as I can tell from the error messages, apparently mixing logical AND (&&) and assignments like in the above one-liner do not work.
So, how can I have a Perl one-liner where a regex-condition is checked; and if match is detected, a series of commands are executed, which may involve one or more assignments, and conclude with a print?
EDIT: found an invocation that works:
perl -nE '/(.*test)(\d+)(line.*)/ && printf("$1%03d$3\n", $2);' testfile.txt
#test001line
#test023line
#test456line
... but I'd still like to know how to do the same via sprintf and assignment to variable.

Precedence issue.
/.../ && $b = ...;
means
( /.../ && $b ) = ...;
You could use
/.../ && ( $b = ... );
/.../ && do { $b = ...; };
/.../ and $b = ...;
$b = ... if /.../;
But there's a second problem. You call print unconditionally.
perl -ne'printf "%s-%03d-%s\n", $1, $2, $3 if /(.*test)(\d+)(line.*)/'

Related

Perl CLI code cannot do a string line appended

I'm trying to use a perl -npe one-liner to surround each line with =.
$ for i in {1..4}; { echo $i ;} |perl -npe '...'
=1=
=2=
=3=
=4=
The following is my first attempt. Note that the line feeds are in the incorrect position.
$ for i in {1..4}; { echo $i ;} |perl -npe '$_= "=".$_."=" '
=1
==2
==3
==4
=
I tried using chop to remove them line feeds and then re-add them in the correct position, but it didn't work.
$ for i in {1..4} ;{ echo $i ;} |perl -npe '$_= "=".chop($_)."=\n" '
=
=
=
=
=
=
=
=
Please solve it out, thanks much.
chop returned the removed character, not the remaining string. It modifies the variable in-place. So the following is the correct usage:
perl -npe'chop( $_ ); $_ = "=$_=\n"'
But we can improve this.
It's safer to use chomp instead of chop to remove trailing line feeds.
-n is implied by -p, and it's customary to leave it out when -p is used.
chomp and chop modify $_ by default, so we don't need to explicitly pass $_.
perl -pe'chomp; $_ = "=$_=\n"'
Finally, we can get the same exact behaviour out of -l.
perl -ple'$_ = "=$_="'

Get value of autosplit delimiter?

If I run a script with perl -Fsomething, is that something value saved anywhere in the Perl environment where the script can find it? I'd like to write a script that by default reuses the input delimiter (if it's a string and not a regular expression) as the output delimiter.
Looking at the source, I don't think the delimiter is saved anywhere. When you run
perl -F, -an
the lexer actually generates the code
LINE: while (<>) {our #F=split(q\0,\0);
and parses it. At this point, any information about the delimiter is lost.
Your best option is to split by hand:
perl -ne'BEGIN { $F="," } #F=split(/$F/); print join($F, #F)' foo.csv
or to pass the delimiter as an argument to your script:
F=,; perl -F$F -sane'print join($F, #F)' -- -F=$F foo.csv
or to pass the delimiter as an environment variable:
export F=,; perl -F$F -ane'print join($ENV{F}, #F)' foo.csv
As #ThisSuitIsBlackNot says it looks like the delimiter is not saved anywhere.
This is how the perl.c stores the -F parameter
case 'F':
PL_minus_a = TRUE;
PL_minus_F = TRUE;
PL_minus_n = TRUE;
PL_splitstr = ++s;
while (*s && !isSPACE(*s)) ++s;
PL_splitstr = savepvn(PL_splitstr, s - PL_splitstr);
return s;
And then the lexer generates the code
LINE: while (<>) {our #F=split(q\0,\0);
However this is of course compiled, and if you run it with B::Deparse you can see what is stored.
$ perl -MO=Deparse -F/e/ -e ''
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/e/, $_, 0);
}
-e syntax OK
Being perl there is always a way, however ugly. (And this is some of the ugliest code I have written in a while):
use B::Deparse;
use Capture::Tiny qw/capture_stdout/;
BEGIN {
my $f_var;
}
unless ($f_var) {
$stdout = capture_stdout {
my $sub = B::Deparse::compile();
&{$sub}; # Have to capture stdout, since I won't bother to setup compile to return the text, instead of printing
};
my (undef, $split_line, undef) = split(/\n/, $stdout, 3);
($f_var) = $split_line =~ /our\(\#F\) = split\((.*)\, \$\_\, 0\);/;
print $f_var,"\n";
}
Output:
$ perl -Fe/\\\(\\[\\\<\\{\"e testy.pl
m#e/\(\[\<\{"e#
You could possible traverse the bytecode instead, since the start probably will be identical every time until you reach the pattern.

Perl confusion over grep {//} and eval {grep //} syntax

Kindly shed some light on these two ways of grep'ping in Perl as how they differ from each other
eval {grep /pattern/, ....};
and the normal one,
grep {/pattern/} ....
First of all, there are 2 independent differences between your alternatives, and they have different purposes. Wrapping the grep in eval allows you to catch errors that are normally fatal (like a syntax error in the regular expression). Putting a block after the grep keyword lets you use a matching rule that is more complex than a single expression.
Here are the 4 combinations that can be made out of your 2 examples:
#y = grep /pattern/, #x; # grep EXPR, no eval
#y = grep { /pattern/ } #x; # grep BLOCK, no eval
eval { #y = grep /pattern/, #x }; # grep EXPR inside eval BLOCK
eval { #y = grep { /pattern/ } #x }; # grep BLOCK inside eval BLOCK
Now we can look in more detail at 2 separate questions: what do you gain from the eval, and what do you gain from using the grep BLOCK syntax? In the simple cases shown above, you gain nothing from either one.
When you want to do a grep where the matching condition is more complicated than a simple regexp, grep BLOCK gives you more flexibility in how you express the condition. You can put multiple statements in the block and use temporary variables. For example this grep within a grep:
# Note: not the most efficient method for finding an intersection of arrays.
my #a = qw/A E I O U/;
my #b = qw/A B D O P Q R/;
my #intersection = grep { my $x = $_; grep { $_ eq $x } #b } #a;
print "#intersection\n";
In the above example, we needed a temporary $x to hold the value being tested by the outer grep so it could be compared to $_ in the inner grep. The inner grep could have been written without a BLOCK as grep $_ eq $x, #b but I think having using the same syntax for both looks better.
The eval block would be useful if you were looking for matches of a regexp that is determined at runtime, and you don't want your program to abort when the regexp is invalid. For example:
#x = qw/foo bar baz quux xyzzy/;
do {
print STDERR 'Enter pattern: ';
$pat = <STDIN>;
chomp $pat;
eval {
#y = grep /$pat/, #x;
};
} while($#);
print "result: #y\n";
We ask the user for a pattern and print the list of matches from #x. If the pattern is not a valid regexp, the eval catches the error and puts it into $#, and the program keeps running (The "Invalid" message is printed and the loop continues so the user can try again.) When a valid regexp is entered, there is no error so $# is false the "result" line is printed. Sample run:
Enter pattern: z$
result: baz
Enter pattern: ^(?!....)
result: foo bar baz
Enter pattern: ([^z])\1
result: foo quux
Enter pattern: [xyz
Invalid pattern
Enter pattern: [xyz]
result: baz quux xyzzy
Enter pattern: ^C
Note that eval doesn't catch syntax errors in a fixed regexp. Those are compiled when the script is compiled, so if you have a simple script like
perl -ne 'print if eval { /[xyz/ } or eval { /^ba/ }'
it fails immediately. The evals don't help. Compare to
perl -ne '$x = "[xyz"; $y = "^ba"; print if eval { /$x/ } or eval { /$y/ }'
which is the same thing but with regexps built from variables - this one runs and prints matches for /^ba/. The first eval always returns false (and sets $# which doesn't matter if you don't look at it).

how to return the search results in perl

I would like to write a script which can return me the result whenever the regex meet.I have some difficulties in writing the regex i guess.
Content of My input file is as below:
Number a123;
Number b456789 vit;
alphabet fty;
I wish that it will return me the result of a123 and b456789, which is the string after "Number " and before ("\s" or ";").
I have tried with below cmd line:
my #result=grep /Number/,#input_file;
print "#results\n";
The result i obtained is shown below:
Number a123;
Number b456789 vit;
Wheareas the expected result should be like below:
a123
b456789
Can anyone help on this?
Perls grep function selects/filters all elements from a list that match a certain condition. In your case, you selected all elements that match the regex /Number/ from the #input_file array.
To select the non-whitespace string after Number use this Regex:
my $regex = qr{
Number # Match the literal string 'Number'
\s+ # match any number of whitespace characters
([^\s;]+) # Capture the following non-spaces-or-semicolons into $1
# using a negated character class
}x; # use /x modifier to allow whitespaces in pattern
# for better formatting
My suggestion would be to loop directly over the input file handle:
while(defined(my $line = <$input>)) {
$line =~ /$regex/;
print "Found: $1" if length $1; # skip if nothing was found
}
If you have to use an array, a foreach-loop would be preferable:
foreach my $line (#input_lines) {
$line =~ /$regex/;
print "Found: $1" if length $1; # skip if nothing was found
}
If you don't want to print your matches directly but to store them in an array, push the values into the array inside your loop (both work) or use the map function. The map function replaces each input element by the value of the specified operation:
my #result = map {/$regex/; length $1 ? $1 : ()} #input_file;
or
my #result = map {/$regex/; length $1 ? $1 : ()} <$input>;
Inside the map block, we match the regex against the current array element. If we have a match, we return $1, else we return an empty list. This gets flattened into invisibility so we don't create an entry in #result. This is different form returning undef, what would create an undef element in your array.
if your script is intended as a simple filter, you can use
$ cat FILE | perl -nle 'print $1 if /Number\s+([^\s;]+)/'
or
$ cat FILE | perl -nle 'for (/Number\s+([^\s;]+)/g) { print }'
if there can be multiple occurences on the same line.
perl -lne 'if(/Number/){s/.*\s([a-zA-Z])([\d]+).*$/\1\2/g;print}' your_file
tested below:
> cat temp
Number a123;
Number b456789 vit;
alphabet fty;
> perl -lne 'if(/Number/){s/.*\s([a-zA-Z])([\d]+).*$/\1\2/g;print}' temp
a123
b456789
>

Perl: extract rows from 1 to n (Windows)

I want to extract rows 1 to n from my .csv file. Using this
perl -ne 'if ($. == 3) {print;exit}' infile.txt
I can extract only one row. How to put a range of rows into this script?
If you have only a single range and a single, possibly concatenated input stream, you can use:
#!/usr/bin/perl -n
if (my $seqno = 1 .. 3) {
print;
exit if $seqno =~ /E/;
}
But if you want it to apply to each input file, you need to catch the end of each file:
#!/usr/bin/perl -n
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
And if you want to be kind to people who forget args, add a nice warning in a BEGIN or INIT clause:
#!/usr/bin/perl -n
BEGIN { warn "$0: reading from stdin\n" if #ARGV == 0 && -t }
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
Notable points include:
You can use -n or -p on the #! line. You could also put some (but not all) other command line switches there, like ‑l or ‑a.
Numeric literals as
operands to the scalar flip‐flop
operator are each compared against
readline counter, so a scalar 1 ..
3 is really ($. == 1) .. ($. ==
3).
Calling eof with neither an argument nor empty parens means the last file read in the magic ARGV list of files. This contrasts with eof(), which is the end of the entire <ARGV> iteration.
A flip‐flop operator’s final sequence number is returned with a "E0" appended to it.
The -t operator, which calls libc’s isatty(3), default to the STDIN handle — unlike any of the other filetest operators.
A BEGIN{} block happens during compilation, so if you try to decompile this script with ‑MO=Deparse to see what it really does, that check will execute. With an INIT{}, it will not.
Doing just that will reveal that the implicit input loop as a label called LINE that you perhaps might in other circumstances use to your advantage.
HTH
What's wrong with:
head -3 infile.txt
If you really must use Perl then this works:
perl -ne 'if ($. <= 3) {print} else {exit}' infile.txt
You can use the range operator:
perl -ne 'if (1 .. 3) { print } else { last }' infile.txt