Perl filter with substitution

Perl filter with substitution - perl

I am attempting to create a Perl script that filters data presented on STDIN, changing all occurrences of
one string to another and outputting all input lines, changed and unchanged to STDOUT. FROMSTRING and TOSTRING can be PERL-compatible regular expressions. I am unable to get matching output.
Here is an example of what I am trying to achieve.
echo "Today is Saturday" | f.pl 'a' '#'
Output Tod#y is S#turd#y.
echo io | filter.pl '([aeiou])([aeiou])' '$2$1'
Output oi.
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2){
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n"
}
exit 1;
my $FROM = $ARGV[0];
my $TO = $ARGV[1];
my $inLine = "";
while (<STDIN>){
$inLine = $_;
$inLine =~ s/$FROM/$TO/;
print $inLine
}
exit 0;

First off, the replacement part of a s/.../.../ operation is not a regex; it works like a double-quoted string.
There are a couple of issues with your code.
Your exit 1; statement appears in the middle of the main code, not in the error block. You probably want:
if (#ARGV != 2) {
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n";
exit 1;
}
You're missing a g flag if you want multiple substitutions to happen in the same line:
$inLine =~ s/$FROM/$TO/g;
There's no need to predeclare $inLine; it's only used in one block.
There's also no need to read a line into $_ just to copy it into $inLine.
It's common to use $names_like_this for variables and functions, not $namesLikeThis.
You can use $0 instead of hardcoding the program name in the error message.
exit 0; is redundant at the end.
The following is closer to how I'd write it:
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2) {
die "Usage: $0 FROMSTRING TOSTRING\n";
}
my ($from, $to) = #ARGV;
while (my $line = readline STDIN) {
$line =~ s/$from/$to/g;
print $line;
}
That said, none of this addresses your second example with '$2$1' as the replacement. The above code won't do what you want because $to is a plain string. Perl won't scan it to look for things like $1 and replace them.
When you write "foo $bar baz" in your code, it means the same thing as 'foo ' . $bar . ' baz', but this only applies to code, i.e. stuff that literally appears in your source code. The contents of $bar aren't re-scanned at runtime to expand e.g. \n or $quux. This also applies to $1 and friends, which are just normal variables.
So how do you get '$2$1' to work?
One way is to mess around with eval, but I don't like it because, well, it's eval: If you're not very careful, it would allow someone to execute arbitrary code by passing the right replacement "string".
Doing it without eval is possible and even easy with e.g. Data::Munge::replace:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Munge qw(replace);
if (#ARGV != 2) {
die "Usage: $0 FROMSTRING TOSTRING\n";
}
my ($from, $to) = #ARGV;
while (my $line = readline STDIN) {
print replace($line, $from, $to, 'g');
}
replace works like JavaScript's String#replace in that it expands special $ sequences.
Doing it by hand is also possible but slightly annoying because you basically have to treat $to as a template and expand all $ sequences by hand (e.g. by using another regex substitution):
# untested
$line =~ s{$from}{
my #start = #-;
my #stop = #+;
(my $r = $to) =~ s{\$([0-9]+|\$)}{
$1 eq '$'
? '$'
: substr($from, $start[$1], $stop[$1] - $start[$1])
}eg;
$r
}eg;
(This does not implement braced groups such as ${1}, ${2}, etc. Those are left as an exercise for the reader.)
This code is sufficiently annoying to write (and look at) that I much prefer using a module like Data::Munge for this sort of thing.

three errors found:
; after error message
exit 1;
$inLine =~ s/$FROM/$TO/g;
like:
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2){
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n";
exit 1;
}
my $FROM = $ARGV[0];
my $TO = $ARGV[1];
my $inLine = "";
while (<STDIN>){
$inLine = $_;
$inLine =~ s/$FROM/$TO/g;
print $inLine
}
exit 0;

Related

How can I have my Perl script read parameters from the command line?

a parsing script I am using looks like that
use strict;
use warnings;
use 5.010;
use autodie;
my (#header, #fh);
while ( <> ) {
if ( /^(\d+)/ ) {
my $n = int $1 / 1000;
unless ( $fh[$n] ) {
my $file = sprintf 'file%d.txt', $n+1;
open $fh[$n], '>', $file;
print { $fh[$n] } #header;
}
print { $fh[$n] } $_;
}
else {
push #header, $_;
}
}
close $_ for grep $_, #fh;
The file I pass to the script is processed an the output is in file1 file2 ..... how can I modify the scripot that there is an additional paramter to the script with which the output is modified to be file1_1 file1_2... if the additional paramter is 1 if the additional paramter would be 2 it would be file2_1 file2_......

If I were to make a change like that, I would make that parameter optional, so as not to break other dependencies. Since you are already relying on #ARGV, that means that we cannot simply shift the parameter, or we must make the parameter non-optional.
E.g.:
my $prefix = shift; # non-optional parameter now
...
my $file = sprintf 'file%s_%d.txt', $prefix, $n+1
But if this program is then used by someone or something that does not expect that parameter, it will remove a file from the input and break the program.
Instead, you can use the rudimentary switch parsing on the command line with the -s switch, or use Getopt::Long, which is a popular module for this purpose.
perl -s program.pl -prefix=1 input1 input2 ...
And then inside the program, either use our $prefix or $main::prefix, so that you do not get strict errors. Then you can also check if $prefix is defined, and handle it accordingly. E.g.:
if (defined $main::prefix) {
$main::prefix .= "_"; # append "_"
} else {
$main::prefix = ""; # empty string
}
my $file = sprintf 'file%s%d.txt', $prefix, $n + 1;
Or using Getopt::Long:
use strict;
use warnings;
use Getopt::Long;
my $prefix;
GetOptions("prefix=s" => \$xyz);
Usage:
perl program.pl -prefix=1 input1 input2 ...

basic perl conditional script not working

I am a beginner to perl and have just been messing around trying to create little scripts. I'm not sure what is wrong here but it just falls through to the else every time as if nothing I input satisfies the if or elsif conditions. Is it because eq is the wrong operator? Or is there something else wrong in my code? Thanks!
#!/usr/bin/perl
use strict;
use warnings;
print "what is your name?\n";
my $name = readline STDIN;
print "Hello $name How are you today?\n";
my $feeling = readline STDIN;
if ($feeling eq "happy") {
print "that's good!\n";
}
elsif ($feeling eq "good") {
print "okay!\n";
}
else {
print "Interesting\n";
}

Use chomp($feeling);
#!/usr/bin/perl
use strict;
use warnings;
print "what is your name?\n";
my $name = readline STDIN;
chomp($name);
print "Hello $name How are you today?\n";
my $feeling = readline STDIN;
chomp($feeling);
if ($feeling eq "happy") {
print "that's good!\n";
}
elsif ($feeling eq "good") {
print "okay!\n";
}
else {
print "Interesting\n";
}
readline STDIN captures every character typed along with last enter hit as \n, say if you type "happy" and hit enter for $feeling then its accepted as "happy\n" notice \n is because enter hit to remove last \n newline character use chomp removes any trailing string

chomp is used to "chomp off" the input record separator, which by default is a newline character.
#!/usr/bin/perl
use strict;
use warnings;
use 5.012; # to use things like 'say' and 'given'
say "what is your name?"; # 'say' is like 'print', but means you don't have to use '\n'
my $name = <STDIN>; # good to include angled brackets <>
chomp($name); # remove the newline when entering the number
say qq{Hello $name, how are you today?}; # qq{} acts like double-quotes ("")
my $feeling = <STDIN>;
chomp $feeling; # notice parenthese aren't always needed
# you could also do chomp(my $feeling=<STDIN>);
given (lc $feeling){ # 'given' is Perl's version of a Switch and lc makes input lowercase
when('happy') { say q{That's good.} } # q{} acts like single-quotes ('')
when('good') { say q{Okay!} }
default { say q{Interesting} } # your else-case
}
As the warnings suggest, given is experimental until smartmatch is figured out. It is perfectly acceptable to use the if-elsif-else structure, if you choose.

doing a substitution until certain condition is true

I'm trying to edit a text using Perl. I need to make a substitution but the substitution cannot be applied once an specific word is found in the text. So, imagine I want to substitute all the "hello" forms by "goodbye", but the substitution cannot be applied once the word "foo" is found.
I tried to do this:
use warnings;
use strict;
$/ = undef;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
do {s/hello/goodbay/} until (m{foo});
print;
}
close F;
But, as a result, only the first "hello" of my text is changed.
Any suggestion?

Trying to think what would be the most efficient. It should be one of the following:
s{^(.*?)(foo|\z)}{
my $s = $1;
$s =~ s{hello}{goodbay}g;
$s.$2
}se;
print;
or (same as above, but requires 5.14+)
s{^(.*?)(foo|\z)}{ s{hello}{goodbay}gr . $2 }se;
print;
or
my $pos = /foo/ ? $-[0] : length;
my $s = substr($_, 0, $pos, '');
$s =~ s{hello}{goodbay}g;
print($s);
print;
Both work even if foo isn't present.
This solution uses less memory:
# Assumes foo will always be present
# (though it could be expanded to handle that
# Assumes foo isn't a regex pattern.
local $/ = "foo";
$_ = <$fh>;
chomp;
s{hello}{goodbay}g;
print;
print $/;
local $/;
print <$fh>;

If the substrings you work on (the hello and foo of your example) are single words, a easy way would probably be to replace $/ = undef; with $/ = " ";. Currently you slurp in the whole file at once, meaning the while loop gets executed at most once.
That is because there is only one "line" in the whole input after you told perl that there are no line separators.
If you use a space as input separator, it will loop over the input word by word and hopefully work as you intend.

Use a flag variable:
use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $replace=1;
while(<F>) {
$replace = 0 if m{foo};
s/hello/goodbye/g if $replace;
print;
}
close F;
This stops at the line containing the end pattern. It will be slightly more complicated if you want to substitute up to just before the match.

This answer uses the ${^PREMATCH] and related variables introduced in Perl 5.10.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
my $foo_found;
while (my $line = <>) {
if (!$foo_found) {
if ($line =~ m/foo/ip) {
# only replace hellos in the part before foo
${^PREMATCH} =~ s/hello/goodbye/g;
$line = "${^PREMATCH}${^MATCH}${^POSTMATCH}";
$foo_found ++;
} else {
$line =~ s/hello/goodbye/ig;
}
}
print $line;
}
Given the following input:
hello cruel world
hello baseball
hello mudda, hello fadda
foo
The rest of the hellos should stay
Last hello
I get the following output
goodbye cruel world
goodbye baseball
goodbye mudda, goodbye fadda
foo
The rest of the hellos should stay
Last hello
If you don't have 5.10 you can use $` and related variables but they come with a performance hit. See perldoc perlvar for details.

Print only the first word in line

I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.

I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.

use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;

You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines

Perl if equals sign

I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}

Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'

Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.

It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}