I'm trying to edit a text using Perl. I need to make a substitution but the substitution cannot be applied once an specific word is found in the text. So, imagine I want to substitute all the "hello" forms by "goodbye", but the substitution cannot be applied once the word "foo" is found.
I tried to do this:
use warnings;
use strict;
$/ = undef;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
do {s/hello/goodbay/} until (m{foo});
print;
}
close F;
But, as a result, only the first "hello" of my text is changed.
Any suggestion?
Trying to think what would be the most efficient. It should be one of the following:
s{^(.*?)(foo|\z)}{
my $s = $1;
$s =~ s{hello}{goodbay}g;
$s.$2
}se;
print;
or (same as above, but requires 5.14+)
s{^(.*?)(foo|\z)}{ s{hello}{goodbay}gr . $2 }se;
print;
or
my $pos = /foo/ ? $-[0] : length;
my $s = substr($_, 0, $pos, '');
$s =~ s{hello}{goodbay}g;
print($s);
print;
Both work even if foo isn't present.
This solution uses less memory:
# Assumes foo will always be present
# (though it could be expanded to handle that
# Assumes foo isn't a regex pattern.
local $/ = "foo";
$_ = <$fh>;
chomp;
s{hello}{goodbay}g;
print;
print $/;
local $/;
print <$fh>;
If the substrings you work on (the hello and foo of your example) are single words, a easy way would probably be to replace $/ = undef; with $/ = " ";. Currently you slurp in the whole file at once, meaning the while loop gets executed at most once.
That is because there is only one "line" in the whole input after you told perl that there are no line separators.
If you use a space as input separator, it will loop over the input word by word and hopefully work as you intend.
Use a flag variable:
use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $replace=1;
while(<F>) {
$replace = 0 if m{foo};
s/hello/goodbye/g if $replace;
print;
}
close F;
This stops at the line containing the end pattern. It will be slightly more complicated if you want to substitute up to just before the match.
This answer uses the ${^PREMATCH] and related variables introduced in Perl 5.10.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
my $foo_found;
while (my $line = <>) {
if (!$foo_found) {
if ($line =~ m/foo/ip) {
# only replace hellos in the part before foo
${^PREMATCH} =~ s/hello/goodbye/g;
$line = "${^PREMATCH}${^MATCH}${^POSTMATCH}";
$foo_found ++;
} else {
$line =~ s/hello/goodbye/ig;
}
}
print $line;
}
Given the following input:
hello cruel world
hello baseball
hello mudda, hello fadda
foo
The rest of the hellos should stay
Last hello
I get the following output
goodbye cruel world
goodbye baseball
goodbye mudda, goodbye fadda
foo
The rest of the hellos should stay
Last hello
If you don't have 5.10 you can use $` and related variables but they come with a performance hit. See perldoc perlvar for details.
Related
I'm trying to read contents from an input file, copy only certain lines of code from the file and print in an output file.
Certain lines of code is determined by:
Code name to determine the first line (IP1_NAME or IP2_NAME)
Pattern to determine the last line (END_OF_LIST)
Input file:
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Output file:
(If IP1_NAME is chosen and the CLIENT_NAME should be replaced by tester_ip)
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir
You could use the following one-liner to pull out the lines between the two patterns:
perl -0777 -ne 'print "$1\n" while /IP1_NAME(.*?)END_OF_LIST/gs' in.txt > out.txt
Where in.txt is your input file and out.txt is the output file.
This use case is actually described in perlfaq6: Regular Expressions.
You can then modify the output file to replace CLIENT_NAME with tester_ip:
perl -pi -e 's/CLIENT_NAME/tester_ip/' y.txt
As a script instead of a one-liner, using the scalar range operator:
#/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
process('input.txt', qr/^IP1_NAME$/, qr/^END_OF_LIST$/, 'tester_ip');
sub process {
my ($filename, $startpat, $endpat, $newip) = #_;
open my $file, '<', $filename;
while (my $line = <$file>) {
chomp $line;
if ($line =~ /$startpat/ .. $line =~ /$endpat/) {
next unless $line =~ /^\s/; # Skip the start and lines.
$line =~ s/^\s+//; # Remove indentation
$line =~ s/CLIENT_NAME/$newip/g; # Replace with desired value
say $line;
}
}
}
Running this on your sample input file produces:
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir
I am assuming there is additional stuff in your input file, otherwise we would not have to jump through the hoops with these start and end markers as and we could just say
perl -ne "print if /^ /"
and that would be silly, right ;-)
So, the flipflop has potential problems as I stated in my comment. And while clever, it does not buy you that much in terms of readability or verbosement (verbocity?), since you have to test again anyway in order to not process the marker lines.
As long as there is no exclusive flip flop operator, I would go for a more robust solution.
my $in;
while (<DATA>) {
$in = 1, next if /^IP\d_NAME/;
$in = 0 if /^END_OF_LIST/;
if ( $in )
{
s/CLIENT_NAME/tester_ip/;
print;
}
}
__DATA__
cruft
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
more
cruft
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Lore Ipsargh!
I am attempting to create a Perl script that filters data presented on STDIN, changing all occurrences of
one string to another and outputting all input lines, changed and unchanged to STDOUT. FROMSTRING and TOSTRING can be PERL-compatible regular expressions. I am unable to get matching output.
Here is an example of what I am trying to achieve.
echo "Today is Saturday" | f.pl 'a' '#'
Output Tod#y is S#turd#y.
echo io | filter.pl '([aeiou])([aeiou])' '$2$1'
Output oi.
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2){
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n"
}
exit 1;
my $FROM = $ARGV[0];
my $TO = $ARGV[1];
my $inLine = "";
while (<STDIN>){
$inLine = $_;
$inLine =~ s/$FROM/$TO/;
print $inLine
}
exit 0;
First off, the replacement part of a s/.../.../ operation is not a regex; it works like a double-quoted string.
There are a couple of issues with your code.
Your exit 1; statement appears in the middle of the main code, not in the error block. You probably want:
if (#ARGV != 2) {
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n";
exit 1;
}
You're missing a g flag if you want multiple substitutions to happen in the same line:
$inLine =~ s/$FROM/$TO/g;
There's no need to predeclare $inLine; it's only used in one block.
There's also no need to read a line into $_ just to copy it into $inLine.
It's common to use $names_like_this for variables and functions, not $namesLikeThis.
You can use $0 instead of hardcoding the program name in the error message.
exit 0; is redundant at the end.
The following is closer to how I'd write it:
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2) {
die "Usage: $0 FROMSTRING TOSTRING\n";
}
my ($from, $to) = #ARGV;
while (my $line = readline STDIN) {
$line =~ s/$from/$to/g;
print $line;
}
That said, none of this addresses your second example with '$2$1' as the replacement. The above code won't do what you want because $to is a plain string. Perl won't scan it to look for things like $1 and replace them.
When you write "foo $bar baz" in your code, it means the same thing as 'foo ' . $bar . ' baz', but this only applies to code, i.e. stuff that literally appears in your source code. The contents of $bar aren't re-scanned at runtime to expand e.g. \n or $quux. This also applies to $1 and friends, which are just normal variables.
So how do you get '$2$1' to work?
One way is to mess around with eval, but I don't like it because, well, it's eval: If you're not very careful, it would allow someone to execute arbitrary code by passing the right replacement "string".
Doing it without eval is possible and even easy with e.g. Data::Munge::replace:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Munge qw(replace);
if (#ARGV != 2) {
die "Usage: $0 FROMSTRING TOSTRING\n";
}
my ($from, $to) = #ARGV;
while (my $line = readline STDIN) {
print replace($line, $from, $to, 'g');
}
replace works like JavaScript's String#replace in that it expands special $ sequences.
Doing it by hand is also possible but slightly annoying because you basically have to treat $to as a template and expand all $ sequences by hand (e.g. by using another regex substitution):
# untested
$line =~ s{$from}{
my #start = #-;
my #stop = #+;
(my $r = $to) =~ s{\$([0-9]+|\$)}{
$1 eq '$'
? '$'
: substr($from, $start[$1], $stop[$1] - $start[$1])
}eg;
$r
}eg;
(This does not implement braced groups such as ${1}, ${2}, etc. Those are left as an exercise for the reader.)
This code is sufficiently annoying to write (and look at) that I much prefer using a module like Data::Munge for this sort of thing.
three errors found:
; after error message
exit 1;
$inLine =~ s/$FROM/$TO/g;
like:
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2){
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n";
exit 1;
}
my $FROM = $ARGV[0];
my $TO = $ARGV[1];
my $inLine = "";
while (<STDIN>){
$inLine = $_;
$inLine =~ s/$FROM/$TO/g;
print $inLine
}
exit 0;
I am working on the perl script and need some help with it. The requirement is, I have to find a lable and once the label is found, I have to replace the word in a line immediately following the label. for Example, if the label is ABC:
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
I want to write a script to match the label (ABC) and once the label is found, replace a word in the next line immediately following the label.
Here is my attempt:
open(my $fh, "<", "file1.txt") or die "cannot open file:$!";
while (my $line = <$fh>))
{
next if ($line =~ /ABC/) {
$line =~ s/original_string/replaced_string/;
}
else {
$msg = "pattern not found \n ";
print "$msg";
}
}
Is this correct..? Any help will be greatly appreciated.
The following one-liner will do what you need:
perl -pe '++$x and next if /ABC:/; $x-- and s/old/new/ if $x' inFile > outFile
The code sets a flag and gets the next line if the label is found. If the flag is set, it's unset and the substitution is executed.
Hope this helps!
You're doing this in your loop:
next if ($line =~ /ABC/);
So, you're reading the file, if a line contains ABC anywhere in that line, you skip the line. However, for every other line, you do the replacement. In the end, you're replacing the string on all other lines and printing that out, and your not printing out your labels.
Here's what you said:
I have to read the file until I find a line with the label:
Once the label is found
I have to read the next line and replace the word in a line immediately following the label.
So:
You want to read through a file line-by-line.
If a line matches the label
read the next line
replace the text on the line
Print out the line
Following these directions:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
say "$line"; # Print out the current line
$line = <$fh> # Read the next line
$line =~ s/old/new/; # Replace that word
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Note that when I use say, I don't have to put the \n on the end of the line. Also, by doing my chomp after my read, I can easily match the label without worrying about the \n on the end.
This is done exactly as you said it should be done, but there are a couple of issues. The first is that when we do $line = <$fh>, there's no guarantee we are really reading a line. What if the file ends right there?
Also, it's bad practice to read a file in multiple places. It makes it harder to maintain the program. To get around this issue, we'll use a flag variable. This allows us to know if the line before was a tag or not:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
my $tag_found = 0; # Flag isn't set
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
$tag_found = 1 # We found the tag!
}
if ( $tag_found ) {
$line =~ s/old/new/; # Replace that word
$tag_found = 0; # Reset our flag variable
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Of course, I would prefer to eliminate mysterious values. For example, the tag should be a variable or constant. Same with the string you're searching for and the string you're replacing.
You mentioned this was a word, so your regular expression replacement should probably look like this:
$line =~ s/\b$old_word\b/$new_word/;
The \b mark word boundaries. This way, if you're suppose to replace the word cat with dog, you don't get tripped up on a line that says:
The Jeopardy category is "Say what".
You don't want to change category to dogegory.
Your problem is that reading in a file does not work like that. You're doing it line by line, so when your regex tests true, the line you want to change isn't there yet. You can try adding a boolean variable to check if the last line was a label.
#!/usr/bin/perl;
use strict;
use warnings;
my $found;
my $replacement = "Hello";
while(my $line = <>){
if($line =~ /ABC/){
$found = 1;
next;
}
if($found){
$line =~ s/^.*?$/$replacement/;
$found = 0;
print $line, "\n";
}
}
Or you could use File::Slurp and read the whole file into one string:
use File::Slurp;
$x = read_file( "file.txt" );
$x =~ s/^(ABC:\s*$ [\n\r]{1,2}^.*?)to\sbe/$1to was/mgx;
print $x;
using /m to make the ^ and $ match embedded begin/end of lines
x is to allow the space after the $ - there is probably a better way
Yields:
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
Also, relying on perl's in-place editing:
use File::Slurp qw(read_file write_file);
use strict;
use warnings;
my $file = 'fakefile1.txt';
# Initialize Fake data
write_file($file, <DATA>);
# Enclosed is the actual code that you're looking for.
# Everything else is just for testing:
{
local #ARGV = $file;
local $^I = '.bac';
while (<>) {
print;
if (/ABC/ && !eof) {
$_ = <>;
s/.*/replaced string/;
print;
}
}
unlink "$file$^I";
}
# Compare new file.
print read_file($file);
1;
__DATA__
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
ABC:
outputs
ABC:
replaced string
some other lines
ABC:
replaced string
some other lines
ABC:
replaced string
ABC:
I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...