How to match multiline data in perl

How to match multiline data in perl - perl

This question refers to
How to replace text using greedy approach in sed?
I have to match multiline data in file and need to replace them with some other text using perl.
cat file
<strong>ABC
</strong>
perl script: code.pl
#!/bin/perl
open(fh, $ARGV[0]) or die "could not open file\n";
while($input = <fh>)
{
if($input =~/<strong>(.*?)\n(\s)*<\/strong>/)
{
print($1,"\n");
}
}
close(fh);
perl code.pl file
Output: No output
How to solve above pblm.
Regards

use File::Slurp qw( read_file );
my $string = read_file( $ARGV[0] );
$string =~ s/\<strong>(.*?)<\/strong>/<b>${1}<\/b>/gs;
print $string;
This example uses the File::Slurp module to read in the entire file at once.
It then uses a regex with the g and s modifiers. The s allows .*? to match newline characters. The g makes the search global. Global meaning it will find all matches in the given string. Without the g only the first instance would be replaced. If you want your search to be case insensitive, you can use the i regex modifier.
The ${1} is a back-reference to the match in parentheses.
This example produces:
<b>ABC
</b>

Related

Search and replace a string in a file

I'm trying to read contents from an input file, copy only certain lines of code from the file and print in an output file.
Certain lines of code is determined by:
Code name to determine the first line (IP1_NAME or IP2_NAME)
Pattern to determine the last line (END_OF_LIST)
Input file:
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Output file:
(If IP1_NAME is chosen and the CLIENT_NAME should be replaced by tester_ip)
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir

You could use the following one-liner to pull out the lines between the two patterns:
perl -0777 -ne 'print "$1\n" while /IP1_NAME(.*?)END_OF_LIST/gs' in.txt > out.txt
Where in.txt is your input file and out.txt is the output file.
This use case is actually described in perlfaq6: Regular Expressions.
You can then modify the output file to replace CLIENT_NAME with tester_ip:
perl -pi -e 's/CLIENT_NAME/tester_ip/' y.txt

As a script instead of a one-liner, using the scalar range operator:
#/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
process('input.txt', qr/^IP1_NAME$/, qr/^END_OF_LIST$/, 'tester_ip');
sub process {
my ($filename, $startpat, $endpat, $newip) = #_;
open my $file, '<', $filename;
while (my $line = <$file>) {
chomp $line;
if ($line =~ /$startpat/ .. $line =~ /$endpat/) {
next unless $line =~ /^\s/; # Skip the start and lines.
$line =~ s/^\s+//; # Remove indentation
$line =~ s/CLIENT_NAME/$newip/g; # Replace with desired value
say $line;
}
}
}
Running this on your sample input file produces:
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir

I am assuming there is additional stuff in your input file, otherwise we would not have to jump through the hoops with these start and end markers as and we could just say
perl -ne "print if /^ /"
and that would be silly, right ;-)
So, the flipflop has potential problems as I stated in my comment. And while clever, it does not buy you that much in terms of readability or verbosement (verbocity?), since you have to test again anyway in order to not process the marker lines.
As long as there is no exclusive flip flop operator, I would go for a more robust solution.
my $in;
while (<DATA>) {
$in = 1, next if /^IP\d_NAME/;
$in = 0 if /^END_OF_LIST/;
if ( $in )
{
s/CLIENT_NAME/tester_ip/;
print;
}
}
__DATA__
cruft
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
more
cruft
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Lore Ipsargh!

Can not remove . and .. using grep?

I use readdir to get the files of a directory , but I want to remove . and .. using grep . The output shows it still contain the . and .. , but I can't figure out what's wrong with it ?
here is my code
#!/usr/bin/perl
opendir(Dir,$ARGV[0]);
#Dirs = readdir(Dir);
#Dirs = grep { $_ != /./ } #Dirs;
# #Dirs = grep { $_ =~ /^./ } #Dirs;
print join("\n",#Dirs);
Thanks

I strongly suggest you take note of the following
Always use strict and use warnings, even for the tiniest bit of code. They will repay you the extra typing time many times over
Always use lexical directory handles and file handles. Global handles like this have been the
wrong choice for over twelve years now
Always check the success of file and directory open calls, and use a die string that includes the $! variable to say why the open failed
Use lower-case letters and underscores for local variable names. Upper case is reserved by convention for global items like package names and built-in variables
Use print "$_\n" for #array instead of print join "\n", #array because a) using join produces a second copy of the text in the array and wastes space, and b) using join omits the newline from the last line of the array
Take a look at this alternative to your program, which applies the advice above. I have excluded all directory entries beginning with a dot, as it successfully removes . and .. as well as Linux "hidden" entries that start with a dot. You may require something different.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $dh, $ARGV[0] or die $!;
my #dirs = grep { not /^\./ } readdir $dh;
print "$_\n" for #dirs;

Try escaping the .:
#Dirs = grep { $_ !~ /^\.\.?$/ } #Dirs;
The dot is a special metacharacter which matches any character when not escaped.

. in a regexp means "any character", try escaping it like this: \.

How to search and replace string in a file in Perl

The content of my input file is shown below:
abc\**def\ghi**\abc\!!!!!
abc\**4nfiug\frgrefd\gtefe\wf4fs**\abc\df3gwddw
abc\**eg4/refw**\abc\f3
I need to replace whatever string in between abc \ --------------\abc in my input file with ABC\CBA.
I have tried something like below to get the strings that need to be replaced. But I get stuck when I need to use the search and replace:
my $string1 = qr/abc\W+([^a]+)/;
my $string2 = map{/$string1/ => " "} #input_file; # The string that needs to be replaced
my $string3 = 'ABC\CBA' # String in that. I want it to replace to
s/$string2/$string3/g
How can I fix this?

perl -i -pe 's/this/that/g;' file1

A one-liner to fix a file:
perl -plwe 's/abc\\\K.*(?=\\abc)/ABC\\CBA/' input.txt > output.txt
Or as a script:
use strict;
use warnings;
while (<DATA>) {
s/abc\\\K.*(?=\\abc)/ABC\\CBA/;
print;
}
__DATA__
abc\**def\ghi**\abc\!!!!!
abc\**4nfiug\frgrefd\gtefe\wf4fs**\abc\df3gwddw
abc\**eg4/refw**\abc\f3
The \K (keep) escape sequence means these characters will not be removed. Similarly, the look-ahead assertion (?= ... ) will keep that part of the match. I assumed you only wanted to change the characters in between.
Instead of \K one can use a look-behind assertion: (?<=abc\\). As a personal preference, I used \K instead.

#!/usr/bin/perl
use strict;
use warnings;
open my $fh,"<", "tryit.txt" or die $!;
while (my $line = <$fh>) {
$line =~ s/(abc\\)(.*?)(\\abc)/$1ABC\\CBA$3/;
print $line;
}
gives the following with the input data.
abc\ABC\CBA\abc\!!!!!
abc\ABC\CBA\abc\df3gwddw
abc\ABC\CBA\abc\f3

If you do not want the substitution to operate on the default variable $_, you have to use the =~ operator:
#!/usr/bin/perl
use warnings;
use strict;
my #input_file = split /\n/, <<'__EOF__';
abc\**def\ghi**\abc\!!!!!
abc\**4nfiug\frgrefd\gtefe\wf4fs**\abc\df3gwddw
abc\**eg4/refw**\abc\f3
__EOF__
my $pattern = qr/abc\\.*\\abc/; # pattern to be matched
my $string2 = join "\n", #input_file; # the string that need to be replaced
my $string3 = 'ABC\CBA'; # string i that i want it to replace to
$string2 =~ s/$pattern/$string3/g;
print $string2;

To address your comment about replacing text "inplace" in the file directly, you can use the -i switch for a one-liner. In a script, you can perhaps look at using Tie::File, which allows read-write access to lines of a file as (mutable) elements in an array. To copy Mike/TLP's answer:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
tie my #file, "Tie::File", "tryit.txt" or die $!;
# I think you have to use $_ here (done implicitly)
while (#file) {
s/(abc\\)(.*?)(\\abc)/$1ABC\\CBA$3/;
print;
}

perl + append text between two lines in file

I need to edit file , the main issue is to append text between two known lines in the file
for example I need to append the following text
a b c d e f
1 2 3 4 5 6
bla bla
Between the first_line and the second_line
first_line=")"
second_line="NIC Hr_Nic ("
remark: first_line and second_line argument can get any line or string
How to do this by perl ? ( i write bash script and I need to insert the perl syntax in my script)
lidia

You could read the file in as a single string and then use a regular expression to do the search and replace:
use strict;
use warnings;
# Slurp file myfile.txt into a single string
open(FILE,"myfile.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
# Set strings to find and insert
my $first_line = ")";
my $second_line = "NIC Hr_Nic (";
my $insert = "hello world";
# Insert our text
$file =~ s/\Q$first_line\E\n\Q$second_line\E/$first_line\n$insert\n$second_line/;
# Write output to output.txt
open(OUTPUT,">output.txt") || die "Can't open file: $!";
print OUTPUT $file;
close(OUTPUT);
By unsetting $/ we put Perl into "slurp mode" and so can easily read the whole file into $file.
We use the s/// operator to do a search and replace using our two search lines as a pattern.
The \Q and \E tell Perl to escape the strings between them, i.e. to ignore any special characters that happen to be in $first_line or $second_line.
You could always write the output over the original file if desired.
The problem as you state it is not solvable using the -i command line option since this option processes a file one line at a time; to insert text between two specific lines you'll need to know about two lines at once.

Well to concenate strings you do
my $text = $first_line . $second_line;
or
my $text = $first_line;
$text .= $second_line;

I'm not sure if I understand your question correctly. A "before and after" example of the file content would, I think, be easier. Anyhow, Here's my take on it, using splice instead of a regular expression. We must of course know the line numbers for this to work.
Load the file into an array:
my #lines;
open F, '<', 'filename' or die $!;
push #lines, $_ for <F>;
close F;
Insert the stuff (see perldoc -f splice):
splice #lines, 1, 0, ('stuff');
..and you're done. All you need to do now is save the array again:
open F, '>', 'filename' or die $!;
print F #lines;
close F;

Neatest way to remove linebreaks in Perl

I'm maintaining a script that can get its input from various sources, and works on it per line. Depending on the actual source used, linebreaks might be Unix-style, Windows-style or even, for some aggregated input, mixed(!).
When reading from a file it goes something like this:
#lines = <IN>;
process(\#lines);
...
sub process {
#lines = shift;
foreach my $line (#{$lines}) {
chomp $line;
#Handle line by line
}
}
So, what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks.
I'm coming up with way too many ways of solving this, one of the usual drawbacks of Perl :)
What's your opinion on the neatest way to chomp off generic linebreaks? What would be the most efficient?
Edit: A small clarification - the method 'process' gets a list of lines from somewhere, not nessecarily read from a file. Each line might have
No trailing linebreaks
Unix-style linebreaks
Windows-style linebreaks
Just Carriage-Return (when original data has Windows-style linebreaks and is read with $/ = '\n')
An aggregated set where lines have different styles

After digging a bit through the perlre docs a bit, I'll present my best suggestion so far that seems to work pretty good. Perl 5.10 added the \R character class as a generalized linebreak:
$line =~ s/\R//g;
It's the same as:
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
I'll keep this question open a while yet, just to see if there's more nifty ways waiting to be suggested.

Whenever I go through input and want to remove or replace characters I run it through little subroutines like this one.
sub clean {
my $text = shift;
$text =~ s/\n//g;
$text =~ s/\r//g;
return $text;
}
It may not be fancy but this method has been working flawless for me for years.

$line =~ s/[\r\n]+//g;

Reading perlport I'd suggest something like
$line =~ s/\015?\012?$//;
to be safe for whatever platform you're on and whatever linefeed style you may be processing because what's in \r and \n may differ through different Perl flavours.

Note from 2017: File::Slurp is not recommended due to design mistakes and unmaintained errors. Use File::Slurper or Path::Tiny instead.
extending on your answer
use File::Slurp ();
my $value = File::Slurp::slurp($filename);
$value =~ s/\R*//g;
File::Slurp abstracts away the File IO stuff and just returns a string for you.
NOTE
Important to note the addition of /g , without it, given a multi-line string, it will only replace the first offending character.
Also, the removal of $, which is redundant for this purpose, as we want to strip all line breaks, not just line-breaks before whatever is meant by $ on this OS.
In a multi-line string, $ matches the end of the string and that would be problematic ).
Point 3 means that point 2 is made with the assumption that you'd also want to use /m otherwise '$' would be basically meaningless for anything practical in a string with >1 lines, or, doing single line processing, an OS which actually understands $ and manages to find the \R* that proceed the $
Examples
while( my $line = <$foo> ){
$line =~ $regex;
}
Given the above notation, an OS which does not understand whatever your files '\n' or '\r' delimiters, in the default scenario with the OS's default delimiter set for $/ will result in reading your whole file as one contiguous string ( unless your string has the $OS's delimiters in it, where it will delimit by that )
So in this case all of these regex are useless:
/\R*$// : Will only erase the last sequence of \R in the file
/\R*// : Will only erase the first sequence of \R in the file
/\012?\015?// : When will only erase the first 012\015 , \012 , or \015 sequence, \015\012 will result in either \012 or \015 being emitted.
/\R*$// : If there happens to be no byte sequences of '\015$OSDELIMITER' in the file, then then NO linebreaks will be removed except for the OS's own ones.
It would appear nobody gets what I'm talking about, so here is example code, that is tested to NOT remove line feeds. Run it, you'll see that it leaves the linefeeds in.
#!/usr/bin/perl
use strict;
use warnings;
my $fn = 'TestFile.txt';
my $LF = "\012";
my $CR = "\015";
my $UnixNL = $LF;
my $DOSNL = $CR . $LF;
my $MacNL = $CR;
sub generate {
my $filename = shift;
my $lineDelimiter = shift;
open my $fh, '>', $filename;
for ( 0 .. 10 )
{
print $fh "{0}";
print $fh join "", map { chr( int( rand(26) + 60 ) ) } 0 .. 20;
print $fh "{1}";
print $fh $lineDelimiter->();
print $fh "{2}";
}
close $fh;
}
sub parse {
my $filename = shift;
my $osDelimiter = shift;
my $message = shift;
print "Parsing $message File $filename : \n";
local $/ = $osDelimiter;
open my $fh, '<', $filename;
while ( my $line = <$fh> )
{
$line =~ s/\R*$//;
print ">|" . $line . "|<";
}
print "Done.\n\n";
}
my #all = ( $DOSNL,$MacNL,$UnixNL);
generate 'Windows.txt' , sub { $DOSNL };
generate 'Mac.txt' , sub { $MacNL };
generate 'Unix.txt', sub { $UnixNL };
generate 'Mixed.txt', sub {
return #all[ int(rand(2)) ];
};
for my $os ( ["$MacNL", "On Mac"], ["$DOSNL", "On Windows"], ["$UnixNL", "On Unix"]){
for ( qw( Windows Mac Unix Mixed ) ){
parse $_ . ".txt", #{ $os };
}
}
For the CLEARLY Unprocessed output, see here: http://pastebin.com/f2c063d74
Note there are certain combinations that of course work, but they are likely the ones you yourself naívely tested.
Note that in this output, all results must be of the form >|$string|<>|$string|< with NO LINE FEEDS to be considered valid output.
and $string is of the general form {0}$data{1}$delimiter{2} where in all output sources, there should be either :
Nothing between {1} and {2}
only |<>| between {1} and {2}

In your example, you can just go:
chomp(#lines);
Or:
$_=join("", #lines);
s/[\r\n]+//g;
Or:
#lines = split /[\r\n]+/, join("", #lines);
Using these directly on a file:
perl -e '$_=join("",<>); s/[\r\n]+//g; print' <a.txt |less
perl -e 'chomp(#a=<>);print #a' <a.txt |less

To extend Ted Cambron's answer above and something that hasn't been addressed here: If you remove all line breaks indiscriminately from a chunk of entered text, you will end up with paragraphs running into each other without spaces when you output that text later. This is what I use:
sub cleanLines{
my $text = shift;
$text =~ s/\r/ /; #replace \r with space
$text =~ s/\n/ /; #replace \n with space
$text =~ s/ / /g; #replace double-spaces with single space
return $text;
}
The last substitution uses the g 'greedy' modifier so it continues to find double-spaces until it replaces them all. (Effectively substituting anything more that single space)