I'm maintaining a Perl script (Perl 5.10 on Linux) which needs to process a file line-by-line while being as flexible as possible regarding line separator characters. Any sequence of newlines and/or carriage return characters should mark the end of a line. Blank lines in the file aren't significant. For example, all of these should yield two lines:
FOO\nBAR FOO\rBAR
FOO\r\nBAR FOO\n\rBAR
FOO\r\n\r\r\r\n\n\nBAR
It doesn't look like it's possible to get this behavior through PerlIO or by setting $/. The files aren't large, so I suppose I could just read the whole file into memory and then split it with a regular expression. Is there are more clever way to do this in Perl?
Just slurp the file and use split:
use strict;
use warnings;
use autodie;
use Data::Dump;
my #data = (
"FOO\nBAR",
"FOO\rBAR",
"FOO\r\nBAR",
"FOO\n\rBAR",
"FOO\r\n\r\r\r\n\n\nBAR",
);
for my $filedata (#data) {
dd $filedata;
open my $fh, "<", \"$filedata";
local $/;
for my $line (split /[\n\r]+/, <$fh>) {
print " $line\n";
}
}
Outputs:
"FOO\nBAR"
FOO
BAR
"FOO\rBAR"
FOO
BAR
"FOO\r\nBAR"
FOO
BAR
"FOO\n\rBAR"
FOO
BAR
"FOO\r\n\r\r\r\n\n\nBAR"
FOO
BAR
Related
I'm trying to read contents from an input file, copy only certain lines of code from the file and print in an output file.
Certain lines of code is determined by:
Code name to determine the first line (IP1_NAME or IP2_NAME)
Pattern to determine the last line (END_OF_LIST)
Input file:
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Output file:
(If IP1_NAME is chosen and the CLIENT_NAME should be replaced by tester_ip)
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir
You could use the following one-liner to pull out the lines between the two patterns:
perl -0777 -ne 'print "$1\n" while /IP1_NAME(.*?)END_OF_LIST/gs' in.txt > out.txt
Where in.txt is your input file and out.txt is the output file.
This use case is actually described in perlfaq6: Regular Expressions.
You can then modify the output file to replace CLIENT_NAME with tester_ip:
perl -pi -e 's/CLIENT_NAME/tester_ip/' y.txt
As a script instead of a one-liner, using the scalar range operator:
#/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
process('input.txt', qr/^IP1_NAME$/, qr/^END_OF_LIST$/, 'tester_ip');
sub process {
my ($filename, $startpat, $endpat, $newip) = #_;
open my $file, '<', $filename;
while (my $line = <$file>) {
chomp $line;
if ($line =~ /$startpat/ .. $line =~ /$endpat/) {
next unless $line =~ /^\s/; # Skip the start and lines.
$line =~ s/^\s+//; # Remove indentation
$line =~ s/CLIENT_NAME/$newip/g; # Replace with desired value
say $line;
}
}
}
Running this on your sample input file produces:
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir
I am assuming there is additional stuff in your input file, otherwise we would not have to jump through the hoops with these start and end markers as and we could just say
perl -ne "print if /^ /"
and that would be silly, right ;-)
So, the flipflop has potential problems as I stated in my comment. And while clever, it does not buy you that much in terms of readability or verbosement (verbocity?), since you have to test again anyway in order to not process the marker lines.
As long as there is no exclusive flip flop operator, I would go for a more robust solution.
my $in;
while (<DATA>) {
$in = 1, next if /^IP\d_NAME/;
$in = 0 if /^END_OF_LIST/;
if ( $in )
{
s/CLIENT_NAME/tester_ip/;
print;
}
}
__DATA__
cruft
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
more
cruft
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Lore Ipsargh!
Perl's documentation says that $/ is:
The input record separator, newline by default. This influences Perl's
idea of what a "line" is.
So, is it basically wrong to:
print STDERR $var, $/;
instead of:
print STDERR "$var\n";
?
What could go wrong if I do the former?
Perhaps you are looking for the output record separator instead?
perldoc perlvar:
IO::Handle->output_record_separator( EXPR )
$OUTPUT_RECORD_SEPARATOR
$ORS
$\
The output record separator for the print operator.
If defined, this value is printed after the last of print's arguments. Default is "undef".
You cannot call "output_record_separator()" on a handle, only as a static method. See IO::Handle.
Mnemonic: you set "$\" instead of adding "\n" at the end of the print. Also, it's just like $/, but it's what you get "back" from Perl.
For example,
$\ = $/;
print STDERR $var;
$/ is LF (U+000A) by default. This is the same character produced by "\n"[1]. So unless you changed $/, $/ and "\n" are equivalent. If you did change $/, then only you know why, and therefore only you know whether $/ or "\n" is more appropriate.
On ancient MacOS boxes, $/'s default was CR (U+000D), but that's also what "\n" produced there.
You need output record separator $\ as xxfelixxx has answered.
$/ as you read is input record separator. Manipulating it can affect how Perl reads the file data you've provided. For example:
open my $fh, "<", $filename or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
close $fh;
The above causes whole content of file to slurp in scalar $content because we had reset $/.
Consider the below code:
#!/usr/bin/perl
use strict;
use warnings;
my $content;
{local $/; $content = <DATA>}
print "Content is $content";
__DATA__
line 1
line 2
line 3
Output:
Content is line 1
line 2
line 3
But if you do not reset $/, like in below code:
#!/usr/bin/perl
use strict;
use warnings;
my $content = <DATA>;
print "Content is $content";
__DATA__
line 1
line 2
line 3
Output will be Content is line 1.
This is because the input record separator was set to newline and it returned after first line.
I have written following code to read from a file list of filenames on each line and append some data to it.
open my $info,'<',"abc.txt";
while(<$info>){
chomp $_;
my $filename = "temp/".$_.".xml";
print"\n";
print $filename;
print "\n";
}
close $info;
Content of abc.txt
file1
file2
file3
Now I was expecting my code to give me following output
temp/file1.xml
temp/file2.xml
temp/file3.xml
but instead I am getting output
.xml/file1
.xml/file2
.xml/file3
Your file has windows line endings \r\n. chomp removes the \n (Newline) but leaves the \r (Carriage return). Using Data::Dumper with Useqq you can examine the variable:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper($filename);
This should output something like:
$VAR1 = "temp/file1\r.xml";
When printed normally, it will output temp/file, move the cursor to the start of the line and overwrite temp with .xml.
To remove the line endings, replace chomp with:
s/\r\n$//;
or as noted by #Borodin:
s/\s+\z//;
which "has the advantage of working for any line terminator, as well as removing trailing whitespace, which is commonly unwanted"
As has been stated, your file has windows line endings.
The following self-contained example demonstrates what you're working with:
use strict;
use warnings;
open my $info, '<', \ "file1\r\nfile2\r\nfile3\r\n";
while(<$info>){
chomp;
my $filename = "temp/".$_.".xml";
use Data::Dump;
dd $filename;
print $filename, "\n";
}
Outputs:
"temp/file1\r.xml"
.xml/file1
"temp/file2\r.xml"
.xml/file2
"temp/file3\r.xml"
.xml/file3
Now there are two ways to fix this
Adjust the $INPUT_RECORD_SEPARATOR to that of your file.
local $/ = "\r\n";
while(<$info>){
chomp;
chomp automatically works on the value of $/.
Use a regex instead of chomp to strip the line endings
Since perl 5.10 there is a escape code \R which stands for a generic newline.
while(<$info>){
s/\R//;
Alternatively, you could just strip all trailing spacing to be even more sure of covering your bases:
while(<$info>){
s/\s+\z//;
This question refers to
How to replace text using greedy approach in sed?
I have to match multiline data in file and need to replace them with some other text using perl.
cat file
<strong>ABC
</strong>
perl script: code.pl
#!/bin/perl
open(fh, $ARGV[0]) or die "could not open file\n";
while($input = <fh>)
{
if($input =~/<strong>(.*?)\n(\s)*<\/strong>/)
{
print($1,"\n");
}
}
close(fh);
perl code.pl file
Output: No output
How to solve above pblm.
Regards
use File::Slurp qw( read_file );
my $string = read_file( $ARGV[0] );
$string =~ s/\<strong>(.*?)<\/strong>/<b>${1}<\/b>/gs;
print $string;
This example uses the File::Slurp module to read in the entire file at once.
It then uses a regex with the g and s modifiers. The s allows .*? to match newline characters. The g makes the search global. Global meaning it will find all matches in the given string. Without the g only the first instance would be replaced. If you want your search to be case insensitive, you can use the i regex modifier.
The ${1} is a back-reference to the match in parentheses.
This example produces:
<b>ABC
</b>
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...