Perl's documentation says that $/ is:
The input record separator, newline by default. This influences Perl's
idea of what a "line" is.
So, is it basically wrong to:
print STDERR $var, $/;
instead of:
print STDERR "$var\n";
?
What could go wrong if I do the former?
Perhaps you are looking for the output record separator instead?
perldoc perlvar:
IO::Handle->output_record_separator( EXPR )
$OUTPUT_RECORD_SEPARATOR
$ORS
$\
The output record separator for the print operator.
If defined, this value is printed after the last of print's arguments. Default is "undef".
You cannot call "output_record_separator()" on a handle, only as a static method. See IO::Handle.
Mnemonic: you set "$\" instead of adding "\n" at the end of the print. Also, it's just like $/, but it's what you get "back" from Perl.
For example,
$\ = $/;
print STDERR $var;
$/ is LF (U+000A) by default. This is the same character produced by "\n"[1]. So unless you changed $/, $/ and "\n" are equivalent. If you did change $/, then only you know why, and therefore only you know whether $/ or "\n" is more appropriate.
On ancient MacOS boxes, $/'s default was CR (U+000D), but that's also what "\n" produced there.
You need output record separator $\ as xxfelixxx has answered.
$/ as you read is input record separator. Manipulating it can affect how Perl reads the file data you've provided. For example:
open my $fh, "<", $filename or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
close $fh;
The above causes whole content of file to slurp in scalar $content because we had reset $/.
Consider the below code:
#!/usr/bin/perl
use strict;
use warnings;
my $content;
{local $/; $content = <DATA>}
print "Content is $content";
__DATA__
line 1
line 2
line 3
Output:
Content is line 1
line 2
line 3
But if you do not reset $/, like in below code:
#!/usr/bin/perl
use strict;
use warnings;
my $content = <DATA>;
print "Content is $content";
__DATA__
line 1
line 2
line 3
Output will be Content is line 1.
This is because the input record separator was set to newline and it returned after first line.
Related
Following code is for copying file content from readfile to writefile. Instead of copying upto last, i want to copy upto some keyword.
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('readfile.txt');
while ( my $line = shift #lines) {
next unless ($line =~ m/END OF HEADER/);
last; # here suggest some other logic
}
append_file('writefile.txt', #lines);
next will continue to the next iteration of the loop, effectively skipping the rest of the statements in the loop for that iteration (in this case, the last).
last will immediately exit the loop, which sounds like what you want. So you should be able to simply put the conditional statement on the last.
Also, I'm not sure why you want to read the entire file into memory to iterate over its lines? Why not just use a regular while(<>)? And I would recommend avoiding File::Slurp, it has some long-standing issues.
You don't show any example input with expected output, and your description is unclear - you said "i want to copy upto some keyword" but in your code you use shift, which removes items from the beginning of the array.
Do you want to remove the lines before or after and including or not including "END OF HEADER"?
This code will copy over only the header:
use warnings;
use strict;
my $infile = 'readfile.txt';
my $outfile = 'writefile.txt';
open my $ifh, '<', $infile or die "$infile: $!";
open my $ofh, '>', $outfile or die "$outfile: $!";
while (<$ifh>) {
last if /END OF HEADER/;
print $ofh $_;
}
close $ifh;
close $ofh;
Whereas if you want to copy everything after the header, you could replace the while above with:
while (<$ifh>) {
last if /END OF HEADER/;
}
while (<$ifh>) {
print $ofh $_;
}
Which will loop and do nothing until it sees END OF HEADER, then breaking out of the first loop and moving to the second, which prints out the lines after the header.
data.txt:
fsffs
sfsfsf
sfSDFF
END OF HEADER
{ dsgs xdgfxdg zFZ }
dgdbg
vfraeer
Code:
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
my $infile = 'data.txt';
my $header_file = 'header.txt';
my $after_header_file = 'after_header.txt';
open my $DATA, '<', $infile;
open my $HEADER, '>', $header_file;
open my $AFTER_HEADER, '>', $after_header_file;
{
local $/ = "END OF HEADER";
my $header = <$DATA>;
say {$HEADER} $header;
my $rest = <$DATA>;
say {$AFTER_HEADER} $rest;
}
close $DATA;
close $HEADER;
close $AFTER_HEADER;
say "Created files: $header_file, $after_header_file";
Output:
$ perl 1.pl
Created files: header.txt, after_header.txt
$ cat header.txt
fsffs
sfsfsf
sfSDFF
END OF HEADER
$ cat after_header.txt
{ dsgs xdgfxdg zFZ }
dgdbg
vfraeer
$/ specifies the input record separator, which by default is a newline. Therefore, when you read from a file:
while (my $x = <$INFILE>) {
}
each value of $x is a sequence of characters up to and including the input recored separator, i.e. a newline, which is what we normally think of as a line of text in a file. Often, we chomp off the newline/input_record_separator at the end of the text:
while (my $x = <$INFILE>) {
chomp $x;
say "$x is a dog";
}
But, you can set the input record separator to anything you want, like your "END OF HEADER" text. That means a line will be all the text up to and including the input record separator, which in this case is "END OF HEADER". For example, a line will be: "abc\ndef\nghi\nEND OF HEADER". Furthermore, chomp() will now remove "END OF HEADER" from the end of its argument, so you could chomp your line if you don't want the "END OF HEADER" marker in the output file.
If perl cannot find the input record separator, then perl keeps reading the file until perl hits the end of the file, then perl returns all the text that was read.
You can use those operations to your advantage when you want to seek to some specific text in a file.
Declaring a variable as local makes the variable magical: when the closing brace of the surrounding block is encountered, perl sets the variable back to the value it had just before the opening brace of the surrounding block:
#Here, by default $/ = "\n", but some code out here could have
#also set $/ to something else
{
local $/ = "END OF HEADER";
} # $/ gets set back to whatever value it had before this block
When you change one of perl's predefined global variables, it's considered good practice to only change the variable for as long as you need to use the variable, then change the variable back to what it was.
If you want to target just the text between the braces, you can do:
data.txt:
fsffs
sfsfsf
sfSDFF
END OF HEADER { dsgs xdgfxdg zFZ }
dgdbg
vfraeer
Code snippet:
...
...
{
local $/ = 'END OF HEADER {';
my $pre_brace = <$DATA>;
$/ = '}';
my $target_text = <$DATA>;
chomp $target_text; #Removes closing brace
say "->$target_text<-";
}
--output:--
-> dsgs xdgfxdg zFZ <-
I have written following code to read from a file list of filenames on each line and append some data to it.
open my $info,'<',"abc.txt";
while(<$info>){
chomp $_;
my $filename = "temp/".$_.".xml";
print"\n";
print $filename;
print "\n";
}
close $info;
Content of abc.txt
file1
file2
file3
Now I was expecting my code to give me following output
temp/file1.xml
temp/file2.xml
temp/file3.xml
but instead I am getting output
.xml/file1
.xml/file2
.xml/file3
Your file has windows line endings \r\n. chomp removes the \n (Newline) but leaves the \r (Carriage return). Using Data::Dumper with Useqq you can examine the variable:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper($filename);
This should output something like:
$VAR1 = "temp/file1\r.xml";
When printed normally, it will output temp/file, move the cursor to the start of the line and overwrite temp with .xml.
To remove the line endings, replace chomp with:
s/\r\n$//;
or as noted by #Borodin:
s/\s+\z//;
which "has the advantage of working for any line terminator, as well as removing trailing whitespace, which is commonly unwanted"
As has been stated, your file has windows line endings.
The following self-contained example demonstrates what you're working with:
use strict;
use warnings;
open my $info, '<', \ "file1\r\nfile2\r\nfile3\r\n";
while(<$info>){
chomp;
my $filename = "temp/".$_.".xml";
use Data::Dump;
dd $filename;
print $filename, "\n";
}
Outputs:
"temp/file1\r.xml"
.xml/file1
"temp/file2\r.xml"
.xml/file2
"temp/file3\r.xml"
.xml/file3
Now there are two ways to fix this
Adjust the $INPUT_RECORD_SEPARATOR to that of your file.
local $/ = "\r\n";
while(<$info>){
chomp;
chomp automatically works on the value of $/.
Use a regex instead of chomp to strip the line endings
Since perl 5.10 there is a escape code \R which stands for a generic newline.
while(<$info>){
s/\R//;
Alternatively, you could just strip all trailing spacing to be even more sure of covering your bases:
while(<$info>){
s/\s+\z//;
I'm maintaining a Perl script (Perl 5.10 on Linux) which needs to process a file line-by-line while being as flexible as possible regarding line separator characters. Any sequence of newlines and/or carriage return characters should mark the end of a line. Blank lines in the file aren't significant. For example, all of these should yield two lines:
FOO\nBAR FOO\rBAR
FOO\r\nBAR FOO\n\rBAR
FOO\r\n\r\r\r\n\n\nBAR
It doesn't look like it's possible to get this behavior through PerlIO or by setting $/. The files aren't large, so I suppose I could just read the whole file into memory and then split it with a regular expression. Is there are more clever way to do this in Perl?
Just slurp the file and use split:
use strict;
use warnings;
use autodie;
use Data::Dump;
my #data = (
"FOO\nBAR",
"FOO\rBAR",
"FOO\r\nBAR",
"FOO\n\rBAR",
"FOO\r\n\r\r\r\n\n\nBAR",
);
for my $filedata (#data) {
dd $filedata;
open my $fh, "<", \"$filedata";
local $/;
for my $line (split /[\n\r]+/, <$fh>) {
print " $line\n";
}
}
Outputs:
"FOO\nBAR"
FOO
BAR
"FOO\rBAR"
FOO
BAR
"FOO\r\nBAR"
FOO
BAR
"FOO\n\rBAR"
FOO
BAR
"FOO\r\n\r\r\r\n\n\nBAR"
FOO
BAR
Why does setting the Perl input record separator to $/ = "__Data__\n" not work?
The data record is set as follows:
__Data__\n
1aaaaaaaaaa\n
aaaaaaaaaaa\n
aaaaaaaaaaaaa\n
__Data__\n
1bbbbbbbbbb\n
bbbbbbbbbbb\n
bbbbbbbbbbbbb\n
__Data__\n
1cccccccccc\n
ccccccccccc\n
ccccccccccccc\n
__Data__\n
Here is the Perl code to access the first row of each data record...
$/ = "__Data__\n";
open READFILE, "<", "logA.txt" or die "Unable to open file";
while (<READFILE>)
{
if (/([^\n]*)\n(.*)/sm)
{
print "$1\n";
}
}
close(<READFILE>);
I get the undesirable output of:
__Data__
and not the desirable output of:
1aaaaaaaaaaa
1bbbbbbbbbbb
1ccccccccccc
Why is the input record separator $/="__Data__"; not working? How should it work?
If I understand the question correctly, you want to strip out the __Data__ part. You want this...
1aaaaaaaaaa
1bbbbbbbbbb
1cccccccccc
...but you're getting this...
__Data__
1aaaaaaaaaa
1bbbbbbbbbb
1cccccccccc
You can use the chomp command to remove the end of line. Normally this is just a newline, but chomp responds to whatever you set $/ to.
use strict;
use warnings;
{
local $/="__Data__\n";
open my $fh, "<", "logA.txt" or die "Unable to open file";
while(my $record = <$fh>) {
chomp $record;
print $record;
}
}
BTW because you changed the concept of "end of line", everything between the __Data__ fields will be considered a single line. If you need to split the lines up, you can use my #lines = split "\n", $record.
use strict;
use warnings;
{
# Isolate the change to the global $/
local $/="__Data__\n";
open my $fh, "<", "logA.txt" or die "Unable to open file";
while(my $record = <$fh>) {
# Remove the __Data__ separator
chomp $record;
# Split the record by line
my #lines = split /\n/, $record;
# Empty record, skip it
next if !#lines;
# Print the first line of the record
print $lines[0], "\n";
}
}
I also made some general improvements to your code. $/ is global and will affect everything that reads files. local ensures your change only happens inside the block.
I've used lexical filehandles, they automatically close themselves when they go out of scope (when the block they're declared in is done).
And I've turned on strict and warnings which will catch typos and little mistakes like close(<READLINE>).
input.txt
__Data__
1aaaaaaaaaa
aaaaaaaaaaa
aaaaaaaaaaaaa
__Data__
1bbbbbbbbbb
bbbbbbbbbbb
bbbbbbbbbbbbb
__Data__
1cccccccccc
ccccccccccc
ccccccccccccc
__Data__
using $/=qq{__Data__\n}
perl -e 'use Data::Dumper;$Data::Dumper::Useqq=1; $/=qq{__Data__\n}; open $fh,"input.txt"; print Dumper [ <$fh> ]'
$VAR1 = [
"__Data__\n",
"1aaaaaaaaaa\naaaaaaaaaaa\naaaaaaaaaaaaa\n__Data__\n",
"1bbbbbbbbbb\nbbbbbbbbbbb\nbbbbbbbbbbbbb\n__Data__\n",
"1cccccccccc\nccccccccccc\nccccccccccccc\n__Data__"
];
using $/=qq{Data}
$VAR1 = [
"__Data",
"__\n1aaaaaaaaaa\naaaaaaaaaaa\naaaaaaaaaaaaa\n__Data",
"__\n1bbbbbbbbbb\nbbbbbbbbbbb\nbbbbbbbbbbbbb\n__Data",
"__\n1cccccccccc\nccccccccccc\nccccccccccccc\n__Data",
"__"
];
I guess it's self explanatory.
I have a program that pulls data from a number of other files to form a large (~200MB) bulk SQL insert statement
INSERT INTO ...
VALUES
('a','b',1,2,3),
('c','d',4,5,6),
Unfortunately, the last line needs to end on a semicolon instead of a comma. Is there a way to (ideally within my perl program) turn only the very last character from a , into a ;?
Things I've tried:
1) After the file has been finished and closed:
open(DAT,">>$output") || die("Cannot Open File");
seek(DAT, 2, SEEK_END);
print DAT ";";
close(DAT);
This just puts a semicolon at the very end.
2) Calling `perl -p -i -e 's/,$/;/g' $output`; from within my perl program, but this is replacing every comma.
3) While printing the last line, end with a semicolon instead of a comma. This doesn't work however because I don't actually know it was the last line until the line has been written.
4) Copy the whole file into a new file, except the last character is a ; instead of a ,. This is slow however, and thus not ideal.
If you know that the ',' you are replacing is always going to be the second last byte in the file (last byte being a "\n"), then you can try this:
my $fsize = -s $filename;
# print $fsize."\n";
open($FILE, "+<", $filename) or die $!;
seek $FILE, $fsize-2, SEEK_SET; # or 0 (numeric) instead of SEEK_SET
print $FILE ";";
close $FILE;
You tried
perl -p -i -e 's/,$/;/g'
Which will apply this replacement on every line in the file. To only do it once, slurp the file using the -0 switch:
perl -0777 -pi -e 's/,$/;/'
This will only match if the last character is a comma (with optional trailing newline). If you have trailing whitespace or other characters, it will not work.
You use wron offset. SEEK_END adds(!) the offset to the END position. So use "-2" as an offset. Try this:
use strict;
use warnings;
open my $fh, "+<x.txt" or die;
seek $fh, -2, 2;
print $fh ";\n";
close $fh;
Or a little bit more talkative:
use strict;
use warnings;
use Fcntl qw(SEEK_END);
open my $fh, "+<x.txt" or die;
seek $fh, -2, SEEK_END;
print $fh ";\n";
close $fh;