perl + print to file_out in place to standard output - perl

I have the follwoing script
#!/usr/bin/perl
open IN, "/tmp/file";
s/(.*)=/$k{$1}++;"$1$k{$1}="/e and print while <IN>;
how to print the output of the script to file_out in place to print to standard output?
lidia

#!/usr/bin/perl
open IN, "/tmp/file";
open OUT, ">file_out.txt";
s/(.*)=/$k{$1}++;"$1$k{$1}="/e and print OUT while <IN>;
Explanation:
`open IN, "/tmp/file"
open command to open file
IN filehandle name
/tmp/file name of file and specifier that it is for reading
if there is no modifier, it means reading
if there is a <, i.e. "</tmp/file" it also means reading
`open OUT, ">file_out.txt"
open command to open file
OUT filehandle name
>file_out.txt name of file and specifier that it is for reading
there must be a >, i.e. ">file_out.txt" to write
s/.../.../e your substitution (I assume you know what it does)
and is a boolean operator that short-circuits, meaning it only does the thing afterwards if the thing beforehand is true. In this case, it will only print if the substitution actually matched something.
print OUT print to the filehandle OUT
while <IN> for each line from the file behind filehandle IN
Note:
Used this way, it makes extensive use of the magical default variable $_. Do a search for $_ on the perlintro site. In short:
If you don't tell a s/// substitution what string to work on, it uses $_
If you don't tell a print what to print, it prints $_
If you don't tell a while loop going through a filehandle's data where to put each line, it gets put into $_
Your program could have been rewritten:
#!/usr/bin/perl
open IN, "/tmp/file";
open OUT, ">file_out.txt";
while( defined( $line = <IN> ) )
{
$line =~ s/(.*)=/$k{$1}++;"$1$k{$1}="/e or next;
print OUT $line;
}

Simply add the filehandle you are printing to after the print statement; opening for writing is a small change from opening for reading:
#!/usr/bin/perl -w
open IN, "/tmp/file";
open OUT, '>', "/tmp/file_out";
s/(.*)=/Sk_$1_++;"$1Sk_$1_="/ and print OUT while <IN>;
(I munged the replacement a bit, so it was easier for me to test.)

Related

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

Perl incorrectly adding newline characters?

This is my tab delimited input file
Name<tab>Street<tab>Address
This is how I want my output file to look like
Street<tab>Address<tab>Address
(yes duplicate the next two columns) My output file looks like this instead
Street<tab>Address
<tab>Address
What is going on with perl? This is my code.
open (IN, $ARGV[0]);
open (OUT, ">output.txt");
while ($line = <IN>){
chomp $line;
#line=split/\t/,$line;
$line[2]=~s/\n//g;
print OUT $line[1]."\t".$line[2]."\t".$line[2]."\n";
}
close( OUT);
First of all, you should always
use strict and use warnings for even the most trivial programs. You will also need to declare each of your variables using my as close as possible to their first use
use lexical file handles and the three-parameter form of open
check the success of every open call, and die with a string that includes $! to show the reason for the failure
Note also that there is no need to explicitly open files named on the command line that appear in #ARGV: you can just read from them using <>.
As others have said, it looks like you are reading a file of DOS or Windows origin on a Linux system. Instead of using chomp, you can remove all trailing whitespace characters from each line using s/\s+\z//. Since CR and LF both count as "whitespace", this will remove all line terminators from each record. Beware, however, that, if trailing space is significant or if the last field may be blank, then this will also remove spaces and tabs. In that case, s/[\r\n]+\z// is more appropriate.
This version of your program works fine.
use strict;
use warnings;
#ARGV = 'addr.txt';
open my $out, '>', 'output.txt' or die $!;
while (<>) {
s/\s+\z//;
my #fields = split /\t/;
print $out join("\t", #fields[1, 2, 2]), "\n";
}
close $out or die $!;
If you know beforehand the origin of your data file, and know it to be a DOS-like file that terminates records with CR LF, you can use the PerlIO crlf layer when you open the file. Like this
open my $in, '<:crlf', $ARGV[0] or die $!;
then all records will appear to end in just "\n" when they are read on a Linux system.
A general solution to this problem is to install PerlIO::eol. Then you can write
open my $in, '<:raw:eol(LF)', $ARGV[0] or die $!;
and the line ending will always be "\n" regardless of the origin of the file, and regardless of the platform where Perl is running.
Did you try to eliminate not only the "\n" but also the "\r"???
$file[2] =~ s/\r\n//g;
$file[3] =~ s/\r\n//g; # Is it the "good" one?
It could work. DOS line endings could also be "\r" (not only "\n").
Another way to avoid end of line problems is to only capture the characters you're interested in:
open (IN, $ARGV[0]);
open (OUT, ">output.txt");
while (<IN>) {
print OUT "$1\t$2\t$2\n" if /^(\w+)\t\w+\t(\w+)\s*/;
}
close( OUT);

issues for a code snippet to handle the input file

I am studying a Perl program, which includes the following segment for handling an input file. I do not understand what is s/^\s+//; used for? Moreover, what are '|' and '||' stand for in open(FILE, "cat $fileName |") || die "could not open file";
open(FILE, "cat $fileName |") || die "could not open file";
while (<FILE>)
{
s/^\s+//;
my #line = split;
if ($line[0]!~ /\:/) {$mark=0}
my $var = $line[$mark];
## some other code
}
You can read the documentation for the various functions in perlfunc.
This code will open a file for reading, by the rather circumspect way of piping from cat instead of simply opening the file. The | means that the shell command cat is piped to the open command, and our file handle will read from the output.
|| is simply or. Open the pipe, and if that fails, the program dies.
while(<FILE>) will read through every line of the input and assign each line to $_. That line is then used implicitly in the substitution and split below. I.e. s/^\s+// is equal to $_ =~ s/^\s+//, and split is equal to split(' ', $_).
s/^\s+//
Will remove leading whitespace. The split will split each line on whitespace, and the elements are stored in the array #line.
Because of the use of implicit split on whitespace, the stripping the leading whitespace with s/^\s+// is not really needed, as that is done automatically.
If the first element does not contain a colon :, $mark is set to 0. Otherwise, it is not set, and will presumably use the value from the previous iteration, since it is not defined inside the loop. Finally, $var is initialized as element number $mark, which is either 0 or whatever.
ETA: As a rather insidious oops: If $mark is undefined, i.e. it does not contain a colon, then $var will still be assigned $line[0], since undef will be converted to 0, with a warning. If use warnings is not in effect, this error is silent, and therefore insidious.
This code seems to be written by someone who does not know too much about perl, and it might not be very safe to use.
The substitution trims leading whitespace that appears at the beginning of the line (^), leaving any non-whitespace characters as the first.
The || operator in open... || die ... is a high-precedence or. If open fails, die executes.
open(FILE, "cat $fileName |") is a waste of an external process. To read a file for input, simply do:
open FILE, '<', $filename or die qq{Could not open "$filename" for reading: $!};
The parentheses for the open call are optional because or does not bind tightly.
It is also better to use lexical file handles:
open my $fh, '<' $filename or die qq{Could not open "$filename" for reading: $!};
This file handle is assigned to a lexical variable that lives only within the scope it is declared. Once the program flow exits this scope, the file closes automatically.
Part of the confusion is that the developer is using the default variable, $_. Many Perl commands (I would say about 1/3 of them) act upon $_ when you don't specify the name of the variable in the function. For example, these are syntactically the same:
my $uppercase_name = uc($_);
my $uppercase_name = uc;
In both cases, the uc function will print the string in the $_ variable in upper case characters. In fact, even the print statement uses the $_ variable. Again, these are both the same:
print $_;
print;
It's frowned upon to use the default variable in newer Perl scripts because it doesn't add clarity to the program and it doesn't make the program faster. I've rewritten the same code snippet you used in order to show the missing $_ variable. It might make the code easier to understand:
open(FILE, "cat $fileName |") || die "could not open file";
while ($_ = <FILE>)
{
$_ =~ s/^\s+//;
my #line = split $_;
if ($line[0] !~ /\:/) {
$mark = 0;
}
my $var = $line[$mark];
## some other code
}
Notice that the while statement is putting the value of the line read into the $_ variable and that the substitute command (the s/^\s+//) is also operating on the $_ variable. I hope that clarifies the code a bit for you.
Now for your questions:
_[W]hat do '|' and '||' stand for?
The || means or as in do this or that. In practice, the or can be thought of as an if statement:
if (not open(FILE, "cat $fileName |")) {
die "could not open file";
}
That is, if the open statement failed, then execute the die statement. If the open statement did manage to open the file, then don't execute the die statement.
In Perl, you now see or instead of || in cases like this:
open(FILE, "cat $fileName |") or die "could not open file";
which makes the meaning a bit more obvious: Open the file, or kill the program.
The single pipe (|) at the end of the file name means execute the command in the open statement (the cat $filename) and read from the output of this command. Imagine something like this:
open (COMMAND, "java -jar foo.war|") or die "Can't execute 'java -jar foo.war'";
Now, I'm running the command java -jar foo.war and using its output in my Perl script.
You can do this the other way around too:
open (MAIL, "|mail $recipient") or die "Can't mail $recipient";
print MAIL "Dear $recipient\n\n";
print MAIL "I hope everything is well.\n";
print MAIL "Sincerely,\n\nDavid";
close MAIL;
I'm now opening the command mail $recipient and writing to it with the print statements. In this case, I'm emailing $recipient with a simple message.
I do not understand what is s/^\s+//; used for?
In the original program, it was on a line by itself:
s/^\s+//;
I've added the missing variable which should help clarify it a bit:
$_ =~ s/^\s+//;
This is the Substitute command in Perl. It's taking the $_ variable and substituting the regular expression ^\s+ with nothing. If you don't understand what are regular expressions, you should take a look at the Perldoc tutorial on the subject. Basically, this is removing all spaces, tabs, and other forms of white space from the beginning of the line.

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

How do I get a filehandle from the command line?

I have a subroutine that takes a filehandle as an argument. How do I make a filehandle from a file path specified on the command line? I don't want to do any processing of this file myself, I just want to pass it off to this other subroutine, which returns an array of hashes with all the parsed data from the file.
Here's what the command line input I'm using looks like:
$ ./getfile.pl /path/to/some/file.csv
Here's what the beginning of the subroutine I'm calling looks like:
sub parse {
my $handle = shift;
my #data = <$handle>;
while (my $line = shift(#data)) {
# do stuff
}
}
Command line arguments are available in the predefined #ARGV array. You can get the file name from there and use open to open a filehandle to it. Assuming that you want read-only access to the file, you would do it this way:
my $file = shift #ARGV;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
parse($fh);
Note that the or die... checks the call open for success and dies with an error message if it wasn't. The built-in variable $! will contain the (OS dependent) error message on failure that tells you why the call wasn't successful. e.g. "Permission denied."
parse(*ARGV) is the simplest solution: the explanation is a bit long, but an important part of learning how to use Perl effectively is to learn Perl.
When you use a null filehandle (<>), it actually reads from the magical ARGV filehandle, which has special semantics: it reads from all the files named in #ARGV, or STDIN if #ARGV is empty.
From perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk. Input from <> comes either from standard
input, or from each file listed on the command line. Here’s how it
works: the first time <> is evaluated, the #ARGV array is checked, and
if it is empty, $ARGV[0] is set to "-", which when opened gives you
standard input. The #ARGV array is then processed as a list of
filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn’t so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally--<> is just a
synonym for <ARGV>, which is magical. (The pseudo code above doesn’t
work because it treats <ARGV> as non-magical.)
You don't have to use <> in a while loop -- my $data = <> will read one line from the first non-empty file, my #data = <>; will slurp it all up at once, and you can pass *ARGV around as if it were a normal filehandle.
This is what the -n switch is for!
Take your parse method, and do this:
#!/usr/bin/perl -n
#do stuff
Each line is stored in $_. So you run
./getfile.pl /path/to.csv
And it does this.
See here and here for some more info about these. I like -p too, and have found the combo of -a and -F to be really useful.
Also, if you want to do some extra processing, add BEGIN and end blocks.
#!/usr/bin/perl -n
BEGIN {
my $accumulator;
}
# do stuff
END {
print process_total($accumulator);
}
or whatever. This is very, very useful.
Am I missing something or are you just looking for the open() call?
open($fh, "<$ARGV[0]") or die "couldn't open $ARGV[0]: $!";
do_something_with_fh($fh);
close($fh);