I'm starting to work with perl and I need to edit some text. Sometimes I need that perl read the input line by line and sometimes I need perl read the input as a whole. I know that this can be set with something like:
$/ = undef;
or with something like:
{
local $/;
$myfile= <$MYFILE>;
}
But Im not sure how I have to do if, in the same script, I want to change the reading from "whole" to "line by line" or viceversa. That is, imagine a script with starts as:
use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
}
And I make some replaces (s///;). And then I need that go on with my edition but reading as a whole. So I write:
{
local $/;
$filename = <F>;
}
But then I need to go on reading line by line....
Somebody can explain me the logic behind this in order to learn how to 'change' from one mode to another, always keeping the last edited version of the input? Thanks
Ok, sorry. I will try to focus on Y instead of X. For instance, I need to edit a text and make a replacement only on a portion of the text which is delimited by two words. So imagine I want to replace all the forms of "dog" to "cat", but only on those "dogs" which are bewtween the word "hello". My input:
hello
dog
dog
dog
hello
dog
dog
My output:
hello
cat
cat
cat
hello
dog
My script:
use warnings;
use strict;
my $file = shift;
open my $FILE, $file or die "Usa: $0 FILENAME\n";
{
local $/;
$file = <$FILE>;
do {$file =~ s{dog}{cat}g} until
($file =~ m/hello/);
}
print $file;
But I get replaced all "dogs"
I tried other stratagey:
use warnings;
use strict;
my $file = shift;
open my $FILE, $file or die "Usa: $0 FILENAME\n";
{
local $/;
$file = <$FILE>;
while ($file =~ m{hello(.*?)hello}sg) {
my $text = $1;
$text =~ s{dog}{cat}g;
}
}
print $file;
But in this case I get no replacement...
You don't have to resort to slurping and multi-line regexes to solve this. Simply use a variable to track the state of the current line, i.e. whether you're inside or outside your delimiters:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $in_hello;
while (<DATA>) {
chomp;
$in_hello = ! $in_hello if $_ eq 'hello';
s/dog/cat/ if $in_hello;
say;
}
__DATA__
hello
dog
dog
dog
hello
dog
dog
Output:
hello
cat
cat
cat
hello
dog
dog
Default, line by line, mode is back again outside code block:
{
local $/;
$filename = <F>;
}
$/ is global variable which is dynamically scoped in this example.
Related
I am new to perl.
Inside my input file is :
james1
84012345
aaron5
2332111 42332
2345112 18238
wayne[2]
3505554
Question: I am not sure what is the correct way to get the input and set the name as key and number as values. example "james" is key and "84012345" is the value.
This is my code:
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $input= $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
my #names = split ' ', $data;
my #values = split ' ', $data;
#hash{#names} = #values;
print Dumper \%hash;
I'mma go over your code real quick:
#!/usr/bin/perl -w
-w is not recommended. You should use warnings; instead (which you're already doing, so just remove -w).
use strict;
use warnings;
Very good.
use Data::Dumper;
my $input= $ARGV[0];
OK.
my %hash;
Don't declare variables before you need them. Declare them in the smallest scope possible, usually right before their first use.
open my $data , '<', $input or die " cannot open file : $_\n";
You have a spurious space at the beginning of your error message and $_ is unset at this point. You should include $input (the name of the file that failed to open) and $! (the error reason) instead.
my #names = split ' ', $data;
my #values = split ' ', $data;
Well, this doesn't make sense. $data is a filehandle, not a string. Even if it were a string, this code would assign the same list to both #names and #values.
#hash{#names} = #values;
print Dumper \%hash;
My version (untested):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#ARGV == 1
or die "Usage: $0 FILE\n";
my $file = $ARGV[0];
my %hash;
{
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
local $/ = '';
while (my $paragraph = readline $fh) {
my #words = split ' ', $paragraph;
my $key = shift #words;
$hash{$key} = \#words;
}
}
print Dumper \%hash;
The idea is to set $/ (the input record separator) to "" for the duration of the input loop, which makes readline return whole paragraphs, not lines.
The first (whitespace separated) word of each paragraph is taken to be the key; the remaining words are the values.
You have opened a file with open() and attached the file handle to $data. The regular way of reading data from a file is to loop over each line, like so:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $input = $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
while (my $line = <$data>) {
chomp $line; # Removes extra newlines (\n)
if ($line) { # Checks if line is empty
my ($key, $value) = split ' ', $line;
$hash{$key} = $value;
}
}
print Dumper \%hash;
OK, +1 for using strict and warnings.
First Take a look at the $/ variable for controlling how a file is broken into records when it's read in.
$data is a file handle you need to extract the data from the file, if it's not to big you can load it all into an array, if it's a large file you can loop over each record at a time. See the <> operator in perlop
Looking at you code it appears that you want to end up with the following data structure from your input file
%hash(
james1 =>[
84012345
],
aaron5 => [
2332111,
42332,
2345112,
18238
]
'wayne[2]' => [
3505554,
]
)
See perldsc on how to do that.
All the documentation can be read using the perldoc command which comes with Perl. Running perldoc on its own will give you some tips on how to use it and running perldoc perldoc will give you possibly far more info than you need at the moment.
in terminal with perl how can I search all php files starting recursive from current working directory for a single or multiline pattern like:
<script>var a=''; * hamoorabi.com * </script>
Read like: find all between <script>var a=''; and </script> but only if contains hamoorabi.com and replace it with an empty string (remove it).
As it´s javascript code there can be a bunch of unescaped characters inside the search string.
From a unix or cwygin prompt:
$ find . | grep .php | xargs ./xx1.pl
Where perl script xx1.pl is :
#!/usr/bin/perl
use strict;
use warnings;
undef $/;
for (#ARGV) {
open(FILE,$_);
my $content = <FILE>;
close(FILE);
my $beginning = '<script>var a=\'\'';
my $end = '</script>';
my $containing = 'hamoorabi.com';
#$content =~ s/(<script>.*?)(hamoorabi.com)(.*?<\/script>)/$1$3/sg;
#much better regex provided by ysth
$content =~ s/\Q$beginning\E(?:(?!\Q$end\E).)*\Q$containing\E.*?\Q$end\E//gs;
open(FILE,">$_");
print FILE $content;
close(FILE);
}
Use File::Find to walk a directory tree looking for files.
use strict;
use warnings;
use File::Find;
sub wanted {
# Ignore anything that isn't a file
return unless -f;
# Ignore anything without a .php extension
return unless /\.php$/;
# Your filename is in $_. Your current directory is the
# one which contains the current file.
# Do what you need to do.
open my $fh, '<', $_ or die $!;
...;
}
I hope this is also helps use further more.
use strict;
use warnings;
String replacement Regex forms
my $str = "<script>var a=''\; * hamoorabi.com * </script>";
$str=~s#<script[^>]*>((?:(?!<\/script>).)*)<\/script># my $script=$&;
$script=~s/^(.*)hamoorabi\.com(.*)$/$1$2/g;
($script);
#esg;
print $str;
Replace the content on the same file using Tie::File
my #array;
use Tie::File;
tie #array, 'Tie::File', "Input.php" || die "blabla";
my $len = join "\n", #array;
#array = split/\n/, $len;
untie #array;
I'm learning Perl and wrote a small script to open perl files and remove the comments
# Will remove this comment
my $name = ""; # Will not remove this comment
#!/usr/bin/perl -w <- wont remove this special comment
The name of files to be edited are passed as arguments via terminal
die "You need to a give atleast one file-name as an arguement\n" unless (#ARGV);
foreach (#ARGV) {
$^I = "";
(-w && open FILE, $_) || die "Oops: $!";
/^\s*#[^!]/ || print while(<>);
close FILE;
print "Done! Please see file: $_\n";
}
Now when I ran it via Terminal:
perl removeComments file1.pl file2.pl file3.pl
I got the output:
Done! Please see file:
This script is working EXACTLY as I'm expecting but
Issue 1 : Why $_ didn't print the name of the file?
Issue 2 : Since the loop runs for 3 times, why Done! Please see file: was printed only once?
How you would write this script in as few lines as possible?
Please comment on my code as well, if you have time.
Thank you.
The while stores the lines read by the diamond operator <> into $_, so you're writing over the variable that stores the file name.
On the other hand, you open the file with open but don't actually use the handle to read; it uses the empty diamond operator instead. The empty diamond operator makes an implicit loop over files in #ARGV, removing file names as it goes, so the foreach runs only once.
To fix the second issue you could use while(<FILE>), or rewrite the loop to take advantage of the implicit loop in <> and write the entire program as:
$^I = "";
/^\s*#[^!]/ || print while(<>);
Here's a more readable approach.
#!/usr/bin/perl
# always!!
use warnings;
use strict;
use autodie;
use File::Copy;
# die with some usage message
die "usage: $0 [ files ]\n" if #ARGV < 1;
for my $filename (#ARGV) {
# create tmp file name that we are going to write to
my $new_filename = "$filename\.new";
# open $filename for reading and $new_filename for writing
open my $fh, "<", $filename;
open my $new_fh, ">", $new_filename;
# Iterate over each line in the original file: $filename,
# if our regex matches, we bail out. Otherwise we print the line to
# our temporary file.
while(my $line = <$fh>) {
next if $line =~ /^\s*#[^!]/;
print $new_fh $line;
}
close $fh;
close $new_fh;
# use File::Copy's move function to rename our files.
move($filename, "$filename\.bak");
move($new_filename, $filename);
print "Done! Please see file: $filename\n";
}
Sample output:
$ ./test.pl a.pl b.pl
Done! Please see file: a.pl
Done! Please see file: b.pl
$ cat a.pl
#!/usr/bin/perl
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
$ cat a.pl.bak
#!/usr/bin/perl
# this doesn't do much
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
Its not safe to use multiple loops and try to get the right $_. The while Loop is killing your $_. Try to give your files specific names inside that loop. You can do this with so:
foreach my $filename(#ARGV) {
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
or that way:
foreach (#ARGV) {
my $filename = $_;
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
Please never use barewords for filehandles and do use a 3-argument open.
open my $FILE, '<', $filename — good
open FILE $filename — bad
Simpler solution: Don't use $_.
When Perl was first written, it was conceived as a replacement for Awk and shell, and Perl heavily borrowed from that syntax. Perl also for readability created the special variable $_ which allowed you to use various commands without having to create variables:
while ( <INPUT> ) {
next if /foo/;
print OUTPUT;
}
The problem is that if everything is using $_, then everything will effact $_ in many unpleasant side effects.
Now, Perl is a much more sophisticated language, and has things like locally scoped variables (hint: You don't use local to create these variables -- that merely gives _package variables (aka global variables) a local value.)
Since you're learning Perl, you might as well learn Perl correctly. The problem is that there are too many books that are still based on Perl 3.x. Find a book or web page that incorporates modern practice.
In your program, $_ switches from the file name to the line in the file and back to the next file. It's what's confusing you. If you used named variables, you could distinguished between files and lines.
I've rewritten your program using more modern syntax, but your same logic:
use strict;
use warnings;
use autodie;
use feature qw(say);
if ( not $ARGV[0] ) {
die "You need to give at least one file name as an argument\n";
}
for my $file ( #ARGV ) {
# Remove suffix and copy file over
if ( $file =~ /\..+?$/ ) {
die qq(File "$file" doesn't have a suffix);
}
my ( $output_file = $file ) =~ s/\..+?$/./; #Remove suffix for output
open my $input_fh, "<", $file;
open my $output_fh, ">", $output_file;
while ( my $line = <$input_fh> ) {
print {$output_fh} $line unless /^\s*#[^!]/;
}
close $input_fh;
close $output_fh;
}
This is a bit more typing than your version of the program, but it's easier to see what's going on and maintain.
I have a directory with a list of image header files of the format
image1.hd
image2.hd
image3.hd
image4.hd
I want to search for the regular expression Image type:=4 in the directory and find the file number which has the first occurrence of this pattern. I can do this with a couple of pipes easily in bash:
grep -l 'Image type:=4' image*.hd | sed ' s/.*image\(.*\).hd/\1/' | head -n1
which returns 1 in this case.
This pattern match will be used in a perl script. I know I could use
my $number = `grep -l 'Image type:=4' image*.hd | sed ' s/.*image\(.*\).hd/\1/' | head -n1`
but is it preferable to use pure perl in such cases? Here is the best I could come up with using perl. It is very cumbersome:
my $tmp;
#want to find the planar study in current study
foreach (glob "$DIR/image*.hd"){
$tmp = $_;
open FILE, "<", "$_" or die $!;
while (<FILE>)
{
if (/Image type:=4/){
$tmp =~ s/.*image(\d+).hd/$1/;
}
}
close FILE;
last;
}
print "$tmp\n";
this also returns the desired output of 1. Is there a more effective way of doing this?
This is simple with the help of a couple of utility modules
use strict;
use warnings;
use File::Slurp 'read_file';
use List::MoreUtils 'firstval';
print firstval { read_file($_) =~ /Image type:=4/ } glob "$DIR/image*.hd";
But if you are restricted to core Perl, then this will do what you want
use strict;
use warnings;
my $firstfile;
while (my $file = glob 'E:\Perl\source\*.pl') {
open my $fh, '<', $file or die $!;
local $/;
if ( <$fh> =~ /Image type:=4/) {
$firstfile = $file;
last;
}
}
print $firstfile // 'undef';
I want a Perl module that reads from the special file handle, <STDIN>, and passes this to a subroutine. You will understand what I mean when you see my code.
Here is how it was before:
#!/usr/bin/perl
use strict; use warnings;
use lib '/usr/local/custom_pm'
package Read_FH
sub read_file {
my ($filein) = #_;
open FILEIN, $filein or die "could not open $filein for read\n";
# reads each line of the file text one by one
while(<FILEIN>){
# do something
}
close FILEIN;
Right now the subroutine takes a file name (stored in $filein) as an argument, opens the file with a file handle, and reads each line of the file one by one using the fine handle.
Instead, I want get the file name from <STDIN>, store it inside a variable, then pass this variable into a subroutine as an argument.
From the main program:
$file = <STDIN>;
$variable = read_file($file);
The subroutine for the module is below:
#!/usr/bin/perl
use strict; use warnings;
use lib '/usr/local/custom_pm'
package Read_FH
# subroutine that parses the file
sub read_file {
my ($file)= #_;
# !!! Should I open $file here with a file handle? !!!!
# read each line of the file
while($file){
# do something
}
Does anyone know how I can do this? I appreciate any suggestions.
It is a good idea in general to use lexical filehandlers. That is a lexical variable containing the file handler instead of a bareword.
You can pass it around like any other variables. If you use read_file from File::Slurp you do not need a seperate file handler, it slurps the content into a variable.
As it is also good practice to close opened file handles as soon as possible this should be the preferred way if you realy only need to get the complete file content.
With File::Slurp:
use strict;
use warnings;
use autodie;
use File::Slurp;
sub my_slurp {
my ($fname) = #_;
my $content = read_file($fname);
print $content; # or do something else with $content
return 1;
}
my $filename = <STDIN>;
my_slurp($filename);
exit 0;
Without extra modules:
use strict;
use warnings;
use autodie;
sub my_handle {
my ($handle) = #_;
my $content = '';
## slurp mode
{
local $/;
$content = <$handle>
}
## or line wise
#while (my $line = <$handle>){
# $content .= $line;
#}
print $content; # or do something else with $content
return 1;
}
my $filename = <STDIN>;
open my $fh, '<', $filename;
my_handle($fh); # pass the handle around
close $fh;
exit 0;
I agree with #mugen kenichi, his solution is a better way to do it than building your own. It's often a good idea to use stuff the community has tested. Anyway, here are the changes you can make to your own program to make it do what you want.
#/usr/bin/perl
use strict; use warnings;
package Read_FH;
sub read_file {
my $filein = <STDIN>;
chomp $filein; # Remove the newline at the end
open my $fh, '<', $filein or die "could not open $filein for read\n";
# reads each line of the file text one by one
my $content = '';
while (<$fh>) {
# do something
$content .= $_;
}
close $fh;
return $content;
}
# This part only for illustration
package main;
print Read_FH::read_file();
If I run it, it looks like this:
simbabque#geektour:~/scratch$ cat test
this is a
testfile
with blank lines.
simbabque#geektour:~/scratch$ perl test.pl
test
this is a
testfile
with blank lines.