perl matching full string - perl

I am very new to Perl. Trying to grep the full line for matched pattern of the string. Seems like it is not able to search for full string. Any suggestion?
use strict;
use warnings;
my $prev;
#my $pattern = ":E: (Sub level extra file/dir checks):";
open(INPUTFILE, "<log_be_sum2.txt") or die "$!";
open(OUTPUTFILE, ">>extract.txt") or die "$!";
while (<INPUTFILE>){
if ($_ =~ /^:E: (Sub level extra file/dir checks):/){
print OUTPUTFILE $prev, $_;
}
$prev = $_;
}

If you want to match a literal / in a regular expression, it either needs to be escaped with a backslash, or you need to use a different character as the regexp quote character (! in the below example). The parenthesis also have to be escaped so they're not treated as a capturing group:
use strict;
use warnings;
my $prev;
#my $pattern = ":E: (Sub level extra file/dir checks):";
open(INPUTFILE, "<", "log_be_sum2.txt") or die "$!";
open(OUTPUTFILE, ">>", "extract.txt") or die "$!";
while (<INPUTFILE>){
if ($_ =~ m!^:E: \(Sub level extra file/dir checks\):!){
print OUTPUTFILE $prev, $_;
}
$prev = $_;
}
Note the change to the three-argument version of open, which is highly recommended. Might consider lexical file handles too. And good for you for using warnings and strict mode! Don't see that enough in new users of the language.

It's good that you are using strict and warnings but you should pay attention to the error/warning messages, and post them along with the question if you don't understand them. My version of Perl fails with an error about an unmatched (. The reason this particular error is thrown is because Perl thinks your regexp is complete when it sees the "/" in "file/dir".
When you have special characters, a good practice is to use quotemeta. I noticed you have a commented line with a variable assignment to pattern. You could uncomment that and use it like this:
...
my $pattern = quotemeta ":E: (Sub level extra file/dir checks):";
...
if ($_ =~ /^$pattern/){
...
}
...
}
But there is also a shortcut documented in perlre: the \Q and \E escape sequences. You can use it like $_ =~ /^\Q$pattern\E/. You can still use it and avoid the variable assignment, but in your case you will need to use a different character for the quote-like operator, since your pattern contains a literal /. I tend to prefer m{}, but it's really up to you as long as it's not /.
use strict;
use warnings;
my $prev = q{}; # NOTE: see NOTE below
open INPUTFILE, "<", "log_be_sum2.txt" or die "$!";
open OUTPUTFILE, ">>", "extract.txt" or die "$!";
while (<INPUTFILE>){
if ($_ =~ m{^\Q:E: \(Sub level extra file/dir checks\):\E}){
print OUTPUTFILE $prev, $_;
}
$prev = $_;
}
*NOTE - I seeded $prev with an empty string, because otherwise if your match is on the first line, you will try to print an undefined value, which will result in a warning.

Related

replace a string containing forward slash "/" in perl

I am string to replace a string containing "/" using Perl, using below code
file.txt contains
/usr/open/xyz -getCh $svr
code
open(FILE, "</tmp/file.txt") || die "File not found";
my #lines = <FILE>;
close(FILE);
my $stringToReplace = "\/usr\/open\/xyz -getCh \$svr";
my $stringToReplaceWith = "echo \"y\" | \/usr\/open\/xyz -getCh \$svr";
my #newlines;
foreach(#lines) {
$_ =~ s/$stringToReplace/$stringToReplaceWith/g;
push(#newlines,$_);
}
open(FILE, ">/tmp/file.txt") || die "File not found";
print FILE #newlines;
close(FILE);
The above code is not working for me.
Some notes on your code
Always use strict and use warnings 'all' at the top of every Perl program you write
Use lexical file handles and the three-parameter form of open
An open call may fail for many reasons other than that the file cannot be found. The error message is in $! and you should include it in your die string
Using single quotes removes the need for most backslashes in string literals. Forward slashes don't need to be escaped inside eithern single or double quotes
You should use constant to define constant values, especially if you use more than once
Use the fact that many of Perl's operators default to acting on $_
There is no need for the array #newlines. You are modifying #lines anyway so #newlines is just a copy
Use \Q...\E in regex patterns or double-quoted strings to escape every non-alphanumeric character
The last point will solve your problem. A dollar sign $ inside a regex pattern means the end 0f a line, and needs to be escaped if you want it taken literally
This variation of your program works correctly
use strict;
use warnings 'all';
use constant FILE => '/tmp/file.txt';
my #input = do {
open my $fh, '<', FILE or die "Unable to open input file: $!";
<$fh>;
};
my $old = '/usr/open/xyz -getCh $svr';
my $new = 'echo "y" | ' . $old;
open my $fh, '>', FILE or die "Unable to open output file: $!";
for ( #input ) {
s/\Q$old/$new/g;
print $fh $_;
}
print "Changes complete\n";

Pattern match (e.g /4947000219 in file01) & replace that digit with plus one in each file

Searching a pattern (e.g /4947000219 in file01) & replacing that digit with one increment in each file.(e.g /4947000219 with /4947000220 in file02, /4947000221 in file 03..) same try with 20140924105028
#!/usr/bin/perl -w
use strict;
use warnings;
use 5.010;
use autodie;
use Time::Piece;
use Time::Seconds qw/ ONE_MINUTE /;
use constant DATE_FORMAT => '%Y%m%d%H%M%S';
my $n;
my $directory = "/home/e/Doc/AutoMation";
opendir(DIR, $directory) or die "couldn't open $directory: $!\n";
my #files = readdir DIR;
foreach (#files) {
open my $in_fh, '<', $_;
my #lines = $_;
close $in_fh;
++$n;
$lines[0] =~ s/\/4947000219/\K(4947000219+)/$1 + $n / e;
$lines[1] =~ s{:20140924105028\K(\d+)}{
my $tp = Time::Piece->strptime($1, DATE_FORMAT);
($tp + ONE_MINUTE * 2 * $n)->strftime(DATE_FORMAT);
}e;
my $backup = "$_.backup";
unlink $backup if -f $backup;
rename $_, $backup;
open my $out_fh, '>', $_;
print $out_fh #lines;
close $out_fh;
}
closedir DIR;
Getting Error message:
Unrecognized escape \K passed through at /home/e/Doc/AutoMation line 27.
Scalar found where operator expected at /home/e/Doc/AutoMation.pl line 27, near "s/\/4947000219/\K(4947000219+)/$1"
syntax error at /home/e/Doc/AutoMation.pl line 27, near "s/\/4947000219/\K(4947000219+)/$1"
You have a syntax error in this line:
$lines[0] =~ s/\/4947000219/\K(4947000219+)/$1+$n/e;
# ││ │└─────── Syntax error
# ││ └──────── End of substitution string
# │└─────────────────────── \K is regex only, warning
# └──────────────────────── Not escaped, end of regex
You haven't escaped all the / in your regex. This is why you get a syntax error. I'd advise you to try to find a delimiter that doesn't occur in your regex, like this:
$lines[0] =~ s~/4947000219/\K(4947000219+)~$1+$n~e
Also, your regex might not do what you want. The + after the 9 is oddly suspicious. I'd move it outside of the parens if there is a possibility of more than one occurrence of your number (now it is only quantifying the 9).
Edit: Additionally reading from a filehandle is done with <$fh> and only in the special case of while(<$fh>) the contents get assigned to $_. Thus your #lines instantiation should look like this:
my #lines = <$in_fh>;

foreach and special variable $_ not behaving as expected

I'm learning Perl and wrote a small script to open perl files and remove the comments
# Will remove this comment
my $name = ""; # Will not remove this comment
#!/usr/bin/perl -w <- wont remove this special comment
The name of files to be edited are passed as arguments via terminal
die "You need to a give atleast one file-name as an arguement\n" unless (#ARGV);
foreach (#ARGV) {
$^I = "";
(-w && open FILE, $_) || die "Oops: $!";
/^\s*#[^!]/ || print while(<>);
close FILE;
print "Done! Please see file: $_\n";
}
Now when I ran it via Terminal:
perl removeComments file1.pl file2.pl file3.pl
I got the output:
Done! Please see file:
This script is working EXACTLY as I'm expecting but
Issue 1 : Why $_ didn't print the name of the file?
Issue 2 : Since the loop runs for 3 times, why Done! Please see file: was printed only once?
How you would write this script in as few lines as possible?
Please comment on my code as well, if you have time.
Thank you.
The while stores the lines read by the diamond operator <> into $_, so you're writing over the variable that stores the file name.
On the other hand, you open the file with open but don't actually use the handle to read; it uses the empty diamond operator instead. The empty diamond operator makes an implicit loop over files in #ARGV, removing file names as it goes, so the foreach runs only once.
To fix the second issue you could use while(<FILE>), or rewrite the loop to take advantage of the implicit loop in <> and write the entire program as:
$^I = "";
/^\s*#[^!]/ || print while(<>);
Here's a more readable approach.
#!/usr/bin/perl
# always!!
use warnings;
use strict;
use autodie;
use File::Copy;
# die with some usage message
die "usage: $0 [ files ]\n" if #ARGV < 1;
for my $filename (#ARGV) {
# create tmp file name that we are going to write to
my $new_filename = "$filename\.new";
# open $filename for reading and $new_filename for writing
open my $fh, "<", $filename;
open my $new_fh, ">", $new_filename;
# Iterate over each line in the original file: $filename,
# if our regex matches, we bail out. Otherwise we print the line to
# our temporary file.
while(my $line = <$fh>) {
next if $line =~ /^\s*#[^!]/;
print $new_fh $line;
}
close $fh;
close $new_fh;
# use File::Copy's move function to rename our files.
move($filename, "$filename\.bak");
move($new_filename, $filename);
print "Done! Please see file: $filename\n";
}
Sample output:
$ ./test.pl a.pl b.pl
Done! Please see file: a.pl
Done! Please see file: b.pl
$ cat a.pl
#!/usr/bin/perl
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
$ cat a.pl.bak
#!/usr/bin/perl
# this doesn't do much
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
Its not safe to use multiple loops and try to get the right $_. The while Loop is killing your $_. Try to give your files specific names inside that loop. You can do this with so:
foreach my $filename(#ARGV) {
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
or that way:
foreach (#ARGV) {
my $filename = $_;
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
Please never use barewords for filehandles and do use a 3-argument open.
open my $FILE, '<', $filename — good
open FILE $filename — bad
Simpler solution: Don't use $_.
When Perl was first written, it was conceived as a replacement for Awk and shell, and Perl heavily borrowed from that syntax. Perl also for readability created the special variable $_ which allowed you to use various commands without having to create variables:
while ( <INPUT> ) {
next if /foo/;
print OUTPUT;
}
The problem is that if everything is using $_, then everything will effact $_ in many unpleasant side effects.
Now, Perl is a much more sophisticated language, and has things like locally scoped variables (hint: You don't use local to create these variables -- that merely gives _package variables (aka global variables) a local value.)
Since you're learning Perl, you might as well learn Perl correctly. The problem is that there are too many books that are still based on Perl 3.x. Find a book or web page that incorporates modern practice.
In your program, $_ switches from the file name to the line in the file and back to the next file. It's what's confusing you. If you used named variables, you could distinguished between files and lines.
I've rewritten your program using more modern syntax, but your same logic:
use strict;
use warnings;
use autodie;
use feature qw(say);
if ( not $ARGV[0] ) {
die "You need to give at least one file name as an argument\n";
}
for my $file ( #ARGV ) {
# Remove suffix and copy file over
if ( $file =~ /\..+?$/ ) {
die qq(File "$file" doesn't have a suffix);
}
my ( $output_file = $file ) =~ s/\..+?$/./; #Remove suffix for output
open my $input_fh, "<", $file;
open my $output_fh, ">", $output_file;
while ( my $line = <$input_fh> ) {
print {$output_fh} $line unless /^\s*#[^!]/;
}
close $input_fh;
close $output_fh;
}
This is a bit more typing than your version of the program, but it's easier to see what's going on and maintain.

Perl Remove Stop Words from multiple files

I have read so many forms on how to remove stop words from files, my code remove many other things but I want to include also stop words. This is how far I reached, but I don't know what I am missing. Please Advice
use Lingua::StopWords qw(getStopWords);
my $stopwords = getStopWords('en');
chdir("c:/perl/input");
#files = <*>;
foreach $file (#files)
{
open (input, $file);
while (<input>)
{
open (output,">>c:/perl/normalized/".$file);
chomp;
#####What should I write here to remove the stop words#####
$_ =~s/<[^>]*>//g;
$_ =~ s/\s\.//g;
$_ =~ s/[[:punct:]]\.//g;
if($_ =~ m/(\w{4,})\./)
{
$_ =~ s/\.//g;
}
$_ =~ s/^\.//g;
$_ =~ s/,/' '/g;
$_ =~ s/\(||\)||\\||\/||-||\'//g;
print output "$_\n";
}
}
close (input);
close (output);
The stop words are the keys of %$stopwords which have the value 1, i.e.:
#stopwords = grep { $stopwords->{$_} } (keys %$stopwords);
It might happen be true that the stop words are just the keys of %$stopwords, but according the the Lingua::StopWords docs you also need to check the value associated with the key.
Once you have the stop words, you can remove them with code like this:
# remove all occurrences of #stopwords from $_
for my $w (#stopwords) {
s/\b\Q$w\E\b//ig;
}
Note the use of \Q...\E to quote any regular expression meta-characters that might appear in the stop word. Even though it is very unlikely that stop words will contains meta-characters, this is a good practice to follow any time you want to represent a literal string in a regular expression.
We also use \b to match a word boundary. This helps ensure that we won't a stop word that occurs in the middle of another word. Hopefully this will work for you - it depends a lot on what your input text is like - i.e. do you have punctuation characters, etc.
# Always use these in your Perl programs.
use strict;
use warnings;
use File::Basename qw(basename);
use Lingua::StopWords qw(getStopWords);
# It's often better to build scripts that take their input
# and output locations as command-line arguments rather than
# being hard-coded in the program.
my $input_dir = shift #ARGV;
my $output_dir = shift #ARGV;
my #input_files = glob "$input_dir/*";
# Convert the hash ref of stop words to a regular array.
# Also quote any regex characters in the stop words.
my #stop_words = map quotemeta, keys %{getStopWords('en')};
for my $infile (#input_files){
# Open both input and output files at the outset.
# Your posted code reopened the output file for each line of input.
my $fname = basename $infile;
my $outfile = "$output_dir/$fname";
open(my $fh_in, '<', $infile) or die "$!: $infile";
open(my $fh_out, '>', $outfile) or die "$!: $outfile";
# Process the data: you need to iterate over all stop words
# for each line of input.
while (my $line = <$fh_in>){
$line =~ s/\b$_\b//ig for #stop_words;
print $fh_out $line;
}
# Close the files within the processing loop, not outside of it.
close $fh_in;
close $fh_out;
}

Perl script to make first 8 characters all caps but not the whole file name

What Perl script should I be using to only change the first 8 characters in a file name to all caps instead of the script changing the entire file name to all caps?
Here is how I am setting it up:
#!/usr/bin/perl
chdir "directory path";
##files = `ls *mw`;
#files = `ls | grep mw`;
chomp #files;
foreach $oldname (#files) {
$newname = $oldname;
$newname =~ s/mw//;
print "$oldname -> $newname\n";
rename("$oldname","$newname");
}
You can use this regex:
my $str = 'Hello World!';
$str =~ s/^(.{8})/uc($1)/se; # $str now contains 'HELLO WOrld!'
The substitution
s/^(.{1,8})/\U$1/
will set the first eight characters of a string to upper case. The complete program looks like this
use strict;
use warnings;
chdir "directory path" or die "Unable to change current directory: $!";
opendir my $dh, '.' or die $!;
my #files = grep -f && /mw/, readdir $dh;
foreach my $file (#files) {
(my $new = $file) =~ s/mw//;
$new =~ s/^(.{1,8})/\U$1/s;
print "$file -> $new\n";
rename $file, $new;
}
How about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
chdir'/path/to/directory';
# Find all files that contain 'mw'
my #files = glob("*mw*");
foreach my $file(#files) {
# skip directories
next if -d $file;
# remve 'mw' from the filename
(my $FILE = $file) =~ s/mw//;
# Change filename to uppercase even if the length is <= 8 char
$FILE =~ s/^(.{1,8})/uc $1/se;
move($file, $FILE);
}
As said in the doc for rename, you'd better use File::Copy to be platform independent.
Always check return values of system calls!
When you make any call to OS services, you should always check the return value. For example, the Perl documentation for chdir is (with added emphasis)
chdir EXPR
chdir FILEHANDLE
chdir DIRHANDLE
chdir
Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by $ENV{HOME}, if set; if not, changes to the directory specified by $ENV{LOGDIR}. (Under VMS, the variable $ENV{SYS$LOGIN} is also checked, and used if it is set.) If neither is set, chdir does nothing. It returns true on success, false otherwise. See the example under die.
On systems that support fchdir(2), you may pass a filehandle or directory handle as the argument. On systems that don't support fchdir(2), passing handles raises an exception.
As written in your question, your code discards important information: whether system calls chdir and rename succeeded or failed.
Providing useful error messages
An example of a common idiom for checking return values in Perl is
chdir $path or die "$0: chdir $path: $!";
The error message contains three important bits of information:
the program emitting the error, $0
what it was trying to do, chdir in this case
why it failed, $!
Also note that die also the name of the file and line number where program control was if your error message does not end with newline. When the chdir fails, the standard error will resemble
./myprogram: chdir: No such file or directory at ./myprogram line 3.
Logical or is true when at least one of its arguments is true. The “do something or die” idiom works because if chdir above fails, it returns a false value and requires or to evaluate the right-hand side and terminates execution with die. In the happy case where chdir succeeds and returns a true value, there is no need to evaluate the right-hand side because we already have one true argument to logical or.
Suggested improvements to your code
For what you’re doing, I recommend using readdir to avoid problems in case one of the filenames contains whitespace. Note the defined test in the code below that’s there to stop a file named 0 (i.e., a single zero character) terminating your loop.
#! /usr/bin/env perl
chdir "directory path" or die "$0: chdir: $!";
opendir $dh, "." or die "$0: opendir: $!";
while (defined($oldname = readdir $dh)) {
next unless ($newname = $oldname) =~ s/mw//;
$newname =~ s/^(.{1,8})/\U$1/;
rename $oldname, $newname or die "$0: rename $oldname, $newname: $!";
}
For the rename to have any hope, you have to preserve the value of $oldname, so right away, the code above copies it to $newname and starts changing the copy rather than the original. You will see
($new = $old) =~ s/.../.../; # or /.../
in Perl code, so it is also an important idiom to understand.
The perlop documentation defines handy escape sequences for use in strings and regex substitutions:
\l lowercase next character only
\u titlecase (not uppercase!) next character only
\L lowercase all characters till \E seen
\U uppercase all characters till \E seen
\Q quote non-word characters till \E
\E end either case modification or quoted section (whichever was last seen)
The code above grabs the first eight characters (or fewer if $newname is shorter in length) and replaces them with their upcased counterparts.
Example output
See the code in action:
$ ls directory\ path/
defmwghijk mwabc nochange qrstuvwxyzmw
$ ./prog
$ ls directory\ path/
ABC DEFGHIJK QRSTUVWXyz nochange
I figure there's more to your requirements than you're telling us, such as not uppercasing parts of the file extension. Instead of matching the first eight characters, I'll match the first eight letters:
use v5.14;
use utf8;
chdir "/Users/brian/test/";
my #files = glob( 'mw*' );
foreach my $old (#files) {
my $new = $old =~ s/\Amw(\pL{1,8})/\U$1/ir;
print "$old → $new\n";
}
Some other notes:
You can do the glob directly in Perl. You don't need ls.
It looks like you were stripping off mv, so I did that. If that's not what you want, it's easy to change.
In lieu of a regular expression to up-case the first eight characters you could use the 4-argument form of substr. This offers in situ replacement.
my $old = q(abcdefghij);
my $new = $old;
substr( $new, 0, 8, substr( uc($old), 0, 8 ) );
print "$old\n$new\n";
abcdefghij
ABCDEFGHij
Use rename or File::Copy::move (as M42 showed) to perform the actual rename.