Perl Substitute String in Text File [duplicate] - perl

I want to replace the word "blue" with "red" in all text files named as 1_classification.dat, 2_classification.dat and so on. I want to edit the same file so I tried the following code, but it does not work. Where am I going wrong?
#files = glob("*_classification.dat");
foreach my $file (#files)
{
open(IN,$file) or die $!;
<IN>;
while(<IN>)
{
$_ = '~s/blue/red/g';
print IN $file;
}
close(IN)
}

Use a one-liner:
$ perl -pi.bak -e 's/blue/red/g' *_classification.dat
Explanation
-p processes, then prints <> line by line
-i activates in-place editing. Files are backed up using the .bak extension
The regex substitution acts on the implicit variable, which are the contents of the file, line-by-line

None of the existing answers here have provided a complete example of how to do this from within a script (not a one-liner). Here is what I did:
rename($file, $file . '.bak');
open(IN, '<' . $file . '.bak') or die $!;
open(OUT, '>' . $file) or die $!;
while(<IN>)
{
$_ =~ s/blue/red/g;
print OUT $_;
}
close(IN);
close(OUT);

$_='~s/blue/red/g';
Uh, what??
Just
s/blue/red/g;
or, if you insist on using a variable (which is not necessary when using $_, but I just want to show the right syntax):
$_ =~ s/blue/red/g;

It can be done using a single line:
perl -pi.back -e 's/oldString/newString/g;' inputFileName
Pay attention that oldString is processed as a Regular Expression.
In case the string contains any of {}[]()^$.|*+? (The special characters for Regular Expression syntax) make sure to escape them unless you want it to be processed as a regular expression.
Escaping it is done by \, so \[.

Related

replace a string containing forward slash "/" in perl

I am string to replace a string containing "/" using Perl, using below code
file.txt contains
/usr/open/xyz -getCh $svr
code
open(FILE, "</tmp/file.txt") || die "File not found";
my #lines = <FILE>;
close(FILE);
my $stringToReplace = "\/usr\/open\/xyz -getCh \$svr";
my $stringToReplaceWith = "echo \"y\" | \/usr\/open\/xyz -getCh \$svr";
my #newlines;
foreach(#lines) {
$_ =~ s/$stringToReplace/$stringToReplaceWith/g;
push(#newlines,$_);
}
open(FILE, ">/tmp/file.txt") || die "File not found";
print FILE #newlines;
close(FILE);
The above code is not working for me.
Some notes on your code
Always use strict and use warnings 'all' at the top of every Perl program you write
Use lexical file handles and the three-parameter form of open
An open call may fail for many reasons other than that the file cannot be found. The error message is in $! and you should include it in your die string
Using single quotes removes the need for most backslashes in string literals. Forward slashes don't need to be escaped inside eithern single or double quotes
You should use constant to define constant values, especially if you use more than once
Use the fact that many of Perl's operators default to acting on $_
There is no need for the array #newlines. You are modifying #lines anyway so #newlines is just a copy
Use \Q...\E in regex patterns or double-quoted strings to escape every non-alphanumeric character
The last point will solve your problem. A dollar sign $ inside a regex pattern means the end 0f a line, and needs to be escaped if you want it taken literally
This variation of your program works correctly
use strict;
use warnings 'all';
use constant FILE => '/tmp/file.txt';
my #input = do {
open my $fh, '<', FILE or die "Unable to open input file: $!";
<$fh>;
};
my $old = '/usr/open/xyz -getCh $svr';
my $new = 'echo "y" | ' . $old;
open my $fh, '>', FILE or die "Unable to open output file: $!";
for ( #input ) {
s/\Q$old/$new/g;
print $fh $_;
}
print "Changes complete\n";

foreach and special variable $_ not behaving as expected

I'm learning Perl and wrote a small script to open perl files and remove the comments
# Will remove this comment
my $name = ""; # Will not remove this comment
#!/usr/bin/perl -w <- wont remove this special comment
The name of files to be edited are passed as arguments via terminal
die "You need to a give atleast one file-name as an arguement\n" unless (#ARGV);
foreach (#ARGV) {
$^I = "";
(-w && open FILE, $_) || die "Oops: $!";
/^\s*#[^!]/ || print while(<>);
close FILE;
print "Done! Please see file: $_\n";
}
Now when I ran it via Terminal:
perl removeComments file1.pl file2.pl file3.pl
I got the output:
Done! Please see file:
This script is working EXACTLY as I'm expecting but
Issue 1 : Why $_ didn't print the name of the file?
Issue 2 : Since the loop runs for 3 times, why Done! Please see file: was printed only once?
How you would write this script in as few lines as possible?
Please comment on my code as well, if you have time.
Thank you.
The while stores the lines read by the diamond operator <> into $_, so you're writing over the variable that stores the file name.
On the other hand, you open the file with open but don't actually use the handle to read; it uses the empty diamond operator instead. The empty diamond operator makes an implicit loop over files in #ARGV, removing file names as it goes, so the foreach runs only once.
To fix the second issue you could use while(<FILE>), or rewrite the loop to take advantage of the implicit loop in <> and write the entire program as:
$^I = "";
/^\s*#[^!]/ || print while(<>);
Here's a more readable approach.
#!/usr/bin/perl
# always!!
use warnings;
use strict;
use autodie;
use File::Copy;
# die with some usage message
die "usage: $0 [ files ]\n" if #ARGV < 1;
for my $filename (#ARGV) {
# create tmp file name that we are going to write to
my $new_filename = "$filename\.new";
# open $filename for reading and $new_filename for writing
open my $fh, "<", $filename;
open my $new_fh, ">", $new_filename;
# Iterate over each line in the original file: $filename,
# if our regex matches, we bail out. Otherwise we print the line to
# our temporary file.
while(my $line = <$fh>) {
next if $line =~ /^\s*#[^!]/;
print $new_fh $line;
}
close $fh;
close $new_fh;
# use File::Copy's move function to rename our files.
move($filename, "$filename\.bak");
move($new_filename, $filename);
print "Done! Please see file: $filename\n";
}
Sample output:
$ ./test.pl a.pl b.pl
Done! Please see file: a.pl
Done! Please see file: b.pl
$ cat a.pl
#!/usr/bin/perl
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
$ cat a.pl.bak
#!/usr/bin/perl
# this doesn't do much
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
Its not safe to use multiple loops and try to get the right $_. The while Loop is killing your $_. Try to give your files specific names inside that loop. You can do this with so:
foreach my $filename(#ARGV) {
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
or that way:
foreach (#ARGV) {
my $filename = $_;
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
Please never use barewords for filehandles and do use a 3-argument open.
open my $FILE, '<', $filename — good
open FILE $filename — bad
Simpler solution: Don't use $_.
When Perl was first written, it was conceived as a replacement for Awk and shell, and Perl heavily borrowed from that syntax. Perl also for readability created the special variable $_ which allowed you to use various commands without having to create variables:
while ( <INPUT> ) {
next if /foo/;
print OUTPUT;
}
The problem is that if everything is using $_, then everything will effact $_ in many unpleasant side effects.
Now, Perl is a much more sophisticated language, and has things like locally scoped variables (hint: You don't use local to create these variables -- that merely gives _package variables (aka global variables) a local value.)
Since you're learning Perl, you might as well learn Perl correctly. The problem is that there are too many books that are still based on Perl 3.x. Find a book or web page that incorporates modern practice.
In your program, $_ switches from the file name to the line in the file and back to the next file. It's what's confusing you. If you used named variables, you could distinguished between files and lines.
I've rewritten your program using more modern syntax, but your same logic:
use strict;
use warnings;
use autodie;
use feature qw(say);
if ( not $ARGV[0] ) {
die "You need to give at least one file name as an argument\n";
}
for my $file ( #ARGV ) {
# Remove suffix and copy file over
if ( $file =~ /\..+?$/ ) {
die qq(File "$file" doesn't have a suffix);
}
my ( $output_file = $file ) =~ s/\..+?$/./; #Remove suffix for output
open my $input_fh, "<", $file;
open my $output_fh, ">", $output_file;
while ( my $line = <$input_fh> ) {
print {$output_fh} $line unless /^\s*#[^!]/;
}
close $input_fh;
close $output_fh;
}
This is a bit more typing than your version of the program, but it's easier to see what's going on and maintain.

Perl script to make first 8 characters all caps but not the whole file name

What Perl script should I be using to only change the first 8 characters in a file name to all caps instead of the script changing the entire file name to all caps?
Here is how I am setting it up:
#!/usr/bin/perl
chdir "directory path";
##files = `ls *mw`;
#files = `ls | grep mw`;
chomp #files;
foreach $oldname (#files) {
$newname = $oldname;
$newname =~ s/mw//;
print "$oldname -> $newname\n";
rename("$oldname","$newname");
}
You can use this regex:
my $str = 'Hello World!';
$str =~ s/^(.{8})/uc($1)/se; # $str now contains 'HELLO WOrld!'
The substitution
s/^(.{1,8})/\U$1/
will set the first eight characters of a string to upper case. The complete program looks like this
use strict;
use warnings;
chdir "directory path" or die "Unable to change current directory: $!";
opendir my $dh, '.' or die $!;
my #files = grep -f && /mw/, readdir $dh;
foreach my $file (#files) {
(my $new = $file) =~ s/mw//;
$new =~ s/^(.{1,8})/\U$1/s;
print "$file -> $new\n";
rename $file, $new;
}
How about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
chdir'/path/to/directory';
# Find all files that contain 'mw'
my #files = glob("*mw*");
foreach my $file(#files) {
# skip directories
next if -d $file;
# remve 'mw' from the filename
(my $FILE = $file) =~ s/mw//;
# Change filename to uppercase even if the length is <= 8 char
$FILE =~ s/^(.{1,8})/uc $1/se;
move($file, $FILE);
}
As said in the doc for rename, you'd better use File::Copy to be platform independent.
Always check return values of system calls!
When you make any call to OS services, you should always check the return value. For example, the Perl documentation for chdir is (with added emphasis)
chdir EXPR
chdir FILEHANDLE
chdir DIRHANDLE
chdir
Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by $ENV{HOME}, if set; if not, changes to the directory specified by $ENV{LOGDIR}. (Under VMS, the variable $ENV{SYS$LOGIN} is also checked, and used if it is set.) If neither is set, chdir does nothing. It returns true on success, false otherwise. See the example under die.
On systems that support fchdir(2), you may pass a filehandle or directory handle as the argument. On systems that don't support fchdir(2), passing handles raises an exception.
As written in your question, your code discards important information: whether system calls chdir and rename succeeded or failed.
Providing useful error messages
An example of a common idiom for checking return values in Perl is
chdir $path or die "$0: chdir $path: $!";
The error message contains three important bits of information:
the program emitting the error, $0
what it was trying to do, chdir in this case
why it failed, $!
Also note that die also the name of the file and line number where program control was if your error message does not end with newline. When the chdir fails, the standard error will resemble
./myprogram: chdir: No such file or directory at ./myprogram line 3.
Logical or is true when at least one of its arguments is true. The “do something or die” idiom works because if chdir above fails, it returns a false value and requires or to evaluate the right-hand side and terminates execution with die. In the happy case where chdir succeeds and returns a true value, there is no need to evaluate the right-hand side because we already have one true argument to logical or.
Suggested improvements to your code
For what you’re doing, I recommend using readdir to avoid problems in case one of the filenames contains whitespace. Note the defined test in the code below that’s there to stop a file named 0 (i.e., a single zero character) terminating your loop.
#! /usr/bin/env perl
chdir "directory path" or die "$0: chdir: $!";
opendir $dh, "." or die "$0: opendir: $!";
while (defined($oldname = readdir $dh)) {
next unless ($newname = $oldname) =~ s/mw//;
$newname =~ s/^(.{1,8})/\U$1/;
rename $oldname, $newname or die "$0: rename $oldname, $newname: $!";
}
For the rename to have any hope, you have to preserve the value of $oldname, so right away, the code above copies it to $newname and starts changing the copy rather than the original. You will see
($new = $old) =~ s/.../.../; # or /.../
in Perl code, so it is also an important idiom to understand.
The perlop documentation defines handy escape sequences for use in strings and regex substitutions:
\l lowercase next character only
\u titlecase (not uppercase!) next character only
\L lowercase all characters till \E seen
\U uppercase all characters till \E seen
\Q quote non-word characters till \E
\E end either case modification or quoted section (whichever was last seen)
The code above grabs the first eight characters (or fewer if $newname is shorter in length) and replaces them with their upcased counterparts.
Example output
See the code in action:
$ ls directory\ path/
defmwghijk mwabc nochange qrstuvwxyzmw
$ ./prog
$ ls directory\ path/
ABC DEFGHIJK QRSTUVWXyz nochange
I figure there's more to your requirements than you're telling us, such as not uppercasing parts of the file extension. Instead of matching the first eight characters, I'll match the first eight letters:
use v5.14;
use utf8;
chdir "/Users/brian/test/";
my #files = glob( 'mw*' );
foreach my $old (#files) {
my $new = $old =~ s/\Amw(\pL{1,8})/\U$1/ir;
print "$old → $new\n";
}
Some other notes:
You can do the glob directly in Perl. You don't need ls.
It looks like you were stripping off mv, so I did that. If that's not what you want, it's easy to change.
In lieu of a regular expression to up-case the first eight characters you could use the 4-argument form of substr. This offers in situ replacement.
my $old = q(abcdefghij);
my $new = $old;
substr( $new, 0, 8, substr( uc($old), 0, 8 ) );
print "$old\n$new\n";
abcdefghij
ABCDEFGHij
Use rename or File::Copy::move (as M42 showed) to perform the actual rename.

Using Perl to rename files in a directory

I'd like to take a directory and for all email (*.msg) files, remove the 'RE ' at the beginning. I have the following code but the rename fails.
opendir(DIR, 'emails') or die "Cannot open directory";
#files = readdir(DIR);
closedir(DIR);
for (#files){
next if $_ !~ m/^RE .+msg$/;
$old = $_;
s/RE //;
rename($old, $_) or print "Error renaming: $old\n";
}
If your ./emails directory contains these files:
1.msg
2.msg
3.msg
then your #files will look something like ('.', '..', '1.msg', '2.msg', '3.msg') but your rename wants names like 'emails/1.msg', 'emails/2.msg', etc. So you can chdir before renaming:
chdir('emails');
for (#files) {
#...
}
You'd probably want to check the chdir return value too.
Or add the directory names yourself:
rename('emails/' . $old, 'emails/' . $_) or print "Error renaming $old: $!\n";
# or rename("emails/$old", "emails/$_") if you like string interpolation
# or you could use map if you like map
You might want to combine your directory reading and filtering using grep:
my #files = grep { /^RE .+msg$/ } readdir(DIR);
or even this:
opendir(DIR, 'emails') or die "Cannot open directory";
for (grep { /^RE .+msg$/ } readdir(DIR)) {
(my $new = $_) =~ s/^RE //;
rename("emails/$_", "emails/$new") or print "Error renaming $_ to $new: $!\n";
}
closedir(DIR);
You seem to be assuming glob-like behavior rather than than readdir-like behavior.
The underlying readdir system call returns just the filenames within the directory, and will include two entries . and ... This carries through to the readdir function in Perl, just to give a bit more detail on mu's answer.
Alternately, there's not much point to using readdir if you're collecting all the results in an array anyways.
#files = glob('emails/*');
As already mentioned, your script fails because of the path you expect and the script uses are not the same.
I would suggest a more transparent usage. Hardcoding a directory is not a good idea, IMO. As I learned one day when I made a script to alter some original files, with the hardcoded path, and a colleague of mine thought this would be a nice script to borrow to alter his copies. Ooops!
Usage:
perl script.pl "^RE " *.msg
i.e. regex, then a file glob list, where the path is denoted in relation to the script, e.g. *.msg, emails/*.msg or even /home/pat/emails/*.msg /home/foo/*.msg. (multiple globs possible)
Using the absolute paths will leave the user with no doubt as to which files he'll be affecting, and it will also make the script reusable.
Code:
use strict;
use warnings;
use v5.10;
use File::Copy qw(move);
my $rx = shift; # e.g. "^RE "
if ($ENV{OS} =~ /^Windows/) { # Patch for Windows' lack of shell globbing
#ARGV = map glob, #ARGV;
}
for (#ARGV) {
if (/$rx/) {
my $new = s/$rx//r; # Using non-destructive substitution
say "Moving $_ to $new ...";
move($_, $new) or die $!;
}
}
I don't know if the regex fits the specifig name of the files, but in one line this could be done with:
perl -E'for (</path/to/emails*.*>){ ($new = $_) =~ s/(^RE)(.*$)/$2/; say $_." -> ".$new}
(say ... is nice for testing, just replace it with rename $_,$new or rename($_,$new) )
<*.*> read every file in the current directory
($new = $_) =~ saves the following substitution in $new and leaves $_ as intact
(^RE) save this match in $1 (optional) and just match files with "RE" at the beginning
(.*$) save everything until and including the end ($) of the line -> into $2
substitute the match with the string in$2

Perl-script to read and print lines from multiple txt files?

We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:
To: blabla#hotmail.com
Subject: blabla
From: bla1#hotmail.com
Message: Hello World!
The platform I am to the script on is Windows, and everything is local (including the Perl instance). The aim is to write a script, which crawls through each file (all located within the same directory), and print out a list of each 'unique' email address in the from field. The concept is very easy.
Can anyone point me in the right direction here? I know how to start off a Perl script, and I am able to read a single file and print all details:
#!/usr/local/bin/perl
open (MYFILE, 'emails/email_id_1.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
So now, I need to be able to read and print line 3 of this file, but perform this activity not just once, but for all of the files. I've looked into the File::Find module, could this be of any use?
What platform? If Linux then it's simple:
foreach $f (#ARGS) {
# Do stuff
}
and then call with:
perl mything.pl *.txt
In Windows you'll need to expand the wildcard first as cmd.exe doesn't expand wildcards (unlike Linux shells):
#ARGV = map glob, #ARGV
foreach $f (#ARGS) {
# Do stuff
}
then extracting the third line is just a simple case of reading each line in and counting when you've got to line 3 so you know to print the results.
The glob() builtin can give you a list of files in a directory:
chdir $dir or die $!;
my #files = glob('*');
You can use Tie::File to access the 3rd line of a file:
use Tie::File;
for (#files) {
tie my #lines, 'Tie::File', $_ or die $!;
print $lines[2], "\n";
}
Perl one-liner, windows-version:
perl -wE "#ARGV = glob '*.txt'; while (<>) { say $1 if /^From:\s*(.*)/ }"
It will check all the lines, but only print if it finds a valid From: tag.
Are you using a Unix-style shell? You can do this in the shell without even using Perl.
grep "^From:" ./* | sort | uniq -c"
The breakdown is as follows:
grep will grab every line that starts with "From:", and send it to...
sort, which will alpha sort those lines, then...
uniq, which will filter out dupe lines. The "-c" part will count the occurrences.
Your output would look like:
3 From: dave#example.com
5 From: foo#bar.example.com
etc...
Possible issues:
I'm not sure how complex your "From" lines will be, e.g. multiple addresses, different formats, etc.
You could enhance that grep step in a few ways, or replace it with a Perl script that has less-broad functionality than your proposed all-in-one script.
Please comment if anything isn't clear.
Here's my solution (I hope this isn't homework).
It checks all files in the current directory whose names end with ".txt", case-insensitive (e.g., it will find "foo.TXT", which is probably what you want under Windows). It also allows for possible variations in line terminators (at least CR-LF and LF), and searches for the From: prefix case-insensitively, and allows arbitrary whitespace after the :.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $DIR, '.' or die "opendir .: $!\n";
my #files = grep /\.txt$/i, readdir $DIR;
closedir $DIR;
# print "Got ", scalar #files, " files\n";
my %seen = ();
foreach my $file (#files) {
open my $FILE, '<', $file or die "$file: $!\n";
while (<$FILE>) {
if (/^From:\s*(.*)\r?$/i) {
$seen{$1} = 1;
}
}
close $FILE;
}
foreach my $addr (sort keys %seen) {
print "$addr\n";
}