Perl - Encoding error when working with .html file - perl

I have some .html files in a directory to which I want to add one line of css code. Using perl, I can locate the position with a regex and add the css code, this works very well.
However, my first .html file contain an accented letter: é but the resulting .html file has an encoding problem and prints: \xE9
In the perl file, I have been careful to specify UTF-8 encoding when opening and closing the files, has shown in the MWE below, but that does not solve the problem. How can I solve this encoding error?
MWE
use strict;
use warnings;
use File::Spec::Functions qw/ splitdir rel2abs /; # To get the current directory name
# Define variables
my ($inputfile, $outputfile, $dir);
# Initialize variables
$dir = '.';
# Open current directory
opendir(DIR, $dir);
# Scan all files in directory
while (my $inputfile = readdir(DIR)) {
#Name output file based on input file
$outputfile = $inputfile;
$outputfile =~ s/_not_centered//;
# Open output file
open(my $ofh, '>:encoding(UTF-8)', $outputfile);
# Open only files containning ending in _not_centered.html
next unless (-f "$dir/$inputfile");
next unless ($inputfile =~ m/\_not_centered.html$/);
# Open input file
open(my $ifh, '<:encoding(UTF-8)', $inputfile);
# Read input file
while(<$ifh>) {
# Catch and store the number of the chapter
if(/(<h2)(.*?)/) {
# $_ =~ s/<h2/<h2 style="text-align: center;"/;
print $ofh "$1 style=\"text-align: center;\"$2";
}else{
print $ofh "$_";
}
}
# Close input and output files
close $ifh;
close $ofh;
}
# Close output file and directory
closedir(DIR);
Problematic file named "Chapter_001_not_centered.html"
<html >
<head></head>
<body>
<h2 class="chapterHead"><span class="titlemark">Chapter 1</span><br /><a id="x1-10001"></a>Brocéliande</h2>
Brocéliande
</body></html>

Following demo script does required inject with utilization of glob function.
Note: the script creates a new file, uncomment rename to make replacement original file with a new one
use strict;
use warnings;
use open ":encoding(Latin1)";
my $dir = '.';
process($_) for glob("$dir/*_not_centered.html");
sub process {
my $fname_in = shift;
my $fname_new = $fname_in . '.new';
open my $in, '<', $fname_in
or die "Couldn't open $fname_in";
open my $out, '>', $fname_new
or die "Couldn't open $fname_new";
while( <$in> ) {
s/<h2/<h2 style="text-align: center;"/;
print $out $_;
}
close $in;
close $out;
# rename $fname_new, $fname_in
# or die "Couldn't rename $fname_new to $fname_in";
}
If you do not mind to run following script per individual file basis script.pl in_file > out_file
use strict;
use warnings;
print s/<h2/<h2 style="text-align: center;"/ ? $_ : $_ for <>;
In case if such task arises only occasionally then it can be solved with one liner
perl -pe "s/<h2/<h2 style='text-align: center;'/" in_file

This question found an answer in the commments of #Shawn and # sticky bit:
By changing the encoding to open and close the files to ISO 8859-1, it solves the problem. If one of you wants to post the answer, I will validate it.

Related

How to open a file that has a special character in it such as $?

Seems fairly simple but with the "$" in the name causes the name to split. I tried escaping the character out but when I try to open the file I get GLOB().
my $path = 'C:\dir\name$.txt';
open my $file, '<', $path || die
print "file = $file\n";
It should open the file so I can traverse the entries.
It has nothing to do with the "$". Just follow standard file handling procedure.
use strict;
use warnings;
my $path = 'C:\dir\name$.txt';
open my $file_handle, '<', $path or die "Can't open $path: $!";
# read and print the file line by line
while (my $line = <$file_handle>) {
# the <> in scalar context gets one line from the file
print $line;
}
# reset the handle
seek $file_handle, 0, 0;
# read the whole file at once, print it
{
# enclose in a block to localize the $/
# $/ is the line separator, so when it's set to undef,
# it reads the whole file
local $/ = undef;
my $file_content = <$file_handle>;
print $file_content;
}
Consider using the CPAN modules File::Slurper or Path::Tiny which will handle the exact details of using open and readline, checking for errors, and encoding if appropriate (most text files are encoded to UTF-8).
use strict;
use warnings;
use File::Slurper 'read_text';
my $file_content = read_text $path;
use Path::Tiny 'path';
my $file_content = path($path)->slurp_utf8;
If it's a data file, use read_binary or slurp_raw.

Perl Script: sorting through log files.

Trying to write a script which opens a directory and reads bunch of multiple log files line by line and search for information such as example:
"Attendance = 0 " previously I have used grep "Attendance =" * to search my information but trying to write a script to search for my information.
Need your help to finish this task.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/path/';
opendir (DIR, $dir) or die $!;
while (my $file = readdir(DIR))
{
print "$file\n";
}
closedir(DIR);
exit 0;
What's your perl experience?
I'm assuming each file is a text file. I'll give you a hint. Try to figure out where to put this code.
# Now to open and read a text file.
my $fn='file.log';
# $! is a variable which holds a possible error msg.
open(my $INFILE, '<', $fn) or die "ERROR: could not open $fn. $!";
my #filearr=<$INFILE>; # Read the whole file into an array.
close($INFILE);
# Now look in #filearr, which has one entry per line of the original file.
exit; # Normal exit
I prefer to use File::Find::Rule for things like this. It preserves path information, and it's easy to use. Here's an example that does what you want.
use strict;
use warnings;
use File::Find::Rule;
my $dir = '/path/';
my $type = '*';
my #files = File::Find::Rule->file()
->name($type)
->in(($dir));
for my $file (#files){
print "$file\n\n";
open my $fh, '<', $file or die "can't open $file: $!";
while (my $line = <$fh>){
if ($line =~ /Attendance =/){
print $line;
}
}
}

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

Search a word in file and replace in Perl

I want to replace word "a" to "red" in a.text files. I want to edit the same file so I tried this code but it does not work. Where am I going wrong?
#files=glob("a.txt");
foreach my $file (#files)
{
open(IN,$file) or die $!;
<IN>;
while(<IN>)
{
$_=~s/a/red/g;
print IN $file;
}
close(IN)
}
I'd suggest it's probably easier to use perl in sed mode:
perl -i.bak -p -e 's/a/red/g' *.txt
-i is inplace edit (-i.bak saves the old as .bak - -i without a specifier doesn't create a backup - this is often not a good idea).
-p creates a loop that iterates all the files specified one line at a time ($_), applying whatever code is specified by -e before printing that line. In this case - s/// applies a sed-style patttern replacement to $_, so this runs a search and replace over every .txt file.
Perl uses <ARVG> or <> to do some magic - it checks if you specify files on your command line - if you do, it opens them and iterates them. If you don't, it reads from STDIN.
So you can also do:
somecommand.sh | perl -i.bak -p -e 's/a/red/g'
In your code you are using same filehandle to write which you have used for open the file to reading. Open the same file for write mode and then write.
Always use lexical filehandle and three arguments to open a file. Here is your modified code:
use warnings;
use strict;
my #files = glob("a.txt");
my #data;
foreach my $file (#files)
{
open my $fhin, "<", $file or die $!;
<$fhin>;
while(<$fhin>)
{
$_ =~ s/\ba\b/red/g;
push #data, $_;
}
open my $fhw, ">", $file or die "Couldn't modify file: $!";
print $fhw #data;
close $fhw;
}
Here is another way (read whole file in a scalar):
foreach my $file (glob "/path/to/dir/a.txt")
{
#read whole file in a scalar
my $data = do {
local $/ = undef;
open my $fh, "<", $file or die $!;
<$fh>;
};
$data =~ s/\ba\b/red/g; #replace a with red,
#modify the file
open my $fhw, ">", $file or die "Couldn't modify file: $!";
print $fhw $data;
close $fhw;
}

Perl: opening a file, and saving it under a different name after editing

I'm trying to write a configuration script.
For each customer, it will ask for variables, and then write several text files.
But each text file needs to be used more than once, so it can't overwrite them. I'd prefer it read from each file, made the changes, and then saved them to $name.originalname.
Is this possible?
You want something like Template Toolkit. You let the templating engine open a template, fill in the placeholders, and save the result. You shouldn't have to do any of that magic yourself.
For very small jobs, I sometimes use Text::Template.
why not copy the file first and then edit the copied file
The code below expects to find a configuration template for each customer where, for example, Joe's template is joe.originaljoe and writes the output to joe:
foreach my $name (#customers) {
my $template = "$name.original$name";
open my $in, "<", $template or die "$0: open $template";
open my $out, ">", $name or die "$0: open $name";
# whatever processing you're doing goes here
my $output = process_template $in;
print $out $output or die "$0: print $out: $!";
close $in;
close $out or warn "$0: close $name";
}
assuming you want to read in one file, make changes to it line-by-line, then write to another file:
#!/usr/bin/perl
use strict;
use warnings;
# set $input_file and #output_file accordingly
# input file
open my $in_filehandle, '<', $input_file or die $!;
# output file
open my $out_filehandle, '>', $output_file or die $!;
# iterate through the input file one line at a time
while ( <$in_filehandle> ) {
# save this line and remove the newline
my $input_line = $_;
chomp $input_line;
# prepare the line to be written out
my $output_line = do_something( $input_line );
# write to the output file
print $output_line . "\n";
}
close $in_filehandle;
close $out_filehandle;