using perl tie::file with utf encoded file - perl

Can I use Tie::File with an output file of utf encoding? I can't get this to work right.
What I am trying to do is open this utf encoded file, remove the match string from the file and rename the file.
Code:
use strict;
use warnings;
use Tie::File;
use File::Copy;
my ($input_file) = qw (test.txt);
open my $infh, "<:encoding(UTF-16LE)", $input_file or die "cannot open '$input_file': $!";
for (<$infh>) {
tie my #lines, "Tie::File", $_;
shift #lines if $lines[0] =~ m/MyHeader/;
untie #lines;
my ($name) = /^(.*).csv/i;
move($_, $name . ".dat");
}
close $infh
or die "Cannot close '$input_file': $!";
Code: (updated)
my ($input_file) = qw (test.txt);
my $qfn_in = $input_file;
my $qfn_out = $qfn_in . ".dat";
open(my $fh_in, "<:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, ">:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_out)
or die("Can't open \"$qfn_out\": $!\n");
while (<$fh_in>) {
next if $. == 1 && /MyHeader/;
print($fh_out $_)
or die("Can't write to \"$qfn_out\": $!");
}
close($fh_in);
close($fh_out) or die("Can't write to \"$qfn_out\": $!");
rename($qfn_out, $qfn_in)
or die("Can't rename: $!\n");

This is underdocumented in the Tie::File perldoc, but you want to pass the discipline => ':encoding(UTF-16LE)' option when you tie the file:
tie my #lines, 'Tie::File', $input_file, discipline => ':encoding(UTF-16LE)'
Note that the third argument is the name of the file to associate with the tied array. Tie::File will automatically open and manage the filehandle for you; there is no need to call open on the file yourself.
#lines now contains the contents of the file, so the next thing to do is check the first line:
if ($lines[0] =~ m/pattern/) {
my $line = shift #lines;
untie #lines; # rewrites, closes the file, w/o first line
my ($name) = $line =~ /^(.*).csv/i;
rename $input_file, "$name.dat";
}
But I concur with TLP that Tie::File is overkill for this job.
(My previous answer about opening a filehandle with the correct encoding and passing the glob as the third arg to Tie::File won't work, as (1) it didn't open the file in read/write mode and (2) even if it did, Tie::File can't or doesn't apply the encoding on both the reading from and writing to the file handle)

my $qfn_in = ...;
my $qfn_out = $qfn_in . ".tmp";
open(my $fh_in, "<:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, ">:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_out)
or die("Can't open \"$qfn_out\": $!\n");
while (<$fh_in>) {
next if $. == 1 && /MyHeader/;
print($fh_out $_)
or die("Can't write to \"$qfn_out\": $!");
}
close($fh_in);
close($fh_out) or die("Can't write to \"$qfn_out\": $!");
rename($qfn_out, $qfn_in)
or die("Can't rename: $!\n");
(:perlio and :utf8 are workarounds for bugs that existed back then.)

The line:
tie my #lines, "Tie::File", $_;
Tries to tie #lines to a file with the name of each line of test.txt. Since it does not seem to be a file with filenames in it, I suspect that that tie fails.
What you are probably after is using Tie::File on test.txt. If you only want to check the first line of that file, you do not need a loop.
So you'd need something like:
use autodie; #handy to check for fatal errors
tie my #lines, "Tie::File", $input_file;
shift #lines if $lines[0] =~ /MyHeader/;
untie #lines;
if ($input_file =~ /(.+).csv/i) {
move($input_file, $1);
}
But there are simpler ways to check the first line of a file. This will check one file:
perl -we '$_=<>; print if /MyHeader/; print <>;' test.txt > test.dat

Related

Read a text file and store each line in a variable using perl

I have a text file (sample.txt) with some data. I want to read the text file and store each line in an array or a variable.
sample.txt
ab1234
str:abcd
pq4567
How can i store each of these lines in an array or a variable using perl script.
It is easy. We open the file, push each line in the file to an array after you chomped \n (newline characters) and to test it, we print the array.
Here $_ is each of the lines read from file where #lines will store each of $_ in an array.
use strict;
use warnings
my $file = "sample.txt";
open(my $fh, "<", "sample.txt") or die "Unable to open < sample.txt: $!";
my #lines;
while (<$fh>) {
chomp $_;
push (#lines, $_);
}
close $fh or die "Unable to open $file: $!";
print #lines;
an even easier method is to just store the content to array.
use strict;
use warnings
my $file = "sample.txt";
open(my $fh, "<", "sample.txt") or die "Unable to open < sample.txt: $!";
my #lines = <$fh>;
chomp(#lines);
print #lines;
# open the file
open my $fh, '<', 'sample.txt'
or die "Could not open sample.txt: $!";
# Read the file into an array
my #lines = <$fh>;
# Optionally, remove newlines from all lines in the array
chomp(#lines);
If you are able to use CPAN modules, then Tie::File is there for your help.
Using this module you can modify, add or delete the contents in the file.
below is the script.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
my #contents=();
tie #contents, 'Tie::File','sample.txt' or die "Not able to Tie sample.txt\n";
my $count=1;
foreach (#contents)
{
print "line $count:$_\n";
$count++;
}
untie #contents;
output:
line 1: ab1234
line 2: str:abcd
line 3: pq4567

Fix files "corrupted" by Perl

I have a bunch of files that were created using this code:
use LWP::Simple;
my $xl = get("http://www.somewhere.com/file.xls");
open(my $outf, '>', "C:/file.xls") || die $!;
print $outf $xl;
Only recently did I realize that I should have been using '>:raw' in the filehandle rather than just '>'. So now I have a bunch of files that have been modified in some way that prevents Excel from opening them.
My question is whether there is some processing I can do with Perl to these files to get back to the original Excel files. In other words, is it possible to figure out what edits would have been made to the file that I can undo with a new Perl script?
It converted LF to CRLF. You can simply change any instance of CRLF back to LF.
my $qfn_in = $qfn;
my $qfn_out = $qfn . ".new";
open(my $fh_in, '<:raw', $qfn_in ) or die $!;
open(my $fh_out, '>:raw', $qfn_out) or die $!;
while (<$fh_in>) {
s/\r\n\z/\n/;
print($fh_out $_);
}
Or
my $qfn_in = $qfn;
my $qfn_out = $qfn . ".new";
open(my $fh_in, '<:raw:crlf', $qfn_in ) or die $!;
open(my $fh_out, '>:raw', $qfn_out) or die $!;
print($fh_out $_) while <$fh_in>;
If you have dos2unix, you could also use that. (Though JRFerguson says that his version of it will corrupt files with character 1A in it.)

Read and write file bit by bit

There is a .jpg file for example or some other file. I want to read it bit by bit. I do this:
open(FH, "<", "red.jpg") or die "Error: $!\n";
my $str;
while(<FH>) {
$str .= unpack('B*', $_);
}
close FH;
Well it gives me $str with 0101001 of the file. After that I do this:
open(AB, ">", "new.jpg") or die "Error: $!\n";
binmode(AB);
print AB $str;
close AB;
but it doesn't work.
How can I do it? and how to do that that it would work regardless of byte order(cross-platform)?
Problems:
You're didn't use binmode when reading too.
It makes no sense to read a binary file line by line since they don't have lines.
You're needlessly using global variables for your file handles.
And the one that answers your question: You didn't reverse the unpack.
open(my $FH, "<", "red.jpg")
or die("Can't open red.jpg: $!\n");
binmode($FH);
my $file; { local $/; $file = <$FH>; }
my $binary = unpack('B*', $file);
open(my $FH, ">", "new.jpg")
or die("Can't create new.jpg: $!\n");
binmode($FH);
print $FH pack('B*', $binary);

Read Increment Then Write to a text file in perl

I have this little perl script which opens a txt file, reads the number in it, then overwrites the file with the number incremented by 1. I can open and read from the file, I can write to the file but I"m having issues overwriting. In addition, I'm wondering if there is a way to do this without opening the file twice. Here's my code:
#!/usr/bin/perl
open (FILE, "<", "data.txt") or die "$! error trying to a\
ppend";
undef $/;
$number = <FILE>;
$number = int($number);
$myNumber = $number++;
print $myNumber+'\n';
close(FILE);
open(FILE, ">data.txt") or die "$! error";
print FILE $myNumber;
close(FILE);
Change the line
$myNumber = $number++;
to
$myNumber = $number+1;
That should solve the problem.
Below is how you could do by opening the file just once:
open(FILE, "+<data.txt") or die "$! error";
undef $/;
$number = <FILE>;
$number = int($number);
$myNumber = $number+1;
seek(FILE, 0, 0);
truncate(FILE, tell FILE);
print $myNumber+"\n";
print FILE $myNumber;
close(FILE);
It's good that you used the three-argument form of open the first time. You also needed to do that in your second open. Also, you should use lexical variables, i.e., those which begin with my, in your script--even for your file handles.
You can just increment the variable that holds the number, instead of passing it to a new variable. Also, it's a good idea to use chomp. This things being said, consider the following option:
#!/usr/bin/env perl
use strict;
use warnings;
undef $/;
open my $fhIN, "<", "data.txt" or die "Error trying to open for reading: $!";
chomp( my $number = <$fhIN> );
close $fhIN;
$number++;
open my $fhOUT, ">", "data.txt" or die "Error trying to open for writing: $!";
print $fhOUT $number;
close $fhOUT;
Another option is to use the Module File::Slurp, letting it handle all the I/O operations:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Slurp qw/edit_file/;
edit_file { chomp; $_++ } 'data.txt';
Try this:
#!/usr/bin/perl
use strict;
use warnings;
my $file = "data.txt";
my $number = 0;
my $fh;
if( -e $file ) {
open $fh, "+<", $file or die "Opening '$file' failed, because $!\n";
$number = <$fh>;
seek( $fh, 0, 0 );
} else { # if no data.txt exists - yet
open $fh, ">", $file or die "Creating '$file' failed, because $!\n";
}
$number++;
print "$number\n";
print $fh $number;
close( $fh );
If you're using a bash shell, and you save the code to test.pl, you can test it with:
for i in {1..10}; do ./test.pl; done
Then 'cat data.txt', should show a 10.

perl: Writing file at Nth position

I am trying to write in to file at Nth POSITION. I have tried with below example but it writes at the end. Please help to achieve this.
#!/usr/bin/perl
open(FILE,"+>>try.txt")
or
die ("Cant open file try.txt");
$POS=5;
seek(FILE,$POS,0);
print FILE "CP1";
You are opening the file in read-write appending mode. Try opening the file in read-write mode:
my $file = "try.txt";
open my $fh, "+<", $file
or die "could not open $file: $!";
Also, note the use of the three argument open, the lexical filehandle, and $!.
#!/usr/bin/perl
use strict;
use warnings;
#create an in-memory file
my $fakefile = "1234567890\n";
open my $fh, "+<", \$fakefile
or die "Cant open file: $!";
my $offset = 5;
seek $fh, $offset, 0
or die "could not seek: $!";
print $fh "CP1";
print $fakefile;
The code above prints:
12345CP190
If I understand you correctly, if the file contents are
123456789
you want to change that to
1234CP157689
You cannot achieve that using modes supplied to open (regardless of programming language).
You need to open the source file and another temporary file (see File::Temp. Read up to the insertion point from the source and write the contents to the temporary file, write what you want to insert, then write the remainder of the source file to the temporary file, close the source and rename the temporary to the source.
If you are going to do this using seek, both files must be opened in binary mode.
Here is an example using line oriented input and text mode:
#!/usr/bin/perl
use strict; use warnings;
use File::Temp qw( :POSIX );
my $source = 'test.test';
my $temp = tmpnam;
open my $source_h, '<', $source
or die "Failed to open '$source': $!";
open my $temp_h, '>', $temp
or die "Failed to open '$temp' for writing: $!";
while ( my $line = <$source_h> ) {
if ( $line =~ /^[0-9]+$/ ) {
$line = substr($line, 0, 5) . "CP1" . substr($line, 5);
}
print $temp_h $line;
}
close $temp_h
or die "Failed to close '$temp': $!";
close $source_h
or die "Failed to close '$source': $!";
rename $temp => $source
or die "Failed to rename '$temp' to '$source': $!";
this works for me
use strict;
use warnings;
open( my $fh, '+<', 'foo.txt' ) or die $!;
seek( $fh, 3, 0 );
print $fh "WH00t?";
this is also a more "modern" use of open(), see http://perldoc.perl.org/functions/open.html
The file will be closed when $fh goes out of scope ..
"Inserting" a string into a function can (mostly) be done in place. See the lightly used truncate built-in function.
open my $fh, '+<', $file or die $!;
seek $fh, 5, 0;
$/ = undef;
$x = <$fh>; # read everything after the 5th byte into $x
truncate $fh, 5;
print $fh "CPI";
print $fh $x;
close $fh;
If your file is line or record oriented, you can insert lines or modify individual lines easily with the core module Tie::File This will allow the file to be treated as an array and Perl string and array manipulation to be used to modify the file in memory. You can safely operate on huge files larger than your RAM with this method.
Here is an example:
use strict; use warnings;
use Tie::File;
#create the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
tie my #data, 'Tie::File', "nums.txt" or die $!;
my $offset=5;
my $insert="INSERTED";
#insert in a string:
$data[0]=substr($data[0],0,$offset).$insert.substr($data[0],$offset)
if (length($data[0])>$offset);
#insert a new array element that becomes a new file line:
splice #data,$offset,0,join(':',split(//,$insert));
#insert vertically:
$data[$_]=substr($data[$_],0,$offset) .
substr(lc $insert,$_,1) .
substr($data[$_],$offset) for (0..length($insert));
untie #data; #close the file too...
__DATA__
123456789
234567891
345678912
456789123
567891234
678912345
789123456
891234567
912345678
Output:
12345iINSERTED6789
23456n7891
34567s8912
45678e9123
56789r1234
I:N:St:E:R:T:E:D
67891e2345
78912d3456
891234567
912345678
The file modifications with Tie::File are made in place and as the array is modified. You could use Tie::File just on the first line of you file to modify and insert as you requested. You can put sleep between the array mods and use tail -n +0 -f on the file and watch the file change if you wish...
Alternatively, if your file is reasonable size and you want to treat it like characters, you can read the entire file into memory, do string operations on the data, then write the modified data back out. Consider:
use strict; use warnings;
#creat the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
my $data;
open (my $in, '<', "nums.txt") or die $!;
{ local $/=undef; $data=<$in>; }
close $in or die $!;
my $offset=5;
my $insert="INSERTED";
open (my $out, '>', "nums.txt") or die $!;
print $out substr($data,0,$offset).$insert.substr($data,$offset);
close $out or die $!;
__DATA__
123456789
2
3
4
5
6
7
8
9
Output:
12345INSERTED6789
2
3
4
5
6
7
8
9
If you treat files as characters, beware that under Windows, files in text mode have a \r\n for a new line. That is two characters if opened in binary mode.