Fix files "corrupted" by Perl - perl

I have a bunch of files that were created using this code:
use LWP::Simple;
my $xl = get("http://www.somewhere.com/file.xls");
open(my $outf, '>', "C:/file.xls") || die $!;
print $outf $xl;
Only recently did I realize that I should have been using '>:raw' in the filehandle rather than just '>'. So now I have a bunch of files that have been modified in some way that prevents Excel from opening them.
My question is whether there is some processing I can do with Perl to these files to get back to the original Excel files. In other words, is it possible to figure out what edits would have been made to the file that I can undo with a new Perl script?

It converted LF to CRLF. You can simply change any instance of CRLF back to LF.
my $qfn_in = $qfn;
my $qfn_out = $qfn . ".new";
open(my $fh_in, '<:raw', $qfn_in ) or die $!;
open(my $fh_out, '>:raw', $qfn_out) or die $!;
while (<$fh_in>) {
s/\r\n\z/\n/;
print($fh_out $_);
}
Or
my $qfn_in = $qfn;
my $qfn_out = $qfn . ".new";
open(my $fh_in, '<:raw:crlf', $qfn_in ) or die $!;
open(my $fh_out, '>:raw', $qfn_out) or die $!;
print($fh_out $_) while <$fh_in>;
If you have dos2unix, you could also use that. (Though JRFerguson says that his version of it will corrupt files with character 1A in it.)

Related

Printing a content of a file to the screen in perl

I Have a perl script which write a few lines into file. (I checked and see that the file is written correctly)
right after that I want to print the content to the screen, the way I'm trying to do it- is to read the file and print it
open (FILE, '>', "tmpLogFile.txt") or die "could not open the log file\n";
$aaa = <FILE>;
close (FILE);
print $aaa;
but I get nothing on the screen, what do I do wrong?
To read you need to specify the open mode as <.
Also, $aaa = <FILE> has scalar context, and only reads a line.
Using print <FILE> you can have list context and read all lines:
open (FILE, '<', "tmpLogFile.txt") or die "could not open the log file\n";
print <FILE>;
close (FILE);
try this:
use strict;
use warnings;
my $filename = 'data.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>) {
chomp $row;
print "$row\n";
}
print "done\n"

perl read a file into string then check each character for unicode range

i am trying to read a file into string then check each character for unicode range 2816-2943. all other characters need to be skipped except those falling in range and \n. i've got following code from net but isnt working for me. i am sorry if i make silly mistakes, i am new to perl. plz help i need to finish this today only.
use utf8;
use encoding 'utf8';
use open qw/:std :utf8/;
binmode(STDOUT, ":utf8"); #makes STDOUT output in UTF-8 instead of ordinary ASCII.
$file="content.txt";
open FILE1, ">filtered.txt" or die $!;
open(FILE, "<$file") or die "Can't read file 'filename' [$!]\n";
binmode(FILE);
my $document = <FILE>;
close (FILE);
print $document;
The following reads line by line from the $input file and writes the filtered line to the $output file.
my $input = 'content.txt';
my $output = 'filtered.txt';
open(my $src_fh, '<:encoding(UTF-8)', $input)
or die qq/Could not open file '$input' for reading: '$!'/;
open(my $dst_fh, '>:encoding(UTF-8)', $output)
or die qq/Could not open file '$output' for writing: '$!'/;
while(<$src_fh>) {
s/[^\x{0B00}-\x{0B7F}\n]//g;
print {$dst_fh} $_
or die qq/Could not write to file '$output': '$!'/;
}
close $dst_fh
or die qq/Could not close output filehandle: '$!'/;
close $src_fh
or die qq/Could not close input filehandle: '$!'/;

Read and write file bit by bit

There is a .jpg file for example or some other file. I want to read it bit by bit. I do this:
open(FH, "<", "red.jpg") or die "Error: $!\n";
my $str;
while(<FH>) {
$str .= unpack('B*', $_);
}
close FH;
Well it gives me $str with 0101001 of the file. After that I do this:
open(AB, ">", "new.jpg") or die "Error: $!\n";
binmode(AB);
print AB $str;
close AB;
but it doesn't work.
How can I do it? and how to do that that it would work regardless of byte order(cross-platform)?
Problems:
You're didn't use binmode when reading too.
It makes no sense to read a binary file line by line since they don't have lines.
You're needlessly using global variables for your file handles.
And the one that answers your question: You didn't reverse the unpack.
open(my $FH, "<", "red.jpg")
or die("Can't open red.jpg: $!\n");
binmode($FH);
my $file; { local $/; $file = <$FH>; }
my $binary = unpack('B*', $file);
open(my $FH, ">", "new.jpg")
or die("Can't create new.jpg: $!\n");
binmode($FH);
print $FH pack('B*', $binary);

using perl tie::file with utf encoded file

Can I use Tie::File with an output file of utf encoding? I can't get this to work right.
What I am trying to do is open this utf encoded file, remove the match string from the file and rename the file.
Code:
use strict;
use warnings;
use Tie::File;
use File::Copy;
my ($input_file) = qw (test.txt);
open my $infh, "<:encoding(UTF-16LE)", $input_file or die "cannot open '$input_file': $!";
for (<$infh>) {
tie my #lines, "Tie::File", $_;
shift #lines if $lines[0] =~ m/MyHeader/;
untie #lines;
my ($name) = /^(.*).csv/i;
move($_, $name . ".dat");
}
close $infh
or die "Cannot close '$input_file': $!";
Code: (updated)
my ($input_file) = qw (test.txt);
my $qfn_in = $input_file;
my $qfn_out = $qfn_in . ".dat";
open(my $fh_in, "<:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, ">:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_out)
or die("Can't open \"$qfn_out\": $!\n");
while (<$fh_in>) {
next if $. == 1 && /MyHeader/;
print($fh_out $_)
or die("Can't write to \"$qfn_out\": $!");
}
close($fh_in);
close($fh_out) or die("Can't write to \"$qfn_out\": $!");
rename($qfn_out, $qfn_in)
or die("Can't rename: $!\n");
This is underdocumented in the Tie::File perldoc, but you want to pass the discipline => ':encoding(UTF-16LE)' option when you tie the file:
tie my #lines, 'Tie::File', $input_file, discipline => ':encoding(UTF-16LE)'
Note that the third argument is the name of the file to associate with the tied array. Tie::File will automatically open and manage the filehandle for you; there is no need to call open on the file yourself.
#lines now contains the contents of the file, so the next thing to do is check the first line:
if ($lines[0] =~ m/pattern/) {
my $line = shift #lines;
untie #lines; # rewrites, closes the file, w/o first line
my ($name) = $line =~ /^(.*).csv/i;
rename $input_file, "$name.dat";
}
But I concur with TLP that Tie::File is overkill for this job.
(My previous answer about opening a filehandle with the correct encoding and passing the glob as the third arg to Tie::File won't work, as (1) it didn't open the file in read/write mode and (2) even if it did, Tie::File can't or doesn't apply the encoding on both the reading from and writing to the file handle)
my $qfn_in = ...;
my $qfn_out = $qfn_in . ".tmp";
open(my $fh_in, "<:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, ">:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_out)
or die("Can't open \"$qfn_out\": $!\n");
while (<$fh_in>) {
next if $. == 1 && /MyHeader/;
print($fh_out $_)
or die("Can't write to \"$qfn_out\": $!");
}
close($fh_in);
close($fh_out) or die("Can't write to \"$qfn_out\": $!");
rename($qfn_out, $qfn_in)
or die("Can't rename: $!\n");
(:perlio and :utf8 are workarounds for bugs that existed back then.)
The line:
tie my #lines, "Tie::File", $_;
Tries to tie #lines to a file with the name of each line of test.txt. Since it does not seem to be a file with filenames in it, I suspect that that tie fails.
What you are probably after is using Tie::File on test.txt. If you only want to check the first line of that file, you do not need a loop.
So you'd need something like:
use autodie; #handy to check for fatal errors
tie my #lines, "Tie::File", $input_file;
shift #lines if $lines[0] =~ /MyHeader/;
untie #lines;
if ($input_file =~ /(.+).csv/i) {
move($input_file, $1);
}
But there are simpler ways to check the first line of a file. This will check one file:
perl -we '$_=<>; print if /MyHeader/; print <>;' test.txt > test.dat

perl: Writing file at Nth position

I am trying to write in to file at Nth POSITION. I have tried with below example but it writes at the end. Please help to achieve this.
#!/usr/bin/perl
open(FILE,"+>>try.txt")
or
die ("Cant open file try.txt");
$POS=5;
seek(FILE,$POS,0);
print FILE "CP1";
You are opening the file in read-write appending mode. Try opening the file in read-write mode:
my $file = "try.txt";
open my $fh, "+<", $file
or die "could not open $file: $!";
Also, note the use of the three argument open, the lexical filehandle, and $!.
#!/usr/bin/perl
use strict;
use warnings;
#create an in-memory file
my $fakefile = "1234567890\n";
open my $fh, "+<", \$fakefile
or die "Cant open file: $!";
my $offset = 5;
seek $fh, $offset, 0
or die "could not seek: $!";
print $fh "CP1";
print $fakefile;
The code above prints:
12345CP190
If I understand you correctly, if the file contents are
123456789
you want to change that to
1234CP157689
You cannot achieve that using modes supplied to open (regardless of programming language).
You need to open the source file and another temporary file (see File::Temp. Read up to the insertion point from the source and write the contents to the temporary file, write what you want to insert, then write the remainder of the source file to the temporary file, close the source and rename the temporary to the source.
If you are going to do this using seek, both files must be opened in binary mode.
Here is an example using line oriented input and text mode:
#!/usr/bin/perl
use strict; use warnings;
use File::Temp qw( :POSIX );
my $source = 'test.test';
my $temp = tmpnam;
open my $source_h, '<', $source
or die "Failed to open '$source': $!";
open my $temp_h, '>', $temp
or die "Failed to open '$temp' for writing: $!";
while ( my $line = <$source_h> ) {
if ( $line =~ /^[0-9]+$/ ) {
$line = substr($line, 0, 5) . "CP1" . substr($line, 5);
}
print $temp_h $line;
}
close $temp_h
or die "Failed to close '$temp': $!";
close $source_h
or die "Failed to close '$source': $!";
rename $temp => $source
or die "Failed to rename '$temp' to '$source': $!";
this works for me
use strict;
use warnings;
open( my $fh, '+<', 'foo.txt' ) or die $!;
seek( $fh, 3, 0 );
print $fh "WH00t?";
this is also a more "modern" use of open(), see http://perldoc.perl.org/functions/open.html
The file will be closed when $fh goes out of scope ..
"Inserting" a string into a function can (mostly) be done in place. See the lightly used truncate built-in function.
open my $fh, '+<', $file or die $!;
seek $fh, 5, 0;
$/ = undef;
$x = <$fh>; # read everything after the 5th byte into $x
truncate $fh, 5;
print $fh "CPI";
print $fh $x;
close $fh;
If your file is line or record oriented, you can insert lines or modify individual lines easily with the core module Tie::File This will allow the file to be treated as an array and Perl string and array manipulation to be used to modify the file in memory. You can safely operate on huge files larger than your RAM with this method.
Here is an example:
use strict; use warnings;
use Tie::File;
#create the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
tie my #data, 'Tie::File', "nums.txt" or die $!;
my $offset=5;
my $insert="INSERTED";
#insert in a string:
$data[0]=substr($data[0],0,$offset).$insert.substr($data[0],$offset)
if (length($data[0])>$offset);
#insert a new array element that becomes a new file line:
splice #data,$offset,0,join(':',split(//,$insert));
#insert vertically:
$data[$_]=substr($data[$_],0,$offset) .
substr(lc $insert,$_,1) .
substr($data[$_],$offset) for (0..length($insert));
untie #data; #close the file too...
__DATA__
123456789
234567891
345678912
456789123
567891234
678912345
789123456
891234567
912345678
Output:
12345iINSERTED6789
23456n7891
34567s8912
45678e9123
56789r1234
I:N:St:E:R:T:E:D
67891e2345
78912d3456
891234567
912345678
The file modifications with Tie::File are made in place and as the array is modified. You could use Tie::File just on the first line of you file to modify and insert as you requested. You can put sleep between the array mods and use tail -n +0 -f on the file and watch the file change if you wish...
Alternatively, if your file is reasonable size and you want to treat it like characters, you can read the entire file into memory, do string operations on the data, then write the modified data back out. Consider:
use strict; use warnings;
#creat the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
my $data;
open (my $in, '<', "nums.txt") or die $!;
{ local $/=undef; $data=<$in>; }
close $in or die $!;
my $offset=5;
my $insert="INSERTED";
open (my $out, '>', "nums.txt") or die $!;
print $out substr($data,0,$offset).$insert.substr($data,$offset);
close $out or die $!;
__DATA__
123456789
2
3
4
5
6
7
8
9
Output:
12345INSERTED6789
2
3
4
5
6
7
8
9
If you treat files as characters, beware that under Windows, files in text mode have a \r\n for a new line. That is two characters if opened in binary mode.