How can I open a Unicode file with Perl?

How can I open a Unicode file with Perl? - perl

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn't seem to like the fact that the results files are Unicode.
I wrote a little test script to test it and the output comes out all warbled:
$file = shift;
open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
print $_;
if (/Invalid|invalid|Cannot|cannot/) {
push(#invalids, $file);
print "invalid file - $inputfile - schedule for retry\n";
last;
}
}
Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file.
I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'.
Edit: Using perl v5.8.8
Edit: Hex dump:
file name: Admin_CI.User.sql.results
mime type:
0000-0010: ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00 ..1.>... 2.>...M.
0000-0020: 73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00 s.g...1. 5.0.0.7.
0000-0030: 2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00 ,...L.e. v.e.l...
0000-0032: 31 00 1.

The file is presumably in UCS2-LE (or UTF-16 format).
C:\Temp> notepad test.txt
C:\Temp> xxd test.txt
0000000: fffe 5400 6800 6900 7300 2000 6900 7300 ..T.h.i.s. .i.s.
0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00 .a. .f.i.l.e...
When opening such file for reading, you need to specify the encoding:
#!/usr/bin/perl
use strict; use warnings;
my ($infile) = #ARGV;
open my $in, '<:encoding(UCS-2le)', $infile
or die "Cannot open '$infile': $!";
Note that the fffe at the beginning is the BOM.

The answer is in the documentation for open, which also points you to perluniintro. :)
open my $fh, '<:encoding(UTF-16LE)', $file or die ...;
You can get a list of the names of the encodings that your perl supports:
% perl -MEncode -le "print for Encode->encodings(':all')"
After that, it's up to you to find out what the file encoding is. This is the same way you'd open any file with an encoding different than the default, whether it's one defined by Unicode or not.
We have a chapter in Effective Perl Programming that goes through the details.

Try opening the file with an IO layer specified, e.g. :
open OUTPUT, "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";
See perldoc open for more on this.

#
# -----------------------------------------------------------------------------
# Reads a file returns a sting , if second param is utf8 returns utf8 string
# usage:
# ( $ret , $msg , $str_file )
# = $objFileHandler->doReadFileReturnString ( $file , 'utf8' ) ;
# or
# ( $ret , $msg , $str_file )
# = $objFileHandler->doReadFileReturnString ( $file ) ;
# -----------------------------------------------------------------------------
sub doReadFileReturnString {
my $self = shift;
my $file = shift;
my $mode = shift ;
my $msg = {} ;
my $ret = 1 ;
my $s = q{} ;
$msg = " the file : $file does not exist !!!" ;
cluck ( $msg ) unless -e $file ;
$msg = " the file : $file is not actually a file !!!" ;
cluck ( $msg ) unless -f $file ;
$msg = " the file : $file is not readable !!!" ;
cluck ( $msg ) unless -r $file ;
$msg .= "can not read the file $file !!!";
return ( $ret , "$msg ::: $! !!!" , undef )
unless ((-e $file) && (-f $file) && (-r $file));
$msg = '' ;
$s = eval {
my $string = (); #slurp the file
{
local $/ = undef;
if ( defined ( $mode ) && $mode eq 'utf8' ) {
open FILE, "<:utf8", "$file "
or cluck("failed to open \$file $file : $!");
$string = <FILE> ;
die "did not find utf8 string in file: $file"
unless utf8::valid ( $string ) ;
}
else {
open FILE, "$file "
or cluck "failed to open \$file $file : $!" ;
$string = <FILE> ;
}
close FILE;
}
$string ;
};
if ( $# ) {
$msg = $! . " " . $# ;
$ret = 1 ;
$s = undef ;
} else {
$ret = 0 ; $msg = "ok for read file: $file" ;
}
return ( $ret , $msg , $s ) ;
}
#eof sub doReadFileReturnString

Related

eliminate empty files in a subroutine in perl

I want to a add a code in the next script to eliminate those empty output files.
The script convert a single fastq file or all the fastq files in a folder to fasta format, all the output fasta files keep the same name of the fastq file; the script present an option to exclude all the sequences that present a determinate number of NNN repeats (NNNNNNNNNNNNNNNNNNATAGTGAAGAATGCGACGTACAGGATCATCTA), I added this option because some sequences present only NNNNN in the sequences, example: if the -n option is equal to 15 (-n 15) it will exclude all the sequences that present 15 o more N repeats, to this point the code works well, but it generate an empty files (in those fastq files that all the sequences present 15 or more N repeats are excluded). I want to eliminate all the empty files (without sequences) and add a count of how many files were eliminate because it were empty.
Code:
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long;
my ($infile, $file_name, $file_format, $N_repeat, $help, $help_descp,
$options, $options_descrp, $nofile, $new_file, $count);
my $fastq_extension = "\\.fastq";
GetOptions (
'in=s' => \$infile,
'N|n=i' =>\$N_repeat,
'h|help' =>\$help,
'op' =>\$options
);
# Help
$help_descp =(qq(
Ussaje:
fastQF -in fastq_folder/ -n 15
or
fastQF -in file.fastq -n 15
));
$options_descrp =(qq(
-in infile.fastq or fastq_folder/ required
-n exclude sequences with more than N repeat optional
-h Help description optional
-op option section optional
));
$nofile =(qq(
ERROR: "No File or Folder Were Chosen !"
Usage:
fastQF -in folder/
Or See -help or -op section
));
# Check Files
if ($help){
print "$help_descp\n";
exit;
}
elsif ($options){
print "$options_descrp\n";
exit;
}
elsif (!$infile){
print "$nofile\n";
exit;
}
#Subroutine to convert from fastq to fasta
sub fastq_fasta {
my $file = shift;
($file_name = $file) =~ s/(.*)$fastq_extension.*/$1/;
# eliminate old files
my $oldfiles= $file_name.".fasta";
if ($oldfiles){
unlink $oldfiles;
}
open LINE, '<', $file or die "can't read or open $file\n";
open OUTFILE, '>>', "$file_name.fasta" or die "can't write $file_name\n";
while (
defined(my $head = <LINE>) &&
defined(my $seq = <LINE>) &&
defined(my $qhead = <LINE>) &&
defined(my $quality = <LINE>)
) {
substr($head, 0, 1, '>');
if (!$N_repeat){
print OUTFILE $head, $seq;
}
elsif ($N_repeat){
my $number_n=$N_repeat-1;
if ($seq=~ m/(n)\1{$number_n}/ig){
next;
}
else{
print OUTFILE $head, $seq;
}
}
}
close OUTFILE;
close LINE;
}
# execute the subrutine to extract the sequences
if (-f $infile) { # -f es para folder !!
fastq_fasta($infile);
}
else {
foreach my $file (glob("$infile/*.fastq")) {
fastq_fasta($file);
}
}
exit;
I have tried to use the next code outside of the subroutine (before exit) but it just work for the last file :
$new_file =$file_name.".fasta";
foreach ($new_file){
if (-z $new_file){
$count++;
if ($count==1){
print "\n\"The choosen File present not sequences\"\n";
print " \"or was excluded due to -n $N_repeat\"\n\n";
}
elsif ($count >=1){
print "\n\"$count Files present not sequences\"\n";
print " \" or were excluded due to -n $N_repeat\"\n\n";
}
unlink $new_file;
}
}
and I just have tried something similar inside of the subroutine but this last code don´t work !!!!
Any Advise !!!!???
Thanks So Much !!!

you should check, if something was written to your new file at the end of our fastq_fasta subroutine. Just put your code after the close OUTFILE statement:
close OUTFILE;
close LINE;
my $outfile = $file_name.".fasta";
if (-z $outfile)
{
unlink $outfile || die "Error while deleting '$outfile': $!";
}
Additionally, it will be better to add the die/warn statement also to the other unlink line. Empty files should be deleted.
Maybe another solution if you are not fixed to perl, but allowed to use sed and a bash loop:
for i in *.fastq
do
out=$(dirname "$i")/$(basename "$i" .fastq).fasta
sed -n '1~4{s/^#/>/;N;p}' "$i" > "$out"
if [ -z $out ]
then
echo "Empty output file $out"
rm "$out"
fi
done
Hope that helps!
Best Frank

The easiest thing to do is probably to add a counter to your subroutine to keep track of the number of sequences in the outfile:
sub fastq_fasta {
my $counter1 = 0;
my $file = shift;
($file_name = $file) =~ s/(.*)$fastq_extension.*/$1/;
# eliminate old files
my $oldfiles= $file_name.".fasta";
if ($oldfiles){
unlink $oldfiles;
}
open LINE, '<', $file or die "can't read or open $file\n";
open OUTFILE, '>>', "$file_name.fasta" or die "can't write $file_name\n";
while (
defined(my $head = <LINE>) &&
defined(my $seq = <LINE>) &&
defined(my $qhead = <LINE>) &&
defined(my $quality = <LINE>)
) {
$counter1 ++;
substr($head, 0, 1, '>');
if (!$N_repeat){
print OUTFILE $head, $seq;
}
elsif ($N_repeat){
my $number_n=$N_repeat-1;
if ($seq=~ m/(n)\1{$number_n}/ig){
$counter1 --;
next;
}
else{
print OUTFILE $head, $seq;
}
}
}
close OUTFILE;
close LINE;
return $counter1;
}
You can then delete files when the returned count is zero:
if (-f $infile) { # -f es para folder !!
fastq_fasta($infile);
}
else {
foreach my $file (glob("$infile/*.fastq")) {
if (fastq_fasta($file) == 0) {
$file =~ s/(.*)$fastq_extension.*/$1.fasta/;
unlink $file;
}
}
}

rename the file according PDF title

I am trying to write file rename Perl script, for reducing manual efforts. Manually I open the pdf file, copy the title and rename the file name according to the title.
I am writing below code to rename the pdf according to the file title. e.g. SPE-180024-MS is title and pdf should be renamed to that
According to my logic it should rename the file, but the output is not proper
#!/usr/bin/perl
use strict;
#use warnings;
use Cwd;
use File::Basename;
#use File::Copy;
use File::Find;
use PDF::API2;
use CAM::PDF;
my $path1 = getcwd;
open( F6, ">Ref.txt" );
opendir( DIR, $path1 ) or die $!;
my #dots = grep /(.*?)\-(MS)$/, readdir(DIR);
closedir(DIR);
my #file;
my #files;
my $check;
my $err_1;
my $err_2;
my $err_3;
foreach my $file (#dots) {
#print F6 $file."\n";
opendir DIR1, $file or die "Can't open $file: $!";
my #files = sort grep { -f "$file/$_" } readdir DIR1;
my $data1 = join( ",", <#files> );
closedir DIR1;
#print F6 #files."\n";
my $a = #files;
if ($data1 =~ m#(((\w+)\-(\d+)\-MS)\.(pdf))#
#&& $data1=~m#((\w+)\-(\d+)\-MS\.(xml))#) #((.*?)\.xml)#
) {
my $check = $2;
#print F6 $1."\n";
if ( $data1 =~ m#(((\w+)\-(\d+)\-MS)\.(xml))# ) {
my $check1 = $2;
my $first = $1;
if ( $check eq $file || $check1 eq $file ) {
}
else {
#print F6 $file."\tDIFFERENT FILE PRESENT\n";
}
}
}
foreach my $f1 ( glob("$file/*.xml") ) {
#print F6 $f1."\n";
open( FH, '<', $f1 ) or die "Cannot open file: $f1";
my $data2 = join( "", <FH> );
#print F6 $data2."\n";
close FH;
if ( $data2 =~ m#(<page-count count="(\d+)"/>)# ) {
my $page = $2;
#print F6 $f1."\t".$1."\n";
if ( $f1 =~ m#(.*?)-MS/((.*?)-MS)#s
#SPE-173391-MS/SPE-173393-MS #(.*?)\.(.*?)$/s)
) {
my $f11 = $2;
#print F6 $f11."\n";
if ( $file eq $f11 ) {
}
else {
$err_1
= $err_1
. $file . "\t"
. $f11
. "\tDIFFERENT XML FILE PRESENT\n";
#print F6 $file."\t".$f11."\tDIFFERENT XML FILE PRESENT\n";
#print F6 $file."\tDIFFERENT XML FILE PRESENT\n";
}
foreach my $f2 ( glob("$file/*.pdf") ) {
open( F2, "<$f2" ) or die "Cannot open file: $f2";
my $data = join( "", <F2> );
close F2;
my $xml_list = $data;
my $pdf = PDF::API2->open($f2);
my $pages = $pdf->pages;
#print F6 $f2."\t".$pages."\n";
if ($f2 =~ m#(.*?)-MS/((.*?)-MS)#
#/(.*?)\.(.*?)$/s
) {
my $f21 = $2;
if ( $file eq $f21 ) {
}
else {
$err_2
= $err_2
. $file . "\t"
. $f21
. "\tDIFFERENT PDF FILE PRESENT\n";
#print F6 $file."\t".$f21."\tDIFFERENT PDF FILE PRESENT\n";
}
while ( $f11 =~ m/$f21/gs ) {
if ( $page !~ m#$pages#s ) {
$err_3
= $err_3
. $f1 . "\t"
. $page . "\t"
. $f2 . "\t"
. $pages . "\n";
#print F6 $f1."\t".$page."\t".$f2."\t".$pages."\n";
$data2 =~ s#<page-count count="$page"\/>#<page-count count="$pages"\/>#gs;
open( FH, '>', $f1 ) or die "Cannot open file: $f1";
print FH $data2 . "\n";
close FH;
}
}
}
}
}
}
}
}
close F6;
This is the document. The marked heading is what I want.

You cannot just open a PDF file and operate on it. It's different from a text file so it has to be parsed.
You can use CAM::PDF. It will convert your pdf to text which can be later analysed to get the title.
The links provided above covers enough stuff to get your job done. I am reproducing some relevant stuff here
use CAM::PDF;
my $pdf = CAM::PDF->new('test1.pdf');
$pageNum = 1
my $page1 = $pdf->getPageContent(pageNum);
The variable page1 will have the contents of page specified by pageNum variable. Rest is a matter of extracting the required information.
If you find converting the entire pdf to text then you can use getpdftext.pl which is a part of CAM::PDF however that's inefficient compared to reading a single page.

PDFs usually have a bunch of metadata, among them is the document title. If you're lucky, you will find the desired PDF title in there. A Perl example using PDF::API2 and its info method:
use autodie;
use Modern::Perl;
use PDF::API2;
my $file = '/your/sample/file.pdf';
my $pdf = PDF::API2->open( $file );
my %pdf_info = $pdf->info;
my $title = $pdf_info{Title};
my $renamed_dir = '/some/where/else/';
if ( $title ) {
my $new_name = $renamed_dir . $title;
if ( -f $new_name ) {
warn "File $new_name already exists, move it out of the way!";
} else {
$pdf->saveas( $new_name );
}
} else {
warn "No title found in document info.";
}
If you need to use some part of the text, then you should convert it to text first. Since you failed to mention any OS restrictions you get a Debian/Ubuntu solution for that. First, install the package poppler-utils. Then use the freshly installed tool pdftotext to extract all the text from the PDF. It might be a good idea to use pdftotext -layout. From the resulting text you will have to grep/parse the line with your "title", and then use that to rename (or much safer: copy) the PDF.

Perl : search a keywords from a file and display the occurence

I have two files
1. input.txt
2. keyword.txt
input.txt has contents like
.src_ref 0 "call.s" 24 first
0x000000 0x5a80 0x0060 BRA.l 0x60
.src_ref 0 "call.s" 30 first
0x000002 0x1bc5 RETI
.src_ref 0 "call.s" 31 first
0x000003 0x6840 MOV R0L,R0L
.src_ref 0 "call.s" 35 first
0x000004 0x1bc5 RETI
keyword.txt has contents
MOV
BRA.l
RETI
ADD
SUB
..
etc
Now I want to read this keyword.txt file and search it in input.txt file and find how many times MOV has occured,how many times BRA.l has occured and so on.
So far I have managed to get it working from a single file itself. here is the code
#!/usr/bin/perl
use strict;
use warnings;
sub retriver();
my #lines;
my $lines_ref;
my $count;
$lines_ref=retriver();
#lines=#$lines_ref;
$count=#lines;
print "Count :$count\nLines\n";
print join "\n",#lines;
sub retriver()
{
my $file='C:\Users\vk18434\Desktop\input.txt';
my $keyword_file = 'C:\Users\vk18434\Desktop\keywords.txt';
open FILE, $file or die "FILE $file NOT FOUND - $!\n";
my #contents=<FILE>;
open FILE, $keyword_file or die "FILE $file NOT FOUND - $!\n";
my #key=<FILE>;
my #filtered=grep(/^$key$/,#contents);
#my #filtered = grep $_ eq $keywords,#contents;
return \#filtered;
}
Output should look like:
MOV appeared 1 time
RETI appeared 2 times
Any help is appreciated. Request you to please help on this !!

I couldn't get your code working, but this code works and is a little easier to read IMO (change the paths back to the ones on your filesystem):
#!/usr/bin/perl
open(my $keywordFile, "<", '/Users/mark/workspace/stackOverflow/keyword.txt')
or die 'Could not open keywords.txt';
foreach my $key(<$keywordFile>) {
chomp $key;
open (my $file, '<', '/Users/mark/workspace/stackOverflow/input.txt')
or die 'Could not open input.txt';
my $count = 0;
foreach my $line (<$file>) {
my $number = () = $line =~ /$key/gi;
$count = $count + $number;
}
close($file);
print "$key was found $count times.\n";
}
The one confusing part is the crazy regex line. I found that on StackOverflow here, and didn't have time to come up with anything cleaner : Is there a Perl shortcut to count the number of matches in a string?

Check this and try:
#!/usr/bin/perl
use strict;
use warnings;
my (#text, #lines);
my $lines_ref;
my $count;
$lines_ref = &retriver;
sub retriver
{
my $file='input.txt';
my $keyword_file = 'keywords.txt';
open KEY, $keyword_file or die "FILE $file NOT FOUND - $!\n";
my #key=<KEY>;
my #filtered;
foreach my $keys(#key)
{
my $total = '0';
chomp($keys);
open FILE, $file or die "FILE $file NOT FOUND - $!\n";
while(<FILE>)
{
my $line = $_;
my $counter = () = $line =~ /$keys/gi;
$total = $total + $counter;
}
close(FILE);
print "$keys found in $total\n";
}
}

# perl pe3.pl
Prototype mismatch: sub main::retriver () vs none at pe3.pl line 36.
cygwin warning:
MS-DOS style path detected: C:\Users\xxxxx\Desktop\input.txt
Preferred POSIX equivalent is: /cygdrive/c/Users/xxxxx/Desktop/input.txt
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Count :3
Lines
BRA.l
RETI
RETI

How do I search for a value in a file and print it using Perl?

I am having trouble searching for a value and printing it. This is what I have so far. What am I doing wrong? How do i get the desired output by searching in the output?
my $host = $ARGV[0];
my $port = $ARGV[1];
my $domain = $ARGV[2];
my $bean = $ARGV[3];
my $get = $ARGV[4];
open(FILE, ">", "/home/hey");
print FILE "open $host:$port\n";
print FILE "domain $domain\n";
print FILE "bean $bean\n";
print FILE "get -s $get\n";
print FILE "close\n";
close FILE;
open JMX, "/root/jdk1.6.0_37/bin/java -jar /var/scripts/jmxterm-1.0-alpha-4-uber.jar -v silent -n < /home//hey |";
open (dbg, ">", "/home/donejava1");
#print JMX "help \n";
foreach ( <JMX> )
{
chomp;
print $_;
open (LOG, ">", "/home/out1");
print LOG $_;
close LOG;
}
//output
{
committed = 313733;
init = 3221225472;
max = 3137339392;
used = 1796598680;
}
// how do i print 1796598680, looking for the attribute "used" ?

The following example should provide a solution for you.
perl -lne'print $1 if /used\s*=\s*(\d+);/' filename

How to read binary file in Perl

I'm having an issue with writing a Perl script to read a binary file.
My code is as the following whereby the $file are files in binary format. I tried to search through the web and apply in my code, tried to print it out, but it seems it doesn't work well.
Currently it only prints the '&&&&&&&&&&&" and ""ppppppppppp", but what I really want is it can print out each of the $line, so that I can do some other post processing later. Also, I'm not quite sure what the $data is as I see it is part of the code from sample in article, stating suppose to be a scalar. I need somebody who can pin point me where the error goes wrong in my code. Below is what I did.
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my #dirs = readdir(TEMP1);
closedir(TEMP1);
foreach my $dirs (#dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my #files = readdir (TEMP2); # This should read binary files
closedir (TEMP2);
#my $buffer = "";
#opendir (FILE, $d) || die $!;
#binmode (FILE);
#my #files = readdir (FILE, $buffer, 169108570);
#closedir (FILE);
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
}
close FILE;
}
}
}
I have altered my code so that it goes like as below. Now I can read the $data. Thanks J-16 SDiZ for pointing out that. I'm trying to push the info I got from the binary file to an array called "#array", thinkking to grep data from the array for string whichever match "p04" but fail. Can someone point out where is the error?
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my #dirs = readdir (TEMP1);
closedir (TEMP1);
foreach my $dirs (#dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my #files = readdir (TEMP2); #This should read binary files
closedir (TEMP2);
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
push #array, $data;
}
close FILE;
}
}
}
foreach $item (#array) {
#print "==>$item<==\n"; # It prints out content of binary file without the ==> and <== if I uncomment this.. weird!
if ($item =~ /p04(.*)/) {
print "=>$item<===============\n"; # It prints "=><===============" according to the number of binary file I have. This is wrong that I aspect it to print the content of each binary file instead :(
next if ($item !~ /^w+/);
open (LOG, ">log") or die $!;
#print LOG $item;
close LOG;
}
}
Again, I changed my code as following, but it still doesn't work as it do not able to grep the "p04" correctly by checking on the "log" file. It did grep the whole file including binary like this "#^#^#^#^G^D^#^#^#^^#p04bbhi06^#^^#^#^#^#^#^#^#^#hh^R^#^#^#^^#^#^#p04lohhj09^#^#^#^^##" . What I'm aspecting is it do grep the anything with p04 only such as grepping p04bbhi06 and p04lohhj09. Here is how my code goes:-
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $f || die $!;
binmode FILE;
my #lines = <FILE>;
close FILE;
foreach $cell (#lines) {
if ($cell =~ /b12/) {
push #array, $cell;
}
}
}
#my #matches = grep /p04/, #lines;
#foreach $item (#matches) {
foreach $item (#array) {
#print "-->$item<--";
open (LOG, ">log") or die $!;
print LOG $item;
close LOG;
}

Use:
$line = read (FILE, $data, 169108570);
The data is in $data; and $line is the number of bytes read.
my $f = "$d/$file" ;
print "==>$file\n" ;
open FILE, $file || die $! ;
I guess the full path is in $f, but you are opening $file. (In my testing -- even $f is not the full path, but I guess you may have some other glue code...)
If you just want to walk all the files in a directory, try File::DirWalk or File::Find.

I am not sure if I understood you right.
If you need to read a binary file, you can do the same as for a text file:
open F, "/bin/bash";
my $file = do { local $/; <F> };
close F;
Under Windows you may need to add binmode F; under *nix it works without it.
If you need to find which lines in an array contains some word, you can use grep function:
my #matches = grep /something/, #array_to_grep;
You will get all matched lines in the new array #matches.
BTW: I don't think it's a good idea to read tons of binary files into memory at once. You can search them 1 by 1...
If you need to find where the match occurs you can use another standard function, index:
my $offset = index('myword', $file);

I'm not sure I'll be able to answer the OP question exactly, but here are some notes that may be related. (edit: this is the same approach as answer by #Dimanoid, but with more detail)
Say you have a file, which is a mix of ASCII data, and binary. Here is an example in a bash terminal:
$ echo -e "aa aa\x00\x0abb bb" | tee tester.txt
aa aa
bb bb
$ du -b tester.txt
13 tester.txt
$ hexdump -C tester.txt
00000000 61 61 20 61 61 00 0a 62 62 20 62 62 0a |aa aa..bb bb.|
0000000d
Note that byte 00 (specified as \x00) is a non-printable character, (and in C, it also means "end of a string") - thereby, its presence makes tester.txt a binary file. The file has size of 13 bytes as seen by du, because of the trailing \n added by the echo (as it can be seen from hexdump).
Now, let's see what happens when we try to read it with perl's <> diamond operator (see also What's the use of <> in perl?):
$ perl -e '
open IN, "<./tester.txt";
binmode(IN);
$data = <IN>; # does this slurp entire file in one go?
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
length is: 7
data is: --aa aa
--
Clearly, the entire file didn't get slurped - it broke at the line end \n (and not at the binary \x00). That is because the diamond filehandle <FH> operator is actually shortcut for readline (see Perl Cookbook: Chapter 8, File Contents)
The same link tells that one should undef the input record separator, \$ (which by default is set to \n), in order to slurp the entire file. You may want to have this change be only local, which is why the braces and local are used instead of undef (see Perl Idioms Explained - my $string = do { local $/; };); so we have:
$ perl -e '
open IN, "<./tester.txt";
print "_$/_\n"; # check if $/ is \n
binmode(IN);
{
local $/; # undef $/; is global
$data = <IN>; # this should slurp one go now
};
print "_$/_\n"; # check again if $/ is \n
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
_
_
_
_
length is: 13
data is: --aa aa
bb bb
--
... and now we can see the file is slurped in its entirety.
Since binary data implies unprintable characters, you may want to inspect the actual contents of $data by printing via sprintf or pack/unpack instead.
Hope this helps someone,
Cheers!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I open a Unicode file with Perl? - perl

Try opening the file with an IO layer specified, e.g. : open OUTPUT, "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n"; See perldoc open for more on this.

Related

eliminate empty files in a subroutine in perl

rename the file according PDF title

Perl : search a keywords from a file and display the occurence

How do I search for a value in a file and print it using Perl?

How to read binary file in Perl

Categories

Resources