How to check for files that has two different extensions in Perl - perl

I have a file reflog with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1, file2 & file3 here as example), it needs to be exist in both extensions (.abc and .def). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).
reflog:
file1.abc
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file8abc.def
file7.abc
file1.def
file9abc.def
file10def.abc
My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).
#! /usr/bin/perl
use strict;
use warnings;
my #files_abc ;
my #files_def ;
my $line;
open(FILE1, 'reflog') || die ("Could not open reflog") ;
open (FILE2, '>log') || die ("Could not open log") ;
while ($line = <FILE1>) {
if($line=~ /(.*).abc/) {
push(#files_abc,$1);
} elsif ($line=~ /(.*).def/) {
push(#files_def,$1); }
}
close(FILE1);
my %first = map { $_ => 1 } #files_def ;
my #same = grep { $first{$_} } #files_abc ;
my #abc_only = grep { !$first{$_} } #files_abc ;
foreach my $abc (sort #abc_only) {
$abc .= ".abc";
}
my %second = map {$_=>1} #files_abc;
my #same2 = grep { $second{$_} } #files_def; ##same and same2 are equal.
my #def_only = grep { !$second{$_} } #files_def;
foreach my $def (sort #def_only) {
$def .= ".def";
}
my #combine_all = sort (#same, #abc_only, #def_only);
print "\nCombine all:-\n #combine_all\n" ;
print "\nList of files with same extension\n #same";
print "\nList of files with abc only\n #abc_only";
print "\nList of files with def only\n #def_only";
foreach my $item (sort #combine_all) {
print FILE2 "$item\n" ;
}
close (FILE2) ;
My output is like this which is wrong:-
1st:- print screen output as below:
Combine all:-
file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc
List of files with same extension
file1 file2 file3 file5
List of files with abc only
file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
file6.def
Log output as below:
**file.abc
file.abc**
file1
file10def.abc
file2
file3
file4.abc
file5
file6.def
file7.abc
Can you pls help me take a look where gies wrong? Thanks heaps.

ALWAYS add
use strict;
use warnings;
to the head of your program. They will catch most simple errors before you need to ask for help.
You should always check whether a file open succeeded with open FILE, "reflog" or die $!;
You are using a variable $ine that doesn't exist. You mean $line
The lines you read into the array contain a trailing newline. Write chomp #lines; to remove them
Your regular expressions are wrong and you need || instead of &&. Instead write if ($line =~ /\.(iif|isp)$/)
If you still have problems when these are fixed then please ask again.

Aside from the errors already pointed out, you appear to be loading #lines from FUNC instead of FILE. Is that also a typo?
Also, If reflog truly contains a series of lines with one filename on each line, why would you ever expect the conditional "if ($line =~ /.abc/ && $line =~ /.def/)" to evaluate true?
It would really help if you could post an example from the actual file you are reading from, along with the actual code you are debugging. Or at least edit the question to fix the typos already mentioned

use strict;
use warnings;
my #files_abc;
my #files_def;
my $line;
open(FILE,'reflog') || die ("could not open reflog");
while ($line = <FILE>) {
if($line=~ /(.*)\.abc/) {
push(#files_abc,$1);
}
elsif($line=~ /(.*)\.def/) {
push(#files_def,$1);
}
}
close(FILE);
my %second = map {$_=>1} #files_def;
my #same = grep { $second{$_} } #files_abc;
print "\nList of files with same extension\n #same";
foreach my $abc (#files_abc) {
$abc .= ".abc";
}
foreach my $def (#files_def) {
$def .= ".def";
}
print "\nList of files with abc extension\n #files_abc";
print "\nList of files with def extension\n #files_def";
Output is
List of files with same extension
file1 file2 file3 file5
List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc
List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def
Hope this helps...

You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your reflog file:
xx.pl
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
Since the code does not actually check the extensions, it would be feasible to omit $oldextn and $newextn; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.
I very seldom find it good for a processing script like this to remove its own input, hence I've left unlink "reflog"; commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so the print statements come in pairs.
This is a variant on control-break reporting.
reflog
file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc
Output
$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$
To handle unsorted file names with blank lines
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (#lines)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
This is very similar to the original code I posted. The new lines are these:
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", ""); # Old
foreach my $newline (#lines)
This reads the 'reflog' file, skipping blank lines, saving the rest in the #lines array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:
file1
file2
file3
Urgh: the chomp $newline; is not needed, though it is not otherwise harmful. The old-fashioned chop (a precursor to chomp) would have been dangerous. Score one for modern Perl.

open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
$line =~ s/^\s*//;
if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
$seen{$1}++;
}
}
foreach my $file ( keys %seen ) {
if ( $seen{$file} > 1 ) {
## do whatever you want to
}
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);

Related

Can't find file trying to move

I'm trying to clean up a directory that contains a lot of sub directories that actually belong in some of the sub directories, not the main directory.
For example, there is
Main directory
sub1
sub2
sub3
HHH
And HHH belongs in sub3. HHH has multiple text files inside of it (as well as some ..txt and ...txt files that I would like to ignore), and each of these text files has a string
some_pattern [sub3].
So, I attempted to write a script that looks into the file and then moves it into its corresponding directory
use File::Find;
use strict;
use warnings;
use File::Copy;
my $DATA = "D:/DATA/DATA_x/*";
my #dirs = grep { -d } glob $DATA;
foreach (#dirs) {
if ($_ =~ m/HHH/) {
print "$_\n";
my $file = "$_/*";
my #files = grep { -f } glob $file;
foreach (#files) {
print "file $_\n";
}
foreach (#files) {
print "\t$_\n";
my #folders = split('/', $_);
if ($folders[4] eq '..txt' or $folders[4] eq '...txt') {
print "$folders[4] ..txt\n";
}
foreach (#folders) {
print "$_\n";
}
open(FH, '<', $_);
my $value;
while (my $line = <FH>) {
if ($line =~ m/some_pattern/) {
($value) = $line =~ /\[(.+?)\]/;
($value) =~ s/\s*$//;
print "ident'$value'\n";
my $new_dir = "$folders[0]/$folders[1]/$folders[2]/$value/$folders[3]/$folders[4]";
print "making $folders[0]/$folders[1]/$folders[2]/$value/$folders[3]\n";
print "file is $folders[4]\n";
my $new_over_dir = "$folders[0]/$folders[1]/$value/$folders[2]/$folders[3]";
mkdir $new_over_dir or die "Can't make it $!";
print "going to swap\n '$_'\n for\n '$new_dir'\n";
move($_, $new_dir) or die "Can't $!";
}
}
}
}
}
It's saying
Can't make it No such file or directory at foo.pl line 57, <FH> line 82.
Why is it saying that it won't make a file that doesn't exist?
A while later: here is my final script:
use File::Find;
use strict;
use warnings;
use File::Copy;
my $DATA = "D:/DATA/DATA_x/*";
my #dirs = grep { -d } glob $DATA;
foreach (#dirs) {
if ($_ =~ m/HHH/) {
my $value;
my #folders;
print "$_\n";
my $file = "$_/*";
my #files = grep { -f } glob $file;
foreach (#files) {
print "file $_\n";
}
foreach (#files) {
print "\t$_\n";
#folders = split('/', $_);
if ($folders[4] eq '..txt' or $folders[4] eq '...txt') {
print "$folders[4] ..txt\n";
}
foreach (#folders) {
print "$_\n";
}
open(FH, '<', $_);
while (my $line = <FH>) {
if ($line =~ m/some_pattern/) {
($value) = $line =~ /\[(.+?)\]/;
($value) =~ s/\s*$//;
print "ident'$value'\n";
}
}
}
if($value){
print "value $value\n";
my $dir1 = "/$folders[1]/$folders[2]/$folders[3]/$folders[4]/$folders[5]";
my $dir2 = "/$folders[1]/$folders[2]/$folders[3]/$folders[4]/$value";
system("cp -r $dir1 $dir2");
}
}
}
}
This works. It looks like part of my problem from before was that I was trying to run this on a directory in my D: drive--when I moved it to the C: drive, it worked fine without any permissions errors or anything. I did try to implement something with Path::Tiny, but this script was so close to being functional (and it was functional in a Unix environment), that I decided to just complete it.
You really should read the Path::Tiny doccu. It probably contains everything you need.
Some starting points, without error handling and so on...
use strict;
use warnings;
use Path::Tiny;
my $start=path('D:/DATA/DATA_x');
my $iter = path($start)->iterator({recurse => 1});
while ( $curr = $iter->() ) {
#select here the needed files - add more conditions if need
next if $curr->is_dir; #skip directories
next if $curr =~ m/HHH.*\.{2,3}txt$/; #skip ...?txt
#say "$curr";
my $content = $curr->slurp;
if( $content =~ m/some_pattern/ ) {
#do something wih the file
say "doing something with $curr";
my $newfilename = path("insert what you need here"); #create the needed new path for the file ..
path($newfilename->dirname)->mkpath; #make directories
$curr->move($newfilename); #move the file
}
}
Are you sure of the directory path you are trying to create. The mkdir call might be failing if some of the intermediate directories doesn't exist. If your code is robust to ensure that
the variable $new_over_dir contains the directory path you have to create, you can use method make_path from perl module File::Path to create the new directory, instead of 'mkdir'.
From the documentation of make_path:
The make_path function creates the given directories if they don't
exists before, much like the Unix command mkdir -p.

how to find the first occurrence of a string in all the files in a folder in perl

I'm trying to find the line of first occurrence of the string "victory" in each txt file in a folder. For each first "victory" in file I would like to save the number from that line to #num and the file name to #filename
Example: For the file a.txt that starts with the line: "lalala victory 123456" -> $num[$i]=123456 and $filename[$i]="a.txt"
ARGV holds all the file names. my problem is that I'm trying to go line by line and I don't know what I'm doing wrong.
one more thing - how can I get the last occurrence of "victory" in the last file??
use strict;
use warnings;
use File::Find;
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #ARGV, $File::Find::name } }, $dir); $^I = ".bak";
my $argvv;
my $counter=0;
my $prev_arg=0;
my $line = 0;
my #filename=0;
my #num=0;
my $i = 0;
foreach $argvv (#ARGV)
{
#open $line, $argvv or die "Could not open file: $!";
my $line = IN
while (<$line>)
{
if (/victory/)
{
$line = s/[^0-9]//g;
$first_bit[$i] = $line;
$filename[$i]=$argvv;
$i++;
last;
}
}
close $line;
}
for ($i=0; $i<3; $i++)
{
print $filename[$i]." ".$num[$i]."\n";
}
Thank you very much! :)
Your example script has a number of minor problems. The following example should do what you want in a fairly clean manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
# Find the files we're interested in parsing
my #files = ();
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #files, $File::Find::name } }, $dir);
# We'll store our results in a hash, rather than in 2 arrays as you did
my %foundItems = ();
foreach my $file (#files)
{
# Using a lexical file handle is the recommended way to open files
open my $in, '<', $file or die "Could not open $file: $!";
while (<$in>)
{
# Uncomment the next two lines to see what's being parsed
# chomp; # Not required, but helpful for the debug print below
# print "$_\n"; # Print out the line being parsed; for debugging
# Capture the number if we find the word 'victory'
# This assumes the number is immediately after the word; if that
# is not the case, it's up to you to modify the logic here
if (m/victory\s+(\d+)/)
{
$foundItems{$file} = $1; # Store the item
last;
}
}
close $in;
}
foreach my $file (sort keys %foundItems)
{
print "$file=> $foundItems{$file}\n";
}
the below searches for a string abc in all the files(file*.txt) and prints only the first line.
perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' file*.txt
tested:
> cat temp
abc 11
22
13
,,
abc 22
bb
cc
,,
ww
kk
ll
,,
> cat temp2
abc t goes into 1000
fileA1, act that abc specific place
> perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' temp temp2
abc 11
abc t goes into 1000
>

Perl script to merge multiple files line by line

Can anyone please help me with writing a Perl script which can take as input 5 text files and create a new text file with merging each row of all 5 files.
Should this be done by opening 5 read streams at a time or like java some random file reader is available in Perl ?
Thank You!
Here is a Perl script that will work on an arbitrary number of files:
use strict;
use warnings;
my #files = ('a.txt','b.txt');
my #fh;
#create an array of open filehandles.
#fh = map { open my $f, $_ or die "Cant open $_:$!"; $f } #files;
open my $out_file, ">merged.txt" or die "can't open out_file: $!";
my $output;
do
{
$output = '';
foreach (#fh)
{
my $line = <$_>;
if (defined $line)
{
#Special case: might not be a newline at the end of the file
#add a newline if none is found.
$line .= "\n" if ($line !~ /\n$/);
$output .= $line;
}
}
print {$out_file} $output;
}
while ($output ne '');
a.txt:
foo1
foo2
foo3
foo4
foo5
b.txt:
bar1
bar2
bar3
merged.txt:
foo1
bar1
foo2
bar2
foo3
bar3
foo4
foo5
This program expects a list of files on the command line (or, on Unix systems, a wildcard file spec). It creates an array of filehandles #fh for these files and then reads from each of them in turn, printing the merged data to STDOUT
use strict;
use warnings;
my #fh;
for (#ARGV) {
open my $fh, '<', $_ or die "Unable to open '$_' for reading: $!";
push #fh, $fh;
}
while (grep { not eof } #fh) {
for my $fh (#fh) {
if (defined(my $line = <$fh>)) {
chomp $line;
print "$line\n";
}
}
}
If a non-perl solution is ok with you, you can try this:
paste -d"\n\n\n\n\n" f1 f2 f3 f4 f5
where f1,f2..are your text files.

How to read binary file in Perl

I'm having an issue with writing a Perl script to read a binary file.
My code is as the following whereby the $file are files in binary format. I tried to search through the web and apply in my code, tried to print it out, but it seems it doesn't work well.
Currently it only prints the '&&&&&&&&&&&" and ""ppppppppppp", but what I really want is it can print out each of the $line, so that I can do some other post processing later. Also, I'm not quite sure what the $data is as I see it is part of the code from sample in article, stating suppose to be a scalar. I need somebody who can pin point me where the error goes wrong in my code. Below is what I did.
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my #dirs = readdir(TEMP1);
closedir(TEMP1);
foreach my $dirs (#dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my #files = readdir (TEMP2); # This should read binary files
closedir (TEMP2);
#my $buffer = "";
#opendir (FILE, $d) || die $!;
#binmode (FILE);
#my #files = readdir (FILE, $buffer, 169108570);
#closedir (FILE);
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
}
close FILE;
}
}
}
I have altered my code so that it goes like as below. Now I can read the $data. Thanks J-16 SDiZ for pointing out that. I'm trying to push the info I got from the binary file to an array called "#array", thinkking to grep data from the array for string whichever match "p04" but fail. Can someone point out where is the error?
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my #dirs = readdir (TEMP1);
closedir (TEMP1);
foreach my $dirs (#dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my #files = readdir (TEMP2); #This should read binary files
closedir (TEMP2);
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
push #array, $data;
}
close FILE;
}
}
}
foreach $item (#array) {
#print "==>$item<==\n"; # It prints out content of binary file without the ==> and <== if I uncomment this.. weird!
if ($item =~ /p04(.*)/) {
print "=>$item<===============\n"; # It prints "=><===============" according to the number of binary file I have. This is wrong that I aspect it to print the content of each binary file instead :(
next if ($item !~ /^w+/);
open (LOG, ">log") or die $!;
#print LOG $item;
close LOG;
}
}
Again, I changed my code as following, but it still doesn't work as it do not able to grep the "p04" correctly by checking on the "log" file. It did grep the whole file including binary like this "#^#^#^#^G^D^#^#^#^^#p04bbhi06^#^^#^#^#^#^#^#^#^#hh^R^#^#^#^^#^#^#p04lohhj09^#^#^#^^##" . What I'm aspecting is it do grep the anything with p04 only such as grepping p04bbhi06 and p04lohhj09. Here is how my code goes:-
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $f || die $!;
binmode FILE;
my #lines = <FILE>;
close FILE;
foreach $cell (#lines) {
if ($cell =~ /b12/) {
push #array, $cell;
}
}
}
#my #matches = grep /p04/, #lines;
#foreach $item (#matches) {
foreach $item (#array) {
#print "-->$item<--";
open (LOG, ">log") or die $!;
print LOG $item;
close LOG;
}
Use:
$line = read (FILE, $data, 169108570);
The data is in $data; and $line is the number of bytes read.
my $f = "$d/$file" ;
print "==>$file\n" ;
open FILE, $file || die $! ;
I guess the full path is in $f, but you are opening $file. (In my testing -- even $f is not the full path, but I guess you may have some other glue code...)
If you just want to walk all the files in a directory, try File::DirWalk or File::Find.
I am not sure if I understood you right.
If you need to read a binary file, you can do the same as for a text file:
open F, "/bin/bash";
my $file = do { local $/; <F> };
close F;
Under Windows you may need to add binmode F; under *nix it works without it.
If you need to find which lines in an array contains some word, you can use grep function:
my #matches = grep /something/, #array_to_grep;
You will get all matched lines in the new array #matches.
BTW: I don't think it's a good idea to read tons of binary files into memory at once. You can search them 1 by 1...
If you need to find where the match occurs you can use another standard function, index:
my $offset = index('myword', $file);
I'm not sure I'll be able to answer the OP question exactly, but here are some notes that may be related. (edit: this is the same approach as answer by #Dimanoid, but with more detail)
Say you have a file, which is a mix of ASCII data, and binary. Here is an example in a bash terminal:
$ echo -e "aa aa\x00\x0abb bb" | tee tester.txt
aa aa
bb bb
$ du -b tester.txt
13 tester.txt
$ hexdump -C tester.txt
00000000 61 61 20 61 61 00 0a 62 62 20 62 62 0a |aa aa..bb bb.|
0000000d
Note that byte 00 (specified as \x00) is a non-printable character, (and in C, it also means "end of a string") - thereby, its presence makes tester.txt a binary file. The file has size of 13 bytes as seen by du, because of the trailing \n added by the echo (as it can be seen from hexdump).
Now, let's see what happens when we try to read it with perl's <> diamond operator (see also What's the use of <> in perl?):
$ perl -e '
open IN, "<./tester.txt";
binmode(IN);
$data = <IN>; # does this slurp entire file in one go?
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
length is: 7
data is: --aa aa
--
Clearly, the entire file didn't get slurped - it broke at the line end \n (and not at the binary \x00). That is because the diamond filehandle <FH> operator is actually shortcut for readline (see Perl Cookbook: ChapterĀ 8, File Contents)
The same link tells that one should undef the input record separator, \$ (which by default is set to \n), in order to slurp the entire file. You may want to have this change be only local, which is why the braces and local are used instead of undef (see Perl Idioms Explained - my $string = do { local $/; };); so we have:
$ perl -e '
open IN, "<./tester.txt";
print "_$/_\n"; # check if $/ is \n
binmode(IN);
{
local $/; # undef $/; is global
$data = <IN>; # this should slurp one go now
};
print "_$/_\n"; # check again if $/ is \n
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
_
_
_
_
length is: 13
data is: --aa aa
bb bb
--
... and now we can see the file is slurped in its entirety.
Since binary data implies unprintable characters, you may want to inspect the actual contents of $data by printing via sprintf or pack/unpack instead.
Hope this helps someone,
Cheers!

File manipulation in Perl

I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.
To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;
If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.