perl + read multiple csv files + manipulate files + provide output_files - perl

Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}

How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.

Related

How to handle soft links in Perl

Hi I am writing a Perl script to search for a file inside a specified directory, I used File::Find to do the same. But what happened was the folder which I was searching was pointing to a softlink, If I give the softlink path it was able to find the file correctly. But if I give the folder path it was saying file not found. I have also mentioned the script below. How to know that the folder/directory which we are searching is pointing to softlink or not. If it is pointing then how to get that softlink path to continue the search.
#!/pkg/qct/software/perl/5.22.0/bin/perl -w
##!usr/bin/perl
use strict;
#use warnings;
use File::Find;
my $gds_file_num;
my #gds_file;
our $dir = "/<path>/";
my #files_list;
use 5.010;
open( my $fh, '<', "data.txt") or die "Can't open data.txt: $!";
# -----------------------------------------------------------------------------
# Find the file from given directory path
# -----------------------------------------------------------------------------
sub find_file {
my $file_name = $_[0];
my $dir_path = $_[1];
my #result;
undef #result;
print "file=$file_name dir=$dir_path\n";
find ( sub {
return unless /$file_name$/;
push #result, $File::Find::name;
}, $dir_path );
if(($result[0] ~~ undef)){
print "[Warn] : File not exist : $file_name is not available under $dir_path\n";
return 0;
}
else {
return $result[0];
}
}
while ( my $line = <$fh> ) {
# chomp $line;
if ( $line =~ /g0hd/ ) {
print $line;
}
}
close $fh;

What produces the white space in my perl programm?

As the title says, I have a program or better two functions to read and write a file either in an array or to one. But now to the mean reason why I write this: when running my test several times my test program that tests my functions produces more and more white space. Is there somebody that could explain my fail and correct me?
my code
Helper.pm:
#!/usr/bin/env perl
package KconfCtl::Helper;
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
return #rray;
}
sub array_to_file($$;$) {
my #rray = #{ shift() };
my $file = shift();
my $mode = shift();
$mode='>' if not $mode;
my $filestream;
if ( not defined $file ) {
$filestream = STDOUT;
}
else {
open( $filestream, $mode, $file ) or die("cant open $file: $!");
}
my $l = #rray; print $l,"\n";
foreach my $line (#rray) {
print $filestream "$line\n";
}
close($filestream);
}
1;
test_helper.pl:
use KconfCtl::Helper;
use strict;
my #t;
#t= KconfCtl::Helper::file_to_array("kconf.test");
#print #t;
my $t_index=#t;
#t[$t_index]="n";
KconfCtl::Helper::array_to_file(\#t, "kconf.test", ">");
the result after the first:
n
and the 2nd run:
n
n
When you read from a file, the data includes the newline characters at the end of each line. You're not stripping those off, but you are adding an additional newline when you output your data again. That means your file is gaining additional blank lines each time you read and write it
Also, you must always use strict and use warnings 'all' at the top of every Perl script; you should avoid using subroutine prototypes; and you should declare all of your variables as late as possible
Here's a more idiomatic version of your module code which removes the newlines on input using chomp. Note that you don't need the #! line on the module file as it won't be run from the command line, but you my want it on the program file. It's also more normal to export symbols from a module using the Exporter module so that you don't have to qualify the subroutine names by prefixing them with the full package name
use strict;
use warnings 'all';
package KconfCtl::Helper;
sub file_to_array {
my ($file) = #_;
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!}; #'
chomp(my #array = <$fh>);
return #array;
}
sub array_to_file {
my ($array, $file, $mode) = #_;
$mode //= '>';
my $fh;
if ( $file ) {
open $fh, $mode, $file or die qq{Can't open "$file" for output: $!}; #'
}
else {
$fh = \*STDOUT;
}
print $fh $_, "\n" for #$array;
}
1;
and your test program would be like this
#!/usr/bin/env perl
use strict;
use warnings 'all';
use KconfCtl::Helper;
use constant FILE => 'kconf.test';
my #t = KconfCtl::Helper::file_to_array(FILE);
push #t, 'n';
KconfCtl::Helper::array_to_file(\#t, FILE);
When you read in from your file, you need to chomp() the lines, or else the \n at the end of the line is included.
Try this and you'll see what's happening:
use Data::Dumper; ## add this line
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, '<', $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
print Dumper( \#rray ); ### add this line
return #rray;
}
you can add
foreach(#rray){
chomp();
}
into your module to stop this happening.

Merge txt files in Perl, but modify them before, leaving original files untouched

I've already posted a question and fixed the problem in my code, but now my "specification has changed" so to say, and now I need to change some things about it.
Here's a code that takes all .txt files from the current directory, cuts off the last line of the first file, the first and the last line of every following file and the first line of the last file and writes everything in a new file (in other words: merge all files, deleting header and footer so that the new file has only one header and one footer).
#!/usr/bin/perl
use warnings;
use Cwd;
use Tie::File;
use Tie::Array;
my $cwd = getcwd();
my $buff = '';
# Get all files in cwd.
my #files = grep ( -f ,<*.txt>);
# Cut off header and footer of $files [1] to $files[$#files-1],
# but only footer of $files[0] and header of $#files[$#files]
for (my $i = 0; $i <= $#files; $i++) {
print 'Opening ' . $files[$i] . "\n";
tie (#lines, Tie::File, $files[$i]) or die "can't update $file: $!";
splice #lines, 0, 1 unless $i == 0;
splice #lines, -1, 1 unless $i == $#files;
untie #lines;
open (file, "<", $files[$i]) or die "can't update $file: $!";
while (my $line =<file>) {
$buff .= $line;
}
close file;
}
# Write the buffer to a new file.
my $allfilename = $cwd.'/Trace.txt';
print 'Writing all files into new file: ' . $allfilename . "\n";
open $outputfile, ">".$allfilename or die "can't write to new file $outputfile: $!";
# Write the buffer into the output file.
print $outputfile $buff;
close $outputfile;
My problem: I don't want to change the original files, but my code does exactly that and I'm having trouble coming up with a solution. The simplest way (simple meaning not having to change too much code) would now be, to just copy all the files to a tmp directory, messing around with them and leaving the original files untouched. Problem: a simple use of dircopy doesn't do it for me, since you have to give a new tmp dir to the dircopy function, making the code only usable for Windows or UNIX systems (but I need portability).
The next approach would be to make use of the File::Temp module but I'm really having trouble with the docs on this one.
Does anybody have a good idea on this one?
I suspected that you didn't really want your original files modified when I answered your previous question.
I don't understand why you've gone back to accumulating all the text in a buffer before printing it, or why you've removed use strict, which is essential to any well-written Perl code.
Here's my previous solution modified to leave the input data untouched.
use strict;
use warnings;
use Tie::File;
my #files = grep -f, glob '*.txt';
my $all_filename = 'Trace.txt';
open my $out_fh, '>', $all_filename or die qq{Unable to open "$all_filename" for output: $!};
for my $i ( 0 .. $#files ) {
my $file = $files[$i];
next if $file eq $all_filename;
print "Opening $file\n";
tie my #lines, 'Tie::File', $file or die qq{Can't open "$file": $!};
my ($start, $end) = (0, $#lines);
++$start unless $i == 0;
--$end unless $i == $#files;
print $out_fh "$_\n" for #lines[$start..$end];
}
close $out_fh;
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $outfile = 'Trace.txt';
# Get all files in cwd.
my #files = grep { -f && $_ ne $outfile } <*.txt>;
open my $outfh, '>', $outfile;
for my $file (#files) {
my #lines = do { local #ARGV = $file; <> };
shift #lines unless $file eq $files[0];
pop #lines unless $file eq $files[-1];
print $outfh #lines;
}
Just do not use Tie::File. Or is there a reason you do this, for example all your files together do not fit your memory or something?
A version very close to your current implementation would be something like the following (untested) code. It just skips the part where you update the file, just to reopen and read it afterwards. (Note that this is certainly not a very effective or overly elegant way to do this, it just sticks to your implementation as close as possible)
#!/usr/bin/perl
use warnings;
use Cwd;
# use Tie::File;
# use Tie::Array;
my $cwd = getcwd();
my $buff = '';
# Get all files in cwd.
my #files = grep ( -f ,<*.txt>);
# Cut off header and footer of $files [1] to $files[$#files-1],
# but only footer of $files[0] and header of $#files[$#files]
for (my $i = 0; $i <= $#files; $i++) {
print 'Opening ' . $files[$i] . "\n";
open (my $fh, "<", $files[$i]) or die "can't open $file for reading: $!";
my #lines = <$fh>;
splice #lines, 0, 1 unless $i == 0;
splice #lines, -1, 1 unless $i == $#files;
foreach my $line (#lines) {
$buff .= $line;
}
}
# Write the buffer to a new file.
my $allfilename = $cwd.'/Trace.txt';
print 'Writing all files into new file: ' . $allfilename . "\n";
open $outputfile, ">".$allfilename or die "can't write to new file $outputfile: $!";
# Write the buffer into the output file.
print $outputfile $buff;
close $outputfile;
Based on Miller's answer, but most suitable for large files.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $outfile = 'Trace.txt';
# Get all files in cwd.
my #files = grep { -f && $_ ne $outfile } <*.txt>;
open my $outfh, '>', $outfile;
my $counter = 0;
for my $file (#files) {
open my $fh, '<', $file;
my ($line, $prev) = ('', '');
my $l = 0;
while ($line = <$fh>) {
print $outfh $prev unless $l++ == 1 and $counter > 0;
$prev = $line;
}
$counter++;
print $outfh $prev if $counter == #files and $l > 0;
close $fh;
}

perl + read multiple csv files + manipulate files + provide output_files + syntax error symbol ref

Buiding on from this question. I am still having syntax trouble with this script:
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
#print "dh -> $dh\n";
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
# added the "()" here ($dh) as otherwise an error
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
print "format_file-> $format_file\n";
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print {$format_file} scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
I can fix this line opendir my $dh, $source_dir; by adding brackets ($dh)
but i am still having trouble with this line print {$format_file} scalar <$orig_file>; # Copy header line line
I get the following error:
Can't use string ("/home/Kevin Smith/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 29.
Can anyone advise?
I have tried using advise here but there is not much joy.
Use print $format_file ... or print ${format_file} ...
However $format_file is just a string containing the name of the file, not a filehandle. You have to open the file:
open my $format_fh, '>', $format_file or die $!;
...
print $format_$fh ... ;

Can i collect the output of find(\&wanted, #directories) in an array

I am writing a script which will traverse the directory(including subdir also) and push the desired file in an array so that i can work on each file.
Here is my code:
use strict;
use warnings;
use File::Find;
my $path = $ARGV[0];
find({ wanted => \&GetappropriateFile }, $path);
sub GetappropriateFile
{
my $file = $_;
my #all_file;
# print "$file\n";
if ( -f and /traces[_d+]/)
{
#print "$file\n";
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while( my $line = <$fh>){
$line =~ /Cmd\sline:\s+com.android*/;
push(#all_file,$file);
#print "$file\n";
}
close($fh);
#print"#all_file\n";
}
}
Problem Area : my $file = $_;
Instead of using " $file" if i could get a way to use an array here then i can easily read those files one by one and filter it.
Here what i am tring to do is : I have to open each file and check for the string "Cmd line: com.android" as soon as i get this string in the file i have to push this current file in an array and start reading the another file.
It would be better to avoid global vars.
use strict;
use warnings;
use File::Find qw( find );
sub IsAppropriateFile {
my ($file) = #_;
if (-f $file && $file =~ /traces[_d+]/) {
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while ( my $line = <$fh> ) {
if ($line =~ /Cmd\sline:\s+com.android*/) {
return 1;
}
}
}
return 0;
}
{
my $path = $ARGV[0];
my #matching_files;
find({
wanted => sub {
push #matching_files, $_ if IsAppropriateFile($_);
},
}, $path);
print("$_\n") for #matching_files; # Or whatever.
}
Put declaration of #all_file outside of function, and use it after find() finishes,
my #all_file;
sub GetappropriateFile
{
..
}
You could also stop with file reading after successful match,
if ($line =~ /Cmd\sline:\s+com.android*/) {
push(#all_file, $file);
last;
}