perl + read multiple csv files + manipulate files + provide output_files + syntax error symbol ref - perl

Buiding on from this question. I am still having syntax trouble with this script:
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
#print "dh -> $dh\n";
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
# added the "()" here ($dh) as otherwise an error
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
print "format_file-> $format_file\n";
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print {$format_file} scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
I can fix this line opendir my $dh, $source_dir; by adding brackets ($dh)
but i am still having trouble with this line print {$format_file} scalar <$orig_file>; # Copy header line line
I get the following error:
Can't use string ("/home/Kevin Smith/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 29.
Can anyone advise?
I have tried using advise here but there is not much joy.

Use print $format_file ... or print ${format_file} ...
However $format_file is just a string containing the name of the file, not a filehandle. You have to open the file:
open my $format_fh, '>', $format_file or die $!;
...
print $format_$fh ... ;

Related

Writing a file and Reading it in Perl

I'm trying to build a primary key into a new file from an original File which has the following structure (tbl_20180615.txt):
573103150033,0664,54,MSS02VEN*',INT,zxzc,,,,,
573103150033,0665,54,MSS02VEN,INT,zxzc,,,,,
573103150080,0659,29,MSS05ARA',INT,zxzc,,,,,
573103150080,0660,29,MSS05ARA ,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
I launch my perl Verify.pl then I send the arguments, the first one is the number of columns to build the primary key in the new file, after I have to send the name of file (original file).
(Verify.pl)
#!/usr/bin/perl
use strict;
use warnings;
my $n1 = $ARGV[0];
my $name = $ARGV[1];
$n1 =~ s/"//g;
my $n2 = $n1 + 1;
my %seen;
my ( $file3 ) = qw(log.txt);
open my $fh3, '>', $file3 or die "Can't open $file3: $!";
print "Loading file ...\n";
open( my $file, "<", "$name" ) || die "Can't read file somefile.txt: $!";
while ( <$file> ) {
chomp;
my #rec = split( /,/, $_, $n2 ); #$n2 sirve para armar la primary key, hacer le split en los campos deseados
for ( my $i = 0; $i < $n1; $i++ ) {
print $fh3 "#rec[$i],";
}
print $fh3 "\n";
}
close( $file );
print "Done!\n";
#########to check duplicates
my ($file4) = qw(log.txt);
print "Checking duplicates records...\n\n";
open (my $file4, "<", "log.txt") || die "Can't read file log.txt: $!";
while ( <$file4> ) {
print if $seen{$_}++;
}
close($file4);
if I send the following instruction
perl Verify.pl 2 tbl_20180615.txt
this code build a new file called "log.txt" with the following structure, splitting the original file () into two columns given by the first argument:
(log.txt)
573103150033,0664,
573103150033,0665,
573103150080,0659,
573103150080,0660,
573103154377,1240,
573103154377,1240,
That works ok, but if I want to read the new file log.txt to check duplicates, it doesn't work, but If I comment the lines to generate the file log.txt (listed above) before the line in the code (###############to check duplicates################) launch the next part of the code it works ok, giving me two duplicates lines and looks like this:
(Result in command line)
573103154377,1240
573103154377,1240
How can I solve this issue?
I think this does what you're asking for. It builds a unique list of derived keys before printing any of them, using a hash to check whether a key has already been generated
Note that I have assigned values to #ARGV to emulate input values. You must remove that statement before running the program with input from the command line
#!/usr/bin/perl
use strict;
use warnings;
use autodie; # Handle bad IO statuses automatically
local #ARGV = qw/ 2 tbl_20180615.txt /; # For testing only
tr/"//d for #ARGV; # "
my ($key_fields, $input_file) = #ARGV;
my $output_file = 'log.txt';
my (#keys, %seen);
print "Loading input ... ";
open my $in_fh, '<', $input_file;
while ( <$in_fh> ) {
chomp;
my #rec = split /,/;
my $key = join ',', #rec[0..$key_fields-1];
push #keys, $key unless $seen{$key}++;
}
print "Done\n";
open my $out_fh, '>', $output_file;
print $out_fh "$_\n" for #keys;
close $out_fh;
output log.txt
573103150033,0664
573103150033,0665
573103150080,0659
573103150080,0660
573103154377,1240

How to handle soft links in Perl

Hi I am writing a Perl script to search for a file inside a specified directory, I used File::Find to do the same. But what happened was the folder which I was searching was pointing to a softlink, If I give the softlink path it was able to find the file correctly. But if I give the folder path it was saying file not found. I have also mentioned the script below. How to know that the folder/directory which we are searching is pointing to softlink or not. If it is pointing then how to get that softlink path to continue the search.
#!/pkg/qct/software/perl/5.22.0/bin/perl -w
##!usr/bin/perl
use strict;
#use warnings;
use File::Find;
my $gds_file_num;
my #gds_file;
our $dir = "/<path>/";
my #files_list;
use 5.010;
open( my $fh, '<', "data.txt") or die "Can't open data.txt: $!";
# -----------------------------------------------------------------------------
# Find the file from given directory path
# -----------------------------------------------------------------------------
sub find_file {
my $file_name = $_[0];
my $dir_path = $_[1];
my #result;
undef #result;
print "file=$file_name dir=$dir_path\n";
find ( sub {
return unless /$file_name$/;
push #result, $File::Find::name;
}, $dir_path );
if(($result[0] ~~ undef)){
print "[Warn] : File not exist : $file_name is not available under $dir_path\n";
return 0;
}
else {
return $result[0];
}
}
while ( my $line = <$fh> ) {
# chomp $line;
if ( $line =~ /g0hd/ ) {
print $line;
}
}
close $fh;

What produces the white space in my perl programm?

As the title says, I have a program or better two functions to read and write a file either in an array or to one. But now to the mean reason why I write this: when running my test several times my test program that tests my functions produces more and more white space. Is there somebody that could explain my fail and correct me?
my code
Helper.pm:
#!/usr/bin/env perl
package KconfCtl::Helper;
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
return #rray;
}
sub array_to_file($$;$) {
my #rray = #{ shift() };
my $file = shift();
my $mode = shift();
$mode='>' if not $mode;
my $filestream;
if ( not defined $file ) {
$filestream = STDOUT;
}
else {
open( $filestream, $mode, $file ) or die("cant open $file: $!");
}
my $l = #rray; print $l,"\n";
foreach my $line (#rray) {
print $filestream "$line\n";
}
close($filestream);
}
1;
test_helper.pl:
use KconfCtl::Helper;
use strict;
my #t;
#t= KconfCtl::Helper::file_to_array("kconf.test");
#print #t;
my $t_index=#t;
#t[$t_index]="n";
KconfCtl::Helper::array_to_file(\#t, "kconf.test", ">");
the result after the first:
n
and the 2nd run:
n
n
When you read from a file, the data includes the newline characters at the end of each line. You're not stripping those off, but you are adding an additional newline when you output your data again. That means your file is gaining additional blank lines each time you read and write it
Also, you must always use strict and use warnings 'all' at the top of every Perl script; you should avoid using subroutine prototypes; and you should declare all of your variables as late as possible
Here's a more idiomatic version of your module code which removes the newlines on input using chomp. Note that you don't need the #! line on the module file as it won't be run from the command line, but you my want it on the program file. It's also more normal to export symbols from a module using the Exporter module so that you don't have to qualify the subroutine names by prefixing them with the full package name
use strict;
use warnings 'all';
package KconfCtl::Helper;
sub file_to_array {
my ($file) = #_;
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!}; #'
chomp(my #array = <$fh>);
return #array;
}
sub array_to_file {
my ($array, $file, $mode) = #_;
$mode //= '>';
my $fh;
if ( $file ) {
open $fh, $mode, $file or die qq{Can't open "$file" for output: $!}; #'
}
else {
$fh = \*STDOUT;
}
print $fh $_, "\n" for #$array;
}
1;
and your test program would be like this
#!/usr/bin/env perl
use strict;
use warnings 'all';
use KconfCtl::Helper;
use constant FILE => 'kconf.test';
my #t = KconfCtl::Helper::file_to_array(FILE);
push #t, 'n';
KconfCtl::Helper::array_to_file(\#t, FILE);
When you read in from your file, you need to chomp() the lines, or else the \n at the end of the line is included.
Try this and you'll see what's happening:
use Data::Dumper; ## add this line
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, '<', $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
print Dumper( \#rray ); ### add this line
return #rray;
}
you can add
foreach(#rray){
chomp();
}
into your module to stop this happening.

perl + read multiple csv files + manipulate files + provide output_files

Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.

opening file in perl gives errors?

I am writing a perl script which reads a text file (which contains absolute paths of many files one below the other), calculates the file names from abs path & then appends all file names separated by a space to the same file. So, consider a test.txt file:
D:\work\project\temp.txt
D:\work/tests/test.abc
C:/office/work/files.xyz
So after running the script the same file will contain:
D:\work\project\temp.txt
D:\work/tests/test.abc
C:/office/work/files.xyz
temp.txt test.abc files.xyz
I have this script revert.pl:
use strict;
foreach my $arg (#ARGV)
{
open my $file_handle, '>>', $arg or die "\nError trying to open the file $arg : $!";
print "Opened File : $arg\n";
my #lines = <$file_handle>;
my $all_names = "";
foreach my $line (#lines)
{
my #paths = split(/\\|\//, $line);
my $last = #paths;
$last = $last - 1;
my $name = $paths[$last];
$all_names = "$all_names $name";
}
print $file_handle "\n\n$all_names";
close $file_handle;
}
When I run the script I am getting the following error:
>> perl ..\revert.pl .\test.txt
Too many arguments for open at ..\revert.pl line 5, near "$arg or"
Execution of ..\revert.pl aborted due to compilation errors.
What is wrong over here?
UPDATE: The problem is that we are using a very old version of perl. So changed the code to:
use strict;
for my $arg (#ARGV)
{
print "$arg\n";
open (FH, ">>$arg") or die "\nError trying to open the file $arg : $!";
print "Opened File : $arg\n";
my $all_names = "";
my $line = "";
for $line (<FH>)
{
print "$line\n";
my #paths = split(/\\|\//, $line);
my $last = #paths;
$last = $last - 1;
my $name = $paths[$last];
$all_names = "$all_names $name";
}
print "$line\n";
if ($all_names == "")
{
print "Could not detect any file name.\n";
}
else
{
print FH "\n\n$all_names";
print "Success!\n";
}
close FH;
}
Now its printing the following:
>> perl ..\revert.pl .\test.txt
.\test.txt
Opened File : .\test.txt
Could not detect any file name.
What could be wrong now?
Perhaps you are running an old perl version, so you have to use the 2 params open version:
open(File_handle, ">>$arg") or die "\nError trying to open the file $arg : $!";
note I wrote File_handle without the $. Also, read and writting operations to the file will be:
#lines = <File_handle>;
#...
print File_handle "\n\n$all_names";
#...
close File_handle;
Update: reading file lines:
open FH, "+>>$arg" or die "open file error: $!";
#...
while( $line = <FH> ) {
#...
}