Perl - Import contents of file into another file - perl

The below code I'm trying to produce is trying to do this:
What I'm trying to do is running a BTEQ script that gets data from a DB then exports to a flat-file, that flat file is picked up my a Perl script (the above code), then with this post trying to get perl to import that file it gets into a fastload file. Does that make more sense?
while (true) {
#Objective: open dir, get flat-file which was exported from bteq
opendir (DIR, "C:/q2refresh/") or die "Cannot open /my/dir: $!\n"; #open directory with the flat-file
my #Dircontent = readdir DIR;
$filetobecopied = "C:/q2refresh/q2_refresh_prod_export.txt"; #flatfile exported from bteq
$newfile = "C:/q2refresh/Q2_FastLoadFromFlatFile.txt"; #new file flat-file contents will be copied to as "fastload"
copy($filetobecopied, $newfile) or die "File cannot be copied.";
close DIR;
my $items_in_dir = #Dircontent;
if ($items_in_dir > 2) { # > 2 because of "." and ".."
-->>>>>> # take the copied FlatFile above and import into a fastload script located at C:/q2refresh/q2Fastload.txt
}
else {sleep 100;}
}
I need help with implementing the above bolded section. How do I import the contents of C:/q2refresh/Q2_FastLoadFromFlatFile.txt into a fastload script located at C:/q2refresh/q2Fastload.txt.
// I apologize if this is somewhat newbish, but I am new to Perl.
Thanks.

if ($items_in_dir > 2) { # > 2 because of "." and ".."
Well, when including . and .., plus the two copies of q2_refresh_prod_export.txt, you will always have more than 2 files in the directory. If such a case should occur that q2_refresh_prod_export.txt is not copied, the script will die. So the else clause will never be called.
Also, it is pointless to copy the file to a new place, if you are simply going to copy it to another place in a second. It's not like "cut & paste", you actually, physically copy the file to a new file, not a clipboard.
If by "import into" mean that you want to append the contents of q2_refresh_prod_export.txt to an existing q2Fastload.txt, there are ways to do that, such as what Troy suggested in another answer, with an open and >> (append to).
You will have to sort out what you mean by this whole $items_in_dir condition. You are keeping files and copying files in that directory, so what is it exactly that you are checking for? Whether the files have all been removed (by some other process)?

I can't tell what you're trying to do. Could it be that you just want to do this?
open SOURCE, $newfile;
open SINK, '>>C:/q2refresh/q2Fastload.txt';
while (<SOURCE>) {
print SINK $_;
}
close SOURCE;
close SINK;
That will append the contents of $newfile to your fastload file.

Related

Perl: Select Filepath for csv file inside folder

I have a Perl Script which does some data manipulation with a selected CSV file. In the past, I have renamed the CSV file to match the one specified inside my script.
I now want to change it so that the sole file in a folder is selected, but the csv file is not always named the same. There will only ever be a single file in the folder.
I currently use this method;
my $filepath_in = 'C:\delete_csv_files\files_new\input.csv';
my $filepath_out = 'C:\delete_csv_files\files_processed\output.csv';
open my $in, '<:encoding(utf8)', $filepath_in or die;
open my $out, '>:encoding(utf8)', $filepath_out or die;
I also want the file to retain its original name after its been processed.
Can anyone give me any pointers?
As suggested by toolic and commented by ikegami, you can use glob.
my ($filepath_in) = glob 'C:\delete_csv_files\files_new\*';
Then you can use a regex to generate the name of the output file, like :
(my $filepath_out = $filepath_in) =~ s!\\files_new\\!\\files_processed\\!;
This will give you a file with the same name, in directory files_processed.
If you want to force the name of the ouput file to output.csv like in your code snippet, then use this regex instead :
(my $filepath_out = $filepath_in) =~ s!\\files_new\\.*$!\\files_processed\\output.csv!;

How to get the list of files that are not in another directory in perl

I have to fix a Perl script, which does the following:
# Get the list of files in the staging directory; skip all beginning with '.'
opendir ERR_STAGING_DIR, "$ERR_STAGING" or die "$PID: Cannot open directory $ERR_STAGING";
#allfiles = grep !/^$ERR_STAGING\/\./, map "$ERR_STAGING/$_", readdir(ERR_STAGING_DIR);
closedir(ERR_STAGING_DIR);
I have two directories one is STAGING and other is ERROR. STAGING contains files like ABC_201608100000.fin and ERR_STAGING_DIR contains ABC_201608100000.fin.bc_lerr.xml. Now the Perl script is run as a daemon process which constantly looks for the files in ERR_STAGING_DIR directory and processes the error files.
However, my requirement is to do not process the file if ABC_201608100000.fin exists in STAGING.
Question:
Is there a way , I can filter the allfiles array and select files which don't exist in STAGING directory?
WHAT I HAVE TRIED:
I have done programmatic way to ignore the files that exist in STAGING dir. Though it is not working.
# Move file from the staging directory to the processing directory.
#splitf = split(/.bc_lerr.xml/,basename($file));
my $finFile = $STAGING . "/" . $splitf[0];
print LOG "$PID: Staging File $finFile \n";
foreach $file(#sorted_allfiles) {
if ( -e $finFile )
{
print LOG "$PID: Staging File still exist.. moving to next $finFile \n";
next;
}
# DO THE PROCESSING.
The questions of timing aside, I assume that a snapshot of files may be processed without worrying about new files showing up. I take it that #allfiles has all file names from the ERROR directory.
Remove a file name from the front of the array at each iteration. Check for the corresponding file in STAGING and if it's not there process away, otherwise push it on the back of the array and skip.
while (#allfiles)
{
my $errfile = shift #allfiles;
my ($file) = $errfile =~ /(.*)\.bc_lerr\.xml$/;
if (-e "$STAGING/$file")
{
push #allfiles, $errfile;
sleep 1; # more time for existing files to clear
next;
}
# process the error file
}
If the processing is faster than what it takes for existing files in STAGING to go away, we would exhaust all processable files and then continuously run file tests. There is no reason for such abuse of resources, thus the sleep, to give STAGING files some more time to go away. Note that if just one file in STAGING fails to go away this loop will keep checking it and you want to add some guard against that.
Another way would be to process the error files with a foreach, and add those that should be skipped to a separate array. That can then be attempted separately, perhaps with a suitable wait.
How suitable this is depends on details of the whole process. For how long do STAGING files hang around, and is this typical or exceptional? How often do new files show up? How many files are there typically?
If you only wish to filter out the error files that have their counterparts in STAGING
my #errfiles_nostaging = grep {
my ($file) = $_ =~ /(.*)\.bc_lerr\.xml$/;
not -e "$STAGING/$file";
} #allfiles;
The output array contains the files from #allfiles which have no corresponding file in $STAGING and can be readily processed. This would be suitable if the error files are processed very fast in comparison to how long the $STAGING files stay around.
The filter can be written in one statement as well. For example
grep { not -e "$STAGING/" . s/\.bc_lerr\.xml$//r } # / or
grep { not -e "$STAGING/" . (split /\.bc_lerr\.xml$/, $_)[0] }
The first example uses the non-destructive /r modifier, available since 5.14. It changes the substitution to return the changed string and not change the original one. See it in perlrequick and in perlop.
This is extremely brute force example, but if you have the contents of the staging directory in an array, you can check against that array when you read the contents of the error directory.
I've made some GIGANTIC assumptions about the relationship of the filenames -- basically that the stage directory contains the file truncated, specifically the way you listed in your example. If that's universally the case, then a substring would work even faster, but this example is a little more scalable, in the event your example was simplified to illustrate the issue.
use strict;
my #error = qw(
ABC_201608100000.fin.bc_lerr.xml
ABD_201608100000.fin.bc_lerr.xml
ABE_201608100000.fin.bc_lerr.xml
ABF_201608100000.fin.bc_lerr.xml
);
my #staging = qw(
ABC_201608100000.fin
ABD_201608100000.fin
);
foreach my $error (#error) {
my $stage = $error;
$stage =~ s/\.bc_lerr\.xml//;
unless (grep { /$stage/ } #staging) {
## process the file here
}
}
The grep in this example is O(n), so if you have a really large list of either array you would want to load this into a hash first, which would be O(1).

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

perl file size calculation not working

I am trying to write a simple perl script that will iterate through the regular files in a directory and calculate the total size of all the files put together. However, I am not able to get the actual size of the file, and I can't figure out why. Here is the relevant portion of the code. I put in print statements for debugging:
$totalsize = 0;
while ($_ = readdir (DH)) {
print "current file is: $_\t";
$cursize = -s $_;
print "size is: $cursize\n";
$totalsize += $cursize;
}
This is the output I get:
current file is: test.pl size is:
current file is: prob12.pl size is:
current file is: prob13.pl size is:
current file is: prob14.pl size is:
current file is: prob15.pl size is:
So the file size remains blank. I tried using $cursize = $_ instead but the only effect of that was to retrieve the file sizes for the current and parent directories as 4096 bytes each; it still didn't get any of the actual file sizes for the regular files.
I have looked online and through a couple of books I have on perl, and it seems that perl isn't able to get the file sizes because the script can't read the files. I tested this by putting in an if statement:
print "Cannot read file $_\n" if (! -r _);
Sure enough for each file I got the error saying that the file could not be read. I do not understand why this is happening. The directory that has the files in question is a subdirectory of my home directory, and I am running the script as myself from another subdirectory in my home directory. I have read permissions to all the relevant files. I tried changing the mode on the files to 755 (from the previous 711), but I still got the Cannot read file output for each file.
I do not understand what's going on. Either I am mixed up about how permissions work when running a perl script, or I am mixed up about the proper way to use -s _. I appreciate your guidance. Thanks!
If it isn't just your typo -s _ instead of the correct -s $_ then please remember that readdir returns file names relative to the directory you've opened with opendir. The proper way would be something like
my $base_dir = '/path/to/somewhere';
opendir DH, $base_dir or die;
while ($_ = readdir DH) {
print "size of $_: " . (-s "$base_dir/$_") . "\n";
}
closedir DH;
You could also take a look at the core module IO::Dir which offers a tie way of accessing both the file names and the attributes in a simpler manner.
You have a typo:
$cursize = -s _;
Should be:
$cursize = -s $_;

Perl - Print output to files in a different directory

Good morning!
I am sorry if this has been posted before - I could not find it. I just need a little point into the right direction, this is my homework and I feel it is almost done. What I want to do is take data from files in a different folder from where the script is run, process the data inside Perl, then print the output to another directory. Now I got the two parts done, but what I fail with is that Perl does not find the path to where I want to save the files. It just sais "No file or directory with that names exists", but it does! Here is the part of the script for that:
my #files = <$ENV{HOME}/Docs/unprocessed/*.txt>;
my $path = "$ENV{HOME}/Docs/results";
<looping through #files, processing each file in the unprocessed folder...>
open (OUTFILE, $path . '>$file') or die $!;
print OUTFILE ""; # "" Is really the finished calculations from the loop, not important here.
close FILE;
close OUTFILE;
I bet its something stupid...
Because you are mixing the "write" token > in with the filename. This:
open (OUTFILE, $path . '>$file')
Should be:
open (OUTFILE, ">$path/$file")
You will also likely have to strip the .../Docs/unprocessed/ prefix from your filename:
use File::Basename;
open (OUTFILE, ">$path/" . basename($file))