Log4Perl: How do I change the logger file used from running code? (After a fork) - perl

I have an ETL process set up in perl to process a number of files, and load them to a database.
Recently, for performance reasons I set up the code to be multi-threaded, through use of a fork() call and a call to system("perl someOtherPerlProcess.pl $arg1 $arg2").
I end up with about 12 instances of someOtherPerlProcess.pl running with different arguments, and these processes each work through one directories worth of files (corresponding to a single table in our database).
The applications main functions work, but I am having issues with figuring out how to configure my logging.
Ideally, I would like to have all the someOtherPerlProcess.pl share the same $log_config value to initialize their loggers, but have each of those create a log file in the directory they are working on.
I haven't been able to figure out how to do that. I also noticed that in the directory I am calling these perl scripts from I see several files (ARRAY(0x260eec), ARRAY(0x313f8), etc) that contain all my logging messages!
Is there a simple way to change the log4perl.appender.A1.filename value from running code?
Or to otherwise dynamically configure the file name we use, but use all other values from a config file?

I came up with a less than ideal solution for this, which is to configure my logger from someOtherPerlProcess.pl directly.
my $FORKED_LOG_CONF = "log4perl.appender.A1.filename=$directory_to_load/log.txt
log4perl.rootLogger=WARN, A1
log4perl.appender.A1=Log::Log4perl::Appender::File
log4perl.appender.A1.mode=append
log4perl.appender.A1.autoflush=1
log4perl.appender.A1.layout=PatternLayout
log4perl.appender.A1.layout.ConversionPattern=[%p] %d{yyyy-MM-dd HH:mm:ss}: %m%n";
#Logger start up
Log::Log4perl::init( \$FORKED_LOG_CONF);
my $logger = get_logger();
The $directory_to_load is the process specific portion of the logger, which works in the context of the perl process that is running and has a (local) value for that variable, but that method will fail if used in an external config file.
I would be happy to hear of any alternative solutions.

In your config file:
log4perl.appender.A1.filename=__LOGFILE__
In your script:
use File::Slurp;
my $log_cfg = read_file( $log_cfgfile );
my $logfile = "$directory_to_load/log.txt";
$log_cfg =~ s/__LOGFILE__/$logfile/;
Log::Log4perl::init( \$log_cfg );

Related

Jenkins Pipeline - Create file in workspace (Windows Slave)

For a number of reasons, it would be really useful if I could create a file from a Jenkins pipeline and put it in my workspace. If I can do this, I could avoid pulling in some repositories where I'm currently pulling them in for just one or two files, keep those files in a maintainable place, and I could also use this to create temporary powershell scripts, working around a limitation of the solution described in https://stackoverflow.com/a/42576572
This might be possible through a Pipeline utility, although https://jenkins.io/doc/pipeline/steps/pipeline-utility-steps/ doesn't list any such utility; or it might be possible using a batch script - as long as that can be passed in as a string
You can do something like that:
node (''){
stage('test'){
bat """
echo "something" > file.txt
"""
String out = readFile(file.txt).trim()
print out // prints variable out groovy style
out.useFunction() // allows running functions loaded from the file
bat "type %out%" // batch closure can access the variable
}
}

gsutil cp: concurrent execution leads to local file corruption

I have a Perl script which calls 'gsutil cp' to copy a selected from from GCS to a local folder:
$cmd = "[bin-path]/gsutil cp -n gs://[gcs-file-path] [local-folder]";
$output = `$cmd 2>&1`;
The script is called via HTTP and hence can be initiated multiple times (e.g. by double-clicking on a link). When this happens, the local file can end up being exactly double the correct size, and hence obviously corrupt. Three things appear odd:
gsutil seems not to be locking the local file while it is writing to
it, allowing another thread (in this case another instance of gsutil)
to write to the same file.
The '-n' seems to have no effect. I would have expected it to prevent
the second instance of gsutil from attempting the copy action.
The MD5 signature check is failing: normally gsutil deletes the
target file if there is a signature mismatch, but this is clearly
not always happening.
The files in question are larger than 2MB (typically around 5MB) so there may be some interaction with the automated resume feature. The Perl script only calls gsutil if the local file does not already exist, but this doesn't catch a double-click (because of the time lag for the GCS transfer authentication).
gsutil version: 3.42 on FreeBSD 8.2
Anyone experiencing a similar problem? Anyone with any insights?
Edward Leigh
1) You're right, I don't see a lock in the source.
2) This can be caused by a race condition - Process 1 checks, sees the file is not there. Process 2 checks, sees the file is not there. Process 1 begins upload. Process 2 begins upload. The docs say this is a HEAD operation before the actual upload process -- that's not atomic with the actual upload.
3) No input on this.
You can fix the issue by having your script maintain an atomic lock of some sort on the file prior to initiating the transfer - i.e. your check would be something along the lines of:
use Lock::File qw(lockfile);
if (my $lock = lockfile("$localfile.lock", { blocking => 0 } )) {
... perform transfer ...
undef $lock;
}
else {
die "Unable to retrieve $localfile, file is locked";
}
1) gsutil doesn't currently do file locking.
2) -n does not protect against other instances of gsutil run concurrently with an overlapping destination.
3) Hash digests are calculated on the bytes as they are being downloaded as a performance optimization. This avoids a long-running computation once the download completes. If the hash validation succeeds, you're guaranteed that the bytes were written successfully at one point. But if something (even another instance of gsutil) modifies the contents in-place while the process is running, the digesters will not detect this.
Thanks to Oesor and Travis for answering all points between them. As an addendum to Oesor's suggested solution, I offer this alternative for systems lacking Lock::File:
use Fcntl ':flock'; # import LOCK_* constants
# if lock file exists ...
if (-e($lockFile))
{
# abort if lock file still locked (or sleep and re-check)
abort() if !unlink($lockFile);
# otherwise delete local file and download again
unlink($filePath);
}
# if file has not been downloaded already ...
if (!-e($filePath))
{
$cmd = "[bin-path]/gsutil cp -n gs://[gcs-file-path] [local-dir]";
abort() if !open(LOCKFILE, ">$lockFile");
flock(LOCKFILE, LOCK_EX);
my $output = `$cmd 2>&1`;
flock(LOCKFILE, LOCK_UN);
unlink($lockFile);
}

file for saving cookie data not found when using HTTP::Cookies in Perl script

all. I had some questions about the Perl module HTTP::Cookies. The example on CPAN is like below:
$cookie_jar = HTTP::Cookies->new( file => '$ENV{\'HOME\'}/lwp_cookies.dat', autosave => 1);
The lwp_cookies.dat file is used to save cookie data on my local machine as I understand. On my machine, '$ENV{\'HOME\'}' is an empty path. The script runs good, even after execution I can't find any file named "lwp_cookies.dat" on my machine. I changed '$ENV{\'HOME\'}' to '$ENV{\'TMP\'}', which is a path really exists after I verified by Perl print. Still I can't find the "lwp_cookies.dat" in my TEMP folder. My first question is how the HTTP::Cookies is working with the "lwp_cookies.dat" file.
On the other hand, on one of my systems(all're Windows system as mentioned here), the same code produce error message below:
Can't open $ENV{'HOME'}/lwp_cookies.dat: No such file or directory
So it's strange to me. On my good system, even file or path not exists, the script runs well, which I suppose the file is created on some temp memory instead; on bad system, the code example doesn't work at all.
If you want the $ENV{'HOME'} variable to interpolate into the string, you need double quotes; single quotes don't interpolate variables:
`file => "$ENV{'HOME'}/lwp_cookies.dat",`

How to handle this situatiuon in Perl

I am having a configuration INI file to store all configuration required for my script to run. I have a Logger.PM which uses Log4Perl, and ConfigReader.PM which reads the INI file and stores the value in global variable. My Start.PL is the entry point where i call the methods from Logger and configreader.
What I do currently
In Start.PL I hardcoded the INI file path
In Logger.Pm I harcoded the directory name where log files should be stored
What I want
I want the INI file path as configurable
I want the log folder path to be taken from the INI file
I could do this by following
Pass the INI file path as a parameter to the start.pl
Read the INI file and get the folder path from INI file
What I could face is that
I cannot use the Logger.PM in ConfigReader (partially) since the
folder name required for logger is part of INI file
I want to log every step of my script (for logging/debugging purpose in case of failure. )
I can use print but this will write to console and to capture i need to use >>log.txt. Then i will be forced to maintain 2 logs for my application which is not what I wanted
Anyone have a good solution for this scenario?
You can pass INI file path in command line using Getopt::Long, and command line switches for istance:
Start.pl --ini=/path/to/INI_file
Here is a code sample to show what changes are needed in Start.pl, in order to have switches:
#!/usr/bin/env perl
use v5.12;
use strict;
use Getopt::Long;
# That little tiny 's' after 'ini=' is for string
GetOptions ( 'ini=s' => \my $ini_file );
say $ini_file;
After this change, you can read all options from your INI file, including log folder path ( are you already using a module to manage INI files like Config::IniFiles? ).
There is something still unclear in your question about print: although one of my master said that print with a pair of square brackets is the best debugger in the world, why use print when you have set up Log::Log4perl?
When you say that Logger.PL can't be used in ConfigReader, are you referring to the log object?

What is $ENV{DOCUMENT_ROOT} equivalent in perl CGI on Windows IIS (2003)

I'm migrating a perl cgi script from linux to windows IIS server 2003 and see that there is no DOCUMENT_ROOT environment variable.
Some googling suggests I can hack it by stripping stuff off the end of $0 or cwd, but getting the site root should be a common task. Is there a better or standard way of doing this?
IIS doesn't really have the notion of a document root in the same way with IIS, as each application is more or less self-contained and independent. For any request, PATH_TRANSLATED is usually a good base on which to build, it is set to the physical path name for the handling component set in PATH_INFO, and from that you can usually get to the file system locations using a little File::Spec navigation.
There's also a SCRIPT_TRANSLATED and SCRIPT_NAME, which may be closer to what you need. SCRIPT_NAME is essentially the host absolute URL (minus the scheme, host, and port) for script, and SCRIPT_TRANSLATED is the corresponding physical file. I use the URI and URI::file classes, and methods to manipulate them, for some of these tasks.
These will only be useful if your request is handled by the same application that serves files, but they do allow you do derive URLs which work. If you need the file system for the root application, the one mapped to "/", and your script is not in the same root application, you will likely have to do some accesses to the IIS metabase (essentially the equivalent to httpd.conf and friends, but queryable) to find this out.
You can print out all ENV variables with a simple CGI script, like this:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
foreach $key (keys %ENV) {
print "$key --> $ENV{$key}<br>";
}
From that output, it should be semi-obvious what the variable you're looking for is.