How can my previously untainted data become tainted again? - perl

I have a bit of a mystery here that I am not quite understanding the root cause of. I am getting an 'Insecure dependency in unlink while running with -T switch' when trying to invoke unlink from a script. That is not the mystery, as I realize that this means Perl is saying I am trying to use tainted data. The mystery is that this data was previously untainted in another script that saved it to disk without any problems.
Here's how it goes... The first script creates a binary file name using the following
# For the binary file upload
my $extensioncheck = '';
my $safe_filename_characters = "a-zA-Z0-9_.";
if ( $item_photo )
{
# Allowable File Type Check
my ( $name, $path, $extension ) = fileparse ( $item_photo, '\..*' );
$extensioncheck = lc($extension);
if (( $extensioncheck ne ".jpg" ) && ( $extensioncheck ne ".jpeg" ) &&
( $extensioncheck ne ".png" ) && ( $extensioncheck ne ".gif" ))
{
die "Your photo file is in a prohibited file format.";
}
# Rename file to Ad ID for adphoto directory use and untaint
$item_photo = join "", $adID, $extensioncheck;
$item_photo =~ tr/ /_/;
$item_photo =~ s/[^$safe_filename_characters]//g;
if ( $item_photo =~ /^([$safe_filename_characters]+)$/ ) { $item_photo = $1; }
else { die "Filename contains invalid characters"; }
}
$adID is generated by the script itself using a localtime(time) function, so it should not be tainted. $item_photo is reassigned using $adID and $extensioncheck BEFORE the taint check, so the new $item_photo is now untainted. I know this because $item_photo itself has no problem with unlink itself latter in the script. $item_photo is only used long enough to create three other image files using ImageMagick before it's tossed using the unlink function. The three filenames created from the ImageMagick processing of $item_photo are created simply like so.
$largepicfilename = $adID . "_large.jpg";
$adpagepicfilename = $adID . "_adpage.jpg";
$thumbnailfilename = $adID . "_thumbnail.jpg";
The paths are prepended to the new filenames to create the URLs, and are defined at the top of the script, so they can't be tainted as well. The URLs for these files are generated like so.
my $adpageURL = join "", $adpages_dir_URL, $adID, '.html';
my $largepicURL = join "", $adphotos_dir_URL, $largepicfilename;
my $adpagepicURL = join "", $adphotos_dir_URL, $adpagepicfilename;
my $thumbnailURL = join "", $adphotos_dir_URL, $thumbnailfilename;
Then I write them to the record, knowing everything is untainted.
Now comes the screwy part. In a second script I read these files in to be deleted using the unlink function, and this is where I am getting my 'Insecue dependency' flag.
# Read in the current Ad Records Database
open (ADRECORDS, $adrecords_db) || die("Unable to Read Ad Records Database");
flock(ADRECORDS, LOCK_SH);
seek (ADRECORDS, 0, SEEK_SET);
my #adrecords_data = <ADRECORDS>;
close(ADRECORDS);
# Find the Ad in the Ad Records Database
ADRECORD1:foreach $AdRecord(#adrecords_data)
{
chomp($AdRecord);
my($adID_In, $adpageURL_In, $largepicURL_In, $adpagepicURL_In, $thumbnailURL_In)=split(/\|/,$AdRecord);
if ($flagadAdID ne $adID_In) { $AdRecordArrayNum++; next ADRECORD1 }
else
{
#Delete the Ad Page and Ad Page Images
unlink ("$adpageURL_In");
unlink ("$largepicURL_In");
unlink ("$adpagepicURL_In");
unlink ("$thumbnailURL_In");
last ADRECORD1;
}
}
I know I can just untaint them again, or even just blow them on through knowing that the data is safe, but that is not the point. What I want is to understand WHY this is happening in the first place, as I am not understanding how this previously untainted data is now being seen as tainted. Any help to enlighten where I am missing this connection would be truly appreciated, because I really want to understand this rather than just write the hack to fix it.

Saving data to a file doesn't save any "tainted" bit with the data. It's just data, coming from an external source, so when Perl reads it it becomes automatically tainted. In your second script, you will have to explicitly untaint the data.
After all, some other malicious program could have changed the data in the file before the second script has a chance to read it.

Related

"sh: 1: file: not found" thrown in Perl

So this is an issue I see thrown around on several coding help-sites that always have a slight variation. I'm not entirely familiar with what it means, and what's even more curious is that this error is thrown midway through a larger Upload.pm script, and does not cause any sort of fatal error. It gets tossed into my error log somewhere during this unless conditional snippet
# If this is the first slice, validate the file extension and mime-type. Mime-type of following slices should be "application/octet-stream".
unless ( defined $response{'error'} ) {
if ( $slice->{'index'} == 1 ) {
my ($filename, $directory, $extension) = fileparse($path.$parent_file, qr/\.[^.]*/);
unless ( is_valid_filetype($slice->{'tmp_file'}, $extension) ) {
$response{'error'} = "Invalid file type.";
$response{'retry'} = 0;
}
}
}
Now, let me be perfectly honest. I don't really understand the error message, and I could really use some help understanding it, as well as solving it.
Our Perl based web app has refused to let us upload files correctly since upgrading to Debian Bullseye, and I've been stuck debugging this code I didn't write for a few days now. I'm wondering if the upgrade depreciated some Perl modules, or if the directories to said modules are no longer working?
I'm testing this in a Ubuntu based Docker environment running Debian Bullseye on an Apache 2 server.
If you need any more context, clarification, etc, please let me know.
is_valid_filetype() looks like this:
sub is_valid_filetype
{
my ($tmp_file, $extension) = #_;
if ( $tmp_file && $extension ) {
# Get temp file's actual mime-type.
my $mime = qx/file --mime-type -b '${tmp_file}'/;
$mime =~ s/^\s+|\s+$//g;
# Get valid mime-types matching this extension.
my $dbh = JobTracker::Common::dbh or die("DBH not available.");
my $mime_types = $dbh->selectrow_array('SELECT `mime_types` FROM `valid_files` WHERE `extension` = ?', undef, substr($extension, 1));
if ( $mime && $mime_types ) {
if ( $mime_types !~ /,/ ) {
# Single valid mime-type for this extension.
if ( $mime eq $mime_types ) {
return 1;
}
} else {
# Multiple valid mime-types for this extension.
my %valid_mimes = map { $_ => 1 } split(/,/, $mime_types);
if ( defined $valid_mimes{$mime} ) {
return 1;
}
}
}
}
return 0;
}
It's a message from sh (not Perl). It concerns an error on line 1 of the script, which was apparently an attempt to run the file utility. But sh couldn't find it.
The code in question executes this command using
qx/file --mime-type -b '${tmp_file}'/
Install file or adjust the PATH so it can be found.
Note that this code suffers from a code injection bug. It will fail if the string in $tmp_path contains a single quote ('), possibly resulting in the unintentional execution of code.
Fixed:
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote( "file", "--mime-type", "-b", $tmp_file" );
qx/$cmd/
Debian Bullseye was reading our CSV files as the wrong mime-type. It was interpreting the file command as application/csv, despite obviously not being an application.
This may be an actual bug in Bullseye, because both my boss and I have scoured the internet with no lucky finding anyone else with this issue. I may even report to Bullseye's devs for further awareness.
The fix was manually adding in our own mime-types that interpreted this file correctly.
It took us dumping the tmp directory to confirm the files existed, and triple checking I had my modules installed.
This was such a weird and crazy upstream issue that either of us could not have imaged it would be the file type interpretation at an OS level in Bullseye.
I really hope this helps someone, saves them the time it took us to find this.

Use Archive::Zip to determine if a member is a text file or not

I'm working on a script that will grep the contents of members of zip archives when the member name matches a pattern, using a given search string.
I have the following sub that processes a single archive (the script can take more than one archive on the command line):
sub processArchive($$$$) {
my ($zip, $searchstr, $match, $zipName) = #_;
print "zip[$zip] searchstr[$searchstr] match[$match] zipName[$zipName]\n";
my #matchingList = $zip->membersMatching($match);
my $len = #matchingList;
if ($len > 0) {
print $zipName . ":\n";
for my $member (#matchingList) {
print "member[$member]\n";
print "textfile[" . $member->isTextFile() . "] contents[" . $member->contents() . "]\n";
if ($member->isTextFile()) {
print "Is a text file.\n";
}
else {
print "Is not a text file.\n";
}
my #matchingLines = grep /$searchstr/, $member->contents();
my $len = #matchingLines;
if ($len > 0) {
print #matchingLines;
}
}
}
}
The logic isn't even complete yet. I'm first experimenting with calling "isTextFile()" to see what it does. I must be doing something wrong, because I get "Is not a text file" for at least one member that is clearly a text file.
I also note that when I print the value of the return from "isTextFile()", it's always an empty string. Is that what I should expect from printing a "true" or "false" value, or is something else wrong here?
The "text file" status is read from a flag in the ZIP file. Many archiving tools do not set this flag properly, as it is rarely used and has no impact on normal use.
If you actually need to check whether a file contains text, you will need to extract it and see for yourself.

Perl Read a file into a variable and add suffix to each lines

I'm very new to Perl and I'm having a hard time find out what I want.
I have a text file containing something like
text 2015-02-02:
- blabla1
- blabla2
text2 2014-12-12:
- blabla
- ...
I'm trying to read the file, put it in var, add to end of each line (of my var) and use it to send it to a web page.
This is what I have for the moment. It works except for the part.
if (open (IN, "CHANGELOG.OLD")) {
local $/;
$oldchangelog = <IN>'</br>';
close (IN);
$tmplhtml{'CHANGELOG'} = $oldchangelog;
} else {
# changelog not available
$tmplhtml{'CHANGELOG'} = "Changelog not available";
}
thanks for the help!
As someone comments - this looks like YAML, so parsing as YAML is probably more appropriate.
However to address your scenario:
3 argument file opens are good.
you're using local $/; which means you're reading the whole file into a string. This is not suitable for line by line processing.
Looks like you're putting everything into one element of a hash. Is there any particular reason you're doing this?
Anyway:
if ( open ( my $input, "<", "CHANGELOG.OLD" ) ) {
while ( my $line = <$input> ) {
$tmplhtml{'CHANGELOG'} .= $line . " <BR/>\n";
}
}
else {
$tmplhtml{'CHANGELOG'} = "Changelog not available";
}
As an alternative - you can render text 'neatly' to HTML using <PRE> tags.

Perl mechanize Find all links array loop issue

I am currently attempting to create a Perl webspider using WWW::Mechanize.
What I am trying to do is create a webspider that will crawl the whole site of the URL (entered by the user) and extract all of the links from every page on the site.
But I have a problem with how to spider the whole site to get every link, without duplicates
What I have done so far (the part im having trouble with anyway):
foreach (#nonduplicates) { #array contain urls like www.tree.com/contact-us, www.tree.com/varieties....
$mech->get($_);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/); #find all links on this page that starts with http://www.tree.com
#NOW THIS IS WHAT I WANT IT TO DO AFTER THE ABOVE (IN PSEUDOCODE), BUT CANT GET WORKING
#foreach (#list) {
#if $_ is already in #nonduplicates
#then do nothing because that link has already been found
#} else {
#append the link to the end of #nonduplicates so that if it has not been crawled for links already, it will be
How would I be able to do the above?
I am doing this to try and spider the whole site to get a comprehensive list of every URL on the site, without duplicates.
If you think this is not the best/easiest method of achieving the same result I'm open to ideas.
Your help is much appreciated, thanks.
Create a hash to track which links you've seen before and put any unseen ones onto #nonduplicates for processing:
$| = 1;
my $scanned = 0;
my #nonduplicates = ( $urlToSpider ); # Add the first link to the queue.
my %link_tracker = map { $_ => 1 } #nonduplicates; # Keep track of what links we've found already.
while (my $queued_link = pop #nonduplicates) {
$mech->get($queued_link);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/);
for my $new_link (#list) {
# Add the link to the queue unless we already encountered it.
# Increment so we don't add it again.
push #nonduplicates, $new_link->url_abs() unless $link_tracker{$new_link->url_abs()}++;
}
printf "\rPages scanned: [%d] Unique Links: [%s] Queued: [%s]", ++$scanned, scalar keys %link_tracker, scalar #nonduplicates;
}
use Data::Dumper;
print Dumper(\%link_tracker);
use List::MoreUtils qw/uniq/;
...
my #list = $mech->find_all_links(...);
my #unique_urls = uniq( map { $_->url } #list );
Now #unique_urls contains the unique urls from #list.

What does this perl crash means?

Can someone tell me what this means?
if (not defined $config{'crontab'}) {
die "no crontab defined!";
}
I want to open a file crontab.txt but the perl script crashes at this line and I don't really know any perl.
EDIT 1
It goes like this:
sub main()
{
my %config = %{getCommandLineOptions()};
my $programdir = File::Spec->canonpath ( (fileparse ( Win32::GetFullPathName($PROGRAM_NAME) ))[1] );
my $logdir = File::Spec->catdir ($programdir, 'logs');
$logfile = File::Spec->catfile ($logdir, 'cronw.log');
configureLogger($logfile);
$log = get_logger("cronw::cronService-pl");
# if --exec option supplied, we are being invoked to execute a job
if ($config{exec}) {
execJob(decodeArgs($config{exec}), decodeArgs($config{args}));
return;
}
my $cronfile = $config{'crontab'};
$log->info('starting service');
$log->debug('programdir: '.$programdir);
$log->debug('logfile: '.$logfile);
if (not defined $config{'crontab'}) {
$log->error("no crontab defined!\n");
die "no crontab defined!";
# fixme: crontab detection?
}
$log->debug('crontab: '.$config{'crontab'});
And I'm trying to load this 'crontab.txt' file...
sub getCommandLineOptions()
{
my $clParser = new Getopt::Long::Parser config => ["gnu_getopt", "pass_through"];
my %config = ();
my #parameter = ( 'crontab|cronfile=s',
'exec=s',
'args=s',
'v|verbose'
);
$clParser->getoptions (\%config, #parameter);
if (scalar (#ARGV) != 0) { $config{'unknownParameter'} = $true; }
return \%config;
}
Probably I have to give the script an argument
Probably I have to give the script an argument
I would say so.
$ script --cronfile=somefile
That code looks to see whether there is a key 'crontab' in the hash %config. If not, then it calls die and terminates.
If that's not what you expect to happen, then somewhere else in your script there should be something that is setting $config{'crontab'}, but there is not currently enough information in your question to determine what that might be.
Probably the file path of crontab.txt is expected in %config hash, pointed by the 'crontab' key, but isn't there! If so, a DIRTY solution CAN BE:
$config{'crontab'}='FULLPATH/crontab.txt';
#if (not defined $config{'crontab'}) {
# die "no crontab defined!";
#}
but this may not work because there is something like $config{'prefix'} and what you will try to open is the path represented by the concatenation of both, or just because in $config{'crontab'} is expected any other value than full path!