How to sanitize input from open files in Perl - perl

I have a Perl script which opens a file, processes it and prints some output.
The input file is gzipped.
the path to the $file is passed to the script as an arugment.
Below is the current solution I'm using:
open(my $fh, "-|", "$gzcat $file") or die("Cannot open $file$!");
The script has failed in Checkmarx's security audit recently, with the following error:
<script> gets user input for the $fh element. This element’s value then flows through the code without being properly sanitized or validated and is eventually displayed to the user in method <method>. This may enable a CrossSite-Scripting attack.
I have tried validating the file exists with perl -f, and also removing unwanted characters using $file =~ s/[^A-Za-z0-9_\-\.\/]//g;, yet it does not satisfy Checkmarx.
I would like to know what is the proper way of sanitzing an input which contains a path to a file in Perl.

As long as you are on Perl 5.8 or newer on an OS that supports forking, or 5.22 or newer on Windows, you can use the list form of pipe open to bypass the shell when running your command. This avoids problems where the filename contains metacharacters the shell will interpret, such as & and spaces.
open(my $fh, "-|", $gzcat, $file) or die("Cannot open $file: $!");
However, this is not validation or sanitization as requested, but it is important to avoid both vulnerabilities and misbehavior. The cross-site scripting possibility that is mentioned would be due to the filename being displayed as mentioned later; if it is displayed in an HTML page for example, you must HTML-escape it, most templating systems have methods to do this.

I ended up removing unwanted characters with
$file =~ s/[^A-Za-z0-9_\-\.\/]//g;
Checking that the file exists with Perl -f, and opening the file using
IO::Uncompress::Gunzip.
This passes Checkmarx's audit.

Related

How to write to an existing file in Perl?

I want to open an existing file in my desktop and write to it, for some reason I can't do it in ubuntu. Maybe I don't write the path exactly?
Is it possible without modules and etc.
open(WF,'>','/home/user/Desktop/write1.txt';
$text = "I am writing to this file";
print WF $text;
close(WF);
print "Done!\n";
You have to open a file in append (>>) mode in order to write to same file.
(Use a modern way to read a file, using a lexical filehandle:)
Here is the code snippet (tested in Ubuntu 20.04.1 with Perl v5.30.0):
#!/usr/bin/perl
use strict;
use warnings;
my $filename = '/home/vkk/Scripts/outfile.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
print $fh "Write this line to file\n";
close $fh;
print "done\n";
For more info, refer these links - open or appending-to-files by Gabor.
Please see following code sample, it demonstrates some aspects of correct usage of open, environment variables and reports an error if a file can not be open for writing.
Note: Run a search in Google for Perl bookshelf
#!/bin/env perl
#
# vim: ai ts=4 sw=4
#
use strict;
use warnings;
use feature 'say';
my $fname = $ENV{HOME} . '/Desktop/write1.txt';
my $text = 'I am writing to this file';
open my $fh, '>', $fname
or die "Can't open $fname";
say $fh $text;
close $fh;
say 'Done!';
Documentation quote
About modes
When calling open with three or more arguments, the second argument -- labeled MODE here -- defines the open mode. MODE is usually a literal string comprising special characters that define the intended I/O role of the filehandle being created: whether it's read-only, or read-and-write, and so on.
If MODE is <, the file is opened for input (read-only). If MODE is >, the file is opened for output, with existing files first being truncated ("clobbered") and nonexisting files newly created. If MODE is >>, the file is opened for appending, again being created if necessary.
You can put a + in front of the > or < to indicate that you want both read and write access to the file; thus +< is almost always preferred for read/write updates--the +> mode would clobber the file first. You can't usually use either read-write mode for updating textfiles, since they have variable-length records. See the -i switch in perlrun for a better approach. The file is created with permissions of 0666 modified by the process's umask value.
These various prefixes correspond to the fopen(3) modes of r, r+, w, w+, a, and a+.
Documentation: open, close,

Unable to Downsample audio file in CGI perl script using sox

I am working on a cgi script where I get an uploaded an audio file, downsample it to 8000Hz and then get it recognised later.
I am facing an error while downsampling the file. The code for downsampling goes like:
1) Code for File Upload:
use CGI;
use strict;
use File::Copy qw(copy);
use CGI::Carp 'fatalsToBrowser';
my $PROGNAME = "file_upload.cgi";
my $cgi = new CGI();
print "Content-type: text/html\n\n";
my $upfile = $cgi->param('upfile');
# Get the basename in case we want to use it.
my $basename = GetBasename($upfile);
no strict 'refs';
if (! open(OUTFILE, ">../cgi-bin/upload/".$basename) ) {
print "Can't open for writing - $!";
exit(-1);
}
2)Code for downsample:
my $source_file="/var/www/cgi-bin/upload/$upfile";
system("sox $source_file -r 8000 /var/www/cgi-bin/upload/temp.wav".";"."mv /var/www/cgi-bin/upload/temp.wav $source_file");
where:
source_file is the path for uploaded audio file
$upfile is the name of the uploaded wav file
temp.wav is the temporary downsampled file which is overwritten on the original file using mv command
Error
sox FAIL formats: can't open input file `/var/www/cgi-bin/upload/file1.wav': WAVE: RIFF header not found
file1.wav is the file I uploaded
Please help me understand why the sox command is not executing despite it being correctly written?
This isn't really an answer to your question as we don't have enough information yet.
Have you tried running the command from your Unix command line? I'd assume you get the same error. What do you get if you run file on the file that you have saved? How big is the file before and after you upload it?
You don't show the code that writes the uploaded file. I suspect there's a bug in that. If you add that to your question, we could help you find it.
Where is GetBasename() defined? Can we see the code?
Your sox command seems strange. You're running sox on a file called temp.wav and then copying that file over your uploaded file. Perhaps there are a couple of steps that you aren't telling us.
Some other suggestions for improvement:
Use cgi->new, not new CGI. The latter has some strange corner cases that you will have real problems debugging if you ever come across them.
If you're loading the CGI module, then why not use its header method instead of writing your own (technically incorrect) header.
no strict 'refs' is a really bad idea (and, as far as I can see, isn't needed here).
Please use the three-arg version of open() and lexical filehandles
open my $out_fh, '>', "../cgi-bin/upload/$basename"
Include the file path in your error message
my $file = "../cgi-bin/upload/$basename";
if (!open my $out_fh, '>', $file) {
print "Can't open file '$file' for writing - $!";
exit(-1);
}
You are loading the File::Copy module, but then moving your file using a shell command.
Allowing random users to upload files into a directory under your cgi-bin directory is a massive potential security hole. You should find another directory to store the uploaded files.
Oh, and then there's the whole - why on Earth would you be writing CGI programs in 2017!
The issue is resolved. The reason why I was having problem executing the sox and copy commands was because of where I was placing the two commands in code. Basically a beginners error. So I was opening the file as mentioned in the problem statement. I put the copy and sox commands for execution before I closed the filehandler and hence they were not getting executed successfully.

Read same extension multiple files in one directory in Perl

I currently have an issue with reading files in one directory.
I need to take all the fastq files in a file and run the script for each file then put new files in an ‘Edited_sequences’ folder.
The one script I had is
perl -ne '$i++; if($i<80001){print}' BM2003_TCCCAGAACAAC_L001_R1_001.fastq > ./Edited_sequences/BM2003_TCCCAGAACAAC_L001_R1_001.fastq
It takes the first 80000 lines in one fastq file then outputs the result.
Now for example I have 2000 fastq files, then I need to copy and paste for 2000 times.
I know there is a glob command suit for this situation but I just do not know how to deal with that.
Please help me out.
You can use perl to do copy/paste for you, first argument *.fastq are all fastq files, and second ./Edited_sequences is target folder for new files,
perl -e '$d=pop; `head -8000 "$_" > "$d/$_"` for #ARGV' *.fastq ./Edited_sequences
glob gets you an array of filenames matching a particular expression. It's frequently used with <> brackets, a lot like reading input (you can think of it as reading files from a directory).
This is a simple example that will print the names of every ".fastq" file in the current directory:
print "$_\n" for <*.fastq>;
The important part is <*.fastq>, which gives us an array of filenames matching that expression (in this case, a file extension). If you need to change which directory your Perl script is working in, you can use chdir.
From there, we can process your files as needed:
while (my $filename = <*.fastq>) {
open(my $in, '<', $filename) or die $!;
open(my $out, '>', "./Edited_sequences/$filename") or die $!;
for (1..80000) {
my $line = <$in>;
print $out $line;
}
}
You have two choices:
Use Perl to read in the 2000 files and run it as part of your program
Use the Shell to pass each of those 2000 file to your command line
Here's the bash alternative:
for file in *.fastq
do
perl -ne '$i++; if($i<80001){print}' "$file" > "./Edited_sequences/$file"
done
Your same Perl script, but with the shell finding each file. This should work and not overload the command line. The for loop in bash, if handed a glob can expand them correctly.
However, I always recommend that you don't actually execute the command, but echo the resulting commands into a file:
for file in *.fastq
do
echo "perl -ne '\$i++; if(\$i<80001){print}' \
\"$file\" > \"./Edited_sequences/$file\"" >> myoutput.txt
done
Then, you can look at myoutput.txt to make sure it looks good before you actually do any real harm. Once you've determined that myoutput.txt is a good file, you can execute that as a shell script:
$ bash myoutput.txt

How do I run shell commands in a CGI program as the nobody user?

I want to run shell commands in a CGI program (written in Perl). My program doesn’t have root permission. It runs as nobody. I want to use this code:
use strict;
system <<'EEE';
awk '{a[$1]+=$2;b[$1]+=$3}END{for(i in a)print i, a[i], b[i]|"sort -nk 3"}' s.txt
EEE
I can run my code successfully with perl from the command line but not as a CGI program.
Based on the code in your question, there are at least four possibilities for failure.
The nobody user does not have permission to execute your program.
The Perl code in your question has no shebang (#!) line. You are trying to run awk, so I assume you are running on some form of Unix. If your code is missing this line, then your operating system does not know how to run your program.
The file s.txt is either not in the executing program’s working directory, or it is not readable by the nobody user.
For whatever reason, awk is not reachable via the PATH of your executing program’s environment.
To quickly diagnose such low-level problems, try to have all error output to show up in the browser. One way to do this is adding the following just after the shebang line in your code.
BEGIN {
print "Content-type: text/plain\n\n";
open STDERR, ">&", \*STDOUT or print "$0: dup: $!";
}
The output will render as plain text rather than HTML, but this is a temporary measure to see your program’s output. By wrapping it in a BEGIN block, the code executes as soon as it parses. Redirecting STDERR means your browser also gets anything written to the standard output.
Another way to do this is with the CGI::Carp module.
use CGI::Carp 'fatalsToBrowser';
This way, errors go to the browser and also to the web server’s error log.
If you still see 500-series errors from your server, the problem is happening at a lower level: probably some failure to start perl. Go examine your server’s error log. Once your program is executing, you can remove this temporary redirection of error output.
Finally, I recommend changing your program to
#! /usr/bin/perl -T
BEGIN { print "Content-type: text/plain\n\n"; }
use strict;
use warnings;
$ENV{PATH} = "/bin:/usr/bin";
my $input = "/path/to/your/s.txt";
my $buckets = <<'EOProgram'
{ a[$1] += $2; b[$1] += $3 }
END { for (i in a) print i, a[i], b[i] }
EOProgram
open STDIN, "-|", "awk", $buckets, $input or die "$0: open: $!";
exec "sort", "-nk", 3 or die "$0: exec: $!";
The -T switch enables a security dataflow analysis called taint mode that prevents you from using unsanitized input on system operations such as open, exec, and so on that an attacker (or benign user supplying unexpected input) could use to harm your system. You should always add -T to CGI programs and any other code that runs on behalf of another user.
Given the nature of your awk program, a content type of text/plain seems reasonable. Output it as soon as possible.
With taint mode enabled, be explicit about the value of your PATH environment variable. If instead you stick with whatever untrusted PATH your program inherits, attempting to run external programs will fail.
Nail down the full path of your input. This will eliminate surprises.
Using the multi-argument forms of open and exec eliminates the shell and its argument parsing. (For completeness, system also has a similar multi-argument form.) Yes, writing it this way can mean being a little more deliberate (such as breaking out the arguments and setting up the pipeline yourself), but it also avoids nasty surprises.
I'm sure nobody is allowed to run shell commands. The problem is that nobody doesn't have permission to open the file s.txt. Add read permission for everyone to s.txt, and add execute permission to everyone on every directory up to s.txt.
I would suggest finding out the full qualified path for awk and specifying it directly. Likely the nobody that launched httpd had a very minimal path in its $ENV{PATH}. Displaying the $ENV{PATH} I am guessing will show this.
This is a good thing, I wouldn't modify the path, but just specify the path /usr/bin/awk or what not.
If you have shell access and it works, type 'which awk' to find this out.
i can run my codes successfully in
perl file but not in cgi file.
What web server are you running under? For instance, apache requires printing a CGI header i.e. print "Content-type: text/plain; charset=utf-8\n\n", or
use CGI;
my $q = CGI->new();
print $q->header('text/html');
(See CGI)
Apache will conplain in the log (error.log) about "premature end of script headers" IF what I said is the case.
You could just do it inline without having to fork out to another process...
if ( open my $fh, '<', 's.txt' ) {
my %data;
while (<$fh>) {
my ($c1,$c2,$c3) = split;
$data{a}{$c1} += $c2;
$data{b}{$c1} += $c3;
}
foreach ( sort { $data{b}{$a} <=> $data{b}{$b} } keys %{ $data{b} } ) {
print "$_ $data{a}{$_} $data{b}{$_}\n";
}
} else {
warn "Unable to open s.txt: $!\n";
}

wkhtmltopdf/perl: HTTP headers & logging

I just discovered wkhtmltopdf and I'm trying to use it in a perl CGI script to generate PDFs. Basically, the perl script writes an HTML file, calls wkhtmltopdf via system() to create a pdf, then downloads the pdf and deletes the temporary files.
open NNN, ">$path_to_files/${file}_pdf.html" or die "can't write file: $!";
print NNN $text;
close NNN;
my #pdfSettings = (
"d:/very/long/path/wkhtmltopdf",
"$path_to_files/${file}_pdf.html",
"$path_to_files/$file.pdf"
);
system(#pdfSettings);
open(DLFILE, '<', "$path_to_files/$file.pdf");
print $q->header(
-type=> 'application/x-download',
-attachment => "$file.pdf",
-filename => "$file.pdf",
'Content-length' => -s "$path_to_files/$file.pdf",
);
binmode DLFILE;
print while <DLFILE>;
close (DLFILE);
unlink("$path_to_files/${file}_pdf.html");
unlink("$path_to_files/${file}.pdf");
This works fine on my local server. However, when I upload it to my public server, it gets as far as creating the pdf file and then dies with "The specified CGI application misbehaved by not returning a complete set of HTTP headers."
Moving the "print $q->header" to before the system() call causes the pdf to generate with wkhtmltopdf's console output ("Loading pages (1/6)," etc.) at the top of the file, so I think what's happening is that wkhtmltopdf is spewing that information headerless to the server and causing it to fail. But I can't find any options in the wkhtmltopdf docs to turn off the console output, and I can't figure out a perl method to suppres/redirect that output.
(Yes, I'm aware of WKHTMLTOPDF.pm, but I was having trouble installing it for my flavor of ActivePerl and I wanted to avoid switching if possible.)
How about executing via qx or backticks instead of system(), and redirecting the output to NUL:?
qx("d:/very/long/path/wkhtmltopdf" "$path_to_files/${file}_pdf.html" "$path_to_files/$file.pdf" > NUL: 2> NUL:);