wkhtmltopdf/perl: HTTP headers & logging - perl

I just discovered wkhtmltopdf and I'm trying to use it in a perl CGI script to generate PDFs. Basically, the perl script writes an HTML file, calls wkhtmltopdf via system() to create a pdf, then downloads the pdf and deletes the temporary files.
open NNN, ">$path_to_files/${file}_pdf.html" or die "can't write file: $!";
print NNN $text;
close NNN;
my #pdfSettings = (
"d:/very/long/path/wkhtmltopdf",
"$path_to_files/${file}_pdf.html",
"$path_to_files/$file.pdf"
);
system(#pdfSettings);
open(DLFILE, '<', "$path_to_files/$file.pdf");
print $q->header(
-type=> 'application/x-download',
-attachment => "$file.pdf",
-filename => "$file.pdf",
'Content-length' => -s "$path_to_files/$file.pdf",
);
binmode DLFILE;
print while <DLFILE>;
close (DLFILE);
unlink("$path_to_files/${file}_pdf.html");
unlink("$path_to_files/${file}.pdf");
This works fine on my local server. However, when I upload it to my public server, it gets as far as creating the pdf file and then dies with "The specified CGI application misbehaved by not returning a complete set of HTTP headers."
Moving the "print $q->header" to before the system() call causes the pdf to generate with wkhtmltopdf's console output ("Loading pages (1/6)," etc.) at the top of the file, so I think what's happening is that wkhtmltopdf is spewing that information headerless to the server and causing it to fail. But I can't find any options in the wkhtmltopdf docs to turn off the console output, and I can't figure out a perl method to suppres/redirect that output.
(Yes, I'm aware of WKHTMLTOPDF.pm, but I was having trouble installing it for my flavor of ActivePerl and I wanted to avoid switching if possible.)

How about executing via qx or backticks instead of system(), and redirecting the output to NUL:?
qx("d:/very/long/path/wkhtmltopdf" "$path_to_files/${file}_pdf.html" "$path_to_files/$file.pdf" > NUL: 2> NUL:);

Related

How to sanitize input from open files in Perl

I have a Perl script which opens a file, processes it and prints some output.
The input file is gzipped.
the path to the $file is passed to the script as an arugment.
Below is the current solution I'm using:
open(my $fh, "-|", "$gzcat $file") or die("Cannot open $file$!");
The script has failed in Checkmarx's security audit recently, with the following error:
<script> gets user input for the $fh element. This element’s value then flows through the code without being properly sanitized or validated and is eventually displayed to the user in method <method>. This may enable a CrossSite-Scripting attack.
I have tried validating the file exists with perl -f, and also removing unwanted characters using $file =~ s/[^A-Za-z0-9_\-\.\/]//g;, yet it does not satisfy Checkmarx.
I would like to know what is the proper way of sanitzing an input which contains a path to a file in Perl.
As long as you are on Perl 5.8 or newer on an OS that supports forking, or 5.22 or newer on Windows, you can use the list form of pipe open to bypass the shell when running your command. This avoids problems where the filename contains metacharacters the shell will interpret, such as & and spaces.
open(my $fh, "-|", $gzcat, $file) or die("Cannot open $file: $!");
However, this is not validation or sanitization as requested, but it is important to avoid both vulnerabilities and misbehavior. The cross-site scripting possibility that is mentioned would be due to the filename being displayed as mentioned later; if it is displayed in an HTML page for example, you must HTML-escape it, most templating systems have methods to do this.
I ended up removing unwanted characters with
$file =~ s/[^A-Za-z0-9_\-\.\/]//g;
Checking that the file exists with Perl -f, and opening the file using
IO::Uncompress::Gunzip.
This passes Checkmarx's audit.

Unable to Downsample audio file in CGI perl script using sox

I am working on a cgi script where I get an uploaded an audio file, downsample it to 8000Hz and then get it recognised later.
I am facing an error while downsampling the file. The code for downsampling goes like:
1) Code for File Upload:
use CGI;
use strict;
use File::Copy qw(copy);
use CGI::Carp 'fatalsToBrowser';
my $PROGNAME = "file_upload.cgi";
my $cgi = new CGI();
print "Content-type: text/html\n\n";
my $upfile = $cgi->param('upfile');
# Get the basename in case we want to use it.
my $basename = GetBasename($upfile);
no strict 'refs';
if (! open(OUTFILE, ">../cgi-bin/upload/".$basename) ) {
print "Can't open for writing - $!";
exit(-1);
}
2)Code for downsample:
my $source_file="/var/www/cgi-bin/upload/$upfile";
system("sox $source_file -r 8000 /var/www/cgi-bin/upload/temp.wav".";"."mv /var/www/cgi-bin/upload/temp.wav $source_file");
where:
source_file is the path for uploaded audio file
$upfile is the name of the uploaded wav file
temp.wav is the temporary downsampled file which is overwritten on the original file using mv command
Error
sox FAIL formats: can't open input file `/var/www/cgi-bin/upload/file1.wav': WAVE: RIFF header not found
file1.wav is the file I uploaded
Please help me understand why the sox command is not executing despite it being correctly written?
This isn't really an answer to your question as we don't have enough information yet.
Have you tried running the command from your Unix command line? I'd assume you get the same error. What do you get if you run file on the file that you have saved? How big is the file before and after you upload it?
You don't show the code that writes the uploaded file. I suspect there's a bug in that. If you add that to your question, we could help you find it.
Where is GetBasename() defined? Can we see the code?
Your sox command seems strange. You're running sox on a file called temp.wav and then copying that file over your uploaded file. Perhaps there are a couple of steps that you aren't telling us.
Some other suggestions for improvement:
Use cgi->new, not new CGI. The latter has some strange corner cases that you will have real problems debugging if you ever come across them.
If you're loading the CGI module, then why not use its header method instead of writing your own (technically incorrect) header.
no strict 'refs' is a really bad idea (and, as far as I can see, isn't needed here).
Please use the three-arg version of open() and lexical filehandles
open my $out_fh, '>', "../cgi-bin/upload/$basename"
Include the file path in your error message
my $file = "../cgi-bin/upload/$basename";
if (!open my $out_fh, '>', $file) {
print "Can't open file '$file' for writing - $!";
exit(-1);
}
You are loading the File::Copy module, but then moving your file using a shell command.
Allowing random users to upload files into a directory under your cgi-bin directory is a massive potential security hole. You should find another directory to store the uploaded files.
Oh, and then there's the whole - why on Earth would you be writing CGI programs in 2017!
The issue is resolved. The reason why I was having problem executing the sox and copy commands was because of where I was placing the two commands in code. Basically a beginners error. So I was opening the file as mentioned in the problem statement. I put the copy and sox commands for execution before I closed the filehandler and hence they were not getting executed successfully.

Redirect and Restore STDERR in Dancer

When starting my http server I don't want to see >> Dancer2 v0.201000 server <pid> listening on http://0.0.0.0:<port> printed on the stderr. Thats why I added the following line before calling start()
get "/pwd" => sub {
my $pwd = cwd;
print STDERR "\n\n[PWD] : $pwd\n"; # this line is not being printed
print "\n\n[STDOUT::PWD] : $pwd\n";
my %responseHash = ( pwd => $pwd );
my $response = encode_json \%responseHash;
return $response;
};
my $dancerStartErr;
sub startServer {
open (local *STDERR, ">", \$dancerStartErr)
or die "Dup err to variable error: $!\n";
start();
}
startServer();
The problem is that later I can't print something on the STERR. How can I reopen STDERR (open(STDERR, ">", \*STDERR); doesn't help)?
If you don't want your application to log anything, you can change the logging engine to use Dancer2::Logger::Null. You do that by editing your config.yml, or in one of your environments. For example, to turn it off in producion, change # appdir/environments/production.yml.
logger: 'null'
The default is the logging engine 'console', which prints stuff to your terminal.
There are other Dancer2::Logger:: classes available bundled with Dancer2 and on CPAN in their own distributions. A better solution to just dumping everything into a black hole might be to log to a file instead. Documentation of how to configure it further can be found in Dancer2::Core::Role::Logger.
Also note that instead of printing to STDERR in your code, you should use the logging keywords with the appropriate log level.
print STDERR "\n\n[PWD] : $pwd\n"; # this line is not being printed
This is not a good idea, because you cannot distinguish if this is an error, or a warning, or just debugging output. That's why there are different log levels built into Dancer2.
core
debug
info
warning
error
All of them are available as keywords. There is documentation on it in Dancer2::Manual.
Since the working directory is probably not relevant in production, but only during development, you'd go with debug.
debug "[PWD] : $pwd";
That's it. It takes care of newlines and such for you automatically.
You could use select before redirecting to save it in a variable
my $oldfh = select(STDERR);
and then use it later
select($oldfh);
Also check out:
Capture::Tiny::Extended
How to redirect and restore STDOUT/STDERR

Unable to write file with Apache CGI scripts

I want to add some logs to my CGI scripts with Perl code like this:
open(LOG, ">/path/to/my.log") or die;
print LOG "Some content...\n";
close(LOG);
However, logs are never written to my log file, while the scripts are still correctly handling requests.
I'm not very familiar with Apache, CGI and Perl, so gurus please shine a light.
It is probably a permission problem. The script's runner (probably user: apache, httpd or nobody) has no permission to write to the file. However, to be sure, you need to check what $! contains. Also try checking Apache's ErrorLog file when the script is run.
I would rewrite your code as:
use CGI::Carp qw( croak );
open my $log, '>', '/path/to/my.log' or croak "Error opening file: $!";
print $log "Some content...\n";
close $log;
The problem has been solved: changes to my Perl script take effect only after restarting Apache. Not sure why it behaves like this because I am thinking Perl is an interpreted language and it can be modified on the fly...

How do I run shell commands in a CGI program as the nobody user?

I want to run shell commands in a CGI program (written in Perl). My program doesn’t have root permission. It runs as nobody. I want to use this code:
use strict;
system <<'EEE';
awk '{a[$1]+=$2;b[$1]+=$3}END{for(i in a)print i, a[i], b[i]|"sort -nk 3"}' s.txt
EEE
I can run my code successfully with perl from the command line but not as a CGI program.
Based on the code in your question, there are at least four possibilities for failure.
The nobody user does not have permission to execute your program.
The Perl code in your question has no shebang (#!) line. You are trying to run awk, so I assume you are running on some form of Unix. If your code is missing this line, then your operating system does not know how to run your program.
The file s.txt is either not in the executing program’s working directory, or it is not readable by the nobody user.
For whatever reason, awk is not reachable via the PATH of your executing program’s environment.
To quickly diagnose such low-level problems, try to have all error output to show up in the browser. One way to do this is adding the following just after the shebang line in your code.
BEGIN {
print "Content-type: text/plain\n\n";
open STDERR, ">&", \*STDOUT or print "$0: dup: $!";
}
The output will render as plain text rather than HTML, but this is a temporary measure to see your program’s output. By wrapping it in a BEGIN block, the code executes as soon as it parses. Redirecting STDERR means your browser also gets anything written to the standard output.
Another way to do this is with the CGI::Carp module.
use CGI::Carp 'fatalsToBrowser';
This way, errors go to the browser and also to the web server’s error log.
If you still see 500-series errors from your server, the problem is happening at a lower level: probably some failure to start perl. Go examine your server’s error log. Once your program is executing, you can remove this temporary redirection of error output.
Finally, I recommend changing your program to
#! /usr/bin/perl -T
BEGIN { print "Content-type: text/plain\n\n"; }
use strict;
use warnings;
$ENV{PATH} = "/bin:/usr/bin";
my $input = "/path/to/your/s.txt";
my $buckets = <<'EOProgram'
{ a[$1] += $2; b[$1] += $3 }
END { for (i in a) print i, a[i], b[i] }
EOProgram
open STDIN, "-|", "awk", $buckets, $input or die "$0: open: $!";
exec "sort", "-nk", 3 or die "$0: exec: $!";
The -T switch enables a security dataflow analysis called taint mode that prevents you from using unsanitized input on system operations such as open, exec, and so on that an attacker (or benign user supplying unexpected input) could use to harm your system. You should always add -T to CGI programs and any other code that runs on behalf of another user.
Given the nature of your awk program, a content type of text/plain seems reasonable. Output it as soon as possible.
With taint mode enabled, be explicit about the value of your PATH environment variable. If instead you stick with whatever untrusted PATH your program inherits, attempting to run external programs will fail.
Nail down the full path of your input. This will eliminate surprises.
Using the multi-argument forms of open and exec eliminates the shell and its argument parsing. (For completeness, system also has a similar multi-argument form.) Yes, writing it this way can mean being a little more deliberate (such as breaking out the arguments and setting up the pipeline yourself), but it also avoids nasty surprises.
I'm sure nobody is allowed to run shell commands. The problem is that nobody doesn't have permission to open the file s.txt. Add read permission for everyone to s.txt, and add execute permission to everyone on every directory up to s.txt.
I would suggest finding out the full qualified path for awk and specifying it directly. Likely the nobody that launched httpd had a very minimal path in its $ENV{PATH}. Displaying the $ENV{PATH} I am guessing will show this.
This is a good thing, I wouldn't modify the path, but just specify the path /usr/bin/awk or what not.
If you have shell access and it works, type 'which awk' to find this out.
i can run my codes successfully in
perl file but not in cgi file.
What web server are you running under? For instance, apache requires printing a CGI header i.e. print "Content-type: text/plain; charset=utf-8\n\n", or
use CGI;
my $q = CGI->new();
print $q->header('text/html');
(See CGI)
Apache will conplain in the log (error.log) about "premature end of script headers" IF what I said is the case.
You could just do it inline without having to fork out to another process...
if ( open my $fh, '<', 's.txt' ) {
my %data;
while (<$fh>) {
my ($c1,$c2,$c3) = split;
$data{a}{$c1} += $c2;
$data{b}{$c1} += $c3;
}
foreach ( sort { $data{b}{$a} <=> $data{b}{$b} } keys %{ $data{b} } ) {
print "$_ $data{a}{$_} $data{b}{$_}\n";
}
} else {
warn "Unable to open s.txt: $!\n";
}