Perl script to access a webpage - perl

I have a server from which I can access a webpage. I want to test the server reaction when several users(say 60000 users) access the same webpage simultanously.
Im looking for a script to do this,a perl script will be better,
Here is the code I tried with
#!c:\\perl\\bin
use strict;
use WWW::Mechanize;
my $url = "http://www.cpan.org";
my $searchstring = "WWW::Mechanize";
my $mech = WWW::Mechanize->new();
while (i == 60000)
{
$mech->get($url);
i++;
}
But this scripts access the url 1 at a time, But I need simultanous access.

You can use ab. Use its concurrency (-c) and number of requests (-n) options.

Your code, as shown above, doesn't even compile. It would be far easier for people to help you if you gave us the exact code that you're using, rather than typing to retype it and adding typos.
To get your code to compile, I had to declare $i and add the $ before the two uses of i in your programme. I ended up with this:
#!/usr/bin/perl
use strict;
use WWW::Mechanize;
my $url = "http://www.cpan.org";
my $searchstring = "WWW::Mechanize";
my $mech = WWW::Mechanize->new();
my $i;
while ($i == 60000) {
$mech->get($url);
$i++;
}
You say that the loop only gets executed once. That's not true. The loop doesn't get executed at all. The condition on the loop is $i == 60000. The first time this condition is checked, $i is undefined so the condition is false and the loop is skipped.
I think you probably wanted $i <= 60000 instead.
But as Alan points out, you'd be far better off using an existing tool like ab.

You can either use Perl itself to launch your script 6000 times in parallel or write one (or maybe more than one) .BAT files or similar Windows scripting to launch the perl script.
Be careful to clearly understand how to execute stuff in detached mode under Windows.
(Of course you have to remove the while loop in your original script).

ab has already been mentioned, but that only tests against the root page (/) of the site. We found siege more useful in replicating real load, since we were able to test against a long list of URLs (10,000 or so) taken from our live access log...

Related

Fetch a URL 100 times using Perl

The problem I met is that I need to get one URL (I cannot be specific that link exactly, this link is doing request and looks like http://link.com/?name=name&password=password& and etc)
And I need to fetch this URL 100 times in a row. I can not do this manually using browser - this takes much time.
Is there any option to run (just run, like you put link in browser and press enter) this link 100 times in a row using Perl scripting?
I have not met before with the Perl and therefore asking the help directly. As I google before some information and make a little script, but seems like I missing something in my knowledge:
#!/usr/bin/perl -w
use LWP::Simple;
my $uri = 'http://my link here';
my $content = get $uri;
Could you please advise to me how I can finish this script?
Use a (simple) for loop.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $uri = 'http://my link here';
get $uri for 1 .. 100;
Update: Just read in a comment that you don't care about the returned data, so I've edited my answer to remove the unnecessary $content variable.

Change output filename from WGET when using input file option

I have a perl script that I wrote that gets some image URLs, puts the urls into an input file, and proceeds to run wget with the --input-file option. This works perfectly... or at least it did as long as the image filenames were unique.
I have a new company sending me data and they use a very TROUBLESOME naming scheme. All files have the same name, 0.jpg, in different folders.
for example:
cdn.blah.com/folder/folder/202793000/202793123/0.jpg
cdn.blah.com/folder/folder/198478000/198478725/0.jpg
cdn.blah.com/folder/folder/198594000/198594080/0.jpg
When I run my script with this, wget works fine and downloads all the images, but they are titled 0.jpg.1, 0.jpg.2, 0.jpg.3, etc. I can't just count them and rename them because files can be broken, not available, whatever.
I tried running wget once for each file with -O, but it's embarrassingly slow: starting the program, connecting to the site, downloading, and ending the program. Thousands of times. It's an hour vs minutes.
So, I'm trying to find a method to change the output filenames from wget without it taking so long. The original approach works so well that I don't want to change it too much unless necessary, but i am open to suggestions.
Additional:
LWP::Simple is too simple for this. Yes, it works, but very slowly. It has the same problem as running individual wget commands. Each get() or get_store() call makes the system re-connect to the server. Since the files are so small (60kB on average) with so many to process (1851 for this one test file alone) that the connection time is considerable.
The filename i will be using can be found with /\/(\d+)\/(\d+.jpg)/i where the filename will simply be $1$2 to get 2027931230.jpg. Not really important for this question.
I'm now looking at LWP::UserAgent with LWP::ConnCache, but it times out and/or hangs on my pc. I will need to adjust the timeout and retry values. The inaugural run of the code downloaded 693 images (43mb) in just a couple minutes before it hung. Using simple, I only got 200 images in 5 minutes.
use LWP::UserAgent;
use LWP::ConnCache;
chomp(#filelist = <INPUTFILE>);
my $browser = LWP::UserAgent->new;
$browser->conn_cache(LWP::ConnCache->new());
foreach(#filelist){
/\/(\d+)\/(\d+.jpg)/i
my $newfilename = $1.$2;
$response = $browser->mirror($_, $folder . $newfilename);
die 'response failure' if($response->is_error());
}
LWP::Simple's getstore function allows you to specify a URL to fetch from and the filename to store the data from it in. It's an excellent module for many of the same use cases as wget, but with the benefit of being a Perl module (i.e. no need to outsource to the shell or spawn off child processes).
use LWP::Simple;
# Grab the filename from the end of the URL
my $filename = (split '/', $url)[-1];
# If the file exists, increment its name
while (-e $filename)
{
$filename =~ s{ (\d+)[.]jpg }{ $1+1 . '.jpg' }ex
or die "Unexpected filename encountered";
}
getstore($url, $filename);
The question doesn't specify exactly what kind of renaming scheme you need, but this will work for the examples given by simply incrementing the filename until the current directory doesn't contain that filename.

For/foreach loop crashing CGI server

I need to write some perl scripts for a class which include some for or foreach loops. For some reason even the simplest for loop are just returning a 500 server error. I've checked many times and the code also works on codepad.org, but I dont know why it is not working on the server.
I dont have access to the server logs so I can't really tell whats going on.
These are some very simple loops that are also causing the error.
#a=(2,3,4);
foreach my $r (#a) {
print $r;
}
or
#a=(2,3,4);
for ($i = 0; $i <= 2 ; $i++) {
print $a[$i];
}
Any ideas?
Any time you are writing a CGI script, put this at the top while you are working on it:
use CGI::Carp qw(fatalsToBrowser);
This will give redirect errors in your Perl script to the browser window. It should give you a better idea of what is causing your problem (assuming it is at the script level).
It also lets you easily send your own errors and test statements to the browser using die:
die "testing 1 2 3";
This can be useful for doing quick debugging.

Is it possible to send POST parameters to a CGI script without another HTTP request?

I'm attempting to run a CGI script in the current environment from another Perl module. All works well using standard systems calls for GET requests. POST is fine too, until the parameter list gets too long, then they get cut off.
Has anyone ran in to this problem, or have any suggestions for other ways to attempt this?
The following are somewhat simplified for clarity. There is more error checking, etc.
For GET requests and POST requests w/o parameters, I do the following:
# $query is a CGI object.
my $perl = $^X;
my $cgi = $cgi_script_location; # /path/file.cgi
system {$perl} $cgi;
Parameters are passed through the
QUERY_STRING environment variable.
STDOUT is captured by the calling
script so whatever the CGI script
prints behaves as normal.
This part works.
For POST requests with parameters the following works, but seemingly limits my available query length:
# $query is a CGI object.
my $perl = $^X;
my $cgi = $cgi_script_location; # /path/file.cgi
# Gather parameters into a URL-escaped string suitable
# to pass to a CGI script ran from the command line.
# Null characters are handled properly.
# e.g., param1=This%20is%20a%20string&param2=42&... etc.
# This works.
my $param_string = $self->get_current_param_string();
# Various ways to do this, but system() doesn't pass any
# parameters (different question).
# Using qx// and printing the return value works as well.
open(my $cgi_pipe, "|$perl $cgi");
print {$cgi_pipe} $param_string;
close($cgi_pipe);
This method works for short parameter lists, but if the entire command gets to be close to 1000 characters, the parameter list is cut short. This is why I attempted to save the parameters to a file; to avoid shell limitations.
If I dump the parameter list from the executed CGI script I get something like the following:
param1=blah
... a bunch of other parameters ...
paramN=whatever
p <-- cut off after 'p'. There are more parameters.
Other things I've done that didn't help or work
Followed the CGI troubleshooting guide
Saved the parameters to a file using CGI->save(), passing that file to the CGI script. Only the first parameter is read using this method.
$> perl index.cgi < temp-param-file
Saved $param_string to a file, passing that file to the CGI script just like above. Same limitations as passing the commands through the command line; still gets cut off.
Made sure $CGI::POST_MAX is acceptably high (it's -1).
Made sure the CGI's command-line processing was working. (:no_debug is not set)
Ran the CGI from the command line with the same parameters. This works.
Leads
Obviously, this seems like a character limit of the shell Perl is using to execute the command, but it wasn't resolved by passing the parameters through a file.
Passign parameters to system as a single string, from HTTP input, is extremely dangerous.
From perldoc -f system,
If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument,..
In other words, if I pass in arguments -e printf("working..."); rm -rf /; I can delete information from your disk (everything if your web server is running as root). If you choose to do this, make sure you call system("perl", #cgi) instead.
The argument length issue you're running into may be an OS limitation (described at http://www.in-ulm.de/~mascheck/various/argmax/):
There are different ways to learn the upper limit:
command: getconf ARG_MAX
system header: ARG_MAX in e.g. <[sys/]limits.h>
Saving to a temp file is risky: multiple calls to the CGI might save to the same file, creating a race condition where one user's parameters might be used by another user's process.
You might try opening a file handle to the process and passing arguments as standard input, instead. open my $perl, '|', 'perl' or die; fprintf(PERL, #cgi);
I didn't want to do this, but I've gone with the most direct approach and it works. I'm tricking the environment to think the request method is GET so that the called CGI script will read its input from the QUERY_STRING environment variable it expects. Like so:
$ENV{'QUERY_STRING'} = $long_parameter_string . '&' . $ENV{'QUERY_STRING'};
$ENV{'REQUEST_METHOD'} = 'GET';
system {$perl_exec} $cgi_script;
I'm worried about potential problems this may cause, but I can't think of what this would harm, and it works well so far. But, because I'm worried, I thought I'd ask the horde if they saw any potential problems:
Are there any problems handling a POST request as a GET request on the server
I'll save marking this as the official answer until people have confirmed or at least debated it on the above post.
Turns out that the problem is actually related to the difference in Content-Length between the original parameters and the parameter string I cobbled together. I didn't realize that the CGI module was using this value from the original headers as the limit to how much input to read (makes sense!). Apparently the extra escaping I was doing was adding some characters.
My solution's trick is simply to piece together the parameter string I'll be passing and modify the environment variable the CGI module will check to determine the content length to be equal to the .
Here's the final working code:
use CGI::Util qw(escape);
my $params;
foreach my $param (sort $query->param) {
my $escaped_param = escape($param);
foreach my $value ($query->param($param)) {
$params .= "$escaped_param=" . escape("$value") . "&";
}
}
foreach (keys %{$query->{'.fieldnames'}}) {
$params .= ".cgifields=" . escape("$_") . "&";
}
# This is the trick.
$ENV{'CONTENT_LENGTH'} = length($params);
open(my $cgi_pipe, "| $perl $cgi_script") || die("Cannot fork CGI: $!");
local $SIG{PIPE} = sub { warn "spooler pipe broke" };
print {$cgi_pipe} $params;
warn("param chars: " . length($params));
close($cgi_pipe) || warn "Error: CGI exited with value $?";
Thanks for all the help!

How can I quickly find the user's terminal PID in Perl?

The following snippet of code is used to find the PID of a user's terminal, by using ptree and grabbing the third PID from the results it returns. All terminal PID's are stored in a hash with the user's login as the key.
## If process is a TEMINAL.
## The command ptree is used to get the terminal's process ID.
## The user can then use this ID to peek the user's terminal.
if ($PID =~ /(\w+)\s+(\d+) .+basic/) {
$user = $1;
if (open(PTREE, "ptree $2 |")) {
while ($PTREE = <PTREE>) {
if ($PTREE =~ /(\d+)\s+-pksh-ksh/) {
$terminals{$user} = $terminals{$user} . " $1";
last;
}
next;
}
close(PTREE);
}
next;
}
Below is a sample ptree execution:
ares./home_atenas/lmcgra> ptree 29064
485 /usr/lib/inet/inetd start
23054 /usr/sbin/in.telnetd
23131 -pksh-ksh
26107 -ksh
29058 -ksh
29064 /usr/ob/bin/basic s=61440 pgm=/usr/local/etc/logon -q -nr trans
412 sybsrvr
I'd like to know if there is a better way to code this. This is the part of the script that takes longest to run.
Note: this code, along with other snippets, are inside a loop and are executed a couple of times.
I think the main problem is that this code is in a loop. You don't need to run ptree and parse the results more than once! You need to figure out a way to run ptree once and put it into a data structure that you can use later. Probably be some kind of simple hash will suffice. You may even be able to just keep around your %terminals hash and keep reusing it.
Some nitpicks...
Both of your "next" statements seem
unnecessary to me... you should be
able to just remove them.
Replace
$terminals{$user} = $terminals{$user} . " $1";
with:
$terminals{$user} .= " $1";
Replace the bareword PTREE which you
are using as a filehandle with
$ptreeF or some such... using
barewords became unnecessary for
filehandles about 10 years ago :)
I don't know why your $PID variable
is all caps... it could be confusing
to readers of your code because it
looks like there is something
special about that variable, and
there isn't.
I think you'll get the best performance improvement by avoiding the overhead of repeatedly executing an external command (ptree, in this case). I'd look for a CPAN module that provides a direct interface to the data structures that ptree is reading. Check the Linux:: namespace, maybe? (I'm not sure if ptree is setuid; that may complicate things.)
The above advice aside, some additional style and robustness notes based on the posted snippet only (forgive me if the larger code invalidates them):
I'd start by using strict, at the very least. Lexical filehandles would also be a good idea.
You appear to be silently ignoring the case when you cannot open() the ptree command. That could happen for many reasons, some of which I can't imagine you wanting to ignore, such as…
You're not using the full path to the ptree command, but rather assuming it's in your path—and that the one in your path is the right one.
How many users are on the system? Can you invert this? List all -pksh-ksh processes in the system along with their EUIDs, and build the map from that - that might be only one execution of ps/ptree.
I was thinking of using ps to get the parents pid, but I would need to loop this to get the great-grandparent's pid. That's the one I need. Thanks. – lamcro
Sorry, there are many users and each can have up to three terminals open. The whole script is used to find those terminals that are using a file. I use fuser to find the processes that use a file. Then use ptree to find the terminal's pid. – lamcro
If you have (or can get) a list of PIDs using a file, and just need all of the grand-parents of that PID, there's an easier way, for sure.
#!perl
use warnings;
use strict;
#***** these PIDs are gotten with fuser or some other method *****
my($fpids) = [27538, 31812, 27541];
#***** get all processes, assuming linux PS *****
my($cmd) = "ps -ef";
open(PS, "$cmd |") || die qq([ERROR] Cannot open pipe from "$cmd" - $!\n);
my($processlist) = {};
while (<PS>) {
chomp;
my($user, $pid, $ppid, $rest) = split(/ +/, $_, 4);
$processlist->{$pid} = $ppid;
}
close PS;
#***** lookup grandparent *****
foreach my $fpid (#$fpids) {
my($parent) = $processlist->{$fpid} || 0;
my($grandparent) = $processlist->{$parent} || 0;
if ($grandparent) {
#----- do something here with grandparent's pid -----
print "PID:GRANDPID - $fpid:$grandparent\n";
}
else {
#----- some error condition -----
print "ERROR - Cannot determine GrandPID: $fpid ($parent)\n";
}
}
Which for me produces:
ERROR - Cannot determine GrandPID: 27538 (1)
PID:GRANDPID - 31812:2804
PID:GRANDPID - 27541:27538
Have you considered using 'who -u' to tell you which process is the login shell for a given tty instead of using ptree? This would simplify your search - irrespective of the other changes you should also make.
I just did some trivial timings here based on your script (calling "cat ptree.txt" instead of ptree itself) and confirmed my thoughts that all of your time is spent creating new sub-processes and running ptree itself. Unless you can factor away the need to call ptree (maybe there's a way to open up the connection once and reuse it, like with nslookup), you won't see any real gains.