I have a strange problem with my script in powershell, I want to examine the average time of downloading page. I write script which fires frequently. But sometimes my script returns result 0, which means it downloads site in 0 ms. If i modified my script to save whole site to the file when the download time is about 0ms it doesn't saves anything. And I'm interesting if I do something wrong, or powershell function isn't too accurate to count such "small" times.
ps. other "good" results are about 4-9 ms.
Here is a part of my script which responds to count the download time:
$StartTime = Get-Date
$PageDownload = $Request.DownloadString("mypage.com")
$TimeTaken = ((Get-Date) - $StartTime).TotalMilliseconds
Get-Date should be as precise as the system clock is.
There could be web caching going on. Unfortunately, disabling caching for WebClient is not possible, from what I see elsewhere. The "do it right" method is to construct your own Http request with the TcpClient class, but that's also pretty complex.
One easy way to make sure you're not being cached is to put an arbitrary value as a GET request. It's a hack, but it is often enough to fool a cache. So, instead of:
"http://mypage.com"
You use:
"http://mypage.com?someUnusedValueName=$([System.Environment]::TickCount)"
Related
(Resolved: The delay/issues I was having, was not due to the delay of adding in try-catch, or redirecting errors to null. It was a result of using usernames that were not real, for testing the script, as opposed to actual/production names. The AD lookup times are obviously different -- didn't think of that before.)
I've been editing https://github.com/dafthack/DomainPasswordSpray for a work project, with the intent of adding in logging. The script outputs to a logfile about 6 times a second, which throws ioexception errors about accessibility of the file. It's a nonterminating error.
I'm ok with the error and potentially non-logged info. However, it clutters the console.
Since it's a non-terminating error, I tried set-content -erroraction 'silentlycontinue', but the error is still displayed. However, there is no measurable performance impact -- just a cluttered console.
So researching further, I tried a try/catch block, which does eliminate the error from the console, BUT the script now takes a tad over 2x as log to run.
Is there a 3rd alternative?
I can mitigate it by splitting the data lists into two, and running the script twice, concurrently. But it's less than idea, as I'm actually already splitting the data list up to speed up the process. (single list takes 2 hours, and 30m is the goal, so I'd need 8 windows to maintain current timing...)
Anyway. Hope that makes sense. Any thoughts/input appreciated. (attempting to copy the code to a machine where I can upload a portion here for those who want to review, but gmail blocks it. working on it.)
The code causing issues:
$FileLocation2 = "LOG-LastTested_$($Userlist)"
# Write out the last user tried
$tm = get-date
# This file is written to so rapidly, errors can occur, b/c it's still open, "In use by another process. SilentlyContinue not working, using try-catch which surpresses error (but is 2x slower).
Try {Write-output "Num: $cu, Name: $name, Pass: $Password, Time: $tm" | set-content -erroraction 'stop' $FileLocation2}
Catch [System.IO.IOException] {continue}```
Resolved: The delay/issues I was having, was not due to the delay of adding in try-catch, or redirecting errors to null. It was a result of using usernames that were not real, for testing the script, as opposed to actual/production names. The AD lookup times are obviously different -- didn't think of that before.
I was testing the error suppression (try-catch/redirect to null) with a list of dummy users. When I checked the rate of testing, it had dropped from 10 users/second with production accounts, to 3/s with dummy accounts. After reverting all the error suppression it was still at the slower 3/s rate. So I put my try-catch lines back in, and tried with production accounts, and it worked great -- 8-10 users/sec.
So, the issue was due to the only change: using dummy accts instead of production. Once I went back to all production accounts, WITH my error surpression -- worked like a champ. Hope that helps someone else.
Thanks to #RetiredGeek and #Mr.Sven who kept me testing and looking for a solution.
I used this link to fetch the pound to euro exchange rate on a daily (nightly) basis:
http://www.google.com/ig/calculator?hl=en&q=1pound=?euro
This returned an array which I then stripped and used the data I needed.
Since the first of November they retired iGoogle resulting in the URL to forward to: https://support.google.com/websearch/answer/2664197
Anyone knows an alternative URL that won't require me to rewrite the whole function? I'm sure google didn't stop providing this service entirely.
I started getting cronjob errors today on this very issue. So I fell back to a prior URL I was using before I switched to the faster/reliable iGoogle.
Url to programmatically hit (USD to EUR):
http://www.webservicex.net/CurrencyConvertor.asmx/ConversionRate?FromCurrency=USD&ToCurrency=EUR
Details about it:
http://www.webservicex.net/ws/WSDetails.aspx?CATID=2&WSID=10
It works for now, but its prone to be slow at times, and used to respond with an "Out of space" error randomly. Just be sure to code in a way to handle that, and maybe run the cron four times a day instead of once. I run ours every hour.
Example code to get the rate out of the return (there is probably a more elegant way):
$ci = curl_init($accessurl);
curl_setopt($ci, CURLOPT_HTTPGET, 1);
curl_setopt($ci, CURLOPT_RETURNTRANSFER, 1);
$rawreturn = curl_exec($ci);
curl_close($ci);
$rate = trim(preg_replace("/.*<double[^>]*>([^<]*)<\/double[^>]*>.*/i","$1",$rawreturn));
I have written a Perl Script which uses WWW::Mechanize to connect to a site, login and then visit a few pages inside the site. It all works good, however, when I try to visit a large number of pages, the script gets killed. I am sure this has got nothing to with the HTTP Server's Configuration and the connection limits configured. This is because, the script is running on my own site.
Here's a high level overview of my script:
$url="http://example.com";
$mech=WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get($url);
login to the site using the form fields.
Now, once I am logged in, I connect to URLs within the site as follows:
$i is the iteration counter in a for loop
$internal_url="http://example.com/index.php?page=$i";
$mech->get($internal_url);
perform some operations on the page returned ($mech->content using HTML::TreeBuilder::XPath)
now, I iterate over the for loop connecting to a different internal_url, since the value of $i is incremented in every iteration.
As I said, it all works good. However, after about 180 pages, the script gets killed.
What could be the reason? I have tried multiple times.
I even added a $mech->delete; right before the end of the FOR loop to prevent any memory leak.
However, the only issue is that the login session which was maintained by $mech would be destroyed as a result of this.
I have tried multiple times and this script always gets killed after visiting the same number of pages.
Thanks.
Try this code:
$mech=WWW::Mechanize->new();
$mech->stack_depth(0);
OR
$mech=WWW::Mechanize->new(stack_depth=>0);
According to the docs: Get or set the page stack depth. Use this if
you're doing a lot of page scraping and running out of memory.
A value of 0 means "no history at all." By default, the max stack
depth is humongously large, effectively keeping all history.
This is a weird one. :)
I have a script running under Apache 1.3, with Apache::PerlRun option of mod_perl. It uses the standard CGI.pm module. It's a regularly accessed script on a busy server, accessed over https.
The URL is typically something like...
/script.pl?action=edit&id=47049
Which is then brought into Perl the usual way...
my $action = $cgi->param("action");
my $id = $cgi->param("id");
This has been working successfully for a couple of years. However we started getting support requests this week from our customers who were accessing this script and getting blank pages. We already had a line like the following that put the current URL into a form we use for customers to report an issue about a page...
$cgi->url(-query => 1);
And when we view source of the page, the result of that command is the same URL, but with an entirely different query string.
/script.pl?action=login&user=foo&password=bar
A query string that we recognise as being from a totally different script elsewhere on our system.
However crazy it sounds, it seems that when users are accessing a URL with a query string, the query string that the script is seeing is one from a previous request on another script. Of course the script can't handle that action and outputs nothing.
We have some automated test scripts running to see how often this happens, and it's not every time. To throw some extra confusion into the mix, after an Apache restart, the problem seems to initially disappear completely only to come back later. So whatever is causing it is somehow relieved by a restart, but we can't see how Apache can possibly take the request from one user and mix it up with another.
This, it appears, is an interesting combination of Apache 1.3, mod_perl 1.31, CGI.pm and Apache::GTopLimit.
A bug was logged against CGI.pm in May last year: RT #57184
Which also references CGI.pm params not being cleared?
CGI.pm registers a cleanup handler in order to cleanup all of it's cache.... (line 360)
$r->register_cleanup(\&CGI::_reset_globals);
Apache::GTopLimit (like Apache::SizeLimit mentioned in the bug report) also has a handler like this:
$r->post_connection(\&exit_if_too_big) if $r->is_main;
In pre mod_perl 1.31, post_connection and register_cleanup appears to push onto the stack, while in 1.31 it appears as if the GTopLimit one clobbers the CGI.pm entry. So if your GTopLimit function fires because the Apache process has got to large, then CGI.pm won't be cleaned up, leaving it open to returning the same parameters the next time you use it.
The solution seems to be to change line 360 of CGI.pm to;
$r->push_handlers( 'PerlCleanupHandler', \&CGI::_reset_globals);
Which explicitly pushes the handler onto the list.
Our restart of Apache temporarily resolved the problem because it reduced the size of all the processes and gave GTopLimit no reason to fire.
And we assume it has appeared over the past few weeks because we have increased the size of the Apache process either through new developments which included something that wasn't before.
All tests so far point to this being the issue, so fingers crossed it is!
How would I validate that a jpg file is a valid image file. We are having files written to a directory using FTP, but we seem to be picking up the file before it has finished writing it, creating invalid images. I need to be able to identify when it is no longer being written to. Any ideas?
Easiest way might just be to write the file to a temporary directory and then move it to the real directory after the write is finished.
Or you could check here.
JPEG::Error
[arguments: none] If the file reference remains undefined after a call to new, the file is to be considered not parseable by this module, and one should issue some error message and go to another file. An error message explaining the reason of the failure can be retrieved with the Error method:
EDIT:
Image::TestJPG might be even better.
You're solving the wrong problem, I think.
What you should be doing is figuring out how to tell when whatever FTPd you're using is done writing the file - that way when you come to have the same problem for (say) GIFs, DOCs or MPEGs, you don't have to fix it again.
Precisely how you do that depends rather crucially on what FTPd on what OS you're running. Some do, I believe, have hooks you can set to trigger when an upload's done.
If you can run your own FTPd, Net::FTPServer or POE::Component::Server::FTP are customizable to do the right thing.
In the absence of that:
1) try tailing the logs with a Perl script that looks for 'upload complete' messages
2) use something like lsof or fuser to check whether anything is locking a file before you try and copy it.
Again looking at the FTP issue rather than the JPG issue.
I check the timestamp on the file to make sure it hasn't been modified in the last X (5) mins - that way I can be reasonably sure they've finished uploading
# time in seconds that the file was last modified
my $last_modified = (stat("$path/$file"))[9];
# get the time in secs since epoch (ie 1970)
my $epoch_time = time();
# ensure file's not been modified during the last 5 mins, ie still uploading
unless ( $last_modified >= ($epoch_time - 300)) {
# move / edit or what ever
}
I had something similar come up once, more or less what I did was:
var oldImageSize = 0;
var currentImageSize;
while((currentImageSize = checkImageSize(imageFile)) != oldImageSize){
oldImageSize = currentImageSize;
sleep 10;
}
processImage(imageFile);
Have the FTP process set the readonly flag, then only work with files that have the readonly flag set.