How can I limit execution time for a Perl script in IIS? - perl

This is a shared hosting environment. I control the server, but not necessarily the content. I've got a client with a Perl script that seems to run out of control every now and then and suck down 50% of the processor until the process is killed.
With ASP scripts, I'm able to restrict the amount of time the script can run, and IIS will simply shut it down after, say, 90 seconds. This doesn't work for Perl scripts, since it's running as a cgi process (and actually launches an external process to execute the script).
Similarly, techniques that look for excess resource consumption in a worker process will likely not see this, since the resource that's being consumed (the processor) is being chewed up by a child process rather than the WP itself.
Is there a way to make IIS abort a Perl script (or other cgi-type process) that's running too long? How??

On a UNIX-style system, I would use a signal handler trapping ALRM events, then use the alarm function to start a timer before starting an action that I expected might timeout. If the action completed, I'd use alarm(0) to turn off the alarm and exit normally, otherwise the signal handler should pick it up to close everything up gracefully.
I have not worked with perl on Windows in a while and while Windows is somewhat POSIXy, I cannot guarantee this will work; you'll have to check the perl documentation to see if or to what extent signals are supported on your platform.
More detailed information on signal handling and this sort of self-destruct programming using alarm() can be found in the Perl Cookbook. Here's a brief example lifted from another post and modified a little:
eval {
# Create signal handler and make it local so it falls out of scope
# outside the eval block
local $SIG{ALRM} = sub {
print "Print this if we time out, then die.\n";
die "alarm\n";
};
# Set the alarm, take your chance running the routine, and turn off
# the alarm if it completes.
alarm(90);
routine_that_might_take_a_while();
alarm(0);
};

The ASP script timeout applies to all scripting languages. If the script is running in an ASP page, the script timeout will close the offending page.

An update on this one...
It turns out that this particular script apparently is a little buggy, and that the Googlebot has the uncanny ability to "press it's buttons" and drive it crazy. The script is an older, commercial application that does calendaring. Apparently, it displays links for "next month" and "previous month", and if you follow the "next month" too many times, you'll fall off a cliff. The resulting page, however, still includes a "next month" link. Googlebot would continuously beat the script to death and chew up the processor.
Curiously, adding a robots.txt with Disallow: / didn't solve the problem. Either the Googlebot had already gotten ahold of the script and wouldn't let loose, or else it simply was disregarding the robots.txt.
Anyway, Microsoft's Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) was a huge help, as it allowed me to see the environment for the perl.exe process in more detail, and I was able to determine from it that it was the Googlebot causing my problems.
Once I knew that (and determined that robots.txt wouldn't solve the problem), I was able to use IIS directly to block all traffic to this site from *.googlebot.com, which worked well in this case, since we don't care if Google indexes this content.
Thanks much for the other ideas that everyone posted!
Eric Longman

Googling for "iis cpu limit" gives these hits:
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/38fb0130-b14b-48d5-a0a2-05ca131cf4f2.mspx?mfr=true
"The CPU monitoring feature monitors and automatically shuts down worker processes that consume large amounts of CPU time. CPU monitoring is enabled for individual application pools."
http://technet.microsoft.com/en-us/library/cc728189.aspx
"By using CPU monitoring, you can monitor worker processes for CPU usage and optionally shut down the worker processes that consume large amounts of CPU time. CPU monitoring is only available in worker process isolation mode."

Related

Can I open and run from multiple command line prompts in the same directory?

I want to open two command line prompts (I am using CMDer) from the same directory and run different commands at the same time.
Would those two commands interrupt each other?
One is for compiling a web application I am building (takes like 7 minutes to compile), and the other is to see the history of the commands I ran (this one should be done quickly).
Thank you!
Assuming that CMDer does nothing else than to issue the same commands to the operating system as a standard cmd.exe console would do, then the answer is a clear "Yes, they do interfere, but it depends" :D
Break down:
The first part "opening multiple consoles" is certainly possible. You can open up N console windows and in each of them switch to the same directory without any problems (Except maybe RAM restrictions).
The second part "run commands which do or do not interfere" is the tricky part. If your idea is that a console window presents you something like an isolated environment where you can do things as you like and if you close the window everything is back to normal as if you never ever touched anything (think of a virtual machine snapshot which is lost/reverted when closing the VM) - then the answer is: This is not the case. There will be cross console effects observable.
Think about deleting a file in one console window and then opening this file in a second console window: It would not be very intuitive if the file would not have been vanished in the second console window as well.
However, sometimes there are delays until changes to the file system are visible to another console window. It could be, that you delete the file in one console and make a dir where the file is sitting in another console and still see that file in the listing. But if you try to access it, the operating system will certainly quit with an error message of the kind "File not found".
Generally you should consider a console window to be a "View" on your system. If you do something in one window, the effect will be present in the other, because you changed the underlying system which exists only once (the system is the "Model" - as in "Model-View-Controller Design Pattern" you may have heard of).
An exception to this might be changes to the environment variables. These are copied from the current state when a console window is started. And if you change the value of such a variable, the other console windows will stay unaffected.
So, in your scenario, if you let a build/compile operation run and during this process some files on your file system are created, read (locked), altered or deleted then this would be a possible conflicting situation if the other console window tries to access the same files. It will be a so called "race condition", that is, a non-deterministic process, which state of a file will be actual to the second console window (or both, if the second one also changes files which the first one wants to work with).
If there is no interference on a file level (reading the same files is allowed, writing to the same file is not), then there should be no problem of letting both tasks run at the same time.
However, on a very detailed view, both processes would interfere in that they need the same limited but vastly available CPU and RAM resources of your system. This should not pose any problems with the todays PC computing power, considering features like X separate cores, 16GB of RAM, Terabytes of hard drive storage or fast SSDs, and so on.
Unless there is a very demanding, highly parallelizable, high priority task to be considered, which eats up 98% CPU time, for example. Then there might be a considerable slow down impact on other processes.
Normally, the operating system's scheduler does a good job on giving each user-process enough CPU time to finish as quickly as possible, while still presenting a responsive mouse cursor, playing some music in the background, allowing a Chrome running with more than 2 tabs ;) and uploading the newest telemetry data to some servers on the internet, all at the same time.
There are techniques which make it possible that a file is available as certain snapshots to a given timestamp. The key word would be "Shadow Copy" under Windows. Without going into details, this technique allows for example defragmenting a file while it is being edited in some application or a backup could copy a (large) file while a delete operation is run at the same file. The operating system ensures that the access time is considered when a process requests access to a file. So the OS could let the backup finish first, until it schedules the delete operation to run, since this was started after the backup (in this example) or could do even more sophisticated things to present a synchronized file system state, even if it is actually changing at the moment.

Powershell memory usage - expensive?

I am new to powershell but has written up a few scripts running on a windows2003 server. It's definitely more powerful than cmd scripting (maybe due to me having a programming background). However, when I delve further, I noticed that:
Each script launched will run under 1 powershell process, ie.
you see a new powershell process for each script.
the scripts I tested for memory are really simple, say, build a
string or query an environment variable, then Start-Sleep for 60
sec, So nothing needy (as to memory usage). But each process takes
around >30MB. Call me stingy, but as there are memory-intensive
applications scheduled to run everyday, and if I need to schedule a
few powershell scripts to run regularly and maybe some script
running continuously as a service, I'd certainly try to keep memory
consumption as low as possible. <-- This is because we recently
experienced a large application failure due to lack of memory.
I have not touched on C# yet, but would anyone reckon that it sometimes may be better to write the task in C#?
Meanwhile, I've seen posts regarding memory leak in powershell. Am I right to think that the memory created by the script will be withing the process space of powershell, so that when the script terminates hence powershell terminates, the memory created get cleared?
My PowerShell.exe 2.0 by itself (not running a script) is ~30MB on XP. This shouldn't worry you much with the average memory per machine these days. Regarding memory leaks, there have been cases where people use 3rd party libraries that have memory leaks when objects arn't properly disposed of. To address those you have to manually invoke the garbage collectorusing [gc]::Collect(), but this is rare. Other times i've seen people use Get-Content to read a very large file and assign it to a variable before using it. This will take alot of memory as well. In that case you can use the pipeline to read the file portions at a time to reduce your memory footprint.
1 - Yes, a new process is created. The same is true when running a cmd script, vb script, or C# compiled executable.
2 - Loading the powershell host and runtime will take some non-trivial amount of memory, which will vary from system to system and version to version. It will generally be a heavier-weight process than a cmd shell or a dedicated C# exe. For those MB, you are getting the rich runtime and library support that makes Powershell so powerful.
General comments:
The OS allocates memory per-process. Once a process terminates, all of its memory is reclaimed. This is the general design of any modern OS, and is not specific to Powershell or even Windows.
If your team is running business-critical applications on hardware such that a handful of 30MB processes can cause a catastrophic failure, you have bigger problems. Opening a browser and going to Facebook will eat more memory than that.
In the time it takes you to figure out some arcane batch script solution, you could probably create a better solution in Powershell, and your company could afford new dedicated hardware with the savings in billable hours :-)
You should use the tool which is most appropriate for the job. Powershell is often the right tool, but not always. It's great for automating administrative tasks in a Windows environment (file processing, working with AD, scheduled tasks, setting permissions, etc, etc). It's less great for high-performance, heavily algorithmic tasks, or for complex coding against raw .NET APIs. For these tasks, C# would make more sense.
Powershell has huge backing/support from Microsoft (and a big user community!), and it's been made very clear that it is the preferred scripting environment for Windows going forward. All new server-side tech for Windows has powershell support. If you are working in admin/IT, it would be a wise investment to build up some skills in Powershell. I would never discourage someone from learning C#, but if your role is more IT than dev then Powershell will be the right tool much more often, and your colleagues are more likely to also understand it.
Powershell requires (much) more resources (RAM) than cmd so if all you need is something quick and simple, it makes more sense to use cmd.
CMD uses native Win32 calls and Powershell uses the .Net framework. Powershell takes longer to load, and can consume a lot more RAM than CMD.
"I monitored a Powershell session executing Get-ChildItem. It grew to
2.5GB (all of it private memory) after a few minutes and was no way nearly finished. CMD “dir /o-d” with a small scrollback buffer
finished in about 2 minutes, and never took more than 300MB of
memory."
https://qr.ae/pGmwoe

Non-blocking / Asynchronous Execution in Perl

Is there a way to implement non-blocking / asynchronous execution (without fork()'ing) in Perl?
I used to be a Python developer for many years... Python has really great 'Twisted' framework that allows to do so (using DEFERREDs. When I ran search to see if there is anything in Perl to do the same, I came across POE framework - which seemed "close" enough to what I was searching for. But... after spending some time reading the documentation and "playing" with the code, I came against "the wall" - which is following limitation (from POE::Session documentation):
Callbacks are not preemptive. As long as one is running, no others will be dispatched. This is known as cooperative multitasking. Each session must cooperate by returning to the central dispatching kernel.
This limitation essentially defeats the purpose of asynchronous/parallel/non-blocking execution - by restricting to only one callback (block of code) executing at any given moment. No other callback can start running while another is already running!
So... is there any way in Perl to implement multi-tasking (parallel, non-blocking, asynchronous execution of code) without fork()'ing - similar to DEFERREDs in Python?
Coro is a mix between POE and threads. From reading its CPAN documentation, I think that IO::Async does real asynchronous execution. threads can be used too - at least Padre IDE successfully uses them.
I'm not very familiar with Twisted or POE, but basic parallel execution is pretty simple with threads. Interpreters are generally not compiled with threading support, so you would need to check for that. The forks package is a drop-in replacement for threading (implements the full API) but using processes seamlessly. Then you can do stuff like this:
my $thread = async {
print "you can pass a block of code as an arg unlike Python :p";
return some_func();
};
my $result = $thread->join();
I've definitely implemented callbacks from an event loop in an async process using forks and I don't see why it wouldn't work with threads.
Twisted also uses cooperative multi-tasking just like POE & Coro.
However it looks like Twisted Deferred does (or can) make use of threads. NB. See this answer from the SO question Twisted: Making code non-blocking
So you would need to go the same route with POE (though using fork is probably preferable).
So one POE solution would be to use: POE::Wheel::Run - portably run blocking code and programs in subprocesses.
For alternatives to POE take a look at AnyEvent and Reflex.
I believe you use select for that kind of thing. More similarly to forking, there's threading.
POE is fine if you want asynchronous processing but using only a single cpu (core) is fine.
For example if the app is I/O limited a single process will be enough most of the time.
No other callback can start running while another is already running!
As far as I can tell - this is the same with all languages (per CPU thread of course; modern web servers usually spawn a least one process or thread per CPU core, so it will look (to users) like stuff it working in parallel, but the long-running callback didn't get interrupted, some other core just did that work).
You can't interrupt an interrupt, unless the interrupted interrupt has been programmed specifically to accommodate it.
Imagine code that takes 1min to run, and a PC with 16 cores - now imagine a million people try to load that page, you can deliver working results to 16 people, and "time out" all the rest, or, you can crash your web server and give no results to anyone. Folks choose not to crash their web server, which is why they never permit callbacks to interrupt other callbacks (not that they could even if they tried - the caller never gets control back to make a new call before the prior one has ended anyhow...)

Perl scripts, to use forks or threads?

I am writing a couple fo scripts that go and collect data from a number of servers, the number will grow and im trynig to future proof my scripts, but im a little stuck.
so to start off with I have a script that looks up an IP in a mysql database and then connects to each server grabs some information and then puts it into the database again.
What i have been thinknig is there is a limited amount of time to do this and if i have 100 servers it will take a little bit of time to go out to each server get the information and then push it to a db. So I have thought about either using forks or threads in perl?
Which would be the prefered option in my situation? And hs anyone got any examples?
Thanks!
Edit: Ok so a bit more inforamtion needed: Im running on Linux, and what I thought was i could get the master script to collect the db information, then send off each sub process / task to connect and gather information then push teh information back to the db.
Which is best depends a lot on your needs; but for what it's worth here's my experience:
Last time I used perl's threads, I found it was actually slower and more problematic for me than forking, because:
Threads copied all data anyway, as a thread would, but did it all upfront
Threads didn't always clean up complex resources on exit; causing a slow memory leak that wasn't acceptable in what was intended to be a server
Several modules didn't handle threads cleanly, including the database module I was using which got seriously confused.
One trap to watch for is the "forks" library, which emulates "threads" but uses real forking. The problem I faced here was many of the behaviours it emulated were exactly what I was trying to get away from. I ended up using a classic old-school "fork" and using sockets to communicate where needed.
Issues with forks (the library, not the fork command):
Still confused the database system
Shared variables still very limited
Overrode the 'fork' command, resulting in unexpected behaviour elsewhere in the software
Forking is more "resource safe" (think database modules and so on) than threading, so you might want to end up on that road.
Depending on your platform of choice, on the other hand, you might want to avoid fork()-ing in Perl. Quote from perlfork(1):
Perl provides a fork() keyword that
corresponds to the Unix system call of
the same name. On most Unix-like
platforms where the fork() system call
is available, Perl's fork() simply
calls it.
On some platforms such as Windows
where the fork() system call is not
available, Perl can be built to
emulate fork() at the interpreter
level. While the emulation is
designed to be as compatible as
possible with the real fork() at the
level of the Perl program, there are
certain important differences that
stem from the fact that all the pseudo
child "processes" created this way
live in the same real process as far
as the operating system is concerned.

So, who should daemonize? The script or the caller?

I'm always wondering who should do it. In Ruby, we have the Daemons library which allows Ruby scripts to daemonize themselves. And then, looking at God (a process monitoring tool, similar to monit) page, I see that God can daemonize processes.
Any definitive answer out there?
You probably cannot get a definitive answer, as we generally end up with both: the process has the ability to daemonize itself, and the process monitor has the ability to daemonize its children.
Personally I prefer to have the process monitor or script do it, for a few reasons:
1. if the process monitor wishes to closely follow its children to restart them if they die, it can choose not to daemonize them. A SIGCHLD will be delivered to the monitor when one of its child processes exits. In embedded systems we do this a lot.
2. Typically when daemonizing, you also set the euid and egid. I prefer not to encode into every child process a knowledge of system-level policy like uids to use.
3. It allows re-use of the same application as either a command line tool or a daemon (I freely admit that this rarely happens in practice).
I would say it is better for your script to do it. I don't know your process monitoring tool there, but I would think users could potentially use an alternative tool, which means that having the script do it would be preferable.
If you can envision the script run in non-daemon fashion, I would add an option to the script to enable or disable daemonization.