Polluted pipeline troubleshooting - powershell

I have a script, in which this code fails, with an exit code of -2145124322
$new.ExitCode > $null
$filePath = "wusa.exe"
$argumentList = "`"\\PX_SERVER\Rollouts\Microsoft\VirtualPC\Windows6.1-KB958559-x64-RefreshPkg.msu`" /quiet /norestart"
$exitCode = (Start-Process -FilePath:$filePath -argumentList:$argumentList -wait -errorAction:Stop -PassThru).ExitCode
Write-Host $exitCode
Now, the main script has about 15,000 lines of "other stuff going on", and these lines where not originally exactly like this. The variables are pulled from XML, there is data validation and try/catch blocks, all sorts of stuff. So, I started pulling the pertinent lines out, and put them in a tiny separate script, and hard coded the variables. And there, it works, I get a nice 3010 exit code and off to the races. So, I took my working code, hard coded variables and all, and pasted it back into the original script, and it breaks again.
So, I moved the code out of the function where it belongs, and just put it after I initialize everything and before I start working through the main loop. And there it works! Now, I gotta believe it's the usual "polluted pipeline", but dang if I can figure out what could cause this. My next step I guess is to just start stepping through the code, dropping this nugget in somewhere, run the test, if it works move it farther down, try again. Gack!
So, hopping someone has some insights. Either what it might be, or perhaps an improved test protocol. Or some trick to actually see the whole pipeline and somehow recognize the pollution.
FWIW, I normally work with PoSH v2, but I have tried this with v4 with the exact same results. But perhaps there is some pipeline monitoring feature in a later version that could help with the troubleshooting?
Also, my understanding is that PoSH v2 has issues with negative return codes, so they can't be trusted. But I think newer versions fixed this, correct? So the fact that I get the same code in v4 means it is meaningful to Google? Not that I have found any hint of that exit code anywhere thus far.
Crossed fingers.
EDIT: OK, a little more data. I searched on the exit code without the -, and with DuckDuckGo instead of Google, and found this.
0x8024001E -2145124322 WU_E_SERVICE_STOP Operation did not complete because the service or system was being shut down.
OK, that's some direction. And I have some code that would allow me to kill a service temporarily. But that seems a little draconian. Isn't the whole point of this, like 10th way to install updates from Microsoft, supposed to be to make automation easier? In any case, I can't find any indication there are command line flags for WUSA that would avoid the problem, but I have to believe I am doing something wrong.

Solved! After tracking a number of different errors trying different things, including turning off the firewall and such, it turns out the error isn't that a service won't stop, but that a service won't start. See, some of that 15K lines of code suppresses Windows Update for the duration of my script, because Windows Update causes lots of Autodesk deployments to fail, which is the whole point of my code. Well, of course WUSA needs that service. So, it looks like, rather than suppressing Windows Update for the duration of script execution, I need to be less heavy handed and only suppress for the duration of a deployment task. that will take a few hours to implement and test, but is totally doable. And probably more elegant anyway. Woot!
And yeah, for once it wasn't me pooping in my pipeline unintentionally. ;)

Related

How to get Powershell in VSCode or ISE to give the specific failing line

I'm sure I must be missing something really basic but I've been revisiting Powershell of late to get up to speed with 7.1 and can't seem to get it to tell me where an error is thrown, either in VSCode or ISE.
In the above from VSCode (same report in ISE) the error isn't on that line, it's a couple of levels deeper in a function called by CompareFiles, but it always seems to report the caller of the caller of the code which has failed, rather than the actual failing line.
I've searched here, there and everywhere and found lots of clever tweaks and debugging ideas which I could add but I don't understand why it doesn't just give me the failing line here, rather than a line a level or two up in the call stack. It's as if the CompareFiles function has some kind of pragma that says "Dont record debugging info for me or anything I call" but it hasn't (and that probably doesn't exist anyway!).
I can't help feeling I've just not set some obvious debug setting, or set one incorrectly while I've been tinkering.
If it makes a difference, I'm calling a PS module from a PS Script, the module is loaded fine from the PSPath via Import-Module, and the line being reported is in the module, as is the actual failing line (both are in the same module), so it's not some problem where it's only debugging the script and not the module.
Both the script and the module have the below at the top;
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
As I say, I get an identical error when I use the ISE so it's not a VSCode setting.
Debugging line by line works fine, so I can step through to find the failing line but surely it should just pop up and tell me.
[Later] I should not it's not just this error, it's been like that for days with all sorts of runtime errors with this and other scripts.
Silly me - I simply removed..
$ErrorActionPreference = "Stop"
..from the script and the module, this was essentially implementing the imaginary pragma I mentioned. I removed it and now get the failing error line.
I probably only need it at one of the two levels if anywhere but error handling works just fine without it, so have removed it everywhere, perhaps I'll look into what it does properly at some point.
Serves me right for adding something blindly because it sounded good, i.e. "Sure, I want it to stop when there's an error, why wouldn't I ? - I'll add that statement then" and not re-testing or looking further into it.

Powershell Remove-Variable cmdlet, do I need to call it at the end of each function/scriptblock?

This is a generic question, no code.
Not sure if I need to remove local variables as I thought it should be done by the Powershell Engine.
I had a script to gather info from WMI and used a lot of local variables. The output was messed up when running multiple times, but it got fixed after I clean up all local variables at the end of function/scriptblock.
Any thoughts/idea would be appreciated.
The trouble do not come from the fact that you do not remove your vars, but by at least from two beginers errors (or done by lazy developpers like me, supposing I'am a developper).
we forget to itialize our vars before using them.
we do not test every returns of our functions or CmdLets calls.
Once these two things done (code multipled by two at least) you can restart your script without cleaning anything except the processed datas.
For me scripting is most of the time done on a corner of a table even not push in a source repository.
So start scripting and ask yourself fewer questions.

back ticks not working in perl

Got stuck with one problem in our live server.
Have script (perl) which runs almost 15 to 18 hrs a day. it creates 100+ sub process every day . One place it has command (product command which we run in command line solaris box) which is being triggerred with back ticks inside perl code.
It looks like the back ticks command gets skipped or failed randomly.
for eg. if i need to run for 50 customers 2 or 3 gets failed randomly.
I do not see the evidence that the command has been triggerred in anywhere.
since its live server we can't even try making much in code change until we are sure about the problem.
here is the code..
my $comm = "inventory -noX customer1"; #sample command i have given here
my $newLogFile = "To capture command output here we have path whre the file gets created");
my $piddy = `$comm 2>&1 > $newLogFile`;
Is it because of the back ticks it happens I am really not sure :(.
Also tried various analysis like memory/CPU/diskspace/Adding librtld_db.so in LD_LIBRARY_PATH etc....but no luck...Also the perl is in 64 bit ...what else Can i? :(
I suspect you are not checking for errors (and perl doesn't make that easy to do correctly for backticks).
Consider using IPC::System::Simple's capture in place of your backticks/qx.
As its doc says, "If there's an error, it will die with a detailed description of what went wrong."
It shouldn't fail just because of backticks, however because it is spawning a new process, that process may be periodically subject to failure due to system conditions (eg. sysLoad). Backticks are really a "fire and forget" method and should never be used for anything critical in a production environment. As previously suggested, there are far more detailed ways to manage spawning external processes.
If the command's output is being lost due to buffering, you might try turning off buffering, but keep an eye on it for performance degradation (it's usually not significant).
Buffering can be turned off for an entire script by adding this near the top:
$|=1;
When calling external commands, I'm using system of IPC::System::Simple or open3 of IPC::Open3.

how to profile(timing) in powershell

My powershell script runs slowly, is there any way to profile the powershell script?
Posting your script here would really help in giving an accurate answer.
You can use Measure-Command to see how much time each statement in your script is taking. However, you have to wrap each statement in Measure-Command.
Trace-Command can also be used to trace what is happening when the script runs. The output from this cmdlet can be quite verbose.
http://www.jonathanmedd.net/2010/06/powershell-2-0-one-cmdlet-at-a-time-104-trace-command.html
You can do random-pausing in the Powershell debugger. Get the script running, and while it's running, type Ctrl-C. It will halt and then you can display the stack. That will tell you where it is, what it's doing, and why. Do this several times, not just once.
Suppose it is taking twice as long as it could. That means each time you interrupt it the probability you will catch it doing the slow thing is 50%. So if you interrupt it 10 times, you should see that on about 5 samples.
Suppose it is taking 5 times as long as it could. That means 4/5 of the time is being wasted, so you should see it about 8 times out of 10.
Even if as little as 1/5 of the time is being wasted, you should see it about 2 times out of 10. Anything you see on as few as 2 samples, if you can find a faster way to do it, will give you a good speed improvement.
Here's a recent blog about speeding up for loops that shows you how to build a "test harness" for timing loops:
http://www.dougfinke.com/blog/index.php/2011/01/16/make-your-powershell-for-loops-4x-faster/
A quick and simple poor-man's profiler is simply to step through the code in the ISE debugger. You can sometimes feel how slow a part of the code is just by stepping over it or by running to some breakpoint.

Finding a Perl memory leak

SOLVED see Edit 2
Hello,
I've been writing a Perl program to handle automatic upgrading of local (proprietary) programs (for the company I work for).
Basically, it runs via cron, and unfortunately has a memory leak (or something similar). The problem is that the leak only happens when I'm not looking (aka when run via cron, not via command line).
My code does not contain any circular (or other) references, so the commonly cited tools will not help me (Devel::Cycle, Devel::Peek).
How would I go about figuring out what is using so much memory that the kernel kills it?
Basically, the code SFTPs into a server (using ```sftp...`` `), calls OpenSSL to verify the file, and then SFTPs more if more files are needed, and installs them (untars them).
I have seen delays (~15 sec) before the first SFTP session, but it has never used so much memory as to be killed (in my presence).
If I can't sort this out, I'll need to re-write in a different language, and that will take precious time.
Edit: The following message is printed out by the kernel which led me to believe it was a memory leak:
[100023.123] Out of memory: kill process 9568 (update.pl) score 325406 or a child
[100023.123] Killed Process 9568 (update.pl)
I don't believe it is an issue with cron because of the stalling (for ~15 sec, sometimes) when running it via the command-line. Also, there are no environmental variables used (at least by what I've written, maybe underlying things do?)
Edit 2: I found the issue myself, with help from the below comment by mobrule (in response to this question). It turns out that the script was called from a crontab of a user (non-root) just once a day and that (non-root privs) caused a special infinite loop situation.
Sorry guys, I feel kinda stupid for not finding this before, but thanks.
mobrule, if you submit your comment as an answer, I will accept it as it lead to me finding the problem.
End Edits
Thanks,
Brian
P.S. I may be able to post small snippets of code, but not the whole thing due to company policy.
You could try using Devel::Size to profile some of your objects. e.g. in the main:: scope (the .pl file itself), do something like this:
use Devel::Size qw(total_size);
foreach my $varname (qw(varname1 varname2 ))
{
print "size used for variable $varname: " . total_size($$varname) . "\n";
}
Compare the actual size used to what you think is a reasonable value for each object. Something suspicious might pop out immediately (e.g. a cache that is massively bloated beyond anything that sounds reasonable).
Other things to try:
Eliminate bits of functionality one at a time to see if suddenly things get a lot better; I'd start with the use of any external libraries
Is the bad behaviour localized to just one particular machine, or one particular operating system? Move the program to other systems to see how its behaviour changes.
(In a separate installation) try upgrading to the latest Perl (5.10.1), and also upgrade all your CPAN modules
How do you know that it's a memory leak? I can think of many other reasons why the OS would kill a program.
The first question I would ask is "Does this program always work correctly from the command line?". If the answer is "No" then I'd fix these issues first.
On the other hand if the answer is "Yes", I would investigate all the differences between having the program executed under cron and from the command line to find out why it is misbehaving.
If it is run by cron, that shouldn't it die after iteration? If that is the case, hard for me to see how a memory leak would be a big deal...
Are you sure it is the script itself, and not the child processes that are using the memory? Perhaps it ends up creating a real lot of ssh sessions , instead of doing a bunch of stuff in one session?