Does New-Object have a memory leak? - powershell

Using Powershell v3, I'm using the .net library class System.DirectoryServices.DirectoryEntry and System.DirectoryServices.DirectorySearcher to query a list of properties from users in a domain. The code for this is basically found here.
The only thing you need to add, is a line $ds.PageSize=1000 in between lines $ds.Filter = '(&(objectCategory=person)(objectClass=user))' and $ds.PropertiesToLoad.AddRange($properties). This will remove the limit of only grabbing 1000 users.
The number of users from one domain (we'll call it domain1) I have has over 80,000. Another domain (we'll call this domain2) has over 200,000 users.
If I run the code on domain1, it takes roughly 12 minutes (which is fantastic compared to the 24 hours Get-QADUser was taking). However, after the script is finished in the PowerShell window, I notice a memory hog is left of about 500mb. Domain2 leaves a memory hog of about 1.5gb.
Note: the memory leak with Get-QADUser is much much MUCH worse. Domain2 leaves a memory hog of about 6gb and takes roughly 72 hours to complete (vs less than an hour with the .net class).
The only way to free the memory is to close the PowerShell window. But what if I want to write a script to invoke all these domains scripts to run them one after the other? I would run out of memory after getting to the 6th script.
The only thing I can think of is New-Object is creating a constructor and does not have a destructor (unlike java). I've tried using the [System.GC]::Collect() during the loop iterations, but this has had no affect.
Is there a reason? Is it solvable or am I stuck with this?

One thing to note: If there actually is a memory leak, you can just run the script in a new shell:
powershell { <# your code here #> }
As for the leak, as long as you have any variables that reference an object that holds on to large data it cannot be collected. You may have luck by using a memory profiler to look at what is still in memory and why. As far as I can see, if you use that code in a script file and execute the script (with &, not with .!), then this shouldn't really happen, though.

I guess it's $table that is using all the memory. Why don't you write the collected data directly to a file, instead of adding it to the array?

Related

Odd failures with PS v2 remoting

I have a moderatly complex script made up of a PS1 file that does Import-Module on a number of PSM1 files, and includes a small amount of global variables that define state.
I have it working as a regular script, and I am now trying to implement it for Remoting, and running into some very odd issues. I am seeing a ton of .NET runtime errors with eventID of 0, and the script seems to work intermittently, with time between attempts seeming to affect results. I know that isn't a very good description of the problem, but I haven't had a chance to test more deeply, and I am just wondering if I am perhaps pushing PowerShell v2 further than it can really handle, trying to do remoting with a complex and large script like this? Or does this look more like something I have wrong in code and once I get that sorted I will get consistent script processing? I am relatively new to PowerShell and completely new to remoting.
The event data is
.NET Runtime version : 2.0.50727.5485 - Application ErrorApplication
has generated an exception that could not be handled. Process ID=0x52c
(1324), Thread ID=0x760 (1888). Click OK to terminate the application.
Click CANCEL to debug the application.
Which doesn't exactly provide anything meaningful. Also, rather oddly, if I clear the event log, it seems like I have a better chance of not having an issue. Once there are errors in the event log the chance of the script failing is much higher.
Any thoughts before I dig into troubleshooting are much appreciated. And suggestions on best practices when troubleshooting Remote scripts also much appreciated.
One thing about v2 remoting is that the shell memory limit is set pretty small - 150 MB. You might try bumping that up to say 1gb like so:
Set-Item WSMan:\localhost\shell\MaxMemoryPerShellMB 1024 -force

How much job output will PowerShell hold if I don't receive it?

I have a long-running PowerShell script workflow that is invoked as a job using the -AsJob parameter. It periodically writes warnings and output messages that would need to be retrieved using the Retrieve-Job cmdlet.
I'm curious where PowerShell stores this job output that has not yet been retrieved and if there is a practical upper limit. If so, would it automatically purge job output in a FIFO manner? Crash?
I don't know if this really answers your question, but I tried a variety of Start-Job commands that just created a range from 1 to a a big number - most gave an out of memory exception but I finally got one to work to completion:
# these all bombed out
Start-Job -ScriptBlock { 1..4000000000 }
Start-Job -ScriptBlock { 1..2000000000 }
Start-Job -ScriptBlock { 1..1000000000 }
# this one finally started evaluating
Start-Job -ScriptBlock { 1..100000000 }
I let it run for a little while and it ate up one full CPU core while it was evaluating the range. It ended using about 5GB of RAM according to the Task Manager and would have used more but started to run out of physical memory at that point. The CPU usage dropped as the OS had to start paging memory like crazy.
As you can see in the screenshot below, it actually spawned another powershell.exe process - so I guess that partially answers your question of where does it store the data while the job is running. It appears there is a practical limit that depends on the resources available. PowerShell will not purge or otherwise lose information in the job buffer (unless the second process crashes?) but it may become problematic to retrieve that much data too. You may want to do a few trial runs and see what the acceptable and practical limits are that can be supported by your environment.
Note: Using a much smaller range, like 1..100000, provides more than ample records to observe the behavior while not completely overwhelming the system. I used the larger numbers to see what the practical limits are.

How to instruct PowerShell to garbage collect .NET objects like XmlSchemaSet?

I created a PowerShell script which loops over a large number of XML Schema (.xsd) files, and for each creates a .NET XmlSchemaSet object, calls Add() and Compile() to add a schema to it, and prints out all validation errors.
This script works correctly, but there is a memory leak somewhere, causing it to consume gigabytes of memory if run on 100s of files.
What I essentially do in a loop is the following:
$schemaSet = new-object -typename System.Xml.Schema.XmlSchemaSet
register-objectevent $schemaSet ValidationEventHandler -Action {
...write-host the event details...
}
$reader = [System.Xml.XmlReader]::Create($schemaFileName)
[void] $schemaSet.Add($null_for_dotnet_string, $reader)
$reader.Close()
$schemaSet.Compile()
(A full script to reproduce this problem can be found in this gist: https://gist.github.com/3002649. Just run it, and watch the memory usage increase in Task Manager or Process Explorer.)
Inspired by some blog posts, I tried adding
remove-variable reader, schemaSet
I also tried picking up the $schema from Add() and doing
[void] $schemaSet.RemoveRecursive($schema)
These seem to have some effect, but still there is a leak. I'm presuming that older instances of XmlSchemaSet are still using memory without being garbage collected.
The question: How do I properly teach the garbage collector that it can reclaim all memory used in the code above? Or more generally: how can I achieve my goal with a bounded amount of memory?
Microsoft has confirmed that this is a bug in PowerShell 2.0, and they state that this has been resolved in PowerShell 3.0.
The problem is that an event handler registered using Register-ObjectEvent is not garbage collected. In reponse to a support call, Microsoft said that
"we’re dealing with a bug in PowerShell v.2. The issue is caused
actually by the fact that the .NET object instances are no longer
released due to the event handlers not being released themselves. The
issue is no longer reproducible with PowerShell v.3".
The best solution, as far as I can see, is to interface between PowerShell and .NET at a different level: do the validation completely in C# code (embedded in the PowerShell script), and just pass back a list of ValidationEventArgs objects. See the fixed reproduction script at https://gist.github.com/3697081: that script is functionally correct and leaks no memory.
(Thanks to Microsoft Support for helping me find this solution.)
Initially Microsoft offered another workaround, which is to use $xyzzy = Register-ObjectEvent -SourceIdentifier XYZZY, and then at the end do the following:
Unregister-Event XYZZY
Remove-Job $xyzzy -Force
However, this workaround is functionally incorrect. Any events that are still 'in flight' are lost at the time these two additional statements are executed. In my case, that means that I miss validation errors, so the output of my script is incomplete.
After the remove-variable you can try to force GC collection :
[GC]::Collect()

How to preserve data between executions of program

I am running a perl script on a HP-UX box. The script will execute every 15 minutes and will need to compare it's results with the results of the last time it executed.
I will need to store two variables (IsOccuring and ErrorCount) between the executions. What is the best way to do this?
Edit clarification:
It only compares the most recent execution to the current execution.
It doesn't matter if the value is lost between reboots.
And touching the filesystem is pretty much off limits.
If you can't touch the file system, try using a shared memory segment. There are helper modules for that like IPC::ShareLite, or you can use the shmget and related functions directly.
You'll have to store them in a file. This sort of file is often kept in /tmp, but any place where the user running the cron job has access would do. Make sure your script can handle the case where the file is missing.
You could create a separate process running a "remember stuff" service over your choice of IPC mechanism. This sounds like a rather tortured solution to "I don't want to touch the disk" but if it's important enough to offset a couple of days of development work (realistically, if you are new to IPC, and HP-SUX continues to live up to its name) then by all means read man perlipc for a start.
Does it have to be completely re-executed? Can you just have it running in a loop and sleeping for 15 minutes between iterations? Than you don't have to worry about saving the values externally, the program never stops.
I definitely think IPC is the way to go here.
I'd save off the data in a file. Then, inside the script I'd load the last results if the file exists.
Use module Storable to serialize Perl data structures, save them anywhere you want and deserialize them during next script execution.

What can I do to find out what's causing my program to consume lots of memory over time?

I have an application using POE which has about 10 sessions doing various tasks. Over time, the app starts consuming more and more RAM and this usage doesn't go down even though the app is idle 80% of the time. My only solution at present is to restart the process often.
I'm not allowed to post my code here so I realize it is difficult to get help but maybe someone can tell me what I can do find out myself?
Don't expect the process size to decrease. Memory isn't released back to the OS until the process terminates.
That said, might you have reference loops in data structures somewhere? AFAIK, the perl garbage collector can't sort out reference loops.
Are you using any XS modules anywhere? There could be leaks hidden inside those.
A guess: your program executes a loop for as long as it is running; in this loop it may be that you allocate memory for a buffer (or more) each time some condition occurs; since the scope is never exited, the memory remains and will never be cleaned up. I suggest you check for something like this. If it is the case, place the allocating code in a sub that you call from the loop and where it will go out of scope, and get cleaned up, on return to the loop.
Looks like Test::Valgrind is a tool for searching for memory leaks. I've never used it myself though (but I used plain valgrind with C source).
One technique is to periodically dump the contents of $POE::Kernel::poe_kernel to a time- or sequence-named file. $poe_kernel is the root of a tree spanning all known sessions and the contents of their heaps. The snapshots should monotonically grow if the leaked memory is referenced. You'll be able to find out what's leaking by diff'ing an early snapshot with a later one.
You can export POE_ASSERT_DATA=1 to enable POE's internal data consistency checks. I don't expect it to surface problems, but if it does I'd be very happy to receive a bug report.
Perl can not resolve reference rings. Either you have zombies (which you can detect via ps axl) or you have a memory leak (reference rings/circle)
There are a ton of programs to detect memory leaks.
strace, mtrace, Devel::LeakTrace::Fast, Devel::Cycle