How to instruct PowerShell to garbage collect .NET objects like XmlSchemaSet? - powershell

I created a PowerShell script which loops over a large number of XML Schema (.xsd) files, and for each creates a .NET XmlSchemaSet object, calls Add() and Compile() to add a schema to it, and prints out all validation errors.
This script works correctly, but there is a memory leak somewhere, causing it to consume gigabytes of memory if run on 100s of files.
What I essentially do in a loop is the following:
$schemaSet = new-object -typename System.Xml.Schema.XmlSchemaSet
register-objectevent $schemaSet ValidationEventHandler -Action {
...write-host the event details...
}
$reader = [System.Xml.XmlReader]::Create($schemaFileName)
[void] $schemaSet.Add($null_for_dotnet_string, $reader)
$reader.Close()
$schemaSet.Compile()
(A full script to reproduce this problem can be found in this gist: https://gist.github.com/3002649. Just run it, and watch the memory usage increase in Task Manager or Process Explorer.)
Inspired by some blog posts, I tried adding
remove-variable reader, schemaSet
I also tried picking up the $schema from Add() and doing
[void] $schemaSet.RemoveRecursive($schema)
These seem to have some effect, but still there is a leak. I'm presuming that older instances of XmlSchemaSet are still using memory without being garbage collected.
The question: How do I properly teach the garbage collector that it can reclaim all memory used in the code above? Or more generally: how can I achieve my goal with a bounded amount of memory?

Microsoft has confirmed that this is a bug in PowerShell 2.0, and they state that this has been resolved in PowerShell 3.0.
The problem is that an event handler registered using Register-ObjectEvent is not garbage collected. In reponse to a support call, Microsoft said that
"we’re dealing with a bug in PowerShell v.2. The issue is caused
actually by the fact that the .NET object instances are no longer
released due to the event handlers not being released themselves. The
issue is no longer reproducible with PowerShell v.3".
The best solution, as far as I can see, is to interface between PowerShell and .NET at a different level: do the validation completely in C# code (embedded in the PowerShell script), and just pass back a list of ValidationEventArgs objects. See the fixed reproduction script at https://gist.github.com/3697081: that script is functionally correct and leaks no memory.
(Thanks to Microsoft Support for helping me find this solution.)
Initially Microsoft offered another workaround, which is to use $xyzzy = Register-ObjectEvent -SourceIdentifier XYZZY, and then at the end do the following:
Unregister-Event XYZZY
Remove-Job $xyzzy -Force
However, this workaround is functionally incorrect. Any events that are still 'in flight' are lost at the time these two additional statements are executed. In my case, that means that I miss validation errors, so the output of my script is incomplete.

After the remove-variable you can try to force GC collection :
[GC]::Collect()

Related

Cannot remove variable because it has been optimized and is not removable - releasing a COM object

At the end of my script I use 'ie' | ForEach-Object {Remove-Variable $_ -Force}. It works fine in PS 2 (Windows 7) but PS 5 (Windows 10) throws an error:
Cannot remove variable ie because the variable has been optimized and
is not removable. Try using the Remove-Variable cmdlet (without any
aliases), or dot-sourcing the command that you are using to remove the
variable.
How can I make it play nice with PS 5; or should I just use Remove-Variable 'ie' -Force?
The recommended way to remove COM objects is to call the ReleaseComObject method, passing the object reference ($ie) to the instance of your COM object.
Here is more detailed explanation and sample code from a Windows PowerShell Tip of the Week that shows how to get rid of COM objects:
Whenever you call a COM object from the common language runtime (which
happens to be the very thing you do when you call a COM object from
Windows PowerShell), that COM object is wrapped in a “runtime callable
wrapper,” and a reference count is incremented; that reference count
helps the CLR (common language runtime) keep track of which COM
objects are running, as well as how many COM objects are running. When
you start Excel from within Windows PowerShell, Excel gets packaged up
in a runtime callable wrapper, and the reference count is incremented
to 1.
That’s fine, except for one thing: when you call the Quit method and
terminate Excel, the CLR’s reference count does not get decremented
(that is, it doesn’t get reset back to 0). And because the reference
count is not 0, the CLR maintains its hold on the COM object: among
other things, that means that our object reference ($x) is still valid
and that the Excel.exe process continues to run. And that’s definitely
not a good thing; after all, if we wanted Excel to keep running we
probably wouldn’t have called the Quit method in the first place. ...
... calling the ReleaseComObject method [with] our
instance of Excel ... decrements the reference count for the object in
question. In this case, that means it’s going to change the reference
count for our instance of Excel from 1 to 0. And that is a good thing:
once the reference count reaches 0 the CLR releases its hold on the
object and the process terminates. (And this time it really does
terminate.)
$x = New-Object -com Excel.Application
$x.Visible = $True
Start-Sleep 5
$x.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($x)
Remove-Variable x
The message "Cannot remove variable ie because the variable has been optimized and is not removable." you get, most likely means you have tried to access (inspect, watch, or otherwise access) a variable which has been already removed by the optimizer.
wp78de's helpful answer explains what you need to do to effectively release a COM object instantiated in PowerShell code with New-Object -ComObject.
Releasing the underlying COM object (which means terminating the process of a COM automation server such as Internet Explorer) is what matters most, but it's worth pointing out that:
Even without calling [System.Runtime.Interopservices.Marshal]::ReleaseComObject($ie) first, there's NO reason why your Remove-Variable call should fail (even though, if successful, it wouldn't by itself release the COM object).
I have no explanation for the error you're seeing (I cannot recreate it, but it may be related to this bug).
There's usually no good reason to use ForEach-Object with Remove-Variable, because you can not only pass one variable name directly, but even an array of names to the (implied) -Name parameter - see Remove-Variable -?;
Remove-Variable ie -Force should work.
Generally, note that -Force is only needed to remove read-only variables; if you want to (also) guard against the case where a variable by the specified name(s) doesn't exist, (also) use
-ErrorAction Ignore.

Do I have to close a file if I use Get-ChildItem

Im using Get-ChildItem to read files in a folder then I get the lastwritetime for each file and sort them.
Do I have to close the files after getting the lastwritetime?
No. This is just a list of info for the files. No streams or locks are taking place
As has already been mentioned, the answer is no.
The reason being that when you use Get-ChildItem against the file system it doesn't actually open any files - it interrogates an underlying API which then returns metadata about the files in the file system - so there's no file handle to be "closed".
From the comments on your question, I sense some confusion as to "why don't I need to manage system resource allocation in PowerShell?"
PowerShell runs on .NET, and the .NET runtime is garbage-collected. At some (undefined) point in time after a block of memory is no longer referenced by any pointers, the garbage collector will take care of freeing it, and you don't need to worry about managing this process yourself.
Of course there are situations in which resource allocation is external to the runtime, and has to be managed, but the usual pattern in .NET is to implement the IDisposable interface when defining classes that depend on unmanaged resources. An example is a StreamReader (with which you could read a text file). In C# you can use the using directive to automatically dispose of such an object once execution leaves the scope in which it's required:
using(StreamReader reader = File.OpenText("C:\path\to\file.txt"))
{
// use reader in here
}
// at this point, reader.Dispose() has been called automatically
In PowerShell, there is no such semantic construct. What I usually do when allocating many disposable objects is wrap them in a try/finally block:
try {
$FileReader = [System.IO.File]::OpenText("C:\path\to\file.txt")
# user $FileReader here
}
finally {
if($FileReader -ne $null){
$FileReader.Dispose()
}
}
Of course all of this is hidden away from you when invoking Get-Content for example - the developer of the underlying function in the file system provider has already taken care of disposing the object by the time the pipeline stops running. It's really only needed when you want to write your own cmdlets and interact with more "primitive" types directly
I hope this sheds some light on your confusion

Get References to Powershell's Stream Objects?

I am interested in getting .NET object references for the different streams that come with a Powershell host (stdin, plus the 5 output streams debug, info, error, etc.) I am interested in passing these to custom .NET types which will NOT be cmdlets... just .NET types that expect to use 5 output streams and 1 input stream.
I have spent lots of time googling and msdning and I just can't seem to find information about these streams beyond the cmdlets that read/write them.
If this is not possible, then a link to some related documentation would make for an answer.
Update
Thanks for the feedback so far, and sorry for the delay in making it back to this question.
#CharlieJoynt the idea here is that I will be using PowerShell as an entry point for a number of custom .NET types. These are types that will also be imported into other class libraries and EXEs so they cannot be PowerShell-specific. Anything that does host the types will, however, provide streams for info/log/error/etc output (instead of choosing a specific logging framework like log4net).
#PetSerAl I am not sure what an XY question is? If my update doesn't add the clarity you are looking for, can you clarify ( :P ) what the gap is?
Thanks again for the feedback so far, folks.
I have been able to intercept data written to certain streams by using the Register-ObjectEvent cmdlet.
Register-ObjectEvent
https://technet.microsoft.com/en-us/library/hh849929.aspx
The Register-ObjectEvent cmdlet subscribes to events that are
generated by .NET Framework objects on the local computer or on a
remote computer. When the subscribed event is raised, it is added to
the event queue in your session. To get events in the event queue, use
the Get-Event cmdlet.
You can use the parameters of Register-ObjectEvent to specify
property values of the events that can help you to identify the event
in the queue. You can also use the Action parameter to specify actions
to take when a subscribed event is raised and the Forward parameter to
send remote events to the event queue in the local session.
In my case I had created a new System.Diagnostics.Process object as $Process, but before starting that process I registered some event handlers, which exists as Jobs, e.g.
$StdOutJob = Register-Object-Event -InputObject $Process `
-EventName OutputDataReceived -Action $ScriptBlock
...where $ScriptBlock is a pre-determined script block that handles the events coming from that stream. Within that script block, the events are accessible via some built-in variables:
The value of the Action parameter can include the $Event,
$EventSubscriber, $Sender, $EventArgs, and $Args
automatic variables, which provide information about the event to
the Action script block.
So your ScriptBlock could take $EventArgs.Data and do something with it.
Disclaimer: I have not used this method to try to intercept all the streams you mention, just OutputDataReceived andErrorDataReceived.

Does New-Object have a memory leak?

Using Powershell v3, I'm using the .net library class System.DirectoryServices.DirectoryEntry and System.DirectoryServices.DirectorySearcher to query a list of properties from users in a domain. The code for this is basically found here.
The only thing you need to add, is a line $ds.PageSize=1000 in between lines $ds.Filter = '(&(objectCategory=person)(objectClass=user))' and $ds.PropertiesToLoad.AddRange($properties). This will remove the limit of only grabbing 1000 users.
The number of users from one domain (we'll call it domain1) I have has over 80,000. Another domain (we'll call this domain2) has over 200,000 users.
If I run the code on domain1, it takes roughly 12 minutes (which is fantastic compared to the 24 hours Get-QADUser was taking). However, after the script is finished in the PowerShell window, I notice a memory hog is left of about 500mb. Domain2 leaves a memory hog of about 1.5gb.
Note: the memory leak with Get-QADUser is much much MUCH worse. Domain2 leaves a memory hog of about 6gb and takes roughly 72 hours to complete (vs less than an hour with the .net class).
The only way to free the memory is to close the PowerShell window. But what if I want to write a script to invoke all these domains scripts to run them one after the other? I would run out of memory after getting to the 6th script.
The only thing I can think of is New-Object is creating a constructor and does not have a destructor (unlike java). I've tried using the [System.GC]::Collect() during the loop iterations, but this has had no affect.
Is there a reason? Is it solvable or am I stuck with this?
One thing to note: If there actually is a memory leak, you can just run the script in a new shell:
powershell { <# your code here #> }
As for the leak, as long as you have any variables that reference an object that holds on to large data it cannot be collected. You may have luck by using a memory profiler to look at what is still in memory and why. As far as I can see, if you use that code in a script file and execute the script (with &, not with .!), then this shouldn't really happen, though.
I guess it's $table that is using all the memory. Why don't you write the collected data directly to a file, instead of adding it to the array?

Odd failures with PS v2 remoting

I have a moderatly complex script made up of a PS1 file that does Import-Module on a number of PSM1 files, and includes a small amount of global variables that define state.
I have it working as a regular script, and I am now trying to implement it for Remoting, and running into some very odd issues. I am seeing a ton of .NET runtime errors with eventID of 0, and the script seems to work intermittently, with time between attempts seeming to affect results. I know that isn't a very good description of the problem, but I haven't had a chance to test more deeply, and I am just wondering if I am perhaps pushing PowerShell v2 further than it can really handle, trying to do remoting with a complex and large script like this? Or does this look more like something I have wrong in code and once I get that sorted I will get consistent script processing? I am relatively new to PowerShell and completely new to remoting.
The event data is
.NET Runtime version : 2.0.50727.5485 - Application ErrorApplication
has generated an exception that could not be handled. Process ID=0x52c
(1324), Thread ID=0x760 (1888). Click OK to terminate the application.
Click CANCEL to debug the application.
Which doesn't exactly provide anything meaningful. Also, rather oddly, if I clear the event log, it seems like I have a better chance of not having an issue. Once there are errors in the event log the chance of the script failing is much higher.
Any thoughts before I dig into troubleshooting are much appreciated. And suggestions on best practices when troubleshooting Remote scripts also much appreciated.
One thing about v2 remoting is that the shell memory limit is set pretty small - 150 MB. You might try bumping that up to say 1gb like so:
Set-Item WSMan:\localhost\shell\MaxMemoryPerShellMB 1024 -force