I have several PowerShell scripts that I use to manage a medium sized Microsoft Exchange organization (~10,000 mailboxes). Several of the scripts process all of the organization's mailboxes in some way. One common problem I run into while running these scripts is resource exhaustion. These scripts end up using gigabytes of RAM.
My research suggests that using the pipeline avoids memory consumption because the results aren't loaded into an array prior to processing. However, under certain conditions, Get-Mailbox still seems to load the entire list of results into memory before it attempts to pass those results to the next command in the pipeline.
For instance, I assumed the following example code would start listing the mobile devices associated with each mailbox as soon as the the command is executed:
EXAMPLE 1
function GetMailboxDevices
{
process
{
Write-Host $_.Alias -ForegroundColor Green
Get-MobileDevice -Mailbox $_
}
}
Get-Mailbox -ResultSize Unlimited | GetMailboxDevices
However, this code does not appear to process the results in real time while the Get-Mailbox cmdlet is running. Instead, Get-Mailbox appears to take a few minutes to run and then passes all of the results to the second command in the pipeline at once. The PowerShell session's RAM usage climbs to 1.5 GB or higher during this process.
Nevertheless, I can work around the issue using code similar to the following:
EXAMPLE 2
function GetMailboxAliases
{
process
{
Write-Host $_.Alias -ForegroundColor Green
$_.Alias
}
}
$aliases = Get-Mailbox -ResultSize Unlimited | GetMailboxAliases
foreach ($alias in $aliases)
{
Get-MobileDevice -Mailbox $alias
}
In the second example, Get-Mailbox does pass each result down the pipeline in real time as opposed to all at once (Write-Host confirms this) and the RAM usage does not increase significantly. Of course, this code is not as elegant as I have to collect the aliases into an array and then process the array with a foreach statement.
The pipeline seems to be effective if I do something simple in the function (such as simply returning the alias of each mailbox), but the behavior changes as soon as I introduce another Exchange cmdlet into the function (such as Get-MobileDevices).
My question is this: why doesn't the code in example 1 leverage the pipeline efficiently but example 2 does? What steps can be taken to ensure the pipeline is leveraged efficiently?
I'am not using Exchange so mush, but in my scripts I would do this :
function GetMailboxDevices ($mb)
{
process
{
Write-Host $_.Alias -ForegroundColor Green
Get-MobileDevice -Mailbox $mb
}
}
Get-Mailbox -ResultSize Unlimited | Foreach-object { GetMailboxDevices $_}
or
Get-Mailbox -ResultSize Unlimited | % { GetMailboxDevices $_}
When you are running a script that returns a lot of objects, loading them to a variable is a great way to help memory usage. This is why version 2 performs better.
Also, running this script remotely from another server or workstation could also make code execution easier and make resource usage easier to troubleshoot.
You can try this to check your impressions:
Measure-Command { script1.ps1 }
Measure-command { script2.ps1 }
Related
I tried following powershell-command, but then 1000 windows opened and the powershell ISE crashed. Is there a way to run the batch-file 1000 times in the background? And is there a smarter way that leads to the average execution time?
That's the code I've tried:
cd C:\scripts
Measure-Command {
for($i = 0;$i -lt 1000;$i++){
Start-Process -FilePath "C:\scripts\open.bat"
}
}
Start-Process by default runs programs asynchronously, in a new console window.
Since you want to run your batch file synchronously, in the same console window, invoke it directly (which, since the path is double-quoted - though it doesn't strictly have to be in this case - requires &, the call operator for syntactic reasons):
Measure-Command {
foreach ($i in 1..1000){
& "C:\scripts\open.bat"
}
}
Note: Measure-Command discards the success output from the script block being run; if you do want to see it in the console, use the following variation, though note that it will slow down processing:
Measure-Command {
& {
foreach ($i in 1..1000){
& "C:\scripts\open.bat"
}
} | Out-Host
}
This answer explains in more detail why Start-Process is typically the wrong tool for invoking console-based programs and scripts.
Measure-Command is the right tool for performance measurement in PowerShell, but it's important to note that such measurements are far from an exact science, given PowerShell's dynamic nature, which involves many caches and on-demand compilation behind the scenes.
Averaging multiple runs generally makes sense, especially when calling external programs; by contrast, if PowerShell code is executed repeatedly and the repeat count exceeds 16, on-demand compilation occurs and speeds up subsequent executions, which can skew the result.
Time-Command is a friendly wrapper around Measure-Command, available from this MIT-licensed Gist[1]; it can be used to simplify your tests.
# Download and define function `Time-Command` on demand (will prompt).
# To be safe, inspect the source code at the specified URL first.
if (-not (Get-Command -ea Ignore Time-Command)) {
$gistUrl = 'https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1'
if ((Read-Host "`n====`n OK to download and define benchmark function ``Time-Command`` from Gist ${gistUrl}?`n=====`n(y/n)?").Trim() -notin 'y', 'yes') { Write-Warning 'Aborted.'; exit 2 }
Invoke-RestMethod $gistUrl | Invoke-Expression
if (-not ${function:Time-Command}) { exit 2 }
}
Write-Verbose -Verbose 'Running benchmark...'
# Omit -OutputToHost to run the commands quietly.
Time-Command -Count 1000 -OutputToHost { & "C:\scripts\open.bat" }
Note that while Time-Command is a convenient wrapper even for measuring a single command's performance, it also allows you to compare the performance of multiple commands, passed as separate script blocks ({ ... }).
[1] Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
In AD, I'm trying to identify user accounts where the same EmployeeID value is populated in 2 or more records. Below is my piece of code (Credit: I'm using a Show-Progress function defined here) and the Get-ADUser command alone has taken more than 2 hours to fetch all the records. The other steps (2 to 5) have been pretty quick. While I've completed the work, I'm trying to know if this could've been done more efficiently with PowerShell.
Get-ADUser -LDAPFilter "(&(ObjectCategory=Person)(objectclass=user)(employeeid=*))" -Properties $properties -Server $server_AD_GC -ResultPageSize 1000 |
# *ISSUE HERE*
# The Get-ADUser extract process seems to work very slow.
# However, it is important to note that the above command will be retrieving more than 200K records
# NOTE: I've inferred that employeeid is an indexed attribute and is replicated to GlobalCatalogs and hence have used it in the filter
Show-Progress -Activity "(1/5) Getting AD Users ..." |
select $selectPropsList -OutVariable results_UsersBaseSet |
Group-Object EmployeeID |
Show-Progress -Activity "(2/5) Grouping on EmployeeID ..." |
? { $_.Count -gt 1 } |
Show-Progress -Activity "(3/5) Filtering only dup EmpID records ..." |
select -Exp Group |
Show-Progress -Activity "(4/5) UnGrouping ..." |
Export-Csv "C:\Users\me\op_GetADUser_w_EmpID_Dupes_EntireForest - $([datetime]::Now.ToString("MM-dd-yyyy_hhmmss")).csv" -NoTypeInformation |
Show-Progress -Activity "(5/5) Exporting ..." |
Out-Null
PS: I've also tried to first export all the user accounts to a csv file and then post-process with Excel but I had to frown because of the size of the dataset and it was both time and memory crunching.
Any suggestion is highly appreciated.
Since we don't know what is in $properties or $selectPropsList, your question is really only about finding out to which users the same EmployeeID has been issued, right?
By default, Get-ADUser already returns these properties:
DistinguishedName, Enabled, GivenName, Name, ObjectClass, ObjectGUID, SamAccountName, SID, Surname, UserPrincipalName
So all you need extra is the EmployeeID I guess. Trying to collect LOTS of properties does slow down, so keeping this to a bare minimum helps to speed things up.
Next, by using the Show-Progress script you have linked to, you will slow down the execution of the script considerably. Do you really need to have a progress bar?
Why not simply write the lines with activity steps directly to the console?
Also, piping everything together doesn't help in the speed department either..
$server_AD_GC = 'YourServer'
$selectPropsList = 'EmployeeID', 'Name', 'SamAccountName', 'Enabled'
$outFile = "C:\Users\me\op_GetADUser_w_EmpID_Dupes_EntireForest - $([datetime]::Now.ToString("MM-dd-yyyy_hhmmss")).csv"
Write-Host "Step (1/4) Getting AD Users ..."
$users = Get-ADUser -Filter "EmployeeID -like '*'" -Properties EmployeeID -Server $server_AD_GC -ResultPageSize 1000
Write-Host "Step (2/4) Grouping on EmployeeID ..."
$dupes = $users | Group-Object -Property EmployeeID | Where-Object { $_.Count -gt 1 }
Write-Host "Step (3/4) Collecting duplicates ..."
$result = foreach ($group in $dupes) {
$group.Group | Select-Object $selectPropsList
}
Write-Host "Step (4/4) Exporting ..."
$result | Export-Csv -Path $outFile -NoTypeInformation
Write-Host "All done" -ForegroundColor Green
P.S. Get-ADUser already returns user objects only, so there is no need for the LDAP filter (ObjectCategory=Person)(objectclass=user). Using -Filter "EmployeeID -like '*'" is probably faster
This answer complements Theo's helpful answer and focuses on showing progress during the operation:
The linked Show-Progress function, which is the latest as of this writing:
has an outright bug, in that it doesn't pass pipeline input through (the relevant line is accidentally commented out)
is conceptually flawed in that it doesn't use a process block, which means that all pipeline input is collected first, before it is processed - which defeats the idea of a progress bar.
Therefore, you Show-Progress calls won't show progress until the previous command in the pipeline has output all of its output. A simple alternative is to break the pipeline into separate commands and to simply emit one progress message before each command, announcing the next stage of processing (rather than per-object progress) as shown in Theo's answer.
Generally, there is no way to show the progress of command-internal processing, only the progress of a command's (multi-object) output.
The simplest way to do this via a ForEach-Object call in which you call
Write-Progress, but that comes with two challenges:
In order to show a percent-complete progress bar, you need to know how many objects there will be in total, which you must determine ahead of time, because a pipeline cannot know how many objects it will receive; your only option is to collect all output first (or find some other way to count it) and then use the collected output as pipeline input, using the count of objects as the basis for calculating the value to pass to Write-Progress -PerCentComplete.
Calling Write-Progress for each object received will result in a significant slowdown of overall processing; a compromise is to only call it for every N objects, as shown in this answer; the approach there could be wrapped in a properly implemented function a la Show-Progress that requires passing the total object count as an argument and performs proper streaming input-object processing (via a process block); that said, the mere act of using PowerShell code for passing input objects through is costly.
Conclusion:
Percent-complete progress displays have two inherent problems:
They require you to know the total number of objects to process beforehand (a pipeline has no way of knowing how many objects will pass through it):
Either: Collect all objects to process in memory, beforehand, if feasible; the count of elements in the collection can then serve as the basis for the percent-complete calculations. This may not be an option with very large input sets.
Or: Perform an extra processing step beforehand that merely counts all objects without actually retrieving them. This may not be practical in terms of the additional processing time added.
Object-by-object processing in PowerShell code - either via ForEach-Object or an advanced script/function - is inherently slow.
You can mitigate that somewhat by limiting Write-Progress calls to every N objects, as shown in this answer
Overall it's a tradeoff between processing speed and the ability to show percent-complete progress to the end user.
I have a script I'm using to loop through a bunch of domains and get dates from whois.exe. This works line-by-line, but when run as a script, it'll freeze. Here is where it gets stuck:
ForEach ($domain in $domains)
{
$domainname = $domain.Name
Write-Host "Processing $domainname..."
# WhoIsCL responds with different information depending on if it's a .org or something else.
if($domainname -like "*.org" -and $domainname)
{
$date = .\WhoIs.exe -v "$domainname" | Select-String -Pattern "Registry Expiry Date: " -AllMatches
Write-Host "Domain is a .org" -ForegroundColor "Yellow"
When I CTRL+C to cancel the command, I can verify that $domain is the correct variable. I can then write this:
if($domainname -like "*.org" -and $domainname)
{
"Test"
}
... and "Test" appears in the command line. I then run:
$date = .\WhoIs.exe -v "$domainname" | Select-String -Pattern "Registry Expiry Date: " -AllMatches
Upon checking the date, it comes out right and I get the appropriate date. Given it freezes right as it says "Processing $domainname..." and right before "Domain is a .org", I can only assume WhoIs.exe is freezing. So, why does this happen as the script is being run, but not directly from the Powershell window?
Lastly, I did a final test by simply copying and pasting the entire script into a Powershell window (which is just silly, but it appears to function) and get the same result. It freezes at whois.exe.
My best guess is that whois.exe needs to be run differently to be reliable in Powershell in my for-loop. However, I don't seem to have a way to test using it in a Start-Process and get string output.
Anyways, advise would be great. I've definitely hit a wall.
Thanks!
If your script is running through lots of domains, it could be that you're being throttled. Here is a quote from the Nominet AUP:
The maximum query rate is 5 queries per second with a maximum of 1,000
queries per rolling 24 hours. If you exceed the query limits a block
will be imposed. For further details regarding blocks please see the
detailed instructions for use page. These limits are not per IP
address, they are per user.
http://registrars.nominet.org.uk/registration-and-domain-management/acceptable-use-policy
Different registrars may behave differently, but I'd expect some sort of rate limit. This would explain why a script (with high volume) behaves differently to ad-hoc manual lookups.
Proposed solution from the comments below is to add Start-Sleep -Seconds 1 to the loop between each Whois lookup.
I'm completely new to using PowerShell and computer programming in general. I'm trying to create a script that kills a process(firefox in this case) if it exceeds a working set of 10mb. I would like to do this using an if statement and also having a piping command included. So far I have this:
get-process|where-object {$_.ProcessName -eq"firefox"}
if ($_.WorkingSet -gt10000000)
{kill}
else
{"This process is not using alot of the computers resources"}
Can anyone help to fix this? Even though firefox is exceeding 10MB working set, the else statement is always reproduced.
You need to wrap the conditional in a loop:
Get-Process | ? { $_.ProcessName -eq 'firefox' } | % {
if ($_.WorkingSet -gt 10MB) {
kill $_
} else {
"This process is not using alot of the computers resources"
}
}
Otherwise the conditional would be evaluated independently from the pipeline, which means that the current object variable ($_) would be empty at that point.
You can filter the process in question by using the Name parameter (no need to use Where-Object for this purpose) then pipe the objects to the Stop-Process cmdlet. Notice the -WhatIf switch, it shows what would happen if the cmdlet runs (the cmdlet is not run). Remove it to execute the cmdlet.
Get-Process -Name firefox |
Where-Object {$_.WorkingSet -gt 10mb} |
Stop-Process -WhatIf
I need to run a Get-Mailbox | Get-MailboxStatistics command across a large number of mailboxes but the majority have never been used as it is a new install. As a result, I have to sit through hundreds of lines of
WARNING: There is no data to return for the specified mailbox '<mailbox DN>' because it has not been logged on to.
It would seem that I need to use a server-side filter of some kind but I haven't been able to find anything appropriate.
What can I do here?
There is no server side filtering in Get-MailboxStatistics and I can't repro it. Can you try this:
Get-Mailbox | Get-MailboxStatistics -warningAction silentlyContinue
This is the standard PS behavior for warnings. You can find Shay's parameter in the help for common_parameters get-help about_common_parameters. Alternately, you can set $WarningPreference = silentlycontinue. There are no statistics to return as the mailboxes have not yet been initialized, hence the warning.