Run N parallel jobs in powershell - powershell

I have the following powershell script
$list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
$list | % {
GetData $_ > $_.txt
ZipTheFile $_.txt $_.txt.zip
...
}
How to run the script block ({ GetDatta $_ > $_.txt ....}) in parallel with limited maximum number of job, e.g. at most 8 files can be generated at one time?

Same idea as user "Start-Automating" posted, but corrected the bug about forgetting to start the jobs that are held back when hitting the else clause in his example:
$servers = #('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n')
foreach ($server in $servers) {
$running = #(Get-Job | Where-Object { $_.State -eq 'Running' })
if ($running.Count -ge 4) {
$running | Wait-Job -Any | Out-Null
}
Write-Host "Starting job for $server"
Start-Job {
# do something with $using:server. Just sleeping for this example.
Start-Sleep 5
return "result from $using:server"
} | Out-Null
}
# Wait for all jobs to complete and results ready to be received
Wait-Job * | Out-Null
# Process the results
foreach($job in Get-Job)
{
$result = Receive-Job $job
Write-Host $result
}
Remove-Job -State Completed

The Start-Job cmdlet allows you to run code in the background. To do what you'd ask, something like the code below should work.
foreach ($server in $servers) {
$running = #(Get-Job | Where-Object { $_.State -eq 'Running' })
if ($running.Count -le 8) {
Start-Job {
Add-PSSnapin SQL
$list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
...
}
} else {
$running | Wait-Job
}
Get-Job | Receive-Job
}
Hope this helps.

It should be really easy with the Split-Pipeline cmdlet of the SplitPipeline module.
The code will look as simple as this:
Import-Module SplitPipeline
$list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
$list | Split-Pipeline -Count 8 {process{
GetData $_ > $_.txt
ZipTheFile $_.txt $_.txt.zip
...
}}

Old thread but I think this could help:
$List = C:\List.txt
$Jobs = 8
Foreach ($PC in Get-Content $List)
{
Do
{
$Job = (Get-Job -State Running | measure).count
} Until ($Job -le $Jobs)
Start-Job -Name $PC -ScriptBlock { "Your command here $Using:PC" }
Get-Job -State Completed | Remove-Job
}
Wait-Job -State Running
Get-Job -State Completed | Remove-Job
Get-Job
The "Do" loop pause the "foreach" when the amount of job "running" exceed the amount of "$jobs" that is allowed to run.
Than wait for the remaining to complete and show failed jobs...

Background jobs is the answer. You can also throttle the jobs in the run queue using [System.Collection.Queue]. There is a blog post from PowerShell team on this topic: https://devblogs.microsoft.com/powershell/scaling-and-queuing-powershell-background-jobs/
Using queuing method is probably the best answer to throttling background jobs.

I use and improove a multithread Function, you can use it like :
$Script = {
param($Computername)
get-process -Computername $Computername
}
#('Srv1','Srv2') | Run-Parallel -ScriptBlock $Script
include this code in your script
function Run-Parallel {
<#
.Synopsis
This is a quick and open-ended script multi-threader searcher
http://www.get-blog.com/?p=189#comment-28834
Improove by Alban LOPEZ 2016
.Description
This script will allow any general, external script to be multithreaded by providing a single
argument to that script and opening it in a seperate thread. It works as a filter in the
pipeline, or as a standalone script. It will read the argument either from the pipeline
or from a filename provided. It will send the results of the child script down the pipeline,
so it is best to use a script that returns some sort of object.
.PARAMETER ScriptBlock
This is where you provide the PowerShell ScriptBlock that you want to multithread.
.PARAMETER ItemObj
The ItemObj represents the arguments that are provided to the child script. This is an open ended
argument and can take a single object from the pipeline, an array, a collection, or a file name. The
multithreading script does it's best to find out which you have provided and handle it as such.
If you would like to provide a file, then the file is read with one object on each line and will
be provided as is to the script you are running as a string. If this is not desired, then use an array.
.PARAMETER InputParam
This allows you to specify the parameter for which your input objects are to be evaluated. As an example,
if you were to provide a computer name to the Get-Process cmdlet as just an argument, it would attempt to
find all processes where the name was the provided computername and fail. You need to specify that the
parameter that you are providing is the "ComputerName".
.PARAMETER AddParam
This allows you to specify additional parameters to the running command. For instance, if you are trying
to find the status of the "BITS" service on all servers in your list, you will need to specify the "Name"
parameter. This command takes a hash pair formatted as follows:
#{"key" = "Value"}
#{"key1" = "Value"; "key2" = 321; "key3" = 1..9}
.PARAMETER AddSwitch
This allows you to add additional switches to the command you are running. For instance, you may want
to include "RequiredServices" to the "Get-Service" cmdlet. This parameter will take a single string, or
an aray of strings as follows:
"RequiredServices"
#("RequiredServices", "DependentServices")
.PARAMETER MaxThreads
This is the maximum number of threads to run at any given time. If ressources are too congested try lowering
this number. The default value is 20.
.PARAMETER SleepTimer_ms
This is the time between cycles of the child process detection cycle. The default value is 200ms. If CPU
utilization is high then you can consider increasing this delay. If the child script takes a long time to
run, then you might increase this value to around 1000 (or 1 second in the detection cycle).
.PARAMETER TimeOutGlobal
this is the TimeOut in second for listen the last thread, after this timeOut All thread are closed, only each other are returned
.PARAMETER TimeOutThread
this is the TimeOut in second for each thread, the thread are aborted at this time
.PARAMETER PSModules
List of PSModule name to include for use in ScriptBlock
.PARAMETER PSSapins
List of PSSapin name to include for use in ScriptBlock
.EXAMPLE
1..20 | Run-Parallel -ScriptBlock {param($i) Start-Sleep $i; "> $i sec <"} -TimeOutGlobal 15 -TimeOutThread 5
.EXAMPLE
Both of these will execute the scriptBlock and provide each of the server names in AllServers.txt
while providing the results to GridView. The results will be the output of the child script.
gc AllServers.txt | Run-Parallel $ScriptBlock_GetTSUsers -MaxThreads $findOut_AD.ActiveDirectory.Servers.count -PSModules 'PSTerminalServices' | out-gridview
#>
Param(
[Parameter(ValueFromPipeline=$true,ValueFromPipelineByPropertyName=$true)]
$ItemObj,
[ScriptBlock]$ScriptBlock = $null,
$InputParam = $Null,
[HashTable] $AddParam = #{},
[Array] $AddSwitch = #(),
$MaxThreads = 20,
$SleepTimer_ms = 100,
$TimeOutGlobal = 300,
$TimeOutThread = 100,
[string[]]$PSSapins = $null,
[string[]]$PSModules = $null,
$Modedebug = $true
)
Begin{
$ISS = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
ForEach ($Snapin in $PSSapins){
[void]$ISS.ImportPSSnapIn($Snapin, [ref]$null)
}
ForEach ($Module in $PSModules){
[void]$ISS.ImportPSModule($Module)
}
$RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxThreads, $ISS, $Host)
$RunspacePool.CleanupInterval=1000
$RunspacePool.Open()
$Jobs = #()
}
Process{
#ForEach ($Object in $ItemObj){
if ($ItemObj){
Write-Host $ItemObj -ForegroundColor Yellow
$PowershellThread = [powershell]::Create().AddScript($ScriptBlock)
If ($InputParam -ne $Null){
$PowershellThread.AddParameter($InputParam, $ItemObj.ToString()) | out-null
}Else{
$PowershellThread.AddArgument($ItemObj.ToString()) | out-null
}
ForEach($Key in $AddParam.Keys){
$PowershellThread.AddParameter($Key, $AddParam.$key) | out-null
}
ForEach($Switch in $AddSwitch){
$PowershellThread.AddParameter($Switch) | out-null
}
$PowershellThread.RunspacePool = $RunspacePool
$Handle = $PowershellThread.BeginInvoke()
$Job = [pscustomobject][ordered]#{
Handle = $Handle
Thread = $PowershellThread
object = $ItemObj.ToString()
Started = Get-Date
}
$Jobs += $Job
}
#}
}
End{
$GlobalStartTime = Get-Date
$continue = $true
While (#($Jobs | Where-Object {$_.Handle -ne $Null}).count -gt 0 -and $continue) {
ForEach ($Job in $($Jobs | Where-Object {$_.Handle.IsCompleted -eq $True})){
$out = $Job.Thread.EndInvoke($Job.Handle)
$out # return vers la sortie srandard
#Write-Host $out -ForegroundColor green
$Job.Thread.Dispose() | Out-Null
$Job.Thread = $Null
$Job.Handle = $Null
}
foreach ($InProgress in $($Jobs | Where-Object {$_.Handle})) {
if ($TimeOutGlobal -and (($(Get-Date) - $GlobalStartTime).totalseconds -gt $TimeOutGlobal)){
$Continue = $false
#Write-Host $InProgress -ForegroundColor magenta
}
if (!$Continue -or ($TimeOutThread -and (($(Get-Date) - $InProgress.Started).totalseconds -gt $TimeOutThread))) {
$InProgress.thread.Stop() | Out-Null
$InProgress.thread.Dispose() | Out-Null
$InProgress.Thread = $Null
$InProgress.Handle = $Null
#Write-Host $InProgress -ForegroundColor red
}
}
Start-Sleep -Milliseconds $SleepTimer_ms
}
$RunspacePool.Close() | Out-Null
$RunspacePool.Dispose() | Out-Null
}
}

Old thread, but my contribution to it, is the part where you count the running jobs. Some of the answers above do not work for 0 or 1 running job. A little trick I use is to throw the results in a forced array, and then count it:
[array]$JobCount = Get-job -state Running
$JobCount.Count

Related

PowerShell, test the performance/efficiency of asynchronous tasks with Start-Job and Start-Process

I'm curious to test out the performance/usefulness of asynchronous tasks in PowerShell with Start-ThreadJob, Start-Job and Start-Process. I have a folder with about 100 zip files and so came up with the following test:
New-Item "000" -ItemType Directory -Force # Move the old zip files in here
foreach ($i in $zipfiles) {
$name = $i -split ".zip"
Start-Job -scriptblock {
7z.exe x -o"$name" .\$name
Move-Item $i 000\ -Force
7z.exe a $i .\$name\*.*
}
}
The problem with this is that it would start jobs for all 100 zip, which would probably be too much, so I want to set a value $numjobs, say 5, which I can change, such that only $numjobs will be started at the same time, and then the script will check for all 5 of the jobs ending before the next block of 5 will start. I'd like to then watch the CPU and memory depending upon the value of $numjobs
How would I tell a loop only to run 5 times, then wait for the Jobs to finish before continuing?
I see that it's easy to wait for jobs to finish
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange
but how might I wait for Start-Process tasks to end?
Although I would like to use Parallel-ForEach, the Enterprises that I work in will be solidly tied to PowerShell 5.1 for the next 3-4 years I expect with no chance to install PowerShell 7.x (although I would be curious for myself to test with Parallel-ForEach on my home system to compare all approaches).
ForEach-Object -Parallel and Start-ThreadJob have built-in functionalities to limit the number of threads that can run at the same time, the same applies for Runspace with their RunspacePool which is what is used behind the scenes by both cmdlets.
Start-Job does not offer such functionality because each Job runs in a separate process as opposed to the cmdlets mentioned before which run in different threads all in the same process. I would also personally not consider it as a parallelism alternative, it is pretty slow and in most cases a linear loop will be faster than it. Serialization and deserialization can be a problem in some cases too.
How to limit the number of running threads?
Both cmdlets offer the -ThrottleLimit parameter for this.
https://learn.microsoft.com/en-us/powershell/module/threadjob/start-threadjob?view=powershell-7.2#-throttlelimit
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object?view=powershell-7.2#-throttlelimit
How would the code look?
$dir = (New-Item "000" -ItemType Directory -Force).FullName
# ForEach-Object -Parallel
$zipfiles | ForEach-Object -Parallel {
$name = [IO.Path]::GetFileNameWithoutExtension($_)
7z.exe x -o $name .\$name
Move-Item $_ $using:dir -Force
7z.exe a $_ .\$name\*.*
} -ThrottleLimit 5
# Start-ThreadJob
$jobs = foreach ($i in $zipfiles) {
Start-ThreadJob {
$name = [IO.Path]::GetFileNameWithoutExtension($using:i)
7z.exe x -o $name .\$name
Move-Item $using:i $using:dir -Force
7z.exe a $using:i .\$name\*.*
} -ThrottleLimit 5
}
$jobs | Receive-Job -Wait -AutoRemoveJob
How to achieve the same having only PowerShell 5.1 available and no ability to install new modules?
The RunspacePool offer this same functionality, either with it's .SetMaxRunspaces(Int32) Method or by targeting one of the RunspaceFactory.CreateRunspacePool overloads offering a maxRunspaces limit as argument.
How would the code look?
$dir = (New-Item "000" -ItemType Directory -Force).FullName
$limit = 5
$iss = [initialsessionstate]::CreateDefault2()
$pool = [runspacefactory]::CreateRunspacePool(1, $limit, $iss, $Host)
$pool.ThreadOptions = [Management.Automation.Runspaces.PSThreadOptions]::ReuseThread
$pool.Open()
$tasks = foreach ($i in $zipfiles) {
$ps = [powershell]::Create().AddScript({
param($path, $dir)
$name = [IO.Path]::GetFileNameWithoutExtension($path)
7z.exe x -o $name .\$name
Move-Item $path $dir -Force
7z.exe a $path .\$name\*.*
}).AddParameters(#{ path = $i; dir = $dir })
$ps.RunspacePool = $pool
#{ Instance = $ps; AsyncResult = $ps.BeginInvoke() }
}
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
$task['Instance'].Dispose()
}
$pool.Dispose()
Note that for all examples, it's unclear if the 7zip code is correct or not, this answer attempts to demonstrate how async is done in PowerShell not how to zip files / folders.
Below is a helper function that can simplify the process of parallel invocations, tries to emulate ForEach-Object -Parallel and is compatible with PowerShell 5.1, though shouldn't be taken as a robust solution:
using namespace System.Management.Automation
using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Generic
function Invoke-Parallel {
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline, DontShow)]
[object] $InputObject,
[Parameter(Mandatory, Position = 0)]
[scriptblock] $ScriptBlock,
[Parameter()]
[int] $ThrottleLimit = 5,
[Parameter()]
[hashtable] $ArgumentList
)
begin {
$iss = [initialsessionstate]::CreateDefault2()
if($PSBoundParameters.ContainsKey('ArgumentList')) {
foreach($argument in $ArgumentList.GetEnumerator()) {
$iss.Variables.Add([SessionStateVariableEntry]::new($argument.Key, $argument.Value, ''))
}
}
$pool = [runspacefactory]::CreateRunspacePool(1, $ThrottleLimit, $iss, $Host)
$tasks = [List[hashtable]]::new()
$pool.ThreadOptions = [PSThreadOptions]::ReuseThread
$pool.Open()
}
process {
try {
$ps = [powershell]::Create().AddScript({
$args[0].InvokeWithContext($null, [psvariable]::new("_", $args[1]))
}).AddArgument($ScriptBlock.Ast.GetScriptBlock()).AddArgument($InputObject)
$ps.RunspacePool = $pool
$invocationInput = [PSDataCollection[object]]::new(1)
$invocationInput.Add($InputObject)
$tasks.Add(#{
Instance = $ps
AsyncResult = $ps.BeginInvoke($invocationInput)
})
}
catch {
$PSCmdlet.WriteError($_)
}
}
end {
try {
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
if($task['Instance'].HadErrors) {
$task['Instance'].Streams.Error
}
$task['Instance'].Dispose()
}
}
catch {
$PSCmdlet.WriteError($_)
}
finally {
if($pool) { $pool.Dispose() }
}
}
}
An example of how it works:
# Hashtable Key becomes the Variable Name inside the Runspace!
$outsideVariables = #{ Message = 'Hello from {0}' }
0..10 | Invoke-Parallel {
"[Item $_] - " + $message -f [runspace]::DefaultRunspace.InstanceId
Start-Sleep 5
} -ArgumentList $outsideVariables -ThrottleLimit 3
To add to Santiago Squarzon's helpful answer:
Below is helper function Measure-Parallel, which allows you to compare the speed of the following approaches to parallelism:
Start-Job:
Child-process-based: creates a child PowerShell process behind the scenes, which makes this approach both slow and resource-intensive.
Start-ThreadJob - ships with PowerShell (Core) (v6+) ; installable via Install-Module ThreadJob in Windows PowerShell v5.1:
Thread-based: Much lighter-weight than Start-Job while providing the same functionality; additionally avoids potential loss of type fidelity due to cross-process serialization / deserialization.
ForEach-Object -Parallel - available only in PowerShell (Core) 7.0+:
Thread-based: In essence a simplified wrapper around Start-ThreadJob with support for direct pipeline input and direct output, with invariably synchronous overall execution (all launched threads are waited for).
Start-Process
Child-process-based: Invokes an external program asynchronously by default, on Windows in a new window by default.
Note that this approach only makes sense if your parallel tasks only consist of a single call to an external program, as opposed to needing to execute a block of PowerShell code.
Notably, the only way to capture output with this approach is by redirection to a file, invariably as text only.
Note:
Given that the tests below wrap a single call to an external executable (such as 7z.exe in your case), the Start-Process approach will perform best, because it doesn't have the overhead of job management. However, as noted above, this approach has fundamental limitations.
Due to its complexity, the runspace-pool-based approach from Santiago's answer wasn't included; if Start-ThreadJob or ForEach-Object -Parallel are available to you, you won't need to resort to this approach.
Sample Measure-Parallelism call, which contrast the runtime performance of the approaches:
# Run 20 jobs / processes in parallel, 5 at a time, comparing
# all approaches.
# Note: Omit the -Approach argument to enter interactive mode.
Measure-Parallel -Approach All -BatchSize 5 -JobCount 20
Sample output from a macOS machine running PowerShell 7.2.6 (timings vary based on many factors, but the ratios should provide a sense of relative performance):
# ... output from the jobs
JobCount : 20
BatchSize : 5
BatchCount : 4
Start-Job (secs.) : 2.20
Start-ThreadJob (secs.) : 1.17
Start-Process (secs.) : 0.84
ForEach-Object -Parallel (secs.) : 0.94
Conclusions:
ForEach-Object -Parallel adds the least thread/job-management overhead, followed by Start-ThreadJob
Start-Job, due to needing an extra child process - for the hidden PowerShell instance running each task - is noticeably slower. It seems that on Windows the performance discrepancy is much more pronounced.
Measure-Parallel source code:
Important:
The function hard-codes sample input objects as well as what external program to invoke - you'll have to edit it yourself as needed; the hard-coded external program is the platform-native shell in this case (cmd.exe on Windows, /bin/sh on Unix-like platform), which is passed a command to simply echo each input object.
It wouldn't be too hard to modify the function to accept a script block as an argument, and to receive input objects for the jobs via the pipeline (though that would preclude the Start-Process approach, except if you explicitly call the block via the PowerShell CLI - but in that case Start-Job could just be used).
What the jobs / processes output goes directly to the display and cannot be captured.
The batch size, which defaults to 5, can be modified with -BatchSize; for the thread-based approaches, the batch size is also used as the -ThrottleLimit argument, i.e. the limit on how many threads are allowed to run at the same time. By default, a single batch is run, but you may request multiple batches indirectly by passing the total number of parallel runs to the -JobCount
You can select approaches via the array-valued -Approach parameter, which supports Job, ThreadJob, Process, ForEachParallel, and All, which combines all of the preceding.
If -Approach isn't specified, interactive mode is entered, where you're (repeatedly) prompted for the desired approach.
Except in interactive mode, a custom object with comparative timings is output.
function Measure-Parallel {
[CmdletBinding()]
param(
[ValidateRange(2, 2147483647)] [int] $BatchSize = 5,
[ValidateSet('Job', 'ThreadJob', 'Process', 'ForEachParallel', 'All')] [string[]] $Approach,
[ValidateRange(2, 2147483647)] [int] $JobCount = $BatchSize # pass a higher count to run multiple batches
)
$noForEachParallel = $PSVersionTable.PSVersion.Major -lt 7
$noStartThreadJob = -not (Get-Command -ErrorAction Ignore Start-ThreadJob)
$interactive = -not $Approach
if (-not $interactive) {
# Translate the approach arguments into their corresponding hashtable keys (see below).
if ('All' -eq $Approach) { $Approach = 'Job', 'ThreadJob', 'Process', 'ForEachParallel' }
$approaches = $Approach.ForEach({
if ($_ -eq 'ForEachParallel') { 'ForEach-Object -Parallel' }
else { $_ -replace '^', 'Start-' }
})
}
if ($noStartThreadJob) {
if ($interactive -or $approaches -contains 'Start-ThreadJob') {
Write-Warning "Start-ThreadJob is not installed, omitting its test; install it with ``Install-Module ThreadJob``"
$approaches = $approaches.Where({ $_ -ne 'Start-ThreadJob' })
}
}
if ($noForEachParallel) {
if ($interactive -or $approaches -contains 'ForEach-Object -Parallel') {
Write-Warning "ForEach-Object -Parallel is not available in this PowerShell version (requires v7+), omitting its test."
$approaches = $approaches.Where({ $_ -ne 'ForEach-Object -Parallel' })
}
}
# Simulated input: Create 'f0.zip', 'f1'.zip', ... file names.
$zipFiles = 0..($JobCount - 1) -replace '^', 'f' -replace '$', '.zip'
# Sample executables to run - here, the native shell is called to simply
# echo the argument given.
# The external program to invoke.
$exe = if ($env:OS -eq 'Windows_NT') { 'cmd.exe' } else { 'sh' }
# The list of its arguments *as a single string* - use '{0}' as the placeholder for where the input object should go.
$exeArgList = if ($env:OS -eq 'Windows_NT') { '/c "echo {0}"' } else { '-c "echo {0}"' }
# A hashtable with script blocks that implement the 3 approaches to parallelism.
$approachImpl = [ordered] #{}
$approachImpl['Start-Job'] = { # child-process-based job
param([array] $batch)
$batch |
ForEach-Object {
Start-Job { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob # wait for all jobs, relay their output, then remove them.
}
if (-not $noStartThreadJob) {
# If Start-ThreadJob is available, add an approach for it.
$approachImpl['Start-ThreadJob'] = { # thread-based job - requires Install-Module ThreadJob in WinPS
param([array] $batch)
$batch |
ForEach-Object {
Start-ThreadJob -ThrottleLimit $BatchSize { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob
}
}
if (-not $noForEachParallel) {
# If ForEach-Object -Parallel is supported (v7+), add an approach for it.
$approachImpl['ForEach-Object -Parallel'] = {
param([array] $batch)
$batch | ForEach-Object -ThrottleLimit $BatchSize -Parallel {
Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $_))
}
}
}
$approachImpl['Start-Process'] = { # direct execution of an external program
param([array] $batch)
$batch |
ForEach-Object {
Start-Process -NoNewWindow -PassThru $exe -ArgumentList ($exeArgList -f $_)
} |
Wait-Process # wait for all processes to terminate.
}
# Partition the array of all indices into subarrays (batches)
$batches = #(
0..([math]::Ceiling($zipFiles.Count / $batchSize) - 1) | ForEach-Object {
, $zipFiles[($_ * $batchSize)..($_ * $batchSize + $batchSize - 1)]
}
)
# In interactive use, print verbose messages by default
if ($interactive) { $VerbosePreference = 'Continue' }
:menu while ($true) {
if ($interactive) {
# Prompt for the approach to use.
$choices = $approachImpl.Keys.ForEach({
if ($_ -eq 'ForEach-Object -Parallel') { '&' + $_ }
else { $_ -replace '-', '-&' }
}) + '&Quit'
$choice = $host.ui.PromptForChoice("Approach", "Select parallelism approach:", $choices, 0)
if ($choice -eq $approachImpl.Count) { break }
$approachKey = #($approachImpl.Keys)[$choice]
}
else {
# Use the given approach(es)
$approachKey = $approaches
}
$tsTotals = foreach ($appr in $approachKey) {
$i = 0; $tsTotal = [timespan] 0
$batches | ForEach-Object {
$ts = Measure-Command { & $approachImpl[$appr] $_ | Out-Host }
Write-Verbose "$batchSize-element '$appr' batch finished in $($ts.TotalSeconds.ToString('N2')) secs."
$tsTotal += $ts
if (++$i -eq $batches.Count) {
# last batch processed.
if ($batches.Count -gt 1) {
Write-Verbose "'$appr' processing of $JobCount items overall finished in $($tsTotal.TotalSeconds.ToString('N2')) secs."
}
$tsTotal # output the overall timing for this approach
}
elseif ($interactive) {
$choice = $host.ui.PromptForChoice("Continue?", "Select action", ('&Next batch', '&Return to Menu', '&Quit'), 0)
if ($choice -eq 1) { continue menu }
if ($choice -eq 2) { break menu }
}
}
}
if (-not $interactive) {
# Output a result object with the overall timings.
$oht = [ordered] #{}; $i = 0
$oht['JobCount'] = $JobCount
$oht['BatchSize'] = $BatchSize
$oht['BatchCount'] = $batches.Count
foreach ($appr in $approachKey) {
$oht[($appr + ' (secs.)')] = $tsTotals[$i++].TotalSeconds.ToString('N2')
}
[pscustomobject] $oht
break # break out of the infinite :menu loop
}
}
}
You could add a counter to your foreach loop and break if the counter reaches your desired value
$numjobs = 5
$counter = 0
foreach ($i in $zipfiles) {
$counter++
if ($counter -ge $numjobs) {
break
}
<your code>
}
or with Powershells Foreach-Object
$numjobs = 5
$zipfiles | select -first $numjobs | Foreach-Object {
<your code>
}
If you want to process the whole array in batches and wait for each batch to complete you have to save the object that is returned by Start-Job and pass it to Wait-Job like this:
$items = 1..100
$batchsize = 5
while ($true) {
$jobs = #()
$counter = 0
foreach ($i in $items) {
if ($counter -ge $batchsize) {
$items = $items[$batchsize..($items.Length)]
break
}
$jobs += Start-Job -ScriptBlock { Start-Sleep 10 }
$counter++
}
foreach ($job in $jobs) {
$job | Wait-Job | Out-Null
}
if (!$items) {
break
}
}
By design arrays have fixed lengths, that's why I'm rewriting the whole array with $items = $items[$batchsize..($items.Length)]

Export and filter 365 Users Mailboxes Size Results Sort by Total Size form High to Low

Continuing from my previous question:
I have Powershell script that exports Mailboxes Size Results to CSV file.
The Results contain "Total Size" column that display results, and follow by Name.
However, i want the exported CSV file to filter and display only "greater then" 25GB Results, from high to low.
Like that:
Now, there is the traditional way to use excel to filter to Numbers in the CSV results- after the powershell export.
But, i want to have it in the CSV file, so i do not have to do it over and over again.
Here's the script:
Param
(
[Parameter(Mandatory = $false)]
[switch]$MFA,
[switch]$SharedMBOnly,
[switch]$UserMBOnly,
[string]$MBNamesFile,
[string]$UserName,
[string]$Password
)
Function Get_MailboxSize
{
$Stats=Get-MailboxStatistics -Identity $UPN
$IsArchieved=$Stats.IsArchiveMailbox
$ItemCount=$Stats.ItemCount
$TotalItemSize=$Stats.TotalItemSize
$TotalItemSizeinBytes= $TotalItemSize –replace “(.*\()|,| [a-z]*\)”, “”
$TotalSize=$stats.TotalItemSize.value -replace "\(.*",""
$DeletedItemCount=$Stats.DeletedItemCount
$TotalDeletedItemSize=$Stats.TotalDeletedItemSize
#Export result to csv
$Result=#{'Display Name'=$DisplayName;'User Principal Name'=$upn;'Mailbox Type'=$MailboxType;'Primary SMTP Address'=$PrimarySMTPAddress;'IsArchieved'=$IsArchieved;'Item Count'=$ItemCount;'Total Size'=$TotalSize;'Total Size (Bytes)'=$TotalItemSizeinBytes;'Deleted Item Count'=$DeletedItemCount;'Deleted Item Size'=$TotalDeletedItemSize;'Issue Warning Quota'=$IssueWarningQuota;'Prohibit Send Quota'=$ProhibitSendQuota;'Prohibit send Receive Quota'=$ProhibitSendReceiveQuota}
$Results= New-Object PSObject -Property $Result
$Results | Select-Object 'Display Name','User Principal Name','Mailbox Type','Primary SMTP Address','Item Count',#{Name = 'Total Size'; Expression = {($_."Total Size").Split(" ")[0]}},#{Name = 'Unit'; Expression = {($_."Total Size").Split(" ")[1]}},'Total Size (Bytes)','IsArchieved','Deleted Item Count','Deleted Item Size','Issue Warning Quota','Prohibit Send Quota','Prohibit Send Receive Quota' | Export-Csv -Path $ExportCSV -Notype -Append
}
Function main()
{
#Check for EXO v2 module inatallation
$Module = Get-Module ExchangeOnlineManagement -ListAvailable
if($Module.count -eq 0)
{
Write-Host Exchange Online PowerShell V2 module is not available -ForegroundColor yellow
$Confirm= Read-Host Are you sure you want to install module? [Y] Yes [N] No
if($Confirm -match "[yY]")
{
Write-host "Installing Exchange Online PowerShell module"
Install-Module ExchangeOnlineManagement -Repository PSGallery -AllowClobber -Force
}
else
{
Write-Host EXO V2 module is required to connect Exchange Online.Please install module using Install-Module ExchangeOnlineManagement cmdlet.
Exit
}
}
#Connect Exchange Online with MFA
if($MFA.IsPresent)
{
Connect-ExchangeOnline
}
#Authentication using non-MFA
else
{
#Storing credential in script for scheduling purpose/ Passing credential as parameter
if(($UserName -ne "") -and ($Password -ne ""))
{
$SecuredPassword = ConvertTo-SecureString -AsPlainText $Password -Force
$Credential = New-Object System.Management.Automation.PSCredential $UserName,$SecuredPassword
}
else
{
$Credential=Get-Credential -Credential $null
}
Connect-ExchangeOnline -Credential $Credential
}
#Output file declaration
$ExportCSV=".\MailboxSizeReport_$((Get-Date -format yyyy-MMM-dd-ddd` hh-mm` tt).ToString()).csv"
$Result=""
$Results=#()
$MBCount=0
$PrintedMBCount=0
Write-Host Generating mailbox size report...
#Check for input file
if([string]$MBNamesFile -ne "")
{
#We have an input file, read it into memory
$Mailboxes=#()
$Mailboxes=Import-Csv -Header "MBIdentity" $MBNamesFile
foreach($item in $Mailboxes)
{
$MBDetails=Get-Mailbox -Identity $item.MBIdentity
$UPN=$MBDetails.UserPrincipalName
$MailboxType=$MBDetails.RecipientTypeDetails
$DisplayName=$MBDetails.DisplayName
$PrimarySMTPAddress=$MBDetails.PrimarySMTPAddress
$IssueWarningQuota=$MBDetails.IssueWarningQuota -replace "\(.*",""
$ProhibitSendQuota=$MBDetails.ProhibitSendQuota -replace "\(.*",""
$ProhibitSendReceiveQuota=$MBDetails.ProhibitSendReceiveQuota -replace "\(.*",""
$MBCount++
Write-Progress -Activity "`n Processed mailbox count: $MBCount "`n" Currently Processing: $DisplayName"
Get_MailboxSize
$PrintedMBCount++
}
}
#Get all mailboxes from Office 365
else
{
Get-Mailbox -ResultSize Unlimited | foreach {
$UPN=$_.UserPrincipalName
$Mailboxtype=$_.RecipientTypeDetails
$DisplayName=$_.DisplayName
$PrimarySMTPAddress=$_.PrimarySMTPAddress
$IssueWarningQuota=$_.IssueWarningQuota -replace "\(.*",""
$ProhibitSendQuota=$_.ProhibitSendQuota -replace "\(.*",""
$ProhibitSendReceiveQuota=$_.ProhibitSendReceiveQuota -replace "\(.*",""
$MBCount++
Write-Progress -Activity "`n Processed mailbox count: $MBCount "`n" Currently Processing: $DisplayName"
if($SharedMBOnly.IsPresent -and ($Mailboxtype -ne "SharedMailbox"))
{
return
}
if($UserMBOnly.IsPresent -and ($MailboxType -ne "UserMailbox"))
{
return
}
Get_MailboxSize
$PrintedMBCount++
}
}
#Open output file after execution
If($PrintedMBCount -eq 0)
{
Write-Host No mailbox found
}
else
{
Write-Host `nThe output file contains $PrintedMBCount mailboxes.
if((Test-Path -Path $ExportCSV) -eq "True")
{
Write-Host `nThe Output file available in $ExportCSV -ForegroundColor Green
$Prompt = New-Object -ComObject wscript.shell
$UserInput = $Prompt.popup("Do you want to open output file?",`
0,"Open Output File",4)
If ($UserInput -eq 6)
{
Invoke-Item "$ExportCSV"
}
}
}
#Disconnect Exchange Online session
Disconnect-ExchangeOnline -Confirm:$false | Out-Null
}
. main
How can i achieve that?
It's a bit of an unfortunate scenario, but if you're outputting the results to the file on each iteration, you have 2 options:
At the end of the script, read the output file, filter the >25gb mailboxes, sort the objects, then output again
Instead of writing the user mailbox to the file each user, save to a
variable instead. At the end, filter, sort, then export to file
Without going too far into the code...
Option 1
Might be simplest, as you're working off two input sizes already. After you've retrieved all mailboxes and gotten all statistics, read the exported csv file and filter + sort the data. Then, export the info back to the file, overwriting. Something like
$tempImport = Import-CSV $exportCSV | Where-Object {($_.'Total Size' -ge 25) -and ($_.Unit -eq "GB")} | Sort-Object 'Total Size' -descending
$tempImport | Export-CSV $exportCSV -noTypeInformation
PowerShell may not like overwriting a file read-in on the same command, hence the saving as a temp variable.
Option 2
Create a live variable storing all mailbox data, and write the information at the end of the script instead of opening the file to append data each iteration. Then, at the end of the script, filter and sort before exporting.
$global:largeMailboxes = #() # definition to allow reading through all functions
Then, instead of exporting to CSV each time, add the result to the above variable
$tvar = $null
$tvar = $Results | Select-Object 'Display Name','User Principal Name','Mailbox Type','Primary SMTP Address','Item Count',#{Name = 'Total Size'; Expression = {($_."Total Size").Split(" ")[0]}},#{Name = 'Unit'; Expression = {($_."Total Size").Split(" ")[1]}},'Total Size (Bytes)','IsArchieved','Deleted Item Count','Deleted Item Size','Issue Warning Quota','Prohibit Send Quota','Prohibit Send Receive Quota'
$global:largeMailboxes += $tvar
#
# Alternatively, only add the mailbox if it's larger than 25GB, to avoid adding objects you don't care about
if ($TotalItemSizeinBytes -ge 26843545600) # This is 25 GB, better to make a variable called $minSize or such to store this in, in case you want to change it later.
{
# Above code to add to global variable
}
Once all mailboxes have been added, sort the object
$global:largeMailboxes = $global:largeMailboxes | Sort-Object 'Total Size' -descending
Then export as needed
$global:largeMailboxes | Export-CSV $exportCSV -NoTypeInformation

Writing an output on a .txt file on Powershell

I found a little script to get all the local groups and members and it's working perfectly but I need to write the output on PowerShell.
Trap {"Error: $_"; Break;}
function EnumLocalGroup($LocalGroup) {
$Group = [ADSI]"WinNT://$strComputer/$LocalGroup,group"
"`r`n" + "Group: $LocalGroup"
$Members = #($Group.psbase.Invoke("Members"))
foreach ($Member In $Members) {
$Name = $Member.GetType().InvokeMember("Name", 'GetProperty', $Null, $Member, $Null)
$Name
}
}
$strComputer = gc env:computername
"Computer: $strComputer"
$computer = [adsi]"WinNT://$strComputer"
$objCount = ($computer.PSBase.Children | Measure-Object).Count
$i = 0
foreach ($adsiObj in $computer.PSBase.Children) {
switch -regex ($adsiObj.PSBase.SchemaClassName) {
"group" {
$group = $adsiObj.Name
EnumLocalGroup $group
}
}
$i++
}
I already tried this:
function EnumLocalGroup($LocalGroup) | Out-File -FilePath "E:\PS\Malik\group.txt"
But the code won't start if I do that. I also tried to use this whole Out-File line at the end of the code after the } but doesn't work either and this is the only solution I find on Internet.
If you want to incorporate logging into a function you need to put it into the function body, e.g.
function EnumLocalGroup($LocalGroup) {
....
$foo = 'something'
$foo # output returned by function
$foo | Add-Content 'log.txt' # output to log file
...
}
or
function EnumLocalGroup($LocalGroup) {
...
$foo = 'something'
$foo | Tee-Object 'log.txt' -Append # output goes to log file and StdOut
...
}
Otherwise you have to do the logging when you call the function:
EnumLocalGroup $group | Add-Content 'C:\log.txt'

powershell result redirect into file

$z = "slc10nzf" , "slc12vbi"
$cls = gc C:\temp\cls.txt
foreach ($cl in $cls)
{
$vms = Get-Vm -ComputerName (Get-ClusterNode -Cluster $cl)
foreach ($vm in $vms)
{
$name = $vm.Name
if ($z -eq $name)
{
Write-Output "$name, $cl" | Out-File c:\temp\result.txt -Append
}
}
}
we have 4 hyper-v cluster and VM's are running on it
cluster names:
slchypervcl001,slchypervcl002,slchypervcl003,slchycl001
I have created script that to find which VM belongs to which cluster.
the script is working fine but the script redirecting result with duplicate any help appreciate.
the present script output is:
slc10nzf, slchypervcl001
slc12vbi, slchypcl001
slc12vbi,
You're using the incorrect comparison operator.
If you change $z -eq $name to $z -contains $name, your script should work as expected.

Hyper-V Powershell checkpoint creation/deletion is not synchronous. Is there a better option?

I've found myself having to write wrappers around powershell's Remove-VMSnapshot and Checkpoint-VM. The docs make no mention of it, but based on the Write-Host in both of the code snippets below executing, checkpoints are not fully deleted/created after the MS provided cmdlets. I hit this when trying to restore to a checkpoint by name immediately after creating it yielded an error.
Has anyone else encountered this? Thoughts on better ways to handle it? Tricks to prevent any of my code from calling the MS cmdlets directly?
function Remove-VMSnapshots-Sync
{
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)][object]$VM,
[Parameter(Mandatory=$True)][string]$CheckpointName,
)
$matchingSnapshots = #(Get-VMSnapshot $VM | Where-Object {$_.Name -eq $CheckpointName})
$matchingSnapshots | Remove-VMSnapshot
do
{
$matchingSnapshots = #(Get-VMSnapshot $VM | Where-Object {$_.Name -eq $CheckpointName})
$stillThere = $matchingSnapshots.length -gt 0
if ($stillThere)
{
Write-Host 'sleeping to try to let snapshot disappear'
start-sleep -s 1
}
} while ($stillThere)
}
function Checkpoint-VM-Sync
{
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)][object]$VM,
[Parameter(Mandatory=$True)][string]$CheckpointName
)
$checkpoint = Checkpoint-VM -VM $VM -SnapshotName $CheckpointName -PassThru
$checkpoint | Write-Host
while (-not (#(Get-VMSnapshot $VM | Select -ExpandProperty Id)).Contains($checkpoint.Id))
{
Write-Host 'waiting for checkpoint to be in list'
Get-VMSnapshot $VM | Write-Host
start-sleep -s 1
}
}
Had a similar issue, see the answer in Can I override a Powershell native cmdlet ... it shows you how easily it is to override commands.
You need to add it into your profile (only for you), or add it to the script (for every one that runs the script), it depends on your situation.