This article shows how to use Invoke-Async in PowerShell: https://sqljana.wordpress.com/2018/03/16/powershell-sql-server-run-in-parallel-collect-sql-results-with-print-output-from-across-your-sql-farm-fast/
I wish to run in parallel the copy-item cmdlet in PowerShell because the alternative is to use FileSystemObject via Excel and copy one file at a time out of a total of millions of files.
I have cobbled together the following:
.SYNOPSIS
<Brief description>
For examples type:
Get-Help .\<filename>.ps1 -examples
.DESCRIPTION
Copys files from one path to another
.PARAMETER FileList
e.g. C:\path\to\list\of\files\to\copy.txt
.PARAMETER NumCopyThreads
default is 8 (but can be 100 if you want to stress the machine to maximum!)
.EXAMPLE
.\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt
.NOTES
#>
[CmdletBinding()]
Param(
[String] $FileList = "C:\temp\copytest.csv",
[int] $NumCopyThreads = 8
)
$filesToCopy = New-Object "System.Collections.Generic.List[fileToCopy]"
$csv = Import-Csv $FileList
foreach($item in $csv)
{
$file = New-Object fileToCopy
$file.SrcFileName = $item.SrcFileName
$file.DestFileName = $item.DestFileName
$filesToCopy.add($file)
}
$sb = [scriptblock] {
param($file)
Copy-item -Path $file.SrcFileName -Destination $file.DestFileName
}
$results = Invoke-Async -Set $filesToCopy -SetParam file -ScriptBlock $sb -Verbose -Measure:$true -ThreadCount 8
$results | Format-Table
Class fileToCopy {
[String]$SrcFileName = ""
[String]$DestFileName = ""
}
the csv input for which looks like this:
SrcFileName,DestFileName
C:\Temp\dummy-data\101438\101438-0154723869.zip,\\backupserver\Project Archives\101438\0154723869.zip
C:\Temp\dummy-data\101438\101438-0165498273.xlsx,\\backupserver\Project Archives\101438\0165498273.xlsx
What am I missing to get this working, because when I run .\CopyFiles.ps1 -FileList C:\Temp\test.csv nothing happens. The files exist in the source path, but the file objects aren't being pulled from the -Set collection. (Unless I have misunderstood how the collection is used?)
No, I can't use robocopy to do this because there are millions of files which resolve to different paths depending upon their original location.
I have no explanation for your symptom based on the code in your question (see bottom section), but I suggest basing your solution on the (now) standard Start-ThreadJob cmdlet (comes with PowerShell Core; in Windows PowerShell, install it with Install-Module ThreadJob -Scope CurrentUser, for instance[1]):
Such a solution is more efficient than use of the third-party Invoke-Async function, which as of this writing is flawed in that it waits for jobs to finish in a tight loop, which creates unnecessary processing overhead.
Start-ThreadJob jobs are a lightweight, thread-based alternative to the process-based Start-Job background jobs, yet they integrate with the standard job-management cmdlets, such as Wait-Job and Receive-Job.
Here's a self-contained example based on your code that demonstrates its use:
Note: Whether you use Start-ThreadJob or Invoke-Async, you won't be able to explicit reference custom classes such as [fileToCopy] in the script block that runs in separate threads (runspaces; see bottom section), so the solution below simply uses [pscustomobject] instances with the properties of interest for simplicity and brevity.
# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
#'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'# | Set-Content $FileList
# How many threads at most to run concurrently.
$NumCopyThreads = 8
Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow
# Import the CSV data and transform it to [pscustomobject] instances
# with only .SrcFileName and .DestFileName properties - they take
# the place of your original [fileToCopy] instances.
$jobs = Import-Csv $FileList | Select-Object SrcFileName, DestFileName |
ForEach-Object {
# Start the thread job for the file pair at hand.
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList $_ {
param($f)
$simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
# Delay output for a random period.
$randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
Start-Sleep -Milliseconds $randomSleepPeriodMs
# Produce output.
"Copied $($f.SrcFileName) to $($f.DestFileName)"
# Wait for the remainder of the simulated runtime.
Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}
}
Write-Host "Waiting for $($jobs.Count) jobs to complete..."
# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"
# Clean up the temp. file
Remove-Item $FileList
The above yields something like:
Creating jobs...
Waiting for 10 jobs to complete...
Copied c:\tmp\b to \\server\share\b
Copied c:\tmp\g to \\server\share\g
Copied c:\tmp\d to \\server\share\d
Copied c:\tmp\f to \\server\share\f
Copied c:\tmp\e to \\server\share\e
Copied c:\tmp\h to \\server\share\h
Copied c:\tmp\c to \\server\share\c
Copied c:\tmp\a to \\server\share\a
Copied c:\tmp\j to \\server\share\j
Copied c:\tmp\i to \\server\share\i
Total time lapsed: 00:00:05.1961541
Note that the output received does not reflect the input order, and that the overall runtime is roughly 2 times the per-thread runtime of 2 seconds (plus overhead), because 2 "batches" have to be run due to the input count being 10, whereas only 8 threads were made available.
If you upped the thread count to 10 or more (50 is the default), the overall runtime would drop to 2 seconds plus overhead, because all jobs then run concurrently.
Caveat: The above numbers stem from running in PowerShell Core, version on Microsoft Windows 10 Pro (64-bit; Version 1903), using version 2.0.1 of the ThreadJob module.
Inexplicably, the same code is much slower in Windows PowerShell, v5.1.18362.145.
However, for performance and memory consumption it is better to use batching (chunking) in your case, i.e, to process multiple file pairs per thread.
The following solution demonstrates this approach; tweak $chunkSize to find a batch size that works for you.
# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
#'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'# | Set-Content $FileList
# How many threads at most to run concurrently.
$NumCopyThreads = 8
# How many files to process per thread
$chunkSize = 3
# The script block to run in each thread, which now receives a
# $chunkSize-sized *array* of file pairs.
$jobScriptBlock = {
param([pscustomobject[]] $filePairs)
$simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
# Delay output for a random period.
$randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
Start-Sleep -Milliseconds $randomSleepPeriodMs
# Produce output for each pair.
foreach ($filePair in $filePairs) {
"Copied $($filePair.SrcFileName) to $($filePair.DestFileName)"
}
# Wait for the remainder of the simulated runtime.
Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}
Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow
$jobs = & {
# Process the input objects in chunks.
$i = 0
$chunk = [pscustomobject[]]::new($chunkSize)
Import-Csv $FileList | Select-Object SrcFileName, DestFileName | ForEach-Object {
$chunk[$i % $chunkSize] = $_
if (++$i % $chunkSize -ne 0) { return }
# Note the need to wrap $chunk in a single-element helper array (, $chunk)
# to ensure that it is passed *as a whole* to the script block.
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $chunk) -ScriptBlock $jobScriptBlock
$chunk = [pscustomobject[]]::new($chunkSize) # we must create a new array
}
# Process any remaining objects.
# Note: $chunk -ne $null returns those elements in $chunk, if any, that are non-null
if ($remainingChunk = $chunk -ne $null) {
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $remainingChunk) -ScriptBlock $jobScriptBlock
}
}
Write-Host "Waiting for $($jobs.Count) jobs to complete..."
# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"
# Clean up the temp. file
Remove-Item $FileList
While the output is effectively the same, note how only 4 jobs were created this time, each of which processed (up to) $chunkSize (3) file pairs.
As for what you tried:
The screen shot you show suggests that the problem is that your custom class, [fileToCopy], isn't visible to the script block run by Invoke-Async.
Since Invoke-Async invokes the script block via the PowerShell SDK in separate runspaces that know nothing about the caller's state, it is to be expected that these runspaces don't know your class (this equally applies to Start-ThreadJob).
However, it is unclear why that is a problem in your code, because your script block doesn't make an explicit reference to you class: your script-block parameter $file is not type-constrained (it is implicitly [object]-typed).
Therefore, simply accessing the properties of your custom-class instance inside the script block should work, and indeed does in my tests on Windows PowerShell v5.1.18362.145 on Microsoft Windows 10 Pro (64-bit; Version 1903).
However, if your real script-block code were to explicitly reference custom class [fileToCopy] - such as by defining the parameter as param([fileToToCopy] $file) - you would see the symptom.
[1] In Windows PowerShell v3 and v4, which do not come with the PowerShellGet module, Install-Module isn't available by default. However, the module can be installed on demand, as described in Installing PowerShellGet.
I have a simple Powershell script that I wrote in the Powershell ISE. The gist of it is that it watches a named pipe for a write as a signal to perform an action, while at the same time monitoring its boss process. When the boss-process exits, the script exits as well. Simple.
After struggling to get the named pipe working in Powershell without crashing, I managed to get working code, which is shown below. However, while this functions great in the Powershell ISE and interactive terminals, I've been hopeless in getting this to work as a standalone script.
$bosspid = 16320
# Create the named pipe
$pipe = new-object System.IO.Pipes.NamedPipeServerStream(
-join('named-pipe-',$bosspid),
[System.IO.Pipes.PipeDirection]::InOut,
1,
[System.IO.Pipes.PipeTransmissionMode]::Byte,
[System.IO.Pipes.PipeOptions]::Asynchronous
)
# If we don't do it this way, Powershell crashes
# Brazenly stolen from github.com/Tadas/PSNamedPipes
Add-Type #"
using System;
public sealed class CallbackEventBridge
{
public event AsyncCallback CallbackComplete = delegate {};
private void CallbackInternal(IAsyncResult result)
{
CallbackComplete(result);
}
public AsyncCallback Callback
{
get { return new AsyncCallback(CallbackInternal); }
}
}
"#
$cbbridge = New-Object CallBackEventBridge
Register-ObjectEvent -InputObject $cbbridge -EventName CallBackComplete -Action {
param($asyncResult)
$pipe.EndWaitForConnection($asyncResult)
$pipe.Disconnect()
$pipe.BeginWaitForConnection($cbbridge.Callback, 1)
Host-Write('The named pipe has been written to!')
}
# Make sure to close when boss closes
$bossproc = Get-Process -pid $bosspid -ErrorAction SilentlyContinue
$exitsequence = {
$pipe.Dispose()
[Environment]::Exit(0)
}
if (-Not $bossproc) {$exitsequence.Invoke()}
Register-ObjectEvent $bossproc -EventName Exited -Action {$exitsequence.Invoke()}
# Begin watching for events until boss closes
$pipe.BeginWaitForConnection($cbbridge.Callback, 1)
The first problem is that the script terminates before doing anything meaningful. But delaying end of execution with such tricks like while($true) loops, the -NoExit flag, pause command, or even specific commands which seem made for the purpose, like Wait-Event, will cause the process to stay open, but still won't make it respond to the events.
I gave up on doing it the "proper" way and have instead reverted to using synchronous code wrapped in while-true blocks and Job control.
$bosspid = (get-process -name notepad).id
# Construct the named pipe's name
$pipename = -join('named-pipe-',$bosspid)
$fullpipename = -join("\\.\pipe\", $pipename) # fix SO highlighting: "
# This will run in a separate thread
$asyncloop = {
param($pipename, $bosspid)
# Create the named pipe
$pipe = new-object System.IO.Pipes.NamedPipeServerStream($pipename)
# The core loop
while($true) {
$pipe.WaitForConnection()
# The specific signal I'm using to let the loop continue is
# echo m > %pipename%
# in CMD. Powershell's echo will *not* work. Anything other than m
# will trigger the exit condition.
if ($pipe.ReadByte() -ne 109) {
break
}
$pipe.Disconnect()
# (The action this loop is supposed to perform on trigger goes here)
}
$pipe.Dispose()
}
# Set up the exit sequence
$bossproc = Get-Process -pid $bosspid -ErrorAction SilentlyContinue
$exitsequence = {
# While PS's echo doesn't work for passing messages, it does
# open and close the pipe which is enough to trigger the exit condition.
&{echo q > $fullpipename} 2> $null
[Environment]::Exit(0)
}
if ((-Not $bossproc) -or $bossproc.HasExited) { $exitsequence.Invoke() }
# Begin watching for events until boss closes
Start-Job -ScriptBlock $asyncloop -Name "miniloop" -ArgumentList $pipename,$bosspid
while($true) {
Start-Sleep 1
if ($bossproc.HasExited) { $exitsequence.Invoke() }
}
This code works just fine now and does the job I need.
I've written a script in Powershell 3.0 to monitor a log file for specific errors. The script starts a background process, which monitors the file. When anything gets written to the file, the background process simply passes it to the foreground process, if it matches the proper format (a datestamped line). The foreground process then counts the number of errors.
Everything works correctly with no errors. The issue is that, as the source logfile grows in size, the memory consumed by Powershell increases dramatically. These logs are capped at ~24M before they are rotated, which amounts to ~250K lines. In my tests, by the time the log size reaches ~80K lines or so, the monitor process is consuming 250M RAM (foreground and background processes combined. They're consuming ~70M combined when they first start. This type of growth is unacceptable in our environment. What can I do to decrease this?
Here's the script:
# Constants.
$F_IN = "C:\Temp\test.log"
$RE = "^\d+-\d+-\d+ \d+:\d+:\d+,.+ERROR.+Foo$"
$MAX_RESTARTS = 3 # Max restarts for failed background job.
$SLEEP_DELAY = 60 # In seconds.
# Background job.
$SCRIPT_BLOCK = { param($f, $r)
Get-Content -Path $f -Tail 0 -Wait -EA SilentlyContinue `
| Where { $_ -match $r }
}
function Start-FileMonitor {
Param([parameter(Mandatory=$true,Position=0)][alias("f")]
[String]$file,
[parameter(Mandatory=$true,Position=1)][alias("b")]
[ScriptBlock]$SCRIPT_BLOCK,
[parameter(Mandatory=$true,Position=2)][alias("re","r")]
[String]$regex)
$j = Start-Job -ScriptBlock $SCRIPT_BLOCK -Arg $file,$regex
return $j
}
function main {
# Tail log file in the background, return any errors.
$job = Start-FileMonitor -b $SCRIPT_BLOCK -f $F_IN -r $RE
$restarts = 0 # Current number of restarts.
# Poll background $job every $SLEEP_DELAY seconds.
While ($true) {
$a = (Receive-Job $job | Measure-Object)
If ($job.JobStateInfo.State -eq "Running") {
$restarts = 0
If ($a.Count -gt 0) {
$t0 = $a.Count
Write-Host "Error Count: ${t0}"
}
}
Else {
If ($restarts -lt $MAX_RESTARTS) {
$job = Start-FileMonitor -b $SCRIPT_BLOCK -f $F_IN -r $RE
$restarts++
Write-Host "Background job not running. Attempted restart ${restarts}."
}
Else {
Write-Host "`$MAX_RESTARTS (${MAX_RESTARTS}) exceeded. Exiting."
Break
}
}
# Sleep for $SLEEP_DELAY.
Start-Sleep -Seconds $SLEEP_DELAY
}
Write-Host "Done."
}
# Execute script.
main
...and here's the sample data:
2015-11-19 00:00:00, WARN Foo
2015-11-19 00:00:00, ERROR Foo
In order to replicate this issue:
Paste the sample data lines into the file C:\Temp\test.log. Save.
Start the monitoring script.
Paste additional sample data lines into the log and save. Wait for the Error Count: line to confirm that everything is working correctly.
Continue to paste additional lines and watch the memory consumption for powershell.exe in Task Manager. Note how much it increases at 400 lines...800 lines...8,000 lines...80,000 lines...
I wasn't sure how to describe this problem in the title so here goes.
I call a function from a script in another script. In that function i have a while loop that basically keeps looping through a set of ip's and looks up their hostname. when the while loop times out or we have all the host names.
it returns the hostnames.
My problem is that the return value contains every single Write-Host i'm doing in that function.
i know it's because Write-Host puts stuff on the pipeline and the return just returns whatever it has.
How do i go about fixing this?
The entire script i run get's logged in a log file which is why i want to have some verbose logging.
| out-null on write-host fixes the issue but it doesn't print the write-host values in the script.
in main.psm1 i have a function like so:
$nodes = #("ip1", "ip2", "ip3", "ip4")
$nodesnames = DoStuff -nodes $nodes
then in functions.psm1 i have functions like:
Function DoStuff
{
param($nodes)
$timeout = 300
$timetaken = 0
$sleepseconds = 5
$nodenames = #("$env:COMPUTERNAME")
while(($nodenames.count -lt $nodes.count) -and ($timetaken -lt $timeout))
{
try
{
Write-Host "Stuff"
foreach($node in $nodes)
{
$nodename = SuperawesomeFunction $node
Write-Host "$nodename"
if($nodenames -notcontains $nodename)
{
$nodenames += #($nodename)
}
}
}
catch
{
Write-Host "DoStuff Failed because $_"
}
Start-Sleep $sleepseconds
$timetaken += $sleepseconds
}
return $nodenames
}
Function SuperawesomeFunction
{
param($node)
$nodename = [System.Net.Dns]::GetHostEntry("$node")
return $nodename
}
Thanks.
So the answer is, your function is working like it is by design. In PowerShell a function will return output in general to the pipeline, unless specifically directed otherwise.
You used Echo before, which is an alias of Write-Output, and output is passed down the pipe as I mentioned before. As such it would be collected along with the returned $nodenames array.
Replacing Echo with Write-Host changes everything because Write-Host specifically tells PowerShell to send the information to the host application (usually the PowerShell Console or PowerShell ISE).
How do you avoid this? You could add a parameter specifying a path for a logfile, and have your function update the logfile directly, and only output the relevant data.
Or you can make an object with a pair of properties that gets passed back down the pipe which has the DNS results in one property, and the errors in another.
You could use Write-Error in the function, and set it up as an advanced function to support -errorvariable and capture the errors in a separate variable. To be honest, I'm not sure how to do that, I've never done it, but I'm 90% sure that it can be done.
I have made a very simple chat program. All it does is saves the message as a new line in a txt file and displays that txt file in a richtextbox. The issue I am having is the user has to click "Update" to update the chat log. I wrote a loop that would check every second to see if there was a new message, however this locks up the form and if the user wants to send a message during that time, they would need to kill the form.
Is this in anyway possible? Either auto updating the chat log when a new line is added to the txt document or even a regular interval?
Currently what I am using is this:
$i = 1
While ($i -eq 1)
{
sleep -Seconds 1
$BEFORE = $richtextbox1.Text
$CHATLOG = "\\NetworkShareEveryoneHasAccess\Chat.txt"
$TOOUTPUT = Get-Content $CHATLOG | Out-String
$richtextbox1.Text = $TOOUTPUT
$AFTER = $richtextbox1.Text
if ($BEFORE -ne $AFTER)
{
$i = 0
$richtextbox2.Enabled = $true
$richtextbox2.SelectionStart = $richtextbox2.TextLength;
$richtextbox2.ScrollToCaret()
$richtextbox2.Focus()
}
}
Again, the problem with this is that is freezes the form while it is checking to see if a new message (new line in the txt file).
With my limited knowledge of PowerShell, I want to say this is not possible, but as I said my PowerShell knowledge is limited.
Instead of using sleep and checking the whole file for any new writes, you could separate the processes for writing and reading from the file. Then in the reading process use a form of Get-Content -Path Chatfile.log -Tail 1 -Wait which will return a new line everytime the Chatfile is altered. This shows the concept:
$EXITFLAG = 0
$chatLog = "chat.txt"
$logRead = Start-Process powershell.exe -PassThru -ArgumentList "-file logread.ps1"
while ($EXITFLAG -eq 0)
{
$userTxt = Read-Host "Enter some text to chat (Q to exit)"
if ($userTxt -eq "q")
{
$EXITFLAG = 1
}
else
{
$userTxt >> $chatLog
}
}
Stop-Process $logRead.Id