Powershell Job Memory Consumption Issue - powershell

I've been struggling with this for a week now and have exhausted all the methods and options I have found online. I am hoping someone here will be able to help me out with this.
I am using powershell to start 8 jobs, each job running FFmpeg to stream a 7 minute file to a remote RTMP server. This is pulling from a file on the disk and each job uses a different file. The command is in a do while loop so that it is constantly restreaming.
This is causing the shell I launched the jobs from to accumulate a massive amount of memory, consuming all that it can. In 24 hours it consumed 30 of the 32 GB of my server.
Here is my launch code, any help would be appreciated.
start-job -Name v6 -scriptblock {
do { $d = $true; $f = Invoke-Expression -Command "ffmpeg -re -i `"C:\Shares\Matthew\180p_3000k.mp4`" -vcodec copy -acodec copy -f flv -y rtmp://<ip>/<appName>/<streamName>"; $f = $null }
while ($d = $true)
}
I've tried to receive the jobs and pipe it to out-null, I've tried setting $f to $null before starting the do while loop, and some other things I found online but to no avail. Thanks everyone for your time!

Better late than never I guess. I've had the same problem with huge memory consumption when running ffmpeg in Powershell jobs. The core of the issue is that a Powershell job will store any/all output into memory, and ffmpeg is extremely happy to log output to both standard output and standard error streams.
My solution was to add the parameter "-loglevel quiet" to ffmpeg. Alternatively you could redirect both the standard and error streams to null (it's not enough to redirect just the standard stream). For more on how to redirect the standard streams, refer to this question: Redirection of standard and error output appending to the same log-file

There are many ways to invoke a PowerShell script or an expression.
One of my favorites is using RunspacePool. In a RunspacePool each task is started with a new runspace. When the task completes the runspace is disposed of. This should keep memory consumption fairly constant. It is not the easiest method to get your head around, but here is a bare-bone example that might work for you.
$Throttle = 10 # Maximum jobs that can run simultaneously in a runpool
$maxjobs = 8 # Maximum jobs that will run simultaneously
[System.Collections.ArrayList]$jobs = #()
$script:currentjobs = 0
$jobindex = 0
$jobaction = "ffmpeg -re -i `"C:\Shares\Matthew\180p_3000k.mp4`" -vcodec copy -acodec copy -f flv -y rtmp://<ip>/<appName>/<streamName>"
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$runspace = [runspacefactory]::CreateRunspacePool(1, $Throttle, $sessionstate, $Host)
$runspace.Open()
function QueueJob{
param($jobindex)
$job = "" | Select-Object JobID,Script,Errors,PS,Runspace,Handle
$job.JobID = $jobindex
$job.Script = $jobaction
$job.PS = [System.Management.Automation.PowerShell]::create()
$job.PS.RunspacePool = $runspace
[void]$job.PS.AddScript($job.Script)
$job.Runspace = $job.PS.BeginInvoke()
$job.Handle = $job.Runspace.AsyncWaitHandle
Write-Host "----------------- Starting Job: $("{0:D4}" -f $job.JobID) --------------------" -ForegroundColor Blue
$jobs.add($job)
$script:currentjobs++
}
Function ServiceJobs{
foreach($job in $Jobs){
If ($job.Runspace.isCompleted) {
$result = $job.PS.EndInvoke($job.Runspace)
$job.PS.dispose()
$job.Runspace = $null
$job.PS = $null
$script:currentjobs--
Write-Host "----------------- Job Completed: $("{0:D4}" -f $job.JobID) --------------------" -ForegroundColor Yellow
Write-Host $result
$Jobs.Remove($job)
return
}
}
}
while ($true) {
While ($script:currentjobs -le $maxjobs){
QueueJob $jobindex
$jobindex++
}
ServiceJobs
Start-Sleep 1
}
$runspace.Close()
[gc]::Collect()
It is using an infinite loop without proper termination and is lacking any sort of error checking, but hopefully it is enough to demonstrate the technique.

Related

Test-Path timeout for PowerShell

I'm trying to routinely check the presence of particular strings in text files on hundreds of computers on our domain.
foreach ($computer in $computers) {
$hostname = $computer.DNSHostName
if (Test-Connection $hostname -Count 2 -Quiet) {
$FilePath = "\\" + $hostname + "c$\SomeDirectory\SomeFile.txt"
if (Test-Path -Path $FilePath) {
# Check for string
}
}
}
For the most part, the pattern of Test-Connection and then Test-Path is effective and fast. There are certain computers, however, that ping successfully but Test-Path takes around 60 seconds to resolve to FALSE. I'm not sure why, but it may be a domain trust issue.
For situations like this, I would like to have a timeout for Test-Path that defaults to FALSE if it takes more than 2 seconds.
Unfortunately the solution in a related thread (How can I wrap this Powershell cmdlet into a timeout function?) does not apply to my situation. The proposed do-while loop gets hung up in the code block.
I've been experimenting with Jobs but it appears even this won't force quit the Test-Path command:
Start-Job -ScriptBlock {param($Path) Test-Path $Path} -ArgumentList $Path | Wait-Job -Timeout 2 | Remove-Job -Force
The job continues to hang in the background. Is this the cleanest way I can achieve my requirements above? Is there a better way to timeout Test-Path so the script doesn't hang besides spawning asynchronous activities? Many thanks.
Wrap your code in a [powershell] object and call BeginInvoke() to execute it asynchronously, then use the associated WaitHandle to wait for it to complete only for a set amount of time.
$sleepDuration = Get-Random 2,3
$ps = [powershell]::Create().AddScript("Start-Sleep -Seconds $sleepDuration; 'Done!'")
# execute it asynchronously
$handle = $ps.BeginInvoke()
# Wait 2500 milliseconds for it to finish
if(-not $handle.AsyncWaitHandle.WaitOne(2500)){
throw "timed out"
return
}
# WaitOne() returned $true, let's fetch the result
$result = $ps.EndInvoke($handle)
return $result
In the example above, we randomly sleep for either 2 or 3 seconds, but set a 2 and a half second timeout - try running it a couple of times to see the effect :)

Copy-item using invoke-async in Powershell

This article shows how to use Invoke-Async in PowerShell: https://sqljana.wordpress.com/2018/03/16/powershell-sql-server-run-in-parallel-collect-sql-results-with-print-output-from-across-your-sql-farm-fast/
I wish to run in parallel the copy-item cmdlet in PowerShell because the alternative is to use FileSystemObject via Excel and copy one file at a time out of a total of millions of files.
I have cobbled together the following:
.SYNOPSIS
<Brief description>
For examples type:
Get-Help .\<filename>.ps1 -examples
.DESCRIPTION
Copys files from one path to another
.PARAMETER FileList
e.g. C:\path\to\list\of\files\to\copy.txt
.PARAMETER NumCopyThreads
default is 8 (but can be 100 if you want to stress the machine to maximum!)
.EXAMPLE
.\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt
.NOTES
#>
[CmdletBinding()]
Param(
[String] $FileList = "C:\temp\copytest.csv",
[int] $NumCopyThreads = 8
)
$filesToCopy = New-Object "System.Collections.Generic.List[fileToCopy]"
$csv = Import-Csv $FileList
foreach($item in $csv)
{
$file = New-Object fileToCopy
$file.SrcFileName = $item.SrcFileName
$file.DestFileName = $item.DestFileName
$filesToCopy.add($file)
}
$sb = [scriptblock] {
param($file)
Copy-item -Path $file.SrcFileName -Destination $file.DestFileName
}
$results = Invoke-Async -Set $filesToCopy -SetParam file -ScriptBlock $sb -Verbose -Measure:$true -ThreadCount 8
$results | Format-Table
Class fileToCopy {
[String]$SrcFileName = ""
[String]$DestFileName = ""
}
the csv input for which looks like this:
SrcFileName,DestFileName
C:\Temp\dummy-data\101438\101438-0154723869.zip,\\backupserver\Project Archives\101438\0154723869.zip
C:\Temp\dummy-data\101438\101438-0165498273.xlsx,\\backupserver\Project Archives\101438\0165498273.xlsx
What am I missing to get this working, because when I run .\CopyFiles.ps1 -FileList C:\Temp\test.csv nothing happens. The files exist in the source path, but the file objects aren't being pulled from the -Set collection. (Unless I have misunderstood how the collection is used?)
No, I can't use robocopy to do this because there are millions of files which resolve to different paths depending upon their original location.
I have no explanation for your symptom based on the code in your question (see bottom section), but I suggest basing your solution on the (now) standard Start-ThreadJob cmdlet (comes with PowerShell Core; in Windows PowerShell, install it with Install-Module ThreadJob -Scope CurrentUser, for instance[1]):
Such a solution is more efficient than use of the third-party Invoke-Async function, which as of this writing is flawed in that it waits for jobs to finish in a tight loop, which creates unnecessary processing overhead.
Start-ThreadJob jobs are a lightweight, thread-based alternative to the process-based Start-Job background jobs, yet they integrate with the standard job-management cmdlets, such as Wait-Job and Receive-Job.
Here's a self-contained example based on your code that demonstrates its use:
Note: Whether you use Start-ThreadJob or Invoke-Async, you won't be able to explicit reference custom classes such as [fileToCopy] in the script block that runs in separate threads (runspaces; see bottom section), so the solution below simply uses [pscustomobject] instances with the properties of interest for simplicity and brevity.
# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
#'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'# | Set-Content $FileList
# How many threads at most to run concurrently.
$NumCopyThreads = 8
Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow
# Import the CSV data and transform it to [pscustomobject] instances
# with only .SrcFileName and .DestFileName properties - they take
# the place of your original [fileToCopy] instances.
$jobs = Import-Csv $FileList | Select-Object SrcFileName, DestFileName |
ForEach-Object {
# Start the thread job for the file pair at hand.
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList $_ {
param($f)
$simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
# Delay output for a random period.
$randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
Start-Sleep -Milliseconds $randomSleepPeriodMs
# Produce output.
"Copied $($f.SrcFileName) to $($f.DestFileName)"
# Wait for the remainder of the simulated runtime.
Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}
}
Write-Host "Waiting for $($jobs.Count) jobs to complete..."
# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"
# Clean up the temp. file
Remove-Item $FileList
The above yields something like:
Creating jobs...
Waiting for 10 jobs to complete...
Copied c:\tmp\b to \\server\share\b
Copied c:\tmp\g to \\server\share\g
Copied c:\tmp\d to \\server\share\d
Copied c:\tmp\f to \\server\share\f
Copied c:\tmp\e to \\server\share\e
Copied c:\tmp\h to \\server\share\h
Copied c:\tmp\c to \\server\share\c
Copied c:\tmp\a to \\server\share\a
Copied c:\tmp\j to \\server\share\j
Copied c:\tmp\i to \\server\share\i
Total time lapsed: 00:00:05.1961541
Note that the output received does not reflect the input order, and that the overall runtime is roughly 2 times the per-thread runtime of 2 seconds (plus overhead), because 2 "batches" have to be run due to the input count being 10, whereas only 8 threads were made available.
If you upped the thread count to 10 or more (50 is the default), the overall runtime would drop to 2 seconds plus overhead, because all jobs then run concurrently.
Caveat: The above numbers stem from running in PowerShell Core, version on Microsoft Windows 10 Pro (64-bit; Version 1903), using version 2.0.1 of the ThreadJob module.
Inexplicably, the same code is much slower in Windows PowerShell, v5.1.18362.145.
However, for performance and memory consumption it is better to use batching (chunking) in your case, i.e, to process multiple file pairs per thread.
The following solution demonstrates this approach; tweak $chunkSize to find a batch size that works for you.
# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
#'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'# | Set-Content $FileList
# How many threads at most to run concurrently.
$NumCopyThreads = 8
# How many files to process per thread
$chunkSize = 3
# The script block to run in each thread, which now receives a
# $chunkSize-sized *array* of file pairs.
$jobScriptBlock = {
param([pscustomobject[]] $filePairs)
$simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
# Delay output for a random period.
$randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
Start-Sleep -Milliseconds $randomSleepPeriodMs
# Produce output for each pair.
foreach ($filePair in $filePairs) {
"Copied $($filePair.SrcFileName) to $($filePair.DestFileName)"
}
# Wait for the remainder of the simulated runtime.
Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}
Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow
$jobs = & {
# Process the input objects in chunks.
$i = 0
$chunk = [pscustomobject[]]::new($chunkSize)
Import-Csv $FileList | Select-Object SrcFileName, DestFileName | ForEach-Object {
$chunk[$i % $chunkSize] = $_
if (++$i % $chunkSize -ne 0) { return }
# Note the need to wrap $chunk in a single-element helper array (, $chunk)
# to ensure that it is passed *as a whole* to the script block.
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $chunk) -ScriptBlock $jobScriptBlock
$chunk = [pscustomobject[]]::new($chunkSize) # we must create a new array
}
# Process any remaining objects.
# Note: $chunk -ne $null returns those elements in $chunk, if any, that are non-null
if ($remainingChunk = $chunk -ne $null) {
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $remainingChunk) -ScriptBlock $jobScriptBlock
}
}
Write-Host "Waiting for $($jobs.Count) jobs to complete..."
# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"
# Clean up the temp. file
Remove-Item $FileList
While the output is effectively the same, note how only 4 jobs were created this time, each of which processed (up to) $chunkSize (3) file pairs.
As for what you tried:
The screen shot you show suggests that the problem is that your custom class, [fileToCopy], isn't visible to the script block run by Invoke-Async.
Since Invoke-Async invokes the script block via the PowerShell SDK in separate runspaces that know nothing about the caller's state, it is to be expected that these runspaces don't know your class (this equally applies to Start-ThreadJob).
However, it is unclear why that is a problem in your code, because your script block doesn't make an explicit reference to you class: your script-block parameter $file is not type-constrained (it is implicitly [object]-typed).
Therefore, simply accessing the properties of your custom-class instance inside the script block should work, and indeed does in my tests on Windows PowerShell v5.1.18362.145 on Microsoft Windows 10 Pro (64-bit; Version 1903).
However, if your real script-block code were to explicitly reference custom class [fileToCopy] - such as by defining the parameter as param([fileToToCopy] $file) - you would see the symptom.
[1] In Windows PowerShell v3 and v4, which do not come with the PowerShellGet module, Install-Module isn't available by default. However, the module can be installed on demand, as described in Installing PowerShellGet.

Memory leak in Powershell 3.0 script to monitor log file

I've written a script in Powershell 3.0 to monitor a log file for specific errors. The script starts a background process, which monitors the file. When anything gets written to the file, the background process simply passes it to the foreground process, if it matches the proper format (a datestamped line). The foreground process then counts the number of errors.
Everything works correctly with no errors. The issue is that, as the source logfile grows in size, the memory consumed by Powershell increases dramatically. These logs are capped at ~24M before they are rotated, which amounts to ~250K lines. In my tests, by the time the log size reaches ~80K lines or so, the monitor process is consuming 250M RAM (foreground and background processes combined. They're consuming ~70M combined when they first start. This type of growth is unacceptable in our environment. What can I do to decrease this?
Here's the script:
# Constants.
$F_IN = "C:\Temp\test.log"
$RE = "^\d+-\d+-\d+ \d+:\d+:\d+,.+ERROR.+Foo$"
$MAX_RESTARTS = 3 # Max restarts for failed background job.
$SLEEP_DELAY = 60 # In seconds.
# Background job.
$SCRIPT_BLOCK = { param($f, $r)
Get-Content -Path $f -Tail 0 -Wait -EA SilentlyContinue `
| Where { $_ -match $r }
}
function Start-FileMonitor {
Param([parameter(Mandatory=$true,Position=0)][alias("f")]
[String]$file,
[parameter(Mandatory=$true,Position=1)][alias("b")]
[ScriptBlock]$SCRIPT_BLOCK,
[parameter(Mandatory=$true,Position=2)][alias("re","r")]
[String]$regex)
$j = Start-Job -ScriptBlock $SCRIPT_BLOCK -Arg $file,$regex
return $j
}
function main {
# Tail log file in the background, return any errors.
$job = Start-FileMonitor -b $SCRIPT_BLOCK -f $F_IN -r $RE
$restarts = 0 # Current number of restarts.
# Poll background $job every $SLEEP_DELAY seconds.
While ($true) {
$a = (Receive-Job $job | Measure-Object)
If ($job.JobStateInfo.State -eq "Running") {
$restarts = 0
If ($a.Count -gt 0) {
$t0 = $a.Count
Write-Host "Error Count: ${t0}"
}
}
Else {
If ($restarts -lt $MAX_RESTARTS) {
$job = Start-FileMonitor -b $SCRIPT_BLOCK -f $F_IN -r $RE
$restarts++
Write-Host "Background job not running. Attempted restart ${restarts}."
}
Else {
Write-Host "`$MAX_RESTARTS (${MAX_RESTARTS}) exceeded. Exiting."
Break
}
}
# Sleep for $SLEEP_DELAY.
Start-Sleep -Seconds $SLEEP_DELAY
}
Write-Host "Done."
}
# Execute script.
main
...and here's the sample data:
2015-11-19 00:00:00, WARN Foo
2015-11-19 00:00:00, ERROR Foo
In order to replicate this issue:
Paste the sample data lines into the file C:\Temp\test.log. Save.
Start the monitoring script.
Paste additional sample data lines into the log and save. Wait for the Error Count: line to confirm that everything is working correctly.
Continue to paste additional lines and watch the memory consumption for powershell.exe in Task Manager. Note how much it increases at 400 lines...800 lines...8,000 lines...80,000 lines...

Huge memory use from small powershell script

I have this script to trap the IP's on a network to compare to historic packet captures as part of a larger problem solving exercise.
function qp($comp)
{
$p = ping $comp -n 1 -w 2 -4
IF($? -eq $true){$out = $p[1].split("[")[1].split("]")[0]}
else{$out = $False}
return $out
}
$comps = Get-Content C:\PacketCapture\comps.txt
DO
{
foreach($comp in $comps)
{
ECHO "$(qp $comp);$comp" >>"C:\PacketCapture\IP_$(Get-date -format HHmm-ddMMyy).txt"
}
Start-sleep 3600
}
until($null -eq "WANG")
I set this off initially a fortnight ago, at the middle of last week the terminal it was running on was grinding to a halt as the memory use for the power shell process was nearly at 2GB.
I stopped and restarted it and again we were at 1.2GB RAM use this morning.
Whilst not particularly critical, I've modified this to run once then stop/start itself, I'm interested to know which element is causing the memory leak and how I would identify that in the future.
You could try invoking garbage collection manually. I think the do/until loop is a good place for it:
DO
{
foreach($comp in $comps)
{
ECHO "$(qp $comp);$comp" >>"C:\PacketCapture\IP_$(Get-date -format HHmm-ddMMyy).txt"
}
Start-sleep 3600
[GC]::Collect()
}
until($null -eq "WANG")
Try introducing garbage collection into your code with [System.GC]::Collect()
Source : https://dmitrysotnikov.wordpress.com/2012/02/24/freeing-up-memory-in-powershell-using-garbage-collector

Limited powershell start-jobs

I curious if you can answer this or point me in the right direction.
I've written a script that tests/monitors urls. I'm not posting the code ( unless you want me to ) because there is no error in the code. It works great. I can even scriptblock run it as part of start job. The issue I have seems to be that I can not run more than 3 jobs at time.. or they hang. I'm not sure why this is. I can run it for a total of 15 urls throttled to 3 and it's great. If I try to run it on 15 urls with 4 as my run limit, they will hang.. and I can kill one at a time.. until only 3 remain and those will finish. So it seems that I can only start a total of 3 powershell instances or they hang. Anyone explain why this is? All my searches lead me to pages that show how to throttle and it's not really my issue.
Watching the processes, each consumes about 25MBs of memory and sits there idle... If I kill one the other 3 will start using cpu and process go up to maybe 30MBs of memory and terminate completed. System has 8GBs of memory & a quad cord I5-2400 CPU # 3.10GHz. As requested...
Param(
$file
)
$testscript =
{
Param(
[string]$url,
#[ValidateSet('InternetExplorer','Chrome','Firefox','Safari','Opera', IgnoreCase = $true)]
[string]$browser="InternetExplorer",
[string]$teststring="Solution Center",
[int]$timeout=20,
[int]$retry
)
$i=0
do {
$userAgent = [Microsoft.PowerShell.Commands.PSUserAgent]::$browser
$data = Invoke-WebRequest $url -UserAgent $userAgent -TimeoutSec $timeout
$data.Content
$findit = $data.Content.Contains($teststring)
$i++
If ($findit){
break
}
}
while ($i -lt $retry)
if(!$findit) {
Echo "opcmsg a=PSURLCheck o=NHTSA msg_t='$teststring was not found on $url or $url failed to load'"
}
}
$urls = Import-Csv $file | % {
Start-Job -ScriptBlock $testscript -ArgumentList $_.url, $_.browser, $_.teststring, $_.retry
}
While (#(Get-Job | Where { $_.State -eq "Running" }).Count -ne 0)
{ Write-Host "Processing URLs..."
Get-Job
Start-Sleep -Seconds 5
}
$Data = ForEach ($Job in (Get-Job)) {
Receive-Job $Job
Remove-Job $Job
}
$data | select *
So I've used new system.net.webclient and I've even tried doing this with [System.Collections.Queue]... but all three methods use Jobs... so it appears.. I can not run more than three start jobs at any one time.
Are you sure your code is fine? If you're calling separate powershell sessions multiple times memory can be consumed very quickly. Check process monitor for high CPU or memory usage and ensure your blocks are terminating. Or post the code.