How to wait until each file is created within PowerShell? - powershell

I am using the following script to call an API to create a bunch of log files. It loops through all the first and second items in a text file (same way as how tokens are defined in CMD). This works fine:
$linesXmls = #(Get-Content -Path "$path") -Replace " "
ForEach($line in $lines) {
$s = $line -split ","
$var1s = $s[0]
$var2s = $s[1]
Start-Process -FilePath $APIpath -ArgumentList "$argument1", "$argument2", "$argument3", "$path\folder\$var2s.log"
}
However this does not wait until the files are created before executing the next statement and causes my script to fail. Is there a way to wait before each $var2s log file is created?

Start-Process is asynchronous, in the sense that it instructs the operating system to create and start a new process, and then immediately returns - a classic "fire-and-forget" mechanism.
For this reason, you may likely see that multiple processes are running simultaneously, which might caught issues with access to shared file resources.
To prevent such a race condition, use the -Wait swith parameter with Start-Process to force it to wait until the resulting process has exited until returning:
$linesXmls = #(Get-Content -Path "$path") -Replace " "
ForEach($line in $lines) {
$s = $line -split ","
$var1s = $s[0]
$var2s = $s[1]
Start-Process -FilePath $APIpath -ArgumentList "$argument1", "$argument2", "$argument3", "$path\folder\$var2s.log" -Wait
}

Related

Add the result of a Powershell Start-Process to a file instead of replacing it with -RedirectStandardOutput

I use the following command in Powershell to convert files in the background but would like to log the results all in one file. Now the -RedirectStandardOutput replaces the file each run.
foreach ($l in gc ./files.txt) {Start-Process -FilePath "c:\Program Files (x86)\calibre2\ebook-convert.exe" -Argumentlist "'$l' '$l.epub'" -Wait -WindowStyle Hidden -RedirectStandardOutput log.txt}
I tried with a redirect but then the log is empty.
If possible I would like to keep it a one-liner.
foreach ($l in gc ./files.txt) {Start-Process -FilePath "c:\Program Files (x86)\calibre2\ebook-convert.exe" -Argumentlist "`"$l`" `"$l.epub`"" -Wait -WindowStyle Hidden *> log.txt}
If sequential, synchronous execution is acceptable, you can simplify your command to use a single output redirection (the assumption is that ebook-convert.exe is a console-subsystem application, which PowerShell therefore executes synchronously (in a blocking manner).:
Get-Content ./files.txt | ForEach-Object {
& 'c:\Program Files (x86)\calibre2\ebook-convert.exe' $_ "$_.epub"
} *> log.txt
Placing * before > tells PowerShell to redirect all output streams, which in the case of external programs means both stdout and stderr.
If you want to control the character encoding, use Out-File - which > effectively is an alias for - with its -Encoding parameter; or, preferably, with text output - which external-program output always is in PowerShell - Set-Content. To also capture stderr output, append *>&1 to the command in the pipeline segment before the Out-File / Set-Content call.
Note that PowerShell never passes raw output from external programs through to files - they are first always decoded into .NET strings, based on the encoding stored in [Console]::OutputEncoding (the system's active legacy OEM code page by default), and then re-encoded on saving to a file, using the file-writing cmdlet's own defaults, unless overridden with -Encoding - see this answer for more information.
If you want asynchronous, parallel execution (such as via Start-Process, which is asynchronous by default), your best bet is to:
write to separate (temporary) files:
Pass a different output file to -RedirectStandardOutput / -RedirectStandardError in each invocation.
Note that if you want to merge stdout and stderr output and capture it in the same file, you'll have to call your .exe file via a shell (possibly another PowerShell instance) and use its redirection features; for PowerShell, it would be *>log.txt; for cmd.exe (as shown below), it would be > log.txt 2>&1
wait for all launched processes to finish:
Pass -PassThru to Start-Process and collect the process-information objects returned.
Then use Wait-Process to wait for all processes to terminate; use the -Timeout parameter as needed.
and then merge them into a single log file.
Here's an implementation:
$procsAndLogFiles =
Get-Content ./files.txt | ForEach-Object -Begin { $i = 0 } {
# Create a distinct log file for each process,
# and return its name along with a process-information object representing
# each process as a custom object.
$logFile = 'log{0:000}.txt' -f ++$i
[pscustomobject] #{
LogFile = $logFile
Process = Start-Process -PassThru -WindowStyle Hidden `
-FilePath 'cmd.exe' `
-Argumentlist "/c `"`"c:\Program Files (x86)\calibre2\ebook-convert.exe`" `"$_`" `"$_.epub`" >`"$logFile`" 2>&1`""
}
}
# Wait for all processes to terminate.
# Add -Timeout and error handling as needed.
$procsAndLogFiles.Process | Wait-Process
# Merge all log files.
Get-Content -LiteralPath $procsAndLogFiles.LogFile > log.txt
# Clean up.
Remove-Item -LiteralPath $procsAndLogFiles.LogFile
If you want throttled parallel execution, so as to limit how many background processes can run at a time:
# Limit how many background processes may run in parallel at most.
$maxParallelProcesses = 10
# Initialize the log file.
# Use -Force to unconditionally replace an existing file.
New-Item log.txt
# Initialize the list in which those input files whose conversion
# failed due to timing out are recorded.
$allTimedOutFiles = [System.Collections.Generic.List[string]]::new()
# Process the input files in batches of $maxParallelProcesses
Get-Content -ReadCount $maxParallelProcesses ./files.txt |
ForEach-Object {
$i = 0
$launchInfos = foreach ($file in $_) {
# Create a distinct log file for each process,
# and return its name along with the input file name / path, and
# a process-information object representing each process, as a custom object.
$logFile = 'log{0:000}.txt' -f ++$i
[pscustomobject] #{
InputFile = $file
LogFile = $logFile
Process = Start-Process -PassThru -WindowStyle Hidden `
-FilePath 'cmd.exe' `
-ArgumentList "/c `"`"c:\Program Files (x86)\calibre2\ebook-convert.exe`" `"$file`" `"$_.epub`" >`"$file`" 2>&1`""
}
}
# Wait for the processes to terminate, with a timeout.
$launchInfos.Process | Wait-Process -Timeout 30 -ErrorAction SilentlyContinue -ErrorVariable errs
# If not all processes terminated within the timeout period,
# forcefully terminate those that didn't.
if ($errs) {
$timedOut = $launchInfos | Where-Object { -not $_.Process.HasExited }
Write-Warning "Conversion of the following input files timed out; the processes will killed:`n$($timedOut.InputFile)"
$timedOut.Process | Stop-Process -Force
$allTimedOutFiles.AddRange(#($timedOut.InputFile))
}
# Merge all temp. log files and append to the overall log file.
$tempLogFiles = Get-Content -ErrorAction Ignore -LiteralPath ($launchInfos.LogFile | Sort-Object)
$tempLogFiles | Get-Content >> log.txt
# Clean up.
$tempLogFiles | Remove-Item
}
# * log.txt now contains all combined logs
# * $allTimedOutFiles now contains all input file names / paths
# whose conversion was aborted due to timing out.
Note that the above throttling technique isn't optimal, because each batch of inputs is waited for together, at which point the next batch is started. A better approach is to launch a new process as soon as one of the available parallel "slots" up, as shown in the next section; however, note that PowerShell (Core) 7+ is required.
PowerShell (Core) 7+: Efficiently throttled parallel execution, using ForEach-Object -Parallel:
PowerShell (Core) 7+ introduced thread-based parallelism to the ForEach-Object cmdlet, via the -Parallel parameter, which has built-in throttling that defaults to a maximum of 5 threads by default, but can be controlled explicitly via the -ThrottleLimit parameter.
This enables efficient throttling, as a new thread is started as soon as an available slot opens up.
The following is a self-contained example that demonstrates the technique; it works on both Windows and Unix-like platforms:
Inputs are 9 integers, and the conversion process is simulated simply by sleeping a random number of seconds between 1 and 9, followed by echoing the input number.
A timeout of 6 seconds is applied to each child process, meaning that a random number of child processes will time out and be killed.
#requires -Version 7
# Use ForEach-Object -Parallel to launch child processes in parallel,
# limiting the number of parallel threads (from which the child processes are
# launched) via -ThrottleLimit.
# -AsJob returns a single job whose child jobs track the threads created.
$job =
1..9 | ForEach-Object -ThrottleLimit 3 -AsJob -Parallel {
# Determine a temporary, thread-specific log file name.
$logFile = 'log_{0:000}.txt' -f $_
# Pick a radom sleep time that may or may not be smaller than the timeout period.
$sleepTime = Get-Random -Minimum 1 -Maximum 9
# Launch the external program asynchronously and save information about
# the newly launched child process.
if ($env:OS -eq 'Windows_NT') {
$ps = Start-Process -PassThru -WindowStyle Hidden cmd.exe "/c `"timeout $sleepTime >NUL & echo $_ >$logFile 2>&1`""
}
else { # macOS, Linux
$ps = Start-Process -PassThru sh "-c `"{ sleep $sleepTime; echo $_; } >$logFile 2>&1`""
}
# Wait for the child process to exit within a given timeout period.
$ps | Wait-Process -Timeout 6 -ErrorAction SilentlyContinue
# Check if a timout has occurred (implied by the process not having exited yet)
$timedOut = -not $ps.HasExited
if ($timedOut) {
# Note: Only [Console]::WriteLine produces immediate output, directly to the display.
[Console]::WriteLine("Warning: Conversion timed out for: $_")
# Kill the timed-out process.
$ps | Stop-Process -Force
}
# Construct and output a custom object that indicates the input at hand,
# the associated log file, and whether a timeout occurred.
[pscustomobject] #{
InputFile = $_
LogFile = $logFile
TimedOut = $timedOut
}
}
# Wait for all child processes to exit or be killed
$processInfos = $job | Receive-Job -Wait -AutoRemoveJob
# Merge all temporary log files into an overall log file.
$tempLogFiles = Get-Item -ErrorAction Ignore -LiteralPath ($processInfos.LogFile | Sort-Object)
$tempLogFiles | Get-Content > log.txt
# Clean up the temporary log files.
$tempLogFiles | Remove-Item
# To illustrate the results, show the overall log file's content
# and which inputs caused timeouts.
[pscustomobject] #{
CombinedLogContent = Get-Content -Raw log.txt
InputsThatFailed = ($processInfos | Where-Object TimedOut).InputFile
} | Format-List
# Clean up the overall log file.
Remove-Item log.txt
You can use redirection and append to files if you don't use Start-Process, but a direct invocation:
foreach ($l in gc ./files.txt) {& 'C:\Program Files (x86)\calibre2\ebook-convert.exe' "$l" "$l.epub" *>> log.txt}
For the moment I'm using an adaption on mklement0's answer.
ebook-convert.exe often hangs so I need to close it down if the process takes longer than the designated time.
This needs to run asynchronous because the number of files and the processor time taken (5 to 25% depending on the conversion).
The timeout needs to be per file, not on the whole of the jobs.
$procsAndLogFiles =
Get-Content ./files.txt | ForEach-Object -Begin { $i = 0 } {
# Create a distinct log file for each process,
# and return its name along with a process-information object representing
# each process as a custom object.
$logFile = 'd:\temp\log{0:000}.txt' -f ++$i
Write-Host "$(Get-Date) $_"
[pscustomobject] #{
LogFile = $logFile
Process = Start-Process `
-PassThru `
-FilePath "c:\Program Files (x86)\calibre2\ebook-convert.exe" `
-Argumentlist "`"$_`" `"$_.epub`"" `
-WindowStyle Hidden `
-RedirectStandardOutput $logFile `
| Wait-Process -Timeout 30
}
}
# Wait for all processes to terminate.
# Add -Timeout and error handling as needed.
$procsAndLogFiles.Process
# Merge all log files.
Get-Content -LiteralPath $procsAndLogFiles.LogFile > log.txt
# Clean up.
Remove-Item -LiteralPath $procsAndLogFiles.LogFile
Since the problem in my other answer was not completely solved (not killing all the processes that take longer than the timeout limit) I rewrote it in Ruby.
It's not powershell but if you land on this question and also know Ruby (or not) it could help you.
I believe it's the use of Threads that solves the killing issue.
require 'logger'
LOG = Logger.new("log.txt")
PROGRAM = 'c:\Program Files (x86)\calibre2\ebook-convert.exe'
LIST = 'E:\ebooks\english\_convert\mobi\files.txt'
TIMEOUT = 30
MAXTHREADS = 6
def run file, log: nil
output = ""
command = %Q{"#{PROGRAM}" "#{file}" "#{file}.epub" 2>&1}
IO.popen(command+" 2>&1") do |io|
begin
while (line=io.gets) do
output += line
log.info line.chomp if log
end
rescue => ex
log.error ex.message
system("taskkill /f /pid #{io.pid}") rescue log.error $#
end
end
if File.exist? "#{file}.epub"
puts "converted #{file}.epub"
File.delete(file)
else
puts "error #{file}"
end
output
end
threads = []
File.readlines(LIST).each do |file|
file.chomp! # remove line feed
# some checks
if !File.exist? file
puts "not found #{file}"
next
end
if File.exist? "#{file}.epub"
puts "skipping #{file}"
File.delete(file) if File.exist? file
next
end
# go on with the conversion
thread = Thread.new {run(file, log: LOG)}
threads << thread
next if threads.length < MAXTHREADS
threads.each do |t|
t.join(TIMEOUT)
unless t.alive?
t.kill
threads.delete(t)
end
end
end

PowerShell: What is the best method for parallel execution of commands with logging

I have a vendor provided application that needs to be run once with each of several locally stored configuration files. I have tried two different methods to run these commands in parallel, one is slower but seems to work fine, the second seems faster but I get no logs so I can't confirm that the application is actually being executed.
This first method is definitely running parallel. It is much faster than the old procedural version. So that is great.:
$configList = $(Get-ChildItem E:\Path\*config.ps1 -recurse).FullName
$configList | ForEach-Object -Parallel {
# Get the config and logfile path for each
. $_
Invoke-expression "& E:\dir\app.exe $config" | Out-File -Append -FilePath $logfile
}
From my reading I understand that using a RunspacePool can be even faster, and this does execute faster, but it is producing no entries in my logs so I can't ensure it is running properly. Any help is appreciated.
$RunspacePool = [runspacefactory]::CreateRunspacePool(1, 5)
$RunspacePool.Open()
$Jobs = #()
$configList = $(Get-ChildItem E:\Path\*config.ps1 -recurse).FullName
$configList | ForEach-Object {
# Get the config and logfile path for each
. $_
$PowerShell = [powershell]::Create()
$PowerShell.RunspacePool = $RunspacePool
$PowerShell.AddScript({Invoke-expression "& E:\dir\app.exe $config" | Out-File -Append -FilePath $logfile})
$Jobs += $PowerShell.BeginInvoke()
}

Word com object failing

SCRIPT PURPOSE
The idea behind the script is to recursively extract the text from a large amount of documents and update a field in an Azure SQL database with the extracted text. Basically we are moving away from Windows Search of document contents to an SQL full text search to improve the speed.
ISSUE
When the script encounters an issue opening the file such as it being password protected, it fails for every single document that follows. Here is the section of the script that processes the files:
foreach ($list in (Get-ChildItem ( join-path $PSScriptRoot "\FileLists\*" ) -include *.txt )) {
## Word object
$word = New-Object -ComObject word.application
$word.Visible = $false
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatText")
$word.DisplayAlerts = 0
Write-Output ""
Write-Output "################# Parsing $list"
Write-Output ""
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
foreach ($file in (Get-Content $list)) {
if ($file -like "*-*" -and $file -notlike "*~*") {
Write-Output "Processing: $($file)"
Try {
$doc = $word.Documents.OpenNoRepairDialog($file, $false, $false, $false, "ttt")
if ($doc) {
$fileName = [io.path]::GetFileNameWithoutExtension($file)
$fileName = $filename + ".txt"
$doc.SaveAs("$env:TEMP\$fileName", [ref]$saveFormat)
$doc.Close()
$4ID = $fileName.split('-')[-1].replace(' ', '').replace(".txt", "")
$text = Get-Content -raw "$env:TEMP\$fileName"
$text = $text.replace("'", "''")
$query += "
('$text', $4ID),"
Remove-Item -Force "$env:TEMP\$fileName"
<# Upload to azure #>
$query = $query.Substring(0,$query.Length-1)
$query += ";"
Invoke-Sqlcmd #params -Query $Query -ErrorAction "SilentlyContinue"
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
}
}
Catch {
Write-Host "$($file) failed to process" -ForegroundColor RED;
continue
}
}
}
Remove-Item -Force $list.FullName
Write-Output ""
Write-Output "Uploading to azure"
Write-Output ""
<# Upload to azure #>
Invoke-Sqlcmd #params -Query $setQuery -ErrorAction "SilentlyContinue"
$word.Quit()
TASKKILL /f /PID WINWORD.EXE
}
Basically it parses through a folder of .txt files that contain x amount of document paths, creates a T-SQL update statement and runs against an Azure SQL database after each file is fully parsed. The files are generated with the following:
if (!($continue)) {
if ($pdf){
$files = (Get-ChildItem -force -recurse $documentFolder -include *.pdf).fullname
}
else {
$files = (Get-ChildItem -force -recurse $documentFolder -include *.doc, *.docx).fullname
}
$files | Out-File (Join-Path $PSScriptRoot "\documents.txt")
$i=0; Get-Content $documentFile -ReadCount $interval | %{$i++; $_ | Out-File (Join-Path $PSScriptRoot "\FileLists\documents_$i.txt")}
}
The $interval variable defines how many files are set to be extracted for each given upload to azure. Initially i had the word object being created outside the loop and never closed until the end. Unfortunately this doesn't seem to work as every time the script hits a file it cannot open, every file that follows will fail, until it reaches the end of the inner foreach loop foreach ($file in (Get-Content $list)) {.
This means that to get the expected outcome i have to run this with an interval of 1 which takes far too long.
This is a shot in the dark
But to me it sounds like the reason its failing is because the Word Com object is now prompting you for some action due since it cannot open the file so all following items in the loop also fail. This might explain why it works if you set the $Interval to 1 because when its 1 it is closing and opening the Com object every time and that takes forever (I did this with excel).
What you can do is in your catch statement, close and open a new Word Com object which should lets you continue on with the loop (but it will be a bit slower if it needs to open the Com object a lot).
If you want to debug the problem even more, set the Com object to be visible, and slowly loop through your program without interacting with Word. This will show you what is happening with Word and if there are any prompts that are causing the application to hang.
Of course, if you want to run it at full speed, you will need to detect which documents you can't open before hand or you could multithread it by opening several Word Com objects which will allow you to load several documents at a time.
As for...
ISSUE
When the script encounters an issue opening the file such as it being password protected, it fails for every single document that follows.
... then test for this as noted here...
How to check if a word file has a password?
$filename = "C:\path\to\your.doc"
$wd = New-Object -COM "Word.Application"
try {
$doc = $wd.Documents.Open($filename, $null, $null, $null, "")
} catch {
Write-Host "$filename is password-protected!"
}
... and skip the file to avoid the failure of the remaining files.

Powershell read file live and output to speech

What i am looking for is to take powershell and read the file content out to the speech synthesis module.
File name for this example will be read.txt.
Start of the Speech module:
Add-Type -AssemblyName System.speech
$Narrator1 = New-Object System.Speech.Synthesis.SpeechSynthesizer
$Narrator1.SelectVoice('Microsoft Zira Desktop')
$Narrator1.Rate = 2
$Location = "$env:userprofile\Desktop\read.txt"
$Contents = Get-Content $Location
Get-Content $Location -wait -Tail 2 | where {$Narrator1.Speak($Contents)}
This works once. I like to use the Clear-Content to wipe the read.txt after each initial read and have powershell wait until new line is added to the read.txt file then process it again to speak the content. I believe I can also make it run in the background with -windowstyle hidden
Thank you in advanced for any assistance.
Scott
I don't think a loop is the answer, I would use the FileSystemWatcher to detect when the file has changed. Try this:
$fsw = New-Object System.IO.FileSystemWatcher
$fsw.Path = "$env:userprofile\Desktop"
$fsw.Filter = 'read.txt'
Register-ObjectEvent -InputObject $fsw -EventName Changed -Action {
Add-Type -AssemblyName System.speech
$Narrator1 = New-Object System.Speech.Synthesis.SpeechSynthesizer
$Narrator1.SelectVoice('Microsoft Zira Desktop')
$Narrator1.Rate = 2
$file = $Event.SourceEventArgs.FullPath
$Contents = Get-Content $file
$Narrator1.Speak($Contents)
}
Your only problem was that you accidentally used the previously assigned $Contents variable in the where (Where-Object) script block rather than $_, the automatic variable representing the current pipeline object:
Get-Content $Location -Wait -Tail 2 | Where-Object { $Narrator1.Speak($_) }
Get-Content $Location -Wait will poll the input file ($Location here) every second to check for new content and pass it through the pipeline (the -Tail argument only applies to the initial reading of the file; as new lines are added, they are all passed through).
The pipeline will stay alive indefinitely - until you delete the $Location file or abort processing.
Since the command is blocking, you obviously need another session / process to add content to file $Location, such as another PowerShell window or a text editor that has the file open and modifies its content.
You can keep appending to the file with >>, but that will keep growing it.
To discard the file's previous content, you must indeed use Clear-Content, as you say, which truncates the existing file without recreating it, and therefore keeps the pipeline alive; e.g.:
Clear-Content $Location
'another line to speak' > $Location
Caveat: Special chars. such as ! and ? seem to cause silent failure to speak. If anyone knows why, do tell us. The docs offer no immediate clues.
As for background operation:
With a background job, curiously, the Clear-Content / > combination appears not to work; if anybody knows why, please tell us.
However, using >> - which grows the file - does work.
The following snippet demonstrates the use of a background job to keep speaking input as it is being added to a specified file (with some delay), until a special end-of-input string is sent:
# Determine the input file (on the user's desktop)
$file = Join-Path ([environment]::GetFolderPath('Desktop')) 'read.txt'
# Initialize the input file.
$null > $file
# Define a special string that acts as the end-of-input marker.
$eofMarker = '[quit]'
# Start the background job (PSv3+ syntax)
$job = Start-Job {
Add-Type -AssemblyName System.speech
$Narrator1 = New-Object System.Speech.Synthesis.SpeechSynthesizer
$Narrator1.SelectVoice('Microsoft Zira Desktop')
$Narrator1.Rate = 2
while ($true) { # A dummy loop we can break out of on receiving the end-of-input marker
Get-Content $using:file -Wait | Where-Object {
if ($_ -eq $using:eofMarker) { break } # End-of-input marker received -> exit the pipeline.
$Narrator1.Speak($_)
}
}
# Remove the input file.
Remove-Item -ErrorAction Ignore -LiteralPath $using:file
}
# Speak 1, 2, ..., 10
1..10 | ForEach-Object {
Write-Verbose -Verbose $_
# !! Inexplicably, using Clear-Content followed by > to keep
# !! replacing the file content does *not* work with a background task.
# !! >> - which *appends* to the file - does work, however.
$_ >> $file
}
# Send the end-of-input marker to make the background job stop reading.
$eofMarker >> $file
# Wait for background processing to finish.
# Note: We'll get here long before the background job has finished speaking.
Write-Verbose -Verbose 'Waiting for processing to finish to cleanup...'
$null = Receive-Job $job -wait -AutoRemoveJob

Informatica pre session error: process waiting infinitely for child process to finish

I have an powershell script as below: this to convert the string to new line.
$path = Join-Path $args[0] $args[1]
$word = "#####"
$replacement = "`r`n"
$text = get-content $path
$newText = $text -replace $word,$replacement
$newText > $path
$c=get-content $path
Set-Content -Encoding ASCII $c -Path $path
and calling it from a .bat file
This works fine when called manually but going to infinite loop when called from Informatica pre session with error message :
waiting n seconds for child process of the shell command to exit.
What can be possible gone wrong with the code ?