I've encountered this issue in a longer script and have simplified here to show the minimal code required to reproduce it (I think). It outputs numbers followed by letters:
1 a
1 b
1 c...
2 a
2 b
2 c...
all the way to "500 z"
Function Write-HelloWorld
{
Param($number)
write-host -Object $number
}
$numbers = 1..500
$letters = "a".."z"
$Function = get-command Write-HelloWorld
$numbers | ForEach-Object -Parallel {
${function:Write-HelloWorld} = $using:Function
foreach($letter in $using:letters) {
Write-HelloWorld -number "$_ $letter"
}
}
I'm seeing 2 types of sporadically (not every time I run it):
"The term 'write-host' is not recognized as a name of a cmdlet, function, script file, or executable program." As understand it, write-host should always be available. Adding the line "Import-Module Microsoft.PowerShell.Utility" just before the call to write-host didn't help
Odd output like the below, specifically all the "write-host :" lines.
Santiago Squarzon's helpful answer demonstrates the problem with your approach well and links to a GitHub issue explaining the underlying problem (runspace affinity); however, that demonstration isn't the right solution (it wasn't meant to be), as it uses explicit synchronization to allow only one thread at a time to call the function, which negates the benefits of parallelism.
As for a solution:
You must pass a string representation of your Write-HelloWorld's function body to the ForEach-Object -Parallel call:
Function Write-HelloWorld
{
Param($number)
write-host -Object $number
}
$numbers = 1..500
$letters = "a".."z"
# Get the body of the Write-HelloWorld function *as a string*
# Alternative, as suggested by #Santiago:
# $funcDefString = (Get-Command -Type Function Write-HelloWorld).Definition
$funcDefString = ${function:Write-HelloWorld}.ToString()
$numbers | ForEach-Object -Parallel {
# Redefine the Write-HelloWorld function in this thread,
# using the *string* representation of its body.
${function:Write-HelloWorld} = $using:funcDefString
foreach($letter in $using:letters) {
Write-HelloWorld -number "$_ $letter"
}
}
${function:Write-HelloWorld} is an instance of namespace variable notation, which allows you to both get a function (its body as a [scriptblock] instance) and to set (define) it, by assigning either a [scriptblock] or a string containing the function body.
By passing a string, the function is recreated in the context of each thread, which avoids cross-thread issues that can arise when you pass a [System.Management.Automation.FunctionInfo] instance, as output by Get-Command, which contains a [scriptblock] that is bound to the runspace in which it was defined (i.e., the caller's; that script block is said to have affinity with the caller's runspace), and calling this bound [scriptblock] instance from other threads (runspaces) isn't safe.
By contrast, by redefining the function in each thread, via a string, a thread-specific [scriptblock] instance bound to that thread is created, which can safely be called.
In fact, you appear to have found a loophole, given that when you attempt to use a [scriptblock] instance directly with the $using: scope, the command by design breaks with an explicit error message:
A ForEach-Object -Parallel using variable cannot be a script block.
Passed-in script block variables are not supported with ForEach-Object -Parallel,
and can result in undefined behavior
In other words: PowerShell shouldn't even let you do what you attempted to do, but unfortunately does, as of PowerShell Core 7.2.7, resulting in the obscure failures you saw - see GitHub issue #16461.
Potential future improvement:
An enhancement is being discussed in GitHub issue #12240 to support copying the caller's state to the parallel threads on demand, which would automatically make the caller's functions available, without the need for manual redefinition.
Note, this answer is meant to prove a point but does not provide the correct solution to the problem.
See mklement0's helpful answer for the proper way to solve this by simply passing the function's definition as string to the runspaces. See also GitHub Issue #4003 for more details.
It's a very bad idea to pass in a reference object and use it without thread safety, here is proof that by simply adding thread safety to your code the problem is solved:
function Write-HelloWorld {
param($number)
Write-Host -Object $number
}
$numbers = 1..500
$letters = "a".."z"
$Function = Get-Command Write-HelloWorld
$numbers | ForEach-Object -Parallel {
$refObj = $using:Function
[System.Threading.Monitor]::Enter($refObj)
${function:Write-HelloWorld} = $using:Function
foreach($letter in $using:letters) {
Write-HelloWorld -number "$_ $letter"
}
[System.Threading.Monitor]::Exit($refObj)
}
To be precise, this issue is related to Runspace Affinity, all Runspaces are trying to send the invocation back to the origin Runspace thread hence poor PowerShell collapses.
Related
I've encountered this issue in a longer script and have simplified here to show the minimal code required to reproduce it (I think). It outputs numbers followed by letters:
1 a
1 b
1 c...
2 a
2 b
2 c...
all the way to "500 z"
Function Write-HelloWorld
{
Param($number)
write-host -Object $number
}
$numbers = 1..500
$letters = "a".."z"
$Function = get-command Write-HelloWorld
$numbers | ForEach-Object -Parallel {
${function:Write-HelloWorld} = $using:Function
foreach($letter in $using:letters) {
Write-HelloWorld -number "$_ $letter"
}
}
I'm seeing 2 types of sporadically (not every time I run it):
"The term 'write-host' is not recognized as a name of a cmdlet, function, script file, or executable program." As understand it, write-host should always be available. Adding the line "Import-Module Microsoft.PowerShell.Utility" just before the call to write-host didn't help
Odd output like the below, specifically all the "write-host :" lines.
Santiago Squarzon's helpful answer demonstrates the problem with your approach well and links to a GitHub issue explaining the underlying problem (runspace affinity); however, that demonstration isn't the right solution (it wasn't meant to be), as it uses explicit synchronization to allow only one thread at a time to call the function, which negates the benefits of parallelism.
As for a solution:
You must pass a string representation of your Write-HelloWorld's function body to the ForEach-Object -Parallel call:
Function Write-HelloWorld
{
Param($number)
write-host -Object $number
}
$numbers = 1..500
$letters = "a".."z"
# Get the body of the Write-HelloWorld function *as a string*
# Alternative, as suggested by #Santiago:
# $funcDefString = (Get-Command -Type Function Write-HelloWorld).Definition
$funcDefString = ${function:Write-HelloWorld}.ToString()
$numbers | ForEach-Object -Parallel {
# Redefine the Write-HelloWorld function in this thread,
# using the *string* representation of its body.
${function:Write-HelloWorld} = $using:funcDefString
foreach($letter in $using:letters) {
Write-HelloWorld -number "$_ $letter"
}
}
${function:Write-HelloWorld} is an instance of namespace variable notation, which allows you to both get a function (its body as a [scriptblock] instance) and to set (define) it, by assigning either a [scriptblock] or a string containing the function body.
By passing a string, the function is recreated in the context of each thread, which avoids cross-thread issues that can arise when you pass a [System.Management.Automation.FunctionInfo] instance, as output by Get-Command, which contains a [scriptblock] that is bound to the runspace in which it was defined (i.e., the caller's; that script block is said to have affinity with the caller's runspace), and calling this bound [scriptblock] instance from other threads (runspaces) isn't safe.
By contrast, by redefining the function in each thread, via a string, a thread-specific [scriptblock] instance bound to that thread is created, which can safely be called.
In fact, you appear to have found a loophole, given that when you attempt to use a [scriptblock] instance directly with the $using: scope, the command by design breaks with an explicit error message:
A ForEach-Object -Parallel using variable cannot be a script block.
Passed-in script block variables are not supported with ForEach-Object -Parallel,
and can result in undefined behavior
In other words: PowerShell shouldn't even let you do what you attempted to do, but unfortunately does, as of PowerShell Core 7.2.7, resulting in the obscure failures you saw - see GitHub issue #16461.
Potential future improvement:
An enhancement is being discussed in GitHub issue #12240 to support copying the caller's state to the parallel threads on demand, which would automatically make the caller's functions available, without the need for manual redefinition.
Note, this answer is meant to prove a point but does not provide the correct solution to the problem.
See mklement0's helpful answer for the proper way to solve this by simply passing the function's definition as string to the runspaces. See also GitHub Issue #4003 for more details.
It's a very bad idea to pass in a reference object and use it without thread safety, here is proof that by simply adding thread safety to your code the problem is solved:
function Write-HelloWorld {
param($number)
Write-Host -Object $number
}
$numbers = 1..500
$letters = "a".."z"
$Function = Get-Command Write-HelloWorld
$numbers | ForEach-Object -Parallel {
$refObj = $using:Function
[System.Threading.Monitor]::Enter($refObj)
${function:Write-HelloWorld} = $using:Function
foreach($letter in $using:letters) {
Write-HelloWorld -number "$_ $letter"
}
[System.Threading.Monitor]::Exit($refObj)
}
To be precise, this issue is related to Runspace Affinity, all Runspaces are trying to send the invocation back to the origin Runspace thread hence poor PowerShell collapses.
I want to use start-job to run a .ps1 script requiring a parameter. Here's the script file:
#Test-Job.ps1
Param (
[Parameter(Mandatory=$True)][String]$input
)
$output = "$input to output"
return $output
and here is how I am running it:
$input = "input"
Start-Job -FilePath 'C:\PowerShell\test_job.ps1' -ArgumentList $input -Name "TestJob"
Get-Job -name "TestJob" | Wait-Job | Receive-Job
Get-Job -name "TestJob" | Remove-Job
Run like this, it returns " to output", so $input is null in the script run by the job.
I've seen other questions similar to this, but they mostly use -Scriptblock in place of -FilePath. Is there a different method for passing parameters to files through Start-Job?
tl;dr
$input is an automatic variable (value supplied by PowerShell) and shouldn't be used as a custom variable.
Simply renaming $input to, say, $InputObject solves your problem.
As Lee_Dailey notes, $input is an automatic variable and shouldn't be assigned to (it is automatically managed by PowerShell to provide an enumerator of pipeline input in non-advanced scripts and functions).
Regrettably and unexpectedly, several automatic variables, including $input, can be assigned to: see this answer.
$input is a particularly insidious example, because if you use it as a parameter variable, any value you pass to it is quietly discarded, because in the context of a function or script $input invariably is an enumerator for any pipeline input.
Here's a simple example to demonstrate the problem:
PS> & { param($input) "[$input]" } 'hi'
# !! No output - the argument was quietly discarded.
That the built-in definition of $input takes precedence can be demonstrated as follows:
PS> 'ho' | & { param($input) "[$input]" } 'hi'
ho # !! pipeline input took precedence
While you can technically get away with using $input as a regular variable (rather than a parameter variable) as long as you don't cross scope boundaries, custom use of $input should still be avoided:
& {
$input = 'foo' # TO BE AVOIDED
"[$input]" # Technically works: -> '[foo]'
& { "[$input]" } # FAILS, due to child scope: -> '[]'
}
I have a function that executes a script block. For convenience, the script block does not need to have explicitly defined parameters, but instead can use $_ and $A to refer to the inputs.
In the code, this is done as such:
$_ = $Value
$A = $Value2
& $ScriptBlock
This whole thing is wrapped in a function. Minimal example:
function F {
param(
[ScriptBlock]$ScriptBlock,
[Object]$Value
[Object]$Value2
)
$_ = $Value
$A = $Value2
& $ScriptBlock
}
If this function is written in a PowerShell script file (.ps1), but imported using Import-Module, the behaviour of F is as expected:
PS> F -Value 7 -Value2 1 -ScriptBlock {$_ * 2 + $A}
15
PS>
However, when the function is written in a PowerShell module file (.psm1) and imported using Import-Module, the behaviour is unexpected:
PS> F -Value 7 -Value2 1 -ScriptBlock {$_ * 2 + $A}
PS>
Using {$_ + 1} instead gives 1. It seems that $_ has a value of $null instead. Presumably, some security measure restricts the scope of the $_ variable or otherwise protects it. Or, possibly, the $_ variable is assigned by some automatic process. Regardless, if only the $_ variable was affected, the first unsuccessful example would return 1.
Ideally, the solution would involve the ability to explicitly specify the environment in which a script block is run. Something like:
Invoke-ScriptBlock -Variables #{"_" = $Value; "A" = $Value2} -InputObject $ScriptBlock
In conclusion, the questions are:
Why can't script blocks in module files access variables defined in functions from which they were called?
Is there a method for explicitly specifying the variables accessible by a script block when invoking it?
Is there some other way of solving this that does not involve including an explicit parameter declaration in the script block?
Out of order:
Is there some other way of solving this that does not involve including an explicit parameter declaration in the script block?
Yes, if you just want to populate $_, use ForEach-Object!
ForEach-Object executes in the caller's local scope, which helps you work around the issue - except you won't have to, because it also automatically binds input to $_/$PSItem:
# this will work both in module-exported commands and standalone functions
function F {
param(
[ScriptBlock]$ScriptBlock,
[Object]$Value
)
ForEach-Object -InputObject $Value -Process $ScriptBlock
}
Now F will work as expected:
PS C:\> F -Value 7 -ScriptBlock {$_ * 2}
Ideally, the solution would involve the ability to explicitly specify the environment in which a script block is run. Something like:
Invoke-ScriptBlock -Variables #{"_" = $Value; "A" = $Value2} -InputObject $ScriptBlock
Execute the scripblock using ScriptBlock.InvokeWithContext():
$functionsToDefine = #{
'Do-Stuff' = {
param($a,$b)
Write-Host "$a - $b"
}
}
$variablesToDefine = #(
[PSVariable]::new("var1", "one")
[PSVariable]::new("var2", "two")
)
$argumentList = #()
{Do-Stuff -a $var1 -b two}.InvokeWithContext($functionsToDefine, $variablesToDefine, $argumentList)
Or, wrapped in a function like your original example:
function F
{
param(
[scriptblock]$ScriptBlock
[object]$Value
)
$ScriptBlock.InvokeWithContext(#{},#([PSVariable]::new('_',$Value)),#())
}
Now you know how to solve your problem, let's get back to the question(s) about module scoping.
At first, it's worth noting that you could actually achieve the above using modules, but sort of in reverse.
(In the following, I use in-memory modules defined with New-Module, but the module scope resolution behavior describe is the same as when you import a script module from disk)
While module scoping "bypasses" normal scope resolution rules (see below for explanation), PowerShell actually supports the inverse - explicit execution in a specific module's scope.
Simply pass a module reference as the first argument to the & call operator, and PowerShell will treat the subsequent arguments as a command to be invoked in said module:
# Our non-module test function
$twoPlusTwo = { return $two + $two }
$two = 2
& $twoPlusTwo # yields 4
# let's try it with explicit module-scoped execution
$myEnv = New-Module {
$two = 2.5
}
& $myEnv $twoPlusTwo # Hell froze over, 2+2=5 (returns 5)
Why can't script blocks in module files access variables defined in functions from which they were called?
If they can, why can't the $_ automatic variable?
Because loaded modules maintain state, and the implementers of PowerShell wanted to isolate module state from the caller's environment.
Why might that be useful, and why might one preclude the other, you ask?
Consider the following example, a non-module function to test for odd numbers:
$two = 2
function Test-IsOdd
{
param([int]$n)
return $n % $two -ne 0
}
If we run the above statements in a script or an interactive prompt, subsequently invocating Test-IsOdd should yield the expected result:
PS C:\> Test-IsOdd 123
True
So far, so great, but relying on the non-local $two variable bears a flaw in this scenario - if, somewhere in our script or in the shell we accidentally reassign the local variable $two, we might break Test-IsOdd completely:
PS C:\> $two = 1 # oops!
PS C:\> Test-IsOdd 123
False
This is expected since, by default, variable scope resolution just wanders up the call stack until it reaches the global scope.
But sometimes you might require state to be kept across executions of one or more functions, like in our example above.
Modules solve this by following slightly different scope resolution rules - module-exported functions defer to something we call module scope (before reaching the global scope).
To illustrate how this solves our problem from before, considering this module-exported version of the same function:
$oddModule = New-Module {
function Test-IsOdd
{
param([int]$n)
return $n % $two -ne 0
}
$two = 2
}
Now, if we invoke our new module-exported Test-IsOdd, we predictably get the expected result, regardless of "contamination" in the callers scope:
PS C:\> Test-IsOdd 123
True
PS C:\> $two = 1
PS C:\> Test-IsOdd 123 # still works
True
This behavior, while maybe surprising, basicly serves to solidify the implicit contract between the module author and the user - the module author doesn't need to worry too much about what's "out there" (the callers session state), and the user can expect whatever going on "in there" (the loaded module's state) to work correctly without worrying about what they assign to variables in the local scope.
Module scoping behavior poorly documented in the help files, but is explained in some depth in chapter 8 of Bruce Payette's "PowerShell In Action" (ISBN:9781633430297)
Basically I'm trying to get the below "inline if-statement" function working (credit here)
Function IIf($If, $Then, $Else) {
If ($If -IsNot "Boolean") {$_ = $If}
If ($If) {If ($Then -is "ScriptBlock") {&$Then} Else {$Then}}
Else {If ($Else -is "ScriptBlock") {&$Else} Else {$Else}}
}
Using PowerShell v5 it doesn't seem to work for me and calling it like
IIf "some string" {$_.Substring(0, 4)} "no string found :("
gives the following error:
You cannot call a method on a null-valued expression.
At line:1 char:20
+ IIf "some string" {$_.Substring(0, 4)} "no string found :("
+ ~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
So, as a more general question, how do you make $_ available to the scriptblock passed into a function?
I kind of tried following this answer, but it seems it's meant for passing it to a separate process, which is not what I'm looking for.
Update:
It seems the issue is that I have the function in a module rather than directly in a script/PS session. A workaround would be to avoid putting it in the module, but I feel a module is more portable, so I'd like to figure out a solution for that.
There are two changes worth making, which make your problem go away:
Do not try to assign to $_ directly; it is an automatic variable under PowerShell's control, not meant to be set by user code (even though it may work situationally, it shouldn't be relied upon).
Instead, use the ForEach-Object cmdlet to implicitly set $_ via its -InputObject parameter.
Note that use of ForEach-Object with -InputObject rather than with input from the pipeline is unusual, because it results in atypical behavior: even collections passed to -InputObject are passed as a single object to the -Process block; that is, the usual enumeration does not take place; however, in the context at hand, this is precisely what is desired here: whatever $If represents should be passed as-is to the -Process script block, even if it happens to be a collection.
Use the -is operator with type literals such as [Boolean], not type names such as "Boolean".
Function IIf($If, $Then, $Else) {
If ($If) {
If ($Then -is [scriptblock]) { ForEach-Object -InputObject $If -Process $Then }
Else { $Then }
} Else {
If ($Else -is [scriptblock]) { ForEach-Object -InputObject $If -Process $Else }
Else { $Else }
}
}
As for what you tried:
In a later update you state that your IIf function is defined in a module, which explains why your attempt to set $_ by direct assignment ($_ = $If, which, as stated, is to be avoided in general), was ineffective:
It created a function-local $_ instance, which the $Then script block, due to being bound to the scope of the (module-external) caller, does not see.
The reason is that each module has its own scope domain (hierarchy of scopes aka session state), which only shares the global scope with non-module callers - see the bottom section of this answer for more information about scopes in PowerShell.
Here's what I'm trying to do:
param([Switch]$myparameter)
If($myparamter -eq $true) {$export = Export-CSV c:\temp\temp.csv}
Get-MyFunction | $export
If $myparameter is passed, export the data to said location. Else, just display the normal output (in other words, ignore the $export). What doesn't work here is setting $export to the "Export-csv...". Wrapping it in quotes does not work.
I'm trying to avoid an if, then statement saying "if it's passed, export this. If it's not passed, output data"
I have a larger module that everything works in so there is a reason behind why I am looking to do it this way. Please let me know if any additional information is needed.
Thank you everyone in advance.
tl;dr:
param([Switch] $myparameter)
# Define the core command as a *script block* (enclosed in { ... }),
# to be invoked later, either with operator . (no child variable scope)
# or & (with child variable scope)
$scriptBlock = { Get-MyFunction }
# Invoke the script block with . (or &), and pipe it to the Export-Csv cmdlet,
# if requested.
If ($myparameter) { # short for: ($myparameter -eq $True), because $myparameter is a switch
. $scriptBlock | Export-Csv c:\temp\temp.csv
} else {
. $scriptBlock
}
TessellatingHeckler's answer is concise, works, and uses a number of advanced features cleverly - however, while it avoids an if statement, as requested, doing so may not yield the best or most readable solution in this case.
What you're looking for is to store a command in a variable for later execution, but your own attempt to do so:
If ($myparameter -eq $true) { $export = Export-CSV c:\temp\temp.csv }
results in immediate execution, which is not only unintended, but fails, because the Export-Csv cmdlet is missing input in the above statement.
You can store a snippet of source code for later execution in a variable via a script block, simply by enclosing the snippet in { ... }, which in your case would mean:
If ($myparameter -eq $true) { $export = { Export-Csv c:\temp\temp.csv } }
Note that what you pass to if is itself a script block, but it is by definition one that is executed as soon as the if condition is found to be true.
A variable containing a script block can then be invoked on demand, using one of two operators:
., the "dot-sourcing" operator, which executes the script block in the current scope.
&, the call operator, which executes the script block in a child scope with respect to potential variable definitions.
However, given that you only need the pipeline with an additional command if switch $myparameter is specified, it's better to change the logic:
Store the shared core command, Get-MyFunction, in a script block, in variable $scriptBlock.
Invoke that script block in an if statement, either standalone (by default), or by piping it to Export-Csv (if -MyParameter was specified).
I'm trying to avoid an if, then statement
Uh, if you insist...
param([Switch]$myparameter)
$cmdlet, $params = (('Write-output', #{}),
('Export-Csv', #{'LiteralPath'='c:\temp\temp.csv'}))[$myparameter]
Get-MyFunction | & $cmdlet #params