ForEach is documented as an alias of ForEach-Object. When I use Get-Alias ForEach it tells that it is in alias of ForEach-Object.
But,
ForEach-Object Accepts parameters such as Begin Process and End, where ForEach doesn't accept them.
When we call ForEach-Object without any thing it prompts for parameter Process, and on calling ForEach it leaves a nested prompt with >>.
% and ForEach behave same, but ForEach-Object don't.
Here my questions are:
Is ForEach really an alias of ForEach-Object?
Which is better, ForEach or ForEach-Object?
Please share your thoughts.
Thank you!
There are two distinct underlying constructs:
The ForEach-Object cmdlet
This cmdlet has a built-in alias name: foreach, which just so happens to match the name of the distinct foreach statement (see next point).
The foreach loop statement (akin to the lesser used for statement).
What foreach refers to depends on the parsing context - see about_Parsing:
In argument mode (in the context of a command), foreach is ForEach-Object's alias.
In expression mode (more strictly: statement mode in this case), foreach is the foreach loop statement.
As a cmdlet, ForEach-Object operates on pipeline input.
Use it process output from other commands in a streaming manner, object by object, as these objects are being received, via the automatic $_ variable.
As a language statement, the foreach loop operates on variables and expressions (which may include output collected from commands).
Use it to process already-collected-in-memory results efficiently, via a self-chosen iterator variable (e.g., $num in foreach ($num in 1..3) { ... }); doing so is noticeably faster than processing via ForEach-Object.[1]
Note that you cannot send outputs from a foreach statement directly to the pipeline, because PowerShell's grammar doesn't permit it; for streaming output to the pipeline, wrap a foreach statement in & { ... }. (By contrast, simple expressions (e.g., 1..3) can be sent directly to the pipeline).
For more information, a performance comparison and a discussion of the tradeoffs (memory use vs. performance), including the .ForEach() array method, see this answer.
[1] However, note that the main reason for this performance discrepancy as of PowerShell 7.2.x isn't the pipeline itself, but ForEach-Object's inefficient implementation - see GitHub issue #10982.
Related
I've learnt from this thread how to append to an array of arrays. However, I've observed the weirdest behaviour ever. It seems to append stdout too! Suppose I want to append to an array of arrays, but I want to echo debug messages in the loop. Consider the following function.
function WTF {
$Result = #()
$A = 1,2
$B = 11,22
$A,$B | % {
Write-Output "Appending..."
$Result += , $_
}
return $Result
}
Now if you do $Thing = WTF, you might think you get a $Thing that is an array of two arrays: $(1,2) and $(11,22). But that's not the case. If you type $Thing in the terminal, you actually get:
Appending...
Appending...
1
2
11
22
That is, $Thing[0] is the string "Appending...", $Thing[1] is the string "Appending...", $Thing[2] is the array #(1,2), and $Thing[3] is the array #(11,22). Nevermind that they seem to be in a weird order, which is a whole other can of worms I don't want to get into, but why does the echoed string "Appending..." get appended to the result??? This is so extremely weird. How do I stop it from doing that?
EDIT
Oh wait, upon further experimenting, the following simpler function might be more revealing:
function WTF {
$Result = #()
1,2 | % {
Write-Output "LOL"
}
return $Result
Now if you do $Thing = WTF, then $Thing becomes an array of length 2, containing the string "LOL" twice. Clearly there is something fundamental about Powershell loops and or Write-Output that I'm not understanding.
ANOTHER EDIT!!!
In fact, the following even simpler function does the same thing:
function WTF {
1,2 | % {
Write-Output "LOL"
}
}
Maybe I just shouldn't be using Write-Output, but should use Write-Information or Write-Debug instead.
PowerShell doesn't have return values, it has a success output stream (the analog of stdout in traditional shells).
The PowerShell pipeline serves as the conduit for this stream, both when capturing command output in a variable and when sending it to another command via |, the pipeline operator
Any statement - including multiple ones, possibly in a loop - inside a function or script can write to that stream, typically implicitly - by not capturing, suppressing, or redirecting output - or explicitly, with Write-Output, although its use is rarely needed - see this answer for more information.
Output is sent instantly to the success output stream, as it is being produced - even before the script or function exits.
return exists for flow control, independently of PowerShell's output behavior; as syntactic sugar you may also use it to write to the output stream; that is, return $Result is syntactic sugar for: $Result; return, with $Result by itself producing implicit output, and return exiting the scope.
To avoid polluting the success output stream - intended for data output only - with status messages, write to one of the other available output streams - see the conceptual about_Redirection help topic.
Write-Verbose is a good choice, because it is silent by default, and can be activated on demand, either via $VerbosePreference = 'Continue', or, on a per-call basis, with the common -Verbose parameter, assuming the script or function is an advanced one.
Write-Host, by contrast, unconditionally prints information to the host (display), and allows control over formatting, notably coloring.
Outputting a collection (array) from a script or function enumerates it. That is, instead of sending the collection itself, as a whole, its elements are sent to PowerShell's pipeline, one by one, a process called streaming.
PowerShell commands generally expect streaming output, and may not behave as expected if you output a collection as a whole.
When you do want to output a collection as a whole (which may sometimes be necessary for performance reasons), wrap them in a transient aux. single-element array, using the unary form of ,, the array constructor operator: , $Result
A conceptually clearer (but less efficient) alternative is to use Write-Output -NoEnumerate
See this answer for more information.
Therefore, the PowerShell idiomatic reformulation of your function is:
function WTF {
# Even though no parameters are declared,
# these two lines are needed to activate support for the -Verbose switch,
# which implicitly make the function an *advanced* one.
# Even without it, you could still control verbose output via
# the $VerbosePreference preference variable.
[CmdletBinding()]
param()
$A = 1,2
$B = 11,22
# Use Write-Verbose for status information.
# It is silent by default, but if you pass -Verbose
# on invocation or set $VerbosePreference = 'Continue', you'll
# see the message.
Write-Verbose 'Appending...'
# Construct an array containing the input arrays as elements
# and output it, which *enumerates* it, meaning that each
# input array is output by itself, as a whole.
# If you need to output the nested array *as a whole*, use
# , ($A, $B)
$A, $B
}
Sample invocation:
PS> $nestedArray = WTF -Verbose
VERBOSE: Appending...
Note:
Only the success output (stream 1) was captured in variable $nestedArray, whereas the verbose output (stream 4) was passsed through to the display.
$nestedArray ends up as an array - even though $A and $B were in effect streamed separately - because PowerShell automatically collects multiple streamed objects in an array, which is always of type [object[]].
A notable pitfall is that if there's only one output object, it is assigned as-is, not wrapped in an array.
To ensure that a command's output is is always an array, even in the case of single-object output:
You can enclose the command in #(...), the array-subexpression operator
$txtFiles = #(Get-ChildItem *.txt)
In the case of a variable assignment, you can also use a type constraint with [array] (effectively the same as [object[]]):
[array] $txtFiles = Get-ChildItem *.txt
However, note that if a given single output object itself is a collection (which, as discussed, is unusual), no extra array wrapper is created by the commands above, and if that collection is of a type other than an array, it will be converted to a regular [object[]] array.
Additionally, if #(...) is applied to a strongly typed array (e.g., [int[]] (1, 2), it is in effect enumerated and rebuilt as an [object[]] array; by contrast, the [array] type constraint (cast) preserves such an array as-is.
As for your specific observations and questions:
I've learnt from this thread how to append to an array of arrays
While using += in order to incrementally build an array is convenient, it is also inefficient, because a new array must be constructed every time - see this answer for how to construct arrays efficiently, and this answer for how to use an efficiently extensible list type as an alternative.
Nevermind that they seem to be in a weird order, which is a whole other can of worms I don't want to get into
Output to the pipeline - whether implicit or explicit with Write-Output is instantly sent to the success output stream.
Thus, your Write-Output output came first, given that you didn't output the $Result array until later, via return.
How do I stop it from doing that?
As discussed, don't use Write-Output for status messages.
Using BorderAround emits "True" to the console.
$range = $sum_wksht.Range('B{0}:G{0}' -f ($crow))
$range.BorderAround(1, -4138)
This can be overcome by using one of the following.
$wasted = $range.BorderAround(1, -4138)
[void]$range.BorderAround(1, -4138)
Why is this needed? Am I not creating the range correctly? Is there a better workaround?
Why is this needed?
It is needed, because the BorderAround method has a return value and, in PowerShell, any command or expression that outputs (returns) data is implicitly output to the (success) output stream, which by default goes to the host, which is typically the console window (terminal) in which a PowerShell session runs.
That is, the data shows in the console/terminal, unless it is:
captured ($var = ...)
sent through the pipeline for further processing (... | ...; the last pipeline segment's command may or may not produce output itself)
redirected (... >)
or any combination thereof.
That is:
$range.BorderAround(1, -4138)
is (more efficient) shorthand for:
Write-Output $range.BorderAround(1, -4138)
(Explicit use of Write-Output is rarely needed.)
Since you don't want that output, you must suppress it, for which you have several options:
$null = ...
[void] (...)
... > $null
... | Out-Null
$null = ... may be the best overall choice, because:
It conveys the intent to suppress up front
While [void] = (...) does that too, it often requires you to enclose an expression in (...) for syntactic reasons; e.g., [void] 1 + 2 doesn't work as intended, only [void] (1 + 2); similarly, a command must always be enclosed in (...); [void] New-Item test.txt doesn't work, only [void] (New-Item test.txt) does.
It performs well with both command output (e.g., $null = Get-AdUser ...) and expression output (e.g., $null = $range.BorderAround(1, -4138)).
Conversely, avoid ... | Out-Null, because it is generally much slower (except in the edge case of a side effect-free expression's output in PowerShell (Core) 6+)[1].
However, if you need to silence all output streams - not just the success output, but also errors, verbose output, ... - you must use *> $null
Why does PowerShell produce output implicitly?
As a shell, PowerShell's output behavior is based on streams, as in traditional shells such as cmd.exe or Bash. (While traditional shells have 2 output streams - stdout and stderr - PowerShell has 6, so as to provide more sophisticated functionality - see about_Redirection.)
A cmdlet, script, or function can write to the output streams as often as it wants, and such output is usually instantly available for display but notably also to potential consumers, which enables the streaming, one-by-one processing that the pipeline provides.
This contrasts with traditional programming languages, whose output behavior is based on return values, typically provided via the return keyword, which conflates output data (the return value) with flow control (exit the scope and return to the caller).
A frequent pitfall is to expect PowerShell's return statement to act the same, but it doesn't: return <val> is just syntactic sugar for <val>; return, i.e., implicit output of <val> followed by an unconditional return of control to the caller; notably, the use of return does not preclude generation of output from earlier statements in the same scope.
Unlike traditional shells, PowerShell doesn't require an explicit write-to-the-output stream command in order to produce output:
While PowerShell does have a counterpart to echo, namely Write-Output, its use is rarely needed.
Among the rare cases where Write-Output is useful is preventing enumeration of a collection on output with -NoEnumerate, or to use common parameter -OutVariable to both output data and capture it in a variable (which is generally only needed for expressions, because cmdlets and advanced functions / scripts themselves support -OutVariable).
The implicit output behavior:
is generally a blessing:
for interactive experimentation - just type any statement - notably including expressions such as [IO.Path]::GetExtension('foo.txt') and [math]::Pow(2, 32) - and see its output (akin to the behavior of a REPL).
for writing concise code that doesn't need to spell out implied behavior (see example below).
can occasionally be a pitfall:
for users accustomed to the semantics of traditional programming languages.
due to the potential for accidental pollution of the output stream from statements that one doesn't expect to produce output, such as in your case; a more typical example is the .Add() method of the [System.Collections.ArrayList] class unexpectedly producing output.
Example:
# Define a function that takes an array of integers and
# outputs their hex representation (e.g., '0xa' for decimal 10)
function Get-HexNumber {
param([int[]] $numbers)
foreach ($i in $numbers) {
# Format the integer at hand
# *and implicitly output it*.
'0x{0}' -f $i.ToString('x')
}
}
# Call the function with integers 0 to 16 and loop over the
# results, sleeping 1 second between numbers.
Get-HexNumber (0..16) | ForEach-Object { "[$_]"; Start-Sleep 1 }
The above yields the following:
[0x0]
# 1-second pause
[0x1]
# 1-second pause
[0x2]
...
[0x10]
This demonstrates the streaming aspect of the behavior: Get-HexNumber's output is available to the ForEach-Object cmdlet call as it is being produced, not after Get-HexNumber has exited.
[1] In PowerShell (Core) 6+, Out-Null has an optimization if the only preceding pipeline segment is a side effect-free expression rather than a method or command call; e.g., 1..1e6 | Out-Null executes in almost no time, because the expression is seemingly not even executed. However, such a scenario is atypical, and the functionally equivalent Write-Output (1..1e6) | Out-Null takes a long time to run, much longer than $null = Write-Output (1..1e6).
I can't find a way to pass the function. Just variables.
Any ideas without putting the function inside the ForEach loop?
function CustomFunction {
Param (
$A
)
Write-Host $A
}
$List = "Apple", "Banana", "Grape"
$List | ForEach-Object -Parallel {
Write-Host $using:CustomFunction $_
}
The solution isn't quite as straightforward as one would hope:
# Sample custom function.
function Get-Custom {
Param ($A)
"[$A]"
}
# Get the function's definition *as a string*
$funcDef = ${function:Get-Custom}.ToString()
"Apple", "Banana", "Grape" | ForEach-Object -Parallel {
# Define the function inside this thread...
${function:Get-Custom} = $using:funcDef
# ... and call it.
Get-Custom $_
}
Note: This answer contains an analogous solution for using a script block from the caller's scope in a ForEach-Object -Parallel script block.
Note: If your function were defined in a module that is placed in one of the locations known to the module-autoloading feature, your function calls would work as-is with ForEach-Object -Parallel, without extra effort - but each thread would incur the cost of (implicitly) importing the module.
The above approach is necessary, because - aside from the current location (working directory) and environment variables (which apply process-wide) - the threads that ForEach-Object -Parallel creates do not see the caller's state, notably neither with respect to variables nor functions (and also not custom PS drives and imported modules).
As of PowerShell 7.2.x, an enhancement is being discussed in GitHub issue #12240 to support copying the caller's state to the parallel threads on demand, which would make the caller's functions automatically available.
Note that redefining the function in each thread via a string is crucial, as an attempt to make do without the aux. $funcDef variable and trying to redefine the function with ${function:Get-Custom} = ${using:function:Get-Custom} fails, because ${function:Get-Custom} is a script block, and the use of script blocks with the $using: scope specifier is explicitly disallowed in order to avoid cross-thread (cross-runspace) issues.
However, ${function:Get-Custom} = ${using:function:Get-Custom} would work with Start-Job; see this answer for an example.
It would not work with Start-ThreadJob, which currently syntactically allows you to do & ${using:function:Get-Custom} $_, because ${using:function:Get-Custom} is preserved as a script block (unlike with Start-Job, where it is deserialized as a string, which is itself surprising behavior - see GitHub issue #11698), even though it shouldn't. That is, direct cross-thread use of [scriptblock] instances causes obscure failures, which is why ForEach-Object -Parallel prevents it in the first place.
A similar loophole that leads to cross-thread issues even with ForEach-Object -Parallel is using a command-info object obtained in the caller's scope with Get-Command as the function body in each thread via the $using: scope: this too should be prevented, but isn't as of PowerShell 7.2.7 - see this post and GitHub issue #16461.
${function:Get-Custom} is an instance of namespace variable notation, which allows you to both get a function (its body as a [scriptblock] instance) and to set (define) it, by assigning either a [scriptblock] or a string containing the function body.
I just figured out another way using get-command, which works with the call operator. $a ends up being a FunctionInfo object.
EDIT: I'm told this isn't thread safe, but I don't understand why.
function hi { 'hi' }
$a = get-command hi
1..3 | foreach -parallel { & $using:a }
hi
hi
hi
So I figured out another little trick that may be useful for people trying to add the functions dynamically, particularly if you might not know the name of it beforehand, such as when the functions are in an array.
# Store the current function list in a variable
$initialFunctions=Get-ChildItem Function:
# Source all .ps1 files in the current folder and all subfolders
Get-ChildItem . -Recurse | Where-Object { $_.Name -like '*.ps1' } |
ForEach-Object { . "$($_.FullName)" }
# Get only the functions that were added above, and store them in an array
$functions = #()
Compare-Object $initialFunctions (Get-ChildItem Function:) -PassThru |
ForEach-Object { $functions = #($functions) + #($_) }
1..3 | ForEach-Object -Parallel {
# Pull the $functions array from the outer scope and set each function
# to its definition
$using:functions | ForEach-Object {
Set-Content "Function:$($_.Name)" -Value $_.Definition
}
# Call one of the functions in the sourced .ps1 files by name
SourcedFunction $_
}
The main "trick" of this is using Set-Content with Function: plus the function name, since PowerShell essentially treats each entry of Function: as a path.
This makes sense when you consider the output of Get-PSDrive. Since each of those entries can be used as a "Drive" in the same way (i.e., with the colon).
The documentation for ForEach-object says "When you use the InputObject parameter with ForEach-Object, instead of piping command results to ForEach-Object, the InputObject value is treated as a single object." This behavior can easily be observed directly:
PS C:\WINDOWS\system32> ForEach-Object -InputObject #(1, 2, 3) {write-host $_}
1 2 3
This seems weird. What is the point of a "ForEach" if there is no "each" to do "for" on? Is there really no way to get ForEach-object to act directly on the individual elements of an array without piping? if not, it seems that ForEach-Object with InputObject is completely useless. Is there something I don't understand about that?
In the case of ForEach-Object, or any cmdlet designed to operate on a collection, using the -InputObject as a direct parameter doesn't make sense because the cmdlet is designed to operate on a collection, which needs to be unrolled and processed one element at a time. However, I would also not call the parameter "useless" because it still needs to be defined so it can be set to allow input via the pipeline.
Why is it this way?
-InputObject is, by convention, a generic parameter name for what should be considered to be pipeline input. It's a parameter with [Parameter(ValueFromPipeline = $true)] set to it, and as such is better suited to take input from the pipeline rather passed as a direct argument. The main drawback of passing it in as a direct argument is that the collection is not guaranteed to be unwrapped, and may exhibit some other behavior that may not be intended. From the about_pipelines page linked to above:
When you pipe multiple objects to a command, PowerShell sends the objects to the command one at a time. When you use a command parameter, the objects are sent as a single array object. This minor difference has significant consequences.
To explain the above quote in different words, passing in a collection (e.g. an array or a list) through the pipeline will automatically unroll the collection and pass it to the next command in the pipeline one at a time. The cmdlet does not unroll -InputObject itself, the data is delivered one element at a time. This is why you might see problems when passing a collection to the -InputObject parameter directly - because the cmdlet is probably not designed to unroll a collection itself, it expects each collection element to be handed to it in a piecemeal fashion.
Consider the following example:
# Array of hashes with a common key
$myHash = #{name = 'Alex'}, #{name='Bob'}, #{name = 'Sarah'}
# This works as intended
$myHash | Where-Object { $_.name -match 'alex' }
The above code outputs the following as expected:
Name Value
---- -----
name Alex
But if you pass the hash as InputArgument directly like this:
Where-Object -InputObject $myHash { $_.name -match 'alex' }
It returns the whole collection, because -InputObject was never unrolled as it is when passed in via the pipeline, but in this context $_.name -match 'alex' still returns true. In other words, when providing a collection as a direct parameter to -InputObject, it's treated as a single object rather than executing each time against each element in the collection. This can also give the appearance of working as expected when checking for a false condition against that data set:
Where-Object -InputObject $myHash { $_.name -match 'frodo' }
which ends up returning nothing, because even in this context frodo is not the value of any of the name keys in the collection of hashes.
In short, if something expects the input to be passed in as pipeline input, it's usually, if not always, a safer bet to do it that way, especially when passing in a collection. However, if you are working with a non-collection, then there is likely no issue if you opt to use the -InputObject parameter directly.
Bender the Greatest's helpful answer explains the current behavior well.
For the vast majority of cmdlets, direct use of the -InputObject parameter is indeed pointless and the parameter should be considered an implementation detail whose sole purpose is to facilitate pipeline input.
There are exceptions, however, such as the Get-Member cmdlet, where direct use of -InputObject allows you to inspect the type of a collection itself, whereas providing that collection via the pipeline would report information about its elements' types.
Given how things currently work, it is quite unfortunate that the -InputObject features so prominently in most cmdlets' help topics, alongside "real" parameters, and does not frame the issue with enough clarity (as of this writing): The description should clearly convey the message "Don't use this parameter directly, use the pipeline instead".
This GitHub issue provides an categorized overview of which cmdlets process direct -InputObject arguments how.
Taking a step back:
While technically a breaking change, it would make sense for -InputObject parameters (or any pipeline-binding parameter) to by default accept and enumerate collections even when they're passed by direct argument rather than via the pipeline, in a manner that is transparent to the implementing command.
This would put direct-argument input on par with pipeline input, with the added benefit of the former resulting in faster processing of already-in-memory collections.
I have a file that looks like this:
a,1
b,2
c,3
a,4
b,5
c,6
(...repeat 1,000s of lines)
How can I transpose it into this?
a,b,c
1,2,3
4,5,6
Thanks
Here's a brute-force one-liner from hell that will do it:
PS> Get-Content foo.txt |
Foreach -Begin {$names=#();$values=#();$hdr=$false;$OFS=',';
function output { if (!$hdr) {"$names"; $global:hdr=$true}
"$values";
$global:names=#();$global:values=#()}}
-Process {$n,$v = $_ -split ',';
if ($names -contains $n) {output};
$names+=$n; $values+=$v }
-End {output}
a,b,c
1,2,3
4,5,6
It's not what I'd call elegant but should get you by. This should copy/paste correctly as-is. However if you reformat it to what is shown above you will need put back-ticks after the last curly on both the Begin and Process scriptblocks. This script requires PowerShell 2.0 as it relies on the new -split operator.
This approach makes heavy use of the Foreach-Object cmdlet. Normally when you use Foreach-Object (alias is Foreach) in the pipeline you specify just one scriptblock like so:
Get-Process | Foreach {$_.HandleCount}
That prints out the handle count for each process. This usage of Foreach-Object uses the -Process scriptblock implicitly which means it executes once for each object it receives from the pipeline. Now what if we want to total up all the handles for each process? Ignore the fact that you could just use Measure-Object HandleCount -Sum to do this, I'll show you how Foreach-Object can do this. As you see in the original solution to this problem, Foreach can take both a Begin scriptblock that is executed once for the first object in the pipeline and a End scripblock that executes when there are no more objects in the pipeline. Here's how you can total the handle count using Foreach-Object:
gps | Foreach -Begin {$sum=0} -Process {$sum += $_.HandleCount } -End {$sum}
Relating this back to the problem solution, in the Begin scriptblock I initialize some variables to hold the array of names and values as well as a bool ($hdr) that tells me whether or not the header has been output (we only want to output it once). The next mildly mind blowing thing is that I also declare a function (output) in the Begin scriptblock that I call from both the Process and End scriptblocks to output the current set of data stored in $names and $values.
The only other trick is that the Process scriptblock uses the -contains operator to see if the current line's field name has already been seen before. If so, then output the current names and values and reset those arrays to empty. Otherwise just stash the name and value in the appropriate arrays so they can be saved later.
BTW the reason the output function needs to use the global: specifier on the variables is that PowerShell performs a "copy-on-write" approach when a nested scope modifies a variable defined outside its scope. However when we really want that modification to occur at the higher scope, we have to tell PowerShell that by using a modifier like global: or script:.