In a powershell pipeline, how to wait for a cmdlet to complete before proceding to the next one? - powershell

Consider the following function:
function myFunction {
100
sleep 1
200
sleep 1
300
sleep 1
}
As you can see it will emit those values, one by one down the pipeline.
But I want to wait for all the values to be emitted before going on. Like
myFunction | waitForThePreviousCommandToComplete | Format-Table
I want the Format-Table above to receive the entire array, instead of one-by-one items.
Is it even possible in Powershell?

Use
(...), the grouping operator in order to collect a command's output in full first, before sending it to the success output stream (the pipeline).
# Due to (...), doesn't send myfunction's output to Format-Table until it has run
# to completion and all its output has been collected.
(myFunction) | Format-Table
# Also works for entire pipelines.
(100, 200, 300 | ForEach-Object { $_; Start-Sleep 1 }) | Format-Table
Note:
If you need to up-front collect the output of multiple commands (pipelines) and / or language statements, use $(...), the subexpression operator instead, e.g. $(Get-Date -Year 2020; Get-Date -Year 2030) | Format-Table; the next point applies to it as well.
Whatever output was collected by (...) is enumerated, i.e., if the collected output is an enumerable, its elements are emitted one by one to the success output stream - albeit without any delay at that point.
Note that the collected output is invariably an enumerable (an array of type [object[]]) if two or more output objects were collected, but it also can be one in the usual event that a single object that itself is an enumerable was collected.
E.g., (Write-Output -NoEnumerate 1, 2, 3) | Measure-Object reports a count of 3, even though Write-Output -NoEnumerate output the given array as a single object (without (...), Measure-Object would report 1).
Typically, commands (cmdlets, functions, scripts) stream their output objects, i.e. emit them one by one to the pipeline, as soon as they are produced, while the command is still running, as your function does, and also act on their pipeline input one by one. However, some cmdlets invariably, themselves collect all input objects first, before they start emitting their output object(s), of conceptual necessity: notable examples are Sort-Object, Group-Object, and Measure-Object, all of which must act on the entirety of their input before they can start emitting results. Ditto for Format-Table when it is passed the -AutoSize switch, discussed next.
In the case of Format-Table, specifically, you can use the -AutoSize switch in order force it to collect all input first, in order to determine suitable display column widths based on all data (by default, Format-Table waits for 300 msecs. in order to determine column widths, based on whatever subset of the input data it has received by then).
However, this does not apply to so-called out-of-band-formatted objects, notably strings and primitive .NET types, which are still emitted (by their culture-invariant .ToString() representation) as they're being received.
Only complex objects (those with properties) are collected first, notably hashtables and [pscustomobject] instances; e.g.:
# Because this ForEach-Object call outputs complex objects (hashtables),
# Format-Table, due to -AutoSize, collects them all first,
# before producing its formatted output.
100, 200, 300 | ForEach-Object { #{ num = $_ }; Start-Sleep 1 } |
Format-Table -AutoSize
If you want to create a custom function that collects all of its pipeline input up front, you have two options:
Create a simple function that uses the automatic $input variable in its function body, which implicitly runs only after all input has been received; e.g.:
# This simple function simply relays its input, but
# implicitly only after all of it has been collected.
function waitForThePreviousCommandToComplete { $input }
# Output doesn't appear until after the ForEach-Object
# call has emitted all its output.
100, 200, 300 | ForEach-Object { $_; Start-Sleep 1 } | waitForThePreviousCommandToComplete
In the context of an advanced function, you'll have to manually collect all input, iteratively in the process block, via a list-type instance allocated in the begin block, which you can then process in the end block.
While using a simple function with $input is obviously simpler, you may still want an advanced one for all the additional benefits it offers (preventing unbound arguments, parameter validation, multiple pipeline-binding parameters, ...).
See this answer for an example.

Sort waits until it has everything.
myFunction | sort-object
Or:
(myFunction)
$(myfunction1; myFunction2)
myFunction | format-table -autosize
myFunction | more
See also: How to tell PowerShell to wait for each command to end before starting the next?

For some unknown reason, just putting the function inside brackets solved my problem:
(myFunction) | Format-Table

Related

Powershell output PSCustomObject blocks output hashtable [duplicate]

I'm learning PowerShell and a vast number of articles I read strongly discourages the use of write-host telling me it's "bad practice" and almost always, the output can be displayed in another way.
So, I'm taking the advice and try to avoid use of write-host. One suggestion I found was to use write-output instead. As far as I understand, this puts everything in a pipeline, and the output is executed at the end of the script (?).
However, I have problems outputting what I want. This example demonstrates the issue:
$properties = #{'OSBuild'="910ef01.2.8779";
'OSVersion'="CustomOS 3";
'BIOSSerial'="A5FT-XT2H-5A4B-X9NM"}
$object = New-Object –TypeName PSObject –Prop $properties
Write-output $object
$properties = #{'Site'="SQL site";
'Server'="SQL Server";
'Database'="SQL Database"}
$object = New-Object –TypeName PSObject –Prop $properties
Write-Output $object
This way I get a nice output of the first object displaying the OS data, but the second object containing the SQL data is never displayed. I've tried renaming the variable names, and a bunch of other different stuff, but no luck.
While troubleshooting this problem, I found similar problems with suggestions to just replace write-output with write-host. This gets me very confused. Why are some people strongly discouraging write-host, while other people encourage it?
And how exactly do I output these two objects in a fashionably manner? I do not fully understand the pipeline mechanism of write-output.
Just to clarify: the problem is only a display problem:
When outputting to the console, if the first object is table-formatted (if Format-Table is applied, which happens implicitly in your case), the display columns are locked in based on that first object's properties.
Since your second output object shares no properties with the first one, it contributes nothing to the table display and is therefore effectively invisible.
By contrast, if you programmatically process the script's output - assign it to a variable or send its output through the pipeline to another command - both objects will be there.
See Charlie Joynt's answer for a helpful example of assigning the two output objects to separate variables.
The simplest solution to the display problem is to explicitly format for display each input object individually - see below.
For a given single object inside a script, you can force formatted to-display (to-host) output with Out-Host:
$object | Out-Host # same as: Write-Output $object | Out-Host
Note, however, that this outputs directly and invariably to the console only and the object is then not part of the script's data output (the objects written to the success output stream, the stream with index 1).
In other words: if you try to assign the script's output to a variable or send its output to another command in a pipeline, that object won't be there.
See below for why Out-Host is preferable to Write-Host, and why it's better to avoid Write-Host in most situations.
To apply the technique ad hoc to a given script's output as a whole, so as to make sure you see all output objects, use:
./some-script.ps1 | % { $_ | Out-String } # % is the built-in alias of ForEach-Object
Note that here too you could use Out-Host, but the advantage of using Out-String is that it still allows you to capture the for-display representation in a file, if desired.
Here's a simple helper function (filter) that you can put in your $PROFILE:
# Once defined, you can use: ./some-script.ps1 | Format-Each
Filter Format-Each { $_ | Out-String }
PetSerAl's suggestion - ./some-script.ps1 | Format-List - works in principle too, but it switches the output from the usual table-style output to list-style output, with each property listed on its own line, which may be undesired.
Conversely, however, Format-Each, if an output object is (implicitly) table-formatted, prints a header for each object.
Why Write-Output doesn't help:
Write-Output doesn't help, because it writes to where output objects go by default anyway: the aforementioned success output stream, where data should go.
If the output stream's objets aren't redirected or captured in some form, they are sent to the host by default (typically, the console), where the automatic formatting is applied.
Also, use of Write-Output is rarely necessary, because simply not capturing or redirecting a command or expression implicitly writes to the success stream; another way of putting it:
Write-Output is implied.
Therefore, the following two statements are equivalent:
Write-Output $object # write $object to the success output stream
$object # same; *implicitly* writes $object to the success output stream
Why use of Write-Host is ill-advised, both here and often in general:
Assuming you do know the implications of using Write-Host in general - see below - you could use it for the problem at hand, but Write-Host applies simple .ToString() formatting to its input, which does not give you the nice, multi-line formatting that PowerShell applies by default.
Thus, Out-Host (and Out-String) were used above, because they do apply the same, friendly formatting.
Contrast the following two statements, which print a hash-table ([hashtable]) literal:
# (Optional) use of Write-Output: The friendly, multi-line default formatting is used.
# ... | Out-Host and ... | Out-String would print the same.
PS> Write-Output #{ foo = 1; bar = 'baz' }
Name Value
---- -----
bar baz
foo 1
# Write-Host: The hashtable's *entries* are *individually* stringified
# and the result prints straight to the console.
PS> Write-Host #{ foo = 1; bar = 'baz' }
System.Collections.DictionaryEntry System.Collections.DictionaryEntry
Write-Host did two things here, which resulted in near-useless output:
The [hashtable] instance's entries were enumerated and each entry was individually stringified.
The .ToString() stringification of hash-table entries (key-value pairs) is System.Collections.DictionaryEntry, i.e., simply the type name of the instance.
The primary reasons for avoiding Write-Host in general are:
It outputs directly to the host (console) rather than to PowerShell's success output stream.
As a beginner, you may mistakenly think that Write-Host is for writing results (data), but it isn't.
In bypassing PowerShell's system of streams, Write-Host output cannot be redirected - that is, it can neither be suppressed nor captured (in a file or variable).
That said, starting with PowerShell v5.0, you can now redirect its output via the newly introduced information stream (number 6; e.g., ./some-script.ps1 6>write-host-output.txt); however, that stream is more properly used with the new Write-Information cmdlet.
By contrast, Out-Host output still cannot be redirected.
That leaves just the following legitimate uses of Write-Host:
Creating end-user prompts and colored for-display-only representations:
Your script may have interactive prompts that solicit information from the user; using Write-Host - optionally with coloring via the -ForegroundColor and -BackgroundColor parameters - is appropriate, given that prompt strings should not become part of the script's output and users also provide their input via the host (typically via Read-Host).
Similarly, you can use Write-Host with selective coloring to explicitly create friendlier for-display-only representations.
Quick prototyping: If you want a quick-and-dirty way to write status/diagnostic information directly to the console without interfering with a script's data output.
However, it is generally better to use Write-Verbose and Write-Debug in such cases.
Generally speaking the expectation is for script/functions to return a single "type" of object, often with many instances. For example, Get-Process returns a load of processes, but they all have the same fields. As you'll have seen from the tutorials, etc. you can then pass the output of Get-Process along a pipeline and process the data with subsequent cmdlets.
In your case you are returning two different types of object (i.e. with two different sets of properties). PS outputs the first object, but not the second one (which doesn't match the first) as you discovered. If you were to add extra properties to the first object that match those used in the second one, then you'd see both objects.
Write-Host doesn't care about this sort of stuff. The push-back against using this is mainly to do with (1) it being a lazy way to give feedback about script progress, i.e. use Write-Verbose or Write-Debug instead and (2) it being imperfect when it comes to passing objects along a pipeline, etc.
Clarification on point (2), helpfully raised in the comments to this answer:
Write-Host is not just imperfect with respect to the pipeline /
redirection / output capturing, you simply cannot use it for that in
PSv4 and earlier, and in PSv5+ you have to awkwardly use 6>; also,
Write-Host stringifies with .ToString(), which often produces
unhelpful representations
If your script is really just meant to print data to the console then go ahead and Write-Host.
Alternatively, you can return multiple objects from a script or function. Using return or Write-Output, just return object objects comma-separated. For example:
Test-WriteOutput.ps1
$object1 = [PSCustomObject]#{
OSBuild = "910ef01.2.8779"
OSVersion = "CustomOS 3"
BIOSSerial = "A5FT-XT2H-5A4B-X9NM"
}
$object2 = [PSCustomObject]#{
Site = "SQL site"
Server= "SQL Server"
Database="SQL Database"
}
Write-Output $object1,$object2
The run the script, assigning the output into two variables:
$a,$b = .\Test-WriteOutput.ps1
You'll see that $a is $object1 and $b is $object2.
use write-host, write-output is for pipeline (and by default on console after clear)

Powershell write variable to stdout as running

In my powershell script I have a variable which captures its commands output
$files= dosomething
so running dosomething alone will produce output to the stdout. My question is how do I get that same functionality out of my variable as its running? mostly as a way to see progress.
maybe best to run the command and then somehow capture that to a variable?
UPDATE 1:
here's the command I'm trying to monitor:
$remotefiles = rclone md5sum remote: --dry-run --max-age "$lastrun" --filter-from "P:\scripts\filter-file.txt"
UPDATE 2:
running rclone md5sum remote: --dry-run --max-age "$lastrun" --filter-from "P:\scripts\filter-file.txt" directly produces output ...
The simplest solution is to enclose your assignment in (...), the grouping operator, which passes the result of the assignment through; e.g:
# Enclosing an assignment in (...) passes the value that is assigned through.
PS> ($date = Get-Date)
Friday, May 21, 2021 7:43:48 PM
Note:
Use of (...) implies that all command output is captured first, before passing it through; that is, you won't see the passed-through output until after the command has terminated - see the bottom section for a solution.
An assignment by default captures only success output (from PowerShell commands) / stdout output (from external commands).
If you (also) want to capture error-stream / stderr output, use 2>&1 to merge the error-stream / stderr output into the success stream / stdout (see the conceptual about_Redirection topic); in the case of the external rclone program (for brevity, the actual arguments are omitted and represented with placeholder ...):
# Merge stderr into stdout and capture the combination, and also pass it through.
([string[]] $remotefiles = rclone ... 2>&1)
Note: the [string[]] type constraint ensures that the stderr lines too are captured as strings; by default, PowerShell returns them as [System.Management.Automation.ErrorRecord] instances. If you want pipeline logic, i.e. to only capture a [string] scalar if there's only one output line and use an array only for multiple lines, use
($remotefiles = rclone ... 2>&1 | % ToString) instead.
As Abraham Zinala points out, PowerShell cmdlets (including advanced functions and scripts), support the common -OutVariable parameter, which also allows capturing a cmdlet's success output in addition to outputting it to the success output stream:
# Outputs *and* saves in variable $date.
# Note that the -OutVariable argument must *not* start with "$"
PS> Get-Date -OutVariable date
Friday, May 21, 2021 7:43:48 PM
Note: Using 2>&1 does not cause the -OutVariable to also capture error-stream output, but you can capture errors separately, using the common -ErrorVariable parameter.
Caveat: Surprisingly, the -OutVariable target variable[1] always receives a [System.Collections.ArrayList] instance, which is unusual in two respects:
PowerShell generally uses regular array ([object[]]) instances to collect multiple output objects, such as in an assignment.
While difference in (read-only) access between an array list and a regular array will rarely matter, the more problematic aspect is that an array list is used even when the cmdlet outputs just ONE object, unlike in assignments; in the case at hand this means:
$date = Get-Date stores a [datetime] instance in variable $date
whereas Get-Date -OutVariable date stores a single-element array list containing a [datetime] instance in $date.
This unexpected behavior is the subject of GitHub issue #3154.
By contrast, Tee-Object's -Variable parameter, as shown in SADIK KUZU's helpful answer, does not exhibit this unexpected behavior; that is, it behaves the same way as an assignment.
However, given that use of Tee-Object is by definition an additional call and therefore requires an extra pipeline segment, you do pay a performance penalty (though that may not matter in practice).
Solution for a long-running (external) command whose output should be passed through as it is being produced:
# Merge stderr into stdout and output in streaming fashion,
# while capturing the (stringified) lines in variable $remoteFiles
rclone ... 2>&1 | % ToString -OutVariable remoteFiles
Note: % is the built-in alias for the ForEach-Object cmdlet; ToString calls the .ToString() method on each object, which is necessary for converting the stderr output lines to strings, because PowerShell by default wraps them in[System.Management.Automation.ErrorRecord] instances; if conversion to strings isn't desired or necessary, replace
% ToString -OutVariable remoteFiles with
Tee-Object -Variable remoteFiles.
[1] The same applies to the other common -*Variable parameters, such as -ErrorVariable.
Tee-Object saves command output in a file or variable and also sends it down the pipeline.
Get-Process notepad | Tee-Object -Variable proc | Select-Object processname,handles
ProcessName Handles
----------- -------
notepad 43
notepad 37
notepad 38
notepad 38
This example gets a list of the processes running on the computer, saves them to the $proc variable, and pipes them to Select-Object.
Out-Host will send output to the console without sending it down the pipe
Function Test-Output {
param()
$list = #(
'Dogs',
'Cats',
'Monkeys'
)
$list | Out-Host
$list | Select-Object -First 1
}
$output = Test-Output
# When running the line above Dogs, Cats, and Monkeys will display in the console
# Dogs
# Cats
# Monkeys
# However the variable $output will only contain 'Dogs'
$output
# Dogs

PowerShell: How to get count of piped collection?

Suppose I have a process that generates a collection of objects. For a very simple example, consider $(1 | get-member). I can get the number of objects generated:
PS C:\WINDOWS\system32> $(1 | get-member).count
21
or I can do something with those objects.
PS C:\WINDOWS\system32> $(1 | get-member) | ForEach-object {write-host $_.name}
CompareTo
Equals
...
With only 21 objects, doing the above is no problem. But what if the process generates hundreds of thousands of objects? Then I don't want to run the process once just to count the objects and then run it again to execute what I want to do with them. So how can I get a count of objects in a collection sent down the pipeline?
A similar question was asked before, and the accepted answer was to use a counter variable inside the script block that works on the collection. The problem is that I already have that counter and what I want is to check that the outcome of that counter is correct. So I don't want to just count inside the script block. I want a separate, independent measure of the size of the collection that I sent down the pipeline. How can I do that?
If processing and counting is needed:
Doing your own counting inside a ForEach-Object script block is your best bet to avoid processing in two passes.
The problem is that I already have that counter and what I want is to check that the outcome of that counter is correct.
ForEach-Object is reliably invoked for each and every input object, including $null values, so there should be no need to double-check.
If you want a cleaner separation of processing and counting, you can pass multiple -Process script blocks to ForEach-Object (in this example, { $_ + 1 } is the input-processing script block and { ++$count } is the input-counting one):
PS> 1..5 | ForEach-Object -Begin { $count = 0 } `
-Process { $_ + 1 }, { ++$count } `
-End { "--- count: $count" }
2
3
4
5
6
--- count: 5
Note that, due to a quirk in ForEach-Object's parameter binding, passing -Begin and -End script blocks is actually required in order to pass multiple -Process (per-input-object) blocks; pass $null if you don't actually need -Begin and/or -End - see GitHub issue #4513.
Also note that the $count variable lives in the caller's scope, and is not scoped to the ForEach-Object call; that is, $count = 0 potentially updates a preexisting $count variable, and, if it didn't previously exist, lives on after the ForEach-Object call.
If only counting is needed:
Measure-Object is the cmdlet to use with large, streaming input collections in the pipeline[1]:
The following example generates 100,000 integers one by one and has Measure-Object count them one by one, without collecting the entire input in memory.
PS> (& { $i=0; while ($i -lt 1e5) { (++$i) } } | Measure-Object).Count
100000
Caveat: Measure-Object ignores $null values in the input collection - see GitHub issue #10905.
Note that while counting input objects is Measure-Object's default behavior, it supports a variety of other operations as well, such as summing -Sum and averaging (-Average), optionally combined in a single invocation.
[1] Measure-Object, as a cmdlet, is capable of processing input in a streaming fashion, meaning it counts objects it receives one by one, as they're being received, which means that even very large streaming input sets (those also created one by one, such as enumerating the rows of a large CSV file with Import-Csv) can be processed without the risk of running out of memory - there is no need to load the input collection as a whole into memory. However, if (a) the input collection already is in memory, or (b) it can fit into memory and performance is important, then use (...).Count.

Pipeline record invalidation

I want to chain cmdlets together in a pipeline starting with a Import-Csv file of NN records to process, at the end I want to write out a result file for all NN records and the accumulated results of processing. Along the way I may want to add data to the pipeline that says don't process this record any further, but still pass it along in the pipeline.
I envisioned this looking like this:
Import-Csv input | step-1 -env DEV | step-2 | step-3 | Export-Csv result
Each cmdlet being written to pipe all $_ properties for each record and keep them in the pipeline.
What's the best way to read some sort of "CanContinue" property and if it is false short circuit processing and just pass it along to the next cmdlet in the pipeline without processing?
I'm assuming you don't want this flag to be part of the resulting CSV. The way I see it you can use 2 similar approaches: add the flag to the object being processed before returning it, or wrap the object in another object (which contains the flag and another property that holds the original object).
For now I'm going to explore the first option where you add the property. I'm going to change it from a positive CanContinue to a negative, DoNotProcess so that its non-existence can coalesce to $false (do continue).
To do this, in each of your processing functions, just check the value of the DoNotContinue property. If it's $true, return the original object you received without additional processing.
If it's $false, you can do your processing and if the conditions are met that processing should stop, you force add the property with $true:
Process {
# processing done
$MyObj |
Add-Member -NotePropertyName DoNotContinue -NotePropertyValue $true -Force -PassThru
}
All such commands can handle it this way.
Now when it comes to the end of the pipeline, you don't want this property written to the CSV. For that, strip it off with Select-Object:
Import-Csv input |
step-1 -env DEV |
step-2 |
step-3 |
Select-Object -Property * -ExcludeProperty DoNotContinue |
Export-Csv result
Bonus:
Refer back to my answer on another question of yours, and instead of manually checking for the property, define it as a parameter in your processing cmdlets with [Parameter(ValueFromPipelineByPropertyName)] like so:
param(
[Parameter(ValueFromPipelineByPropertyName)]
[Switch]
$DoNotContinue
)
Why do that? Because you let PowerShell process the object for you and you only have to check the value of $DoNotContinue. It also allows you to override that value for a particular call.
(in this case, I'd rename it to $DoNotProcess or $SkipProcessing or something; remember you can also use [Alias()] if you want it to have multiple names)

How to transpose data in powershell

I have a file that looks like this:
a,1
b,2
c,3
a,4
b,5
c,6
(...repeat 1,000s of lines)
How can I transpose it into this?
a,b,c
1,2,3
4,5,6
Thanks
Here's a brute-force one-liner from hell that will do it:
PS> Get-Content foo.txt |
Foreach -Begin {$names=#();$values=#();$hdr=$false;$OFS=',';
function output { if (!$hdr) {"$names"; $global:hdr=$true}
"$values";
$global:names=#();$global:values=#()}}
-Process {$n,$v = $_ -split ',';
if ($names -contains $n) {output};
$names+=$n; $values+=$v }
-End {output}
a,b,c
1,2,3
4,5,6
It's not what I'd call elegant but should get you by. This should copy/paste correctly as-is. However if you reformat it to what is shown above you will need put back-ticks after the last curly on both the Begin and Process scriptblocks. This script requires PowerShell 2.0 as it relies on the new -split operator.
This approach makes heavy use of the Foreach-Object cmdlet. Normally when you use Foreach-Object (alias is Foreach) in the pipeline you specify just one scriptblock like so:
Get-Process | Foreach {$_.HandleCount}
That prints out the handle count for each process. This usage of Foreach-Object uses the -Process scriptblock implicitly which means it executes once for each object it receives from the pipeline. Now what if we want to total up all the handles for each process? Ignore the fact that you could just use Measure-Object HandleCount -Sum to do this, I'll show you how Foreach-Object can do this. As you see in the original solution to this problem, Foreach can take both a Begin scriptblock that is executed once for the first object in the pipeline and a End scripblock that executes when there are no more objects in the pipeline. Here's how you can total the handle count using Foreach-Object:
gps | Foreach -Begin {$sum=0} -Process {$sum += $_.HandleCount } -End {$sum}
Relating this back to the problem solution, in the Begin scriptblock I initialize some variables to hold the array of names and values as well as a bool ($hdr) that tells me whether or not the header has been output (we only want to output it once). The next mildly mind blowing thing is that I also declare a function (output) in the Begin scriptblock that I call from both the Process and End scriptblocks to output the current set of data stored in $names and $values.
The only other trick is that the Process scriptblock uses the -contains operator to see if the current line's field name has already been seen before. If so, then output the current names and values and reset those arrays to empty. Otherwise just stash the name and value in the appropriate arrays so they can be saved later.
BTW the reason the output function needs to use the global: specifier on the variables is that PowerShell performs a "copy-on-write" approach when a nested scope modifies a variable defined outside its scope. However when we really want that modification to occur at the higher scope, we have to tell PowerShell that by using a modifier like global: or script:.