How does the PowerShell Pipeline Concept work? - powershell

I understand that PowerShell piping works by taking the output of one cmdlet and passing it to another cmdlet as input. But how does it go about doing this?
Does the first cmdlet finish and then pass all the output variables across at once, which are then processed by the next cmdlet?
Or is each output from the first cmdlet taken one at a time and then run it through all of the remaining piped cmdlet’s?

You can see how pipeline order works with a simple bit of script:
function a {begin {Write-Host 'begin a'} process {Write-Host "process a: $_"; $_} end {Write-Host 'end a'}}
function b {begin {Write-Host 'begin b'} process {Write-Host "process b: $_"; $_} end {Write-Host 'end b'}}
function c { Write-Host 'c' }
1..3 | a | b | c
Outputs:
begin a
begin b
process a: 1
process b: 1
process a: 2
process b: 2
process a: 3
process b: 3
end a
end b
c

Powershell pipe works in an asynchronous way. Meaning that output of the first cmdlet is available to the second cmdlet immediately one object at the time (even if the first one has not finished executing).
For example if you run the below line:
dir -recurse| out-file C:\a.txt
and then stop the execution by pressing Control+C you will see part of directory is written to the text file.
A better example is the following code:(which is indeed useful to delete all of .tmp files on drive c:)
get-childitem c:\ -include *.tmp -recurse | foreach ($_) {remove-item $_.fullname}
Each time $_ in the second cmdlet gets value of a (single file)

Both answers thusfar give you some good information about pipelining. However, there is more to be said.
First, to directly address your question, you posited two possible ways the pipeline might work. And they are both right... depending on the cmdlets on either side of the pipe!
However, the way the pipeline should work is closer to your second notion: objects are processed one at a time. (Though there's no guarantee that an object will go all the way through before the next one is started because each component in the pipeline is asynchronous, as S Nash mentioned.)
So what do I mean by "it depends on your cmdlets" ?
If you are talking about cmdlets supplied by Microsoft, they likely all work as you would expect, passing each object through the pipeline as efficiently as it can. But if you are talking about cmdlets that you write, it depends on how you write them: it is just as easy to write cmdlets that fail to do proper pipelining as those that succeed!
There are two principle failure modes:
generating all output before emitting any into the pipeline, or
collecting all pipeline input before processing any.
What you want to strive for, of course, is to process each input as soon as it is received and emit its output as soon as it is determined. For detailed examples of all of these see my article, Ins and Outs of the PowerShell Pipeline, just published on Simple-Talk.com.

Related

Powershell variable assignment vs pipeline

What is the fundamental difference between these two commands?
$myVar = & "notepad.exe"
and
& "notepad.exe" | Set-Variable "myVar"
With the first one, the command returns immediately without waiting for the exe to terminate, which was not what I expected.
With the second one (or anything else with pipeline, such as | Out-File or | Set-Content), the command waits properly for the exe to write a result in stdout and terminate.
Pipeline is nothing but taking the Output from the first set and passing it as an input to the second one. Pipelines act like a series of connected segments of pipe. Items moving along the pipeline must pass through each segment.
In your case, Powershell is actually waiting in both the cases. but if you use Measure-Command, there is a difference in execution time which is better in case of $myVar = & "C:\path to\program.exe" $argument

Pipeline semantics aren't propagated into Where-Object

I use the following command to run a pipeline.
.\Find-CalRatioSamples.ps1 data16 `
| ? {-Not (Test-GRIDDataset -JobName DiVertAnalysis -JobVersion 13 -JobSourceDatasetName $_ -Exists -Location UWTeV-linux)}
The first is a custom script of mine, and runs very fast (miliseconds). The second is a custom command, also written by me (see https://github.com/LHCAtlas/AtlasSSH/blob/master/PSAtlasDatasetCommands/TestGRIDDataset.cs). It is very slow.
Actually, it isn't so slow processing each line of input. The setup before the first line of input can be processed is very expensive. That done, however, it goes quite quickly. So all the expensive code gets executed once, and only the fairly fast code needs to be executed for each new pipeline input.
Unfortunately, when I want to do the ? { } construct above, it seems like PowerShell doesn't keep the pipe-line as it did before. It now calls me command a fresh time for each line of input, causing the command to redo all the setup for each line.
Is there something I can change in how I invoke the pipe-line? Or in how I've coded up my cmdlet to prevent this from happening? Or am I stuck because this is just the way Where-Object works?
It is working as designed. You're starting a new (nested) pipeline inside the scriptblock when you call your command.
If your function is doing the expensive code in its Begin block, then you need to directly pipe the first script into your function to get that advantage.
.\Find-CalRatioSamples.ps1 data16 |
Test-GRIDDataset -JobName DiVertAnalysis -JobVersion 13 -Exists -Location UWTeV-linux |
Where-Object { $_ }
But then it seems that you are not returning the objects you want (the original).
One way you might be able to change Test-GRIDDataset is to implement a -PassThru switch, though you aren't actually accepting the full objects from your original script, so I'm unable to tell if this is feasible; but the code you wrote seems to be retrieving... stuff(?) from somewhere based on the name. Perhaps that would be sufficient? When -PassThru is specified, send the objects through the pipeline if they exist (rather than just a boolean of whether or not they do).
Then your code would look like this:
.\Find-CalRatioSamples.ps1 data16 |
Test-GRIDDataset -JobName DiVertAnalysis -JobVersion 13 -Exists -Location UWTeV-linux -PassThru

Foreach of a piped number array isnt breaking properly

I dont quite understand this, but why doesn't the following code not work:
"start"
1..5 | foreach {
"$_"
break
}
"stop"
I've done a couple tests and this code does work properly :
"start"
foreach ($num in 1..5){
"$num"
break
}
"stop"
Is there a way to make the first example run properly? The last outputted line should be "stop".
Like so:
start
1
stop
First, you should know that you are using two entirely different language features when you use foreach ($thing in $things) {} vs. $things | foreach { }.
The first is the built-in foreach statement, and the second is an alias for ForEach-Object, and they work very differently.
ForEach-Object runs the scriptblock for each of the items, and it works within a pipeline.
The break statement in that case is only breaking out of the current item's execution. The "parent" so-to-speak doesn't know that the scriptblock exited because of break and it continues, executing the scriptblock for the next object.
How you would go about limiting the results depends on what you want to do.
If you just want to stop producing results, just don't return anything if the condition is met. You'll still run every iteration, but the results will be correct.
If you only need to return a certain number of items, like the first N items, the best way (from PowerShell v3 on) is to add Select-Object:
1..10 | ForEach-Object {
$_*2
} | Select-Object -First 5
This will only execute 5 times, and it will return the sequence 2,4,6,8,10.
This is because of how the pipeline works where each object gets sent through each cmdlet, and Select-Object can stop the pipeline so it doesn't keep executing.
Pre-version 3.0, the pipeline cannot be stopped in that way, and although the results will be correct, you won't have prevented the extra executions.
If you give more details on what your conditions are for exiting, I could give more input as to how you'd want to approach that particular problem (which may involve not using ForEach-Object).

Powershell: Write-Output -NoEnumerate not suppressing output to console

I'm writing a function in PowerShell that I want to be called via other PowerShell functions as well as be used as a standalone function.
With that objective in mind, I want to send a message down the pipeline using Write-Output to these other functions.
However, I don't want Write-Output to write to the PowerShell console. The TechNet page for Write-Output states:
Write-Output:
Sends the specified objects to the next command in the pipeline. If the command is the last command in the pipeline, the objects are displayed in the console.
-NoEnumerate:
By default, the Write-Output cmdlet always enumerates its output. The NoEnumerate parameter suppresses the default behavior, and prevents Write-Output from enumerating output. The NoEnumerate parameter has no effect on collections that were created by wrapping commands in parentheses, because the parentheses force enumeration.
For some reason, this -NoEnumerate switch will not work for me in either the PowerShell ISE or the PowerShell CLI. I always get output to my screen.
$data = "Text to suppress"
Write-Output -InputObject $data -NoEnumerate
This will always return 'Text to suppress' (no quotes).
I've seen people suggest to pipe to Out-Null like this:
$data = "Text to suppress"
Write-Output -InputObject $data -NoEnumerate | Out-Null
$_
This suppresses screen output, but when I use $_ I have nothing in my pipeline afterwards which defeats the purpose of me using Write-Output in the first place.
System is Windows 2012 with PowerShell 4.0
Any help is appreciated.
Write-Output doesn't write to the console unless it's the last command in the pipeline. In your first example, Write-Output is the only command in the pipeline, so its output is being dumped to the console. To keep that from happening, you need to send the output somewhere. For example:
Write-Output 5
will send "5" to the console, because Write-Output is the last and only command in the pipeline. However:
Write-Output 5 | Start-Sleep
no longer does that because Start-Sleep is now the next command in the pipeline, and has therefore become the recipient of Write-Output's data.
Try this:
Write your function as you have written it with Write-Output as the last command in the pipeline. This should send the output up the line to the invoker of the function. It's here that the invoker can use the output, and at the same time suppress writing to the console.
MyFunction blah, blah, blah | % {do something with each object in the output}
I haven't tried this, so I don't know if it works. But it seems plausible.
My question is not the greatest.
First of all Write-Output -NoEnumerate doesn't suppress output on Write-Output.
Secondly, Write-Output is supposed to write its output. Trying to make it stop is a silly goal.
Thirdly, piping Write-Output to Out-Null or Out-File means that the value you gave Write-Output will not continue down the pipeline which was the only reason I was using it.
Fourth, $suppress = Write-Output "String to Suppress" also doesn't pass the value down the pipeline.
So I'm answering my question by realizing if it prints out to the screen that's really not a terrible thing and moving on. Thank you for your help and suggestions.
Explicitly storing the output in a variable would be more prudent than trying to use an implicit automatic variable. As soon as another command is run, that implicit variable will lose the prior output stored in it. No automatic variable exists to do what you're asking.
If you want to type out a set of commands without storing everything in temporary variables along the way, you can write a scriptblock at the command line as well, and make use of the $_ automatic variable you've indicated you're trying to use.
You just need to start a new line using shift + enter and write the code block as you would in a normal scriptblock - in which you could use the $_ automatic variable as part of a pipeline.

Is it possible to make a cmdlet work with all items being piped into it at once?

Instead of counting sheep this evening, I created a cmdlet that lists all duplicate files in a directory. It's dirt stupid simple and it can only work with all files in a directory, and I'm not keen on reinventing the wheel to add filtering, so here's what I want to do with it instead:
dir | duplicates | del
The only catch is that, normally, any given command in the pipe only works with one object at a time, which will do no good whatsoever for detecting duplicates. (Of course there are no duplicates in a set of one, right?)
Is there a trick I can use to have the second command in the chain collect all the output from the first before doing its job and passing things on to the third command?
You can work with a single file at a time, you just have to store each file you receive it the Process block and then process all the files in an End block. This is how commands like Group & Sort work. They can't group or sort until they have all the input. Once they have all the input, they do their operation and then begin streaming the results down the pipeline again in grouped/sorted order.
So I actually came up with the answer while I was in the shower and came back to find Keith had already provided it. Here's an example anyway.
begin
{
Add-Type -Path ($env:USERPROFILE + '\bin\CollectionHelper.cs');
[string[]] $files = #()
}
process
{
$files += $FullName
}
end
{
[JMA.CollectionHelper]::Duplicates($files)
}