Pipeline record invalidation

Pipeline record invalidation - powershell

I want to chain cmdlets together in a pipeline starting with a Import-Csv file of NN records to process, at the end I want to write out a result file for all NN records and the accumulated results of processing. Along the way I may want to add data to the pipeline that says don't process this record any further, but still pass it along in the pipeline.
I envisioned this looking like this:
Import-Csv input | step-1 -env DEV | step-2 | step-3 | Export-Csv result
Each cmdlet being written to pipe all $_ properties for each record and keep them in the pipeline.
What's the best way to read some sort of "CanContinue" property and if it is false short circuit processing and just pass it along to the next cmdlet in the pipeline without processing?

I'm assuming you don't want this flag to be part of the resulting CSV. The way I see it you can use 2 similar approaches: add the flag to the object being processed before returning it, or wrap the object in another object (which contains the flag and another property that holds the original object).
For now I'm going to explore the first option where you add the property. I'm going to change it from a positive CanContinue to a negative, DoNotProcess so that its non-existence can coalesce to $false (do continue).
To do this, in each of your processing functions, just check the value of the DoNotContinue property. If it's $true, return the original object you received without additional processing.
If it's $false, you can do your processing and if the conditions are met that processing should stop, you force add the property with $true:
Process {
# processing done
$MyObj |
Add-Member -NotePropertyName DoNotContinue -NotePropertyValue $true -Force -PassThru
}
All such commands can handle it this way.
Now when it comes to the end of the pipeline, you don't want this property written to the CSV. For that, strip it off with Select-Object:
Import-Csv input |
step-1 -env DEV |
step-2 |
step-3 |
Select-Object -Property * -ExcludeProperty DoNotContinue |
Export-Csv result
Bonus:
Refer back to my answer on another question of yours, and instead of manually checking for the property, define it as a parameter in your processing cmdlets with [Parameter(ValueFromPipelineByPropertyName)] like so:
param(
[Parameter(ValueFromPipelineByPropertyName)]
[Switch]
$DoNotContinue
)
Why do that? Because you let PowerShell process the object for you and you only have to check the value of $DoNotContinue. It also allows you to override that value for a particular call.
(in this case, I'd rename it to $DoNotProcess or $SkipProcessing or something; remember you can also use [Alias()] if you want it to have multiple names)

Related

In a powershell pipeline, how to wait for a cmdlet to complete before proceding to the next one?

Consider the following function:
function myFunction {
100
sleep 1
200
sleep 1
300
sleep 1
}
As you can see it will emit those values, one by one down the pipeline.
But I want to wait for all the values to be emitted before going on. Like
myFunction | waitForThePreviousCommandToComplete | Format-Table
I want the Format-Table above to receive the entire array, instead of one-by-one items.
Is it even possible in Powershell?

Use
(...), the grouping operator in order to collect a command's output in full first, before sending it to the success output stream (the pipeline).
# Due to (...), doesn't send myfunction's output to Format-Table until it has run
# to completion and all its output has been collected.
(myFunction) | Format-Table
# Also works for entire pipelines.
(100, 200, 300 | ForEach-Object { $_; Start-Sleep 1 }) | Format-Table
Note:
If you need to up-front collect the output of multiple commands (pipelines) and / or language statements, use $(...), the subexpression operator instead, e.g. $(Get-Date -Year 2020; Get-Date -Year 2030) | Format-Table; the next point applies to it as well.
Whatever output was collected by (...) is enumerated, i.e., if the collected output is an enumerable, its elements are emitted one by one to the success output stream - albeit without any delay at that point.
Note that the collected output is invariably an enumerable (an array of type [object[]]) if two or more output objects were collected, but it also can be one in the usual event that a single object that itself is an enumerable was collected.
E.g., (Write-Output -NoEnumerate 1, 2, 3) | Measure-Object reports a count of 3, even though Write-Output -NoEnumerate output the given array as a single object (without (...), Measure-Object would report 1).
Typically, commands (cmdlets, functions, scripts) stream their output objects, i.e. emit them one by one to the pipeline, as soon as they are produced, while the command is still running, as your function does, and also act on their pipeline input one by one. However, some cmdlets invariably, themselves collect all input objects first, before they start emitting their output object(s), of conceptual necessity: notable examples are Sort-Object, Group-Object, and Measure-Object, all of which must act on the entirety of their input before they can start emitting results. Ditto for Format-Table when it is passed the -AutoSize switch, discussed next.
In the case of Format-Table, specifically, you can use the -AutoSize switch in order force it to collect all input first, in order to determine suitable display column widths based on all data (by default, Format-Table waits for 300 msecs. in order to determine column widths, based on whatever subset of the input data it has received by then).
However, this does not apply to so-called out-of-band-formatted objects, notably strings and primitive .NET types, which are still emitted (by their culture-invariant .ToString() representation) as they're being received.
Only complex objects (those with properties) are collected first, notably hashtables and [pscustomobject] instances; e.g.:
# Because this ForEach-Object call outputs complex objects (hashtables),
# Format-Table, due to -AutoSize, collects them all first,
# before producing its formatted output.
100, 200, 300 | ForEach-Object { #{ num = $_ }; Start-Sleep 1 } |
Format-Table -AutoSize
If you want to create a custom function that collects all of its pipeline input up front, you have two options:
Create a simple function that uses the automatic $input variable in its function body, which implicitly runs only after all input has been received; e.g.:
# This simple function simply relays its input, but
# implicitly only after all of it has been collected.
function waitForThePreviousCommandToComplete { $input }
# Output doesn't appear until after the ForEach-Object
# call has emitted all its output.
100, 200, 300 | ForEach-Object { $_; Start-Sleep 1 } | waitForThePreviousCommandToComplete
In the context of an advanced function, you'll have to manually collect all input, iteratively in the process block, via a list-type instance allocated in the begin block, which you can then process in the end block.
While using a simple function with $input is obviously simpler, you may still want an advanced one for all the additional benefits it offers (preventing unbound arguments, parameter validation, multiple pipeline-binding parameters, ...).
See this answer for an example.

Sort waits until it has everything.
myFunction | sort-object
Or:
(myFunction)
$(myfunction1; myFunction2)
myFunction | format-table -autosize
myFunction | more
See also: How to tell PowerShell to wait for each command to end before starting the next?

For some unknown reason, just putting the function inside brackets solved my problem:
(myFunction) | Format-Table

Powershell output PSCustomObject blocks output hashtable [duplicate]

I'm learning PowerShell and a vast number of articles I read strongly discourages the use of write-host telling me it's "bad practice" and almost always, the output can be displayed in another way.
So, I'm taking the advice and try to avoid use of write-host. One suggestion I found was to use write-output instead. As far as I understand, this puts everything in a pipeline, and the output is executed at the end of the script (?).
However, I have problems outputting what I want. This example demonstrates the issue:
$properties = #{'OSBuild'="910ef01.2.8779";
'OSVersion'="CustomOS 3";
'BIOSSerial'="A5FT-XT2H-5A4B-X9NM"}
$object = New-Object –TypeName PSObject –Prop $properties
Write-output $object
$properties = #{'Site'="SQL site";
'Server'="SQL Server";
'Database'="SQL Database"}
$object = New-Object –TypeName PSObject –Prop $properties
Write-Output $object
This way I get a nice output of the first object displaying the OS data, but the second object containing the SQL data is never displayed. I've tried renaming the variable names, and a bunch of other different stuff, but no luck.
While troubleshooting this problem, I found similar problems with suggestions to just replace write-output with write-host. This gets me very confused. Why are some people strongly discouraging write-host, while other people encourage it?
And how exactly do I output these two objects in a fashionably manner? I do not fully understand the pipeline mechanism of write-output.

Just to clarify: the problem is only a display problem:
When outputting to the console, if the first object is table-formatted (if Format-Table is applied, which happens implicitly in your case), the display columns are locked in based on that first object's properties.
Since your second output object shares no properties with the first one, it contributes nothing to the table display and is therefore effectively invisible.
By contrast, if you programmatically process the script's output - assign it to a variable or send its output through the pipeline to another command - both objects will be there.
See Charlie Joynt's answer for a helpful example of assigning the two output objects to separate variables.
The simplest solution to the display problem is to explicitly format for display each input object individually - see below.
For a given single object inside a script, you can force formatted to-display (to-host) output with Out-Host:
$object | Out-Host # same as: Write-Output $object | Out-Host
Note, however, that this outputs directly and invariably to the console only and the object is then not part of the script's data output (the objects written to the success output stream, the stream with index 1).
In other words: if you try to assign the script's output to a variable or send its output to another command in a pipeline, that object won't be there.
See below for why Out-Host is preferable to Write-Host, and why it's better to avoid Write-Host in most situations.
To apply the technique ad hoc to a given script's output as a whole, so as to make sure you see all output objects, use:
./some-script.ps1 | % { $_ | Out-String } # % is the built-in alias of ForEach-Object
Note that here too you could use Out-Host, but the advantage of using Out-String is that it still allows you to capture the for-display representation in a file, if desired.
Here's a simple helper function (filter) that you can put in your $PROFILE:
# Once defined, you can use: ./some-script.ps1 | Format-Each
Filter Format-Each { $_ | Out-String }
PetSerAl's suggestion - ./some-script.ps1 | Format-List - works in principle too, but it switches the output from the usual table-style output to list-style output, with each property listed on its own line, which may be undesired.
Conversely, however, Format-Each, if an output object is (implicitly) table-formatted, prints a header for each object.
Why Write-Output doesn't help:
Write-Output doesn't help, because it writes to where output objects go by default anyway: the aforementioned success output stream, where data should go.
If the output stream's objets aren't redirected or captured in some form, they are sent to the host by default (typically, the console), where the automatic formatting is applied.
Also, use of Write-Output is rarely necessary, because simply not capturing or redirecting a command or expression implicitly writes to the success stream; another way of putting it:
Write-Output is implied.
Therefore, the following two statements are equivalent:
Write-Output $object # write $object to the success output stream
$object # same; *implicitly* writes $object to the success output stream
Why use of Write-Host is ill-advised, both here and often in general:
Assuming you do know the implications of using Write-Host in general - see below - you could use it for the problem at hand, but Write-Host applies simple .ToString() formatting to its input, which does not give you the nice, multi-line formatting that PowerShell applies by default.
Thus, Out-Host (and Out-String) were used above, because they do apply the same, friendly formatting.
Contrast the following two statements, which print a hash-table ([hashtable]) literal:
# (Optional) use of Write-Output: The friendly, multi-line default formatting is used.
# ... | Out-Host and ... | Out-String would print the same.
PS> Write-Output #{ foo = 1; bar = 'baz' }
Name Value
---- -----
bar baz
foo 1
# Write-Host: The hashtable's *entries* are *individually* stringified
# and the result prints straight to the console.
PS> Write-Host #{ foo = 1; bar = 'baz' }
System.Collections.DictionaryEntry System.Collections.DictionaryEntry
Write-Host did two things here, which resulted in near-useless output:
The [hashtable] instance's entries were enumerated and each entry was individually stringified.
The .ToString() stringification of hash-table entries (key-value pairs) is System.Collections.DictionaryEntry, i.e., simply the type name of the instance.
The primary reasons for avoiding Write-Host in general are:
It outputs directly to the host (console) rather than to PowerShell's success output stream.
As a beginner, you may mistakenly think that Write-Host is for writing results (data), but it isn't.
In bypassing PowerShell's system of streams, Write-Host output cannot be redirected - that is, it can neither be suppressed nor captured (in a file or variable).
That said, starting with PowerShell v5.0, you can now redirect its output via the newly introduced information stream (number 6; e.g., ./some-script.ps1 6>write-host-output.txt); however, that stream is more properly used with the new Write-Information cmdlet.
By contrast, Out-Host output still cannot be redirected.
That leaves just the following legitimate uses of Write-Host:
Creating end-user prompts and colored for-display-only representations:
Your script may have interactive prompts that solicit information from the user; using Write-Host - optionally with coloring via the -ForegroundColor and -BackgroundColor parameters - is appropriate, given that prompt strings should not become part of the script's output and users also provide their input via the host (typically via Read-Host).
Similarly, you can use Write-Host with selective coloring to explicitly create friendlier for-display-only representations.
Quick prototyping: If you want a quick-and-dirty way to write status/diagnostic information directly to the console without interfering with a script's data output.
However, it is generally better to use Write-Verbose and Write-Debug in such cases.

Generally speaking the expectation is for script/functions to return a single "type" of object, often with many instances. For example, Get-Process returns a load of processes, but they all have the same fields. As you'll have seen from the tutorials, etc. you can then pass the output of Get-Process along a pipeline and process the data with subsequent cmdlets.
In your case you are returning two different types of object (i.e. with two different sets of properties). PS outputs the first object, but not the second one (which doesn't match the first) as you discovered. If you were to add extra properties to the first object that match those used in the second one, then you'd see both objects.
Write-Host doesn't care about this sort of stuff. The push-back against using this is mainly to do with (1) it being a lazy way to give feedback about script progress, i.e. use Write-Verbose or Write-Debug instead and (2) it being imperfect when it comes to passing objects along a pipeline, etc.
Clarification on point (2), helpfully raised in the comments to this answer:
Write-Host is not just imperfect with respect to the pipeline /
redirection / output capturing, you simply cannot use it for that in
PSv4 and earlier, and in PSv5+ you have to awkwardly use 6>; also,
Write-Host stringifies with .ToString(), which often produces
unhelpful representations
If your script is really just meant to print data to the console then go ahead and Write-Host.
Alternatively, you can return multiple objects from a script or function. Using return or Write-Output, just return object objects comma-separated. For example:
Test-WriteOutput.ps1
$object1 = [PSCustomObject]#{
OSBuild = "910ef01.2.8779"
OSVersion = "CustomOS 3"
BIOSSerial = "A5FT-XT2H-5A4B-X9NM"
}
$object2 = [PSCustomObject]#{
Site = "SQL site"
Server= "SQL Server"
Database="SQL Database"
}
Write-Output $object1,$object2
The run the script, assigning the output into two variables:
$a,$b = .\Test-WriteOutput.ps1
You'll see that $a is $object1 and $b is $object2.

use write-host, write-output is for pipeline (and by default on console after clear)

PowerShell: What is the point of ForEach-Object with InputObject?

The documentation for ForEach-object says "When you use the InputObject parameter with ForEach-Object, instead of piping command results to ForEach-Object, the InputObject value is treated as a single object." This behavior can easily be observed directly:
PS C:\WINDOWS\system32> ForEach-Object -InputObject #(1, 2, 3) {write-host $_}
1 2 3
This seems weird. What is the point of a "ForEach" if there is no "each" to do "for" on? Is there really no way to get ForEach-object to act directly on the individual elements of an array without piping? if not, it seems that ForEach-Object with InputObject is completely useless. Is there something I don't understand about that?

In the case of ForEach-Object, or any cmdlet designed to operate on a collection, using the -InputObject as a direct parameter doesn't make sense because the cmdlet is designed to operate on a collection, which needs to be unrolled and processed one element at a time. However, I would also not call the parameter "useless" because it still needs to be defined so it can be set to allow input via the pipeline.
Why is it this way?
-InputObject is, by convention, a generic parameter name for what should be considered to be pipeline input. It's a parameter with [Parameter(ValueFromPipeline = $true)] set to it, and as such is better suited to take input from the pipeline rather passed as a direct argument. The main drawback of passing it in as a direct argument is that the collection is not guaranteed to be unwrapped, and may exhibit some other behavior that may not be intended. From the about_pipelines page linked to above:
When you pipe multiple objects to a command, PowerShell sends the objects to the command one at a time. When you use a command parameter, the objects are sent as a single array object. This minor difference has significant consequences.
To explain the above quote in different words, passing in a collection (e.g. an array or a list) through the pipeline will automatically unroll the collection and pass it to the next command in the pipeline one at a time. The cmdlet does not unroll -InputObject itself, the data is delivered one element at a time. This is why you might see problems when passing a collection to the -InputObject parameter directly - because the cmdlet is probably not designed to unroll a collection itself, it expects each collection element to be handed to it in a piecemeal fashion.
Consider the following example:
# Array of hashes with a common key
$myHash = #{name = 'Alex'}, #{name='Bob'}, #{name = 'Sarah'}
# This works as intended
$myHash | Where-Object { $_.name -match 'alex' }
The above code outputs the following as expected:
Name Value
---- -----
name Alex
But if you pass the hash as InputArgument directly like this:
Where-Object -InputObject $myHash { $_.name -match 'alex' }
It returns the whole collection, because -InputObject was never unrolled as it is when passed in via the pipeline, but in this context $_.name -match 'alex' still returns true. In other words, when providing a collection as a direct parameter to -InputObject, it's treated as a single object rather than executing each time against each element in the collection. This can also give the appearance of working as expected when checking for a false condition against that data set:
Where-Object -InputObject $myHash { $_.name -match 'frodo' }
which ends up returning nothing, because even in this context frodo is not the value of any of the name keys in the collection of hashes.
In short, if something expects the input to be passed in as pipeline input, it's usually, if not always, a safer bet to do it that way, especially when passing in a collection. However, if you are working with a non-collection, then there is likely no issue if you opt to use the -InputObject parameter directly.

Bender the Greatest's helpful answer explains the current behavior well.
For the vast majority of cmdlets, direct use of the -InputObject parameter is indeed pointless and the parameter should be considered an implementation detail whose sole purpose is to facilitate pipeline input.
There are exceptions, however, such as the Get-Member cmdlet, where direct use of -InputObject allows you to inspect the type of a collection itself, whereas providing that collection via the pipeline would report information about its elements' types.
Given how things currently work, it is quite unfortunate that the -InputObject features so prominently in most cmdlets' help topics, alongside "real" parameters, and does not frame the issue with enough clarity (as of this writing): The description should clearly convey the message "Don't use this parameter directly, use the pipeline instead".
This GitHub issue provides an categorized overview of which cmdlets process direct -InputObject arguments how.
Taking a step back:
While technically a breaking change, it would make sense for -InputObject parameters (or any pipeline-binding parameter) to by default accept and enumerate collections even when they're passed by direct argument rather than via the pipeline, in a manner that is transparent to the implementing command.
This would put direct-argument input on par with pipeline input, with the added benefit of the former resulting in faster processing of already-in-memory collections.

PowerShell: write-output only writes one object

I'm learning PowerShell and a vast number of articles I read strongly discourages the use of write-host telling me it's "bad practice" and almost always, the output can be displayed in another way.
So, I'm taking the advice and try to avoid use of write-host. One suggestion I found was to use write-output instead. As far as I understand, this puts everything in a pipeline, and the output is executed at the end of the script (?).
However, I have problems outputting what I want. This example demonstrates the issue:
$properties = #{'OSBuild'="910ef01.2.8779";
'OSVersion'="CustomOS 3";
'BIOSSerial'="A5FT-XT2H-5A4B-X9NM"}
$object = New-Object –TypeName PSObject –Prop $properties
Write-output $object
$properties = #{'Site'="SQL site";
'Server'="SQL Server";
'Database'="SQL Database"}
$object = New-Object –TypeName PSObject –Prop $properties
Write-Output $object
This way I get a nice output of the first object displaying the OS data, but the second object containing the SQL data is never displayed. I've tried renaming the variable names, and a bunch of other different stuff, but no luck.
While troubleshooting this problem, I found similar problems with suggestions to just replace write-output with write-host. This gets me very confused. Why are some people strongly discouraging write-host, while other people encourage it?
And how exactly do I output these two objects in a fashionably manner? I do not fully understand the pipeline mechanism of write-output.

Generally speaking the expectation is for script/functions to return a single "type" of object, often with many instances. For example, Get-Process returns a load of processes, but they all have the same fields. As you'll have seen from the tutorials, etc. you can then pass the output of Get-Process along a pipeline and process the data with subsequent cmdlets.
In your case you are returning two different types of object (i.e. with two different sets of properties). PS outputs the first object, but not the second one (which doesn't match the first) as you discovered. If you were to add extra properties to the first object that match those used in the second one, then you'd see both objects.
Write-Host doesn't care about this sort of stuff. The push-back against using this is mainly to do with (1) it being a lazy way to give feedback about script progress, i.e. use Write-Verbose or Write-Debug instead and (2) it being imperfect when it comes to passing objects along a pipeline, etc.
Clarification on point (2), helpfully raised in the comments to this answer:
Write-Host is not just imperfect with respect to the pipeline /
redirection / output capturing, you simply cannot use it for that in
PSv4 and earlier, and in PSv5+ you have to awkwardly use 6>; also,
Write-Host stringifies with .ToString(), which often produces
unhelpful representations
If your script is really just meant to print data to the console then go ahead and Write-Host.
Alternatively, you can return multiple objects from a script or function. Using return or Write-Output, just return object objects comma-separated. For example:
Test-WriteOutput.ps1
$object1 = [PSCustomObject]#{
OSBuild = "910ef01.2.8779"
OSVersion = "CustomOS 3"
BIOSSerial = "A5FT-XT2H-5A4B-X9NM"
}
$object2 = [PSCustomObject]#{
Site = "SQL site"
Server= "SQL Server"
Database="SQL Database"
}
Write-Output $object1,$object2
The run the script, assigning the output into two variables:
$a,$b = .\Test-WriteOutput.ps1
You'll see that $a is $object1 and $b is $object2.

use write-host, write-output is for pipeline (and by default on console after clear)

using powershell and pipeing output od Select-Object to access selected columns

I have the power shell below that selectes certain fields
dir -Path E:\scripts\br\test | Get-FileMetaData | Select-Object name, Comments, Path, Rating
what i want to do is utilize Name,Comments,Path,Rating in further Pipes $_.name etc dosnt work

If I understand your question correctly, you want to do something with the output of Select-Object, but you want to do it in a pipeline.
To do this, you need to pass the output down the pipeline into a Cmdlet that accepts pipeline input (such as ForEach-Object). If the next operation in the pipeline does not accept pipeline input, you will have to set the output to a variable and access the information through the variable,
Using ForEach-Object
In this method, you will be processing each object individually. This will be similar to the first option in Method 1 (that is, dealing with individual items in the collection of items returned by Select-Object).
dir | Get-FileMetaData | Select-Object Name,Comments,Path,Rating | ForEach-Object {
# Do stuff with $_
# Note that $_ is a single item in the collection returned by Select-Object
}
The variable method is included in case your next Cmdlet does not accept pipeline input.
Using Variable
In this method, you will treat $tempVariable as an array and you can operate on each item. If need be, you can actually access each column individually, getting everything at once.
$tempVariable = dir | Get-FileMetaData | Select-Object Name,Comments,Path,Rating
# Do stuff with each Name by using $tempVariable[i].Name, etc.
# Or do stuff with all Names by using $tempVariable.Name, etc.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse