I have a console application which can take standard input. It buffers up the data until the execute command, at which point it executes it all, and sends the output to standard output.
At the moment, I am running this application from Powershell, piping commands into it, and then parsing the output. The data piped in is relatively small; however this application is being called about 1000 times. Each time it is executed, it has to load, and create network connections. I am wondering whether it might be more efficient to pipeline all the commands into a single instantiation of the console application.
I have tried this by adding all Powershell script, that manufactures the standard input for the console, into a function, then piping that function to the console application. This seems to work at first, but you eventually realise it is buffering up all the data in Powershell until the function has finished, then sending it to the console's StdIn. You can see this because I have a whole load of Write-Host statements that flash by, and only then do you see the output.
e.g.
Function Run-Command1
{
Write-Host "Run-Command1"
"GET nethost xxxx COLS id,name"
"EXEC"
}
Function Run-Command2
{
Write-Host "Run-Command2"
"GET nethost yyyy COLS id,name"
"GET users yyyy COLS id,name"
"EXEC"
}
...
Function Run-CommandX
{
...
}
Previously, I would use this as:
Run-Command1 | netapp.exe -connect QQQQ -U user -P password
Run-Command2 | netapp.exe -connect QQQQ -U user -P password
...
Run-CommandX | netapp.exe -connect QQQQ -U user -P password
But now I would like to do:
Function Run-Commands
{
Run-Command1
Run-Command2
...
Run-CommandX
}
Run-Commands |
netapp.exe -connect QQQQ -U user -P password
Ideally, I would like the Powershell pipeline behaviour to be extended to an external application. Is this possible?
I would like the Powershell pipeline behaviour to be extended to an external application.
I have a whole load of Write-Host statements that flash by, and only then do you see the output.
Tip of the hat to marsze.
PowerShell [Core] v6+ performs no buffering at all, and sends (stringified) output as it is being produced by a command to an external program, in the same manner that output is streamed between PowerShell commands.[1]
PowerShell's legacy edition (versions up to 5.1), Windows PowerShell, buffers in that it collects all output from a command first before sending it(s stringification) to an external program.
marsze's helpful answer shows a workaround based on direct use of .NET APIs.
However, I think even Windows PowerShell's behavior isn't the problem here: Your Run-Commands function executes very quickly - given that the functions it calls merely output string literals - and the resulting array of lines is then sent all at once to netapp.exe - and further processing, including when to produce output, is then up to netapp.exe. In PowerShell [Core] v6+, with PowerShell-side buffering out of the picture, the individual Run-Commmand<n> functions' output would be sent to netapp.exe ever so slightly earlier, but I wouldn't expect that to make a difference.
The upshot is that unless netapp.exe offers a way to adjust its input and output buffering, you won't be able to control the timing of its input processing and output production.
How PowerShell sends objects to an external program (native utility) via the pipeline:
It sends a stringified representation of each object:
in PowerShell [Core] v6+: as the object becomes available.
in Windows PowerShell: after having collected all output objects in memory first.
In other words: on the PowerShell side, from v6 onward, there is no buffering.[1]
However, receiving external programs typically do buffer the stdin (standard input) data they receive via the pipeline[2].
Similarly, external programs typically do buffer their stdout (standard output) streams (but PowerShell performs no additional buffering before passing the output on, such as to the terminal (console)).
PowerShell has no control over this behavior; either the external program itself offers an option to adjust buffering or, in limited cases on Linux, you can call the external program via the stdbuf utility.
Optional reading: How PowerShell stringifies objects when piping to external programs:
PowerShell, as of v7.1, knows only text when communicating with external programs; that is, data sent to such programs is converted to text, and output from such programs is interpreted as text - even though the underlying system IPC features are simply byte conduits.
The UTF-16-based .NET strings PowerShell uses are converted to byte streams for external programs based on the character encoding specified in the $OutputEncoding preference variable, which, regrettably, defaults to ASCII(!) in Windows PowerShell, and now sensibly to (BOM-less) UTF-8 in PowerShell [Core] v6+.
In other words: The encoding specified via $OutputEncoding must match the character encoding that the external program expects.
Conversely, it is the encoding specified in [Console]::OutputEncoding that determines how PowerShell interprets text received from an external program, i.e. how it converts the bytes received to .NET strings, line by line, with newlines stripped (which, when captured in a variable, amounts to either a single string, if only one line was output, or an array of strings).
The for-display representations you see in the PowerShell console (terminal) are also what is sent to external programs via the pipeline, as lines of text, specifically:
If an object (already) is a string (or [char] instance), PowerShell sends it as-is to the pipe, but with a platform-appropriate newline invariably appended.
That is, a CRLF newline is appended on Windows, and a LF-only newline on Unix-like platforms.
This behavior can be problematic, as there are situations where you do not want that, and there's no way to prevent it - see GitHub issue #5974, GitHub issue #13579, and this answer for a workaround.
If an object is, loosely speaking, a primitive type - something that is conceptually a single value, notably the various number types - it is stringified in a culture-sensitive manner, where available[3], a platform-appropriate newline is again invariably appended.
E.g., with, a French culture in effect (as reflected in Get-Culture), decimal fraction 1.2 - which PowerShell parses as a [double] value - is sent as 1,2<newline>.
Note that [bool] instances are not culture-sensitive and are always converted to strings True or False.
All other (complex) types are subject to PowerShell's rich for-display output formatting, and whatever you would see in the terminal (console) is also what is sent to external programs - which not only again potentially contains culture-sensitive representations, but is generally problematic in that these representations are designed for the human observer, not for programmatic processing.
The upshot:
Beware encoding problems - make sure $OutputEncoding and [Console]::OutputEncoding are set correctly.
To avoid unexpected culture-sensitivity and unexpected for-display formatting, it is best to deliberately construct the string representation you want to send.
[1] By default; however, you can explicitly request buffering - expressed as an object count - via the common -OutBuffer parameter
[2] On recent macOS and Linux platforms, the stdin buffer size is 64KB. On Unix-like platforms, utilities typically switch to line-buffering in interactive invocations, i.e. when the stream in question is connected to a terminal.
[3] The behavior is delegated to the .ToString() method of a type at hand, i.e. whether or not that method outputs a culture-sensitive representation.
EDIT: As #mklement0 pointed out, this is different in PowerShell Core.
In PowerShell 5.1 (and lower) think you would have to manually write each pipeline item to the external application's input stream.
Here's an attempt to build a function for that:
function Invoke-Pipeline {
[CmdletBinding()]
param (
[Parameter(Mandatory, Position = 0)]
[string]$FileName,
[Parameter(Position = 1)]
[string[]]$ArgumentList,
[int]$TimeoutMilliseconds = -1,
[Parameter(ValueFromPipeline)]
$InputObject
)
begin {
$process = [System.Diagnostics.Process]::Start((New-Object System.Diagnostics.ProcessStartInfo -Property #{
FileName = $FileName
Arguments = $ArgumentList
UseShellExecute = $false
RedirectStandardInput = $true
RedirectStandardOutput = $true
}))
$output = [System.Collections.Concurrent.ConcurrentQueue[string]]::new()
$event = Register-ObjectEvent -InputObject $process -EventName 'OutputDataReceived' ` -Action {
$Event.MessageData.TryAdd($EventArgs.Data)
} -MessageData $output
$process.BeginOutputReadLine()
}
process {
$process.StandardInput.WriteLine($InputObject)
[string]$line = ""
while (-not ($output.TryDequeue([ref]$line))) {
start-sleep -Milliseconds 1
}
do {
$line
} while ($output.TryDequeue([ref]$line))
}
end {
if ($TimeoutMilliseconds -lt 0) {
$exited = $process.WaitForExit()
}
else {
$exited = $process.WaitForExit($TimeoutMilliseconds)
}
if ($exited) {
$process.Close()
}
else {
try {$process.Kill()} catch {}
}
}
}
Run-Commands | Invoke-Pipeline netapp.exe "-connect QQQQ -U user -P password"
The problem is, that there is no perfect solution, because by definition, you cannot know when the external program will write something to its output stream, or how much.
Note: This function doesn't redirect the error stream. The approach would be the same though.
Related
Performing URLdecoding in CGI hybrid script by powershell oneliner:
echo %C4%9B%C5%A1%C4%8D%C5%99%C5%BE%C3%BD%C3%A1%C3%AD%C3%A9%C5%AF%C3%BA| powershell.exe "Add-Type -AssemblyName System.Web;[System.Web.HttpUtility]::UrlDecode($Input) | Write-Host"
execution time of this oneliner is between 2-3 seconds on virtual machine. Is it because of .NET object is employed? Is there any way to decrease execution time? Have also some C written, lightning fast urldecode.exe utility, but unfortunatelly it does not eat STDIN.
A note if you're passing the input data as a string literal, as in the sample call in the question (rather than as output from a command):
If you're calling from an interactive cmd.exe session, %C4%9B%C5%A1%C4%8D%C5%99%C5%BE%C3%BD%C3%A1%C3%AD%C3%A9%C5%AF%C3%BA works as-is - unless any of the tokens between paired % instances happen to be the name of an existing environment variable (which seems unlikely).
From a batch file, you'll have to escape the % chars. by doubling them - see this answer; which you can obtain by applying the following PowerShell operation on the original string:
'%C4%9B%C5%A1%C4%8D%C5%99%C5%BE%C3%BD%C3%A1%C3%AD%C3%A9%C5%AF%C3%BA' -replace '%', '%%'
Is it because of .NET object is employed?
Yes, powershell.exe, as a .NET-based application requires starting the latter's runtime (CLR), which is nontrivial in terms of performance.
Additionally, powershell.exe by default loads the initialization files listed in its $PROFILE variable, which can take additional time.
Pass the -NoProfile CLI option to suppress that.
Have also some C written, lightning fast urldecode.exe utility, but unfortunately it does not eat STDIN.
If so, pass the data as an argument, if feasible; e.g.:
urldecode.exe "%C4%9B%C5%A1%C4%8D%C5%99%C5%BE%C3%BD%C3%A1%C3%AD%C3%A9%C5%AF%C3%BA"
If the data comes from another command's output, you can use for /f to capture in a variable first, and then pass the latter.
If you do need to call powershell.exe, PowerShell's CLI, after all:
There's not much you can do in terms of optimizing performance:
Add -NoProfile, as suggested.
Pass the input data as an argument.
Avoid unnecessary calls such as Write-Host and rely on PowerShell's implicit output behavior instead.[1]
powershell.exe -NoProfile -c "Add-Type -AssemblyName System.Web;[System.Web.HttpUtility]::UrlDecode('%C4%9B%C5%A1%C4%8D%C5%99%C5%BE%C3%BD%C3%A1%C3%AD%C3%A9%C5%AF%C3%BA')"
[1] Optional reading: How do I output machine-parseable data from a PowerShell CLI call?
Note: The sample commands are assumed to be run from cmd.exe / outside PowerShell.
PowerShell's CLI only supports text as output, not also raw byte data.
In order to output data for later programmatic processing, you may have to explicitly ensure that what is output is machine-parseable rather than something that is meant for display only.
There are two basic choices:
Rely on PowerShell's default output formatting for outputting what are strings (text) to begin with, as well as for numbers - though for fractional and very large or small non-integer numbers additional effort may be required.
Explicitly use a structured, text-based data format, such as CSV or Json, to represent complex objects.
Rely on PowerShell's default output formatting:
If the output data is itself text (strings), no extra effort is needed. This applies to your case, and therefore simply implicitly outputting the string returned from the [System.Web.HttpUtility]::UrlDecode() call is sufficient:
# A simple example that outputs a 512-character string; note that
# printing to the _console_ (terminal) will insert line breaks for
# *display*, but the string data itself does _not_ contain any
# (other than a single _trailing one), irrespective of the
# console window width:
powershell -noprofile -c "'x' * 512"
If the output data comprises numbers, you may have to apply explicit, culture-invariant formatting if your code must run with different cultures in effect:
True integer types do not require special handling, as their string representation is in effect culture-neutral.
However, fractional numbers ([double], [decimal]) do use the current culture's decimal mark, which later processing may not expect:
# With, say, culture fr-FR (French) in effect, this outputs
# "1,2", i.e. uses a *comma* as the decimal mark.
powershell -noprofile -c "1.2"
# Simple workaround: Let PowerShell *stringify* the number explicitly
# in an *expandable* (interpolating) string, which uses
# the *invariant culture* for formatting, where the decimal
# mark is *always "." (dot).
# The following outputs "1.2", irrespective of what culture is in effect.
powershell -noprofile -c " $num=1.2; \"$num\" "
Finally, very large and very small [double] values can result in exponential notation being output (e.g., 5.1E-07 for 0.00000051); to avoid that, explicit number formatting is required, which can be done via the .ToString() method:
# The following outputs 0.000051" in all cultures, as intended.
powershell -noprofile -c "$n=0.000051; $n.ToString('F6', [cultureinfo]::InvariantCulture)"
More work is needed if you want to output representations of complex objects in machine-parseable form, as discussed in the next section.
Relying on PowerShell's default output formatting is not an option in this case, because implicit output and (equivalent explicit Write-Output calls) cause the CLI to apply for-display-only formatting, which is meaningful to the human observer but cannot be robustly parsed.
# Produces output helpful to the *human observer*, but isn't
# designed for *parsing*.
# `Get-ChildItem` outputs [System.IO.FileSystemInfo] objects.
powershell -noprofile -c "Get-ChildItem /"
Note that use of Write-Host is not an alternative: Write-Host fundamentally isn't designed for data output, and the textual representation it creates for complex objects are typically not even meaningful to the human observer - see this answer for more information.
Use a structured, text-based data format, such as CSV or Json:
Note:
Hypothetically, the simplest approach is to use the CLI's -OutputFormat Xml option, which serializes the output using the XML-based CLIXML format PowerShell itself uses for remoting and background jobs - see this answer.
However, this format is only natively understood by PowerShell itself, and for third-party applications to parse it they'd have to be .NET-based and use the PowerShell SDK.
Also, this format is automatically used for both serialization and deserialization if you call another PowerShell instance from PowerShell, with the command specified as a script block ({ ... }) - see this answer. However, there is rarely a need to call the PowerShell CLI from PowerShell itself, and direct invocation of PowerShell code and scripts provides full type fidelity as well as better performance.
Finally, note that all serialization formats, including CSV and JSON discussed below, have limits with respect to faithfully representing all aspects of the data, though -OutputFormat Xml comes closest.
PowerShell comes with cmdlets such as ConvertTo-Csv and ConvertTo-Json, which make it easy to convert output to the structured CSV and JSON formats.
Using a Get-Item call to get information about PowerShell's installation directory ($PSHOME) as an example; Get-Item outputs a System.IO.DirectoryInfo instance in this case:
Use of ConvertTo-Csv:
C:\>powershell -noprofile -c "Get-Item $PSHOME | ConvertTo-Csv -NoTypeInformation"
"PSPath","PSParentPath","PSChildName","PSDrive","PSProvider","PSIsContainer","Mode","BaseName","Target","LinkType","Name","FullName","Parent","Exists","Root","Extension","CreationTime","CreationTimeUtc","LastAccessTime","LastAccessTimeUtc","LastWriteTime","LastWriteTimeUtc","Attributes"
"Microsoft.PowerShell.Core\FileSystem::C:\Windows\System32\WindowsPowerShell\v1.0","Microsoft.PowerShell.Core\FileSystem::C:\Windows\System32\WindowsPowerShell","v1.0","C","Microsoft.PowerShell.Core\FileSystem","True","d-----","v1.0","System.Collections.Generic.List`1[System.String]",,"v1.0","C:\Windows\System32\WindowsPowerShell\v1.0","WindowsPowerShell","True","C:\",".0","12/7/2019 4:14:52 AM","12/7/2019 9:14:52 AM","3/14/2021 10:33:10 AM","3/14/2021 2:33:10 PM","11/6/2020 3:52:41 AM","11/6/2020 8:52:41 AM","Directory"
Note: -NoTypeInformation is no longer needed in PowerShell (Core) 7+
Using ConvertTo-Json:
C:\>powershell -noprofile -c "Get-Item $PSHOME | ConvertTo-Json -Depth 1"
{
"Name": "v1.0",
"FullName": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0",
"Parent": {
"Name": "WindowsPowerShell",
"FullName": "C:\\Windows\\System32\\WindowsPowerShell",
"Parent": "System32",
"Exists": true,
"Root": "C:\\",
"Extension": "",
"CreationTime": "\/Date(1575710092565)\/",
"CreationTimeUtc": "\/Date(1575710092565)\/",
"LastAccessTime": "\/Date(1615733476841)\/",
"LastAccessTimeUtc": "\/Date(1615733476841)\/",
"LastWriteTime": "\/Date(1575710092565)\/",
"LastWriteTimeUtc": "\/Date(1575710092565)\/",
"Attributes": 16
},
"Exists": true
// ...
}
Since JSON is a hierarchical data format, the serialization depth must be limited with -Depth in order to prevent "runaway" serialization when serializing arbitrary .NET types; this isn't necessary for [pscustomobject] and [hashtable] object graphs composed of primitive .NET types only.
I am trying to pipe the content of a file to a simple ASCII symmetrical encryption program i made. It's a simple program that reads input from STDIN and adds or subtracts a certain value (224) to each byte of the input.
For example: if the first byte is 4 and we want to encrypt, then it becomes 228. If it exceeds 255, the program just performs some modulo.
This is the output I get with cmd (test.txt contains "this is a test"):
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test
It also works the other way, thus it is a symmetrical encryption algorithm
type .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
this is a test
But, the behaviour on PowerShell is different. When encrypting first, I get:
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test_*
And that is what I get when decrypting first:
Maybe is an encoding problem. Thanks in advance.
tl;dr:
Up to at least PowerShell 7.3.1, if you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
Future support for passing raw byte data between external programs and to-file redirections is the subject of GitHub issue #1908.
For raw byte handling, shell out to cmd with /c (on Windows; on Unix-like platforms / Unix-like Windows subsystems, use sh or bash with -c):
cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'
Use a similar technique to save raw byte output in a file - do not use PowerShell's > operator:
cmd /c 'someexe > file.bin'
Note that if you want to capture an external program's text output in a PowerShell variable or process it further in a PowerShell pipeline, you need to make sure that [Console]::OutputEncoding matches your program's output character encoding (the active OEM code page, typically), which should be true by default in this case; see the next section for details.
Generally, however, byte manipulation of text data is best avoided.
There are two separate problems, only one of which has a simple solution:
Problem 1: There is indeed a character encoding problem, as you suspected:
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String), which are sequences of UTF-16 code units.
As an aside: Even when using only PowerShell-native commands, this means that reading input from files and saving them again can result in a different character encoding, because the information about the original character encoding is not preserved once (string) data has been read into memory, and on saving it is the cmdlets' default character encoding that is used; while this default encoding is consistently BOM-less UTF-8 in PowerShell (Core) 6+, it varies by cmdlet in Windows PowerShell - see this answer.
In order to send to and receive data from external programs (such as Crypt.exe in your case), you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
On sending data, PowerShell uses the encoding of the $OutputEncoding preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
The receiving end is covered by default: PowerShell uses [Console]::OutputEncoding (which itself reflects the code page reported by chcp) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].
To fix your primary problem, you therefore need to set $OutputEncoding to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
That is, "foo" | .\Crypt.exe doesn't send (the $OutputEncoding-encoded bytes representing) "foo" to .\Crypt.exe's stdin, it sends "foo`r`n" on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
This problematic behavior is discussed in GitHub issue #5974 and also in this answer.
In your specific case, the implicitly appended "`r`n" is also subject to the byte-value-shifting, which means that the 1st Crypt.exe calls transforms it to -*, causing another "`r`n" to be appended when the data is sent to the 2nd Crypt.exe call.
The net result is an extra newline that is round-tripped (the intermediate -*), plus an encrypted newline that results in φΩ).
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
Given that calling cmd /c as shown at the top of the answer works too, that hardly seems worth it.
How PowerShell handles pipeline data with external programs:
Unlike cmd (or POSIX-like shells such as bash):
PowerShell doesn't support raw byte data in pipelines.[2]
When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
Specifically, this works as follows:
When you send data to an external program via the pipeline (to its stdin stream):
It is converted to text (strings) using the character encoding specified in the $OutputEncoding preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
Caveat: If you assign an encoding with a BOM to $OutputEncoding, PowerShell (as of v7.0) will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use [System.Text.Encoding]::Utf8 (which emits a BOM) in Windows PowerShell, and use [System.Text.Utf8Encoding]::new($false) (which doesn't) instead.
If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
This behavior can cause problems, as discussed in GitHub issue #5974 and also in this answer.
When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in [Console]::OutputEncoding, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]).
PowerShell-internally text is represented using the .NET System.String type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).
The above also applies:
when piping data between external programs,
when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell, > produces UTF-16LE-encoded files (with BOM), whereas PowerShell (Core) sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).
[1] In PowerShell (Core), given that $OutputEncoding commendably already defaults to UTF-8, it would make sense to have [Console]::OutputEncoding be the same - i.e., for the active code page to be effectively 65001 on Windows, as suggested in GitHub issue #7233.
[2] With input from a file, the closest you can get to raw byte handling is to read the file as a .NET System.Byte array with Get-Content -AsByteStream (PowerShell (Core)) / Get-Content -Encoding Byte (Windows PowerShell), but the only way you can further process such as an array is to pipe to a PowerShell command that is designed to handle a byte array, or by passing it to a .NET type's method that expects a byte array. If you tried to send such an array to an external program via the pipeline, each byte would be sent as its decimal string representation on its own line.
[3] Unicode is the name of the abstract standard describing a "global alphabet". In concrete use, it has various standard encodings, UTF-8 and UTF-16 being the most widely used.
I am trying to pipe the content of a file to a simple ASCII symmetrical encryption program i made. It's a simple program that reads input from STDIN and adds or subtracts a certain value (224) to each byte of the input.
For example: if the first byte is 4 and we want to encrypt, then it becomes 228. If it exceeds 255, the program just performs some modulo.
This is the output I get with cmd (test.txt contains "this is a test"):
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test
It also works the other way, thus it is a symmetrical encryption algorithm
type .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
this is a test
But, the behaviour on PowerShell is different. When encrypting first, I get:
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test_*
And that is what I get when decrypting first:
Maybe is an encoding problem. Thanks in advance.
tl;dr:
Up to at least PowerShell 7.3.1, if you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
Future support for passing raw byte data between external programs and to-file redirections is the subject of GitHub issue #1908.
For raw byte handling, shell out to cmd with /c (on Windows; on Unix-like platforms / Unix-like Windows subsystems, use sh or bash with -c):
cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'
Use a similar technique to save raw byte output in a file - do not use PowerShell's > operator:
cmd /c 'someexe > file.bin'
Note that if you want to capture an external program's text output in a PowerShell variable or process it further in a PowerShell pipeline, you need to make sure that [Console]::OutputEncoding matches your program's output character encoding (the active OEM code page, typically), which should be true by default in this case; see the next section for details.
Generally, however, byte manipulation of text data is best avoided.
There are two separate problems, only one of which has a simple solution:
Problem 1: There is indeed a character encoding problem, as you suspected:
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String), which are sequences of UTF-16 code units.
As an aside: Even when using only PowerShell-native commands, this means that reading input from files and saving them again can result in a different character encoding, because the information about the original character encoding is not preserved once (string) data has been read into memory, and on saving it is the cmdlets' default character encoding that is used; while this default encoding is consistently BOM-less UTF-8 in PowerShell (Core) 6+, it varies by cmdlet in Windows PowerShell - see this answer.
In order to send to and receive data from external programs (such as Crypt.exe in your case), you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
On sending data, PowerShell uses the encoding of the $OutputEncoding preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
The receiving end is covered by default: PowerShell uses [Console]::OutputEncoding (which itself reflects the code page reported by chcp) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].
To fix your primary problem, you therefore need to set $OutputEncoding to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
That is, "foo" | .\Crypt.exe doesn't send (the $OutputEncoding-encoded bytes representing) "foo" to .\Crypt.exe's stdin, it sends "foo`r`n" on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
This problematic behavior is discussed in GitHub issue #5974 and also in this answer.
In your specific case, the implicitly appended "`r`n" is also subject to the byte-value-shifting, which means that the 1st Crypt.exe calls transforms it to -*, causing another "`r`n" to be appended when the data is sent to the 2nd Crypt.exe call.
The net result is an extra newline that is round-tripped (the intermediate -*), plus an encrypted newline that results in φΩ).
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
Given that calling cmd /c as shown at the top of the answer works too, that hardly seems worth it.
How PowerShell handles pipeline data with external programs:
Unlike cmd (or POSIX-like shells such as bash):
PowerShell doesn't support raw byte data in pipelines.[2]
When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
Specifically, this works as follows:
When you send data to an external program via the pipeline (to its stdin stream):
It is converted to text (strings) using the character encoding specified in the $OutputEncoding preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
Caveat: If you assign an encoding with a BOM to $OutputEncoding, PowerShell (as of v7.0) will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use [System.Text.Encoding]::Utf8 (which emits a BOM) in Windows PowerShell, and use [System.Text.Utf8Encoding]::new($false) (which doesn't) instead.
If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
This behavior can cause problems, as discussed in GitHub issue #5974 and also in this answer.
When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in [Console]::OutputEncoding, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]).
PowerShell-internally text is represented using the .NET System.String type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).
The above also applies:
when piping data between external programs,
when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell, > produces UTF-16LE-encoded files (with BOM), whereas PowerShell (Core) sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).
[1] In PowerShell (Core), given that $OutputEncoding commendably already defaults to UTF-8, it would make sense to have [Console]::OutputEncoding be the same - i.e., for the active code page to be effectively 65001 on Windows, as suggested in GitHub issue #7233.
[2] With input from a file, the closest you can get to raw byte handling is to read the file as a .NET System.Byte array with Get-Content -AsByteStream (PowerShell (Core)) / Get-Content -Encoding Byte (Windows PowerShell), but the only way you can further process such as an array is to pipe to a PowerShell command that is designed to handle a byte array, or by passing it to a .NET type's method that expects a byte array. If you tried to send such an array to an external program via the pipeline, each byte would be sent as its decimal string representation on its own line.
[3] Unicode is the name of the abstract standard describing a "global alphabet". In concrete use, it has various standard encodings, UTF-8 and UTF-16 being the most widely used.
I am trying to pipe the content of a file to a simple ASCII symmetrical encryption program i made. It's a simple program that reads input from STDIN and adds or subtracts a certain value (224) to each byte of the input.
For example: if the first byte is 4 and we want to encrypt, then it becomes 228. If it exceeds 255, the program just performs some modulo.
This is the output I get with cmd (test.txt contains "this is a test"):
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test
It also works the other way, thus it is a symmetrical encryption algorithm
type .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
this is a test
But, the behaviour on PowerShell is different. When encrypting first, I get:
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test_*
And that is what I get when decrypting first:
Maybe is an encoding problem. Thanks in advance.
tl;dr:
Up to at least PowerShell 7.3.1, if you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
Future support for passing raw byte data between external programs and to-file redirections is the subject of GitHub issue #1908.
For raw byte handling, shell out to cmd with /c (on Windows; on Unix-like platforms / Unix-like Windows subsystems, use sh or bash with -c):
cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'
Use a similar technique to save raw byte output in a file - do not use PowerShell's > operator:
cmd /c 'someexe > file.bin'
Note that if you want to capture an external program's text output in a PowerShell variable or process it further in a PowerShell pipeline, you need to make sure that [Console]::OutputEncoding matches your program's output character encoding (the active OEM code page, typically), which should be true by default in this case; see the next section for details.
Generally, however, byte manipulation of text data is best avoided.
There are two separate problems, only one of which has a simple solution:
Problem 1: There is indeed a character encoding problem, as you suspected:
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String), which are sequences of UTF-16 code units.
As an aside: Even when using only PowerShell-native commands, this means that reading input from files and saving them again can result in a different character encoding, because the information about the original character encoding is not preserved once (string) data has been read into memory, and on saving it is the cmdlets' default character encoding that is used; while this default encoding is consistently BOM-less UTF-8 in PowerShell (Core) 6+, it varies by cmdlet in Windows PowerShell - see this answer.
In order to send to and receive data from external programs (such as Crypt.exe in your case), you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
On sending data, PowerShell uses the encoding of the $OutputEncoding preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
The receiving end is covered by default: PowerShell uses [Console]::OutputEncoding (which itself reflects the code page reported by chcp) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].
To fix your primary problem, you therefore need to set $OutputEncoding to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
That is, "foo" | .\Crypt.exe doesn't send (the $OutputEncoding-encoded bytes representing) "foo" to .\Crypt.exe's stdin, it sends "foo`r`n" on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
This problematic behavior is discussed in GitHub issue #5974 and also in this answer.
In your specific case, the implicitly appended "`r`n" is also subject to the byte-value-shifting, which means that the 1st Crypt.exe calls transforms it to -*, causing another "`r`n" to be appended when the data is sent to the 2nd Crypt.exe call.
The net result is an extra newline that is round-tripped (the intermediate -*), plus an encrypted newline that results in φΩ).
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
Given that calling cmd /c as shown at the top of the answer works too, that hardly seems worth it.
How PowerShell handles pipeline data with external programs:
Unlike cmd (or POSIX-like shells such as bash):
PowerShell doesn't support raw byte data in pipelines.[2]
When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
Specifically, this works as follows:
When you send data to an external program via the pipeline (to its stdin stream):
It is converted to text (strings) using the character encoding specified in the $OutputEncoding preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
Caveat: If you assign an encoding with a BOM to $OutputEncoding, PowerShell (as of v7.0) will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use [System.Text.Encoding]::Utf8 (which emits a BOM) in Windows PowerShell, and use [System.Text.Utf8Encoding]::new($false) (which doesn't) instead.
If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
This behavior can cause problems, as discussed in GitHub issue #5974 and also in this answer.
When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in [Console]::OutputEncoding, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]).
PowerShell-internally text is represented using the .NET System.String type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).
The above also applies:
when piping data between external programs,
when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell, > produces UTF-16LE-encoded files (with BOM), whereas PowerShell (Core) sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).
[1] In PowerShell (Core), given that $OutputEncoding commendably already defaults to UTF-8, it would make sense to have [Console]::OutputEncoding be the same - i.e., for the active code page to be effectively 65001 on Windows, as suggested in GitHub issue #7233.
[2] With input from a file, the closest you can get to raw byte handling is to read the file as a .NET System.Byte array with Get-Content -AsByteStream (PowerShell (Core)) / Get-Content -Encoding Byte (Windows PowerShell), but the only way you can further process such as an array is to pipe to a PowerShell command that is designed to handle a byte array, or by passing it to a .NET type's method that expects a byte array. If you tried to send such an array to an external program via the pipeline, each byte would be sent as its decimal string representation on its own line.
[3] Unicode is the name of the abstract standard describing a "global alphabet". In concrete use, it has various standard encodings, UTF-8 and UTF-16 being the most widely used.
I'm trying to use Putty's plink.exe as part of a Powershell script, and am having trouble teeing the output.
Some of the commands invoke an interactive response (eg: entering password). Specifically, I'm testing against an Isilon.
Example code:
$command = '&"C:\Program Files\Putty\plink.exe" root#10.0.0.141 -pw "password" -t -batch "isi auth users create testuser --set-password"'
iex $command
Expected result:
I get a prompt password:
I enter the password
I get a prompt confirm:
I enter the password again
Command ends
If I try to tee the output, using iex $command | tee-object -variable result or even just redirect with iex $command *>test.log, the prompt text doesn't show up until after I've responded to it. While still technically functional, if you don't know exactly what prompt to expect, it's useless.
I've tried using Start-Transcript, but that doesn't capture the output at all. I've also tried using plink's -sshlog argument, but that logs way too much, in a less than readable format.
Is there any way to have stdout be unbuffered in the console, and also have it stored in a variable?
To answer some potential questions:
-This is to be run in an environment that doesn't allow modules, so can't use Posh-SSH.
-The Powershell version available isn't new enough to use the built-in openssh functionality.
This is all about redirecting streams.
When you use redirection, all outputs are redirected from the streams, and passed to be written to file. When you execute:
Write-Host "Some Text" *>out.txt
You don't see any output and it is all redirected to the file.
Key Note: Redirection works on a (simplification) line by line basis, as
the redirection works by writing to the file one line at a time.
Similarly, when you use Tee-Object, all outputs are redirected from the stream and down the pipeline. This is passed to the cmdlet Tee-Object. Tee-Object takes the input, and then writes that input to both the variable/file you want and to the screen. This happens After the input has been gathered and processed.
This means that both redirection and the Tee-Object commands work on a line by line basis. This makes sense both redirection and the Tee-Object commands work this way because it is hard to deal with things like deleting characters, moving around and editing text dynamically while trying to edit and maintain an open file at the same time. It is only designed for a one-way once the statement is complete, output.
In this case, when running it interactively, the password: prompt is written to the screen and you can respond.
When redirecting/Teeing the output, the password: text prompt is redirected, and buffered, awaiting your response. This makes sense because the statement has not completed yet. You don't want to send half a statement that could change, or half an object down the pipeline. It is only after you complete the statement (e.g. entering in the password + enter) that the whole statement is passed down the stream/pipeline. Once the whole statement is sent, then it is redirected/output Tee'd and can be displayed.
#Bill_Stewart is correct, in the sense that you should pick either an interactive prompt, or a fully automated solution.
Edit: To add some more information from comments.
If we use Tee-Object it relies on the Pipeline. Pipelines can only pass complete objects down the pipeline (e.g. complete strings inc. New Line). Pipelines have to interact with other commands like ForEach-Object or Select-Object, and they can't handle passing incomplete data to them. That's how the PowerShell console works, and you can't change it.
Similarly, redirection works line by line. The underlying reason why, I will explain why in a moment.
So, if you want to interact with it character by character, then you are dealing with streams. And if you want to deal with streams directly, it's 100 times more complicated because you can't use the convenience of the PowerShell console, you have to directly run of the process manually and handle all the input and output yourself.
To start, you have to manually launch the process. To do this we use the System.Diagnostics.Process class. The Pseudocode looks something like this:
$p = [System.Diagnostics.Process]::New()
$p.StartInfo.RedirectStandardOutput = $true
$p.StartInfo.RedirectStandardError = $true
$p.StartInfo.RedirectStandardInput = $true
$p.StartInfo.UseShellExecute = $false
#$p.StartInfo.CreateNoWindow = $true
$p.StartInfo.FileName = "plink.exe"
$p.StartInfo.Arguments = 'root#10.0.0.141 -pw "password" -t -batch "isi auth users create testuser --set-password"'
$p.EnableRaisingEvents = $true
....
We essentially create the process, specify that we are going to redirect the stdout (StartInfo.RedirectStandardOutput = $true), as well as the stdin to something else for us to handle. How do we know when to read the data? Well, the class has the Process.OutputDataReceived Event. You bind to this event to read in the additional data. But:
The OutputDataReceived event indicates that the associated Process has
written a line, terminating with a newline character, to its
redirected StandardOutput stream.
Remarks
So even the process class revolves around newlines for streaming data. This is why even redirects *> work on a line by line basis. PowerShell, and cmd, etc. all use the Process class as a basis to run processes. They all bind to this same event and methods to do their processing. Hence, why everything revolves around newlines and statement completions.
(big breath) So. You still want to interactively work with things one character at a time? well then you can't use the convenience of events. You will have to fall back to using a Stream Reader and directly binding to the Process.StandardOutput Property. Unfortunately this is where I stop, and say that to accomplish this
is beyond the scope of SO, and will require much more research to accomplish.