Random linebreaks in PowerShell standard error output - powershell

I want to convert many .iso files to .mp4 with HandBrake, so I am trying to use the command line interface. I would prefer to write my scripts for this in powershell instead of batch files. However, the standard error contains linebreaks at random location if I use powershell.
For troubleshooting, I created a simplified script both in powershell and in batch.
Powershell:
& "$Env:ProgramFiles\HandBrake\HandBrakeCLI.exe" #(
'--input', 'V:\',
'--title', '1', '--chapter', '1',
'--start-at', 'duration:110', '--stop-at', 'duration:15',
'--output', 'pmovie.mp4',
'--format', 'av_mp4'
) > ".\pstd.txt" 2> ".\perr.txt"
Batch file:
"%ProgramFiles%\HandBrake\HandBrakeCLI.exe" --input V:\ --title 1 --chapter 1 --start-at duration:110 --stop-at duration:15 --output ".\cmovie.mp4" --format av_mp4 > ".\cstd.txt" 2> ".\cerr.txt"
Both scripts create the same .mp4 file, the difference is only the standard error output they create:
Powershell:
HandBrakeCLI.exe : [10:41:44] hb_init: starting libhb thread
At C:\Test\phandbrake.ps1:1 char:2
+ & <<<< "$Env:ProgramFiles\HandBrake\HandBrakeCLI.exe" #(
+ CategoryInfo : NotSpecified: ([10:41:44] hb_i...ng libhb thread
:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
[10:41:44] thread 541fc20 started ("libhb")
HandBrake 1.1.2 (2018090500) - MinGW x86_64 - https://handbrake.fr
8 CPUs detected
O
pening V:\...
[10:41:44] CPU: Intel(R) Core(TM) i7-2600K CPU # 3.40GHz
[10:41:44] - Intel microarchitecture Sandy Bridge
[10:41:44] - logical processor count: 8
[10:41:44] Intel Quick Sync Video support: no
[10:41:44] hb_scan: path=V:\, title_index=1
src/libbluray/disc/disc.c:424: error opening file BDMV\index.bdmv
src/libbluray/disc/disc.c:424: error opening file BDMV\BACKUP\index.bdmv
[10:41:44] bd: not a bd - trying as a stream/file instead
libdvdnav: Using dvdnav version 6.0.0
l
ibdvdnav: Unable to open device file V:\.
libdvdnav: vm: dvd_read_name failed
libdvdnav: DVD disk re
ports i
tself wi
th Region mask 0x
0000000
0. Reg
ions:
1 2 3 4 5
6 7 8
Batch file:
[10:41:35] hb_init: starting libhb thread
[10:41:35] thread 5a2cc30 started ("libhb")
HandBrake 1.1.2 (2018090500) - MinGW x86_64 - https://handbrake.fr
8 CPUs detected
Opening V:\...
[10:41:35] CPU: Intel(R) Core(TM) i7-2600K CPU # 3.40GHz
[10:41:35] - Intel microarchitecture Sandy Bridge
[10:41:35] - logical processor count: 8
[10:41:35] Intel Quick Sync Video support: no
[10:41:35] hb_scan: path=V:\, title_index=1
src/libbluray/disc/disc.c:424: error opening file BDMV\index.bdmv
src/libbluray/disc/disc.c:424: error opening file BDMV\BACKUP\index.bdmv
[10:41:35] bd: not a bd - trying as a stream/file instead
libdvdnav: Using dvdnav version 6.0.0
libdvdnav: Unable to open device file V:\.
libdvdnav: vm: dvd_read_name failed
libdvdnav: DVD disk reports itself with Region mask 0x00000000. Regions: 1 2 3 4 5 6 7 8
libdvdread: Attempting to retrieve all CSS keys
libdvdread: This can take a _long_ time, please be patient
libdvdread: Get key for /VIDEO_TS/VIDEO_TS.VOB at 0x00000130
libdvdread: Elapsed time 0
This bothers me because I would like to check these text files to be sure that there was no error during the encoding.
I suppose this may be related to a lack of synchronization between threads that write to the same stream but I am not sure about it.
The question: What can I do to get the standard error output from PowerShell without these random line breaks?

You might try the Start-Process command, with -RedirectStandardError, -RedirectStandardInput, and -Wait options.
These -Redirect... options on Start-Process do OS level I/O redirection directly to the target file, as most shells do. As I understand it, that's not how PowerShell angle-bracket redirection works, instead they the angle brackets pipe the output through another PowerShell pipeline, using Write-File (or something), which inserts line-breaks between strings it receives.
I'm not sure of the exact details of this, but I'm glad to hear it seems to address the problem for you as it has for me.

I think the issue here is that there is a certain width to the console, and the console itself is essentially being redirected to a file.
My solution to this is to redirect the output directly to the pipeline, using:
2>&1 #Interpreted by the console
2>&1 | x #Output directly to x
And then using Out-File with the available -Width parameter:
$(throw thisisnotsometthingyoucanthrowbutisinfactaverylongmessagethatdemonstratesmypoint) 2>&1 |
Out-File "test.txt" -Width 10000
In this case, powershell will write 10,000 characters before wrapping the text.
However, you also have some odd line breaks in there that I can't replicate right now. That said, now that you know how to send output through the pipeline, you can use other methods to remove the line breaks.
For example, you can use this function which prints out the exact control characters that cause line breaks.
$(throw error) 2>&1 | Out-String | Debug-String
Then, you can go through the output and replace the problem characters, like so:
$(throw error) 2>&1 | Out-String | % {$_ -replace "`r"} | Out-File "test.txt" -Width 10000

Burt Harris' helpful answer shows you one way to avoid the problem, via Start-Process, which requires you to structure the command fundamentally differently, however.
If the output that an equivalent batch file produces is sufficient, there's an easier way: simply call cmd /c and let cmd handle the output redirections, as in your batch file:
cmd /c "`"`"$Env:ProgramFiles\HandBrake\HandBrakeCLI.exe`"`"" #(
'--input', 'V:\',
'--title', '1', '--chapter', '1',
'--start-at', 'duration:110', '--stop-at', 'duration:15',
'--output', 'pmovie.mp4',
'--format', 'av_mp4'
) '> .\pstd.txt 2> .\perr.txt'
Note how the two output redirections are passed as a single, quoted string, to ensure that they are interpreted by cmd.exe rather than by PowerShell.
Also note the embedded escaped double quotes (`") around the executable path to ensure that cmd.exe sees the entire path as a single, double-quoted string.
As for the extra line breaks you're seeing:
I have no specific explanation, but I can tell you how > and 2> work differently in PowerShell - both compared to cmd.exe (batch files) and Start-Process with -RedirectStandard*:
cmd.exe's redirection operator (>) writes raw bytes to the specified target file, both when redirecting stdout (just > or, explicitly, 1>) and stderr (2>); as such, text output by external programs such as HandBrakeCLI.exe is passed through as-is.
Start-Process, which uses the .NET API under the hood, does essentially the same when -RedirectStandardOutput and/or -RedirectStandardError parameters are specified.
By contrast, Powershell's own > operator functions differently:
PowerShell-internally (when calling native PowerShell commands) it converts input objects (that aren't already strings) to strings using PowerShell's rich output formatting system, before sending them to the output file(s), using the character encoding detailed below.
Output received from external programs is assumed to be text, whose encoding is assumed to be the system's OEM character encoding by default, as reflected in [console]::OutputEncoding and chcp. The decoded text is loaded into .NET strings (which are inherently UTF-16-based) line by line.
For redirected stdout output, these strings are re-encoded on output to the target file, using the following encoding by default:
Windows PowerShell: UTF-16LE ("Unicode")
PowerShell Core: UTF-8 without BOM
Note: Only in Windows PowerShell v5.1 or higher and PowerShell Core can you change these defaults - see this answer for details.
By contrast, when redirecting stderr output, via stream 2 (PowerShell's error stream), the strings are wrapped in error objects (instances of type [System.Management.Automation.ErrorRecord]) before being output, and the resulting objects are converted to strings based on PowerShell's output-formatting system, and the same character encoding as above is applied on output to the target file.
You can see evidence of that in your output containing extra information and lines such as HandBrakeCLI.exe : [10:41:44] hb_init: starting libhb thread and
At C:\Test\phandbrake.ps1:1 char:2, ...
It also means that extra line breaks can be introduced, because text produced by the output-formatting system assumes a fixed line width based on the console window's width.
That said, that doesn't explain the oddly placed line breaks in your case.

Related

Powershell : How do I capture Success and Failure of call to command? [duplicate]

I'm using some GIT commands in my PowerShell scripts. Most of the time I'm calling the GIT commands via Invoke-Expression so that I, e.g.
can parse the output, or/and
forward the out to a logging method.
At some GIT commands I recognized that not all output is returned via Invoke-Expression though the documentation states:
Outputs
PSObject
Returns the output that is generated by the invoked command (the value of the Command parameter).
Here is an example:
> $x = iex "git fetch --all"
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), done.
Content of $x:
> $x
Fetching origin
Fetching upstream
So the main information is not returned to $x. I can't imagine that git fetch --all is returning the main information via stderr (wouldn't make sense ...).
I also found this PowerShell question, which is unanswered and the used PowerShell version is 2.
Used PowerShell version:
> $PSVersionTable
Name Value
---- -----
PSVersion 6.2.0
PSEdition Core
GitCommitId 6.2.0
OS Microsoft Windows 10.0.18362
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
How can I force Invoke-Expression to return the whole output?
Thx
As I mentioned in "PowerShell Capture Git Output", with Git 2.16 (Q1 2018), you can try and set first:
set GIT_REDIRECT_STDERR=2>&1
Then in your Powershell script, you should get both stdout and stderr outputs,
See also dahlbyk/posh-git issue 109 for a more Powershell-like example:
$env:GIT_REDIRECT_STDERR = '2>&1'
VonC's answer works well with git, specifically, but it's worth discussing a generic solution:
Note: Invoke-Expression should generally be avoided and there is no reason to use it for invocation of external programs: just invoke them directly and assign to a variable:
$capturedStdout = git ... # capture git's stdout output as an array of lines
As has been noted, git outputs status information to stderr, whereas data goes to stdout; a PowerShell variable assignment only captures stdout output.[1]
To capture a combination of stdout and stderr, interleaved, as it would print to the terminal, you can use redirection 2>&1, as in other shells, to merge the error stream / stderr (2) into (>&) the data output stream (stdout equivalent, 1 - see about_Redirection):
$combinedOutput = git fetch --all 2>&1
Caveat: In PowerShell versions up to v7.1, if $ErrorActionPreference = 'Stop' happens to be in effect, the use of 2> unexpectedly triggers a terminating error; this problematic behavior is discussed in GitHub issue #4002.
There are non-obvious differences to the behavior of other shells, however:
The output will be an array of lines, not a single, multi-line string,
Note: As of PowerShell 7.2 - external-program output is invariably interpreted as text (strings) - there is no support for raw binary output; see this answer.
Lines that originated from stdout are represented as strings, as expected, but lines originating from stderr are actually [System.Management.Automation.ErrorRecord] instances, though they print like strings and on conversion to strings do reproduce the original line, such as when sending the result to an external program.
This answer shows how to separate the captured lines by stream of origin (assuming stdout and stderr were merged).
Being able to capture stderr output separately in a variable is desirable, which isn't supported as of PowerShell 7.2.x, however. Adding future support, along the lines of 2>variable:errs, is the subject of GitHub issue #4332.
The array-based result can be advantageous for parsing; e.g., to find a line that contains the word unpacking:
PS> $combinedOutput -match 'unpacking'
Unpacking objects: 100% (4/4), done.
Note: If there's a chance that only one line was output, use #($combinedOutput) -match 'unpacking'
If you prefer to receive a single, multi-line string instead:
$combinedOutput = (git fetch --all 2>&1) -join "`n" # \n (LF); or: [Environment]::NewLine
If you don't mind a trailing newline as part of the string, you can more simply use Out-String:[2]
$combinedOutput = git fetch --all 2>&1 | Out-String
Caveat: In Windows PowerShell this won't work as expected if stderr lines are present, as they are rendered like PowerShell error records (this problem has been fixed in PowerShell (Core) 6+); run cmd /c 'echo data & echo err >&2' 2>&1 | Out-String to see the problem. Use the -join "`n" solution to avoid the problem.
Note:
As usual, irrespective of what redirections you use, determining whether an external-program call succeeded or failed should be based only on its exit code, reflected in PowerShell's automatic $LASTEXITCODE variable: By convention (which most, but not all programs observe), 0 indicates success and and any nonzero value failure (a notable exception is robocopy which uses several nonzero exit codes to communicate additional information in the success case).
[1] For comprehensive information on capturing output from external programs in PowerShell, see this answer.
[2] This problematic Out-String behavior is discussed in GitHub issue #14444.
try this (without iex)
$x=git fetch --all

Docker save images twice the size when using powershell - saving raw byte streams

Docker version 18.03.1-ce, build 9ee9f40
I'm using powershell to build a big project on windows.
When issuing the command
docker save docker.elastic.co/kibana/kibana > deploy/kibana.docker
I'm getting an file 1.4Gb.
Same command run in CMD produces 799Mb image.
Same command run in bash produces 799Mb image.
CMD and Bash takes less than a minute to save an image, while Powershell takes about 10 minutes.
I did not manage to find any normal explanation of this phenomenon in docker or MS docs.
Right now the "solution" is
Write-Output "Saving images to files"
cmd /c .\deploy-hack.cmd
But I want to find the actual underlying reason for this.
PowerShell doesn't support outputting / passing raw byte streams through - any output from an external program such as docker is parsed line by line, into strings and the strings are then re-encoded on output to a file (if necessary).
It is the overhead of parsing, decoding and re-encoding that explains the performance degradation.
Windows PowerShell's > redirection operator produces UTF16-LE ("Unicode") files by default (whereas PowerShell Core uses UTF8), i.e., files that use (at least) 2 bytes per character. Therefore, it produces files that are twice the size of raw byte input[1], because each byte is interpreted as a character that receives a 2-byte representation in the output.
Your best bet is to use docker save with the -o / --output option to specify the output file (see the docs):
docker save docker.elastic.co/kibana/kibana -o deploy/kibana.docker
[1] Strictly speaking, how PowerShell interprets output from external programs depends on the value of [console]::OutputEncoding, which, if set to UTF8 (chcp 65001 on Windows), could situationally interpret multiple bytes as a single character. However, on Windows PowerShell the default is determined by the (legacy) system locale's OEM code page, which is always a single-byte encoding.

Powershell Encoding Default Output

I have the following problems with a powershell script that runs inside a TFS build. Both problems are unrelated to TFS and can be reproduced using an simple powershell command line window.
1) Completely unrelated to TFS. It seems Powershell does not like german umlauts when it comes to pipe.
1a) This line of code works fine and all umlauts are shown correctly
.\TF.exe hist "$/Test" /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /version:C1~T
1b) This line messes with umlauts
.\TF.exe hist "$/Test" /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /version:C1~T | Out-String
Initially I tried Out-File and changed encoding only to the that the umlauts are encoded wrong in every typeset (UTF8, unicode, UTF32,...)
I really do not know how to extract a string from standard output and get the umlauts right.
2) When using Out-File or Out-String each line in the output got truncated after 80 characters with seems to be the default screen buffer setting. How can I change that inside a powershell script and why does it even have an impact when redirecting the output.
Problem number 2 is not a Powershell problem. tfs documentation says following about default /format parameter (i.e. /format:brief)
Some of the data may be truncated.
/format:detailed does not have that warning, but it returns more information, which you can process with Powershell before doing Out-String or Out-File.
tl;dr
The following should solve both your problems, which stem from tf.exe using ANSI character encoding rather than the expected OEM encoding, and from truncating output by default.:
If you're using Windows PowerShell (the Windows-only legacy edition of PowerShell with versions up to v5.1):
$correctlyCapturedOutput =
& {
$prev = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Encoding]::Default
# Note the addition of /format:detailed
.\tf.exe hist '$/Test' /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /format:detailed /version:C1~T
[Console]::OutputEncoding = $prev
}
If you're using the cross-platform, install-on-demand PowerShell (Core) 7+:
Note: [System.Text.Encoding]::Default, which reports the active ANSI code page's encoding in Windows PowerShell, reports (BOM-less) UTF-8 in PowerShell (Core) (reflecting .NET Core's / .NET 5+'s behavior). Therefore, the active ANSI code page must be determined explicitly, which is most robustly done via the registry.
$correctlyCapturedOutput =
& {
$prev = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(
[int] ((Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP).ACP)
)
# Note the addition of /format:detailed
.\tf.exe hist '$/Test' /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /format:detailed /version:C1~T
[Console]::OutputEncoding = $prev
}
This Gist contains helper function Invoke-WithEncoding, which can simplify the above in both PowerShell edition as follows:
$correctlyCapturedOutput =
Invoke-WithEncoding -Encoding Ansi {
.\tf.exe hist '$/Test' /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /format:detailed /version:C1~T
}
You can directly download and define the function with the following command (while I can personally assure you that doing so is safe, it is advisable to check the source code first):
# Downloads and defines function Invoke-WithEncoding in the current session.
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex
Read on for a detailed discussion.
Re the umlaut (character encoding) problem:
While the output from external programs may print OK to the console, when it comes to capturing the output in a variable or redirecting it - such as sending it through the pipeline to Out-String in your case - PowerShell decodes the output into .NET strings, using the character encoding stored in [Console]::OutputEncoding.
If [Console]::OutputEncoding doesn't match the actual encoding used by the external program, PowerShell will misinterpret the output.
The solution is to (temporarily) set [Console]::OutputEncoding to the actual encoding used by the external program.
While the official tf.exe documentation doesn't discuss character encodings, this comment on GitHub suggests that tf.exe uses the system's active ANSI code page, such as Windows-1252 on US-English or Western European systems.
It should be noted that the use of the ANSI code page is nonstandard behavior for a console application, because console applications are expected to use the system's active OEM code page. As an aside: python too exhibits this nonstandard behavior by default, though its behavior is configurable.
The solutions at the top show how to temporarily switch [Console]::OutputEncoding to the active ANSI code page's encoding in order to ensure that PowerShell correctly decodes tf.exe's output.
Re output-line truncation with Out-String / Out-File (and therefore also > and >>):
As Mustafa Zengin's helpful answer points out, in your particular case - due to use of tf.exe - the truncation happens at the source, i.e. it is tf.exe itself that outputs truncated data per its default formatting (implied /format:brief when /noprompt is also specified).
In general, Out-String and Out-File / > / >> do situationally truncate or line-wrap their output lines based on the console-window width (with a default of 120 chars. in the absence of a console):
Truncation of line-wrapping applies only to output lines stemming from the representations of non-primitive, non-string objects generated by PowerShell's rich output-formatting system:
Strings themselves ([string] input) as well as the string representations of .NET primitive types (plus a few more singe-value-only types) are not subject to truncation / line-wrapping.
Since PowerShell only ever interprets output from external programs as text ([string] instances), truncation / line-wrapping do not occur.
It follows that there's usually no reason to use Out-String on external-program output - unless you need to join the stream (array) of output lines to form a single, multiline string for further in-memory processing.
However, note that Out-String invariably adds a trailing newline to the resulting string, which may be undesired; use (...) -join [Environment]::NewLine to avoid that; Out-String's problematic behavior is discussed in GitHub issue #14444.

Output to text file with cyrillic content

Trying to get an output through cmd with the list of folders and files inside a drive.
Some folders are written in cyrillic alphabet so I only get ??? symbols.
My command:
tree /f /a |clip
or
tree /f /a >output.txt
Result:
\---???????????
\---2017 - ????? ??????? ????
01. ?????.mp3
02. ? ???????.mp3
03. ????.mp3
04. ?????? ? ???.mp3
05. ?????.mp3
06. ???? ?????.mp3
07. ???????? ????.mp3
08. ??? ?? ?????.mp3
Cover.jpg
Any idea?
tree.com uses the native UTF-16 encoding when writing to the console, just like cmd.exe and powershell.exe. So at first you'd expect redirecting the output to a file or pipe to also use Unicode. But tree.com, like most command-line utilities, encodes output to a pipe or disk file using a legacy codepage. (Speaking of legacy, the ".com" in the filename here is historical. In 64-bit Windows it's a regular 64-bit executable, not 16-bit DOS code.)
When writing to a pipe or disk file, some programs hard code the system ANSI codepage (e.g. 1252 in Western Europe) or OEM codepage (e.g. 850 in Western Europe), while some use the console's current output codepage (if attached to a console), which defaults to OEM. The latter would be great because you can change the console's output codepage to UTF-8 via chcp.com 65001. Unfortunately tree.com uses the OEM codepage, with no option to use anything else.
cmd.exe, on the other hand, at least provides a /u option to output its built-in commands as UTF-16. So, if you don't really need tree-formatted output, you could simply use cmd's dir command. For example:
cmd /u /c "dir /s /b" | clip
If you do need tree-formatted output, one workaround would be to read the output from tree.com directly from a console screen buffer, which can be done relatively easily for up to 9,999 lines. But that's not generally practical.
Otherwise PowerShell is probably your best option. For example, you could modify the Show-Tree script to output files in addition to directories.

windows cmd pipe not unicode even with /U switch

I have a little c# console program that outputs some text using Console.WriteLine. I then pipe this output into a textfile like:
c:myprogram > textfile.txt
However, the file is always an ansi text file, even when I start cmd with the /u switch.
cmd /? says about the /u switch:
/U Causes the output of internal
commands to a pipe or file to be Unicode
And it indeed makes a difference, when I do an
c:echo "foo" > text.txt
the text.txt is unicode (without BOM)
I wonder why piping the output of my console program into a new file does not create an unicode file likewise and how i could change that?
I just use Windows Power Shell (which produces a unicode file with correct BOM), but I'd still like to know how to do it with cmd.
Thanks!
The /U switch, as the documentation says, affects whether internal commands generate Unicode output. Your program is not one of cmd.exe's internal commands, so the /U option does not affect it.
To create a Unicode text file, you need to make sure your program is generating Unicode text.
Even that may not be enough, though. I came across this blog from Junfeng Zhang describing how to write Unicode text in a console program. It checks the file type of the standard output handle. For character files (a console or LPT port), it calls WriteFileW. For all other types of handles (including disk files and pipes), it converts the output string to the console's current code page. I'm afraid I don't know how that translates into .Net terms, though.
I had a look how mscorlib implements Console.WriteLine, and it seems to decide on which text output encoding to use based on a call to GetConsoleOutPutCP. So I'm guessing (but have not yet confimed) that the codepage returned is a differnt one for a PS console than for a cmd console so that my program indeed only outputs ansi when running from cmd.