Powershell - Write-Output produces string with a BOM character - powershell

I'm trying to execute such command in a Powershell script:
Write-Output "Some Command" | some-application
When running PS script some-application receives string \xef\xbb\xbfSome Command. The first character is an UTF-8 BOM. All solutions that I can Google, apply only to redirecting output to a file. But I'm trying to redirect a string to another command (via pipe).
The variable $OutputEncoding shows that ASCII is configured, no UTF-8 is set.
I'm running this script from Azure DevOps Pipeline and only there this problem exists.

Note: This answer deals with how to control the encoding that PowerShell uses when data is piped to an external program (to be read via stdin by such a program).
This is separate from:
what character encoding PowerShell cmdlets use by default on output - for more information, see this answer.
what character encoding PowerShell uses when reading data received from external programs - for more information, see this answer.
The implication is that you've mistakenly set the $OutputEncoding preference variable, which determines what character encoding PowerShell uses to send data to external programs, to a UTF-8 encoding with a BOM.
Your problem goes away if you assign a BOM-less UTF-8 encoding instead:
$OutputEncoding = [System.Text.Utf8Encoding]::new($false) # BOM-less
"Some Command" | some-application
Note that this problem wouldn't arise in PowerShell [Core] v6+, where $OutputEncoding defaults to BOM-less UTF-8 (in Windows PowerShell, unfortunately, it defaults to ASCII).
To illustrate the problem:
$OutputEncoding = [System.Text.Encoding]::Utf8 # *with* BOM
"Some Command" | findstr .
The above outputs Some Command, where  is the rendering of the 3-byte UTF-8 BOM (0xef, 0xbb, 0xbf) in the OEM code page 437 (on US-English systems, for instance).

Related

Convert bat file to powershell script

I have a simple .bat file that executes a exe file and also passes some parameters in txt file. How do i acheive the same in powershell script (.ps1 file) ?
.bat file content:
#echo on
C:\Windows\System32\cmd.exe /C "C:\Program Files\BMC Software\AtriumCore\cmdb\server64\bin\cmdbdiag.exe" -u test -p test -s remedyar -t 41900 < "C:\Program Files\BMC Software\ARSystem\diserver\data-integration\batch\CleanupInputs.txt" > "C:\Program Files\BMC Software\ARSystem\diserver\data-integration\batch\Snow_Output\DailyOutput.log"
Exit 0
Fundamentally, you invoke console applications the same way in PowerShell as you do in cmd.exe, but there are important differences:
# If you really want to emulate `#echo ON` - see comments below.
Set-PSDebug -Trace 1
# * PowerShell doesn't support `<` for *input* redirection, so you must
# use Get-Content to *pipe* a file's content to another command.
# * `>` for *output* redirection *is* supported, but beware encoding problems:
# * Windows PowerShell creates a "Unicode" (UTF-16LE) file,
# * PowerShell (Core, v6+) a BOM-less UTF-8 file.
# * To control the encoding, pipe to Out-File / Set-Content with -Encoding
# * For syntactic reasons, because your executable path is *quoted*, you must
# invoke it via `&`, the call operator.
Get-Content "C:\..\CleanupInputs.txt" |
& "C:\...\cmdbdiag.exe" -u test -p test -s remedyar -t 41900 > "C:\...\DailyOutput.log"
# Turn tracing back off.
Set-PSDebug -Trace 0
exit 0
Note:
For brevity I've replaced the long directory paths in your command with ...
Character-encoding caveats:
When PowerShell communicates with external programs, it only "speaks text" (and it generally never passes raw bytes through its pipelines (as of v7.2)), which therefore involves potentially multiple passes of encoding and decoding strings; specifically:
Get-Content doesn't just path a text file's raw bytes through, it decodes the content into .NET strings and then sends the content line by line through the pipeline. If the input file lacks a BOM, Windows PowerShell assumes the active ANSI encoding, whereas PowerShell (Core) 7+ assumes UTF-8; you can use the -Encoding parameter to specify the encoding explicitly.
Since the receiving command is an external program (executable), PowerShell (re)-encodes the lines before sending them to the program, based on the $OutputEncoding preference variable , which defaults to ASCII(!) in Windows PowerShell, and UTF-8 in PowerShell (Core) 7+.
Since > - effectively an alias for Out-File - is used to redirect the external program to a file, another round of decoding and encoding happens:
PowerShell first decodes the external program's output into .NET strings, based on the character encoding stored in [Console]::OutputEncoding, which defaults to the system's active OEM code page.
Then Out-File encodes the decoded strings based on its default encoding, which is UTF-16LE ("Unicode") in Windows PowerShell, and BOM-less UTF-8 in PowerShell (Core); to control the encoding, you need to use Out-File (or Set-Content) explicitly and use its -Encoding parameter.
See also:
about_Redirection
&, the call operator
This answer discusses default encodings in both PowerShell editions; the short of it: they vary wildly in Windows PowerShell, but PowerShell (Core) 7+ now consistently used BOM-less UTF-8.
Re execution tracing: use of #echo ON in your batch file and how it compares to PowerShell's
Set-PSDebug -Trace 1:
Batch files typically run with #echo OFF so as not to echo every command itself before printing its output.
#echo ON (or omitting an #echo ON/OFF statement altogether) can be helpful for diagnosing problems during execution, however.
Set-PSDebug -Trace 1 is similar to #echo ON, but it has one disadvantage: the raw source code of commands is echoed, which means that you won't see the value of embedded variable references and expressions - see this answer for more information.

Powershell script fails to run due to unicode character

I have a Powershell script that is failing due to unicode characters in it:
MyScript.ps1:
Write-Host "Installing 無書籤..."
When I run this script from Powershell command line, I get the following error:
I gather the issue is Powershell is running in ASCII or some other non-unicode mode. I tried changing it like this:
$OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
But the error still persists. How do I get Powershell to run my script?
The screen shot implies that you're using Windows PowerShell, which interprets BOM-less *.ps1 files as ANSI-encoded (using the usually single-byte ANSI code page determined by the legacy system locale); by contrast, PowerShell [Core] v6+ now assumes UTF-8.
Therefore, for Windows PowerShell to correctly interpret CJK characters you must save your *.ps1 file using a Unicode encoding with BOM; given that PowerShell source code itself uses ASCII-range characters (resulting in a mix of Latin and CJK characters), the best choice is UTF-8 with BOM.
As for what you tried:
$OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
These settings only apply when PowerShell communicates with external programs - see this answer.
First, write and save your script, which contains Unicode characters, in the Windows PowerShell ISE program
Then you can edit it with any program like visual code or ...
This method should be useful.

Broken Cmd scripts after creation with PowerShell Write-Output

We have a domain-wide automation tool that can start jobs on servers as admin (Stonebranch UAC - please note: this has nothing to do with Windows "User Access Control", Stronebranch UAC is an Enterprise automation tool). Natively, it looks for Cmd batch scripts, so we use those.
However, I prefer to use PowerShell for everything, so I bulk created dozens of .bat scripts using PowerShell. Nothing worked, the automation tool broke whenever it tried to run the .bat scripts. So I pared back the scripts so that they contained a single line echo 123 and still, everything was broken. We thought it was a problem with the tool, but then tried to run the .bat scripts on the server and they were broken too, just generating some unicode on the command line and failing to run.
So it dawned on us that something about how PowerShell pumps Write-Output commands to create the batch scripts was breaking them (this is on Windows 2012 R2 and PowerShell is 5.1). And I repeat this, for example, if I type the following on a PowerShell console:
Write-Output "echo 123" > test.bat
If I now open a cmd.exe and then try to run test.bat, I just get a splat of 2 unicode-looking characters on the screen and nothing else.
Can someone explain to me a) why this behaviour happens, and b) how can I continue to use PowerShell to generate these batch scripts without them being broken? i.e. do I have to change BOM or UTF-8 settings or whatever to get this working and how do I do that please?
In Windows PowerShell, >, like the underlying Out-File cmdlet, invariably[1] creates "Unicode" (UTF-16LE) files, which cmd.exe cannot read (not even with the /U switch).
In PowerShell [Core] v6+, BOM-less UTF-8 encoding is consistently used, including by >.
Therefore:
If you're using PowerShell [Core] v6+ AND the content of the batch file comprises ASCII-range characters only (7-bit range), you can get away with >.
Otherwise, use Set-Content with -Encoding Oem.
'#echo 123' | Set-Content -Encoding Oem test.bat
If your batch-file source code only ever contains ASCII-range characters (7-bit range), you can get also get away with (in both PowerShell editions):
'#echo 123' | Set-Content test.bat
Note:
As the -Encoding argument implies, the system's active OEM code page is used, which is what batch files expect.
OEM code pages are supersets of ASCII encoding, so a file saved with -Encoding Oem that is composed only of ASCII-range characters is implicitly also an ASCII file. The same applies to BOM-less UTF-8 and ANSI (Default) encoded files composed of ASCII-range characters only.
-Encoding Oem - as opposed to -Encoding Ascii or even using Set-Content's default encoding[2] - therefore only matters if you have non-ASCII-range characters in your batch file's source code, such as é. Such characters, however, are limited to a set of 256 characters in total, given that OEM code pages are fixed-width single-byte encodings, which means that many Unicode characters are inherently unusable, such as €.
[1] In Windows PowerShell v5.1 (and above), it is possible to change >'s encoding via the $PSDefaultParameterValues preference variable - see this answer - however, you won't be able to select a BOM-less UTF-8 encoding, which would be needed for creation of batch files (composed of ASCII-range characters only).
[2] Set-Content's default encoding is the active ANSI code page (Default) in Windows PowerShell (another ASCII superset), and (as for all cmdlets) BOM-less UTF-8 in PowerShell [Core] v6+; for an overview of the wildly inconsistent character encodings in Windows PowerShell, see this answer.
a method that by default creates bom-less utf8 files is:
new-item -Path outfile.bat -itemtype file -value "echo this"

Changing PowerShell's default output encoding to UTF-8

By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.
It can be done on a case-by-case basis by replacing the >foo.txt syntax with | out-file foo.txt -encoding utf8 but this is awkward to have to repeat every time.
The persistent way to set things in PowerShell is to put them in \Users\me\Documents\WindowsPowerShell\profile.ps1; I've verified that this file is indeed executed on startup.
It has been said that the output encoding can be set with $PSDefaultParameterValues = #{'Out-File:Encoding' = 'utf8'} but I've tried this and it had no effect.
https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.
How do you set PowerShell to use UTF-8?
Note:
The next section applies primarily to Windows PowerShell.
See the section after it for the cross-platform PowerShell Core (v6+) edition.
In both cases, the information applies to making PowerShell use UTF-8 for reading and writing files.
By contrast, for information on how to send and receive UTF-8-encoded strings to and from external programs, see this answer.
A system-wide switch to UTF-8 is possible nowadays (since recent versions of Windows 10): see this answer, but note the following caveats:
The feature has far-reaching consequences, because both the OEM and the ANSI code page are then set to 65001, i.e. UTF-8; also the feature is still considered a beta feature as of this writing.
In Windows PowerShell, this takes effect only for those cmdlets that default to the ANSI code page, notably Set-Content, but not Out-File / >, and it also applies to reading files, notably including Get-Content and how PowerShell itself reads source code.
The Windows PowerShell perspective:
In PSv5.1 or higher, where > and >> are effectively aliases of Out-File, you can set the default encoding for > / >> / Out-File via the $PSDefaultParameterValues preference variable:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
Note:
In Windows PowerShell (the legacy edition whose latest and final version is v5.1), this invariably creates UTF-8 file with a (pseudo) BOM.
Many Unix-based utilities do not recognize this BOM (see bottom); see this post for workarounds that create BOM-less UTF-8 files.
In PowerShell (Core) v6+, BOM-less UTF-8 is the default (see next section), but if you do want a BOM there, you can use 'utf8BOM'
In PSv5.0 or below, you cannot change the encoding for > / >>, but, on PSv3 or higher, the above technique does work for explicit calls to Out-File.
(The $PSDefaultParameterValues preference variable was introduced in PSv3.0).
In PSv3.0 or higher, if you want to set the default encoding for all cmdlets that support
an -Encoding parameter (which in PSv5.1+ includes > and >>), use:
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
If you place this command in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding via their -Encoding parameter.
Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine; however, to avoid a session-global change, use the following form to create a local copy of $PSDefaultParameterValues:
$PSDefaultParameterValues = #{ '*:Encoding' = 'utf8' }
For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.
The automatic $OutputEncoding variable is unrelated, and only applies to how PowerShell communicates with external programs (what encoding PowerShell uses when sending strings to them) - it has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.
Optional reading: The cross-platform perspective: PowerShell Core:
PowerShell is now cross-platform, via its PowerShell Core edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.
This means that source-code files without a BOM are assumed to be UTF-8, and using > / Out-File / Set-Content defaults to BOM-less UTF-8; explicit use of the utf8 -Encoding argument too creates BOM-less UTF-8, but you can opt to create files with the pseudo-BOM with the utf8bom value.
If you create PowerShell scripts with an editor on a Unix-like platform and nowadays even on Windows with cross-platform editors such as Visual Studio Code and Sublime Text, the resulting *.ps1 file will typically not have a UTF-8 pseudo-BOM:
This works fine on PowerShell Core.
It may break on Windows PowerShell, if the file contains non-ASCII characters; if you do need to use non-ASCII characters in your scripts, save them as UTF-8 with BOM.
Without the BOM, Windows PowerShell (mis)interprets your script as being encoded in the legacy "ANSI" codepage (determined by the system locale for pre-Unicode applications; e.g., Windows-1252 on US-English systems).
Conversely, files that do have the UTF-8 pseudo-BOM can be problematic on Unix-like platforms, as they cause Unix utilities such as cat, sed, and awk - and even some editors such as gedit - to pass the pseudo-BOM through, i.e., to treat it as data.
This may not always be a problem, but definitely can be, such as when you try to read a file into a string in bash with, say, text=$(cat file) or text=$(<file) - the resulting variable will contain the pseudo-BOM as the first 3 bytes.
Inconsistent default encoding behavior in Windows PowerShell:
Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.
Note:
The following doesn't aspire to cover all standard cmdlets.
Googling cmdlet names to find their help topics now shows you the PowerShell Core version of the topics by default; use the version drop-down list above the list of topics on the left to switch to a Windows PowerShell version.
Historically, the documentation frequently incorrectly claimed that ASCII is the default encoding in Windows PowerShell; fortunately, this has since been corrected.
Cmdlets that write:
Out-File and > / >> create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content / Add-Content (see next point); New-ModuleManifest and Export-CliXml also create UTF-16LE files.
Set-Content (and Add-Content if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default).
Export-Csv indeed creates ASCII files, as documented, but see the notes re -Append below.
Export-PSSession creates UTF-8 files with BOM by default.
New-Item -Type File -Value currently creates BOM-less(!) UTF-8.
The Send-MailMessage help topic also claims that ASCII encoding is the default - I have not personally verified that claim.
Start-Transcript invariably creates UTF-8 files with BOM, but see the notes re -Append below.
Re commands that append to an existing file:
>> / Out-File -Append make no attempt to match the encoding of a file's existing content.
That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding, which is not an option with >> (except indirectly in PSv5.1+, via $PSDefaultParameterValues, as shown above).
In short: you must know the encoding of an existing file's content and append using that same encoding.
Add-Content is the laudable exception: in the absence of an explicit -Encoding argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.
This inconsistency between Out-File -Append / >> and Add-Content, which also affects PowerShell Core, is discussed in GitHub issue #9423.
Export-Csv -Append partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
To put it differently: in the absence of a BOM, Export-Csv -Append assumes UTF-8 is, whereas Add-Content assumes ANSI.
Start-Transcript -Append partially matches the existing encoding: It correctly matches encodings with BOM, but defaults to potentially lossy ASCII encoding in the absence of one.
Cmdlets that read (that is, the encoding used in the absence of a BOM):
Get-Content and Import-PowerShellDataFile default to ANSI (Default), which is consistent with Set-Content.
ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.
By contrast, Import-Csv, Import-CliXml and Select-String assume UTF-8 in the absence of a BOM.
To be short, use:
write-output "your text" | out-file -append -encoding utf8 "filename"
You may want to put parts of the script into braces so you could redirect output of few commands:
{
command 1
command 2
} | out-file -append -encoding utf8 "filename"
A dump made using PowerShell on Windows with output redirection creates a file that has UTF-16 encoding. To work around this issue, you can try:
mysqldump.exe [options] --result-file=dump.sql
Reference link: mysqldump_result-file

How to cat a UTF-8 (no BOM) file properly/globally in PowerShell? (to another file)

Create a file utf8.txt. Ensure the encoding is UTF-8 (no BOM). Set its content to €
In cmd.exe:
type utf8.txt > out.txt
Content of out.txt is €
In PowerShell (v4):
cat .\utf8.txt > out.txt
or
type .\utf8.txt > out.txt
Out.txt content is €
How do I globally make PowerShell work correctly?
Note: This answer is about Windows PowerShell (up to v5.1); PowerShell [Core, v6+], the cross-platform edition of PowerShell, now fortunately defaults to BOM-less UTF-8 on both in- and output.
Windows PowerShell, unlike the underlying .NET Framework[1]
, uses the following defaults:
on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).
on output: the > and >> redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).
File-consuming and -producing cmdlets do usually support an -Encoding parameter that lets you specify the encoding explicitly.
Prior to Windows PowerShell v5.1, using the underlying Out-File cmdlet explicitly was the only way to change the encoding.
In Windows PowerShell v5.1+, > and >> became effective aliases of Out-File, allowing you to change the encoding behavior of > and >> via the $PSDefaultParameterValues preference variable; e.g.:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'.
For Windows PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2]
, but note that on output, PowerShell invariably adds a BOM to UTF-8 files.
Applied to your example:
Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt
To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.
[1] .NET Framework uses (BOM-less) UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between Windows PowerShell and the framework it is built on is unusual. The difference went away in PowerShell [Core] v6+: both .NET [Core] and PowerShell [Core] default to BOM-less UTF-8.
[2] Get-Content does, however, automatically recognize UTF-8 files with a BOM.
For PowerShell 5.1, enable this setting:
Control Panel, Region, Administrative, Change system locale, Use Unicode UTF-8
for worldwide language support
Then enter this into PowerShell:
$PSDefaultParameterValues['*:Encoding'] = 'Default'
Alternatively, you can upgrade to PowerShell 6 or higher.
https://github.com/PowerShell/PowerShell