Changing PowerShell's default output encoding to UTF-8 - powershell

By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.
It can be done on a case-by-case basis by replacing the >foo.txt syntax with | out-file foo.txt -encoding utf8 but this is awkward to have to repeat every time.
The persistent way to set things in PowerShell is to put them in \Users\me\Documents\WindowsPowerShell\profile.ps1; I've verified that this file is indeed executed on startup.
It has been said that the output encoding can be set with $PSDefaultParameterValues = #{'Out-File:Encoding' = 'utf8'} but I've tried this and it had no effect.
https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.
How do you set PowerShell to use UTF-8?

Note:
The next section applies primarily to Windows PowerShell.
See the section after it for the cross-platform PowerShell Core (v6+) edition.
In both cases, the information applies to making PowerShell use UTF-8 for reading and writing files.
By contrast, for information on how to send and receive UTF-8-encoded strings to and from external programs, see this answer.
A system-wide switch to UTF-8 is possible nowadays (since recent versions of Windows 10): see this answer, but note the following caveats:
The feature has far-reaching consequences, because both the OEM and the ANSI code page are then set to 65001, i.e. UTF-8; also the feature is still considered a beta feature as of this writing.
In Windows PowerShell, this takes effect only for those cmdlets that default to the ANSI code page, notably Set-Content, but not Out-File / >, and it also applies to reading files, notably including Get-Content and how PowerShell itself reads source code.
The Windows PowerShell perspective:
In PSv5.1 or higher, where > and >> are effectively aliases of Out-File, you can set the default encoding for > / >> / Out-File via the $PSDefaultParameterValues preference variable:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
Note:
In Windows PowerShell (the legacy edition whose latest and final version is v5.1), this invariably creates UTF-8 file with a (pseudo) BOM.
Many Unix-based utilities do not recognize this BOM (see bottom); see this post for workarounds that create BOM-less UTF-8 files.
In PowerShell (Core) v6+, BOM-less UTF-8 is the default (see next section), but if you do want a BOM there, you can use 'utf8BOM'
In PSv5.0 or below, you cannot change the encoding for > / >>, but, on PSv3 or higher, the above technique does work for explicit calls to Out-File.
(The $PSDefaultParameterValues preference variable was introduced in PSv3.0).
In PSv3.0 or higher, if you want to set the default encoding for all cmdlets that support
an -Encoding parameter (which in PSv5.1+ includes > and >>), use:
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
If you place this command in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding via their -Encoding parameter.
Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine; however, to avoid a session-global change, use the following form to create a local copy of $PSDefaultParameterValues:
$PSDefaultParameterValues = #{ '*:Encoding' = 'utf8' }
For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.
The automatic $OutputEncoding variable is unrelated, and only applies to how PowerShell communicates with external programs (what encoding PowerShell uses when sending strings to them) - it has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.
Optional reading: The cross-platform perspective: PowerShell Core:
PowerShell is now cross-platform, via its PowerShell Core edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.
This means that source-code files without a BOM are assumed to be UTF-8, and using > / Out-File / Set-Content defaults to BOM-less UTF-8; explicit use of the utf8 -Encoding argument too creates BOM-less UTF-8, but you can opt to create files with the pseudo-BOM with the utf8bom value.
If you create PowerShell scripts with an editor on a Unix-like platform and nowadays even on Windows with cross-platform editors such as Visual Studio Code and Sublime Text, the resulting *.ps1 file will typically not have a UTF-8 pseudo-BOM:
This works fine on PowerShell Core.
It may break on Windows PowerShell, if the file contains non-ASCII characters; if you do need to use non-ASCII characters in your scripts, save them as UTF-8 with BOM.
Without the BOM, Windows PowerShell (mis)interprets your script as being encoded in the legacy "ANSI" codepage (determined by the system locale for pre-Unicode applications; e.g., Windows-1252 on US-English systems).
Conversely, files that do have the UTF-8 pseudo-BOM can be problematic on Unix-like platforms, as they cause Unix utilities such as cat, sed, and awk - and even some editors such as gedit - to pass the pseudo-BOM through, i.e., to treat it as data.
This may not always be a problem, but definitely can be, such as when you try to read a file into a string in bash with, say, text=$(cat file) or text=$(<file) - the resulting variable will contain the pseudo-BOM as the first 3 bytes.
Inconsistent default encoding behavior in Windows PowerShell:
Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.
Note:
The following doesn't aspire to cover all standard cmdlets.
Googling cmdlet names to find their help topics now shows you the PowerShell Core version of the topics by default; use the version drop-down list above the list of topics on the left to switch to a Windows PowerShell version.
Historically, the documentation frequently incorrectly claimed that ASCII is the default encoding in Windows PowerShell; fortunately, this has since been corrected.
Cmdlets that write:
Out-File and > / >> create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content / Add-Content (see next point); New-ModuleManifest and Export-CliXml also create UTF-16LE files.
Set-Content (and Add-Content if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default).
Export-Csv indeed creates ASCII files, as documented, but see the notes re -Append below.
Export-PSSession creates UTF-8 files with BOM by default.
New-Item -Type File -Value currently creates BOM-less(!) UTF-8.
The Send-MailMessage help topic also claims that ASCII encoding is the default - I have not personally verified that claim.
Start-Transcript invariably creates UTF-8 files with BOM, but see the notes re -Append below.
Re commands that append to an existing file:
>> / Out-File -Append make no attempt to match the encoding of a file's existing content.
That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding, which is not an option with >> (except indirectly in PSv5.1+, via $PSDefaultParameterValues, as shown above).
In short: you must know the encoding of an existing file's content and append using that same encoding.
Add-Content is the laudable exception: in the absence of an explicit -Encoding argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.
This inconsistency between Out-File -Append / >> and Add-Content, which also affects PowerShell Core, is discussed in GitHub issue #9423.
Export-Csv -Append partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
To put it differently: in the absence of a BOM, Export-Csv -Append assumes UTF-8 is, whereas Add-Content assumes ANSI.
Start-Transcript -Append partially matches the existing encoding: It correctly matches encodings with BOM, but defaults to potentially lossy ASCII encoding in the absence of one.
Cmdlets that read (that is, the encoding used in the absence of a BOM):
Get-Content and Import-PowerShellDataFile default to ANSI (Default), which is consistent with Set-Content.
ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.
By contrast, Import-Csv, Import-CliXml and Select-String assume UTF-8 in the absence of a BOM.

To be short, use:
write-output "your text" | out-file -append -encoding utf8 "filename"
You may want to put parts of the script into braces so you could redirect output of few commands:
{
command 1
command 2
} | out-file -append -encoding utf8 "filename"

A dump made using PowerShell on Windows with output redirection creates a file that has UTF-16 encoding. To work around this issue, you can try:
mysqldump.exe [options] --result-file=dump.sql
Reference link: mysqldump_result-file

Related

Powershell prevent UTF8 conversion

when I try to manipulate an .ini File with Powershell it always switch the encoding to UTF-8.
My Code:
Get-Content -Path "./update.ini" -Encoding ascii | Out-File -FilePath "ascii_update.ini" -Encoding ascii
The file needs to stay ASCII, so how can I disable this behaviour or how to switch it back to ASCII?
German characters will not be shown correctly
Given that you don't want UTF-8 encoding yet you want German umlauts, what you're looking for is ANSI encoding, not ASCII.
In Windows PowerShell, ANSI encoding is the default encoding used by Get-Content and Set-Content (but not Out-File, which defaults to "Unicode" (UTF-16LE)), so all you need is the following:
# Windows PowerShell: ANSI encoding is the default for Get-Content / Set-Content
Get-Content ./update.ini | Set-Content ascii_update.ini
In PowerShell (Core) 7+, (BOM-less) UTF-8 is now the default, across all cmdlets, so you must request ANSI encoding explicitly, using -Encoding
Unfortunately, whereas Default refers to the system's active ANSI encoding in Windows PowerShell, in PowerShell (Core) it now refers to UTF-8, and there is no predefined ANSI enumeration value to complement the OEM value - this baffling omission is discussed in GitHub issue #6562.
Therefore, you must determine the active ANSI code page explicitly, as shown below.
$ansiEnc = [cultureinfo]::CurrentCulture.TextInfo.ANSICodePage
Get-Content -Encoding $ansiEnc ./update.ini |
Set-Content -Encoding $ansiEnc ascii_update.ini
notepad shows it as UTF-8 on the bottom right side.
ASCII encoding is a subset of UTF-8 encoding, which is why most editors show pure ASCII files as UTF-8, because they are by definition also valid UTF-8 files.
Note that if you save or read text that contains non-ASCII characters with -Encoding ASCII, the non-ASCII characters are "lossily" transcoded to verbatim ? characters.
Optional reading: managing INI files as UTF-16LE ("Unicode") encoded, support via Windows API functions:
zett42 points out that the WritePrivateProfileString and GetPrivateProfileString
Windows API functions interpret INI files as follows:
If a file has a UTF-16LE ("Unicode") BOM, it is read and updated as such.
Otherwise, it is invariably interpreted as ANSI-encoded (even if it has a different Unicode encoding's BOM, such as UTF-8).
If you let WritePrivateProfileString create an INI file implicitly, it is always created without a BOM, and therefore treated as ANSI-encoded (even if you use the Unicode version of the API function). If you try to write non-ANSI-range Unicode characters to such a file, they are quietly and lossily transcoded as follows: either to an ASCII-range equivalent, for accented letters, if applicable (e.g., ă is transoced to a); otherwise, to verbatim ?
Thus, creating the INI file of interest explicitly with a UTF-16lE BOM is necessary in order to maintain the file as UTF-16LE-encoded and therefore enable full Unicode support.
Thus, you could create the INI file initially with a command such as Set-Content -Encoding Unicode ./update.ini -Value #(), which creates an (otherwise) empty file that contains only a UTF-16LE BOM, and then stick with -Encoding Unicode if you need to manipulate the file directly.
This MIT-licensed Gist (authored by me) contains module file IniFileHelper.psm1, whose Get-IniValue and Set-IniValue functions wrap the above-mentioned Windows API functions, with the crucial difference that when Set-IniValue implicitly creates an INI file it uses UTF-16LE encoding.
The following, self-contained example demonstrates this:
# Download the module code and import it via a session-scoped, dynamic module.
# IMPORTANT:
# While I can personally assure you that doing this is safe,
# you should always check the source code yourself first.
$null = New-Module -Verbose -ScriptBlock ([scriptblock]::Create((Invoke-RestMethod 'https://gist.githubusercontent.com/mklement0/006c2352ddae7bb05693be028240f5b6/raw/1e2520810213f76f2e8f419d0e48892a4009de6a/IniFileHelper.psm1')))
# Implicitly create file "test.ini" in the current directory,
# and write key "TestKey" to section "Main", with a value
# that contains an ASCII-range character, an ANSI-range character,
# and a character beyond either of these two ranges.
Set-IniValue test.ini Main TestKey 'haäă'
# Now retrieve the same entry, which should show the exact
# same value, 'haäă'
# Note: If there is a preexisting "test.ini" file that does NOT
# have a UTF-16LE BOM, the non-ANSI 'ă' character would be
# "best-fit" transcoded to ASCII 'a'.
# Other non-ANSI characters that do not have ASCII-range analogs
# would be lossily transcoded to verbatim '?'
Get-IniValue test.ini Main TestKey

Powershell - Write-Output produces string with a BOM character

I'm trying to execute such command in a Powershell script:
Write-Output "Some Command" | some-application
When running PS script some-application receives string \xef\xbb\xbfSome Command. The first character is an UTF-8 BOM. All solutions that I can Google, apply only to redirecting output to a file. But I'm trying to redirect a string to another command (via pipe).
The variable $OutputEncoding shows that ASCII is configured, no UTF-8 is set.
I'm running this script from Azure DevOps Pipeline and only there this problem exists.
Note: This answer deals with how to control the encoding that PowerShell uses when data is piped to an external program (to be read via stdin by such a program).
This is separate from:
what character encoding PowerShell cmdlets use by default on output - for more information, see this answer.
what character encoding PowerShell uses when reading data received from external programs - for more information, see this answer.
The implication is that you've mistakenly set the $OutputEncoding preference variable, which determines what character encoding PowerShell uses to send data to external programs, to a UTF-8 encoding with a BOM.
Your problem goes away if you assign a BOM-less UTF-8 encoding instead:
$OutputEncoding = [System.Text.Utf8Encoding]::new($false) # BOM-less
"Some Command" | some-application
Note that this problem wouldn't arise in PowerShell [Core] v6+, where $OutputEncoding defaults to BOM-less UTF-8 (in Windows PowerShell, unfortunately, it defaults to ASCII).
To illustrate the problem:
$OutputEncoding = [System.Text.Encoding]::Utf8 # *with* BOM
"Some Command" | findstr .
The above outputs Some Command, where  is the rendering of the 3-byte UTF-8 BOM (0xef, 0xbb, 0xbf) in the OEM code page 437 (on US-English systems, for instance).

Broken Cmd scripts after creation with PowerShell Write-Output

We have a domain-wide automation tool that can start jobs on servers as admin (Stonebranch UAC - please note: this has nothing to do with Windows "User Access Control", Stronebranch UAC is an Enterprise automation tool). Natively, it looks for Cmd batch scripts, so we use those.
However, I prefer to use PowerShell for everything, so I bulk created dozens of .bat scripts using PowerShell. Nothing worked, the automation tool broke whenever it tried to run the .bat scripts. So I pared back the scripts so that they contained a single line echo 123 and still, everything was broken. We thought it was a problem with the tool, but then tried to run the .bat scripts on the server and they were broken too, just generating some unicode on the command line and failing to run.
So it dawned on us that something about how PowerShell pumps Write-Output commands to create the batch scripts was breaking them (this is on Windows 2012 R2 and PowerShell is 5.1). And I repeat this, for example, if I type the following on a PowerShell console:
Write-Output "echo 123" > test.bat
If I now open a cmd.exe and then try to run test.bat, I just get a splat of 2 unicode-looking characters on the screen and nothing else.
Can someone explain to me a) why this behaviour happens, and b) how can I continue to use PowerShell to generate these batch scripts without them being broken? i.e. do I have to change BOM or UTF-8 settings or whatever to get this working and how do I do that please?
In Windows PowerShell, >, like the underlying Out-File cmdlet, invariably[1] creates "Unicode" (UTF-16LE) files, which cmd.exe cannot read (not even with the /U switch).
In PowerShell [Core] v6+, BOM-less UTF-8 encoding is consistently used, including by >.
Therefore:
If you're using PowerShell [Core] v6+ AND the content of the batch file comprises ASCII-range characters only (7-bit range), you can get away with >.
Otherwise, use Set-Content with -Encoding Oem.
'#echo 123' | Set-Content -Encoding Oem test.bat
If your batch-file source code only ever contains ASCII-range characters (7-bit range), you can get also get away with (in both PowerShell editions):
'#echo 123' | Set-Content test.bat
Note:
As the -Encoding argument implies, the system's active OEM code page is used, which is what batch files expect.
OEM code pages are supersets of ASCII encoding, so a file saved with -Encoding Oem that is composed only of ASCII-range characters is implicitly also an ASCII file. The same applies to BOM-less UTF-8 and ANSI (Default) encoded files composed of ASCII-range characters only.
-Encoding Oem - as opposed to -Encoding Ascii or even using Set-Content's default encoding[2] - therefore only matters if you have non-ASCII-range characters in your batch file's source code, such as é. Such characters, however, are limited to a set of 256 characters in total, given that OEM code pages are fixed-width single-byte encodings, which means that many Unicode characters are inherently unusable, such as €.
[1] In Windows PowerShell v5.1 (and above), it is possible to change >'s encoding via the $PSDefaultParameterValues preference variable - see this answer - however, you won't be able to select a BOM-less UTF-8 encoding, which would be needed for creation of batch files (composed of ASCII-range characters only).
[2] Set-Content's default encoding is the active ANSI code page (Default) in Windows PowerShell (another ASCII superset), and (as for all cmdlets) BOM-less UTF-8 in PowerShell [Core] v6+; for an overview of the wildly inconsistent character encodings in Windows PowerShell, see this answer.
a method that by default creates bom-less utf8 files is:
new-item -Path outfile.bat -itemtype file -value "echo this"

How to pass UTF-8 characters to clip.exe with PowerShell without conversion to another charset?

I'm a Windows and Powershell noobie. I'm coming from Linux Land. I used to have this little Bash function in my .bashrc that would copy a "shruggie" (¯\_(ツ)_/¯) to the clipboard for me so that I could paste it into conversations on Slack and such.
My Bash alias looked like this: alias shruggie='printf "¯\_(ツ)_/¯" | xclip -selection c && echo "¯\_(ツ)_/¯"'
I realize that this question is juvenile, but the answer does have value to me as I'm sure that I will need to pipe odd UTF-8 characters to output in a Powershell script at some point in the future.
I wrote this function in my PowerShell profile:
function shruggie() {
'¯\_(ツ)_/¯' | clip
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
However, this gives me: ??\_(???)_/?? (Unknown UTF-8 chars are converted to ?) when I call it on the command line.
I've looked at [System.Text.Encoding]::UTF8 and some other questions but I don't know how to cast my string as UTF-8 and pass that through clip.exe and receive UTF-8 out on the other side (on the clipboard).
There are two distinct, independent aspects:
copying ¯\_(ツ)_/¯ to the clipboard, using clip.exe
writing (echoing) ¯\_(ツ)_/¯ to the console
Prerequisite: PowerShell must properly recognize your source code's encoding in order for the solutions below to work: if your source code is UTF-8-encoded, be sure to save the enclosing files as UTF-8 with BOM for Windows PowerShell to recognize it.
Windows PowerShell, in the absence of BOM, interprets source as "ANSI"-encoded, referring to the legacy, single-byte, extended-ASCII code page in effect, such as Windows-1252 on US-English system, and would therefore interpret UTF-8-encoded source code incorrectly.
Note that, by contrast, PowerShell Core uses UTF-8 as the default, so the BOM is no longer necessary (but still recognized).
Copying ¯\_(ツ)_/¯ to the clipboard, using clip.exe:
In Windows PowerShell v5.1+, you can use the built-in Set-Clipboard cmdlet to copy text to the clipboard from within PowerShell; given that PowerShell uses the .NET System.String type that is capable of representing all Unicode characters, there are no encoding issues.
Note that PowerShell Core, even when run on Windows, does NOT have this cmdlet (as of PowerShell Core v6.0.0-rc.2)
See this answer of mine for clipboard functions that work in earlier PowerShell versions as well as in PowerShell Core.
In earlier versions of Windows PowerShell and in PowerShell Core, use of clip.exe is a viable alternative, but its use requires additional work:
function shruggie() {
$OutputEncoding = (New-Object System.Text.UnicodeEncoding $False, $False).psobject.BaseObject
'¯\_(ツ)_/¯' | clip
Write-Verbose -Verbose "Shruggie copied to clipboard." # see section about console output
}
New-Object System.Text.UnicodeEncoding $False, $False creates a BOM-less UTF16-LE encoding, which clip.exe understands.
The magic .psobject.BaseObject incantation is, unfortunately, required to work around a bug; in PSv5+, you can bypass this bug by using the following instead:
[System.Text.UnicodeEncoding]::new($False, $False)
Assigning that encoding to preference variable $OutputEncoding ensures that PowerShell uses that encoding to pipe data to external utility clip.exe.
Writing ¯\_(ツ)_/¯ to the console:
Note: PowerShell Core on Unix platforms generally uses consoles (terminals) with a default encoding of (BOM-less) UTF-8, so no additional work is needed there.
To merely echo (print) Unicode characters (beyond the 8-bit range), it is sufficient to switch to a font that can display Unicode characters (beyond the extended ASCII range), because, as PetSerAl points out, PowerShell uses the Unicode version of the WriteConsole Windows API function to print to the console.
To support (most) Unicode characters, you most switch to one of the "TT" (TrueType) fonts.
PetSerAl points out in a comment that console windows on Windows are currently limited to a single 16-bit code unit per output character (cell); given that only (most of) the characters in the BMP (Basic Multilingual Plane) are self-contained 16-bit code units, the (rare) characters beyond the BMP cannot be represented.
Sadly, even that may not be enough for some (BMP) Unicode characters, given that the Unicode standard is versioned and font representations / implementations may lag.
Indeed, as of Windows 10 release ID 1703, only a select few fonts can render ツ (Unicode character KATAKANA LETTER TU, U+30C4, UTF-8: E3 83 84):
MS Gothic
NSimSum
Note that if you want to (also) change how other applications interpret such output, you must again set $OutputEncoding:
For instance, to make PowerShell expect UTF-8 input from external utilities as well as output UTF-8-encoded data to external utilities, use the following:
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
The above implicitly changes the code page to 65001 (UTF-8), as reflected in chcp (chcp.com).
Note that, for backward compatibility, Windows console windows still default to the single-byte, extended-ASCII legacy OEM code page, such as 437 on US-English systems.
Unfortunately, as of v6.0.0-rc.2, this also applies to PowerShell Core, even though it has otherwise switched to BOM-less UTF-8 as the default encoding, as also reflected in $OutputEncoding.
If you cannot use PowerShell 5's Set-Clipboard function (which is IMO the go-to solution) you can convert/encode your output in a way that clip.exe understands it correctly.
There are two ways to achieve what want here:
Feed clip.exe with a UTF-16 file: clip < UTF16-Shruggie.txt
The important part here is to save the file encoded as: Unicode (which means UTF-16 format little-endian byte order with
BOM)
Encode the string appropriately (the following part works in a PoSh editor like ISE but unfortunately not in a regular console, see mklment0s answer how to achieve this):
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
function shruggie() {
[System.Text.Encoding]::Default.GetString(
[System.Text.Encoding]::UTF8.GetBytes('¯\_(ツ)_/¯')
) | clip.exe
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
shruggie
This works for me. Here is an MSDN blog post that gives further explanations about $OutputEncoding/[Console]::OutputEncoding.
The post Set-Clipbord option is the most direct answer, but as noted a PoSHv5 and higher thing. However, depending on what OS he the OP is on, not all cmdlets are available on all OS/PoSH versions. This is not to say that Set-Clipboard is not, but since the OP says they're new, it's just a heads up.
If you can't go there for whatever reason, you can create your own and or use add-on modules. See this post:
Convert Keith Hill's PowerShell Get-Clipboard and Set-Clipboard to a PSM1 script
The results from using the Set-Clipboard function from the above post and modifying the OP's post for its use:
(Get-CimInstance -ClassName Win32_OperatingSystem).Caption
Microsoft Windows Server 2012 R2 Standard
$PSVersionTable
Name Value
---- -----
PSVersion 4.0
WSManStackVersion 3.0
SerializationVersion 1.1.0.1
CLRVersion 4.0.30319.42000
BuildVersion 6.3.9600.18773
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0}
PSRemotingProtocolVersion 2.2
function Set-ClipBoard
{
Param
(
[Parameter(ValueFromPipeline=$true)]
[string] $text
)
Add-Type -AssemblyName System.Windows.Forms
$tb = New-Object System.Windows.Forms.TextBox
$tb.Multiline = $true
$tb.Text = $text
$tb.SelectAll()
$tb.Copy()
}
function New-Shruggie
{
Set-ClipBoard -text '¯\_(ツ)_/¯'
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
New-Shruggie
¯\_(ツ)_/¯ copied to clipboard.
Results pasted from clipboard
¯\_(ツ)_/¯
There are options however, such as the following, but the above are still the best route.
First remember that output is controlled by the OS codepage and the interpreter (PoSH) and both default to ASCII.
You can see the PoSH default CP settings by looking at the output of the built-in variable
$OutputEncoding
As per the PoSH creator Jeffery Snover says:
The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.
Some do, most don’t.
So, all that being said ... You can change the CodePage, by doing items like...
[Console]::OutputEncoding
Or ...
$OutputEncoding = New-Object -typename System.Text.UTF8Encoding
If sending out put to a file...
$OutPutData | Out-File $outFile -Encoding UTF8

How to cat a UTF-8 (no BOM) file properly/globally in PowerShell? (to another file)

Create a file utf8.txt. Ensure the encoding is UTF-8 (no BOM). Set its content to €
In cmd.exe:
type utf8.txt > out.txt
Content of out.txt is €
In PowerShell (v4):
cat .\utf8.txt > out.txt
or
type .\utf8.txt > out.txt
Out.txt content is €
How do I globally make PowerShell work correctly?
Note: This answer is about Windows PowerShell (up to v5.1); PowerShell [Core, v6+], the cross-platform edition of PowerShell, now fortunately defaults to BOM-less UTF-8 on both in- and output.
Windows PowerShell, unlike the underlying .NET Framework[1]
, uses the following defaults:
on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).
on output: the > and >> redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).
File-consuming and -producing cmdlets do usually support an -Encoding parameter that lets you specify the encoding explicitly.
Prior to Windows PowerShell v5.1, using the underlying Out-File cmdlet explicitly was the only way to change the encoding.
In Windows PowerShell v5.1+, > and >> became effective aliases of Out-File, allowing you to change the encoding behavior of > and >> via the $PSDefaultParameterValues preference variable; e.g.:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'.
For Windows PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2]
, but note that on output, PowerShell invariably adds a BOM to UTF-8 files.
Applied to your example:
Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt
To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.
[1] .NET Framework uses (BOM-less) UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between Windows PowerShell and the framework it is built on is unusual. The difference went away in PowerShell [Core] v6+: both .NET [Core] and PowerShell [Core] default to BOM-less UTF-8.
[2] Get-Content does, however, automatically recognize UTF-8 files with a BOM.
For PowerShell 5.1, enable this setting:
Control Panel, Region, Administrative, Change system locale, Use Unicode UTF-8
for worldwide language support
Then enter this into PowerShell:
$PSDefaultParameterValues['*:Encoding'] = 'Default'
Alternatively, you can upgrade to PowerShell 6 or higher.
https://github.com/PowerShell/PowerShell