when I try to manipulate an .ini File with Powershell it always switch the encoding to UTF-8.
My Code:
Get-Content -Path "./update.ini" -Encoding ascii | Out-File -FilePath "ascii_update.ini" -Encoding ascii
The file needs to stay ASCII, so how can I disable this behaviour or how to switch it back to ASCII?
German characters will not be shown correctly
Given that you don't want UTF-8 encoding yet you want German umlauts, what you're looking for is ANSI encoding, not ASCII.
In Windows PowerShell, ANSI encoding is the default encoding used by Get-Content and Set-Content (but not Out-File, which defaults to "Unicode" (UTF-16LE)), so all you need is the following:
# Windows PowerShell: ANSI encoding is the default for Get-Content / Set-Content
Get-Content ./update.ini | Set-Content ascii_update.ini
In PowerShell (Core) 7+, (BOM-less) UTF-8 is now the default, across all cmdlets, so you must request ANSI encoding explicitly, using -Encoding
Unfortunately, whereas Default refers to the system's active ANSI encoding in Windows PowerShell, in PowerShell (Core) it now refers to UTF-8, and there is no predefined ANSI enumeration value to complement the OEM value - this baffling omission is discussed in GitHub issue #6562.
Therefore, you must determine the active ANSI code page explicitly, as shown below.
$ansiEnc = [cultureinfo]::CurrentCulture.TextInfo.ANSICodePage
Get-Content -Encoding $ansiEnc ./update.ini |
Set-Content -Encoding $ansiEnc ascii_update.ini
notepad shows it as UTF-8 on the bottom right side.
ASCII encoding is a subset of UTF-8 encoding, which is why most editors show pure ASCII files as UTF-8, because they are by definition also valid UTF-8 files.
Note that if you save or read text that contains non-ASCII characters with -Encoding ASCII, the non-ASCII characters are "lossily" transcoded to verbatim ? characters.
Optional reading: managing INI files as UTF-16LE ("Unicode") encoded, support via Windows API functions:
zett42 points out that the WritePrivateProfileString and GetPrivateProfileString
Windows API functions interpret INI files as follows:
If a file has a UTF-16LE ("Unicode") BOM, it is read and updated as such.
Otherwise, it is invariably interpreted as ANSI-encoded (even if it has a different Unicode encoding's BOM, such as UTF-8).
If you let WritePrivateProfileString create an INI file implicitly, it is always created without a BOM, and therefore treated as ANSI-encoded (even if you use the Unicode version of the API function). If you try to write non-ANSI-range Unicode characters to such a file, they are quietly and lossily transcoded as follows: either to an ASCII-range equivalent, for accented letters, if applicable (e.g., ă is transoced to a); otherwise, to verbatim ?
Thus, creating the INI file of interest explicitly with a UTF-16lE BOM is necessary in order to maintain the file as UTF-16LE-encoded and therefore enable full Unicode support.
Thus, you could create the INI file initially with a command such as Set-Content -Encoding Unicode ./update.ini -Value #(), which creates an (otherwise) empty file that contains only a UTF-16LE BOM, and then stick with -Encoding Unicode if you need to manipulate the file directly.
This MIT-licensed Gist (authored by me) contains module file IniFileHelper.psm1, whose Get-IniValue and Set-IniValue functions wrap the above-mentioned Windows API functions, with the crucial difference that when Set-IniValue implicitly creates an INI file it uses UTF-16LE encoding.
The following, self-contained example demonstrates this:
# Download the module code and import it via a session-scoped, dynamic module.
# IMPORTANT:
# While I can personally assure you that doing this is safe,
# you should always check the source code yourself first.
$null = New-Module -Verbose -ScriptBlock ([scriptblock]::Create((Invoke-RestMethod 'https://gist.githubusercontent.com/mklement0/006c2352ddae7bb05693be028240f5b6/raw/1e2520810213f76f2e8f419d0e48892a4009de6a/IniFileHelper.psm1')))
# Implicitly create file "test.ini" in the current directory,
# and write key "TestKey" to section "Main", with a value
# that contains an ASCII-range character, an ANSI-range character,
# and a character beyond either of these two ranges.
Set-IniValue test.ini Main TestKey 'haäă'
# Now retrieve the same entry, which should show the exact
# same value, 'haäă'
# Note: If there is a preexisting "test.ini" file that does NOT
# have a UTF-16LE BOM, the non-ANSI 'ă' character would be
# "best-fit" transcoded to ASCII 'a'.
# Other non-ANSI characters that do not have ASCII-range analogs
# would be lossily transcoded to verbatim '?'
Get-IniValue test.ini Main TestKey
Related
I'm trying to execute such command in a Powershell script:
Write-Output "Some Command" | some-application
When running PS script some-application receives string \xef\xbb\xbfSome Command. The first character is an UTF-8 BOM. All solutions that I can Google, apply only to redirecting output to a file. But I'm trying to redirect a string to another command (via pipe).
The variable $OutputEncoding shows that ASCII is configured, no UTF-8 is set.
I'm running this script from Azure DevOps Pipeline and only there this problem exists.
Note: This answer deals with how to control the encoding that PowerShell uses when data is piped to an external program (to be read via stdin by such a program).
This is separate from:
what character encoding PowerShell cmdlets use by default on output - for more information, see this answer.
what character encoding PowerShell uses when reading data received from external programs - for more information, see this answer.
The implication is that you've mistakenly set the $OutputEncoding preference variable, which determines what character encoding PowerShell uses to send data to external programs, to a UTF-8 encoding with a BOM.
Your problem goes away if you assign a BOM-less UTF-8 encoding instead:
$OutputEncoding = [System.Text.Utf8Encoding]::new($false) # BOM-less
"Some Command" | some-application
Note that this problem wouldn't arise in PowerShell [Core] v6+, where $OutputEncoding defaults to BOM-less UTF-8 (in Windows PowerShell, unfortunately, it defaults to ASCII).
To illustrate the problem:
$OutputEncoding = [System.Text.Encoding]::Utf8 # *with* BOM
"Some Command" | findstr .
The above outputs Some Command, where  is the rendering of the 3-byte UTF-8 BOM (0xef, 0xbb, 0xbf) in the OEM code page 437 (on US-English systems, for instance).
We have a domain-wide automation tool that can start jobs on servers as admin (Stonebranch UAC - please note: this has nothing to do with Windows "User Access Control", Stronebranch UAC is an Enterprise automation tool). Natively, it looks for Cmd batch scripts, so we use those.
However, I prefer to use PowerShell for everything, so I bulk created dozens of .bat scripts using PowerShell. Nothing worked, the automation tool broke whenever it tried to run the .bat scripts. So I pared back the scripts so that they contained a single line echo 123 and still, everything was broken. We thought it was a problem with the tool, but then tried to run the .bat scripts on the server and they were broken too, just generating some unicode on the command line and failing to run.
So it dawned on us that something about how PowerShell pumps Write-Output commands to create the batch scripts was breaking them (this is on Windows 2012 R2 and PowerShell is 5.1). And I repeat this, for example, if I type the following on a PowerShell console:
Write-Output "echo 123" > test.bat
If I now open a cmd.exe and then try to run test.bat, I just get a splat of 2 unicode-looking characters on the screen and nothing else.
Can someone explain to me a) why this behaviour happens, and b) how can I continue to use PowerShell to generate these batch scripts without them being broken? i.e. do I have to change BOM or UTF-8 settings or whatever to get this working and how do I do that please?
In Windows PowerShell, >, like the underlying Out-File cmdlet, invariably[1] creates "Unicode" (UTF-16LE) files, which cmd.exe cannot read (not even with the /U switch).
In PowerShell [Core] v6+, BOM-less UTF-8 encoding is consistently used, including by >.
Therefore:
If you're using PowerShell [Core] v6+ AND the content of the batch file comprises ASCII-range characters only (7-bit range), you can get away with >.
Otherwise, use Set-Content with -Encoding Oem.
'#echo 123' | Set-Content -Encoding Oem test.bat
If your batch-file source code only ever contains ASCII-range characters (7-bit range), you can get also get away with (in both PowerShell editions):
'#echo 123' | Set-Content test.bat
Note:
As the -Encoding argument implies, the system's active OEM code page is used, which is what batch files expect.
OEM code pages are supersets of ASCII encoding, so a file saved with -Encoding Oem that is composed only of ASCII-range characters is implicitly also an ASCII file. The same applies to BOM-less UTF-8 and ANSI (Default) encoded files composed of ASCII-range characters only.
-Encoding Oem - as opposed to -Encoding Ascii or even using Set-Content's default encoding[2] - therefore only matters if you have non-ASCII-range characters in your batch file's source code, such as é. Such characters, however, are limited to a set of 256 characters in total, given that OEM code pages are fixed-width single-byte encodings, which means that many Unicode characters are inherently unusable, such as €.
[1] In Windows PowerShell v5.1 (and above), it is possible to change >'s encoding via the $PSDefaultParameterValues preference variable - see this answer - however, you won't be able to select a BOM-less UTF-8 encoding, which would be needed for creation of batch files (composed of ASCII-range characters only).
[2] Set-Content's default encoding is the active ANSI code page (Default) in Windows PowerShell (another ASCII superset), and (as for all cmdlets) BOM-less UTF-8 in PowerShell [Core] v6+; for an overview of the wildly inconsistent character encodings in Windows PowerShell, see this answer.
a method that by default creates bom-less utf8 files is:
new-item -Path outfile.bat -itemtype file -value "echo this"
I have a banking application script that generates a “filtered” output file by removing error records from a daily input bank file (see How do I create a Windows Server script to remove error records, AND the previous record to each, from a file with the results written to a NEW file). The “filtered” output file will be sent to the State for updating their system. As a side note, the original input files that we receive from the bank show as Unix 1252 (ANSI Latin 1) in my file editor (UltraEdit), and each record ends only with a line feed.
I sent a couple of test output files generated from both “clean” (no errors) and “dirty” (contained 4 errors) input files to the State for testing on their end to make sure all was good before implementation, but was a little concerned because the output files were generated in UTF-16 encoding with CRLF line endings, where the input and current unfiltered output are encoded in Windows-1252. All other output files on this system are Windows-1252 encoded.
Sure enough… I got word back that the encoding is incorrect for the state’s system. Their comments were:
“The file was encoded UCS-2 Little Endian and needed to be converted to ANSI to run on our system. That was unexpected.
After that the file with no detail transactions would run through our EFT rejects program ok.
It seems that it was processed ok, but we had to do some conversion. Can it be sent in ANSI or needs to be done in UCS 2 Little Endian?”
I have tried unsuccessfully adding –Encoding “Windows-1252” and –Encoding windows-1252 to my out-file statement, with both returning the message:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument
"Windows-1252" does not belong to the set
"unknown,string,unicode,bigendianunicode,utf8,utf7,utf32,ascii,default,oem"
specified by the ValidateSet attribute. Supply an argument that is in the set
and then try the command again.
At C:\EZTRIEVE\PwrShell\TEST2_FilterR02.ps1:47 char:57
+ ... OutputStrings | Out-File $OutputFileFiltered -Encoding "Windows-1252"
+ ~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingVal
idationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.Power
Shell.Commands.OutFileCommand
I’ve looked high and low for some help with this for days, but nothing is really clear, and the vast majority of what I found, involved converting FROM Windows-1252 TO another encoding. Yesterday, I found a comment somewhere on stackoverflow that “ANSI” is the same as Windows-1252, but so far, I have not found anything that shows me how to properly append the Windows-1252 encoding option to my out-file statement so Powershell will accepted it. I really need to get this project finished so I can tackle the next several that have been added to my queue. Is there possibly a subparameter that I’m missing that needs to be appended to –Encoding?
This is being tested under Dollar Universe (job scheduler) on a new backup server running Windows Server 2016 Standard with Powershell 5.1. Our production system runs Dollar Universe on Windows Server 2012 R2, also with Powershell 5.1 (yes, we are looking for a sufficient upgrade window :-)
As of my last attempt, my Powershell script is :
[cmdletbinding()]
Param
(
[string] $InputFilePath
)
# Read the text file
$InputFile = Get-Content $InputFilePath
# Initialize output record counter
$Inrecs = 0
$Outrecs = 0
# Get the time
$Time = Get-Date -Format "MM_dd_yy"
# Set up the output file name
$OutputFileFiltered = "C:\EZTRIEVE\CFIS\DATA\TEST_CFI_EFT_RETURN_FILTERED"
# Initialize the variable used to hold the output
$OutputStrings = #()
# Loop through each line in the file
# Check the line ahead for "R02" and add it to the output
# or skip it appropriately
for ($i = 0; $i -lt $InputFile.Length - 1; $i++)
{
if ($InputFile[$i + 1] -notmatch "R02")
{
# The next record does not contain "R02", increment count and add it to the output
$Outrecs++
$OutputStrings += $InputFile[$i]
}
else
{
# The next record does contain "R02", skip it
$i++
}
}
# Add the trailer record to the output
$OutputString += $InputFile[$InputFile.Length - 1]
# Write the output to a file
# $OutputStrings | Out-File $OutputFileFiltered
$OutputStrings | Out-File $OutputFileFiltered -Encoding windows-1252
# Display record processing stats:
$Filtered = $Outrecs-$i
Write-Host $i Input records processed
Write-Host $Filtered Error records filtered out
Write-Host $Outrecs Output records written
Note:
You later clarified that you need LF (Unix-style) newlines - see the bottom section.
The next section deals with the question as originally asked and presents solutions that result in files with CRLF (Windows-style) newlines (when run on Windows).
If your system's Language for non-Unicode programs setting (a.k.a. the system locale) happens to have Windows-1252 as the active ANSI code page (e.g, on US-English or Western European systems), use -Encoding Default, because Default refers to that code page in Windows PowerShell (but not in PowerShell Core, which defaults to BOM-less UTF-8 and doesn't support the Default encoding identifier).
Verify with: (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP) -eq '1252'
... | Out-File -Encoding Default $file
Note:
If you are certain that your data is actually composed exclusively of ASCII-range characters (characters with code points in the 7-bit range, which excludes accented characters such as ü), -Encoding Default will work even if your system locale uses an ANSI code page other than Windows-1252, given that all (single-byte) ANSI code pages share all ASCII characters in their 7-bit subrange; you could then also use -Encoding ASCII, but note that if there are non-ASCII characters present after all, they will be transliterated to literal ? chars., resulting in loss of information.
The Set-Content cmdlet actually defaults to the Default encoding in Windows PowerShell (but not PowerShell Core, where the consistent default is UTF-8 without BOM).
While Set-Content's stringification behavior differs from that of Out-File - see this answer - it's actually the better choice if the objects to write to the file already are strings.
Otherwise, you have two options:
Use the .NET Framework file I/O functionality directly, where you can use any encoding supported by .NET; e.g.:
$lines = ... # array of strings (to become lines in a file)
# CAVEAT: Be sure to specify an *absolute file path* in $file,
# because .NET typically has a different working dir.
[IO.File]::WriteAllLines($file, $lines, [Text.Encoding]::GetEncoding(1252))
Use PowerShell Core, which allows you to pass any supported .NET encoding to the -Encoding parameter:
... | Out-File -Encoding ([Text.Encoding]::GetEncoding(1252)) $file
Note that in PSv5.1+ you can actually change the encoding used by the > and >> operators, as detailed in this answer.
However, in Windows PowerShell you are again limited to the encodings supported by Out-File's -Encoding parameter.
Creating text files with LF (Unix-style) newlines on Windows:
PowerShell (invariably) and .NET (by default) use the platform-appropriate newline sequence - as reflected in [Environment]::NewLine - when writing strings as lines to a file.
In other words: on Windows you'll end up with files with CRLF newlines, and on Unix-like platforms (PowerShell Core) with LF newlines.
Note that the solutions below assume that the data to write to your file is an array of strings that represent the lines to write, as returned by Get-Content, for instance (where the resulting array elements are the input file's lines without their trailing newline sequence).
To explicitly create a file with LF newlines on Windows (PSv5+):
$lines = ... # array of strings (to become lines in a file)
($lines -join "`n") + "`n" | Set-Content -NoNewline $file
"`n" produces a LF character.
Note:
In Windows PowerShell this implicitly uses the active ANSI code page's encoding.
In PowerShell Core this implicitly creates a UTF-8 file without BOM. If you want to use the active ANSI code page instead, use:
-Encoding ([Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)))
In PSv4- (PowerShell version 4 or lower), you'll have to use the .NET Framework directly:
$lines = ... # array of strings (to become lines in a file)
# CAVEAT: Be sure to specify an *absolute file path* in $file,
# because .NET typically has a different working dir.
[IO.File]::WriteAllText($file, ($lines -join "`n") + "`n")
Note:
In both Windows PowerShell and PowerShell Core this creates a UTF-8 file without BOM.
If you want to use the active ANSI code page instead, pass the following as an additional argument to WriteAllText():
([Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)))
By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.
It can be done on a case-by-case basis by replacing the >foo.txt syntax with | out-file foo.txt -encoding utf8 but this is awkward to have to repeat every time.
The persistent way to set things in PowerShell is to put them in \Users\me\Documents\WindowsPowerShell\profile.ps1; I've verified that this file is indeed executed on startup.
It has been said that the output encoding can be set with $PSDefaultParameterValues = #{'Out-File:Encoding' = 'utf8'} but I've tried this and it had no effect.
https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.
How do you set PowerShell to use UTF-8?
Note:
The next section applies primarily to Windows PowerShell.
See the section after it for the cross-platform PowerShell Core (v6+) edition.
In both cases, the information applies to making PowerShell use UTF-8 for reading and writing files.
By contrast, for information on how to send and receive UTF-8-encoded strings to and from external programs, see this answer.
A system-wide switch to UTF-8 is possible nowadays (since recent versions of Windows 10): see this answer, but note the following caveats:
The feature has far-reaching consequences, because both the OEM and the ANSI code page are then set to 65001, i.e. UTF-8; also the feature is still considered a beta feature as of this writing.
In Windows PowerShell, this takes effect only for those cmdlets that default to the ANSI code page, notably Set-Content, but not Out-File / >, and it also applies to reading files, notably including Get-Content and how PowerShell itself reads source code.
The Windows PowerShell perspective:
In PSv5.1 or higher, where > and >> are effectively aliases of Out-File, you can set the default encoding for > / >> / Out-File via the $PSDefaultParameterValues preference variable:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
Note:
In Windows PowerShell (the legacy edition whose latest and final version is v5.1), this invariably creates UTF-8 file with a (pseudo) BOM.
Many Unix-based utilities do not recognize this BOM (see bottom); see this post for workarounds that create BOM-less UTF-8 files.
In PowerShell (Core) v6+, BOM-less UTF-8 is the default (see next section), but if you do want a BOM there, you can use 'utf8BOM'
In PSv5.0 or below, you cannot change the encoding for > / >>, but, on PSv3 or higher, the above technique does work for explicit calls to Out-File.
(The $PSDefaultParameterValues preference variable was introduced in PSv3.0).
In PSv3.0 or higher, if you want to set the default encoding for all cmdlets that support
an -Encoding parameter (which in PSv5.1+ includes > and >>), use:
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
If you place this command in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding via their -Encoding parameter.
Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine; however, to avoid a session-global change, use the following form to create a local copy of $PSDefaultParameterValues:
$PSDefaultParameterValues = #{ '*:Encoding' = 'utf8' }
For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.
The automatic $OutputEncoding variable is unrelated, and only applies to how PowerShell communicates with external programs (what encoding PowerShell uses when sending strings to them) - it has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.
Optional reading: The cross-platform perspective: PowerShell Core:
PowerShell is now cross-platform, via its PowerShell Core edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.
This means that source-code files without a BOM are assumed to be UTF-8, and using > / Out-File / Set-Content defaults to BOM-less UTF-8; explicit use of the utf8 -Encoding argument too creates BOM-less UTF-8, but you can opt to create files with the pseudo-BOM with the utf8bom value.
If you create PowerShell scripts with an editor on a Unix-like platform and nowadays even on Windows with cross-platform editors such as Visual Studio Code and Sublime Text, the resulting *.ps1 file will typically not have a UTF-8 pseudo-BOM:
This works fine on PowerShell Core.
It may break on Windows PowerShell, if the file contains non-ASCII characters; if you do need to use non-ASCII characters in your scripts, save them as UTF-8 with BOM.
Without the BOM, Windows PowerShell (mis)interprets your script as being encoded in the legacy "ANSI" codepage (determined by the system locale for pre-Unicode applications; e.g., Windows-1252 on US-English systems).
Conversely, files that do have the UTF-8 pseudo-BOM can be problematic on Unix-like platforms, as they cause Unix utilities such as cat, sed, and awk - and even some editors such as gedit - to pass the pseudo-BOM through, i.e., to treat it as data.
This may not always be a problem, but definitely can be, such as when you try to read a file into a string in bash with, say, text=$(cat file) or text=$(<file) - the resulting variable will contain the pseudo-BOM as the first 3 bytes.
Inconsistent default encoding behavior in Windows PowerShell:
Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.
Note:
The following doesn't aspire to cover all standard cmdlets.
Googling cmdlet names to find their help topics now shows you the PowerShell Core version of the topics by default; use the version drop-down list above the list of topics on the left to switch to a Windows PowerShell version.
Historically, the documentation frequently incorrectly claimed that ASCII is the default encoding in Windows PowerShell; fortunately, this has since been corrected.
Cmdlets that write:
Out-File and > / >> create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content / Add-Content (see next point); New-ModuleManifest and Export-CliXml also create UTF-16LE files.
Set-Content (and Add-Content if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default).
Export-Csv indeed creates ASCII files, as documented, but see the notes re -Append below.
Export-PSSession creates UTF-8 files with BOM by default.
New-Item -Type File -Value currently creates BOM-less(!) UTF-8.
The Send-MailMessage help topic also claims that ASCII encoding is the default - I have not personally verified that claim.
Start-Transcript invariably creates UTF-8 files with BOM, but see the notes re -Append below.
Re commands that append to an existing file:
>> / Out-File -Append make no attempt to match the encoding of a file's existing content.
That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding, which is not an option with >> (except indirectly in PSv5.1+, via $PSDefaultParameterValues, as shown above).
In short: you must know the encoding of an existing file's content and append using that same encoding.
Add-Content is the laudable exception: in the absence of an explicit -Encoding argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.
This inconsistency between Out-File -Append / >> and Add-Content, which also affects PowerShell Core, is discussed in GitHub issue #9423.
Export-Csv -Append partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
To put it differently: in the absence of a BOM, Export-Csv -Append assumes UTF-8 is, whereas Add-Content assumes ANSI.
Start-Transcript -Append partially matches the existing encoding: It correctly matches encodings with BOM, but defaults to potentially lossy ASCII encoding in the absence of one.
Cmdlets that read (that is, the encoding used in the absence of a BOM):
Get-Content and Import-PowerShellDataFile default to ANSI (Default), which is consistent with Set-Content.
ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.
By contrast, Import-Csv, Import-CliXml and Select-String assume UTF-8 in the absence of a BOM.
To be short, use:
write-output "your text" | out-file -append -encoding utf8 "filename"
You may want to put parts of the script into braces so you could redirect output of few commands:
{
command 1
command 2
} | out-file -append -encoding utf8 "filename"
A dump made using PowerShell on Windows with output redirection creates a file that has UTF-16 encoding. To work around this issue, you can try:
mysqldump.exe [options] --result-file=dump.sql
Reference link: mysqldump_result-file
Create a file utf8.txt. Ensure the encoding is UTF-8 (no BOM). Set its content to €
In cmd.exe:
type utf8.txt > out.txt
Content of out.txt is €
In PowerShell (v4):
cat .\utf8.txt > out.txt
or
type .\utf8.txt > out.txt
Out.txt content is €
How do I globally make PowerShell work correctly?
Note: This answer is about Windows PowerShell (up to v5.1); PowerShell [Core, v6+], the cross-platform edition of PowerShell, now fortunately defaults to BOM-less UTF-8 on both in- and output.
Windows PowerShell, unlike the underlying .NET Framework[1]
, uses the following defaults:
on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).
on output: the > and >> redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).
File-consuming and -producing cmdlets do usually support an -Encoding parameter that lets you specify the encoding explicitly.
Prior to Windows PowerShell v5.1, using the underlying Out-File cmdlet explicitly was the only way to change the encoding.
In Windows PowerShell v5.1+, > and >> became effective aliases of Out-File, allowing you to change the encoding behavior of > and >> via the $PSDefaultParameterValues preference variable; e.g.:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'.
For Windows PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2]
, but note that on output, PowerShell invariably adds a BOM to UTF-8 files.
Applied to your example:
Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt
To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.
[1] .NET Framework uses (BOM-less) UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between Windows PowerShell and the framework it is built on is unusual. The difference went away in PowerShell [Core] v6+: both .NET [Core] and PowerShell [Core] default to BOM-less UTF-8.
[2] Get-Content does, however, automatically recognize UTF-8 files with a BOM.
For PowerShell 5.1, enable this setting:
Control Panel, Region, Administrative, Change system locale, Use Unicode UTF-8
for worldwide language support
Then enter this into PowerShell:
$PSDefaultParameterValues['*:Encoding'] = 'Default'
Alternatively, you can upgrade to PowerShell 6 or higher.
https://github.com/PowerShell/PowerShell