Powershell script fails to run due to unicode character - powershell

I have a Powershell script that is failing due to unicode characters in it:
MyScript.ps1:
Write-Host "Installing 無書籤..."
When I run this script from Powershell command line, I get the following error:
I gather the issue is Powershell is running in ASCII or some other non-unicode mode. I tried changing it like this:
$OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
But the error still persists. How do I get Powershell to run my script?

The screen shot implies that you're using Windows PowerShell, which interprets BOM-less *.ps1 files as ANSI-encoded (using the usually single-byte ANSI code page determined by the legacy system locale); by contrast, PowerShell [Core] v6+ now assumes UTF-8.
Therefore, for Windows PowerShell to correctly interpret CJK characters you must save your *.ps1 file using a Unicode encoding with BOM; given that PowerShell source code itself uses ASCII-range characters (resulting in a mix of Latin and CJK characters), the best choice is UTF-8 with BOM.
As for what you tried:
$OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
These settings only apply when PowerShell communicates with external programs - see this answer.

First, write and save your script, which contains Unicode characters, in the Windows PowerShell ISE program
Then you can edit it with any program like visual code or ...
This method should be useful.

Related

PowerShell Script Running on one server but throwing error on another [duplicate]

$logstring = Invoke-Command -ComputerName $filesServer -ScriptBlock {
param(
$logstring,
$grp
)
$Klassenbuchordner = "KB " + $grp.Gruppe
$Gruppenordner = $grp.Gruppe
$share = $grp.Gruppe
$path = "D:\Gruppen\$Gruppenordner"
if ((Test-Path D:\Dozenten\01_Klassenbücher\$Klassenbuchordner) -eq $true)
{$logstring += "Verzeichnis für Klassenbücher existiert bereits"}
else {
mkdir D:\Dozenten\01_Klassenbücher\$Klassenbuchordner
$logstring += "Klassenbuchordner wurde erstellt!"
}} -ArgumentList $logstring, $grp
My goal is to test the existence of a directory and create it on demand.
The problem is that the path contains German letters (umlauts), which aren't seen correctly by the target server.
For instance, the server receives path "D:\Dozent\01_Klassenbücher" instead of the expected "D:\Dozent\01_Klassenbücher".
How can I force proper UTF-8 encoding?
Note: Remoting and use of Invoke-Command are incidental to your problem.
Since the problem occurs with a string literal in your source code (...\01_Klassenbücher\...), the likeliest explanation is that your script file is misinterpreted by PowerShell.
In Windows PowerShell, if your script file is de facto UTF-8-encoded but lacks a BOM, the PowerShell engine will misinterpret any non-ASCII-range characters (such as ü) in the script.[1]
Therefore: Re-save your script as UTF-8 with BOM.
Note:
A UTF-8 BOM is no longer strictly necessary in the install-on-demand, cross-platform PowerShell (Core) 7+ edition (which consistently defaults to (BOM-less) UTF-8), but continues to be required if you want your scripts to work in both PowerShell editions.
Why you should save your scripts as UTF-8 with BOM:
Visual Studio Code and other modern editors create UTF-8 files without BOM by default, which is what causes the problem in Windows PowerShell.
By contrast, the PowerShell ISE creates "ANSI"-encoded[1] files, which Windows PowerShell - but not PowerShell Core - reads correctly.
You can only get away with "ANSI"-encoded files:
if your scripts will never be run in PowerShell Core - where all future development effort will go.
if your scripts will never run on a machine where a different "ANSI" code page is in effect.
if your script doesn't contain characters - e.g., emoji - that cannot be represented with your "ANSI" code page.
Given these limitations, it's safest - and future-proof - to always create PowerShell scripts as UTF-8 with BOM.
(Alternatively, you can use UTF-16 (which is always saved with a BOM), but that bloats the file size if you're primarily using ASCII/"ANSI"-range characters, which is likely in PS scripts).
How to make Visual Studio Code create UTF-8 files with-BOM for PowerShell scripts by default:
Note: The following is still required as of v1.11.0 of the PowerShell extension for VSCode, but not that there's a suggestion to make the extension default PowerShell files to UTF-8 with BOM on GitHub.
Add the following to your settings.json file (from the command palette (Ctrl+Shift+P, type settings and select Preferences: Open Settings (JSON)):
"[powershell]": {
"files.encoding": "utf8bom"
}
Note that the setting is intentionally scoped to PowerShell files only, because you wouldn't want all files to default to UTF-8 with BOM, given that many utilities on Unix platforms neither expect nor know how to handle such a BOM.
[1] In the absence of a BOM, Windows PowerShell defaults to the encoding of the system's current "ANSI" code page, as determined by the legacy system locale; e.g., in Western European cultures, Windows-1252.

Powershell: "Unexpected token " error with hebrew letters in string variable [duplicate]

$logstring = Invoke-Command -ComputerName $filesServer -ScriptBlock {
param(
$logstring,
$grp
)
$Klassenbuchordner = "KB " + $grp.Gruppe
$Gruppenordner = $grp.Gruppe
$share = $grp.Gruppe
$path = "D:\Gruppen\$Gruppenordner"
if ((Test-Path D:\Dozenten\01_Klassenbücher\$Klassenbuchordner) -eq $true)
{$logstring += "Verzeichnis für Klassenbücher existiert bereits"}
else {
mkdir D:\Dozenten\01_Klassenbücher\$Klassenbuchordner
$logstring += "Klassenbuchordner wurde erstellt!"
}} -ArgumentList $logstring, $grp
My goal is to test the existence of a directory and create it on demand.
The problem is that the path contains German letters (umlauts), which aren't seen correctly by the target server.
For instance, the server receives path "D:\Dozent\01_Klassenbücher" instead of the expected "D:\Dozent\01_Klassenbücher".
How can I force proper UTF-8 encoding?
Note: Remoting and use of Invoke-Command are incidental to your problem.
Since the problem occurs with a string literal in your source code (...\01_Klassenbücher\...), the likeliest explanation is that your script file is misinterpreted by PowerShell.
In Windows PowerShell, if your script file is de facto UTF-8-encoded but lacks a BOM, the PowerShell engine will misinterpret any non-ASCII-range characters (such as ü) in the script.[1]
Therefore: Re-save your script as UTF-8 with BOM.
Note:
A UTF-8 BOM is no longer strictly necessary in the install-on-demand, cross-platform PowerShell (Core) 7+ edition (which consistently defaults to (BOM-less) UTF-8), but continues to be required if you want your scripts to work in both PowerShell editions.
Why you should save your scripts as UTF-8 with BOM:
Visual Studio Code and other modern editors create UTF-8 files without BOM by default, which is what causes the problem in Windows PowerShell.
By contrast, the PowerShell ISE creates "ANSI"-encoded[1] files, which Windows PowerShell - but not PowerShell Core - reads correctly.
You can only get away with "ANSI"-encoded files:
if your scripts will never be run in PowerShell Core - where all future development effort will go.
if your scripts will never run on a machine where a different "ANSI" code page is in effect.
if your script doesn't contain characters - e.g., emoji - that cannot be represented with your "ANSI" code page.
Given these limitations, it's safest - and future-proof - to always create PowerShell scripts as UTF-8 with BOM.
(Alternatively, you can use UTF-16 (which is always saved with a BOM), but that bloats the file size if you're primarily using ASCII/"ANSI"-range characters, which is likely in PS scripts).
How to make Visual Studio Code create UTF-8 files with-BOM for PowerShell scripts by default:
Note: The following is still required as of v1.11.0 of the PowerShell extension for VSCode, but not that there's a suggestion to make the extension default PowerShell files to UTF-8 with BOM on GitHub.
Add the following to your settings.json file (from the command palette (Ctrl+Shift+P, type settings and select Preferences: Open Settings (JSON)):
"[powershell]": {
"files.encoding": "utf8bom"
}
Note that the setting is intentionally scoped to PowerShell files only, because you wouldn't want all files to default to UTF-8 with BOM, given that many utilities on Unix platforms neither expect nor know how to handle such a BOM.
[1] In the absence of a BOM, Windows PowerShell defaults to the encoding of the system's current "ANSI" code page, as determined by the legacy system locale; e.g., in Western European cultures, Windows-1252.

Powershell - Write-Output produces string with a BOM character

I'm trying to execute such command in a Powershell script:
Write-Output "Some Command" | some-application
When running PS script some-application receives string \xef\xbb\xbfSome Command. The first character is an UTF-8 BOM. All solutions that I can Google, apply only to redirecting output to a file. But I'm trying to redirect a string to another command (via pipe).
The variable $OutputEncoding shows that ASCII is configured, no UTF-8 is set.
I'm running this script from Azure DevOps Pipeline and only there this problem exists.
Note: This answer deals with how to control the encoding that PowerShell uses when data is piped to an external program (to be read via stdin by such a program).
This is separate from:
what character encoding PowerShell cmdlets use by default on output - for more information, see this answer.
what character encoding PowerShell uses when reading data received from external programs - for more information, see this answer.
The implication is that you've mistakenly set the $OutputEncoding preference variable, which determines what character encoding PowerShell uses to send data to external programs, to a UTF-8 encoding with a BOM.
Your problem goes away if you assign a BOM-less UTF-8 encoding instead:
$OutputEncoding = [System.Text.Utf8Encoding]::new($false) # BOM-less
"Some Command" | some-application
Note that this problem wouldn't arise in PowerShell [Core] v6+, where $OutputEncoding defaults to BOM-less UTF-8 (in Windows PowerShell, unfortunately, it defaults to ASCII).
To illustrate the problem:
$OutputEncoding = [System.Text.Encoding]::Utf8 # *with* BOM
"Some Command" | findstr .
The above outputs Some Command, where  is the rendering of the 3-byte UTF-8 BOM (0xef, 0xbb, 0xbf) in the OEM code page 437 (on US-English systems, for instance).

Broken Cmd scripts after creation with PowerShell Write-Output

We have a domain-wide automation tool that can start jobs on servers as admin (Stonebranch UAC - please note: this has nothing to do with Windows "User Access Control", Stronebranch UAC is an Enterprise automation tool). Natively, it looks for Cmd batch scripts, so we use those.
However, I prefer to use PowerShell for everything, so I bulk created dozens of .bat scripts using PowerShell. Nothing worked, the automation tool broke whenever it tried to run the .bat scripts. So I pared back the scripts so that they contained a single line echo 123 and still, everything was broken. We thought it was a problem with the tool, but then tried to run the .bat scripts on the server and they were broken too, just generating some unicode on the command line and failing to run.
So it dawned on us that something about how PowerShell pumps Write-Output commands to create the batch scripts was breaking them (this is on Windows 2012 R2 and PowerShell is 5.1). And I repeat this, for example, if I type the following on a PowerShell console:
Write-Output "echo 123" > test.bat
If I now open a cmd.exe and then try to run test.bat, I just get a splat of 2 unicode-looking characters on the screen and nothing else.
Can someone explain to me a) why this behaviour happens, and b) how can I continue to use PowerShell to generate these batch scripts without them being broken? i.e. do I have to change BOM or UTF-8 settings or whatever to get this working and how do I do that please?
In Windows PowerShell, >, like the underlying Out-File cmdlet, invariably[1] creates "Unicode" (UTF-16LE) files, which cmd.exe cannot read (not even with the /U switch).
In PowerShell [Core] v6+, BOM-less UTF-8 encoding is consistently used, including by >.
Therefore:
If you're using PowerShell [Core] v6+ AND the content of the batch file comprises ASCII-range characters only (7-bit range), you can get away with >.
Otherwise, use Set-Content with -Encoding Oem.
'#echo 123' | Set-Content -Encoding Oem test.bat
If your batch-file source code only ever contains ASCII-range characters (7-bit range), you can get also get away with (in both PowerShell editions):
'#echo 123' | Set-Content test.bat
Note:
As the -Encoding argument implies, the system's active OEM code page is used, which is what batch files expect.
OEM code pages are supersets of ASCII encoding, so a file saved with -Encoding Oem that is composed only of ASCII-range characters is implicitly also an ASCII file. The same applies to BOM-less UTF-8 and ANSI (Default) encoded files composed of ASCII-range characters only.
-Encoding Oem - as opposed to -Encoding Ascii or even using Set-Content's default encoding[2] - therefore only matters if you have non-ASCII-range characters in your batch file's source code, such as é. Such characters, however, are limited to a set of 256 characters in total, given that OEM code pages are fixed-width single-byte encodings, which means that many Unicode characters are inherently unusable, such as €.
[1] In Windows PowerShell v5.1 (and above), it is possible to change >'s encoding via the $PSDefaultParameterValues preference variable - see this answer - however, you won't be able to select a BOM-less UTF-8 encoding, which would be needed for creation of batch files (composed of ASCII-range characters only).
[2] Set-Content's default encoding is the active ANSI code page (Default) in Windows PowerShell (another ASCII superset), and (as for all cmdlets) BOM-less UTF-8 in PowerShell [Core] v6+; for an overview of the wildly inconsistent character encodings in Windows PowerShell, see this answer.
a method that by default creates bom-less utf8 files is:
new-item -Path outfile.bat -itemtype file -value "echo this"

Character-encoding problem with string literal in source code

$logstring = Invoke-Command -ComputerName $filesServer -ScriptBlock {
param(
$logstring,
$grp
)
$Klassenbuchordner = "KB " + $grp.Gruppe
$Gruppenordner = $grp.Gruppe
$share = $grp.Gruppe
$path = "D:\Gruppen\$Gruppenordner"
if ((Test-Path D:\Dozenten\01_Klassenbücher\$Klassenbuchordner) -eq $true)
{$logstring += "Verzeichnis für Klassenbücher existiert bereits"}
else {
mkdir D:\Dozenten\01_Klassenbücher\$Klassenbuchordner
$logstring += "Klassenbuchordner wurde erstellt!"
}} -ArgumentList $logstring, $grp
My goal is to test the existence of a directory and create it on demand.
The problem is that the path contains German letters (umlauts), which aren't seen correctly by the target server.
For instance, the server receives path "D:\Dozent\01_Klassenbücher" instead of the expected "D:\Dozent\01_Klassenbücher".
How can I force proper UTF-8 encoding?
Note: Remoting and use of Invoke-Command are incidental to your problem.
Since the problem occurs with a string literal in your source code (...\01_Klassenbücher\...), the likeliest explanation is that your script file is misinterpreted by PowerShell.
In Windows PowerShell, if your script file is de facto UTF-8-encoded but lacks a BOM, the PowerShell engine will misinterpret any non-ASCII-range characters (such as ü) in the script.[1]
Therefore: Re-save your script as UTF-8 with BOM.
Note:
A UTF-8 BOM is no longer strictly necessary in the install-on-demand, cross-platform PowerShell (Core) 7+ edition (which consistently defaults to (BOM-less) UTF-8), but continues to be required if you want your scripts to work in both PowerShell editions.
Why you should save your scripts as UTF-8 with BOM:
Visual Studio Code and other modern editors create UTF-8 files without BOM by default, which is what causes the problem in Windows PowerShell.
By contrast, the PowerShell ISE creates "ANSI"-encoded[1] files, which Windows PowerShell - but not PowerShell Core - reads correctly.
You can only get away with "ANSI"-encoded files:
if your scripts will never be run in PowerShell Core - where all future development effort will go.
if your scripts will never run on a machine where a different "ANSI" code page is in effect.
if your script doesn't contain characters - e.g., emoji - that cannot be represented with your "ANSI" code page.
Given these limitations, it's safest - and future-proof - to always create PowerShell scripts as UTF-8 with BOM.
(Alternatively, you can use UTF-16 (which is always saved with a BOM), but that bloats the file size if you're primarily using ASCII/"ANSI"-range characters, which is likely in PS scripts).
How to make Visual Studio Code create UTF-8 files with-BOM for PowerShell scripts by default:
Note: The following is still required as of v1.11.0 of the PowerShell extension for VSCode, but not that there's a suggestion to make the extension default PowerShell files to UTF-8 with BOM on GitHub.
Add the following to your settings.json file (from the command palette (Ctrl+Shift+P, type settings and select Preferences: Open Settings (JSON)):
"[powershell]": {
"files.encoding": "utf8bom"
}
Note that the setting is intentionally scoped to PowerShell files only, because you wouldn't want all files to default to UTF-8 with BOM, given that many utilities on Unix platforms neither expect nor know how to handle such a BOM.
[1] In the absence of a BOM, Windows PowerShell defaults to the encoding of the system's current "ANSI" code page, as determined by the legacy system locale; e.g., in Western European cultures, Windows-1252.