How to configure the encoding for Powershell console? - powershell

I have some problems with displaying Chinese characters in the Powershell console. All Chinese are shown as rectangles there. I believe this is an encoding problem. Does anyone know how to configure the Powershell console to use UTF8 encoding?

Have a look at this post
Current Encoding: [Console]::Out
Set Encoding (UTF8): [Console]::OutputEncoding = [System.Text.Encoding]::UTF8

Related

Change Unicode to UTF-8 | PowerShell script

When I use Write-Host in my PowerShell script, the output looks like this: ????? ?????.
This happens because I'm entering strings in Arabic with Write-Host, and it seems that PowerShell doesn't support Arabic...
How do I print text using Write-Host, but in unicode UTF-8 (which supports Arabic).
Example: Write-Host "مرحباً بالعالم"
The output in this case will be: ????? ?????
Any solutions?
Fixed
You need to set a font that supports those characters. Like Cascadia Code PL
Note: The non-PL version didn't work, so get the PL one.
You might have to set the console encoding as well. Unless you really need a different encoding, default to utf8 is a good idea.
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
Before
I didn't actually need wt here, I'd suggest it. windowsterminal is a modern term, which is now an inbox app. Meaning it's default on windows going forward.
There's utf8 that doesn't work on the term, that wt supports (both using the cascadia code pl font

Encoding issue with Powershell `Get-Clipboard`

I would like to retrieve HTML from the clipboard via the command line and am struggling to get the encoding right.
For instance, if you open a command prompt/WSL, copy the following ⇧Shift+⭾TAB and run:
powershell.exe Get-Clipboard
The correct text is retrieved (⇧Shift+⭾TAB).
But if you then try to retrieve the clipboard as html:
powershell.exe "Get-Clipboard -TextFormatType html"
The following text is retrieved
...⇧Shift+⭾TAB...
This seems to be an encoding confusion on part of the Get-Clipboard commandlet. How to work around this?
Edit: As #Zilog80 indicates in the comments, indeed the encoding of the text does not match the encoding which is assumed the text has. I can rectify in Ruby for instance using:
out = `powershell.exe Get-Clipboard -TextFormatType html`
puts out.encode('cp1252').force_encoding('utf-8')
Any idea for how to achieve the same on the command line?
This is indeed a shortcoming of Get-Clipboard. The HTML format is documented to support only UTF-8, regardless of the source encoding of the page, so the cmdlet should interpret it as such, but it doesn't.
I'm speculating as to the encoding PowerShell is going to be using when decoding the data, but it's probably whatever the system default ANSI encoding is. In that case
[Text.Encoding]::UTF8.GetString([Text.Encoding]::Default.GetBytes( `
(Get-Clipboard -TextFormatType Html -Raw) `
))
will recode the text, but with the caveat that if the default ANSI encoding does not cover all code points from 0-255, some characters might get lost. Fortunately Windows-1252 (the most common default) does cover all code points.

Powershell keeps converting to ascii

I've followed the guide here: Use Windows PowerShell to Look for and Replace a Word in a Microsoft Word Document
My problem is that if I put a UTF8 string into the search filed it gets converted to ASCII before going into the word application.
If you simply copy and paste his code and change the find text to something like Japanese: カルシウム
It will go into work and search for the ascii equivalent: カルシウム
I have tried every suggestion about setting input and output to UTF8 that I can find but nothing seems to be working. I can't even get the powershell console to actually display Japanese characters, all I get are boxes. I think that might have something to do with the fact that I only have 3 fonts and perhaps none of them can display the Japanese characters in the console...but I don't care about that, I want to be able to send the Japanese characters in UTF8 for the find and replace.
Any Help?
For people who keep getting output encoding to ASCII or Unicode all the time, you can set output encoding to whatever encoding you want from Microsoft blog $OutputEncoding
PS C:\> $OutputEncoding //tells you what default encoding you have
PS C:\> $OutputEncoding = [Console]::OutputEncoding //change to the console or you can set it
$OutputEncoding = New-Object -typename System.Text.UTF8Encoding //whatever you want, look up japanese
PS C:\> $OutputEncoding // verify the
encoding
The answer is actually quite easy. If the powershell-script is saved as UTF8, characters are not encoded correctly. You'd need to save the ps1-script encoded as "UTF8 with BOM" in order to get the characters right.

What's  sign at the beginning of my source file?

I have a PHP source file where  characters automatically got added in! I don't know from where they have come. I'm not getting any parse errors but it results in weird behavior in the execution of the file. E.g. header location functionality is not working sometimes! I'm curious how these kind of symbols are getting auto generated? I'm using UTF-8 encoding & the sign  is not showing in Notepad++ or Windows Notepad but with Netbeans IDE.
Eg. Code:
<?php
echo "no errors!";
header("Location: http://stackoverflow.com");
exit;
?>
What is this? How can I prevent it?
You propably save the files as UTF-8 with BOM. You should save them as UTF-8 without BOM.
It's called Byte Order Mark, and doesn't always have to be "". http://en.wikipedia.org/wiki/Byte_order_mark
Some Windows applications add BOM by default. In Notepad++ you can use some options in the Encoding menu like Encode in UTF without BOM or Convert to UTF without BOM.
I believe that whether you save it UTF-8 with or without BOM it still happens. I don't think it makes a difference.
Try it, see if it helps.
From a tool like vi or vim, you can modify and save the file without a BOM with the two following commands :
:setlocal nobomb
and then
:w

How to do proper Unicode and ANSI output redirection on cmd.exe?

If you are doing automation on windows and you are redirecting the output of different commands (internal cmd.exe or external, you'll discover that your log files contains combined Unicode and ANSI output (meaning that they are invalid and will not load well in viewers/editors).
Is it is possible to make cmd.exe work with UTF-8? This question is not about display, s about stdin/stdout/stderr redirection and Unicode.
I am looking for a solution that would allow you to:
redirect the output of the internal commands to a file using UTF-8
redirect output of external commands supporting Unicode to the files but encoded as UTF-8.
If it is impossible to obtain this kind of consistence using batch files, is there another way of solving this problem, like using python scripting for this? In this case, I would like to know if it is possible to do the Unicode detection alone (user using the scripting should not remember if the called tools will output Unicode or not, it will just expect to convert the output to UTF-8.
For simplicity we'll assume that if the tool output is not-Unicode it will be considered as UTF-8 (no codepage conversion).
You can use chcp to change the active code page. This will be used for redirecting text as well:
chcp 65001
Keep in mind, though, that this will have no effect if cmd was started with the /u switch which forces Unicode (UTF-16 in this case) redirection output. If that switch is active then all output will be in UTF-16LE, regardless of the codepage set with chcp.
Also note that the console will be unusable for interactive output when set to Raster Fonts. I'm getting fun error messages in that case:
C:\Users\Johannes Rössel\Documents>x
Active code page: 65001
The system cannot write to the specified device.
So either use a sane setup (TrueType font for the console) or don't pull this stunt when using the console interactively and having a path that contains non-ASCII characters.
binmode(STDOUT, ":unix");
without
use encoding 'utf8';
Helped me. With that i had wide character in print warning.