Powershell keeps converting to ascii - powershell

I've followed the guide here: Use Windows PowerShell to Look for and Replace a Word in a Microsoft Word Document
My problem is that if I put a UTF8 string into the search filed it gets converted to ASCII before going into the word application.
If you simply copy and paste his code and change the find text to something like Japanese: カルシウム
It will go into work and search for the ascii equivalent: カルシウム
I have tried every suggestion about setting input and output to UTF8 that I can find but nothing seems to be working. I can't even get the powershell console to actually display Japanese characters, all I get are boxes. I think that might have something to do with the fact that I only have 3 fonts and perhaps none of them can display the Japanese characters in the console...but I don't care about that, I want to be able to send the Japanese characters in UTF8 for the find and replace.
Any Help?

For people who keep getting output encoding to ASCII or Unicode all the time, you can set output encoding to whatever encoding you want from Microsoft blog $OutputEncoding
PS C:\> $OutputEncoding //tells you what default encoding you have
PS C:\> $OutputEncoding = [Console]::OutputEncoding //change to the console or you can set it
$OutputEncoding = New-Object -typename System.Text.UTF8Encoding //whatever you want, look up japanese
PS C:\> $OutputEncoding // verify the
encoding

The answer is actually quite easy. If the powershell-script is saved as UTF8, characters are not encoded correctly. You'd need to save the ps1-script encoded as "UTF8 with BOM" in order to get the characters right.

Related

Change Unicode to UTF-8 | PowerShell script

When I use Write-Host in my PowerShell script, the output looks like this: ????? ?????.
This happens because I'm entering strings in Arabic with Write-Host, and it seems that PowerShell doesn't support Arabic...
How do I print text using Write-Host, but in unicode UTF-8 (which supports Arabic).
Example: Write-Host "مرحباً بالعالم"
The output in this case will be: ????? ?????
Any solutions?
Fixed
You need to set a font that supports those characters. Like Cascadia Code PL
Note: The non-PL version didn't work, so get the PL one.
You might have to set the console encoding as well. Unless you really need a different encoding, default to utf8 is a good idea.
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
Before
I didn't actually need wt here, I'd suggest it. windowsterminal is a modern term, which is now an inbox app. Meaning it's default on windows going forward.
There's utf8 that doesn't work on the term, that wt supports (both using the cascadia code pl font

Encoding issue with Powershell `Get-Clipboard`

I would like to retrieve HTML from the clipboard via the command line and am struggling to get the encoding right.
For instance, if you open a command prompt/WSL, copy the following ⇧Shift+⭾TAB and run:
powershell.exe Get-Clipboard
The correct text is retrieved (⇧Shift+⭾TAB).
But if you then try to retrieve the clipboard as html:
powershell.exe "Get-Clipboard -TextFormatType html"
The following text is retrieved
...⇧Shift+⭾TAB...
This seems to be an encoding confusion on part of the Get-Clipboard commandlet. How to work around this?
Edit: As #Zilog80 indicates in the comments, indeed the encoding of the text does not match the encoding which is assumed the text has. I can rectify in Ruby for instance using:
out = `powershell.exe Get-Clipboard -TextFormatType html`
puts out.encode('cp1252').force_encoding('utf-8')
Any idea for how to achieve the same on the command line?
This is indeed a shortcoming of Get-Clipboard. The HTML format is documented to support only UTF-8, regardless of the source encoding of the page, so the cmdlet should interpret it as such, but it doesn't.
I'm speculating as to the encoding PowerShell is going to be using when decoding the data, but it's probably whatever the system default ANSI encoding is. In that case
[Text.Encoding]::UTF8.GetString([Text.Encoding]::Default.GetBytes( `
(Get-Clipboard -TextFormatType Html -Raw) `
))
will recode the text, but with the caveat that if the default ANSI encoding does not cover all code points from 0-255, some characters might get lost. Fortunately Windows-1252 (the most common default) does cover all code points.

Encoding from ANSI when having non-latin letters

I have a very old program (not a server or something on the internet) that I think it use the ANSI (Windows-1252) encoding.
The problem is that some inputs to this program are written in Arabic.
However, when I am trying to read the result, the Arabic words are written with very wired character. For example the input: "نور" is converted to "äæÑ".
The program output should contain a combination of English words and Arabic words.
E.x. It outputs "Name äæÑ" while the correct output should be something like "Name نور".
In general, the English words are correct and readable with both UTF-8 and ANSI. But the Arabic words are read for example as "���" with UTF-8 and as "äæÑ" with ANSI.
I understand that this is because ANSI doesn't have support to non-Latin letters.
but what should I do now? How can I convert them to Arabic again?
Note: I know the exact input and the exact output that this program should produce.
Note2: I don't have the source code of this program. I just want to convert the output file of this program to have the correct words or encoding.
I solved this problem now by typing in the terminal:
iconv -f WINDOWS-1256 -t utf8 < my_File.ged > result.ged
I tried to write code in java that do a similar thing but it wasn't really working with giving my the result I wanted.
I have also tried the previous terminal command but using WINDOWS-1252 instead of WINDOWS-1256 but it wasn't working. So, I guess it is good to try different encoding until it is working

How to configure the encoding for Powershell console?

I have some problems with displaying Chinese characters in the Powershell console. All Chinese are shown as rectangles there. I believe this is an encoding problem. Does anyone know how to configure the Powershell console to use UTF8 encoding?
Have a look at this post
Current Encoding: [Console]::Out
Set Encoding (UTF8): [Console]::OutputEncoding = [System.Text.Encoding]::UTF8

How to convert UNICODE Hebrew appears as Gibberish in VBScript?

I am gathering information from a HEBREW (WINDOWS-1255 / UTF-8 encoding) website using vbscript and WinHttp.WinHttpRequest.5.1 object.
For Example :
Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
...
'writes the file as unicode (can't use Ascii)
Set Fileout = FSO.CreateTextFile("c:\temp\myfile.xml", true, true)
....
Fileout.WriteLine(objWinHttp.responsetext)
When Viewing the file in notepad / notepad++, I see Hebrew as Gibrish / Gibberish.
For example :
äìëåú - äøá àáøäí éåñó - îåøùú
I need a vbscript function to return Hebrew correctly, the function should be similar to the following http://www.pixiesoft.com/flip/ choosing the 2nd radio button and press convert button , you will see Hebrew correctly.
Your script is correctly fetching the byte stream and saving it as-is. No problems there.
Your problem is that the local text editor doesn't know that it's supposed to read the file as cp1255, so it tries the default on your machine of cp1252. You can't save the file locally as cp1252, so that Notepad will read it correctly, because cp1252 doesn't include any Hebrew characters.
What is ultimately going to be reading the file or byte stream, that will need to pick up the Hebrew correctly? If it does not support cp1255, you will need to find an encoding that is supported by that tool, and convert the cp1255 string to that encoding. Suggest you might try UTF-8 or UTF-16LE (the encoding Windows misleadingly calls 'Unicode'.)
Converting text between encodings in VBScript/JScript can be done as a side-effect of an ADODB stream. See the example in this answer.
Thanks to Charming Bobince (that posted the answer), I am now able to see HEBREW correctly (saving a windows-1255 encoding to a txt file (notpad)) by implementing the following :
Function ConvertFromUTF8(sIn)
Dim oIn: Set oIn = CreateObject("ADODB.Stream")
oIn.Open
oIn.CharSet = "X-ANSI"
oIn.WriteText sIn
oIn.Position = 0
oIn.CharSet = "WINDOWS-1255"
ConvertFromUTF8 = oIn.ReadText
oIn.Close
End Function