PowerShell Unicode Characters Transforming Unexpectedly - powershell

I've got a program that uses a few hash tables to resolve information. I'm getting some weird issues with foreign characters. Below is an accurate representation:
$Props =
#{
P1 = 'Norte Americano e Inglês'
}
$Expressions =
#{
E1 = { $Props['P1'] }
}
& $Expressions['E1']
If I paste this into PowerShell 5.1 console or run selection in VSCode I get:
Norte Americano e Inglês
As expected. But if I run the code in VSCose (hit F5). I get:
Norte Americano e Inglês
By debugging, setting a breakpoint right after the hash literal, I can tell the incorrect version is actually in the hash. So this isn't somehow a side effect of the call operator or the use of script blocks.
I attempted to set the output encoding like:
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
But this doesn't seem to change the pattern. Frankly, I'm surprised the console is handling Unicode so well in the first place. However, I can't understand the inconsistency. Ultimately this data is written to an AD attribute which again works fine if I execute the steps manually, but gets mangled if I actually run the script, even when the output encoding is set as previously mentioned.
I did look through this Q&A, but I don't seem to be having a console display issue, although that may be a result of the true type fonts. Perhaps they're masking the problem.
Interestingly it does seem to work correctly in VSCode if I switch it to PowerShell 7.1. However, because of integration with the AD cmdlets, which do not function well through implicit session compatibility, it's not possible to use PowerShell Core for this project.
The Dev environment is Windows 2012R2 up-to-date. I'm not sure there's an ability to change the system code page as is mentioned for Win 10 (1909).

This is pretty ugly but what happens if you try this at the end of your code:
$enc = [System.Text.Encoding]::UTF8
$enc.GetString($enc.GetBytes($(& $Expressions['E1'])))
Also, this might help you Encode a string in UTF-8

Related

How to get Powershell in VSCode or ISE to give the specific failing line

I'm sure I must be missing something really basic but I've been revisiting Powershell of late to get up to speed with 7.1 and can't seem to get it to tell me where an error is thrown, either in VSCode or ISE.
In the above from VSCode (same report in ISE) the error isn't on that line, it's a couple of levels deeper in a function called by CompareFiles, but it always seems to report the caller of the caller of the code which has failed, rather than the actual failing line.
I've searched here, there and everywhere and found lots of clever tweaks and debugging ideas which I could add but I don't understand why it doesn't just give me the failing line here, rather than a line a level or two up in the call stack. It's as if the CompareFiles function has some kind of pragma that says "Dont record debugging info for me or anything I call" but it hasn't (and that probably doesn't exist anyway!).
I can't help feeling I've just not set some obvious debug setting, or set one incorrectly while I've been tinkering.
If it makes a difference, I'm calling a PS module from a PS Script, the module is loaded fine from the PSPath via Import-Module, and the line being reported is in the module, as is the actual failing line (both are in the same module), so it's not some problem where it's only debugging the script and not the module.
Both the script and the module have the below at the top;
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
As I say, I get an identical error when I use the ISE so it's not a VSCode setting.
Debugging line by line works fine, so I can step through to find the failing line but surely it should just pop up and tell me.
[Later] I should not it's not just this error, it's been like that for days with all sorts of runtime errors with this and other scripts.
Silly me - I simply removed..
$ErrorActionPreference = "Stop"
..from the script and the module, this was essentially implementing the imaginary pragma I mentioned. I removed it and now get the failing error line.
I probably only need it at one of the two levels if anywhere but error handling works just fine without it, so have removed it everywhere, perhaps I'll look into what it does properly at some point.
Serves me right for adding something blindly because it sounded good, i.e. "Sure, I want it to stop when there's an error, why wouldn't I ? - I'll add that statement then" and not re-testing or looking further into it.

CMD pipe different form Powershell pipe?

I am trying to pipe Node.js output to preatty-pino
node .\dist\GameNode.js | pino-pretty
running this in the CMD I get my formated output but running it inside a powershell I get nothing.
I read that Powershell is using objects when piping, so I tried
node .\dist\GameNode.js | Out-String -Stream | pino-pretty
But this also does not work.
Why does it work inside CMD but not inside Powershell ?
Thanks :)
Note: The specific pino-pretty problem described in the question is not resolved by the information below. Lukas (the OP) has filed a bug report here.
It's surprising that you get nothing, but the fundamental difference is:
cmd.exe's pipeline conducts raw data, i.e. byte streams (which a given program receiving the data may or may not itself interpret as text).
PowerShell's pipeline, when talking to external programs, conducts only text (strings), which has two implications:
On piping data to an external program, text must be encoded, which happens based on the character encoding stored in preference variable $OutputEncoding.
On receiving data from an external program, data must be decoded, which happens based on the character encoding stored in [Console]::OutputEncoding, which by default is the system's OEM code page, as reflected in chcp.
This decoding happens invariably, irrespective of whether the data is then further processed in PowerShell or passed on to another external program.
This sometimes problematic lack of ability to send raw data through PowerShell's pipeline even between two external programs is discussed in this answer.
The only exception is if external-program output is neither captured, sent on through the pipeline, nor redirected to a file: in that case, the data prints straight to the console (terminal), but only in a local console (when using PowerShell remoting to interact with a remote machine, decoding is again invariably involved).
This direct-to-display printing can sometimes hide encoding problems, because some programs, notably python, use full Unicode support situationally in that case; that is, the output may print fine, but when you try to process it further, encoding problems can surface.
A simple way to force decoding is to enclose the call in (...); e.g.,
python -c "print('eé')" prints fine, but
(python -c "print('eé'))" surfaces an encoding problem; see the bottom section for more information
While console applications traditionally use the active OEM code page for character encoding and decoding, Node.js always uses UTF-8.
Therefore, in order for PowerShell to communicate properly with Node.js programs, you must (temporarily) set the following first:
$OutputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
If you want to fundamentally switch to UTF-8, either system-wide (which has far-reaching consequences) or only for PowerShell console windows, see this answer.
As an aside: an intermediate Out-String -Stream pipeline segment is never needed for relaying an external program's output - it is effectively (a costly) no-op, because streaming stdout output line by line is what PowerShell does by default. In other words: it is not surprising that it made no difference in your case.
Optional reading: Convenience function Invoke-WithEncoding and diagnostic function Debug-NativeInOutput for ad-hoc encoding needs / diagnosis:
If switching all PowerShell consoles to UTF-8 isn't an option and/or you need to deal with "rogue" programs that use a specific encoding other than UTF-8 or the active OEM code page, you can install:
Function Invoke-WithEncoding, which temporarily switches to a given encoding when invoking an external program, directly from this Gist as follows (I can assure you that doing so is safe, but you should always check):
# Download and define advanced function Invoke-WithEncoding in the current session.
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex
Function Debug-NativeInOutput, which helps diagnose encoding problems with external programs, directly from this Gist as follows (again, you should check first):
# Download and define advanced function Debug-NativeInOutput in the current session.
irm https://gist.github.com/mklement0/eac1f18fbe0fc2798b214229b747e5dd/raw/Debug-NativeInOutput.ps1 | iex
Below are example commands that use a python command to print an accented character.
Like Node.js, Python's behavior is nonstandard, although it doesn't use UTF-8, but the system's active ANSI(!) code page (rather than the expected OEM code page).
That is, even if you switch your PowerShell consoles UTF-8, communication with Python scripts won't work properly by default, unless extra effort is made, which Invoke-WithEncoding can encapsulate for you:
Note: I'm using Python as an example here, to illustrate how the functions work. It is possible to make Python use UTF-8, namely by either setting environment variable PYTHONUTF8 to 1 or - in v3.7+ - by passing parameter -X utf8 (case-exactly).
Invoke-WithEncoding example:
# Outputs *already-decoded* output, so if the output *prints* fine,
# then *decoding* worked fine too.
PS> Invoke-WithEncoding { python -c "print('eé')" } -Encoding Ansi -WindowsOnly
eé
Note that Invoke-WithEncoding ensures that actual decoding to a .NET string happens before it outputs, so that encoding problems aren't accidentally masked by the direct-to-display output seemingly being correct on Windows (see below for more).
-WindowsOnly is for cross-platform compatibility and ensures that the encoding is only applied on Windows in this case (on Unix, Python uses UTF-8).
Debug-NativeInOutput example:
With the PowerShell console at its default, using the system's OEM code page, you'll see the following output with the same Python command, calling from PowerShell (Core) 7.1:
PS> Debug-NativeInOutput { python -c "print('eé')" }
Note the DecodedOutput property, showing the mis-decoded result based on interpreting Python's output as OEM- rather than as ANSI-encoded: 'eΘ'. (The Input* properties are blank, because the command did not involve piping data to the Python script.)
By contrast, with direct-to-display printing the output prints fine (because Python then - and only then - uses Unicode), which hides the problem, but as soon you want to programmatically process the output - capture in a variable, send to another command in the pipeline, redirect to a file - the encoding problem will surface.
Like Invoke-WithEncoding, Debug-NativeInOutput supports an -Encoding parameter, so if you pass -Encoding Ansi to the call above, you'll see that Python's output is decoded properly.
The output reflects the fact that, in PowerShell (Core), $OutputEncoding defaults to UTF-8, whereas in Windows PowerShell it defaults to ASCII(!). This mismatch with the actual encoding in effect in the console window is problematic, and this comment on GitHub issue #14945 proposes a way to resolve this (for PowerShell (Core) only) in the future.

Powershell Get-ItemProperty Returns data differently depending upon 32 or 64bit OS

My first question on this forum so please forgive any mistakes etc.
I'm writing a PowerShell script that needs to run on both 32 and 64 bit OS's. This in itself is not a problem as I can easily identify the two architectures. The problem arises when I issue a "Get-ItemProperty" command on the registry. With a 32 bit OS I get four lines of unwanted data before the data I actually want, ie PSPath, PAParentPath, PSChildName & PSProvider. The same command issues on a 64 bit OS places those same pieces of data after my data. Having written some PS script to "Select-Object -last 1" to get the bit of data from the end of the last line which works perfectly on 32 bit machines I then found that everything was reversed on 64 bit machines and PS script no longer worked. I've tried using "Select-Object First 1" but this only returns the first part of my data line, if I change the value to 2 then I get everything. So, is there a way of either collecting the whole of the first line or stopping "Get-ItemProperty" from returning all the unwanted lines?
I hope all of that makes sense?
Thanks in advance
MrMackyD
Maybe I'm overlooking something here, but shouldn't
Get-ItemProperty <item> | Select-Object PSPath,PAParentPath,PSChildName,PSProvider
work perfectly fine in both cases?
What is the data you actually want? Can you simply select that directly? For example, (gp Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion).DevicePath reads the DevicePath entry directly, ignoring the PS* properties added by PowerShell.
Your issue is not so much of bitness, but of ignoring the PowerShell-specific properties.

/SUBSYSTEM:Windows program will not write to command line

I have a mixed mode C++-CLI program in Visual Studio 2005 that is set to use the /SUBSYSTEM:Windows. Generally speaking it is a graphical application that is launched from its shortcut or through the filetype registered to it.
However, there is a rare occasion where a user will want to run it from the command line with arguments. I can access the arguments just fine, its when it comes to writing to the console, in response to the program being launched from the command line with arguments, where I don't see Console::WriteLine having any effect.
What am I doing wrong?
This one's annoying, I agree. You're not doing anything wrong, it's a quirk of the way Windows is set up.
It is possible to solve this, at least in some cases, see http://blogs.msdn.com/junfeng/archive/2004/02/06/68531.aspx . I've not come across anybody else who's actually used these methods though.
Most people IME just create two versions of the executable with different names, one for batch users ("myapp.exe") and one for when it's run from the start menu ("myappw.exe").
For more information, some of the suggestions at How to output to the console in C++/Windows may be useful.
It's an old problem - see http://www.codeproject.com/KB/cpp/EditBin.aspx for solutions
You can also reopen the streams to a console
int WINAPI WinMain(HINSTANCE hInst, HINSTANCE /*hPrevInst*/, LPSTR cmd_line, int showmode)
{
AllocConsole(); //create a console
ifstream conin("con"); // not sure if this should be "con:" ?
ofstream conout("con");
cout.rdbuf(conout.rdbuf());
cerr.rdbuf(conout.rdbuf());
cin.rdbuf(conin.rdbuf());
FreeConsole();
return 0;
}
edit: sorry this is pure C++, don't know about C++/cli

Python 3, is using sys.stdout.buffer.write() good style?

After I learned about reading unicode files in Python 3.0 web script, now it's time for me to learn using print() with unicode.
I searched for writing unicode, for example this question explains that you can't write unicode characters to non-unicode console. However, in my case, the output is given to Apache and I am sure that it is capable of handling unicode text. For some reason, however, the stdout of my web script is in ascii.
Obviously, if I was opening a file to write myself, I would do something like
open(filename, 'w', encoding='utf8')
but since I'm given an open stream, I resorted to using
sys.stdout.buffer.write(mytext.encode('utf-8'))
and everything seems to work. Does this violate some rule of good behavior or has any unintended consequences?
I don't think you're breaking any rule, but
sys.stdout = codecs.EncodedFile(sys.stdout, 'utf8')
looks like it might be handier / less clunky.
Edit: per comments, this isn't quite right -- #Miles gave the right variant (thanks!):
sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
Edit: if you can arrange for environment variable PYTHONIOENCODING to be set to utf8 when Apache starts your script, that would be even better, making sys.stdout be set to utf8 automatically; but if that's unfeasible or impractical the codecs solution stands.
This is an old answer but I'll add my version here since I first ventured here before finding my solution.
One of the issues with codecs.getwriter is if you are running a script of sorts, the output will be buffered (whereas normally python stdout prints after every line).
sys.stdout in the console is a IOTextWrapper, so my solution uses that. This also allows you to set line_buffering=True or False.
For example, to set stdout to, instead of erroring, backslash encode all output:
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding,
errors="backslashreplace", line_buffering=True)
To force a specific encoding (in this case utf8):
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding="utf8",
line_buffering=True)
A note, calling sys.stdout.detach() will close the underlying buffer. Some modules use sys.__stdout__, which is just an alias for sys.stdout, so you may want to set that as well
sys.stdout = sys.__stdout__ = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)
sys.stderr = sys.__stderr__ = io.TextIOWrapper(sys.stderr.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)