Storing standard output from native app with Utf8 characters - powershell

I'm trying to capture standard output from npm that I run from PowerShell. Npm downloads packages that are missing and outputs appropriate tree.
What it looks like:
.
That's correct output.
When I try to do the same from PowerShell and capture the result, I'm not able to get the same characters:
It's the same when I use
gpm install | Tee-Object -FilePath or
gpm install | Out-File -FilePath .. -Encoding Unicode # or Utf8
$a = gpm install
When I redircect in cmd.exe output to a file, the content looks like this:
How can I capture the output correctly in PowerShell?

PowerShell is an object-based shell with an object-based pipeline, but for native applications the pipeline is byte-stream-based. So PowerShell has to convert from/to a byte stream when it passes data from/to a native application. This conversion happens even when you pipe data from one native application to another or redirect a native application's output to a file.
When PowerShell receives data from a native application, it decodes the byte stream as a string, and splits that string by the newline character. For decoding byte streams to strings PowerShell uses the console's output encoding: [Console]::OutputEncoding. If you know that your application use a different output encoding, you can explicitly change the console's output encoding to match your application's:
[Console]::OutputEncoding=[Text.Encoding]::UTF8
When PowerShell passes data to a native application, it convert objects to strings using the encoding specified in the $OutputEncoding preference variable.

Related

Powershell: convert stdin to UTF-8

I'm trying to automate some Terraform work on Windows. I want to export my Terraform state as JSON then use the Windows version of jq.exe to pull out relevant bits of information.
Ideally, my command line would look like:
terraform show -json | jq '<my-jq-query>'
Unfortunately, by default Windows appears to use UTF-16 LE (so that's what the Terraform JSON output is encoded with) and jq.exe only supports UTF-8.
I found that the PowerShell command Set-Content has an -Encoding parameter that can be used to specify output encoding, but I can't figure out how to get Set-Content to read from stdin instead of from a file. I mean, I'd like to do:
terraform show -json | Set-Content -Encoding utf8 | jq '<my-jq-query'>
but I can't figure out how to get it to work.
How can I coax PowerShell into allow me to convert character encoding in the pipeline without reading/writing to a file?

Running a self decrypting base64 powershelll script locally with powershell -file /path/to/ps1

I want to keep my powershell scripts on my local server in base64 but when run from schtasks or locally using powershell -file /path/to/ps1 they self decode. Is this possible??
I tried:
function Decode { $data = 'base 64 script'
[System.Text.Encoding]
::ASCII.GetString([System.Convert]::FromBase64String($data))}
Decode
This does not work. Any ideas?
I see at least two options for this situation. One option is to send the base64 encoded command to Powershell.exe using the -EncodedCommand parameter. The second option is to create your decoding script to accept another script that contains the base64 encoded command as a parameter value.
Option 1: Passing the Encoded Command
This assumes your base64 encoded command is a string version of your PowerShell commands formatted using UTF-16LE character encoding (Unicode). Let's also assume that you have a script called Encoded.ps1 that contains your base64 encoded command. With the prerequisites met, you can do the following:
Powershell.exe -EncodedCommand (Get-Content Encoded.ps1)
Option 2: Running a Decode Script Against the Encoded Script
The unicode requirement does not matter in this case (you can use ANSI if you like). You just need to know your original command string encoding so you can properly decode it. We will assume ASCII character set. Let's also assume that Encoded.ps1 contains your base64 encoded command.
First, let's create the decode script called Decode.ps1.
# Decode.ps1
param([string]$FilePath)
$64EncodedData = Get-Content $FilePath
$DecodedData = [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String($64EncodedData))
& ([scriptblock]::Create($DecodedData))
Second, let's run the Powershell.exe command to decode Encoded.ps1 and execute the decoded command.
Powershell.exe -File Decoded.ps1 -FilePath Encoded.ps1
The code above is not intended to display the contents of the decoded commands but rather execute the decoded commands. $FilePath is the path to your Encoded.ps1 file, which contains a base64 encoded string from an ASCII encoded character set. You can change to whichever encoding applies to your situation in the Decode.ps1 file. $DecodedData contains the original command strings. Finally, a script block is created containing $DecodedData and then called with the call operator &.

Encoding lost on save [duplicate]

I'm trying to capture standard output from npm that I run from PowerShell. Npm downloads packages that are missing and outputs appropriate tree.
What it looks like:
.
That's correct output.
When I try to do the same from PowerShell and capture the result, I'm not able to get the same characters:
It's the same when I use
gpm install | Tee-Object -FilePath or
gpm install | Out-File -FilePath .. -Encoding Unicode # or Utf8
$a = gpm install
When I redircect in cmd.exe output to a file, the content looks like this:
How can I capture the output correctly in PowerShell?
PowerShell is an object-based shell with an object-based pipeline, but for native applications the pipeline is byte-stream-based. So PowerShell has to convert from/to a byte stream when it passes data from/to a native application. This conversion happens even when you pipe data from one native application to another or redirect a native application's output to a file.
When PowerShell receives data from a native application, it decodes the byte stream as a string, and splits that string by the newline character. For decoding byte streams to strings PowerShell uses the console's output encoding: [Console]::OutputEncoding. If you know that your application use a different output encoding, you can explicitly change the console's output encoding to match your application's:
[Console]::OutputEncoding=[Text.Encoding]::UTF8
When PowerShell passes data to a native application, it convert objects to strings using the encoding specified in the $OutputEncoding preference variable.

How to append binary blob to PowerShell script to be ignored as script text?

A Unix shell (bash, dash tried) interprets a script line by line. This allows to attach some binary data to the end of the script. My particular example is a Jar file that can be automatically unzipped or run from that very script.
I wonder if this is somehow possible with PowerShell too. I tried and got errors which indicate that the PowerShell seems to parse the whole file first before starting to run it.
Is there a way to mark the rest of a file such that the PowerShell does not try to interpret it but just ignores it?
Since my specific use case is that I want to make a Jar file executable, solutions relying on base64 encoding the binary blob do not work.
To be even more explicit: the script will basically run
java -jar $MyInvocation.MyCommand.Definition #Args
such that java shall use the file as a jar file.
What makes you think a solution using base64 encoding wouldn't work? Convert the file to a base64 string like this:
$bytes = [IO.File]::ReadAllBytes('C:\path\to\your.jar')
[Convert]::ToBase64String($bytes)
and put the string into your script as a variable:
$jarData = 'UEsDBBQAAAAA...'
If you prefer a multiline base64 string you can wrap it like this:
[Convert]::ToBase64String($bytes) -replace '(.{80})', "`$1`n"
and put it into the script like this:
$jarData = #'
UEsDBBQAAAAA...
...
'#
Have your script decode the data and save it back to a file upon execution:
$bytes = [Convert]::FromBase64String($jarData)
[IO.File]::WriteAllBytes("$env:TEMP\your.jar", $bytes)
To my knowledge this is the only way to embed binary data in PowerShell scripts.

What is a difference between > operator and Set-Content cmdlet

I figured out that these 2 lines:
echo "hello world" > hi.txt
echo "hello world" | Set-Content hi.txt
aren't doing exactly the same job. I created a simple script that replaces a content of some values in a configuration file and store it (using >), but that seems to store file in some weird format. Standard windows text editors do see the file normal, but the IDE which is supposed to load this file, (it's configuration file of a project) is unable to read it (I think it uses some extra encoding or whatever).
However when I replaced it with Set-Content it works fine.
What is a default behaviour of these commands, what Set-Content does differently so that it works in that?
The difference is in what default encoding is used. From MSDN, we can see that Set-Content defaults to ASCII encoding, which is readable by most programs (but may not work if you're not writing english). The > output redirection operator on the other hand works with Powershell's internal string representation, which is .Net System.String, which is UTF-16 (reference)
As a side-note, you can also use Out-File, which uses unicode encoding.
The default encoding of Set-Content is ASCII. This can be confirmed with the following:
Get-Help -Name Set-Content -Parameter Encoding;
The default encoding of the PowerShell redirection operator > is Unicode. This can be confirmed by looking at the help about_Redirection topic in PowerShell.
http://technet.microsoft.com/en-us/library/hh847746.aspx