Powershell: convert stdin to UTF-8 - powershell

I'm trying to automate some Terraform work on Windows. I want to export my Terraform state as JSON then use the Windows version of jq.exe to pull out relevant bits of information.
Ideally, my command line would look like:
terraform show -json | jq '<my-jq-query>'
Unfortunately, by default Windows appears to use UTF-16 LE (so that's what the Terraform JSON output is encoded with) and jq.exe only supports UTF-8.
I found that the PowerShell command Set-Content has an -Encoding parameter that can be used to specify output encoding, but I can't figure out how to get Set-Content to read from stdin instead of from a file. I mean, I'd like to do:
terraform show -json | Set-Content -Encoding utf8 | jq '<my-jq-query'>
but I can't figure out how to get it to work.
How can I coax PowerShell into allow me to convert character encoding in the pipeline without reading/writing to a file?

Related

Using gh api in powershell to get a file with a UTF-8 bom adds a corrupted version of the BOM to the begining of the file

If I execute the following commands from Powershell core 7.2.6 on windows 11 to grab an XML file with a UTF-8 BOM:
gh api -H "Accept: application/vnd.github.raw+text" repos/zippy1981/CodeFirstBetter/contents/Zippysoft.CodeFirst.AD.Importer/Zippysoft_CodeFirst_AD_Importer.csproj | Out-File Zippysoft_CodeFirst_AD_Importer.csproj
I get a file Visual Studio Code thinks has no BOM, but does have 4 characters that seem to be a corrupt version of the BOM
dotnet cannot parse the file until I delete them. Changing the end of the pipeline to Out-File Zippysoft_CodeFirst_AD_Importer.csproj -Encoding utf8BOM does not fix the problem but oes prepend an ACTUAL BOM to it. How do I get rid of those characters?
The best I can do is two long lines of PowerShell to pull the files metadata and contents in base64 and then decode it.
$contents = gh api -H "Accept: application/vnd.github.text" repos/zippy1981/CodeFirstBetter/contents/Zippysoft.CodeFirst.AD.Importer/Zippysoft_CodeFirst_AD_Importer.csproj | ConvertFrom-Json
[System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($contents.content)) | Out-File -FilePath $contents.name
This seems like a lot of work when the gh api cmd could simply had a flag to diretly write standard output to a file and skip any weirdness of the powershell (or other shell) pipeline.

Powershell input CSV in UTF7 - Output in UTF 8 problem with some character

I have a client who requires a database export from a SQL Server 2016 database in UTF8 no BOM. I've used PowerShell to import the raw output from the database (which is in ANSI) and output the file in UTF-8.
Now I am hearing back if I could remove some 'special characters', and saw that PowerShell has changed it as shown in the picture.
Is there any way PowerShell could keep the character or remove it entirely?
This might also happen with other characters in the future, our sample dataset only contains this particular character.
EDIT: The Customer has a batch script which exports a select request from a MSSQL Server to a CSV File. Script as follows:
sqlcmd -S [SERVER]\[INSTANCE] -U sa -P [PASSWORD] -d [DATABASE] -I -i "C:\Path\To\Query.sql" -o "C:\Path\For\Output\Ouput.csv" -W -s"|"
The CSV is seperated by a Pipe.
The request was then to add double-quotes as text identifier as well as change the encoding to UTF-8 no BOM. The Database apparently exports the file in ANSI.
I've created a powershell script since I know it will automatically add the double quotes for me and I should be able to change encoding through it.
Script goes as follows:
$file = Import-Csv -Path "C:\Path\For\Output\Ouput.csv" -Encoding "UTF7" -Delimiter "|"
$file | Export-Csv -path "C:\Path\For\Output\Ouput.csv" -delimiter "|" -Encoding "UTF8noBOM" -NoTypeInformation
The reason for the -Encoding UTF7 flag in the input step was that without it, we had problems with special letters like ß and äöü (we're in Germany, those will be frequent).
After running the file through this script, it's mostly as it should be however the example in the screenshot is a problem for the people trying to import the file into their system afterwards.
Did this help? I'll gladly provide any further information, thank you in you advanced!
EDIT: Found a solution. I've edited the customers original script which creates the export from the database, I've added the -u flag making the output Unicode. It's not UTF8 yet but the powershell script can now convert the file properly, also no need to set import encoding to UTF7. Thanks to JosefZ for questioning my use of forced UTF7 encoding, made me realise I was looking at the wrong place to fix this.

Encoding lost on save [duplicate]

I'm trying to capture standard output from npm that I run from PowerShell. Npm downloads packages that are missing and outputs appropriate tree.
What it looks like:
.
That's correct output.
When I try to do the same from PowerShell and capture the result, I'm not able to get the same characters:
It's the same when I use
gpm install | Tee-Object -FilePath or
gpm install | Out-File -FilePath .. -Encoding Unicode # or Utf8
$a = gpm install
When I redircect in cmd.exe output to a file, the content looks like this:
How can I capture the output correctly in PowerShell?
PowerShell is an object-based shell with an object-based pipeline, but for native applications the pipeline is byte-stream-based. So PowerShell has to convert from/to a byte stream when it passes data from/to a native application. This conversion happens even when you pipe data from one native application to another or redirect a native application's output to a file.
When PowerShell receives data from a native application, it decodes the byte stream as a string, and splits that string by the newline character. For decoding byte streams to strings PowerShell uses the console's output encoding: [Console]::OutputEncoding. If you know that your application use a different output encoding, you can explicitly change the console's output encoding to match your application's:
[Console]::OutputEncoding=[Text.Encoding]::UTF8
When PowerShell passes data to a native application, it convert objects to strings using the encoding specified in the $OutputEncoding preference variable.

Storing standard output from native app with Utf8 characters

I'm trying to capture standard output from npm that I run from PowerShell. Npm downloads packages that are missing and outputs appropriate tree.
What it looks like:
.
That's correct output.
When I try to do the same from PowerShell and capture the result, I'm not able to get the same characters:
It's the same when I use
gpm install | Tee-Object -FilePath or
gpm install | Out-File -FilePath .. -Encoding Unicode # or Utf8
$a = gpm install
When I redircect in cmd.exe output to a file, the content looks like this:
How can I capture the output correctly in PowerShell?
PowerShell is an object-based shell with an object-based pipeline, but for native applications the pipeline is byte-stream-based. So PowerShell has to convert from/to a byte stream when it passes data from/to a native application. This conversion happens even when you pipe data from one native application to another or redirect a native application's output to a file.
When PowerShell receives data from a native application, it decodes the byte stream as a string, and splits that string by the newline character. For decoding byte streams to strings PowerShell uses the console's output encoding: [Console]::OutputEncoding. If you know that your application use a different output encoding, you can explicitly change the console's output encoding to match your application's:
[Console]::OutputEncoding=[Text.Encoding]::UTF8
When PowerShell passes data to a native application, it convert objects to strings using the encoding specified in the $OutputEncoding preference variable.

What is a difference between > operator and Set-Content cmdlet

I figured out that these 2 lines:
echo "hello world" > hi.txt
echo "hello world" | Set-Content hi.txt
aren't doing exactly the same job. I created a simple script that replaces a content of some values in a configuration file and store it (using >), but that seems to store file in some weird format. Standard windows text editors do see the file normal, but the IDE which is supposed to load this file, (it's configuration file of a project) is unable to read it (I think it uses some extra encoding or whatever).
However when I replaced it with Set-Content it works fine.
What is a default behaviour of these commands, what Set-Content does differently so that it works in that?
The difference is in what default encoding is used. From MSDN, we can see that Set-Content defaults to ASCII encoding, which is readable by most programs (but may not work if you're not writing english). The > output redirection operator on the other hand works with Powershell's internal string representation, which is .Net System.String, which is UTF-16 (reference)
As a side-note, you can also use Out-File, which uses unicode encoding.
The default encoding of Set-Content is ASCII. This can be confirmed with the following:
Get-Help -Name Set-Content -Parameter Encoding;
The default encoding of the PowerShell redirection operator > is Unicode. This can be confirmed by looking at the help about_Redirection topic in PowerShell.
http://technet.microsoft.com/en-us/library/hh847746.aspx