UTF8 encoding without BOM - PowerShell - powershell

I have a bat file where I encode some CSV files. The problem is that there are one character at the begining of the file once the encoding have been done (BOM byte I guess). This character bothers me cause after encoding, I use this file to generate a database.
Here is the line for encoding (inside bat file):
powershell -Command "&{ param($Path); (Get-Content $Path) | Out-File $Path -Encoding UTF8 }" CSVs\\pass.csv
Is there any way to encode the file without BOM (if this is the problem)??
Thanks!

I found the solution.
Just change the line with this:
powershell -Command "&{ param($Path); $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False); $MyFile = Get-Content $Path; [System.IO.File]::WriteAllLines($Path, $MyFile, $Utf8NoBomEncoding) }" CSVs\\pass.csv

Related

How to convert UTF16LE CSV file to UTF8 without losing Commas

We receive Cognos reports that are encoded as UTF16LE. I am trying to create a powershell script to convert the UTF16LE files to UTF8. My logic so far does loop through the directory (whichever directory I place the script in as hardcoding the directory names that contain date/numbers caused errors) and save the files as UTF-8; however, the delimiters seem to be removed.
I believe that it may be due to the way that I am reading the data, as I am not specifying UTF16LE; however, I am unsure of any way to do that. My script so far is:
$files = Get-ChildItem
$dt = get-date -Format yyyyMMdd
$extension = "_" + "$dt" + "_utf8.csv"
ForEach ($file in $files) {
$file_name = $file.basename
$new_file = "$file_name" + "$extension"
echo $new_file
#Get-Content $file | Set-Content -Encoding UTF8 $new_file
}
Read-Host -Prompt "Press Enter to Close Window"
Any and all insight into this issue would be greatly appreciated.
PowerShell's Import-CSV and Export-CSV cmdlets support the -Encoding parameter (links to Microsoft Docs), so you could replace your line
Get-Content $file | Set-Content -Encoding UTF8 $new_file
with
Import-CSV -Path $File -Encoding Unicode | Export-CSV -Path $New_File -Encoding UTF8
(UTF16LE encoding is what PowerShell calls "Unicode"; UTF16BE is "BigEndianUnicode". The default is UTF8NoBOM, UTF8 without Byte Order Mark.)
Since all you want to do is convert the character encoding, reading and writing as a string would be the most straightforward. As always, read a text file with the character encoding it was written with:
Get-Content -Encoding Unicode $file | Set-Content -Encoding UTF8 $new_file
Encoding "Unicode" for UTF-16 harkens back to the infancy of the Unicode character set when UCS-2 was going to be "it" for many environments. Then the explosion happened and UTF-16 was born from UCS-2. Systems invented since then quite reasonably use UTF16 or similar when they mean UTF-16 and "Unicode" for UTF-16 is esoteric and imponderable.

Why does appending to a file insert whitespace (NUL)

I am running this command
Get-Content generated\no_animate.css >> generated\all.css
I want to append the contents of no_animate.css to all.css.
This is working if I run it like this from a cmd prompt:
powershell Get-Content generated\no_animate.css >> generated\all.css
If I put the exact same code into a .ps1 file and run that it is copying the contents but inserting whitespace (Represented as NUL in texteditor) between every character.
Why would it be doing this? How do I prevent it?
In PowerShell the redirection operators > and >> are shorthands for Out-File and Out-File -Append respectively. Out-File uses Unicode (little endian UTF-16 specifically) as its default encoding. With this encoding every character is represented by 2 bytes instead of just 1. For ASCII characters (characters from the basic latin block) the first byte has the value 0.
Running powershell Get-Content generated\no_animate.css >> generated\all.css from CMD uses the CMD redirection operator instead of the PowerShell one, which doesn't transform the text to Unicode.
If you want to use PowerShell and your input file is ascii-encoded use Add-Content (available in PowerShell v3 or newer):
Get-Content generated\no_animate.css | Add-Content generated\all.css
or Out-File with explicit encoding.
Get-Content generated\no_animate.css |
Out-File generated\all.css -Append -Encoding Ascii

Powershell logging from Invoke-Expression with encoding

I have an specific scenario where I have to log a batch file using Invoke-Expression in Powershell but my logs are being saved with "UCS-2 Little Endian" Encoding and I would like to save it with UTF-8 or any other encoding.
This is a simple example of what I'm trying to do:
batch file (test.bat):
echo Test
Powershell file (test.ps1):
Invoke-Expression "c:\test.bat > log.txt"
Is there a way I could change the encoding on log.txt?
You can try this:
C:\test.bat | Out-File C:\log.txt -Encoding UTF8
Or if for whatever reason you really have to use Invoke-Expression:
Invoke-Expression "C:\test.bat" | Out-File C:\log.txt -Encoding UTF8
Note that this will overwrite log.txt everytime. If you want to append to the file do this:
Invoke-Expression "C:\test.bat" | Out-File C:\log.txt -Encoding UTF8 -append
or
Invoke-Expression "C:\test.bat" | Add-Content C:\log.txt -Encoding UTF8

Encode file with cmd

I have a bat file that performs some actions and I need to encode a text file with UTF-8 format.
Is there any way to perform this in windows command line??
Thanks in advance.
Only with other programs which may or may not be installed. If you're targetting Windows 7 and higher you could just use PowerShell:
powershell -Command "&{ param($Path); (Get-Content $Path) | Out-File $Path -Encoding UTF8 }" somefile.txt

Iconv is converting to UTF-16 instead of UTF-8 when invoked from powershell

I have a problem while trying to batch convert the encoding of some files from ISO-8859-1 to UTF-8 using iconv in a powershell script.
I have this bat file, that works ok:
for %%f in (*.txt) do (
echo %%f
C:\"Program Files"\GnuWin32\bin\iconv.exe -f iso-8859-1 -t utf-8 %%f > %%f.UTF_8_MSDOS
)
I need to convert all files on the directories structure, so I programmed this other script, this time using powershell:
Get-ChildItem -Recurse -Include *.java |
ForEach-Object {
$inFileName = $_.DirectoryName + '\' + $_.name
$outFileName = $inFileName + "_UTF_8"
Write-Host Convirtiendo $inFileName -> $outFileName
C:\"Program Files"\GnuWin32\bin\iconv.exe -f iso-8859-1 -t utf-8 $inFileName > $outFileName
}
And using this the result is the files be converted to UTF-16. I have no clue about what I am doing wrong.
Could anyone help me with this? Could be it some kind of problem with the encoding of powershell itself?
I am using W7 and WXP and LibIconv 1.9.2
> essentially is using the Out-File cmdlet who's default encoding is Unicode. Try:
iconv.exe ... | Out-File -Encoding Utf8
or with params:
& "C:\Program Files\GnuWin32\bin\iconv.exe" -f iso-8859-1 -t utf-8 $inFileName |
Out-File -Encoding Utf8 $outFileName
And since iconv.exe is outputting in UTF8, you have to tell the .NET console subsystem how to intrepret the stdin stream like so (execute this before iconv.exe):
[Console]::OutputEncoding = [Text.Encoding]::UTF8