Override Powershell > shortcut - powershell

In Powershell using > is the same as using | Out-File, so I can write
"something" > file.txt and It will write 'something' into file.txt . This is what I expect of a shell. Unfortunately, Powershell uses Unicode for writing file.txt. The only way to change it into UTF-8 is to write the quite long command:
"something" | Out-File file.txt -Encoding UTF8
I want to override the > shortcut, so that it adds the UTF-8 encoding by default. Is there a way to do that?
NOT A DUPLICATE CLARIFICATION:
This is not a duplicate. As is explained clearly here, Out-File has a hard-coded default. I don't want to change Out-File's behavior, I want to change >'s behavior.

No, can't be done
Even the documentation alludes to this.
From the last paragraph of Get-Help about_Redirection:
When you are
writing to files, the redirection operators use Unicode encoding. If
the file has a different encoding, the output might not be formatted
correctly. To redirect content to non-Unicode files, use the Out-File
cmdlet with its Encoding parameter.
(emphasis added)

The output encoding can be overriden by changing the $OutputEncoding variable. However, that only works for piping output into executables. It doesn't work for redirection operators. If you need a specific encoding for file output you must use Out-File or Set-Content with the -Encoding parameter (or a StreamWriter).

Related

Powershell Out-file special characters

I have a script that processes data from files and writes result based on a condition to txt. Given data are strings with words like: "Distribución" or "México". When processed, those special characters like "é" and "ó" are broken (typical white square or question mark).
How can i encode the output file to make it work with those characters? I tried encoding in Utf8, utf8 without BOM, it doesn't work. Here is to file writing line:
...| Out-file -encoding XXX .\result.txt
in XXX i tried ASCII, Utf8, nothing works :/
Out-File will always add a BOM. It's a particularly annoying "feature" of that Cmdlet. Unfortunately - to my knowledge - there is no quick way to save a file using UTF8 WITHOUT a BOM in powershell. You can, however, leverage .Net to do this. This isn't really production ready, but here's a quick example:
$outputPath = "D:\temp.txt"
$data = "Distribución or México"
[System.IO.File]::WriteAllLines($outputPath, $data)
Wrap it in a Cmdlet, function and / or module to make it reusable. Of course you can take more control over the file encoding with .Net too.

How to expand file content with powershell

I want to do this :
$content = get-content "test.html"
$template = get-content "template.html"
$template | out-file "out.html"
where template.html contains
<html>
<head>
</head>
<body>
$content
</body>
</html>
and test.html contains:
<h1>Test Expand</h1>
<div>Hello</div>
I get weird characters in first 2 characters of out.html :
��
and content is not expanded.
How to fix this ?
To complement Mathias R. Jessen's helpful answer with a solution that:
is more efficient.
ensures that the input files are read as UTF-8, even if they don't have a (pseudo-)BOM (byte-order mark).
avoids the "weird character" problem altogether by writing a UTF-8-encoded output file without that pseudo-BOM.
# Explicitly read the input files as UTF-8, as a whole.
$content = get-content -raw -encoding utf8 test.html
$template = get-content -raw -encoding utf8 template.html
# Write to output file using UTF-8 encoding *without a BOM*.
[IO.File]::WriteAllText(
"$PWD/out.html",
$ExecutionContext.InvokeCommand.ExpandString($template)
)
get-content -raw (PSv3+) reads the files in as a whole, into a single string (instead of an array of strings, line by line), which, while more memory-intensive, is faster. With HTML files, memory usage shouldn't be a concern.
An additional advantage of reading the files in full is that if the template were to contain multi-line subexpressions ($(...)), the expansion would still function correctly.
get-content -encoding utf8 ensures that the input files are interpreted as using character encoding UTF-8, as is typical in the web world nowadays.
This is crucial, given that UTF-8-encoded HTML files normally do not have the 3-byte pseudo-BOM that PowerShell needs in order to correctly identify a file as UTF-8-encoded (see below).
A single $ExecutionContext.InvokeCommand.ExpandString() call is then sufficient to perform the template expansion.
Out-File -Encoding utf8 would invariably create a file with the pseudo-BOM, which is undesired.
Instead, [IO.File]::WriteAllText() is used, taking advantage of the fact that the .NET Framework by default creates UTF-8-encoded files without the BOM.
Note the use of $PWD/ before out.html, which is needed to ensure that the file gets written in PowerShell's current location (directory); unfortunately, what the .NET Framework considers the current directory is not in sync with PowerShell.
Finally, the obligatory security warning: use this expansion technique only on input that you trust, given that arbitrary embedded commands may get executed.
Optional background information
PowerShell's Out-File, > and >> use UTF-16 LE character encoding with a BOM (byte-order mark) by default (the "weird characters", as mentioned).
While Out-File -Encoding utf8 allows creating UTF-8 output files instead,
PowerShell invariably prepends a 3-byte pseudo-BOM to the output file, which some utilities, notably those with Unix heritage, have problems with - so you would still get "weird characters" (albeit different ones).
If you want a more PowerShell-like way of creating BOM-less UTF-8 files,
see this answer of mine, which defines an Out-FileUtf8NoBom function that otherwise emulates the core functionality of Out-File.
Conversely, on reading files, you must use Get-Content -Encoding utf8 to ensure that BOM-less UTF-8 files are recognized as such.
In the absence of the UTF-8 pseudo-BOM, Get-Content assumes that the file uses the single-byte, extended-ASCII encoding specified by the system's legacy codepage (e.g., Windows-1252 on English-language systems, an encoding that PowerShell calls Default).
Note that while Windows-only editors such as Notepad create UTF-8 files with the pseudo-BOM (if you explicitly choose to save as UTF-8; default is the legacy codepage encoding, "ANSI"), increasingly popular cross-platform editors such as Visual Studio Code, Atom, and Sublime Text by default do not use the pseudo-BOM when they create files.
For the "weird characters", they're probably BOMs (Byte-order marks). Specify the output encoding explicitly with the -Encoding parameter when using Out-File, for example:
$Template |Out-File out.html -Encoding UTF8
For the string expansion, you need to explicitly tell powershell to do so:
$Template = $Template |ForEach-Object {
$ExecutionContext.InvokeCommand.ExpandString($_)
}
$Template | Out-File out.html -Encoding UTF8

Cannot write help text to a file in PowerShell

I was trying to write help text to a file with
Set-Content -path "help.txt" -Value $(help -Full "help")
Then I found that help cmdlet generates an object rather than text.
But simply adding toString() at the end does not work either.
So how can I get clean text from help command and write it to file using Set-Content?
In order to capture output as it would print on the screen, use either output redirection operator >, or pipe to cmdlet Out-File, which is required if you want to use an output character encoding other than the default, UTF-16 LE:
help -full help > help.txt # invariably creates a UTF-16 LE file
help -full help | Out-File help.txt # equivalent, but supports -Encoding <name>
By contrast, Set-Content:
does not use PowerShell's default output formatting; instead, it applies (at least conceptually) a .ToString() call to each input object, which may or may not give a meaningful representation.
creates ASCII files by default, but, like Out-File, it supports different encodings via the
-Encoding parameter.

Powershell generates .bat, and put special character

I'm currently working with powershell in order to create a .bat script.
I put text in .bat script with >>
For example,
Write "start program xxx" >> script.bat
but when i try to execute this script.bat with cmd, it says :
"■s" is not recognize ... etc.
And in powershell it says : 'þp' is not recognize ..
So I guess that doing >> script put special character at the beginning of the line. If someone got information on this. And what those "■s" and 'þp' are.
The file redirection operators (>> etc.) will write text encoded in UTF-16. If the file already contains text in a different encoding everything will be confused (and I'm not use of cmd.exe understands UTF-16 at all.
Easier to use Out-File with the -encoding parameter to specify something consistent. Use the -append switch parameter to append rather than overwriting.
Eg.
"Some text" | Out-File -encoding ASCII -append -FilePath 'script.bat`
(If you find yourself writing the same out-file and parameters, then put it in a helper advanced function that will read pipeline input to encapsulate the out-file.)

Powershell: Get default system encoding

The powershell cmdlet out-file has the switch -encoding witch you can set to default. This default value will use the encoding of the system's current ANSI code page.
My question is: How can I get the name of this default encoding that out-file will use with powershell?
Take a look at [System.Text.Encoding]::Default, I believe it is used as "default".
E.g. in my case:
[System.Text.Encoding]::Default.EncodingName
gets
Cyrillic (Windows)