tee with utf-8 encoding - powershell

I'm trying to tee a server's output to both the console and a file in Powershell 4. The file is ending up with a UTF-16 encoding, which is incompatible with some other tools I'm using. According to help tee -full:
Tee-Object uses Unicode enocding when it writes to files.
...
To specify the encoding, use the Out-File cmdlet
So tee doesn't support changing encoding, and the help for both tee and Out-File don't show any examples of splitting a stream and encoding it with UTF-8.
Is there a simple way in Powershell 4 to tee (or otherwise split a stream) to a file with UTF-8 encoding?

One option is to use Add-Content or Set-Content instead of Out-File.
The *-Content cmdlets use ASCII encoding by default, and have a -Passthru switch so you can write to the file, and then have the input pass through to the console:
Get-Childitem -Name | Set-Content file.txt -Passthru

You would have to use -Variable and then write it out to a file in a separate step.
$data = $null
Get-Process | Tee-Object -Variable data
$data | Out-File -Path $path -Encoding Utf8
At first glance it seems like it's easier to avoid tee altogether and just capture the output in a variable, then write it to the screen and to a file.
But because of the way the pipeline works, this method allows for a long running pipeline to display data on screen as it goes along. Unfortunately the same cannot be said for the file, which won't be written until afterwards.
Doing Both
An alternative is to roll your own tee so to speak:
[String]::Empty | Out-File -Path $path # initialize the file since we're appending later
Get-Process | ForEach-Object {
$_ | Out-File $path -Append -Encoding Utf
$_
}
That will write to the file and back to the pipeline, and it will happen as it goes along. It's probably quite slow though.

Tee-object seems to invoke out-file, so this will make tee output utf8:
$PSDefaultParameterValues = #{'Out-File:Encoding' = 'utf8'}

First create the file using appropriate flags then append to it:
Set-Content out $null -Encoding Unicode
...
cmd1 | tee out -Append
...
cmdn | tee out -Append

Related

Why does Powershell's Tee-Object mess up the encoding of my file?

I used Tee-Object over the weekend to generate some output of a log file I was tailing, and I tried greping the output file and could not return any results. But the original log file I was able to grep.
It seems that Tee-Object has changed the encoding of the file.
https://adamtheautomator.com/tee-object-powershell/
Is there a setting I can change to just spit out the same encoding it read in to begin with, and also keep the line endings the same?
Short answer no, there is no -Encoding parameter.
From PowerShell Tee-Object documentation:
Tee-Object uses Unicode encoding when it writes to files. As a result,
the output might not be formatted properly in files with a different
encoding. To specify the encoding, use the Out-File cmdlet.
As a workaround, tee to a variable, then use set-content to save it to a file. The default encoding is "ansi".
echo hi | tee -Variable a
set-content file $a
Here's an example, if you want the extra formatting that something like out-file normally provides. I'm guessing, because the original question has no example:
ps cmd | tee -var a
$a | out-string | set-content file
Actually, it looks like tee-object is invoking out-file, so this will set the encoding to ascii for tee-object:
$PSDefaultParameterValues = #{'Out-File:Encoding' = 'Ascii'}
HAL's helpful answer shows that, in Windows PowerShell and as of PowerShell (Core) 7.2.x, Tee-Object does not support specifying an output encoding explicitly when outputting to a file, and instead invariably uses "Unicode" (UTF-16LE) encoding in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
GitHub issue #11104 suggests removing this limitation by adding an -Encoding parameter to Tee-Object that allows specifying the desired output encoding.
js2010's answer shows that there actually is an indirect way to control the encoding, via an entry in the default-parameter-value table $PSDefaultParameterValues aimed at Out-File (e.g., $PSDefaultParameterValues['Out-File:Encoding'] = 'utf8').
However, this coupling between Tee-Object and Out-File is an implementation detail, so it is best not to rely on it. (Besides, it's nontrivial to scope it to an individual invocation of Tee-Object).
js2010's answer also is on the right track for a good workaround, by teeing to a variable first, but Set-Content is not the right cmdlet to use to then save the captured objects, because it performs simple .ToString() stringification on its input, whereas Tee-Object - like Out-File - applies PowerShell's rich default formatting.
Therefore, consider the following workaround:
# Tee to a *variable* first ($out)...
$PSVersionTable | Tee-Object -Variable out # | ...
# ... then use Out-File -Encoding to save to a file with the desired encoding
# ; e.g., with UTF-8
Out-File -InputObject $out out.txt -Encoding utf8
As for:
Is there a setting I can change to just spit out the same encoding
No - PowerShell doesn't support that in general: It reads file content into .NET strings in memory, and applies default (or specified) character encoding when saving back to a file.
The only workaround is to determine the input file's encoding manually, and then pass that encoding's name to a write-to-file cmdlet that has an -Encoding parameter, such as Out-File or Set-Content.
As already noted, there is no encoding option for the tee command. To get around this, I was able to use the following for the conversion:
<powershell command> | tee -Variable content
$content | Set-Content -Encoding uft8 test_output.txt
This worked better than the testing I did to try and use Out-File.

ANSI Encoding via PowerShell [duplicate]

In PowerShell, what's the difference between Out-File and Set-Content? Or Add-Content and Out-File -append?
I've found if I use both against the same file, the text is fully mojibaked.
(A minor second question: > is an alias for Out-File, right?)
Here's a summary of what I've deduced, after a few months experience with PowerShell, and some scientific experimentation. I never found any of this in the documentation :(
[Update: Much of this now appears to be better documented.]
Read and write locking
While Out-File is running, another application can read the log file.
While Set-Content is running, other applications cannot read the log file. Thus never use Set-Content to log long running commands.
Encoding
Out-File saves in the Unicode (UTF-16LE) encoding by default (though this can be specified), whereas Set-Content defaults to ASCII (US-ASCII) in PowerShell 3+ (this may also be specified). In earlier PowerShell versions, Set-Content wrote content in the Default (ANSI) encoding.
Editor's note: PowerShell as of version 5.1 still defaults to the culture-specific Default ("ANSI") encoding, despite what the documentation claims. If ASCII were the default, non-ASCII characters such as ü would be converted to literal ?, but that is not the case: 'ü' | Set-Content tmp.txt; (Get-Content tmp.txt) -eq '?' yields $False.
PS > $null | out-file outed.txt
PS > $null | set-content set.txt
PS > md5sum *
f3b25701fe362ec84616a93a45ce9998 *outed.txt
d41d8cd98f00b204e9800998ecf8427e *set.txt
This means the defaults of two commands are incompatible, and mixing them will corrupt text, so always specify an encoding.
Formatting
As Bartek explained, Out-File saves the fancy formatting of the output, as seen in the terminal. So in a folder with two files, the command dir | out-file out.txt creates a file with 11 lines.
Whereas Set-Content saves a simpler representation. In that folder with two files, the command dir | set-content sc.txt creates a file with two lines. To emulate the output in the terminal:
PS > dir | ForEach-Object {$_.ToString()}
out.txt
sc.txt
I believe this formatting has a consequence for line breaks, but I can't describe it yet.
File creation
Set-Content doesn't reliably create an empty file when Out-File would:
In an empty folder, the command dir | out-file out.txt creates a file, while dir | set-content sc.txt does not.
Pipeline Variable
Set-Content takes the filename from the pipeline; allowing you to set a number of files' contents to some fixed value.
Out-File takes the data as from the pipeline; updating a single file's content.
Parameters
Set-Content includes the following additional parameters:
Exclude
Filter
Include
PassThru
Stream
UseTransaction
Out-File includes the following additional parameters:
Append
NoClobber
Width
For more information about what those parameters are, please refer to help; e.g. get-help out-file -parameter append.
Out-File has the behavior of overwriting the output path unless the -NoClobber and/or the -Append flag is set. Add-Content will append content if the output path already exists by default (if it can). Both will create the file if one doesn't already exist.
Another interesting difference is that Add-Content will create an ASCII encoded file by default and Out-File will create a little endian unicode encoded file by default.
> is an alias syntactic sugar for Out-File. It's Out-File with some pre-defined parameter settings.
Well, I would disagree... :)
Out-File has -Append (-NoClober is there to avoid overwriting) that will Add-Content. But this is not the same beast.
command | Add-Content will use .ToString() method on input. Out-File will use default formatting.
so:
ls | Add-Content test.txt
and
ls | Out-File test.txt
will give you totally different results.
And no, '>' is not alias, it's redirection operator (same as in other shells). And has very serious limitation... It will cut lines same way they are displayed. Out-File has -Width parameter that helps you avoid this. Also, with redirection operators you can't decide what encoding to use.
HTH
Bartek
Set-Content supports -Encoding Byte, while Out-File does not.
So when you want to write binary data or result of Text.Encoding#GetBytes() to a file, you should use Set-Content.
Wanted to add about difference on encoding:
Windows with PowerShell 5.1:
Out-File - Default encoding is utf-16le
Set-Content - Default encoding is us-ascii
Linux with PowerShell 7.1:
Out-File - Default encoding is us-ascii
Set-Content - Default encoding is us-ascii
Out-file -append or >> can actually mix two encodings in the same file. Even if the file is originally ASCII or ANSI, it will add Unicode by default to the bottom of it. Add-content will check the encoding and match it before appending. Btw, export-csv defaults to ASCII (no accents), and set-content/add-content to ANSI.
TL;DR, use Set-Content as it's more consistent over Out-File.
Set-Content behavior is the same over different powershell versions
Out-File as #JagWireZ says produces different encodings for the default settings, even on the same OS(Windows) the docs for powershell 5.1 and powershell 7.3 state that the encoding changed from unicode to utf8NoBOM
Some issues like Malformed XML arise from using Out-File, that could of course be fixed by setting the desired encoding, however it's likely to forget to set the encoding and end up with issues.

Powershell logging from Invoke-Expression with encoding

I have an specific scenario where I have to log a batch file using Invoke-Expression in Powershell but my logs are being saved with "UCS-2 Little Endian" Encoding and I would like to save it with UTF-8 or any other encoding.
This is a simple example of what I'm trying to do:
batch file (test.bat):
echo Test
Powershell file (test.ps1):
Invoke-Expression "c:\test.bat > log.txt"
Is there a way I could change the encoding on log.txt?
You can try this:
C:\test.bat | Out-File C:\log.txt -Encoding UTF8
Or if for whatever reason you really have to use Invoke-Expression:
Invoke-Expression "C:\test.bat" | Out-File C:\log.txt -Encoding UTF8
Note that this will overwrite log.txt everytime. If you want to append to the file do this:
Invoke-Expression "C:\test.bat" | Out-File C:\log.txt -Encoding UTF8 -append
or
Invoke-Expression "C:\test.bat" | Add-Content C:\log.txt -Encoding UTF8

Concatenate files using PowerShell

I am using PowerShell 3.
What is best practice for concatenating files?
file1.txt + file2.txt = file3.txt
Does PowerShell provide a facility for performing this operation directly? Or do I need each file's contents be loaded into local variables?
If all the files exist in the same directory and can be matched by a simple pattern, the following code will combine all files into one.
Get-Content .\File?.txt | Out-File .\Combined.txt
I would go this route:
Get-Content file1.txt, file2.txt | Set-Content file3.txt
Use the -Encoding parameter on Set-Content if you need something other than ASCII which is the default for Set-Content.
If you need more flexibility, you could use something like
Get-ChildItem -Recurse *.cs | ForEach-Object { Get-Content $_ } | Out-File -Path .\all.txt
Warning: Concatenation using a simple Get-Content (whether or not using -Raw flag) works for text files; Powershell is too helpful for that:
Without -Raw, it "fixes" (i.e. breaks, pun intended) line breaks, or what Powershell thinks is a line break.
With -Raw, you get a terminating line end (normally CR+LF) at the
end of each file part, which is added at the end of the pipeline. There's an option for that in newer Powershells' Set-Content.
To concatenate a binary file (that is, an arbitrary file that was split for some reason and needs to be put together again), use either this:
Get-Content -Raw file1, file2 | Set-Content -NoNewline destination
or something like this:
Get-Content file1 -Encoding Byte -Raw | Set-Content destination -Encoding Byte
Get-Content file2 -Encoding Byte -Raw | Add-Content destination -Encoding Byte
An alternative is to use the CMD shell and use
copy file1 /b + file2 /b + file3 /b + ... destinationfile
You must not overwrite any part, that is, use any of the parts as destination. The destination file must be different from any of the parts. Otherwise you're up for a surprise and must find a backup copy of the file part.
a generalization based on #Keith answer:
gc <some regex expression> | sc output
Here is an interesting example of how to make a zip-in-image file based on Powershell 7
Get-Content -AsByteStream file1.png, file2.7z | Set-Content -AsByteStream file3.png
Get-Content -AsByteStream file1.png, file2.7z | Add-Content -AsByteStream file3.png
gc file1.txt, file2.txt > output.txt
I think this is as short as it gets.
In case you would like to ensure the concatenation is done in a specific order, use the Sort-Object -Property <Some Name> argument. For example, concatenate based on the name sorting in an ascending order:
Get-ChildItem -Path ./* -Include *.txt -Exclude output.txt | Sort-Object -Property Name | ForEach-Object { Get-Content $_ } | Out-File output.txt
IMPORTANT: -Exclude and Out-File MUST contain the same values, otherwise, it will recursively keep on adding to output.txt until your disk is full.
Note that you must append a * at the end of the -Path argument because you are using -Include, as mentioned in Get-ChildItem documentation.

PowerShell Set-Content and Out-File - what is the difference?

In PowerShell, what's the difference between Out-File and Set-Content? Or Add-Content and Out-File -append?
I've found if I use both against the same file, the text is fully mojibaked.
(A minor second question: > is an alias for Out-File, right?)
Here's a summary of what I've deduced, after a few months experience with PowerShell, and some scientific experimentation. I never found any of this in the documentation :(
[Update: Much of this now appears to be better documented.]
Read and write locking
While Out-File is running, another application can read the log file.
While Set-Content is running, other applications cannot read the log file. Thus never use Set-Content to log long running commands.
Encoding
Out-File saves in the Unicode (UTF-16LE) encoding by default (though this can be specified), whereas Set-Content defaults to ASCII (US-ASCII) in PowerShell 3+ (this may also be specified). In earlier PowerShell versions, Set-Content wrote content in the Default (ANSI) encoding.
Editor's note: PowerShell as of version 5.1 still defaults to the culture-specific Default ("ANSI") encoding, despite what the documentation claims. If ASCII were the default, non-ASCII characters such as ü would be converted to literal ?, but that is not the case: 'ü' | Set-Content tmp.txt; (Get-Content tmp.txt) -eq '?' yields $False.
PS > $null | out-file outed.txt
PS > $null | set-content set.txt
PS > md5sum *
f3b25701fe362ec84616a93a45ce9998 *outed.txt
d41d8cd98f00b204e9800998ecf8427e *set.txt
This means the defaults of two commands are incompatible, and mixing them will corrupt text, so always specify an encoding.
Formatting
As Bartek explained, Out-File saves the fancy formatting of the output, as seen in the terminal. So in a folder with two files, the command dir | out-file out.txt creates a file with 11 lines.
Whereas Set-Content saves a simpler representation. In that folder with two files, the command dir | set-content sc.txt creates a file with two lines. To emulate the output in the terminal:
PS > dir | ForEach-Object {$_.ToString()}
out.txt
sc.txt
I believe this formatting has a consequence for line breaks, but I can't describe it yet.
File creation
Set-Content doesn't reliably create an empty file when Out-File would:
In an empty folder, the command dir | out-file out.txt creates a file, while dir | set-content sc.txt does not.
Pipeline Variable
Set-Content takes the filename from the pipeline; allowing you to set a number of files' contents to some fixed value.
Out-File takes the data as from the pipeline; updating a single file's content.
Parameters
Set-Content includes the following additional parameters:
Exclude
Filter
Include
PassThru
Stream
UseTransaction
Out-File includes the following additional parameters:
Append
NoClobber
Width
For more information about what those parameters are, please refer to help; e.g. get-help out-file -parameter append.
Out-File has the behavior of overwriting the output path unless the -NoClobber and/or the -Append flag is set. Add-Content will append content if the output path already exists by default (if it can). Both will create the file if one doesn't already exist.
Another interesting difference is that Add-Content will create an ASCII encoded file by default and Out-File will create a little endian unicode encoded file by default.
> is an alias syntactic sugar for Out-File. It's Out-File with some pre-defined parameter settings.
Well, I would disagree... :)
Out-File has -Append (-NoClober is there to avoid overwriting) that will Add-Content. But this is not the same beast.
command | Add-Content will use .ToString() method on input. Out-File will use default formatting.
so:
ls | Add-Content test.txt
and
ls | Out-File test.txt
will give you totally different results.
And no, '>' is not alias, it's redirection operator (same as in other shells). And has very serious limitation... It will cut lines same way they are displayed. Out-File has -Width parameter that helps you avoid this. Also, with redirection operators you can't decide what encoding to use.
HTH
Bartek
Set-Content supports -Encoding Byte, while Out-File does not.
So when you want to write binary data or result of Text.Encoding#GetBytes() to a file, you should use Set-Content.
Wanted to add about difference on encoding:
Windows with PowerShell 5.1:
Out-File - Default encoding is utf-16le
Set-Content - Default encoding is us-ascii
Linux with PowerShell 7.1:
Out-File - Default encoding is us-ascii
Set-Content - Default encoding is us-ascii
Out-file -append or >> can actually mix two encodings in the same file. Even if the file is originally ASCII or ANSI, it will add Unicode by default to the bottom of it. Add-content will check the encoding and match it before appending. Btw, export-csv defaults to ASCII (no accents), and set-content/add-content to ANSI.
TL;DR, use Set-Content as it's more consistent over Out-File.
Set-Content behavior is the same over different powershell versions
Out-File as #JagWireZ says produces different encodings for the default settings, even on the same OS(Windows) the docs for powershell 5.1 and powershell 7.3 state that the encoding changed from unicode to utf8NoBOM
Some issues like Malformed XML arise from using Out-File, that could of course be fixed by setting the desired encoding, however it's likely to forget to set the encoding and end up with issues.