PowerShell command - what does this output? - powershell

My colleague found this Batch command (Powershell Command in Windows) online and I need to know what "encoding" it is using to output to the file. I know that it is not utf-8. Here is the command:
Select-String logfile.log -pattern "string pattern" | tee-object -filepath "output_file.txt" -append
My Windows edition is Windows Server 2012 RC Standard

Taken from https://technet.microsoft.com/en-us/library/hh849937.aspx
Tee-Object uses Unicode encoding when it writes to files. As a result, the output might not be formatted properly in files with a different encoding. To specify the encoding, use the Out-File cmdlet.
The command Select-String is found in PowerShell.

Related

Why does Powershell's Tee-Object mess up the encoding of my file?

I used Tee-Object over the weekend to generate some output of a log file I was tailing, and I tried greping the output file and could not return any results. But the original log file I was able to grep.
It seems that Tee-Object has changed the encoding of the file.
https://adamtheautomator.com/tee-object-powershell/
Is there a setting I can change to just spit out the same encoding it read in to begin with, and also keep the line endings the same?
Short answer no, there is no -Encoding parameter.
From PowerShell Tee-Object documentation:
Tee-Object uses Unicode encoding when it writes to files. As a result,
the output might not be formatted properly in files with a different
encoding. To specify the encoding, use the Out-File cmdlet.
As a workaround, tee to a variable, then use set-content to save it to a file. The default encoding is "ansi".
echo hi | tee -Variable a
set-content file $a
Here's an example, if you want the extra formatting that something like out-file normally provides. I'm guessing, because the original question has no example:
ps cmd | tee -var a
$a | out-string | set-content file
Actually, it looks like tee-object is invoking out-file, so this will set the encoding to ascii for tee-object:
$PSDefaultParameterValues = #{'Out-File:Encoding' = 'Ascii'}
HAL's helpful answer shows that, in Windows PowerShell and as of PowerShell (Core) 7.2.x, Tee-Object does not support specifying an output encoding explicitly when outputting to a file, and instead invariably uses "Unicode" (UTF-16LE) encoding in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
GitHub issue #11104 suggests removing this limitation by adding an -Encoding parameter to Tee-Object that allows specifying the desired output encoding.
js2010's answer shows that there actually is an indirect way to control the encoding, via an entry in the default-parameter-value table $PSDefaultParameterValues aimed at Out-File (e.g., $PSDefaultParameterValues['Out-File:Encoding'] = 'utf8').
However, this coupling between Tee-Object and Out-File is an implementation detail, so it is best not to rely on it. (Besides, it's nontrivial to scope it to an individual invocation of Tee-Object).
js2010's answer also is on the right track for a good workaround, by teeing to a variable first, but Set-Content is not the right cmdlet to use to then save the captured objects, because it performs simple .ToString() stringification on its input, whereas Tee-Object - like Out-File - applies PowerShell's rich default formatting.
Therefore, consider the following workaround:
# Tee to a *variable* first ($out)...
$PSVersionTable | Tee-Object -Variable out # | ...
# ... then use Out-File -Encoding to save to a file with the desired encoding
# ; e.g., with UTF-8
Out-File -InputObject $out out.txt -Encoding utf8
As for:
Is there a setting I can change to just spit out the same encoding
No - PowerShell doesn't support that in general: It reads file content into .NET strings in memory, and applies default (or specified) character encoding when saving back to a file.
The only workaround is to determine the input file's encoding manually, and then pass that encoding's name to a write-to-file cmdlet that has an -Encoding parameter, such as Out-File or Set-Content.
As already noted, there is no encoding option for the tee command. To get around this, I was able to use the following for the conversion:
<powershell command> | tee -Variable content
$content | Set-Content -Encoding uft8 test_output.txt
This worked better than the testing I did to try and use Out-File.

ANSI Encoding via PowerShell [duplicate]

In PowerShell, what's the difference between Out-File and Set-Content? Or Add-Content and Out-File -append?
I've found if I use both against the same file, the text is fully mojibaked.
(A minor second question: > is an alias for Out-File, right?)
Here's a summary of what I've deduced, after a few months experience with PowerShell, and some scientific experimentation. I never found any of this in the documentation :(
[Update: Much of this now appears to be better documented.]
Read and write locking
While Out-File is running, another application can read the log file.
While Set-Content is running, other applications cannot read the log file. Thus never use Set-Content to log long running commands.
Encoding
Out-File saves in the Unicode (UTF-16LE) encoding by default (though this can be specified), whereas Set-Content defaults to ASCII (US-ASCII) in PowerShell 3+ (this may also be specified). In earlier PowerShell versions, Set-Content wrote content in the Default (ANSI) encoding.
Editor's note: PowerShell as of version 5.1 still defaults to the culture-specific Default ("ANSI") encoding, despite what the documentation claims. If ASCII were the default, non-ASCII characters such as ü would be converted to literal ?, but that is not the case: 'ü' | Set-Content tmp.txt; (Get-Content tmp.txt) -eq '?' yields $False.
PS > $null | out-file outed.txt
PS > $null | set-content set.txt
PS > md5sum *
f3b25701fe362ec84616a93a45ce9998 *outed.txt
d41d8cd98f00b204e9800998ecf8427e *set.txt
This means the defaults of two commands are incompatible, and mixing them will corrupt text, so always specify an encoding.
Formatting
As Bartek explained, Out-File saves the fancy formatting of the output, as seen in the terminal. So in a folder with two files, the command dir | out-file out.txt creates a file with 11 lines.
Whereas Set-Content saves a simpler representation. In that folder with two files, the command dir | set-content sc.txt creates a file with two lines. To emulate the output in the terminal:
PS > dir | ForEach-Object {$_.ToString()}
out.txt
sc.txt
I believe this formatting has a consequence for line breaks, but I can't describe it yet.
File creation
Set-Content doesn't reliably create an empty file when Out-File would:
In an empty folder, the command dir | out-file out.txt creates a file, while dir | set-content sc.txt does not.
Pipeline Variable
Set-Content takes the filename from the pipeline; allowing you to set a number of files' contents to some fixed value.
Out-File takes the data as from the pipeline; updating a single file's content.
Parameters
Set-Content includes the following additional parameters:
Exclude
Filter
Include
PassThru
Stream
UseTransaction
Out-File includes the following additional parameters:
Append
NoClobber
Width
For more information about what those parameters are, please refer to help; e.g. get-help out-file -parameter append.
Out-File has the behavior of overwriting the output path unless the -NoClobber and/or the -Append flag is set. Add-Content will append content if the output path already exists by default (if it can). Both will create the file if one doesn't already exist.
Another interesting difference is that Add-Content will create an ASCII encoded file by default and Out-File will create a little endian unicode encoded file by default.
> is an alias syntactic sugar for Out-File. It's Out-File with some pre-defined parameter settings.
Well, I would disagree... :)
Out-File has -Append (-NoClober is there to avoid overwriting) that will Add-Content. But this is not the same beast.
command | Add-Content will use .ToString() method on input. Out-File will use default formatting.
so:
ls | Add-Content test.txt
and
ls | Out-File test.txt
will give you totally different results.
And no, '>' is not alias, it's redirection operator (same as in other shells). And has very serious limitation... It will cut lines same way they are displayed. Out-File has -Width parameter that helps you avoid this. Also, with redirection operators you can't decide what encoding to use.
HTH
Bartek
Set-Content supports -Encoding Byte, while Out-File does not.
So when you want to write binary data or result of Text.Encoding#GetBytes() to a file, you should use Set-Content.
Wanted to add about difference on encoding:
Windows with PowerShell 5.1:
Out-File - Default encoding is utf-16le
Set-Content - Default encoding is us-ascii
Linux with PowerShell 7.1:
Out-File - Default encoding is us-ascii
Set-Content - Default encoding is us-ascii
Out-file -append or >> can actually mix two encodings in the same file. Even if the file is originally ASCII or ANSI, it will add Unicode by default to the bottom of it. Add-content will check the encoding and match it before appending. Btw, export-csv defaults to ASCII (no accents), and set-content/add-content to ANSI.
TL;DR, use Set-Content as it's more consistent over Out-File.
Set-Content behavior is the same over different powershell versions
Out-File as #JagWireZ says produces different encodings for the default settings, even on the same OS(Windows) the docs for powershell 5.1 and powershell 7.3 state that the encoding changed from unicode to utf8NoBOM
Some issues like Malformed XML arise from using Out-File, that could of course be fixed by setting the desired encoding, however it's likely to forget to set the encoding and end up with issues.

Notepad++ Open file, change encoding and save from command line

I search a way to do an automated task with Notepad++ from command line:
Open file
Change encoding to UTF-8
Save file
Is there any way to do it with some plugin or even with other program ?
Why do you want to use Notepad++ for that task? Which OS are you using?
Notepad++ got a Plugin-manager where you can install the Python Script plugin.
http://pw999.wordpress.com/2013/08/19/mass-convert-a-project-to-utf-8-using-notepad/
But if you want to convert files to UTF8 you can do that way easier with PowerShell on Windows or command line on Linux.
For Windows Power-Shell:
$yourfile = "C:\path\to\your\file.txt"
get-content -path $yourfile | out-file $yourfile -encoding utf8
For Linux use (e.g.) iconv:
iconv -f ISO-8859-15 -t UTF-8 source.txt > new-file.txt
Windows Powershell script to change all the files in the current folder (and in all subfolders):
foreach ($file in #(Get-ChildItem *.* -File -Recurse)) {
$content = get-content $file
out-file -filepath $file -inputobject $content -encoding utf8
}
If you want to change only specific files just change the *.* (in the first line).
Note: I tried the pipe (|) approach in Broco's answer and was not working (I got empty output files as Josh commented). I think is because we probably cannot read and write directly from and to the same file (while in my approach I put the content into a memory variable).

Encode file with cmd

I have a bat file that performs some actions and I need to encode a text file with UTF-8 format.
Is there any way to perform this in windows command line??
Thanks in advance.
Only with other programs which may or may not be installed. If you're targetting Windows 7 and higher you could just use PowerShell:
powershell -Command "&{ param($Path); (Get-Content $Path) | Out-File $Path -Encoding UTF8 }" somefile.txt

PowerShell Set-Content and Out-File - what is the difference?

In PowerShell, what's the difference between Out-File and Set-Content? Or Add-Content and Out-File -append?
I've found if I use both against the same file, the text is fully mojibaked.
(A minor second question: > is an alias for Out-File, right?)
Here's a summary of what I've deduced, after a few months experience with PowerShell, and some scientific experimentation. I never found any of this in the documentation :(
[Update: Much of this now appears to be better documented.]
Read and write locking
While Out-File is running, another application can read the log file.
While Set-Content is running, other applications cannot read the log file. Thus never use Set-Content to log long running commands.
Encoding
Out-File saves in the Unicode (UTF-16LE) encoding by default (though this can be specified), whereas Set-Content defaults to ASCII (US-ASCII) in PowerShell 3+ (this may also be specified). In earlier PowerShell versions, Set-Content wrote content in the Default (ANSI) encoding.
Editor's note: PowerShell as of version 5.1 still defaults to the culture-specific Default ("ANSI") encoding, despite what the documentation claims. If ASCII were the default, non-ASCII characters such as ü would be converted to literal ?, but that is not the case: 'ü' | Set-Content tmp.txt; (Get-Content tmp.txt) -eq '?' yields $False.
PS > $null | out-file outed.txt
PS > $null | set-content set.txt
PS > md5sum *
f3b25701fe362ec84616a93a45ce9998 *outed.txt
d41d8cd98f00b204e9800998ecf8427e *set.txt
This means the defaults of two commands are incompatible, and mixing them will corrupt text, so always specify an encoding.
Formatting
As Bartek explained, Out-File saves the fancy formatting of the output, as seen in the terminal. So in a folder with two files, the command dir | out-file out.txt creates a file with 11 lines.
Whereas Set-Content saves a simpler representation. In that folder with two files, the command dir | set-content sc.txt creates a file with two lines. To emulate the output in the terminal:
PS > dir | ForEach-Object {$_.ToString()}
out.txt
sc.txt
I believe this formatting has a consequence for line breaks, but I can't describe it yet.
File creation
Set-Content doesn't reliably create an empty file when Out-File would:
In an empty folder, the command dir | out-file out.txt creates a file, while dir | set-content sc.txt does not.
Pipeline Variable
Set-Content takes the filename from the pipeline; allowing you to set a number of files' contents to some fixed value.
Out-File takes the data as from the pipeline; updating a single file's content.
Parameters
Set-Content includes the following additional parameters:
Exclude
Filter
Include
PassThru
Stream
UseTransaction
Out-File includes the following additional parameters:
Append
NoClobber
Width
For more information about what those parameters are, please refer to help; e.g. get-help out-file -parameter append.
Out-File has the behavior of overwriting the output path unless the -NoClobber and/or the -Append flag is set. Add-Content will append content if the output path already exists by default (if it can). Both will create the file if one doesn't already exist.
Another interesting difference is that Add-Content will create an ASCII encoded file by default and Out-File will create a little endian unicode encoded file by default.
> is an alias syntactic sugar for Out-File. It's Out-File with some pre-defined parameter settings.
Well, I would disagree... :)
Out-File has -Append (-NoClober is there to avoid overwriting) that will Add-Content. But this is not the same beast.
command | Add-Content will use .ToString() method on input. Out-File will use default formatting.
so:
ls | Add-Content test.txt
and
ls | Out-File test.txt
will give you totally different results.
And no, '>' is not alias, it's redirection operator (same as in other shells). And has very serious limitation... It will cut lines same way they are displayed. Out-File has -Width parameter that helps you avoid this. Also, with redirection operators you can't decide what encoding to use.
HTH
Bartek
Set-Content supports -Encoding Byte, while Out-File does not.
So when you want to write binary data or result of Text.Encoding#GetBytes() to a file, you should use Set-Content.
Wanted to add about difference on encoding:
Windows with PowerShell 5.1:
Out-File - Default encoding is utf-16le
Set-Content - Default encoding is us-ascii
Linux with PowerShell 7.1:
Out-File - Default encoding is us-ascii
Set-Content - Default encoding is us-ascii
Out-file -append or >> can actually mix two encodings in the same file. Even if the file is originally ASCII or ANSI, it will add Unicode by default to the bottom of it. Add-content will check the encoding and match it before appending. Btw, export-csv defaults to ASCII (no accents), and set-content/add-content to ANSI.
TL;DR, use Set-Content as it's more consistent over Out-File.
Set-Content behavior is the same over different powershell versions
Out-File as #JagWireZ says produces different encodings for the default settings, even on the same OS(Windows) the docs for powershell 5.1 and powershell 7.3 state that the encoding changed from unicode to utf8NoBOM
Some issues like Malformed XML arise from using Out-File, that could of course be fixed by setting the desired encoding, however it's likely to forget to set the encoding and end up with issues.