How to save to file non-ascii output of program in Powershell? - powershell

I want to run program in Powershell and write output to file with UTF-8 encoding.
However I can't write non-ascii characters properly.
I already read many similar questions on Stack overflow, but I still can't find answer.
I tried both PowerShell 5.1.19041.1023 and PowerShell Core 7.1.3, they differently encode output file, but content is broken in the same way.
I tried simple programs in Python and Golang:
(Please assume that I can't change source code of programs)
Python
print('Hello ąćęłńóśźż world')
Results:
python hello.py
Hello ąćęłńóśźż world
python hello.py > file1.txt
Hello ╣Šŕ│˝ˇťč┐ world
python hello.py | out-file -encoding utf8 file2.ext
Hello ╣Šŕ│˝ˇťč┐ world
On cmd:
python hello.py > file3.txt
Hello ����󜟿 world
Golang
package main
import "fmt"
func main() {
fmt.Printf("Hello ąćęłńóśźż world\n")
}
Results:
go run hello.go:
Hello ąćęłńóśźż world
go run hello.go > file4.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
go run hello.go | out-file -encoding utf8 file5.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
On cmd it works ok:
go run hello.go > file6.txt
Hello ąćęłńóśźż world

You should set the OutputEncoding property of the console first.
In PowerShell, enter this line before running your programs:
[Console]::OutputEncoding = [Text.Encoding]::Utf8
You can then use Out-File with your encoding type:
py hello.py | Out-File -Encoding UTF8 file2.ext
go run hello.go | Out-File -Encoding UTF8 file5.txt

Note: These character-encoding problems only plague PowerShell on Windows, in both editions. On Unix-like platforms, UTF-8 is consistently used.[1]
Quicksilver's answer is fundamentally correct:
It is the character encoding stored in [Console]::OutputEncoding that determines how PowerShell decodes text received from external programs[2] - and note that it invariably interprets such output as text (strings).
[Console]::OutputEncoding by default reflects a console's active code page, which itself defaults to the system's active OEM code page, such as 437 (CP437) on US-English systems.
The standard chcp program also reports the active OEM code page, and while it can in principle also be used to change it for the active console (e.g., chcp 65001), this does not work from inside PowerShell, due to .NET caching the encodings.
Therefore, you may have to (temporarily) set [Console]::OutputEncoding to match the actual character encoding used by a given external console program:
While many console programs respect the active console code page (in which case no workarounds are required), some do not, typically in order to provide full Unicode support. Note that you may not notice a problem until you programmatically process such a program's output (meaning: capturing in a variable, sending through the pipeline to another command, redirection to a file), because such a program may detect the case when its stdout is directly connected to the console and may then selectively use full Unicode support for display.
Notable CLIs that do not respect the active console code page:
Python exhibits nonstandard behavior in that it uses the active ANSI code page by default, i.e. the code page normally only used by non-Unicode GUI-subsystem applications.
However, you can use $env:PYTHONUTF8=1 before invoking Python scripts to instruct Python to use UTF-8 instead (which then applies to all Python calls made from the same process); in v3.7+, you can alternatively pass command-line option -X utf8 (case-sensitive) as a per-call opt-in.
Go and also Node.js invariably use UTF-8 encoding.
The following snippet shows how to set [Console]::OutputEncoding temporarily as needed:
# Save the original encoding.
$orig = [Console]::OutputEncoding
# Work with console programs that use UTF-8 encoding,
# such as Go and Node.js
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
# Piping to Write-Output is a dummy operation that forces
# decoding of the external program's output, so that encoding problems would show.
go run hello.go | Write-Output
# Work with console programs that use ANSI encoding, such as Python.
# As noted, the alternative is to configure Python to use UTF-8.
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP))
python hello.py | Write-Output
# Restore the original encoding.
[Console]::OutputEncoding = $orig
Your own answer provides an effective alternative, but it comes with caveats:
Activating the Use Unicode UTF-8 for worldwide language support feature via Control Panel (or the equivalent registry settings) changes the code pages system-wide, which affects not only all console windows and console applications, but also legacy (non-Unicode) GUI-subsystem applications, given that both the OEM and the ANSI code pages are being set.
Notable side effects include:
Windows PowerShell's default behavior changes, because it uses the ANSI code page both to read source code and as the default encoding for the Get-Content and Set-Content cmdlets.
For instance, existing Windows PowerShell scripts that contain non-ASCII range characters such as é will then misbehave, unless they were saved as UTF-8 with a BOM (or as "Unicode", UTF-16LE, which always has a BOM).
By contrast, PowerShell (Core) v6+ consistently uses (BOM-less) UTF-8 to begin with.
Old console applications may break with 65001 (UTF-8) as the active OEM code page, as they may not be able to handle the variable-length encoding aspect of UTF-8 (a single character can be encoded by up to 4 bytes).
See this answer for more information.
[1] The cross-platform PowerShell (Core) v6+ edition uses (BOM-less) UTF-8 consistently. While it is possible to configure Unix terminals and thereby console (terminal) applications to use a character encoding other than UTF-8, doing so is rare these days - UTF-8 is almost universally used.
[2] By contrast, it is the $OutputEncoding preference variable that determines the encoding used for sending text to external programs, via the pipeline.

Solution is to enable Beta: Use Unicode UTF-8 for worldwide language support as described in What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?
Note: this solution may cause problems with legacy programs. Please read answer by mklement0 and answer by Quciksilver for details and alternative solutions.
Also I found explanation written by Ghisler helpful (source):
If you check this option, Windows will use codepage 65001 (Unicode
UTF-8) instead of the local codepage like 1252 (Western Latin1) for
all plain text files. The advantage is that text files created in e.g.
Russian locale can also be read in other locale like Western or
Central Europe. The downside is that ANSI-Only programs (most older
programs) will show garbage instead of accented characters.
Also Powershell before version 7.1 has a bug when this option is enabled. If you enable it , you may want to upgrade to version 7.1 or later.
I like this solution because it's enough to set it once and it's working. It brings consistent Unix-like UTF-8 behaviour to Windows. I hope I will not see any issues.
How to enable it:
Win+R → intl.cpl
Administrative tab
Click the Change system locale button
Enable Beta: Use Unicode UTF-8 for worldwide language support
Reboot
or alternatively via reg file:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"

Related

how to find the encoding type of configuration.properties files in powershell [duplicate]

This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?
Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click "Save As...".
It'll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicdoe
If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:
file *
This will give you the encoding details of all the files in that folder.
The (Linux) command-line tool 'file' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it's located in C:\Program Files\git\usr\bin.
Example:
C:\Users\SH\Downloads\SquareRoot>file *
_UpgradeReport_Files; directory
Debug; directory
duration.h; ASCII C++ program text, with CRLF line terminators
ipch; directory
main.cpp; ASCII C program text, with CRLF line terminators
Precision.txt; ASCII text, with CRLF line terminators
Release; directory
Speed.txt; ASCII text, with CRLF line terminators
SquareRoot.sdf; data
SquareRoot.sln; UTF-8 Unicode (with BOM) text, with CRLF line terminators
SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary info
SquareRoot.vcproj; XML document text
SquareRoot.vcxproj; XML document text
SquareRoot.vcxproj.filters; XML document text
SquareRoot.vcxproj.user; XML document text
squarerootmethods.h; ASCII C program text, with CRLF line terminators
UpgradeLog.XML; XML document text
C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
_UpgradeReport_Files; binary
Debug; binary
duration.h; us-ascii
ipch; binary
main.cpp; us-ascii
Precision.txt; us-ascii
Release; binary
Speed.txt; us-ascii
SquareRoot.sdf; binary
SquareRoot.sln; utf-8
SquareRoot.sln.docstates.suo; binary
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary infobinary
SquareRoot.vcproj; us-ascii
SquareRoot.vcxproj; utf-8
SquareRoot.vcxproj.filters; utf-8
SquareRoot.vcxproj.user; utf-8
squarerootmethods.h; us-ascii
UpgradeLog.XML; us-ascii
Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker
EXE can be found here
Install git ( on Windows you have to use git bash console). Type:
file --mime-encoding *
for all files in the current directory , or
file --mime-encoding */*
for the files in all subdirectories
Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).
Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by #Sybren, and I show how to do that via PowerShell in a later answer.
# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return 'utf8' }
switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
'^efbbbf' { return 'utf8' }
'^2b2f76' { return 'utf7' }
'^fffe' { return 'unicode' }
'^feff' { return 'bigendianunicode' }
'^0000feff' { return 'utf32' }
default { return 'ascii' }
}
}
dir ~\Documents\WindowsPowershell -File |
select Name,#{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} |
ft -AutoSize
Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)
A simple solution might be opening the file in Firefox.
Drag and drop the file into firefox
Press Ctrl+I to open the page info
and the text encoding will appear on the "Page Info" window.
Note: If the file is not in txt format, just rename it to txt and try again.
P.S. For more info see this article.
I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use #Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your profile.ps1:
$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe
And used like: file.exe --mime-encoding *. You must include .exe in the command for PS alias to work.
But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to ~\Documents\WindowsPowerShell. It's safe to use on a computer without git, but will write warnings when git is not found.
The .exe in the command is also how I use C:\WINDOWS\system32\where.exe from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.
you can simply check that by opening your git bash on the file location then running the command file -i file_name
example
user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8
Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM,
UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document.
For all other encodings, you have to trust heuristics based on statistics.
EDIT:
A powershell version of a C# answer from: Effective way to find any file's Encoding. Only works with signatures (boms).
# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
begin {
# set .net current directoy
[Environment]::CurrentDirectory = (pwd).path
}
process {
$reader = [System.IO.StreamReader]::new($filename,
[System.Text.Encoding]::default,$true)
$peek = $reader.Peek()
$encoding = $reader.currentencoding
$reader.close()
[pscustomobject]#{Name=split-path $filename -leaf
BodyName=$encoding.BodyName
EncodingName=$encoding.EncodingName}
}
.\get-encoding chinese8.txt
Name BodyName EncodingName
---- -------- ------------
chinese8.txt utf-8 Unicode (UTF-8)
get-childitem -file | .\get-encoding
Looking for a Node.js/npm solution? Try encoding-checker:
npm install -g encoding-checker
Usage
Usage: encoding-checker [-p pattern] [-i encoding] [-v]
Options:
--help Show help [boolean]
--version Show version number [boolean]
--pattern, -p, -d [default: "*"]
--ignore-encoding, -i [default: ""]
--verbose, -v [default: false]
Examples
Get encoding of all files in current directory:
encoding-checker
Return encoding of all md files in current directory:
encoding-checker -p "*.md"
Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):
encoding-checker -p "**"
For more examples refer to the npm docu or the official repository.
Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."
The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).
The only way that I have found to do this is VIM or Notepad++.
EncodingChecker
File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.
File Encoding Checker requires .NET 4 or above to run.

Is there a way to make VS Code not replace unknown text characters?

I'm currently using VS code to write a PowerShell script. As part of this script REGEX is used to replace/remove an atypical character that ends up in the data fairly often and causes trouble down the line. The character is (U+2019) and when the script is opened in code it is replaced permanently with (U+FFFD)
thus the line:
$user.Name = $user.Name -Replace "'|\’|\(|\)|\s+",""
Permanently becomes: $user.Name = $user.Name -Replace "'|\�|\(|\)|\s+",""
until it is manually changed. Seeing as I can paste the U+2019 character in once the file is open and then run the code, I assume that VS code can interpret it okay and the problem is with loading the file in. Is there some option that I can set to stop this being replaced when I open the file?
In my case, turning on the VS Code setting, "Files: Auto Guess Encoding," has fixed the problem, both for reading and saving.
This looks like it all comes down to encoding. Visual Studio Code by default uses UTF-8 and can in general handle saving/viewing Unicode properly.
If the issue is on Opening the file, then is is a case where Visual Studio Code is misinterpreting the file encoding on Opening the file. You can change the encoding (Configuring VS Code encoding) via settings in VS Code for file specific encoding (e.g. UTF-8, UTF-8BOM, UTF-16LE,etc.) by changing the "files.encoding" setting.
"files.encoding": "utf8bom"
If the issue is on saving the file, then it is being saved as ASCII(aka. Windows-1252) and not as proper UTF-8 or equivalent. On save, the character is replaced with the Replacement Character (U+FFFD) which would be displayed on the next time it is opened.
Note: The default encoding used for Windows PowerShell v5.1 is Windows-1252, and may be why saving the scripts with special characters may not work. PowerShell Core v6+ uses UTF-8 by default.
If I save in Vscode as Windows 1252 encoding, I see the character "’" change to � on next opening. I think the problem is Vscode doesn't recognize Windows 1252. It opens it as UTF8. If you reopen with the Windows 1252 encoding, it displays correctly. The other encodings work fine, even to display the character. This includes utf8 no bom.
Even Powershell 5 doesn't have this problem with Windows 1252, only Vscode. Set-content and get-content in Powershell 5 default to Windows 1252.
"’" | set-content file
get-content file
’
Powershell 7 would actually have the same problem:
get-content file
�

Powershell Encoding Default Output

I have the following problems with a powershell script that runs inside a TFS build. Both problems are unrelated to TFS and can be reproduced using an simple powershell command line window.
1) Completely unrelated to TFS. It seems Powershell does not like german umlauts when it comes to pipe.
1a) This line of code works fine and all umlauts are shown correctly
.\TF.exe hist "$/Test" /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /version:C1~T
1b) This line messes with umlauts
.\TF.exe hist "$/Test" /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /version:C1~T | Out-String
Initially I tried Out-File and changed encoding only to the that the umlauts are encoded wrong in every typeset (UTF8, unicode, UTF32,...)
I really do not know how to extract a string from standard output and get the umlauts right.
2) When using Out-File or Out-String each line in the output got truncated after 80 characters with seems to be the default screen buffer setting. How can I change that inside a powershell script and why does it even have an impact when redirecting the output.
Problem number 2 is not a Powershell problem. tfs documentation says following about default /format parameter (i.e. /format:brief)
Some of the data may be truncated.
/format:detailed does not have that warning, but it returns more information, which you can process with Powershell before doing Out-String or Out-File.
tl;dr
The following should solve both your problems, which stem from tf.exe using ANSI character encoding rather than the expected OEM encoding, and from truncating output by default.:
If you're using Windows PowerShell (the Windows-only legacy edition of PowerShell with versions up to v5.1):
$correctlyCapturedOutput =
& {
$prev = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Encoding]::Default
# Note the addition of /format:detailed
.\tf.exe hist '$/Test' /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /format:detailed /version:C1~T
[Console]::OutputEncoding = $prev
}
If you're using the cross-platform, install-on-demand PowerShell (Core) 7+:
Note: [System.Text.Encoding]::Default, which reports the active ANSI code page's encoding in Windows PowerShell, reports (BOM-less) UTF-8 in PowerShell (Core) (reflecting .NET Core's / .NET 5+'s behavior). Therefore, the active ANSI code page must be determined explicitly, which is most robustly done via the registry.
$correctlyCapturedOutput =
& {
$prev = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(
[int] ((Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP).ACP)
)
# Note the addition of /format:detailed
.\tf.exe hist '$/Test' /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /format:detailed /version:C1~T
[Console]::OutputEncoding = $prev
}
This Gist contains helper function Invoke-WithEncoding, which can simplify the above in both PowerShell edition as follows:
$correctlyCapturedOutput =
Invoke-WithEncoding -Encoding Ansi {
.\tf.exe hist '$/Test' /recursive /collection:https://TestTFS/tfs/TestCollection /noprompt /format:detailed /version:C1~T
}
You can directly download and define the function with the following command (while I can personally assure you that doing so is safe, it is advisable to check the source code first):
# Downloads and defines function Invoke-WithEncoding in the current session.
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex
Read on for a detailed discussion.
Re the umlaut (character encoding) problem:
While the output from external programs may print OK to the console, when it comes to capturing the output in a variable or redirecting it - such as sending it through the pipeline to Out-String in your case - PowerShell decodes the output into .NET strings, using the character encoding stored in [Console]::OutputEncoding.
If [Console]::OutputEncoding doesn't match the actual encoding used by the external program, PowerShell will misinterpret the output.
The solution is to (temporarily) set [Console]::OutputEncoding to the actual encoding used by the external program.
While the official tf.exe documentation doesn't discuss character encodings, this comment on GitHub suggests that tf.exe uses the system's active ANSI code page, such as Windows-1252 on US-English or Western European systems.
It should be noted that the use of the ANSI code page is nonstandard behavior for a console application, because console applications are expected to use the system's active OEM code page. As an aside: python too exhibits this nonstandard behavior by default, though its behavior is configurable.
The solutions at the top show how to temporarily switch [Console]::OutputEncoding to the active ANSI code page's encoding in order to ensure that PowerShell correctly decodes tf.exe's output.
Re output-line truncation with Out-String / Out-File (and therefore also > and >>):
As Mustafa Zengin's helpful answer points out, in your particular case - due to use of tf.exe - the truncation happens at the source, i.e. it is tf.exe itself that outputs truncated data per its default formatting (implied /format:brief when /noprompt is also specified).
In general, Out-String and Out-File / > / >> do situationally truncate or line-wrap their output lines based on the console-window width (with a default of 120 chars. in the absence of a console):
Truncation of line-wrapping applies only to output lines stemming from the representations of non-primitive, non-string objects generated by PowerShell's rich output-formatting system:
Strings themselves ([string] input) as well as the string representations of .NET primitive types (plus a few more singe-value-only types) are not subject to truncation / line-wrapping.
Since PowerShell only ever interprets output from external programs as text ([string] instances), truncation / line-wrapping do not occur.
It follows that there's usually no reason to use Out-String on external-program output - unless you need to join the stream (array) of output lines to form a single, multiline string for further in-memory processing.
However, note that Out-String invariably adds a trailing newline to the resulting string, which may be undesired; use (...) -join [Environment]::NewLine to avoid that; Out-String's problematic behavior is discussed in GitHub issue #14444.

Output to text file with cyrillic content

Trying to get an output through cmd with the list of folders and files inside a drive.
Some folders are written in cyrillic alphabet so I only get ??? symbols.
My command:
tree /f /a |clip
or
tree /f /a >output.txt
Result:
\---???????????
\---2017 - ????? ??????? ????
01. ?????.mp3
02. ? ???????.mp3
03. ????.mp3
04. ?????? ? ???.mp3
05. ?????.mp3
06. ???? ?????.mp3
07. ???????? ????.mp3
08. ??? ?? ?????.mp3
Cover.jpg
Any idea?
tree.com uses the native UTF-16 encoding when writing to the console, just like cmd.exe and powershell.exe. So at first you'd expect redirecting the output to a file or pipe to also use Unicode. But tree.com, like most command-line utilities, encodes output to a pipe or disk file using a legacy codepage. (Speaking of legacy, the ".com" in the filename here is historical. In 64-bit Windows it's a regular 64-bit executable, not 16-bit DOS code.)
When writing to a pipe or disk file, some programs hard code the system ANSI codepage (e.g. 1252 in Western Europe) or OEM codepage (e.g. 850 in Western Europe), while some use the console's current output codepage (if attached to a console), which defaults to OEM. The latter would be great because you can change the console's output codepage to UTF-8 via chcp.com 65001. Unfortunately tree.com uses the OEM codepage, with no option to use anything else.
cmd.exe, on the other hand, at least provides a /u option to output its built-in commands as UTF-16. So, if you don't really need tree-formatted output, you could simply use cmd's dir command. For example:
cmd /u /c "dir /s /b" | clip
If you do need tree-formatted output, one workaround would be to read the output from tree.com directly from a console screen buffer, which can be done relatively easily for up to 9,999 lines. But that's not generally practical.
Otherwise PowerShell is probably your best option. For example, you could modify the Show-Tree script to output files in addition to directories.

windows cmd pipe not unicode even with /U switch

I have a little c# console program that outputs some text using Console.WriteLine. I then pipe this output into a textfile like:
c:myprogram > textfile.txt
However, the file is always an ansi text file, even when I start cmd with the /u switch.
cmd /? says about the /u switch:
/U Causes the output of internal
commands to a pipe or file to be Unicode
And it indeed makes a difference, when I do an
c:echo "foo" > text.txt
the text.txt is unicode (without BOM)
I wonder why piping the output of my console program into a new file does not create an unicode file likewise and how i could change that?
I just use Windows Power Shell (which produces a unicode file with correct BOM), but I'd still like to know how to do it with cmd.
Thanks!
The /U switch, as the documentation says, affects whether internal commands generate Unicode output. Your program is not one of cmd.exe's internal commands, so the /U option does not affect it.
To create a Unicode text file, you need to make sure your program is generating Unicode text.
Even that may not be enough, though. I came across this blog from Junfeng Zhang describing how to write Unicode text in a console program. It checks the file type of the standard output handle. For character files (a console or LPT port), it calls WriteFileW. For all other types of handles (including disk files and pipes), it converts the output string to the console's current code page. I'm afraid I don't know how that translates into .Net terms, though.
I had a look how mscorlib implements Console.WriteLine, and it seems to decide on which text output encoding to use based on a call to GetConsoleOutPutCP. So I'm guessing (but have not yet confimed) that the codepage returned is a differnt one for a PS console than for a cmd console so that my program indeed only outputs ansi when running from cmd.