How can I get the '…' character into a findstr in cmd? - command-line

I'm trying to search for a special character using the CMD command:
findstr /s /i <special_character> *.*
The character I want to search for is …
That is, three dots all compressed into a single character.
How can I do this?

If the character encoding of the text file is Microsoft Windows proprietary codepage 1252 (English and Western Europe), then you can use FINDSTR.EXE.
The ellipsis in codepage 1252 is at 0x85. To enter the character at the command line using codepage 437, you would hold down the Alt key while entering the decimal value of 0x85. That decimal value is 133.
Enter the following text into Notepad by either copy and paste or holding down the Alt key while entering on the numeric keypad 0133. Save this as myscript.bat.
findstr /L "…" myfile.txt
From a cmd shell, enter TYPE myscript.bat. It will appear as shown below. This is because 133 in codepage 1252 is the HORIZONTAL ELLIPSIS character, but in codepage 437 it is a LATIN SMALL LETTER A WITH GRAVE.
findstr /L "à" myfile.txt
Use the chcp command to see the codepage currently used in the cmd shell.
When myscript.bat is run from the cmd shell, the character at codepoint 0x85 will appear as LATIN SMALL LETTER A WITH GRAVE because it is in codepage 437.

Related

Output to text file with cyrillic content

Trying to get an output through cmd with the list of folders and files inside a drive.
Some folders are written in cyrillic alphabet so I only get ??? symbols.
My command:
tree /f /a |clip
or
tree /f /a >output.txt
Result:
\---???????????
\---2017 - ????? ??????? ????
01. ?????.mp3
02. ? ???????.mp3
03. ????.mp3
04. ?????? ? ???.mp3
05. ?????.mp3
06. ???? ?????.mp3
07. ???????? ????.mp3
08. ??? ?? ?????.mp3
Cover.jpg
Any idea?
tree.com uses the native UTF-16 encoding when writing to the console, just like cmd.exe and powershell.exe. So at first you'd expect redirecting the output to a file or pipe to also use Unicode. But tree.com, like most command-line utilities, encodes output to a pipe or disk file using a legacy codepage. (Speaking of legacy, the ".com" in the filename here is historical. In 64-bit Windows it's a regular 64-bit executable, not 16-bit DOS code.)
When writing to a pipe or disk file, some programs hard code the system ANSI codepage (e.g. 1252 in Western Europe) or OEM codepage (e.g. 850 in Western Europe), while some use the console's current output codepage (if attached to a console), which defaults to OEM. The latter would be great because you can change the console's output codepage to UTF-8 via chcp.com 65001. Unfortunately tree.com uses the OEM codepage, with no option to use anything else.
cmd.exe, on the other hand, at least provides a /u option to output its built-in commands as UTF-16. So, if you don't really need tree-formatted output, you could simply use cmd's dir command. For example:
cmd /u /c "dir /s /b" | clip
If you do need tree-formatted output, one workaround would be to read the output from tree.com directly from a console screen buffer, which can be done relatively easily for up to 9,999 lines. But that's not generally practical.
Otherwise PowerShell is probably your best option. For example, you could modify the Show-Tree script to output files in addition to directories.

GHCi: incorrect text output despite the right font and codepage

Windows 8.1 x64 Russian.
I create and fill the %AppData%\ghc\ghci.conf file:
:! title GHCi (Haskell interpreter)
putStrLn $ replicate 30 '*'
putStrLn "© Андрей Бушман, 2014" -- The sample of some not English chars...
:set prompt "\x03BB: "
This file has the UTF-8 without BOM encoding. I run ghci via Cmd.exe and PowerShell.exe. I set necessary font and codepage before. But I get unexpected result: incorrect text output. Why I get it?
UPD
I can resave my ghci.conf file with Windows 1251 encoding and add the :! chcp 1251at first row:
:! chcp 1251
:! title GHCi (Haskell interpreter)
putStrLn $ replicate 30 '*'
putStrLn "© Андрей Бушман, 2014" -- The sample of not English chars...
:set prompt "\x03BB: "
Now I see the correct result:
But why it doesn't work when I save with UTF-8 or UTF-8 without BOM encodings and set 65001 codepage?
Also... I get incorrect text in the title when I use the cyrillic chars:
:! chcp 1251
-- Cyrillic in te title:
:! title GHCi (Интерпретатор Haskell)
putStrLn $ replicate 30 '*'
putStrLn "© Андрей Бушман, 2014 (\"авторские права\" на данное сообщение ;) )"
:set prompt "\x03BB: "
How can I fix it?
This is probably related to the following question:
Unicode console I/O in Haskell on Windows
Btw, setting the font in the console will not affect what codepage is used.

UTF8 Script in PowerShell outputs incorrect characters

I've created a UTF8 script for PowerShell with non-ascii characters.
characters.ps1:
Write-Host "ç â ã á à"
When the script is run in PowerShell console, it outputs wrong characters.
However, if I write the chars directly in the console, they are shown as expected:
Does anyone knows what causes that behavior?
The problem arised from a script I wrote who has hardcoded paths which include non-ascii characters. When I try to pass the path as argument to cmdlets (in the case I was gonna robocopy a folder) the command fails because it cannot find the path (which is output wrongly in the screen).
Changing the encoding of the script to UTF-8 with BOM solved the issue.
I was using SublimeText with the EncodingHelper plugin to control the character-set of the script. It was set correctly to UTF8.
I changed the encoding of the script in SublimeText to "UTF-8 with BOM" and the output was shown correctly.
I created the same script with Notepad++, which defaults to "UTF-8 with BOM", and the string was shown correctly in the console.
I changed the encoding of the script in Notepad++ to "UTF-8 without BOM" and it was shown incorrectly.
It seems PowerShell cannot guess correctly the encoding of UTF-8 files with no BOM.
In my case the problem was caused by creating a new PowerShell script with Visual Studio Code which has the default encoding of UTF-8 without BOM. Set the encoding to "Windows 1252" solved the problem.
It seems that PowerShell can't handle UTF-8 without BOM, it needs "Windows 1252" or "UTF8 with BOM" encodings.
try this before invoking your script :
$OutputEncoding = [Console]::OutputEncoding
There is a reliable way to detect utf8nobom (https://unicodebook.readthedocs.io/guess_encoding.html). Like a lot of other little things, this seems to work better in PS 6. Even my beloved emacs 25 for windows gets the encoding wrong.
PS C:\users\admin> pwsh
PowerShell 6.1.0
Copyright (c) Microsoft Corporation. All rights reserved.
https://aka.ms/pscore6-docs
Type 'help' to get help.
PS C:\users\admin> "write-host 'ç â ã á à'" | set-content -Encoding utf8NoBOM accent.ps1
PS C:\users\admin> .\accent
ç â ã á à

windows cmd pipe not unicode even with /U switch

I have a little c# console program that outputs some text using Console.WriteLine. I then pipe this output into a textfile like:
c:myprogram > textfile.txt
However, the file is always an ansi text file, even when I start cmd with the /u switch.
cmd /? says about the /u switch:
/U Causes the output of internal
commands to a pipe or file to be Unicode
And it indeed makes a difference, when I do an
c:echo "foo" > text.txt
the text.txt is unicode (without BOM)
I wonder why piping the output of my console program into a new file does not create an unicode file likewise and how i could change that?
I just use Windows Power Shell (which produces a unicode file with correct BOM), but I'd still like to know how to do it with cmd.
Thanks!
The /U switch, as the documentation says, affects whether internal commands generate Unicode output. Your program is not one of cmd.exe's internal commands, so the /U option does not affect it.
To create a Unicode text file, you need to make sure your program is generating Unicode text.
Even that may not be enough, though. I came across this blog from Junfeng Zhang describing how to write Unicode text in a console program. It checks the file type of the standard output handle. For character files (a console or LPT port), it calls WriteFileW. For all other types of handles (including disk files and pipes), it converts the output string to the console's current code page. I'm afraid I don't know how that translates into .Net terms, though.
I had a look how mscorlib implements Console.WriteLine, and it seems to decide on which text output encoding to use based on a call to GetConsoleOutPutCP. So I'm guessing (but have not yet confimed) that the codepage returned is a differnt one for a PS console than for a cmd console so that my program indeed only outputs ansi when running from cmd.

Is there a Windows command shell that will display Unicode characters?

Assuming I have fonts installed which have the appropriate glyphs in them, is there a command shell for Windows XP that will display Unicode characters? At a minimum, two things that should display Unicode correctly:
Directory listings. I don't care what I have to type (dir, ls, get-childitem, etc.), so long as files with Unicode characters in their names appear with the right glyphs, not the unprintable character box.
Text file content listings. Again, doesn't matter to me if it's 'less', 'more', 'cat', 'dog', etc., so long as the characters are printed. I recognize that this is more complicated because of character encoding of the file, so if I have to specify that on the command line that's fine with me.
Here's what I've tried so far:
cmd.exe
Windows PowerShell; including the multilingual version.
Cygwin bash
No luck. I even tried installing custom fonts for cmd/PowerShell. PowerShell and cmd.exe seem to be Unicode-aware in the sense that I can copy/paste the non-printable box out of there and it will paste into other apps with the correct characters. Cygwin (?) seems to convert to the ? character and that comes through in the copy/paste.
Any ideas?
To do this with cmd.exe, you'll need to use the console properties dialog to switch to a Unicode TrueType font.
Then use these commands:
CHCP 65001
DIR > UTF8.TXT
TYPE UTF8.TXT
Commands:
Switch console to UTF-8 (65001)
Redirect output of DIR to UTF8.TXT
Dump UTF-8 to console
The characters will still need to be supported by the font to display properly on the console.
I18N: Unicode at the Windows command prompt (C++; .Net; Java)
This was a major issue in PowerShell v1. Version 2 is shipping with a "graphical shell" that corrects the problem, which is ultimately not with PowerShell, but with the Windows console host (which Cmd.exe also uses). You can get the current CTP for PowerShell v2, if you want.
Actually, PowerShell v2.0 was finalized and shipped with the release of Windows 7 and Windows Server 2008 R2 in early August. In addition, the backported versions (Windows Vista/2008) reached their Release Candidate milestone just the other day; Windows XP/Windows Server 2003 should follow very shortly. Linky linky.
Setting the codepage to UTF-8 with the command "chcp 65001" should help you print file contents correctly to the shell (using cmd.exe). This won't work for directory listings though (UTF-16 encoding in NTFS file names).
Try this:
powershell.exe -NoExit /c "chcp.com 65001"
Who uses msysgit:
powershell.exe -NoExit /c "chcp.com 65001; sh --login -i"
Do not forget to change font of window to TrueType font with UTF-8 support ("Lucida Console")
This is how I can got Chinese output in cmd.exe running on Windows 7 Pro English Version. I also tried file names with Japanese, Russian, and Polish and they all seem to display correctly. Input also seems to work, at least when I tried to do a dir xxx* containing non-ascii characters.
Install console2, which is a front-end to cmd.exe (and other shells)
After installation, follow these instructions
Delete the key HKEY_CURRENT_USER\Console\Console2 command window in the registry.
Import the following data into windows registry:
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Console\Console2 command window]
"CodePage"=dword:000003a8
"FontSize"=dword:000a0000
"FontFamily"=dword:00000036
"FontWeight"=dword:00000190
"FaceName"="細明體"
"HistoryNoDup"=dword:00000000
You may or may not have to change the font. Initially I had the font set to #NimSum, and the Chinese characters came out rotated 90 degrees. Then I switched to NimSum (without the #) and it came out correctly. Then just out of curiosity I switched to Consola and yet I can still see the Chinese characters. So I'm not sure if you actually have to set the font or not.
For a true shell, try PowerShell Plus. You can select Unicode fonts and work with other languages, not only in the editor, but in the true console.
Try Console 2. Be careful with the colors/palette configurations though. Those are a bit buggy. I have confirmed them to not work; they behave like cmd.exe.
Open an elevated command prompt (run cmd as administrator). Query your registry for available TrueType fonts to the console by:
REG query "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont"
You'll see an output like:
0 REG_SZ Lucida Console
00 REG_SZ Consolas
936 REG_SZ *新宋体
932 REG_SZ *MS ゴシック
Now we need to add a TrueType font that supports the characters you need like Courier New, we do this by adding zeros to the string name, so in this case the next one would be "000" :
REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" /v 000 /t REG_SZ /d "Courier New"
Now we implement UTF-8 support:
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 65001 /f
Set default font to "Courier New":
REG ADD HKCU\Console /v FaceName /t REG_SZ /d "Courier New" /f
Set font size to 20 :
REG ADD HKCU\Console /v FontSize /t REG_DWORD /d 20 /f
Enable quick edit if you like :
REG ADD HKCU\Console /v QuickEdit /t REG_DWORD /d 1 /f
As of November 2011, MinTTY is now Cygwin's default terminal emulator (installed by setup.exe). MinTTY is a fork of PuTTY's terminal emulator, and as such sports proper Unicode support and much-improved compatibility with other terminal emulators.
PowerShell V2 CTP3 inside Console2 seems to do that. The only downside is that the default console encoding is UCS-2 LE instead of UTF-8.
Also from
UTF-16 on cmd.exe
Open/run cmd.exe
Click on the icon at the top-left corner
Select properties
Then "Font" bar
Select "Lucida Console" and OK.
Write Chcp 10000 at the prompt
Finally dir /b
A fast and convenient way to do it is on the Explorer.
1. Open the Explorer window.
2. Traverse to the top level of directory where you want to find.
3. On the upper right corner, there is a find field.