ElFinder and NTFS UTF-16 file names - encoding

I use WAMP server and ElFinder 2.x, it works fine except for filenames are encoded in UTF-8 when uploaded, so they look like Список предприятий ВРК123.xlsx in Windows Explorer. It's OK, but it would be nice to be able to copy files with unicode filename to ElFinder's folder via Windows Explorer.
As far as I know NTFS uses UTF-16. nao-pon answered here that one needs to set encoding, locale in connector options for multi-byte encodings. I've tried to set these options to 'UTF-16' and 'ru_RU.UTF-16', but ElFinder cannot load folder at all then and gives Invalid backend configuration. Readable volumes not available error.
UPD: it works fine with 'encoding' => 'CP1251' but well... it doesn't list files with names like 한자.txt.

Related

how to find the encoding type of configuration.properties files in powershell [duplicate]

This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?
Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click "Save As...".
It'll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicdoe
If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:
file *
This will give you the encoding details of all the files in that folder.
The (Linux) command-line tool 'file' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it's located in C:\Program Files\git\usr\bin.
Example:
C:\Users\SH\Downloads\SquareRoot>file *
_UpgradeReport_Files; directory
Debug; directory
duration.h; ASCII C++ program text, with CRLF line terminators
ipch; directory
main.cpp; ASCII C program text, with CRLF line terminators
Precision.txt; ASCII text, with CRLF line terminators
Release; directory
Speed.txt; ASCII text, with CRLF line terminators
SquareRoot.sdf; data
SquareRoot.sln; UTF-8 Unicode (with BOM) text, with CRLF line terminators
SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary info
SquareRoot.vcproj; XML document text
SquareRoot.vcxproj; XML document text
SquareRoot.vcxproj.filters; XML document text
SquareRoot.vcxproj.user; XML document text
squarerootmethods.h; ASCII C program text, with CRLF line terminators
UpgradeLog.XML; XML document text
C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
_UpgradeReport_Files; binary
Debug; binary
duration.h; us-ascii
ipch; binary
main.cpp; us-ascii
Precision.txt; us-ascii
Release; binary
Speed.txt; us-ascii
SquareRoot.sdf; binary
SquareRoot.sln; utf-8
SquareRoot.sln.docstates.suo; binary
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary infobinary
SquareRoot.vcproj; us-ascii
SquareRoot.vcxproj; utf-8
SquareRoot.vcxproj.filters; utf-8
SquareRoot.vcxproj.user; utf-8
squarerootmethods.h; us-ascii
UpgradeLog.XML; us-ascii
Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker
EXE can be found here
Install git ( on Windows you have to use git bash console). Type:
file --mime-encoding *
for all files in the current directory , or
file --mime-encoding */*
for the files in all subdirectories
Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).
Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by #Sybren, and I show how to do that via PowerShell in a later answer.
# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return 'utf8' }
switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
'^efbbbf' { return 'utf8' }
'^2b2f76' { return 'utf7' }
'^fffe' { return 'unicode' }
'^feff' { return 'bigendianunicode' }
'^0000feff' { return 'utf32' }
default { return 'ascii' }
}
}
dir ~\Documents\WindowsPowershell -File |
select Name,#{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} |
ft -AutoSize
Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)
A simple solution might be opening the file in Firefox.
Drag and drop the file into firefox
Press Ctrl+I to open the page info
and the text encoding will appear on the "Page Info" window.
Note: If the file is not in txt format, just rename it to txt and try again.
P.S. For more info see this article.
I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use #Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your profile.ps1:
$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe
And used like: file.exe --mime-encoding *. You must include .exe in the command for PS alias to work.
But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to ~\Documents\WindowsPowerShell. It's safe to use on a computer without git, but will write warnings when git is not found.
The .exe in the command is also how I use C:\WINDOWS\system32\where.exe from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.
you can simply check that by opening your git bash on the file location then running the command file -i file_name
example
user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8
Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM,
UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document.
For all other encodings, you have to trust heuristics based on statistics.
EDIT:
A powershell version of a C# answer from: Effective way to find any file's Encoding. Only works with signatures (boms).
# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
begin {
# set .net current directoy
[Environment]::CurrentDirectory = (pwd).path
}
process {
$reader = [System.IO.StreamReader]::new($filename,
[System.Text.Encoding]::default,$true)
$peek = $reader.Peek()
$encoding = $reader.currentencoding
$reader.close()
[pscustomobject]#{Name=split-path $filename -leaf
BodyName=$encoding.BodyName
EncodingName=$encoding.EncodingName}
}
.\get-encoding chinese8.txt
Name BodyName EncodingName
---- -------- ------------
chinese8.txt utf-8 Unicode (UTF-8)
get-childitem -file | .\get-encoding
Looking for a Node.js/npm solution? Try encoding-checker:
npm install -g encoding-checker
Usage
Usage: encoding-checker [-p pattern] [-i encoding] [-v]
Options:
--help Show help [boolean]
--version Show version number [boolean]
--pattern, -p, -d [default: "*"]
--ignore-encoding, -i [default: ""]
--verbose, -v [default: false]
Examples
Get encoding of all files in current directory:
encoding-checker
Return encoding of all md files in current directory:
encoding-checker -p "*.md"
Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):
encoding-checker -p "**"
For more examples refer to the npm docu or the official repository.
Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."
The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).
The only way that I have found to do this is VIM or Notepad++.
EncodingChecker
File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.
File Encoding Checker requires .NET 4 or above to run.

vmoptions classpath with non-ascii characters

I'm adding the following line -classpath/p ${installer:sys.userHome}/.comput/updates/latest.jar to the vmoption file. (Tried both options: via installer 'Add VM option' action and via launcher config).
Works pretty fine with ASCII user name (with spaces as well), but fails with non-ascii user names (I'm testing with Russian). The vmoption file looks fine to me: the path is correct and has the right encoding: CP 1251 for my case:
However the path passed to JVM seems to have incorrectly decoded characters: On the attached screen you may see the actual path passed to JVM (checked via YourKit) from Install4J launcher:
and you may also compare it with the screen when the non-ascii path is passed via command prompt:
The only workaround I have found is to substitute the path with 8.3 Windows path, but converting to it on pure Java seems very error prone to me.
Appeciate your help very much!

CFdirectory with Coldfusion 11, issue with non ascii characters in filenames

I have a similar question to this:
ColdFusion, CFDirectory and the French
which was not given a satisfactory answer.
We have upgraded from Coldfusion 9 to Coldfusion 11. So far no major problems except the following:
When using CFdirectory to display file names that contain non ASCII characters in their names (eg: accents, umlauts) we get to see the file name with replacement characters � instead of the correct UTF equivalent. For example a file named L’État, c’est moi.pdf is displayed as L�����tat, c���est moi.pdf.
We are confident that this is a Coldfusion issue as nothing has changed but the Coldfusion version. With Coldfusion 9 CFdirectory worked OK when listing the same accented filenames. Our OS is Redhat 7.0 and the file names are also displayed correctly on the terminal with the ls command. I have also created a quick PHP script to see if PHP can read correctly the directory with the "readdir" command and there no problems there either, filenames are rendered correctly.
So I believe this has to be a Coldfusion 11 issue. I have added the -Dfile.encoding=UTF-8 -Dencoding=UTF-8 parameters in the JVM settings from the Coldfusion administrator server interface but it made no difference.
Any suggestions on how to rectify this would be appreciated.
example of code used follows:
<cfdirectory
action="list"
directory="#ExpandPath( './' )#/pdfs"
listinfo="name"
name="qFile"
/>
<cfdump
var="#qFile#"
label="All Files"
/>
Have you tried setting the cfprocessingdirective tag?
<cfprocessingdirective pageencoding="utf-8">
CF 11 WikiDocs
Also, In the Chrome Network Inspector, make sure the encoding is being returned correctly. Eg:
Content-Type:text/html; charset=UTF-8
If your environment is Linux, you need to have a clean UTF-8 configuration.
Please have a look here.
I had the same problem, I just add into the file ~/.bashrc these lines:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
After that, don't forget to restart your Coldfusion Server
sudo /opt/coldfusion11/cfusion/bin/coldfusion restart
Please see: Why are certain characters not being injected correctly to SQL Server from a CFQUERY?
Make sure your file is saved with encoding Unicode UTF-8.
Also make sure your JVM arguments will process that as well. Admin > Server Settings > Java and JVM. Add " -Dfile.encoding=UTF-8" to the Arguments.
I had the same problem this solved my bug
/.bashrc
LC_ALL="de_DE.UTF-8"
on linux and after change restart coldfusion application

The combination PostScript and UTF-8 is not supported

I am new to linux and antiword.
I want to convert word to PostScript.
I am able to create the file but the PS file is not openning.
And at the time of conversion it is giving error like
"The combination PostScript and UTF-8 is not supported"
This utility 'antiword', the latest version of which I found to be 0.37 (21 Oct. 2005) ships with a few Readme files.
It explains itself in its own History file as such:
The name comes from: "The antidote against people who send Microsoft(R) Word files to everybody, because they believe that everybody runs Windows(R) and therefore runs Word".
The same file says about its current version:
Beta release, for evaluation by the public.
And it also says, under the header Known Limitations:
[....]
2) Antiword doesn't show all the images included in a Word document.
[....]
7) PostScript ouput will not work in combination with UTF-8. It only works in combination with character sets ISO-8859-1, ISO-8859-2 and ISO-8859-5.
8) Antiword's error messages are not very helpful.
So here you are. The behavior you are experiencing is exactly as is documented. :-)
#Tushar: It's also recommended to read the included FAQ document.

Oracle -Character Encoding

I am facing a peculiar problem where i need to update a particular value in database to say 'Hellò'. When i run normal update statement the values are updated fine. But when i put it i a .sql script file and then run the update statement the last character gets replaced by a junk value. Can some one enlighten me on this and how oracle processes script files?
If the update statement works then this isn't an issue with the character encoding in the database so there are two main culprits to look at; your software and your file encoding.
Check that you editor is UTF8 compliant - Notepad for example is not, but Wordpad is, and so are better editors like UltraEdit. You also need to check that the file is saved as UTF8 since this in not always the default and if you edited and saved a file with Notepad, for example, it won't be UTF8 any more.
Once you have a UTF8 file then you must load it into the database via software that supports UTF8, which excludes SQLPlus in Windows 10g. SQL Developer is OK. If you want to use SQLPLus upgrade your client to 11g. You don't say how you are loading it, so check that whatever you are using supported UTF8.