How to get á,ű,ú characters instead of ?? using JRuby IRB? - centos

I need to use the Hungarian chars mentioned in the title, but somehow JRuby doesn't seem to accept those, and shows ?? instead of them.
The OS is Centos 7, but the same thing happens on 6.5.
The system lang is set to hu_HU.utf8.
I also set the encoding in .jrubyrc (default_external, default_internal) to UTF-8 then to ISO-8859-2.
The result is the same.
If I use Ruby 2.1.5 then no problems at all. The mentioned chars show up as expected in IRB.
I used rbenv to install both Ruby 2.1.5 and JRuby 1.7.16.1.
Any ideas about how to make it show these Hungarian chars properly?
For example, I get ??rhaj?? instead of űrhajó (spaceship in Hungarian).

Problem solved.
Needed to put this:
LC_ALL=hu_HU.utf8
LANG=hu_HU.utf8
inside /etc/environment

Related

vmoptions classpath with non-ascii characters

I'm adding the following line -classpath/p ${installer:sys.userHome}/.comput/updates/latest.jar to the vmoption file. (Tried both options: via installer 'Add VM option' action and via launcher config).
Works pretty fine with ASCII user name (with spaces as well), but fails with non-ascii user names (I'm testing with Russian). The vmoption file looks fine to me: the path is correct and has the right encoding: CP 1251 for my case:
However the path passed to JVM seems to have incorrectly decoded characters: On the attached screen you may see the actual path passed to JVM (checked via YourKit) from Install4J launcher:
and you may also compare it with the screen when the non-ascii path is passed via command prompt:
The only workaround I have found is to substitute the path with 8.3 Windows path, but converting to it on pure Java seems very error prone to me.
Appeciate your help very much!

Wget with a special character inside the URL

I download an HTML page and its files via Wget on Windows:
wget -m -k -p -np --html-extension
That HTML content has a lot of URLs with special characters (example: Chp1).
There are two issues:
Inside the HTML content, URLs (including special character's) become some random words:
Expectation:
Chp1
Actual:
Chp1
Filename is random words.
The second issue can be solved by adding --restrict-file-names=nocontrol.
How do I solve the first one? Is this Windows version a problem?
Obviously, inside HTML, it converts URLs with special characters to something...
Your problem comes from the fact that Windows will still treat your UTF-8 characters as Latin-1 characters, even with the --restrict-file-names=nocontrol command line argument.
GNU's site documents this bug here, and it is still unfortunately an issue for Windows users to this day. Your command would work inside a Linux environment however.

Forcing UTF-8 over cp1252 (Python3)

I've written some code that makes use of the Biopython Entrez wrapper. Code was working fine on my previous Win10 laptop (Python 3.5.1), but I've just ported the code to a new Win10 laptop with the same versions of every package and Python installed and I'm now getting a decode error.
The traceback error leads to a function that fetches text - it's attempting to decode the text using cp1252 when it should be using UTF-8. I know that similar questions have been asked, but none have dealt with this problem happening inside a package (Biopython in my case). Copying the UTF-8 encoding file in Python/lib and renaming it to cp1252.py solves the problem, but this obviously is not a long term solution.
File "C:\Users\arjun\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 21715: character maps to <undefined>
Use the io module for reading if you're using Python 3.x (https://docs.python.org/2/library/io.html#io.open).
By default, it will use the encoding specified on its running platform. You can also specify your own encoding as explained in the docs.

CFdirectory with Coldfusion 11, issue with non ascii characters in filenames

I have a similar question to this:
ColdFusion, CFDirectory and the French
which was not given a satisfactory answer.
We have upgraded from Coldfusion 9 to Coldfusion 11. So far no major problems except the following:
When using CFdirectory to display file names that contain non ASCII characters in their names (eg: accents, umlauts) we get to see the file name with replacement characters � instead of the correct UTF equivalent. For example a file named L’État, c’est moi.pdf is displayed as L�����tat, c���est moi.pdf.
We are confident that this is a Coldfusion issue as nothing has changed but the Coldfusion version. With Coldfusion 9 CFdirectory worked OK when listing the same accented filenames. Our OS is Redhat 7.0 and the file names are also displayed correctly on the terminal with the ls command. I have also created a quick PHP script to see if PHP can read correctly the directory with the "readdir" command and there no problems there either, filenames are rendered correctly.
So I believe this has to be a Coldfusion 11 issue. I have added the -Dfile.encoding=UTF-8 -Dencoding=UTF-8 parameters in the JVM settings from the Coldfusion administrator server interface but it made no difference.
Any suggestions on how to rectify this would be appreciated.
example of code used follows:
<cfdirectory
action="list"
directory="#ExpandPath( './' )#/pdfs"
listinfo="name"
name="qFile"
/>
<cfdump
var="#qFile#"
label="All Files"
/>
Have you tried setting the cfprocessingdirective tag?
<cfprocessingdirective pageencoding="utf-8">
CF 11 WikiDocs
Also, In the Chrome Network Inspector, make sure the encoding is being returned correctly. Eg:
Content-Type:text/html; charset=UTF-8
If your environment is Linux, you need to have a clean UTF-8 configuration.
Please have a look here.
I had the same problem, I just add into the file ~/.bashrc these lines:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
After that, don't forget to restart your Coldfusion Server
sudo /opt/coldfusion11/cfusion/bin/coldfusion restart
Please see: Why are certain characters not being injected correctly to SQL Server from a CFQUERY?
Make sure your file is saved with encoding Unicode UTF-8.
Also make sure your JVM arguments will process that as well. Admin > Server Settings > Java and JVM. Add " -Dfile.encoding=UTF-8" to the Arguments.
I had the same problem this solved my bug
/.bashrc
LC_ALL="de_DE.UTF-8"
on linux and after change restart coldfusion application

Change language of system and error messages in PostgreSQL

Is it possible to change the language of system messages from PostgreSQL?
In MSSQL for instance this is possible with the SQL statement SET LANGUAGE.
SET lc_messages TO 'en_US.UTF-8';
More info on requirements and limitations here.
For me neither Milen A. Radev's nor user1's answer worked - editing PostgreSQL\11\data\postgresql.conf had absolutely no effect. Even after setting lc_messages = 'random value' PostgreSQL would still start.
What helped was to delete PostgreSQL\11\share\locale\*\LC_MESSAGES, after that I finally got English messages.
Milen's answer didn't work for me.
I got it working by modifying a file postgresql.conf. If you're on Linux, write:
sudo find / -iname postgresql.conf
I had mine in /var/lib/pgsql/data.
Then edit the file and search for a variable lc_messages and change it to your preferred language, e.g. 'en_US.UTF-8'.
If PostgreSQL stops working and you check in its log that you have an error that looks like this:
invalid value for parameter "lc_messages": "en_US.UTF-8"
You have to edit /etc/locale.gen and uncomment line with encoding from the error message (e.g. en_US.UTF-8). Then you have to run locale-gen (as root) to update the locales. Finally, to check if the locale is set you can run locale -a.
Or, if you want the language to be English, you can just set lc_messages = 'C'.
In my case (on Windows Server 2019) I managed to change language by creating a system environment variable "LC_MESSAGES" with value "English":
setx LC_MESSAGES English /m
(Solution taken from here)
I've reproduced the same issue with naming of PostgreSQL error messages which were specifically displayed in Intellij IDEA similar to:
the only solution for me was renaming C:\Program Files\PostgreSQL\13\share\locale folder to another default name.
then as result changed to:
To be noticed: it wasn't related to Intellij Idea configurations at all, because I tested different answers (and other non-related to IDE answers), e.g., like:
Help | Edit custom VM options
Setting of Environments variables
Using specific commands
only change postgresql.conf is not working on windows10,the following method is fine for me,is very simple but work:
change lc_message = en_US.UTF-8, in postgresql.conf;
delete all files in fold: \share\locale, expect es fold or the
language you want to keep;
restart pg service and then you will find that is what you want!
I simply deleted the LC_MESSAGE folder from
C:\Program Files\PostgreSQL\14\share\locale\<LANGUAGE YOU WANT TO GET RID OF>\LC_MESSAGE
and re-log in to psql