I am encoding every py script in my project to utf-8, as we are definitely migrating our application from Jython 2.2.1 to Jython 2.5.2. For that reason, I have added a 'magic comment' at the first line of every py file (#encoding=utf-8) and I have started testing whether everything is OK by debugging the application in Eclipse.
The problem appears in a script that contains the string straße, because it is automatically converted to straße.
My doubt is if this change is caused by Pydev or it happens because utf-8 doesn't cover this kind of characters.
What can I do to automatically avoid this issue with other 'strange' strings I haven't detected yet?
Are you sure your .py files use UTF-8 encoding? Try to open it with WebBrowser (as text) and check various encodings. While you see straße if seems that ß is encoded by two bytes (most probably UTF-8) but ensure it is really UTF-8.
Also check in Eclipse settings on Project/Properties. There is Resource panel with "Text file encoding" setting (I use Eclipse only for Java projects and do not know if Pydev uses this setting).
Try such code with PyDev and check if result file contains UTF-8 text:
# -*- coding: utf8 -*-
import codecs
f = codecs.open('strasse.txt', 'wb', 'UTF-8')
f.write('straße'.decode('UTF-8'))
f.close()
My guess is that you had a different encoding at that file (say cp1252, which is the default windows encoding) and when you put utf-8 it became garbled (so, it wasn't really PyDev who garbled it, but the fact that it was previously in another encoding).
While you're at it, also make sure you also set the default encoding for Eclipse to utf-8 (which is usually the default platform encoding) -- you can do this at preferences > general > workspace.
As a note, I believe the most common way of putting that comment is #coding: utf-8, followed by #-*- coding: utf-8 -*- (i.e.: not #encoding:utf-8) -- although all those formats work (see pep: https://www.python.org/dev/peps/pep-0263/)
Related
I downloaded multi-module Scala project from GitHub (https://github.com/henrikengstrom/roygbiv), and one of the module is Play 2.0 module. So I can run whole application using SBT's run command on each module, and all works fine. But when I add to Play 2.0 template (index.scala.html) non-English characters and press F5 in browser I get compilation error:
IO error while decoding
C:\Users...\web\target\scala-2.9.1\src_managed\main\views\html\index.template.scala
with UTF-8 Please try specifying another one using the -encoding
option
Play 2.0 module I run also using SBT's run command, not using Play console.
I checked source file encoding - it is UTF-8. Also tried UTF-8 without BOM.
Where can be problem?
You could try to startup SBT with forced encoding to UTF-8. I read in this post that for some people it helped to start SBT with the following option:
JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8'
Then one of the first lines of SBT should display:
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
Your problem seems to be this: your intermediate scala files are not encoded correctly.
Here is the process:
Play takes your template file (foo.scala.html) and translates this into Scala: target/scala-2.10/src_managed/main/views/html/foo.template.scala. This then gets compiled by sbt to .class files and run by play.
When sbt creates these intermediate files, it creates them with the default encoding (in my case a Windows machine so UTF-8 without BOM - your machine may differ). Importantly, this encoding sticks around, so even if I change the encoding of the original template file (foo.scala.html to UTF-16), the encoding of the .scala file is still the same (UTF-8 without BOM in my case). However, the file no longer compiles because the file can't be read because the scala compiler is expecting ITF-8.
The 'correct' solution is to always use UTF-8, and in fact this was the solution recommended for play 1.x see Play documentation Internationalization. Here is the equivalent for play 2. You can also use normal internationalization messages files.
So, if you specify
JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8' sbt
as suggested by Bjorn, then this will tell sbt that all files that it reads and writes will be in UTF8. You can also specify the file encoding for the scala compiler in your Build.scala:
val main = play.Project(appName, appVersion, appDependencies).settings(
scalacOptions ++= Seq("-encoding", "UTF-8")
// Add your own project settings here
)
This tells the scala compiler that all files that it reads (i.e. the foo.template.scala) are encoded in UTF-8. If you set this to your default encoding, this may work as well.
Your best bet is to do an sbt clean, ensuring that the offending files have disappeared, and restarting with the JAVA_TOOL_OPTION as suggested above. However, you'll have to ensure that all of your builds take this into account (jenkins, other developers etc).
The following works fine for me. Encoded in utf-8 default by eclipse(scala-ide)
#(message: String)
#main("Welcome to Play 2.1") {
<div>Ελληνικά</div>
<div>
#message
</div>
<br />
<ul>
#for(p<-message) {
<li>
#p
</li>
}
</ul>
}
What editor are you using to save these files? There might be a possibility that your characters are double encoded and thus stored incorrectly as UTF-8. Eg. characters encoded in iso-8859-1 are encoded again as UTF-8.
I was having this problem and figured out that it was being caused by some characters of my native language I had in the comments (ã). I removed those and the error disappeared.
Current encoding is UTF-8 and I want to add BOM to all the files
Context: Windows 8 app certification toolkit throws following error, if BOM is not added:
File C:\x\y\z.js is not properly UTF-8 encoded. Re-save the file as UTF-8 (including Byte Order Mark).
If you are using IDE like eclipse or netbeans, you can select all files and set the encoding.
Other option is open all files in text editor and change.
I know this is an old question but here is how I did it:
create a php file: addBOMtoFile.php and add the following line:
file_put_contents("some_new_file_name.js", "\xEF\xBB\xBF" . mb_convert_encoding(file_get_contents("some_file_name.js"), "UTF-8", "UTF-8"));
C:\php>php addBOMtoFile.php
We have a HSQL .script file in source control. Some of our developers use Linux, some use Windows. Each time there is a commit we have to deal with conflicts (each line in file has one) due to platform specific newline characters in script.
Is there a way to specify newline format for the HSQL script file.
You cannot specify the end of line (eol) format for the HSQLDB script. HSQLDB can read the .script file regardless of the eol format used when the file was saved.
Source control system usually allow to specify the eol format to use for text files. For example, Subversion has a svn:eol-style property which can be set to "native" for all or individual files.
Is it possible to change file's encoding from UTF-8 to windows1251 without cyrillic information lost. Because when I explicitely change the encoding, all cyrillic symbols become unreadable?
UPDATE: new IDE versions can convert encodings:
http://blogs.jetbrains.com/idea/2013/03/use-the-utf-8-luke-file-encodings-in-intellij-idea/
The problem is that IntelliJ IDEA doesn't actually convert your file encoding from UTF-8 to windows-1251, what happens is that you tell IntelliJ IDEA to treat UTF-8 file as being encoded in windows-1251, so you will see garbage in the editor. The actual file on disk remains in UTF-8.
You have to use some external tool to perform the conversion, such as iconv:
iconv.exe -f utf-8 -t windows-1251 <input file> > <output file>
Newer versions of IntelliJ will ask if you would like to "Reload" or "Convert" the file to the new encoding.
I had a file that was displayed using UTF-8 but was actually written in x-macRoman. I selected x-macRoman and chose "Reload" so that the encoding would be used to interpret the file, I then chose UTF-8 and selected "Convert". Now my file is properly encoded as UTF-8
Tested With: version 12.1.3
I'm using Zend Studio for Eclipise on Mac, and it seems to keep setting all files to have and encoding of 'Mac Roman'. This becomes problematic when I save the files, as they all need to be UTF-8.
I know how to change the encoding to UTF-8 on a file by file basis, but I was wondering if I could set this project wide?
Eclipse-Wide: Window->Preferences->Appearence->Workspace
Project-Wide: Rightclick on Project->Properties
Filewide: Rightclick on File->Properties
On my Eclipse for PHP Helios SR 2 for Mac:
Eclipse-Wide: Eclipse->Preferences->General->Workspace
The others are the same as #SkaveRat
On a Zend Studio 8.x,for Mac osx 10.5.8 I changed it like this:
Top menu chose: Edit->Set encoding->Other: UTF-8,. By default it is set Mac Roman.
And then apply.
Just remember, php does not actually support utf-8 encoded sourcefiles.
When creating strings in a utf-8 encoded file, php will just see 2 static bytes per character.
Try running the following with either utf-8 or ISO-8859-1 enconding.
strlen() will report different lengths depending on encoding.
<?php
$string = "äüö";
echo (strlen($string));
?>