I have been using LaTeX with Emacs and AUCTeX for 2 years and I haven't run into any problem. However, yesterday when I tried to compile the master file LaTeX gave me a strange error:
ERROR: Package inputenc Error: Unicode char \u8:\303\lst#FillFixed# not set up for use
--- TeX said ---
with LaTeX.
See the inputenc package documentation for explanation.
Type H for immediate help.
...
l.963 magicamente, è
ancora intatto. È una struttura solidamente costruita
I think the problem is the encoding format (è È) but I don't understand why this problem happens only for one "subordinated" file and not for the others.
The encoding I use is UTF-8 and is set both in my .emacs file and in the LaTeX master file. Moreover, the mode line says U, which indicates UTF-8 is used as an encoding for the buffer.
Do you have any suggestions for resolving my problem?
If that 303 is supposed to be the unicode point for è or È, then I think something is wrong. It should be 232 (0xE8) resp. 200 (0xC8). Have you tried re-entering those characters?
Have you tried
\usepackage[utf8]{inputenc}
That has some limitations - see Icelandic, utf8 and utf8x in LaTeX
Related
I'm working with Visual Studio Code under Lubuntu 18.04. The file encoding in VS Code is configured to be UTF-8, and the Python scripts have the encoding set to utf-8:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
The Python files contain some non-ASCII characters like in this example docstring:
"""
'Al final pudimos reparar el problema de registro de datos y se pudieron montar los
equipos para recoger algún dato más. ...'
"""
If executing the scripts I get the following error:
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfa in position 141: invalid start byte
Here is the traceback:
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("__main__"))
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 261, in run_path
code, fname = _get_code_from_file(run_name, path_name)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 236, in _get_code_from_file
code = compile(f.read(), fname, 'exec')
File "/home/USERNAME/Desktop/Python/Scripts/General/Import_export/import_EXCEL_spreadsheet_data_write_to_CSV.py", line 338
None of the numerous proposals worked for me, since this error is thrown when executing any Python script containing non-ASCII characters even in comments or docstrings.
Finally, I found the cause of the entire problem:
It lied in the settings.json - file, where autoguess-encoding was set to true:
"files.autoGuessEncoding": true
This option is able to override "files.encoding": "utf8", so even if you have defined a preferred encoding, VS Code is capable of guessing another encoding.
By virtue of the valuable hint of Brett Cannon I detected that indeed in the bottom right corner of VS Code the file's encoding was sometimes (automatically) put to Windows 1252. This unfortunate guess of VS Code's option "files.autoGuessEncoding": true led to the common errors mentioned above in my initial question (provided that I inserted Umlauts ("äöü..") or diacritics ("éúá..") somewhere in my script):
Getting the error message in pylint right after insertion: "error while code parsing: Wrong or no encoding specified for script.py."
Next, running the script produces the mentioned SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfa in position 141: invalid start byte
As stated in the discussion related to the aforementioned link, VS Code is still somewhat inaccurate when it comes to detecting the adequate file-encoding, which I can confirm.
To resolve this problem at last, avoid autodetection by putting the following 2 lines in your settings.json (or set the associated options in the settings-GUI of VS Code):
{...,
"files.encoding": "utf8",
"files.autoGuessEncoding": false,
...
}
Now, it is possible to place any character of desire within the text-file or script, such as Umlauts ("äöü..") and diacritics ("éúá..").
Finally, it is noteworthy that the above-mentioned settings won't change the encoding of already previously created and saved files.
For this to happen, you need to left-click on the encoding on the bottom right in the VS Code window, then either reopen or save with your desired encoding, which will most likely be utf8.
As an aside regarding the settings, note that you can also change these settings via the GUI under File -> Preferences -> Settings instead of using the settings.json - file (via Ctrl + Shift + P and then "Preferences: Open Settings (JSON)".
EDIT after finding the cause of the issue
The actual answer can be found above.
Nevertheless, by assuming that some information could still be useful, I'm not going to delete what was stated below in the original workaround and related comments.
Original workaround
Until someone comes up with a more elegant and to-the-point approach, I'll share my workaround which resolved the problem for me:
Identify which languages you actually need: In my case, I code in English, but sometimes it could be that I need to type something in German (Umlaute like "ääöüöü") or Spanish (accents like "á, ú, ..") which imply certain latin-non-ASCII characters.
Since utf-8 doesn't work for unknown reasons on my system outlined in the comments pertaining to my question, replace # -*- coding: utf-8 -*- with # -*- coding: latin-1 -*- in all python-scripts where it's needed, or even as default.
Alternatives how to go about accomplishing step 2 are the following:
Search all files in directory within VS Code via the magnifying glass on the top right, or using a shortcut. Hereby, it might be interesting to exclude the **/.history/** folder-pattern like so.
Use the CLI-approach sed for search and replace tasks.
By and large, this made it work for me. I can know type "ääöüöü" and "á, ú, ..." in my scripts and save them without getting any errors in VS Code.
I originally ran my .Rnw-file with the latex option:
\usepackage[utf8]{inputenc}
It produced an error:
"! Package inputenc Error: Unicode char \u8: not set up for use with LaTeX."
I switched to [utf8x], which generated a somewhat more helpful error message:
"! Package ucs Error: Unknown Unicode character 150 = U+0096,
(ucs) possibly declared in uni-0.def."
I tried to replace the 0096 (http://www.charbase.com/0096-unicode-start-of-guarded-area) character with \DeclareUnicodeCharacter{0096}{\"o} to easily detect where to problem was but when using [utf8x] the error message remained the same and when using [utf8] there was an additional error: "! Package inputenc Error: Cannot define Unicode char value < 00A0"
Thanks for any help!
I had the same issue with my bibliography. In my editor (TeXstudio), the character U+0096 is rendered as whitespace. For some unknown reason, the line pdflatex reports as containing the offending character is inaccurate.
I solved the problem by running a regular expression search for \x0096 and it found the offending character immediately. Deleting the character and replacing it with a true space fixed the issue.
Incidentally, I tried the \DeclareUnicodeCharacter{0096}{ } fix and it did nothing for me. This could be because the offending character was in the .bib file rather than the .tex file where I placed the command.
I do not think that it is workable way by switching the [utf8x].
Just carefully check your code, particularly the part you copy from somewhere, not type it by your self.
I do have the same problem recently.
I show you How I solve this problem.
I remove the code from the R markdown part by part to find which part caused this problem. Finally, I found the below part that resulted in the error in my code.
### Platform:Affymetrix A-AFFY-2-Affymetrix GeneChip Arabidopsis Genome [ATH1-121501].
I remember I copy this information from webpage. So I delete them and type this part by myself. It can run and generate the pdf file without any error.
To be clear, I show you the difference between the copy version and the version of my typing:
This is just one example I think. I want to point out that it is always problematic when you copy something from an unknown resource file into your code.
Hope this can help you and other people who were frustrated by this problem.
the warning is as below,
"failed to translate characters from US-ASCII to UTF-8: check INPUT_ENCODING"
I am running doxygen over a c++project
New to doxygen and do not know how to proceed.
Help. Thanks.
Somewhere in your code you're using a special character that is not converting correctly. Are there any non-English words or other likely sources of special characters?
http://www.doxygen.nl/manual/config.html#cfg_input_encoding
Problem
VerbatimOut from the “fancyvrb” package doesn’t play nicely with UTF-8 characters.
Minimal working example:
\documentclass{minimal}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{fancyvrb}
\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}
\input{\jobname.test}
\end{document}
Error message
When compiled using pdflatex mini, this gives the error
File ended while scanning use of \UTFviii#three#octets.
A different error occurs when the sole occurrence of é above is replaced by something else, e.g. é */:
Package inputenc Error: Unicode char \u8:### not set up for use with LaTeX.
– indicating that in this case, LaTeX succeeds in reading a multi-byte UTF-8 character, but not knowing what to do with it (i.e. it’s the wrong character).
In fact, when I open the produced .test file manually, it contains the character é, but in Latin-1 encoding!
Proof: when I open the files in a hex editor, I get the following:
Original file: C3 A9 (corresponds to LATIN SMALL LETTER E WITH ACUTE in UTF-8)
Written file: E9 (corresponds to é in Latin-1)
Question
How to set VerbatimOut up correctly?
filecontents* (from “filecontents”) shows that it can work. Unfortunately, I don’t understand either code so I cannot fix fancyvrb’s code by replicating the logic from filecontents manually.
I also cannot use filecontents* instead of VerbatimOut because the former doesn’t work within a \newenvironment, while the latter does.
(Oh, by the way: vanilla Verbatim instead of VerbatimOut also works as expected. The error seems to occur when writing the file, not when reading the verbatim input)
Is your end goal to write symbols and accents in Verbatim? Because you can do that like this:
\documentclass{article}
\usepackage{fancyvrb}
\begin{document}
\begin{Verbatim}[commandchars=\\\{\}]
\'{e} \~{e} \`{e} \^{e}
\end{Verbatim}
\end{document}
The commandchars option allows the \ { } characters to work as they normally would.
Source: http://ctan.mirror.garr.it/mirrors/CTAN/macros/latex/contrib/fancyvrb/fancyvrb.pdf
This is still unfixed? I'll take another look. What exactly do you want: your package to use VerbatimOut, or for it not to interfere with it?
Tests
TexLive 2009's Xelatex compiles fine. With pdflatex, version
This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009)
I get an error message that is rather more useful error message than you got:
! Argument of \UTFviii#three#octets has an extra }.
\par
l.8 é
? i \makeatletter\show\UTFviii#three#octets
! Undefined control sequence.
\GenericError ...
#4 \errhelp \#err# ...
l.8 é
If I were to make a wild guess, I'd say that inputenc with pdftex uses the pdftex primitives to do some hairy storing and restoring of character tables, and some table somewhere has got a rarely mistake in it.
Possibly related
I saw a post by Vladimir Volovich in the pdf-tex mailing list archives, all the way back from 2003, that discusses a conflict between inputenc & fancyvrb, and posts a patch to "solve the problem". Who knows, maybe he faced the same problem? It might be worth emailing him.
XeTeX has much better Unicode support. The following run through xelatex produces “é” both in \jobname.test and the output PDF.
\documentclass{minimal}
\usepackage{fontspec}
\tracingonline=1
\usepackage{fancyvrb}
\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}
\input{\jobname.test}
\end{document}
fontspec loads the Latin Modern fonts, which have Unicode support. The standard TeX Computer Modern fonts don’t have the right tables for Unicode support.
If you use a character that does not have a glyph in the current font, by default XeTeX writes a blank space to the PDF and prints a warning in the log but not on the terminal. \tracingonline=1 prints the warning to the terminal.
On http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Main.LiterateAgda, they suggest that you should use
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
in the preabmle. I successfully used this in order to insert unicode into a verbatim environment.
\documentclass{article}
\usepackage{fancyvrb}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\newenvironment{MonVerbatim}{%
\count0=128\relax %
\loop
\catcode\count0=11\relax
\advance\count0 by 1\relax
\ifnum\count0<256
\repeat
\VerbatimOut[commandchars=\\\{\}]{VerbatimText.tex}%
}{\endVerbatimOut}
\newcommand\test{A command producing accented characters éà}
\begin{document}
\begin{MonVerbatim}
A little bit text in verbatim mode éà_].
\test
\end{MonVerbatim}
Followed by some accented character éà.
\end{document}
This code is working for me with TeXLive 2018 and pdflatex. Yous should
probably avoid changing catcode if you are using a 16 bits TeX (lualatex or xelatex).
You can use the package "iftex" to check the tex engine used.
i have an url with cyrilic characters:
http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование
when i compile the document, i get following as url:
http://www.pravoslavie.bg/%5CT2A%5CCYRV%20%5CT2A%5Ccyrhrdsn%20%5CT2A%5Ccyrz%20%5CT2A%5Ccyrp%20%5CT2A%5Ccyri%20%5CT2A%5Ccyrt%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyri%20%5CT2A%5Ccyre%20/%5CT2A%5CCYRD%20%5CT2A%5Ccyru%20%5CT2A%5Ccyrh%20%5CT2A%5Ccyro%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyro%20-%5CT2A%5Ccyri%20-%5CT2A%5Ccyrs%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyre%20%5CT2A%5Ccyrt%20%5CT2A%5Ccyrs%20%5CT2A%5Ccyrk%20%5CT2A%5Ccyro%20-%5CT2A%5Ccyro%20%5CT2A%5Ccyrb%20%5CT2A%5Ccyrr%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrz%20%5CT2A%5Ccyro%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyri%20%5CT2A%5Ccyre
and that ist not the same. Can I set the encoding to utf8 for hyperref? Or how can i solve the problem?
If you're happy not to use the \url command (i.e., you'll need to break lines manually) you can do the following in regular LaTeX:
\documentclass{article}
\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
\begin{document}
\texttt{http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование}
\end{document}
If you need to get the hyperlinks working, my only suggestion for now is to use either XeTeX or LuaTeX to be able to use proper unicode input/output. Something like the following produces at least the correct-looking output in XeTeX, although the hyperlink itself is broken for some reason :(
\documentclass{article}
\usepackage{fontspec,hyperref}
\setmonofont{Arial Unicode MS}
\begin{document}
\url{http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование}
\end{document}
I had a similar problem with the pdftitle field.
splitting use declaration and setup made it work correctly
\usepackage{hyperref}
\hypersetup{
pdftitle=Priorità
}
Assuming your LaTeX source is utf8 encoded, try adding \usepackage[utf8]{inputenc} to your document. If utf8 doesn't work try utf8x. See here
If it is, as the other posters seem to assume, a charset issue, make sure the character encoding for the bibtex source and the tex document match. Cf. Q#1635788: Different encoding of latex and bibtex files. You don't need to make the character encodings both be utf8; is should think that latin-5 or KOI8-R would both work, but it is the best supported.
If it isn't, than as per my comment above: look at the software chain that you are using: editor, makefiles, &c, to see if something is doing unwanted URL escaping for you. Then deal ruthlessly with the offending software.
#Mike Weller:
i have already \usepackage[utf8]{inputenc} in my document, with utf8x i get following as url:
http://www.pravoslavie.bg/\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{з}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{п}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]/\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]-\begingroup\let\
relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]-\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðo}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]-\
begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{б}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{з}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]D
edit: the problem is solved - i've used URL Encoding to convert the cyrilic chars :)
\usepackage[unicode]{hyperref}
worked for me (since at least June 2010) using texlive distribution
(not sure if it is relevant).