Using Unicode in fancyvrb’s VerbatimOut - unicode

Problem
VerbatimOut from the “fancyvrb” package doesn’t play nicely with UTF-8 characters.
Minimal working example:
\documentclass{minimal}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{fancyvrb}
\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}
\input{\jobname.test}
\end{document}
Error message
When compiled using pdflatex mini, this gives the error
File ended while scanning use of \UTFviii#three#octets.
A different error occurs when the sole occurrence of é above is replaced by something else, e.g. é */:
Package inputenc Error: Unicode char \u8:### not set up for use with LaTeX.
– indicating that in this case, LaTeX succeeds in reading a multi-byte UTF-8 character, but not knowing what to do with it (i.e. it’s the wrong character).
In fact, when I open the produced .test file manually, it contains the character é, but in Latin-1 encoding!
Proof: when I open the files in a hex editor, I get the following:
Original file: C3 A9 (corresponds to LATIN SMALL LETTER E WITH ACUTE in UTF-8)
Written file: E9 (corresponds to é in Latin-1)
Question
How to set VerbatimOut up correctly?
filecontents* (from “filecontents”) shows that it can work. Unfortunately, I don’t understand either code so I cannot fix fancyvrb’s code by replicating the logic from filecontents manually.
I also cannot use filecontents* instead of VerbatimOut because the former doesn’t work within a \newenvironment, while the latter does.
(Oh, by the way: vanilla Verbatim instead of VerbatimOut also works as expected. The error seems to occur when writing the file, not when reading the verbatim input)

Is your end goal to write symbols and accents in Verbatim? Because you can do that like this:
\documentclass{article}
\usepackage{fancyvrb}
\begin{document}
\begin{Verbatim}[commandchars=\\\{\}]
\'{e} \~{e} \`{e} \^{e}
\end{Verbatim}
\end{document}
The commandchars option allows the \ { } characters to work as they normally would.
Source: http://ctan.mirror.garr.it/mirrors/CTAN/macros/latex/contrib/fancyvrb/fancyvrb.pdf

This is still unfixed? I'll take another look. What exactly do you want: your package to use VerbatimOut, or for it not to interfere with it?
Tests
TexLive 2009's Xelatex compiles fine. With pdflatex, version
This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009)
I get an error message that is rather more useful error message than you got:
! Argument of \UTFviii#three#octets has an extra }.
\par
l.8 é
? i \makeatletter\show\UTFviii#three#octets
! Undefined control sequence.
\GenericError ...
#4 \errhelp \#err# ...
l.8 é
If I were to make a wild guess, I'd say that inputenc with pdftex uses the pdftex primitives to do some hairy storing and restoring of character tables, and some table somewhere has got a rarely mistake in it.
Possibly related
I saw a post by Vladimir Volovich in the pdf-tex mailing list archives, all the way back from 2003, that discusses a conflict between inputenc & fancyvrb, and posts a patch to "solve the problem". Who knows, maybe he faced the same problem? It might be worth emailing him.

XeTeX has much better Unicode support. The following run through xelatex produces “é” both in \jobname.test and the output PDF.
\documentclass{minimal}
\usepackage{fontspec}
\tracingonline=1
\usepackage{fancyvrb}
\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}
\input{\jobname.test}
\end{document}
fontspec loads the Latin Modern fonts, which have Unicode support. The standard TeX Computer Modern fonts don’t have the right tables for Unicode support.
If you use a character that does not have a glyph in the current font, by default XeTeX writes a blank space to the PDF and prints a warning in the log but not on the terminal. \tracingonline=1 prints the warning to the terminal.

On http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Main.LiterateAgda, they suggest that you should use
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
in the preabmle. I successfully used this in order to insert unicode into a verbatim environment.

\documentclass{article}
\usepackage{fancyvrb}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\newenvironment{MonVerbatim}{%
\count0=128\relax %
\loop
\catcode\count0=11\relax
\advance\count0 by 1\relax
\ifnum\count0<256
\repeat
\VerbatimOut[commandchars=\\\{\}]{VerbatimText.tex}%
}{\endVerbatimOut}
\newcommand\test{A command producing accented characters éà}
\begin{document}
\begin{MonVerbatim}
A little bit text in verbatim mode éà_].
\test
\end{MonVerbatim}
Followed by some accented character éà.
\end{document}
This code is working for me with TeXLive 2018 and pdflatex. Yous should
probably avoid changing catcode if you are using a 16 bits TeX (lualatex or xelatex).
You can use the package "iftex" to check the tex engine used.

Related

Package inputenc Error: Unicode char \u8:β not set up for use with LaTeX

One of my references in Bibdesk contains some latin/Greek character e.g. 'β'. I am getting the error while using the reference in TEXMAKER:
"! Package inputenc Error: Unicode char \u8:β not set up for use with LaTeX."
How can I set it up to work?
Though with inputenc TeX can read all the unicode characters, it doesn't know what to do with most of them, except those in the usual ascii range. I once also had a problem with that, when I wanted to copy some unicode text verbatim into one of my TeX documents, and that text contained symbols like alpha, or other math symbols.
The solution to that is the command \DeclareUnicodeCharacter{#1}{#2} where in #1 you have to put the unicode value of the character and in #2 a tex expression, that gets inserted, when character code #1 is encountered. E.g. for the beta you could use \DeclareUnicodeCharacter{03B2}{\ensuremath{\beta}}, because 03B2 is the unicode character value for the symbol "beta" (you have to look those things up in a Unicode table).
I've also written a tex package for that, if you're interested. It can be found on github at https://github.com/ezander/utf8math. See especially this file here: https://github.com/ezander/utf8math/blob/master/utf8math.sty
Another solution is to use XeTeX, which is more suitable than most other TeX engines for unicode : replace the lines
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
with
\usepackage{fontspec}

Emacs: AUCTeX-LaTeX compilation error

I have been using LaTeX with Emacs and AUCTeX for 2 years and I haven't run into any problem. However, yesterday when I tried to compile the master file LaTeX gave me a strange error:
ERROR: Package inputenc Error: Unicode char \u8:\303\lst#FillFixed# not set up for use
--- TeX said ---
with LaTeX.
See the inputenc package documentation for explanation.
Type H for immediate help.
...
l.963 magicamente, è
ancora intatto. È una struttura solidamente costruita
I think the problem is the encoding format (è È) but I don't understand why this problem happens only for one "subordinated" file and not for the others.
The encoding I use is UTF-8 and is set both in my .emacs file and in the LaTeX master file. Moreover, the mode line says U, which indicates UTF-8 is used as an encoding for the buffer.
Do you have any suggestions for resolving my problem?
If that 303 is supposed to be the unicode point for è or È, then I think something is wrong. It should be 232 (0xE8) resp. 200 (0xC8). Have you tried re-entering those characters?
Have you tried
\usepackage[utf8]{inputenc}
That has some limitations - see Icelandic, utf8 and utf8x in LaTeX

Problem with LaTeX hyperref

i have an url with cyrilic characters:
http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование
when i compile the document, i get following as url:
http://www.pravoslavie.bg/%5CT2A%5CCYRV%20%5CT2A%5Ccyrhrdsn%20%5CT2A%5Ccyrz%20%5CT2A%5Ccyrp%20%5CT2A%5Ccyri%20%5CT2A%5Ccyrt%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyri%20%5CT2A%5Ccyre%20/%5CT2A%5CCYRD%20%5CT2A%5Ccyru%20%5CT2A%5Ccyrh%20%5CT2A%5Ccyro%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyro%20-%5CT2A%5Ccyri%20-%5CT2A%5Ccyrs%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyre%20%5CT2A%5Ccyrt%20%5CT2A%5Ccyrs%20%5CT2A%5Ccyrk%20%5CT2A%5Ccyro%20-%5CT2A%5Ccyro%20%5CT2A%5Ccyrb%20%5CT2A%5Ccyrr%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrz%20%5CT2A%5Ccyro%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyri%20%5CT2A%5Ccyre
and that ist not the same. Can I set the encoding to utf8 for hyperref? Or how can i solve the problem?
If you're happy not to use the \url command (i.e., you'll need to break lines manually) you can do the following in regular LaTeX:
\documentclass{article}
\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
\begin{document}
\texttt{http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование}
\end{document}
If you need to get the hyperlinks working, my only suggestion for now is to use either XeTeX or LuaTeX to be able to use proper unicode input/output. Something like the following produces at least the correct-looking output in XeTeX, although the hyperlink itself is broken for some reason :(
\documentclass{article}
\usepackage{fontspec,hyperref}
\setmonofont{Arial Unicode MS}
\begin{document}
\url{http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование}
\end{document}
I had a similar problem with the pdftitle field.
splitting use declaration and setup made it work correctly
\usepackage{hyperref}
\hypersetup{
pdftitle=Priorità
}
Assuming your LaTeX source is utf8 encoded, try adding \usepackage[utf8]{inputenc} to your document. If utf8 doesn't work try utf8x. See here
If it is, as the other posters seem to assume, a charset issue, make sure the character encoding for the bibtex source and the tex document match. Cf. Q#1635788: Different encoding of latex and bibtex files. You don't need to make the character encodings both be utf8; is should think that latin-5 or KOI8-R would both work, but it is the best supported.
If it isn't, than as per my comment above: look at the software chain that you are using: editor, makefiles, &c, to see if something is doing unwanted URL escaping for you. Then deal ruthlessly with the offending software.
#Mike Weller:
i have already \usepackage[utf8]{inputenc} in my document, with utf8x i get following as url:
http://www.pravoslavie.bg/\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{з}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{п}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]/\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]-\begingroup\let\
relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]-\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðo}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]-\
begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{б}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{з}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]D
edit: the problem is solved - i've used URL Encoding to convert the cyrilic chars :)
\usepackage[unicode]{hyperref}
worked for me (since at least June 2010) using texlive distribution
(not sure if it is relevant).

Displaying Unicode characters above U+FFFF on Windows

the application I'm developing with EVC++ 4 runs on Windows CE 5 and should support unicode (AFAIK wchar_t uses UTF-16 on windows, so I'm using that), so I want to be able to test it with "more exotic" characters. Especially with characters that use 4 Byte in UTF-16 and not just 2. Therefore I'm trying to display such characters in a texteditor (atm on my desktop PC with Windows XP, not on the embedded device).
But I haven't managed it to do so yet. As an example I've chosen this character.
Like mentioned here "MPH 2B Damase" should support this character. So I downloaded the font and put it into Windows\Fonts. I created a textfile using a hexeditor (just to be sure) with following content:
FFFE D802 DC00
When I open it with notepad (which should be unicode-capable, right?) and use the downloaded font it doesn't display 1 char, as intended, but this 2:
˘Ü
What am I doing wrong? :)
Thanks!
hrniels
Edit:
Flipping the BOM, as suggested, doesn't work. Notepad (and all other editors I tried, too) displays two squares in this case. Interesting is that if I copy the two squares here (with firefox) I see the right character:
I've also tried it with Komodo Edit with the same result.
Using UTF-8 doesn't help notepad either.
What happens if you put the byte order mark the other way around?
FEFF D802 DC00
(At the moment the byte sequence is being interpreted as the two characters U+02D8 U+00DC, so hopefully flipping the BOM will cause the bytes to be read in the intended order)
Probably you forgot to read the _wfopen() documentation. There they specify the encoding parameter. BTW, I assumed you are already using Unicode (wchars).
I would recommend you to use UTF-8 in files with or without BOM but forcing your fopen to use UTF-8 flag. It looks _wfopen("newfile.txt", "r, ccs=UTF-8"); will work with UTF-8 with or without BOM and also with UTF-16. Do not make the mistake of using the ccs=Unicode, it is a common thing to have UTF-8 files without BOM.
You should really read a little bit about Unicode before trying to work. This about this as a very good investment - it will save you time if you understand how Unicode works.
Here is a start http://blog.i18n.ro/newbie-guide-to-unicode/ and do not forget to read the links from the end of the article.
If you really need a simple text editor that allows you to play with Unicode encodings, use Notepad++ and forget about Notepad.
Your text editor might not like UTF-16. It probably assumes ANSI or UTF-8.
Try typing in the UTF-8 equivalent instead:
0xF0 0x90 0xA0 0x80
This won't help your testing, but will make sure your font isn't at fault. A text editor that does support UTF-16 is Komodo Edit.

Which packages for many unicode-characters?

I'm trying to create a LaTeX document with as many as Unicode characters as possible. My header is as follows:
\documentclass[12pt,a4paper]{article}
\usepackage[utf8x]{inputenc}
\usepackage{ucs}
\pagestyle{empty}
\usepackage{textcomp}
\usepackage[T1]{fontenc}
\begin{document}
The Unicode characters which follow in the document-body are in the form of
\unichar{xyz}
where xyz stands for an integer, e.g. 97 (for "a").
The problem is: for many integers the script's not compiling with an error message as:
! Package ucs Error: Unknown Unicode character 79297 = U+135C1,
(ucs) possibly declared in uni-309.def.
(ucs) Type H to see if it is available with options.
Which packages should I add to the header to make the file compile with as many characters as possible?
As far as I remember the utf8x package is merely a hack to allow for some Unicode "support". Basically it was a giant lookup table translating individual character sequences to LaTeX's expectations.
You should really use Xe(La)?TeX for such things which was designed with Unicode in mind. The old TeX still suffers from it's 1970's heritage in this respect.
ETA: From the package's documentation:
This bundle provides the ucs package, and utf8x.def, together with a large number of support files.
The utf8x.def definition file for use with inputenc covers a wider range of Unicode characters than does utf8.def in the LaTeX distribution. The ucs package provides facilities for efficient use of large sets of Unicode characters.
Since you're already using both packages I guess you're out of luck then with plain LaTeX.
You can manually edit .def files similarly to this https://bugzilla.redhat.com/show_bug.cgi?id=418981
For example this is only way how to enable LaTex for Latvian language as far as I know.
Just find you symbol (on Mac use "locate uni-1.def" to find where the package is located - to enable locate command http://osxdaily.com/2011/11/02/enable-and-use-the-locate-command-in-the-mac-os-x-terminal/ )