Tecronic doesn't show accent

Frequel · April 11, 2018, 1:50pm

i’m writing a pdf for the University, i need to write in italian and we use a lot oh accents and apostrophes.
while texlive-full compile correctly the file and show accents and apostrophes in the pdf, tectonic doesn’t show them, it eliminates the corrispondent character , so , for example doesn’t appear doesnt.

pkgw · April 12, 2018, 1:42am

Can you post an example document, preferably reduced down to the smallest thing that demonstrates the problem? Tectonic should be able to handle accents very well.

garro · May 8, 2018, 9:31am

In fact, tectonic has problems handling many utf8 characters:

\documentclass[11pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\author{Garro}
\title{Test utf8 characters}
\begin{document}
	\section{Lorem ipsum doλor}
	Text containing ƙ and èòà@ç
\end{document}

In this text, just the @ is rendered correctly.

pkgw · May 8, 2018, 1:50pm

I think this is a manifestation of GitHub issue #173 — the contents of the final PDF will depend significantly on which font you are using. In your example, the default LaTeX font doesn’t cover the needed characters.

(One side note, the inputenc package is not needed for Tectonic since it is XeTeX-derived.)

The following variant of your example does better, but still misses the λ and the ƙ.

\documentclass[11pt,a4paper]{article}
\usepackage{fontspec} % <= the Tectonic/XeTeX font management package
\author{Garro}
\title{Test utf8 characters}
\begin{document}
	\section{Lorem ipsum doλor}
	Text containing ƙ and èòà@ç
\end{document}

Here, the fontspec package loads a font that has better character coverage than the LaTeX default: Latin Modern.

On my machine, this second version covers all of the characters with a font available on my machine, Linux Libertine O:

\documentclass[11pt,a4paper]{article}
\usepackage{fontspec}
\setmainfont{Linux Libertine O} % <= new line here
\author{Garro}
\title{Test utf8 characters}
\begin{document}
	\section{Lorem ipsum doλor}
	Text containing ƙ and èòà@ç
\end{document}

So in a certain sense, the problem is that many people are not aware that the fontspec package needs to be loaded to get decent Unicode coverage — the user experience here is a pain because different TeX engines (pdftex, luatex, xetex, Tectonic) require different magic commands. As a practical solution, I think Tectonic should issue a warning when it needs to output a character not supported by the current font — that’s issue #173.

garro · May 8, 2018, 8:29pm

Yes, it told me, but I left it for compatibility with latex and pdflatex.

Then, why does pdflatex correctly handle at least accents and cedillas?

pkgw · May 9, 2018, 1:21am

I believe that the basic answer is given here — at least, it squares with my coder’s intuition for what’s going on underneath the hood.

In a TeX engine that’s not Unicode-aware, like pdftex, a “character” is exactly one byte in the input TeX file. When saved in the UTF8 encoding, something that appears in your editor as “ç” is represented as a pair of bytes (or maybe more?) in the underlying file. So when the TeX engine reads your file, it interprets each byte of the ç as a separate character; the first character has a special funky value not used in plain ASCII text. The inputenc package basically adds some magic to detect the relevant multi-character sequences and translate them into TeX constructs like \c{c}.

But in a Unicode-aware engine like XeTeX or Tectonic, the definition of a “character” is different — a character is no longer exactly 1 byte, but rather a multi-byte value representing a particular concept defined by Unicode in a very specific way because human writings systems are immensely varied. So, inputenc never sees the magic character pairs it expects, and its explicit translation doesn’t happen. Instead, the translation should happen a bit farther downstream, when the engine determines how to represent the Unicode values in the current font — but if the current font doesn’t have glyphs for the underlying text, you get nothing.

pkgw · May 17, 2018, 2:03pm

I’ve gone ahead and merged a pull request to master that tries to provide some user feedback in this situation. We’ll have to wait and see how well it works out in real-world usage.