Profiling tectonic

rekka · June 15, 2017, 4:40am

I want to share my experience with profiling tectonic on Ubuntu 17.04 using perf. While it is probably too early to optimize the code, and tectonic is definitely not slow, I was curious how much overhead the Rust IO bridge adds on top of XeTeX.

Setup

First, we need to compile tectonic with debug info and frame pointers to be able to call graphs. You can use my branch profile.

In Cargo.toml, add

[profile.release]
debug = true

and add -fno-omit-frame-pointer flag to gcc in build.rs by adding lines

    ccfg
        ...
        .flag("-fno-omit-frame-pointer")

    cpcfg
        ...
        .flag("-fno-omit-frame-pointer")

Profiling

Compile with cargo build --release and use perf to record the profiling data:

perf record -F 9997 -g -- target/release/tectonic tests/xenia/paper.tex

You might have to use sudo. Warning: Sometimes perf completely freezes my system, not even Linux consoles work and hardware reset is needed, so save your work before running.

Option -g -- asks perf to produce a call graph. -F 9997 is a sampling frequency that seems to work well in my case.

To see the data, run

perf report

Again, sudo might be needed.

Unfortunately, the report is not super readable.

Flame graph

FlameGraph can be used to produce a nice graph like this:

Just download stackcollapse-perf.pl and flamegraph.pl, put your on your $PATH and run

perf script | stackcollapse-perf.pl | flamegraph.pl > graph.svg

(sudo might be needed again.)

Open graph.svg in your browser (Firefox works well) for some interactive goodness.

Performance

The not so good news is that indeed a lot of time is spent in engines::input_read, cca. 35% of the total run time. load_fmt_file itself takes 40% of the total run time. The good news is that there is a lot of low hanging fruit for a speed up. For instance, the engine state after loading a format file could be cached for consecutive runs, input IO could be buffered on the C side, etc.

pkgw · June 20, 2017, 2:52am

Thanks for posting the detailed instructions!

It is good to know that the I/O indirection is indeed adding overhead, although subjectively I have been very happy with the performance so far. I have to admit that I’m not planning to prioritize work on the performance front unless things somehow become much slower than they are now, but it would be awesome if someone decides this is a fun problem and puts some work in

Have you compared at all to plain xelatex? I have to admit that I’m curious how we compare in speed … given what you’ve said, I’d guess that Tectonic is going to be slower, but you never know until you measure.

rekka · June 20, 2017, 3:37am

Yep, this is more of a sanity check that we are not adding too much overhead. I was also curious about the internals of TeX and where it spends most of its time. And now I finally have a simple way to compile TeX from source

Just a quick test on a 44 page note that I am working on at the moment, using time:

xelatex lecture.tex  1.51s user 0.04s system 107% cpu 1.450 total
tectonic lecture.tex -r0  1.94s user 0.03s system 99% cpu 1.969 total

(clean directory, 1 pass only, load_fmt_file takes only about 10% of the runtime in this case)

They are basically comparable, so not a big deal at the moment. This is definitely a low priority.

I’ve been using tectonic as my default ‘edit -> compile -> edit’ engine, and I’ve been happy with the performance.

pkgw · June 22, 2017, 12:30pm

Yeah, that’s consistent with my experience. I’ve been building a 100-page book with a lot of images and the compilations have been quite snappy.