I want to share my experience with profiling tectonic on Ubuntu 17.04 using perf. While it is probably too early to optimize the code, and tectonic is definitely not slow, I was curious how much overhead the Rust IO bridge adds on top of XeTeX.
Setup
First, we need to compile tectonic with debug info and frame pointers to be able to call graphs. You can use my branch profile.
In Cargo.toml, add
[profile.release]
debug = true
and add -fno-omit-frame-pointer flag to gcc in build.rs by adding lines
ccfg
...
.flag("-fno-omit-frame-pointer")
cpcfg
...
.flag("-fno-omit-frame-pointer")
Profiling
Compile with cargo build --release and use perf to record the profiling data:
perf record -F 9997 -g -- target/release/tectonic tests/xenia/paper.tex
You might have to use sudo. Warning: Sometimes perf completely freezes my system, not even Linux consoles work and hardware reset is needed, so save your work before running.
Option -g -- asks perf to produce a call graph. -F 9997 is a sampling frequency that seems to work well in my case.
To see the data, run
perf report
Again, sudo might be needed.
Unfortunately, the report is not super readable.
Flame graph
FlameGraph can be used to produce a nice graph like this:
Just download stackcollapse-perf.pl and flamegraph.pl, put your on your $PATH and run
perf script | stackcollapse-perf.pl | flamegraph.pl > graph.svg
(sudo might be needed again.)
Open graph.svg in your browser (Firefox works well) for some interactive goodness.
Performance
The not so good news is that indeed a lot of time is spent in engines::input_read, cca. 35% of the total run time. load_fmt_file itself takes 40% of the total run time. The good news is that there is a lot of low hanging fruit for a speed up. For instance, the engine state after loading a format file could be cached for consecutive runs, input IO could be buffered on the C side, etc.

