This website is created using the Zola static website engine, combining HTML templates (with Tera) and Markdown files.
Zola highlights code blocks such as
async fn test() {
todo!()
}
using the syntect
Rust crate, based on the syntaxes from Sublime Text
Issues with syntect
and possible solutions
Unfortunately, syntect
uses relatively old Sublime syntaxes and an update is not trivial. This results for example in:
async
andawait
keywords not being highlighted correctly in Rust.- Useful syntaxes missing, such as a
console
mode.
This problem discussed at length in this Zola issue, with two options being migrating from syntect
to
tree-sitter
; the GitHub issue contains a prototype branch using thetree-painter
crate.- a Rust port of the
pygments
Python library, which works quite well despite performing simpler parsing thantree-sitter
.
Syntax highlighting with Pygments
In the meantime, we can do a really dirty hack and call pygments
from Rust, not even via pyo3
but via the pygmentize
CLI, paying for the initialization of the Python runtime at each call.
Processing an average-sized code block with pygmentize
takes roughly 100 ms. Given that Zola is a static website generator and that pages are built in parallel, increased compilation time is a problem mostly for live previewing (zola serve
).
To speed things up a bit, we can implement caching based on the code block contents, so that each code block needs to be processed only once. Overall, the change is focused on the CodeBlock::highlight
method:
impl CodeBlock {
pub fn highlight(&mut self, content: &str) -> errors::Result<String> {
// Setup cache
let cache = dirs::cache_dir().unwrap();
let cache = cache.join("zola");
std::fs::create_dir_all(&cache)?;
// Check cache
let mut hasher = DefaultHasher::new();
content.hash(&mut hasher);
self.language.hash(&mut hasher);
let hash = hasher.finish();
let filename = cache.join(hash.to_string());
if filename.is_file() {
return Ok(std::fs::read_to_string(filename)?);
}
// This is critical to not mess up the recursive parsing in markdown_to_html
let content = content.lines().map(|l| if l.is_empty() { " " } else { l })
.join("\n");
// Run through pygmentize if necessary
let mut child = Command::new("pygmentize")
.args(["-l", &self.language, "-P", "style=nord", "-P", "nowrap",
"-f", "html"])
.stdout(Stdio::piped())
.stdin(Stdio::piped())
.spawn()?;
let mut stdin = child.stdin.take().unwrap();
stdin.write(content.as_bytes())?;
drop(stdin);
let output = child.wait_with_output()?;
let output = String::from_utf8(output.stdout)?;
// Write cache
std::fs::write(filename, &output)?;
Ok(output)
}
The CSS must be included on the page separately after being generated with:
$ pygmentize -S nord -f html > highlighting.css
Timings
This results in the following global generation timings for this website:
Zola highlighting | Generation time [ms] |
---|---|
syntect | 190 |
pygments | 950 |
pygments (cached) | 90 |
Assuming that only one block of code changes at a time and that they are infrequent compared to text edits, this is reasonable.
Follow-ups
With a bit more care, we could:
- Read the style from the configuration rather than hard-coding it.
- Support the line annotations from Zola (numbering, selecting, highlighting, hiding).
- Avoid repeated loadings of Python and the
pygments
module, although that might conflict with GIL locks. - Implement caching for
syntect
too.