This website is created using the Zola static website engine, combining HTML templates (with Tera) and Markdown files.
Zola highlights code blocks such as
async fn test() {
todo!()
}
using the syntect Rust crate, based on the syntaxes from Sublime Text
Issues with syntect and possible solutions
Unfortunately, syntect uses relatively old Sublime syntaxes and an update is not trivial. This results for example in:
asyncandawaitkeywords not being highlighted correctly in Rust.- Useful syntaxes missing, such as a
consolemode.
This problem discussed at length in this Zola issue, with two options being migrating from syntect to
tree-sitter; the GitHub issue contains a prototype branch using thetree-paintercrate.- a Rust port of the
pygmentsPython library, which works quite well despite performing simpler parsing thantree-sitter.
Syntax highlighting with Pygments
In the meantime, we can do a really dirty hack and call pygments from Rust, not even via pyo3 but via the pygmentize CLI, paying for the initialization of the Python runtime at each call.
Processing an average-sized code block with pygmentize takes roughly 100 ms. Given that Zola is a static website generator and that pages are built in parallel, increased compilation time is a problem mostly for live previewing (zola serve).
To speed things up a bit, we can implement caching based on the code block contents, so that each code block needs to be processed only once. Overall, the change is focused on the CodeBlock::highlight method:
impl CodeBlock {
pub fn highlight(&mut self, content: &str) -> errors::Result<String> {
// Setup cache
let cache = dirs::cache_dir().unwrap();
let cache = cache.join("zola");
std::fs::create_dir_all(&cache)?;
// Check cache
let mut hasher = DefaultHasher::new();
content.hash(&mut hasher);
self.language.hash(&mut hasher);
let hash = hasher.finish();
let filename = cache.join(hash.to_string());
if filename.is_file() {
return Ok(std::fs::read_to_string(filename)?);
}
// This is critical to not mess up the recursive parsing in markdown_to_html
let content = content.lines().map(|l| if l.is_empty() { " " } else { l })
.join("\n");
// Run through pygmentize if necessary
let mut child = Command::new("pygmentize")
.args(["-l", &self.language, "-P", "style=nord", "-P", "nowrap",
"-f", "html"])
.stdout(Stdio::piped())
.stdin(Stdio::piped())
.spawn()?;
let mut stdin = child.stdin.take().unwrap();
stdin.write(content.as_bytes())?;
drop(stdin);
let output = child.wait_with_output()?;
let output = String::from_utf8(output.stdout)?;
// Write cache
std::fs::write(filename, &output)?;
Ok(output)
}
The CSS must be included on the page separately after being generated with:
$ pygmentize -S nord -f html > highlighting.css
Timings
This results in the following global generation timings for this website:
| Zola highlighting | Generation time [ms] |
|---|---|
syntect | 190 |
pygments | 950 |
pygments (cached) | 90 |
Assuming that only one block of code changes at a time and that they are infrequent compared to text edits, this is reasonable.
Follow-ups
With a bit more care, we could:
- Read the style from the configuration rather than hard-coding it.
- Support the line annotations from Zola (numbering, selecting, highlighting, hiding).
- Avoid repeated loadings of Python and the
pygmentsmodule, although that might conflict with GIL locks. - Implement caching for
syntecttoo.