cpg

Zola syntax highlighting with Pygments

2023-07-29 #rust

Replacing syntect with pygments as Zola’s syntax highlighter.

This website is created using the Zola static website engine, combining HTML templates (with Tera) and Markdown files.

Zola highlights code blocks such as

async fn test() {
    todo!()
}

using the syntect Rust crate, based on the syntaxes from Sublime Text

Issues with syntect and possible solutions

Unfortunately, syntect uses relatively old Sublime syntaxes and an update is not trivial. This results for example in:

This problem discussed at length in this Zola issue, with two options being migrating from syntect to

Syntax highlighting with Pygments

In the meantime, we can do a really dirty hack and call pygments from Rust, not even via pyo3 but via the pygmentize CLI, paying for the initialization of the Python runtime at each call.

Processing an average-sized code block with pygmentize takes roughly 100 ms. Given that Zola is a static website generator and that pages are built in parallel, increased compilation time is a problem mostly for live previewing (zola serve).

To speed things up a bit, we can implement caching based on the code block contents, so that each code block needs to be processed only once. Overall, the change is focused on the CodeBlock::highlight method:

impl CodeBlock {
    pub fn highlight(&mut self, content: &str) -> errors::Result<String> {
        // Setup cache
        let cache = dirs::cache_dir().unwrap();
        let cache = cache.join("zola");
        std::fs::create_dir_all(&cache)?;
 
        // Check cache
        let mut hasher = DefaultHasher::new();
        content.hash(&mut hasher);
        self.language.hash(&mut hasher);
        let hash = hasher.finish();
        let filename = cache.join(hash.to_string());
        if filename.is_file() {
            return Ok(std::fs::read_to_string(filename)?);
        }
        // This is critical to not mess up the recursive parsing in markdown_to_html
        let content = content.lines().map(|l| if l.is_empty() { " " } else { l })
                             .join("\n");
        // Run through pygmentize if necessary
        let mut child = Command::new("pygmentize")
            .args(["-l", &self.language, "-P", "style=nord", "-P", "nowrap",
                   "-f", "html"])
            .stdout(Stdio::piped())
            .stdin(Stdio::piped())
            .spawn()?;
 
        let mut stdin = child.stdin.take().unwrap();
        stdin.write(content.as_bytes())?;
        drop(stdin);
 
        let output = child.wait_with_output()?;
        let output = String::from_utf8(output.stdout)?;
        // Write cache
        std::fs::write(filename, &output)?;
        Ok(output)
}

The CSS must be included on the page separately after being generated with:

$ pygmentize -S nord -f html > highlighting.css

Timings

This results in the following global generation timings for this website:

Zola highlightingGeneration time [ms]
syntect190
pygments950
pygments (cached)90

Assuming that only one block of code changes at a time and that they are infrequent compared to text edits, this is reasonable.

Follow-ups

With a bit more care, we could: