September 9, 2022

Speeding up incremental Rust compilation with dylibs

TL;DR

Creating a dynamic version of a Rust library can improve incremental compilation speed during development quite substantially. cargo add-dynamic automates the setup process.

One of Rust’s biggest downsides is that it tends to be rather slow when compiling programs. A lot of work has been put into improving the situation over the years with noticable impact. In addition, there are a number of good guides out there that explain the impact of various optimization options and tooling choices:

One tip that I have not seen widely mentioned is the use of dynamic libraries that can help decreasing incremental compilation speed quite dramatically. Bevy provides this as a feature flag (see last link above) but this approach can be used for most other dependencies as well.

Incremental compilation matters so much because it directly affects development feedback. This is not just re-running the application but also the wait time for feedback from the compiler and tools such as rust-analyzer. When recompilation is slow, it is not rare to wait seconds for compiler errors to appear — infuriating!

Table of Contents

Changelog

  • 2022-09-14: Mentioning diamond-dependency problem
  • 2022-09-10: Added breakdown about time spent, link to reddit, env var for feature

Example with polars

Here is an easy to produce example with polars. Polars is a dataframe library, a Rust (and python) alternative to the popular Python pandas library. I lately have used it a lot for data analysis and exploration, during which fast feedback cycles help a lot.

So to get started, here is the baseline with a simple polars hello world:

# create a new project
cargo new polars-test
cd polars-test

# add polars as dependency
cargo add polars --features csv-file,lazy,list,describe,rows,fmt,strings,temporal

# modify main.rs
cat > src/main.rs << EOM
use polars::prelude::*;

fn main() {
    let df = df!("col 1" => ["a"]).unwrap();
    dbg!(df);
}
EOM

# first full build, this should take a while
cargo build

We create a new project, add polars as dependency and create a dataframe in the main function. Polars offers a ton of (Rust) features, so much so that using it without a number of those does not make much sense. The above is quite typical for what I would typically use for working on csv data. The initial build is expected to take a while (on my computer1 about 25 seconds).

But now on to incremental changes. We make a small change to the main.rs file and rebuild.

sed -i '' 's/"a"/"b"/' src/main.rs
time cargo build
# real	0m2.896s
# user	0m2.080s
# sys	0m0.237s

Almost 3 seconds, not great. Let’s try the same thing with polars as a dynamic library:

cargo rm polars
cargo install cargo-add-dynamic
cargo add-dynamic polars --features csv-file,lazy,list,describe,rows,fmt,strings,temporal
cargo build

We are using the cargo command cargo-add-dynamic to re-install polars and wrapping it as a dynamic library. Using this command is optional, the setup can be done manually (see below).

And now building incrementally:

sed -i '' 's/"b"/"a"/' src/main.rs
time cargo build
# real	0m0.485s
# user	0m0.391s
# sys	0m0.110s

This is a 5x improvement over the static version! Moreover, half a second is not all that noticable, 3 seconds gets quite annoying quickly. So how does it work?

Wrapping dependencies as dylibs

By default, Rust dependencies will be linked statically with the resulting binary file being build. For production use, this is typically very desirable as this enables a number of optimizations like inlining and makes the resulting binary (mostly) “standalone”, i.e. no (or only some) libraries will need to be found and loaded at runtime.

During development, the Rust compiler will need to do this work as well. This means that for “heavy” dependencies that export a lot of symbols recompilation and linking can be expensive. When forcing a dependency to by dynamic, the compiler will not need to do the same amount of work, however.

We can can capture and then compare build times using the nightly time-passes flag that will output time and memory usage for various compilation steps:

RUSTFLAGS="-Z time-passes" time cargo +nightly build &> time-passes.log

This results in many lines of output, so redirecting this into a file is recommended. The output of both a dynamic and a static recompilation can be diffed but the differences are quite visible directly when looking at the last few lines:

With a static build:

...
time:   2.352; rss:  105MB ->  105MB (   +0MB)	run_linker
time:   2.353; rss:  104MB ->  105MB (   +1MB)	link_binary
time:   2.353; rss:  104MB ->  105MB (   +1MB)	link_crate
time:   2.356; rss:  104MB ->  105MB (   +1MB)	link
time:   2.511; rss:   20MB ->   59MB (  +39MB)	total
    Finished dev [unoptimized + debuginfo] target(s) in 2.64s

With a dynamic build:

...
time:   0.202; rss:  103MB ->  103MB (   +0MB)	run_linker
time:   0.203; rss:  102MB ->  103MB (   +1MB)	link_binary
time:   0.203; rss:  102MB ->  103MB (   +1MB)	link_crate
time:   0.206; rss:  102MB ->  103MB (   +1MB)	link
time:   0.346; rss:   20MB ->   56MB (  +36MB)	total
    Finished dev [unoptimized + debuginfo] target(s) in 0.47s

So the main differences really are just in the linking phase. As was mentioned by u/valarauca14 on reddit

Static linking is relatively complex while run-time dynamic linking is super straight forward.

With Static Linking you need to index the object, respect the “weakness”/“strength” of symbols, resolve conflicts, and potentially rewrite offsets within functions/assembly when gluing things together. Potentially repair debug/data that now conflicts with one-another. Constants maybe merged/dropped. These constants are not necessarily_defined by the developer, they can be ABI specific stuff (like dynamic linking points/stack unwrapping heap reservations, etc.).

This work is rather expensive computationally. Fast linkers like mold don’t support some of these things (or they’ve received the bug report and are working on a solution). While faster linkers like gold have the luxury of newer algorithms/fresh eyes on old problems.

With dynamic-linking. The object file has a literal index to help you determine symbol -> start/end byte offset. This file will be pre-loaded into memory at start. It is a very quick operation, normally just a memcp (to put the function into executable memory) + mov to update the offset so the original program can link to. This also occurs lazily at runtime, so loading & linking is amortized into runtime.

And indeed, as all of the guides linked above recommend, switching to a faster linker such as mold (on Linux) or zld (on macOS) does make a difference. Switching to zld on my machine brings down the static compilation time to around 2 seconds while the dynamic compilation time stays at around 0.5 seconds. So even though a faster linker improves the situation, there is still a notable difference.

crate-type = [“dylib”]

How can a dependency be made dynamic? The Rust compiler supports various methods to link crates together both statically and dynamically. A dependency itself can specify it’s output type in its Cargo.toml using the crate-type setting:

[lib]
crate-type = ["dylib"]

When a crate is only available as dylib, the compiler will be forced to link to it dynamically.

polars-dynamic

Going back to the example above and the cargo add-dynamic polars command. It automated the creation of a sub-package polars-dynamic that wrapped the original library. It has the following content:

polars-dynamic/Cargo.toml

[package]
name = "polars-dynamic"
version = "0.1.0"
edition = "2021"

[dependencies]
polars = { version = "0.23.2", features = ["csv-file", "lazy", "list", "describe", "rows", "fmt", "strings", "temporal"] }

[lib]
crate-type = ["dylib"]

polars-dynamic/src/lib.rs

pub use polars::*;

Cargo.toml

And then instead of depending on directly on polars, the main project can use the local wrapped sub-package:

# ...

[dependencies]
polars = { path = "polars-dynamic", package = "polars-dynamic" }

Limitation: The diamond dependency problem

As was brought up in a github issue, there is an important limitation in this approach. Because of how Rust deals with transitive dependencies during linking, if two dylibs (B and C) depend both on a static dependency (A) and a binary (D) uses B and C then you will see a compile error along the lines of error: cannot satisfy dependencies so A only shows up once.

In particular this comes up when trying to dynamically link against both serde and serde_json. If both of these are wrapped as dylibs as shown above, we end up with a dependency graph such as:

graph TD; bin["binary"]-.->serde_dylib["serde (dylib)"]; bin-.->serde_json_dylib["serde_json (dylib)"]; serde_json_dylib-->serde_json; serde_json-->serde1["serde"]; serde_dylib-->serde2["serde"];

The problem is that the serde (dylib) dependency provides a copy of serde to the binary and serde_json pulls in another copy of serde as well. Even though the versions are the same, Rust cannot deal with this because of global state mentioned in the linked issue above.

The solution is to make serde_json depend on serde dynamically as well so that the same dylib can be linked to serde-dynamic and serde_json.

graph TD; bin["binary"]-.->serde_dylib["serde (dylib)"]; bin-.->serde_json_dylib["serde_json (dylib)"]; serde_json_dylib-->serde_json; serde_json-.->serde_dylib; serde_dylib-->serde;

However, this unfortunately is not straightforward. If serde would specify a crate-type of dylib in addition to rlib, we could use RUSTFLAGS="-C prefer-dynamic" to build it as a dylib and have both our binary as well as serde_json depend on it dynamically. Unfortunately, serde does not specify that crate-type so the only solution that makes this currently work is to clone https://github.com/serde-rs/json and use a modified version of it that points to a dynamic serde lib. Now this is certainly undesirable.

For this specific scenario, a pull request against serde to allow for opt-in dylib builds might be one solution. But generally, this seems like a shortcoming of how dylib builds can be specified: The crate that is to be consumed needs to do it via crate-type. Dependent crates cannot override this. In my opinion this is something that could be added to Rust/Cargo as it looks like a backwards compatible addition. Though there might be complications that I do not consider.

Switching between static and dynamic dependencies

And this is how re-compilation can be improved when depending on “heavy” libraries. You will probably not want to use this for production as this would mean having to ship additional libraries and making sure they can be found at runtime. In addition, a number of optimizations won’t be available.

So switching between a development and production version is advisable. You can do this quite easily with a feature flag as just loading the dynamic library will force dynamic linking:

#![allow(unused_imports)]

#[cfg(feature = "dynamic")]
use polars_dynamic;

use polars::prelude::*;

And Cargo.toml would then specify:

[dependencies]
polars = { version = "0.23.2", features = [...]}
polars-dynamic = { path = "polars-dynamic", optional = true }

[features]
default = []
dynamic = ["dep:polars-dynamic"]

Now cargo build --feature dynamic will make use of the dynamic polars dependency.

Controlling --feature dynamic with an environment variable

To make the feature selection automated based on environment variables, a suggestion from Matthias Beyer (thanks!) is to employ a build.rs script such as

fn main() {
    println!("cargo:rerun-if-env-changed=DYNAMIC");
    if std::env::var("DYNAMIC").is_ok() {
        println!("cargo:rustc-cfg=feature=\"dynamic\"");
    }
}

which can then be set like DYNAMIC=1 cargo run or in a .env file or similar.

Using it with hot-reloading

This can be used in combination with hot-reloading Rust code. In particular, having the ability to replace behavior on-the-fly but having to wait for a long time on the compilation process makes little sense. And indeed, several of the hot-reload examples make use of this approach.


  1. MacBooPro M1 (MacBookPro18,3), 32GB RAM ↩︎

© Robert Krahn 2009-2022