Garage receives NLNet grant to work on reliability and performance

Published on
7 min reading time 1328 words

We are happy to announce that Garage just received a new NLNet grant, to fund work on reliability and performance. In this post we will detail our motivations to make this a priority, and detail the project roadmap for 2026.


If we look back a few years, Garage was designed in the context of the Deuxfleurs non-profit and tailored for its specific use case: deploying storage on low-power repurposed hardware over consumer-grade networks. This meant focusing on geo-distributed setups and optimizing for latency, while only assuming moderate amounts of storage, in the order of a few terabytes at most.

Nowadays, Garage is seeing increased adoption by users around the world in a wider variety of deployment scenarios. This really accelerated recently when MinIO decided to make changes to their offerings, and people started to look for replacement for local S3 storage. These new deployment scenarios are putting Garage outside of its original design space, somewhat stretching its current abilities, for instance in terms the of amount of data or objects stored. In many cases, Garage behaves relatively well in these new configurations; but still, these have revealed issues surrounding performance, reliability and operability.

To close that gap, we submitted last year a grant proposal via NLNet to the NGI Zero Commons Fund, and we are happy to announce that our proposal was accepted!

At a high level, this grant will fund a bit more than 1 year of full-time work, dedicated to stress-testing and reliability engineering, internal database engine performance, proper queuing and backpressure mechanisms, end-to-end performance, documentation and observability.

2026 Project Roadmap

More specifically, our roadmap for this project is to work on the following milestones:

Stress testing and fuzzing of the metadata store. The goal here is to extend the existing test suite to apply fuzzing/property testing techniques on the different layers of Garage's metadata store (K/V store, CRDTs layer, S3 metadata operations...) testing consistency against a functional model. The goal is to increase reliability of this part of Garage by hunting for rarely occurring bugs that might only arise during high load scenarios.

Packaging of existing Jepsen testbenches in Nix. Garage current testing infrastructure includes a whole-system testing harness based on Jepsen. We want to make it easier for people to run Jepsen tests by automating the loading of VMs and Jepsen jobs using Nix.

Electing a replacement for LMDB as metadata K/V store. LMDB is the K/V store used for metadata storage in the majority of Garage deployments, including the one at Deuxfleurs. However, LMDB suffers from a number of problems and limitations, the main one being its fragility to machine crashes, possibly leading to metadata corruption. We plan to study alternatives and elect a replacement K/V store with better properties. The options we consider at this stage are SQLite, Fjall and RocksDB.

Fix handling of backpressure for writes. Backpressure mechanisms allow components of a system to communicate whether they are overloaded and should not receive more work, so that the whole system can behave gracefully under pressure. Currently, Garage has some backpressure mechanism to handle high write workload for data blocks, and none for metadata. However, the backpressure mechanism for data blocks does not work well, and will block uploads even if there is a quorum of nodes that can accept the upload right away. This is often reported in unusual deployments using an “asynchronous replication” mode (3, 3-dangerous and 2-dangerous), but is a problem that applies to all deployments. There is also no backpressure on metadata currently. This can become a problem if metadata disks are slow or overloaded by I/O; then queues (themselves persisted in the metadata store) fill up without stopping, causing further I/O and making the problem worse. The goal, then, is to implement well-behaved backpressure mechanisms for both data blocks and metadata.

Garage performance with many small objects. The Garage deployment at Deuxfleurs hosts mostly medium objects (images and webpages). However, users have also been using Garage to store a large number of small objects, a use case which was not originally anticipated or optimized for. We plan to benchmark and optimize Garage’s performance for these use cases.

Garage performance with large objects. At Deuxfleurs, Garage is not used to store large amounts of data: the entire Deuxfleurs cluster stores less than 10TB. However, other users of Garage have reported storing very large amounts of data on their garage deployments (>100TB, up to a few PB), made of large objects, and hitting subpar performance in these settings. Again, we plan to benchmark and optimize Garage’s performance with large objects.

Reliability of Garage deployments with “replication factor = 1”. Garage’s well-studied deployment scenario is to geo-distribute garage across nodes with a replication factor of 3 (each piece of data is replicated in 3 places). However, Garage users have also been deploying Garage with several nodes — sometimes using RAID on the nodes — but with a replication factor of 1, trading reliability for storage efficiency. This was not an originally intended use case and we have little information with respect to the reliability guarantees and limits for this kind of deployments. The goal of the task is to test those limitations and document them.

Document recommended deployment scenarios. Generally speaking, Garage’s documentation does not currently contain a lot of information regarding recommended deployment scenarios. The only officially recommended scenario so far is to geo-distribute garage with replication factor = 3; but in practice many users deploy garage to single nodes, or two nodes, or use replication factor = 1 — in those cases, users have little available information on what are recommended settings, hardware configurations, etc. There are also few recommendations regarding metadata storage K/V stores, filesystems, and discussions of tradeoffs. We plan to document a wider variety of deployment scenarios that match what people use in practice.

Better observability for resync and layout change operations. Garage needs to send data blocks between nodes, both as part of normal operations and in the context of a change in cluster layout. Such blocks are tracked in each node's “resync queue”, which currently does not prevent blocks from being duplicated in the queue. This causes two problems: 1) the queue is unnecessarily large, which causes write amplification in the metadata storage and thus unnecessary I/O; 2) when operating a garage cluster, monitoring the size of the queue does not helpfully reflect the amount of remaining resync work. Additionally, when operating a garage cluster and removing a node from the layout, blocks need to be moved between nodes (using the resync queue), but there is nothing to indicate when this process is done and the node can be safely removed from the cluster. Our plan is to improve the handling of the resync queue and better expose its status to cluster operators.

Explore S3 bucket policies for future implementation. Bucket policies are a feature of the S3 API for access control. These are not currently supported by Garage, but have been requested in past surveys, and are useful in multi-tenant deployment scenarios more complex than the one at Deuxfleurs. We plan to do a preliminary exploration of the design space for bucket policies, with the goal of producing a design document and a roadmap for implementing this in future work.

Closing thoughts

Work on this roadmap items will start in the following months. Stay tuned for future developments as the team picks up steam!

We know that there is still a lot of features missing in garage, but we cannot get there without solid foundations, and we feel that taking this year to focus on making garage scale is our best bet to move forward.

As always, if you have questions regarding Garage, feel free to email us at garagehq@deuxfleurs.fr or join the Garage Matrix channel!