Figma's In-House Redis Proxy: Achieving Six Nines Uptime (2026)

Figma's in-house Redis proxy service, FigCache, has achieved six nines uptime, marking a significant milestone in the company's data caching platform. This achievement comes after a detailed rearchitecture process, showcasing Figma's commitment to reliability and scalability. The story behind FigCache is a testament to the challenges of managing Redis at scale and the innovative solutions that can emerge from such endeavors.

A Growing Threat to Site Availability

Figma's Redis platform faced scalability and reliability gaps, posing a threat to site availability. Rapid scale-ups of client services led to thundering herds of connection requests, saturating Redis I/O and causing availability issues. A sprawl of independent client libraries, each with unique observability behaviors, further complicated incident diagnosis. Initial workarounds, like custom client-side connection pooling, only masked the underlying structural problems.

Building a Proxy for Control and Flexibility

Figma's decision to build an in-house proxy, FigCache, was driven by the limitations of existing open-source solutions. These solutions lacked the semantic awareness needed to implement runtime guardrails and support a fragmented client base. By building a proprietary layer, Figma gained the flexibility to handle various client variants, including Redis Cluster emulation, transparently.

Extending the Backend with Composable Logic

FigCache's backend is configured using a Starlark program, evaluated at runtime, which renders a Protobuf-structured configuration. This approach allows operators to change routing logic, key-prefix-based rejection rules, and command-type splitting without redeploying server binaries. This composable engine tree design enables the system to absorb future requirements without disrupting existing functionality.

Handling Complex Scenarios with Fanout Filters

FigCache's fanout filter engine intercepts multi-shard pipelines, executing them internally as parallelized scatter-gather. This ensures that errors, like CROSSSLOT errors from Redis Cluster, are handled gracefully, never reaching the application. This level of control and error handling is crucial for maintaining high availability.

A Reversible Migration Strategy

The migration to FigCache was designed with reversibility in mind. Traffic was shifted service by service, with feature flags enabling instant reversion without code changes or binary deployments. For large workloads, incremental domain-based shifts were employed, ensuring a smooth transition. Extensive benchmarking, including weekly distributed stress tests, was conducted to ensure the system could handle peak loads.

Parallels in the Industry

Figma's approach to Redis caching has parallels in other companies. Lastminute.com rearchitected a search aggregation system to use Redis as an intermediary result store, decoupling supplier search drivers from the aggregation service via RabbitMQ. The goal was similar: reduce coupling, improve scalability, and isolate components from one another's failure modes.

The Wider Redis Ecosystem and Future Options

The Redis ecosystem has seen changes, with Redis returning to open-source licensing under AGPLv3 after a year of controversy. Redis 8.0, released alongside the licensing change, boasts performance improvements. Figma's decision to build an abstraction layer that can swap out backend storage systems looks prudent in this context, as FigCache is designed to support alternative backends, including AWS MemoryDB and Figma's own Postgres stack.

The Choice Between Building and Buying

The decision to build or buy infrastructure is a common dilemma for engineering teams. Sneha Wasankar highlights that the choice of cache-aside, write-through, or write-behind patterns often matters less than the reliability of the infrastructure. Figma's post emphasizes that, at sufficient scale, the infrastructure itself becomes the product, as evidenced by FigCache's ability to eliminate high-severity incidents and improve observability.

A Hard-Won Success Story

FigCache's success is a result of meticulous planning, innovative design, and a reversible migration strategy. The system's ability to handle complex scenarios, maintain high availability, and improve observability is a testament to Figma's engineering prowess. The question of whether this approach can be generalized beyond Figma's specific context remains open, but the story of FigCache serves as an inspiring example of how innovative solutions can emerge from the challenges of managing large-scale data caching platforms.

Figma's In-House Redis Proxy: Achieving Six Nines Uptime (2026)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Sen. Ignacio Ratke

Last Updated:

Views: 6184

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Sen. Ignacio Ratke

Birthday: 1999-05-27

Address: Apt. 171 8116 Bailey Via, Roberthaven, GA 58289

Phone: +2585395768220

Job: Lead Liaison

Hobby: Lockpicking, LARPing, Lego building, Lapidary, Macrame, Book restoration, Bodybuilding

Introduction: My name is Sen. Ignacio Ratke, I am a adventurous, zealous, outstanding, agreeable, precious, excited, gifted person who loves writing and wants to share my knowledge and understanding with you.