Forging Resilience at 250K RPS: Go Context Lessons from Debugging a Cascading Failure.

Extreme scale engineering

Discover the latest trends and best practices impacting data-intensive applications. Register for access to all 60+ sessions available on demand.

Forging Resilience at 250K RPS: Go Context Lessons from Debugging a Cascading Failure.

Paweł ObrępalskiPrajakta Mundale17 minutes

Register for access to all 60+ sessions available on demand.

Fill out the form to watch this session from the Monster Scale Summit livestream. You’ll also get access to all available recordings.

In this Monster Scale Summit Presentation

At ShareChat's scale, a seemingly simple incident revealed a 100% client-side error rate masking only 5% server-side failures, exposing dangerous gaps in observability and failure handling. This talk walks through the debugging journey that uncovered misaligned timeouts, inconsistent circuit breaking, and service mesh interactions, along with the fixes applied and practical techniques for improving resilience in distributed systems.

Paweł Obrępalski, Staff Engineer, Sharechat

Paweł Obrępalski is a Staff Engineer at ShareChat, passionate about performance and simplicity.

Prajakta Mundale, Software Engineer 2, Sharechat

Hi, I’m Prajakta, SDE-2 at Sharechat. I work mainly on performance optimization and engineering improvements to make microservices resilient and cost efficient.