Monday, March 17, 2014

Apache ZooKeeper Resilience at Pinterest

Apache ZooKeeper is an open source distributed coordination service that’s popular for use cases like service discovery, dynamic configuration management and distributed locking. While it’s versatile and useful, it has failure modes that can be hard to prepare for and recover from, and if used for site critical functionality, can have a significant impact on site availability.

It’s important to structure the usage of ZooKeeper in a way that prevents outages and data loss, so it doesn’t become a single point of failure (SPoF). Here, you’ll learn how Pinterest uses ZooKeeper, the problems we’ve dealt with, and a creative solution to benefit from ZooKeeper in a fault-tolerant and highly resilient manner.

Read more here

