Monday, October 5, 2015

Lessons learned writing highly available code

After more than two years at Imgur, I’ve had to learn a lot about the principles behind writing highly-available (but not AP) fault-resilient systems. While occasionally some systems go down, it’s the times that I wake up in the morning and come in to work only to realize that overnight, a safeguard we put in place automatically triggered, or the system caught an error and successfully recovered, that I am thankful for some good design principles. Here are a few of those things I’ve noticed in particular:

  1. Put limits on everything 
  2. Retry, but with exponential back-off 
  3. Use supervisors and watchdog processes 
  4. Add health checks, and use them to re-route requests or automate rollbacks 
  5. Redundancy is more than just nice-to-have, it’s a requirement 
  6. Prefer battle-tested tools over the “new hotness”

read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .