Friday, April 10, 2015

Who moved my 99th percentile latency?

Longtail latencies affect members every day and improving the response times of systems even at the 99th percentile is critical to the member's experience. There can be many causes such as slow applications, slow disk accesses, errors in the network, and many more. We've encountered a root cause of microbursting traffic which cannot be easily solved by the hedging your bet strategy, i.e, sending the same request to multiple servers in hopes that one of the servers will not be impacted by longtail latencies. In this following post we will share our methodology to root cause longtail latencies, experiences, and lessons learned.

