US East 1 endpoint API disruption -- details
Feb 20 at 07:01pm CST
Yesterday between 7:45pm and 8:15pm UTC we noticed increased API latencies and elevated error rates in the identification service running in the US East 1 AWS region.
Our engineers started looking into the infrastructure and discovered that the rate limiting library that we used contained a bug that allowed spikes of traffic to go through and overload one of the production clusters as a result.
A large portion of the requests to this cluster failed or took more time to complete during this period of degraded performance.
Our engineering team identified the root cause and have made the changes to prevent this situation from happening again.
We apologize for the inconvenience and will make sure we improve our load testing process to cover this scenario.