Hello,
Beginning around 11:20 UTC today, Cloudflare experienced widespread degradation to traffic in our network. We deeply apologize for this disruption. We have published a blog detailing exactly what happened. The majority of the impact was resolved by 14:30 UTC, with all remaining downstream impacted services being fully operational at 17:06 UTC.
In short, this incident occurred due to a latent bug related to a configuration file for our bot management service. There is no evidence of an attack or malicious activity causing the issue.
Bot management detection sits in the critical path for traffic flow. When any request hits our network, the bot management module should load, but due to the bug, the configuration file grew beyond an expected size of entries. This triggered a crash in the software system that handles traffic for a number of Cloudflare's services. Customers experienced HTTP 500 status code errors as many of Cloudflare's services suffered an outage.
The Cloudflare team diagnosed the issue and reverted to an earlier version of the bot detection module configuration that was below the limits. We also modified the core proxy service so it would not fail in this way if limits are reached again.
For a more detailed description and timeline of the incident, and the steps we took to address it, please review our blog.
Sincerely,
Cloudflare Team