Dec 4, 2025, Taobao, Alipay, and Xianyu all cratered. Users got charged while orders still showed “unpaid,” a carbon copy of the 2024 Double-11 fiasco.
A ClickHouse permission tweak doubled a feature file, tripped a Rust hard limit, and froze Cloudflare’s core traffic for six hours—their worst outage since 2019. Here’s the full translation plus commentary.
us-east-1’s DNS control plane faceplanted for 15 hours and dragged 142 AWS services—and a good chunk of the public internet—down with it. Here’s the forensic tour.
Seven days after Singapore Zone C failure, availability not even reaching 8, let alone multiple 9s. But compared to data loss, availability is just a minor issue
NetEase Cloud Music experienced a two-and-a-half-hour outage this afternoon. Based on circulating online clues, we can deduce that the real cause behind this incident was…
Due to an “unprecedented configuration error,” Google Cloud mistakenly deleted trillion-RMB fund giant UniSuper’s entire cloud account, cloud environment and all off-site backups, setting a new record in cloud computing history!
Tencent Cloud’s epic global outage after Double 11 set industry records. How should we evaluate and view this failure, and what lessons can we learn from it?
Alibaba-Cloud and Didi had major outages one after another. This article discusses how to move from cost-reduction jokes to real cost reduction and efficiency — what costs should we really reduce, what efficiency should we improve?
Alibaba-Cloud’s epic global outage after Double 11 set an industry record. How should we evaluate this incident, and what lessons can we learn from it?