Cloud

Posts in 2024
  • OpenAI Global Outage Postmortem: K8S Circular Dependencies

    December 14, 2024 in Cloud

    OpenAI Global Outage Postmortem: K8S Circular Dependencies

    Even trillion-dollar unicorns can be a house of cards when operating outside their core expertise.

    Read more

    Even trillion-dollar unicorns can be a house of cards when operating outside their core expertise.

    Read more

  • WordPress Community Civil War: On Community Boundary Demarcation

    October 17, 2024 in Cloud

    WordPress Community Civil War: On Community Boundary Demarcation

    When open source ideals meet commercial conflicts, what insights can this conflict between open source software communities and cloud vendors bring? On the importance of community boundary demarcation.

    Read more

    When open source ideals meet commercial conflicts, what insights can this conflict between open source software communities and cloud vendors bring? On the importance of community boundary demarcation.

    Read more

  • Cloud Database: Michelin Prices for Cafeteria Pre-made Meals

    October 06, 2024 in Cloud

    Cloud Database: Michelin Prices for Cafeteria Pre-made Meals

    The paradigm shift brought by RDS, whether cloud databases are overpriced cafeteria meals. Quality, security, efficiency, and cost analysis, cloud exit database self-building: how to implement in practice!

    Read more

    The paradigm shift brought by RDS, whether cloud databases are overpriced cafeteria meals. Quality, security, efficiency, and cost analysis, cloud exit database self-building: how to implement in practice!

    Read more

  • Alibaba-Cloud: High Availability Disaster Recovery Myth Shattered

    September 17, 2024 in Cloud

    Alibaba-Cloud: High Availability Disaster Recovery Myth Shattered

    Seven days after Singapore Zone C failure, availability not even reaching 8, let alone multiple 9s. But compared to data loss, availability is just a minor issue

    Read more

    Seven days after Singapore Zone C failure, availability not even reaching 8, let alone multiple 9s. But compared to data loss, availability is just a minor issue

    Read more

  • Amateur Hour Opera: Alibaba-Cloud PostgreSQL Disaster Chronicle

    August 19, 2024 in Cloud

    Amateur Hour Opera: Alibaba-Cloud PostgreSQL Disaster Chronicle

    A customer experienced an outrageous cascade of failures on cloud database last week: a high-availability PG RDS cluster went down completely - both primary and replica servers - after attempting a simple memory expansion, troubleshooting until dawn. Poor recommendations abounded during the incident, and the postmortem was equally perfunctory. I share this case study here for reference and review.

    Read more

    A customer experienced an outrageous cascade of failures on cloud database last week: a high-availability PG RDS cluster went down completely - both primary and replica servers - after attempting a simple memory expansion, troubleshooting until dawn. Poor recommendations abounded during the incident, and the postmortem was equally perfunctory. I share this case study here for reference and review.

    Read more

  • What Can We Learn from NetEase Cloud Music's Outage?

    August 18, 2024 in Cloud

    What Can We Learn from NetEase Cloud Music's Outage?

    NetEase Cloud Music experienced a two-and-a-half-hour outage this afternoon. Based on circulating online clues, we can deduce that the real cause behind this incident was...

    Read more

    NetEase Cloud Music experienced a two-and-a-half-hour outage this afternoon. Based on circulating online clues, we can deduce that the real cause behind this incident was...

    Read more

  • Blue Screen Friday: Amateur Hour on Both Sides

    July 23, 2024 in Cloud

    Blue Screen Friday: Amateur Hour on Both Sides

    Both client and vendor failed to control blast radius, leading to this epic global security incident that will greatly benefit local-first software philosophy.

    Read more

    Both client and vendor failed to control blast radius, leading to this epic global security incident that will greatly benefit local-first software philosophy.

    Read more

  • How Ahrefs Saved US$400M by NOT Going to the Cloud

    May 22, 2024 in Cloud

    How Ahrefs Saved US$400M by NOT Going to the Cloud

    After Alibaba-Cloud's epic global outage on Double 11, setting industry records, how should we evaluate this incident and what lessons can we learn from it?

    Read more

    After Alibaba-Cloud's epic global outage on Double 11, setting industry records, how should we evaluate this incident and what lessons can we learn from it?

    Read more

  • Database Deletion Supreme - Google Cloud Nuked a Major Fund's Entire Cloud Account

    May 11, 2024 in Cloud

    Database Deletion Supreme - Google Cloud Nuked a Major Fund's Entire Cloud Account

    Due to an "unprecedented configuration error," Google Cloud mistakenly deleted trillion-RMB fund giant **UniSuper**'s entire cloud account, cloud environment and all off-site backups, setting a new record in cloud computing history!

    Read more

    Due to an "unprecedented configuration error," Google Cloud mistakenly deleted trillion-RMB fund giant **UniSuper**'s entire cloud account, cloud environment and all off-site backups, setting a new record in cloud computing history!

    Read more

  • Cloud Dark Forest: Exploding Cloud Bills with Just S3 Bucket Names

    April 30, 2024 in Cloud

    Cloud Dark Forest: Exploding Cloud Bills with Just S3 Bucket Names

    The dark forest law has emerged on public cloud: **Anyone who knows your S3 object storage bucket name can explode your cloud bill.**

    Read more

    The dark forest law has emerged on public cloud: **Anyone who knows your S3 object storage bucket name can explode your cloud bill.**

    Read more