Logo Logo
  • Platform
    • Products
      • Why Core dna
        See how Core dna transforms your digital business.
      • eCommerce
        Power your eCommerce ambition
      • CMS
        For marketers with vision, not code
      • Orchestration
        Integrate, automate, orchestrate
      • DXP
        Build, manage, and scale your digital properties in one place.
      By Role
      • Developers
        Modernize your web presence without ripping or replacing anything.
      • Executives
        Empower marketers, free up IT team and slash costs at the same time.
      • Marketers
        Total control, without the development team.
      Company
      • Customers
        Helping power the digital presence of hundreds of customers
      • Features
        Content and commerce features.
      • Services
        From digital transformation strategy to scaling your digital business.
      • Admin login
        Access to Core dna DXP 1 admin
      • Integrations
  • Solutions
    • Use Cases
      • B2B
        Go directly to customers with an all-in-one B2B platform.
      • B2C
        Connect to shoppers anytime, anywhere with our B2C eCommerce solution
      • Marketplace
        Multi-vendor eCommerce marketplace platform.
      • Content
        Craft content with ease, then deliver it anywhere.
      • Headless
        A hybrid headless platform loved by marketers and developers.
      • Infrastructure
        Advanced cloud infrastructure built for scale and security.
      By Industry
      • Direct to Consumers / Manufacturing
        Get the tools and experience to thrive in the new direct-to-consumer world.
      • Education
        Create a powerful online presence with your school website.
      • Franchises
        Seamlessly push brand-approved marketing to all locations or specific locations - easily.
      • Retail
        Sell with excellence in-store and online.
      • Media
        Don’t just break news, break news everywhere.
      • Travel & Tourism
        Give travellers the speed and reliability they demand.
      • Membership Organizations
        Empower Your Membership Management with Smart Technology
  • Resources
    • Insights
      • Blog
      • Guides
      • FAQ
      Developers
      • Getting started
      • Documentation
      • API
  • Pricing
  • Partners
    • Why Partner?
    • Program Overview
    • Become a partner
Get started
 
  1. Home
  2. Core dna insights

Core dna Chronicles: How Core dna Stayed Online During the $650M AWS Outage

AWS OUTAGE DISASTER RECOVERY PLAN
Dmitry Kruglov
November 01, 2025 - (8 min read)

Platform Strategies

The real cost of infrastructure wasn’t just the $650 million in estimated losses, it was the trust that evaporated in minutes. The biggest Amazon Web Services (AWS) outage of the decade took the internet down with it.

One of AWS’s most repeated best practices for high availability has always been to distribute workloads across multiple availability zones within the same region. Last week proved how fragile that assumption really is.

The 14-hour failure in AWS’s US-EAST-1 region crippled thousands of websites and services worldwide: Shopify stores froze mid-transaction, Snapchat, Fortnite, Reddit, all saw their websites come to a halt. 

Why and what happened? What began as a DNS configuration error in DynamoDB, one of AWS’s core database services, quickly spiraled into one of the most significant cloud outages in history. Over 150 interconnected AWS services were affected. 

For most platforms, recovery took more than half a day. Core dna clients were back online in under 30 minutes thanks to a platform architecture designed and tested for moments exactly like this.

Key Takeaways

  • Don't accept "multiple availability zones", require your platform to run production environments across different geographic regions.
  • Ask your provider how long recovery takes during AWS outages: 30 minutes vs. 14 hours is the difference between minimal and catastrophic revenue loss.
  • If your platform hasn't actually executed its disaster recovery procedures, you're at risk when real outages hit.
  • Choose providers that detect issues before cloud providers acknowledge them; early detection means faster response.
  • Your checkout, payments, and login must stay operational even when background systems go down; confirm your platform separates these concerns.
Form 94 - Core dna blueprint

Explore Core dna

Discover how CMS, Commerce, and Orchestration come together in one platform built for growth.

We respect your privacy.(See our disclosure)
Success! Your request has been submitted successfully.

On this page:

    The Cascading Failure That Broke the Internet

    The AWS outage revealed a fundamental vulnerability in modern cloud infrastructure: internal dependencies. When DynamoDB's DNS configuration failed, it didn't just take down one service, it created a domino effect across AWS's ecosystem.

    Simple Queue Service (SQS), Lambda functions, EC2 instance launches, and eventually over 150 additional services joined the casualty list. The problem? 

    Most companies had followed AWS's own best practices, architecting their systems across multiple availability zones within the same region. They believed this provided redundancy. The outage proved otherwise—when core regional services fail, availability zones become irrelevant.

    How Core dna Detected and Responded in Real-Time

    At Core dna, the story unfolded differently. Here's the timeline from our CTO, Dmitry:

    3:55 AM ET: One of our clients has a big event running.

    Early Morning: The client called reporting website issues, the team quickly realized this wasn't an isolated incident, it was systemic. AWS's status page initially reported only DynamoDB issues, but Core dna's logs revealed the full scope: SQS endpoints were completely down, returning internal server errors.

    Decision Point: Unlike companies forced to wait for AWS to resolve the issue, Core dna had options. Our disaster recovery plan includes multiple production environments across different regions with continuous cross-regional backups.

    The Critical Advantage: Because SQS is not latency-sensitive like database operations, we could switch our SQS endpoints from US-EAST-1 to Canada Central without moving entire infrastructures. This surgical approach restored full functionality within 30 minutes.

    The Architecture That Made the Difference

    Core dna's resilience during this outage wasn't accidental, it reflects years of architectural decisions prioritizing business continuity over convenience:

    1. Multi-Region Production Environments

    We maintain fully operational production environments in multiple AWS regions, not just availability zones. This means:

    • Real infrastructure running 24/7, not cold backups waiting to spin up
    • Continuous cross-regional data replication
    • Ability to shift specific services without complete migration

    2. Service-Level Failover Strategy

    Instead of an all-or-nothing disaster recovery approach, we can selectively route specific services based on their latency requirements:

    • Latency-sensitive services (databases): Must stay regional due to the compounding effect of network delays across hundreds of queries per page load
    • Latency-tolerant services (message queues, search): Can route cross-region without performance impact

    3. Proactive Monitoring Beyond AWS Status

    Our systems monitor actual service health, not just what cloud providers report. During this outage, AWS took hours to update their status page with the full list of affected services. We had already identified the issue and implemented our solution.

    4. Designed for Partial Failures

    Core dna's architecture assumes that individual services will fail. When SQS went down, here's what happened:

    • User registrations continued: Accounts were created, logins worked—users only saw errors in follow-up notifications
    • eCommerce transactions processed: Orders completed, payments went through, inventory updated
    • Admin operations persisted: While some audit logging and search indexing were delayed, critical business functions remained operational

    The key insight: our orchestration layer failed gracefully. Events that couldn't be pushed to the queue didn't crash entire workflows.

    What This Means for Your Business

    The October 2025 AWS outage should fundamentally change how ecommerce leaders think about infrastructure:

    For CMOs: Revenue Protection is Infrastructure Insurance

    Most of the losses came from eCommerce operations. Core dna clients avoided extended downtime, not through luck, but through infrastructure investment that directly protects revenue.

    Consider what 30 minutes of downtime costs your business during peak season. Now multiply that by the 12+ hours some companies experienced. The architecture investment pays for itself in a single incident.

    For CTOs: Multi-Cloud ≠ Multi-Region

    Many technical leaders believe they've solved for redundancy by using multiple AWS availability zones. The October outage proved this insufficient. True resilience requires:

    • Geographic distribution beyond a single cloud region
    • Service-level understanding of latency requirements
    • Automated failover procedures that don't require manual intervention
    • Regular testing of disaster recovery plans 

    For eCommerce Managers: Operational Continuity During Crisis

    While competitors explained downtime to customers, Core dna clients maintained normal operations. The only intervention required was manual reprocessing of orchestration events for the 30-minute window before failover, a minor backend task invisible to customers.

    This operational continuity matters beyond revenue. Your brand reputation, customer trust, and competitive position all hang in the balance when systems fail.

    Building for the Next Outage (Because There Will Be One)

    This wasn't the first major cloud outage, and it won't be the last. US-EAST-1 region has experienced three major incidents in five years. Microsoft Azure suffered a similar outage just days after AWS recovered. The pattern is clear: as we consolidate more infrastructure with fewer providers, the impact radius of each failure grows.

    Forward-thinking ecommerce platforms are already adopting Core dna's architectural principles:

    1. Assume Failure at Every Layer Design systems where any individual component can fail without cascading. This means:

    • Graceful degradation paths for non-critical features
    • Message queues that can redirect across regions
    • Separation of synchronous (customer-facing) and asynchronous (background) operations

    2. Invest in Operational Intelligence The ability to detect and respond before official acknowledgment provided Core dna crucial extra time. Real-time monitoring of actual service health—not just status pages—is essential.

    3. Test Disaster Recovery Regularly Core dna's 30-minute response time came from having executed these procedures before. If your DR plan lives in a document that's never been tested, it's not a plan—it's wishful thinking.

    4. Understand Service Latency Requirements Not everything needs to run in the same data center. Core dna's insight that SQS could route cross-region while databases couldn't represents sophisticated architectural thinking that most platforms lack.

    The New Standard for Enterprise Ecommerce

    The AWS outage didn't just disrupt services, it revealed which platforms were built for enterprise resilience and which were hoping problems would never happen.

    Core dna's response during this crisis demonstrates what modern eCommerce infrastructure should look like:

    • Multiple production environments, not backup dreams
    • Service-specific failover strategies, not one-size-fits-all disaster recovery
    • Proactive monitoring and response, not reactive scrambling
    • Architecture that assumes failure, not hopes for perfection

    For CMOs, CTOs, and ecommerce managers evaluating platforms, the question isn't whether your provider uses AWS, Google Cloud, or Azure. The question is: What happens when that cloud fails?

    With Core dna, the answer is simple: your business keeps running.

    About the Technical Response

    Core dna's approach to this outage reflects years of architectural investment in distributed systems. Our disaster recovery plan includes:

    • Cross-regional backups running continuously across US and Canadian regions
    • Service-specific routing strategies based on latency sensitivity
    • Automated monitoring that detects issues before cloud providers acknowledge them
    • Manual fallback procedures for full regional migration (tested regularly but unnecessary in this case)

    The October 2025 outage proved these investments weren't over-engineering, they were essential infrastructure for any platform handling critical ecommerce operations.

    Form 94 - Core dna blueprint

    Explore Core dna

    Discover how CMS, Commerce, and Orchestration come together in one platform built for growth.

    We respect your privacy.(See our disclosure)
    Success! Your request has been submitted successfully.
    Dmitry Kruglov
    Dmitry Kruglov

    Dmitry has over 23 years experience in developing complex web solutions. Before Core dna Dmitry was working in FinTech and Education industries.

    Back
    Next Post4 Reasons Businesses Are Leaving Sitecore (And What to Do About It)

    Related guides

    • The Orchestrated DXP for continuous digital evolution
    See all guides

    Related posts

    AI Update

    Platform Strategies

    Beyond Automation: Introducing Core dna’s Orchestration Platform
    July 08, 2025 ( 8 min read )
    AI Update

    eCommerce Business

    Core dna Orchestration Platform: Built for Complex Digital Businesses
    July 31, 2025 ( 6 min read )
    AI Update

    Content Management

    How to build AI Agents with Core dna's Orchestration Module 
    June 19, 2025 ( 6 min read )
    AI Update

    Web Development

    Orchestration: The Missing Link for Most CMS and eCommerce Platforms
    July 22, 2025 ( 6 min read )
    Solutions by Role
    • Partners
    • Developers
    • Executives
    • Marketers
    Solutions by Need
    • Intranet
    • Event Management
    • Content Management
    • B2b eCommerce
    • B2c eCommerce
    • Headless
    • Marketing
    Solutions by Industry
    • Community
    • Healthcare
    • Finance
    • Technology
    • Hospitality
    • Franchise
    • Education
    • Travel & Tourism
    Company
    • About Us
    • Why Core dna
    • Partner Ecosystem
    • Customers
    • Careers
    • Contact Us
    • G2Crowd Reviews
    Resources
    • Blog
    • Guides
    • Admin login
    • RSS Feed
    • Documentation
    • All Features
    Support
    • Help
    • Videos
    • Network Status
    • GDPR
    • Privacy Policy
    • Terms & Conditions
    • Fair Use Policy
    Get our latest articles
    Success! You've been added to our email list.
    Melbourne

    348 High Street

    Prahran, VIC 3181

    Australia

    +61 (3) 8517-4300

    Boston

    55 Court St, Level 2

    Boston, MA 02108

    USA

    +1 617 274 6660

    Berlin

    Belziger Str. 71

    Berlin 10823

    Germany

    +1 617 274 6660

    Go wow them! ™ | Core dna copyright ©  2025.