Incident Report on Trading Service Disruption – January 29, 2021

·

On January 29, 2021, OKX experienced a temporary disruption in its trading services between 17:37 HKT and 18:18 HKT. During this period, users encountered intermittent issues accessing the platform and executing trades across web, mobile app, and API interfaces. This report provides a detailed overview of the incident, the root cause, response actions taken, and the long-term measures implemented to enhance platform stability and reliability.

Overview of the Service Disruption

Between 17:37 HKT and 18:18 HKT on January 29, 2021, OKX users experienced intermittent service outages. The primary symptoms included:

👉 Discover how leading platforms maintain high availability during peak traffic.

The incident was triggered by a sudden spike in user traffic that overwhelmed the system’s caching layer, causing bandwidth saturation. This led to cascading failures in internal service communications, resulting in timeouts and degraded performance across multiple components.

Timeline of Events and Response Actions

The following timeline outlines the key moments during the incident and the immediate response from the engineering team:

The entire resolution process took approximately 41 minutes from initial detection to full recovery, with critical trading functions restored within 21 minutes.

Root Cause Analysis

The primary cause of the outage was a sudden surge in user activity that exceeded the designed throughput of the caching infrastructure. While the system was built to handle high loads, this particular spike created bottlenecks in the internal communication layer between microservices.

Specifically:

This scenario highlighted a previously untested edge case under extreme load conditions—an important learning point for future scalability planning.

Measures to Enhance Platform Stability

At OKX, we are committed to delivering a resilient, high-performance trading environment. While no complex system can guarantee 100% uptime, we continuously invest in improving reliability through structural and operational enhancements.

1. Strengthening Engineering Quality and Testing Processes

We have reinforced our software development lifecycle by:

These practices ensure that updates are stable and production-ready, minimizing the risk of introducing new vulnerabilities.

2. Architectural Upgrades for High Availability

To reduce dependency on single points of failure, we are actively transitioning to a multi-node, multi-region architecture. This includes:

This upgrade significantly reduces downtime risks caused by hardware failures or regional network issues.

3. Implementing Hot Updates and Stateless Design

We are progressively adopting hot update capabilities for non-critical logic layers. This allows us to deploy patches and improvements without requiring service restarts or maintenance windows. By designing services to be stateless where possible, we minimize disruption during upgrades—ensuring seamless trading experiences even during system maintenance.

👉 Learn how real-time updates keep trading platforms running without interruptions.

How Users Stay Informed About System Status

Transparency is central to our user communication strategy. To keep traders informed about platform health and upcoming changes, we provide real-time updates through multiple channels:

These tools empower users to make informed decisions during periods of volatility or technical adjustments.

Frequently Asked Questions (FAQ)

Q: What does error code “30012 – Invalid Authority” mean?
A: This error typically indicates an authentication or session validation failure. During high-load events, temporary delays in token verification may trigger this response. Retrying the request after a short delay usually resolves it.

Q: Were any user funds affected during the outage?
A: No. All account balances and transaction records remained secure and intact. The issue was related to service availability, not data integrity or fund safety.

Q: Can OKX prevent similar outages in the future?
A: While no system is immune to extreme conditions, we’ve since upgraded our infrastructure to handle higher traffic volumes and implemented predictive scaling to automatically adjust resources during demand spikes.

Q: How can I monitor OKX system status in real time?
A: Visit our public status page at status.okx.com or subscribe to system alerts via the system/status channel in our API.

Q: Does OKX offer compensation for losses due to service disruptions?
A: We evaluate each incident individually. While general market risk is inherent in trading, we may provide goodwill gestures in cases of prolonged or impactful outages.

Q: Are mobile and web platforms equally reliable?
A: Yes. Both platforms connect to the same backend infrastructure and receive equal priority in terms of performance optimization and uptime保障.


OKX remains dedicated to building one of the most reliable digital asset trading platforms in the industry. The January 29, 2021 incident served as a critical milestone in our journey toward greater resilience. Through continuous innovation, transparent communication, and user-centric design, we aim to set new standards in exchange stability and trust.

👉 Explore how advanced infrastructure supports seamless crypto trading experiences.