On August 7, 2025, at approximately 21:04 UTC, the Storj US1 satellite experienced a performance degradation due to an unusually high volume of concurrent uploads to the same objects. The incident affected only the US1 satellite, while the AP1 and EU1 satellites remained fully operational throughout the event.
The primary root cause was an exceptionally high volume of simultaneous uploads targeting the same objects, which triggered a bottleneck in the database due to transaction contention. This led to a cascade of issues, including database locking, request timeouts, and connection pool exhaustion. Database transactions can cause issues because they hold locks on the affected data, preventing other operations from accessing or modifying that data until the transactions are committed or rolled back. The increased level of long running transactions eventually leads to request timeouts.
The incident affected customers and applications relying on the Storj US1 satellite for storage and retrieval operations. The AP1 and EU1 satellites were not affected by this incident and continued to operate normally.
21:04 UTC: Database transaction locks started to increase.
21:14 UTC: The on-call team received a page and started investigating the issue.
21:55 UTC: A fix was implemented and the on-call team started monitoring the results.
22:31 UTC: The on-call team started investigating another increase in error rates.
22:45 UTC: Error rates trended back down to normal and the on-call team continued to monitor for further issues.
23:05 UTC: Operations were fully restored.
To address the issue, we periodically reset the state of connections thus avoiding a cascading growth of contention related errors. To prevent similar incidents from occurring in the future, we implemented the following measures:
We apologize for any impact this service disruption may have caused our customers and users. We are committed to learning from this incident and continuing to improve the reliability and resilience of our platform.