Date of Incident: 2025-05-21
Time of Incident (UTC) 16:06:40 - 16:48:10
Service(s) Affected: USA/Global 1Password.com website, Sign in, Sign up, Admin console, SSO (Single Sign On), Command Line Interface (CLI)).
Impact Duration: 41 minutes
Summary
On May 21st, 1Password's web interface, APIs, browser extension, and CLI tools experienced significant latency and errors. These problems stemmed from a code change that triggered a spike in server requests, leading to increased memory usage and system load. As a result, customers were unable to access their vaults or sign in via SSO.
This was not a result of a security incident and customer data was not affected.
Impact on Customers
During the duration of the incident:
- Web interface, Administration: Customers experienced significant delays when accessing the 1Password web interface. Administrators could not access or use any administration tools.
- Single Sign-on (SSO), Multi-factor Authentication (MFA): Users with SSO or MFA enabled could not sign in and received an "An unexpected error occurred" message. Customers may also have been required to re-authenticate to access 1Password once the issue was mitigated.
- Command Line Interface (CLI): CLI users faced increased latency and timeouts when attempting to access our web APIs.
- Browser Extension: Users requiring web interface authentication were unable to unlock their vaults.
- Number of Affected Users (approximate): All users accessing the service in the US/Global (1password.com) region were affected
- Geographic Regions Affected (if applicable): 1password.com (US/Global)
What Happened?
We deployed code changes that increased the number of queries to our Redis clusters. The increase in queries caused a spike in memory usage which in turn caused latency and errors across all endpoints.
Timeline of Events (UTC):
- 2025-05-21 15:52 UTC: Deployment started
- 2025-05-21 15:57 UTC: Deployment complete
- 2025-05-21 16:00 UTC: Automated monitoring detects increased errors and latency
- 2025-05-21 16:01 UTC: Automation pages the incident response team
- 2025-05-21 16:06 UTC: The team activates our incident protocol and begins investigation
- 2025-05-21 16:21 UTC: The team initiates a rollback to a previous version
- 2025-05-21 16:23 Code change causing the issue identified
- 2025-05-21 16:48 UTC: Incident mitigated—rollback completed and we see a significant improvement in error rates and latency. The team continues to monitor the system.
- 2025-05-21 17:23:11 UTC: Incident resolved
Root Cause Analysis:
We released a code change that caused a significant increase in data writes to our session store cluster.
All operations, even those with a pre-established session depend on the session store for authenticating requests.
The resulting resource contention led to increased latency and timeouts.
The unplanned high volume of writes to this specific datastore also caused a portion sessions to be prematurely evicted, requiring customers to re-authenticate earlier than anticipated.
How Was It Resolved?
Our monitoring systems detected the issue and alerted the response team immediately after the release. The team quickly identified the problem and initiated a rollback.
- Resolution Steps: The team identified the problematic code change and reverted to a previous version. As the rollback deployed, server functionality returned to normal.
- Verification of Resolution: Our monitoring systems were closely observed for 2 hours after the rollback to ensure latency and errors were fully resolved.
What We Are Doing to Prevent Future Incidents
- Our team will implement longer testing periods in lower-traffic environments to improve monitoring and issue detection for similarly high-risk changes.
- Our team is working to improve our deployment process to enhance our incremental deployments, which will allow us to detect system issues earlier and contain fallout.
Next Steps and Communication
- Some customers may need to re-authenticate in order to access 1Password
We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.
Sincerely,
The 1Password Team