Cannot sign into 1Password accounts

Incident Report for 1Password

Postmortem

Incident Postmortem - Sign-in Service Degradation

Date of Incident: 2025-08-05
Time of Incident (UTC): 20:20 - 21:20
Service(s) Affected: Sign-in, Web Application, Command Line Interface (CLI), Single Sign On (SSO), API’s
Impact Duration: 1 hour

Summary

On August 5, 2025, 1Password experienced a service degradation that impacted customers' ability to sign in and access the web application. The incident was triggered during a planned architectural improvement when a misconfigured rollback attempt caused an overload of traffic and a subsequent database connection bottleneck. The issue was resolved by correcting the misconfiguration and restarting the web application servers, fully restoring service.

Impact on Customers

During the service disruption, some customers experienced degraded performance when accessing their 1Password vaults and signing in.

Sign-in Issues: Customers may have experienced sign-in slowness or timeouts.
Error Messages: Customers may have seen error messages when attempting to sign in, such as "Can't sign in", "Failed to determine sign in methods for email", or "Upstream connect error".
Vault Access: Some customers experienced degraded performance when accessing their 1Password vaults.
Geographic Regions Affected: USA/Global

What Happened?

The incident was part of ongoing improvements to the 1Password infrastructure and was not the result of a security incident. Customer data was not affected.

Timeline of Events (UTC):

17:12: A planned, phased rollout of an architectural improvement to authentication systems begins.
18:20: Engineers monitoring the rollout begin to observe higher latency during sign-in for a small subset of accounts.
20:17: A rollback of the change is initiated. An error in the rollback configuration sends a high volume of traffic to the new code path, causing a database connection bottleneck. Engineers observing the deployment immediately observe service impact.
20:21: A corrective action is deployed to revert the system to its previous state before the misconfigured rollback.
20:39: While impact is still being observed, a failover from the primary database to a secondary database is initiated. This action has no effect.
20:58: A restart of the service that manages incoming traffic to our services is initiated to reset connections.
21:13: A rolling restart of the web application servers is initiated.
21:20: Service is fully restored for all customers.

Root Cause Analysis: The root cause was an error in the configuration of an attempted rollback. This misconfiguration incorrectly routed a high volume of sign-in traffic through a new, slower code path, which created a bottleneck of connections to our primary database and made the web application unresponsive.

How Was It Resolved?

Resolution Steps: The issue was fully resolved through two key actions:

The rollback misconfiguration was identified and corrected, which stopped traffic from flowing to the problematic new code.
A rolling restart of the web application servers was performed to clear the backlog of stuck database connections.

Verification of Resolution: Monitoring systems were closely observed for 30 minutes to ensure error rates returned to normal.

What We Are Doing to Prevent Future Incidents

We are working to implement the following improvements:

Improve configuration testing: We will improve testing procedures of configuration updates and their rollbacks prior to being pushed to production.
Improve our deployment tooling: We will add additional validation to our traffic management tools to prevent similar configuration errors.
Review our incident response procedures: We have updated the runbook used to respond to this type of incident with guidance that will enable faster recovery.
Enhance our monitors: We will add more specific alerts that will help us more quickly distinguish between different application tiers, allowing for faster diagnosis and time to resolution.

Next Steps and Communication

No action is required from our customers. 1Password applications are designed to be resilient, with local copies of vault data always available on customer devices, even without a connection to the 1Password service.

If you are still experiencing issues, please contact our support team.

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely, The 1Password Team

Posted Aug 11, 2025 - 13:53 EDT

Resolved

This incident has been resolved.

Posted Aug 05, 2025 - 17:59 EDT

Monitoring

We've rolled out changes to mitigate this issue. We're seeing recovery and will continue to monitor the situation.

Posted Aug 05, 2025 - 17:30 EDT

Identified

The issue has been identified and we are working to resolve the issue. In the meantime, if you already have the 1Password app installed on your device, you can still access your saved items in offline mode in the app, if your admin hasn't disabled this feature. Please note that new changes won't sync until the service is resolved.

Posted Aug 05, 2025 - 17:22 EDT

Investigating

We are investigating issues affecting sign-ins. Users may see errors such as "Can’t sign in. The request took too long." Our team is working to resolve this as quickly as possible.

Posted Aug 05, 2025 - 16:46 EDT

This incident affected: USA/Global (Sign in).