Operational errors are a category of errors that represent transient or expected failure conditions in distributed systems. Unlike bugs or unexpected failures, operational errors are conditions that can occur during normal operation and should be handled gracefully by the system.Documentation Index
Fetch the complete documentation index at: https://handbook.polar.sh/llms.txt
Use this file to discover all available pages before exploring further.
When to use
Operational errors should be used to categorize and handle:- Transient failures: Temporary conditions that may resolve on retry (e.g., database timeouts, lock contention)
- Expected race conditions: Situations where concurrent operations conflict in predictable ways
- External service issues: Problems with third-party integrations that are outside our control
- Resource contention: When system resources are temporarily unavailable
How to use
Identifying operational errors
To add a new operational error type:- Create a matcher function in
server/polar/operational_errors.py:
- Register the matcher in the
_operation_error_matchersdictionary:
Handling operational errors
The system automatically handles operational errors through middleware:- API requests:
OperationalErrorMiddlewareinserver/polar/middlewares.py - Background jobs:
OperationalErrorMiddlewareinserver/polar/worker/_broker.py
- It’s logged as a warning (not an error)
- A Prometheus counter is incremented for observability
- Sentry events are marked as operational (and filtered out)
How it works
Key components
- Matcher functions: Each operational error type has a matcher function that identifies specific exception patterns
- Registry: The
_operation_error_matchersdictionary maps error types to their matchers - Handler:
handle_operational_error()checks exceptions against all registered matchers - Middleware: Automatically intercepts exceptions in both API and worker contexts
- Observability: Prometheus metrics and structured logging provide visibility
Current operational error types
- sql_timeout_error: Database query timeouts from asyncpg
- sql_lock_not_available_error: Database lock contention errors
- timeout_lock_error: Distributed lock acquisition timeouts
- external_event_already_handled: Idempotency conflicts in event processing
- loops_client_operational_error: Issues with the Loops email service integration
Benefits
- Reduced noise: Operational errors don’t trigger error alerts
- Better observability: Specific metrics for each error type
- Improved debugging: Clear distinction between bugs and expected conditions
- Consistent handling: Uniform approach across API and worker contexts

