When to use
Operational errors should be used to categorize and handle:- Transient failures: Temporary conditions that may resolve on retry (e.g., database timeouts, lock contention)
- Expected race conditions: Situations where concurrent operations conflict in predictable ways
- External service issues: Problems with third-party integrations that are outside our control
- Resource contention: When system resources are temporarily unavailable
How to use
Identifying operational errors
To add a new operational error type:- Create a matcher function in
server/polar/operational_errors.py:
- Register the matcher in the
_operation_error_matchersdictionary:
Handling operational errors
The system automatically handles operational errors through middleware:- API requests:
OperationalErrorMiddlewareinserver/polar/middlewares.py - Background jobs:
OperationalErrorMiddlewareinserver/polar/worker/_broker.py
- It’s logged as a warning (not an error)
- A Prometheus counter is incremented for observability
- Sentry events are marked as operational (and filtered out)
How it works
Key components
- Matcher functions: Each operational error type has a matcher function that identifies specific exception patterns
- Registry: The
_operation_error_matchersdictionary maps error types to their matchers - Handler:
handle_operational_error()checks exceptions against all registered matchers - Middleware: Automatically intercepts exceptions in both API and worker contexts
- Observability: Prometheus metrics and structured logging provide visibility
Current operational error types
- sql_timeout_error: Database query timeouts from asyncpg
- sql_lock_not_available_error: Database lock contention errors
- timeout_lock_error: Distributed lock acquisition timeouts
- external_event_already_handled: Idempotency conflicts in event processing
- loops_client_operational_error: Issues with the Loops email service integration
Benefits
- Reduced noise: Operational errors don’t trigger error alerts
- Better observability: Specific metrics for each error type
- Improved debugging: Clear distinction between bugs and expected conditions
- Consistent handling: Uniform approach across API and worker contexts

