Status: Active
Created: August 2025
Last Updated: August 2025
Context
We are using Stripe Subscriptions, meaning that Stripe is responsible for creating the subscriptions and managing the retries of the failing methods. At the beginning we want to mimic the Stripe retry logic to have similar recovery results.Solution
Approach 1: New Dunning concept
This approach involved creating a newDunningAttempt
model that keeps track of the failed payment attempts for subscriptions. A dedicated worker would manage the state transitions.
Overview:
- New
DunningAttempt
model: Introduce a new class and table to process subscription failed payments. This new class will have:id
: UUIDorder_id
: foreign key to Order (unique - we should have only 1 row per order)status
: retrying, succeeded, failedattempt_number
: the current retry count (from 1 through 4)next_payment_attempt_at
: date when the next retry is scheduledlast_failure_reason
: the error message that failedstarted_at
: when the dunning process began
- Renewal Job: A scheduled job (
subscription.cycle
) runs regularly to find subscriptions due for renewal. - Payment Attempt: For each due order subscription, it calls
PaymentService
to attempt a charge. - State Transition:
- On Failure: The subscription status is changed to
past_due
, and a new DunningAttempt record is created withattempt_number
set to 1, statusretrying
, and anext_payment_attempt_at
. The benefits will be revoked. - On Success: The subscription is renewed, and the status remains
active
.
- On Failure: The subscription status is changed to
- Dunning Worker: A new periodic worker (
subscription.dunning
) runs hourly.- It queries for
dunning_attempts
withstatus = "retrying"
, andnext_payment_attempt_at
it’s in the past. - It re-attempts payment. If it fails again, it updates
next_payment_attempt_at
for the next retry. - After a configured number of retries, it moves the DunningAttempt to failed and the subscription to
unpaid
orcanceled
.
- It queries for
- Recovery: If a payment succeeds during the dunning process, the subscription status is set back to
active
and the DunningAttempt is marked tosucceeded
Sequence Diagram
First payment Dunning retries TBD: User updated payment method TODOApproach 2: Store the retries in the Subscription model
I discarded this solution as it increases the complexity of the Subscription model.(Recommended) Approach 3: Using Order and Payment Models
This approach relies on the existingOrder
and Payment
models, making it a lightweight and integrated solution. It avoids a new table only needing minimal changes to the Order
model.
Overview:
- Change
Order
Model: Add a single, nullable timestamp field to theOrder
model:next_payment_attempt_at
: Schedules the next retry. IfNULL
, no retry is pending.
- Use Existing
Payment
Model: ThePayment
model, which already has a foreign key toOrder
, will serve as the log for all payment attempts.attempt_number
can be derived by counting thePayment
records associated with the order.last_failure_reason
can be retrieved from the most recent failedPayment
record.
- Renewal and Initial Payment: The process starts as before. A scheduled job creates an
Order
and aPayment
is attempted. - State Transition on Failure:
- When a payment fails, the
Payment
record is marked asfailed
. - The
Order
is updated by settingnext_payment_attempt_at
to schedule the first retry (e.g., 3 days from now). - The
Subscription
status is set topast_due
. Benefits are revoked.
- When a payment fails, the
- Dunning Worker: A periodic worker (
order.process_dunning
) runs hourly.- It queries for
Order
s wherenext_payment_attempt_at
is notNULL
and is in the past. - For each due order, it creates a new
Payment
record and attempts a charge. - On Failure: It reschedules
next_payment_attempt_at
for the next attempt. - On Final Failure: After the last attempt,
next_payment_attempt_at
is set toNULL
. TheSubscription
is moved tounpaid
orcanceled
.
- It queries for
- Recovery:
- If a payment succeeds, the
Payment
is marked assucceeded
, theOrder
is marked as paid, andnext_payment_attempt_at
is set toNULL
. - The
Subscription
status is set back toactive
. Benefits are re-granted.
- If a payment succeeds, the
Comparison with Approach 1
Pros:- Maintainability: Avoids a new table and only adds one nullable column to
Order
. - Leverages Existing Models: Reuses the
Payment
model for its intended purpose, keeping the design clean and logical. - Clear and Explicit: The
Order
model is only concerned with when the next payment is due, while thePayment
model correctly stores the history of what happened.
- Complex Queries: Retrieving the full dunning history for an order (like the attempt count or last error) requires querying the associated
Payment
records. We only expect 4 payments per order. Mitigation: it can be cached or denormalized if it becomes expensive. - Race conditions: As before, we can have a Race condition when the retry mechanism is working and the user plans to manually pay. Mitigation can be prevented by locking the Order.
Sequence Diagrams
First Subscription Payment Failure This diagram shows the flow when a recurring subscription payment fails for the first time, initiating the dunning process. Dunning Retries This diagram illustrates how the periodic worker handles scheduled payment retries. User Updates Payment Method TODOQuestions
How does the retry mechanism look in our Stripe account In Stripe we have:- 4 retries at most.
- These 4 retries should happen before 3 weeks.
- If all retries failed, we set the status of the subscription to canceled.
- 2682 with at least 1 failed payment
- 1108 recovered orders
- Then 41% of recovery rate.
- 1253 total
- 541 recovered
- Then 43% of recovery rate.
invoice
and payment_intent
objects. Each payment intent can have multiple transactions associated with it. From what I see, we don’t expose these payment intents in our API. I will keep this as a separate task if we want to do it.
What happens when the customer updates the payment method?
The customer can trigger a manual payment action. If the payment succeeds the order will be marked as paid; if it fails, it will follow the original dunning schedule.
What is the backoff strategy of Stripe?
Stripe has different backoffs, but based on (t1, t2, t3) we have the following ranges:
- (D+0) Subscription payment fails
- (D+2/3) First attempt
- (D+7/12) Second attempt
- (D+14/17) Third attempt
- (D+21) Fourth attempt.
unpaid
or canceled
?
In Stripe, we are moving it to canceled
.