Secrets encryption - Polar Handbook

Status: DraftCreated: 2026-06-23Last Updated: 2026-06-25Author: Petru Rares Sincraian

Summary

We store secrets in plain text in the database: OAuth access and refresh tokens, OAuth client secrets, and Slack bot and signing secrets. Anyone who reads a copy of the database can use them right away. This RFC encrypts these secrets before we store them. We recommend AWS KMS. Secrets are stored as ciphertext, wrapped in an EncryptedString type so loading a row makes no KMS call, and decryption is an explicit await only at the few places that actually need the plaintext. Local development and CI do not call AWS: they use a static key instead, so tests need no cloud access. The full mechanics and worked examples are in the appendices.

Goals

A leaked database dump, a backup, a read replica, or read-only access must not expose the secrets.
Cheap even with many secrets (one per row).
Allows key rotation.
Local development don’t call external services.

What we protect against

KMS keeps the master key separate from the database and the app environment, so the common, database-level leaks become useless to an attacker.

We protect against	We do not protect against
A stolen backup or snapshot	An attacker running our code
A read replica or analytics copy	A leak of the AWS credentials and the database together
Read-only or SQL-injection read access
An accidental `SELECT *` in logs
A teammate with database access reading secrets

Current state

Secrets are stored without encryption today:

Model	Column	File
`OAuthAccount`	`access_token`, `refresh_token`	`server/polar/models/user.py:68`
`SlackApp`	`client_secret`, `signing_secret`, `bot_token`	`server/polar/models/slack_app.py:30`
`OAuth2Client`	`client_secret`, `registration_access_token`	`server/polar/models/oauth2_client.py:20`

Checkout secrets are not considered secrets.

Options

Option	Key kept out of the app	Rotation	Audit log	Cost at our scale	Verdict
`sqlalchemy-utils` `EncryptedType`	No	None built in	No	Flat	Reject
In-house key	No	Manual	No	Flat	Local/dev only
AWS KMS envelope	Yes	Automatic	CloudTrail	~ $1/month +$ 0.03/10k calls	Recommended
AWS Secrets Manager	Yes	Yes	CloudTrail	Thousands/month	Reject (cost)

Main reason to reject in-house solutions are because those are kept in the same place as the app and database. AWS solutions have rate limits, guards, and audit logs built in that makes it easier to protect a full decryption.

Decision

Store each secret as ciphertext, wrapped in an EncryptedString type, and encrypt or decrypt it explicitly through a small secret service, backed by a KeyProvider chosen by config:

Production and sandbox: a KMS provider with one key per environment.
Local and CI: a static-key provider that runs locally.

We map the column with a SQLAlchemy TypeDecorator, but a wrap-only one: it does no crypto and no I/O. On load it only boxes the stored ciphertext into an EncryptedString object; on save it unboxes it. The KMS call is a separate, explicit await secret.decrypt(). So loading or listing rows still fires zero KMS calls and never blocks the event loop, while we keep type safety and good ergonomics. The distinction that matters is between a decrypting type (eager, synchronous, the thing to avoid) and a wrap-only type (lazy, cheap). Design and worked example in Appendix A.

How key rotation works

Two things can rotate, and only the second ever rewrites existing rows:

Master key (in KMS). Turn on automatic rotation. KMS makes new key on a schedule and keeps the old key. New secrets are wrapped with the new key, old secrets still decrypt with the old one.
Full re-encryption (only after a suspected leak). A background job reads each secret and writes it back. Because every write generates a fresh data key, this also gives every secret a new one.

There is no separate “data key rotation” step: each secret already gets its own data key when it is written. Full walk-through in Appendix B.

Rollout

We migrate one column at a time, without losing data, in this order:

Add the encrypted column and write both (plain and encrypted) while reading plain.
Backfill existing rows with a background job (see server/polar/meter/tasks.py for the pattern).
Switch reads to the encrypted column, then drop the plain column in a later migration.

The full migration plan is in Appendix C.

Appendix A: How envelope encryption works

“Envelope encryption” means we do not encrypt the secret directly with the master key. Instead:

The master key (called a CMK, customer master key) lives inside AWS KMS and never leaves it.
For each secret we use a data key: a fresh, short-lived key that does the actual encryption on our server.
We store the data key next to the secret in wrapped form, meaning encrypted by the master key.

KMS only ever handles the tiny data key, never the secret itself.

Writing a secret (worked example)

Say we store a Slack bot token xoxb-1234.

Ask KMS for a data key: GenerateDataKey(KeyId, EncryptionContext={"table": "slack_apps", "column": "bot_token", "id": "<row uuid>"}). KMS returns two things: the plaintext data key (32 random bytes) and the wrapped data key (the same key, encrypted by the master key).
Encrypt the token on our server with AES-256-GCM, using the plaintext data key and a random nonce.

Store one string in the column:

v1.<wrapped data key>.<nonce>.<ciphertext>

Throw away the plaintext data key.

The v1 prefix is a version marker, so we can change the format or algorithm later without guessing how old rows were written.

Reading a secret

Read the column and split it into its four parts.
Ask KMS to unwrap the data key: Decrypt(wrapped data key, EncryptionContext={"table": "slack_apps", "column": "bot_token", "id": "<row uuid>"}).
Decrypt the token on our server with AES-256-GCM.

Each read makes one KMS Decrypt call, which is fast and cheap. We do not cache by default. If a hot path ever reads the same secret in a loop, we can decrypt once per batch at the call site, or add a small bounded cache then.

Encryption context (audit + safety)

The EncryptionContext is a set of plain (non-secret) labels, for example {"table": "slack_apps", "column": "bot_token", "id": "<row uuid>"}. KMS ties it to the wrapped data key: decryption fails unless we pass the same labels. It also appears in CloudTrail, so every decryption is logged with the exact row it was for. The wrap-only type only sees the column value on load, not the row, so it carries the static {table, column} part and the caller supplies the row id at encrypt and decrypt. This binds each ciphertext to its row: a ciphertext copied into another row fails to decrypt because the id no longer matches, and a missing id fails closed.

The provider abstraction

Encryption and decryption live in a small secret service, which talks to a KeyProvider chosen by config:

class KeyProvider(Protocol):
    # returns (plaintext_data_key, wrapped_data_key)
    def generate_data_key(self, context: dict[str, str]) -> tuple[bytes, bytes]: ...
    def decrypt_data_key(self, wrapped: bytes, context: dict[str, str]) -> bytes: ...

KMSKeyProvider (prod/sandbox): calls KMS GenerateDataKey / Decrypt.
LocalKeyProvider (local/CI): wraps the data key with a static key from an env var.

Where it runs (the EncryptedString type)

The column is mapped with a wrap-only TypeDecorator. It never calls KMS: on load it boxes the ciphertext into an EncryptedString; on save it unboxes it.

class EncryptedStringType(TypeDecorator):
    impl = Text
    cache_ok = True  # safe only because context is stored hashable below

    def __init__(self, context: dict[str, str]) -> None:
        super().__init__()
        self.context = tuple(sorted(context.items()))  # hashable -> part of the statement cache key

    def process_bind_param(self, value, dialect):     # write: unwrap to ciphertext, reject raw str
        if value is None:
            return None
        if isinstance(value, EncryptedString):
            return value.encrypted_value
        raise ValueError("encrypt the value before assigning it")

    def process_result_value(self, value, dialect):    # read: wrap, no KMS
        return None if value is None else EncryptedString(value, dict(self.context))


CTX = {"table": "slack_apps", "column": "bot_token"}


class SlackApp(RecordModel):
    bot_token: Mapped[EncryptedString] = mapped_column(EncryptedStringType(CTX))

The EncryptedString wrapper holds the ciphertext and owns the only paths that touch KMS, both explicit and async. The synchronous boto3 call runs through asyncio.to_thread, the pattern Polar already uses for S3 and SQS. The type holds the static CTX ({table, column}); the caller adds the row id at encrypt and decrypt so the ciphertext is bound to its row. The row id is available before insert (Polar ids are app-generated), and a missing or wrong id fails closed, so an inconsistent call site errors immediately:

# write (encryption is async, so it happens before assignment; the row id is bound in)
slack_app.bot_token = await EncryptedString.encrypt(token, context={**CTX, "id": str(slack_app.id)})

# use (one KMS Decrypt, only here; the same row id is required)
token = await slack_app.bot_token.decrypt(id=str(slack_app.id))
await slack_client.conversations_list(token)

Three details that are easy to get wrong:

cache_ok = True requires the per-column context stored as a hashable value (a sorted tuple). With a raw dict, SQLAlchemy disables statement caching and can collide cache keys across columns.
EncryptedString is immutable and always reassigned, so ORM change tracking works without sqlalchemy.ext.mutable.
process_bind_param rejects a raw str, because encryption is async and cannot run inside the synchronous processor.

A list of 100 Slack apps loads with zero KMS calls; we pay one Decrypt only for the app we actually message.

Exposing secrets in the API

Pydantic cannot serialize asynchronously, so a field cannot decrypt itself during serialization. The endpoint decrypts and passes a plain str to the schema:

class OAuth2ClientRead(BaseModel):
    client_secret: str  # decrypted plaintext

return OAuth2ClientRead(
    id=client.id,
    client_secret=await client.client_secret.decrypt(),  # the one KMS call, awaited here
)

We don’t plan to have a separate endpoint to reveal secrets. But, we make the secret exposition an opt-in feature. Schemas that do not need the secret omit it, so listing stays at zero KMS calls. As a safety net, EncryptedString serializes to "<encrypted>" if it ever lands in a schema, so revealing the plaintext is always a deliberate decrypt() call. One caveat: a schema that exposes the secret cannot be built with model_validate(orm, from_attributes=True) because the attribute is an EncryptedString, not a str; it is constructed explicitly with the decrypted value.

IAM and config

The Render service role gets only two actions on the one key, optionally pinned to an encryption context:

{
  "Effect": "Allow",
  "Action": ["kms:GenerateDataKey", "kms:Decrypt"],
  "Resource": "arn:aws:kms:...:key/<key-id>"
}

Config adds AWS_KMS_KEY_ID (the key id) and ENCRYPTION_LOCAL_KEY (the static key for local/CI) to server/polar/config.py. We provision one key per environment in Terraform and pass POLAR_AWS_KMS_KEY_ID through secret_environment_variables, the same way POLAR_SECRET is wired today.

Cost and limits

One KMS key can back an unlimited number of data keys. KMS does not store data keys: it mints one, returns it, and forgets it. The wrapped data keys live in our database. The bill is essentially the flat key fee:

~$1/month per key (a small, capped surcharge applies when automatic rotation is on).
$0.03 per 10,000 GenerateDataKey / Decrypt calls, with the first 20,000 per month free.

At our read and write volume for secrets, this stays at or near the free tier, so the bill is essentially the flat key fee.

Appendix B: How key rotation works

1. Master key rotation (automatic, no work for us)

A KMS key has key material: the secret bytes KMS uses to wrap and unwrap our data keys. When we enable automatic rotation, once a year (configurable between 90 and 2560 days) KMS creates new key material for the same key. Crucially:

The key keeps the same ID and ARN.
KMS keeps every old version of the material for as long as the key exists.
The wrapped data key records which version wrapped it, so KMS always picks the right one to unwrap.

So existing data still decrypts after rotation, with no re-encryption and no code change. Example: written in 2026, read in 2027.

June 2026: we encrypt a GitHub token. KMS wraps its data key with material version A. We store v1.<wrapped-with-A>.<nonce>.<ciphertext>.
January 2027: KMS rotates the key automatically. It creates material version B and keeps version A.
March 2027: we read that 2026 token. We send the wrapped data key to KMS. KMS sees it was wrapped with version A, uses it, and returns the data key. We decrypt locally, having rewritten nothing.
Any token written after January 2027 is wrapped with version B.

Automatic rotation limits how much data any single version of the material covers. It is a hygiene control; the real protection is that the material never leaves KMS.

2. Data keys (one per secret, nothing to schedule)

Every encrypt generates a fresh data key for that value (see the write path in Appendix A), so each row already has its own unique data key. There is no shared, long-lived data key, so there is nothing to rotate on a schedule.

3. Full re-encryption (only after a suspected leak)

If we believe a data key or the stored data was exposed, we run a background job that reads each secret and writes it back under a fresh key. This is the only case that touches every row.

Appendix C: Migration plan

We migrate each secret column on its own, so a problem with one never blocks the others. For a column X:

Add X_encrypted (a new column holding ciphertext). Keep the plain X for now.
Dual-write. On every write, set the plain X and also X_encrypted = await EncryptedString.encrypt(value, context=...). On read, prefer await obj.X_encrypted.decrypt() and fall back to X. The app behaves exactly as before.
Backfill. A batched script reads each row, encrypts X, and fills X_encrypted (one KMS GenerateDataKey call per row, so pace the batches within the KMS request-rate quota). The pattern to copy is meter.backfill_events in server/polar/meter/tasks.py. Migrations stay thin (add the column only); the data work happens in the task.
Cut over. Once the backfill is done, switch reads to X_encrypted only.
Drop the plain column in a later migration, after we are confident.

This is reversible at every step until step 5: if anything looks wrong, we keep reading the plain column. Order of columns, highest value first:

Step	Columns	Why first
1	`OAuthAccount.access_token`, `refresh_token`	Account takeover; one per user
2	`SlackApp.client_secret`, `signing_secret`, `bot_token`	Workspace takeover
3	`OAuth2Client.client_secret`, `registration_access_token`	Low cardinality, lower urgency

​Summary

​Goals

​What we protect against

​Current state

​Options

​Decision

​How key rotation works

​Rollout

​Appendix A: How envelope encryption works

​Writing a secret (worked example)

​Reading a secret

​Encryption context (audit + safety)

​The provider abstraction

​Where it runs (the EncryptedString type)

​Exposing secrets in the API

​IAM and config

​Cost and limits

​Appendix B: How key rotation works

​1. Master key rotation (automatic, no work for us)

​2. Data keys (one per secret, nothing to schedule)

​3. Full re-encryption (only after a suspected leak)

​Appendix C: Migration plan

Summary

Goals

What we protect against

Current state

Options

Decision

How key rotation works

Rollout

Appendix A: How envelope encryption works

Writing a secret (worked example)

Reading a secret

Encryption context (audit + safety)

The provider abstraction

Where it runs (the EncryptedString type)

Exposing secrets in the API

IAM and config

Cost and limits

Appendix B: How key rotation works

1. Master key rotation (automatic, no work for us)

2. Data keys (one per secret, nothing to schedule)

3. Full re-encryption (only after a suspected leak)

Appendix C: Migration plan