Architecting FIVUCSAS: a multi-tenant biometric authentication platform

FIVUCSAS — Face and Identity Verification Using Cloud-based SaaS — started as my Marmara senior engineering project and now ships under RollingCat Software, the umbrella name I publish some of my work under. It is a production multi-tenant biometric authentication platform: Spring Boot 3 / Java 21 on PostgreSQL 16 + pgvector, a FastAPI ML sidecar, a React 18 web app, and Kotlin Multiplatform desktop and mobile clients. WebAuthn-first, KVKK-compliant, self-hosted on a Hetzner CX43 box behind Traefik, observed with Loki + Promtail + Grafana, backed up with pgBackRest WAL archiving for PITR. The repo lives at github.com/Rollingcat-Software/FIVUCSAS, and the embeddable widget at verify.fivucsas.com.

This piece is not a feature tour. It is what I would tell another engineer about to build something at this scope — the three production incidents the team learned the most from, written up the way I wish the post had existed when we were getting into it.

The shape of the system

The platform splits into three runtime axes:

Identity Core API — Spring Boot 3 / Java 21, the authoritative source of truth for tenants, users, sessions, audit logs, MFA factors (TOTP, WebAuthn, NFC, biometric).
Biometric Processor — FastAPI sidecar in Python. Owns the ML stack (face mesh, embedding extraction, active-liveness puzzle scoring). Talks to the API over a private Docker network — never exposed publicly.
Clients — React web app + Kotlin Multiplatform mobile + a Desktop / Admin client. Each one is a thin shell over the API; the embeddable widget is the smallest possible surface a tenant integrates against.

The core insight that shaped almost every decision: biometric data must never sit unencrypted at rest, and the embedding extraction process must never be reachable from the internet. Everything else fell out of that.

Incident 1 — The day a test wiped a real user

In the early schema, users was hard-deletable. The cascade graph (which we had not fully drawn) reached 13 tables including webauthn_credentials, nfc_keys, totp_secrets, and the biometric reference table. A test cleanup script, harmless in isolation, ran against a row that turned out to be a production account. The cascade did exactly what we told it to.

What was lost, in seconds: TOTP, WebAuthn passkey, NFC binding, biometric reference. What was not lost: the audit log, because audit lives in a separate cascade-isolated table on purpose.

The fix was small and obvious in retrospect:

Move users to soft-delete (deleted_at IS NULL everywhere we used to filter on existence).
Patch every findByEmail / findById to add the soft-delete predicate.
Schema-level guardrails: a CI check that greps for DELETE FROM users in the codebase.

The lesson we took: draw the FK graph first, before the first migration. A 13-table cascade is not a bug — it is an architecture decision you made without realizing it.

Incident 2 — Embedding encryption, the third time

The first version stored face embeddings as raw float[] columns. The second version base64-encoded them. Neither is encryption. The third version finally did it right: every embedding is encrypted with Fernet (AES-128-CBC + HMAC-SHA256) using a per-environment key, stored encrypted in pgvector, and decrypted in-process only at the moment the cosine similarity calculation runs.

The migration to v3 had to be done online, against production data, so the rollout was:

Add the encrypted column alongside the plaintext one.
Dual-write for one release.
Backfill existing rows in batches off-hours.
Switch reads to the encrypted column.
Drop the plaintext column in a follow-up migration.

The bit that almost broke it: the operator must set FIVUCSAS_EMBEDDING_KEY at boot — the application fails fast if the key is missing, which is intentional. Letting it default to a random key would silently invalidate every existing embedding.

The lesson: for irreversible data transformations, fail-fast on configuration is better than fail-soft on behavior.

Incident 3 — Refresh tokens, family revocation, and the rollback

Refresh-token rotation is one of those things that looks simple in a sequence diagram and breaks in seven non-obvious ways at scale. The version I shipped first was correct in isolation but wrong under concurrent requests: a mobile client briefly online over flaky 3G could submit a refresh, get a new pair, lose the response, and retry — and my server would treat the second submission as token reuse and revoke the entire family.

The user-visible symptom: a single failed network round-trip would log them out across all devices. From the inside it looked correct — a “stolen token” detection — but from the outside it was a heisenbug that mostly hit mobile users.

The fix was a hashing-based family-revoke design (V55 in the migration log):

Store hashed refresh tokens, not plaintext.
A short reuse-grace window (a few seconds) where the same client can resubmit without triggering family-revoke.
Distinguish “different token in same family used after rotation” (real attack signal — revoke) from “same token submitted twice in 5 seconds” (network retry — accept).

The lesson: the network is not a sequence diagram. Every protocol that distinguishes “legitimate retry” from “attack” needs to model retries explicitly, not as an afterthought.

What the architecture is good at

Clean separation of concerns. The ML sidecar can be replaced or upgraded without touching the API; the API can move identity providers without touching the ML stack.
Schema-driven multi-tenancy. Tenant isolation is a database-level concern, not an application-level filter. There is no WHERE tenant_id = ? clause that someone can forget to add.
Operational primitives by default. Loki + Promtail + Grafana for logs and metrics, pgBackRest for WAL archiving, fail-loud backups with restore-verify, gitleaks CI on every commit.

What we would do differently

Draw the FK graph on day zero. Soft-delete by default; add hard-delete only where there is a real legal requirement.
Treat the ML sidecar as a separate deployable from the start. Bundling it in early was a shortcut; splitting it later was three weeks of yak-shaving.
Pick one MFA factor as the canonical primary. WebAuthn is the right answer; everything else (TOTP, NFC, biometric) is a fallback.

Reading list

Designing Data-Intensive Applications, ch. 5 (Replication) and ch. 9 (Consistency and Consensus).
The WebAuthn Level 3 spec — particularly user verification and client-extension handling.
The Fernet specification + the cryptography.io implementation notes on key rotation.

The full source is private until a third-party security review completes. Source access is available on request — email me at ahmetabdullahgultekin@gmail.com.

architecture
biometric
postgres
webauthn