What the product needed to do
A merchant-onboarding SDK has three non-obvious demands: idempotent writes under retry (webhooks fire twice, often), eventual-consistent reads at high cardinality (partner dashboards query the full merchant book), and deterministic KYC callbacks (banks require an audit trail). The existing system handled each of these as ad-hoc retries and cron sweepers — the new architecture had to bake the guarantees in at the SDK boundary.
The technical bet
The biggest bet was putting Kafka between the SDK gateway and the settlement service. Every existing team in the org was still on the synchronous write path and nervous about event-sourced architectures. We mitigated by shipping the dual-write for four weeks — every onboarding request landed in both the old Postgres table and the new Kafka topic — and only cut over once the Kafka pipeline had ten days of zero-drift reconciliation.
What shipped
- A typed TypeScript SDK with four primary endpoints:
enrol,submitKyc,attachSettlement,invalidate. Full codegen from a single OpenAPI schema. - A Kafka-backed write path with at-least-once delivery and idempotency keys sourced from the partner's own reference ID.
- A Datadog-instrumented trace that stitched partner SDK → gateway → KYC → settlement into a single flame graph.
- A partner sandbox with golden-path and failure-mode scenarios that new integrators could hit from a curl in 30 seconds.
Where it's now
The SDK is live at 9M+ merchants across ten bank partners. The platform has survived two press cycles that drove 40× traffic spikes with zero incidents, and the SDK is now used internally for every new merchant-adjacent product BharatPe launches. The team that was worried about needing a second platform squad hasn't had to hire one yet.
