9. System Design / Backend Design
T
Tuan Nguyen

9. System Design / Backend Design

This section focuses on designing backend systems with real production concerns, including authentication, transactions, scalability, reliability, consistency, retries, rate limiting, pagination, idempotency, and distributed system patterns.

1. How would you design a user registration system?

A user registration system should allow a new user to create an account safely and reliably.

The basic flow is:

Client
→ POST /api/register
→ Validate request
→ Check if email already exists
→ Hash password
→ Save user
→ Send verification email

A good design should not store the password as plain text. The password should be hashed using BCrypt or another secure password hashing algorithm.

Example table:

users (
    id BIGINT PRIMARY KEY,
    email VARCHAR UNIQUE NOT NULL,
    password_hash VARCHAR NOT NULL,
    status VARCHAR NOT NULL,
    created_at TIMESTAMP
)

The system should check duplicate emails before creating the account. However, in production, checking by query is not enough because two requests can arrive at the same time. The database should also enforce a unique constraint on email.

A good registration design also includes email verification. After registration, the user status can be PENDING_VERIFICATION. The system sends a verification token by email. After the user clicks the verification link, the account becomes ACTIVE.

For security, the response should not reveal too much information. For example, when registration fails, avoid exposing sensitive internal errors.


2. How would you design login with JWT?

Login with JWT usually starts with username/email and password.

Flow:

Client sends email + password
→ Backend validates credentials
→ Backend creates access token and refresh token
→ Client stores token
→ Client sends access token in future requests

Example request:

POST /api/login

Example response:

{
  "accessToken": "...",
  "refreshToken": "..."
}

The access token should be short-lived, for example 15 minutes. The refresh token can live longer, for example several days or weeks.

For each protected request, the client sends:

Authorization: Bearer <access_token>

The backend validates:

Token signature
Token expiration
User identity
User roles

In Spring Boot, this is usually handled by a JWT filter inside Spring Security.

Important design point: JWT is stateless, so the server does not need to store session data for every request. However, refresh tokens usually should be stored or tracked so they can be revoked.


3. How would you design password reset?

Password reset should be secure because attackers often target this flow.

Basic flow:

User requests password reset
→ Backend generates reset token
→ Token is stored with expiration time
→ Email is sent to user
→ User clicks link
→ User submits new password
→ Backend validates token
→ Password is hashed and updated
→ Token is invalidated

A reset token should be random, long, and temporary.

Example table:

password_reset_tokens (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    token_hash VARCHAR,
    expires_at TIMESTAMP,
    used BOOLEAN
)

Do not store the raw reset token directly. Store a hash of the token. If the database leaks, attackers cannot use the raw token.

The reset link can look like:

https://app.com/reset-password?token=abc123

The system should also avoid revealing whether an email exists. A safe response is:

If the email exists, a reset link has been sent.

This prevents user enumeration attacks.


4. How would you design order management system?

An order management system handles the lifecycle of an order.

Typical order states:

CREATED
PAID
PACKED
SHIPPED
DELIVERED
CANCELLED
REFUNDED

Basic design:

Order Service
→ Order database
→ Inventory Service
→ Payment Service
→ Notification Service

Important tables:

orders (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    status VARCHAR,
    total_amount DECIMAL,
    created_at TIMESTAMP
)

order_items (
    id BIGINT PRIMARY KEY,
    order_id BIGINT,
    product_id BIGINT,
    quantity INT,
    price DECIMAL
)

A strong design should treat order status transitions carefully. For example, an order should not move from CANCELLED back to SHIPPED.

For large systems, order creation may publish events:

OrderCreatedEvent
OrderPaidEvent
OrderShippedEvent

Other services can react asynchronously.

This makes the system more scalable and loosely coupled.


5. How would you design inventory management?

Inventory management tracks product stock.

Basic table:

inventory (
    product_id BIGINT PRIMARY KEY,
    available_quantity INT,
    reserved_quantity INT,
    updated_at TIMESTAMP
)

When an order is created, the system should not immediately reduce final stock in all cases. A common approach is reservation.

Flow:

Order created
→ Reserve inventory
→ Payment succeeds
→ Confirm inventory deduction

This prevents overselling.

For concurrency, database locking or optimistic locking can be used.

Example optimistic locking:

inventory (
    product_id BIGINT,
    quantity INT,
    version INT
)

When updating stock, the system checks the version. If another transaction changed the row, the update fails and the system retries.

Inventory design must handle race conditions carefully because many users may buy the same product at the same time.


6. How would you design payment processing?

Payment processing should be reliable, secure, and idempotent.

Basic flow:

Client creates order
→ Backend creates payment request
→ Payment provider processes payment
→ Provider sends webhook
→ Backend updates order status

Payment should usually be handled through external providers such as Stripe, Adyen, PayPal, or bank APIs.

Important design points:

Never trust only frontend payment confirmation.
Always verify payment status from provider.
Use webhook events.
Use idempotency keys.
Store payment status history.

Example table:

payments (
    id BIGINT PRIMARY KEY,
    order_id BIGINT,
    provider_payment_id VARCHAR,
    amount DECIMAL,
    status VARCHAR,
    idempotency_key VARCHAR UNIQUE,
    created_at TIMESTAMP
)

Payment status examples:

PENDING
AUTHORIZED
CAPTURED
FAILED
REFUNDED

If the client retries the same payment request, the idempotency key prevents duplicate charges.


7. How would you design notification service?

A notification service sends emails, SMS, push notifications, or in-app messages.

Basic design:

Main service publishes event
→ Kafka / Queue
→ Notification service consumes event
→ Sends notification

Example event:

{
  "type": "ORDER_CONFIRMED",
  "userId": 10,
  "orderId": 1001
}

The notification service should be asynchronous. The order system should not wait for email sending to finish.

A good design includes:

Retry mechanism
Dead letter topic
Template management
Notification status table
Rate limiting
Provider fallback

Example table:

notifications (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    type VARCHAR,
    status VARCHAR,
    sent_at TIMESTAMP
)

This allows tracking whether notification delivery succeeded or failed.


8. How would you design file upload service?

A file upload service should not usually store large files directly in the application database.

Better design:

Client uploads file
→ Backend validates metadata
→ File stored in object storage
→ Metadata stored in database

Object storage can be:

AWS S3
MinIO
Azure Blob Storage
Google Cloud Storage

Database stores only metadata:

files (
    id BIGINT PRIMARY KEY,
    owner_id BIGINT,
    file_name VARCHAR,
    content_type VARCHAR,
    size BIGINT,
    storage_path VARCHAR,
    created_at TIMESTAMP
)

Important validations:

File size limit
File type validation
Virus scanning
Access control
Unique file name
Private/public visibility

For large files, use pre-signed URLs. The backend generates an upload URL, and the client uploads directly to object storage. This reduces backend load.


9. How would you design audit logging?

Audit logging records important actions in the system.

Examples:

User logged in
Password changed
Order cancelled
Admin changed user role
Payment refunded

Audit logs should answer:

Who did what?
When?
From where?
What changed?

Example table:

audit_logs (
    id BIGINT PRIMARY KEY,
    actor_id BIGINT,
    action VARCHAR,
    entity_type VARCHAR,
    entity_id VARCHAR,
    old_value JSON,
    new_value JSON,
    created_at TIMESTAMP,
    ip_address VARCHAR
)

Audit logs should usually be append-only. Users should not be able to modify old audit records.

In high-scale systems, audit logs may be sent to Kafka and stored asynchronously in a separate database or log storage system.


10. How would you design rate limiting?

Rate limiting controls how many requests a user or client can send in a period.

Example:

100 requests per minute per user

Common strategies:

Fixed window
Sliding window
Token bucket
Leaky bucket

For a single application instance, in-memory rate limiting can work.

For multiple instances, use Redis.

Example Redis key:

rate_limit:user:123:/api/login

Rate limiting is important for:

Login protection
API abuse prevention
DDoS reduction
Cost control
Fair usage

If limit is exceeded, return:

429 Too Many Requests

11. How would you design search functionality?

Search can be simple or advanced.

For simple search, database queries may be enough:

SELECT * FROM products WHERE name LIKE '%phone%';

But for large-scale search, use a search engine:

Elasticsearch
OpenSearch
Solr

Design:

Main database stores source of truth
Search engine stores indexed searchable data
Application queries search engine

When data changes, publish an event:

ProductUpdatedEvent
→ Search Indexer
→ Update Elasticsearch index

This makes search faster and more flexible.

Search systems often support:

Full-text search
Filtering
Sorting
Ranking
Autocomplete
Typo tolerance

12. How would you design pagination for millions of records?

For small datasets, offset pagination is common:

LIMIT 20 OFFSET 1000

But for millions of records, large offsets become slow because the database still scans skipped rows.

Better approach: cursor-based pagination.

Example:

GET /orders?limit=20&cursor=last_seen_id

SQL:

SELECT *
FROM orders
WHERE id > :lastSeenId
ORDER BY id
LIMIT 20;

Cursor pagination is faster because it uses indexed columns.

Recommended for:

Large tables
Infinite scrolling
High-traffic APIs
Event feeds
Transaction history

Offset pagination is easier for page numbers, but cursor pagination scales better.


13. How would you design soft delete?

Soft delete means records are not physically removed from the database.

Instead, mark them as deleted.

Example:

users (
    id BIGINT PRIMARY KEY,
    email VARCHAR,
    deleted_at TIMESTAMP NULL
)

When deleting:

UPDATE users SET deleted_at = NOW() WHERE id = 1;

Queries should filter:

WHERE deleted_at IS NULL

Benefits:

Recovery possible
Audit history preserved
Safer delete operation
Compliance support

Problems:

Queries become more complex
Unique constraints become tricky
Old data grows
Performance may degrade

Soft delete is common in enterprise systems but must be designed carefully.


14. How would you design multi-tenant application?

Multi-tenancy means one application serves multiple customers or organizations.

Example:

Tenant A
Tenant B
Tenant C

Common approaches:

Shared database, shared tables

users (
    id BIGINT,
    tenant_id BIGINT,
    email VARCHAR
)

Every table includes tenant_id.

This is cheaper and easier to operate, but tenant isolation is weaker.

Shared database, separate schemas

Each tenant has its own schema.

Better isolation, more complex operations.

Separate database per tenant

Strongest isolation, but highest operational complexity.

For most SaaS systems, shared tables with tenant_id is common.

Important rule:

Every query must filter by tenant_id.

Missing tenant filtering can cause serious data leakage.


15. How would you design role-based access control?

RBAC controls access using users, roles, and permissions.

Example:

User → Role → Permission

Tables:

users
roles
permissions
user_roles
role_permissions

Example:

User: John
Role: ADMIN
Permissions: USER_READ, USER_DELETE

In Spring Security:

@PreAuthorize("hasRole('ADMIN')")

or:

@PreAuthorize("hasAuthority('USER_DELETE')")

A good design separates roles from permissions. Roles are groups. Permissions are specific actions.

This gives more flexibility in large systems.


16. How would you design API versioning?

API versioning allows APIs to evolve without breaking existing clients.

Common approaches:

URL versioning

/api/v1/users
/api/v2/users

Easy to understand and commonly used.

Header versioning

Accept: application/vnd.company.v2+json

Cleaner URLs but more complex.

Query parameter versioning

/api/users?version=2

Simple but less preferred for public APIs.

For most backend interviews, URL versioning is acceptable and practical.

Important design point:

Do not break existing clients suddenly.
Support old versions for a planned deprecation period.

17. How would you design retry mechanism?

Retry mechanism handles temporary failures.

Examples:

Network timeout
External API temporary failure
Database deadlock
Kafka processing error

A good retry design uses:

Max retry count
Exponential backoff
Dead letter queue/topic
Idempotency
Monitoring

Example:

Retry after 1s
Retry after 5s
Retry after 30s
Move to DLT

Retries should not continue forever. Infinite retry can overload systems.

Also, retry should be used only for transient failures. Do not retry validation errors.


18. How would you design distributed transaction?

Distributed transaction happens when one business operation touches multiple services or databases.

Example:

Create order
Reserve inventory
Charge payment
Send notification

Using one database transaction across all services is difficult and usually not recommended in microservices.

Better design:

Saga pattern
Outbox pattern
Idempotency
Compensating actions
Event-driven communication

Instead of trying to make everything immediately consistent, the system coordinates steps and handles failures carefully.

Example:

Order created
→ Inventory reserved
→ Payment failed
→ Inventory released
→ Order cancelled

This is more realistic for distributed systems.


19. What is Saga pattern?

Saga pattern manages distributed transactions through multiple local transactions.

Each service performs its own transaction and publishes an event.

Example:

Order Service creates order
Inventory Service reserves stock
Payment Service charges payment
Shipping Service prepares shipment

If one step fails, compensating actions are executed.

Example failure:

Payment failed
→ Release inventory
→ Cancel order

Saga can be implemented in two ways:

Choreography: services react to events
Orchestration: central coordinator controls flow

Saga is common in microservices because it avoids global database transactions.


20. What is Outbox pattern?

Outbox pattern ensures database changes and event publishing stay consistent.

Problem:

Save order to database succeeds
Kafka event publishing fails

Now the system has data but no event.

Outbox solves this by saving the event in the same database transaction.

Example:

orders
outbox_events

Transaction:

Save order
Save OrderCreatedEvent into outbox table
Commit transaction

Then a background worker publishes outbox events to Kafka.

This guarantees that if the order is saved, the event is also recorded.

Outbox pattern is very important in reliable event-driven systems.


21. What is CQRS?

CQRS stands for Command Query Responsibility Segregation.

It separates:

Commands = write operations
Queries = read operations

Example:

Command side:
Create order
Update order status

Query side:
Get order details
Search orders
Generate reports

In simple systems, read and write use the same model.

In CQRS, read and write models may be different.

Benefits:

Optimized reads
Optimized writes
Better scalability
Clearer responsibility

CQRS is useful when read requirements and write requirements are very different.


22. What is Event Sourcing?

Event Sourcing stores state changes as events instead of storing only the latest state.

Traditional approach:

orders table contains current status = PAID

Event sourcing approach:

OrderCreated
PaymentReceived
OrderShipped

Current state is rebuilt by replaying events.

Benefits:

Full history
Auditability
Replay capability
Debugging
Temporal reconstruction

Challenges:

More complexity
Event versioning
Query performance
Data model difficulty

Event sourcing is powerful but should not be used everywhere.


23. What is eventual consistency?

Eventual consistency means data may not be immediately consistent across all services, but it becomes consistent after some time.

Example:

Order created
Payment processed
Notification sent later

At one moment:

Order Service says PAID
Notification Service has not sent email yet

This is acceptable if the system eventually reaches the correct state.

Eventual consistency is common in:

Microservices
Kafka systems
Distributed databases
Asynchronous processing

It improves scalability but requires careful failure handling.


24. What is strong consistency?

Strong consistency means every read sees the latest committed write.

Example:

User updates balance
Next read must show new balance immediately

This is important for:

Bank balances
Payments
Inventory reservation
Critical financial operations

Strong consistency is easier in a single database transaction.

In distributed systems, strong consistency is harder and may reduce availability or performance.


25. How would you design a high-traffic REST API?

A high-traffic REST API should be designed for scalability, reliability, and performance.

Important areas:

Stateless application servers
Load balancer
Caching
Database indexing
Rate limiting
Pagination
Async processing
Monitoring
Horizontal scaling

Basic architecture:

Client
→ Load Balancer
→ Multiple Spring Boot instances
→ Cache
→ Database
→ Message queue

Use Redis for caching frequently accessed data.

Use database indexes for common queries.

Use Kafka or queues for heavy background work.

Avoid loading huge data responses. Always use pagination.

Monitor:

Latency
Error rate
CPU
Memory
Database query time
Consumer lag

26. How would you scale a Spring Boot application?

Spring Boot applications are usually scaled horizontally.

Instead of making one server bigger, run multiple instances.

Example:

Spring Boot instance 1
Spring Boot instance 2
Spring Boot instance 3

Put a load balancer in front.

Important requirement:

Application should be stateless.

Do not store session data in local memory. Use JWT or external session storage like Redis.

Also scale supporting systems:

Database
Cache
Kafka consumers
External API clients
Background workers

Application scaling alone is not enough if the database becomes the bottleneck.


27. How would you handle database bottleneck?

First, identify the bottleneck using monitoring and query analysis.

Common solutions:

Add indexes
Optimize queries
Avoid N+1 queries
Use pagination
Use caching
Use read replicas
Archive old data
Partition large tables
Move heavy work async

For read-heavy systems, use Redis cache or database read replicas.

For write-heavy systems, optimize transactions and reduce unnecessary writes.

For very large tables, consider partitioning.

Important interview point:

Do not scale blindly. Measure first.

Use slow query logs, execution plans, and metrics.


28. How would you handle external API failure?

External APIs are unreliable, so the system must protect itself.

Use:

Timeout
Retry with backoff
Circuit breaker
Fallback
Queue-based async processing
Monitoring

Example:

Payment provider is down
→ Do not block forever
→ Mark payment as PENDING
→ Retry later
→ Notify user if needed

Circuit breaker prevents repeated calls to a failing service.

If the external API is not critical, return fallback response.

If it is critical, store the request and process it asynchronously later.


29. How would you handle duplicate requests?

Duplicate requests happen because of:

Client retry
Network timeout
User double-click
Payment retry
Message redelivery

The solution is idempotency.

Client sends an idempotency key:

Idempotency-Key: abc-123

Backend stores the key with request result.

If the same key appears again, backend returns the previous result instead of executing again.

Example table:

idempotency_keys (
    key VARCHAR PRIMARY KEY,
    request_hash VARCHAR,
    response_body JSON,
    status VARCHAR,
    created_at TIMESTAMP
)

This is especially important for:

Payments
Order creation
Booking systems
Money transfer

30. How would you design idempotency key?

An idempotency key is a unique value generated by the client for a request.

Example:

POST /payments
Idempotency-Key: payment-abc-123

Backend flow:

Check if key exists
If not exists → process request → store result
If exists → return previous result

Important design rules:

Key must be unique per operation
Store request hash to prevent key reuse with different payload
Use database unique constraint
Set expiration time
Return same response for repeated request

Example:

idempotency_records (
    idempotency_key VARCHAR UNIQUE,
    request_hash VARCHAR,
    response_status INT,
    response_body JSON,
    created_at TIMESTAMP,
    expires_at TIMESTAMP
)

If same key comes with a different request body, return an error such as:

409 Conflict

This protects the system from accidental duplicate operations and is essential for reliable payment and order systems.

Comments