System Design Interview Questions: Microservices Architecture

Reviewed by Mark Dickie · Last updated 29 June 2026

Microservices architecture is an approach to building software where a single application is split into small, independently deployable services, each owning its own data and running its own process. Interviews on this topic test whether you can make deliberate trade-offs, not just recite definitions — expect questions on service boundaries, inter-service communication, failure isolation, and how data consistency works without a shared database. You should be comfortable explaining when microservices are the wrong call, since interviewers specifically probe that judgment. Knowing the CAP theorem and the two-phase commit problem cold will carry you a long way.

What interviewers actually care about

Microservices questions at interview are rarely about naming patterns. They are about showing you understand the cost of distribution — network latency, partial failure, eventual consistency — and can pick the right tool for a given constraint.

The most common weak spots candidates show:

Treating service decomposition as a purely technical exercise instead of aligning boundaries to team ownership and business capabilities (the "team topology" argument for Domain-Driven Design).
Defaulting to synchronous REST everywhere, without considering where async messaging reduces coupling and absorbs traffic spikes.
Ignoring the data layer — splitting services while keeping a shared database negates most of the independence you were aiming for.
Skipping observability: a distributed system with no distributed tracing is nearly impossible to debug under pressure.
Forgetting that coordination between services reintroduces the complexity you split apart — design for failure first, convenience second.

Core concepts and how they connect

| Concept | What it means in practice | Common interview angle | |---|---|---| | Service decomposition | Splitting by bounded context or business capability, not by technical layer | "How would you break up a monolith?" | | Synchronous vs async communication | REST/gRPC for request-response; message queues (Kafka, RabbitMQ) for event-driven flows | "How do services talk without tight coupling?" | | Data isolation | Each service owns its schema; cross-service queries use APIs or events | "How do you handle joins across services?" | | Saga pattern | Sequence of local transactions with compensating actions on failure (choreography or orchestration) | "How do you manage a distributed transaction?" | | Circuit breaker | Stops calling a failing downstream service to prevent cascading failures | "What happens when Service B is down and A depends on it?" | | Service discovery | Services register themselves; clients look up addresses dynamically (e.g. Consul, Kubernetes DNS) | "How does Service A find Service B's address?" | | API gateway | Single entry point that handles routing, auth, rate-limiting, and protocol translation | "How do you expose 40 services to a mobile client?" | | Observability | Structured logs, metrics, and distributed traces (e.g. OpenTelemetry + Jaeger) | "How do you debug a slow request across five services?" |

Difficulty range on this quiz

Questions here span the full difficulty scale:

Level 1–2 — Definitions and basic trade-offs (monolith vs microservices, when to use each).
Level 3 — Pattern application: design a checkout service, pick a communication style, explain data ownership.
Level 4 — Failure scenarios: circuit breakers, saga rollbacks, idempotency across retries.
Level 5 — Open-ended architecture: design a ride-sharing dispatch system or a multi-region event-driven platform at scale, justifying every boundary.

Work through the lower levels first if service decomposition or the CAP theorem feels uncertain — the harder questions build directly on those foundations.

At a glance

Questions	15
Difficulty	2–5 of 5
Formats	Multiple choice, Ordering, Multiple answer, True / false

What you'll review

microservices
sd architecture
horizontal vertical
caching strategies
cap consistency
sharding
message queues
idempotency keys
sql vs nosql
availability
cdn edge
replication

Practice questions

System Design/sd-architecture/microservices

A 6-person startup with one product and frequent cross-cutting changes wants to split its monolith into ~15 microservices 'to scale like the big companies.' What is the strongest argument to stay a (modular) monolith for now?

Options

Monoliths are always faster than microservices because every in-process function call beats a network hop, so raw latency arithmetic alone settles the architecture decision regardless of team size, scaling needs, organizational structure, or deployment cadence
Microservices cannot use a relational database, and the team's data model depends on one, so the split is technically impossible to carry out
Microservices eliminate the need for integration testing, a safety net this small team cannot afford to give up at its current stage of growth
Microservices mainly pay off at team and scaling boundaries this team doesn't have yet, while adding network failure modes, deploy coordination, and distributed transactions — a tax a modular monolith avoids while keeping clean module boundaries

Show answer

Stay a modular monolith: microservices mainly pay off at team and scaling boundaries this team doesn't have yet, while adding network failure modes, deploy coordination, and distributed transactions — a tax a modular monolith avoids while keeping clean module boundaries. A 6-person single-product team with frequent cross-cutting changes would pay that tax immediately and gain none of the independent-deployment benefit, with a clear path to extract services later.

Why:

Microservices buy independent deployability and scaling per team/domain — value that materialises when many teams own separate domains with divergent scaling needs. A 6-person single-product team has none of that yet, but would immediately pay the tax: network failure modes, distributed transactions/saga complexity, deploy and versioning coordination, and cross-cutting changes (their stated norm) now touching 15 repos at once. A modular monolith gives clean module boundaries and a path to extract services later, without that tax. The distractors are false absolutes: in-process speed isn't a blanket win that decides architecture (a), microservices use relational databases routinely (b), and they add rather than remove integration-testing surface (c).

System Design/sd-architecture

You are designing a globally distributed, multi-region active-active database (e.g., a shopping-cart service). Each region accepts both reads and writes and replicates asynchronously to the others. A user updates their cart from two different regions nearly simultaneously before either update has replicated. Which mechanism best detects and resolves this write conflict while preserving availability across all regions?

Options

Last-Write-Wins (LWW) using each node's local wall-clock timestamp
Two-phase commit (2PC) coordinated across all regions before acknowledging each write
Vector clocks (or version vectors) attached to each write, with application-level or CRDT-based merge on conflict detection
Read repair triggered at query time to reconcile diverged replicas

Show answer

Vector clocks (or version vectors) with application-level or CRDT-based merging are the best fit. They capture causal relationships between writes across regions, allowing the system to detect true conflicts (concurrent writes with no causal ordering) and apply deterministic merge logic — without requiring cross-region coordination on the write path. LWW risks silent data loss due to clock skew, 2PC sacrifices availability, and read repair only acts on the read path.

Why:

In a multi-region active-active architecture, the core challenge with eventual consistency is handling concurrent writes to the same record from different regions. Vector clocks (or similar mechanisms like CRDTs) track causality across nodes so that conflicts can be detected and resolved rather than silently overwriting data. Last-Write-Wins (LWW) based solely on wall-clock time is unsafe because clocks can skew, making it possible to overwrite a newer write with an older one. Two-phase commit provides strong consistency but introduces cross-region coordination latency and is an availability risk — the opposite of what active-active targets. Read repair is a read-path technique and does not prevent write conflicts. Vector clocks remain the standard causality-tracking answer for active-active replication conflict detection.

System Design/sd-architecture/microservices

A checkout flow spans three microservices: inventory, payment, and fulfilment. You need all three to succeed or the whole operation to roll back. Using a distributed two-phase commit (2PC) across the services is rejected. What is the standard microservices alternative and what does it trade away?

Options

The Saga pattern: break the operation into a sequence of local transactions with compensating transactions for rollback, trading strong consistency (ACID across services) for eventual consistency and more complex failure handling
Long polling: the orchestrator polls each service until all three confirm success or any confirms failure, providing ACID guarantees without the coordinator overhead of 2PC
Moving all three services back into a single monolith so an in-process database transaction can wrap all three operations atomically
Event sourcing: store the intent in an append-only log and replay it until all services catch up, providing linearizable consistency across the distributed system

Show answer

The standard alternative is the Saga pattern: break the operation into a sequence of local transactions with compensating transactions for rollback. It trades strong consistency (ACID across services) for eventual consistency and more complex failure handling. If a step fails, completed steps are undone — release the inventory, void the payment — so the design needs careful idempotency and retry handling. This is the accepted trade for service autonomy and fault isolation.

Why:

The Saga pattern decomposes a cross-service transaction into a series of local service transactions, each publishing an event (choreography) or receiving an orchestrator command. If a step fails, previously completed steps are rolled back via compensating transactions (e.g. release the reserved inventory, void the payment). What you give up is ACID atomicity across services: the system is eventually consistent — during execution, partial state is visible, and compensations can fail, requiring careful idempotency and retry design. This is the accepted trade for autonomy and fault isolation. Long polling (b) is a transport mechanism that provides no consistency semantics. Re-merging into a monolith (c) defeats the whole point of the microservices architecture. Event sourcing (d) is a data-modelling pattern that offers durability and audit, not cross-service atomicity — and it is orthogonal to, not a replacement for, saga coordination.

System Design/sd-architecture/microservices

A platform team proposes adding a service mesh (e.g. Istio) on top of the existing API gateway. A developer argues they are redundant. Which statement correctly distinguishes their purposes?

Options

The API gateway handles north-south traffic (external clients → services) with concerns like auth, rate limiting, and routing; the service mesh handles east-west traffic (service-to-service inside the cluster) with mTLS, retries, and telemetry injected via sidecar proxies — they are complementary, not redundant
They are genuinely redundant: the API gateway is the newer technology that subsumes all service-mesh features, so adding a mesh once a gateway exists provides no additional value
The service mesh replaces the API gateway for external traffic because sidecar proxies terminate TLS at the pod level, making an ingress gateway unnecessary
The API gateway is an application-layer concern deployed once; the service mesh is an infrastructure tool that replaces the database connection pool so services don't need to manage their own DB connections

Show answer

They are complementary, not redundant. The API gateway handles north-south traffic (external clients to services) with auth, rate limiting, and routing; the service mesh handles east-west traffic (service-to-service inside the cluster) with mTLS, retries, and telemetry injected via sidecar proxies. The gateway cannot see internal pod-to-pod calls, and the mesh is not designed as an external ingress, so each covers a boundary the other does not.

Why:

They operate at different traffic boundaries. The API gateway is the north-south edge: it handles the public API surface — authentication, TLS termination, routing, rate limiting, and request aggregation for external callers. The service mesh is east-west: it intercepts every inter-service call via sidecar proxies (e.g. Envoy) to provide mTLS between pods, circuit breaking, retries, load balancing, and per-call telemetry without any changes to application code. These two layers complement each other — the gateway cannot see internal pod-to-pod calls, and the mesh is not designed as an external ingress. A gateway does not subsume mesh features (b); they have different scopes. Sidecars handle inter-service calls, not external ingress (c). A service mesh has nothing to do with database connection pooling (d) — that is managed by the application or a separate proxy like PgBouncer.

System Design/sd-architecture

A distributed key-value store experiences a network partition that splits its nodes into a majority group and a minority group. Arrange the following architectural decision steps in the correct order a system architect should reason through when handling this scenario, from first principle to final action:

Put these in order

Detect that a network partition has occurred (e.g., via heartbeat timeouts or gossip failure detection)
Identify which nodes form the majority partition and which form the minority partition (e.g., by quorum count)
Determine whether the system's SLA prioritises Consistency (CP) or Availability (AP) under partition
If AP: allow writes on both partition sides and track divergence (e.g., with vector clocks); if CP: reject writes on the minority side entirely to preserve a single source of truth
After the partition heals, perform reconciliation — applying conflict-resolution rules (e.g., LWW, vector clocks, CRDTs) for AP systems; for CP systems, minority nodes re-sync from the majority log (no minority writes were accepted, so there is nothing to replay)

Show answer

The correct order is: (A) Detect the partition → (B) Identify majority vs. minority nodes → (C) Determine CP vs. AP SLA → (D) Enforce the policy (CP: reject minority writes outright; AP: accept all writes and track divergence) → (E) Reconcile after healing (AP: conflict resolution; CP: minority re-syncs from the majority log). Identifying the majority/minority split must precede enforcement, and true CP systems reject — not queue — minority-side writes.

Why:

The question tests deep understanding of the CAP theorem applied to a concrete split-brain scenario. First, the system must detect the partition (A) — without detection, no decision can be made. Second, nodes must identify majority vs. minority (B) — this quorum assessment is a prerequisite to knowing which side to suppress or allow. Third, the architect must consult the system's SLA to decide between CP and AP behaviour (C). Fourth, the system enforces that decision (D): an AP system (e.g., Cassandra, DynamoDB) accepts writes on both sides and logs divergence; a true CP system (e.g., etcd, ZooKeeper — both Raft/Paxos-based) outright rejects writes that cannot achieve quorum on the minority side — queuing writes would defer the consistency decision and is an AP-leaning strategy, not CP. Finally, after healing (E), AP systems perform conflict resolution (LWW, vector clocks, CRDTs); CP systems simply re-sync minority nodes from the authoritative majority log, because those minority nodes accepted no writes and have nothing to replay.

System Design/sd-fundamentals/horizontal-vertical

Your API runs on a fleet behind a round-robin load balancer, but users intermittently get logged out as requests land on different nodes. The team stores session state in each node's local memory. What is the single most important change to enable safe horizontal scaling?

Options

Move session state out of the app process into a shared store (e.g. Redis) so any node can serve any request
Enable sticky sessions on the load balancer and keep state in local memory
Scale each node vertically with more RAM so sessions never get evicted
Add a CDN in front of the API to cache the session responses

Show answer

Move session state out of each node's local memory into a shared store such as Redis, so any node can serve any request. The nodes are stateful today — each holds session data only it can see — which is why round-robin routing logs users out. Externalising state makes nodes interchangeable, the precondition for safe horizontal scaling, rolling deploys, and failover.

Why:

The root problem is that the app nodes are stateful: each holds session data only it can see. Externalising state to a shared store makes the nodes interchangeable, which is the precondition for horizontal scaling and for free rolling deploys and failover. Sticky sessions are a workaround, not a fix — they pin users to a node, so a deploy or crash still drops their session and they defeat even load distribution. Vertical scaling (more RAM) raises the ceiling but keeps the state trapped on one box, so the cross-node inconsistency remains. A CDN caches public, cacheable responses; per-user session data is neither, so it does nothing here.

System Design/sd-fundamentals/caching-strategies

A read-heavy product catalog uses a cache in front of the database. The team wants fresh reads immediately after a write with the lowest steady-state read latency, accepting slightly slower writes. Which caching strategy best fits?

Options

Cache-aside (lazy loading) with no write-side update
Write-through: write to the cache and the database synchronously on every write
Write-behind (write-back): acknowledge the write from cache and flush to the DB asynchronously
TTL-only caching with a 60-second expiry and no write path

Show answer

Use a write-through cache: write to the cache and the database synchronously on every write. The just-written value is already in cache for the next read, giving fresh reads immediately at the cost of a slightly slower write — exactly the trade described. Cache-aside only populates on a read miss, and TTL or write-behind strategies both permit a window of staleness.

Why:

Write-through updates the cache and the database in the same write path, so the just-written value is already in cache for the next read — you get fresh reads immediately at the cost of a slower write, exactly the trade the team accepted. Plain cache-aside only populates the cache on a read miss, so the entry written can be stale until the next miss (and a delete-on-write is needed to avoid serving the old value). Write-behind acknowledges before the DB is durable, trading consistency and durability for write speed — the opposite of the stated priority. TTL-only caching guarantees staleness for up to the TTL window, which violates the "fresh immediately" requirement.

System Design/sd-data/cap-consistency

Per the CAP theorem, during an active network partition a distributed datastore must choose between two properties. Which pair describes the real choice it faces?

Options

Consistency or Availability — partition tolerance is a given for any networked system
Consistency or Partition tolerance — availability is always preserved
Availability or Partition tolerance — consistency is always preserved
Latency or Durability — CAP is purely about write performance

Show answer

The real choice is Consistency or Availability — partition tolerance is a given for any networked system. CAP says that when a partition occurs, a system can preserve either linearizable consistency or availability, not both. You cannot opt out of partitions in a real distributed system, so P is assumed and the live decision is C-versus-A.

Why:

CAP says that when a partition occurs, a system can preserve either linearizable Consistency or Availability, not both. Partition tolerance is not optional in a real distributed system — networks drop packets and nodes fail — so P is assumed, and the live decision is C-vs-A. The other framings misstate this: you cannot "choose" to forgo partition tolerance and still be distributed, and consistency is precisely the property at risk, not the one that is guaranteed. Latency/durability describes PACELC's else-clause and write tuning, which is a separate axis from the partition-time CAP choice.

System Design/sd-data/sharding

You are sharding a high-write events table across many partitions. Using created_at (a monotonically increasing timestamp) as the shard key causes one shard to absorb nearly all writes. What is the best remedy?

Options

Shard on a high-cardinality key such as a hash of the tenant/entity id so writes spread evenly
Add more replicas to the busy shard so it can keep up
Increase the shard count but keep created_at as the key
Switch the busy shard to a larger instance type (vertical scale)

Show answer

Shard on a high-cardinality key, such as a hash of the tenant or entity id, so writes spread evenly across partitions. A monotonic timestamp routes every current write to whichever shard owns the latest range, creating a hot partition no matter how many shards exist. Adding replicas or vertically scaling the busy shard only raises one box's ceiling; neither balances the write distribution.

Why:

A monotonic timestamp routes all current writes to whichever shard owns the latest range, creating a hot partition no matter how many shards exist. Choosing a high-cardinality key (a hash of tenant or entity id) distributes writes uniformly, which is the actual fix. Adding replicas helps read load but not the write hotspot — replicas still funnel writes through one leader. Raising the shard count without changing the key leaves the newest range concentrated on a single shard. Vertically scaling the hot shard only raises its ceiling and reintroduces a single point of contention; it does not balance the distribution.

System Design/sd-patterns/message-queues

Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?

Options

Checkout no longer blocks on slow consumers — it returns once the event is enqueued
The queue absorbs traffic spikes, buffering work so consumers can drain at their own pace
A temporarily down consumer can recover and process backlog without losing events
It guarantees end-to-end latency is lower than the synchronous version for every request
It removes the need for consumers to handle duplicate deliveries

Show answer

The real benefits are that checkout no longer blocks on slow consumers (it returns once the event is enqueued), the queue absorbs traffic spikes and buffers work so consumers drain at their own pace, and a temporarily down consumer can recover and process the backlog without losing events. It does not guarantee lower per-request end-to-end latency, and it does not remove the need to handle duplicate deliveries.

Why:

A queue decouples producers from consumers: checkout returns as soon as the event is enqueued (a), the queue acts as a buffer that smooths spikes so consumers process at a sustainable rate (b), and durable queues retain messages so a consumer that was down can drain the backlog on recovery (c). The two wrong options reflect common misconceptions. Async processing does not lower per-request end-to-end latency (d) — the downstream work still happens, just later; you trade latency-to-completion for responsiveness and resilience. And most queues offer at-least-once delivery, so consumers must be idempotent to tolerate duplicates (e); the queue does not remove that obligation.

System Design/sd-patterns/idempotency-keys

A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?

Options

The client generates a unique key per logical operation and resends the same key on retries
The server persists the key with the operation's result and replays the stored result on a repeated key
Idempotency keys are most valuable for non-idempotent verbs like POST, where natural retries are unsafe
Generating a fresh key on each retry attempt is the recommended approach
Idempotency keys are unnecessary because TCP already guarantees exactly-once delivery

Show answer

Three statements are correct: the client generates a unique key per logical operation and resends the same key on retries, the server persists the key with the operation's result and replays the stored result on a repeat, and idempotency keys are most valuable for non-idempotent verbs like POST. Generating a fresh key per retry defeats the mechanism, and TCP guarantees reliable bytes, not application-level exactly-once.

Why:

Idempotency keys work because the client mints one key per logical operation and reuses it across retries (a), and the server records that key alongside the result so a repeat presents the same outcome instead of executing twice (b). They matter most for non-idempotent methods like POST (c) — GET/PUT/DELETE are already idempotent by definition, so a blind retry of those is safe. The wrong options break the mechanism: minting a new key per attempt (d) defeats the whole point — the server sees each retry as a distinct operation and double-charges. And TCP guarantees reliable, ordered bytes within a connection, not application-level exactly-once semantics across reconnects and timeouts (e); that is exactly the gap idempotency keys fill.

System Design/sd-data/sql-vs-nosql

You are choosing between a relational database and a document/wide-column NoSQL store for a new service. Which considerations correctly favour reaching for NoSQL over a single relational primary?

Options

Access patterns are known and key-based, and you need predictable single-digit-ms reads at very high scale
The schema is flexible/evolving and writes are massive, with horizontal partitioning built in
You can model the workload to avoid cross-entity joins and tolerate eventual consistency
The core requirement is multi-row ACID transactions spanning several related tables
You rely heavily on ad-hoc analytical queries with complex joins and aggregations

Show answer

NoSQL is favoured when access patterns are known and key-based with a need for predictable single-digit-ms reads at very high scale, when the schema is flexible and writes are massive with horizontal partitioning built in, and when you can model the workload to avoid cross-entity joins and tolerate eventual consistency. Multi-row ACID transactions across related tables and ad-hoc analytical queries with complex joins are signals to stay relational.

Why:

NoSQL shines when access is key-based and predictable at scale (a), when the schema is fluid and write volume is huge with native horizontal partitioning (b), and when you can denormalise to avoid joins and accept eventual consistency (c). The remaining two are signals to stay relational. Multi-row ACID transactions across related tables (d) are exactly what a relational engine guarantees cleanly, whereas many NoSQL stores limit transactions to a single partition or item. Ad-hoc analytical queries with complex joins and aggregations (e) are the relational/SQL sweet spot; forcing them onto a key-value or document model leads to expensive scatter-gather or duplicated denormalised data. The choice is driven by access patterns and consistency needs, not by one store being universally "better."

System Design/sd-reliability/availability

You are hardening a service toward higher availability and want to eliminate single points of failure. Which design choices meaningfully improve availability through redundancy and failover?

Options

Run stateless app instances across multiple availability zones behind a health-checking load balancer
Use a replicated database with automated leader failover (promote a follower on primary loss)
Add health checks so the load balancer stops routing to unhealthy instances automatically
Deploy every component into a single availability zone to minimize cross-zone latency
Run a single oversized, highly reliable instance instead of several smaller ones

Show answer

Availability comes from redundancy with automatic failover: run stateless app instances across multiple availability zones behind a health-checking load balancer, use a replicated database with automated leader failover, and add health checks so the balancer stops routing to unhealthy instances. Deploying everything into a single AZ, or running one oversized instance, each reintroduces a single point of failure that takes the whole service down when it fails.

Why:

Availability comes from redundancy with automatic failover and removing single points of failure. Spreading stateless instances across multiple AZs behind a health-checking balancer (a) survives an instance or whole-zone outage; a replicated DB with automated leader promotion (b) removes the database as a single point of failure; and health checks that eject unhealthy nodes (c) keep traffic flowing only to working capacity. The two wrong options reduce availability. Putting everything in one AZ (d) trades resilience for latency — a single zone outage takes the whole service down. A single oversized instance (e) is the textbook single point of failure no matter how reliable the box: when it dies, you have zero capacity, which is why N+1 redundancy beats one big node.

System Design/sd-fundamentals/cdn-edge

Your global app serves large static assets (images, JS bundles, video) and origin egress plus latency are hurting. You put a CDN / edge cache in front. Which outcomes are correct expectations of this change?

Options

Cached assets are served from edge PoPs close to users, cutting round-trip latency
Origin load and egress drop because the CDN absorbs repeat requests for cacheable content
Cache-Control / TTL headers govern how long edges serve content before revalidating with origin
Highly personalized, per-user dynamic API responses become trivially cacheable at the edge
A bad deploy is harmless because edges instantly reflect every origin change with no invalidation needed

Show answer

The correct expectations are that cached assets are served from edge PoPs close to users (cutting round-trip latency), origin load and egress drop because the CDN absorbs repeat requests for cacheable content, and Cache-Control/TTL headers govern how long edges serve content before revalidating. Highly personalized per-user responses are not trivially cacheable at a shared edge, and edges do not instantly reflect origin changes — stale objects live until TTL or an explicit purge.

Why:

A CDN caches content at edge points of presence near users (a), so it lowers latency and offloads repeat requests from the origin, cutting load and egress (b), with freshness controlled by Cache-Control/TTL and revalidation (c). The wrong options are classic edge-caching traps. Personalized, per-user dynamic responses are generally not cacheable at a shared edge (d) — caching them risks leaking one user's data to another, so they need private/no-store handling or edge-compute personalization, not naive caching. And edges do not instantly reflect origin changes (e): cached objects live until their TTL or an explicit purge/invalidation, which is precisely why a bad asset deploy can keep serving stale content until you invalidate the cache or bust the URL.

System Design/sd-data/replication

With asynchronous primary-replica replication, a client can write to the primary and immediately read a stale value from a replica.

Show answer

True. Asynchronous replication acknowledges the write on the primary before the change has propagated, so a read routed to a lagging replica can miss the just-written value — the classic read-your-writes anomaly. Mitigations include pinning that user's reads to the primary for a short window after a write, using a synchronous or quorum write, or tracking replication lag.

Why:

Asynchronous replication acknowledges the write on the primary before the change has propagated, so a read routed to a lagging replica can miss the just-written value — the classic read-your-writes anomaly. Mitigations include routing a user's reads to the primary for a short window after a write, using a synchronous/quorum write, or tracking replication lag and pinning sticky reads.

Job market

See system-design salaries and hiring demand from live job postings.

Practice this for real

CodePrep turns your target job description into an adaptive quiz from a bank of tagged questions, scores your answers, and resurfaces the topics you miss.

Start free

System Design Interview Questions: Microservices Architecture

What interviewers actually care about

Core concepts and how they connect

Difficulty range on this quiz

At a glance

What you'll review

Practice questions

A 6-person startup with one product and frequent cross-cutting changes wants to split its monolith into ~15 microservices 'to scale like the big companies.' What is the strongest argument to stay a (modular) monolith for now?

A checkout flow spans three microservices: inventory, payment, and fulfilment. You need all three to succeed or the whole operation to roll back. Using a distributed two-phase commit (2PC) across the services is rejected. What is the standard microservices alternative and what does it trade away?

A platform team proposes adding a service mesh (e.g. Istio) on top of the existing API gateway. A developer argues they are redundant. Which statement correctly distinguishes their purposes?

Your API runs on a fleet behind a round-robin load balancer, but users intermittently get logged out as requests land on different nodes. The team stores session state in each node's local memory. What is the single most important change to enable safe horizontal scaling?

A read-heavy product catalog uses a cache in front of the database. The team wants fresh reads immediately after a write with the lowest steady-state read latency, accepting slightly slower writes. Which caching strategy best fits?

Per the CAP theorem, during an active network partition a distributed datastore must choose between two properties. Which pair describes the real choice it faces?

You are sharding a high-write events table across many partitions. Using created_at (a monotonically increasing timestamp) as the shard key causes one shard to absorb nearly all writes. What is the best remedy?

Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?

A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?

You are choosing between a relational database and a document/wide-column NoSQL store for a new service. Which considerations correctly favour reaching for NoSQL over a single relational primary?

You are hardening a service toward higher availability and want to eliminate single points of failure. Which design choices meaningfully improve availability through redundancy and failover?

Your global app serves large static assets (images, JS bundles, video) and origin egress plus latency are hurting. You put a CDN / edge cache in front. Which outcomes are correct expectations of this change?

With asynchronous primary-replica replication, a client can write to the primary and immediately read a stale value from a replica.

Related interview questions

Job market

Practice this for real

New topics and job-market signal, in your inbox