Senior Software Engineer Interview Prep: Questions Across Python, SQL, System Design & More
Reviewed by Mark Dickie · Last updated
A Senior Software Engineer interview tests your ability to own technical decisions end-to-end — from writing reliable Python and SQL to designing distributed systems, building AI-integrated services, and reasoning about HTTP APIs and Node.js runtimes under real production constraints. To prepare well, you need depth in at least two or three of the core technology areas below, plus the ability to connect them: interviewers at this level want to see why you made a choice, not just what the syntax is. System Design questions carry heavy weight at senior level, so treat them as the spine of your prep and fill in the language and data-layer specifics around them. Strong candidates can also speak to AI Engineering patterns — retrieval-augmented generation, embedding pipelines, model-serving trade-offs — which have moved from niche to expected at many teams.
Core competency areas
| Area | What interviewers probe | |---|---| | Databases & SQL | Query optimisation, index design, transaction isolation, schema trade-offs (relational vs. document) | | Python | Async execution, memory model, idiomatic data structures, testing patterns, packaging | | AI Engineering | Prompt design, RAG pipelines, vector stores, model-serving latency, evaluation strategies | | HTTP & APIs | REST vs. GraphQL vs. gRPC, auth flows (OAuth 2, JWT), rate limiting, versioning, caching headers | | System Design | Capacity estimation, consistency vs. availability trade-offs, message queues, service boundaries | | Node.js | Event loop mechanics, async/await pitfalls, streaming, module system, performance profiling |
Suggested study order
Work through these in sequence. Each layer builds on the previous one, so skipping ahead tends to leave gaps that surface badly in design rounds.
- SQL and data modelling fundamentals — if your query plans and index reasoning are shaky, every system-design answer that touches persistence will wobble with them.
- Python internals and async patterns — understand the GIL,
asyncio, and how Python interacts with I/O before you try to design services that use it at scale. - HTTP, APIs, and auth flows — most distributed systems interview questions assume you can talk fluently about how services communicate and how you secure those channels.
- Node.js runtime model — study the event loop and where it breaks down under CPU-bound load; this pairs directly with API and concurrency questions.
- AI Engineering concepts — ground yourself in the practical patterns (RAG, chunking strategies, vector search, evaluation loops) rather than the theory; that is what senior-level questions target.
- System Design end-to-end — bring everything together here. Practice talking through capacity, data flow, failure modes, and operational concerns on realistic prompts like "design a job queue" or "design a semantic search service."
Senior roles attract competitive compensation, and the market for engineers who can span backend systems and AI Engineering patterns is particularly strong right now. The live data below this intro reflects current salary ranges and demand signals for this role, so check those figures before you head into any negotiation.
What to study, in order
For a senior Software Engineer interview, prioritise the role's most in-demand technologies first:
- Databases & SQL
- Python
- AI Engineering
- HTTP & APIs
- System Design
- Node.js
Software Engineer salary & demand
From live job postings, the median Software Engineer salary is £55,000, across 297 recent postings. These figures are role-wide across all seniority levels, as of June 2026.
| 25th percentile | £37,500 |
|---|---|
| Median (50th percentile) | £55,000 |
| 75th percentile | £82,500 |
Practice questions
AI Engineering/evaluation-safety/prompt-injection
Your agent reads untrusted web content and has tools that can read files and call internal APIs. Which of the following are genuine, meaningful mitigations against prompt injection?
Options
- Apply least privilege to tools and require human approval (or hard policy checks) for high-impact actions
- Clearly delimit untrusted content and instruct the model to treat it as data, not instructions
- Raise the temperature so the model is less predictable to attackers
- Validate and constrain tool outputs/arguments before executing them (allow-lists, schemas, sandboxing)
- Trust the system prompt to always win because it appears first
Show answer
The genuine mitigations are applying least privilege to tools with human approval for high-impact actions, clearly delimiting untrusted content and labeling it as data not instructions, and validating or sandboxing tool outputs and arguments before executing them. These are layered and architectural. Raising the temperature does nothing for safety, and trusting the system prompt to always win is false: there is no hard precedence boundary in the token stream, so crafted injections routinely override prepended instructions.
Effective defenses are layered and architectural. Least privilege plus human-in-the-loop for dangerous actions (a) limits blast radius even when an injection succeeds. Delimiting untrusted text and labeling it as data (b) helps the model resist hijacking — it is necessary but not sufficient on its own. Validating/sandboxing what tools receive and do (d) stops a hijacked call from causing real damage. Raising temperature (c) does nothing for safety; it just adds randomness and can make the system less reliable. "The system prompt always wins" (e) is false — there is no hard precedence boundary in the token stream, and crafted injections routinely override prepended instructions, which is exactly why you cannot rely on prompt ordering alone.
Databases & SQL/querying/joins
A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?
Options
NULLin everyorderscolumn- The row is omitted entirely
0in everyorderscolumn- An empty string in every
orderscolumn
Show answer
Every orders column holds NULL. A LEFT JOIN keeps each left-hand row even when no right-hand row matches the ON condition, filling the unmatched right-hand columns with NULL rather than omitting the row or substituting 0 or empty strings. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers who have placed no orders.
A LEFT JOIN keeps every left-hand row even when no right-hand row matches the ON condition; the unmatched right-hand columns are filled with NULL. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers with no orders.
HTTP & APIs/http-protocol/status-codes
A request carries a valid, authenticated session, but the user lacks permission for the action. Which status code is correct?
Options
- 400 Bad Request
- 401 Unauthorized
- 403 Forbidden
- 404 Not Found
Show answer
403 Forbidden is correct when the request is authenticated but the user lacks permission for the action. The server understood the request and knows who the client is, but refuses it, so re-authenticating will not help. 401 Unauthorized is reserved for missing or invalid credentials and must include a WWW-Authenticate header.
403 Forbidden means the server understood the request and the client is authenticated, but is not allowed to perform it — re-authenticating will not help (RFC 9110 §15.5.4). 401 Unauthorized is for missing or invalid credentials and must include a WWW-Authenticate header. (404 is sometimes used deliberately to hide a resource's existence, but the textbook answer for a known authorization failure is 403.)
Node.js/node-modules/esm-node
A package's package.json has "type": "module". What does a .js file in that package use to import a dependency?
Options
import x from "pkg";(static ESM syntax)const x = require("pkg");module.import("pkg")include "pkg";
Show answer
It uses static ESM syntax like import x from "pkg";. Setting "type": "module" makes .js files ES modules, where import/export is the import mechanism and require is not defined in scope. To keep using CommonJS require in such a package, name the file .cjs instead.
"type": "module" makes .js files ES modules, so they use import/export. require is not defined in an ES module scope (you would use createRequire to get it). To keep using CommonJS in such a package, name the file .cjs.
Python/iteration/generators
Which statements are true of a generator object created by a generator expression like (x for x in range(3))? Select all that apply.
Options
- It is iterated lazily, producing one value at a time
- It is single-pass — once exhausted it yields nothing more
- It supports
len()to report how many values remain - It supports indexing like
g[0]
Show answer
A generator is iterated lazily, producing one value at a time, and it is single-pass, so once exhausted it yields nothing more. It does not support len() or indexing like g[0], because there is no materialized collection behind it — you would convert it to a list first to get either.
A generator computes values on demand (lazy) and can be consumed only once — after it is exhausted, further iteration yields nothing. It does not support len() or indexing, because it has no materialized collection behind it; you would convert it to a list first to get either.
System Design/sd-patterns/message-queues
Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?
Options
- Checkout no longer blocks on slow consumers — it returns once the event is enqueued
- The queue absorbs traffic spikes, buffering work so consumers can drain at their own pace
- A temporarily down consumer can recover and process backlog without losing events
- It guarantees end-to-end latency is lower than the synchronous version for every request
- It removes the need for consumers to handle duplicate deliveries
Show answer
The real benefits are that checkout no longer blocks on slow consumers (it returns once the event is enqueued), the queue absorbs traffic spikes and buffers work so consumers drain at their own pace, and a temporarily down consumer can recover and process the backlog without losing events. It does not guarantee lower per-request end-to-end latency, and it does not remove the need to handle duplicate deliveries.
A queue decouples producers from consumers: checkout returns as soon as the event is enqueued (a), the queue acts as a buffer that smooths spikes so consumers process at a sustainable rate (b), and durable queues retain messages so a consumer that was down can drain the backlog on recovery (c). The two wrong options reflect common misconceptions. Async processing does not lower per-request end-to-end latency (d) — the downstream work still happens, just later; you trade latency-to-completion for responsiveness and resilience. And most queues offer at-least-once delivery, so consumers must be idempotent to tolerate duplicates (e); the queue does not remove that obligation.
AI Engineering/rag/retrieval-quality
A RAG system returns confident but wrong answers because the retrieved chunks are often irrelevant. Which changes are legitimate levers to improve retrieval quality?
Options
- Add a reranker (e.g. a cross-encoder) over the top-k candidates before passing them to the model
- Tune chunk size and overlap so chunks are semantically coherent and self-contained
- Combine dense (vector) retrieval with sparse keyword search (hybrid retrieval)
- Raise the generation model's
max_tokensso it can write a longer answer - Switch to a stronger embedding model better matched to your domain
Show answer
The legitimate levers are adding a reranker such as a cross-encoder over the top-k candidates, tuning chunk size and overlap so chunks are coherent and self-contained, combining dense vector retrieval with sparse keyword search (hybrid retrieval), and switching to a stronger embedding model matched to your domain. Each improves which chunks reach the context. Raising the generation model's max_tokens only changes how long the answer may be — it does nothing about which documents were retrieved.
Retrieval quality is about getting the right chunks into context. A reranker (a) reorders the initial candidate set with a more expensive, more accurate cross-encoder so the best passages float to the top. Chunking strategy (b) directly affects whether a chunk contains a complete, embeddable idea rather than a fragment. Hybrid retrieval (c) catches cases where exact terms/IDs matter that dense vectors miss, and vice versa. A better-matched embedding model (e) improves the similarity signal at the source. Raising max_tokens (d) only changes how long the answer may be — it does nothing about which documents were retrieved, so it cannot fix retrieving the wrong material.
Databases & SQL/querying/aggregation
A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?
Options
COUNT(*)returns 10;COUNT(manager_id)returns 7- Both return 10
- Both return 7
COUNT(*)returns 7;COUNT(manager_id)returns 10
Show answer
COUNT(*) returns 10 and COUNT(manager_id) returns 7. COUNT(*) counts every row regardless of nulls, while COUNT(manager_id) counts only rows where that column is non-null, skipping the 3 nulls. Every standard aggregate except COUNT(*) ignores NULL inputs.
COUNT(*) counts rows regardless of nulls, so it returns 10. COUNT(manager_id) counts only rows where that expression is non-null, skipping the 3 nulls to return 7. Every standard aggregate except COUNT(*) ignores NULL inputs.
HTTP & APIs/http-protocol/headers
A client sends Content-Type: application/json and Accept: application/xml on a POST. What does this tell the server?
Options
- The request body is JSON; the client would prefer the response in XML
- The request body is XML; the client would prefer the response in JSON
- Both the request and response must be JSON
- The server must reject the request because the two headers conflict
Show answer
It tells the server the request body is JSON and that the client would prefer the response in XML. Content-Type describes the media type of the body being sent, while Accept advertises which media types the client is willing to receive back. They describe opposite directions, so the two headers do not conflict.
Content-Type describes the media type of the body being sent (here the JSON request payload), while Accept advertises which media types the client is willing to receive in the response, driving content negotiation (RFC 9110 §8.3, §12.5.1). They describe opposite directions, so there is no conflict.
Node.js/fs-io/path-module
What is the key difference between path.join(...) and path.resolve(...)?
Options
path.resolveproduces an absolute path (resolving against the cwd if needed);path.joinjust concatenates segments with the separatorpath.joinalways returns an absolute path;path.resolvereturns a relative one- They are aliases and behave identically
path.resolveonly works on Windows
Show answer
path.resolve produces an absolute path, resolving against the current working directory when needed, while path.join simply concatenates and normalizes segments. So path.join stays relative if its inputs are relative, whereas path.resolve('a', 'b') yields <cwd>/a/b.
path.join normalizes and joins segments but stays relative if the inputs are relative. path.resolve processes segments right-to-left until it has built an absolute path, falling back to process.cwd() — so path.resolve('a', 'b') yields <cwd>/a/b.
Python/context-resources/context-managers
Which statements about the with statement and the context-manager protocol are true? Select all that apply.
Options
- An object is a context manager if it defines
__enter__and__exit__ __exit__runs even when thewithbody raises an exception- Returning a truthy value from
__exit__suppresses the exception @contextlib.contextmanagerlets you write one without__exit__, using a generator thatyields once
Show answer
All four statements are true. An object is a context manager if it defines __enter__ and __exit__; __exit__ runs even when the body raises, which is what makes with reliable for cleanup; returning a truthy value from __exit__ suppresses the exception; and @contextlib.contextmanager lets you write one with a single-yield generator instead of those methods.
The protocol is exactly __enter__/__exit__. __exit__ is guaranteed to run on the way out — normal or exceptional — which is what makes with reliable for cleanup; if it returns a truthy value the propagating exception is suppressed. @contextlib.contextmanager wraps a single-yield generator: code before the yield is the enter, code after (in a finally) is the exit.
System Design/sd-patterns/idempotency-keys
A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?
Options
- The client generates a unique key per logical operation and resends the same key on retries
- The server persists the key with the operation's result and replays the stored result on a repeated key
- Idempotency keys are most valuable for non-idempotent verbs like POST, where natural retries are unsafe
- Generating a fresh key on each retry attempt is the recommended approach
- Idempotency keys are unnecessary because TCP already guarantees exactly-once delivery
Show answer
Three statements are correct: the client generates a unique key per logical operation and resends the same key on retries, the server persists the key with the operation's result and replays the stored result on a repeat, and idempotency keys are most valuable for non-idempotent verbs like POST. Generating a fresh key per retry defeats the mechanism, and TCP guarantees reliable bytes, not application-level exactly-once.
Idempotency keys work because the client mints one key per logical operation and reuses it across retries (a), and the server records that key alongside the result so a repeat presents the same outcome instead of executing twice (b). They matter most for non-idempotent methods like POST (c) — GET/PUT/DELETE are already idempotent by definition, so a blind retry of those is safe. The wrong options break the mechanism: minting a new key per attempt (d) defeats the whole point — the server sees each retry as a distinct operation and double-charges. And TCP guarantees reliable, ordered bytes within a connection, not application-level exactly-once semantics across reconnects and timeouts (e); that is exactly the gap idempotency keys fill.
AI Engineering/ai-production/latency-cost
A chat feature feels slow and is expensive at scale. Which techniques are valid ways to reduce latency and/or cost in a production LLM application?
Options
- Stream tokens to the client to lower perceived latency (time-to-first-token)
- Route easy requests to a smaller/cheaper model and reserve the large model for hard ones
- Cache responses (or prompt prefixes) for repeated or near-identical requests
- Always pad every prompt with extra few-shot examples to be safe
- Trim unnecessary context and cap
max_tokensto what the task needs
Show answer
Valid levers are streaming tokens to lower perceived latency, routing easy requests to a smaller cheaper model while reserving the large model for hard ones, caching responses or prompt prefixes for repeated requests, and trimming unnecessary context while capping max_tokens to what the task needs. Each cuts cost, latency, or both. Padding every prompt with extra few-shot examples does the opposite: it inflates input tokens on every call and beyond a point adds no accuracy.
Streaming (a) doesn't change total compute but dramatically improves perceived speed by showing the first tokens immediately. Model routing / cascading (b) sends the bulk of easy traffic to a cheaper model, cutting average cost and latency while preserving quality on the hard tail. Caching (c) avoids paying for work you've already done. Trimming context and bounding output length (e) reduces both input and output tokens, which is where the bill and the time go. Indiscriminately padding every prompt with more few-shot examples (d) does the opposite — it inflates input tokens (cost and latency) on every call, and beyond a point adds no accuracy, so it is a regression, not an optimization.
Databases & SQL/indexes/btree
Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?
Options
WHERE created_at >= '2024-01-01'WHERE EXTRACT(YEAR FROM created_at) = 2024WHERE created_at::text LIKE '%2024%'WHERE created_at <> '2024-01-01'
Show answer
WHERE created_at >= '2024-01-01' is the one a B-tree can accelerate with a single range scan. A B-tree stores keys in sorted order, so range and equality predicates on the bare column map to a contiguous slice. Wrapping the column in a function such as EXTRACT or a cast, or using a leading-wildcard LIKE or <>, defeats the index because the stored key order no longer lines up with the predicate.
A B-tree stores keys in sorted order, so range and equality predicates on the bare column (>=, >, <, BETWEEN, =) map to a contiguous slice it can scan. Wrapping the column in a function (EXTRACT, a cast) or using a leading-wildcard LIKE or <> makes the index unusable because the stored key order no longer lines up with the predicate.
HTTP & APIs/http-protocol/http-caching
Which Cache-Control directive guarantees that a response is never written to any cache (e.g. for a page showing private banking data)?
Options
- no-cache
- no-store
- max-age=0
- must-revalidate
Show answer
no-store is the directive that guarantees a response is never written to any cache, which is what you want for private banking data. no-cache actually allows storage but forces revalidation before reuse, max-age=0 only makes a stored response immediately stale, and must-revalidate merely forbids serving stale responses once they expire.
no-store forbids any cache from storing the request or response at all (RFC 9111 §5.2.2.5). no-cache does allow storage but requires revalidation with the origin before reuse; max-age=0 makes a stored response immediately stale (still revalidatable); must-revalidate only forbids serving stale responses once they expire.