Senior Software Engineer Interview Prep: Questions Across Python, SQL, System Design & More

Reviewed by Mark Dickie · Last updated 26 June 2026

A Senior Software Engineer interview tests your ability to own technical decisions end-to-end — from writing reliable Python and SQL to designing distributed systems, building AI-integrated services, and reasoning about HTTP APIs and Node.js runtimes under real production constraints. To prepare well, you need depth in at least two or three of the core technology areas below, plus the ability to connect them: interviewers at this level want to see why you made a choice, not just what the syntax is. System Design questions carry heavy weight at senior level, so treat them as the spine of your prep and fill in the language and data-layer specifics around them. Strong candidates can also speak to AI Engineering patterns — retrieval-augmented generation, embedding pipelines, model-serving trade-offs — which have moved from niche to expected at many teams.

Core competency areas

| Area | What interviewers probe | |---|---| | Databases & SQL | Query optimisation, index design, transaction isolation, schema trade-offs (relational vs. document) | | Python | Async execution, memory model, idiomatic data structures, testing patterns, packaging | | AI Engineering | Prompt design, RAG pipelines, vector stores, model-serving latency, evaluation strategies | | HTTP & APIs | REST vs. GraphQL vs. gRPC, auth flows (OAuth 2, JWT), rate limiting, versioning, caching headers | | System Design | Capacity estimation, consistency vs. availability trade-offs, message queues, service boundaries | | Node.js | Event loop mechanics, async/await pitfalls, streaming, module system, performance profiling |

Suggested study order

Work through these in sequence. Each layer builds on the previous one, so skipping ahead tends to leave gaps that surface badly in design rounds.

SQL and data modelling fundamentals — if your query plans and index reasoning are shaky, every system-design answer that touches persistence will wobble with them.
Python internals and async patterns — understand the GIL, asyncio, and how Python interacts with I/O before you try to design services that use it at scale.
HTTP, APIs, and auth flows — most distributed systems interview questions assume you can talk fluently about how services communicate and how you secure those channels.
Node.js runtime model — study the event loop and where it breaks down under CPU-bound load; this pairs directly with API and concurrency questions.
AI Engineering concepts — ground yourself in the practical patterns (RAG, chunking strategies, vector search, evaluation loops) rather than the theory; that is what senior-level questions target.
System Design end-to-end — bring everything together here. Practice talking through capacity, data flow, failure modes, and operational concerns on realistic prompts like "design a job queue" or "design a semantic search service."

Senior roles attract competitive compensation, and the market for engineers who can span backend systems and AI Engineering patterns is particularly strong right now. The live data below this intro reflects current salary ranges and demand signals for this role, so check those figures before you head into any negotiation.

What to study, in order

For a senior Software Engineer interview, prioritise the role's most in-demand technologies first:

Databases & SQL
Python
AI Engineering
HTTP & APIs
System Design
Node.js

Software Engineer salary & demand

From live job postings, the median Software Engineer salary is £55,000, across 297 recent postings. These figures are role-wide across all seniority levels, as of June 2026.

25th percentile	£37,500
Median (50th percentile)	£55,000
75th percentile	£82,500

Practice questions

AI Engineering/evaluation-safety/prompt-injection

Your agent reads untrusted web content and has tools that can read files and call internal APIs. Which of the following are genuine, meaningful mitigations against prompt injection?

Options

Apply least privilege to tools and require human approval (or hard policy checks) for high-impact actions
Clearly delimit untrusted content and instruct the model to treat it as data, not instructions
Raise the temperature so the model is less predictable to attackers
Validate and constrain tool outputs/arguments before executing them (allow-lists, schemas, sandboxing)
Trust the system prompt to always win because it appears first

Show answer

The genuine mitigations are applying least privilege to tools with human approval for high-impact actions, clearly delimiting untrusted content and labeling it as data not instructions, and validating or sandboxing tool outputs and arguments before executing them. These are layered and architectural. Raising the temperature does nothing for safety, and trusting the system prompt to always win is false: there is no hard precedence boundary in the token stream, so crafted injections routinely override prepended instructions.

Why:

Effective defenses are layered and architectural. Least privilege plus human-in-the-loop for dangerous actions (a) limits blast radius even when an injection succeeds. Delimiting untrusted text and labeling it as data (b) helps the model resist hijacking — it is necessary but not sufficient on its own. Validating/sandboxing what tools receive and do (d) stops a hijacked call from causing real damage. Raising temperature (c) does nothing for safety; it just adds randomness and can make the system less reliable. "The system prompt always wins" (e) is false — there is no hard precedence boundary in the token stream, and crafted injections routinely override prepended instructions, which is exactly why you cannot rely on prompt ordering alone.

Databases & SQL/querying/joins

A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?

Options

NULL in every orders column
The row is omitted entirely
0 in every orders column
An empty string in every orders column

Show answer

Every orders column holds NULL. A LEFT JOIN keeps each left-hand row even when no right-hand row matches the ON condition, filling the unmatched right-hand columns with NULL rather than omitting the row or substituting 0 or empty strings. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers who have placed no orders.

Why:

A LEFT JOIN keeps every left-hand row even when no right-hand row matches the ON condition; the unmatched right-hand columns are filled with NULL. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers with no orders.

HTTP & APIs/http-protocol/status-codes

A request carries a valid, authenticated session, but the user lacks permission for the action. Which status code is correct?

Options

400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found

Show answer

403 Forbidden is correct when the request is authenticated but the user lacks permission for the action. The server understood the request and knows who the client is, but refuses it, so re-authenticating will not help. 401 Unauthorized is reserved for missing or invalid credentials and must include a WWW-Authenticate header.

Why:

403 Forbidden means the server understood the request and the client is authenticated, but is not allowed to perform it — re-authenticating will not help (RFC 9110 §15.5.4). 401 Unauthorized is for missing or invalid credentials and must include a WWW-Authenticate header. (404 is sometimes used deliberately to hide a resource's existence, but the textbook answer for a known authorization failure is 403.)

Node.js/node-modules/esm-node

A package's package.json has "type": "module". What does a .js file in that package use to import a dependency?

Options

import x from "pkg"; (static ESM syntax)
const x = require("pkg");
module.import("pkg")
include "pkg";

Show answer

It uses static ESM syntax like import x from "pkg";. Setting "type": "module" makes .js files ES modules, where import/export is the import mechanism and require is not defined in scope. To keep using CommonJS require in such a package, name the file .cjs instead.

Why:

"type": "module" makes .js files ES modules, so they use import/export. require is not defined in an ES module scope (you would use createRequire to get it). To keep using CommonJS in such a package, name the file .cjs.

Python/iteration/generators

Which statements are true of a generator object created by a generator expression like (x for x in range(3))? Select all that apply.

Options

It is iterated lazily, producing one value at a time
It is single-pass — once exhausted it yields nothing more
It supports len() to report how many values remain
It supports indexing like g[0]

Show answer

A generator is iterated lazily, producing one value at a time, and it is single-pass, so once exhausted it yields nothing more. It does not support len() or indexing like g[0], because there is no materialized collection behind it — you would convert it to a list first to get either.

Why:

A generator computes values on demand (lazy) and can be consumed only once — after it is exhausted, further iteration yields nothing. It does not support len() or indexing, because it has no materialized collection behind it; you would convert it to a list first to get either.

System Design/sd-patterns/message-queues

Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?

Options

Checkout no longer blocks on slow consumers — it returns once the event is enqueued
The queue absorbs traffic spikes, buffering work so consumers can drain at their own pace
A temporarily down consumer can recover and process backlog without losing events
It guarantees end-to-end latency is lower than the synchronous version for every request
It removes the need for consumers to handle duplicate deliveries

Show answer

The real benefits are that checkout no longer blocks on slow consumers (it returns once the event is enqueued), the queue absorbs traffic spikes and buffers work so consumers drain at their own pace, and a temporarily down consumer can recover and process the backlog without losing events. It does not guarantee lower per-request end-to-end latency, and it does not remove the need to handle duplicate deliveries.

Why:

A queue decouples producers from consumers: checkout returns as soon as the event is enqueued (a), the queue acts as a buffer that smooths spikes so consumers process at a sustainable rate (b), and durable queues retain messages so a consumer that was down can drain the backlog on recovery (c). The two wrong options reflect common misconceptions. Async processing does not lower per-request end-to-end latency (d) — the downstream work still happens, just later; you trade latency-to-completion for responsiveness and resilience. And most queues offer at-least-once delivery, so consumers must be idempotent to tolerate duplicates (e); the queue does not remove that obligation.

AI Engineering/rag/retrieval-quality

A RAG system returns confident but wrong answers because the retrieved chunks are often irrelevant. Which changes are legitimate levers to improve retrieval quality?

Options

Add a reranker (e.g. a cross-encoder) over the top-k candidates before passing them to the model
Tune chunk size and overlap so chunks are semantically coherent and self-contained
Combine dense (vector) retrieval with sparse keyword search (hybrid retrieval)
Raise the generation model's max_tokens so it can write a longer answer
Switch to a stronger embedding model better matched to your domain

Show answer

The legitimate levers are adding a reranker such as a cross-encoder over the top-k candidates, tuning chunk size and overlap so chunks are coherent and self-contained, combining dense vector retrieval with sparse keyword search (hybrid retrieval), and switching to a stronger embedding model matched to your domain. Each improves which chunks reach the context. Raising the generation model's max_tokens only changes how long the answer may be — it does nothing about which documents were retrieved.

Why:

Retrieval quality is about getting the right chunks into context. A reranker (a) reorders the initial candidate set with a more expensive, more accurate cross-encoder so the best passages float to the top. Chunking strategy (b) directly affects whether a chunk contains a complete, embeddable idea rather than a fragment. Hybrid retrieval (c) catches cases where exact terms/IDs matter that dense vectors miss, and vice versa. A better-matched embedding model (e) improves the similarity signal at the source. Raising max_tokens (d) only changes how long the answer may be — it does nothing about which documents were retrieved, so it cannot fix retrieving the wrong material.

Databases & SQL/querying/aggregation

A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?

Options

COUNT(*) returns 10; COUNT(manager_id) returns 7
Both return 10
Both return 7
COUNT(*) returns 7; COUNT(manager_id) returns 10

Show answer

COUNT(*) returns 10 and COUNT(manager_id) returns 7. COUNT(*) counts every row regardless of nulls, while COUNT(manager_id) counts only rows where that column is non-null, skipping the 3 nulls. Every standard aggregate except COUNT(*) ignores NULL inputs.

Why:

COUNT(*) counts rows regardless of nulls, so it returns 10. COUNT(manager_id) counts only rows where that expression is non-null, skipping the 3 nulls to return 7. Every standard aggregate except COUNT(*) ignores NULL inputs.

HTTP & APIs/http-protocol/headers

A client sends Content-Type: application/json and Accept: application/xml on a POST. What does this tell the server?

Options

The request body is JSON; the client would prefer the response in XML
The request body is XML; the client would prefer the response in JSON
Both the request and response must be JSON
The server must reject the request because the two headers conflict

Show answer

It tells the server the request body is JSON and that the client would prefer the response in XML. Content-Type describes the media type of the body being sent, while Accept advertises which media types the client is willing to receive back. They describe opposite directions, so the two headers do not conflict.

Why:

Content-Type describes the media type of the body being sent (here the JSON request payload), while Accept advertises which media types the client is willing to receive in the response, driving content negotiation (RFC 9110 §8.3, §12.5.1). They describe opposite directions, so there is no conflict.

Node.js/fs-io/path-module

What is the key difference between path.join(...) and path.resolve(...)?

Options

path.resolve produces an absolute path (resolving against the cwd if needed); path.join just concatenates segments with the separator
path.join always returns an absolute path; path.resolve returns a relative one
They are aliases and behave identically
path.resolve only works on Windows

Show answer

path.resolve produces an absolute path, resolving against the current working directory when needed, while path.join simply concatenates and normalizes segments. So path.join stays relative if its inputs are relative, whereas path.resolve('a', 'b') yields <cwd>/a/b.

Why:

path.join normalizes and joins segments but stays relative if the inputs are relative. path.resolve processes segments right-to-left until it has built an absolute path, falling back to process.cwd() — so path.resolve('a', 'b') yields <cwd>/a/b.

Python/context-resources/context-managers

Which statements about the with statement and the context-manager protocol are true? Select all that apply.

Options

An object is a context manager if it defines __enter__ and __exit__
__exit__ runs even when the with body raises an exception
Returning a truthy value from __exit__ suppresses the exception
@contextlib.contextmanager lets you write one without __exit__, using a generator that yields once

Show answer

All four statements are true. An object is a context manager if it defines __enter__ and __exit__; __exit__ runs even when the body raises, which is what makes with reliable for cleanup; returning a truthy value from __exit__ suppresses the exception; and @contextlib.contextmanager lets you write one with a single-yield generator instead of those methods.

Why:

The protocol is exactly __enter__/__exit__. __exit__ is guaranteed to run on the way out — normal or exceptional — which is what makes with reliable for cleanup; if it returns a truthy value the propagating exception is suppressed. @contextlib.contextmanager wraps a single-yield generator: code before the yield is the enter, code after (in a finally) is the exit.

System Design/sd-patterns/idempotency-keys

A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?

Options

The client generates a unique key per logical operation and resends the same key on retries
The server persists the key with the operation's result and replays the stored result on a repeated key
Idempotency keys are most valuable for non-idempotent verbs like POST, where natural retries are unsafe
Generating a fresh key on each retry attempt is the recommended approach
Idempotency keys are unnecessary because TCP already guarantees exactly-once delivery

Show answer

Three statements are correct: the client generates a unique key per logical operation and resends the same key on retries, the server persists the key with the operation's result and replays the stored result on a repeat, and idempotency keys are most valuable for non-idempotent verbs like POST. Generating a fresh key per retry defeats the mechanism, and TCP guarantees reliable bytes, not application-level exactly-once.

Why:

Idempotency keys work because the client mints one key per logical operation and reuses it across retries (a), and the server records that key alongside the result so a repeat presents the same outcome instead of executing twice (b). They matter most for non-idempotent methods like POST (c) — GET/PUT/DELETE are already idempotent by definition, so a blind retry of those is safe. The wrong options break the mechanism: minting a new key per attempt (d) defeats the whole point — the server sees each retry as a distinct operation and double-charges. And TCP guarantees reliable, ordered bytes within a connection, not application-level exactly-once semantics across reconnects and timeouts (e); that is exactly the gap idempotency keys fill.

AI Engineering/ai-production/latency-cost

A chat feature feels slow and is expensive at scale. Which techniques are valid ways to reduce latency and/or cost in a production LLM application?

Options

Stream tokens to the client to lower perceived latency (time-to-first-token)
Route easy requests to a smaller/cheaper model and reserve the large model for hard ones
Cache responses (or prompt prefixes) for repeated or near-identical requests
Always pad every prompt with extra few-shot examples to be safe
Trim unnecessary context and cap max_tokens to what the task needs

Show answer

Valid levers are streaming tokens to lower perceived latency, routing easy requests to a smaller cheaper model while reserving the large model for hard ones, caching responses or prompt prefixes for repeated requests, and trimming unnecessary context while capping max_tokens to what the task needs. Each cuts cost, latency, or both. Padding every prompt with extra few-shot examples does the opposite: it inflates input tokens on every call and beyond a point adds no accuracy.

Why:

Streaming (a) doesn't change total compute but dramatically improves perceived speed by showing the first tokens immediately. Model routing / cascading (b) sends the bulk of easy traffic to a cheaper model, cutting average cost and latency while preserving quality on the hard tail. Caching (c) avoids paying for work you've already done. Trimming context and bounding output length (e) reduces both input and output tokens, which is where the bill and the time go. Indiscriminately padding every prompt with more few-shot examples (d) does the opposite — it inflates input tokens (cost and latency) on every call, and beyond a point adds no accuracy, so it is a regression, not an optimization.

Databases & SQL/indexes/btree

Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?

Options

WHERE created_at >= '2024-01-01'
WHERE EXTRACT(YEAR FROM created_at) = 2024
WHERE created_at::text LIKE '%2024%'
WHERE created_at <> '2024-01-01'

Show answer

WHERE created_at >= '2024-01-01' is the one a B-tree can accelerate with a single range scan. A B-tree stores keys in sorted order, so range and equality predicates on the bare column map to a contiguous slice. Wrapping the column in a function such as EXTRACT or a cast, or using a leading-wildcard LIKE or <>, defeats the index because the stored key order no longer lines up with the predicate.

Why:

A B-tree stores keys in sorted order, so range and equality predicates on the bare column (>=, >, <, BETWEEN, =) map to a contiguous slice it can scan. Wrapping the column in a function (EXTRACT, a cast) or using a leading-wildcard LIKE or <> makes the index unusable because the stored key order no longer lines up with the predicate.

HTTP & APIs/http-protocol/http-caching

Which Cache-Control directive guarantees that a response is never written to any cache (e.g. for a page showing private banking data)?

Options

no-cache
no-store
max-age=0
must-revalidate

Show answer

no-store is the directive that guarantees a response is never written to any cache, which is what you want for private banking data. no-cache actually allows storage but forces revalidation before reuse, max-age=0 only makes a stored response immediately stale, and must-revalidate merely forbids serving stale responses once they expire.

Why:

no-store forbids any cache from storing the request or response at all (RFC 9111 §5.2.2.5). no-cache does allow storage but requires revalidation with the origin before reuse; max-age=0 makes a stored response immediately stale (still revalidatable); must-revalidate only forbids serving stale responses once they expire.

What to study, in order

Software Engineer salary & demand

Practice questions

Your agent reads untrusted web content and has tools that can read files and call internal APIs. Which of the following are genuine, meaningful mitigations against prompt injection?

A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?

A request carries a valid, authenticated session, but the user lacks permission for the action. Which status code is correct?

A package's package.json has "type": "module". What does a .js file in that package use to import a dependency?

Which statements are true of a generator object created by a generator expression like (x for x in range(3))? Select all that apply.

Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?

A RAG system returns confident but wrong answers because the retrieved chunks are often irrelevant. Which changes are legitimate levers to improve retrieval quality?

A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?

A client sends Content-Type: application/json and Accept: application/xml on a POST. What does this tell the server?

What is the key difference between path.join(...) and path.resolve(...)?

Which statements about the with statement and the context-manager protocol are true? Select all that apply.

A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?

A chat feature feels slow and is expensive at scale. Which techniques are valid ways to reduce latency and/or cost in a production LLM application?

Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?

Which Cache-Control directive guarantees that a response is never written to any cache (e.g. for a page showing private banking data)?

New topics and job-market signal, in your inbox