Mid-Level Software Engineer Interview Questions & Prep Guide

Reviewed by Mark Dickie · Last updated 26 June 2026

A mid-level Software Engineer interview tests whether you can move beyond syntax recall and make sound engineering decisions — across data modeling, API design, backend runtimes, and increasingly, AI-adjacent tooling. To prepare well, focus on the gap between knowing how a tool works and knowing when to reach for it: interviewers at this level probe trade-offs, not just definitions. You should be comfortable with relational databases and query performance, writing idiomatic Python, designing HTTP APIs that behave correctly under error conditions, and sketching system designs that account for scale without over-engineering. AI Engineering is appearing more often in mid-level interviews, so a working understanding of how LLM APIs, embeddings, and retrieval patterns fit into a backend architecture is worth your time.

What the interview covers

The questions in this guide draw from six technology areas. Each one maps to a distinct set of concerns interviewers are trying to probe.

| Area | What interviewers are actually testing | |---|---| | Databases & SQL | Schema design, joins, indexing strategy, query optimization, transactions | | Python | Idiomatic patterns, data structures, async basics, testing practices | | AI Engineering | LLM API usage, prompt construction, embeddings, retrieval-augmented generation concepts | | HTTP & APIs | REST semantics, status codes, auth patterns, error handling, versioning | | System Design | Component responsibilities, data flow, caching, scaling trade-offs, failure modes | | Node.js | Event loop model, async/await, middleware patterns, runtime differences vs. Python |

Questions in this guide sit at difficulty 2–4 out of 5: past the "what is a primary key" floor, and short of the "design a distributed transaction system from scratch" ceiling. That range is where most mid-level interviews actually live.

How to work through your prep

A sequenced approach beats random drilling. Start where the fundamentals are most transferable, then build outward toward the areas that require broader context.

SQL and Databases first. Almost every backend role touches a database. Locking down query writing, index selection, and normalization gives you a base that makes system design conversations easier.
Python fundamentals and idioms. Work through the language features that distinguish mid-level code from junior code: generators, context managers, comprehensions, and the basics of writing testable functions.
HTTP & APIs. Understand the full request/response cycle, what correct REST resource modeling looks like, and how auth mechanisms like OAuth 2.0 and JWT actually work under the hood.
Node.js runtime model. If you come from a Python background, the single-threaded event loop in Node.js is the concept most likely to trip you up. Get clear on how I/O concurrency works before the interview.
AI Engineering patterns. You do not need to have built a production LLM product, but you should be able to talk through how you would integrate an LLM API into a backend service, handle context windows, and think about retrieval.
System Design last. It synthesizes everything above. Practice drawing out a design for a familiar service — a URL shortener, a notification system — and narrating your trade-off decisions out loud.

The quiz below pulls live questions from all six areas. Market salary and demand data for mid-level Software Engineers is rendered below the quiz.

What to study, in order

For a mid-level Software Engineer interview, prioritise the role's most in-demand technologies first:

Databases & SQL
Python
AI Engineering
HTTP & APIs
System Design
Node.js

Software Engineer salary & demand

From live job postings, the median Software Engineer salary is £55,000, across 297 recent postings. These figures are role-wide across all seniority levels, as of June 2026.

25th percentile	£37,500
Median (50th percentile)	£55,000
75th percentile	£82,500

Practice questions

AI Engineering/evaluation-safety/prompt-injection

Your agent reads untrusted web content and has tools that can read files and call internal APIs. Which of the following are genuine, meaningful mitigations against prompt injection?

Options

Apply least privilege to tools and require human approval (or hard policy checks) for high-impact actions
Clearly delimit untrusted content and instruct the model to treat it as data, not instructions
Raise the temperature so the model is less predictable to attackers
Validate and constrain tool outputs/arguments before executing them (allow-lists, schemas, sandboxing)
Trust the system prompt to always win because it appears first

Show answer

The genuine mitigations are applying least privilege to tools with human approval for high-impact actions, clearly delimiting untrusted content and labeling it as data not instructions, and validating or sandboxing tool outputs and arguments before executing them. These are layered and architectural. Raising the temperature does nothing for safety, and trusting the system prompt to always win is false: there is no hard precedence boundary in the token stream, so crafted injections routinely override prepended instructions.

Why:

Effective defenses are layered and architectural. Least privilege plus human-in-the-loop for dangerous actions (a) limits blast radius even when an injection succeeds. Delimiting untrusted text and labeling it as data (b) helps the model resist hijacking — it is necessary but not sufficient on its own. Validating/sandboxing what tools receive and do (d) stops a hijacked call from causing real damage. Raising temperature (c) does nothing for safety; it just adds randomness and can make the system less reliable. "The system prompt always wins" (e) is false — there is no hard precedence boundary in the token stream, and crafted injections routinely override prepended instructions, which is exactly why you cannot rely on prompt ordering alone.

Databases & SQL/querying/joins

A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?

Options

NULL in every orders column
The row is omitted entirely
0 in every orders column
An empty string in every orders column

Show answer

Every orders column holds NULL. A LEFT JOIN keeps each left-hand row even when no right-hand row matches the ON condition, filling the unmatched right-hand columns with NULL rather than omitting the row or substituting 0 or empty strings. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers who have placed no orders.

Why:

A LEFT JOIN keeps every left-hand row even when no right-hand row matches the ON condition; the unmatched right-hand columns are filled with NULL. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers with no orders.

HTTP & APIs/http-protocol/http-methods

Which HTTP method is intended to fully replace the resource at a known URL with the representation in the request body?

Options

POST
PUT
PATCH
GET

Show answer

PUT is the method intended to fully replace the resource at a known URL with the representation in the request body. It is idempotent, so repeating it produces the same result. PATCH only applies a partial modification, POST lets the server decide how to process the body, and GET is a read with no body semantics.

Why:

PUT replaces the target resource in full with the supplied representation and is idempotent (RFC 9110 §9.3.4). PATCH applies a partial modification, POST processes the body however the server decides (often creating a subordinate resource), and GET is a read with no body semantics.

Node.js/node-modules/commonjs

In a CommonJS module (a .js file loaded with require), which of these is a real, module-scoped variable Node injects automatically?

Options

__dirname
import.meta.url
globalThis.module
window

Show answer

__dirname is the real module-scoped variable Node injects automatically. Node wraps every CommonJS module in a function that receives exports, require, module, __filename, and __dirname as parameters, so it is available without any import. import.meta only exists in ES modules, and window is a browser global.

Why:

Node wraps every CommonJS module in a function that receives exports, require, module, __filename, and __dirname as parameters, so __dirname is available without importing anything. import.meta only exists in ES modules; window is a browser global.

Python/data-model/truthiness

Which of these values are falsy (i.e. bool(value) is False)? Select all that apply.

Options

[]
0
""
"0"
[0]

Show answer

The falsy values are the empty list [], the number zero 0, and the empty string — but not the string "0" or the list [0]. Truthiness depends on emptiness, not on the contents being zero-like, so a non-empty string "0" and a non-empty list [0] are both truthy even though they contain a zero.

Why:

Empty containers ([]), the number zero (0), and the empty string ("") are all falsy. The string "0" is a non-empty string and the list [0] is a non-empty list — both are truthy, because truthiness depends on emptiness, not on the contents being zero-like.

System Design/sd-patterns/message-queues

Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?

Options

Checkout no longer blocks on slow consumers — it returns once the event is enqueued
The queue absorbs traffic spikes, buffering work so consumers can drain at their own pace
A temporarily down consumer can recover and process backlog without losing events
It guarantees end-to-end latency is lower than the synchronous version for every request
It removes the need for consumers to handle duplicate deliveries

Show answer

The real benefits are that checkout no longer blocks on slow consumers (it returns once the event is enqueued), the queue absorbs traffic spikes and buffers work so consumers drain at their own pace, and a temporarily down consumer can recover and process the backlog without losing events. It does not guarantee lower per-request end-to-end latency, and it does not remove the need to handle duplicate deliveries.

Why:

A queue decouples producers from consumers: checkout returns as soon as the event is enqueued (a), the queue acts as a buffer that smooths spikes so consumers process at a sustainable rate (b), and durable queues retain messages so a consumer that was down can drain the backlog on recovery (c). The two wrong options reflect common misconceptions. Async processing does not lower per-request end-to-end latency (d) — the downstream work still happens, just later; you trade latency-to-completion for responsiveness and resilience. And most queues offer at-least-once delivery, so consumers must be idempotent to tolerate duplicates (e); the queue does not remove that obligation.

AI Engineering/rag/retrieval-quality

A RAG system returns confident but wrong answers because the retrieved chunks are often irrelevant. Which changes are legitimate levers to improve retrieval quality?

Options

Add a reranker (e.g. a cross-encoder) over the top-k candidates before passing them to the model
Tune chunk size and overlap so chunks are semantically coherent and self-contained
Combine dense (vector) retrieval with sparse keyword search (hybrid retrieval)
Raise the generation model's max_tokens so it can write a longer answer
Switch to a stronger embedding model better matched to your domain

Show answer

The legitimate levers are adding a reranker such as a cross-encoder over the top-k candidates, tuning chunk size and overlap so chunks are coherent and self-contained, combining dense vector retrieval with sparse keyword search (hybrid retrieval), and switching to a stronger embedding model matched to your domain. Each improves which chunks reach the context. Raising the generation model's max_tokens only changes how long the answer may be — it does nothing about which documents were retrieved.

Why:

Retrieval quality is about getting the right chunks into context. A reranker (a) reorders the initial candidate set with a more expensive, more accurate cross-encoder so the best passages float to the top. Chunking strategy (b) directly affects whether a chunk contains a complete, embeddable idea rather than a fragment. Hybrid retrieval (c) catches cases where exact terms/IDs matter that dense vectors miss, and vice versa. A better-matched embedding model (e) improves the similarity signal at the source. Raising max_tokens (d) only changes how long the answer may be — it does nothing about which documents were retrieved, so it cannot fix retrieving the wrong material.

Databases & SQL/querying/aggregation

A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?

Options

COUNT(*) returns 10; COUNT(manager_id) returns 7
Both return 10
Both return 7
COUNT(*) returns 7; COUNT(manager_id) returns 10

Show answer

COUNT(*) returns 10 and COUNT(manager_id) returns 7. COUNT(*) counts every row regardless of nulls, while COUNT(manager_id) counts only rows where that column is non-null, skipping the 3 nulls. Every standard aggregate except COUNT(*) ignores NULL inputs.

Why:

COUNT(*) counts rows regardless of nulls, so it returns 10. COUNT(manager_id) counts only rows where that expression is non-null, skipping the 3 nulls to return 7. Every standard aggregate except COUNT(*) ignores NULL inputs.

HTTP & APIs/http-protocol/status-codes

A POST /orders request successfully creates a new order. Which status code is the most appropriate response?

Options

200 OK
201 Created
202 Accepted
204 No Content

Show answer

201 Created is the most appropriate response when a POST successfully creates a new resource, and it should carry a Location header pointing at the new order. 200 OK implies a generic success with no creation, 202 Accepted means processing is still happening asynchronously, and 204 No Content means success with no body.

Why:

201 Created signals that a new resource was created as a result of the request and should carry a Location header pointing at it (RFC 9110 §15.3.2). 200 implies a generic success with no creation, 202 means the request was accepted for asynchronous processing that hasn't finished, and 204 means success with no body.

Node.js/node-modules/esm-node

A package's package.json has "type": "module". What does a .js file in that package use to import a dependency?

Options

import x from "pkg"; (static ESM syntax)
const x = require("pkg");
module.import("pkg")
include "pkg";

Show answer

It uses static ESM syntax like import x from "pkg";. Setting "type": "module" makes .js files ES modules, where import/export is the import mechanism and require is not defined in scope. To keep using CommonJS require in such a package, name the file .cjs instead.

Why:

"type": "module" makes .js files ES modules, so they use import/export. require is not defined in an ES module scope (you would use createRequire to get it). To keep using CommonJS in such a package, name the file .cjs.

Python/iteration/generators

Which statements are true of a generator object created by a generator expression like (x for x in range(3))? Select all that apply.

Options

It is iterated lazily, producing one value at a time
It is single-pass — once exhausted it yields nothing more
It supports len() to report how many values remain
It supports indexing like g[0]

Show answer

A generator is iterated lazily, producing one value at a time, and it is single-pass, so once exhausted it yields nothing more. It does not support len() or indexing like g[0], because there is no materialized collection behind it — you would convert it to a list first to get either.

Why:

A generator computes values on demand (lazy) and can be consumed only once — after it is exhausted, further iteration yields nothing. It does not support len() or indexing, because it has no materialized collection behind it; you would convert it to a list first to get either.

System Design/sd-patterns/idempotency-keys

A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?

Options

The client generates a unique key per logical operation and resends the same key on retries
The server persists the key with the operation's result and replays the stored result on a repeated key
Idempotency keys are most valuable for non-idempotent verbs like POST, where natural retries are unsafe
Generating a fresh key on each retry attempt is the recommended approach
Idempotency keys are unnecessary because TCP already guarantees exactly-once delivery

Show answer

Three statements are correct: the client generates a unique key per logical operation and resends the same key on retries, the server persists the key with the operation's result and replays the stored result on a repeat, and idempotency keys are most valuable for non-idempotent verbs like POST. Generating a fresh key per retry defeats the mechanism, and TCP guarantees reliable bytes, not application-level exactly-once.

Why:

Idempotency keys work because the client mints one key per logical operation and reuses it across retries (a), and the server records that key alongside the result so a repeat presents the same outcome instead of executing twice (b). They matter most for non-idempotent methods like POST (c) — GET/PUT/DELETE are already idempotent by definition, so a blind retry of those is safe. The wrong options break the mechanism: minting a new key per attempt (d) defeats the whole point — the server sees each retry as a distinct operation and double-charges. And TCP guarantees reliable, ordered bytes within a connection, not application-level exactly-once semantics across reconnects and timeouts (e); that is exactly the gap idempotency keys fill.

AI Engineering/ai-production/latency-cost

A chat feature feels slow and is expensive at scale. Which techniques are valid ways to reduce latency and/or cost in a production LLM application?

Options

Stream tokens to the client to lower perceived latency (time-to-first-token)
Route easy requests to a smaller/cheaper model and reserve the large model for hard ones
Cache responses (or prompt prefixes) for repeated or near-identical requests
Always pad every prompt with extra few-shot examples to be safe
Trim unnecessary context and cap max_tokens to what the task needs

Show answer

Valid levers are streaming tokens to lower perceived latency, routing easy requests to a smaller cheaper model while reserving the large model for hard ones, caching responses or prompt prefixes for repeated requests, and trimming unnecessary context while capping max_tokens to what the task needs. Each cuts cost, latency, or both. Padding every prompt with extra few-shot examples does the opposite: it inflates input tokens on every call and beyond a point adds no accuracy.

Why:

Streaming (a) doesn't change total compute but dramatically improves perceived speed by showing the first tokens immediately. Model routing / cascading (b) sends the bulk of easy traffic to a cheaper model, cutting average cost and latency while preserving quality on the hard tail. Caching (c) avoids paying for work you've already done. Trimming context and bounding output length (e) reduces both input and output tokens, which is where the bill and the time go. Indiscriminately padding every prompt with more few-shot examples (d) does the opposite — it inflates input tokens (cost and latency) on every call, and beyond a point adds no accuracy, so it is a regression, not an optimization.

Databases & SQL/indexes/btree

Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?

Options

WHERE created_at >= '2024-01-01'
WHERE EXTRACT(YEAR FROM created_at) = 2024
WHERE created_at::text LIKE '%2024%'
WHERE created_at <> '2024-01-01'

Show answer

WHERE created_at >= '2024-01-01' is the one a B-tree can accelerate with a single range scan. A B-tree stores keys in sorted order, so range and equality predicates on the bare column map to a contiguous slice. Wrapping the column in a function such as EXTRACT or a cast, or using a leading-wildcard LIKE or <>, defeats the index because the stored key order no longer lines up with the predicate.

Why:

A B-tree stores keys in sorted order, so range and equality predicates on the bare column (>=, >, <, BETWEEN, =) map to a contiguous slice it can scan. Wrapping the column in a function (EXTRACT, a cast) or using a leading-wildcard LIKE or <> makes the index unusable because the stored key order no longer lines up with the predicate.

HTTP & APIs/http-protocol/headers

A client sends Content-Type: application/json and Accept: application/xml on a POST. What does this tell the server?

Options

The request body is JSON; the client would prefer the response in XML
The request body is XML; the client would prefer the response in JSON
Both the request and response must be JSON
The server must reject the request because the two headers conflict

Show answer

It tells the server the request body is JSON and that the client would prefer the response in XML. Content-Type describes the media type of the body being sent, while Accept advertises which media types the client is willing to receive back. They describe opposite directions, so the two headers do not conflict.

Why:

Content-Type describes the media type of the body being sent (here the JSON request payload), while Accept advertises which media types the client is willing to receive in the response, driving content negotiation (RFC 9110 §8.3, §12.5.1). They describe opposite directions, so there is no conflict.

What to study, in order

Software Engineer salary & demand

Practice questions

Your agent reads untrusted web content and has tools that can read files and call internal APIs. Which of the following are genuine, meaningful mitigations against prompt injection?

A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?

Which HTTP method is intended to fully replace the resource at a known URL with the representation in the request body?

In a CommonJS module (a .js file loaded with require), which of these is a real, module-scoped variable Node injects automatically?

Which of these values are falsy (i.e. bool(value) is False)? Select all that apply.

Your checkout flow synchronously calls an email service, an analytics pipeline, and a fraud-scoring job, and a slow dependency now blocks orders. You introduce a message queue so checkout publishes events and workers consume them. Which benefits does this asynchronous decoupling genuinely provide?

A RAG system returns confident but wrong answers because the retrieved chunks are often irrelevant. Which changes are legitimate levers to improve retrieval quality?

A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?

A POST /orders request successfully creates a new order. Which status code is the most appropriate response?

A package's package.json has "type": "module". What does a .js file in that package use to import a dependency?

Which statements are true of a generator object created by a generator expression like (x for x in range(3))? Select all that apply.

A mobile client retries a POST /payments when the network drops, risking a double charge. You add idempotency keys. Which statements about implementing them correctly are true?

A chat feature feels slow and is expensive at scale. Which techniques are valid ways to reduce latency and/or cost in a production LLM application?

Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?

A client sends Content-Type: application/json and Accept: application/xml on a POST. What does this tell the server?

New topics and job-market signal, in your inbox