Databases & SQL Interview Questions — Practice Quiz for All Levels

Reviewed by Mark Dickie · Last updated 29 June 2026

Databases and SQL are the foundational layer of nearly every production system — a structured way to store, retrieve, and manipulate data using a declarative query language standardised across dozens of engines. To do well in a database interview, you need to be solid on four things: writing correct SQL (joins, aggregations, subqueries), understanding how engines execute and optimise queries, knowing when to reach for an index versus a schema change, and being able to reason about transactions and isolation levels out loud. Expect questions to probe real trade-offs — a candidate who can explain why a clustered index on a high-write table can hurt throughput will stand out far more than one who just lists index types.

What the quiz covers

Questions span the full range from basic SELECT syntax to multi-table schema design and ACID guarantees. The difficulty curve below maps to what interviewers typically ask at each stage:

| Career stage | Typical topics tested | |---|---| | Junior / intern | SELECT, WHERE, ORDER BY, basic JOINs, primary keys | | Mid-level | GROUP BY, subqueries, NULLs, normalisation (1NF–3NF), foreign keys | | Senior | Window functions, query plans, index strategy, locking, MVCC | | Staff / architect | Partitioning, replication lag, CAP trade-offs, schema migration at scale |

Core concepts to review before you start

Work through these in order if you are short on time:

JOIN types — know the difference between INNER, LEFT, RIGHT, and FULL OUTER joins, and be ready to write each one from scratch on a whiteboard.
Indexes — understand B-tree vs. hash indexes, covering indexes, and the cost of indexing on columns with low cardinality.
ACID properties — be able to define Atomicity, Consistency, Isolation, and Durability and give a concrete example of what breaks when each one is violated.
Normalisation vs. denormalisation — know the normal forms up to 3NF, but also know when a read-heavy workload justifies intentional denormalisation.
Window functions — ROW_NUMBER(), RANK(), LAG(), and LEAD() now appear regularly in mid-to-senior screens; at minimum, understand the OVER (PARTITION BY ... ORDER BY ...) clause.

One area candidates consistently underestimate: NULL handling. In SQL, NULL is not zero and not an empty string — it means unknown, and comparisons with NULL using = always evaluate to NULL, not TRUE or FALSE. Use IS NULL / IS NOT NULL and be aware of how NULLs propagate through COUNT, SUM, and JOIN conditions.

At a glance

Questions	15
Difficulty	1–5 of 5
Formats	Multiple choice, Multiple answer, True / false, Code output, Find the bug, Short answer, Flashcard

What you'll review

select basics
joins
aggregation
btree
foreign keys
index tradeoffs
isolation levels
constraints
composite index
acid
normalization
subqueries
n plus one
query planning
covering index

Practice questions

Databases & SQL/querying/select-basics

Which keyword removes duplicate rows from a SELECT result set?

Options

DISTINCT
UNIQUE
GROUP
DEDUPE

Show answer

DISTINCT removes duplicate rows from a SELECT result set. Writing SELECT DISTINCT collapses rows that are identical across all selected columns into a single row. UNIQUE is a column or table constraint rather than a query keyword, and GROUP BY aggregates rows instead of simply de-duplicating them.

Why:

SELECT DISTINCT collapses rows that are identical across all selected columns into one. UNIQUE is a column/table constraint, not a query keyword, and GROUP BY aggregates rather than simply de-duplicating.

Databases & SQL/querying/joins

A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?

Options

NULL in every orders column
The row is omitted entirely
0 in every orders column
An empty string in every orders column

Show answer

Every orders column holds NULL. A LEFT JOIN keeps each left-hand row even when no right-hand row matches the ON condition, filling the unmatched right-hand columns with NULL rather than omitting the row or substituting 0 or empty strings. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers who have placed no orders.

Why:

A LEFT JOIN keeps every left-hand row even when no right-hand row matches the ON condition; the unmatched right-hand columns are filled with NULL. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers with no orders.

Databases & SQL/querying/aggregation

A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?

Options

COUNT(*) returns 10; COUNT(manager_id) returns 7
Both return 10
Both return 7
COUNT(*) returns 7; COUNT(manager_id) returns 10

Show answer

COUNT(*) returns 10 and COUNT(manager_id) returns 7. COUNT(*) counts every row regardless of nulls, while COUNT(manager_id) counts only rows where that column is non-null, skipping the 3 nulls. Every standard aggregate except COUNT(*) ignores NULL inputs.

Why:

COUNT(*) counts rows regardless of nulls, so it returns 10. COUNT(manager_id) counts only rows where that expression is non-null, skipping the 3 nulls to return 7. Every standard aggregate except COUNT(*) ignores NULL inputs.

Databases & SQL/indexes/btree

Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?

Options

WHERE created_at >= '2024-01-01'
WHERE EXTRACT(YEAR FROM created_at) = 2024
WHERE created_at::text LIKE '%2024%'
WHERE created_at <> '2024-01-01'

Show answer

WHERE created_at >= '2024-01-01' is the one a B-tree can accelerate with a single range scan. A B-tree stores keys in sorted order, so range and equality predicates on the bare column map to a contiguous slice. Wrapping the column in a function such as EXTRACT or a cast, or using a leading-wildcard LIKE or <>, defeats the index because the stored key order no longer lines up with the predicate.

Why:

A B-tree stores keys in sorted order, so range and equality predicates on the bare column (>=, >, <, BETWEEN, =) map to a contiguous slice it can scan. Wrapping the column in a function (EXTRACT, a cast) or using a leading-wildcard LIKE or <> makes the index unusable because the stored key order no longer lines up with the predicate.

Databases & SQL/schema-design/foreign-keys

What does a FOREIGN KEY constraint guarantee?

Options

Every value in the referencing column matches an existing value in the referenced column (referential integrity)
The referencing column values are unique
The referencing column is automatically indexed in every database
Both tables are stored in the same physical file

Show answer

A FOREIGN KEY guarantees referential integrity: every value in the referencing column matches an existing value in the referenced column. You cannot insert a child row pointing at a non-existent parent, nor orphan children by deleting a parent. It does not impose uniqueness, and PostgreSQL does not auto-create an index on the referencing column, though one is often added to speed joins and cascade checks.

Why:

A foreign key enforces referential integrity: you cannot insert a child row that points at a non-existent parent, and (depending on the referential action) you cannot orphan children by deleting a parent. It does not impose uniqueness, and although PostgreSQL does not auto-create an index on the referencing column, many people add one to speed joins and cascade checks.

Databases & SQL/indexes/index-tradeoffs

What is the primary trade-off of adding more indexes to a heavily-written table?

Options

Each INSERT/UPDATE/DELETE must also maintain every affected index, slowing writes and consuming storage
Reads on indexed columns become slower
The table can no longer enforce a primary key
Transactions can no longer be rolled back

Show answer

The trade-off is that every INSERT, UPDATE, and DELETE must also maintain each affected index, slowing writes and consuming extra storage. Indexes speed reads, but each one is an additional structure the database keeps consistent on every write, adding disk and cache pressure. The skill is indexing the columns that selective queries actually filter or join on, not indexing everything.

Why:

Indexes speed reads but every index is an extra structure the database must keep consistent on every write, so more indexes mean slower INSERT/UPDATE/DELETE and more disk and cache pressure. The skill is indexing the columns that selective queries actually filter or join on, not indexing everything.

Databases & SQL/transactions/isolation-levels

Under the SQL standard, which is the lowest isolation level that prevents phantom reads?

Options

SERIALIZABLE
REPEATABLE READ
READ COMMITTED
READ UNCOMMITTED

Show answer

SERIALIZABLE is the lowest standard isolation level that prevents phantom reads. Under the ANSI definitions, READ UNCOMMITTED allows dirty reads, READ COMMITTED prevents those but allows non-repeatable reads, and REPEATABLE READ prevents non-repeatable reads but still permits phantoms. Only SERIALIZABLE forbids phantom reads, though PostgreSQL's stronger REPEATABLE READ happens to block them too.

Why:

By the ANSI SQL definitions, READ UNCOMMITTED allows dirty reads, READ COMMITTED prevents those but allows non-repeatable reads, REPEATABLE READ prevents non-repeatable reads but still permits phantoms, and only SERIALIZABLE forbids phantom reads. (PostgreSQL's REPEATABLE READ is stronger than the standard and happens to block phantoms too, but the standard's guarantee starts at SERIALIZABLE.)

Databases & SQL/schema-design/constraints

Which of these are column- or table-level integrity constraints in standard SQL?

Options

NOT NULL
CHECK
UNIQUE
ORDER BY

Show answer

NOT NULL, CHECK, and UNIQUE are integrity constraints, but ORDER BY is not. Those three, alongside PRIMARY KEY and FOREIGN KEY, are rules the database enforces on every write. ORDER BY is a query clause that sorts a result set and constrains nothing about the stored data.

Why:

NOT NULL, CHECK, and UNIQUE (alongside PRIMARY KEY and FOREIGN KEY) are integrity constraints the database enforces on every write. ORDER BY is a query clause that sorts a result set; it constrains nothing about the stored data.

Databases & SQL/indexes/composite-index

You have a composite B-tree index on (tenant_id, created_at) in that column order. Which queries can use the index to seek rather than scan the whole table?

Options

WHERE tenant_id = 42
WHERE tenant_id = 42 AND created_at > '2024-01-01'
WHERE created_at > '2024-01-01' (no tenant_id predicate)
ORDER BY tenant_id, created_at after WHERE tenant_id = 42

Show answer

Filtering on tenant_id alone, on tenant_id plus a created_at range, and an ORDER BY tenant_id, created_at after a tenant_id filter can all seek — but filtering on created_at alone cannot. A composite index is usable only on a left-to-right prefix of its columns, so skipping the leading tenant_id defeats it. This is the classic left-prefix rule.

Why:

A composite index is usable only on a left-to-right prefix of its columns. Filtering on the leading tenant_id (a), or on tenant_id plus a range on the next column created_at (b), seeks efficiently, and the index's sort order also satisfies the matching ORDER BY (d). Filtering on created_at alone (c) skips the leading column, so the index can't seek — this is the classic left-prefix rule.

Databases & SQL/transactions/acid

Which statements correctly describe ACID properties?

Options

Atomicity: a transaction either fully commits or fully rolls back, never partially
Durability: once committed, changes survive a crash or power loss
Consistency: a transaction moves the database from one valid state to another, preserving all constraints
Isolation: every transaction must execute on its own physical server

Show answer

The Atomicity, Durability, and Consistency descriptions are correct, but the Isolation one is wrong. Atomicity means all-or-nothing, Durability means committed changes survive crashes, and Consistency means constraints are preserved across the transaction. Isolation governs how concurrent transactions are kept from interfering via the chosen isolation level; it has nothing to do with running each transaction on its own physical server.

Why:

Atomicity (all-or-nothing), Consistency (constraints/invariants preserved across the transaction), and Durability (committed data persists through failures) are stated correctly. Isolation concerns how concurrent transactions are kept from interfering via the chosen isolation level — it has nothing to do with running on separate servers.

Databases & SQL/schema-design/normalization

Third normal form (3NF) requires that no non-key column depends on another non-key column (no transitive dependencies on the primary key).

Show answer

True. Third normal form forbids transitive dependencies, requiring every non-key attribute to depend on the key, the whole key, and nothing but the key. Storing a zip_code alongside a city that the zip_code functionally determines is the classic 3NF violation, because city depends on zip_code rather than directly on the primary key.

Why:

3NF builds on 2NF and additionally forbids transitive dependencies: every non-key attribute must depend on the key, the whole key, and nothing but the key. Storing a zip_code and a city that is functionally determined by the zip_code in the same table is the classic 3NF violation, since city depends on zip_code rather than directly on the primary key.

Databases & SQL/querying/subqueries

customers(id) has ids 1, 2, 3. orders(customer_id) has rows for customer_id 1, 1, 3. What single number does this scalar query return?

SELECT COUNT(*)
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Show answer

Why:

EXISTS is true for a customer as soon as the correlated subquery finds at least one matching order; it never multiplies rows. Customers 1 and 3 have orders, customer 2 does not, so exactly 2 rows satisfy the predicate and COUNT(*) returns 2.

Databases & SQL/db-performance/n-plus-one

This pseudocode renders a list of posts with their author names but is slow on large lists. What is the performance bug?

posts = db.query("SELECT id, author_id, title FROM posts")
for post in posts:
    author = db.query(
        "SELECT name FROM authors WHERE id = ?", post.author_id)
    render(post.title, author.name)

Options

It is an N+1 query: 1 query for the posts plus one extra query per post to fetch its author
The posts query is missing a WHERE clause and returns too many columns
SELECT name should be SELECT * so the author loads fully
The loop should run the author query before the posts query

Show answer

It is an N+1 query: 1 query for the posts plus one extra query per post to fetch its author

Why:

This is the N+1 query problem: one query loads N posts, then the loop fires one additional query per post to fetch its author — N+1 round trips total. Replace it with a single JOIN (or a batched WHERE id IN (...) over the collected author ids) so the database returns posts and author names in one round trip.

Databases & SQL/db-performance/query-planning

What does EXPLAIN (and EXPLAIN ANALYZE) show you, and why is it useful when a query is slow?

Show answer

EXPLAIN shows the query planner's chosen execution plan — the access methods (sequential scan vs index scan), join strategies, and the planner's estimated row counts and costs — without running the query. EXPLAIN ANALYZE actually executes it and adds real timings and actual row counts, so you can spot a sequential scan that should be an index scan or a large gap between estimated and actual rows that points to bad statistics.

Why:

EXPLAIN exposes how the planner intends to run a query so you can diagnose slowness: an unexpected sequential scan, an expensive join order, or estimates that diverge wildly from reality (stale statistics). EXPLAIN ANALYZE runs the query and reports actual time and rows, which is the definitive way to confirm where the cost is going.

Databases & SQL/indexes/covering-index

What is a covering index, and why is it faster than a normal index lookup?

Show answer

A covering index contains every column a query needs (in its key or, in PostgreSQL, an INCLUDE clause), so the query is answered entirely from the index. This gives an index-only scan that skips the extra heap/table fetch a normal index lookup needs to retrieve the remaining columns.

Why:

Normally an index points back to the table row to fetch columns it doesn't store, costing an extra read per match. When the index already holds all referenced columns, the engine does an index-only scan and never touches the table — a big win for hot, selective queries.

Sources

The official documentation these questions are checked against:

PostgreSQL documentation

Job market

See databases-sql salaries and hiring demand from live job postings.

Practice this for real

CodePrep turns your target job description into an adaptive quiz from a bank of tagged questions, scores your answers, and resurfaces the topics you miss.

Start free

Databases & SQL Interview Questions — Practice Quiz for All Levels

What the quiz covers

Core concepts to review before you start

At a glance

What you'll review

Practice questions

Which keyword removes duplicate rows from a SELECT result set?

A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?

A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?

Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?

What does a FOREIGN KEY constraint guarantee?

What is the primary trade-off of adding more indexes to a heavily-written table?

Under the SQL standard, which is the lowest isolation level that prevents phantom reads?

Which of these are column- or table-level integrity constraints in standard SQL?

You have a composite B-tree index on (tenant_id, created_at) in that column order. Which queries can use the index to seek rather than scan the whole table?

Which statements correctly describe ACID properties?

Third normal form (3NF) requires that no non-key column depends on another non-key column (no transitive dependencies on the primary key).

customers(id) has ids 1, 2, 3. orders(customer_id) has rows for customer_id 1, 1, 3. What single number does this scalar query return?

This pseudocode renders a list of posts with their author names but is slow on large lists. What is the performance bug?

What does EXPLAIN (and EXPLAIN ANALYZE) show you, and why is it useful when a query is slow?

What is a covering index, and why is it faster than a normal index lookup?

Sources

Related interview questions

Job market

Practice this for real

New topics and job-market signal, in your inbox