Databases & SQL Interview Questions — Practice Quiz for All Levels
Reviewed by Mark Dickie · Last updated
Databases and SQL are the foundational layer of nearly every production system — a structured way to store, retrieve, and manipulate data using a declarative query language standardised across dozens of engines. To do well in a database interview, you need to be solid on four things: writing correct SQL (joins, aggregations, subqueries), understanding how engines execute and optimise queries, knowing when to reach for an index versus a schema change, and being able to reason about transactions and isolation levels out loud. Expect questions to probe real trade-offs — a candidate who can explain why a clustered index on a high-write table can hurt throughput will stand out far more than one who just lists index types.
What the quiz covers
Questions span the full range from basic SELECT syntax to multi-table schema design and ACID guarantees. The difficulty curve below maps to what interviewers typically ask at each stage:
| Career stage | Typical topics tested | |---|---| | Junior / intern | SELECT, WHERE, ORDER BY, basic JOINs, primary keys | | Mid-level | GROUP BY, subqueries, NULLs, normalisation (1NF–3NF), foreign keys | | Senior | Window functions, query plans, index strategy, locking, MVCC | | Staff / architect | Partitioning, replication lag, CAP trade-offs, schema migration at scale |
Core concepts to review before you start
Work through these in order if you are short on time:
- JOIN types — know the difference between INNER, LEFT, RIGHT, and FULL OUTER joins, and be ready to write each one from scratch on a whiteboard.
- Indexes — understand B-tree vs. hash indexes, covering indexes, and the cost of indexing on columns with low cardinality.
- ACID properties — be able to define Atomicity, Consistency, Isolation, and Durability and give a concrete example of what breaks when each one is violated.
- Normalisation vs. denormalisation — know the normal forms up to 3NF, but also know when a read-heavy workload justifies intentional denormalisation.
- Window functions —
ROW_NUMBER(),RANK(),LAG(), andLEAD()now appear regularly in mid-to-senior screens; at minimum, understand theOVER (PARTITION BY ... ORDER BY ...)clause.
One area candidates consistently underestimate: NULL handling. In SQL, NULL is not zero and not an empty string — it means unknown, and comparisons with NULL using = always evaluate to NULL, not TRUE or FALSE. Use IS NULL / IS NOT NULL and be aware of how NULLs propagate through COUNT, SUM, and JOIN conditions.
At a glance
| Questions | 15 |
|---|---|
| Difficulty | 1–5 of 5 |
| Formats | Multiple choice, Multiple answer, True / false, Code output, Find the bug, Short answer, Flashcard |
What you'll review
- select basics
- joins
- aggregation
- btree
- foreign keys
- index tradeoffs
- isolation levels
- constraints
- composite index
- acid
- normalization
- subqueries
- n plus one
- query planning
- covering index
Practice questions
Databases & SQL/querying/select-basics
Which keyword removes duplicate rows from a SELECT result set?
Options
DISTINCTUNIQUEGROUPDEDUPE
Show answer
DISTINCT removes duplicate rows from a SELECT result set. Writing SELECT DISTINCT collapses rows that are identical across all selected columns into a single row. UNIQUE is a column or table constraint rather than a query keyword, and GROUP BY aggregates rows instead of simply de-duplicating them.
SELECT DISTINCT collapses rows that are identical across all selected columns into one. UNIQUE is a column/table constraint, not a query keyword, and GROUP BY aggregates rather than simply de-duplicating.
Databases & SQL/querying/joins
A LEFT JOIN between customers (left) and orders (right) returns a customer who has placed no orders. What appears in that row's orders columns?
Options
NULLin everyorderscolumn- The row is omitted entirely
0in everyorderscolumn- An empty string in every
orderscolumn
Show answer
Every orders column holds NULL. A LEFT JOIN keeps each left-hand row even when no right-hand row matches the ON condition, filling the unmatched right-hand columns with NULL rather than omitting the row or substituting 0 or empty strings. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers who have placed no orders.
A LEFT JOIN keeps every left-hand row even when no right-hand row matches the ON condition; the unmatched right-hand columns are filled with NULL. This is exactly why a WHERE orders.id IS NULL filter after a LEFT JOIN finds customers with no orders.
Databases & SQL/querying/aggregation
A table has 10 rows; the manager_id column is NULL in 3 of them. How do COUNT(*) and COUNT(manager_id) differ?
Options
COUNT(*)returns 10;COUNT(manager_id)returns 7- Both return 10
- Both return 7
COUNT(*)returns 7;COUNT(manager_id)returns 10
Show answer
COUNT(*) returns 10 and COUNT(manager_id) returns 7. COUNT(*) counts every row regardless of nulls, while COUNT(manager_id) counts only rows where that column is non-null, skipping the 3 nulls. Every standard aggregate except COUNT(*) ignores NULL inputs.
COUNT(*) counts rows regardless of nulls, so it returns 10. COUNT(manager_id) counts only rows where that expression is non-null, skipping the 3 nulls to return 7. Every standard aggregate except COUNT(*) ignores NULL inputs.
Databases & SQL/indexes/btree
Which predicate can a standard B-tree index on created_at accelerate with a single index range scan?
Options
WHERE created_at >= '2024-01-01'WHERE EXTRACT(YEAR FROM created_at) = 2024WHERE created_at::text LIKE '%2024%'WHERE created_at <> '2024-01-01'
Show answer
WHERE created_at >= '2024-01-01' is the one a B-tree can accelerate with a single range scan. A B-tree stores keys in sorted order, so range and equality predicates on the bare column map to a contiguous slice. Wrapping the column in a function such as EXTRACT or a cast, or using a leading-wildcard LIKE or <>, defeats the index because the stored key order no longer lines up with the predicate.
A B-tree stores keys in sorted order, so range and equality predicates on the bare column (>=, >, <, BETWEEN, =) map to a contiguous slice it can scan. Wrapping the column in a function (EXTRACT, a cast) or using a leading-wildcard LIKE or <> makes the index unusable because the stored key order no longer lines up with the predicate.
Databases & SQL/schema-design/foreign-keys
What does a FOREIGN KEY constraint guarantee?
Options
- Every value in the referencing column matches an existing value in the referenced column (referential integrity)
- The referencing column values are unique
- The referencing column is automatically indexed in every database
- Both tables are stored in the same physical file
Show answer
A FOREIGN KEY guarantees referential integrity: every value in the referencing column matches an existing value in the referenced column. You cannot insert a child row pointing at a non-existent parent, nor orphan children by deleting a parent. It does not impose uniqueness, and PostgreSQL does not auto-create an index on the referencing column, though one is often added to speed joins and cascade checks.
A foreign key enforces referential integrity: you cannot insert a child row that points at a non-existent parent, and (depending on the referential action) you cannot orphan children by deleting a parent. It does not impose uniqueness, and although PostgreSQL does not auto-create an index on the referencing column, many people add one to speed joins and cascade checks.
Databases & SQL/indexes/index-tradeoffs
What is the primary trade-off of adding more indexes to a heavily-written table?
Options
- Each INSERT/UPDATE/DELETE must also maintain every affected index, slowing writes and consuming storage
- Reads on indexed columns become slower
- The table can no longer enforce a primary key
- Transactions can no longer be rolled back
Show answer
The trade-off is that every INSERT, UPDATE, and DELETE must also maintain each affected index, slowing writes and consuming extra storage. Indexes speed reads, but each one is an additional structure the database keeps consistent on every write, adding disk and cache pressure. The skill is indexing the columns that selective queries actually filter or join on, not indexing everything.
Indexes speed reads but every index is an extra structure the database must keep consistent on every write, so more indexes mean slower INSERT/UPDATE/DELETE and more disk and cache pressure. The skill is indexing the columns that selective queries actually filter or join on, not indexing everything.
Databases & SQL/transactions/isolation-levels
Under the SQL standard, which is the lowest isolation level that prevents phantom reads?
Options
SERIALIZABLEREPEATABLE READREAD COMMITTEDREAD UNCOMMITTED
Show answer
SERIALIZABLE is the lowest standard isolation level that prevents phantom reads. Under the ANSI definitions, READ UNCOMMITTED allows dirty reads, READ COMMITTED prevents those but allows non-repeatable reads, and REPEATABLE READ prevents non-repeatable reads but still permits phantoms. Only SERIALIZABLE forbids phantom reads, though PostgreSQL's stronger REPEATABLE READ happens to block them too.
By the ANSI SQL definitions, READ UNCOMMITTED allows dirty reads, READ COMMITTED prevents those but allows non-repeatable reads, REPEATABLE READ prevents non-repeatable reads but still permits phantoms, and only SERIALIZABLE forbids phantom reads. (PostgreSQL's REPEATABLE READ is stronger than the standard and happens to block phantoms too, but the standard's guarantee starts at SERIALIZABLE.)
Databases & SQL/schema-design/constraints
Which of these are column- or table-level integrity constraints in standard SQL?
Options
NOT NULLCHECKUNIQUEORDER BY
Show answer
NOT NULL, CHECK, and UNIQUE are integrity constraints, but ORDER BY is not. Those three, alongside PRIMARY KEY and FOREIGN KEY, are rules the database enforces on every write. ORDER BY is a query clause that sorts a result set and constrains nothing about the stored data.
NOT NULL, CHECK, and UNIQUE (alongside PRIMARY KEY and FOREIGN KEY) are integrity constraints the database enforces on every write. ORDER BY is a query clause that sorts a result set; it constrains nothing about the stored data.
Databases & SQL/indexes/composite-index
You have a composite B-tree index on (tenant_id, created_at) in that column order. Which queries can use the index to seek rather than scan the whole table?
Options
WHERE tenant_id = 42WHERE tenant_id = 42 AND created_at > '2024-01-01'WHERE created_at > '2024-01-01'(notenant_idpredicate)ORDER BY tenant_id, created_atafterWHERE tenant_id = 42
Show answer
Filtering on tenant_id alone, on tenant_id plus a created_at range, and an ORDER BY tenant_id, created_at after a tenant_id filter can all seek — but filtering on created_at alone cannot. A composite index is usable only on a left-to-right prefix of its columns, so skipping the leading tenant_id defeats it. This is the classic left-prefix rule.
A composite index is usable only on a left-to-right prefix of its columns. Filtering on the leading tenant_id (a), or on tenant_id plus a range on the next column created_at (b), seeks efficiently, and the index's sort order also satisfies the matching ORDER BY (d). Filtering on created_at alone (c) skips the leading column, so the index can't seek — this is the classic left-prefix rule.
Databases & SQL/transactions/acid
Which statements correctly describe ACID properties?
Options
- Atomicity: a transaction either fully commits or fully rolls back, never partially
- Durability: once committed, changes survive a crash or power loss
- Consistency: a transaction moves the database from one valid state to another, preserving all constraints
- Isolation: every transaction must execute on its own physical server
Show answer
The Atomicity, Durability, and Consistency descriptions are correct, but the Isolation one is wrong. Atomicity means all-or-nothing, Durability means committed changes survive crashes, and Consistency means constraints are preserved across the transaction. Isolation governs how concurrent transactions are kept from interfering via the chosen isolation level; it has nothing to do with running each transaction on its own physical server.
Atomicity (all-or-nothing), Consistency (constraints/invariants preserved across the transaction), and Durability (committed data persists through failures) are stated correctly. Isolation concerns how concurrent transactions are kept from interfering via the chosen isolation level — it has nothing to do with running on separate servers.
Databases & SQL/schema-design/normalization
Third normal form (3NF) requires that no non-key column depends on another non-key column (no transitive dependencies on the primary key).
Show answer
True. Third normal form forbids transitive dependencies, requiring every non-key attribute to depend on the key, the whole key, and nothing but the key. Storing a zip_code alongside a city that the zip_code functionally determines is the classic 3NF violation, because city depends on zip_code rather than directly on the primary key.
3NF builds on 2NF and additionally forbids transitive dependencies: every non-key attribute must depend on the key, the whole key, and nothing but the key. Storing a zip_code and a city that is functionally determined by the zip_code in the same table is the classic 3NF violation, since city depends on zip_code rather than directly on the primary key.
Databases & SQL/querying/subqueries
customers(id) has ids 1, 2, 3. orders(customer_id) has rows for customer_id 1, 1, 3. What single number does this scalar query return?
SELECT COUNT(*)
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);Show answer
2
EXISTS is true for a customer as soon as the correlated subquery finds at least one matching order; it never multiplies rows. Customers 1 and 3 have orders, customer 2 does not, so exactly 2 rows satisfy the predicate and COUNT(*) returns 2.
Databases & SQL/db-performance/n-plus-one
This pseudocode renders a list of posts with their author names but is slow on large lists. What is the performance bug?
posts = db.query("SELECT id, author_id, title FROM posts")
for post in posts:
author = db.query(
"SELECT name FROM authors WHERE id = ?", post.author_id)
render(post.title, author.name)Options
- It is an N+1 query: 1 query for the posts plus one extra query per post to fetch its author
- The posts query is missing a
WHEREclause and returns too many columns SELECT nameshould beSELECT *so the author loads fully- The loop should run the author query before the posts query
Show answer
It is an N+1 query: 1 query for the posts plus one extra query per post to fetch its author
This is the N+1 query problem: one query loads N posts, then the loop fires one additional query per post to fetch its author — N+1 round trips total. Replace it with a single JOIN (or a batched WHERE id IN (...) over the collected author ids) so the database returns posts and author names in one round trip.
Databases & SQL/db-performance/query-planning
What does EXPLAIN (and EXPLAIN ANALYZE) show you, and why is it useful when a query is slow?
Show answer
EXPLAIN shows the query planner's chosen execution plan — the access methods (sequential scan vs index scan), join strategies, and the planner's estimated row counts and costs — without running the query. EXPLAIN ANALYZE actually executes it and adds real timings and actual row counts, so you can spot a sequential scan that should be an index scan or a large gap between estimated and actual rows that points to bad statistics.
EXPLAIN exposes how the planner intends to run a query so you can diagnose slowness: an unexpected sequential scan, an expensive join order, or estimates that diverge wildly from reality (stale statistics). EXPLAIN ANALYZE runs the query and reports actual time and rows, which is the definitive way to confirm where the cost is going.
Databases & SQL/indexes/covering-index
What is a covering index, and why is it faster than a normal index lookup?
Show answer
A covering index contains every column a query needs (in its key or, in PostgreSQL, an INCLUDE clause), so the query is answered entirely from the index. This gives an index-only scan that skips the extra heap/table fetch a normal index lookup needs to retrieve the remaining columns.
Normally an index points back to the table row to fetch columns it doesn't store, costing an extra read per match. When the index already holds all referenced columns, the engine does an index-only scan and never touches the table — a big win for hot, selective queries.
Sources
The official documentation these questions are checked against:
Related interview questions
Job market
See databases-sql salaries and hiring demand from live job postings.
Practice this for real
CodePrep turns your target job description into an adaptive quiz from a bank of tagged questions, scores your answers, and resurfaces the topics you miss.