Frontend Masters Boost RSS Feed

Code portability

Chris Coyier — Mon, 22 Sep 2025 18:36:32 +0000

Another good one from Nicholas C. Zakas this time on code portability. Here’s some choices he made for a recent projects:

Astro, because it can be deployed on a “wide range of cloud services” and also supports a variety of front-end frameworks, so you can “start with React and later move to Vue or Solid without changing your application framework”.
Hono, because “it can run almost anywhere JavaScript runs: Node.js, Deno, Bun, Cloudflare Workers…”
Supabase because “if needed, you can export your entire database and run it anywhere PostgreSQL is supported”.
Cloudflare R2, because it’s “an S3-compatible service, so you can switch providers without hassle”.

Portability indeed!

Advanced PostgreSQL Indexing: Multi-Key Queries and Performance Optimization

Adam Rackis — Wed, 03 Sep 2025 13:19:36 +0000

Welcome to part two of our exploration of Postgres indexes. Be sure to check out part one if you haven’t already. We’ll be picking up exactly where we left off.

Article Series

We have the same books table as before, containing approximately 90 million records.

Editors’ note: Need to bone up on PostgreSQL all around? Our course Complete Intro to SQL & PostgreSQL from Brian Holt will be perfect for you.

Filtering and sorting

Let’s dive right in. Imagine you work for a book distribution company. You’re responsible for publishers and need to query info on them. There are approximately 250,000 different publishers, with a wide variance in the number of books published by each, which we’ll explore.

Let’s start easy. You want to see the top 10 books, sorted alphabetically, for a single publisher.

explain analyze
select *
from books
where publisher = 157595
order by title
limit 10;

This publisher is relatively small, with only 65 books in its catalog. Nonetheless, the query is slow to run, taking almost four seconds.

If you followed the same steps from part 1 to create this same database, note that your ids will be different.

This is hardly surprising; there are a lot of rows in our table, and finding the rows for that publisher takes a while, since Postgres has to scan the entire heap.

So we add an index on, for now, just publisher.

CREATE INDEX idx_publisher ON books(publisher);

We can think of our index in this way. It just helps us identify all the book entries by publisher. To get the rest of the info on the book, we go to the heap.

And now our same query is incredibly fast.

Nothing surprising or interesting.

But now you need to run the same query, but on a different publisher, number 210537. This is the biggest publisher in the entire database, with over 2 million books. Let’s see how our index fares.

explain analyze
select *
from books
where publisher = 210537
order by title
limit 10;

Actually, our index wasn’t used at all. Postgres just scanned the whole table, grabbing our publisher along the way, and then sorted the results to get the top 10. We’ll discuss why a little later, as we did in the prior post, but the short of it is that the random heap accesses from reading so many entries off of an index would be expensive; Postgres decided the scan would be cheaper. These decisions are all about tradeoffs and are governed by statistics and cost estimates.

Previously, we threw the “other” field into the INCLUDE() list, so the engine wouldn’t have to leave the index to get the other field it needed. In this case, we’re selecting everything. I said previously to be diligent in avoiding unnecessary columns in the SELECT clause for just this reason, but here, we assume we actually do need all these columns.

We probably don’t want to dump every single column into the INCLUDE list of the index: we’d basically just be redefining our table into an index.

But why do we need to read so many rows in the first place? We have a limit of 10 on our query. The problem, of course, is that we’re ordering on title. And Postgres needs to see all rows for a publisher (2 million rows in this case) in order to sort them, and grab the first 10.

What if we built an index on publisher, and then title?

CREATE INDEX idx_publisher_title ON books(publisher, title);

That would look like this:

If Postgres were to search for a specific publisher, it could just seek down to the start of that publisher’s books, and then read however many needed, right off the leaf nodes, couldn’t it? There could be 2 million book entries in the leaf nodes, but Postgres could just read the first 10, and be guaranteed that they’re the first 10, since that’s how the index is ordered.

Let’s try it.

We got the top 10 books, sorted, from a list of over two million in less than a fourth of a millisecond. Amazing!

More publishers!

Now your boss comes and tells you to query the top 10 books, sorted alphabetically, as before, but over either publisher, combined. To be clear, the requirement is to take all books both publishers have published, combine them, then get the first ten, alphabetically.

Easy, you say assuredly, fresh off the high of seeing Postgres grab you that same data for your enormous publisher in under a millisecond.

You can put both publisher ids into an IN clause. Then, Postgres can search for each, one at a time, save the starting points of both, and then start reading forward on both, and sort of merge them together, taking the smaller title from either, until you have 10 books total.

Let’s try it!

explain analyze
select *
from books
where publisher in (157595, 210537)
order by title
limit 10;

Which produces this

[Sad Trombone]

Let’s re-read my completely made-up, assumed chain of events Postgres would take, from above.

Postgres can search for each, one at a time, save the starting points of both, and then start reading forward on both, and sort of merge them together, taking the smaller title from either, until you have 10 books total.

It reads like the Charlie meme from Always Sunny in Philadelphia.

If your description of what the database will do sounds like something that would fit with this meme, you’re probably overthinking things.

Postgres operates on very simple operations that it chains together. Index Scan, Gather Merge, Sort, Sequential Scan, etc.

Searching multiple publishers

To be crystal clear, Postgres absolutely can search multiple keys from an index. Here’s the execution plan for the identical query from a moment ago, but with two small publishers for the publisher ids, which each have just a few hundred books

It did indeed do an index scan, on that same index. It just matched two values at once.

Rather than taking one path down the B Tree, it takes multiple paths down the B Tree, based on the multiple key value matches.

Index Cond: (publisher = ANY ('{157595,141129}'::integer[]))

That gives us all rows for either publisher. Then it needs to sort them, which it does next, followed by the limit.

Why does it need to sort them? When we have a single publisher, we know all values under that publisher are ordered.

Look at the index.

Imagine we searched for publisher 8. Postgres can go directly to the beginning of that publisher, and just read:

"Animal Farm"
"Of Mice and Men"

Look what happens when we search for two publishers, 8 and also, now, 21.

We can’t just start reading for those matched records. That would give us

"Animal Farm"
"Of Mice and Men"
"Lord of The Flies"
"The Catcher in The Rye"

The books under each publisher are ordered, but the overall list of matches is not. And again, Postgres operates on simple operations. Elaborate meta descriptions like “well it’ll just merge the matches from each publisher taking the less of the next entry from either until the limit is satisfied” won’t show up in your execution plan, at least not directly.

Why did the publisher id change the plan?

Before we make this query fast, let’s briefly consider why our query’s plan changed so radically between searching for two small publishers compared to an enormous publisher and a small one.

As we discussed in part 1, Postgres tracks and uses statistics about your data in order to craft the best execution plan it can. Here, when you searched for the large publisher, it realized that query would yield an enormous number of rows. That led it to decide that simply scanning through the heap directly would be faster than the large number of random i/o that would be incurred from following so many matches in the index’s leaf nodes, over to the corresponding locations on the heap. Random i/o is bad, and Postgres will usually try to avoid it.

Crafting a better query

You can absolutely have Postgres find the top 10 books in both publishers, and then put them together, sorted, and take the first 10 from there. You just have to be explicit about it.

explain analyze
with pub1 as (
    select * from books
    where publisher = 157595
    order by title limit 10
), pub2 as (
    select * from books
    where publisher = 210537
    order by title limit 10
)
select * from pub1
union all
select * from pub2
order by title
limit 10;

The syntax below is called a common table expression, or a CTE. It’s basically a query that we define, and then query against later.

with pub1 as (
    select * from books
    where publisher = 157595
    order by title limit 10
)

Let’s run it!

The execution plan is beautiful

It’s fast! As you can see, it runs in less than a fifth of a millisecond (0.186ms — but who’s counting)?

Always read these from the bottom:

It’s the same exact index scan from before, but on a single publisher, with a limit of 10, run twice. Postgres can seek to the right publisher, and just read 10 for the first publisher, and then repeat for the second publisher. Then it puts those lists together.

Remember the silly, contrived Postgres operation I made up before?

… and then start reading forward on both, and sort of merge them together, taking the smaller title from either, until you have 10 books total.

You’re not going to believe this, but that’s exactly what the Merge Append on line 2 does

->  Merge Append  (cost=1.40..74.28 rows=20 width=111) (actual time=0.086..0.115 rows=10 loops=1)

You can achieve amazing things with modern databases if you know how to structure your queries just right.

How does this scale?

You won’t want to write queries like this manually. Presumably, you’d have application code taking a list of publisher ids, and constructing something like this. How will it perform as you add more and more publishers?

I’ve explored this very idea on larger production sets of data (much larger than what we’re using here). I found that, somewhere around a thousand ids, the performance does break down. But not because there’s too much data to work with. The execution of those queries, with even a thousand ids, took only a few hundred ms. But the Planning Time started to take many, many seconds. It turns out having Postgres parse through a thousand CTEs, and put a plan together takes time.

Version 2

We’re onto something, for sure. But can we take a list of ids, and force them into individual queries that match on that specific id, with a limit, and then select from the overall bucket of results? Exactly like before, but without having to manually cobble together a CTE for each id?

When there’s a will, there’s a way.

explain analyze
with ids as (
    select * from (
      values (157595), (210537)
    ) t(id)
), results as (
    select bookInfo.*
    from ids
    cross join lateral (
      select *
      from books
      where publisher = ids.id
      order by title
      limit 10
    ) bookInfo
)
select *
from results
order by title
limit 10;

Let’s walk through this.

Our ids CTE:

with ids as (
    select * from (
      values (157595), (210537)
    ) t(id)
)

This defines a pseudo-table that has one column, with two rows. The rows have values of our publisher ids for the sole column: 157595 and 210537.

values (157595), (210537)

But if it’s a table, how do we query against the column? It needs to have a name. That’s what this syntax is.

t(id)

We gave that column a name of id.

The results CTE is where the real work happens.

results as (
    select bookInfo.*
    from ids
    cross join lateral (
      select *
      from books
      where publisher = ids.id
      order by title
      limit 10
    ) bookInfo
)

We query against our ids table, and then use the ugly cross join lateral expression as a neat trick to run our normal books query, but with access to the publisher value in the ids CTE. The value in the ids CTE is the publisher id. So we’ve achieved what we want: we’re conceptually looping through those ids, and then running our fast query on each.

The term lateral is the key. Think of (American) football, where a lateral is a sideways pass. Here, the lateral keyword allows us to “laterally” reference the ids.id value from the expression right beside it; the ids CTE laterals each id over to the results CTE.

That coaxes Postgres to run its normal index scan, followed by a read of the next 10 rows. That happens once for each id. That whole meta-list will then contain (up to) 10 rows for each publisher, and then this…

select *
from results
order by title
limit 10;

… re-sorts, and takes the first 10.

In my own experience, this scales fabulously. Even with a few thousand ids I couldn’t get this basic setup to take longer than half a second, even on a much larger table than we’ve been looking at here.

Let’s run it!

Let’s see what this version of our query looks like

Still a small fraction of a millisecond, but ever so slightly slower; this now runs in 0.207ms. And the execution plan is a bit longer and more complex.

->  Nested Loop  (cost=0.69..81.19 rows=20 width=111) (actual time=0.042..0.087 rows=20 loops=1)

A nested loop join is a pretty simple (and usually pretty slow) join algorithm. It just takes each value in the one list, and then applies it to each value in the second list. In this case, though, it’s taking values from a static list and applying them against an incredibly fast query.

The left side of the join is each id from that static table we built

->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)

The right side is our normal (fast) query that we’ve seen a few times now.

->  Limit  (cost=0.69..40.48 rows=10 width=111) (actual time=0.024..0.037 rows=10 loops=2)
      ->  Index Scan using idx_publisher_title on books  (cost=0.69..2288.59 rows=575 width=111) (actual time=0.023..0.034 rows=10 loops=2)
         Index Cond: (publisher = "*VALUES*".column1)

However, our nice Merge Append is gone, replaced with a normal sort. The reason is that we replaced discrete CTEs, each of which produced separate, identically sorted outputs, which the planner could identify, and apply a Merge Append to. Merge Append works on multiple, independently sorted streams of data. Instead, this is just a regular join, which produces one stream of data, and therefore needs to be sorted.

But this is no tragedy. The query runs in a tiny fraction of a millisecond, and will not suffer planning time degradation like the previous CTE version would, as we add more and more publisher ids. Plus, the sort is over just N*10 records, where N is the number of publishers. It would take a catastrophically large N to wind up with enough rows where Postgres would struggle to sort them quickly, especially since the limit of 10 would allow it to do an efficient top-N heapsort, like we saw in part 1.

Stepping back

The hardest part of writing this post is knowing when to stop. I could easily write as much content again: we haven’t even gotten into joins, and how indexes can help there, or even materialized views. This is an endless topic, and one that I enjoy, but we’ll stop here for now.

The one theme throughout can be summed up as: understand how your data is stored, and craft your queries to make the best use possible of this knowledge. If you’re not sure exactly how to craft your queries to do this, then use your knowledge of how indexes work, and what you want your queries to accomplish to ask an extremely specific question to your favorite AI model. It’s very likely to at least get you closer to your answer. Oftentimes knowing what to ask is half the battle.

And of course, if your data is not stored as you need, then change how your data is stored. Indexes are the most common way, which we’ve discussed here. Materialized views would be the next power tool to consider when needed. But that’s a topic for another day.

Parting thoughts

Hopefully, these posts have taught you a few things about querying, query tuning, and crafting the right index for the right situation. These are skills that can have a huge payoff in achieving palpable performance gains that your users will notice.

Article Series

Editor’s note: our The Complete Course for Building Backend Web Apps with Go includes setting up a PostgreSQL database and running it in Docker, all from scratch.

Introduction to Postgres Indexes

Adam Rackis — Mon, 01 Sep 2025 19:50:36 +0000

This Part 1 (of a 2-part series) is a practical, hands-on, applicable approach to database indexes. We’ll cover what B Trees are with a focus on deeply understanding, and internalizing how they store data on disk, and how your database uses them to speed up queries.

This will set us up nicely for part 2, where we’ll explore some interesting, counterintuitive ways to press indexes into service to achieve great querying performance over large amounts of data.

Article Series

There are other types of database indexes beside B Tree, but B Tree indexes are the most common, which is why they’ll be the exclusive focus of this post.

Everything in these posts will use Postgres, but everything is directly applicable to other relational databases (like MySQL). All the queries I’ll be running are against a simple books database which I scaffolded, and had Cursor populate with about 90 million records. The schema for the database, as well as the code to fill it are in this repo. If you’d like to follow along on your own: sql/db_create.sql has the DDL, and npx tsx insert-data/fill-database.ts will run the code to fill it.

We’ll be looking at some B Tree visualizations as we go. Those were put together with a web app I had Cursor help me build.

Editors’ note: Need to bone up on PostgreSQL all around? Our course Complete Intro to SQL & PostgreSQL from Brian Holt will be perfect for you.

Setting some baselines

Just for fun, let’s take a look at the first 10 rows in the books table. Don’t look too close, again, this was all algorithmically generated by AI. The special characters at the beginning were my crude way of forcing the (extremely repetitive) titles to spread out.

That’s the last time we’ll be looking at actual data. From here forward we’ll look at queries, the execution plans they generate, and we’ll talk about how indexes might, or might not be able to help. Rather than the psql terminal utility I’ll be running everything through DataGrip, which is an IDE for databases. The output is identical, except with nicely numbered lines which will make things easier to talk about as we go.

Let’s get started. Let’s see what the prior query looks like by putting explain analyze before it. This tells Postgres to execute the query, and return to us the execution plan it used, as well as its performance.

explain analyze
select * from books limit 10;

We asked for 10 rows. The database did a sequential scan on our books table, but with a limit of 10. This couldn’t be a simpler query, and it returned in less than one twentieth of one millisecond. This is hardly surprising (or interesting). Postgres reached in and grabbed the first ten rows.

Let’s grab the first 10 books, but this time sorted alphabetically.

Catastrophically, this took 20 seconds. With 90 million rows in this table, Postgres now has to (kind of) sort the entire table, in order to know what the first 10 books are.

I say kind of since it doesn’t really have to sort the entire table; it just has to scan the entire table and keep track of the 10 rows with the lowest titles. That’s what the top-N heapsort is doing.

And it’s doing that in parallel. We can see two child workers getting spawned (in addition to the main worker already running our query) to each scan about a third of the table. Then the Gather Merge pulls from each worker until it has the top 10. In this case it only needed to pull the top 7 rows from each worker to get its 10; this is reflected in lines 3-9 of the execution plan.

Line 5 makes this especially clear

->  Sort  (cost=2839564.34..2934237.14 rows=37869122 width=112) (actual time=20080.358..20080.359 rows=7 loops=3)

Notice the loops=3, and the rows=37 million. Each worker is scanning its share of the table, and keeping the top 7 it sees.

These 3 groups of 7 are then gathered and merged together in the Gather Merge on line 2

->  Gather Merge  (cost=2840564.36..11677309.78 rows=75738244 width=112) (actual time=20093.790..20096.619 rows=10 loops=1)

Rather than just slapping an index in, and magically watching the time drop down, let’s take a quick detour and make sure we really understand how indexes work. Failing to do this can result in frustration when your database winds up not picking the index you want it to, for reasons that a good understanding could make clear.

Indexes

The best way to think about a database index is in terms of an index in a book. These list all the major terms in the book, as well as all the pages that the term appears on. Imagine you have a 1,000 page book on the American Civil War, and wanted to know what pages Philip Sheridan are mentioned on. It would be excruciatingly slow to just look through all 1,000 pages, searching for those words. But if there’s a 30 or so page index, your task is considerably simpler.

Before we go further, let’s look at a very basic index over a numeric id column

This is a B Tree, which is how (most) indexes in a database are stored.

Start at the very, very bottom. Those blue “leaf” nodes contain the actual data in your index. These are the actual id values. This is a direct analogue to a book’s index.

So what are the gold boxes above them? These help you find where the leaf node is, with the value you’re looking for.

Let’s go to the very top, to the root node of our B Tree. Each of these internal nodes will have N key values, and N+1 pointers. If the value you’re looking for is strictly less than the first value, go down that first, left-most arrow and continue your search. If the value you’re looking for is greater than or equal to that first key, but strictly less than the next key, take the second arrow. And so on. In real life the number of keys in each of these nodes will be determined by how many of them can fit into a single page on disk (and will usually be much more than 3).

So with this B Tree, if we want to find id = 33, we start at the root. The id 33 is not < 19, so we don’t take the first arrow. But 33 is >=19 and <37, so we take the middle arrow.

Now we repeat. The id 33 is not < 25, so we don’t take the left most path. The id 33 is not >= 25 AND < 31, so we don’t take the middle path. But 33 is greater than 31 (it better be, this is the last path remaining), so we take the right most path. And that takes us to the leaf node with our key value.

Notice also that these leaf nodes have pointers forward and backward. This allows us to not only find a specific key value, but also a range of values. If we wanted all ids > 33, we could do as we did, and just keep reading.

But — now what? What if we ran a query of SELECT * FROM books WHERE id = 33 – we’ve arrived at a leaf node in our index with … our key. How do we get all the data associated with that key? In other words the actual row in the database for that value?

The thing I’ve left off so far is that leaf nodes also contain pointers to the actual table where that value in question is.

Returning briefly to our analogy with a book’s index, those heap pointers correspond to the page number beside each index entry, telling you where to go in the book to see the actual content.

So the full story to find a single row in our database by an id value, via an index, would actually look more like this:

We’ll talk later about these heap reads, or lack thereof when we get into covering indexes and the Index Only Scan operation.

Bear with me a little longer. Before we look at what an index on title would look like, and create one in our database to run our query against, let’s take a slightly deeper look at B Trees. Internalizing how they work can be incredibly valuable.

B Trees run in O(LogN) time

You may have been taught a fun little math fact in school, that if you were to be given a penny on January 1st, then have your penny doubled on January 2nd, then have that new amount (2 cents) doubled on January 3rd, etc, you’d have about $10 million dollars before February. That’s the power of exponential operations. Anytime you’re repeatedly multiplying a value by some constant (which is all doubling is, for constant 2), it will become enormous, fast.

Now think of a more depressing, reverse scenario. If someone gave you $10 million on January 1st, but with the condition that your remaining money would be halved each day, you’d have a lowly cent remaining on Feb 1st. This is a logarithm; it’s the inverse of exponentiation. Rather than multiplying a value by some constant, we divide it by some constant. No matter how enormous, it will become small, fast.

This is exactly how B Trees work. In our example B Tree above, there were 9 leaf pages. Our internal nodes had up to 3 pointers. Notice that we were able to find out the exact leaf node we wanted by following only 2 of those gold nodes’ arrows (which is also the depth of the tree).

9 divided by 3 is 3

3 divided by 3 is 1

Or, more succinctly, Log₃9 = 2

Which reads as

Logarithm base 3 of 9 is 2

But these small values don’t really do this concept justice. Imagine if you had an index with whose leaves spanned 4 billion pages, and your index nodes had only 2 pointers, each (both of these assumptions are unrealistic). You’d still need only 32 page reads to find any specific value.

2³² = ~4 billion

and also

Log₂(~4 billion) = 32.

They’re literally inverse operations of each other.

How deep are real indexes?

Before we move on, let’s briefly look at how deep a real Postgres index is on a somewhat large amount of data. The books table with 90 million entries already has an index defined on the primary key id field, which is a 32 bit integer. Without going into gross detail about what all is stored on a B Tree node (N keys, N+1 offsets to other nodes, some metadata and headers, etc), ChatGPT estimates that Postgres can store between 400-500 key fields on an index on a 32 bit integer.

Let’s check that. There’s a Postgres extension for just this purpose.

CREATE EXTENSION IF NOT EXISTS pageinspect;

and then

SELECT * FROM bt_metap('books_pkey');

which produces

magic	version	root	level	fastroot	fastlevel	last_cleanup_num_delpages	last_cleanup_num_tuples	allequalimage
340322	4	116816	3	116816	3	0	-1	t

Note the level 3, which is what our index’s depth is. That means it would take just 3 page reads to arrive at the correct B Tree leaf for any value (this excludes reading the root node itself, which is usually just stored in memory).

Checking the math, the Log₄₅₀(90,000,000) comes out to … 2.998

Taking an index for a spin

Let’s run a quick query by id, with the primary key index that already exists, and then look at how we can create one on title, so we can re-run our query to find the first 10 books in order.

explain analyze
select *
from books
where id = 10000;

which produces the following

We’re running an index scan. No surprises there. The Index Cond

  Index Cond: (id = 10000)

is the condition Postgres uses to navigate the internal nodes; those were the gold nodes from the visualization before. In this case, it predictably looks for id = 10000

Re-visiting our titles sort

Let’s take a fresh look at this query.

select *
from books
order by title
limit 10;

This time let’s define an index, like so

CREATE INDEX idx_title ON books(title);

This index would look something like this (conceptually at least).

Now our query runs in less than a fifth of a millisecond.

Notice what’s missing from this execution plan, that was present on the previous query, when we looked for a single index value.

Did you spot it?

It’s the Index Cond. We’re not actually … looking for anything. We just want the first ten rows, sorted by title. The index stores all books, sorted by title. So the engine just hops right down to the start of the index, and simply reads the first ten rows from the leaf nodes (the blue nodes from the diagrams).

More fun with indexes

Let’s go deeper. Before we start, I’ll point out that values for the pages column was filled with random values from 100-700. So there are 600 possible values for pages, each randomly assigned.

Let’s look at a query to read the titles of books with the 3 maximum values for pages. And let’s pull a lot more results this time; we’ll limit it to one hundred thousand entries

explain analyze
select title, pages
from books
where pages > 697
limit 100000;

As before, we see a parallel sequential scan. We read through the table, looking for the first 100,000 rows. Our condition matches very few results, so the database has to discard through over 6 million records before it finds the first 100,000 matching our condition

Rows Removed by Filter: 6627303

The whole operation took 833ms.

Let’s define an index on pages

CREATE INDEX idx_pages ON books(pages);

You might notice that the pages column is by no means unique; but that doesn’t effect anything: the leaf pages can easily contain duplicate entries.

Everything else works the same as it always has: the database can quickly jump down to a specific value. This allows us to query a particular value, or even grab a range of values sorted on the column we just looked up. For example, if we want all books with pages > 500, we just seek to that value, and start reading.

Let’s re-run that query from before

explain analyze
select title, pages
from books
where pages > 697
limit 100000;

There’s a lot going on. Our index is being used, but not like before

->  Bitmap Index Scan on idx_pages  (cost=0.00..4911.16 rows=451013 width=0) (actual time=38.057..38.057 rows=453891 loops=1)

Bitmap scan means that the database is scanning our table, and noting the heap locations with records matching our filter. It literally builds a bitmap of matching locations, hence the name. It then sorts those locations in order of disk access.

It then pulls those locations from the heap. This is the Bitmap Heap Scan on line 5

->  Parallel Bitmap Heap Scan on books  (cost=5023.92..1441887.67 rows=187922 width=73) (actual time=41.383..1339.997 rows=33382 loops=3)

But remember, this is the heap, and it’s not ordered on pages, so those random locations may have other records not matching our filter. This is done in the Index Recheck on line 6

Recheck Cond: (pages > 697)

which removed 1,603,155 results.

Why is Postgres doing this, rather than just walking our index, and following the pointers to the heap, as before?

Postgres keeps track of statistics on which values are contained in its various columns. In this case, it knew that relatively few values would match on this filter, so it chose to use this index.

But that still doesn’t answer why it didn’t use a regular old index scan, following the various pointers to the heap. Here, Postgres decided that, even though the filter would exclude a large percentage of the table, it would need to read a lot of pages from the heap, and following all those random pointers from the index to the heap would be bad. Those pointers point in all manner of random directions, and Random I/O is bad. In fact, Postgres also stores just how closely, or badly those pointers correspond to the underlying order on the heap for that column via something called correlation. So if, somehow, the book entries in the heap just happened to be stored (more or less) in increasing values of pages, there would be a high correlation on the pages column, and this index would be more likely to be used for this query.

For these reasons, Postgres thought it would be better to use the index to only keep track of which heap locations had relevant records, and then fetch those heap locations in heap order, after the Bitmap scan sorted them. This results in neighboring chunks of memory in the heap being pulled together, rather than frequently following those random pointers from the index.

Again, Random I/O tends to be expensive and can hurt query performance. This was not faulty reasoning at all.

But in this case Postgres wagered wrong. Our query now runs slower than the regular table scan from before, on the same query. It now takes 1.38 seconds, instead of 833ms. Adding an index made this query run slower.

Was I forcing the issue with the larger limit of 100,000? Of course. My goal is to show how indexes work, how they can help, and occasionally, how they can lead the query optimizer to make the wrong choice, and hurt performance. But please don’t think an index causing a worse, slower execution plan is an unhinged, unheard of eventuality which I contrived for this post; I’ve seen it happen on very normal queries on production databases.

The road not traveled

Can we force Postgres to do a regular index scan, to see what might have been? It turns out we can; we can (temporarily) turn off bitmap scans, and run the same query.

SET enable_bitmapscan = off;

explain analyze
select title, pages
from books
where pages > 697
limit 100000;

and now our query runs in just 309ms

Clearly Postgres’s statistics led it astray this time. They’re based on heuristics and probabilities, along with estimated costs for things like disk access. It won’t always work perfectly.

When stats get things right

Before we move on, let’s query all the books with an above-average number of pages

explain analyze
select title, pages
from books
where pages > 400
limit 100000;

In this case Postgres was smart enough to not even bother with the index.

Postgres’ statistics told it that this query would match an enormous number of rows, and just walking across the heap would get it the right results more quickly than bothering with the index. And in this case it assumed correctly. The query ran in just 37ms.

Covering Indexes

Let’s go back to this query

explain analyze
select title, pages
from books
where pages > 697
limit 100000;

It took a little over 800ms with no index, and over 1.3 seconds with an index on just pages.

The shortcoming of our index was that it did not include title, which is needed for this query; Postgres had to keep running to the heap to retrieve it.

Your first instinct might be to just add title to the index.

CREATE INDEX idx_pages_title ON books(pages, title);

Which would look like this:

This would work fine. We’re not needing to filter based on title, only pages. But having those titles in the gold non-leaf nodes wouldn’t hurt one bit. Postgres would just ignore it, find the starting point for all books with > 400 pages, and start reading. There’s be no need for heap access at all, since the titles are right there.

Let’s try it.

Our query now runs in just 32ms! And we have a new operation in our execution plan.

->  Index Only Scan using idx_pages_title on books  (cost=0.69..30393.42 rows=451013 width=73) (actual time=0.243..83.911 rows=453891 loops=1)

Index Only Scan means that only the index is being scanned. Again, there’s no need to look anything up in the heap, since the index has all that we need. That makes this a “covering index” for this query, since the index can “cover” it all.

More or less.

Heap Fetches: 0

That’s Line 4 above, and it is not as redundant as it might seem. Postgres does have to consult something called a visibility table to make sure the values in your index are up to date given how Postgres handles updates through it’s MVCC system. If those values are not up to date, it will have to hit the heap. But unless your data are changing extremely frequently this should not be a large burden.

In this case, it turns out Postgres had to go to the heap zero times.

A variation on the theme

If you’re using Postgres or Microsoft SQL Server you can create an even nicer version of this index. Remember, we’re not actually querying on title here, at all. We just put it in the index so the title values would be in the leaf nodes, so Postgres could read them, without having to visit the heap.

Wouldn’t it be nice it we could only put those titles in the leaf nodes? This would keep our internal nodes smaller, with less content, which, in a real index, would let us cram more key values together, resulting in a smaller, shallower B Tree that would potentially be faster to query.

We do this with the INCLUDE clause when creating our index (in databases that support this feature).

CREATE INDEX idx_pages_include_title ON books(pages) INCLUDE(title);

This tell Postgres to create our index on the pages column, as before, but also include the title field in the leaf nodes. It would look like this, conceptually.

And re-running that same query, we see that it does run a bit faster. From 32ms down to just 21ms.

To be clear, it’s quite fast either way, but a nice 31% speedup isn’t something to turn down if you’re using a database that supports this feature (MySQL does not).

Pay attention to your SELECT clauses

There’s one corollary to the above: don’t request things you don’t need in your queries; don’t default to SELECT *

Requesting only what you need will not only reduce the amount of data that has to travel over the wire, but in extreme cases can mean the difference between an index scan, and an index-only scan. In the above query, if we’d done SELECT * instead of SELECT title, pages, none of the indexes we added would have been able to help; those heap accesses would have continued to hurt us.

Wrapping up

To say that this post is only scratching the surface would be an understatement. The topic of indexing, and query optimization could fill entire books, and of course it has.

Hopefully, this post has you thinking about indexes the right way. Thinking about how indexes are stored on disk, and how they’re read. And never, ever forgetting about the fact that, when scanning an index, you may still need to visit the heap for every matched entry you find, which can get expensive.

Editor’s note: our The Complete Course for Building Backend Web Apps with Go includes setting up a PostgreSQL database and running it in Docker, all from scratch.