Index (PostgreSQL)

Indexes are a common way to enhance database performance. An index allows the PostgreSQL server to find and retrieve specific rows much faster than it could do without an index.

Selectivity

  SELECT COUNT(DISTINCT column_name) / COUNT(*)
  FROM table_name;
  • The closer this number is to 1, the more your index will behave like a primary key.
  • Values closer to 0 may indicate that the column is bad index, however one must note the following:
    • Even at low index selectivity (like 0.00001), you may still be able to derive performance boosts when querying for the minority of indexed data. The database is smart enough to take advantage of that and will ignore the index if it realizes its no better than a sequential scan.

Index Types

B-Tree

B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < <= = >= >

  CREATE INDEX index_name
  ON table_name USING BTREE(column);

Hash Index

  CREATE INDEX index_name
  ON table_name USING HASH(column);
  • Only makes sense to prefer this over BTREE if you are dealing with either equality or pertinence (IN clauses) checks all the time.

GIST

<< &< &> >> <<| &<| |&> |>> @> <@ ~= &&

SP-GIST

<< >> ~= <@

GIN

GIN is a type of index that instead of pointing to a single tuple points to multiple values, and to some extent, to an array of values. (Ferrari and Pirozzi 2023, 463)

  • Most likelly to be used in full text search scenarios, where we may have duplicated keys that point to different places.

BRIN

Block Range Index (BRIN) is a particular type of index that is based on the range of values in data blocks on storage. The idea is that every block has a minimal and maximal value, and the index then stores a couple of values for every data block on the storage. When a particular value is searched from a query, the index knows in which data block the values can be found, but all the tuples in the block must be evaluated. (Ferrari and Pirozzi 2023, 463)

  • Every block has a minimal and a maximal value.
  • When used effectively, it gives you smaller indexes than a B-Tree.
  • Some recomended usages for this index are:
    • Time-Series data, since every block can represent ranges of time.

Composite/Multicolumn Indexes

Currently, only the B-tree, GiST, GIN, and BRIN index types support multiple-key-column indexes. Whether there can be multiple key columns is independent of whether INCLUDE columns can be added to the index. Indexes can have up to 32 columns, including INCLUDE columns.

  CREATE INDEX index_name
  ON table_name USING <index_type>(col1, col2, ..., coln);
  • Be careful when it doing SELECT queries to respect the LEFTMOST order (col1, col2, ..., coln) of the index, PostgreSQL will try to use as much of the LEFT indexes as provided when performing certain queries.
  • Sometimes composite indexes are faster when performing WHERE/AND queries, while single-column indexes are more likelly to be chosen in WHERE/OR queries.

Covering Indexes (aka Index-Only Scams)

  • In an ordinary index scan, data can be fetched both from the index and the heap. This heap-access portion can involve a lot of random accesses and may prove to be slow.
  • When using the INCLUDE clause in a INDEX, PostgreSQL will include aditional data in it as non-key, as to avoid costly scan to the heap.
  CREATE [UNIQUE] INDEX index_name
  ON table_name USING BTREE(col1, col2, ...)
  INCLUDE(id);

The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support.

Partial Indexes

  • Only index a subset of the rows.
  • Useful if you plan to have fast scan through a subset of the data anyway.
  • Avoids having to maintain a massive BTREE with mostly data that you usually will never care about.
  CREATE INDEX my_index
  ON table_name USING BTREE (column_name)
  WHERE <predicate>;

Indexes and ORDER BY

In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.

  CREATE INDEX index_name
  ON table_name USING BTREE (col1 ASC, col2 DESC, ...);

which will be matched with an ORDER BY clause if the ordering also matches:

  SELECT col1, col2, ...
  FROM table_name
  (...)
  ORDER BY col1 ASC, col2 DESC, ...;

Functional Indexes

  • An index defined on the result of applying a function1 to one or more columns of a single table.
  CREATE INDEX index_name
  ON table_name (lower(col1));

References:

Ferrari, Luca, and Enrico Pirozzi. 2023. Learn Postgresql: Use, Manage, and Build Secure and Scalable Databases with Postgresql 16. 2nd ed. Packt Publishing Ltd.

Backlinks:

Footnotes: