Index (PostgreSQL)

An index is a data structure that allows faster access to the underlying table so that specific tuples can be found quickly. (…). PostgreSQL supports different types of indexes, and not all types are optimal for every scenario and workload. (…)

An index in PostgreSQL can be built on a single column or multiple columns at once; PostgreSQL supports indexes with up to 32 columns. An index can cover all the data in the underlying table, or can index specific values only - in that case, the index is known as "partial". For example, you can decide to index only those values of certain columns that you are going to use the most.

An index can also be unique, meaning that it is used to ensure the uniqueness of the values it indexes, such as, for example, the primary keys of a table. Moreover, an index can be built on top of a user-defined function, which means the index is going to index the return values of those functions.

(Ferrari and Pirozzi 2023, 462 chap.13)

Selectivity

  SELECT COUNT(DISTINCT column_name) / COUNT(*)
  FROM table_name;
  • The closer this number is to 1, the more your index will behave like a primary key.
  • Values closer to 0 may indicate that the column is bad index, however one must note the following:
    • Even at low index selectivity (like 0.00001), you may still be able to derive performance boosts when querying for the minority of indexed data. The database is smart enough to take advantage of that and will ignore the index if it realizes its no better than a sequential scan.

Index Types

B-Tree

B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < <= = >= >

One drawback of the B-Tree index is that it copies the whole column's values into the tree structure; therefore, if you use a B-Tree to index large values (for example, long strings), the index will rapidly grow in size and space.

(Ferrari and Pirozzi 2023, 463 chap.13)

  CREATE INDEX index_name
  ON table_name USING BTREE(column);

Hash Index

Another type of index that PostgreSQL provides is the hash index: this index is built on the result of a hash function for the value of the column(s). It is important to note that the hash index can be used only for equality operators, not for range nor disequality operators. In fact, being an index built on a hash function, the index cannot compare two hash values to understand their ordering; only the equality (which produces the very same hash value) can be evaluated.

(Ferrari and Pirozzi 2023, 463 chap.13)

  CREATE INDEX index_name
  ON table_name USING HASH(column);
  • Only makes sense to prefer this over BTREE if you are dealing with either equality or pertinence (IN clauses) checks all the time.

GIST

Then comes Generalized Index Search Tree (GIST), which is a platform on top of which new index types can be built. The idea is to provide a pluggable infrastructure where you can define operators and features that can index a data structure. An example is SP-GIST, a spatial index used in geographical applications.

(Ferrari and Pirozzi 2023, 463 chap.13)

<< &< &> >> <<| &<| |&> |>> @> <@ ~= &&

SP-GIST

<< >> ~= <@

GIN

GIN is a type of index that instead of pointing to a single tuple points to multiple values, and to some extent, to an array of values.

(Ferrari and Pirozzi 2023, 463 chap.13)

  • Most likelly to be used in full text search scenarios, where we may have duplicated keys that point to different places.

BRIN

Block Range Index (BRIN) is a particular type of index that is based on the range of values in data blocks on storage. The idea is that every block has a minimal and maximal value, and the index then stores a couple of values for every data block on the storage. When a particular value is searched from a query, the index knows in which data block the values can be found, but all the tuples in the block must be evaluated.

Therefore, this type of index is not as accurate as a B-Tree and is called lossy (to emphasize it is not exact; i.e., it can have losses), but it is much smaller in size with respect to all the other types of indexes since it only stores a couple of values for every data block.

(Ferrari and Pirozzi 2023, 463 chap.13)

  • Every block has a minimal and a maximal value.
  • When used effectively, it gives you smaller indexes than a B-Tree.
  • Some recomended usages for this index are:
    • Time-Series data, since every block can represent ranges of time.

Composite/Multicolumn Indexes

Currently, only the B-Tree, GiST, GIN, and BRIN index types support multiple-key-column indexes. Whether there can be multiple key columns is independent of whether INCLUDE columns can be added to the index. Indexes can have up to 32 columns, including INCLUDE columns.

  CREATE INDEX index_name
  ON table_name USING <index_type>(col1, col2, ..., coln);
  • Be careful when it doing SELECT queries to respect the LEFTMOST order (col1, col2, ..., coln) of the index, PostgreSQL will try to use as much of the LEFT indexes as provided when performing certain queries.
  • Sometimes composite indexes are faster when performing WHERE/AND queries, while single-column indexes are more likelly to be chosen in WHERE/OR queries.

Covering Indexes (aka Index-Only Scams)

  • In an ordinary index scan, data can be fetched both from the index and the heap. This heap-access portion can involve a lot of random accesses and may prove to be slow.
  • When using the INCLUDE clause in a INDEX, PostgreSQL will include aditional data in it as non-key, as to avoid costly scan to the heap.
  CREATE [UNIQUE] INDEX index_name
  ON table_name USING BTREE(col1, col2, ...)
  INCLUDE(id);

The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support.

Partial Indexes

  • Only index a subset of the rows.
  • Useful if you plan to have fast scan through a subset of the data anyway.
  • Avoids having to maintain a massive BTREE with mostly data that you usually will never care about.
  CREATE INDEX my_index
  ON table_name USING BTREE (column_name)
  WHERE <predicate>;

Indexes and ORDER BY

In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.

  CREATE INDEX index_name
  ON table_name USING BTREE (col1 ASC, col2 DESC, ...);

which will be matched with an ORDER BY clause if the ordering also matches:

  SELECT col1, col2, ...
  FROM table_name
  (...)
  ORDER BY col1 ASC, col2 DESC, ...;

Functional Indexes

  • An index defined on the result of applying a function1 to one or more columns of a single table.
  CREATE INDEX index_name
  ON table_name (lower(col1));

References:

Ferrari, Luca, and Enrico Pirozzi. 2023. Learn Postgresql: Use, Manage, and Build Secure and Scalable Databases with Postgresql 16. 2nd ed. Packt Publishing Ltd.

Backlinks:

Footnotes: