Subqueries (SQL)

Semi-Joins & Subquery Elimination

A semi-join between two tables R and S returns rows from table R for which there is at least one row from table S with matching values in the joining columns.

(Dombrovskaya, Novikov, and Bailliekova 2021, 138)

  • Perfect for use-cases that benefit from short-circuits.

Conditions

IN / NOT IN

  SELECT *
  FROM <tbl>
  WHERE <col> IN (A, B, ..., N);

  -- Some RDBMSs also support multi-columns

  SELECT *
  FROM <tbl>
  WHERE (<col1>, <col2>) IN (...);
  • Used inside a WHERE clause instead of to avoid using multiple OR conditions.
  • The IN predicate evaluates to NULL if either the expression is NULL or the subquery returns NULL.

        -- To illustrate better, imagine that
        EXPRESSION IN (VALUE [, ...])
    
        -- Can be seem as...
        EXPRESSION = VALUE1
        OR
        EXPRESSION = VALUE2
        OR
        ...
    
        -- Which returns NULL, since NULL V (...) = NULL
    
  • NOT IN doesn't play nice if the subquery returns NULL, the predicate will return FALSE in this situation.

        -- To illustrate better, imagine that
        EXPRESSION NOT IN (VALUE [, ...])
    
        -- Can be seem as...
        EXPRESSION <> VALUE1
        AND
        EXPRESSION <> VALUE2
        AND
        ...
    

EXISTS / NOT EXISTS

  • EXISTS (subquery) evaluates and determines wheter the subquery produces a result (at least one result is enough to make the predicate return TRUE, thus making it a Semi-Join).
  • Can reference variables from the outter query.
  SELECT *
  FROM <tbl>
  WHERE <col> IN (A, B, ..., N);

ANY / SOME

  EXPRESSION OPERATOR ANY (ARRAY EXPRESSION)
  EXPRESSION OPERATOR SOME (ARRAY EXPRESSION)
  • ANY / SOME compares an expression (left) with the right side (which must yield an ARRAY) using an boolean operator.
  • ANY is TRUE if some true result is obtained. It is considered FALSE otherwise (including if the ARRAY has zero elements).
  • ANY is considered FALSE if no subquery returns TRUE.

        SELECT *
        FROM <tbl>
        WHERE <col> IN (A, B, ..., N);
    

If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.

ALL

  EXPRESSION OPERATOR ALL (ARRAY EXPRESSION)
  • ALL is TRUE if all the subquery comparisons are equal or if the subquery returns no rows.
  • ALL is FALSE if at least comparisson fails.

If the array expression yields a null array, the result of ALL will be null. If the left-hand expression yields null, the result of ALL is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no false comparison result is obtained, the result of ALL will be null, not true (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.

Subqueries as Expressions

Scalar Subquery

Results in a single row.

Multi-Row Subquery

Results in multiple rows, tipically used in the where clause with operators such as IN, NOT, EXISTS, ANY and ALL.

Correlated Subquery

A uncorrelated sub-query is a type of sub-query where the inner query doesn’t depend upon the outer query for its execution.

  SELECT
    foo,
    (SELECT
      AVG(...)
     FROM B
     WHERE B.foo = A.foo) AS bar
  FROM A;

Uncorrelated Subquery

A correlated sub-query is a type of query where the inner query depends upon the outcome of the outer query in order to perform its execution.

  SELECT
    foo,
    (SELECT
      AVG(...)
     FROM B) AS bar
  FROM A;

Subqueries as Tables

References:

Dombrovskaya, Henrietta, Boris Novikov, and Anna Bailliekova. 2021. Postgresql Query Optimization. Springer.