NOT IN subquery

View as Markdown

Overview

The fieldX NOT IN (<subquery>) predicate returns true if fieldX does not equal any value returned by the subquery. For predicates where fieldX or the <subquery> can contain NULL values, Materialize provides idiomatic SQL alternatives.

Materialize and NOT IN (<subquery>)

When evaluating a WHERE fieldX NOT IN (<subquery>) predicate involving possible NULL values for fieldX or <subquery>, Materialize performs a cross join between the outer relation and the subquery to preserve SQL NULL semantics, which can significantly increase memory usage. If possible, rewrite the query to avoid the cross join.

Idiomatic Materialize SQL

For fieldX NOT IN (<subquery>) predicates involving possible NULL values, the following rewrites are available:

NOTE:

Neither rewrite is strictly equivalent to NOT IN (<subquery>).

Both rewrites avoid the NULL propagation semantics of NOT IN; that is, they treat subquery NULL values as non-matches rather than allowing them to invalidate the comparison. In addition, the NOT EXISTS rewrite retains outer rows whose value is NULL, whereas both NOT IN and the filter-NULLs rewrite exclude them.

  • Rewrite to NOT EXISTS with a correlated subquery.
  • Retain NOT IN, but filter out NULL values from both the outer field and the subquery.
Materialize SQL

Rewrite to NOT EXISTS with a correlated subquery.

SELECT t1.*
FROM t1
WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.a = t1.a)
;

Filter out NULLs on both sides of the NOT IN.

SELECT t1.*
FROM t1
WHERE t1.a IS NOT NULL
  AND t1.a NOT IN (SELECT t2.a FROM t2 WHERE t2.a IS NOT NULL)
;
Anti-pattern

Avoid NOT IN (<subquery>) predicates, which force a cross join between the outer relation and the subquery.

-- Anti-pattern. Avoid. --
SELECT t1.*
FROM t1
WHERE t1.a NOT IN (SELECT t2.a FROM t2) -- Anti-pattern. Avoid.
;

If the subquery uses UNNEST() on a column whose value depends on the outer row:

Examples

NOTE: The example data can be found in the Appendix.

Find items not currently on sale

Using idiomatic Materialize SQL, the following examples find items in the items table whose item value (declared NOT NULL) does not appear in any of this week’s sales arrays in sales_items, a nullable text[]. The subquery uses UNNEST() to expand each week’s items array into individual values for comparison.

If the subquery uses UNNEST() on a column whose value depends on the outer row:

Materialize SQL

Because the subquery uses UNNEST() on a column of the outer-correlated row, factor the UNNEST() into an uncorrelated Common Table Expression (CTE) first.

Rewrite to NOT EXISTS with a CTE for the UNNEST()

WITH this_weeks_sales AS (
  SELECT unnest(items) AS sale_item
  FROM sales_items
  WHERE week_of = date_trunc('week', current_timestamp)
)
SELECT i.item, i.price
FROM items i
WHERE NOT EXISTS (
  SELECT 1 FROM this_weeks_sales s WHERE s.sale_item = i.item
)
ORDER BY i.item
;

Filter out NULLs with a CTE for the UNNEST()

WITH this_weeks_sales AS (
  SELECT unnest(items) AS sale_item
  FROM sales_items
  WHERE week_of = date_trunc('week', current_timestamp)
)
SELECT i.item, i.price
FROM items i
WHERE i.item IS NOT NULL
  AND i.item NOT IN (
    SELECT sale_item FROM this_weeks_sales WHERE sale_item IS NOT NULL
  )
ORDER BY i.item
;
Anti-pattern

Avoid NOT IN (<subquery>), which forces a cross join.

-- Anti-pattern. Avoid. --
SELECT i.item, i.price
FROM items i
WHERE i.item NOT IN (
  SELECT unnest(items) FROM sales_items
  WHERE week_of = date_trunc('week', current_timestamp)
)
ORDER BY i.item
;

See also

Back to top ↑