Full Text Search (PostgreSQL)
Parsing Documents
PostgreSQL provides the function to_tsvector
for converting a document to the
tsvector
data type. A ts_vector
is a sorted list of distinct lexemes:
<database>#= SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector; tsvector ---------------------------------------------------- 'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat' <database>#= SELECT to_tsvector('a fat cat sat on a mat and ate a fat rat'); to_tsvector ------------------------------------------------------------------------------- 'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4 <database>#= SELECT 'The quick brown fox jumps over the lazy dog'::tsvector; tsvector --------------------------------------------------------------- 'The' 'brown' 'dog' 'fox' 'jumps' 'lazy' 'over' 'quick' 'the' <database>#= SELECT to_tsvector('The quick brown fox jumps over the lazy dog'); to_tsvector --------------------------------------------------------------------------- 'brown':3 'dog':9 'fox':4 'jumps':5 'lazy':8 'over':6 'quick':2 'the':1,7
A ts_query
can be used to build lexemes and apply then to certain operators.
The match operator
@@
returns true if a document satisfies the query and false otherwise. It can be used in theWHERE
clause of aSELECT
statement or anywhere else where a Boolean expression is expected.
<database>#= SELECT 'fat & rat'::tsquery; tsquery --------------- 'fat' & 'rat' <database>#= SELECT to_tsvector('a fat cat sat on a mat and ate a fat rat') @@ to_tsquery('fat | cat') as result; result -------- t <database>#= SELECT to_tsvector('a fat cat sat on a mat and ate a fat rat') @@ to_tsquery('bat | hat') as result; result -------- f
you can leverage generated columns when creating tables with searchable content.
CREATE TABLE user_comment ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, content TEXT NOT NULL, search_vector_en TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', content)) STORED );
The following query picks all the movies whose titles contain "star" immediately followed by either "wars" or "trek".
SELECT title FROM movies WHERE to_tsvector(title) @@ to_tsquery('star <-> (wars | trek)');
Highlights
To present search results it is ideal to show a part of each document and how it is related to the query. Usually, search engines show fragments of the document with marked search terms. PostgreSQL provides a function
ts_headline
that implements this functionality.
Websearch
SELECT plainto_tsquery('star trek') as plain, phraseto_tsquery('star trek') as phrase, websearch_to_tsquery('star trek -khan') as websearch; -- Outputs plain | phrase | websearch -----------------+-------------------+----------------- 'star' & 'trek' | 'star' <-> 'trek' | 'star' & 'trek' & !'khan'
Ranking
SELECT *, ts_rank( -- base to_tsvector(title), -- query to_tsquery('star & (wars | trek)'), -- weight (optional) -- 0 makes it so the length is not taken into account -- 1 favors shorter matches 1 ) AS rank FROM bookmarks ORDER BY rank DESC;
Setting weights
setweight