Full Text Search (PostgreSQL)
Parsing Documents
PostgreSQL provides the function to_tsvector for converting a document to the
tsvector data type. A ts_vector is a sorted list of distinct lexemes:
<database>#= SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
tsvector
----------------------------------------------------
'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
<database>#= SELECT to_tsvector('a fat cat sat on a mat and ate a fat rat');
to_tsvector
-------------------------------------------------------------------------------
'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4
<database>#= SELECT 'The quick brown fox jumps over the lazy dog'::tsvector;
tsvector
---------------------------------------------------------------
'The' 'brown' 'dog' 'fox' 'jumps' 'lazy' 'over' 'quick' 'the'
<database>#= SELECT to_tsvector('The quick brown fox jumps over the lazy dog');
to_tsvector
---------------------------------------------------------------------------
'brown':3 'dog':9 'fox':4 'jumps':5 'lazy':8 'over':6 'quick':2 'the':1,7
A ts_query can be used to build lexemes and apply then to certain operators.
The match operator
@@returns true if a document satisfies the query and false otherwise. It can be used in theWHEREclause of aSELECTstatement or anywhere else where a Boolean expression is expected.
<database>#= SELECT 'fat & rat'::tsquery;
tsquery
---------------
'fat' & 'rat'
<database>#= SELECT to_tsvector('a fat cat sat on a mat and ate a fat rat') @@ to_tsquery('fat | cat') as result;
result
--------
t
<database>#= SELECT to_tsvector('a fat cat sat on a mat and ate a fat rat') @@ to_tsquery('bat | hat') as result;
result
--------
f
you can leverage generated columns when creating tables with searchable content.
CREATE TABLE user_comment (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
content TEXT NOT NULL,
search_vector_en TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);
The following query picks all the movies whose titles contain "star" immediately followed by either "wars" or "trek".
SELECT title
FROM movies
WHERE to_tsvector(title) @@ to_tsquery('star <-> (wars | trek)');
Highlights
To present search results it is ideal to show a part of each document and how it is related to the query. Usually, search engines show fragments of the document with marked search terms. PostgreSQL provides a function
ts_headlinethat implements this functionality.
Websearch
SELECT
plainto_tsquery('star trek') as plain,
phraseto_tsquery('star trek') as phrase,
websearch_to_tsquery('star trek -khan') as websearch;
-- Outputs
plain | phrase | websearch
-----------------+-------------------+-----------------
'star' & 'trek' | 'star' <-> 'trek' | 'star' & 'trek' & !'khan'
Ranking
SELECT
*,
ts_rank(
-- base
to_tsvector(title),
-- query
to_tsquery('star & (wars | trek)'),
-- weight (optional)
-- 0 makes it so the length is not taken into account
-- 1 favors shorter matches
1
) AS rank
FROM bookmarks
ORDER BY rank DESC;
Setting weights
setweight