Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 88 additions & 10 deletions use-timescale/extensions/pg-textsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,18 @@ products: [cloud, self_hosted]

import EA1125 from "versionContent/_partials/_early_access_11_25.mdx";
import SINCE010 from "versionContent/_partials/_since_0_1_0.mdx";
import SINCE040 from "versionContent/_partials/_since_0_4_0.mdx";
import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";

# Optimize full text search with BM25

$PG full-text search at scale consistently hits a wall where performance degrades catastrophically.
$PG full-text search at scale consistently hits a wall where performance degrades catastrophically.
$COMPANY's [pg_textsearch][pg_textsearch-github-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG,
with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and
provides better search quality and performance than the $PG built-in full-text search.
with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and
provides better search quality and performance than the $PG built-in full-text search. With Block-Max WAND optimization,
`pg_textsearch` delivers up to **4x faster top-k queries** compared to native BM25 implementations. Advanced compression
using delta encoding and bitpacking reduces index sizes by **41%** while improving query performance by 10-20% for
shorter queries.

BM25 scores in `pg_textsearch` are returned as negative values, where lower (more negative) numbers indicate better
matches. `pg_textsearch` implements the following:
Expand Down Expand Up @@ -73,7 +77,7 @@ You have installed `pg_textsearch` on $CLOUD_LONG.

## Create BM25 indexes on your data

BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus
BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus
statistics and better algorithmic design.

To create a BM25 index with pg_textsearch:
Expand Down Expand Up @@ -109,21 +113,65 @@ To create a BM25 index with pg_textsearch:
WITH (text_config='english');
```

BM25 supports single-column indexes only.
BM25 supports single-column indexes only. For optimal performance, load your data first, then create the index.

</Procedure>

You have created a BM25 index for full-text search.

## Accelerate indexing with parallel builds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait on this, parallel indexing is not coming until 0.5.0


`pg_textsearch` supports parallel index builds for faster indexing of large tables. $PG automatically uses parallel workers
based on table size and the `max_parallel_maintenance_workers` configuration.

<Procedure>

1. **Configure parallel workers (optional)**

```sql
-- Set parallel workers (uses server defaults if not specified)
SET max_parallel_maintenance_workers = 4;
```

1. **Create index on a large table**

```sql
-- Parallel workers are used automatically for large tables
CREATE INDEX products_search_idx ON products
USING bm25(description)
WITH (text_config='english');
```

You see a notice when parallel build is used:

```
NOTICE: Using parallel index build with 4 workers (1000000 tuples)
```

</Procedure>

For partitioned tables, each partition builds its index independently with parallel workers if the partition is large
enough. This enables efficient indexing of very large partitioned datasets.

## Optimize search queries for performance

Use efficient query patterns to leverage BM25 ranking and optimize search performance.
Use efficient query patterns to leverage BM25 ranking and optimize search performance. The `<@>` operator provides
BM25-based ranking scores as negative values, where lower (more negative) scores indicate better matches. In `ORDER BY`
clauses, the index is automatically detected from the column. For `WHERE` clause filtering, use `to_bm25query()` with
an explicit index name.

<Procedure>

1. **Perform ranked searches using the distance operator**

```sql
-- Simplified syntax: index is automatically detected in ORDER BY
SELECT name, description, description <@> 'ergonomic work' as score
FROM products
ORDER BY score
LIMIT 3;

-- Alternative explicit syntax (works in all contexts)
SELECT name, description, description <@> to_bm25query('ergonomic work', 'products_search_idx') as score
FROM products
ORDER BY score
Expand All @@ -142,6 +190,8 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor

1. **Filter results by score threshold**

For filtering with WHERE clauses, use explicit index specification with `to_bm25query()`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: need to remove this restriction


```sql
SELECT name, description <@> to_bm25query('wireless', 'products_search_idx') as score
FROM products
Expand All @@ -163,7 +213,7 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor
FROM products
WHERE price < 500
AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -0.5
ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx')
ORDER BY score

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe you can use the computed score here unless you wrap the select inside of an CTE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Greg, I tested this on the latest build today, and it seemed to work.

LIMIT 5;
```

Expand Down Expand Up @@ -342,17 +392,30 @@ Customize `pg_textsearch` behavior for your specific use case and data character
threshold, it automatically flushes to a segment at transaction commit.

```sql
-- Set memtable spill threshold (default 800000 posting entries, ~8MB segments)
SET pg_textsearch.memtable_spill_threshold = 1000000;
-- Set memtable spill threshold (default 32000000 posting entries, ~1M docs/segment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this is going to churn again in upcoming release as I continue to perfect the Colonel's secret recipe

SET pg_textsearch.memtable_spill_threshold = 32000000;

-- Set bulk load spill threshold (default 100000 terms per transaction)
SET pg_textsearch.bulk_load_threshold = 150000;

-- Set default query limit when no LIMIT clause is present (default 1000)
SET pg_textsearch.default_limit = 5000;

-- Enable Block-Max WAND optimization for faster top-k queries (enabled by default)
SET pg_textsearch.enable_bmw = true;

-- Log block skip statistics for debugging query performance (disabled by default)
SET pg_textsearch.log_bmw_stats = false;
```
<SINCE010 />

```sql
-- Enable segment compression using delta encoding and bitpacking (enabled by default)
-- Reduces index size by ~41% with 10-20% query performance improvement for shorter queries
SET pg_textsearch.compress_segments = on;
```
<SINCE040 />

1. **Configure language-specific text processing**

You can create multiple BM25 indexes on the same column with different language configurations:
Expand Down Expand Up @@ -387,11 +450,26 @@ Customize `pg_textsearch` behavior for your specific use case and data character
WHERE indexrelid::regclass::text ~ 'bm25';
```

- View detailed index information
- View index summary with corpus statistics and memory usage
```sql
SELECT bm25_summarize_index('products_search_idx');
```

- View detailed index structure (output is truncated for display)
```sql
SELECT bm25_dump_index('products_search_idx');
```

- Export full index dump to a file for detailed analysis
```sql
SELECT bm25_dump_index('products_search_idx', '/tmp/index_dump.txt');
```

- Force memtable spill to disk (useful for testing or memory management)
```sql
SELECT bm25_spill_index('products_search_idx');
```

</Procedure>

You have configured `pg_textsearch` for optimal performance. For production applications, consider implementing result
Expand Down
5 changes: 1 addition & 4 deletions use-timescale/schema-management/about-constraints.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ CREATE TABLE conditions (
);
```

<CreateHypertablePolicyNote />

This example also references values in another `locations` table using a foreign
key constraint.

Expand All @@ -50,7 +48,6 @@ Time columns used for partitioning must not allow `NULL` values. A

</Highlight>

For more information on how to manage constraints, see the
[$PG docs][postgres-createconstraint].
For more information on how to manage constraints, see the [$PG docs][postgres-createconstraint].

[postgres-createconstraint]: https://www.postgresql.org/docs/current/ddl-constraints.html