Skip to content

Conversation

@NikolayS
Copy link
Owner

pg_stat_statements: Add rows_scanned and rows_filtered columns

Summary

This PR adds two new columns to pg_stat_statements to help identify queries that may benefit from better indexing:

  • rows_scanned - Total rows fetched from storage by scan nodes before any filtering
  • rows_filtered - Total rows removed by filter conditions (scanqual, joinqual, and other quals)

These metrics are useful for identifying inefficient queries that scan many rows but return few results.

Implementation Details

  • Enables per-node instrumentation (INSTRUMENT_ALL) in ExecutorStart hook
  • Walks the plan tree in ExecutorEnd to collect statistics from all nodes
  • For rows_scanned: sums tuples from scan nodes (SeqScan, IndexScan, etc.) including filtered tuples
  • For rows_filtered: sums nfiltered1 (scanqual/joinqual) and nfiltered2 (other quals) from all nodes
  • Reads both ntuples and tuplecount from instrumentation to capture complete tuple counts

Example Output

SELECT query, rows, rows_scanned, rows_filtered 
FROM pg_stat_statements;

                 query                  | rows | rows_scanned | rows_filtered
----------------------------------------+------+--------------+---------------
 SELECT * FROM test_rows WHERE id <= $1 |   10 |         1000 |           990
 SELECT * FROM test_rows WHERE id > $1  |  500 |         1000 |           500
 SELECT * FROM test_simple              |  100 |          100 |             0
 SELECT * FROM test_simple LIMIT $1     |    5 |            5 |             0

Add a new rows_scanned column to pg_stat_statements that tracks the
total number of rows scanned by scan nodes (SeqScan, IndexScan,
IndexOnlyScan, BitmapHeapScan, etc.) before filter conditions are
applied. This metric is collected by walking the plan tree and summing
up ntuples + nfiltered1 for all scan nodes.

This information is valuable for identifying queries that scan many
rows but return few, which often indicates missing indexes or
suboptimal query plans. Combined with the existing rows column, users
can calculate the filtering efficiency of their queries.

The new column appears after rows in the view, so existing queries
that select specific columns by name will continue to work. Bump
extension version to 1.14.
Add a rows_filtered column to pg_stat_statements to track rows removed
by scan/join/other filter conditions. This metric helps identify queries
that may benefit from better indexing.

The implementation:
- Enables per-node instrumentation with INSTRUMENT_ALL before ExecutorStart
- Walks the plan tree in ExecutorEnd to sum nfiltered1 (scanqual/joinqual)
  and nfiltered2 (other quals) from all nodes
- Reads both ntuples and tuplecount to capture complete tuple counts from
  both completed and current execution cycles
- Includes the new column in the SQL function and view for version 1.15
@NikolayS NikolayS force-pushed the claude/pg-stat-statements-metrics-ddwRA branch from 0f4c486 to 0e9d6f8 Compare December 27, 2025 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants