
Introduction
Database performance is critical for application responsiveness, user satisfaction, and infrastructure costs. As PostgreSQL databases grow in size and complexity, identifying performance bottlenecks becomes increasingly challenging. This is where pg_stat_statements
comes in – one of the most valuable tools in a PostgreSQL DBA’s toolkit.
In this guide, we’ll explore everything you need to know about pg_stat_statements
, from basic setup to advanced query analysis with real-world examples. You’ll learn how to identify and fix slow queries to optimize your database performance effectively.
Table of Contents
- What is pg_stat_statements?
- Setting Up pg_stat_statements
- Basic Usage and Key Metrics
- Real-world Example: Identifying Slow Queries
- Advanced Analysis Techniques
- Monitoring pg_stat_statements Over Time
- Common Issues and Troubleshooting
- Best Practices
- Conclusion
What is pg_stat_statements?
pg_stat_statements
is a PostgreSQL extension that tracks execution statistics for all SQL statements executed by a server. Unlike standard PostgreSQL logging, which captures individual query executions, pg_stat_statements
aggregates statistics across multiple executions of similar queries, giving you a broader view of database performance patterns.
Key benefits include:
- Normalized query tracking: Similar queries with different parameters are grouped together
- Minimal performance impact: Significantly lower overhead than full query logging
- Comprehensive metrics: Collects execution times, rows processed, disk I/O statistics, and more
- Historical data: Maintains statistics across server restarts (when configured properly)
Setting Up pg_stat_statements
Getting started with pg_stat_statements
is straightforward. Let’s walk through the setup process step by step:
1. Install the Extension
First, you need to add pg_stat_statements
to your shared_preload_libraries
in the postgresql.conf
file:
shared_preload_libraries = 'pg_stat_statements'
You’ll also want to configure these additional parameters:
# Controls how many statements are tracked
pg_stat_statements.max = 10000
# Controls whether to save statistics across server restarts
pg_stat_statements.track_utility = on
# Controls the level of statement normalization
pg_stat_statements.track = all
2. Restart PostgreSQL
After modifying the configuration file, restart your PostgreSQL server:
sudo systemctl restart postgresql
3. Create the Extension
Connect to your database and create the extension:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
4. Verify Installation
Check if the extension is properly installed:
SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements';
You should see one row confirming the extension is installed.
Basic Usage and Key Metrics
Once installed, pg_stat_statements
starts collecting statistics immediately. Let’s examine the most important columns and metrics:
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows,
shared_blks_hit,
shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This query returns the top 10 queries by total execution time. The key columns are:
- query: The SQL statement text
- calls: Number of times the query was executed
- total_exec_time: Total time spent executing the query (in milliseconds)
- mean_exec_time: Average execution time per call (in milliseconds)
- rows: Total number of rows retrieved or affected
- shared_blks_hit: Number of shared block cache hits
- shared_blks_read: Number of shared blocks read from disk
Real-world Example: Identifying Slow Queries
Let’s look at a real-world scenario. Imagine you’re investigating slow performance on an e-commerce application. Here’s how you might use pg_stat_statements
to identify problematic queries:
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows,
mean_exec_time / NULLIF(rows, 0) AS time_per_row
FROM pg_stat_statements
WHERE query ILIKE '%products%'
ORDER BY total_exec_time DESC
LIMIT 5;
This query might return something like:
query | calls | total_exec_time | mean_exec_time | rows | time_per_row
---------------------------------------------------------+-------+-----------------+----------------+-------+---------------
SELECT * FROM products WHERE category_id = $1 | 15240 | 42578.95 | 2.79 | 76200 | 0.56
SELECT p.*, c.name FROM products p JOIN categories ... | 3865 | 15482.67 | 4.01 | 19325 | 0.80
UPDATE products SET stock = stock - $1 WHERE id = $2 | 5422 | 10937.92 | 2.02 | 5422 | 2.02
SELECT COUNT(*) FROM products WHERE price BETWEEN $1... | 8976 | 3587.45 | 0.40 | 8976 | 0.40
SELECT * FROM products ORDER BY created_at DESC LIMIT...| 10245 | 1854.32 | 0.18 | 51225 | 0.04
From this output, we can see:
- The most expensive query is a simple
SELECT
by category_id, with over 15,000 executions - The query has a moderate mean execution time (2.79ms), but its total impact is significant
- The second query joining products with categories has a higher mean execution time (4.01ms)
Based on this analysis, we would prioritize optimizing these two queries, perhaps by adding indexes or rewriting them.
Advanced Analysis Techniques
Let’s explore some advanced techniques to get even more insights from pg_stat_statements
.
Finding Queries with High Disk I/O
SELECT
query,
calls,
shared_blks_read,
shared_blks_hit,
shared_blks_read::float / (shared_blks_hit + shared_blks_read) * 100 AS read_percentage
FROM pg_stat_statements
WHERE shared_blks_read + shared_blks_hit > 1000
ORDER BY shared_blks_read DESC
LIMIT 10;
This query helps identify statements causing high disk I/O. Here’s what you might see:
query | calls | shared_blks_read | shared_blks_hit | read_percentage
---------------------------------------------------------+-------+------------------+----------------+------------------
SELECT * FROM large_table WHERE created_at > $1 | 125 | 56789 | 1245 | 97.86
SELECT * FROM orders WHERE customer_id = $1 | 5688 | 34562 | 23441 | 59.58
SELECT COUNT(*) FROM event_logs WHERE event_type = $1 | 356 | 23451 | 934 | 96.17
This example shows three queries with high disk read operations. The first query is reading 97.86% of its blocks from disk rather than cache, indicating a potential performance issue.
Identifying Parameter Sniffing Problems
Parameter sniffing occurs when the query planner generates a plan optimized for specific parameter values that’s inefficient for other values.
SELECT
query,
calls,
min_exec_time,
max_exec_time,
mean_exec_time,
stddev_exec_time,
stddev_exec_time / NULLIF(mean_exec_time, 0) AS variance_ratio
FROM pg_stat_statements
WHERE calls > 100
ORDER BY variance_ratio DESC
LIMIT 10;
A high variance ratio might indicate parameter sniffing issues:
query | calls |min_exec_time|max_exec_time|mean_exec_time|stddev_exec_time|variance_ratio
--------------------------------------------------------+-------+-------------+-------------+--------------+----------------+--------------
SELECT * FROM items WHERE type_id = $1 | 2456 | 0.12 | 3542.67 | 14.5 | 245.67 | 16.94
SELECT * FROM users WHERE country_code = $1 | 7845 | 0.45 | 789.34 | 8.9 | 98.45 | 11.06
In this example, the first query has excellent performance in some cases (0.12ms) but terrible in others (3542.67ms). This suggests the query plan might be optimized for certain type_id values but not others.
Monitoring pg_stat_statements Over Time
To get the most value from pg_stat_statements
, you should monitor trends over time. Let’s set up a simple tracking system:
- Create a table to store historical data:
CREATE TABLE statement_stats_history (
captured_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
queryid BIGINT,
query TEXT,
calls BIGINT,
total_exec_time DOUBLE PRECISION,
rows BIGINT,
shared_blks_read BIGINT,
shared_blks_hit BIGINT
);
- Create a function to capture current statistics:
CREATE OR REPLACE FUNCTION capture_pg_stat_statements()
RETURNS VOID AS $$
BEGIN
INSERT INTO statement_stats_history (
queryid, query, calls, total_exec_time, rows,
shared_blks_read, shared_blks_hit
)
SELECT
queryid, query, calls, total_exec_time, rows,
shared_blks_read, shared_blks_hit
FROM pg_stat_statements;
-- Reset the statistics (optional)
PERFORM pg_stat_statements_reset();
END;
$$ LANGUAGE plpgsql;
- Schedule this function with a cronjob or PostgreSQL’s pg_cron extension:
-- If using pg_cron
SELECT cron.schedule('0 * * * *', 'SELECT capture_pg_stat_statements()');
This setup captures snapshots hourly, allowing you to analyze performance trends and identify issues that might only occur at specific times.
Common Issues and Troubleshooting
Issue: Statements Not Being Tracked
If you don’t see expected queries in pg_stat_statements
, check:
- Confirm the extension is properly loaded:
SELECT * FROM pg_stat_statements LIMIT 1;
If this gives an error, the extension might not be loaded correctly.
- Verify your
pg_stat_statements.track
setting:
SHOW pg_stat_statements.track;
If set to ‘top’, it might miss nested statements.
Issue: Memory Usage Concerns
pg_stat_statements
consumes shared memory. If you’re tracking too many queries, you might encounter memory issues.
Monitor the number of tracked statements:
SELECT COUNT(*) FROM pg_stat_statements;
If approaching your pg_stat_statements.max
limit, increase the allocation or reset the statistics more frequently.
Issue: Statements Being Reset Unexpectedly
Check if automatic vacuum or maintenance tasks are resetting your statistics:
SELECT query, calls FROM pg_stat_statements WHERE query LIKE '%pg_stat_statements_reset%';
Best Practices
- Right-size your configuration:
- Set
pg_stat_statements.max
based on your application’s query diversity - For most applications, 5,000-10,000 is sufficient
- Set
- Regular maintenance:
- Periodically reset statistics after collecting historic data
- Consider scheduled resets during low-traffic periods
- Complement with other tools:
- Use with EXPLAIN ANALYZE for deeper query analysis
- Combine with pg_stat_activity for real-time monitoring
- Normalize query workload:
- Use prepared statements to improve statement normalization
- Consider consistent coding standards for SQL to improve grouping
Conclusion
pg_stat_statements
is an invaluable tool for PostgreSQL performance monitoring and optimization. By following the steps outlined in this guide, you can effectively identify problematic queries, track performance trends over time, and make data-driven decisions to improve your database’s efficiency.
Remember that performance tuning is an iterative process. Regularly review your pg_stat_statements
data, implement optimizations, and measure the results to ensure your PostgreSQL database continues to perform at its best.