pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
Hello
This is my first patch that I wanted to submit since nearly 2 years.
It adds a new `calls_aborted` counter to `pg_stat_statements` that
tracks queries terminated due to statement timeouts, user
cancellations, or other errors during execution. Unlike the existing
calls counter which only increments on successful completion,
`calls_aborted` provides visibility into failed query attempts,
enabling better monitoring of query reliability and timeout issues
without needing to examine logs. The counter is incremented in the
`ExecutorRun` `PG_FINALLY` block to capture queries that start
execution but fail to complete.
I work mostly as a web developer in environments with sensitive data.
Leveraging `pg_stat_statements` for cancelled and timed-out queries
instead of logs could be very beneficial for developers who want quick
access to an overview of problematic queries without having access to
all logs and more sensitive data (slow queries, auto explain, etc).
Status: For discussion
Branch: master
Testing: Compiles successfully and passes all regression tests,
including new tests for this feature
Performance impact:
Minimal performance impact. Only adds:
- One additional int64 field to the counters structure
- A hash table lookup and counter increment in the `PG_FINALLY` block
(only for aborted queries)
- No impact on successful query execution paths
Implementation rationale:
The patch tracks aborted queries specifically in `ExecutorRun's
PG_FINALLY` block because:
1. Execution focus: Queries that fail during parsing/planning are
fundamentally different from queries that start executing but get
cancelled
2. Practical monitoring: Operations teams probably care most about
queries that consume resources (start execution) but fail to complete.
It follows my earlier need of more "dev oriented" columns.
3. Consistent semantics: Matches the existing calls counter which
tracks execution completions even if this counter is quite "alone" as
it is not linking to other incremented columns.
The approach only increments `calls_aborted` for existing entries
(doesn't create new ones, using parsed) since queries that would be
tracked are already identified during earlier query phases.
Anticipated questions and responses
(Thanks to Michael Paquier and Tomas Vondra for early discussion on
this patch 🙌)
A. Why not use PGSS_ABORT kind?
pgssStoreKind is designed for successful operations (plans,
executions) or storing "setups" with PGSS_INVALID. For me aborted
calls don't have meaningful timing/resource statistics to store in the
arrays, and it would require extending ALL the timing/buffer/WAL
arrays for incomplete operations.
B. Could we collect additional stats about aborted runs (buffer/WAL
usage, etc.)?
I chose not to do this because it would require modifying execMain.c
and dealing with interrupted queries to gather partial information. It
seems quite ambitious to do as a first patch. I see `calls_aborted`
more as a probe to identify problematic queries for further
investigation in logs or at the application level.
C. Why not track earlier phases?
I deliberately focused `calls_aborted` on executor-level failures
rather than earlier phases (parser/analyzer/rewriter/planner) because
they serve different operational purposes. Earlier phase failures are
typically development-time errors (syntax mistakes, missing tables,
type mismatches) that don't provide actionable operational insights.
Executor aborts represent runtime operational issues (query timeouts,
cancellations, resource exhaustion, lock conflicts, etc.) that
indicate performance degradation or capacity problems requiring
attention. This design keeps the feature focused on what matters for
production monitoring: distinguishing between queries that "worked
before but now fail operationally" versus "never worked due to code
bugs." The implementation is also cleaner, avoiding the complexity of
hooking multiple subsystems and classifying different error types but
of course I may be wrong. ;)
Previous discussions:
- /messages/by-id/20171112223906.dqtl3kk3pd7vn6yc@alap3.anarazel.de
- /messages/by-id/4671.1510537167@sss.pgh.pa.us
- https://wiki.postgresql.org/wiki/PGConf.dev_2025_Developer_Unconference
"Could we count up timeouts?"
Thanks
__________
Benoit Tigeot
benoit.tigeot@gmail.com
Attachments:
patch_pgss_calls_aborted_v4.patchapplication/octet-stream; name=patch_pgss_calls_aborted_v4.patchDownload
diff --git a/contrib/pg_stat_statements/expected/dml.out b/contrib/pg_stat_statements/expected/dml.out
index 347cb8699e4..80ae1a5bc06 100644
--- a/contrib/pg_stat_statements/expected/dml.out
+++ b/contrib/pg_stat_statements/expected/dml.out
@@ -172,3 +172,24 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
(1 row)
+-- aborted calls tracking
+SELECT pg_sleep(0.5);
+ pg_sleep
+----------
+
+(1 row)
+
+SET statement_timeout = '50ms';
+SELECT pg_sleep(0.5);
+ERROR: canceling statement due to statement timeout
+SELECT pg_sleep(0.5), 'test';
+ERROR: canceling statement due to statement timeout
+SET statement_timeout = '0';
+SELECT query, calls, calls_aborted FROM pg_stat_statements
+WHERE query LIKE '%pg_sleep%' ORDER BY query COLLATE "C";
+ query | calls | calls_aborted
+-------------------------+-------+---------------
+ SELECT pg_sleep($1) | 1 | 1
+ SELECT pg_sleep($1), $2 | 0 | 1
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index 726383a99d7..d5ebb59d5a1 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -425,6 +425,7 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.13';
mean_plan_time | double precision | | |
stddev_plan_time | double precision | | |
calls | bigint | | |
+ calls_aborted | bigint | | |
total_exec_time | double precision | | |
min_exec_time | double precision | | |
max_exec_time | double precision | | |
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql b/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql
index 2f0eaf14ec3..fe8bcb6398b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql
@@ -25,6 +25,7 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT mean_plan_time float8,
OUT stddev_plan_time float8,
OUT calls int8,
+ OUT calls_aborted int8,
OUT total_exec_time float8,
OUT min_exec_time float8,
OUT max_exec_time float8,
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9fc9635d330..61cd1fe6bb7 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -98,7 +98,7 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
#define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */
#define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */
#define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */
-#define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
+#define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC] + c.calls_aborted) == 0)
/*
* Extension version number, for supporting older extension versions' objects
@@ -155,6 +155,9 @@ typedef struct pgssHashKey
typedef struct Counters
{
int64 calls[PGSS_NUMKIND]; /* # of times planned/executed */
+
+ int64 calls_aborted; /* # of times query was aborted */
+
double total_time[PGSS_NUMKIND]; /* total planning/execution time,
* in msec */
double min_time[PGSS_NUMKIND]; /* minimum planning/execution time in
@@ -1033,6 +1036,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
static void
pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count)
{
+ volatile bool query_completed = false;
+
nesting_level++;
PG_TRY();
{
@@ -1040,9 +1045,52 @@ pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count)
prev_ExecutorRun(queryDesc, direction, count);
else
standard_ExecutorRun(queryDesc, direction, count);
+
+ /* Mark as completed if we reach here */
+ query_completed = true;
}
PG_FINALLY();
{
+ /*
+ * Check if tracking was enabled when ExecutorRun started.
+ * We use (nesting_level - 1) because nesting_level was incremented
+ * at the start of this function, but the tracking decision should
+ * be based on the level when the query began execution.
+ * This is crucial for PGSS_TRACK_TOP mode to work correctly.
+ */
+ bool was_enabled = pgss_enabled(nesting_level - 1);
+
+ if (!query_completed && pgss && was_enabled &&
+ queryDesc->plannedstmt->queryId != UINT64CONST(0))
+ {
+ pgssHashKey key;
+ pgssEntry *entry;
+
+ /* Clear padding to ensure proper hash key comparison */
+ memset(&key, 0, sizeof(pgssHashKey));
+
+ key.userid = GetUserId();
+ key.dbid = MyDatabaseId;
+ /* nesting_level was incremented at start of ExecutorRun */
+ key.toplevel = (nesting_level == 1);
+ key.queryid = queryDesc->plannedstmt->queryId;
+
+ LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
+
+ /* Only increment calls_aborted if entry already exists.
+ * Entries are created in pgss_post_parse_analyze for queries with constants.
+ * If no entry exists, the query wouldn't normally be tracked anyway. */
+ entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
+
+ if (entry)
+ {
+ SpinLockAcquire(&entry->mutex);
+ entry->counters.calls_aborted++;
+ SpinLockRelease(&entry->mutex);
+ }
+ LWLockRelease(pgss->lock);
+ }
+
nesting_level--;
}
PG_END_TRY();
@@ -1578,8 +1626,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_10 43
#define PG_STAT_STATEMENTS_COLS_V1_11 49
#define PG_STAT_STATEMENTS_COLS_V1_12 52
-#define PG_STAT_STATEMENTS_COLS_V1_13 54
-#define PG_STAT_STATEMENTS_COLS 54 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_13 55
+#define PG_STAT_STATEMENTS_COLS 55 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1922,6 +1970,13 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (kind == PGSS_EXEC || api_version >= PGSS_V1_8)
{
values[i++] = Int64GetDatumFast(tmp.calls[kind]);
+
+ /* Add calls_aborted right after execution calls */
+ if (kind == PGSS_EXEC && api_version >= PGSS_V1_13)
+ {
+ values[i++] = Int64GetDatumFast(tmp.calls_aborted);
+ }
+
values[i++] = Float8GetDatumFast(tmp.total_time[kind]);
}
diff --git a/contrib/pg_stat_statements/sql/dml.sql b/contrib/pg_stat_statements/sql/dml.sql
index 9986b0a22d3..199f018ebc3 100644
--- a/contrib/pg_stat_statements/sql/dml.sql
+++ b/contrib/pg_stat_statements/sql/dml.sql
@@ -93,3 +93,13 @@ SELECT
FROM pg_stat_statements;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- aborted calls tracking
+SELECT pg_sleep(0.5);
+SET statement_timeout = '50ms';
+SELECT pg_sleep(0.5);
+SELECT pg_sleep(0.5), 'test';
+SET statement_timeout = '0';
+
+SELECT query, calls, calls_aborted FROM pg_stat_statements
+WHERE query LIKE '%pg_sleep%' ORDER BY query COLLATE "C";
On Tue, Aug 12, 2025 at 04:54:10PM +0200, Benoit Tigeot wrote:
I deliberately focused `calls_aborted` on executor-level failures
rather than earlier phases (parser/analyzer/rewriter/planner) because
they serve different operational purposes. Earlier phase failures are
typically development-time errors (syntax mistakes, missing tables,
type mismatches) that don't provide actionable operational insights.
Executor aborts represent runtime operational issues (query timeouts,
cancellations, resource exhaustion, lock conflicts, etc.) that
indicate performance degradation or capacity problems requiring
attention. This design keeps the feature focused on what matters for
production monitoring: distinguishing between queries that "worked
before but now fail operationally" versus "never worked due to code
bugs." The implementation is also cleaner, avoiding the complexity of
hooking multiple subsystems and classifying different error types but
of course I may be wrong. ;)
That seems kind of limited to me in scope. The executor is only one
part of the system. I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase, because you should know the latest query_id in this
case to be able to put a counter update in the correct slot (right?).
+-- aborted calls tracking
+SELECT pg_sleep(0.5);
+ pg_sleep
+----------
+
+(1 row)
Using hardcoded sleep times for deterministic tests is never a good
idea. On fast machines, they eat time for nothing. And if not
written correctly, they may not achieve their goal on slow machines
because the sleep threshold may be reached before the custom action is
taken. If you want to force a failure, you should just use a SQL that
you know would fail at execution time (based on your implementation
expects).
--
Michael
Thanks Michael
Thu, Aug 14, 2025 at 10:18 AM, Michael Paquier <michael@paquier.xyz> wrote:
That seems kind of limited to me in scope. The executor is only one
part of the system. I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase, because you should know the latest query_id in this
case to be able to put a counter update in the correct slot (right?).
I will make a v5 patch with this approach. It should also address your
second comment about using `pg_sleep` in the test. I used this
approach because it sounds that it's the closest to a real timeout
issue in production, more like integration testing. I will try to
change that with the new patch.
__________
Benoit Tigeot
On Thu, Aug 14, 2025 at 11:04:27AM +0200, Benoit Tigeot wrote:
Thanks Michael
Thu, Aug 14, 2025 at 10:18 AM, Michael Paquier <michael@paquier.xyz> wrote:
That seems kind of limited to me in scope. The executor is only one
part of the system. I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase, because you should know the latest query_id in this
case to be able to put a counter update in the correct slot (right?).I will make a v5 patch with this approach.
I'm wondering how useful that counter alone will be. Since
pg_stat_statements is already quite large, wouldn't it be better to have
another extension that tracks something like (queryid, sqlstate), with maybe
the number of errors, the oldest and the newest timestamp or something like
that? You would have way more information to diagnose various problems, and
can optionally use pg_stat_statements if you need the query text from there (vs
eg. the logs).
On 14 Aug 2025, at 10:18, Michael Paquier <michael@paquier.xyz> wrote:
That seems kind of limited to me in scope. The executor is only one
part of the system. I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase, because you should know the latest query_id in this
case to be able to put a counter update in the correct slot (right?).+-- aborted calls tracking +SELECT pg_sleep(0.5); + pg_sleep +---------- + +(1 row)Using hardcoded sleep times for deterministic tests is never a good
idea. On fast machines, they eat time for nothing. And if not
written correctly, they may not achieve their goal on slow machines
because the sleep threshold may be reached before the custom action is
taken. If you want to force a failure, you should just use a SQL that
you know would fail at execution time (based on your implementation
expects).
Hello,
Thanks for the review.
On scope and callbacks
- I prototyped a version with RegisterXactCallback and RegisterSubXactCallback but I’m not very happy with the result. Same limitation as v4: we only have a stable identity (queryId) after parse/analyze. The xact/subxact callbacks fire with no statement context, so you still need to know “which statement was active” at abort time.
- To deal with that, I keep a small backend-local tracker of the current statement’s key (userid, dbid, queryId, toplevel) when we enter the executor, mark it completed on success, and bump calls_aborted in the callbacks only if active && !completed.
- This reliably captures executor-time failures and timeouts. It cannot attribute errors that happen before post-parse (syntax/name resolution), because queryId isn’t set yet. That’s the hard boundary. Planner failures aren’t attributed in this version because the tracker is set in ExecutorRun..still.
On the value of callbacks
- It seems callbacks alone are insufficient (no per-statement context).
- If we want to preserve pgss semantics (only report statements that reached post-parse and have a queryId), the v4 approach is simpler and safer; I could look into planner error handling in a next patch version.
- If we also want to count very-early errors, that implies raw-text keyed entries (no queryId), which changes pg_stat_statements’ semantics (normalization, PII risk, cardinality). That probably should be separate extension as Julien Rouhaud suggested: /messages/by-id/aJ2ov4KYM2vX4uAs@jrouhaud. Not sure I would be able to make this, also I need to have something that will be supported by managed PostgreSQL with extension restrictions.
Happy to adjust if you see a better way to thread “latest queryId” into the abort path without the local tracker.
On pre-creating entries earlier
- I considered creating entries from raw text to count very-early errors, but I’ve avoided it: pgss is keyed on the normalized parse tree (queryId). Pre-hashing raw text weakens normalization, risks storing sensitive literals before jumbling, and adds lock churn in hot paths.
Tests
- Agreed on sleep flakiness. I switched to deterministic executor-time failures: duplicate key (primary key violation) and a check constraint violation. Both trip calls_aborted without timing assumptions.
Also, are we okay tracking statements that never reach post-parse (i.e., no queryId)? That would effectively key on raw text, which changes pg_stat_statements’ semantics.
Benoit
Attachments:
patch_pgss_calls_aborted_v5.patchapplication/octet-stream; name=patch_pgss_calls_aborted_v5.patch; x-unix-mode=0644Download
diff --git a/contrib/pg_stat_statements/expected/dml.out b/contrib/pg_stat_statements/expected/dml.out
index 347cb8699e4..ac5886b250f 100644
--- a/contrib/pg_stat_statements/expected/dml.out
+++ b/contrib/pg_stat_statements/expected/dml.out
@@ -172,3 +172,24 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
(1 row)
+-- aborted calls tracking
+DROP TABLE IF EXISTS t_dup, t_check;
+NOTICE: table "t_dup" does not exist, skipping
+NOTICE: table "t_check" does not exist, skipping
+CREATE TABLE t_dup(id int primary key);
+INSERT INTO t_dup VALUES (1); -- success
+CREATE TABLE t_check(x int CHECK (x > 0));
+INSERT INTO t_dup VALUES (1); -- unique violation, aborted and success reported
+ERROR: duplicate key value violates unique constraint "t_dup_pkey"
+DETAIL: Key (id)=(1) already exists.
+INSERT INTO t_check VALUES (0); -- check violation, only failure reported
+ERROR: new row for relation "t_check" violates check constraint "t_check_x_check"
+DETAIL: Failing row contains (0).
+SELECT query, calls, calls_aborted FROM pg_stat_statements
+WHERE query LIKE '%INSERT INTO t_%' ORDER BY query COLLATE "C";
+ query | calls | calls_aborted
+---------------------------------+-------+---------------
+ INSERT INTO t_check VALUES ($1) | 0 | 1
+ INSERT INTO t_dup VALUES ($1) | 1 | 1
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index 726383a99d7..d5ebb59d5a1 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -425,6 +425,7 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.13';
mean_plan_time | double precision | | |
stddev_plan_time | double precision | | |
calls | bigint | | |
+ calls_aborted | bigint | | |
total_exec_time | double precision | | |
min_exec_time | double precision | | |
max_exec_time | double precision | | |
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql b/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql
index 2f0eaf14ec3..fe8bcb6398b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.12--1.13.sql
@@ -25,6 +25,7 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT mean_plan_time float8,
OUT stddev_plan_time float8,
OUT calls int8,
+ OUT calls_aborted int8,
OUT total_exec_time float8,
OUT min_exec_time float8,
OUT max_exec_time float8,
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9fc9635d330..912e85c9d37 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -48,6 +48,7 @@
#include <unistd.h>
#include "access/parallel.h"
+#include "access/xact.h"
#include "catalog/pg_authid.h"
#include "common/int.h"
#include "executor/instrument.h"
@@ -98,7 +99,7 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
#define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */
#define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */
#define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */
-#define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
+#define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC] + c.calls_aborted) == 0)
/*
* Extension version number, for supporting older extension versions' objects
@@ -155,6 +156,9 @@ typedef struct pgssHashKey
typedef struct Counters
{
int64 calls[PGSS_NUMKIND]; /* # of times planned/executed */
+
+ int64 calls_aborted; /* # of times query was aborted */
+
double total_time[PGSS_NUMKIND]; /* total planning/execution time,
* in msec */
double min_time[PGSS_NUMKIND]; /* minimum planning/execution time in
@@ -274,6 +278,32 @@ static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
static ProcessUtility_hook_type prev_ProcessUtility = NULL;
+/* Transaction callback functions */
+static void pgss_xact_callback(XactEvent event, void *arg);
+static void pgss_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg);
+
+/*
+ * Backend-local tracking of the currently running statement.
+ *
+ * We record the hash key used by pg_stat_statements for the statement that
+ * has begun execution, and whether it completed successfully. If a
+ * transaction or subtransaction abort happens while a statement is marked as
+ * active but not completed, we count it as an aborted call in the callbacks.
+ */
+typedef struct pgssLocalCurrentStmt
+{
+ bool active; /* true if we've started tracking a stmt */
+ bool completed; /* true if the tracked stmt finished OK */
+ pgssHashKey key; /* identity of the tracked stmt */
+} pgssLocalCurrentStmt;
+
+/* One instance per backend. Zero-initialized at backend start. */
+static pgssLocalCurrentStmt pgss_curr_stmt = {false, false, {0}};
+
+/* helper to increment calls_aborted for the current active statement */
+static void pgss_count_current_as_aborted(void);
+
/* Links to shared memory state */
static pgssSharedState *pgss = NULL;
static HTAB *pgss_hash = NULL;
@@ -490,6 +520,10 @@ _PG_init(void)
ExecutorEnd_hook = pgss_ExecutorEnd;
prev_ProcessUtility = ProcessUtility_hook;
ProcessUtility_hook = pgss_ProcessUtility;
+
+ /* Register transaction callbacks for tracking aborted queries */
+ RegisterXactCallback(pgss_xact_callback, NULL);
+ RegisterSubXactCallback(pgss_subxact_callback, NULL);
}
/*
@@ -1033,6 +1067,21 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
static void
pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count)
{
+ volatile bool query_completed = false;
+
t+ /* Start tracking this statement for abort accounting if eligible. */
+ if (pgss && pgss_hash && queryDesc->plannedstmt->queryId != INT64CONST(0)
+ && pgss_enabled(nesting_level))
+ {
+ memset(&pgss_curr_stmt.key, 0, sizeof(pgss_curr_stmt.key));
+ pgss_curr_stmt.key.userid = GetUserId();
+ pgss_curr_stmt.key.dbid = MyDatabaseId;
+ pgss_curr_stmt.key.queryid = queryDesc->plannedstmt->queryId;
+ pgss_curr_stmt.key.toplevel = (nesting_level == 0);
+ pgss_curr_stmt.active = true;
+ pgss_curr_stmt.completed = false;
+ }
+
nesting_level++;
PG_TRY();
{
@@ -1040,9 +1089,14 @@ pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count)
prev_ExecutorRun(queryDesc, direction, count);
else
standard_ExecutorRun(queryDesc, direction, count);
+
+ /* Mark as completed if we reach here */
+ query_completed = true;
}
PG_FINALLY();
{
+ if (pgss_curr_stmt.active)
+ pgss_curr_stmt.completed = query_completed;
nesting_level--;
}
PG_END_TRY();
@@ -1102,6 +1156,10 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->plannedstmt->planOrigin);
}
+ /* Clear local tracking for this statement at ExecutorEnd. */
+ pgss_curr_stmt.active = false;
+ pgss_curr_stmt.completed = false;
+
if (prev_ExecutorEnd)
prev_ExecutorEnd(queryDesc);
else
@@ -1275,6 +1333,62 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
}
}
+/*
+ * Transaction callback: report when queries are aborted due to transaction rollback
+ */
+static void
+pgss_xact_callback(XactEvent event, void *arg)
+{
+ /* We only care about transaction abort events for now */
+ if (event == XACT_EVENT_ABORT)
+ pgss_count_current_as_aborted();
+}
+
+/*
+ * Subtransaction callback: log when queries are aborted due to savepoint rollback
+ */
+static void
+pgss_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ /* We only care about subtransaction abort events for now */
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ pgss_count_current_as_aborted();
+}
+
+/*
+ * Increment calls_aborted for the currently tracked statement
+ */
+static void
+pgss_count_current_as_aborted(void)
+{
+ pgssEntry *entry;
+
+ /* Fast-path checks */
+ if (!pgss || !pgss_hash)
+ return;
+
+ /* Only if we have a statement that started but did not complete */
+ if (!pgss_curr_stmt.active || pgss_curr_stmt.completed)
+ return;
+
+ /* Acquire lock and try to find the entry. Don't create new entries. */
+ LWLockAcquire(pgss->lock, LW_SHARED);
+ entry = (pgssEntry *) hash_search(pgss_hash, &pgss_curr_stmt.key,
+ HASH_FIND, NULL);
+ if (entry)
+ {
+ SpinLockAcquire(&entry->mutex);
+ entry->counters.calls_aborted++;
+ SpinLockRelease(&entry->mutex);
+ }
+ LWLockRelease(pgss->lock);
+
+ /* Avoid double-counting on multiple abort-phase callbacks. */
+ pgss_curr_stmt.active = false;
+ pgss_curr_stmt.completed = false;
+}
+
/*
* Store some statistics for a statement.
*
@@ -1578,8 +1692,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_10 43
#define PG_STAT_STATEMENTS_COLS_V1_11 49
#define PG_STAT_STATEMENTS_COLS_V1_12 52
-#define PG_STAT_STATEMENTS_COLS_V1_13 54
-#define PG_STAT_STATEMENTS_COLS 54 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_13 55
+#define PG_STAT_STATEMENTS_COLS 55 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1922,6 +2036,13 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (kind == PGSS_EXEC || api_version >= PGSS_V1_8)
{
values[i++] = Int64GetDatumFast(tmp.calls[kind]);
+
+ /* Add calls_aborted right after execution calls */
+ if (kind == PGSS_EXEC && api_version >= PGSS_V1_13)
+ {
+ values[i++] = Int64GetDatumFast(tmp.calls_aborted);
+ }
+
values[i++] = Float8GetDatumFast(tmp.total_time[kind]);
}
diff --git a/contrib/pg_stat_statements/sql/dml.sql b/contrib/pg_stat_statements/sql/dml.sql
index 9986b0a22d3..c612eb4cae2 100644
--- a/contrib/pg_stat_statements/sql/dml.sql
+++ b/contrib/pg_stat_statements/sql/dml.sql
@@ -93,3 +93,15 @@ SELECT
FROM pg_stat_statements;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- aborted calls tracking
+DROP TABLE IF EXISTS t_dup, t_check;
+CREATE TABLE t_dup(id int primary key);
+INSERT INTO t_dup VALUES (1); -- success
+CREATE TABLE t_check(x int CHECK (x > 0));
+INSERT INTO t_dup VALUES (1); -- unique violation, aborted and success reported
+INSERT INTO t_check VALUES (0); -- check violation, only failure reported
+
+SELECT query, calls, calls_aborted FROM pg_stat_statements
+WHERE query LIKE '%INSERT INTO t_%' ORDER BY query COLLATE "C";
+
Thanks for starting this thread! I do think it will be valuable to have
an idea of how many calls were aborted. But I am not so sure about
pg_stat_statements being the correct vehicle for this information.
The pg_stat_statements view deals with stats the are updated when a statement
completes execution, so I think it is quite confusing to add this metric to
pg_stat_statements.
I'm wondering how useful that counter alone will be. Since
pg_stat_statements is already quite large,
I agree, but I think it's time to start thinking about splitting
pg_stat_statements into multiple functions. This was recently
discussed [0]/messages/by-id/aHdAQskQCjGWOdfi@paquier.xyz
wouldn't it be better to have
another extension that tracks something like (queryid, sqlstate),
I think this can be a part of the pg_stat_statements extension, but as
a separate view, i.e. pg_stat_statements_aborted ( or something ) which
tracks this info.
part of the system. I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase,
my 2c is the implementation for such statistics, we need to count
when execution starts and when execution ends. The difference in those
2 values give us the aborted calls count.
When we rely on the transaction callbacks, we assume we reach that
hook ( which will occur during normal transaction cleanup ), but I am thinking
of other cases, which are quite common, in which the application times out
and causes the connection to drop. By counting the ExecutorStart and
ExecutorEnd,
as described above, we can cover those cases also.
Unfortunately, I think this will not be very performant to do the way
I describe it
above with the way pg_s_s currently works. As it means we will need to
take an additional spinlock at executor start to count this.
I do think that if we move pg_s_s to using pluggable stats [1]/messages/by-id/aKF0V-T8-XAxj47T@paquier.xyz, we
have an opportunity
to do something like this.
( FWIW, I also think it will be good to have a calls_stated and
called_completed counter at a higher
level also, as in pg_stat_database. )
[0]: /messages/by-id/aHdAQskQCjGWOdfi@paquier.xyz
[1]: /messages/by-id/aKF0V-T8-XAxj47T@paquier.xyz
--
Sami
On Mon, Aug 18, 2025 at 03:10:42PM -0500, Sami Imseih wrote:
I agree, but I think it's time to start thinking about splitting
pg_stat_statements into multiple functions. This was recently
discussed [0]
Yes, no idea for other people, but I'm putting a stop to new additions
until we've split the SQL facing interface a bit, removing some bloat
from the main pgss view. So says the guy who has just bumped pgss to
1.13 a couple of weeks ago, so it sounds a bit metaphoric.
--
Michael
Hi,
On 2025-08-19 09:20:23 +0900, Michael Paquier wrote:
On Mon, Aug 18, 2025 at 03:10:42PM -0500, Sami Imseih wrote:
I agree, but I think it's time to start thinking about splitting
pg_stat_statements into multiple functions. This was recently
discussed [0]Yes, no idea for other people, but I'm putting a stop to new additions
until we've split the SQL facing interface a bit, removing some bloat
from the main pgss view. So says the guy who has just bumped pgss to
1.13 a couple of weeks ago, so it sounds a bit metaphoric.
I think the problem isn't mainly the SQL view, it's that all the additions
made pg_stat_statements have so much overhead that it's practically unusable
for any busy workload. It used to be only an issue on really big servers, but
there's been so much crap stuffed into pgss that it's trivially reproducible
on a laptop.
I think it's pretty insane to do things like variance computation while
holding a spinlock, for every friggin query execution. The spinlock'ed section
is ~185 instructions for me, with plenty high-latency instructions like
divisions.
It's so slow that it has measurable impact for single threaded readonly
pgbench. Which does friggin btree lookups.
Greetings,
Andres Freund
On 19 Aug 2025, at 15:46, Andres Freund <andres@anarazel.de> wrote:
I think it's pretty insane to do things like variance computation while
holding a spinlock, for every friggin query execution. The spinlock'ed section
is ~185 instructions for me, with plenty high-latency instructions like
divisions.It's so slow that it has measurable impact for single threaded readonly
pgbench. Which does friggin btree lookups.
Do you think it could be an option to first add regression benchmarks,
and then consider migrating the most time-consuming columns
to another extension?
Not sure moving results out of pgss could be an option.
---
Benoit