pg_stat_statements has duplicate entries for the same query & queryId
Hi Folks!
Let me know if there's a better mailing list to ask this in.
I have a statistics collector that collects data from various postgres
statistics tables, pg_stat_statements being one of them. This is done on an
entire fleet of Postgres databases. From the collected data we record the
timestamp of each collection in the query itself as extract(epoch from
now()) as ts. What I'm seeing is that for the same query *and* query id,
there are two rows with different statistics data *at the same time*. For
example one row can have 2 calls while another can have 4. Anyone else run
into this or have any idea why this can occur?
Regards,
Jevon Cowell
On 5/18/25 12:20, Jevon Cowell wrote:
Hi Folks!
Let me know if there's a better mailing list to ask this in.I have a statistics collector that collects data from various postgres
statistics tables, pg_stat_statements being one of them. This is done on
an entire fleet of Postgres databases. From the collected data we record
the timestamp of each collection in the query itself as extract(epoch
from now()) as ts. What I'm seeing is that for the same query
/and/ query id, there are two rows with different statistics data /at
the same time/. For example one row can have 2 calls while another can
have 4. Anyone else run into this or have any idea why this can occur?
From here:
https://www.postgresql.org/docs/current/pgstatstatements.html
queryid bigint
Hash code to identify identical normalized queries.
query text
Text of a representative statement
From here:
https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT
"now() is a traditional PostgreSQL equivalent to transaction_timestamp()"
and
"transaction_timestamp() is equivalent to CURRENT_TIMESTAMP, but is
named to clearly reflect what it returns."
Therefore now() is pinned to the time the transaction started.
Consequently it is conceivable that the queries actually ran at
different times but got stamped with an identical timestamp via
extract(epoch from now()) as ts.
Regards,
Jevon Cowell
--
Adrian Klaver
adrian.klaver@aklaver.com
Jevon Cowell wrote:
What I'm seeing is that for the same query *and* query id,
there are two rows with different statistics data *at the same time*. For
example one row can have 2 calls while another can have 4
That looks normal. The unicity to expect is on
(queryid, userid, dbid, toplevel).
From https://www.postgresql.org/docs/current/pgstatstatements.html :
This view contains one row for each distinct combination of database
ID, user ID, query ID and whether it's a top-level statement or not
(up to the maximum number of distinct statements that the module can
track)
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
On Mon, May 19, 2025 at 09:07:35AM +0200, Daniel Verite wrote:
That looks normal. The unicity to expect is on
(queryid, userid, dbid, toplevel).From https://www.postgresql.org/docs/current/pgstatstatements.html :
This view contains one row for each distinct combination of database
ID, user ID, query ID and whether it's a top-level statement or not
(up to the maximum number of distinct statements that the module can
track)
Yeah. The uniqueness of the entries in the PGSS hash table is
enforced by pgssHashKey in pg_stat_statements.c. Or an unknown bug
has been found, but there is no data proving that here.
--
Michael