pg_stat_statements has duplicate entries for the same query & queryId

Started by Jevon Cowell11 months ago4 messagesgeneral

jcowell@atlassian.com

11 months ago

Hi Folks!
Let me know if there's a better mailing list to ask this in.

I have a statistics collector that collects data from various postgres
statistics tables, pg_stat_statements being one of them. This is done on an
entire fleet of Postgres databases. From the collected data we record the
timestamp of each collection in the query itself as extract(epoch from
now()) as ts. What I'm seeing is that for the same query *and* query id,
there are two rows with different statistics data *at the same time*. For
example one row can have 2 calls while another can have 4. Anyone else run
into this or have any idea why this can occur?

Regards,
Jevon Cowell

Adrian Klaver

adrian.klaver@aklaver.com

11 months ago

In reply to: Jevon Cowell (#1)

Re: pg_stat_statements has duplicate entries for the same query & queryId

On 5/18/25 12:20, Jevon Cowell wrote:

Hi Folks!
Let me know if there's a better mailing list to ask this in.

I have a statistics collector that collects data from various postgres
statistics tables, pg_stat_statements being one of them. This is done on
an entire fleet of Postgres databases. From the collected data we record
the timestamp of each collection in the query itself as extract(epoch
from now()) as ts. What I'm seeing is that for the same query
/and/ query id, there are two rows with different statistics data /at
the same time/. For example one row can have 2 calls while another can
have 4. Anyone else run into this or have any idea why this can occur?

From here:

https://www.postgresql.org/docs/current/pgstatstatements.html

queryid bigint

Hash code to identify identical normalized queries.

query text

Text of a representative statement

From here:

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

"now() is a traditional PostgreSQL equivalent to transaction_timestamp()"

and

"transaction_timestamp() is equivalent to CURRENT_TIMESTAMP, but is
named to clearly reflect what it returns."

Therefore now() is pinned to the time the transaction started.
Consequently it is conceivable that the queries actually ran at
different times but got stamped with an identical timestamp via
extract(epoch from now()) as ts.

Regards,
Jevon Cowell

--
Adrian Klaver
adrian.klaver@aklaver.com

Daniel Verite

daniel@manitou-mail.org

11 months ago

In reply to: Jevon Cowell (#1)

Re: pg_stat_statements has duplicate entries for the same query & queryId

Jevon Cowell wrote:

What I'm seeing is that for the same query *and* query id,
there are two rows with different statistics data *at the same time*. For
example one row can have 2 calls while another can have 4

That looks normal. The unicity to expect is on
(queryid, userid, dbid, toplevel).

From https://www.postgresql.org/docs/current/pgstatstatements.html :

This view contains one row for each distinct combination of database
ID, user ID, query ID and whether it's a top-level statement or not
(up to the maximum number of distinct statements that the module can
track)

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

Michael Paquier

michael@paquier.xyz

11 months ago

In reply to: Daniel Verite (#3)

Re: pg_stat_statements has duplicate entries for the same query & queryId

On Mon, May 19, 2025 at 09:07:35AM +0200, Daniel Verite wrote:

That looks normal. The unicity to expect is on
(queryid, userid, dbid, toplevel).

From https://www.postgresql.org/docs/current/pgstatstatements.html :

This view contains one row for each distinct combination of database
ID, user ID, query ID and whether it's a top-level statement or not
(up to the maximum number of distinct statements that the module can
track)

Yeah. The uniqueness of the entries in the PGSS hash table is
enforced by pgssHashKey in pg_stat_statements.c. Or an unknown bug
has been found, but there is no data proving that here.
--
Michael