pg_stat_statements: improve loading and saving routines for the dump file

Started by Nonameover 1 year ago8 messageshackers

m.litsarev@postgrespro.ru

over 1 year ago

Hi!

Currently in pg_stat_statements save/load routines the whole pgssEntry
entity data are written/read with its last field
slock_t mutex;
which is actually not used then.
This small patch fixes this issue. Hope, it will be useful.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Ivan Kush

ivan.kush@tantorlabs.com

over 1 year ago

In reply to: Noname (#1)

Re: pg_stat_statements: improve loading and saving routines for the dump file

Hello, Mikhail.

1) I'd add to comment a reason, why mutex should be last.

// Mutex should be last field, as this field isn't copied to dump file

+	/* protects the counters only. Should be the very last field, as this field isn't copied to dump file
+	slock_t		mutex;
  } pgssEntry;

2) You didn't take into account the upgrade. Saved in Postgres with this
byte and try to load in version without this byte.

On 1/20/25 16:49, m.litsarev@postgrespro.ru wrote:

Hi!

Currently in pg_stat_statements save/load routines the whole pgssEntry
entity data are written/read with its last field
slock_t mutex;
which is actually not used then.
This small patch fixes this issue. Hope, it will be useful.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

--
Best wishes,
Ivan Kush
Tantor Labs LLC

Noname

m.litsarev@postgrespro.ru

over 1 year ago

In reply to: Ivan Kush (#2)

Re: pg_stat_statements: improve loading and saving routines for the dump file

// Mutex should be last field, as this field isn't copied to dump file

Updated.

2) You didn't take into account the upgrade. Saved in Postgres with
this byte and try to load in version without this byte.

The PGSS_DUMP_FILE format is defined by PGSS_FILE_HEADER magic number
(the first four bytes in the dump file). Once they does not coinside
(when the file is being read) the statistics are not loaded from the
dump file and set to zero and the irrelevant dump file is skipped. This
is the actual behaviour implemented in the code.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Ivan Kush

ivan.kush@tantorlabs.com

over 1 year ago

In reply to: Noname (#3)

Re: pg_stat_statements: improve loading and saving routines for the dump file

What does this patch give on aglobal scale? Does it save much memory or
increase performance? How many times?

On 1/21/25 13:51, m.litsarev@postgrespro.ru wrote:

// Mutex should be last field, as this field isn't copied to dump file

Updated.

2) You didn't take into account the upgrade. Saved in Postgres with
this byte and try to load in version without this byte.

The PGSS_DUMP_FILE format is defined by PGSS_FILE_HEADER magic number
(the first four bytes in the dump file). Once they does not coinside
(when the file is being read) the statistics are not loaded from the
dump file and set to zero and the irrelevant dump file is skipped.
This is the actual behaviour implemented in the code.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

--
Best wishes,
Ivan Kush
Tantor Labs LLC

Sami Imseih

samimseih@gmail.com

over 1 year ago

In reply to: Ivan Kush (#4)

Re: pg_stat_statements: improve loading and saving routines for the dump file

This will only reduce the size of the
$PGDATA/pg_stat/pg_stat_statements.txt file. Even with
100k entries, the most I have seen pg_stat_statements.max
set to, that will be less than 1 MB of disk saving. The default
config of 5k entries will be much less.

Regards,

Sami

Noname

m.litsarev@postgrespro.ru

over 1 year ago

In reply to: Ivan Kush (#4)

Re: pg_stat_statements: improve loading and saving routines for the dump file

What does this patch give on aglobal scale? Does it save much memory or
increase performance? How many times?

This patch makes the code semantically more correct and we don't lose
anything. It is obviously not about performance or memory optimisation.

This will only reduce the size of the
$PGDATA/pg_stat/pg_stat_statements.txt file. Even with
100k entries, the most I have seen pg_stat_statements.max
set to, that will be less than 1 MB of disk saving. The default
config of 5k entries will be much less.

Perfectly agree. I would just add that statistics are dropped into the
pg_stat_statements.txt file at server stop which is not a very frequent
event.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Noname (#6)

Re: pg_stat_statements: improve loading and saving routines for the dump file

m.litsarev@postgrespro.ru writes:

What does this patch give on aglobal scale? Does it save much memory or
increase performance? How many times?

This patch makes the code semantically more correct and we don't lose
anything. It is obviously not about performance or memory optimisation.

Indeed not: on my machine I see sizeof(pgssEntry) = 432. It's full of
int64 fields, so the alignment requirement is 8 bytes, meaning that
the mutex field accounts for 8 bytes even though it's likely just 1
or 4 bytes wide. Still, that means what you suggest is only going
to save 8/432 = 1.8% of the on-disk size of the struct. Given that
we also store the SQL query text for each entry, the net savings
fraction is even smaller.

I don't really agree that this is adding any semantic correctness
either. If we were reading directly into a live hashtable entry,
overwriting the mutex field could indeed be bad. But the fread()
call is reading into a local variable "temp" and we only copy
selected fields out of that. So it's just some bytes in a
transient stack entry.

On the whole therefore, I'm inclined to reject this on several
grounds:

* The savings is not worth the costs of bumping the stats file format
magic number (and thereby invalidating everyone's stored statistics).

* I'd rather keep the flexibility to put the mutex wherever we want in
the struct. Right now that isn't worth much either, but depending on
what fields get added in future, we might have the opportunity to move
the mutex to someplace where it fits into padding space and hence adds
nothing to sizeof(pgssEntry).

* Adding the requirement on where the mutex is makes the code more
fragile, since somebody might ignore the comment and place things
incorrectly. I'd like to think that such an error would be spotted
immediately, but we've committed sillier mistakes. The consequences
would likely be that some fields don't get stored/reloaded correctly,
which might escape notice for awhile.

regards, tom lane

Noname

m.litsarev@postgrespro.ru

about 1 year ago

In reply to: Sami Imseih (#5)

Re: pg_stat_statements: improve loading and saving routines for the dump file

Hi!

Thank you for detailed explanations.

Respectfully,
Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

pg_stat_statements: improve loading and saving routines for the dump file

Attachments:

Attachments: