Preserve index stats during ALTER TABLE ... TYPE ...
Hi hackers,
while working on relfilenode statistics [1]/messages/by-id/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal, I observed that index stats
are not preserved during ALTER TABLE ... TYPE ....
Indeed, for example:
postgres=# CREATE TABLE test_tab(a int primary key, b int, c int);
CREATE INDEX test_b_idx ON test_tab(b);
-- Force an index scan on test_b_idx
SELECT * FROM test_tab WHERE b = 2;
CREATE TABLE
CREATE INDEX
a | b | c
---+---+---
(0 rows)
postgres=# select indexrelname, idx_scan from pg_stat_all_indexes where indexrelname in ('test_b_idx', 'test_tab_pkey');
indexrelname | idx_scan
---------------+----------
test_tab_pkey | 0
test_b_idx | 1
(2 rows)
postgres=# select idx_scan from pg_stat_all_tables where relname = 'test_tab';
idx_scan
----------
1
(1 row)
postgres=# ALTER TABLE test_tab ALTER COLUMN b TYPE int;
ALTER TABLE
postgres=# select indexrelname, idx_scan from pg_stat_all_indexes where indexrelname in ('test_b_idx', 'test_tab_pkey');
indexrelname | idx_scan
---------------+----------
test_tab_pkey | 0
test_b_idx | 0
(2 rows)
postgres=# select idx_scan from pg_stat_all_tables where relname = 'test_tab';
idx_scan
----------
0
(1 row)
During ALTER TABLE ... TYPE ... on an indexed column, a new index is created and
the old one is dropped.
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).
Note that the issue is the same if a rewrite is involved (ALTER TABLE test_tab
ALTER COLUMN b TYPE bigint).
PFA, a patch to $SUBJECT.
A few remarks:
- We can not use pgstat_copy_relation_stats() because the old index is dropped
before the new one is created, so the patch adds a new PgStat_StatTabEntry
pointer in AlteredTableInfo.
- The stats are saved in ATPostAlterTypeParse() (before the old index is dropped)
and restored in ATExecAddIndex() once the new index is created.
- Note that pending statistics (if any) are not preserved, only the
accumulated stats from previous transactions. I think this is
acceptable since the accumulated stats represent the historical usage patterns we
want to maintain.
- The patch adds a few tests to cover multiple scenarios (with and without
rewrites, and indexes with and without associated constraints).
- I'm not familiar with this area of the code, the patch is an attempt to fix
the issue, maybe there is a more elegant way to solve it.
- The issue exists back to v13, but I'm not sure that's serious enough for
back-patching.
Looking forward to your feedback,
Regards,
[1]: /messages/by-id/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Preserve-index-stats-during-ALTER-TABLE-.-TYPE.patchtext/x-diff; charset=us-asciiDownload+157-5
Hi,
Thanks for raising this issue and for the patch!
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).
I agree.
- We can not use pgstat_copy_relation_stats() because the old index is dropped
before the new one is created, so the patch adds a new PgStat_StatTabEntry
pointer in AlteredTableInfo.
I wonder if it will be good to have a pgstat_save_relation_stats() routine that
gets called in all code paths that will need to restore the stats. This way
pgstat_copy_relation_stats can also be used. This will be cleaner than code
paths that need this having to deal with pgstat_fetch_stat_tabentry?
Have not thought this thoroughly, but it seems like it might be a more general
approach.
- The patch adds a few tests to cover multiple scenarios (with and without
rewrites, and indexes with and without associated constraints).
The current patch does not work for partitioned tables because
the "oldId" is that of the parent index which has no stats. So we
are just copying zeros to the new entry.
```
DROP TABLE test_tab;
CREATE TABLE test_tab(a int primary key, b int, c int) partition by range (a);
CREATE TABLE test_tab_p1 PARTITION OF test_tab
FOR VALUES FROM (0) TO (100);
CREATE TABLE test_tab_p2 PARTITION OF test_tab
FOR VALUES FROM (100) TO (200);
CREATE INDEX test_b_idx ON test_tab(b);
-- Force an index scan on test_b_idx
SELECT * FROM test_tab WHERE b = 2;
test=# select indexrelname, idx_scan from pg_stat_all_indexes where
indexrelname like '%test%';
indexrelname | idx_scan
-------------------+----------
test_tab_p1_pkey | 0
test_tab_p2_pkey | 0
test_tab_p1_b_idx | 1
test_tab_p2_b_idx | 1
(4 rows)
test=# ALTER TABLE test_tab ALTER COLUMN b TYPE int;
ALTER TABLE
test=# select indexrelname, idx_scan from pg_stat_all_indexes where
indexrelname like '%test%';
indexrelname | idx_scan
-------------------+----------
test_tab_p1_pkey | 0
test_tab_p2_pkey | 0
test_tab_p1_b_idx | 0
test_tab_p2_b_idx | 0
(4 rows)
```
Regards,
--
Sami Imseih
Amazon Web Services (AWS)
Hi,
On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote:
Hi,
Thanks for raising this issue and for the patch!
Thanks for looking at it!
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).I agree.
- We can not use pgstat_copy_relation_stats() because the old index is dropped
before the new one is created, so the patch adds a new PgStat_StatTabEntry
pointer in AlteredTableInfo.I wonder if it will be good to have a pgstat_save_relation_stats() routine that
gets called in all code paths that will need to restore the stats. This way
pgstat_copy_relation_stats can also be used. This will be cleaner than code
paths that need this having to deal with pgstat_fetch_stat_tabentry?
pgstat_copy_relation_stats() needs 2 Relation, I'm not sure how a new
pgstat_save_relation_stats() could help using pgstat_copy_relation_stats()
here.
The current patch does not work for partitioned tables because
the "oldId" is that of the parent index which has no stats. So we
are just copying zeros to the new entry.
Doh, of course. I've spend some time on it and now have something working.
The idea is to:
- store a List of savedIndexStats. The savedIndexStats struct would get the
PgStat_StatTabEntry + all the information needed to be able to use
CompareIndexInfo() when restoring the stats (so that we can restore each PgStat_StatTabEntry
in the right index).
- Iterate on all the indexes and populate this new list in AlteredTableInfo in
ATPostAlterTypeParse().
- Iterate on all the indexes and use the list above and CompareIndexInfo() to
restore the stats in ATExecAddIndex().
Will polish and share next week.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Fri, Oct 10, 2025 at 03:52:58PM +0000, Bertrand Drouvot wrote:
The idea is to:
- store a List of savedIndexStats. The savedIndexStats struct would get the
PgStat_StatTabEntry + all the information needed to be able to use
CompareIndexInfo() when restoring the stats (so that we can restore each PgStat_StatTabEntry
in the right index).- Iterate on all the indexes and populate this new list in AlteredTableInfo in
ATPostAlterTypeParse().- Iterate on all the indexes and use the list above and CompareIndexInfo() to
restore the stats in ATExecAddIndex().Will polish and share next week.
PFA v2 that handles partitioned tables/indexes.
A few words about its design:
I started by just creating a list of of PgStat_StatTabEntry + all the information
needed to be able to use CompareIndexInfo() when restoring the stats.
But that lead to O(P^2) when restoring the stats (for each new partition index
(P), it was scanning through all saved ones (P)), and could be non negligible.
For example, with 20K partitions and no rewrite:
- 89.64% 0.00% postgres postgres [.] ATController
- ATController
- 79.23% ATRewriteCatalogs
- 64.43% ATExecCmd
- 56.53% ATExecAddIndex
+ 46.34% DefineIndex
+ 10.19% ATExecAddIndex_RestoreStats
+ 5.29% ATExecAlterColumnType
+ 2.60% CommandCounterIncrement
+ 11.91% ATPostAlterTypeCleanup
+ 2.77% relation_open
+ 8.79% ATPrepCmd
+ 1.62% ATRewriteTables
We can see ATExecAddIndex_RestoreStats was not negligible at that time. That was
less of an issue when rewrite was involved:
- 89.35% 0.00% postgres postgres [.] ATController
- ATController
+ 51.24% ATRewriteTables
- 33.89% ATRewriteCatalogs
- 26.98% ATExecCmd
- 22.16% ATExecAddIndex
+ 17.44% DefineIndex
4.71% ATExecAddIndex_RestoreStats
+ 3.58% ATExecAlterColumnType
+ 1.24% CommandCounterIncrement
+ 5.53% ATPostAlterTypeCleanup
+ 1.32% relation_open
+ 4.22% ATPrepCmd
So I added a hash table keyed by partition table OID, with each entry containing
a list of saved index stats for that partition. This way, restoration is now O(P)
instead of O(P�).
With the attached, the perf profile (again 20K partitions and no rewrite) is:
- 89.06% 0.00% postgres postgres [.] ATController
- ATController
- 77.65% ATRewriteCatalogs
- 61.57% ATExecCmd
- 52.63% ATExecAddIndex
+ 51.16% DefineIndex
+ 1.47% ATExecAddIndex_RestoreStats
+ 5.96% ATExecAlterColumnType
+ 2.98% CommandCounterIncrement
+ 13.26% ATPostAlterTypeCleanup
+ 2.73% relation_open
+ 9.59% ATPrepCmd
+ 1.82% ATRewriteTables
As we can see, the ATExecAddIndex_RestoreStats impact is now around 1.5%, which
I think is acceptable given the benefit of preserving historical statistics.
Additional remarks:
- I initially tried using only CompareIndexInfo() for matching, but this fails
when multiple indexes exist on the same column(s). So I added the index name as
the primary matching check with CompareIndexInfo() kept as a sanity check (I think
that it could be removed).
- The new resources are allocated in the PortalContext, it's a short lived one so
the patch does not free them explicitly.
- Much more tests have been added as compared to v1.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Preserve-index-stats-during-ALTER-TABLE-.-TYPE.patchtext/x-diff; charset=us-asciiDownload+576-5
On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote:
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).I agree.
Hmm. Why should it be always OK to preserve the stats of an index
when one of its attributes is changed so as a relation is rewritten?
A REINDEX (including CONCURRENTLY), while it initiates a rewrite of
the index, does not change the definition of the underlying index. A
type alteration, on the contrary, does. Hence, the planner may decide
to treat a given index differently (doesn't it? Tuple width or
whole-row references come into mind). Keeping the past stats may
actually lead to confusing conclusions when overlapping them with some
of the new number generated under the new type? Could there be more
benefits in always resetting them as we do now?
Any thoughts from others?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote:
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).
Hmm. Why should it be always OK to preserve the stats of an index
when one of its attributes is changed so as a relation is rewritten?
Right offhand, this proposal seems utterly unsafe, to the point of
maybe introducing security-grade bugs. I see that the patch compares
opfamilies but that seems insufficient, since "same opfamily" does not
mean "binary compatible". We could easily be restoring stats whose
binary content is incompatible with the new column type.
regards, tom lane
On Thu, Oct 16, 2025 at 01:38:19AM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
Hmm. Why should it be always OK to preserve the stats of an index
when one of its attributes is changed so as a relation is rewritten?Right offhand, this proposal seems utterly unsafe, to the point of
maybe introducing security-grade bugs. I see that the patch compares
opfamilies but that seems insufficient, since "same opfamily" does not
mean "binary compatible". We could easily be restoring stats whose
binary content is incompatible with the new column type.
The point of the thread is about copying the aggregated numbers stored
in pgstats. These numbers have a fixed size, for contents in
PgStat_StatTabEntry. The point of the patch is about copying these
entries in the pgstats hash table across rewrites, so I am not sure to
follow your argument.
My point was slightly different: I am questioning if a reset does not
make more sense in most cases as an attribute type change may cause
the planner to choose a different Path, making the new stats generated
leading to decisions that are inconsistent when aggregated with the
numbers copied across the rewrites.
--
Michael
Hi,
On Thu, Oct 16, 2025 at 02:06:01PM +0900, Michael Paquier wrote:
On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote:
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).I agree.
Hmm. Why should it be always OK to preserve the stats of an index
when one of its attributes is changed so as a relation is rewritten?
I agree that in this case the stats (namely idx_scan, idx_tup_read and
idx_tup_fetch) would represent a mixture of two different index structures.
Hence, the planner may decide
to treat a given index differently (doesn't it? Tuple width or
whole-row references come into mind).
I do think so, yes.
Keeping the past stats may
actually lead to confusing conclusions when overlapping them with some
of the new number generated under the new type? Could there be more
benefits in always resetting them as we do now?
The issue is that these stats are also exposed at the table level
(idx_scan, last_idx_scan, idx_tup_fetch in pg_stat_all_tables).
That's valuable information for understanding table access patterns
that is currently lost.
It would make more sense to reset the index stats if table level
stats were tracked independently from the underlying index stats.
Also, users already have pg_stat_reset_single_table_counters() if
they want to reset the index stats. This patch gives users the choice to preserve
stats or reset them. Currently, they have no choice: the stats are
always lost.
Also, when the rewrite also occurs on the table (type changes) a stat like
seq_scan is preserved (because the table Oid does not change, only the
relfilenode does). Why would it be ok to preserve seq_scan and not idx_scan?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Oct 16, 2025 at 03:09:24PM +0900, Michael Paquier wrote:
On Thu, Oct 16, 2025 at 01:38:19AM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
Hmm. Why should it be always OK to preserve the stats of an index
when one of its attributes is changed so as a relation is rewritten?Right offhand, this proposal seems utterly unsafe, to the point of
maybe introducing security-grade bugs. I see that the patch compares
opfamilies but that seems insufficient, since "same opfamily" does not
mean "binary compatible". We could easily be restoring stats whose
binary content is incompatible with the new column type.The point of the thread is about copying the aggregated numbers stored
in pgstats. These numbers have a fixed size, for contents in
PgStat_StatTabEntry. The point of the patch is about copying these
entries in the pgstats hash table across rewrites, so I am not sure to
follow your argument.
Same here.
My point was slightly different: I am questioning if a reset does not
make more sense in most cases as an attribute type change may cause
the planner to choose a different Path, making the new stats generated
leading to decisions that are inconsistent when aggregated with the
numbers copied across the rewrites.
See my reply in [1]/messages/by-id/aPCVvWZjvvC1ZO78@ip-10-97-1-34.eu-west-3.compute.internal.
[1]: /messages/by-id/aPCVvWZjvvC1ZO78@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hence, the planner may decide
to treat a given index differently (doesn't it? Tuple width or
whole-row references come into mind).
I do think so, yes.
The planner may also treat the index differently after
ALTER INDEX ... ALTER COLUMN ... SET STATISTICS ...; ANALYZE,
but in
I am not sure the planner aspect is a good reason to not preserve
cumulative stats for an index.
In the case where the table is not rewritten, Isn't that a clear case in
which stats should be preserved?
Keeping the past stats may
actually lead to confusing conclusions when overlapping them with some
of the new number generated under the new type? Could there be more
benefits in always resetting them as we do now?The issue is that these stats are also exposed at the table level
(idx_scan, last_idx_scan, idx_tup_fetch in pg_stat_all_tables).
That's valuable information for understanding table access patterns
that is currently lost.It would make more sense to reset the index stats if table level
stats were tracked independently from the underlying index stats.
This sounds like a good enhancement. This will also take care of the
index stats being preserved on a table in the case an index is dropped.
But that means we will need some new fields to aggregate index access
in PgStat_StatTabEntry, which may not be so good in
terms of memory and performance.
--
Sami
Hi,
On Thu, Oct 16, 2025 at 03:39:59PM -0500, Sami Imseih wrote:
The issue is that these stats are also exposed at the table level
(idx_scan, last_idx_scan, idx_tup_fetch in pg_stat_all_tables).
That's valuable information for understanding table access patterns
that is currently lost.It would make more sense to reset the index stats if table level
stats were tracked independently from the underlying index stats.This sounds like a good enhancement. This will also take care of the
index stats being preserved on a table in the case an index is dropped.But that means we will need some new fields to aggregate index access
in PgStat_StatTabEntry,
Yeah, we'd need to add say:
total_idx_numscans
idx_lastscan
total_tuples_idx_fetched
to get rid of the pg_stat_get_*() calls on the indexes in pg_stat_all_tables().
That way we don't need to worry about copying the statistics during the alter
command.
which may not be so good in
terms of memory and performance.
Performance:
We could populate those fields at the "table" level when we flush the
index stats (similar to what we do currently for some tables stats that populate
some database stats at flush time). That would avoid double incrementing.
Memory:
Adding those 3 extra fields to PgStat_StatTabEntry does not worry me that
much given the number of fields already in PgStat_StatTabEntry.
The thing that is not ideal is that as PgStat_StatTabEntry is currently used
for both tables and indexes stats then we'll add fields that would be only used
for the table case. But that's already the case for some other fields and this
will be "solved" once we'll resume working on "Split index and table statistics
into different types of stats" ([1]/messages/by-id/f572abe7-a1bb-e13b-48c7-2ca150546822@gmail.com) means after relfilenode stats ([2]/messages/by-id/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal) are
implemented (I'm currently working on it).
I prefer this approach as compared to the current proposal (copying the stats
during the alter command). Thoughts?
[1]: /messages/by-id/f572abe7-a1bb-e13b-48c7-2ca150546822@gmail.com
[2]: /messages/by-id/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Oct 16, 2025 at 03:39:59PM -0500, Sami Imseih wrote:
This sounds like a good enhancement. This will also take care of the
index stats being preserved on a table in the case an index is dropped.But that means we will need some new fields to aggregate index access
in PgStat_StatTabEntry, which may not be so good in
terms of memory and performance.
Putting aside the should-we-preserve-index-stats-on-relation-rewrite
problem for a minute.
FWIW, I think that aiming at less memory per entry is better in the
long term, because we are that it's going to be cheaper. One thing
that's been itching me quite a bit with pgstat_relation.c lately is
that PgStat_StatTabEntry is being used by both tables and indexes, but
we don't care about the most of its fields for indexes. The ones I
can see as used for indexes are:
- blocks_hit
- blocks_fetched
- reset_time
- tuples_returned
- tuples_fetched
- lastscan
- numscan
This means that we don't care about the business around HOT, vacuum
(we could care about the vacuum timings for individual index
cleanups), analyze, live/dead tuples.
It may be time to do a clean split, even if the current state of
business in pgstat.h is a kind of historical thing.
--
Michael
Hi,
On Mon, Oct 20, 2025 at 10:53:37AM +0900, Michael Paquier wrote:
On Thu, Oct 16, 2025 at 03:39:59PM -0500, Sami Imseih wrote:
This sounds like a good enhancement. This will also take care of the
index stats being preserved on a table in the case an index is dropped.But that means we will need some new fields to aggregate index access
in PgStat_StatTabEntry, which may not be so good in
terms of memory and performance.Putting aside the should-we-preserve-index-stats-on-relation-rewrite
problem for a minute.
Okay.
FWIW, I think that aiming at less memory per entry is better in the
long term, because we are that it's going to be cheaper. One thing
that's been itching me quite a bit with pgstat_relation.c lately is
that PgStat_StatTabEntry is being used by both tables and indexes, but
we don't care about the most of its fields for indexes. The ones I
can see as used for indexes are:
- blocks_hit
- blocks_fetched
- reset_time
- tuples_returned
- tuples_fetched
- lastscan
- numscanThis means that we don't care about the business around HOT, vacuum
(we could care about the vacuum timings for individual index
cleanups), analyze, live/dead tuples.
Exactly, and that's one of the reasons why the "Split index and table statistics
into different types of stats" work ([1]/messages/by-id/f572abe7-a1bb-e13b-48c7-2ca150546822@gmail.com) started.
It may be time to do a clean split, even if the current state of
business in pgstat.h is a kind of historical thing.
Yeah, but maybe it would make more sense to look at this once the relfilenode
stats one ([2]/messages/by-id/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal) is done? (see [3]/messages/by-id/20230105002733.ealhzubjaiqis6ua@awork3.anarazel.de).
[1]: /messages/by-id/f572abe7-a1bb-e13b-48c7-2ca150546822@gmail.com
[2]: /messages/by-id/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal
[3]: /messages/by-id/20230105002733.ealhzubjaiqis6ua@awork3.anarazel.de
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Mon, Oct 20, 2025 at 06:22:00AM +0000, Bertrand Drouvot wrote:
On Mon, Oct 20, 2025 at 10:53:37AM +0900, Michael Paquier wrote:
It may be time to do a clean split, even if the current state of
business in pgstat.h is a kind of historical thing.Yeah, but maybe it would make more sense to look at this once the relfilenode
stats one ([2]) is done? (see [3]).
Ah, right, that rings a bell now. So as you mention the history of
events is that the refactoring related to relfilenodes should happen
first. Maybe we should just focus on that for now, then. TBH, I
cannot get excited for the moment in making tablecmds.c more complex
regarding its stats handling on rewrite without knowing if it could
become actually simpler. This is also assuming that we actually do
something about it, at the end, which is not something I am sure is
worth the extra complications in ALTER TABLE. And perhaps we could
get some nice side effects of the other discussion for what you are
proposing (first answer points to no, but it's hard to say as well if
that would be a definitive answer).
--
Michael