pg_restore handles extended statistics inconsistently with statistics data

Started by Chao Li13 days ago15 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

While testing “[c32fb29e9] Include extended statistics data in pg_dump”, I noticed that pg_dump and pg_restore behave asymmetrically for --statistics-only when a schema is specified.

This command dumps relation stats, attribute stats, and extended stats from the schema:
```
pg_dump --statistics-only -n s1 -f /tmp/plain-s1-stats.sql stats_src
```

However, this command will only restore extended stats for the schema:
```
pg_restore --statistics-only -n s1 -f /tmp/archive-s1-stats.sql /tmp/stats.dump
```

I tried the same test against 1ea44d7ddfb, the immediate predecessor of c32fb29e9. pg_dump dumped relation stats and attribute stats, while pg_restore restored nothing. So the asymmetric behavior for stats already existed. c32fb29e9 then added extended stats to both pg_dump and pg_restore, but the new EXTENDED STATISTICS DATA entries are handled differently from STATISTICS DATA during selective pg_restore, making the inconsistency visible.

The asymmetric behavior was not introduced by c32fb29e9, so I think we probably should not change that for v19. If it's confirmed that this needs to be fixed and nobody else plans to work on it, I would be happy to add it to my TODO list for v20.

For v19, I wonder if we should also exclude extended stats from selective pg_restore in the same way as other stats, so that EXTENDED STATISTICS DATA is treated consistently with STATISTICS DATA. That only requires a one-line change, see the attached exclude_ext_stats.diff. Actually, I suspect c32fb29e9 simply missed that change.

Also, attached test_pgdump_stats.sh is the script I ran to verify the problem.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

exclude_ext_stats.diffapplication/octet-stream; name=exclude_ext_stats.diff; x-unix-mode=0644Download+1-0
test_pgdump_stats.shapplication/octet-stream; name=test_pgdump_stats.sh; x-unix-mode=0755Download
#2Michael Paquier
michael@paquier.xyz
In reply to: Chao Li (#1)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Thu, Jun 11, 2026 at 11:31:26AM +0800, Chao Li wrote:

I tried the same test against 1ea44d7ddfb, the immediate predecessor
of c32fb29e9. pg_dump dumped relation stats and attribute stats,
while pg_restore restored nothing. So the asymmetric behavior for
stats already existed. c32fb29e9 then added extended stats to both
pg_dump and pg_restore, but the new EXTENDED STATISTICS DATA entries
are handled differently from STATISTICS DATA during selective
pg_restore, making the inconsistency visible.

The asymmetric behavior was not introduced by c32fb29e9, so I think
we probably should not change that for v19. If it's confirmed that
this needs to be fixed and nobody else plans to work on it, I would
be happy to add it to my TODO list for v20.

FWIW, I'm going to disagree with your argument, as I find the behavior
of v18 really weird. I would have assumed that the pg_restore
--statistics-only should restore all the stats in the schema without
the objects in the schema, relation and attribute stats (+extended,
only applies with v19), for all the objects in the schema. If you
want only the schema definition and not the objects, we already have
-s for the job.

In your example, the dump in custom format with --statistics looks
right to me: object definitions and stats. pg_dump -Fc
--statistics-only also looks right: only the stats, no objects. The
restore part is bumpy.

So I'd like to think that the behavior of the relation and attribute
stats is wrong in v18 and v19, and that the behavior of extended stats
is actually the right one in v19. Why should custom and plain formats
differ when filtering with a --schema and --statistics-only?

At the end, it seems to me that the right thing to do is the patch
attached, to-be-backpatched down to v18. check-world passes with this
patch, so we have never tested really this path, I guess? I could see
myself adding a scenario in 003, at least.

Jeff or Corey, could you comment please?
--
Michael

Attachments:

0001-Fix-pg_restore-with-schema-and-statistics-only.patchtext/plain; charset=us-asciiDownload+0-2
#3Chao Li
li.evan.chao@gmail.com
In reply to: Michael Paquier (#2)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Jun 12, 2026, at 14:05, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jun 11, 2026 at 11:31:26AM +0800, Chao Li wrote:

I tried the same test against 1ea44d7ddfb, the immediate predecessor
of c32fb29e9. pg_dump dumped relation stats and attribute stats,
while pg_restore restored nothing. So the asymmetric behavior for
stats already existed. c32fb29e9 then added extended stats to both
pg_dump and pg_restore, but the new EXTENDED STATISTICS DATA entries
are handled differently from STATISTICS DATA during selective
pg_restore, making the inconsistency visible.

The asymmetric behavior was not introduced by c32fb29e9, so I think
we probably should not change that for v19. If it's confirmed that
this needs to be fixed and nobody else plans to work on it, I would
be happy to add it to my TODO list for v20.

FWIW, I'm going to disagree with your argument, as I find the behavior
of v18 really weird.

Yeah, I had the same feeling.

I would have assumed that the pg_restore
--statistics-only should restore all the stats in the schema without
the objects in the schema, relation and attribute stats (+extended,
only applies with v19), for all the objects in the schema. If you
want only the schema definition and not the objects, we already have
-s for the job.

In your example, the dump in custom format with --statistics looks
right to me: object definitions and stats. pg_dump -Fc
--statistics-only also looks right: only the stats, no objects. The
restore part is bumpy.

So I'd like to think that the behavior of the relation and attribute
stats is wrong in v18 and v19, and that the behavior of extended stats
is actually the right one in v19. Why should custom and plain formats
differ when filtering with a --schema and --statistics-only?

At the end, it seems to me that the right thing to do is the patch
attached, to-be-backpatched down to v18.

Totally agreed. Making pg_dump and pg_restore behave consistently also feels like the right direction to me.

I was just not sure if we should do that now or for v20, as we are supposed to fix v19-only issues at the current stage. I didn’t verify that on v18.

check-world passes with this
patch, so we have never tested really this path, I guess? I could see
myself adding a scenario in 003, at least.

Jeff or Corey, could you comment please?
--
Michael
<0001-Fix-pg_restore-with-schema-and-statistics-only.patch>

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#4Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#3)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Jun 12, 2026, at 14:56, Chao Li <li.evan.chao@gmail.com> wrote:

On Jun 12, 2026, at 14:05, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jun 11, 2026 at 11:31:26AM +0800, Chao Li wrote:

I tried the same test against 1ea44d7ddfb, the immediate predecessor
of c32fb29e9. pg_dump dumped relation stats and attribute stats,
while pg_restore restored nothing. So the asymmetric behavior for
stats already existed. c32fb29e9 then added extended stats to both
pg_dump and pg_restore, but the new EXTENDED STATISTICS DATA entries
are handled differently from STATISTICS DATA during selective
pg_restore, making the inconsistency visible.

The asymmetric behavior was not introduced by c32fb29e9, so I think
we probably should not change that for v19. If it's confirmed that
this needs to be fixed and nobody else plans to work on it, I would
be happy to add it to my TODO list for v20.

FWIW, I'm going to disagree with your argument, as I find the behavior
of v18 really weird.

Yeah, I had the same feeling.

I would have assumed that the pg_restore
--statistics-only should restore all the stats in the schema without
the objects in the schema, relation and attribute stats (+extended,
only applies with v19), for all the objects in the schema. If you
want only the schema definition and not the objects, we already have
-s for the job.

In your example, the dump in custom format with --statistics looks
right to me: object definitions and stats. pg_dump -Fc
--statistics-only also looks right: only the stats, no objects. The
restore part is bumpy.

So I'd like to think that the behavior of the relation and attribute
stats is wrong in v18 and v19, and that the behavior of extended stats
is actually the right one in v19. Why should custom and plain formats
differ when filtering with a --schema and --statistics-only?

At the end, it seems to me that the right thing to do is the patch
attached, to-be-backpatched down to v18.

Totally agreed. Making pg_dump and pg_restore behave consistently also feels like the right direction to me.

I was just not sure if we should do that now or for v20, as we are supposed to fix v19-only issues at the current stage. I didn’t verify that on v18.

check-world passes with this
patch, so we have never tested really this path, I guess? I could see
myself adding a scenario in 003, at least.

Jeff or Corey, could you comment please?
--
Michael
<0001-Fix-pg_restore-with-schema-and-statistics-only.patch>

If the fix direction is to make pg_restore behave consistently with pg_dump, then I think Michael's change is correct.

I tried to add a test for this. Without the fix, the test fails as below:
```
# +++ tap check in src/bin/pg_dump +++
t/002_pg_dump.pl .. 9710/?
# Failed test 'statistics_only_dump_test_schema_restore: should dump relstats_on_unanalyzed_tables'
# at t/002_pg_dump.pl line 5339.
# '--
```

See the attached v2 for details.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v2-0001-Fix-pg_restore-with-schema-and-statistics-only.patchapplication/octet-stream; name=v2-0001-Fix-pg_restore-with-schema-and-statistics-only.patch; x-unix-mode=0644Download+21-2
#5Michael Paquier
michael@paquier.xyz
In reply to: Chao Li (#4)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Mon, Jun 15, 2026 at 11:51:40AM +0800, Chao Li wrote:

If the fix direction is to make pg_restore behave consistently with
pg_dump, then I think Michael's change is correct.

I have been checking for a few hours, and well.. There was more.

See the attached v2 for details.

That's nice, unfortunately incomplete. I have found a different
pattern that behaves incorrectly, reusing your script of upthread:
pg_restore --statistics --table t \
-f /tmp/archive-s1-stats.sql /tmp/stats.dump
pg_restore --statistics-only --table t \
-f /tmp/archive-s1-stats.sql /tmp/stats.dump

With "--statistics --table t", we restore the definition of table "t"
and its data (correct), miss the stats (incorrect!). With
"--statistics-only --table t", we restore no definition and no data
(correct), and still miss the stats (incorrect!). So we still have a
borked restore of the stats when selecting an individual table. We
don't care about extended stats with this case, as extstats are
objects defined at schema level and we cannot filter them, but I think
that we definitely care about attribute and relation stats here.

This case requires a second change in _tocEntryRequired() even after
the first fix so as it is possible to select a STATISTICS DATA that
depends on a table part of the authorized list close to the list that
includes the "SEQUENCE SET".

With all that in mind, I have the attached. I have expanded the tests
with the --table case, using dump and restore commands, and that seems
to work correctly now with the restores, including the cases with
extstats.

And I'm planning to apply that down to v18 tomorrow, after a second
round of lookups.
--
Michael

Attachments:

v3-0001-Fix-pg_restore-with-schema-table-and-statistics-o.patchtext/plain; charset=us-asciiDownload+42-4
#6Chao Li
li.evan.chao@gmail.com
In reply to: Michael Paquier (#5)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Jun 15, 2026, at 15:55, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jun 15, 2026 at 11:51:40AM +0800, Chao Li wrote:

If the fix direction is to make pg_restore behave consistently with
pg_dump, then I think Michael's change is correct.

I have been checking for a few hours, and well.. There was more.

See the attached v2 for details.

That's nice, unfortunately incomplete. I have found a different
pattern that behaves incorrectly, reusing your script of upthread:
pg_restore --statistics --table t \
-f /tmp/archive-s1-stats.sql /tmp/stats.dump
pg_restore --statistics-only --table t \
-f /tmp/archive-s1-stats.sql /tmp/stats.dump

With "--statistics --table t", we restore the definition of table "t"
and its data (correct), miss the stats (incorrect!). With
"--statistics-only --table t", we restore no definition and no data
(correct), and still miss the stats (incorrect!). So we still have a
borked restore of the stats when selecting an individual table. We
don't care about extended stats with this case, as extstats are
objects defined at schema level and we cannot filter them, but I think
that we definitely care about attribute and relation stats here.

This case requires a second change in _tocEntryRequired() even after
the first fix so as it is possible to select a STATISTICS DATA that
depends on a table part of the authorized list close to the list that
includes the "SEQUENCE SET".

With all that in mind, I have the attached. I have expanded the tests
with the --table case, using dump and restore commands, and that seems
to work correctly now with the restores, including the cases with
extstats.

And I'm planning to apply that down to v18 tomorrow, after a second
round of lookups.
--
Michael
<v3-0001-Fix-pg_restore-with-schema-table-and-statistics-o.patch>

Ah, I missed the --table case. Your finding made me think about the --index case as well. It looks like --index still misses stats with v3 applied. I created two additional test scripts for the table and index cases, see the attached .sh files.

With v3, the table case works:
```
% ./test_pgdump_stats_table.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
242; 1259 16609 TABLE s1 t chaol
3930; 0 16609 TABLE DATA s1 t chaol
3937; 0 0 STATISTICS DATA s1 t
3938; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics --table t:
CREATE TABLE s1.t (
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
pg_restore --statistics-only --table t:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
```

But for the index case, stats are still missing:
```
% ./test_pgdump_stats_index.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3937; 0 0 STATISTICS DATA s1 t
3764; 1259 16618 INDEX s1 idx_expr chaol
3938; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics --index idx_expr:
CREATE INDEX idx_expr ON s1.t USING btree (((a + 1)));
pg_restore --statistics-only --index idx_expr:
```

We cannot just add a strcmp(te->desc, "STATISTICS DATA") == 0 check to the "else if (strcmp(te->desc, "INDEX") == 0)" branch, because STATISTICS DATA would already have matched the earlier table branch. So in v4, I pulled STATISTICS DATA into its own branch before the table and index branches.

With v4, stats for the index are restored:
```
% ./test_pgdump_stats_index.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3937; 0 0 STATISTICS DATA s1 t
3764; 1259 16624 INDEX s1 idx_expr chaol
3938; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics --index idx_expr:
CREATE INDEX idx_expr ON s1.t USING btree (((a + 1)));
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
pg_restore --statistics-only --index idx_expr:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

test_pgdump_stats_index.shapplication/octet-stream; name=test_pgdump_stats_index.sh; x-unix-mode=0755Download
test_pgdump_stats_table.shapplication/octet-stream; name=test_pgdump_stats_table.sh; x-unix-mode=0755Download
v4-0001-Fix-pg_restore-with-schema-table-and-statistics-o.patchapplication/octet-stream; name=v4-0001-Fix-pg_restore-with-schema-table-and-statistics-o.patch; x-unix-mode=0644Download+83-11
#7Corey Huinker
corey.huinker@gmail.com
In reply to: Chao Li (#6)
Re: pg_restore handles extended statistics inconsistently with statistics data

We cannot just add a strcmp(te->desc, "STATISTICS DATA") == 0 check to the
"else if (strcmp(te->desc, "INDEX") == 0)" branch, because STATISTICS DATA
would already have matched the earlier table branch. So in v4, I pulled
STATISTICS DATA into its own branch before the table and index branches.

v4 is looking good, though I'm a bit frustrated that that `pg_dump -s -t
s1.t` will include the index creations but not not the extended stats
objects. Feels like an oversight.

#8Michael Paquier
michael@paquier.xyz
In reply to: Corey Huinker (#7)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Mon, Jun 15, 2026 at 03:25:39PM -0400, Corey Huinker wrote:

We cannot just add a strcmp(te->desc, "STATISTICS DATA") == 0 check to the
"else if (strcmp(te->desc, "INDEX") == 0)" branch, because STATISTICS DATA
would already have matched the earlier table branch. So in v4, I pulled
STATISTICS DATA into its own branch before the table and index branches.

v4 is looking good, though I'm a bit frustrated that that `pg_dump -s -t
s1.t` will include the index creations but not not the extended stats
objects. Feels like an oversight.

That's the behavior since v15. Contrary to you, the behavior of
specifying a table in pg_dump and pg_restore but not including a stats
definition makes more sense here. The reasoning is that a stats
object may live on a different schema than the table it is defined on,
so you cannot guarantee that its schema has been created if you only
create a table on a schema.
--
Michael

#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Tue, Jun 16, 2026 at 07:15:51AM +0900, Michael Paquier wrote:

That's the behavior since v15. Contrary to you, the behavior of
specifying a table in pg_dump and pg_restore but not including a stats
definition makes more sense here. The reasoning is that a stats
object may live on a different schema than the table it is defined on,
so you cannot guarantee that its schema has been created if you only
create a table on a schema.

Sharing the check for a STATISTICS DATA TOC entry on table and index
names was making me ticking a bit, as this is not entirely
collision-proof for the names, but it also looks like we do things the
same way with TABLE DATA and INDEX, so.. At the end, applied down to
v18 as suggested.
--
Michael

#10Corey Huinker
corey.huinker@gmail.com
In reply to: Michael Paquier (#8)
Re: pg_restore handles extended statistics inconsistently with statistics data

The reasoning is that a stats
object may live on a different schema than the table it is defined on,
so you cannot guarantee that its schema has been created if you only
create a table on a schema.

Solid reasoning, though unfortunate.

#11Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Tue, Jun 16, 2026 at 11:07:22AM +0900, Michael Paquier wrote:

Sharing the check for a STATISTICS DATA TOC entry on table and index
names was making me ticking a bit, as this is not entirely
collision-proof for the names, but it also looks like we do things the
same way with TABLE DATA and INDEX, so.. At the end, applied down to
v18 as suggested.

And I am having second thoughts on this one. Take for example this
case:
CREATE SCHEMA s1;
CREATE SCHEMA s2;
CREATE TABLE s1.foo (id int);
INSERT INTO s1.foo SELECT generate_series(1,100);
ANALYZE s1.foo;
CREATE TABLE s2.bar (id int);
CREATE INDEX foo ON s2.bar(id);
INSERT INTO s2.bar SELECT generate_series(1,100);
ANALYZE s2.bar;

And then this:
pg_dump --statistics -Fc -f stats.dump mydb
pg_restore --statistics-only --index=foo -f stats_foo.sql stats.dump

On HEAD, we get relation and attribute we should not in stats_foo.sql,
getting also some data from the table s1.foo. With the patch
attached, that strengthens the name check based on the type of the
depending TOC entries, we only get the relation stats of s2.foo,
nothing about the table s1.foo. This feels too funky to write a test
for, wasting cycles compared to the existing coverage.

pg_restore --index is as old as e8f69be054e9, so it's not like we
could just remove it, but I'd say that with the schema-level restore
this would be tempting.

Anyway, let's improve this situation with the attached, for HEAD and
v18.
--
Michael

Attachments:

0001-Use-dependency-based-matching-for-STATISTICS-DATA-in.patchtext/plain; charset=us-asciiDownload+29-12
#12Corey Huinker
corey.huinker@gmail.com
In reply to: Michael Paquier (#11)
Re: pg_restore handles extended statistics inconsistently with statistics data

pg_restore --index is as old as e8f69be054e9, so it's not like we
could just remove it, but I'd say that with the schema-level restore
this would be tempting.

That temptation tells me that there is no appetite for a similar flag for
named extended stats objects, which makes me wonder how somebody would get
all of the extended stats objects and the pg_restore_extended_stats() calls
for a given table, aside from manually filtering the output of a --no-data
--statistics dump. It's not a very common use case right now, but I suspect
the cases for it will increase in the future.

Anyway, let's improve this situation with the attached, for HEAD and
v18.

+1 to this.

#13Chao Li
li.evan.chao@gmail.com
In reply to: Michael Paquier (#11)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Jun 16, 2026, at 11:23, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jun 16, 2026 at 11:07:22AM +0900, Michael Paquier wrote:

Sharing the check for a STATISTICS DATA TOC entry on table and index
names was making me ticking a bit, as this is not entirely
collision-proof for the names, but it also looks like we do things the
same way with TABLE DATA and INDEX, so.. At the end, applied down to
v18 as suggested.

And I am having second thoughts on this one. Take for example this
case:
CREATE SCHEMA s1;
CREATE SCHEMA s2;
CREATE TABLE s1.foo (id int);
INSERT INTO s1.foo SELECT generate_series(1,100);
ANALYZE s1.foo;
CREATE TABLE s2.bar (id int);
CREATE INDEX foo ON s2.bar(id);
INSERT INTO s2.bar SELECT generate_series(1,100);
ANALYZE s2.bar;

And then this:
pg_dump --statistics -Fc -f stats.dump mydb
pg_restore --statistics-only --index=foo -f stats_foo.sql stats.dump

On HEAD, we get relation and attribute we should not in stats_foo.sql,
getting also some data from the table s1.foo. With the patch
attached, that strengthens the name check based on the type of the
depending TOC entries, we only get the relation stats of s2.foo,
nothing about the table s1.foo. This feels too funky to write a test
for, wasting cycles compared to the existing coverage.

pg_restore --index is as old as e8f69be054e9, so it's not like we
could just remove it, but I'd say that with the schema-level restore
this would be tempting.

Anyway, let's improve this situation with the attached, for HEAD and
v18.
--
Michael
<0001-Use-dependency-based-matching-for-STATISTICS-DATA-in.patch>

I think the dependency-based matching is better, but what happens if the archive is created with --statistics-only? In that case, the statistics entries still have dependencies, but the referenced parent TOC entries are not present in the archive. See the attached test script. It creates the archive with --statistics-only, with the new patch, neither table nor index stats are restored:
```
% ./test_pgdump_stats_new.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3936; 0 0 STATISTICS DATA s1 t
3937; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics-only --table t:
pg_restore --statistics-only --index idx_expr:
```

With HEAD, both table and index stats are restored:
```
% ./test_pgdump_stats_new.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3936; 0 0 STATISTICS DATA s1 t
3937; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics-only --table t:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
pg_restore --statistics-only --index idx_expr:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

test_pgdump_stats_new.shapplication/octet-stream; name=test_pgdump_stats_new.sh; x-unix-mode=0755Download
#14Michael Paquier
michael@paquier.xyz
In reply to: Chao Li (#13)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Tue, Jun 16, 2026 at 02:36:21PM +0800, Chao Li wrote:

I think the dependency-based matching is better, but what happens if
the archive is created with --statistics-only? In that case, the
statistics entries still have dependencies, but the referenced
parent TOC entries are not present in the archive. See the attached
test script. It creates the archive with --statistics-only, with the
new patch, neither table nor index stats are restored:

I can see your argument in that HEAD can improve the detection in what
gets restored if the dumps only include the stats data, but I don't
really buy that this is a use case interesting to support: one can
also dump the data with the index definition and then filter back the
contents when restoring using --index. So at the end having a tighter
check at the restore is more appealing to me, because it leads to a
more predictible result, like in the scenario I have posted upthread
when a schema has the idea to include tables and indexes with the same
names (unlikely, okay, still). Or in short, don't do that.

For the case of upgrades one or the other does not really matter,
pg_upgrade treats all the objects in full.
--
Michael

#15Chao Li
li.evan.chao@gmail.com
In reply to: Michael Paquier (#14)
Re: pg_restore handles extended statistics inconsistently with statistics data

On Jun 16, 2026, at 14:59, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jun 16, 2026 at 02:36:21PM +0800, Chao Li wrote:

I think the dependency-based matching is better, but what happens if
the archive is created with --statistics-only? In that case, the
statistics entries still have dependencies, but the referenced
parent TOC entries are not present in the archive. See the attached
test script. It creates the archive with --statistics-only, with the
new patch, neither table nor index stats are restored:

I can see your argument in that HEAD can improve the detection in what
gets restored if the dumps only include the stats data, but I don't
really buy that this is a use case interesting to support: one can
also dump the data with the index definition and then filter back the
contents when restoring using --index. So at the end having a tighter
check at the restore is more appealing to me, because it leads to a
more predictible result, like in the scenario I have posted upthread
when a schema has the idea to include tables and indexes with the same
names (unlikely, okay, still). Or in short, don't do that.

For the case of upgrades one or the other does not really matter,
pg_upgrade treats all the objects in full.
--
Michael

Make sense. Then the new patch looks good to me.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/