Allow pg_dump --statistics-only to dump foreign table statistics?

Started by Fujii Masao7 months ago10 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com
1 attachment(s)

Hi,

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

Sorry if this has already been discussed. I tried searching but couldn't
find the discussion. If there is one, I'd appreciate it if you could point
me to it.

In case we decide it's reasonable to allow dumping statistics for foreign
tables, I've attached a patch that implements this behavior.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Attachments:

v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchtext/plain; charset=UTF-8; name=v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchDownload
From 27992ef9e7335473f673c30c24f8d0344857518c Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 13 Jun 2025 15:35:44 +0900
Subject: [PATCH v1] pg_dump: Allow pg_dump to dump the statistics for foreign
 tables.

---
 doc/src/sgml/ref/pg_dump.sgml | 3 ++-
 src/bin/pg_dump/pg_dump.c     | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d7595a7e546..8ef624d961c 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1359,7 +1359,8 @@ PostgreSQL documentation
       <listitem>
        <para>
         Dump only the statistics, not the schema (data definitions) or data.
-        Statistics for tables, materialized views, and indexes are dumped.
+        Statistics for tables, materialized views, foreign tables,
+        and indexes are dumped.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 37432e66efd..8fb2b17cc3a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6890,7 +6890,8 @@ getRelationStatistics(Archive *fout, DumpableObject *rel, int32 relpages,
 		(relkind == RELKIND_PARTITIONED_TABLE) ||
 		(relkind == RELKIND_INDEX) ||
 		(relkind == RELKIND_PARTITIONED_INDEX) ||
-		(relkind == RELKIND_MATVIEW))
+		(relkind == RELKIND_MATVIEW ||
+		 relkind == RELKIND_FOREIGN_TABLE))
 	{
 		RelStatsInfo *info = pg_malloc0(sizeof(RelStatsInfo));
 		DumpableObject *dobj = &info->dobj;
@@ -6929,6 +6930,7 @@ getRelationStatistics(Archive *fout, DumpableObject *rel, int32 relpages,
 			case RELKIND_RELATION:
 			case RELKIND_PARTITIONED_TABLE:
 			case RELKIND_MATVIEW:
+			case RELKIND_FOREIGN_TABLE:
 				info->section = SECTION_DATA;
 				break;
 			case RELKIND_INDEX:
-- 
2.49.0

#2Daniel Gustafsson
daniel@yesql.se
In reply to: Fujii Masao (#1)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 13 Jun 2025, at 09:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

That indeed seems like an oversight, and regardless it's unclear to the user
what actually happens. Question is, since we don't know what's on the other
end do we need a corresponding option to --include-foreign-data to be able to
be selective with statistics?

--
Daniel Gustafsson

#3Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Daniel Gustafsson (#2)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On Fri, Jun 13, 2025 at 1:19 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 13 Jun 2025, at 09:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

That indeed seems like an oversight, and regardless it's unclear to the user
what actually happens. Question is, since we don't know what's on the other
end do we need a corresponding option to --include-foreign-data to be able to
be selective with statistics?

We include only statistics and no data for regular tables. Shouldn't
we do the same to the foreign tables?

--
Best Wishes,
Ashutosh Bapat

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#1)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On Fri, Jun 13, 2025 at 04:19:26PM +0900, Fujii Masao wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

Isn't this applicable to --with-statistics, too?

Sorry if this has already been discussed. I tried searching but couldn't
find the discussion. If there is one, I'd appreciate it if you could point
me to it.

I skimmed through the main thread for the feature [0]/messages/by-id/CADkLM=cB0rF3p_FuWRTMSV0983ihTRpsH+OCpNyiqE7Wk0vUWA@mail.gmail.com (which seems to be so
long that it sometimes doesn't load completely), and I didn't see anything
directly related to the topic. There was some discussion about importing
foreign relation stats with the new functions instead of remote table
samples, but that's a different thing.

[0]: /messages/by-id/CADkLM=cB0rF3p_FuWRTMSV0983ihTRpsH+OCpNyiqE7Wk0vUWA@mail.gmail.com

--
nathan

#5Corey Huinker
corey.huinker@gmail.com
In reply to: Nathan Bossart (#4)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

I skimmed through the main thread for the feature [0] (which seems to be so
long that it sometimes doesn't load completely), and I didn't see anything
directly related to the topic. There was some discussion about importing
foreign relation stats with the new functions instead of remote table
samples, but that's a different thing

If we aren't exporting stats for foreign tables then that is an oversight,
the intention always was to fetch all available statistics for all
relations. I can't offhand think of where we even have the option to
exclude them.

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Corey Huinker (#5)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On Mon, Jun 16, 2025 at 03:39:07PM -0400, Corey Huinker wrote:

If we aren't exporting stats for foreign tables then that is an oversight,
the intention always was to fetch all available statistics for all
relations. I can't offhand think of where we even have the option to
exclude them.

getRelationStatistics() enumerates the relation kinds:

if ((relkind == RELKIND_RELATION) ||
(relkind == RELKIND_PARTITIONED_TABLE) ||
(relkind == RELKIND_INDEX) ||
(relkind == RELKIND_PARTITIONED_INDEX) ||
(relkind == RELKIND_MATVIEW))

The proposed patch [0]/messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch adds RELKIND_FOREIGN_TABLE to this list. That
appears to be the only missing relation kind that ANALYZE handles.

[0]: /messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch

--
nathan

#7Corey Huinker
corey.huinker@gmail.com
In reply to: Nathan Bossart (#6)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

The proposed patch [0] adds RELKIND_FOREIGN_TABLE to this list. That
appears to be the only missing relation kind that ANALYZE handles.

[0]
/messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch

Thanks for pointing it out, a little distracted today.

+1 for the patch.

#8Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Nathan Bossart (#4)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 2025/06/14 7:31, Nathan Bossart wrote:

On Fri, Jun 13, 2025 at 04:19:26PM +0900, Fujii Masao wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

Isn't this applicable to --with-statistics, too?

Yes, it applies to --with-statistics as well.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#9Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Corey Huinker (#7)
1 attachment(s)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 2025/06/17 7:44, Corey Huinker wrote:

The proposed patch [0] adds RELKIND_FOREIGN_TABLE to this list.  That
appears to be the only missing relation kind that ANALYZE handles.

[0] /messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch </messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch&gt;

Thanks for pointing it out, a little distracted today.

+1 for the patch.

Thanks for reviewing the patch! I found a few more descriptions that
needed updates, so I've revised the patch accordingly. The updated
version is attached. Barring any objections, I plan to commit it.

           </para>
           <para>
            The data section contains actual table data, large-object
-          contents, statistics for tables and materialized views and
-          sequence values.
+          contents, sequence values, and statistics for tables,
+          materialized views, and foriegn tables.
            Post-data items include definitions of indexes, triggers, rules,
            statistics for indexes, and constraints other than validated check
            constraints.

Although not directly related to foreign table statistics, I considered
clarifying that the "data" section only includes statistics *if*
--statistics is specified. The current wording might suggest that
statistics are included by default, which isn't accurate. However,
since the default behavior is still under discussion at [1]/messages/by-id/CADkLM=fXSiX4GQ7F6__z+ofGpp0QwwytGP-GNeVQyMhc29bbwQ@mail.gmail.com,
I've left that part unchanged for now.

Regards,

[1]: /messages/by-id/CADkLM=fXSiX4GQ7F6__z+ofGpp0QwwytGP-GNeVQyMhc29bbwQ@mail.gmail.com

--
Fujii Masao
NTT DATA Japan Corporation

Attachments:

v2-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchtext/plain; charset=UTF-8; name=v2-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchDownload
From eeb8d3bdb3e2860a5d1c40a92e802b303be51248 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Tue, 17 Jun 2025 10:37:53 +0900
Subject: [PATCH v2] pg_dump: Allow pg_dump to dump the statistics for foreign
 tables.

Commit 1fd1bd87101 introduced support for dumping statistics with
pg_dump and pg_dumpall, covering tables, materialized views, and indexes.
However, it overlooked foreign tables, even though functions like
pg_restore_relation_stats() support them.

This commit fixes that oversight by allowing pg_dump and pg_dumpall
to include statistics for foreign tables.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3772e4e4-ef39-4deb-bb76-aa8165f33fb6@oss.nttdata.com
---
 doc/src/sgml/ref/pg_dump.sgml    | 7 ++++---
 doc/src/sgml/ref/pg_dumpall.sgml | 3 ++-
 src/bin/pg_dump/pg_dump.c        | 4 +++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d7595a7e546..1e06bd33bdc 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1277,8 +1277,8 @@ PostgreSQL documentation
          </para>
          <para>
           The data section contains actual table data, large-object
-          contents, statistics for tables and materialized views and
-          sequence values.
+          contents, sequence values, and statistics for tables,
+          materialized views, and foriegn tables.
           Post-data items include definitions of indexes, triggers, rules,
           statistics for indexes, and constraints other than validated check
           constraints.
@@ -1359,7 +1359,8 @@ PostgreSQL documentation
       <listitem>
        <para>
         Dump only the statistics, not the schema (data definitions) or data.
-        Statistics for tables, materialized views, and indexes are dumped.
+        Statistics for tables, materialized views, foreign tables,
+        and indexes are dumped.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 723a466cfaa..43f384ed16a 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -690,7 +690,8 @@ exclude database <replaceable class="parameter">PATTERN</replaceable>
       <listitem>
        <para>
         Dump only the statistics, not the schema (data definitions) or data.
-        Statistics for tables, materialized views, and indexes are dumped.
+        Statistics for tables, materialized views, foreign tables,
+        and indexes are dumped.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7bc0724cd30..a8f0309e8fc 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6890,7 +6890,8 @@ getRelationStatistics(Archive *fout, DumpableObject *rel, int32 relpages,
 		(relkind == RELKIND_PARTITIONED_TABLE) ||
 		(relkind == RELKIND_INDEX) ||
 		(relkind == RELKIND_PARTITIONED_INDEX) ||
-		(relkind == RELKIND_MATVIEW))
+		(relkind == RELKIND_MATVIEW ||
+		 relkind == RELKIND_FOREIGN_TABLE))
 	{
 		RelStatsInfo *info = pg_malloc0(sizeof(RelStatsInfo));
 		DumpableObject *dobj = &info->dobj;
@@ -6929,6 +6930,7 @@ getRelationStatistics(Archive *fout, DumpableObject *rel, int32 relpages,
 			case RELKIND_RELATION:
 			case RELKIND_PARTITIONED_TABLE:
 			case RELKIND_MATVIEW:
+			case RELKIND_FOREIGN_TABLE:
 				info->section = SECTION_DATA;
 				break;
 			case RELKIND_INDEX:
-- 
2.49.0

#10Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#9)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 2025/06/17 12:39, Fujii Masao wrote:

On 2025/06/17 7:44, Corey Huinker wrote:

    The proposed patch [0] adds RELKIND_FOREIGN_TABLE to this list.  That
    appears to be the only missing relation kind that ANALYZE handles.

    [0] /messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch </messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch&gt;

Thanks for pointing it out, a little distracted today.

+1 for the patch.

Thanks for reviewing the patch! I found a few more descriptions that
needed updates, so I've revised the patch accordingly. The updated
version is attached. Barring any objections, I plan to commit it.

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao
NTT DATA Japan Corporation