Allow pg_dump --statistics-only to dump foreign table statistics?

Started by Fujii Masao10 months ago10 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi,

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

Sorry if this has already been discussed. I tried searching but couldn't
find the discussion. If there is one, I'd appreciate it if you could point
me to it.

In case we decide it's reasonable to allow dumping statistics for foreign
tables, I've attached a patch that implements this behavior.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Attachments:

v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchtext/plain; charset=UTF-8; name=v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchDownload+5-3
#2Daniel Gustafsson
daniel@yesql.se
In reply to: Fujii Masao (#1)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 13 Jun 2025, at 09:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

That indeed seems like an oversight, and regardless it's unclear to the user
what actually happens. Question is, since we don't know what's on the other
end do we need a corresponding option to --include-foreign-data to be able to
be selective with statistics?

--
Daniel Gustafsson

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Daniel Gustafsson (#2)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On Fri, Jun 13, 2025 at 1:19 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 13 Jun 2025, at 09:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

That indeed seems like an oversight, and regardless it's unclear to the user
what actually happens. Question is, since we don't know what's on the other
end do we need a corresponding option to --include-foreign-data to be able to
be selective with statistics?

We include only statistics and no data for regular tables. Shouldn't
we do the same to the foreign tables?

--
Best Wishes,
Ashutosh Bapat

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#1)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On Fri, Jun 13, 2025 at 04:19:26PM +0900, Fujii Masao wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

Isn't this applicable to --with-statistics, too?

Sorry if this has already been discussed. I tried searching but couldn't
find the discussion. If there is one, I'd appreciate it if you could point
me to it.

I skimmed through the main thread for the feature [0]/messages/by-id/CADkLM=cB0rF3p_FuWRTMSV0983ihTRpsH+OCpNyiqE7Wk0vUWA@mail.gmail.com (which seems to be so
long that it sometimes doesn't load completely), and I didn't see anything
directly related to the topic. There was some discussion about importing
foreign relation stats with the new functions instead of remote table
samples, but that's a different thing.

[0]: /messages/by-id/CADkLM=cB0rF3p_FuWRTMSV0983ihTRpsH+OCpNyiqE7Wk0vUWA@mail.gmail.com

--
nathan

#5Corey Huinker
corey.huinker@gmail.com
In reply to: Nathan Bossart (#4)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

I skimmed through the main thread for the feature [0] (which seems to be so
long that it sometimes doesn't load completely), and I didn't see anything
directly related to the topic. There was some discussion about importing
foreign relation stats with the new functions instead of remote table
samples, but that's a different thing

If we aren't exporting stats for foreign tables then that is an oversight,
the intention always was to fetch all available statistics for all
relations. I can't offhand think of where we even have the option to
exclude them.

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Corey Huinker (#5)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On Mon, Jun 16, 2025 at 03:39:07PM -0400, Corey Huinker wrote:

If we aren't exporting stats for foreign tables then that is an oversight,
the intention always was to fetch all available statistics for all
relations. I can't offhand think of where we even have the option to
exclude them.

getRelationStatistics() enumerates the relation kinds:

if ((relkind == RELKIND_RELATION) ||
(relkind == RELKIND_PARTITIONED_TABLE) ||
(relkind == RELKIND_INDEX) ||
(relkind == RELKIND_PARTITIONED_INDEX) ||
(relkind == RELKIND_MATVIEW))

The proposed patch [0]/messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch adds RELKIND_FOREIGN_TABLE to this list. That
appears to be the only missing relation kind that ANALYZE handles.

[0]: /messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch

--
nathan

#7Corey Huinker
corey.huinker@gmail.com
In reply to: Nathan Bossart (#6)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

The proposed patch [0] adds RELKIND_FOREIGN_TABLE to this list. That
appears to be the only missing relation kind that ANALYZE handles.

[0]
/messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch

Thanks for pointing it out, a little distracted today.

+1 for the patch.

#8Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#4)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 2025/06/14 7:31, Nathan Bossart wrote:

On Fri, Jun 13, 2025 at 04:19:26PM +0900, Fujii Masao wrote:

I noticed that pg_restore_relation|attribute_stats() can restore statistics
for foreign tables, but pg_dump --statistics-only doesn't include them.
Is there a reason why pg_dump skips statistics for foreign tables?
Are there any risks or concerns around including them?

Isn't this applicable to --with-statistics, too?

Yes, it applies to --with-statistics as well.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Corey Huinker (#7)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 2025/06/17 7:44, Corey Huinker wrote:

The proposed patch [0] adds RELKIND_FOREIGN_TABLE to this list.  That
appears to be the only missing relation kind that ANALYZE handles.

[0] /messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch </messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch&gt;

Thanks for pointing it out, a little distracted today.

+1 for the patch.

Thanks for reviewing the patch! I found a few more descriptions that
needed updates, so I've revised the patch accordingly. The updated
version is attached. Barring any objections, I plan to commit it.

           </para>
           <para>
            The data section contains actual table data, large-object
-          contents, statistics for tables and materialized views and
-          sequence values.
+          contents, sequence values, and statistics for tables,
+          materialized views, and foriegn tables.
            Post-data items include definitions of indexes, triggers, rules,
            statistics for indexes, and constraints other than validated check
            constraints.

Although not directly related to foreign table statistics, I considered
clarifying that the "data" section only includes statistics *if*
--statistics is specified. The current wording might suggest that
statistics are included by default, which isn't accurate. However,
since the default behavior is still under discussion at [1]/messages/by-id/CADkLM=fXSiX4GQ7F6__z+ofGpp0QwwytGP-GNeVQyMhc29bbwQ@mail.gmail.com,
I've left that part unchanged for now.

Regards,

[1]: /messages/by-id/CADkLM=fXSiX4GQ7F6__z+ofGpp0QwwytGP-GNeVQyMhc29bbwQ@mail.gmail.com

--
Fujii Masao
NTT DATA Japan Corporation

Attachments:

v2-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchtext/plain; charset=UTF-8; name=v2-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patchDownload+9-6
#10Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#9)
Re: Allow pg_dump --statistics-only to dump foreign table statistics?

On 2025/06/17 12:39, Fujii Masao wrote:

On 2025/06/17 7:44, Corey Huinker wrote:

    The proposed patch [0] adds RELKIND_FOREIGN_TABLE to this list.  That
    appears to be the only missing relation kind that ANALYZE handles.

    [0] /messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch </messages/by-id/attachment/177608/v1-0001-pg_dump-Allow-pg_dump-to-dump-the-statistics-for-.patch&gt;

Thanks for pointing it out, a little distracted today.

+1 for the patch.

Thanks for reviewing the patch! I found a few more descriptions that
needed updates, so I've revised the patch accordingly. The updated
version is attached. Barring any objections, I plan to commit it.

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao
NTT DATA Japan Corporation