Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

Started by Jeevan Chalke4 months ago4 messageshackers
Jump to latest
#1Jeevan Chalke
jeevan.chalke@enterprisedb.com

Hello Hackers,

We have identified a dependency issue—most notably observed with the
PostGIS extension—where a table's column definition relies on data existing
in another table's catalog at restore time. Because pg_dump typically
separates schema and data into distinct sections, these implicit data-level
dependencies are not captured, leading to failures during pg_upgrade or
pg_restore.

Jakub Wartak previously reported a detailed example of this issue here:
/messages/by-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q@mail.gmail.com

Following a discussion with Alvaro Herrera, we have developed a patch based
on his suggestions.
The Problem

In certain extension-heavy schemas, an object's schema definition cannot be
created unless another table's data is already populated. Current pg_dump
logic handles schema-to-schema dependencies via pg_depend, but it lacks a
mechanism to:

1.

Enforce a specific order for dependencies not recorded in pg_depend.
2.

Interleave data loading with schema creation for specific tables.

Proposed Solution

This patch introduces a new option, --extra-dependencies, for pg_dump and
pg_upgrade.

*1. Dependency Hinting:* The option allows users to provide manual
dependency hints in the format table#referenced_table. Internally, pg_dump
treats these as "faked" dependencies, ensuring the referenced table is
sorted before the dependent object in the dump graph, similar to standard
pg_depend entries.

*2. Immediate Data Dumping:* To satisfy the data-level requirement, the
patch ensures that any table referenced via this option has its data dumped
immediately following its definition (utilizing the dumpTableData()
infrastructure) rather than in the general DATA section of the dump.
Use Case

While this is a specialized tool intended for unrecorded dependencies (like
those in PostGIS), it provides a necessary safety valve for migrations that
currently require manual intervention or complex workarounds.

Attached are the patches (PoC) for review. We look forward to your thoughts
and suggestions.

Regards,

--
*Jeevan Chalke*
*Principal Engineer, Engineering Manager*
*Product Development*

enterprisedb.com <https://www.enterprisedb.com&gt;

Attachments:

v1-0001-pg_dump-Add-extra-dependencies-option.patchapplication/octet-stream; name=v1-0001-pg_dump-Add-extra-dependencies-option.patchDownload+187-6
v1-0002-pg_upgrade-Add-extra-dependencies-option.patchapplication/octet-stream; name=v1-0002-pg_upgrade-Add-extra-dependencies-option.patchDownload+33-2
#2Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Jeevan Chalke (#1)
Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

On Wed, 24 Dec 2025 at 12:45, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Hello Hackers,

We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column definition relies on data existing in another table's catalog at restore time. Because pg_dump typically separates schema and data into distinct sections, these implicit data-level dependencies are not captured, leading to failures during pg_upgrade or pg_restore.

Jakub Wartak previously reported a detailed example of this issue here: /messages/by-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q@mail.gmail.com

Ah, yes, that does sound like an issue.

Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions.

The Problem

In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is already populated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism to:

Enforce a specific order for dependencies not recorded in pg_depend.
Interleave data loading with schema creation for specific tables.

Is there something that prevents PostGIS from recording this kind of
dependency in pg_depend, and by doing so force the right order in
pg_dump? It seems to me that pg_depend's model is generic enough to
enable that kind of dependency; so is the issue that pg_dump doesn't
currently track and resolve that type of dependency in a satisfactory
manner?

I'm personally not a big fan of new pg_dump and pg_upgrade options to
solve this, as they require a user input to register a dependency that
should've been stored in the catalog; it should've been handled
natively. So, if we could make it work using pg_depend instead of
expecting user input here, then that'd be very much appreciated.

Kind regards,

Matthias van de Meent

#3Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Matthias van de Meent (#2)
Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

On Thu, Dec 25, 2025 at 2:22 AM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:

On Wed, 24 Dec 2025 at 12:45, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Hello Hackers,

We have identified a dependency issue—most notably observed with the

PostGIS extension—where a table's column definition relies on data existing
in another table's catalog at restore time. Because pg_dump typically
separates schema and data into distinct sections, these implicit data-level
dependencies are not captured, leading to failures during pg_upgrade or
pg_restore.

Jakub Wartak previously reported a detailed example of this issue here:

/messages/by-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q@mail.gmail.com

Ah, yes, that does sound like an issue.

Following a discussion with Alvaro Herrera, we have developed a patch

based on his suggestions.

The Problem

In certain extension-heavy schemas, an object's schema definition cannot

be created unless another table's data is already populated. Current
pg_dump logic handles schema-to-schema dependencies via pg_depend, but it
lacks a mechanism to:

Enforce a specific order for dependencies not recorded in pg_depend.
Interleave data loading with schema creation for specific tables.

Is there something that prevents PostGIS from recording this kind of
dependency in pg_depend, and by doing so force the right order in
pg_dump? It seems to me that pg_depend's model is generic enough to
enable that kind of dependency; so is the issue that pg_dump doesn't
currently track and resolve that type of dependency in a satisfactory
manner?

I'm personally not a big fan of new pg_dump and pg_upgrade options to
solve this, as they require a user input to register a dependency that
should've been stored in the catalog; it should've been handled
natively. So, if we could make it work using pg_depend instead of
expecting user input here, then that'd be very much appreciated.

Thanks for the feedback, Matthias; I agree with your assessment. Currently,
Postgres lacks a native mechanism for tracking dependencies between a table
and the specific rows of another table. While certain extensions like
PostGIS introduce these patterns, they remain non-standard edge cases.
Implementing a fix in the core backend seems like overkill for this
scenario. Since the failure is specific to the upgrade path, targeting
pg_dump and pg_upgrade is a significantly less invasive approach. Notably,
this patch triggers an immediate dump of the referenced table data -- an
unconventional behavior that is better handled in the client binaries than
in the backend. Consequently, this approach would require new options for
these binaries to explicitly inject those dependency details.

Kind regards,

Matthias van de Meent

Regards,

--
*Jeevan Chalke*
*Principal Engineer, Engineering Manager*
*Product Development*

enterprisedb.com <https://www.enterprisedb.com&gt;

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Jeevan Chalke (#3)
Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

On 01.01.26 14:43, Jeevan Chalke wrote:

Thanks for the feedback, Matthias; I agree with your assessment.
Currently, Postgres lacks a native mechanism for tracking dependencies
between a table and the specific rows of another table. While certain
extensions like PostGIS introduce these patterns, they remain non-
standard edge cases.

Implementing a fix in the core backend seems like overkill for this
scenario. Since the failure is specific to the upgrade path, targeting |
pg_dump| and |pg_upgrade| is a significantly less invasive approach.
Notably, this patch triggers an immediate dump of the referenced table
data -- an unconventional behavior that is better handled in the client
binaries than in the backend. Consequently, this approach would require
new options for these binaries to explicitly inject those dependency
details.

How about this: postgis should define its table spatial_ref_sys as
user_catalog_table, and we change pg_dump to dump the contents of user
catalog tables before other DDL.

There is still some work to do here, but at least this sounds like a
more principled approach.