[PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade

Started by Jimmy Angelakos3 months ago3 messageshackers
Jump to latest
#1Jimmy Angelakos
jimmy.angelakos@pgedge.com

Hi All,

I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is my
first code patch, so any feedback on the approach will be appreciated!

The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the schema
from $oldcluster to $newcluster. Because this excludes all table data, it
leaves out data in extension config tables registered with
pg_extension_config_dump().

In $newcluster, binary_upgrade_create_empty_extension() creates the
extensions without populating any table data. The extensions' CREATE
EXTENSION scripts never get executed so any INSERTs are skipped. As a
consequence, if any CREATE TABLE statement in $newcluster requires
validation against these empty config tables, the upgrade fails. As an
example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial
reference system definitions (SRIDs). When a table has, e.g. a geometry
column that specifies an SRID, this gets validated during the CREATE TABLE:

CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR: Cannot find SRID (27700) in spatial_ref_sys

This will happen for any SRID-constrained column, which will prevent many
real-world PostGIS deployments from being able to pg_upgrade. To summarise
the problem, our ordering is wrong here because extension configuration
data must be present before user tables that depend on it get created, but
--schema-only strips this data.

The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps extension
config table data in SECTION_PRE_DATA during --binary-upgrade ONLY. This
restores the needed data between extension creation and user object
creation, allowing the DDL to succeed.

Four files are modified in bin/pg_dump:

pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between DO_EXTENSION
and DO_TYPE

pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE

pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType =
DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during
--binary-upgrade ONLY. As it depends on the table def, the COPY will be
emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with
description "EXTENSION DATA" using dumpTableData_copy(). This allows the
config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true, call
makeExtensionDataInfo() instead of makeTableDataInfo(). Additionally, skip
extcondition filter because we need to dump all rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in
addBoundaryDependencies()

pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar to
BLOB, BLOB METADATA, etc. to include extension config table data in
--schema-only dumps during --binary-upgrade ONLY.

What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and do
not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA"
entries populate tables like spatial_ref_sys with COPY. Subsequent CREATE
TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to $newcluster,
making spatial_ref_sys byte-for-byte identical to its previous state.

This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade).
DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for any
extension that registers config tables that will be needed for DDL
validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster PostGIS
3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on PG19-devel.

Thanks in advance for your review! Please find attached the patch for HEAD.
I believe this should be easily backpatchable to (at least) PG15, and will
be happy to work on backports.

Best regards,
Jimmy

Attachments:

0001-pg_dump-Restore-extension-config-table-data-before-u.patchtext/x-patch; charset=US-ASCII; name=0001-pg_dump-Restore-extension-config-table-data-before-u.patchDownload+117-3
#2Andrew Dunstan
andrew@dunslane.net
In reply to: Jimmy Angelakos (#1)
Re: [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade

On 2026-03-20 Fr 1:47 PM, Jimmy Angelakos wrote:

Hi All,

I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is
my first code patch, so any feedback on the approach will be appreciated!

The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the
schema from $oldcluster to $newcluster. Because this excludes all
table data, it leaves out data in extension config tables registered
with pg_extension_config_dump().

In $newcluster, binary_upgrade_create_empty_extension() creates the
extensions without populating any table data. The extensions' CREATE
EXTENSION scripts never get executed so any INSERTs are skipped. As a
consequence, if any CREATE TABLE statement in $newcluster requires
validation against these empty config tables, the upgrade fails. As an
example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial
reference system definitions (SRIDs). When a table has, e.g. a
geometry column that specifies an SRID, this gets validated during the
CREATE TABLE:

CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR:  Cannot find SRID (27700) in spatial_ref_sys

This will happen for any SRID-constrained column, which will prevent
many real-world PostGIS deployments from being able to pg_upgrade. To
summarise the problem, our ordering is wrong here because extension
configuration data must be present before user tables that depend on
it get created, but --schema-only strips this data.

The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps
extension config table data in SECTION_PRE_DATA during
--binary-upgrade ONLY. This restores the needed data between extension
creation and user object creation, allowing the DDL to succeed.

Four files are modified in bin/pg_dump:

pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between
DO_EXTENSION and DO_TYPE

pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE

pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType
= DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during
--binary-upgrade ONLY. As it depends on the table def, the COPY will
be emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with
description "EXTENSION DATA" using dumpTableData_copy(). This allows
the config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true,
call makeExtensionDataInfo() instead of makeTableDataInfo().
Additionally, skip extcondition filter because we need to dump all
rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in
addBoundaryDependencies()

pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar
to BLOB, BLOB METADATA, etc. to include extension config table data in
--schema-only dumps during --binary-upgrade ONLY.

What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and
do not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA"
entries populate tables like spatial_ref_sys with COPY. Subsequent
CREATE TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to
$newcluster, making spatial_ref_sys byte-for-byte identical to its
previous state.

This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade).
DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for
any extension that registers config tables that will be needed for DDL
validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster
PostGIS 3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on
PG19-devel.

Thanks in advance for your review! Please find attached the patch for
HEAD. I believe this should be easily backpatchable to (at least)
PG15, and will be happy to work on backports.

Hi, Jimmy.

First, as you probably know, we don't backpatch features, and I think
this comes into that category. Unfortunately, we're about to close
release 19 for features, so this would need to wait till release 20.

The patch didn't include any tests. It will need them (probably in
src/test/modules/test_pg_dump)

There appears to be a lot of code duplication between
dumpExtensionData() and dumpTableData(). It might be better to refactor
that, perhaps by supplying an extra flag to dumpTableData().

Do make sure to add a Commitfest entry for this is you haven't already
done so.

cheers

andrew

Best regards,
Jimmy

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#3Jimmy Angelakos
jimmy.angelakos@pgedge.com
In reply to: Andrew Dunstan (#2)
Re: [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade

Hi Andrew,

Thanks for your review!

My opinion is that this is a bugfix rather than a feature: Rather than
adding new capability it's fixing pg_upgrade's behaviour, because it
currently fails in the described scenario (SRID-constrained columns). The
new code path isn't user facing and it only fires during pg_upgrade's
internal use of pg_dump --binary-upgrade.

However, I will defer to the committers' judgement on whether this should
be included in PG19 and backpatched.

To address your feedback, please find attached v2 which:
1. Removes dumpExtensionData() and adds the handling for EXTENSION DATA
object type to dumpTableData()
2. Adds test in test_pg_dump: we insert a row into the dumpable extension
table, and we expect that the COPY appears in --binary-upgrade dumps.

I have also added a commitfest entry.

Thanks again,
Jimmy

On Sun, Mar 29, 2026 at 7:34 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2026-03-20 Fr 1:47 PM, Jimmy Angelakos wrote:

Hi All,

I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is
my first code patch, so any feedback on the approach will be appreciated!

The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the
schema from $oldcluster to $newcluster. Because this excludes all
table data, it leaves out data in extension config tables registered
with pg_extension_config_dump().

In $newcluster, binary_upgrade_create_empty_extension() creates the
extensions without populating any table data. The extensions' CREATE
EXTENSION scripts never get executed so any INSERTs are skipped. As a
consequence, if any CREATE TABLE statement in $newcluster requires
validation against these empty config tables, the upgrade fails. As an
example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial
reference system definitions (SRIDs). When a table has, e.g. a
geometry column that specifies an SRID, this gets validated during the
CREATE TABLE:

CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR: Cannot find SRID (27700) in spatial_ref_sys

This will happen for any SRID-constrained column, which will prevent
many real-world PostGIS deployments from being able to pg_upgrade. To
summarise the problem, our ordering is wrong here because extension
configuration data must be present before user tables that depend on
it get created, but --schema-only strips this data.

The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps
extension config table data in SECTION_PRE_DATA during
--binary-upgrade ONLY. This restores the needed data between extension
creation and user object creation, allowing the DDL to succeed.

Four files are modified in bin/pg_dump:

pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between
DO_EXTENSION and DO_TYPE

pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE

pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType
= DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during
--binary-upgrade ONLY. As it depends on the table def, the COPY will
be emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with
description "EXTENSION DATA" using dumpTableData_copy(). This allows
the config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true,
call makeExtensionDataInfo() instead of makeTableDataInfo().
Additionally, skip extcondition filter because we need to dump all
rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in
addBoundaryDependencies()

pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar
to BLOB, BLOB METADATA, etc. to include extension config table data in
--schema-only dumps during --binary-upgrade ONLY.

What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and
do not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA"
entries populate tables like spatial_ref_sys with COPY. Subsequent
CREATE TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to
$newcluster, making spatial_ref_sys byte-for-byte identical to its
previous state.

This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade).
DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for
any extension that registers config tables that will be needed for DDL
validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster
PostGIS 3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on
PG19-devel.

Thanks in advance for your review! Please find attached the patch for
HEAD. I believe this should be easily backpatchable to (at least)
PG15, and will be happy to work on backports.

Hi, Jimmy.

First, as you probably know, we don't backpatch features, and I think
this comes into that category. Unfortunately, we're about to close
release 19 for features, so this would need to wait till release 20.

The patch didn't include any tests. It will need them (probably in
src/test/modules/test_pg_dump)

There appears to be a lot of code duplication between
dumpExtensionData() and dumpTableData(). It might be better to refactor
that, perhaps by supplying an extra flag to dumpTableData().

Do make sure to add a Commitfest entry for this is you haven't already
done so.

cheers

andrew

Best regards,
Jimmy

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

--

Jimmy Angelakos

Staff Software Engineer

jimmy@pgEdge.com

pgEdge.com <http://pgedge.com/&gt;

Attachments:

nonameimage/png; name=nonameDownload+3-0
v2-0001-pg_dump-Restore-extension-config-table-data-befor.patchtext/x-patch; charset=US-ASCII; name=v2-0001-pg_dump-Restore-extension-config-table-data-befor.patchDownload+89-6