[PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade

Started by Jimmy Angelakos16 days ago2 messageshackers
Jump to latest
#1Jimmy Angelakos
jimmy.angelakos@pgedge.com

Hi All,

I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is my
first code patch, so any feedback on the approach will be appreciated!

The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the schema
from $oldcluster to $newcluster. Because this excludes all table data, it
leaves out data in extension config tables registered with
pg_extension_config_dump().

In $newcluster, binary_upgrade_create_empty_extension() creates the
extensions without populating any table data. The extensions' CREATE
EXTENSION scripts never get executed so any INSERTs are skipped. As a
consequence, if any CREATE TABLE statement in $newcluster requires
validation against these empty config tables, the upgrade fails. As an
example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial
reference system definitions (SRIDs). When a table has, e.g. a geometry
column that specifies an SRID, this gets validated during the CREATE TABLE:

CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR: Cannot find SRID (27700) in spatial_ref_sys

This will happen for any SRID-constrained column, which will prevent many
real-world PostGIS deployments from being able to pg_upgrade. To summarise
the problem, our ordering is wrong here because extension configuration
data must be present before user tables that depend on it get created, but
--schema-only strips this data.

The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps extension
config table data in SECTION_PRE_DATA during --binary-upgrade ONLY. This
restores the needed data between extension creation and user object
creation, allowing the DDL to succeed.

Four files are modified in bin/pg_dump:

pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between DO_EXTENSION
and DO_TYPE

pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE

pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType =
DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during
--binary-upgrade ONLY. As it depends on the table def, the COPY will be
emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with
description "EXTENSION DATA" using dumpTableData_copy(). This allows the
config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true, call
makeExtensionDataInfo() instead of makeTableDataInfo(). Additionally, skip
extcondition filter because we need to dump all rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in
addBoundaryDependencies()

pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar to
BLOB, BLOB METADATA, etc. to include extension config table data in
--schema-only dumps during --binary-upgrade ONLY.

What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and do
not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA"
entries populate tables like spatial_ref_sys with COPY. Subsequent CREATE
TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to $newcluster,
making spatial_ref_sys byte-for-byte identical to its previous state.

This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade).
DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for any
extension that registers config tables that will be needed for DDL
validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster PostGIS
3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on PG19-devel.

Thanks in advance for your review! Please find attached the patch for HEAD.
I believe this should be easily backpatchable to (at least) PG15, and will
be happy to work on backports.

Best regards,
Jimmy

Attachments:

0001-pg_dump-Restore-extension-config-table-data-before-u.patchtext/x-patch; charset=US-ASCII; name=0001-pg_dump-Restore-extension-config-table-data-before-u.patchDownload+117-3
#2Andrew Dunstan
andrew@dunslane.net
In reply to: Jimmy Angelakos (#1)
Re: [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade

On 2026-03-20 Fr 1:47 PM, Jimmy Angelakos wrote:

Hi All,

I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is
my first code patch, so any feedback on the approach will be appreciated!

The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the
schema from $oldcluster to $newcluster. Because this excludes all
table data, it leaves out data in extension config tables registered
with pg_extension_config_dump().

In $newcluster, binary_upgrade_create_empty_extension() creates the
extensions without populating any table data. The extensions' CREATE
EXTENSION scripts never get executed so any INSERTs are skipped. As a
consequence, if any CREATE TABLE statement in $newcluster requires
validation against these empty config tables, the upgrade fails. As an
example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial
reference system definitions (SRIDs). When a table has, e.g. a
geometry column that specifies an SRID, this gets validated during the
CREATE TABLE:

CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR:  Cannot find SRID (27700) in spatial_ref_sys

This will happen for any SRID-constrained column, which will prevent
many real-world PostGIS deployments from being able to pg_upgrade. To
summarise the problem, our ordering is wrong here because extension
configuration data must be present before user tables that depend on
it get created, but --schema-only strips this data.

The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps
extension config table data in SECTION_PRE_DATA during
--binary-upgrade ONLY. This restores the needed data between extension
creation and user object creation, allowing the DDL to succeed.

Four files are modified in bin/pg_dump:

pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between
DO_EXTENSION and DO_TYPE

pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE

pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType
= DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during
--binary-upgrade ONLY. As it depends on the table def, the COPY will
be emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with
description "EXTENSION DATA" using dumpTableData_copy(). This allows
the config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true,
call makeExtensionDataInfo() instead of makeTableDataInfo().
Additionally, skip extcondition filter because we need to dump all
rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in
addBoundaryDependencies()

pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar
to BLOB, BLOB METADATA, etc. to include extension config table data in
--schema-only dumps during --binary-upgrade ONLY.

What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and
do not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA"
entries populate tables like spatial_ref_sys with COPY. Subsequent
CREATE TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to
$newcluster, making spatial_ref_sys byte-for-byte identical to its
previous state.

This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade).
DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for
any extension that registers config tables that will be needed for DDL
validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster
PostGIS 3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on
PG19-devel.

Thanks in advance for your review! Please find attached the patch for
HEAD. I believe this should be easily backpatchable to (at least)
PG15, and will be happy to work on backports.

Hi, Jimmy.

First, as you probably know, we don't backpatch features, and I think
this comes into that category. Unfortunately, we're about to close
release 19 for features, so this would need to wait till release 20.

The patch didn't include any tests. It will need them (probably in
src/test/modules/test_pg_dump)

There appears to be a lot of code duplication between
dumpExtensionData() and dumpTableData(). It might be better to refactor
that, perhaps by supplying an extra flag to dumpTableData().

Do make sure to add a Commitfest entry for this is you haven't already
done so.

cheers

andrew

Best regards,
Jimmy

--
Andrew Dunstan
EDB: https://www.enterprisedb.com