Fix a server crash problem from pg_get_database_ddl
Hi,
While doing some testing, I hit a server crash:
```
2026-04-15 11:30:17.377 CST [98179] LOG: client backend (PID 41260) was terminated by signal 11: Segmentation fault: 11
2026-04-15 11:30:17.377 CST [98179] DETAIL: Failed process was running: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
2026-04-15 11:30:17.377 CST [98179] LOG: terminating any other active server processes
2026-04-15 11:30:17.380 CST [44361] FATAL: the database system is in recovery mode
```
After debugging it, I found that the crash happened because I had mistakenly deleted the tablespace entry directly from pg_tablespace, and pg_get_database_ddl_internal() calls get_tablespace_name() without checking whether the return value is NULL.
So this doesn't seem like a bug a normal user could hit. It is more like a superuser-only mistake that creates an invalid catalog state. I think that even in such an edge case, we should raise a proper error instead of crashing the backend.
BTW, I have verified that in this case, ALTER DATABASE ... SET TABLESPACE can move the database to a valid tablespace and recover from the issue.
This patch fixes that by checking for a NULL result and throwing an error.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
v1-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patchapplication/octet-stream; name=v1-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patch; x-unix-mode=0644Download+7-1
I have reproduced this error against the current master:
```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:
```
[1]: LOG: terminating any other active server processes ``` After applying the patch:
[1]: LOG: terminating any other active server processes ``` After applying the patch:
[1]: LOG: terminating any other active server processes ``` After applying the patch:
```
After applying the patch:
```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
```
and backend logs show:
```
[56]: STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase); ``` All tests pass.
[56]: STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase); ``` All tests pass.
[56]: STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase); ``` All tests pass.
```
All tests pass.
The only note I'd have on the code change is that there is no accompanying test. It seems like a TAP test would be reasonable, but I am quite new and will defer to whether you think that's the right call or even necessary.
Jack
On Wed, 15 Apr 2026 at 20:44, "Jack Bonatakis" <jack@bonatak.is> wrote:
I have reproduced this error against the current master:
```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:```
[1] LOG: client backend (PID 15420) was terminated by signal 11: Segmentation fault
[1] DETAIL: Failed process was running: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
[1] LOG: terminating any other active server processes
```
After applying the patch:```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
```
and backend logs show:```
[56] ERROR: tablespace with OID 16393 does not exist
[56] HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
[56] STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
```
All tests pass.The only note I'd have on the code change is that there is no accompanying test. It seems like a TAP test would be
reasonable, but I am quite new and will defer to whether you think that's the right call or even necessary.Jack
This seems similar to [1]/messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com.. Could you please confirm?
[1]: /messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
On Apr 16, 2026, at 09:23, Japin Li <japinli@hotmail.com> wrote:
On Wed, 15 Apr 2026 at 20:44, "Jack Bonatakis" <jack@bonatak.is> wrote:
I have reproduced this error against the current master:
```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:```
[1] LOG: client backend (PID 15420) was terminated by signal 11: Segmentation fault
[1] DETAIL: Failed process was running: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
[1] LOG: terminating any other active server processes
```
After applying the patch:```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
```
and backend logs show:```
[56] ERROR: tablespace with OID 16393 does not exist
[56] HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
[56] STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
```
All tests pass.The only note I'd have on the code change is that there is no accompanying test. It seems like a TAP test would be
reasonable, but I am quite new and will defer to whether you think that's the right call or even necessary.Jack
This seems similar to [1]. Could you please confirm?
[1] /messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Thanks for printing out that. Yes, they are similar.
I agree with what Tom said in [2]/messages/by-id/1538113.1768921841@sss.pgh.pa.us:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```
So, let me take back this patch.
[2]: /messages/by-id/1538113.1768921841@sss.pgh.pa.us
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi,
Adding Tom to the thread explicitly to seek his opinion.
On Wed, Apr 15, 2026 at 6:36 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Apr 16, 2026, at 09:23, Japin Li <japinli@hotmail.com> wrote:
On Wed, 15 Apr 2026 at 20:44, "Jack Bonatakis" <jack@bonatak.is> wrote:
I have reproduced this error against the current master:
```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:```
[1] LOG: client backend (PID 15420) was terminated by signal 11:Segmentation fault
[1] DETAIL: Failed process was running: SELECT * FROM
pg_get_database_ddl('db1'::regdatabase);
[1] LOG: terminating any other active server processes
```
After applying the patch:```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a validtablespace.
```
and backend logs show:```
[56] ERROR: tablespace with OID 16393 does not exist
[56] HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to avalid tablespace.
[56] STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
```
All tests pass.The only note I'd have on the code change is that there is no
accompanying test. It seems like a TAP test would be
reasonable, but I am quite new and will defer to whether you think
that's the right call or even necessary.
Jack
This seems similar to [1]. Could you please confirm?
[1]
/messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com
.--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.Thanks for printing out that. Yes, they are similar.
I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```So, let me take back this patch.
In this case, it is a very corner case but not something superuser
intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a
different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function
(intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this comment
while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.
Attached an injection point test to show the race. Not intended to commit.
Thanks,
Satya
Attachments:
0002-Test-pg_get_database_ddl-race-with-concurrent-tablespace-drop.patchapplication/octet-stream; name=0002-Test-pg_get_database_ddl-race-with-concurrent-tablespace-drop.patchDownload+118-2
On 2026-04-23 Th 2:47 AM, SATYANARAYANA NARLAPURAM wrote:
Thanks for printing out that. Yes, they are similar.
I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```So, let me take back this patch.
[2]
/messages/by-id/1538113.1768921841@sss.pgh.pa.usIn this case, it is a very corner case but not something superuser
intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a
different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function
(intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this
comment while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.Attached an injection point test to show the race. Not intended to commit.
I agree if there's a race condition we should protect against it. I
don't much like the idea of silently ignoring it, though. Raising an
error seems more like the right thing to do.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
On Apr 26, 2026, at 22:50, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2026-04-23 Th 2:47 AM, SATYANARAYANA NARLAPURAM wrote:
Thanks for printing out that. Yes, they are similar.
I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```So, let me take back this patch.
[2] /messages/by-id/1538113.1768921841@sss.pgh.pa.us
In this case, it is a very corner case but not something superuser intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function (intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this comment while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.Attached an injection point test to show the race. Not intended to commit.
I agree if there's a race condition we should protect against it. I don't much like the idea of silently ignoring it, though. Raising an error seems more like the right thing to do.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
The v1 patch raises an error when the tablespace name is NULL.
PFA v2: removed hint from the error message, because I now consider the hint might not be necessary.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
v2-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patchapplication/octet-stream; name=v2-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patch; x-unix-mode=0644Download+6-1
Hi Chao,
On Sun, Apr 26, 2026 at 7:03 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Apr 26, 2026, at 22:50, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2026-04-23 Th 2:47 AM, SATYANARAYANA NARLAPURAM wrote:
Thanks for printing out that. Yes, they are similar.
I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```So, let me take back this patch.
[2]
/messages/by-id/1538113.1768921841@sss.pgh.pa.us
In this case, it is a very corner case but not something superuser
intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a
different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function
(intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this
comment while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.
Attached an injection point test to show the race. Not intended to
commit.
I agree if there's a race condition we should protect against it. I
don't much like the idea of silently ignoring it, though. Raising an error
seems more like the right thing to do.cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.comThe v1 patch raises an error when the tablespace name is NULL.
PFA v2: removed hint from the error message, because I now consider the
hint might not be necessary.
+ if (spcname == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablespace with OID %u does not exist",
+ dbform->dattablespace));
+
A message with error detail that says a concurrent DDL could have dropped a
tablespace could be better?
System catalog is optional.
Something like:
errdetail("The tablespace may have been dropped concurrently, or the system
catalog is inconsistent.")));
Thanks,
Satya