Fix a server crash problem from pg_get_database_ddl

Started by Chao Li15 days ago8 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

While doing some testing, I hit a server crash:
```
2026-04-15 11:30:17.377 CST [98179] LOG: client backend (PID 41260) was terminated by signal 11: Segmentation fault: 11
2026-04-15 11:30:17.377 CST [98179] DETAIL: Failed process was running: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
2026-04-15 11:30:17.377 CST [98179] LOG: terminating any other active server processes
2026-04-15 11:30:17.380 CST [44361] FATAL: the database system is in recovery mode
```

After debugging it, I found that the crash happened because I had mistakenly deleted the tablespace entry directly from pg_tablespace, and pg_get_database_ddl_internal() calls get_tablespace_name() without checking whether the return value is NULL.

So this doesn't seem like a bug a normal user could hit. It is more like a superuser-only mistake that creates an invalid catalog state. I think that even in such an edge case, we should raise a proper error instead of crashing the backend.

BTW, I have verified that in this case, ALTER DATABASE ... SET TABLESPACE can move the database to a valid tablespace and recover from the issue.

This patch fixes that by checking for a NULL result and throwing an error.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v1-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patchapplication/octet-stream; name=v1-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patch; x-unix-mode=0644Download+7-1
#2Jack Bonatakis
jack@bonatak.is
In reply to: Chao Li (#1)
Re: Fix a server crash problem from pg_get_database_ddl

I have reproduced this error against the current master:

```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:

```
[1]: LOG: terminating any other active server processes ``` After applying the patch:
[1]: LOG: terminating any other active server processes ``` After applying the patch:
[1]: LOG: terminating any other active server processes ``` After applying the patch:
```
After applying the patch:

```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
```
and backend logs show:

```
[56]: STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase); ``` All tests pass.
[56]: STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase); ``` All tests pass.
[56]: STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase); ``` All tests pass.
```
All tests pass.

The only note I'd have on the code change is that there is no accompanying test. It seems like a TAP test would be reasonable, but I am quite new and will defer to whether you think that's the right call or even necessary.

Jack

#3Japin Li
japinli@hotmail.com
In reply to: Jack Bonatakis (#2)
Re: Fix a server crash problem from pg_get_database_ddl

On Wed, 15 Apr 2026 at 20:44, "Jack Bonatakis" <jack@bonatak.is> wrote:

I have reproduced this error against the current master:

```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:

```
[1] LOG: client backend (PID 15420) was terminated by signal 11: Segmentation fault
[1] DETAIL: Failed process was running: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
[1] LOG: terminating any other active server processes
```
After applying the patch:

```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
```
and backend logs show:

```
[56] ERROR: tablespace with OID 16393 does not exist
[56] HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
[56] STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
```
All tests pass.

The only note I'd have on the code change is that there is no accompanying test. It seems like a TAP test would be
reasonable, but I am quite new and will defer to whether you think that's the right call or even necessary.

Jack

This seems similar to [1]/messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com.. Could you please confirm?

[1]: /messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#4Chao Li
li.evan.chao@gmail.com
In reply to: Japin Li (#3)
Re: Fix a server crash problem from pg_get_database_ddl

On Apr 16, 2026, at 09:23, Japin Li <japinli@hotmail.com> wrote:

On Wed, 15 Apr 2026 at 20:44, "Jack Bonatakis" <jack@bonatak.is> wrote:

I have reproduced this error against the current master:

```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:

```
[1] LOG: client backend (PID 15420) was terminated by signal 11: Segmentation fault
[1] DETAIL: Failed process was running: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
[1] LOG: terminating any other active server processes
```
After applying the patch:

```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
```
and backend logs show:

```
[56] ERROR: tablespace with OID 16393 does not exist
[56] HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid tablespace.
[56] STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
```
All tests pass.

The only note I'd have on the code change is that there is no accompanying test. It seems like a TAP test would be
reasonable, but I am quite new and will defer to whether you think that's the right call or even necessary.

Jack

This seems similar to [1]. Could you please confirm?

[1] /messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Thanks for printing out that. Yes, they are similar.

I agree with what Tom said in [2]/messages/by-id/1538113.1768921841@sss.pgh.pa.us:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```

So, let me take back this patch.

[2]: /messages/by-id/1538113.1768921841@sss.pgh.pa.us

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#5SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Chao Li (#4)
Re: Fix a server crash problem from pg_get_database_ddl

Hi,

Adding Tom to the thread explicitly to seek his opinion.

On Wed, Apr 15, 2026 at 6:36 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 16, 2026, at 09:23, Japin Li <japinli@hotmail.com> wrote:

On Wed, 15 Apr 2026 at 20:44, "Jack Bonatakis" <jack@bonatak.is> wrote:

I have reproduced this error against the current master:

```
CREATE TABLESPACE ts1 LOCATION '/workspace/tablespaces/pg_bug_ts1';
CREATE DATABASE db1 TABLESPACE ts1;
DELETE FROM pg_tablespace WHERE spcname = 'ts1';
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```
Backend logs show:

```
[1] LOG: client backend (PID 15420) was terminated by signal 11:

Segmentation fault

[1] DETAIL: Failed process was running: SELECT * FROM

pg_get_database_ddl('db1'::regdatabase);

[1] LOG: terminating any other active server processes
```
After applying the patch:

```
SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
ERROR: tablespace with OID 16393 does not exist
HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a valid

tablespace.

```
and backend logs show:

```
[56] ERROR: tablespace with OID 16393 does not exist
[56] HINT: To recover, try ALTER DATABASE ... SET TABLESPACE ... to a

valid tablespace.

[56] STATEMENT: SELECT * FROM pg_get_database_ddl('db1'::regdatabase);
```
All tests pass.

The only note I'd have on the code change is that there is no

accompanying test. It seems like a TAP test would be

reasonable, but I am quite new and will defer to whether you think

that's the right call or even necessary.

Jack

This seems similar to [1]. Could you please confirm?

[1]

/messages/by-id/CAJTYsWXcd324VELk=9KdsfTsua9So3Yexqv7N3B23h9zAUD40g@mail.gmail.com
.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Thanks for printing out that. Yes, they are similar.

I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```

So, let me take back this patch.

[2] /messages/by-id/1538113.1768921841@sss.pgh.pa.us

In this case, it is a very corner case but not something superuser
intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a
different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function
(intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this comment
while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.

Attached an injection point test to show the race. Not intended to commit.

Thanks,
Satya

Attachments:

0002-Test-pg_get_database_ddl-race-with-concurrent-tablespace-drop.patchapplication/octet-stream; name=0002-Test-pg_get_database_ddl-race-with-concurrent-tablespace-drop.patchDownload+118-2
#6Andrew Dunstan
andrew@dunslane.net
In reply to: SATYANARAYANA NARLAPURAM (#5)
Re: Fix a server crash problem from pg_get_database_ddl

On 2026-04-23 Th 2:47 AM, SATYANARAYANA NARLAPURAM wrote:

Thanks for printing out that. Yes, they are similar.

I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```

So, let me take back this patch.

[2]
/messages/by-id/1538113.1768921841@sss.pgh.pa.us

In this case, it is a very corner case but not something superuser
intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a
different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function
(intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this
comment while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.

Attached an injection point test to show the race. Not intended to commit.

I agree if there's a race condition we should protect against it. I
don't much like the idea of silently ignoring it, though. Raising an
error seems more like the right thing to do.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#7Chao Li
li.evan.chao@gmail.com
In reply to: Andrew Dunstan (#6)
Re: Fix a server crash problem from pg_get_database_ddl

On Apr 26, 2026, at 22:50, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2026-04-23 Th 2:47 AM, SATYANARAYANA NARLAPURAM wrote:

Thanks for printing out that. Yes, they are similar.

I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```

So, let me take back this patch.

[2] /messages/by-id/1538113.1768921841@sss.pgh.pa.us
In this case, it is a very corner case but not something superuser intentionally breaks.
For example, a concurrent tablespace drop + database ddl to assign a different tablespace or default.
We aren't acquiring Access Share lock on the DB in this function (intentional) so it is a good practice
to do the null checks. Of course, it makes more sense to add this comment while doing a code review.
I will let Tom and others chime in with their thoughts on fixing this.

Attached an injection point test to show the race. Not intended to commit.

I agree if there's a race condition we should protect against it. I don't much like the idea of silently ignoring it, though. Raising an error seems more like the right thing to do.

cheers

andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com

The v1 patch raises an error when the tablespace name is NULL.

PFA v2: removed hint from the error message, because I now consider the hint might not be necessary.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v2-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patchapplication/octet-stream; name=v2-0001-ddlutils-error-out-when-pg_get_database_ddl-sees-.patch; x-unix-mode=0644Download+6-1
#8SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Chao Li (#7)
Re: Fix a server crash problem from pg_get_database_ddl

Hi Chao,

On Sun, Apr 26, 2026 at 7:03 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 26, 2026, at 22:50, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2026-04-23 Th 2:47 AM, SATYANARAYANA NARLAPURAM wrote:

Thanks for printing out that. Yes, they are similar.

I agree with what Tom said in [2]:
```
This is not a bug. This is a superuser intentionally breaking
the system by corrupting the catalogs. There are any number
of ways to cause trouble with ill-advised manual updates to a
catalog table. Try, eg, "DELETE FROM pg_proc" (... but not in
a database you care about).
```

So, let me take back this patch.

[2]

/messages/by-id/1538113.1768921841@sss.pgh.pa.us

In this case, it is a very corner case but not something superuser

intentionally breaks.

For example, a concurrent tablespace drop + database ddl to assign a

different tablespace or default.

We aren't acquiring Access Share lock on the DB in this function

(intentional) so it is a good practice

to do the null checks. Of course, it makes more sense to add this

comment while doing a code review.

I will let Tom and others chime in with their thoughts on fixing this.

Attached an injection point test to show the race. Not intended to

commit.

I agree if there's a race condition we should protect against it. I

don't much like the idea of silently ignoring it, though. Raising an error
seems more like the right thing to do.

cheers

andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com

The v1 patch raises an error when the tablespace name is NULL.

PFA v2: removed hint from the error message, because I now consider the
hint might not be necessary.

+ if (spcname == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablespace with OID %u does not exist",
+   dbform->dattablespace));
+

A message with error detail that says a concurrent DDL could have dropped a
tablespace could be better?
System catalog is optional.
Something like:

errdetail("The tablespace may have been dropped concurrently, or the system
catalog is inconsistent.")));

Thanks,
Satya