Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

Started by Mahendra Singh Thalorover 6 years ago37 messageshackersbugs
Jump to latest
#1Mahendra Singh Thalor
mahi6run@gmail.com
hackersbugs

Hi hackers,

In other thread "[HACKERS] Block level parallel vacuum"[1]/messages/by-id/CANEvxPorfG2Ck3kuDkm5tWpK+3uCzRiibOJ-Lk4ZJ6wHP4KJfA@mail.gmail.com, Prabhat Kumar
Sahu reported a random assert failure but he got only once and he was not
able to reproduce it. In that thread [2]/messages/by-id/CAA4eK1L-Y7vyo+ypH55kFHy1HS=4h1ZWQ+5fthKBgOdQzz4hOw@mail.gmail.com, Amit Kapila suggested some points
to reproduce assert. I tried to reproduce and I was able to reproduce it
consistently.

Below are the steps to reproduce assert:
*Configure sett*ing:
log_min_messages=debug1
autovacuum_naptime = 5s
autovacuum = on

postgres=# create temporary table temp1(c1 int);
CREATE TABLE
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
-----------+-------+-------+----------+-------------+---------+-------------
pg_temp_3 | temp1 | table | mahendra | temporary | 0 bytes |
(1 row)

postgres=# drop schema pg_temp_3 cascade;
NOTICE: drop cascades to table temp1
DROP SCHEMA
postgres=# \d+
Did not find any relations.
postgres=# create temporary table temp2(c1 int);
CREATE TABLE
postgres=# \d+
Did not find any relations.
postgres=# select pg_sleep(6);
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
postgres=#

*Stack Trace:*
elinux-2.5-12.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64
openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64
xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x00007f80b2ef9277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f80b2efa968 in __GI_abort () at abort.c:90
#2 0x0000000000ecdd4e in ExceptionalCondition (conditionName=0x11a9bcb
"strvalue != NULL", errorType=0x11a9bbb "FailedAssertion",
fileName=0x11a9bb0 "snprintf.c", lineNumber=442)
at assert.c:67
#3 0x0000000000f80122 in dopr (target=0x7ffe902e44d0, format=0x10e8fe5
".%s\"", args=0x7ffe902e45b8) at snprintf.c:442
#4 0x0000000000f7f821 in pg_vsnprintf (str=0x18cd480 "autovacuum: dropping
orphan temp table \"postgres.", '\177' <repeats 151 times>..., count=1024,
fmt=0x10e8fb8 "autovacuum: dropping orphan temp table \"%s.%s.%s\"",
args=0x7ffe902e45b8) at snprintf.c:195
#5 0x0000000000f74cb3 in pvsnprintf (buf=0x18cd480 "autovacuum: dropping
orphan temp table \"postgres.", '\177' <repeats 151 times>..., len=1024,
fmt=0x10e8fb8 "autovacuum: dropping orphan temp table \"%s.%s.%s\"",
args=0x7ffe902e45b8) at psprintf.c:110
#6 0x0000000000f7775b in appendStringInfoVA (str=0x7ffe902e45d0,
fmt=0x10e8fb8 "autovacuum: dropping orphan temp table \"%s.%s.%s\"",
args=0x7ffe902e45b8) at stringinfo.c:149
#7 0x0000000000ecf5de in errmsg (fmt=0x10e8fb8 "autovacuum: dropping
orphan temp table \"%s.%s.%s\"") at elog.c:832
#8 0x0000000000aef625 in do_autovacuum () at autovacuum.c:2253
#9 0x0000000000aedfae in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1693
#10 0x0000000000aed82f in StartAutoVacWorker () at autovacuum.c:1487
#11 0x0000000000b1773a in StartAutovacuumWorker () at postmaster.c:5562
#12 0x0000000000b16c13 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5279
#13 <signal handler called>
#14 0x00007f80b2fb8c53 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
#15 0x0000000000b0da27 in ServerLoop () at postmaster.c:1691
#16 0x0000000000b0cfa2 in PostmasterMain (argc=3, argv=0x18cb290) at
postmaster.c:1400
#17 0x000000000097868a in main (argc=3, argv=0x18cb290) at main.c:210

ereport(LOG,
(errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
get_database_name(MyDatabaseId),
get_namespace_name(classForm->relnamespace),
NameStr(classForm->relname))));

I debugged and found that "get_namespace_name(classForm->relnamespace)" was
null so it was crashing.

This bug is introduced or exposed from below mentioned commit:
*commit 246a6c8f7b237cc1943efbbb8a7417da9288f5c4*
Author: Michael Paquier <michael@paquier.xyz>
Date: Mon Aug 13 11:49:04 2018 +0200

Make autovacuum more aggressive to remove orphaned temp tables

Commit dafa084, added in 10, made the removal of temporary orphaned
tables more aggressive. This commit makes an extra step into the

Before above commit, we were not getting any assert failure but \d+ was not
showing any temp table info after "drop schema pg_temp_3 cascade" (for
those tables are created after drooping schema) .

As per my analysis, I can see that while drooping schema of temporary
table, we are not setting myTempNamespace to invalid so at the time of
creating again temporary table, we are not creating proper schema.

We can fix this problem by either one way 1) reset myTempNamespace to
invalid while drooping schema of temp table 2) should not allow to drop
temporary table schema

Please let me know your thoughts to fix this problem.

[1]: /messages/by-id/CANEvxPorfG2Ck3kuDkm5tWpK+3uCzRiibOJ-Lk4ZJ6wHP4KJfA@mail.gmail.com
/messages/by-id/CANEvxPorfG2Ck3kuDkm5tWpK+3uCzRiibOJ-Lk4ZJ6wHP4KJfA@mail.gmail.com
[2]: /messages/by-id/CAA4eK1L-Y7vyo+ypH55kFHy1HS=4h1ZWQ+5fthKBgOdQzz4hOw@mail.gmail.com
/messages/by-id/CAA4eK1L-Y7vyo+ypH55kFHy1HS=4h1ZWQ+5fthKBgOdQzz4hOw@mail.gmail.com

Thanks and Regards
Mahendra Siingh Thalor
EnterpriseDB: http://www.enterprisedb.com

#2Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#1)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Tue, Dec 24, 2019 at 04:50:58PM +0530, Mahendra Singh wrote:

We can fix this problem by either one way 1) reset myTempNamespace to
invalid while drooping schema of temp table 2) should not allow to drop
temporary table schema

(Please note that it is better not to cross-post on multiple lists, so
I have removed pgsql-bugs from CC.)

There is a little bit more to that, as we would basically need to do
the work of RemoveTempRelationsCallback() once the temp schema is
dropped, callback registered when the schema is correctly created at
transaction commit (also we need to make sure that
RemoveTempRelationsCallback is not called or unregistered if we were
to authorize DROP SCHEMA on a temp schema). And then all the reset
done at the beginning of AtEOXact_Namespace() would need to happen.

Anyway, as dropping a temporary schema leads to an inconsistent
behavior when recreating new temporary objects in a session that
dropped it, that nobody has actually complained on the matter, and
that in concept a temporary schema is linked to the session that
created it, I think that we have a lot of arguments to just forbid the
operation from happening. Please note as well that it is possible to
drop temporary schemas of other sessions, still this is limited to
owners of the schema.

In short, let's tighten the logic, and we had better back-patch this
one all the way down, 9.4 being broken. Attached is a patch to do
that. The error message generated depends on the state of the session
so I have not added a test for this reason, and the check is added
before the ACL check. We could make the error message more generic,
like "cannot drop temporary namespace". Any thoughts?
--
Michael

Attachments:

drop-temp-schema-v1.patchtext/x-diff; charset=us-asciiDownload+14-0
#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#2)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

At Wed, 25 Dec 2019 11:22:03 +0900, Michael Paquier <michael@paquier.xyz> wrote in

Anyway, as dropping a temporary schema leads to an inconsistent
behavior when recreating new temporary objects in a session that
dropped it, that nobody has actually complained on the matter, and
that in concept a temporary schema is linked to the session that

Agreed.

created it, I think that we have a lot of arguments to just forbid the
operation from happening. Please note as well that it is possible to
drop temporary schemas of other sessions, still this is limited to
owners of the schema.

In short, let's tighten the logic, and we had better back-patch this
one all the way down, 9.4 being broken. Attached is a patch to do
that. The error message generated depends on the state of the session
so I have not added a test for this reason, and the check is added
before the ACL check. We could make the error message more generic,
like "cannot drop temporary namespace". Any thoughts?

Just inhibiting the action seems reasonable to me.

Still the owner can drop temporary namespace on another session or
pg_toast_temp_x of the current session.

isTempnamespace(address.objectId) doesn't work for the purpose.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#3)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Wed, Dec 25, 2019 at 12:18:26PM +0900, Kyotaro Horiguchi wrote:

Still the owner can drop temporary namespace on another session or
pg_toast_temp_x of the current session.

Arf. Yes, this had better be isAnyTempNamespace() so as we complain
about all of them.
--
Michael

#5Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Michael Paquier (#2)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Wed, 25 Dec 2019 at 07:52, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 24, 2019 at 04:50:58PM +0530, Mahendra Singh wrote:

We can fix this problem by either one way 1) reset myTempNamespace to
invalid while drooping schema of temp table 2) should not allow to drop
temporary table schema

(Please note that it is better not to cross-post on multiple lists, so

Sorry. I was not aware of multiple mail ids. I will take care in future mails.

I have removed pgsql-bugs from CC.)

Thanks.

There is a little bit more to that, as we would basically need to do
the work of RemoveTempRelationsCallback() once the temp schema is
dropped, callback registered when the schema is correctly created at
transaction commit (also we need to make sure that
RemoveTempRelationsCallback is not called or unregistered if we were
to authorize DROP SCHEMA on a temp schema). And then all the reset
done at the beginning of AtEOXact_Namespace() would need to happen.

Thanks for quick detailed analysis.

Anyway, as dropping a temporary schema leads to an inconsistent
behavior when recreating new temporary objects in a session that
dropped it, that nobody has actually complained on the matter, and
that in concept a temporary schema is linked to the session that
created it, I think that we have a lot of arguments to just forbid the
operation from happening. Please note as well that it is possible to
drop temporary schemas of other sessions, still this is limited to
owners of the schema.

Yes, you are right that we can drop temporary schema of other sessions.

Even after applying your attached patch, I am getting same assert
failure because I am able to drop " temporary schema" from other
session so I think, we should not allow to drop any temporary schema
from any session.

In short, let's tighten the logic, and we had better back-patch this
one all the way down, 9.4 being broken. Attached is a patch to do

Yes, I also verified that we have to back-patch till v9.4.

that. The error message generated depends on the state of the session
so I have not added a test for this reason, and the check is added
before the ACL check. We could make the error message more generic,
like "cannot drop temporary namespace". Any thoughts?

I think, we can make error message as "cannot drop temporary schema"

While applying attached patch on HEAD, I got below warnings:

[mahendra@localhost postgres]$ git apply drop-temp-schema-v1.patch
drop-temp-schema-v1.patch:9: trailing whitespace.
/*
drop-temp-schema-v1.patch:10: trailing whitespace.
* Prevent drop of a temporary schema as this would mess up with
drop-temp-schema-v1.patch:11: trailing whitespace.
* the end-of-session callback cleaning up all temporary objects.
drop-temp-schema-v1.patch:12: trailing whitespace.
* As the in-memory state is not cleaned up either here, upon
drop-temp-schema-v1.patch:13: trailing whitespace.
* recreation of a temporary schema within the same session the
error: patch failed: src/backend/commands/dropcmds.c:101
error: src/backend/commands/dropcmds.c: patch does not apply

I think, above warnings are due to "trailing CRs" in patch.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#6Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#5)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Wed, Dec 25, 2019 at 10:07:58AM +0530, Mahendra Singh wrote:

Yes, you are right that we can drop temporary schema of other sessions.

I have mentioned that upthread, and basically we need to use
isAnyTempNamespace() here. My mistake.

While applying attached patch on HEAD, I got below warnings:

The patch applies cleanly for me.
--
Michael

#7Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Wed, Dec 25, 2019 at 12:24:10PM +0900, Michael Paquier wrote:

Arf. Yes, this had better be isAnyTempNamespace() so as we complain
about all of them.

Okay, finally coming back to that. Attached is an updated patch with
polished comments and the fixed logic.
--
Michael

Attachments:

drop-temp-schema-v2.patchtext/x-diff; charset=us-asciiDownload+15-0
#8Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Michael Paquier (#7)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Thu, 26 Dec 2019 at 19:23, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 25, 2019 at 12:24:10PM +0900, Michael Paquier wrote:

Arf. Yes, this had better be isAnyTempNamespace() so as we complain
about all of them.

Okay, finally coming back to that. Attached is an updated patch with
polished comments and the fixed logic.

Thanks Michael for patch.

Patch is fixing all the issues.

I think, we can add a regression test for this.
postgres=# create temporary table temp(c1 int);
CREATE TABLE
postgres=# drop schema pg_temp_3 cascade ;
ERROR: cannot drop temporary namespace "pg_temp_3"
postgres=#

I have one doubt. Please give me your opinion on below doubt.
Let suppose, I connected 10 sessions at a time and created 1 temporary
table to each session. Then it is creating schema from pg_temp_3 to
pg_temp_12 (one schema for each temp table session). After that, I
closed all the 10 sessions but if I connect again any session and
checking all the schema, It is still showing pg_temp_3 to pg_temp_12.
Is this expected behavior ? or we should not display any temp table
schema after closing session. I thought that auto_vacuum wlll drop all
the temp table schema but it is not drooping.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mahendra Singh Thalor (#8)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

Mahendra Singh <mahi6run@gmail.com> writes:

I think, we can add a regression test for this.
postgres=# create temporary table temp(c1 int);
CREATE TABLE
postgres=# drop schema pg_temp_3 cascade ;
ERROR: cannot drop temporary namespace "pg_temp_3"
postgres=#

No, we can't, because the particular temp namespace used by a given
session isn't stable.

I thought that auto_vacuum wlll drop all
the temp table schema but it is not drooping.

Generally speaking, once a particular pg_temp_N schema exists it's
never dropped, just recycled for use by subsequent sessions.

regards, tom lane

#10Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Tom Lane (#9)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Thu, 26 Dec 2019 at 23:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mahendra Singh <mahi6run@gmail.com> writes:

I think, we can add a regression test for this.
postgres=# create temporary table temp(c1 int);
CREATE TABLE
postgres=# drop schema pg_temp_3 cascade ;
ERROR: cannot drop temporary namespace "pg_temp_3"
postgres=#

No, we can't, because the particular temp namespace used by a given
session isn't stable.

I thought that auto_vacuum wlll drop all
the temp table schema but it is not drooping.

Generally speaking, once a particular pg_temp_N schema exists it's
never dropped, just recycled for use by subsequent sessions.

Okay. Understood. Thanks for clarification.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#11Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#10)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Fri, Dec 27, 2019 at 12:33:03AM +0530, Mahendra Singh wrote:

On Thu, 26 Dec 2019 at 23:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No, we can't, because the particular temp namespace used by a given
session isn't stable.

And I'd prefer keep the name of the namespace in the error message,
because the information is helpful.

I thought that auto_vacuum wlll drop all
the temp table schema but it is not drooping.

Generally speaking, once a particular pg_temp_N schema exists it's
never dropped, just recycled for use by subsequent sessions.

Okay. Understood. Thanks for clarification.

Please see RemoveTempRelations() for the details, which uses
PERFORM_DELETION_SKIP_ORIGINAL to avoid a drop of the temp schema, and
just work on all the objects the schema includes.

And committed down to 9.4. We use much more "temporary schema" in
error messages actually, so I have switched to that.
--
Michael

#12Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#11)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Fri, Dec 27, 2019 at 4:06 AM Michael Paquier <michael@paquier.xyz> wrote:

And committed down to 9.4. We use much more "temporary schema" in
error messages actually, so I have switched to that.

I think this was a bad idea and that it should be reverted. It seems
to me that the problem here is that you introduced a feature which had
a bug, namely that it couldn't tolerate concurrency, and when somebody
discovered the bug, you "fixed" it not by making the code able to
tolerate concurrent activity but by preventing concurrent activity
from happening in the first place. I think that's wrong on general
principle.

In this specific case, DROP SCHEMA on another temporary sessions's
schema is a feature which has existed for a very long time and which I
have used on multiple occasions to repair damaged databases. Suppose,
for example, there's a catalog entry that prevents the schema from
being dropped. Before this commit, you could fix it or delete the
entry and then retry the drop. Now, you can't. You can maybe wait for
autovacuum to retry it or something, assuming autovacuum is working
and you're in multi-user mode.

But even if that weren't the case, this seems like a very fragile fix.
Maybe someday we'll allow multiple autovacuum workers in the same
database, and the problem comes back. Maybe some user who can't drop
the schema because of this arbitrary prohibition will find themselves
forced to delete the pg_namespace row by hand and then crash the
server. Most server code is pretty careful that to either tolerate
missing system catalog tuples or elog(ERROR), not crash (e.g. cache
lookup failed for ...). This code shouldn't be an exception to that
rule.

Also, as a matter of procedure, 3 days from first post to commit is
not a lot, especially when the day something is posted is Christmas
Eve.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#12)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Sun, Dec 29, 2019 at 07:37:15AM -0500, Robert Haas wrote:

I think this was a bad idea and that it should be reverted. It seems
to me that the problem here is that you introduced a feature which had
a bug, namely that it couldn't tolerate concurrency, and when somebody
discovered the bug, you "fixed" it not by making the code able to
tolerate concurrent activity but by preventing concurrent activity
from happening in the first place. I think that's wrong on general
principle.

Sorry for the delay, there was a long period off here so I could not
have a serious look.

The behavior of the code in 246a6c8 has changed so as a non-existing
temporary namespace is considered as not in use, in which case
autovacuum would consider this relation to be orphaned, and it would
then try to drop it. Anyway, just a revert of the patch is not a good
idea either, because keeping around the old behavior allows any user
to create orphaned relations that autovacuum would just ignore in
9.4~10, leading to ultimately a forced shutdown of the instance as no
cleanup can happen if this goes unnoticed. This also puts pg_class
into an inconsistent state as pg_class entries would include
references to a namespace that does not exist for sessions still
holding its own references to myTempNamespace/myTempToastNamespace.

In this specific case, DROP SCHEMA on another temporary sessions's
schema is a feature which has existed for a very long time and which I
have used on multiple occasions to repair damaged databases. Suppose,
for example, there's a catalog entry that prevents the schema from
being dropped. Before this commit, you could fix it or delete the
entry and then retry the drop. Now, you can't. You can maybe wait for
autovacuum to retry it or something, assuming autovacuum is working
and you're in multi-user mode.

This behavior is broken since its introduction then per the above. If
we were to allow DROP SCHEMA to work properly on temporary schema, we
would need to do more than what we have now, and that does not involve
just mimicking DISCARD TEMP if you really wish to be able to drop the
schema entirely and not only the objects it includes. Allowing a
temporary schema to be dropped only if it is owned by the current
session would be simple enough to implement, but I think that allowing
that to work properly for a schema owned by another session would be
rather difficult to implement for little gains. Now, if you still
wish to be able to do a DROP SCHEMA on a temporary schema, I have no
objections to allow doing that, but under some conditions. So I would
recommend to restrict it so as this operation is not allowed by
default, and I think we ought to use allow_system_table_mods to
control that, because if you were to do that you are an operator and
you know what you are doing. Normally :)

But even if that weren't the case, this seems like a very fragile fix.
Maybe someday we'll allow multiple autovacuum workers in the same
database, and the problem comes back. Maybe some user who can't drop
the schema because of this arbitrary prohibition will find themselves
forced to delete the pg_namespace row by hand and then crash the
server. Most server code is pretty careful that to either tolerate
missing system catalog tuples or elog(ERROR), not crash (e.g. cache
lookup failed for ...). This code shouldn't be an exception to that
rule.

You are right here, things could be done better in 11 and newer
versions, still there are multiple ways to do that. Here are three
suggestions:
1) Issue an elog(ERROR) as that's what we do usually for lookup errors
and such when seeing an orphaned relation which refers to a
non-existing namespace. But this would prevent autovacuum to do
any kind of work and just loop over-and-over on the same error, just
bloating the database involved.
2) Ignore the relation and leave it around, though we really have been
fighting to make autovacuum more aggressive, so that would defeat the
work done lately for that purpose.
3) Still drop the orphaned relation even if it references to a
non-existing schema, generating an appropriate LOG message so as the
problem comes from an incorrect lookup at the namespace name.

Attached is a patch doing two things:
a) Control DROP SCHEMA on a temporary namespace using
allow_system_table_mods.
b) Generate a non-buggy LOG message if trying to remove a temp
relation referring to a temporary schema that does not exist, using
"(null)" as a replacement for the schema name.

My suggestion is to do a) down to 9.4 if that's thought to be helpful
to have, and at least Robert visibly thinks so, then b) in 11~ as
that's where 246a6c8 exists. Comments welcome.
--
Michael

Attachments:

drop-temp-schema-adjust-v1.patchtext/x-diff; charset=us-asciiDownload+12-6
#14Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#13)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Sun, Jan 5, 2020 at 8:42 PM Michael Paquier <michael@paquier.xyz> wrote:

The behavior of the code in 246a6c8 has changed so as a non-existing
temporary namespace is considered as not in use, in which case
autovacuum would consider this relation to be orphaned, and it would
then try to drop it. Anyway, just a revert of the patch is not a good
idea either, because keeping around the old behavior allows any user
to create orphaned relations that autovacuum would just ignore in
9.4~10, leading to ultimately a forced shutdown of the instance as no
cleanup can happen if this goes unnoticed. This also puts pg_class
into an inconsistent state as pg_class entries would include
references to a namespace that does not exist for sessions still
holding its own references to myTempNamespace/myTempToastNamespace.

I'm not arguing for a revert of 246a6c8. I think we should just change this:

ereport(LOG,
(errmsg("autovacuum: dropping orphan
temp table \"%s.%s.%s\"",
get_database_name(MyDatabaseId),

get_namespace_name(classForm->relnamespace),
NameStr(classForm->relname))));

To look more like:

char *nspname = get_namespace_name(classForm->relnamespace);
if (nspname != NULL)
ereport(..."autovacuum: dropping orphan temp table \"%s.%s.%s\"...)
else
ereport(..."autovacuum: dropping orphan temp table with OID %u"....)

If we do that, then I think we can just revert
a052f6cbb84e5630d50b68586cecc127e64be639 completely. As a side
benefit, this would also provide some insurance against other
possibly-problematic situations, like a corrupted pg_class row with a
garbage value in the relnamespace field, which is something I've seen
multiple times in the field.

I can't quite understand your comments about why we shouldn't do that,
but the reported bug is just a null pointer reference. Incredibly,
autovacuum.c seems to have been using get_namespace_name() without a
null check since 2006, so it's not really the fault of your patch as I
had originally thought. I wonder how in the world we've managed to get
away with it for as long as we have.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

Robert Haas <robertmhaas@gmail.com> writes:

I'm not arguing for a revert of 246a6c8. I think we should just change this:
...
To look more like:

char *nspname = get_namespace_name(classForm->relnamespace);
if (nspname != NULL)
ereport(..."autovacuum: dropping orphan temp table \"%s.%s.%s\"...)
else
ereport(..."autovacuum: dropping orphan temp table with OID %u"....)

If we do that, then I think we can just revert
a052f6cbb84e5630d50b68586cecc127e64be639 completely.

+1 to both of those --- although I think we could still provide the
table name in the null-nspname case.

autovacuum.c seems to have been using get_namespace_name() without a
null check since 2006, so it's not really the fault of your patch as I
had originally thought. I wonder how in the world we've managed to get
away with it for as long as we have.

Maybe we haven't. It's not clear that infrequent autovac crashes would
get reported to us, or that we'd successfully find the cause if they were.

I think what you propose above is a back-patchable bug fix.

regards, tom lane

#16Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#15)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Mon, Jan 06, 2020 at 12:33:47PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm not arguing for a revert of 246a6c8. I think we should just change this:
...
To look more like:

char *nspname = get_namespace_name(classForm->relnamespace);
if (nspname != NULL)
ereport(..."autovacuum: dropping orphan temp table \"%s.%s.%s\"...)
else
ereport(..."autovacuum: dropping orphan temp table with OID %u"....)

If we do that, then I think we can just revert
a052f6cbb84e5630d50b68586cecc127e64be639 completely.

+1 to both of those --- although I think we could still provide the
table name in the null-nspname case.

Okay for the first one, printing the OID sounds like a good idea.
Like Tom, I would prefer keeping the relation name with "(null)" for
the schema name. Or even better, could we just print the OID all the
time? What's preventing us from showing that information in the first
place? And that still looks good to have when debugging issues IMO
for orphaned entries.

For the second one, I would really wish that we keep the restriction
put in place by a052f6c until we actually figure out how to make the
operation safe in the ways we want it to work because this puts
the catalogs into an inconsistent state for any object type able to
use a temporary schema, like functions, domains etc. for example able
to use "pg_temp" as a synonym for the temp namespace name. And any
connected user is able to do that. On top of that, except for tables,
these could remain as orphaned entries after a crash, no? Allowing
the operation only via allow_system_table_mods gives an exit path
actually if you really wish to do so, which is fine by me as startup
controls that, aka an administrator.

In short, I don't think that it is sane to keep in place the property,
visibly accidental (?) for any user to create inconsistent catalog
entries using a static state in the session which is incorrect in
namespace.c, except if we make DROP SCHEMA on a temporary schema have
a behavior close to DISCARD TEMP. Again, for the owner of the session
that's simple, no clear idea how to do that safely when the drop is
done from another session not owning the temp schema.

Maybe we haven't. It's not clear that infrequent autovac crashes would
get reported to us, or that we'd successfully find the cause if they were.

I think what you propose above is a back-patchable bug fix.

Yeah, likely it is safer to fix the logs in the long run down to 9.4.
--
Michael

#17Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#16)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Mon, Jan 6, 2020 at 7:22 PM Michael Paquier <michael@paquier.xyz> wrote:

Okay for the first one, printing the OID sounds like a good idea.
Like Tom, I would prefer keeping the relation name with "(null)" for
the schema name. Or even better, could we just print the OID all the
time? What's preventing us from showing that information in the first
place? And that still looks good to have when debugging issues IMO
for orphaned entries.

I think we should have two different messages, rather than trying to
shoehorn things into one message using a fake schema name.

For the second one, I would really wish that we keep the restriction
put in place by a052f6c until we actually figure out how to make the
operation safe in the ways we want it to work because this puts
the catalogs into an inconsistent state for any object type able to
use a temporary schema, like functions, domains etc. for example able
to use "pg_temp" as a synonym for the temp namespace name. And any
connected user is able to do that.

So what?

On top of that, except for tables,
these could remain as orphaned entries after a crash, no?

Tables, too, although they want have storage any more. But your patch
in no way prevents that. It just makes it harder to fix when it does
happen. So I see no advantages of it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jan 6, 2020 at 7:22 PM Michael Paquier <michael@paquier.xyz> wrote:

For the second one, I would really wish that we keep the restriction
put in place by a052f6c until we actually figure out how to make the
operation safe in the ways we want it to work because this puts
the catalogs into an inconsistent state for any object type able to
use a temporary schema, like functions, domains etc. for example able
to use "pg_temp" as a synonym for the temp namespace name. And any
connected user is able to do that.

So what?

I still agree with Robert that a052f6c is a bad idea. It's not the case
that that's blocking "any connected user" from causing an issue. The
temp schemas are always owned by the bootstrap superuser, so only a
superuser could delete them. All that that patch is doing is preventing
superusers from doing something that they could reasonably wish to do,
and that is perfectly safe when there's not concurrent usage of the
schema. We are not normally that nanny-ish, and the case for being so
here seems pretty thin.

regards, tom lane

#19Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#18)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Tue, Jan 07, 2020 at 01:06:08PM -0500, Tom Lane wrote:

I still agree with Robert that a052f6c is a bad idea. It's not the case
that that's blocking "any connected user" from causing an issue. The
temp schemas are always owned by the bootstrap superuser, so only a
superuser could delete them. All that that patch is doing is preventing
superusers from doing something that they could reasonably wish to do,
and that is perfectly safe when there's not concurrent usage of the
schema. We are not normally that nanny-ish, and the case for being so
here seems pretty thin.

Okay, I am running out of arguments then, so attached is a patch to
address things. I would also prefer if we keep the relation name in
the log even if the namespace is missing.
--
Michael

Attachments:

drop-temp-schema-adjust-v2.patchtext/x-diff; charset=us-asciiDownload+16-20
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#19)
hackersbugs
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

Michael Paquier <michael@paquier.xyz> writes:

Okay, I am running out of arguments then, so attached is a patch to
address things. I would also prefer if we keep the relation name in
the log even if the namespace is missing.

A couple of thoughts:

* Please revert a052f6c as a separate commit specifically doing that,
so that when it comes time to make the release notes, it's clear that
a052f6c doesn't require documentation.

* I think the check on log_min_messages <= LOG is probably wrong, since
LOG sorts out of order for this purpose. Compare is_log_level_output()
in elog.c. I'd suggest not bothering with trying to optimize away the
get_namespace_name call here; we shouldn't be in this code path often
enough for performance to matter, and nobody ever cared about it before.

* I don't greatly like the notation
dropping orphan temp table \"%s.(null).%s\" ...
and I bet Robert won't either. Not sure offhand about a better
idea --- maybe
dropping orphan temp table \"%s\" with OID %u in database \"%s\"

regards, tom lane

#21Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#20)
hackersbugs
#22Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#21)
hackersbugs
#23Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Michael Paquier (#22)
hackersbugs
#24Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#23)
hackersbugs
#25Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#24)
hackersbugs
#26Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Michael Paquier (#25)
hackersbugs
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#25)
hackersbugs
#28Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#27)
hackersbugs
#29Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#26)
hackersbugs
#30Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#25)
hackersbugs
#31Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Michael Paquier (#30)
hackersbugs
#32Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#31)
hackersbugs
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#30)
hackersbugs
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#33)
hackersbugs
#35Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#34)
hackersbugs
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#35)
hackersbugs
#37Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#36)
hackersbugs