removal of dangling temp tables

Started by Alvaro Herreraabout 7 years ago29 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com

We recently ran into a funny situation, where autovacuum would not
remove very old temp tables. The relfrozenxid of those tables was about
to reach the max freeze age, so monitoring started to complain. It
turned out that autovacuum saw that the backendId was used by a live
backend ... but that session was in reality not using those temp tables,
and had not used any temp table at all. They were left-overs from a
crash months ago, and since the session was not using temp tables, they
had not been removed. DISCARD ALL may have run, but had no effect.

I think the best way to fix this is to call RemoveTempRelations()
unconditionally at session start (without doing the rest of the temp
table setup, just the removal.)

In versions earlier than pg11, related issues occur if you have a crash
with autovacuum off and/or workers disabled, and temp tables are leaked
in backendID 1 or 2; then start with normal values. In that case, those
backendIDs are used by the logical replication launcher and the
autovacuum launcher, so autovacuum does not remove them either. This
was fixed in PG11 inadvertently by this commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=943576bddcb52971041d9f5f806789921fa107ee
The reason the commit fixes it is that now the databaseID of the PGPROC
entry is compared to the temp table's database; and for those worker
processes, the DatabaseId is InvalidOid so it all works out.

This isn't a terribly interesting bug, as restarting with changed
worker/autovacuum options after a crash that happens to leak temp tables
should be quite rare. But anyway we can fix this second issue in prior
branches by adding a comparison to databaseId to the existing 'if' test
in autovacuum; easy enough, no compatibility concerns. This should also
cover the case that one session crashes leaking temp tables, then the
same session connects to a different database for a long time. (I
didn't verify this last point to be a real problem.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#2Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#1)
Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I think the best way to fix this is to call RemoveTempRelations()
unconditionally at session start (without doing the rest of the temp
table setup, just the removal.)

That would certainly simplify things. I think I thought about that as
far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed
like a significant behavior change and I wasn't sure that everyone
would like it. In particular, it adds overhead to backend startup
that, in the case of a large temp schema, could be fairly long.

Nevertheless, I tentatively think that a change like this is a good
idea. I wouldn't back-patch it, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#2)
1 attachment(s)
Re: removal of dangling temp tables

On 2018-Dec-14, Robert Haas wrote:

On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I think the best way to fix this is to call RemoveTempRelations()
unconditionally at session start (without doing the rest of the temp
table setup, just the removal.)

That would certainly simplify things. I think I thought about that as
far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed
like a significant behavior change and I wasn't sure that everyone
would like it. In particular, it adds overhead to backend startup
that, in the case of a large temp schema, could be fairly long.

Hmm, I think in the case covered by your commit, that is a session that
crashes with a few thousands of temp tables, this new patch might cause
a failure to open a new session altogether.

Maybe it'd be better to change temp table removal to always drop
max_locks_per_transaction objects at a time (ie. commit/start a new
transaction every so many objects).

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

clean-temp-ns-startup.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 13a24631fc..003bbabf1f 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4118,6 +4118,26 @@ RemoveTempRelations(Oid tempNamespaceId)
 }
 
 /*
+ * Remove temp tables, without relying on myTempNamespace being set, and
+ * without setting it.  This is used at session start.
+ */
+void
+InitRemoveTempRelations(void)
+{
+	Oid			namespaceOid;
+	char		namespaceName[NAMEDATALEN];
+
+	Assert(MyBackendId != InvalidBackendId);
+
+	snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
+
+	namespaceOid = get_namespace_oid(namespaceName, true);
+
+	if (OidIsValid(namespaceOid))
+		RemoveTempRelations(namespaceOid);
+}
+
+/*
  * Callback to remove temp relations at backend exit.
  */
 static void
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index b636b1e262..b41b47222c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -1074,6 +1074,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (!bootstrap)
 		pgstat_bestart();
 
+	/* Remove any temp tables that might have leaked previously */
+	if (!bootstrap)
+		InitRemoveTempRelations();
+
 	/* close the transaction we started above */
 	if (!bootstrap)
 		CommitTransactionCommand();
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 0e202372d5..f90eeff1b7 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -144,6 +144,7 @@ extern void GetTempNamespaceState(Oid *tempNamespaceId,
 					  Oid *tempToastNamespaceId);
 extern void SetTempNamespaceState(Oid tempNamespaceId,
 					  Oid tempToastNamespaceId);
+extern void InitRemoveTempRelations(void);
 extern void ResetTempTableNamespace(void);
 
 extern OverrideSearchPath *GetOverrideSearchPath(MemoryContext context);
#4Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#3)
Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 12:27 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Hmm, I think in the case covered by your commit, that is a session that
crashes with a few thousands of temp tables, this new patch might cause
a failure to open a new session altogether.

Oh, good point. Or if the catalog is corrupted.

Maybe it'd be better to change temp table removal to always drop
max_locks_per_transaction objects at a time (ie. commit/start a new
transaction every so many objects).

We're basically just doing DROP SCHEMA ... CASCADE, so I'm not sure
how we'd implement that, but I agree it would be significantly better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: removal of dangling temp tables

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I think the best way to fix this is to call RemoveTempRelations()
unconditionally at session start (without doing the rest of the temp
table setup, just the removal.)

That would certainly simplify things. I think I thought about that as
far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed
like a significant behavior change and I wasn't sure that everyone
would like it. In particular, it adds overhead to backend startup
that, in the case of a large temp schema, could be fairly long.

Nevertheless, I tentatively think that a change like this is a good
idea. I wouldn't back-patch it, though.

I seem to recall discussions about having crash recovery go around
and clean out temp tables. That seems like a better plan than
penalizing every session start with this.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I seem to recall discussions about having crash recovery go around
and clean out temp tables. That seems like a better plan than
penalizing every session start with this.

Well, crash recovery already removes the files, but it can't really
remove the catalog entries, can it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: removal of dangling temp tables

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Dec 14, 2018 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I seem to recall discussions about having crash recovery go around
and clean out temp tables. That seems like a better plan than
penalizing every session start with this.

Well, crash recovery already removes the files, but it can't really
remove the catalog entries, can it?

Hm. It *could*, if we wanted it to run some transactions after
finishing recovery. But I guess I wonder why bother; if the disk
space is gone then there's not that much reason to be in a hurry
to get rid of the catalog entries. The particular problem Alvaro
is complaining of might be better handled by ignoring temp tables
while computing max freeze age etc. I have some recollection that
we'd intentionally included them, but I wonder why really; it's
not like autovacuum is going to be able to do anything about an
ancient temp table.

Alternatively, maybe we could have backends flag whether they've
taken ownership of their temp schemas or not, and let autovacuum
flush old temp tables if not?

regards, tom lane

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: removal of dangling temp tables

Hi,

On 2018-12-14 13:35:50 -0500, Tom Lane wrote:

Hm. It *could*, if we wanted it to run some transactions after
finishing recovery.

How, we don't have infrastructure for changing databases yet?

But I guess I wonder why bother; if the disk
space is gone then there's not that much reason to be in a hurry
to get rid of the catalog entries. The particular problem Alvaro
is complaining of might be better handled by ignoring temp tables
while computing max freeze age etc. I have some recollection that
we'd intentionally included them, but I wonder why really; it's
not like autovacuum is going to be able to do anything about an
ancient temp table.

We can't truncate the clog, adapt the xid horizon, etc if there's any
temp tables. Otherwise you'd get failures when reading from one, no?

Greetings,

Andres Freund

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 6:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. It *could*, if we wanted it to run some transactions after
finishing recovery.

It'd have to launch a separate process per database. That would be
useful infrastructure for other things, too, like automatic catalog
upgrades in minor releases, but I'm not volunteering to write that
infrastructure right now.

Alternatively, maybe we could have backends flag whether they've
taken ownership of their temp schemas or not, and let autovacuum
flush old temp tables if not?

Yes, that seems like a possibly promising approach.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#4)
Re: removal of dangling temp tables

On 2018-Dec-14, Robert Haas wrote:

On Fri, Dec 14, 2018 at 12:27 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Maybe it'd be better to change temp table removal to always drop
max_locks_per_transaction objects at a time (ie. commit/start a new
transaction every so many objects).

We're basically just doing DROP SCHEMA ... CASCADE, so I'm not sure
how we'd implement that, but I agree it would be significantly better.

(Minor nit: even currently, we don't drop the schema itself, only the
objects it contains.)

I was thinking we could scan pg_depend for objects depending on the
schema, add them to an ObjectAddresses array, and do
performMultipleDeletions once every max_locks_per_transaction objects.
But in order for this to have any useful effect we'd have to commit the
transaction and start another one; maybe that's too onerous.

Maybe we could offer such a behavior as a special case to be used only
in case the regular mechanism fails. So add a PG_TRY which, in case of
failure, sends a hint to do the cleanup. Not sure this is worthwhile.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: removal of dangling temp tables

On 2018-Dec-14, Robert Haas wrote:

On Fri, Dec 14, 2018 at 6:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. It *could*, if we wanted it to run some transactions after
finishing recovery.

It'd have to launch a separate process per database. That would be
useful infrastructure for other things, too, like automatic catalog
upgrades in minor releases, but I'm not volunteering to write that
infrastructure right now.

This looks like a major project.

Alternatively, maybe we could have backends flag whether they've
taken ownership of their temp schemas or not, and let autovacuum
flush old temp tables if not?

Yes, that seems like a possibly promising approach.

I did propose in my OP the idea of a PGPROC boolean flag that indicates
whether the temp namespace has been set up. If not, have autovac remove
those tables. I like this option better, but I fear it adds more
ProcArrayLock contention. Maybe I can just use a new LWLock to
coordinate that particular member of the ProcGlobal array ... (but then
it can no longer be a boolean.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#11)
Re: removal of dangling temp tables

On Fri, Dec 14, 2018 at 11:06:32PM -0300, Alvaro Herrera wrote:

I did propose in my OP the idea of a PGPROC boolean flag that indicates
whether the temp namespace has been set up. If not, have autovac remove
those tables. I like this option better, but I fear it adds more
ProcArrayLock contention. Maybe I can just use a new LWLock to
coordinate that particular member of the ProcGlobal array ... (but then
it can no longer be a boolean.)

Isn't that what tempNamespaceId can be used for in PGPROC now? The flag
would be set only when a backend creates a new temporary schema so as it
can be tracked as the owner of the schema.
--
Michael

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#12)
Re: removal of dangling temp tables

On 2018-Dec-15, Michael Paquier wrote:

On Fri, Dec 14, 2018 at 11:06:32PM -0300, Alvaro Herrera wrote:

I did propose in my OP the idea of a PGPROC boolean flag that indicates
whether the temp namespace has been set up. If not, have autovac remove
those tables. I like this option better, but I fear it adds more
ProcArrayLock contention. Maybe I can just use a new LWLock to
coordinate that particular member of the ProcGlobal array ... (but then
it can no longer be a boolean.)

Isn't that what tempNamespaceId can be used for in PGPROC now? The flag
would be set only when a backend creates a new temporary schema so as it
can be tracked as the owner of the schema.

Oh, we already have it! Sorry, I overlooked it. With that, it seems
the patch is fairly simple ... I wonder about the locking implications
in autovacuum, though -- the value is set in backends without acquiring
a lock. I wonder if we could use memory barriers, so it'd incur little
cost.

I wonder how this thing works in parallel query workers.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#13)
Re: removal of dangling temp tables

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Dec-15, Michael Paquier wrote:

Isn't that what tempNamespaceId can be used for in PGPROC now? The flag
would be set only when a backend creates a new temporary schema so as it
can be tracked as the owner of the schema.

Oh, we already have it! Sorry, I overlooked it. With that, it seems
the patch is fairly simple ... I wonder about the locking implications
in autovacuum, though -- the value is set in backends without acquiring
a lock.

I was wondering about that too. But I think it's probably OK. If
autovacuum observes that (a) a table is old enough to pose a wraparound
hazard and (b) its putatively owning backend hasn't yet set
tempNamespaceId, then I think it's safe to conclude that that table is
removable, despite the theoretical race condition.

Autovacuum would need to acquire a deletion lock and then check that the
table is still there, to avoid race conditions if the backend starts to
clean out the schema immediately after it looks. But I think those race
conditions exist anyway (consider a fresh backend that starts cleaning out
its temp schema immediately), so if we have a problem with concurrent
deletion attempts then that problem exists already.

I wonder how this thing works in parallel query workers.

Surely workers are not allowed to create or delete temp tables.

regards, tom lane

#15Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#14)
Re: removal of dangling temp tables

On Sat, Dec 15, 2018 at 09:51:31AM -0500, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Oh, we already have it! Sorry, I overlooked it. With that, it seems
the patch is fairly simple ... I wonder about the locking implications
in autovacuum, though -- the value is set in backends without acquiring
a lock.

I was wondering about that too. But I think it's probably OK. If
autovacuum observes that (a) a table is old enough to pose a wraparound
hazard and (b) its putatively owning backend hasn't yet set
tempNamespaceId, then I think it's safe to conclude that that table is
removable, despite the theoretical race condition.

This relies on the fact that the flag gets set by a backend within a
transaction context, and autovacuum would not see yet temp relations
associated to it at the moment of the scan of pg_class if the backend
has not committed yet its namespace creation via the creation of the
first temp table it uses.

Autovacuum would need to acquire a deletion lock and then check that the
table is still there, to avoid race conditions if the backend starts to
clean out the schema immediately after it looks. But I think those race
conditions exist anyway (consider a fresh backend that starts cleaning out
its temp schema immediately), so if we have a problem with concurrent
deletion attempts then that problem exists already.

I wonder how this thing works in parallel query workers.

Surely workers are not allowed to create or delete temp tables.

Yes, InitTempTableNamespace prevents their creation and InitLocalBuffers
prevents their access as buffers of temp tables are local to a backend
and cannot be shared across multiple workers. Amit Kapila has been
working on this problem lately for example.
--
Michael

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#15)
1 attachment(s)
Re: removal of dangling temp tables

On 2018-Dec-16, Michael Paquier wrote:

On Sat, Dec 15, 2018 at 09:51:31AM -0500, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Oh, we already have it! Sorry, I overlooked it. With that, it seems
the patch is fairly simple ... I wonder about the locking implications
in autovacuum, though -- the value is set in backends without acquiring
a lock.

I was wondering about that too. But I think it's probably OK. If
autovacuum observes that (a) a table is old enough to pose a wraparound
hazard and (b) its putatively owning backend hasn't yet set
tempNamespaceId, then I think it's safe to conclude that that table is
removable, despite the theoretical race condition.

This relies on the fact that the flag gets set by a backend within a
transaction context, and autovacuum would not see yet temp relations
associated to it at the moment of the scan of pg_class if the backend
has not committed yet its namespace creation via the creation of the
first temp table it uses.

Makes sense, thanks.

I think there are two possible ways forward. The conservative one is to
just apply the attached patch to branches 9.4-10. That will let
autovacuum drop tables when the backend is in another database. It may
not solve the problem for the bunch of users that have only one database
that takes the majority of connections, but I think it's worth doing
nonetheless. I tested the 9.4 instance and it works fine; tables are
deleted as soon as I make the session connection to another database.

The more aggressive action is to backpatch 943576bddcb5 ("Make
autovacuum more aggressive to remove orphaned temp tables") which is
currently only in pg11. We would put the new PGPROC member at the end
of the struct, to avoid ABI incompatibilities, but it'd bring trouble
for extensions that put PGPROC in arrays. I checked the code of some
known extensions; found that pglogical uses PGPROC, but only pointers to
it, so it wouldn't be damaged by the proposed change AFAICS.

Another possibly useful change is to make DISCARD ALL and DISCARD TEMP
delete everything in what would be the backend's temp namespace, even if
it hasn't been initialized yet. This would cover the case where a
connection pooler keeps the connection open for a very long time, which
I think is a common case.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

autovac-temp-REL9_4_STABLE.patchtext/x-diff; charset=us-asciiDownload
commit 5aa717be9fe7288cbec043cdea0fb757433f3462
Author:     Alvaro Herrera <alvherre@alvh.no-ip.org>
AuthorDate: Wed Dec 26 15:12:44 2018 -0300
CommitDate: Wed Dec 26 15:12:45 2018 -0300

    Improve autovacuum logic about temp tables nearing XID wraparound

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c18285d690..62a06add56 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2108,11 +2108,18 @@ do_autovacuum(void)
 		if (classForm->relpersistence == RELPERSISTENCE_TEMP)
 		{
 			int			backendID;
+			PGPROC	   *proc;
 
 			backendID = GetTempNamespaceBackendId(classForm->relnamespace);
 
-			/* We just ignore it if the owning backend is still active */
-			if (backendID == MyBackendId || BackendIdGetProc(backendID) == NULL)
+			/*
+			 * We just ignore it if the owning backend is still active in the
+			 * same database.
+			 */
+			if (backendID != InvalidBackendId &&
+				(backendID == MyBackendId ||
+				 (proc = BackendIdGetProc(backendID)) == NULL ||
+				 proc->databaseId != MyDatabaseId))
 			{
 				/*
 				 * We found an orphan temp table (which was probably left
#17Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Alvaro Herrera (#16)
RE: removal of dangling temp tables

From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]

The more aggressive action is to backpatch 943576bddcb5 ("Make autovacuum
more aggressive to remove orphaned temp tables") which is currently only
in pg11. We would put the new PGPROC member at the end of the struct, to
avoid ABI incompatibilities, but it'd bring trouble for extensions that
put PGPROC in arrays. I checked the code of some known extensions; found
that pglogical uses PGPROC, but only pointers to it, so it wouldn't be
damaged by the proposed change AFAICS.

+1
I think this is a bug from a user's perspective that garbage is left. I want to believe that fixing bugs of PostgreSQL itself are prioritized over the ABI compatibility for extensions, if we have to choose one of them.

Another possibly useful change is to make DISCARD ALL and DISCARD TEMP delete
everything in what would be the backend's temp namespace, even if it hasn't
been initialized yet. This would cover the case where a connection pooler
keeps the connection open for a very long time, which I think is a common
case.

That sounds good.

Regards
Takayuki Tsunakawa

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tsunakawa, Takayuki (#17)
1 attachment(s)
Re: removal of dangling temp tables

On 2018-Dec-26, Tsunakawa, Takayuki wrote:

From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]

The more aggressive action is to backpatch 943576bddcb5 ("Make autovacuum
more aggressive to remove orphaned temp tables") which is currently only
in pg11. We would put the new PGPROC member at the end of the struct, to
avoid ABI incompatibilities, but it'd bring trouble for extensions that
put PGPROC in arrays. I checked the code of some known extensions; found
that pglogical uses PGPROC, but only pointers to it, so it wouldn't be
damaged by the proposed change AFAICS.

+1
I think this is a bug from a user's perspective that garbage is left.
I want to believe that fixing bugs of PostgreSQL itself are
prioritized over the ABI compatibility for extensions, if we have to
choose one of them.

Having been victim of ABI incompatibility myself, I loathe giving too
much priority to other issues that can be resolved in other ways, so I
don't necessarily support your view on bugs.
That said, I think in this case it shouldn't be a problem, so I'm going
to work on that next.

I haven't got around to creating the abidiff reporting system yet ...

Another possibly useful change is to make DISCARD ALL and DISCARD TEMP delete
everything in what would be the backend's temp namespace, even if it hasn't
been initialized yet. This would cover the case where a connection pooler
keeps the connection open for a very long time, which I think is a common
case.

That sounds good.

Thanks. I just tested that the attached patch does the intended. (I
named it "autovac" but obviously the DISCARD part is not about
autovacuum). I intend to push this patch first, and later backpatch the
other one.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

autovac-temp-REL9_4_STABLE.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 89df585b87..d6f3fd5436 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3835,12 +3835,31 @@ RemoveTempRelationsCallback(int code, Datum arg)
 
 /*
  * Remove all temp tables from the temporary namespace.
+ *
+ * If we haven't set up one yet, but one exists from a previous crashed
+ * backend, clean that one; but only do this once in a session's life.
  */
 void
 ResetTempTableNamespace(void)
 {
+	static bool	TempNamespaceCleaned = false;
+
 	if (OidIsValid(myTempNamespace))
 		RemoveTempRelations(myTempNamespace);
+	else if (MyBackendId != InvalidBackendId && !RecoveryInProgress() &&
+			 !TempNamespaceCleaned)
+	{
+		char		namespaceName[NAMEDATALEN];
+		Oid			namespaceId;
+
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
+				 MyBackendId);
+		namespaceId = get_namespace_oid(namespaceName, true);
+		if (OidIsValid(namespaceId))
+			RemoveTempRelations(namespaceId);
+	}
+
+	TempNamespaceCleaned = true;
 }
 
 
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c18285d690..62a06add56 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2108,11 +2108,18 @@ do_autovacuum(void)
 		if (classForm->relpersistence == RELPERSISTENCE_TEMP)
 		{
 			int			backendID;
+			PGPROC	   *proc;
 
 			backendID = GetTempNamespaceBackendId(classForm->relnamespace);
 
-			/* We just ignore it if the owning backend is still active */
-			if (backendID == MyBackendId || BackendIdGetProc(backendID) == NULL)
+			/*
+			 * We just ignore it if the owning backend is still active in the
+			 * same database.
+			 */
+			if (backendID != InvalidBackendId &&
+				(backendID == MyBackendId ||
+				 (proc = BackendIdGetProc(backendID)) == NULL ||
+				 proc->databaseId != MyDatabaseId))
 			{
 				/*
 				 * We found an orphan temp table (which was probably left
#19Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Alvaro Herrera (#18)
RE: removal of dangling temp tables

From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]

Having been victim of ABI incompatibility myself, I loathe giving too much
priority to other issues that can be resolved in other ways, so I don't
necessarily support your view on bugs.
That said, I think in this case it shouldn't be a problem, so I'm going
to work on that next.

I understood your sad experience...

Thanks. I just tested that the attached patch does the intended. (I named
it "autovac" but obviously the DISCARD part is not about autovacuum). I
intend to push this patch first, and later backpatch the other one.

The patch looks good. I think this can be committed.

Please don't mind replying to this mail.

Regards
Takayuki Tsunakawa

#20Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#18)
Re: removal of dangling temp tables

On Wed, Dec 26, 2018 at 08:51:56PM -0300, Alvaro Herrera wrote:

Having been victim of ABI incompatibility myself, I loathe giving too
much priority to other issues that can be resolved in other ways, so I
don't necessarily support your view on bugs.
That said, I think in this case it shouldn't be a problem, so I'm going
to work on that next.

And it is even better if bugs can be fixed, even partially without any
ABI breakages. Anyway, not breaking the ABI of PGPROC is why 246a6c8
has not been back-patched to begin with, because we have no idea how
PGPROC is being used and because its interface is public, so if the
intent is to apply 246a6c8 to v10 and down this gets a -1 from me.

Back-patching what you sent in
/messages/by-id/20181226190834.wsk2wzott5yzrjiq@alvherre.pgsql
is fine for me.

Another possibly useful change is to make DISCARD ALL and DISCARD TEMP delete
everything in what would be the backend's temp namespace, even if it hasn't
been initialized yet. This would cover the case where a connection pooler
keeps the connection open for a very long time, which I think is a common
case.

That sounds good.

Thanks. I just tested that the attached patch does the intended. (I
named it "autovac" but obviously the DISCARD part is not about
autovacuum). I intend to push this patch first, and later backpatch the
other one.

+       snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
+                MyBackendId);
+       namespaceId = get_namespace_oid(namespaceName, true);

So this is the reverse engineering of GetTempNamespaceBackendId().
Perhaps a dedicated API in namespace.c would be interesting to have?

I would recommend to keep the changes for DISCARD and autovacuum into
separate commits.
--
Michael

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#20)
Re: removal of dangling temp tables

On 2018-Dec-27, Michael Paquier wrote:

On Wed, Dec 26, 2018 at 08:51:56PM -0300, Alvaro Herrera wrote:

Having been victim of ABI incompatibility myself, I loathe giving too
much priority to other issues that can be resolved in other ways, so I
don't necessarily support your view on bugs.
That said, I think in this case it shouldn't be a problem, so I'm going
to work on that next.

And it is even better if bugs can be fixed, even partially without any
ABI breakages. Anyway, not breaking the ABI of PGPROC is why 246a6c8
has not been back-patched to begin with, because we have no idea how
PGPROC is being used and because its interface is public, so if the
intent is to apply 246a6c8 to v10 and down this gets a -1 from me.

We allow structs to receive new members at the end of the struct, since
this doesn't affect the offset of existing members; thus code already
compiled with the previous struct definition does not break. AFAICS
there is no danger in backpatching that, moving that struct member at
the end of the struct.

Back-patching what you sent in
/messages/by-id/20181226190834.wsk2wzott5yzrjiq@alvherre.pgsql
is fine for me.

Done.

+       snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
+                MyBackendId);
+       namespaceId = get_namespace_oid(namespaceName, true);

So this is the reverse engineering of GetTempNamespaceBackendId().
Perhaps a dedicated API in namespace.c would be interesting to have?

Since this code doesn't appear in branches 11 and later, I'm not sure
creating such an API has any value.

I would recommend to keep the changes for DISCARD and autovacuum into
separate commits.

Yeah, done like that.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#22Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#21)
Re: removal of dangling temp tables

On Thu, Dec 27, 2018 at 04:30:21PM -0300, Alvaro Herrera wrote:

We allow structs to receive new members at the end of the struct, since
this doesn't affect the offset of existing members; thus code already
compiled with the previous struct definition does not break. AFAICS
there is no danger in backpatching that, moving that struct member at
the end of the struct.

Sure. Now this comes to PGPROC, which I am not sure we can say is
never manipulated as an array.
--
Michael

#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#22)
Re: removal of dangling temp tables

On 2018-Dec-28, Michael Paquier wrote:

On Thu, Dec 27, 2018 at 04:30:21PM -0300, Alvaro Herrera wrote:

We allow structs to receive new members at the end of the struct, since
this doesn't affect the offset of existing members; thus code already
compiled with the previous struct definition does not break. AFAICS
there is no danger in backpatching that, moving that struct member at
the end of the struct.

Sure. Now this comes to PGPROC, which I am not sure we can say is
never manipulated as an array.

The server code allocates arrays, but that's fine because that code is
recompiled. Extensions only pass pointers around -- they don't create
any additional arrays.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#24Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#23)
Re: removal of dangling temp tables

On Fri, Dec 28, 2018 at 12:05:34AM -0300, Alvaro Herrera wrote:

The server code allocates arrays, but that's fine because that code is
recompiled. Extensions only pass pointers around -- they don't create
any additional arrays.

There are many exotic extensions which could be using sizeof(PGPROC)
as that's a popular structure, so I am glad that 246a6c8f did not find
its way down. So thanks for what you have done!
--
Michael

#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#24)
Re: removal of dangling temp tables

On 2018-Dec-28, Michael Paquier wrote:

On Fri, Dec 28, 2018 at 12:05:34AM -0300, Alvaro Herrera wrote:

The server code allocates arrays, but that's fine because that code is
recompiled. Extensions only pass pointers around -- they don't create
any additional arrays.

There are many exotic extensions which could be using sizeof(PGPROC)
as that's a popular structure,

Can you show one instance of this?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#25)
Re: removal of dangling temp tables

On 2018-Dec-28, Alvaro Herrera wrote:

On 2018-Dec-28, Michael Paquier wrote:

On Fri, Dec 28, 2018 at 12:05:34AM -0300, Alvaro Herrera wrote:

The server code allocates arrays, but that's fine because that code is
recompiled. Extensions only pass pointers around -- they don't create
any additional arrays.

There are many exotic extensions which could be using sizeof(PGPROC)
as that's a popular structure,

Can you show one instance of this?

I looked at
https://github.com/postgrespro/pg_wait_sampling/blob/master/pg_wait_sampling.c
https://github.com/citusdata/citus/search?q=pgproc&amp;unscoped_q=pgproc

and skimmed a few others can't find any instance where the full struct
is used, as opposed to just a pointer to it.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#26)
Re: removal of dangling temp tables

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Dec-28, Alvaro Herrera wrote:

On 2018-Dec-28, Michael Paquier wrote:

There are many exotic extensions which could be using sizeof(PGPROC)
as that's a popular structure,

Can you show one instance of this?

I looked at
https://github.com/postgrespro/pg_wait_sampling/blob/master/pg_wait_sampling.c
https://github.com/citusdata/citus/search?q=pgproc&amp;unscoped_q=pgproc
and skimmed a few others can't find any instance where the full struct
is used, as opposed to just a pointer to it.

No, the point Michael is making is that the array stride in the ProcArray
is part of our ABI. For example, accessing a PGPROC from its pgprocno
using the GetPGProcByNumber macro will be broken if we change the
struct size. I do not think you can assume that no extension does that.

regards, tom lane

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#27)
Re: removal of dangling temp tables

I wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I looked at
https://github.com/postgrespro/pg_wait_sampling/blob/master/pg_wait_sampling.c
https://github.com/citusdata/citus/search?q=pgproc&amp;unscoped_q=pgproc
and skimmed a few others can't find any instance where the full struct
is used, as opposed to just a pointer to it.

No, the point Michael is making is that the array stride in the ProcArray
is part of our ABI. For example, accessing a PGPROC from its pgprocno
using the GetPGProcByNumber macro will be broken if we change the
struct size. I do not think you can assume that no extension does that.

In fact, there's a counterexample right here in pg_wait_sampling:

https://github.com/postgrespro/pg_wait_sampling/blob/master/pg_wait_sampling.c#L343

regards, tom lane

#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#28)
Re: removal of dangling temp tables

On 2018-Dec-28, Tom Lane wrote:

I wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I looked at
https://github.com/postgrespro/pg_wait_sampling/blob/master/pg_wait_sampling.c
https://github.com/citusdata/citus/search?q=pgproc&amp;unscoped_q=pgproc
and skimmed a few others can't find any instance where the full struct
is used, as opposed to just a pointer to it.

No, the point Michael is making is that the array stride in the ProcArray
is part of our ABI. For example, accessing a PGPROC from its pgprocno
using the GetPGProcByNumber macro will be broken if we change the
struct size. I do not think you can assume that no extension does that.

In fact, there's a counterexample right here in pg_wait_sampling:

https://github.com/postgrespro/pg_wait_sampling/blob/master/pg_wait_sampling.c#L343

Ughh. I stand corrected.

This seems a terrible interface, from an ABI compatibility point of
view, and exposing proclist_node_get() as an inline function, doubly so.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services