performance issue in remove_from_unowned_list()

Started by Tomas Vondraalmost 7 years ago15 messages
#1Tomas Vondra
tomas.vondra@2ndquadrant.com
1 attachment(s)

Hi,

While working on benchmarks for the syscache patch (negative entries and
all of that), I've ran into a an issue in remove_from_unowned_list.

If you create a lot of relations in a single transaction (say, 100k) and
then abort the transaction (or if it fails for some reason, e.g. because
of hitting max_locks_per_transaction), smgrDoPendingDeletes ought to
clean relation files.

Unfortunately, the constructed srels array is ordered in exactly the
opposite way compared to "unowned list" (used by smgrclose). So what
happens is that we start with the first relation, walk the whole list
unowned list and remove the last item. Then we get the second rel and
again walk the whole unowned list until the last item. And so on.

For large number of objects that's pretty significant. With 100k rels
I've lost the patience after a couple of minutes, and just nuked the
whole cluster (interrupting ROLLBACK is a no go, and after a kill the
recovery just starts chewing on it again).

Of course, transactions creating 100k tables in a single transaction are
not that common, but it's not unheard of either (such applications
exist, are discussed in the syscache thread, and restoring them from a
pg_dump is likely to hit this issue). And the same issue applies to
cases that drop a bunch of tables in a single transaction (and I've seen
plenty of scripts doing that, although in smaller batches).

But it's a bit funnier, because there's also DropRelationFiles() which
does smgrclose on a batch of relations too, and it says this

/*
* Call smgrclose() in reverse order as when smgropen() is called.
* This trick enables remove_from_unowned_list() in smgrclose()
* to search the SMgrRelation from the unowned list,
* with O(1) performance.
*/
for (i = ndelrels - 1; i >= 0; i--)
...

but it's called from two places in xact.c only. And if you trigger the
issue with 100k x CREATE TABLE + ROLLBACK, and then force a recovery by
killing postmaster, you actually get the very same behavior with always
traversing the unowned list for some reason. I'm not quite sure why, but
it seems the optimization is exactly the wrong thing to do here.

Attached is a WIP patch removing the optimization from DropRelationFiles
and adding it to smgrDoPendingDeletes. This resolves the issue, at least
in the cases I've been able to reproduce. But maybe we should deal with
this issue earlier by ensuring the two lists are ordered the same way
from the very beginning, somehow.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

smgr-fix.patchtext/x-patch; name=smgr-fix.patchDownload
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0302507e6f..8dc9050343 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -360,7 +360,13 @@ smgrDoPendingDeletes(bool isCommit)
 	{
 		smgrdounlinkall(srels, nrels, false);
 
-		for (i = 0; i < nrels; i++)
+		/*
+		 * Call smgrclose() in reverse order as when smgropen() is called.
+		 * This trick enables remove_from_unowned_list() in smgrclose()
+		 * to search the SMgrRelation from the unowned list,
+		 * with O(1) performance.
+		 */
+		for (i = nrels-1; i >= 0; i--)
 			smgrclose(srels[i]);
 
 		pfree(srels);
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2aba2dfe91..6ed68185ed 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1699,13 +1699,7 @@ DropRelationFiles(RelFileNode *delrels, int ndelrels, bool isRedo)
 
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
-	/*
-	 * Call smgrclose() in reverse order as when smgropen() is called.
-	 * This trick enables remove_from_unowned_list() in smgrclose()
-	 * to search the SMgrRelation from the unowned list,
-	 * with O(1) performance.
-	 */
-	for (i = ndelrels - 1; i >= 0; i--)
+	for (i = 0; i < ndelrels; i++)
 		smgrclose(srels[i]);
 	pfree(srels);
 }
#2Ideriha, Takeshi
ideriha.takeshi@jp.fujitsu.com
In reply to: Tomas Vondra (#1)
RE: performance issue in remove_from_unowned_list()

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
a batch of relations too, and it says this

/*
* Call smgrclose() in reverse order as when smgropen() is called.
* This trick enables remove_from_unowned_list() in smgrclose()
* to search the SMgrRelation from the unowned list,
* with O(1) performance.
*/
for (i = ndelrels - 1; i >= 0; i--)
...

but it's called from two places in xact.c only. And if you trigger the issue with 100k x
CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
actually get the very same behavior with always traversing the unowned list for some
reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
to do here.

So when DropRelationFiles() is called, order of calling smgr_close() and order of unowed list is always same?

This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion.
/messages/by-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O=1NrRTpHcbtCBw@mail.gmail.com

Regards,
Takeshi Ideriha

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ideriha, Takeshi (#2)
1 attachment(s)
Re: performance issue in remove_from_unowned_list()

On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
a batch of relations too, and it says this

/*
* Call smgrclose() in reverse order as when smgropen() is called.
* This trick enables remove_from_unowned_list() in smgrclose()
* to search the SMgrRelation from the unowned list,
* with O(1) performance.
*/
for (i = ndelrels - 1; i >= 0; i--)
...

but it's called from two places in xact.c only. And if you trigger the issue with 100k x
CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
actually get the very same behavior with always traversing the unowned list for some
reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
to do here.

So when DropRelationFiles() is called, order of calling smgr_close()

and order of unowed list is always same?

Well, let's say you create 10k tables in a single transaction, and then
kill the server with "kill -9" right after the commit. Then on restart,
during recovery this happens

2019-02-08 14:16:21.781 CET [12817] LOG: remove_from_unowned_list 10015
2019-02-08 14:16:21.871 CET [12817] LOG: remove_from_unowned_list 10005
2019-02-08 14:16:21.967 CET [12817] LOG: remove_from_unowned_list 10004
2019-02-08 14:16:22.057 CET [12817] LOG: remove_from_unowned_list 10001
2019-02-08 14:16:22.147 CET [12817] LOG: remove_from_unowned_list 10000
2019-02-08 14:16:22.238 CET [12817] LOG: remove_from_unowned_list 9999
2019-02-08 14:16:22.327 CET [12817] LOG: remove_from_unowned_list 9998
2019-02-08 14:16:22.421 CET [12817] LOG: remove_from_unowned_list 9996
2019-02-08 14:16:22.513 CET [12817] LOG: remove_from_unowned_list 9995
2019-02-08 14:16:22.605 CET [12817] LOG: remove_from_unowned_list 9994
2019-02-08 14:16:22.696 CET [12817] LOG: remove_from_unowned_list 9993
...
2019-02-08 14:19:13.921 CET [12817] LOG: remove_from_unowned_list 8396
2019-02-08 14:19:14.025 CET [12817] LOG: remove_from_unowned_list 8395
2019-02-08 14:19:14.174 CET [12817] LOG: remove_from_unowned_list 8394
2019-02-08 14:19:14.277 CET [12817] LOG: remove_from_unowned_list 8393
2019-02-08 14:19:14.387 CET [12817] LOG: remove_from_unowned_list 8392
2019-02-08 14:19:14.508 CET [12817] LOG: remove_from_unowned_list 8391
2019-02-08 14:19:14.631 CET [12817] LOG: remove_from_unowned_list 8390
2019-02-08 14:19:14.770 CET [12817] LOG: remove_from_unowned_list 8389
...

This is with the attached debug patch, which simply prints index of the
unowned list index entry. And yes, it took ~3 minutes to get from 10k to
8k (at which point I interrupted the recovery and killed the cluster).

I see similar issue after creating a lot of tables in the same xact and
rolling it back, although in this case it's faster for some reason.

This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion.
/messages/by-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O=1NrRTpHcbtCBw@mail.gmail.com

That probably explains why we haven't seen the issue before.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

smgr-debug.patchtext/x-patch; name=smgr-debug.patchDownload
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..8eb724b80d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -281,6 +281,7 @@ remove_from_unowned_list(SMgrRelation reln)
 {
 	SMgrRelation *link;
 	SMgrRelation cur;
+	int			idx = 0;
 
 	for (link = &first_unowned_reln, cur = *link;
 		 cur != NULL;
@@ -292,7 +293,12 @@ remove_from_unowned_list(SMgrRelation reln)
 			cur->next_unowned_reln = NULL;
 			break;
 		}
+
+		idx++;
 	}
+
+	if (idx > 0)
+		elog(LOG, "remove_from_unowned_list %d", idx);
 }
 
 /*
#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#3)
1 attachment(s)
Re: performance issue in remove_from_unowned_list()

On 2/8/19 2:27 PM, Tomas Vondra wrote:

On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on
a batch of relations too, and it says this

/*
* Call smgrclose() in reverse order as when smgropen() is called.
* This trick enables remove_from_unowned_list() in smgrclose()
* to search the SMgrRelation from the unowned list,
* with O(1) performance.
*/
for (i = ndelrels - 1; i >= 0; i--)
...

but it's called from two places in xact.c only. And if you trigger the issue with 100k x
CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
actually get the very same behavior with always traversing the unowned list for some
reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing
to do here.

So when DropRelationFiles() is called, order of calling smgr_close()

and order of unowed list is always same?

Well, let's say you create 10k tables in a single transaction, and then
kill the server with "kill -9" right after the commit. Then on restart,
during recovery this happens

2019-02-08 14:16:21.781 CET [12817] LOG: remove_from_unowned_list 10015
2019-02-08 14:16:21.871 CET [12817] LOG: remove_from_unowned_list 10005
2019-02-08 14:16:21.967 CET [12817] LOG: remove_from_unowned_list 10004
2019-02-08 14:16:22.057 CET [12817] LOG: remove_from_unowned_list 10001
2019-02-08 14:16:22.147 CET [12817] LOG: remove_from_unowned_list 10000
2019-02-08 14:16:22.238 CET [12817] LOG: remove_from_unowned_list 9999
2019-02-08 14:16:22.327 CET [12817] LOG: remove_from_unowned_list 9998
2019-02-08 14:16:22.421 CET [12817] LOG: remove_from_unowned_list 9996
2019-02-08 14:16:22.513 CET [12817] LOG: remove_from_unowned_list 9995
2019-02-08 14:16:22.605 CET [12817] LOG: remove_from_unowned_list 9994
2019-02-08 14:16:22.696 CET [12817] LOG: remove_from_unowned_list 9993
...
2019-02-08 14:19:13.921 CET [12817] LOG: remove_from_unowned_list 8396
2019-02-08 14:19:14.025 CET [12817] LOG: remove_from_unowned_list 8395
2019-02-08 14:19:14.174 CET [12817] LOG: remove_from_unowned_list 8394
2019-02-08 14:19:14.277 CET [12817] LOG: remove_from_unowned_list 8393
2019-02-08 14:19:14.387 CET [12817] LOG: remove_from_unowned_list 8392
2019-02-08 14:19:14.508 CET [12817] LOG: remove_from_unowned_list 8391
2019-02-08 14:19:14.631 CET [12817] LOG: remove_from_unowned_list 8390
2019-02-08 14:19:14.770 CET [12817] LOG: remove_from_unowned_list 8389
...

This is with the attached debug patch, which simply prints index of the
unowned list index entry. And yes, it took ~3 minutes to get from 10k to
8k (at which point I interrupted the recovery and killed the cluster).

I see similar issue after creating a lot of tables in the same xact and
rolling it back, although in this case it's faster for some reason.

This one was inroduced at b4166911 and I'd like to hear author and reviewer's opinion.
/messages/by-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O=1NrRTpHcbtCBw@mail.gmail.com

That probably explains why we haven't seen the issue before.

FWIW I've just hit this issue (remove_from_unowned_list consuming 100%
CPU for long periods of time) in checkpointer, which calls smgrcloseall.
And that does this:

hash_seq_init(&status, SMgrRelationHash);

while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL)
smgrclose(reln);

which means the order of closing relations is entirely random in this
case, and there's no hope of smartly ordering the close requests.

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

The comment before remove_from_unowned_list() claims it's not worth it,
but I guess we may need to revisit the assumptions - the fact that we've
accumulated ordering optimizations on a number of places suggests they
may not be entirely accurate.

The doubly-linked list seems way less fragile than the attempts to order
the requests in a particular way, and it works all the time. It requires
an extra pointer in the struct, but per pahole size changes from 88 to
96 bytes - so the number of cachelines does not change, and the allocset
chunk size remains the same. So no problem here, IMO. But some testing
is probably needed.

I'm going to use this for the syscache patch benchmarking, because it's
rather impossible without it.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

smgr-fix.patchtext/x-patch; name=smgr-fix.patchDownload
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..02a5d5cdda 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -259,7 +259,12 @@ static void
 add_to_unowned_list(SMgrRelation reln)
 {
 	/* place it at head of the list (to make smgrsetowner cheap) */
+	reln->prev_unowned_reln = NULL;
 	reln->next_unowned_reln = first_unowned_reln;
+
+	if (first_unowned_reln)
+		first_unowned_reln->prev_unowned_reln = reln;
+
 	first_unowned_reln = reln;
 }
 
@@ -279,20 +284,20 @@ add_to_unowned_list(SMgrRelation reln)
 static void
 remove_from_unowned_list(SMgrRelation reln)
 {
-	SMgrRelation *link;
-	SMgrRelation cur;
+	SMgrRelation prev;
+	SMgrRelation next;
 
-	for (link = &first_unowned_reln, cur = *link;
-		 cur != NULL;
-		 link = &cur->next_unowned_reln, cur = *link)
-	{
-		if (cur == reln)
-		{
-			*link = cur->next_unowned_reln;
-			cur->next_unowned_reln = NULL;
-			break;
-		}
-	}
+	prev = reln->prev_unowned_reln;
+	next = reln->next_unowned_reln;
+
+	if (prev)
+		prev->next_unowned_reln = reln->next_unowned_reln;
+
+	if (next)
+		next->prev_unowned_reln = reln->prev_unowned_reln;
+
+	if (reln == first_unowned_reln)
+		first_unowned_reln = next;
 }
 
 /*
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 820d08ed4e..ed6c4421eb 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -72,6 +72,7 @@ typedef struct SMgrRelationData
 	struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];
 
 	/* if unowned, list link in list of all unowned SMgrRelations */
+	struct SMgrRelationData *prev_unowned_reln;
 	struct SMgrRelationData *next_unowned_reln;
 } SMgrRelationData;
 
#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#4)
Re: performance issue in remove_from_unowned_list()

On 2019-Feb-08, Tomas Vondra wrote:

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

+1 for that approach.

Did you consider using a dlist?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#5)
Re: performance issue in remove_from_unowned_list()

On Wed, Mar 6, 2019 at 1:53 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Feb-08, Tomas Vondra wrote:

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

+1 for that approach.

+1 for me, too.

Did you consider using a dlist?

Maybe that is worthwhile, but this is a smaller change, which I think
should count for quite a bit. Nothing keeps somebody from doing the
dlist change as a separate patch, if desired.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#5)
Re: performance issue in remove_from_unowned_list()

On 3/6/19 7:52 PM, Alvaro Herrera wrote:

On 2019-Feb-08, Tomas Vondra wrote:

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

+1 for that approach.

Did you consider using a dlist?

I'm not sure. I might have considered it, but decided to go with a
simpler / less invasive fix demonstrating the effect. And maybe make it
more acceptable for backpatch, if we want that. Which we probably don't,
so I agree dlist might be a better choice.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#6)
Re: performance issue in remove_from_unowned_list()

On 3/6/19 8:04 PM, Robert Haas wrote:

On Wed, Mar 6, 2019 at 1:53 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Feb-08, Tomas Vondra wrote:

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

+1 for that approach.

+1 for me, too.

Did you consider using a dlist?

Maybe that is worthwhile, but this is a smaller change, which I think
should count for quite a bit. Nothing keeps somebody from doing the
dlist change as a separate patch, if desired.

Yeah, although now that I think about it I wouldn't expect the dlist
version to be much more complicated. We access next_unowned_reln on two
or three places, IIRC, so switching to dlist would be trivial I think.

What worries me more is that I observe the opposite behavior than what's
described in comment for b4166911 (which is from 2018, so not that old)
and 279628a0a7 (from 2013). So what changed since then? Seems fishy ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#1)
Re: performance issue in remove_from_unowned_list()

On 2019-Feb-07, Tomas Vondra wrote:

Attached is a WIP patch removing the optimization from DropRelationFiles
and adding it to smgrDoPendingDeletes. This resolves the issue, at least
in the cases I've been able to reproduce. But maybe we should deal with
this issue earlier by ensuring the two lists are ordered the same way
from the very beginning, somehow.

I noticed that this patch isn't in the commitfest. Are you planning to
push it soon?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#9)
1 attachment(s)
Re: performance issue in remove_from_unowned_list()

On 3/10/19 9:09 PM, Alvaro Herrera wrote:

On 2019-Feb-07, Tomas Vondra wrote:

Attached is a WIP patch removing the optimization from DropRelationFiles
and adding it to smgrDoPendingDeletes. This resolves the issue, at least
in the cases I've been able to reproduce. But maybe we should deal with
this issue earlier by ensuring the two lists are ordered the same way
from the very beginning, somehow.

I noticed that this patch isn't in the commitfest. Are you planning to
push it soon?

I wasn't planning to push anything particularly soon, for two reasons:
Firstly, the issue is not particularly pressing except with non-trivial
number of relations (and I only noticed that during benchmarking).
Secondly, I still have a feeling I'm missing something about b4166911
because for me that commit does not actually fix the issue - i.e. I can
create a lot of relations in a transaction, abort it, and observe that
the replica actually accesses the relations in exactly the wrong order.
So that commit does not seem to actually fix anything.

Attached is a patch adopting the dlist approach - it seems to be working
quite fine, and is a bit cleaner than just slapping another pointer into
the SMgrRelationData struct. So I'd say this is the way to go.

I see b4166911 was actually backpatched to all supported versions, on
the basis that it fixes oversight in 279628a0a7. So I suppose this fix
should also be backpatched.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

smgr-dlist-wip-fix.patchtext/x-patch; name=smgr-dlist-wip-fix.patchDownload
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..10e84dc571 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -18,6 +18,7 @@
 #include "postgres.h"
 
 #include "commands/tablespace.h"
+#include "lib/ilist.h"
 #include "storage/bufmgr.h"
 #include "storage/ipc.h"
 #include "storage/smgr.h"
@@ -97,7 +98,7 @@ static const int NSmgr = lengthof(smgrsw);
  */
 static HTAB *SMgrRelationHash = NULL;
 
-static SMgrRelation first_unowned_reln = NULL;
+static dlist_head	unowned_relns;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
@@ -165,7 +166,7 @@ smgropen(RelFileNode rnode, BackendId backend)
 		ctl.entrysize = sizeof(SMgrRelationData);
 		SMgrRelationHash = hash_create("smgr relation table", 400,
 									   &ctl, HASH_ELEM | HASH_BLOBS);
-		first_unowned_reln = NULL;
+		dlist_init(&unowned_relns);
 	}
 
 	/* Look up or create an entry */
@@ -258,9 +259,7 @@ smgrclearowner(SMgrRelation *owner, SMgrRelation reln)
 static void
 add_to_unowned_list(SMgrRelation reln)
 {
-	/* place it at head of the list (to make smgrsetowner cheap) */
-	reln->next_unowned_reln = first_unowned_reln;
-	first_unowned_reln = reln;
+	dlist_push_tail(&unowned_relns, &reln->node);
 }
 
 /*
@@ -268,31 +267,19 @@ add_to_unowned_list(SMgrRelation reln)
  *
  * If the reln is not present in the list, nothing happens.  Typically this
  * would be caller error, but there seems no reason to throw an error.
- *
- * In the worst case this could be rather slow; but in all the cases that seem
- * likely to be performance-critical, the reln being sought will actually be
- * first in the list.  Furthermore, the number of unowned relns touched in any
- * one transaction shouldn't be all that high typically.  So it doesn't seem
- * worth expending the additional space and management logic needed for a
- * doubly-linked list.
  */
 static void
 remove_from_unowned_list(SMgrRelation reln)
 {
-	SMgrRelation *link;
-	SMgrRelation cur;
+	/* if not in the list, do nothing */
+	if ((reln->node.prev == NULL) && (reln->node.next == NULL))
+		return;
 
-	for (link = &first_unowned_reln, cur = *link;
-		 cur != NULL;
-		 link = &cur->next_unowned_reln, cur = *link)
-	{
-		if (cur == reln)
-		{
-			*link = cur->next_unowned_reln;
-			cur->next_unowned_reln = NULL;
-			break;
-		}
-	}
+	dlist_delete(&reln->node);
+
+	/* reset the dlist pointers */
+	reln->node.prev = NULL;
+	reln->node.next = NULL;
 }
 
 /*
@@ -812,13 +799,19 @@ smgrpostckpt(void)
 void
 AtEOXact_SMgr(void)
 {
+	dlist_mutable_iter	iter;
+
 	/*
 	 * Zap all unowned SMgrRelations.  We rely on smgrclose() to remove each
 	 * one from the list.
 	 */
-	while (first_unowned_reln != NULL)
+	dlist_foreach_modify(iter, &unowned_relns)
 	{
-		Assert(first_unowned_reln->smgr_owner == NULL);
-		smgrclose(first_unowned_reln);
+		SMgrRelation	rel = dlist_container(SMgrRelationData, node,
+											  iter.cur);
+
+		Assert(rel->smgr_owner == NULL);
+
+		smgrclose(rel);
 	}
 }
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 4d60b28dac..8e98273878 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -14,6 +14,7 @@
 #ifndef SMGR_H
 #define SMGR_H
 
+#include "lib/ilist.h"
 #include "storage/block.h"
 #include "storage/relfilenode.h"
 
@@ -71,7 +72,7 @@ typedef struct SMgrRelationData
 	struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];
 
 	/* if unowned, list link in list of all unowned SMgrRelations */
-	struct SMgrRelationData *next_unowned_reln;
+	dlist_node	node;
 } SMgrRelationData;
 
 typedef SMgrRelationData *SMgrRelation;
#11Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#10)
Re: performance issue in remove_from_unowned_list()

On Tue, Mar 12, 2019 at 6:54 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is a patch adopting the dlist approach - it seems to be working
quite fine, and is a bit cleaner than just slapping another pointer into
the SMgrRelationData struct. So I'd say this is the way to go.

What about using a data structure that supports O(1) lookups for any element?

The current efforts all seem to revolve around correctly guessing from
which end of the list we are likely to delete stuff, but your research
suggests that we don't always make such guesses particularly well.
And it seems unnecessary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#11)
Re: performance issue in remove_from_unowned_list()

On 3/13/19 1:12 PM, Robert Haas wrote:

On Tue, Mar 12, 2019 at 6:54 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is a patch adopting the dlist approach - it seems to be working
quite fine, and is a bit cleaner than just slapping another pointer into
the SMgrRelationData struct. So I'd say this is the way to go.

What about using a data structure that supports O(1) lookups for any element?

The current efforts all seem to revolve around correctly guessing from
which end of the list we are likely to delete stuff, but your research
suggests that we don't always make such guesses particularly well.
And it seems unnecessary.

Isn't the the doubly-linked list is doing exactly that?

AFAICS we already maintain a hash table of the smgr relations, and we
look them up in this table. We don't need to look them up in the list of
unowned relations - the whole problem is that with the current
single-linked list, we need to iterate the list anyway to update pointer
in the preceding entry.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#12)
Re: performance issue in remove_from_unowned_list()

On Wed, Mar 13, 2019 at 9:47 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

AFAICS we already maintain a hash table of the smgr relations, and we
look them up in this table. We don't need to look them up in the list of
unowned relations - the whole problem is that with the current
single-linked list, we need to iterate the list anyway to update pointer
in the preceding entry.

OK, I'm dumb. Sorry for the noise.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#10)
1 attachment(s)
Re: performance issue in remove_from_unowned_list()

On 3/12/19 11:54 PM, Tomas Vondra wrote:

On 3/10/19 9:09 PM, Alvaro Herrera wrote:

On 2019-Feb-07, Tomas Vondra wrote:

Attached is a WIP patch removing the optimization from DropRelationFiles
and adding it to smgrDoPendingDeletes. This resolves the issue, at least
in the cases I've been able to reproduce. But maybe we should deal with
this issue earlier by ensuring the two lists are ordered the same way
from the very beginning, somehow.

I noticed that this patch isn't in the commitfest. Are you planning to
push it soon?

I wasn't planning to push anything particularly soon, for two reasons:
Firstly, the issue is not particularly pressing except with non-trivial
number of relations (and I only noticed that during benchmarking).
Secondly, I still have a feeling I'm missing something about b4166911
because for me that commit does not actually fix the issue - i.e. I can
create a lot of relations in a transaction, abort it, and observe that
the replica actually accesses the relations in exactly the wrong order.
So that commit does not seem to actually fix anything.

Attached is a patch adopting the dlist approach - it seems to be working
quite fine, and is a bit cleaner than just slapping another pointer into
the SMgrRelationData struct. So I'd say this is the way to go.

I see b4166911 was actually backpatched to all supported versions, on
the basis that it fixes oversight in 279628a0a7. So I suppose this fix
should also be backpatched.

OK, so here is a bit more polished version of the dlist-based patch.

It's almost identical to what I posted before, except that it:

1) undoes the non-working optimization in DropRelationFiles()

2) removes add_to_unowned_list/remove_from_unowned_list entirely and
just replaces that with dlist_push_tail/dlist_delete

I've originally planned to keep those functions, mostly because the
remove_from_unowned_list comment says this:

- * If the reln is not present in the list, nothing happens.
- * Typically this would be caller error, but there seems no
- * reason to throw an error.

I don't think dlist_delete allows that. But after further inspection of
all places calling those functions, don't think that can happen.

I plan to commit this soon-ish and backpatch it as mentioned before,
because I consider it pretty much a fix for b4166911.

I'm however still mildly puzzled that b4166911 apparently improved the
behavior in some cases, at least judging by [1]/messages/by-id/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com. OTOH there's not much
detail about how it was tested, so I can't quite run it again.

[1]: /messages/by-id/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com
/messages/by-id/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

smgr-fix-20190321.patchtext/x-patch; name=smgr-fix-20190321.patchDownload
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2aba2dfe91..6ed68185ed 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1699,13 +1699,7 @@ DropRelationFiles(RelFileNode *delrels, int ndelrels, bool isRedo)
 
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
-	/*
-	 * Call smgrclose() in reverse order as when smgropen() is called.
-	 * This trick enables remove_from_unowned_list() in smgrclose()
-	 * to search the SMgrRelation from the unowned list,
-	 * with O(1) performance.
-	 */
-	for (i = ndelrels - 1; i >= 0; i--)
+	for (i = 0; i < ndelrels; i++)
 		smgrclose(srels[i]);
 	pfree(srels);
 }
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..f6de9df9e6 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -18,6 +18,7 @@
 #include "postgres.h"
 
 #include "commands/tablespace.h"
+#include "lib/ilist.h"
 #include "storage/bufmgr.h"
 #include "storage/ipc.h"
 #include "storage/smgr.h"
@@ -97,12 +98,10 @@ static const int NSmgr = lengthof(smgrsw);
  */
 static HTAB *SMgrRelationHash = NULL;
 
-static SMgrRelation first_unowned_reln = NULL;
+static dlist_head	unowned_relns;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void add_to_unowned_list(SMgrRelation reln);
-static void remove_from_unowned_list(SMgrRelation reln);
 
 
 /*
@@ -165,7 +164,7 @@ smgropen(RelFileNode rnode, BackendId backend)
 		ctl.entrysize = sizeof(SMgrRelationData);
 		SMgrRelationHash = hash_create("smgr relation table", 400,
 									   &ctl, HASH_ELEM | HASH_BLOBS);
-		first_unowned_reln = NULL;
+		dlist_init(&unowned_relns);
 	}
 
 	/* Look up or create an entry */
@@ -192,7 +191,7 @@ smgropen(RelFileNode rnode, BackendId backend)
 			reln->md_num_open_segs[forknum] = 0;
 
 		/* it has no owner yet */
-		add_to_unowned_list(reln);
+		dlist_push_tail(&unowned_relns, &reln->node);
 	}
 
 	return reln;
@@ -222,7 +221,7 @@ smgrsetowner(SMgrRelation *owner, SMgrRelation reln)
 	if (reln->smgr_owner)
 		*(reln->smgr_owner) = NULL;
 	else
-		remove_from_unowned_list(reln);
+		dlist_delete(&reln->node);
 
 	/* Now establish the ownership relationship. */
 	reln->smgr_owner = owner;
@@ -246,53 +245,8 @@ smgrclearowner(SMgrRelation *owner, SMgrRelation reln)
 	/* unset our reference to the owner */
 	reln->smgr_owner = NULL;
 
-	add_to_unowned_list(reln);
-}
-
-/*
- * add_to_unowned_list -- link an SMgrRelation onto the unowned list
- *
- * Check remove_from_unowned_list()'s comments for performance
- * considerations.
- */
-static void
-add_to_unowned_list(SMgrRelation reln)
-{
-	/* place it at head of the list (to make smgrsetowner cheap) */
-	reln->next_unowned_reln = first_unowned_reln;
-	first_unowned_reln = reln;
-}
-
-/*
- * remove_from_unowned_list -- unlink an SMgrRelation from the unowned list
- *
- * If the reln is not present in the list, nothing happens.  Typically this
- * would be caller error, but there seems no reason to throw an error.
- *
- * In the worst case this could be rather slow; but in all the cases that seem
- * likely to be performance-critical, the reln being sought will actually be
- * first in the list.  Furthermore, the number of unowned relns touched in any
- * one transaction shouldn't be all that high typically.  So it doesn't seem
- * worth expending the additional space and management logic needed for a
- * doubly-linked list.
- */
-static void
-remove_from_unowned_list(SMgrRelation reln)
-{
-	SMgrRelation *link;
-	SMgrRelation cur;
-
-	for (link = &first_unowned_reln, cur = *link;
-		 cur != NULL;
-		 link = &cur->next_unowned_reln, cur = *link)
-	{
-		if (cur == reln)
-		{
-			*link = cur->next_unowned_reln;
-			cur->next_unowned_reln = NULL;
-			break;
-		}
-	}
+	/* add to list of unowned relations */
+	dlist_push_tail(&unowned_relns, &reln->node);
 }
 
 /*
@@ -319,7 +273,7 @@ smgrclose(SMgrRelation reln)
 	owner = reln->smgr_owner;
 
 	if (!owner)
-		remove_from_unowned_list(reln);
+		dlist_delete(&reln->node);
 
 	if (hash_search(SMgrRelationHash,
 					(void *) &(reln->smgr_rnode),
@@ -812,13 +766,19 @@ smgrpostckpt(void)
 void
 AtEOXact_SMgr(void)
 {
+	dlist_mutable_iter	iter;
+
 	/*
 	 * Zap all unowned SMgrRelations.  We rely on smgrclose() to remove each
 	 * one from the list.
 	 */
-	while (first_unowned_reln != NULL)
+	dlist_foreach_modify(iter, &unowned_relns)
 	{
-		Assert(first_unowned_reln->smgr_owner == NULL);
-		smgrclose(first_unowned_reln);
+		SMgrRelation	rel = dlist_container(SMgrRelationData, node,
+											  iter.cur);
+
+		Assert(rel->smgr_owner == NULL);
+
+		smgrclose(rel);
 	}
 }
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 4d60b28dac..8e98273878 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -14,6 +14,7 @@
 #ifndef SMGR_H
 #define SMGR_H
 
+#include "lib/ilist.h"
 #include "storage/block.h"
 #include "storage/relfilenode.h"
 
@@ -71,7 +72,7 @@ typedef struct SMgrRelationData
 	struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];
 
 	/* if unowned, list link in list of all unowned SMgrRelations */
-	struct SMgrRelationData *next_unowned_reln;
+	dlist_node	node;
 } SMgrRelationData;
 
 typedef SMgrRelationData *SMgrRelation;
#15Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#14)
Re: performance issue in remove_from_unowned_list()

On Thu, Mar 21, 2019 at 02:22:40AM +0100, Tomas Vondra wrote:

...

OK, so here is a bit more polished version of the dlist-based patch.

It's almost identical to what I posted before, except that it:

1) undoes the non-working optimization in DropRelationFiles()

2) removes add_to_unowned_list/remove_from_unowned_list entirely and
just replaces that with dlist_push_tail/dlist_delete

I've originally planned to keep those functions, mostly because the
remove_from_unowned_list comment says this:

- * If the reln is not present in the list, nothing happens.
- * Typically this would be caller error, but there seems no
- * reason to throw an error.

I don't think dlist_delete allows that. But after further inspection of
all places calling those functions, don't think that can happen.

I plan to commit this soon-ish and backpatch it as mentioned before,
because I consider it pretty much a fix for b4166911.

Aaaaaand committed + backpatched.

Show quoted text

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services