Do we need a ShmList implementation?

Started by Kevin Grittnerover 15 years ago17 messages

Kevin.Grittner@wicourts.gov

over 15 years ago

On the Serializable Snapshot Isolation thread, Heikki pointed out a
collection of objects in an HTAB which didn't really need its key on
VirtualTransactionId, but there isn't really any other useful key,
either. One of these objects may live and die, seeing use from
multiple processes, without ever getting a TransactionId assigned;
and it needs to be in a collection in shared memory the whole time.
This suggests to me that some sort of list would be better.

SHM_QUEUE objects provide the infrastructure for maintaining a
shared memory linked list, but they don't do anything about the
allocation and release of the space for the objects. So it occurs
to me that I'm using an HTAB for this collection because it provides
the infrastructure for managing the memory for the collection,
rather than because I need hash lookup. :-( It works, but that
hardly seems optimal.

Have I missed something we already have which could meet that need?
If not, how would people feel about a ShmList implementation? A
quick first draft for the API (which can almost certainly be
improved, so don't be shy), is:

ShmList ShmInitList(const char *name,
Size entrySize,
int initalEntryAlloc,
int maxExtensions);
Size ShmListEstimateSize(ShmList list);
void *CreateShmListEntry(ShmList list);
void ReleaseShmListEntry(ShmList list, void *entry);
int ShmListSize(ShmList list);
void *ShmListFirst(ShmList list);
void *ShmListNext(ShmList list, void *entry);

I see this as grabbing the initial allocation, filling it with
zeros, and then creating a linked list of available entries.
Internally the entries would be a SHM_QUEUE structure followed by
space for the entrySize passed on init. A "create entry" call would
remove an entry from the available list, link it into the
collection, and return a pointer to the structure. Releasing an
entry would remove it from the collection list, zero it, and link it
to the available list. Hopefully the rest is fairly self-evident --
if not, let me know.

Thoughts?

-Kevin

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Kevin Grittner (#1)

Re: Do we need a ShmList implementation?

On 20/09/10 18:12, Kevin Grittner wrote:

On the Serializable Snapshot Isolation thread, Heikki pointed out a
collection of objects in an HTAB which didn't really need its key on
VirtualTransactionId, but there isn't really any other useful key,
either. One of these objects may live and die, seeing use from
multiple processes, without ever getting a TransactionId assigned;
and it needs to be in a collection in shared memory the whole time.
This suggests to me that some sort of list would be better.

In the SSI patch, you'd also need a way to insert an existing struct
into a hash table. You currently work around that by using a hash
element that contains only the hash key, and a pointer to the
SERIALIZABLEXACT struct. It isn't too bad I guess, but I find it a bit
confusing.

SHM_QUEUE objects provide the infrastructure for maintaining a
shared memory linked list, but they don't do anything about the
allocation and release of the space for the objects. So it occurs
to me that I'm using an HTAB for this collection because it provides
the infrastructure for managing the memory for the collection,
rather than because I need hash lookup. :-( It works, but that
hardly seems optimal.

Have I missed something we already have which could meet that need?

Well, we generally try to avoid dynamic structures in shared memory,
because shared memory can't be resized. So, you'd typically use an array
with a fixed number of elements. One could even argue that we
specifically *don't* want to have the kind of infrastructure you
propose, to discourage people from writing patches that need dynamic
shmem structures.

Any chance of collapsing together entries of already-committed
transactions in the SSI patch, to put an upper limit on the number of
shmem list entries needed? If you can do that, then a simple array
allocated at postmaster startup will do fine.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

markus@bluegap.ch

over 15 years ago

In reply to: Kevin Grittner (#1)

Re: Do we need a ShmList implementation?

Kevin,

On 09/20/2010 05:12 PM, Kevin Grittner wrote:

SHM_QUEUE objects provide the infrastructure for maintaining a
shared memory linked list, but they don't do anything about the
allocation and release of the space for the objects.

Did you have a look at my dynshmem stuff? It tries to solve the problem
of dynamic allocation from shared memory. Not just for lists, but very
generally.

Regards

Markus Wanner

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Heikki Linnakangas (#2)

Re: Do we need a ShmList implementation?

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

In the SSI patch, you'd also need a way to insert an existing
struct into a hash table. You currently work around that by using
a hash element that contains only the hash key, and a pointer to
the SERIALIZABLEXACT struct. It isn't too bad I guess, but I find
it a bit confusing.

Hmmm... Mucking with the hash table implementation to accommodate
that seems like it's a lot of work and risk for pretty minimal
benefit. Are you sure it's worth it? Perhaps better commenting
around the SERIALIZABLEXID structure to indicate it's effectively a
used for a non-primary key index into the other collection?

Well, we generally try to avoid dynamic structures in shared
memory, because shared memory can't be resized.

But don't HTAB structures go beyond their estimated sizes as needed?
I was trying to accommodate the situation where one collection
might not be anywhere near its limit, but some other collection has
edged past. Unless I'm misunderstanding things (which is always
possible), the current HTAB implementation takes advantage of the
"slush fund" of unused space to some degree. I was just trying to
maintain the same flexibility with the list.

I was thinking of returning a size based on the *maximum* allowed
allocations from the estimated size function, and actually limiting
it to that size. So it wasn't so much a matter of grabbing more
than expected, but leaving something for the hash table slush if
possible. Of course I was also thinking that this would allow one
to be a little bit more generous with he maximum, as it might have
benefit elsewhere...

So, you'd typically use an array with a fixed number of elements.

That's certainly a little easier, if you think it's better.

Any chance of collapsing together entries of already-committed
transactions in the SSI patch, to put an upper limit on the number
of shmem list entries needed? If you can do that, then a simple
array allocated at postmaster startup will do fine.

I suspect it can be done, but I'm quite sure that any such scheme
would increase the rate of serialization failures. Right now I'm
trying to see how much I can do to *decrease* the rate of
serialization failures, so I'm not eager to go there. :-/ If it is
necessary, the most obvious way to manage this is just to force
cancellation of the oldest running serializable transaction and
running ClearOldPredicateLocks(), perhaps iterating, until we free
an entry to service the new request.

-Kevin

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Markus Wanner (#3)

Re: Do we need a ShmList implementation?

Markus Wanner <markus@bluegap.ch> wrote:

On 09/20/2010 05:12 PM, Kevin Grittner wrote:

SHM_QUEUE objects provide the infrastructure for maintaining a
shared memory linked list, but they don't do anything about the
allocation and release of the space for the objects.

Did you have a look at my dynshmem stuff? It tries to solve the
problem of dynamic allocation from shared memory. Not just for
lists, but very generally.

Yeah, I mostly followed that thread. If such a feature was present,
it might well make sense to use it for this; however, I've got
enough trouble selling the SSI technology without making it
dependent on something else which was clearly quite controversial,
and which seemed to have some technical hurdles of its own left to
clear. :-/

At the point where there is an implementation which is accepted by
the community, I'll certainly take another look.

-Kevin

simon@2ndQuadrant.com

over 15 years ago

In reply to: Heikki Linnakangas (#2)

Re: Do we need a ShmList implementation?

On Mon, 2010-09-20 at 18:37 +0300, Heikki Linnakangas wrote:

SHM_QUEUE objects provide the infrastructure for maintaining a
shared memory linked list, but they don't do anything about the
allocation and release of the space for the objects. So it occurs
to me that I'm using an HTAB for this collection because it provides
the infrastructure for managing the memory for the collection,
rather than because I need hash lookup. :-( It works, but that
hardly seems optimal.

Have I missed something we already have which could meet that need?

Well, we generally try to avoid dynamic structures in shared memory,
because shared memory can't be resized. So, you'd typically use an array
with a fixed number of elements. One could even argue that we
specifically *don't* want to have the kind of infrastructure you
propose, to discourage people from writing patches that need dynamic
shmem structures.

My understanding is that we used to have that and it was removed for the
reasons Heikki states. There are still vestigial bits still in code.

Not exactly impressed with the SHM_QUEUE stuff though, so I appreciate
the sentiment that Kevin expresses.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Simon Riggs (#6)

Re: Do we need a ShmList implementation?

Simon Riggs <simon@2ndQuadrant.com> wrote:

My understanding is that we used to have that and it was removed
for the reasons Heikki states. There are still vestigial bits
still in code.

Not exactly impressed with the SHM_QUEUE stuff though, so I
appreciate the sentiment that Kevin expresses.

So, if I just allocated a fixed memory space to provide an API
similar to my previous post, does that sound reasonable to you? For
the record, my intention would be to hide the SHM_QUEUE structures
in this API -- an entry would be just the structure you're
interested in working with. If practical, I would prefer for
ShmList to be a pointer to an opaque structure; users of this
shouldn't really be exposed to or depend upon the implementation.

-Kevin

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Kevin Grittner (#4)

Re: Do we need a ShmList implementation?

On 20/09/10 19:04, Kevin Grittner wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

In the SSI patch, you'd also need a way to insert an existing
struct into a hash table. You currently work around that by using
a hash element that contains only the hash key, and a pointer to
the SERIALIZABLEXACT struct. It isn't too bad I guess, but I find
it a bit confusing.

Hmmm... Mucking with the hash table implementation to accommodate
that seems like it's a lot of work and risk for pretty minimal
benefit. Are you sure it's worth it?

No, I'm not sure at all.

Well, we generally try to avoid dynamic structures in shared
memory, because shared memory can't be resized.

But don't HTAB structures go beyond their estimated sizes as needed?

Yes, but not in a very smart way. The memory allocated for hash table
elements are never free'd. So if you use up all the "slush fund" shared
memory for SIREAD locks, it can't be used for anything else anymore,
even if the SIREAD locks are later released.

Any chance of collapsing together entries of already-committed
transactions in the SSI patch, to put an upper limit on the number
of shmem list entries needed? If you can do that, then a simple
array allocated at postmaster startup will do fine.

I suspect it can be done, but I'm quite sure that any such scheme
would increase the rate of serialization failures. Right now I'm
trying to see how much I can do to *decrease* the rate of
serialization failures, so I'm not eager to go there. :-/

I see. It's worth spending some mental power on, an upper limit would
make life a lot easier. It doesn't matter much if it's 2*max_connections
or 100*max_connections, as long as it's finite.

If it is
necessary, the most obvious way to manage this is just to force
cancellation of the oldest running serializable transaction and
running ClearOldPredicateLocks(), perhaps iterating, until we free
an entry to service the new request.

Hmm, that's not very appealing either. But perhaps it's still better
than not letting any new transactions to begin. We could say "snapshot
too old" in the error message :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Kevin Grittner (#7)

Re: Do we need a ShmList implementation?

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Simon Riggs <simon@2ndQuadrant.com> wrote:

My understanding is that we used to have that and it was removed
for the reasons Heikki states. There are still vestigial bits
still in code.

There's nothing vestigial about SHM_QUEUE --- it's used by the lock
manager. But it's intended to link together structs whose existence
is managed by somebody else.

Not exactly impressed with the SHM_QUEUE stuff though, so I
appreciate the sentiment that Kevin expresses.

So, if I just allocated a fixed memory space to provide an API
similar to my previous post, does that sound reasonable to you?

I'm not excited about inventing an API with just one use-case; it's
unlikely that you actually end up with anything generally useful.
(SHM_QUEUE seems like a case in point...) Especially when there are so
many other constraints on what shared memory is usable for. You might
as well just do this internally to the SERIALIZABLEXACT management code.

regards, tom lane

simon@2ndQuadrant.com

over 15 years ago

In reply to: Tom Lane (#9)

Re: Do we need a ShmList implementation?

On Mon, 2010-09-20 at 12:35 -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Simon Riggs <simon@2ndQuadrant.com> wrote:

My understanding is that we used to have that and it was removed
for the reasons Heikki states. There are still vestigial bits
still in code.

There's nothing vestigial about SHM_QUEUE --- it's used by the lock
manager.

Yes, I was talking about an implementation that allocated memory as
well. There are sections of IFDEF'd out code there...

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Tom Lane (#9)

Re: Do we need a ShmList implementation?

Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's nothing vestigial about SHM_QUEUE --- it's used by the
lock manager. But it's intended to link together structs whose
existence is managed by somebody else.

Yep, that's exactly my problem.

I'm not excited about inventing an API with just one use-case;
it's unlikely that you actually end up with anything generally
useful. (SHM_QUEUE seems like a case in point...) Especially
when there are so many other constraints on what shared memory is
usable for. You might as well just do this internally to the
SERIALIZABLEXACT management code.

Fair enough. I'll probably abstract it within the SSI patch anyway,
just because it will keep the other code cleaner where the logic is
necessarily kinda messy anyway, and I think it'll reduce the chance
of weird memory bugs. I just won't get quite so formal about the
interface.

-Kevin

markus@bluegap.ch

over 15 years ago

In reply to: Kevin Grittner (#5)

Re: Do we need a ShmList implementation?

On 09/20/2010 06:09 PM, Kevin Grittner wrote:

Yeah, I mostly followed that thread. If such a feature was present,
it might well make sense to use it for this; however, I've got
enough trouble selling the SSI technology without making it
dependent on something else which was clearly quite controversial,
and which seemed to have some technical hurdles of its own left to
clear. :-/

Okay, well understandable. I'm wondering how you want to implement the
memory allocation part, though.

At the point where there is an implementation which is accepted by
the community, I'll certainly take another look.

Fair enough, thanks.

Regards

Markus Wanner

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Markus Wanner (#12)

Re: Do we need a ShmList implementation?

Markus Wanner <markus@bluegap.ch> wrote:

I'm wondering how you want to implement the memory allocation part

Based on the feedback I've received, it appears that the only sane
way to do that in the current shared memory environment is to
allocate a fixed size of memory to hold these entries on postmaster
startup. To minimize the chance that we'll be forced to cancel
running transactions to deal with the limit, it will need to be
sized to some multiple of max_connections.

Obviously, if there were a dynamic way to add to the entries as
needed, there would be one less setting (hard-coded or GUC) to worry
about getting right. Too low means transactions need to be
canceled. Too high means you're wasting space which could otherwise
go to caching. And of course, the optimal number could change from
day to day or hour to hour.

-Kevin

markus@bluegap.ch

over 15 years ago

In reply to: Kevin Grittner (#13)

Re: Do we need a ShmList implementation?

On 09/20/2010 08:06 PM, Kevin Grittner wrote:

Obviously, if there were a dynamic way to add to the entries as
needed, there would be one less setting (hard-coded or GUC) to worry
about getting right. Too low means transactions need to be
canceled. Too high means you're wasting space which could otherwise
go to caching. And of course, the optimal number could change from
day to day or hour to hour.

Yeah, same problem as with lots of the other users shared memory.

It certainly makes sense to decouple the two projects, so you'll have to
pick some number that sounds good to you now.

Regards

Markus

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Kevin Grittner (#11)

Re: Do we need a ShmList implementation?

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

I'm not excited about inventing an API with just one use-case;
it's unlikely that you actually end up with anything generally
useful. (SHM_QUEUE seems like a case in point...) Especially
when there are so many other constraints on what shared memory is
usable for. You might as well just do this internally to the
SERIALIZABLEXACT management code.

Fair enough. I'll probably abstract it within the SSI patch
anyway, just because it will keep the other code cleaner where the
logic is necessarily kinda messy anyway, and I think it'll reduce
the chance of weird memory bugs. I just won't get quite so formal
about the interface.

OK, I'd say it's a little rough yet, but it works. Is this
reasonable?:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=b8eca245ab63725d0fbfc3b5969f4a17fc765f2c

In particular, I'm a little squeamish about how I allocated the
shared memory for the list, but I couldn't think of anything that
seemed better.

-Kevin

markus@bluegap.ch

over 15 years ago

In reply to: Kevin Grittner (#15)

Re: Do we need a ShmList implementation?

Hi,

On 09/21/2010 05:48 PM, Kevin Grittner wrote:

OK, I'd say it's a little rough yet, but it works. Is this
reasonable?:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=b8eca245ab63725d0fbfc3b5969f4a17fc765f2c

I only get a: "404 - Unknown commit object" on that link. Did you push
your work?

Regards

Markus Wanner

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Markus Wanner (#16)

1 attachment(s)

Re: Do we need a ShmList implementation?

Markus Wanner <markus@bluegap.ch> wrote:

I only get a: "404 - Unknown commit object" on that link. Did you
push your work?

Yeah, but it has since been blown away (at my request) as part of my
attempt to get it based on the new git conversion. Sorry about
that. Attached is an mbox representation of what was there.
Hopefully I can get it out there again today some time.

-Kevin

Attachments:

xactlist.mboxapplication/octet-stream; name=xactlist.mboxDownload

From e29927c7966adba2443fdc4f64da9d282f95a05b Mon Sep 17 00:00:00 2001
From: Kevin Grittner <Kevin.Grittner@wicourts.gov>
Date: Sat, 18 Sep 2010 11:08:40 -0500
Subject: [PATCH 1460/1461] Rework techniques for getting to top level transaction when using
 an XID from MVCC data, based on concerns form Heikki and Tom. Use
 SubTransGetTopmostTransaction, checking both before and after for
 a TranactionId before TransactionXmin, which would indicate a
 transaction too early to overlap, and therefore not able to
 generate a conflict with the current transaction.

---
 src/backend/access/transam/varsup.c      |    4 +
 src/backend/storage/lmgr/predicate.c     |  146 ++++++++++++------------------
 src/include/storage/predicate.h          |    1 +
 src/include/storage/predicate_internal.h |   15 ++--
 4 files changed, 70 insertions(+), 96 deletions(-)

diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index efe11a9..8811e24 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -20,6 +20,7 @@
 #include "miscadmin.h"
 #include "postmaster/autovacuum.h"
 #include "storage/pmsignal.h"
+#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/syscache.h"
@@ -156,9 +157,12 @@ GetNewTransactionId(bool isSubXact)
 	 * holds 32K or more transactions, so we don't have to do this very often.
 	 *
 	 * Extend pg_subtrans too.
+	 * If it's top level, the predicate locking system also needs to know.
 	 */
 	ExtendCLOG(xid);
 	ExtendSUBTRANS(xid);
+	if (!isSubXact)
+		RegisterPredicateLockingXid(xid);
 
 	/*
 	 * Now advance the nextXid counter.  This must not happen until after we
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 76e1277..974b95c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -116,6 +116,7 @@
  *
  * predicate lock maintenance
  *		RegisterSerializableTransaction(Snapshot snapshot)
+ *		RegisterPredicateLockingXid(void)
  *		PredicateLockRelation(Relation relation)
  *		PredicateLockPage(Relation relation, BlockNumber blkno)
  *		PredicateLockTuple(Relation relation, HeapTuple tuple)
@@ -137,6 +138,7 @@
 
 #include "postgres.h"
 
+#include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -293,7 +295,6 @@ static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
 						  PREDICATELOCKTARGETTAG *parent);
 static void DecrementParentLocks(const PREDICATELOCKTARGETTAG *targettag);
 static void PredicateLockAcquire(const PREDICATELOCKTARGETTAG *tag);
-static void EnsureMySerializableXidExists(void);
 static void ClearOldPredicateLocks(void);
 static bool XidIsConcurrent(TransactionId xid);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
@@ -612,7 +613,6 @@ RegisterSerializableTransaction(const Snapshot snapshot)
 	sxact->finishedBefore = InvalidTransactionId;
 	sxact->xmin = snapshot->xmin;
 	SHMQueueInit(&(sxact->predicateLocks));
-	SHMQueueInit(&(sxact->xids));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->rolledBack = false;
 	LWLockRelease(SerializableXactHashLock);
@@ -632,55 +632,50 @@ RegisterSerializableTransaction(const Snapshot snapshot)
 }
 
 /*
- * Make sure we have a SERIALIZABLEXACT reference in MySerializableXact.
- * It should be current for this process and be contained in
- * SerializableXidHash.
+ * Register the top level XID in SerializableXidHash.
+ * Also store it for easy reference in MySerializableXact.
  */
-static void
-EnsureMySerializableXidExists(void)
+void
+RegisterPredicateLockingXid(const TransactionId xid)
 {
-	TransactionId xid;
-
-	Assert(MySerializableXact != InvalidSerializableXact);
-
-	MySerializableXact->topXid = GetTopTransactionIdIfAny();
+	SERIALIZABLEXIDTAG sxidtag;
+	SERIALIZABLEXID *sxid;
+	bool		found;
 
 	/*
-	 * If this isn't the xid we've most recently seen for this vxid, make sure
-	 * it's in the hash table.
+	 * If we're not tracking predicate lock data for this transaction,
+	 * we should ignore the request and return quickly.
 	 */
-	xid = GetCurrentTransactionIdIfAny();
-	if (MyXid != xid)
-	{
-		SERIALIZABLEXIDTAG sxidtag;
-		SERIALIZABLEXID *sxid;
-		bool		found;
+	if (MySerializableXact == InvalidSerializableXact)
+		return;
 
-		Assert(TransactionIdIsValid(xid));
+	/* This should only be done once per transaction. */
+	Assert(MySerializableXact->topXid == InvalidTransactionId);
 
-		sxidtag.xid = xid;
-		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
-		sxid = (SERIALIZABLEXID *) hash_search(SerializableXidHash,
-											   &sxidtag,
-											   HASH_ENTER, &found);
-		if (!sxid)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of shared memory"),
-					 errhint("You might need to increase max_predicate_locks_per_transaction.")));
+	/* We should have a valid XID and be at the top level. */
+	Assert(TransactionIdIsValid(xid));
 
-		/* Initialize the structure. */
-		if (!found)
-		{
-			sxid->myXact = (SERIALIZABLEXACT *) MySerializableXact;
-			SHMQueueInsertBefore(&(((SERIALIZABLEXACT *) MySerializableXact)->xids),
-								 &(sxid->xactLink));
-		}
-		LWLockRelease(SerializableXactHashLock);
-		MyXid = xid;
-	}
+	MySerializableXact->topXid = xid;
+
+	sxidtag.xid = xid;
+	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+	sxid = (SERIALIZABLEXID *) hash_search(SerializableXidHash,
+										   &sxidtag,
+										   HASH_ENTER, &found);
+	if (!sxid)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				  errmsg("out of shared memory"),
+				  errhint("You might need to increase max_predicate_locks_per_transaction.")));
+
+	Assert(!found);
+
+	/* Initialize the structure. */
+	sxid->myXact = (SERIALIZABLEXACT *) MySerializableXact;
+	LWLockRelease(SerializableXactHashLock);
 }
 
+
 /*
  * Check whether there are any predicate locks held by any transaction
  * for the page at the given block number.
@@ -1053,8 +1048,6 @@ PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag)
 	PREDICATELOCK *lock;
 	LOCALPREDICATELOCK *locallock;
 
-	EnsureMySerializableXidExists();
-
 	/* Do we have the lock already, or a covering lock? */
 	if (PredicateLockExists(targettag))
 		return;
@@ -1201,28 +1194,22 @@ PredicateLockTuple(const Relation relation, const HeapTuple tuple)
 		return;
 
 	/*
-	 * If it's a heap tuple, return if this xact wrote it.  It might be useful
-	 * to pass in the xmin from the tuple as another parameter.
+	 * If it's a heap tuple, return if this xact wrote it.
 	 */
 	if (relation->rd_index == NULL)
 	{
-		SERIALIZABLEXIDTAG sxidtag;
-		SERIALIZABLEXID *sxid;
-
-		sxidtag.xid = HeapTupleHeaderGetXmin(tuple->t_data);
-		LWLockAcquire(SerializableXactHashLock, LW_SHARED);
-		sxid = (SERIALIZABLEXID *)
-			hash_search(SerializableXidHash, &sxidtag, HASH_FIND, NULL);
-		if (sxid)
+		TransactionId xid;
+
+		xid = HeapTupleHeaderGetXmin(tuple->t_data);
+		if (TransactionIdFollowsOrEquals(xid, TransactionXmin))
 		{
-			if (sxid->myXact == MySerializableXact)
+			xid = SubTransGetTopmostTransaction(xid);
+			if (xid == GetTopTransactionIdIfAny())
 			{
 				/* We wrote it; we already have a write lock. */
-				LWLockRelease(SerializableXactHashLock);
 				return;
 			}
 		}
-		LWLockRelease(SerializableXactHashLock);
 	}
 
 	tid = &(tuple->t_self);
@@ -1731,7 +1718,7 @@ static void
 ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact)
 {
 	PREDICATELOCK *predlock;
-	SERIALIZABLEXID *sxid;
+	SERIALIZABLEXIDTAG sxidtag;
 
 	Assert(sxact != NULL);
 	Assert(sxact->rolledBack || SxactIsCommitted(sxact));
@@ -1783,31 +1770,13 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact)
 	}
 	LWLockRelease(SerializablePredicateLockListLock);
 
-	/* Get rid of the xids and the record of the transaction itself. */
-	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
-	sxid = (SERIALIZABLEXID *)
-		SHMQueueNext(&(sxact->xids),
-					 &(sxact->xids),
-					 offsetof(SERIALIZABLEXID, xactLink));
-	while (sxid)
-	{
-		SERIALIZABLEXID *nextsxid;
-		SERIALIZABLEXIDTAG tag;
-
-		nextsxid = (SERIALIZABLEXID *)
-			SHMQueueNext(&(sxact->xids),
-						 &(sxid->xactLink),
-						 offsetof(SERIALIZABLEXID, xactLink));
-		tag = sxid->tag;
-		hash_search(SerializableXidHash, &tag, HASH_REMOVE, NULL);
-
-		/*
-		 * No need to do retail removal from transaction object; it's going
-		 * away.
-		 */
-		sxid = nextsxid;
-	}
 	SHMQueueDelete(&(sxact->finishedLink));
+
+	/* Get rid of the xid and the record of the transaction itself. */
+	sxidtag.xid = sxact->topXid;
+	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+	if (sxidtag.xid != InvalidTransactionId)
+		hash_search(SerializableXidHash, &sxidtag, HASH_REMOVE, NULL);
 	hash_search(SerializableXactHash, &(sxact->tag), HASH_REMOVE, NULL);
 	LWLockRelease(SerializableXactHashLock);
 }
@@ -1903,6 +1872,13 @@ CheckForSerializableConflictOut(const bool valid, const Relation relation,
 		xid = HeapTupleHeaderGetXmin(tuple->t_data);
 	}
 
+	/* Bail out if xid is too early to be a conflict. */
+	if (TransactionIdPrecedes(xid, TransactionXmin))
+		return;
+	xid = SubTransGetTopmostTransaction(xid);
+	if (TransactionIdPrecedes(xid, TransactionXmin))
+		return;
+
 	/*
 	 * It's OK to look for conflicts with a share lock, and record them with
 	 * an exclusive lock when found; we just have to release the shared lock
@@ -1949,12 +1925,6 @@ CheckForSerializableConflictOut(const bool valid, const Relation relation,
 	sxacttag = sxact->tag;
 	LWLockRelease(SerializableXactHashLock);
 
-	/*
-	 * Make sure we have somewhere to record a conflict against this
-	 * transaction.
-	 */
-	EnsureMySerializableXidExists();
-
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 	sxact = (SERIALIZABLEXACT *)
 		hash_search(SerializableXactHash, &sxacttag, HASH_FIND, NULL);
@@ -2184,8 +2154,6 @@ CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple,
 	if (SkipSerialization(relation))
 		return;
 
-	EnsureMySerializableXidExists();
-
 	/*
 	 * It is important that we check for locks from the finest granularity to
 	 * the coarsest granularity, so that granularity promotion doesn't cause
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 7dcc2af..2cc20ab 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -37,6 +37,7 @@ extern bool PageIsPredicateLocked(const Relation relation, const BlockNumber blk
 
 /* predicate lock maintenance */
 extern void RegisterSerializableTransaction(const Snapshot snapshot);
+extern void RegisterPredicateLockingXid(const TransactionId xid);
 extern void PredicateLockRelation(const Relation relation);
 extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
 extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
diff --git a/src/include/storage/predicate_internal.h b/src/include/storage/predicate_internal.h
index 7cdb5af..d6a2b8c 100644
--- a/src/include/storage/predicate_internal.h
+++ b/src/include/storage/predicate_internal.h
@@ -74,7 +74,6 @@ typedef struct SERIALIZABLEXACT
 										 * serializable xids are before this. */
 	TransactionId xmin;			/* the transaction's snapshot xmin */
 	SHM_QUEUE	predicateLocks; /* list of associated PREDICATELOCK objects */
-	SHM_QUEUE	xids;			/* list of associated SERIALIZABLEXID objects */
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 	bool		rolledBack;		/* ignore conflicts when true; allows deferred
@@ -93,8 +92,8 @@ typedef struct SERIALIZABLEXIDTAG
 
 /*
  * The SERIALIZABLEXID struct provides a link from a TransactionId for a
- * serializable transaction or subtransaction to the related SERIALIZABLEXACT
- * record.
+ * serializable transaction to the related SERIALIZABLEXACT record, even if
+ * the transaction has completed and its connection has been closed.
  * 
  * A hash table of these objects is maintained in shared memory to provide a
  * quick way to find the top level transaction information for a serializable
@@ -104,8 +103,12 @@ typedef struct SERIALIZABLEXIDTAG
  * allows a fast link from MVCC transaction IDs to the related serializable
  * transaction hash table entry.
  *
- * These are created as the transaction and its subtransactions write data,
- * and are kept until the cleanup of the top level transaction.
+ * These are created as new top level transaction IDs are first assigned to
+ * transactions which are participating in predicate locking.  They are
+ * removed with their related serializable transaction objects.
+ *
+ * The SubTransGetTopmostTransaction method is used where necessary to get
+ * from an XID which might be from a subtransaction to the top level XID.
  */
 typedef struct SERIALIZABLEXID
 {
@@ -114,8 +117,6 @@ typedef struct SERIALIZABLEXID
 
 	/* data */
 	SERIALIZABLEXACT *myXact;	/* pointer to the top level transaction data */
-	SHM_QUEUE	xactLink;		/* list link in SERIALIZABLEXACT's list of
-								 * xids */
 } SERIALIZABLEXID;
 
 
-- 
1.6.3.3


From b8eca245ab63725d0fbfc3b5969f4a17fc765f2c Mon Sep 17 00:00:00 2001
From: Kevin Grittner <Kevin.Grittner@wicourts.gov>
Date: Tue, 21 Sep 2010 10:11:26 -0500
Subject: [PATCH 1461/1461] Replace the hash table for SERIALIZABLEXACT objects with a list.
 The hash table was really only being used for memory allocation
 purposes; it was the SHM_QUEUE objects forming lists which really
 mattered.  That seemed inefficient and overly hacky, so a simple
 shared memory list implementation using SHM_QUEUE was created.
 The use of SHM_QUEUE is hidden, and all access is through four
 small functions, but no effort was made to really turn the list
 into a proper "black box" or generalize it.

---
 src/backend/storage/lmgr/predicate.c     |  186 +++++++++++++++++++++++-------
 src/include/storage/predicate_internal.h |   32 +++++
 2 files changed, 177 insertions(+), 41 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 974b95c..44c86e0 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -257,10 +257,21 @@ TransactionId SerializableGlobalXmin = InvalidTransactionId;
 int			SerializableGlobalXminCount = 0;
 
 /*
+ * This provides a list of objects in order to track transactions
+ * participating in predicate locking.  Entries in the list are fixed size,
+ * and reside in shared memory.  The memory address of an entry must remain
+ * fixed during its lifetime.  The list will be protected from concurrent
+ * update externally; no provision is made in this code to manage that.  The
+ * number of entries in the list, and the size allowed for each entry is
+ * fixed upon creation.
+ */
+static PredTranList PredLockTranList;
+
+
+/*
  * The predicate locking hash tables are in shared memory.
  * Each backend keeps pointers to them.
  */
-static HTAB *SerializableXactHash;
 static HTAB *SerializableXidHash;
 static HTAB *PredicateLockTargetHash;
 static HTAB *PredicateLockHash;
@@ -275,16 +286,19 @@ static HTAB *LocalPredicateLockHash = NULL;
 /*
  * Keep a pointer to the currently-running serializable transaction (if any)
  * for quick reference.
+ * TODO SSI: Remove volatile qualifier and the then-unnecessary casts?
  */
 static volatile SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
 
-/* TODO SSI: Remove volatile qualifier and the then-unnecessary casts? */
-
-/* The most recently used xid within this transaction, for optimizations. */
-static TransactionId MyXid = InvalidTransactionId;
 
 
 /* local functions */
+
+static SERIALIZABLEXACT *CreatePredTran(void);
+static void ReleasePredTran(SERIALIZABLEXACT *sxact);
+static SERIALIZABLEXACT *FirstPredTran(void);
+static SERIALIZABLEXACT *NextPredTran(SERIALIZABLEXACT *sxact);
+
 static uint32 predicatelock_hash(const void *key, Size keysize);
 static void ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact);
 static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *newtargettag);
@@ -302,6 +316,78 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										const SERIALIZABLEXACT *writer);
 
+
+/*
+ * These functions are a simple implementation of a list for this specific
+ * type of struct.  If there is ever a generalized shared memory list, we
+ * should probably switch to that.
+ */
+static SERIALIZABLEXACT *
+CreatePredTran(void)
+{
+	PredTranListElement ptle;
+
+	ptle = (PredTranListElement)
+		SHMQueueNext(&PredLockTranList->availableList,
+					 &PredLockTranList->availableList,
+					 offsetof(PredTranListElementData, link));
+	SHMQueueDelete(&ptle->link);
+	SHMQueueInsertBefore(&PredLockTranList->activeList, &ptle->link);
+	return &ptle->sxact;
+}
+
+static void
+ReleasePredTran(SERIALIZABLEXACT *sxact)
+{
+	PredTranListElement ptle;
+
+	Assert(ShmemAddrIsValid(sxact));
+
+	ptle = (PredTranListElement)
+		(((char *) sxact)
+			- offsetof(PredTranListElementData, sxact)
+			+ offsetof(PredTranListElementData, link));
+	SHMQueueDelete(&ptle->link);
+	SHMQueueInsertBefore(&PredLockTranList->availableList, &ptle->link);
+}
+
+static SERIALIZABLEXACT *
+FirstPredTran(void)
+{
+	PredTranListElement ptle;
+
+	ptle = (PredTranListElement)
+			SHMQueueNext(&PredLockTranList->activeList,
+						 &PredLockTranList->activeList,
+						 offsetof(PredTranListElementData, link));
+	if (!ptle)
+		return NULL;
+
+	return &ptle->sxact;
+}
+
+static SERIALIZABLEXACT *
+NextPredTran(SERIALIZABLEXACT *sxact)
+{
+	PredTranListElement ptle;
+
+	Assert(ShmemAddrIsValid(sxact));
+
+	ptle = (PredTranListElement)
+		(((char *) sxact)
+			- offsetof(PredTranListElementData, sxact)
+			+ offsetof(PredTranListElementData, link));
+	ptle = (PredTranListElement)
+			SHMQueueNext(&PredLockTranList->activeList,
+						 &ptle->link,
+						 offsetof(PredTranListElementData, link));
+	if (!ptle)
+		return NULL;
+
+	return &ptle->sxact;
+}
+
+
 /*
  * InitPredicateLocks -- Initialize the predicate locking data structures.
  *
@@ -319,6 +405,7 @@ InitPredicateLocks(void)
 	int			hash_flags;
 	long		init_table_size,
 				max_table_size;
+	Size		requestSize;
 	bool		found;
 
 	/*
@@ -375,25 +462,46 @@ InitPredicateLocks(void)
 	init_table_size = max_table_size / 2;
 
 	/*
-	 * Allocate hash table for SERIALIZABLEXACT structs.  This stores per-vxid
-	 * information for serializable transactions which have accessed data.
+	 * Allocate a list to hold information on transaction participating in
+	 * predicate locking.
+	 *
+	 * Assume an average of 10 predicate locking transactions per backend.
+	 * That may seem high, but each transaction must be kept until every
+	 * overlapping predicate locking transaction has completed, so we have
+	 * to tolerate the occassional long-running transaction.
 	 */
-	MemSet(&info, 0, sizeof(info));
-	info.keysize = sizeof(SERIALIZABLEXACTTAG);
-	info.entrysize = sizeof(SERIALIZABLEXACT);
-	info.hash = tag_hash;
-	hash_flags = (HASH_ELEM | HASH_FUNCTION);
-
-	SerializableXactHash = ShmemInitHash("SERIALIZABLEXACT hash",
-										 init_table_size,
-										 max_table_size,
-										 &info,
-										 hash_flags);
-
-	/* Assume an average of 10 serializable xids per backend. */
 	max_table_size *= 10;
 	init_table_size *= 10;
 
+	PredLockTranList = ShmemInitStruct("PredTranList",
+									   PredTranListDataSize,
+									   &found);
+	if (!found)
+	{
+		int i;
+
+		SHMQueueInit(&PredLockTranList->availableList);  // available elements
+		SHMQueueInit(&PredLockTranList->activeList);  // active elements
+		requestSize = mul_size((Size) max_table_size,
+							   PredTranListElementDataSize);
+		PredLockTranList->element = ShmemAlloc(requestSize);
+		if (PredLockTranList->element == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					  errmsg("not enough shared memory for elements of data structure"
+							" \"%s\" (%lu bytes requested)",
+							"PredTranList", (unsigned long) requestSize)));
+		PredLockTranList->entryLimit = max_table_size;  // get from GUC?
+		PredLockTranList->entryCount = 0;  // starts empty
+		/* Add all elements to available list, clean. */
+		memset(PredLockTranList->element, 0, requestSize);
+		for (i = 0; i < max_table_size; i++)
+		{
+			SHMQueueInsertBefore(&(PredLockTranList->availableList),
+								 &(PredLockTranList->element[i].link));
+		}
+	}
+
 	/*
 	 * Allocate hash table for SERIALIZABLEXID structs.  This stores per-xid
 	 * information for serializable transactions which have accessed data.
@@ -447,13 +555,14 @@ PredicateLockShmemSize(void)
 	 */
 	size = add_size(size, size / 10);
 
-	/* serializable transaction table */
+	/* transaction list */
 	max_table_size = MaxBackends;
-	size = add_size(size, hash_estimate_size(max_table_size,
-											 sizeof(SERIALIZABLEXACT)));
-
-	/* serializable subtransaction table */
 	max_table_size *= 10;
+	size = add_size(size, PredTranListDataSize);
+	size = add_size(size, mul_size((Size) max_table_size,
+								   PredTranListElementDataSize));
+
+	/* transaction xid table */
 	size = add_size(size, hash_estimate_size(max_table_size,
 											 sizeof(SERIALIZABLEXID)));
 
@@ -561,7 +670,7 @@ GetPredicateLockStatusData(void)
 /*
  * Make sure we have a SERIALIZABLEXACT reference in MySerializableXact.
  * It should be current for this process and be contained in
- * SerializableXactHash.
+ * PredLockTranList.
  */
 void
 RegisterSerializableTransaction(const Snapshot snapshot)
@@ -569,7 +678,6 @@ RegisterSerializableTransaction(const Snapshot snapshot)
 	PGPROC	   *proc;
 	SERIALIZABLEXACTTAG sxacttag;
 	SERIALIZABLEXACT *sxact;
-	bool		found;
 	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
@@ -596,10 +704,8 @@ RegisterSerializableTransaction(const Snapshot snapshot)
 	{
 		Assert(TransactionIdFollows(snapshot->xmin, SerializableGlobalXmin));
 	}
-	sxact = (SERIALIZABLEXACT *) hash_search(SerializableXactHash,
-											 &sxacttag,
-											 HASH_ENTER, &found);
-	Assert(!found);
+	sxact = CreatePredTran();
+	/* TODO SSI: If null, cancel oldest tran and try again? */
 	if (!sxact)
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
@@ -607,6 +713,7 @@ RegisterSerializableTransaction(const Snapshot snapshot)
 				 errhint("You might need to increase max_predicate_locks_per_transaction.")));
 
 	/* Initialize the structure. */
+	sxact->tag = sxacttag;
 	sxact->outConflict = InvalidSerializableXact;
 	sxact->inConflict = InvalidSerializableXact;
 	sxact->topXid = GetTopTransactionIdIfAny();
@@ -1542,13 +1649,12 @@ PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno,
 static void
 SetNewSerializableGlobalXmin(void)
 {
-	HASH_SEQ_STATUS seqstat;
 	SERIALIZABLEXACT *sxact;
 
 	SerializableGlobalXmin = InvalidTransactionId;
 	SerializableGlobalXminCount = 0;
-	hash_seq_init(&seqstat, SerializableXactHash);
-	while ((sxact = (SERIALIZABLEXACT *) hash_seq_search(&seqstat)))
+
+	for (sxact = FirstPredTran(); sxact != NULL; sxact = NextPredTran(sxact))
 	{
 		if (!SxactIsOnFinishedList(sxact))
 		{
@@ -1662,7 +1768,6 @@ ReleasePredicateLocks(const bool isCommit)
 		ClearOldPredicateLocks();
 
 	MySerializableXact = InvalidSerializableXact;
-	MyXid = InvalidTransactionId;
 
 	/* Delete per-transaction lock table */
 	hash_destroy(LocalPredicateLockHash);
@@ -1777,7 +1882,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact)
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 	if (sxidtag.xid != InvalidTransactionId)
 		hash_search(SerializableXidHash, &sxidtag, HASH_REMOVE, NULL);
-	hash_search(SerializableXactHash, &(sxact->tag), HASH_REMOVE, NULL);
+	ReleasePredTran(sxact);
 	LWLockRelease(SerializableXactHashLock);
 }
 
@@ -1838,7 +1943,6 @@ CheckForSerializableConflictOut(const bool valid, const Relation relation,
 	TransactionId xid;
 	SERIALIZABLEXIDTAG sxidtag;
 	SERIALIZABLEXID *sxid;
-	SERIALIZABLEXACTTAG sxacttag;
 	SERIALIZABLEXACT *sxact;
 
 	if (SkipSerialization(relation))
@@ -1922,18 +2026,18 @@ CheckForSerializableConflictOut(const bool valid, const Relation relation,
 		return;
 	}
 
-	sxacttag = sxact->tag;
 	LWLockRelease(SerializableXactHashLock);
 
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
-	sxact = (SERIALIZABLEXACT *)
-		hash_search(SerializableXactHash, &sxacttag, HASH_FIND, NULL);
-	if (!sxact)
+	sxid = (SERIALIZABLEXID *)
+		hash_search(SerializableXidHash, &sxidtag, HASH_FIND, NULL);
+	if (!sxid)
 	{
 		/* It must have been cleaned up, which means it wasn't useful. */
 		LWLockRelease(SerializableXactHashLock);
 		return;
 	}
+	Assert(sxid->myXact == sxact);
 	xid = sxact->topXid;
 	if (!XidIsConcurrent(xid))
 	{
diff --git a/src/include/storage/predicate_internal.h b/src/include/storage/predicate_internal.h
index d6a2b8c..222fc74 100644
--- a/src/include/storage/predicate_internal.h
+++ b/src/include/storage/predicate_internal.h
@@ -80,6 +80,38 @@ typedef struct SERIALIZABLEXACT
 								 * cleanup */
 } SERIALIZABLEXACT;
 
+/*
+ * The following types are used to provide an ad hoc list for holding
+ * SERIALIZABLEXACT objects.  An HTAB is overkill, since there is no need to
+ * access these by key -- there are direct pointers to these objects where
+ * needed.  If a shared memory list is created, these types can probably be
+ * eliminated in favor of using the general solution.
+ */
+typedef struct PredTranListElementData
+{
+	SHM_QUEUE	link;
+	SERIALIZABLEXACT	sxact;
+} PredTranListElementData;
+
+typedef struct PredTranListElementData * PredTranListElement;
+
+#define PredTranListElementDataSize \
+		((Size)MAXALIGN(sizeof(PredTranListElementData)))
+
+typedef struct PredTranListData
+{
+	SHM_QUEUE	availableList;
+	SHM_QUEUE	activeList;
+	PredTranListElement	element;
+	int	entryLimit;
+	int entryCount;
+} PredTranListData;
+
+typedef struct PredTranListData *PredTranList;
+
+#define PredTranListDataSize \
+		((Size)MAXALIGN(sizeof(PredTranListData)))
+
 
 /*
  * The SERIALIZABLEXIDTAG struct identifies an xid assigned to a serializable
-- 
1.6.3.3