Do XID sequences need to be contiguous?

Started by Mark Dilgerover 6 years ago3 messageshackers
Jump to latest
#1Mark Dilger
mark.dilger@enterprisedb.com

Hackers,

While working on the problem of XID wraparound within the LISTEN/NOTIFY
system, I tried to increment XIDs by more than one per transaction.
This leads to a number of test failures, many which look like:

+ERROR:  could not access status of transaction 7485
+DETAIL:  Could not read from file "pg_subtrans/0000" at offset 24576: 
read too few bytes.

I might not have read the right documentation, but....

I do not see anything in src/backend/access/transam/README nor elsewhere
documenting a design decision or assumption that transaction IDs must
be assigned contiguously. I suppose this is such a fundamental
assumption that it is completely implicit and nobody thought to document
it, but I'd like to check for two reasons:

First, I'd like a good method of burning through transaction ids in
tests designed to check for problems in XID wrap-around.

Second, I'd like to add Asserts where appropriate regarding this
assumption. It seems strange to me that I should have gotten as far
as a failing read() without having tripped an Assert somewhere along the
way.

To duplicate the errors I hit, you can either apply this simple change:

diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 33fd052156..360b7335bb 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -83,7 +83,7 @@ FullTransactionIdFromEpochAndXid(uint32 epoch, 
TransactionId xid)
  static inline void
  FullTransactionIdAdvance(FullTransactionId *dest)
  {
-       dest->value++;
+       dest->value += 2;
         while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
                 dest->value++;
  }

or apply the much larger WIP patch, attached, and then be sure to
provide the --enable-xidcheck flag to configure before building.

--
Mark Dilger

Attachments:

xidcheck.WIP.patch.1text/plain; charset=UTF-8; name=xidcheck.WIP.patch.1Download+216-0
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Dilger (#1)
Re: Do XID sequences need to be contiguous?

Mark Dilger <hornschnorter@gmail.com> writes:

While working on the problem of XID wraparound within the LISTEN/NOTIFY
system, I tried to increment XIDs by more than one per transaction.
This leads to a number of test failures, many which look like:

IIRC, the XID-creation logic is designed to initialize the next clog
page whenever it allocates an exact-multiple-of-BLCKSZ*4 transaction
number. Skipping over such numbers would create trouble.

First, I'd like a good method of burning through transaction ids in
tests designed to check for problems in XID wrap-around.

Don't "burn through them". Stop the cluster and use pg_resetwal to
set the XID counter wherever you want it. (You might need to set it
just before a page or segment boundary; I'm not sure if pg_resetwal
has any logic of its own to initialize a new CLOG page/file when you
move the counter this way. Perhaps it's worth improving that.)

Second, I'd like to add Asserts where appropriate regarding this
assumption.

I'm not excited about that, and it's *certainly* not a problem that
justifies additional configure infrastructure.

regards, tom lane

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Mark Dilger (#1)
Re: Do XID sequences need to be contiguous?

On Fri, Nov 29, 2019 at 12:22 AM Mark Dilger <hornschnorter@gmail.com> wrote:

Hackers,

While working on the problem of XID wraparound within the LISTEN/NOTIFY
system, I tried to increment XIDs by more than one per transaction.
This leads to a number of test failures, many which look like:

+ERROR:  could not access status of transaction 7485
+DETAIL:  Could not read from file "pg_subtrans/0000" at offset 24576:
read too few bytes.

I might not have read the right documentation, but....

I do not see anything in src/backend/access/transam/README nor elsewhere
documenting a design decision or assumption that transaction IDs must
be assigned contiguously. I suppose this is such a fundamental
assumption that it is completely implicit and nobody thought to document
it, but I'd like to check for two reasons:

First, I'd like a good method of burning through transaction ids in
tests designed to check for problems in XID wrap-around.

As Tom pointed out and as mentioned in the comments "If we are
allocating the first XID of a new page of the commit log, zero out
that commit-log page before returning.", we need to take care of
extending the CLOG while advancing TransactionIds. I have some old
script for burning transactionid's which I am attaching here. It
might help you. I think this is provided long back by Jeff Janes.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

transactionid_burner_v1.patchapplication/octet-stream; name=transactionid_burner_v1.patchDownload+30-11