[PATCH] Transaction traceability - txid_status(bigint)
Hi all
Following on from
bigint txids vs 'xid' type, new txid_recent(bigint) => xid
/messages/by-id/CAMsr+YFDZMN_iZ7KrRoe+j0KVLQvFVgvZxbcVxR-MLjgtoZugA@mail.gmail.com
here's a patch that implements a txid_status(bigint) function to report the
commit status of a function.
If an application is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.
txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.
Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
A future protocol enhancement to report txid assignment would be very
useful, but quite separate to this.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+223-25
On 20 August 2016 at 21:24, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all
Following on from
bigint txids vs 'xid' type, new txid_recent(bigint) => xid
Ahem. Forgot to squash in a fixup commit. Updated patch of
txid_status(bigint) attachd.
A related patch follows, adding a new txid_current_ifassigned(bigint)
function as suggested by Jim Nasby. It's usefully connected to
txid_status() and might as well be added at the same time.
Since it builds on the same history I've also attached an updated version
of txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the
above-linked thread.
Finally, and not intended for commit, is a useful test function I wrote to
cause extremely rapid xid wraparound, bundled up into a src/test/regress
test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2
seconds if fsync is off, making it handy for testing. Posting so others
can use it for their own test needs later and because it's useful for
testing these patches that touch on the xid epoch.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+240-25
0002-Add-txid_current_ifassigned.patchtext/x-patch; charset=US-ASCII; name=0002-Add-txid_current_ifassigned.patchDownload+53-1
0003-Add-txid_convert_ifrecent-to-get-the-32-bit-xid-from.patchtext/x-patch; charset=US-ASCII; name=0003-Add-txid_convert_ifrecent-to-get-the-32-bit-xid-from.patchDownload+183-8
0004-Add-txid_incinerate-test-function-for-fast-wrap-arou.patchtext/x-patch; charset=US-ASCII; name=0004-Add-txid_incinerate-test-function-for-fast-wrap-arou.patchDownload+455-1
On Sat, Aug 20, 2016 at 9:46 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
Ahem. Forgot to squash in a fixup commit. Updated patch of
txid_status(bigint) attachd.A related patch follows, adding a new txid_current_ifassigned(bigint)
function as suggested by Jim Nasby. It's usefully connected to txid_status()
and might as well be added at the same time.Since it builds on the same history I've also attached an updated version of
txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the
above-linked thread.Finally, and not intended for commit, is a useful test function I wrote to
cause extremely rapid xid wraparound, bundled up into a src/test/regress
test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2
seconds if fsync is off, making it handy for testing. Posting so others can
use it for their own test needs later and because it's useful for testing
these patches that touch on the xid epoch.
I think you should use underscores to separate all of the words
instead of only some of them.
Also, note that the corresponding internal function is
GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
rather than txid_current_if_assigned(), but you could argue that your
naming is better...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com> wrote:
I think you should use underscores to separate all of the words
instead of only some of them.
Right. Will fix.
Thanks for taking a look.
Also, note that the corresponding internal function is
GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
rather than txid_current_if_assigned(), but you could argue that your
naming is better.
Yeah, I do argue that in this case. Not a hugely strong opinion, but we
refer to txid_current() in the docs as:
"get current transaction ID, assigning a new one if the current transaction
does not have one"
so I thought it'd be worth being consistent with that. It's not like
txid_current() mirrors the name of GetTopTransactionId() after all ;) and I
think it's more important to be consistent with what the user sees than
with the code.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com> wrote:
I think you should use underscores to separate all of the words
instead of only some of them.
ifassigned => if_assigned
ifrecent=> if_recent
Updated patch series attached. As before, 0-4 intended for commit, 5 just
because it'll be handy to have around for people doing wraparound related
testing.
Again, thanks for taking a look.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+239-25
0002-Add-txid_current_ifassigned.patchtext/x-patch; charset=US-ASCII; name=0002-Add-txid_current_ifassigned.patchDownload+53-1
0003-Add-txid_convert_ifrecent-to-get-the-32-bit-xid-from.patchtext/x-patch; charset=US-ASCII; name=0003-Add-txid_convert_ifrecent-to-get-the-32-bit-xid-from.patchDownload+183-8
0004-Add-txid_incinerate-test-function-for-fast-wrap-arou.patchtext/x-patch; charset=US-ASCII; name=0004-Add-txid_incinerate-test-function-for-fast-wrap-arou.patchDownload+455-1
On 23/08/16 02:55, Craig Ringer wrote:
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:I think you should use underscores to separate all of the words
instead of only some of them.ifassigned => if_assigned
ifrecent=> if_recent
Updated patch series attached. As before, 0-4 intended for commit, 5
just because it'll be handy to have around for people doing wraparound
related testing.
I guess you mean 0-3 for commit and 4 is just handy?
From the point of code this patch seems good to me.
I do wonder about the 3rd patch though. I wonder if it would not be
better to have the opposite function instead - converting xid to txid as
that will always work and does not have to have the NULL case and would
be simpler in terms of code.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 Aug 2016 16:02, "Petr Jelinek" <petr@2ndquadrant.com> wrote:
On 23/08/16 02:55, Craig Ringer wrote:
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:I think you should use underscores to separate all of the words
instead of only some of them.ifassigned => if_assigned
ifrecent=> if_recent
Updated patch series attached. As before, 0-4 intended for commit, 5
just because it'll be handy to have around for people doing wraparound
related testing.I guess you mean 0-3 for commit and 4 is just handy?
Er. Right. 1-3. 4 just as handy test/tool.
1 most important and useful. Then 2. Then 3.
From the point of code this patch seems good to me.
Thanks.
I do wonder about the 3rd patch though. I wonder if it would not be
better to have the opposite function instead - converting xid to txid as
that will always work and does not have to have the NULL case and would be
simpler in terms of code.
Yeah, but it wouldn't solve the need to take txid_current() output and do
stuff with it other than ordinal comparison. Like pass to commit ts
functions and others that take xid. If we extend all funcs that take xid to
take bigint instead, they just get to use the same epoch logic in them,
complete with some way to deal with wrapped xids sensibly. It has to be
done somewhere. Though it's prettier if hidden from the user.
More importantly imo, txid => bigint has to assume the current epoch. We
have no way to make sure the user doesn't try to use something already
wrapped.
I don't mind if everyone decides it's better to make xid go away and use
bigint everywhere user facing. Or even a new bigxid type. More work than I
can really afford but can manage; shouldn't block #1 and #2 though as they
already use bigint.
On 23/08/16 11:27, Craig Ringer wrote:
On 23 Aug 2016 16:02, "Petr Jelinek" <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 23/08/16 02:55, Craig Ringer wrote:
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>
<mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>> wrote:
I think you should use underscores to separate all of the words
instead of only some of them.ifassigned => if_assigned
ifrecent=> if_recent
Updated patch series attached. As before, 0-4 intended for commit, 5
just because it'll be handy to have around for people doing wraparound
related testing.I guess you mean 0-3 for commit and 4 is just handy?
Er. Right. 1-3. 4 just as handy test/tool.
1 most important and useful. Then 2. Then 3.
From the point of code this patch seems good to me.
Thanks.
I do wonder about the 3rd patch though. I wonder if it would not be
better to have the opposite function instead - converting xid to txid as
that will always work and does not have to have the NULL case and would
be simpler in terms of code.Yeah, but it wouldn't solve the need to take txid_current() output and
do stuff with it other than ordinal comparison. Like pass to commit ts
functions and others that take xid. If we extend all funcs that take xid
to take bigint instead, they just get to use the same epoch logic in
them, complete with some way to deal with wrapped xids sensibly. It has
to be done somewhere. Though it's prettier if hidden from the user.More importantly imo, txid => bigint has to assume the current epoch. We
have no way to make sure the user doesn't try to use something already
wrapped.
Okay, fair points.
I don't mind if everyone decides it's better to make xid go away and use
bigint everywhere user facing. Or even a new bigxid type. More work than
I can really afford but can manage; shouldn't block #1 and #2 though as
they already use bigint.
I don't think that would be very straightforward to be honest. I guess
for what you want to achieve the approach chosen is the best one then.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Updated patch series attached. As before, 0-4 intended for commit, 5 just
because it'll be handy to have around for people doing wraparound related
testing.Again, thanks for taking a look.
/me reviews a bit more deeply.
In 0001, it seems to me that "in-progress" should be "in progress". I
don't think it's normal to hyphenate that. We have admittedly
sometimes done so, but:
[rhaas pgsql]$ git grep 'in-progress' | wc -l
63
[rhaas pgsql]$ git grep 'in progress' | wc -l
346
It may make sense to speak of an "in-progress transaction" but I would
say "the transaction is in progress" not "the transaction is
in-progress", which seems to me to argue for a space as the proper
separator here.
Also:
+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
+
+CREATE FUNCTION
+ txid_status(txid bigint)
+RETURNS txid_status
+LANGUAGE sql
+VOLATILE PARALLEL SAFE
+AS $$
+SELECT CASE
+ WHEN s IS NULL THEN NULL::txid_status
+ WHEN s = -1 THEN 'aborted'::txid_status
+ WHEN s = 0 THEN 'in-progress'::txid_status
+ WHEN s = 1 THEN 'committed'::txid_status
+END
+FROM pg_catalog.txid_status_internal($1) s;
+$$;
+
+COMMENT ON FUNCTION txid_status(bigint)
+IS 'get commit status of given recent xid or null if too old';
I'm not really that keen on this approach. I don't think we need to
introduce a new data type for this, and I would rather not use SQL,
either. It would be faster and simpler to just return the appropriate
string from a C function defined directly.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 10:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Updated patch series attached. As before, 0-4 intended for commit, 5 just
because it'll be handy to have around for people doing wraparound related
testing.Again, thanks for taking a look.
/me reviews a bit more deeply.
In 0001, ...
0002 looks good, so I committed it. You forgot a function prototype
for the new SQL-callable function, though, so I added that. For me,
it generates a compiler warning if that's missing; you might want to
try to achieve a similar setup.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 August 2016 at 22:18, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:Updated patch series attached. As before, 0-4 intended for commit, 5 just
because it'll be handy to have around for people doing wraparound related
testing.Again, thanks for taking a look.
/me reviews a bit more deeply.
In 0001, it seems to me that "in-progress" should be "in progress".
Fine by me. I was on the fence about it anyway.
+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
I'm not really that keen on this approach. I don't think we need to
introduce a new data type for this, and I would rather not use SQL,
either. It would be faster and simpler to just return the appropriate
string from a C function defined directly.
Also fine by me. You're right, keep it simple. It means the potential set
of values isn't discoverable the same way, but ... meh. Using it usefully
means reading the docs anyway.
The remaining 2 patches of interest are attached - txid_status() and
txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().
Now I'd best stop pretending I'm in a sensible timezone.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Also fine by me. You're right, keep it simple. It means the potential set of
values isn't discoverable the same way, but ... meh. Using it usefully means
reading the docs anyway.The remaining 2 patches of interest are attached - txid_status() and
txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().Now I'd best stop pretending I'm in a sensible timezone.
I reviewed this version some more and found some more problems.
+ uint32 xid_epoch = (uint32)(xid_with_epoch >>32);
+ TransactionId xid = (TransactionId)(xid_with_epoch);
I think this is not project style. In particular, I think that the
first one needs a space after the cast and another space before the
32; and I think the second one has an unnecessary set of parentheses
and needs a space added.
+/*
+ * Underlying implementation of txid_status, which is mapped to an enum in
+ * system_views.sql.
+ */
Not any more...
+ errmsg("transaction ID "UINT64_FORMAT" is an invalid xid",
+ xid_with_epoch)));
Spacing.
+ if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
+ status = gettext_noop("in progress");
+ else if (TransactionIdDidCommit(xid))
+ status = gettext_noop("committed");
+ else if (TransactionIdDidAbort(xid))
+ status = gettext_noop("aborted");
+ else
+ /* shouldn't happen */
+ ereport(ERROR,
+ (errmsg_internal("unable to determine commit status of
xid "UINT64_FORMAT, xid)));
Maybe I'm all wet here, but it seems like there might be a problem
here if the XID is older than the CLOG truncation point but less than
one epoch old. get_xid_in_recent_past only guarantees that the
transaction is less than one epoch old, not that we still have CLOG
data for it. And there's nothing to keep NextXID from advancing under
us, so if somebody asks about a transaction that's just under 2^32
transactions old, then get_xid_in_recent_past could say it's valid,
then NextXID advances and we look up the XID extracted from the txid
and get the status of the new transaction instead of the old one!
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 24 August 2016 at 03:10, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Also fine by me. You're right, keep it simple. It means the potential set of
values isn't discoverable the same way, but ... meh. Using it usefully means
reading the docs anyway.The remaining 2 patches of interest are attached - txid_status() and
txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().Now I'd best stop pretending I'm in a sensible timezone.
I reviewed this version some more and found some more problems.
Thanks. It took me a few days to prep a new patch as I found another
issue in the process. Updated patch attached.
The updated series starts (0001) with a change to slru.c to release
the control lock when throwing an exception so that we don't deadlock
with ourselves when re-entering slru.c; explanation below.
Then there's the txid_status (0002) patch with fixes, then
txid_convert_if_recent(0003).
I omitted txid_incinerate() ; I have an updated version that sets xact
status to aborted for burned xacts instead of leaving them at 0
(in-progress), but haven't had time to finish it so it doesn't try to
blindly set the status of xacts on pages where it didn't hold
XidGenLock when the page was added to the clog.
+ uint32 xid_epoch = (uint32)(xid_with_epoch >>32); + TransactionId xid = (TransactionId)(xid_with_epoch);I think this is not project style. In particular, I think that the
first one needs a space after the cast and another space before the
32; and I think the second one has an unnecessary set of parentheses
and needs a space added.
OK, no problems. I didn't realise spacing around casts was specified.
+/* + * Underlying implementation of txid_status, which is mapped to an enum in + * system_views.sql. + */Not any more...
That's why I shouldn't revise a patch at 1am ;)
+ if (TransactionIdIsCurrentTransactionId(xid) || TransactionIdIsInProgress(xid)) + status = gettext_noop("in progress"); + else if (TransactionIdDidCommit(xid)) + status = gettext_noop("committed"); + else if (TransactionIdDidAbort(xid)) + status = gettext_noop("aborted"); + else + /* shouldn't happen */ + ereport(ERROR, + (errmsg_internal("unable to determine commit status of xid "UINT64_FORMAT, xid)));Maybe I'm all wet here, but it seems like there might be a problem
here if the XID is older than the CLOG truncation point but less than
one epoch old. get_xid_in_recent_past only guarantees that the
transaction is less than one epoch old, not that we still have CLOG
data for it.
Good point. The call would then fail with something like
ERROR: could not access status of transaction 778793573
DETAIL: could not open file "pg_clog/02E6": No such file or directory
This probably didn't come up in my wraparound testing because I'm
aggressively forcing wraparound by writing a lot of clog very quickly
under XidGenLock, and because I'm mostly looking at xacts that are
either recent or past the xid boundary. I've added better add coverage
for xacts around 2^30 behind the nextXid to the wraparound tests;
can't add them to txid.sql since the xid never gets that far in normal
regression testing.
What I'd really like is to be able to ask transam.c to handle the
xid_in_recent_past logic, treating an attempt to read an xid from
beyond the clog truncation threshold as a soft error indicating
unknown xact state. But that involves delving into slru.c, and I
really, really don't want to touch that for what should be a simple
and pretty noninvasive utility function.
A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
except for two issues:
* I see no accepted way to access the errcode etc from within the
PG_CATCH block, though. PG_RE_THROW() can do it because pg_re_throw()
is in elog.c. I couldn't find any existing code that seems to check
details about an error thrown in a PG_TRY block, only SPI calls. I
don't want to ignore all types of errors and potentially hide
problems, so I just used geterrcode() - which is meant for errcontext
callbacks - and changed the comment to say it can be used in PG_CATCH
too. I don't see why it shouldn't be.
We should probably have some sort of PG_CATCH_INFO(varname) that
exposes the top ErrorData, but that's not needed for this patch so I
left it alone.
* TransactionIdGetStatus() releases the CLogControlLock taken by
SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
thrown from SlruReportIOError(). It seems appropriate for
SimpleLruReadPage() to release the LWLock before calling
SlruReportIOError(), so I've done that as a separate patch (now 0001).
I also removed the TransactionIdInProgress check in txid_status and
just assume it's in progress if it isn't our xid, committed or
aborted. TransactionIdInProgress looks like it's potentially more
expensive, and most of the time we'll be looking at committed or
aborted xacts anyway. I can't sanity-check TransactionIdInProgress
after commited/aborted, because there's then a race where the xact can
commit or abort after we decide it's not committed/aborted so it's not
in progress when we go to check that.
And there's nothing to keep NextXID from advancing under
us, so if somebody asks about a transaction that's just under 2^32
transactions old, then get_xid_in_recent_past could say it's valid,
then NextXID advances and we look up the XID extracted from the txid
and get the status of the new transaction instead of the old one!
Hm, yeah. Though due to the clog truncation issue you noticed it
probably won't happen.
We could require that XidGenLock be held at least as LW_SHARED when
entering get_xid_in_recent_past(), but I'd rather not since that'd be
an otherwise-unnecessary lwlock for txid_convert_ifrecent().
Instead, I think I'll rename the wraparound flag to too_old and set it
if the xact is more than say 2^30 from the epoch struct's last_xid,
leaving a race window so ridiculously improbable that the nearly
impossible chance of failing with a clog access error is not a worry.
If the server's managing to have a proc stuck that long then it's
already on fire. We're only interested in reasonably recent xacts
since we can only work with xacts before wraparound / clog truncation.
This just moves the threshold for "reasonably recent" a bit closer.
All this certainly reinforces my view that users handling 'xid'
directly or trying to extract it from a bigint epoch-extended xid is a
bad idea that needs to go away soon.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Release-SLRU-control-lock-before-reporting-I-O-error.patchtext/x-patch; charset=US-ASCII; name=0001-Release-SLRU-control-lock-before-reporting-I-O-error.patchDownload+13-5
0002-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0002-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+293-26
0003-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchtext/x-patch; charset=US-ASCII; name=0003-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchDownload+184-9
Hi,
On 2016-08-29 11:25:39 +0800, Craig Ringer wrote:
ERROR: could not access status of transaction 778793573
DETAIL: could not open file "pg_clog/02E6": No such file or directoryWhat I'd really like is to be able to ask transam.c to handle the
xid_in_recent_past logic, treating an attempt to read an xid from
beyond the clog truncation threshold as a soft error indicating
unknown xact state. But that involves delving into slru.c, and I
really, really don't want to touch that for what should be a simple
and pretty noninvasive utility function.
Can't you "just" check this against ShmemVariableCache->oldestXid while
holding appropriate locks?
A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
except for two issues:
It seems like a bad idea to PG_CATCH and not re-throw an error. That
generally is quite error prone. At the very least locking and such gets
a lot more complicated (as you noticed below).
* TransactionIdGetStatus() releases the CLogControlLock taken by
SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
thrown from SlruReportIOError(). It seems appropriate for
SimpleLruReadPage() to release the LWLock before calling
SlruReportIOError(), so I've done that as a separate patch (now 0001).
We normally prefer to handle this via the "bulk" releases in the error
handlers. It's otherwise hard to write code that handles these
situations reliably. It's different for spinlocks, but those normally
protect far smaller regions of code.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Aug 28, 2016 at 11:25 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
What I'd really like is to be able to ask transam.c to handle the
xid_in_recent_past logic, treating an attempt to read an xid from
beyond the clog truncation threshold as a soft error indicating
unknown xact state. But that involves delving into slru.c, and I
really, really don't want to touch that for what should be a simple
and pretty noninvasive utility function.
I think you're going to have to bite the bullet and do that, though, because ...
A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
...I don't think this has any chance of being acceptable. You can't
catch errors and not re-throw them. That's bad news.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29 August 2016 at 11:45, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-08-29 11:25:39 +0800, Craig Ringer wrote:
ERROR: could not access status of transaction 778793573
DETAIL: could not open file "pg_clog/02E6": No such file or directoryWhat I'd really like is to be able to ask transam.c to handle the
xid_in_recent_past logic, treating an attempt to read an xid from
beyond the clog truncation threshold as a soft error indicating
unknown xact state. But that involves delving into slru.c, and I
really, really don't want to touch that for what should be a simple
and pretty noninvasive utility function.Can't you "just" check this against ShmemVariableCache->oldestXid while
holding appropriate locks?
Hm. Yeah, I should've thought of that. Thank you.
A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
except for two issues:It seems like a bad idea to PG_CATCH and not re-throw an error. That
generally is quite error prone. At the very least locking and such gets
a lot more complicated (as you noticed below).
Yeah, and as I remember from the "fun" of trying to write apply errors
to tables in BDR. It wasn't my first choice.
* TransactionIdGetStatus() releases the CLogControlLock taken by
SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
thrown from SlruReportIOError(). It seems appropriate for
SimpleLruReadPage() to release the LWLock before calling
SlruReportIOError(), so I've done that as a separate patch (now 0001).We normally prefer to handle this via the "bulk" releases in the error
handlers. It's otherwise hard to write code that handles these
situations reliably. It's different for spinlocks, but those normally
protect far smaller regions of code.
Fair enough. It's not a complex path, but there are a _lot_ of
callers, and while I can't really imagine any of them relying on the
CLogControLock being held on error it's not something I was keen to
change. I thought complicating the clog with a soft-error interface
was worse and didn't come up with a better approach.
Said better approach attached in revised series. Thanks.
My only real complaint with doing this is that it's a bit more
conservative. But in practice clog truncation probably won't follow
that far behind oldestXmin so except in fairly contrived circumstances
it won't hurt. Apps that need guarantees about how old an xid they can
get status on can hold down xmin with a replication slot, a dummy
prepared xact, or whatever. If we find that becomes a common need that
should be made simpler then appropriate API to allow apps to hold down
clog truncation w/o blocking vacuuming can be added down the track.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+282-24
0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchtext/x-patch; charset=US-ASCII; name=0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchDownload+185-9
On 29 August 2016 at 15:53, Craig Ringer <craig@2ndquadrant.com> wrote:
Said better approach attached in revised series. Thanks.
Here's another minor update to the txid_status() and
txid_convert_if_recent() patches. The only change is moving
get_xid_in_recent_past from src/backend/utils/adt/txid.c to
src/backend/access/transam/xlog.c to permit its use by other code.
Specifically, I think it'll be needed for logical decoding on standby.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+284-24
0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchtext/x-patch; charset=US-ASCII; name=0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchDownload+185-9
On 1 September 2016 at 13:08, Craig Ringer <craig@2ndquadrant.com> wrote:
On 29 August 2016 at 15:53, Craig Ringer <craig@2ndquadrant.com> wrote:
Said better approach attached in revised series. Thanks.
Here's another minor update to the txid_status() and
txid_convert_if_recent() patches. The only change is moving
get_xid_in_recent_past from src/backend/utils/adt/txid.c to
src/backend/access/transam/xlog.c to permit its use by other code.
Specifically, I think it'll be needed for logical decoding on standby.
Ahem, wrong patches. Attached correctly now.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+285-24
0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchtext/x-patch; charset=US-ASCII; name=0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchDownload+185-9
Here's another update to these patches. While working on support for
logical decoding on standbys I noticed that I needed something like
get_xid_in_recent_past(...) there too. So I've moved it to xlog.c as
TransactionIdInRecentPast too and flipped its arguments to be more
convenient. No other changes.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchtext/x-patch; charset=US-ASCII; name=0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patchDownload+295-24
0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchtext/x-patch; charset=US-ASCII; name=0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patchDownload+184-9
On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote:
So I've moved it to xlog.c...
I'm pretty sure it shouldn't live in xlog.c, but there may be some
good reason I can't see yet.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers