pgsql: snapshot scalability: cache snapshots using a xact completion co
snapshot scalability: cache snapshots using a xact completion counter.
Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:
This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.
We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built. Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.
On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though). If we were able to safely avoid the lock there'd be another
significant gain on top of that.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: /messages/by-id/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/623a9ba79bbdd11c5eccb30b8bd5c446130e521c
Modified Files
--------------
src/backend/replication/logical/snapbuild.c | 1 +
src/backend/storage/ipc/procarray.c | 125 +++++++++++++++++++++++-----
src/backend/utils/time/snapmgr.c | 4 +
src/include/access/transam.h | 9 ++
src/include/utils/snapshot.h | 7 ++
5 files changed, 126 insertions(+), 20 deletions(-)
On Tue, Aug 18, 2020 at 04:30:21AM +0000, Andres Freund wrote:
snapshot scalability: cache snapshots using a xact completion counter.
Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built. Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though). If we were able to safely avoid the lock there'd be another
significant gain on top of that.
spurfowl and more animals are telling us that this commit has broken
2PC:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-08-18%2004%3A31%3A11
--
Michael
Andres Freund <andres@anarazel.de> writes:
snapshot scalability: cache snapshots using a xact completion counter.
buildfarm doesn't like this a bit ...
regards, tom lane
Hi,
On 2020-08-18 00:55:22 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
snapshot scalability: cache snapshots using a xact completion counter.
buildfarm doesn't like this a bit ...
Yea, looking already. Unless that turns out to be incredibly bad luck
and only the first three animals failed (there's a few passes after), or
unless I find the issue in the next 30min or so, I'll revert.
Greetings,
Andres Freund
On 2020-08-18 13:52:46 +0900, Michael Paquier wrote:
On Tue, Aug 18, 2020 at 04:30:21AM +0000, Andres Freund wrote:
spurfowl and more animals are telling us that this commit has broken
2PC:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-08-18%2004%3A31%3A11
It looks like it's a bit more subtle than outright breaking 2PC. We're
now at 3 out of 18 BF members having failed. I locally ran also quite a
few loops of the normal regression tests without finding an issue.
I'd written to Tom that I was planning to revert unless the number of
failures were lower than initially indicated. But that actually seems to
have come to pass (the failures are quicker to report because they don't
run the subsequent tests, of course). I'd like to let the failures
accumulate a bit longer, say until tomorrow Midday if I haven't figured
it out by then. With the hope of finding some detail to help pinpoint
the issue.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
I'd written to Tom that I was planning to revert unless the number of
failures were lower than initially indicated. But that actually seems to
have come to pass (the failures are quicker to report because they don't
run the subsequent tests, of course). I'd like to let the failures
accumulate a bit longer, say until tomorrow Midday if I haven't figured
it out by then. With the hope of finding some detail to help pinpoint
the issue.
There's certainly no obvious pattern here, so I agree with waiting for
more data.
regards, tom lane
Hi,
On 2020-08-18 01:21:17 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I'd written to Tom that I was planning to revert unless the number of
failures were lower than initially indicated. But that actually seems to
have come to pass (the failures are quicker to report because they don't
run the subsequent tests, of course). I'd like to let the failures
accumulate a bit longer, say until tomorrow Midday if I haven't figured
it out by then. With the hope of finding some detail to help pinpoint
the issue.There's certainly no obvious pattern here, so I agree with waiting for
more data.
FWIW, I think I have found the bug, but I'm still working to reproduce
the issue reliably enough that I can verify that the fix actually works.
The issue is basically that 2PC PREPARE is weird, WRT procarray. The
last snapshot built with GetSnapshotData() before the PREPARE doesn't
include its own transaction in ->xip[], as normal. PrepareTransaction()
removes the "normal" entry with ProcArrayClearTransaction(), which so
far doesn't increase the xact completion count. Because the xact
completion count is not increased, snapshots can be reused as long as
they're taken before the 2PC transaction is finished. That's fine for
other backends, but for the backend doing the PrepareTransaction() it's
not, because there ->xip doesn't include the own backend.
It's a bit tricky to reproduce exactly the issue the BF is occasionally
hitting, because the way ->xmax is computed *limits* the
damage. Combined with the use of SERIALIZABLE (preventing recomputation
of the data snapshot) that makes it somewhat hard to hit.
Greetings,
Andres Freund
On 2020-08-18 13:28:05 -0700, Andres Freund wrote:
Hi,
On 2020-08-18 01:21:17 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I'd written to Tom that I was planning to revert unless the number of
failures were lower than initially indicated. But that actually seems to
have come to pass (the failures are quicker to report because they don't
run the subsequent tests, of course). I'd like to let the failures
accumulate a bit longer, say until tomorrow Midday if I haven't figured
it out by then. With the hope of finding some detail to help pinpoint
the issue.There's certainly no obvious pattern here, so I agree with waiting for
more data.FWIW, I think I have found the bug, but I'm still working to reproduce
the issue reliably enough that I can verify that the fix actually works.The issue is basically that 2PC PREPARE is weird, WRT procarray. The
last snapshot built with GetSnapshotData() before the PREPARE doesn't
include its own transaction in ->xip[], as normal. PrepareTransaction()
removes the "normal" entry with ProcArrayClearTransaction(), which so
far doesn't increase the xact completion count. Because the xact
completion count is not increased, snapshots can be reused as long as
they're taken before the 2PC transaction is finished. That's fine for
other backends, but for the backend doing the PrepareTransaction() it's
not, because there ->xip doesn't include the own backend.It's a bit tricky to reproduce exactly the issue the BF is occasionally
hitting, because the way ->xmax is computed *limits* the
damage. Combined with the use of SERIALIZABLE (preventing recomputation
of the data snapshot) that makes it somewhat hard to hit.
I pushed a fix. After a while I figured out that it's not actually that
hard to test reliably. But it does require multiple sessions
interacting, particularly another session needs to acquire and commit a
transaction id that's later than the prepared transaction's.
I think it's worth adding an isolation test. But it doesn't seem like
extending prepared-transactions.spec makes too much sense, it doesn't
fit in well. It's a lot easier to reproduce the issue without
SERIALIZABLE, for example. Generally the file seems more about
serializable than 2PC...
So unless somebody disagrees I'm gonna add a new
prepared-transactions-2.spec.
Greetings,
Andres Freund
Hi,
This thread started on committers, at
/messages/by-id/20200818234532.uiafo5br5lo6zhya@alap3.anarazel.de
In it I wanted to add a isolation test around prepared transactions:
On 2020-08-18 16:45:32 -0700, Andres Freund wrote:
I think it's worth adding an isolation test. But it doesn't seem like
extending prepared-transactions.spec makes too much sense, it doesn't
fit in well. It's a lot easier to reproduce the issue without
SERIALIZABLE, for example. Generally the file seems more about
serializable than 2PC...So unless somebody disagrees I'm gonna add a new
prepared-transactions-2.spec.
But I noticed that the already existing prepared transactions test
wasn't in the normal schedule, since:
commit ae55d9fbe3871a5e6309d9b91629f1b0ff2b8cba
Author: Andrew Dunstan <andrew@dunslane.net>
Date: 2012-07-20 15:51:40 -0400
Remove prepared transactions from main isolation test schedule.
There is no point in running this test when prepared transactions are disabled,
which is the default. New make targets that include the test are provided. This
will save some useless waste of cycles on buildfarm machines.
Backpatch to 9.1 where these tests were introduced.
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 669c0f220c4..2184975dcb1 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -9,7 +9,6 @@ test: ri-trigger
test: partial-index
test: two-ids
test: multiple-row-versions
-test: prepared-transactions
test: fk-contention
test: fk-deadlock
test: fk-deadlock2
The commit confuses me, cause I thought we explicitly enabled prepared
transactions during tests well before that? See
commit 8d4f2ecd41312e57422901952cbad234d293060b
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2009-04-23 00:23:46 +0000
Change the default value of max_prepared_transactions to zero, and add
documentation warnings against setting it nonzero unless active use of
prepared transactions is intended and a suitable transaction manager has been
installed. This should help to prevent the type of scenario we've seen
several times now where a prepared transaction is forgotten and eventually
causes severe maintenance problems (or even anti-wraparound shutdown).
The only real reason we had the default be nonzero in the first place was to
support regression testing of the feature. To still be able to do that,
tweak pg_regress to force a nonzero value during "make check". Since we
cannot force a nonzero value in "make installcheck", add a variant regression
test "expected" file that shows the results that will be obtained when
max_prepared_transactions is zero.
Also, extend the HINT messages for transaction wraparound warnings to mention
the possibility that old prepared transactions are causing the problem.
All per today's discussion.
And indeed, including the test in the schedule works for make check, not
just an installcheck with explicitly enabled prepared xacts.
ISTM we should just add an alternative output for disabled prepared
xacts, and re-add the test?
My new test, without the alternative output for now, is attached.
Greetings,
Andres Freund
Attachments:
test.difftext/x-diff; charset=us-asciiDownload
diff --git c/src/test/isolation/expected/prepared-transactions-2.out i/src/test/isolation/expected/prepared-transactions-2.out
new file mode 100644
index 00000000000..f7fd2e7f989
--- /dev/null
+++ i/src/test/isolation/expected/prepared-transactions-2.out
@@ -0,0 +1,119 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_ins_before s1_begin s1_txid s2_txid s1_update_in s1_ins_in s1_show s1_prepare s1_show s1_commit s1_show
+step s1_ins_before: INSERT INTO preptest VALUES ('before');
+step s1_begin: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
+step s1_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s2_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s1_update_in: UPDATE preptest SET data = 'in; '|| data;
+step s1_ins_in: INSERT INTO preptest VALUES ('in');
+step s1_show: SELECT * FROM preptest
+data
+
+in; before
+in
+step s1_prepare: PREPARE TRANSACTION 'prep';
+step s1_show: SELECT * FROM preptest
+data
+
+before
+step s1_commit: COMMIT PREPARED 'prep';
+step s1_show: SELECT * FROM preptest
+data
+
+in; before
+in
+
+starting permutation: s1_ins_before s1_begin s1_txid s2_txid s1_update_in s1_ins_in s1_show s1_prepare s2_show s1_commit s2_show
+step s1_ins_before: INSERT INTO preptest VALUES ('before');
+step s1_begin: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
+step s1_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s2_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s1_update_in: UPDATE preptest SET data = 'in; '|| data;
+step s1_ins_in: INSERT INTO preptest VALUES ('in');
+step s1_show: SELECT * FROM preptest
+data
+
+in; before
+in
+step s1_prepare: PREPARE TRANSACTION 'prep';
+step s2_show: SELECT * FROM preptest
+data
+
+before
+step s1_commit: COMMIT PREPARED 'prep';
+step s2_show: SELECT * FROM preptest
+data
+
+in; before
+in
+
+starting permutation: s1_ins_before s1_begin s1_txid s2_txid s1_update_in s1_ins_in s1_show s1_prepare s1_show s1_rollback s1_show
+step s1_ins_before: INSERT INTO preptest VALUES ('before');
+step s1_begin: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
+step s1_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s2_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s1_update_in: UPDATE preptest SET data = 'in; '|| data;
+step s1_ins_in: INSERT INTO preptest VALUES ('in');
+step s1_show: SELECT * FROM preptest
+data
+
+in; before
+in
+step s1_prepare: PREPARE TRANSACTION 'prep';
+step s1_show: SELECT * FROM preptest
+data
+
+before
+step s1_rollback: ROLLBACK PREPARED 'prep';
+step s1_show: SELECT * FROM preptest
+data
+
+before
+
+starting permutation: s1_ins_before s1_begin s1_txid s2_txid s1_update_in s1_ins_in s1_show s1_prepare s2_show s1_rollback s2_show
+step s1_ins_before: INSERT INTO preptest VALUES ('before');
+step s1_begin: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
+step s1_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s2_txid: SELECT txid_current() IS NULL;
+?column?
+
+f
+step s1_update_in: UPDATE preptest SET data = 'in; '|| data;
+step s1_ins_in: INSERT INTO preptest VALUES ('in');
+step s1_show: SELECT * FROM preptest
+data
+
+in; before
+in
+step s1_prepare: PREPARE TRANSACTION 'prep';
+step s2_show: SELECT * FROM preptest
+data
+
+before
+step s1_rollback: ROLLBACK PREPARED 'prep';
+step s2_show: SELECT * FROM preptest
+data
+
+before
diff --git c/src/test/isolation/isolation_schedule i/src/test/isolation/isolation_schedule
index 218c87b24bf..ec15ce784db 100644
--- c/src/test/isolation/isolation_schedule
+++ i/src/test/isolation/isolation_schedule
@@ -1,3 +1,5 @@
+test: prepared-transactions
+test: prepared-transactions-2
test: read-only-anomaly
test: read-only-anomaly-2
test: read-only-anomaly-3
diff --git c/src/test/isolation/specs/prepared-transactions-2.spec i/src/test/isolation/specs/prepared-transactions-2.spec
new file mode 100644
index 00000000000..d5ed58e4cfc
--- /dev/null
+++ i/src/test/isolation/specs/prepared-transactions-2.spec
@@ -0,0 +1,60 @@
+# This test checks against a bug found in the initial support for
+# caching snapshots. The bug caused the snapshot from within the
+# to-be-prepared transaction to be used after preparing, erroneously
+# not including the prepared transaction's xid, thereby effectively
+# treating it as aborted.
+#
+# See also https://postgr.es/m/E1k7tGP-0005V0-5k%40gemulon.postgresql.org
+
+setup
+{
+ CREATE TABLE preptest(data text);
+}
+
+teardown
+{
+ DROP TABLE preptest;
+}
+
+session "s1"
+step "s1_begin" { BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; }
+step "s1_txid" { SELECT txid_current() IS NULL; }
+step "s1_ins_before" { INSERT INTO preptest VALUES ('before'); }
+step "s1_update_in" { UPDATE preptest SET data = 'in; '|| data; }
+step "s1_ins_in" { INSERT INTO preptest VALUES ('in'); }
+step "s1_prepare" { PREPARE TRANSACTION 'prep'; }
+step "s1_commit" { COMMIT PREPARED 'prep'; }
+step "s1_rollback" { ROLLBACK PREPARED 'prep'; }
+step "s1_show" { SELECT * FROM preptest }
+
+session "s2"
+step "s2_show" { SELECT * FROM preptest }
+step "s2_txid" { SELECT txid_current() IS NULL; }
+
+# This permutation shows the bug
+permutation "s1_ins_before" "s1_begin" "s1_txid"
+ # Without acquiring an xid here, s1's xid would be the snapshot's
+ # xmax, hiding the bug (as only xids < xmax are looked up in ->xip).
+ "s2_txid"
+ "s1_update_in" "s1_ins_in" "s1_show" "s1_prepare"
+ # With the bug the next select would find xmax neither committed nor in progress,
+ # and therefore mark xmax as invalid. Obviously breaking visibility.
+ "s1_show" "s1_commit" "s1_show"
+
+# The following are just variations of the theme that seem good to
+# also test, even though they weren't affected by the bug.
+
+permutation "s1_ins_before" "s1_begin" "s1_txid"
+ "s2_txid"
+ "s1_update_in" "s1_ins_in" "s1_show" "s1_prepare"
+ "s2_show" "s1_commit" "s2_show"
+
+permutation "s1_ins_before" "s1_begin" "s1_txid"
+ "s2_txid"
+ "s1_update_in" "s1_ins_in" "s1_show" "s1_prepare"
+ "s1_show" "s1_rollback" "s1_show"
+
+permutation "s1_ins_before" "s1_begin" "s1_txid"
+ "s2_txid"
+ "s1_update_in" "s1_ins_in" "s1_show" "s1_prepare"
+ "s2_show" "s1_rollback" "s2_show"
Andres Freund <andres@anarazel.de> writes:
ISTM we should just add an alternative output for disabled prepared
xacts, and re-add the test?
I believe the buildfarm runs the isolation step with "make installcheck",
so if you're hoping to get buildfarm coverage that way, you're mistaken.
Having said that, it'd probably be good if "make check" did run this test.
regards, tom lane
Hi,
On 2020-08-18 22:24:20 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
ISTM we should just add an alternative output for disabled prepared
xacts, and re-add the test?I believe the buildfarm runs the isolation step with "make installcheck",
so if you're hoping to get buildfarm coverage that way, you're mistaken.
It seems like the buildfarm ought to configure the started server with a
bunch of prepared transactions, in that case? At least going forward?
Having said that, it'd probably be good if "make check" did run this test.
Yea. It'd at least be run when we do check-world - which at least I do
before nearly every commit.
Greetings,
Andres Freund
On Tue, Aug 18, 2020 at 07:34:00PM -0700, Andres Freund wrote:
It seems like the buildfarm ought to configure the started server with a
bunch of prepared transactions, in that case? At least going forward?
Agreed. Testing with max_prepared_transactions > 0 has much more
value than not, for sure. So I think that it could be a good thing,
particularly if we begin to add more isolation tests.
--
Michael
On 8/18/20 9:22 PM, Andres Freund wrote:
Hi,
This thread started on committers, at
/messages/by-id/20200818234532.uiafo5br5lo6zhya@alap3.anarazel.deIn it I wanted to add a isolation test around prepared transactions:
On 2020-08-18 16:45:32 -0700, Andres Freund wrote:
I think it's worth adding an isolation test. But it doesn't seem like
extending prepared-transactions.spec makes too much sense, it doesn't
fit in well. It's a lot easier to reproduce the issue without
SERIALIZABLE, for example. Generally the file seems more about
serializable than 2PC...So unless somebody disagrees I'm gonna add a new
prepared-transactions-2.spec.But I noticed that the already existing prepared transactions test
wasn't in the normal schedule, since:commit ae55d9fbe3871a5e6309d9b91629f1b0ff2b8cba
Author: Andrew Dunstan <andrew@dunslane.net>
Date: 2012-07-20 15:51:40 -0400Remove prepared transactions from main isolation test schedule.
There is no point in running this test when prepared transactions are disabled,
which is the default. New make targets that include the test are provided. This
will save some useless waste of cycles on buildfarm machines.Backpatch to 9.1 where these tests were introduced.
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 669c0f220c4..2184975dcb1 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -9,7 +9,6 @@ test: ri-trigger test: partial-index test: two-ids test: multiple-row-versions -test: prepared-transactions test: fk-contention test: fk-deadlock test: fk-deadlock2The commit confuses me, cause I thought we explicitly enabled prepared
transactions during tests well before that? Seecommit 8d4f2ecd41312e57422901952cbad234d293060b
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2009-04-23 00:23:46 +0000Change the default value of max_prepared_transactions to zero, and add
documentation warnings against setting it nonzero unless active use of
prepared transactions is intended and a suitable transaction manager has been
installed. This should help to prevent the type of scenario we've seen
several times now where a prepared transaction is forgotten and eventually
causes severe maintenance problems (or even anti-wraparound shutdown).The only real reason we had the default be nonzero in the first place was to
support regression testing of the feature. To still be able to do that,
tweak pg_regress to force a nonzero value during "make check". Since we
cannot force a nonzero value in "make installcheck", add a variant regression
test "expected" file that shows the results that will be obtained when
max_prepared_transactions is zero.Also, extend the HINT messages for transaction wraparound warnings to mention
the possibility that old prepared transactions are causing the problem.All per today's discussion.
And indeed, including the test in the schedule works for make check, not
just an installcheck with explicitly enabled prepared xacts.ISTM we should just add an alternative output for disabled prepared
xacts, and re-add the test?
here's the context for the 2012 commit.
/messages/by-id/50099220.2060005@dunslane.net
So I hope any changes that are made will not result in a major slowdown
of buildfarm animals.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Aug-18, Andres Freund wrote:
So unless somebody disagrees I'm gonna add a new
prepared-transactions-2.spec.
I think keeping things separate if they're not really related is
sensible.
I think it might be a good idea to add that test to older branches too,
even if it's just 13 -- at least temporarily.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services