MVCC catalog access
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact. I decided to do some
testing to measure the impact. I was pleasantly surprised by the
results.
The attached patch is a quick hack to provide for MVCC catalog access.
It adds a GUC called "mvcc_catalog_access". When this GUC is set to
true, and heap_beginscan() or index_beginscan() is called with
SnapshotNow, they call GetLatestSnapshot() and use the resulting
snapshot in lieu of SnapshotNow. As a debugging double-check, I
modified HeapTupleSatisfiesNow to elog(FATAL) if called with
mvcc_catalog_access is true. Of course, both of these are dirty
hacks. If we were actually to implement MVCC catalog access, I think
we'd probably just go through and start replacing instances of
SnapshotNow with GetLatestSnapshot(), but I wanted to make it easy to
do perf testing.
When I first made this change, I couldn't detect any real change;
indeed, it seemed that make check was running ever-so-slightly faster
than before, although that may well have been a testing artifact. I
wrote a test case that created a schema with 100,000 functions in it
and then dropped the schema (I believe it was Tom who previously
suggested this test case as a worst-case scenario for MVCC catalog
access). That didn't seem to be adversely affected either, even
though it must take ~700k additional MVCC snapshots with
mvcc_catalog_access = true.
MVCC Off: Create 8743.101 ms, Drop 9655.471 ms
MVCC On: Create 7462.882 ms, Drop 9515.537 ms
MVCC Off: Create 7519.160 ms, Drop 9380.905 ms
MVCC On: Create 7517.382 ms, Drop 9394.857 ms
The first "Create" seems to be artificially slow because of some kind
of backend startup overhead. Not sure exactly what.
After wracking my brain for a few minutes, I realized that the lack of
any apparent performance regression was probably due to the lack of
any concurrent connections, making the scans of the PGXACT array very
cheap. So I wrote a little program to open a bunch of extra
connections. My MacBook Pro grumbled when I tried to open more than
about 600, so I had to settle for that number. That was enough to
show up the cost of all those extra snapshots:
MVCC Off: Create 9065.887 ms, Drop 9599.494 ms
MVCC On: Create 8384.065 ms, Drop 10532.909 ms
MVCC Off: Create 7632.197 ms, Drop 9499.502 ms
MVCC On: Create 8215.443 ms, Drop 10033.499 ms
Now, I don't know about you, but I'm having a hard time getting
agitated about those numbers. Most people are not going to drop
100,000 objects with a single cascaded drop. And most people are not
going to have 600 connections open when they do. (The snapshot
overhead should be roughly proportional to the product of the number
of drops and the number of open connections, and the number of cases
where the product is as high as 60 million has got to be pretty
small.) But suppose that someone is in that situation. Well, then
they will take a... 10% performance penalty? That sounds plenty
tolerable to me, if it means we can start moving in the direction of
allowing some concurrent DDL.
Am I missing an important test case here? Are these results worse
than I think they are? Did I boot this testing somehow?
[MVCC catalog access patch, test program to create lots of idle
connections, and pg_depend stress test case attached.]
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mvcc-catalog-access.patchapplication/octet-stream; name=mvcc-catalog-access.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9498cbb..ccce409 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1309,6 +1309,9 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
{
HeapScanDesc scan;
+ if (mvcc_catalog_access && snapshot == SnapshotNow)
+ snapshot = GetLatestSnapshot();
+
/*
* increment relation ref count while scanning relation
*
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..67c0cff 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -242,6 +242,9 @@ index_beginscan(Relation heapRelation,
{
IndexScanDesc scan;
+ if (mvcc_catalog_access && snapshot == SnapshotNow)
+ snapshot = GetLatestSnapshot();
+
scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot);
/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 22ba35f..fb3c295 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -76,6 +76,7 @@
#include "utils/portal.h"
#include "utils/ps_status.h"
#include "utils/snapmgr.h"
+#include "utils/tqual.h"
#include "utils/tzparser.h"
#include "utils/xml.h"
@@ -1455,6 +1456,16 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"mvcc_catalog_access", PGC_USERSET, DEVELOPER_OPTIONS,
+ gettext_noop("Use MVCC catalog access."),
+ NULL,
+ },
+ &mvcc_catalog_access,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 24384b4..7ecdbee 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -73,6 +73,9 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
+/* Use MVCC catalog access. */
+bool mvcc_catalog_access;
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
@@ -353,6 +356,8 @@ HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
+ if (mvcc_catalog_access)
+ elog(FATAL, "behold, we are dead");
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 465231c..16b67a3 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -29,6 +29,9 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
#define SnapshotAny (&SnapshotAnyData)
#define SnapshotToast (&SnapshotToastData)
+/* Use MVCC catalog access. */
+extern bool mvcc_catalog_access;
+
/*
* We don't provide a static SnapshotDirty variable because it would be
* non-reentrant. Instead, users of that snapshot type should declare a
On Tue, May 21, 2013 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
[ MVCC catalog access seems to be pretty cheap ]
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.
I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive). I couldn't quite believe it made no difference, so I
tried doing it in a tight loop under pgbench. I still can't measure
any difference. I haven't tested carefully enough to rule out the
possibility of an effect <1/2% at 600 connections, but there certainly
isn't anything bigger than that and I don't even think there's that
much of a difference.
Andres Freund suggested creating a couple of simple tables and having
lots of short-lived backends select data from them.
rhaas=# create table af1 (x) as select g from generate_series(1,4) g;
SELECT 4
rhaas=# create table af2 (x) as select g from generate_series(4,7) g;
SELECT 4
Test query: SELECT * FROM af1, af2 WHERE af1.x = af2.x;
pgbench command: pgbench -T 10 -c 50 -j 50 -n -f f -C
With mvcc_catalog_access=off, I get ~1553 tps; with it on, I get ~1557
tps. Hmm... that could be because of the two-line debugging hunk my
patch addes to HeapTupleSatisfiesNow(). After removing that, I get
maybe a 1% regression with mvcc_catalog_access=on on this test, but
it's not very consistent. If I restart the database server a few
times, the overhead bounces around each time, and sometimes it's zero;
the highest I saw is 1.4%. But it's not much, and this is a pretty
brutal workload for PostgreSQL, since starting up >1500 connections
per second is not a workload for which we're well-suited in the first
place.
All in all, I'm still feeling optimistic.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.
I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).
Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?
All in all, I'm still feeling optimistic.
+1. I still feel like this has to be much harder since we made it out to
be hard for a long time ;)
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 22, 2013 at 11:02 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?
No... I can hack something up for that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-22 23:05:40 -0400, Robert Haas wrote:
On Wed, May 22, 2013 at 11:02 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?No... I can hack something up for that.
Make that actually having acquired an xid. We skip a large part of the
work if a transaction doesn't yet have one afair. I don't think the mere
presence of 600 idle connections without an xid in contrast to just
having max_connection at 600 should actually make a difference in the
cost of acquiring a snapshot?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 22, 2013 at 11:11 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Make that actually having acquired an xid. We skip a large part of the
work if a transaction doesn't yet have one afair. I don't think the mere
presence of 600 idle connections without an xid in contrast to just
having max_connection at 600 should actually make a difference in the
cost of acquiring a snapshot?
Attached is a slightly updated version of the patch I'm using for
testing, and an updated version of the pg_cxn source that I'm using to
open lotsa connections. With this version, I can do this:
./pg_cxn -n 600 -c BEGIN -c 'SELECT txid_current()'
...which I think is sufficient to make sure all those transactions
have XIDs. Then I reran the "depend" test case (create a schema with
1000,000 functions and then drop the schema with CASCADE) that I
mentioned in my original posting. Here are the results:
MVCC Off: Create 8685.662 ms, Drop 9973.233 ms
MVCC On: Create 7931.039 ms, Drop 10189.189 ms
MVCC Off: Create 7810.084 ms, Drop 9594.580 ms
MVCC On: Create 8854.577 ms, Drop 10240.024 ms
OK, let's try the rebuild-the-relcache test using the same pg_cxn
scenario (600 transactions that have started a transaction and
selected txid_current()).
[rhaas ~]$ time for s in `seq 1 1000`; do rm -f
pgdata/global/pg_internal.init && psql -c 'SELECT 2+2' >/dev/null;
done
MVCC catalog access on:
real 0m11.006s
user 0m2.746s
sys 0m2.664s
MVCC catalog access off:
real 0m10.583s
user 0m2.745s
sys 0m2.661s
MVCC catalog access on:
real 0m10.646s
user 0m2.750s
sys 0m2.661s
MVCC catalog access off:
real 0m10.823s
user 0m2.756s
sys 0m2.681s
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mvcc-catalog-access-v2.patchapplication/octet-stream; name=mvcc-catalog-access-v2.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9498cbb..ccce409 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1309,6 +1309,9 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
{
HeapScanDesc scan;
+ if (mvcc_catalog_access && snapshot == SnapshotNow)
+ snapshot = GetLatestSnapshot();
+
/*
* increment relation ref count while scanning relation
*
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..67c0cff 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -242,6 +242,9 @@ index_beginscan(Relation heapRelation,
{
IndexScanDesc scan;
+ if (mvcc_catalog_access && snapshot == SnapshotNow)
+ snapshot = GetLatestSnapshot();
+
scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot);
/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 22ba35f..fb3c295 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -76,6 +76,7 @@
#include "utils/portal.h"
#include "utils/ps_status.h"
#include "utils/snapmgr.h"
+#include "utils/tqual.h"
#include "utils/tzparser.h"
#include "utils/xml.h"
@@ -1455,6 +1456,16 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"mvcc_catalog_access", PGC_USERSET, DEVELOPER_OPTIONS,
+ gettext_noop("Use MVCC catalog access."),
+ NULL,
+ },
+ &mvcc_catalog_access,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 24384b4..b7a636b 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -73,6 +73,9 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
+/* Use MVCC catalog access. */
+bool mvcc_catalog_access;
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 465231c..16b67a3 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -29,6 +29,9 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
#define SnapshotAny (&SnapshotAnyData)
#define SnapshotToast (&SnapshotToastData)
+/* Use MVCC catalog access. */
+extern bool mvcc_catalog_access;
+
/*
* We don't provide a static SnapshotDirty variable because it would be
* non-reentrant. Instead, users of that snapshot type should declare a
Perhaps we see little difference in performance because PGPROC has been
separated into PGPROC and PGXACT, reducing lock contention with getting
snapshot data?
By the way, I grabbed a 32-core machine and did some more performance tests
with some open connections with XIDs assigned using pg_cxn v2 given by
Robert in his previous mail to make sure that the snapshots get pretty
large.
First I ran the simple read test:
$ time for s in `seq 1 1000`
do
rm -f ~/bin/pgsql/master/global/pg_internal.init && psql -c 'SELECT 2+2'
/dev/null;
done
And then the create/drop test.
I have done those tests with 250, 500, 1000 and 2000 open connections:
1) 250 open connections
1-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.124s
user 0m0.200s
sys 0m0.392s
mvcc_catalog_access on:
real 0m9.297s
user 0m0.148s
sys 0m0.444s
Round 2:
mvcc_catalog_access off:
real 0m8.985s
user 0m0.160s
sys 0m0.372s
mvcc_catalog_access on:
real 0m9.244s
user 0m0.240s
sys 0m0.400s
1-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 24554.849, Drop: 29755.146
mvcc on: Create: 26904.755, Drop: 32891.556
mvcc off: Create: 23337.342, Drop: 29921.990
mvcc on: Create: 24533.708, Drop: 31670.840
2) 500 open connections
2-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.123s
user 0m0.200s
sys 0m0.396s
mvcc_catalog_access on:
real 0m9.627s
user 0m0.156s
sys 0m0.460s
Round 2:
mvcc_catalog_access off:
real 0m9.221s
user 0m0.316s
sys 0m0.392s
mvcc_catalog_access on:
real 0m9.592s
user 0m0.160s
sys 0m0.484s
2-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 25872.886, Drop: 31723.921
mvcc on: Create: 27076.429, Drop: 33674.336
mvcc off: Create: 24039.456, Drop: 30434.019
mvcc on: Create: 29105.713, Drop: 33821.170
3) 1000 open connections
3-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.240s
user 0m0.192s
sys 0m0.396s
mvcc_catalog_access on:
real 0m9.674s
user 0m0.236s
sys 0m0.440s
Round 2:
mvcc_catalog_access off:
real 0m9.302s
user 0m0.308s
sys 0m0.392s
mvcc_catalog_access on:
real 0m9.746s
user 0m0.204s
sys 0m0.436s
3-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 25563.705, Drop: 31747.451
mvcc on: Create: 33281.246, Drop: 36618.166
mvcc off: Create: 28178.210, Drop: 30550.166
mvcc on: Create: 31849.825, Drop: 36831.245
4) 2000 open connections
4-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.066s
user 0m0.128s
sys 0m0.420s
mvcc_catalog_access on:
real 0m9.978s
user 0m0.168s
sys 0m0.412s
Round 2:
mvcc_catalog_access off:
real 0m9.113s
user 0m0.152s
sys 0m0.444s
mvcc_catalog_access on:
real 0m9.974s
user 0m0.176s
sys 0m0.436s
More or less the same results as previously with 10% performance drop.
4-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 28708.095 ms, Drop: 32510.057 ms
mvcc on: Create: 39987.815 ms, Drop: 43157.006 ms
mvcc off: Create: 28409.853 ms, Drop: 31248.163 ms
mvcc on: Create: 41669.829 ms, Drop: 44645.109 ms
For read tests, we can see a performance drop up to 10% for 2000
connections.
For the write tests, we can see a performance drop of 9~10% for 250
connections, up to 30% performance drop with 2000 connections.
We barely see users drop that many objects at the same time with so much
open transactions, they'll switch to a connection pooler before opening
that many connections to the server. I am not sure that such a performance
drop is acceptable as-is, but perhaps it is if we consider the
functionality gain we can have thanks to MVCC catalogs.
--
Michael
On Sun, May 26, 2013 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Perhaps we see little difference in performance because PGPROC has been
separated into PGPROC and PGXACT, reducing lock contention with getting
snapshot data?By the way, I grabbed a 32-core machine and did some more performance tests
with some open connections with XIDs assigned using pg_cxn v2 given by
Robert in his previous mail to make sure that the snapshots get pretty
large.
Thanks for checking this on another machine. It's interesting that
you were able to measure a hit for relcache rebuild, whereas I was
not, but it doesn't look horrible.
IMHO, we should press forward with this approach. Considering that
these are pretty extreme test cases, I'm inclined to view the
performance loss as acceptable. We've never really viewed DDL as
something that needs to be micro-optimized, and there is ample
testimony to that fact in the existing code and in the treatment of
prior patches in this area. This is not to say that we want to go
around willy-nilly making it slower, but I think there will be very
few users for which the number of microseconds it takes to create or
drop an SQL object is performance-critical, especially when you
consider that (1) the effect will be quite a bit less when the objects
are tables, since in that case the snapshot cost will tend to be
drowned out by the filesystem cost and (2) people who don't habitually
keep hundreds and hundreds of connections open - which hopefully most
people don't - won't see the effect anyway. Against that, this
removes the single largest barrier to allowing more concurrent DDL, a
feature that I suspect will make a whole lot of people *very* happy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 28, 2013 at 10:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
IMHO, we should press forward with this approach. Considering that
these are pretty extreme test cases, I'm inclined to view the
performance loss as acceptable. We've never really viewed DDL as
something that needs to be micro-optimized, and there is ample
testimony to that fact in the existing code and in the treatment of
prior patches in this area. This is not to say that we want to go
around willy-nilly making it slower, but I think there will be very
few users for which the number of microseconds it takes to create or
drop an SQL object is performance-critical, especially when you
consider that (1) the effect will be quite a bit less when the objects
are tables, since in that case the snapshot cost will tend to be
drowned out by the filesystem cost and (2) people who don't habitually
keep hundreds and hundreds of connections open - which hopefully most
people don't - won't see the effect anyway. Against that, this
removes the single largest barrier to allowing more concurrent DDL, a
feature that I suspect will make a whole lot of people *very* happy.
+1.
So, I imagine that the next step would be to add a new Snapshot validation
level in tqual.h. Something like SnapshotMVCC? Then replace SnapshotNow
by SnapshotMVCC where it is required.
I am also seeing that SnapshotNow is used in places where we might not want
to
have it changed. For example autovacuum code path when we retrieve database
or table list should not be changed, no?
--
Michael
On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
+1.
Here's a more serious patch for MVCC catalog access. This one
involves more data copying than the last one, I think, because the
previous version did not register the snapshots it took, which I think
is not safe. So this needs to be re-tested for performance, which I
have so far made no attempt to do.
It strikes me as rather unfortunate that the snapshot interface is
designed in such a way as to require so much data copying. It seems
we always take a snapshot by copying from PGXACT/PGPROC into
CurrentSnapshotData or SecondarySnapshotData, and then copying data a
second time from there to someplace more permanent. It would be nice
to avoid that, at least in common cases.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mvcc-catalog-access-v3.patchapplication/octet-stream; name=mvcc-catalog-access-v3.patchDownload
commit 1e6f9c79e2b2b9a2f4d5635e04310355a8c91d9c
Author: Robert Haas <rhaas@postgresql.org>
Date: Sat Jun 1 23:19:33 2013 -0400
Switch to MVCC catalog access.
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index e617f9b..1110719 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2046,7 +2046,7 @@ get_pkey_attnames(Relation rel, int16 *numatts)
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(indexRelation, IndexIndrelidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(indexTuple = systable_getnext(scan)))
{
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 17b832e..81ab972 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -727,7 +727,7 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId)
rel = heap_open(catalogId, AccessShareLock);
sscan = systable_beginscan(rel, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while (HeapTupleIsValid(tuple = systable_getnext(sscan)))
{
Form_pg_database datForm;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e88dd30..e2b4daa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -80,7 +80,7 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync,
- bool is_bitmapscan);
+ bool is_bitmapscan, bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1283,7 +1283,16 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false);
+ true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_instant(Relation relation, int nkeys, ScanKey key)
+{
+ Snapshot snapshot = RegisterSnapshot(GetInstantSnapshot());
+
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ true, true, false, true);
}
HeapScanDesc
@@ -1292,7 +1301,7 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false);
+ allow_strat, allow_sync, false, false);
}
HeapScanDesc
@@ -1300,14 +1309,14 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true);
+ false, false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync,
- bool is_bitmapscan)
+ bool is_bitmapscan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1332,6 +1341,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
+ scan->rs_temp_snap = temp_snap;
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
@@ -1418,6 +1428,9 @@ heap_endscan(HeapScanDesc scan)
if (scan->rs_strategy != NULL)
FreeAccessStrategy(scan->rs_strategy);
+ if (scan->rs_temp_snap)
+ UnregisterSnapshot(scan->rs_snapshot);
+
pfree(scan);
}
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 31a419b..06c9406 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -28,6 +28,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/tqual.h"
@@ -231,7 +232,7 @@ BuildIndexValueDescription(Relation indexRelation,
* rel: catalog to scan, already opened and suitably locked
* indexId: OID of index to conditionally use
* indexOK: if false, forces a heap scan (see notes below)
- * snapshot: time qual to use (usually should be SnapshotNow)
+ * snapshot: time qual to use (NULL for an instantaneous snapshot)
* nkeys, key: scan keys
*
* The attribute numbers in the scan key should be set for the heap case.
@@ -266,6 +267,17 @@ systable_beginscan(Relation heapRelation,
sysscan->heap_rel = heapRelation;
sysscan->irel = irel;
+ if (snapshot == NULL)
+ {
+ snapshot = RegisterSnapshot(GetInstantSnapshot());
+ sysscan->snapshot = snapshot;
+ }
+ else
+ {
+ /* Caller is responsible for any snapshot. */
+ sysscan->snapshot = NULL;
+ }
+
if (irel)
{
int i;
@@ -401,6 +413,9 @@ systable_endscan(SysScanDesc sysscan)
else
heap_endscan(sysscan->scan);
+ if (sysscan->snapshot)
+ UnregisterSnapshot(sysscan->snapshot);
+
pfree(sysscan);
}
@@ -444,6 +459,17 @@ systable_beginscan_ordered(Relation heapRelation,
sysscan->heap_rel = heapRelation;
sysscan->irel = indexRelation;
+ if (snapshot == NULL)
+ {
+ snapshot = RegisterSnapshot(GetInstantSnapshot());
+ sysscan->snapshot = snapshot;
+ }
+ else
+ {
+ /* Caller is responsible for any snapshot. */
+ sysscan->snapshot = NULL;
+ }
+
/* Change attribute numbers to be index column numbers. */
for (i = 0; i < nkeys; i++)
{
@@ -494,5 +520,7 @@ systable_endscan_ordered(SysScanDesc sysscan)
{
Assert(sysscan->irel);
index_endscan(sysscan->iscan);
+ if (sysscan->snapshot)
+ UnregisterSnapshot(sysscan->snapshot);
pfree(sysscan);
}
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index fcf1a95..d7853c0 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -142,8 +142,7 @@ guarantees that VACUUM can't delete any heap tuple that an indexscanning
process might be about to visit. (This guarantee works only for simple
indexscans that visit the heap in sync with the index scan, not for bitmap
scans. We only need the guarantee when using non-MVCC snapshot rules such
-as SnapshotNow, so in practice this is only important for system catalog
-accesses.)
+as SnapshotNow.)
Because a page can be split even while someone holds a pin on it, it is
possible that an indexscan will return items that are no longer stored on
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 8905596..e647326 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -611,7 +611,7 @@ boot_openrel(char *relname)
{
/* We can now load the pg_type data */
rel = heap_open(TypeRelationId, NoLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
i = 0;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
++i;
@@ -620,7 +620,7 @@ boot_openrel(char *relname)
while (i-- > 0)
*app++ = ALLOC(struct typmap, 1);
*app = NULL;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
app = Typ;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -918,7 +918,7 @@ gettype(char *type)
}
elog(DEBUG4, "external type: %s", type);
rel = heap_open(TypeRelationId, NoLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
i = 0;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
++i;
@@ -927,7 +927,7 @@ gettype(char *type)
while (i-- > 0)
*app++ = ALLOC(struct typmap, 1);
*app = NULL;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
app = Typ;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index cb9b75a..cc17b42 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -788,7 +788,7 @@ objectsInSchemaToOids(GrantObjectType objtype, List *nspnames)
ObjectIdGetDatum(namespaceId));
rel = heap_open(ProcedureRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 1, key);
+ scan = heap_beginscan_instant(rel, 1, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -833,7 +833,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
CharGetDatum(relkind));
rel = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 2, key);
+ scan = heap_beginscan_instant(rel, 2, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -1333,7 +1333,7 @@ RemoveRoleFromObjectACL(Oid roleid, Oid classid, Oid objid)
ObjectIdGetDatum(objid));
scan = systable_beginscan(rel, DefaultAclOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
@@ -1453,7 +1453,7 @@ RemoveDefaultACLById(Oid defaclOid)
ObjectIdGetDatum(defaclOid));
scan = systable_beginscan(rel, DefaultAclOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
@@ -2706,7 +2706,7 @@ ExecGrant_Largeobject(InternalGrant *istmt)
scan = systable_beginscan(relation,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -3469,7 +3469,7 @@ pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnum, Oid roleid,
return pg_language_aclmask(table_oid, roleid, mask, how);
case ACL_KIND_LARGEOBJECT:
return pg_largeobject_aclmask_snapshot(table_oid, roleid,
- mask, how, SnapshotNow);
+ mask, how, NULL);
case ACL_KIND_NAMESPACE:
return pg_namespace_aclmask(table_oid, roleid, mask, how);
case ACL_KIND_TABLESPACE:
@@ -3857,10 +3857,13 @@ pg_language_aclmask(Oid lang_oid, Oid roleid,
* Exported routine for examining a user's privileges for a largeobject
*
* When a large object is opened for reading, it is opened relative to the
- * caller's snapshot, but when it is opened for writing, it is always relative
- * to SnapshotNow, as documented in doc/src/sgml/lobj.sgml. This function
- * takes a snapshot argument so that the permissions check can be made relative
- * to the same snapshot that will be used to read the underlying data.
+ * caller's snapshot, but when it is opened for writing, an instantaneous
+ * MVCC snapshot will be used. See doc/src/sgml/lobj.sgml. This function
+ * takes a snapshot argument so that the permissions check can be made
+ * relative to the same snapshot that will be used to read the underlying
+ * data. The caller will actually pass NULL for an instantaneous MVCC
+ * snapshot, since all we do with the snapshot argument is pass it through
+ * to systable_beginscan().
*/
AclMode
pg_largeobject_aclmask_snapshot(Oid lobj_oid, Oid roleid,
@@ -4645,7 +4648,7 @@ pg_language_ownercheck(Oid lan_oid, Oid roleid)
* Ownership check for a largeobject (specified by OID)
*
* This is only used for operations like ALTER LARGE OBJECT that are always
- * relative to SnapshotNow.
+ * relative to an up-to-date snapshot.
*/
bool
pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
@@ -4671,7 +4674,7 @@ pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
scan = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -5033,7 +5036,7 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
scan = systable_beginscan(pg_extension,
ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 41a5da0..1378488 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -232,6 +232,10 @@ IsReservedName(const char *name)
* know if it's shared. Fortunately, the set of shared relations is
* fairly static, so a hand-maintained list of their OIDs isn't completely
* impractical.
+ *
+ * XXX: Now that we have MVCC catalog access, the reasoning above is no longer
+ * true. Are there other good reasons to hard-code this, or should we revisit
+ * that decision?
*/
bool
IsSharedRelation(Oid relationId)
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 69171f8..fe17c96 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -558,7 +558,7 @@ findDependentObjects(const ObjectAddress *object,
nkeys = 2;
scan = systable_beginscan(*depRel, DependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -733,7 +733,7 @@ findDependentObjects(const ObjectAddress *object,
nkeys = 2;
scan = systable_beginscan(*depRel, DependReferenceIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1069,7 +1069,7 @@ deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags)
nkeys = 2;
scan = systable_beginscan(*depRel, DependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..4fd42ed 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1386,7 +1386,7 @@ RelationRemoveInheritance(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
simple_heap_delete(catalogRelation, &tuple->t_self);
@@ -1450,7 +1450,7 @@ DeleteAttributeTuples(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(attrel, AttributeRelidNumIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/* Delete all the matching tuples */
while ((atttup = systable_getnext(scan)) != NULL)
@@ -1491,7 +1491,7 @@ DeleteSystemAttributeTuples(Oid relid)
Int16GetDatum(0));
scan = systable_beginscan(attrel, AttributeRelidNumIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/* Delete all the matching tuples */
while ((atttup = systable_getnext(scan)) != NULL)
@@ -1623,7 +1623,7 @@ RemoveAttrDefault(Oid relid, AttrNumber attnum,
Int16GetDatum(attnum));
scan = systable_beginscan(attrdef_rel, AttrDefaultIndexId, true,
- SnapshotNow, 2, scankeys);
+ NULL, 2, scankeys);
/* There should be at most one matching tuple, but we loop anyway */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -1677,7 +1677,7 @@ RemoveAttrDefaultById(Oid attrdefId)
ObjectIdGetDatum(attrdefId));
scan = systable_beginscan(attrdef_rel, AttrDefaultOidIndexId, true,
- SnapshotNow, 1, scankeys);
+ NULL, 1, scankeys);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -2374,7 +2374,7 @@ MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr,
ObjectIdGetDatum(RelationGetNamespace(rel)));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -2640,7 +2640,7 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
}
scan = systable_beginscan(pgstatistic, StatisticRelidAttnumInhIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
/* we must loop even when attnum != 0, in case of inherited stats */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -2885,7 +2885,7 @@ heap_truncate_find_FKs(List *relationIds)
fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
fkeyScan = systable_beginscan(fkeyRel, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while (HeapTupleIsValid(tuple = systable_getnext(fkeyScan)))
{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..8eb0bf1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1845,7 +1845,7 @@ index_update_stats(Relation rel,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
- pg_class_scan = heap_beginscan(pg_class, SnapshotNow, 1, key);
+ pg_class_scan = heap_beginscan_instant(pg_class, 1, key);
tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
tuple = heap_copytuple(tuple);
heap_endscan(pg_class_scan);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 23943ff..4434dd6 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4013,8 +4013,8 @@ fetch_search_path_array(Oid *sarray, int sarray_len)
* a nonexistent object OID, rather than failing. This is to avoid race
* condition errors when a query that's scanning a catalog using an MVCC
* snapshot uses one of these functions. The underlying IsVisible functions
- * operate on SnapshotNow semantics and so might see the object as already
- * gone when it's still visible to the MVCC snapshot. (There is no race
+ * always use an up-to-date snapshot and so might see the object as already
+ * gone when it's still visible to the transaction snapshot. (There is no race
* condition in the current coding because we don't accept sinval messages
* between the SearchSysCacheExists test and the subsequent lookup.)
*/
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 215eaf5..4d22f3a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1481,7 +1481,7 @@ get_catalog_object_by_oid(Relation catalog, Oid objectId)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(catalog, oidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
{
@@ -1544,7 +1544,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(castDesc, CastOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -1644,7 +1644,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
adscan = systable_beginscan(attrdefDesc, AttrDefaultOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(adscan);
@@ -1750,7 +1750,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amopDesc, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -1800,7 +1800,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amprocDesc, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -1848,7 +1848,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(ruleDesc, RewriteOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -1883,7 +1883,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
tgscan = systable_beginscan(trigDesc, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(tgscan);
@@ -2064,7 +2064,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(defaclrel, DefaultAclOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -2816,7 +2816,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
adscan = systable_beginscan(attrdefDesc, AttrDefaultOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(adscan);
@@ -2921,7 +2921,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amopDesc, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -2965,7 +2965,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amprocDesc, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -3218,7 +3218,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(defaclrel, DefaultAclOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(rcscan);
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index dd00502..99f4be5 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -166,7 +166,7 @@ RemoveCollationById(Oid collationOid)
ObjectIdGetDatum(collationOid));
scandesc = systable_beginscan(rel, CollationOidIndexId, true,
- SnapshotNow, 1, &scanKeyData);
+ NULL, 1, &scanKeyData);
tuple = systable_getnext(scandesc);
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index a8eb4cb..5021420 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -412,7 +412,7 @@ ConstraintNameIsUsed(ConstraintCategory conCat, Oid objId,
ObjectIdGetDatum(objNamespace));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -506,7 +506,7 @@ ChooseConstraintName(const char *name1, const char *name2,
ObjectIdGetDatum(namespaceid));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
found = (HeapTupleIsValid(systable_getnext(conscan)));
@@ -699,7 +699,7 @@ AlterConstraintNamespaces(Oid ownerId, Oid oldNspId,
ObjectIdGetDatum(ownerId));
scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
}
else
{
@@ -709,7 +709,7 @@ AlterConstraintNamespaces(Oid ownerId, Oid oldNspId,
ObjectIdGetDatum(ownerId));
scan = systable_beginscan(conRel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
}
while (HeapTupleIsValid((tup = systable_getnext(scan))))
@@ -778,7 +778,7 @@ get_relation_constraint_oid(Oid relid, const char *conname, bool missing_ok)
ObjectIdGetDatum(relid));
scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -836,7 +836,7 @@ get_domain_constraint_oid(Oid typid, const char *conname, bool missing_ok)
ObjectIdGetDatum(typid));
scan = systable_beginscan(pg_constraint, ConstraintTypidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -903,7 +903,7 @@ check_functional_grouping(Oid relid,
ObjectIdGetDatum(relid));
scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index 45d8e62..9cdf7b2 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -166,8 +166,7 @@ RemoveConversionById(Oid conversionOid)
/* open pg_conversion */
rel = heap_open(ConversionRelationId, RowExclusiveLock);
- scan = heap_beginscan(rel, SnapshotNow,
- 1, &scanKeyData);
+ scan = heap_beginscan_instant(rel, 1, &scanKeyData);
/* search for the target tuple */
if (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
diff --git a/src/backend/catalog/pg_db_role_setting.c b/src/backend/catalog/pg_db_role_setting.c
index 4594912..733bc3b 100644
--- a/src/backend/catalog/pg_db_role_setting.c
+++ b/src/backend/catalog/pg_db_role_setting.c
@@ -43,7 +43,7 @@ AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(roleid));
scan = systable_beginscan(rel, DbRoleSettingDatidRolidIndexId, true,
- SnapshotNow, 2, scankey);
+ NULL, 2, scankey);
tuple = systable_getnext(scan);
/*
@@ -205,7 +205,7 @@ DropSetting(Oid databaseid, Oid roleid)
numkeys++;
}
- scan = heap_beginscan(relsetting, SnapshotNow, numkeys, keys);
+ scan = heap_beginscan_instant(relsetting, numkeys, keys);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
simple_heap_delete(relsetting, &tup->t_self);
@@ -244,7 +244,7 @@ ApplySetting(Oid databaseid, Oid roleid, Relation relsetting, GucSource source)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(relsetting, DbRoleSettingDatidRolidIndexId, true,
- SnapshotNow, 2, keys);
+ NULL, 2, keys);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
bool isnull;
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 9535fba..bd5cd99 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -211,7 +211,7 @@ deleteDependencyRecordsFor(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -261,7 +261,7 @@ deleteDependencyRecordsForClass(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -343,7 +343,7 @@ changeDependencyFor(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -407,7 +407,7 @@ isObjectPinned(const ObjectAddress *object, Relation rel)
ObjectIdGetDatum(object->objectId));
scan = systable_beginscan(rel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/*
* Since we won't generate additional pg_depend entries for pinned
@@ -467,7 +467,7 @@ getExtensionOfObject(Oid classId, Oid objectId)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -520,7 +520,7 @@ sequenceIsOwned(Oid seqId, Oid *tableId, int32 *colId)
ObjectIdGetDatum(seqId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -580,7 +580,7 @@ getOwnedSequences(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -643,7 +643,7 @@ get_constraint_index(Oid constraintId)
Int32GetDatum(0));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -701,7 +701,7 @@ get_index_constraint(Oid indexId)
Int32GetDatum(0));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index 7e746f9..a7ef8cd 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -156,7 +156,7 @@ EnumValuesDelete(Oid enumTypeOid)
ObjectIdGetDatum(enumTypeOid));
scan = systable_beginscan(pg_enum, EnumTypIdLabelIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -483,6 +483,9 @@ restart:
* (for example, enum_in and enum_out do so). The worst that can happen
* is a transient failure to find any valid value of the row. This is
* judged acceptable in view of the infrequency of use of RenumberEnumType.
+ *
+ * XXX. Now that we have MVCC catalog scans, the above reasoning is no longer
+ * correct. Should we revisit any decisions here?
*/
static void
RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems)
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index fbfe7bc..638e535 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -81,7 +81,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
ObjectIdGetDatum(parentrelId));
scan = systable_beginscan(relation, InheritsParentIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while ((inheritsTuple = systable_getnext(scan)) != NULL)
{
@@ -325,7 +325,7 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
ObjectIdGetDatum(this_relid));
inhscan = systable_beginscan(inhrel, InheritsRelidSeqnoIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while ((inhtup = systable_getnext(inhscan)) != NULL)
{
diff --git a/src/backend/catalog/pg_largeobject.c b/src/backend/catalog/pg_largeobject.c
index d01a5a7..22d499d 100644
--- a/src/backend/catalog/pg_largeobject.c
+++ b/src/backend/catalog/pg_largeobject.c
@@ -104,7 +104,7 @@ LargeObjectDrop(Oid loid)
scan = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -126,7 +126,7 @@ LargeObjectDrop(Oid loid)
scan = systable_beginscan(pg_largeobject,
LargeObjectLOidPNIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
simple_heap_delete(pg_largeobject, &tuple->t_self);
@@ -145,11 +145,11 @@ LargeObjectDrop(Oid loid)
* We don't use the system cache for large object metadata, for fear of
* using too much local memory.
*
- * This function always scans the system catalog using SnapshotNow, so it
- * should not be used when a large object is opened in read-only mode (because
- * large objects opened in read only mode are supposed to be viewed relative
- * to the caller's snapshot, whereas in read-write mode they are relative to
- * SnapshotNow).
+ * This function always scans the system catalog using an up-to-date snapshot,
+ * so it should not be used when a large object is opened in read-only mode
+ * (because large objects opened in read only mode are supposed to be viewed
+ * relative to the caller's snapshot, whereas in read-write mode they are
+ * relative to a current snapshot).
*/
bool
LargeObjectExists(Oid loid)
@@ -170,7 +170,7 @@ LargeObjectExists(Oid loid)
sd = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(sd);
if (HeapTupleIsValid(tuple))
diff --git a/src/backend/catalog/pg_range.c b/src/backend/catalog/pg_range.c
index 639b40c..b782f90 100644
--- a/src/backend/catalog/pg_range.c
+++ b/src/backend/catalog/pg_range.c
@@ -126,7 +126,7 @@ RangeDelete(Oid rangeTypeOid)
ObjectIdGetDatum(rangeTypeOid));
scan = systable_beginscan(pg_range, RangeTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 7de4420..dc21c10 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -220,7 +220,7 @@ shdepChangeDep(Relation sdepRel,
Int32GetDatum(objsubid));
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 4, key);
+ NULL, 4, key);
while ((scantup = systable_getnext(scan)) != NULL)
{
@@ -554,7 +554,7 @@ checkSharedDependencies(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -729,7 +729,7 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
ObjectIdGetDatum(templateDbId));
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/* Set up to copy the tuples except for inserting newDbId */
memset(values, 0, sizeof(values));
@@ -792,7 +792,7 @@ dropDatabaseDependencies(Oid databaseId)
/* We leave the other index fields unspecified */
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -936,7 +936,7 @@ shdepDropDependency(Relation sdepRel,
}
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1125,7 +1125,7 @@ isSharedObjectPinned(Oid classId, Oid objectId, Relation sdepRel)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/*
* Since we won't generate additional pg_shdepend entries for pinned
@@ -1212,7 +1212,7 @@ shdepDropOwned(List *roleids, DropBehavior behavior)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while ((tuple = systable_getnext(scan)) != NULL)
{
@@ -1319,7 +1319,7 @@ shdepReassignOwned(List *roleids, Oid newrole)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while ((tuple = systable_getnext(scan)) != NULL)
{
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..591bad5 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
* against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe
* to execute with less than full exclusive lock on the parent table;
* otherwise concurrent executions of RelationGetIndexList could miss indexes.
+ *
+ * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index
+ * shouldn't be common enough to worry about. The above comment needs
+ * to be updated, and it may be possible to simplify the logic here in other
+ * ways also.
*/
void
mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
@@ -1583,7 +1588,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
Anum_pg_index_indisclustered,
BTEqualStrategyNumber, F_BOOLEQ,
BoolGetDatum(true));
- scan = heap_beginscan(indRelation, SnapshotNow, 1, &entry);
+ scan = heap_beginscan_instant(indRelation, 1, &entry);
while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
index = (Form_pg_index) GETSTRUCT(indexTuple);
diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c
index 60db27c..8baf017 100644
--- a/src/backend/commands/comment.c
+++ b/src/backend/commands/comment.c
@@ -187,7 +187,7 @@ CreateComments(Oid oid, Oid classoid, int32 subid, char *comment)
description = heap_open(DescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
{
@@ -281,7 +281,7 @@ CreateSharedComments(Oid oid, Oid classoid, char *comment)
shdescription = heap_open(SharedDescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(shdescription, SharedDescriptionObjIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
{
@@ -363,7 +363,7 @@ DeleteComments(Oid oid, Oid classoid, int32 subid)
description = heap_open(DescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, nkeys, skey);
+ NULL, nkeys, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
simple_heap_delete(description, &oldtuple->t_self);
@@ -399,7 +399,7 @@ DeleteSharedComments(Oid oid, Oid classoid)
shdescription = heap_open(SharedDescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(shdescription, SharedDescriptionObjIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
simple_heap_delete(shdescription, &oldtuple->t_self);
@@ -442,7 +442,7 @@ GetComment(Oid oid, Oid classoid, int32 subid)
tupdesc = RelationGetDescr(description);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
comment = NULL;
while ((tuple = systable_getnext(sd)) != NULL)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 0e10a75..34e3071 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -133,7 +133,6 @@ createdb(const CreatedbStmt *stmt)
int notherbackends;
int npreparedxacts;
createdb_failure_params fparms;
- Snapshot snapshot;
/* Extract options from the statement node tree */
foreach(option, stmt->options)
@@ -538,29 +537,6 @@ createdb(const CreatedbStmt *stmt)
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
/*
- * Take an MVCC snapshot to use while scanning through pg_tablespace. For
- * safety, register the snapshot (this prevents it from changing if
- * something else were to request a snapshot during the loop).
- *
- * Traversing pg_tablespace with an MVCC snapshot is necessary to provide
- * us with a consistent view of the tablespaces that exist. Using
- * SnapshotNow here would risk seeing the same tablespace multiple times,
- * or worse not seeing a tablespace at all, if its tuple is moved around
- * by a concurrent update (eg an ACL change).
- *
- * Inconsistency of this sort is inherent to all SnapshotNow scans, unless
- * some lock is held to prevent concurrent updates of the rows being
- * sought. There should be a generic fix for that, but in the meantime
- * it's worth fixing this case in particular because we are doing very
- * heavyweight operations within the scan, so that the elapsed time for
- * the scan is vastly longer than for most other catalog scans. That
- * means there's a much wider window for concurrent updates to cause
- * trouble here than anywhere else. XXX this code should be changed
- * whenever a generic fix is implemented.
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
-
- /*
* Once we start copying subdirectories, we need to be able to clean 'em
* up if we fail. Use an ENSURE block to make sure this happens. (This
* is not a 100% solution, because of the possibility of failure during
@@ -577,7 +553,7 @@ createdb(const CreatedbStmt *stmt)
* each one to the new database.
*/
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid srctablespace = HeapTupleGetOid(tuple);
@@ -682,9 +658,6 @@ createdb(const CreatedbStmt *stmt)
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
PointerGetDatum(&fparms));
- /* Free our snapshot */
- UnregisterSnapshot(snapshot);
-
return dboid;
}
@@ -1214,7 +1187,7 @@ movedb(const char *dbname, const char *tblspcname)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(dbname));
sysscan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
oldtuple = systable_getnext(sysscan);
if (!HeapTupleIsValid(oldtuple)) /* shouldn't happen... */
ereport(ERROR,
@@ -1403,7 +1376,7 @@ AlterDatabase(AlterDatabaseStmt *stmt, bool isTopLevel)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(stmt->dbname));
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
@@ -1498,7 +1471,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(dbname));
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
@@ -1637,7 +1610,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
NameGetDatum(name));
scan = systable_beginscan(relation, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scanKey);
+ NULL, 1, &scanKey);
tuple = systable_getnext(scan);
@@ -1751,20 +1724,9 @@ remove_dbtablespaces(Oid db_id)
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
- Snapshot snapshot;
-
- /*
- * As in createdb(), we'd better use an MVCC snapshot here, since this
- * scan can run for a long time. Duplicate visits to tablespaces would be
- * harmless, but missing a tablespace could result in permanently leaked
- * files.
- *
- * XXX change this when a generic fix for SnapshotNow races is implemented
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1810,7 +1772,6 @@ remove_dbtablespaces(Oid db_id)
heap_endscan(scan);
heap_close(rel, AccessShareLock);
- UnregisterSnapshot(snapshot);
}
/*
@@ -1832,19 +1793,9 @@ check_db_file_conflict(Oid db_id)
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
- Snapshot snapshot;
-
- /*
- * As in createdb(), we'd better use an MVCC snapshot here; missing a
- * tablespace could result in falsely reporting the OID is unique, with
- * disastrous future consequences per the comment above.
- *
- * XXX change this when a generic fix for SnapshotNow races is implemented
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1870,7 +1821,6 @@ check_db_file_conflict(Oid db_id)
heap_endscan(scan);
heap_close(rel, AccessShareLock);
- UnregisterSnapshot(snapshot);
return result;
}
@@ -1927,7 +1877,7 @@ get_database_oid(const char *dbname, bool missing_ok)
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(dbname));
scan = systable_beginscan(pg_database, DatabaseNameIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
dbtuple = systable_getnext(scan);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 2d84ac8..d5ac47f 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -128,7 +128,7 @@ get_extension_oid(const char *extname, bool missing_ok)
CStringGetDatum(extname));
scandesc = systable_beginscan(rel, ExtensionNameIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -173,7 +173,7 @@ get_extension_name(Oid ext_oid)
ObjectIdGetDatum(ext_oid));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -212,7 +212,7 @@ get_extension_schema(Oid ext_oid)
ObjectIdGetDatum(ext_oid));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -1605,7 +1605,7 @@ RemoveExtensionById(Oid extId)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(extId));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -2103,7 +2103,7 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
ObjectIdGetDatum(CurrentExtensionObject));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2252,7 +2252,7 @@ extension_config_remove(Oid extensionoid, Oid tableoid)
ObjectIdGetDatum(extensionoid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2460,7 +2460,7 @@ AlterExtensionNamespace(List *names, const char *newschema)
ObjectIdGetDatum(extensionOid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2508,7 +2508,7 @@ AlterExtensionNamespace(List *names, const char *newschema)
ObjectIdGetDatum(extensionOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -2618,7 +2618,7 @@ ExecAlterExtensionStmt(AlterExtensionStmt *stmt)
CStringGetDatum(stmt->extname));
extScan = systable_beginscan(extRel, ExtensionNameIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2768,7 +2768,7 @@ ApplyExtensionUpdates(Oid extensionOid,
ObjectIdGetDatum(extensionOid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 38187a8..f3a8ddd 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1607,7 +1607,7 @@ DropCastById(Oid castOid)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(castOid));
scan = systable_beginscan(relation, CastOidIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7ea90d0..67bd4b5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1358,7 +1358,7 @@ GetDefaultOpClass(Oid type_id, Oid am_id)
ObjectIdGetDatum(am_id));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1838,7 +1838,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
* indirectly by reindex_relation).
*/
relationRelation = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(relationRelation, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(relationRelation, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index f2d78ef..3140b37 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -614,7 +614,7 @@ DefineOpClass(CreateOpClassStmt *stmt)
ObjectIdGetDatum(amoid));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1622,7 +1622,7 @@ RemoveAmOpEntryById(Oid entryOid)
rel = heap_open(AccessMethodOperatorRelationId, RowExclusiveLock);
scan = systable_beginscan(rel, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
/* we expect exactly one match */
tup = systable_getnext(scan);
@@ -1651,7 +1651,7 @@ RemoveAmProcEntryById(Oid entryOid)
rel = heap_open(AccessMethodProcedureRelationId, RowExclusiveLock);
scan = systable_beginscan(rel, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
/* we expect exactly one match */
tup = systable_getnext(scan);
diff --git a/src/backend/commands/proclang.c b/src/backend/commands/proclang.c
index 6e4c682..b7be1f7 100644
--- a/src/backend/commands/proclang.c
+++ b/src/backend/commands/proclang.c
@@ -455,7 +455,7 @@ find_language_template(const char *languageName)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(languageName));
scan = systable_beginscan(rel, PLTemplateNameIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
tup = systable_getnext(scan);
if (HeapTupleIsValid(tup))
diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
index 3b27ac2..7466e66 100644
--- a/src/backend/commands/seclabel.c
+++ b/src/backend/commands/seclabel.c
@@ -167,7 +167,7 @@ GetSharedSecurityLabel(const ObjectAddress *object, const char *provider)
pg_shseclabel = heap_open(SharedSecLabelRelationId, AccessShareLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 3, keys);
+ NULL, 3, keys);
tuple = systable_getnext(scan);
if (HeapTupleIsValid(tuple))
@@ -224,7 +224,7 @@ GetSecurityLabel(const ObjectAddress *object, const char *provider)
pg_seclabel = heap_open(SecLabelRelationId, AccessShareLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, 4, keys);
+ NULL, 4, keys);
tuple = systable_getnext(scan);
if (HeapTupleIsValid(tuple))
@@ -284,7 +284,7 @@ SetSharedSecurityLabel(const ObjectAddress *object,
pg_shseclabel = heap_open(SharedSecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 3, keys);
+ NULL, 3, keys);
oldtup = systable_getnext(scan);
if (HeapTupleIsValid(oldtup))
@@ -375,7 +375,7 @@ SetSecurityLabel(const ObjectAddress *object,
pg_seclabel = heap_open(SecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, 4, keys);
+ NULL, 4, keys);
oldtup = systable_getnext(scan);
if (HeapTupleIsValid(oldtup))
@@ -434,7 +434,7 @@ DeleteSharedSecurityLabel(Oid objectId, Oid classId)
pg_shseclabel = heap_open(SharedSecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(oldtup = systable_getnext(scan)))
simple_heap_delete(pg_shseclabel, &oldtup->t_self);
systable_endscan(scan);
@@ -485,7 +485,7 @@ DeleteSecurityLabel(const ObjectAddress *object)
pg_seclabel = heap_open(SecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, nkeys, skey);
+ NULL, nkeys, skey);
while (HeapTupleIsValid(oldtup = systable_getnext(scan)))
simple_heap_delete(pg_seclabel, &oldtup->t_self);
systable_endscan(scan);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8294b29..3a4a23a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2738,7 +2738,7 @@ AlterTableGetLockLevel(List *cmds)
* multiple DDL operations occur in a stream against frequently accessed
* tables.
*
- * 1. Catalog tables are read using SnapshotNow, which has a race bug that
+ * 1. Catalog tables were read using SnapshotNow, which has a race bug that
* allows a scan to return no valid rows even when one is present in the
* case of a commit of a concurrent update of the catalog table.
* SnapshotNow also ignores transactions in progress, so takes the latest
@@ -3793,7 +3793,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
* Scan through the rows, generating a new row if needed and then
* checking all the constraints.
*/
- scan = heap_beginscan(oldrel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(oldrel, 0, NULL);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -4170,7 +4170,7 @@ find_composite_type_dependencies(Oid typeOid, Relation origRelation,
ObjectIdGetDatum(typeOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -4269,7 +4269,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(typeOid));
- scan = heap_beginscan(classRel, SnapshotNow, 1, key);
+ scan = heap_beginscan_instant(classRel, 1, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -6202,7 +6202,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -6708,7 +6708,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
slot = MakeSingleTupleTableSlot(tupdesc);
econtext->ecxt_scantuple = slot;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -6783,7 +6783,7 @@ validateForeignKeyConstraint(char *conname,
* if that tuple had just been inserted. If any of those fail, it should
* ereport(ERROR) and that's that.
*/
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -7033,7 +7033,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -7114,7 +7114,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(childrelid));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
/* scan for matching tuple - there should only be one */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -7514,7 +7514,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
Int32GetDatum((int32) attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
@@ -7699,7 +7699,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
Int32GetDatum((int32) attnum));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
@@ -8376,7 +8376,7 @@ change_owner_fix_column_acls(Oid relationOid, Oid oldOwnerId, Oid newOwnerId)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relationOid));
scan = systable_beginscan(attRelation, AttributeRelidNumIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(attributeTuple = systable_getnext(scan)))
{
Form_pg_attribute att = (Form_pg_attribute) GETSTRUCT(attributeTuple);
@@ -8453,7 +8453,7 @@ change_owner_recurse_to_sequences(Oid relationOid, Oid newOwnerId, LOCKMODE lock
/* we leave refobjsubid unspecified */
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -9047,7 +9047,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(child_rel)));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
/* inhseqno sequences start at 1 */
inhseqno = 0;
@@ -9289,7 +9289,7 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(parent_rel)));
parent_scan = systable_beginscan(catalog_relation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &parent_key);
+ true, NULL, 1, &parent_key);
while (HeapTupleIsValid(parent_tuple = systable_getnext(parent_scan)))
{
@@ -9312,7 +9312,7 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(child_rel)));
child_scan = systable_beginscan(catalog_relation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &child_key);
+ true, NULL, 1, &child_key);
while (HeapTupleIsValid(child_tuple = systable_getnext(child_scan)))
{
@@ -9420,7 +9420,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(inheritsTuple = systable_getnext(scan)))
{
@@ -9454,7 +9454,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, AttributeRelidNumIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(attributeTuple = systable_getnext(scan)))
{
Form_pg_attribute att = (Form_pg_attribute) GETSTRUCT(attributeTuple);
@@ -9496,7 +9496,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(parent_rel)));
scan = systable_beginscan(catalogRelation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
connames = NIL;
@@ -9516,7 +9516,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(constraintTuple = systable_getnext(scan)))
{
@@ -9608,7 +9608,7 @@ drop_parent_dependency(Oid relid, Oid refclassid, Oid refobjid)
Int32GetDatum(0));
scan = systable_beginscan(catalogRelation, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTuple = systable_getnext(scan)))
{
@@ -9663,7 +9663,7 @@ ATExecAddOf(Relation rel, const TypeName *ofTypename, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
scan = systable_beginscan(inheritsRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
if (HeapTupleIsValid(systable_getnext(scan)))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -10119,7 +10119,7 @@ AlterSeqNamespaces(Relation classRel, Relation rel,
/* we leave refobjsubid unspecified */
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 8589512..dce227c 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -400,7 +400,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_instant(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
if (!HeapTupleIsValid(tuple))
@@ -831,7 +831,7 @@ RenameTableSpace(const char *oldname, const char *newname)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(oldname));
- scan = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scan = heap_beginscan_instant(rel, 1, entry);
tup = heap_getnext(scan, ForwardScanDirection);
if (!HeapTupleIsValid(tup))
ereport(ERROR,
@@ -861,7 +861,7 @@ RenameTableSpace(const char *oldname, const char *newname)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(newname));
- scan = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scan = heap_beginscan_instant(rel, 1, entry);
tup = heap_getnext(scan, ForwardScanDirection);
if (HeapTupleIsValid(tup))
ereport(ERROR,
@@ -910,7 +910,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(stmt->tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_instant(rel, 1, entry);
tup = heap_getnext(scandesc, ForwardScanDirection);
if (!HeapTupleIsValid(tup))
ereport(ERROR,
@@ -1311,7 +1311,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_instant(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
/* We assume that there can be at most one matching tuple */
@@ -1357,7 +1357,7 @@ get_tablespace_name(Oid spc_oid)
ObjectIdAttributeNumber,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(spc_oid));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_instant(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
/* We assume that there can be at most one matching tuple */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ed65bab..d86e9ad 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -492,7 +492,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
{
Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(tuple);
@@ -1048,7 +1048,7 @@ RemoveTriggerById(Oid trigOid)
ObjectIdGetDatum(trigOid));
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(tgscan);
if (!HeapTupleIsValid(tup))
@@ -1127,7 +1127,7 @@ get_trigger_oid(Oid relid, const char *trigname, bool missing_ok)
CStringGetDatum(trigname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
tup = systable_getnext(tgscan);
@@ -1242,7 +1242,7 @@ renametrig(RenameStmt *stmt)
BTEqualStrategyNumber, F_NAMEEQ,
PointerGetDatum(stmt->newname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
@@ -1262,7 +1262,7 @@ renametrig(RenameStmt *stmt)
BTEqualStrategyNumber, F_NAMEEQ,
PointerGetDatum(stmt->subname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
{
tgoid = HeapTupleGetOid(tuple);
@@ -1359,7 +1359,7 @@ EnableDisableTrigger(Relation rel, const char *tgname,
nkeys = 1;
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, nkeys, keys);
+ NULL, nkeys, keys);
found = changed = false;
@@ -1468,7 +1468,7 @@ RelationBuildTriggers(Relation relation)
tgrel = heap_open(TriggerRelationId, AccessShareLock);
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
@@ -4270,7 +4270,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
ObjectIdGetDatum(namespaceId));
conscan = systable_beginscan(conrel, ConstraintNameNspIndexId,
- true, SnapshotNow, 2, skey);
+ true, NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -4333,7 +4333,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
ObjectIdGetDatum(conoid));
tgscan = systable_beginscan(tgrel, TriggerConstraintIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 57b69f8..61ebc2e 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -921,7 +921,7 @@ makeConfigurationDependencies(HeapTuple tuple, bool removeOld,
ObjectIdGetDatum(myself.objectId));
scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1059,7 +1059,7 @@ DefineTSConfiguration(List *names, List *parameters)
ObjectIdGetDatum(sourceOid));
scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1138,7 +1138,7 @@ RemoveTSConfigurationById(Oid cfgId)
ObjectIdGetDatum(cfgId));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -1294,7 +1294,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
Int32GetDatum(tokens[i]));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1333,7 +1333,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
ObjectIdGetDatum(cfgId));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1450,7 +1450,7 @@ DropConfigurationMapping(AlterTSConfigurationStmt *stmt,
Int32GetDatum(tokens[i]));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 6bc16f1..d173004 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -2258,7 +2258,7 @@ AlterDomainNotNull(List *names, bool notNull)
HeapTuple tuple;
/* Scan all tuples in this relation */
- scan = heap_beginscan(testrel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(testrel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
int i;
@@ -2356,7 +2356,7 @@ AlterDomainDropConstraint(List *names, const char *constrName,
ObjectIdGetDatum(HeapTupleGetOid(tup)));
conscan = systable_beginscan(conrel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/*
* Scan over the result set, removing any matching entries.
@@ -2551,7 +2551,7 @@ AlterDomainValidateConstraint(List *names, char *constrName)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(domainoid));
scan = systable_beginscan(conrel, ConstraintTypidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -2640,7 +2640,7 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
HeapTuple tuple;
/* Scan all tuples in this relation */
- scan = heap_beginscan(testrel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(testrel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
int i;
@@ -2751,7 +2751,7 @@ get_rels_with_domain(Oid domainOid, LOCKMODE lockmode)
ObjectIdGetDatum(domainOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -3066,7 +3066,7 @@ GetDomainConstraints(Oid typeOid)
ObjectIdGetDatum(typeOid));
scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(conTup = systable_getnext(scan)))
{
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 844f25c..e101a86 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1006,7 +1006,7 @@ DropRole(DropRoleStmt *stmt)
ObjectIdGetDatum(roleid));
sscan = systable_beginscan(pg_auth_members_rel, AuthMemRoleMemIndexId,
- true, SnapshotNow, 1, &scankey);
+ true, NULL, 1, &scankey);
while (HeapTupleIsValid(tmp_tuple = systable_getnext(sscan)))
{
@@ -1021,7 +1021,7 @@ DropRole(DropRoleStmt *stmt)
ObjectIdGetDatum(roleid));
sscan = systable_beginscan(pg_auth_members_rel, AuthMemMemRoleIndexId,
- true, SnapshotNow, 1, &scankey);
+ true, NULL, 1, &scankey);
while (HeapTupleIsValid(tmp_tuple = systable_getnext(sscan)))
{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 641c740..2e26127 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -351,7 +351,7 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
pgclass = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(pgclass, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(pgclass, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -735,7 +735,7 @@ vac_update_datfrozenxid(void)
relation = heap_open(RelationRelationId, AccessShareLock);
scan = systable_beginscan(relation, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while ((classTup = systable_getnext(scan)) != NULL)
{
@@ -852,7 +852,7 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId frozenMulti)
*/
relation = heap_open(DatabaseRelationId, AccessShareLock);
- scan = heap_beginscan(relation, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(relation, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index cd88061..dd2359a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1855,7 +1855,7 @@ get_database_list(void)
(void) GetTransactionSnapshot();
rel = heap_open(DatabaseRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
@@ -2002,7 +2002,7 @@ do_autovacuum(void)
* wide tables there might be proportionally much more activity in the
* TOAST table than in its parent.
*/
- relScan = heap_beginscan(classRel, SnapshotNow, 0, NULL);
+ relScan = heap_beginscan_instant(classRel, 0, NULL);
/*
* On the first pass, we collect main tables to vacuum, and also the main
@@ -2120,7 +2120,7 @@ do_autovacuum(void)
BTEqualStrategyNumber, F_CHAREQ,
CharGetDatum(RELKIND_TOASTVALUE));
- relScan = heap_beginscan(classRel, SnapshotNow, 1, &key);
+ relScan = heap_beginscan_instant(classRel, 1, &key);
while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
{
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ac20dff..4451ec5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1109,7 +1109,7 @@ pgstat_collect_oids(Oid catalogid)
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
rel = heap_open(catalogid, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(rel, 0, NULL);
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid thisoid = HeapTupleGetOid(tup);
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..8d1d255 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -419,7 +419,7 @@ DefineQueryRewrite(char *rulename,
{
HeapScanDesc scanDesc;
- scanDesc = heap_beginscan(event_relation, SnapshotNow, 0, NULL);
+ scanDesc = heap_beginscan_instant(event_relation, 0, NULL);
if (heap_getnext(scanDesc, ForwardScanDirection) != NULL)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 01875fc..874535f 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2090,8 +2090,8 @@ relation_is_updatable(Oid reloid, int req_events)
/*
* If the relation doesn't exist, say "false" rather than throwing an
* error. This is helpful since scanning an information_schema view under
- * MVCC rules can result in referencing rels that were just deleted
- * according to a SnapshotNow probe.
+ * MVCC rules can result in referencing rels that have actually been
+ * deleted already.
*/
if (rel == NULL)
return false;
diff --git a/src/backend/rewrite/rewriteRemove.c b/src/backend/rewrite/rewriteRemove.c
index 75fc776..51e27cf 100644
--- a/src/backend/rewrite/rewriteRemove.c
+++ b/src/backend/rewrite/rewriteRemove.c
@@ -58,7 +58,7 @@ RemoveRewriteRuleById(Oid ruleOid)
ObjectIdGetDatum(ruleOid));
rcscan = systable_beginscan(RewriteRelation, RewriteOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(rcscan);
diff --git a/src/backend/rewrite/rewriteSupport.c b/src/backend/rewrite/rewriteSupport.c
index f481c53..e24ae44 100644
--- a/src/backend/rewrite/rewriteSupport.c
+++ b/src/backend/rewrite/rewriteSupport.c
@@ -143,7 +143,7 @@ get_rewrite_oid_without_relid(const char *rulename,
CStringGetDatum(rulename));
RewriteRelation = heap_open(RewriteRelationId, AccessShareLock);
- scanDesc = heap_beginscan(RewriteRelation, SnapshotNow, 1, &scanKeyData);
+ scanDesc = heap_beginscan_instant(RewriteRelation, 1, &scanKeyData);
htup = heap_getnext(scanDesc, ForwardScanDirection);
if (!HeapTupleIsValid(htup))
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index b98110c..fb91571 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -250,7 +250,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
if (flags & INV_WRITE)
{
- retval->snapshot = SnapshotNow;
+ retval->snapshot = NULL; /* instantaneous MVCC snapshot */
retval->flags = IFS_WRLOCK | IFS_RDLOCK;
}
else if (flags & INV_READ)
@@ -270,7 +270,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
errmsg("invalid flags for opening a large object: %d",
flags)));
- /* Can't use LargeObjectExists here because it always uses SnapshotNow */
+ /* Can't use LargeObjectExists here because we need to specify snapshot */
if (!myLargeObjectExists(lobjId, retval->snapshot))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -288,9 +288,8 @@ inv_close(LargeObjectDesc *obj_desc)
{
Assert(PointerIsValid(obj_desc));
- if (obj_desc->snapshot != SnapshotNow)
- UnregisterSnapshotFromOwner(obj_desc->snapshot,
- TopTransactionResourceOwner);
+ UnregisterSnapshotFromOwner(obj_desc->snapshot,
+ TopTransactionResourceOwner);
pfree(obj_desc);
}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..5ddeffe 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -697,7 +697,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
* That leads to a couple of choices. We work from the pg_class row alone
* rather than actually opening each relation, for efficiency. We don't
* fail if we can't find the relation --- some rows might be visible in
- * the query's MVCC snapshot but already dead according to SnapshotNow.
+ * the query's MVCC snapshot even though the relations have been dropped.
* (Note: we could avoid using the catcache, but there's little point
* because the relation mapper also works "in the now".) We also don't
* fail if the relation doesn't have storage. In all these cases it
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 0d1ff61..fa61f5a 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -104,7 +104,7 @@ regprocin(PG_FUNCTION_ARGS)
hdesc = heap_open(ProcedureRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, ProcedureNameArgsNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
{
@@ -472,7 +472,7 @@ regoperin(PG_FUNCTION_ARGS)
hdesc = heap_open(OperatorRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, OperatorNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
{
@@ -843,7 +843,7 @@ regclassin(PG_FUNCTION_ARGS)
hdesc = heap_open(RelationRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, ClassNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
result = HeapTupleGetOid(tuple);
@@ -1007,7 +1007,7 @@ regtypein(PG_FUNCTION_ARGS)
hdesc = heap_open(TypeRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, TypeNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
result = HeapTupleGetOid(tuple);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index a1ed781..cf9ce3f 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -704,7 +704,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
ObjectIdGetDatum(trigid));
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
ht_trig = systable_getnext(tgscan);
@@ -1796,7 +1796,7 @@ pg_get_serial_sequence(PG_FUNCTION_ARGS)
Int32GetDatum(attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index cc91406..d12da76 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1182,7 +1182,7 @@ SearchCatCache(CatCache *cache,
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
- SnapshotNow,
+ NULL,
cache->cc_nkeys,
cur_skey);
@@ -1461,7 +1461,7 @@ SearchCatCacheList(CatCache *cache,
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
- SnapshotNow,
+ NULL,
nkeys,
cur_skey);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index e0dc126..675bd94 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -9,8 +9,8 @@
* consider that it is *still valid* so long as we are in the same command,
* ie, until the next CommandCounterIncrement() or transaction commit.
* (See utils/time/tqual.c, and note that system catalogs are generally
- * scanned under SnapshotNow rules by the system, or plain user snapshots
- * for user queries.) At the command boundary, the old tuple stops
+ * scanned under the most current snapshot available, rather than the
+ * transaction snapshot.) At the command boundary, the old tuple stops
* being valid and the new version, if any, becomes valid. Therefore,
* we cannot simply flush a tuple from the system caches during heap_update()
* or heap_delete(). The tuple is still good at that point; what's more,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index f114038..5a2e755 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -266,7 +266,8 @@ static void unlink_initfile(const char *initfilename);
* tuple matching targetRelId. The caller must hold at least
* AccessShareLock on the target relid to prevent concurrent-update
* scenarios --- else our SnapshotNow scan might fail to find any
- * version that it thinks is live.
+ * version that it thinks is live. XXX: Now that we have MVCC
+ * catalog access, this hazard no longer exists.
*
* NB: the returned tuple has been copied into palloc'd storage
* and must eventually be freed with heap_freetuple.
@@ -305,7 +306,7 @@ ScanPgRelation(Oid targetRelId, bool indexOK)
pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
indexOK && criticalRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
pg_class_tuple = systable_getnext(pg_class_scan);
@@ -480,7 +481,7 @@ RelationBuildTupleDesc(Relation relation)
pg_attribute_scan = systable_beginscan(pg_attribute_desc,
AttributeRelidNumIndexId,
criticalRelcachesBuilt,
- SnapshotNow,
+ NULL,
2, skey);
/*
@@ -663,7 +664,7 @@ RelationBuildRuleLock(Relation relation)
rewrite_tupdesc = RelationGetDescr(rewrite_desc);
rewrite_scan = systable_beginscan(rewrite_desc,
RewriteRelRulenameIndexId,
- true, SnapshotNow,
+ true, NULL,
1, &key);
while (HeapTupleIsValid(rewrite_tuple = systable_getnext(rewrite_scan)))
@@ -1313,7 +1314,7 @@ LookupOpclassInfo(Oid operatorClassOid,
ObjectIdGetDatum(operatorClassOid));
rel = heap_open(OperatorClassRelationId, AccessShareLock);
scan = systable_beginscan(rel, OpclassOidIndexId, indexOK,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(htup = systable_getnext(scan)))
{
@@ -1348,7 +1349,7 @@ LookupOpclassInfo(Oid operatorClassOid,
ObjectIdGetDatum(opcentry->opcintype));
rel = heap_open(AccessMethodProcedureRelationId, AccessShareLock);
scan = systable_beginscan(rel, AccessMethodProcedureIndexId, indexOK,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
while (HeapTupleIsValid(htup = systable_getnext(scan)))
{
@@ -3317,7 +3318,7 @@ AttrDefaultFetch(Relation relation)
adrel = heap_open(AttrDefaultRelationId, AccessShareLock);
adscan = systable_beginscan(adrel, AttrDefaultIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
found = 0;
while (HeapTupleIsValid(htup = systable_getnext(adscan)))
@@ -3384,7 +3385,7 @@ CheckConstraintFetch(Relation relation)
conrel = heap_open(ConstraintRelationId, AccessShareLock);
conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(htup = systable_getnext(conscan)))
{
@@ -3487,7 +3488,7 @@ RelationGetIndexList(Relation relation)
indrel = heap_open(IndexRelationId, AccessShareLock);
indscan = systable_beginscan(indrel, IndexIndrelidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(indscan)))
{
@@ -3938,7 +3939,7 @@ RelationGetExclusionInfo(Relation indexRelation,
conrel = heap_open(ConstraintRelationId, AccessShareLock);
conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
found = false;
while (HeapTupleIsValid(htup = systable_getnext(conscan)))
diff --git a/src/backend/utils/cache/ts_cache.c b/src/backend/utils/cache/ts_cache.c
index 65a8ad7..4e79247 100644
--- a/src/backend/utils/cache/ts_cache.c
+++ b/src/backend/utils/cache/ts_cache.c
@@ -484,7 +484,7 @@ lookup_ts_config_cache(Oid cfgId)
maprel = heap_open(TSConfigMapRelationId, AccessShareLock);
mapidx = index_open(TSConfigMapIndexId, AccessShareLock);
mapscan = systable_beginscan_ordered(maprel, mapidx,
- SnapshotNow, 1, &mapskey);
+ NULL, 1, &mapskey);
while ((maptup = systable_getnext_ordered(mapscan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e0abff1..2a2da12 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -111,7 +111,7 @@ GetDatabaseTuple(const char *dbname)
relation = heap_open(DatabaseRelationId, AccessShareLock);
scan = systable_beginscan(relation, DatabaseNameIndexId,
criticalSharedRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
tuple = systable_getnext(scan);
@@ -154,7 +154,7 @@ GetDatabaseTupleByOid(Oid dboid)
relation = heap_open(DatabaseRelationId, AccessShareLock);
scan = systable_beginscan(relation, DatabaseOidIndexId,
criticalSharedRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
tuple = systable_getnext(scan);
@@ -1083,7 +1083,7 @@ ThereIsAtLeastOneRole(void)
pg_authid_rel = heap_open(AuthIdRelationId, AccessShareLock);
- scan = heap_beginscan(pg_authid_rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_instant(pg_authid_rel, 0, NULL);
result = (heap_getnext(scan, ForwardScanDirection) != NULL);
heap_endscan(scan);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e739d2d..957c0c4 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -207,6 +207,19 @@ GetLatestSnapshot(void)
}
/*
+ * GetInstantSnapshot
+ * Get a snapshot that is up-to-date as of the current instant,
+ * but don't set the transaction snapshot.
+ */
+Snapshot
+GetInstantSnapshot(void)
+{
+ SecondarySnapshot = GetSnapshotData(&SecondarySnapshotData);
+
+ return SecondarySnapshot;
+}
+
+/*
* SnapshotSetCommandId
* Propagate CommandCounterIncrement into the static snapshots, if set
*/
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ec956ad..984251f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -14,13 +14,13 @@
* Note that pg_dump runs in a transaction-snapshot mode transaction,
* so it sees a consistent snapshot of the database including system
* catalogs. However, it relies in part on various specialized backend
- * functions like pg_get_indexdef(), and those things tend to run on
- * SnapshotNow time, ie they look at the currently committed state. So
- * it is possible to get 'cache lookup failed' error if someone
- * performs DDL changes while a dump is happening. The window for this
- * sort of thing is from the acquisition of the transaction snapshot to
- * getSchemaData() (when pg_dump acquires AccessShareLock on every
- * table it intends to dump). It isn't very large, but it can happen.
+ * functions like pg_get_indexdef(), and those things tend to look at
+ * the currently committed state. So it is possible to get 'cache
+ * lookup failed' error if someone performs DDL changes while a dump is
+ * happening. The window for this sort of thing is from the acquisition
+ * of the transaction snapshot to getSchemaData() (when pg_dump acquires
+ * AccessShareLock on every table it intends to dump). It isn't very large,
+ * but it can happen.
*
* http://archives.postgresql.org/pgsql-bugs/2010-02/msg00187.php
*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index baa8c50..a231fc0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -105,6 +105,8 @@ typedef struct HeapScanDescData *HeapScanDesc;
extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_instant(Relation relation, int nkeys,
+ ScanKey key);
extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 5b58028..3a86ca4 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -32,6 +32,7 @@ typedef struct HeapScanDescData
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
+ bool rs_temp_snap; /* unregister snapshot at scan end? */
/* state set up at initscan time */
BlockNumber rs_nblocks; /* number of blocks to scan */
@@ -101,6 +102,7 @@ typedef struct SysScanDescData
Relation irel; /* NULL if doing heap scan */
HeapScanDesc scan; /* only valid in heap-scan case */
IndexScanDesc iscan; /* only valid in index-scan case */
+ Snapshot snapshot; /* snapshot to unregister at end of scan */
} SysScanDescData;
#endif /* RELSCAN_H */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index bfbd8dd..7da58c8 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -26,6 +26,7 @@ extern TransactionId RecentGlobalXmin;
extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
+extern Snapshot GetInstantSnapshot(void);
extern void SnapshotSetCommandId(CommandId curcid);
extern void PushActiveSnapshot(Snapshot snapshot);
On Tue, Jun 4, 2013 at 3:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:+1.
Here's a more serious patch for MVCC catalog access. This one
involves more data copying than the last one, I think, because the
previous version did not register the snapshots it took, which I think
is not safe. So this needs to be re-tested for performance, which I
have so far made no attempt to do.It strikes me as rather unfortunate that the snapshot interface is
designed in such a way as to require so much data copying. It seems
we always take a snapshot by copying from PGXACT/PGPROC into
CurrentSnapshotData or SecondarySnapshotData, and then copying data a
second time from there to someplace more permanent. It would be nice
to avoid that, at least in common cases.
And here are more results comparing master branch with and without this
patch...
1) DDL CREATE/DROP test:
1-1) master:
250 connections:
Create: 24846.060, Drop: 30391.713
Create: 23771.394, Drop: 29769.396
500 connections:
Create: 24339.449, Drop: 30084.741
Create: 24152.176, Drop: 30643.471
1000 connections:
Create: 26007.960, Drop: 31019.918
Create: 25937.592, Drop: 30600.341
2000 connections:
Create: 26900.324, Drop: 30741.989
Create: 26910.660, Drop: 31577.247
1-2) mvcc catalogs:
250 connections:
Create: 25371.342, Drop: 31458.952
Create: 25685.094, Drop: 31492.377
500 connections:
Create: 28557.882, Drop: 33673.266
Create: 27901.910, Drop: 33223.006
1000 connections:
Create: 31910.130, Drop: 36770.062
Create: 32210.093, Drop: 36754.888
2000 connections:
Create: 40374.754, Drop: 43442.528
Create: 39763.691, Drop: 43234.243
2) backend startup
2-1) master branch:
250 connections:
real 0m8.993s
user 0m0.128s
sys 0m0.380s
500 connections:
real 0m9.004s
user 0m0.212s
sys 0m0.340s
1000 connections:
real 0m9.072s
user 0m0.272s
sys 0m0.332s
2000 connections:
real 0m9.257s
user 0m0.204s
sys 0m0.392s
2-2) MVCC catalogs:
250 connections:
real 0m9.067s
user 0m0.108s
sys 0m0.396s
500 connections:
real 0m9.034s
user 0m0.112s
sys 0m0.376s
1000 connections:
real 0m9.303s
user 0m0.176s
sys 0m0.328s
2000 connections
real 0m9.916s
user 0m0.160s
sys 0m0.428s
Except for the case of backend startup test for 500 connections that looks
to have some noise, performance degradation reaches 6% for 2000
connections, and less than 1% for 250 connections. This is better than last
time.
For the CREATE/DROP case, performance drop reaches 40% for 2000 connections
(32% during last tests). I also noticed a lower performance drop for 250
connections now (3~4%) compared to the 1st time (9%).
I compiled the main results on tables here:
http://michael.otacoo.com/postgresql-2/postgres-9-4-devel-mvcc-catalog-access-take-2-2/
The results of last time are also available here:
http://michael.otacoo.com/postgresql-2/postgres-9-4-devel-mvcc-catalog-access-2/
Regards,
--
Michael
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.
I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones. If you clean those all up it will be probably be cleaner and
better but we don't know how many such sites will need to be modified.
I'm not even sure what "clean them up" means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/05/2013 04:28 PM, Greg Stark wrote:
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones. If you clean those all up it will be probably be cleaner and
better but we don't know how many such sites will need to be modified.
I'm not even sure what "clean them up" means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?
I guess that anything that does *not* write should be happier with
MVCC catalogue, especially if there has been any DDL after its snapshot.
For writers the ability to compare MVCC and SnapshotNow snapshots
would tell if they need to take extra steps.
But undoubtedly the whole thing would be lot of work.
--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones.
But SnapshotNow doesn't allow you to do actual dirty reads? It only
gives you rows back that were actually visible when we checked. The
difference to SnapshotMVCC is that during a scan the picture of which
transactions are committed can change.
I'm not even sure what "clean them up" means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?
The places that require this should already use HeapTupleSatisfiesDirty
which is something different.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones.
But SnapshotNow doesn't allow you to do actual dirty reads?
Yeah. I believe the issue is that we can't simply do MVCC catalog reads
with a snapshot taken at transaction start time or statement start time,
as we would do if executing an MVCC scan for a user query. Rather, the
snapshot has to be recent enough to ensure we see the current definition
of any table we've just acquired lock on, *even if that's newer than the
snapshot prevailing for the user's purposes*. Otherwise we might be
using the wrong rowtype definition or failing to enforce a just-added
constraint.
The last time we talked about this, we were batting around ideas of
keeping a "current snapshot for catalog purposes", which we'd update
or at least invalidate anytime we acquired a new lock. (In principle,
if that isn't new enough, we have a race condition that we'd better fix
by adding some more locking.) Robert's results seem to say that that
might be unnecessary optimization, and that it'd be sufficient to just
take a new snap each time we need to do a catalog scan. TBH I'm not
sure I believe that; it seems to me that this approach is surely going
to create a great deal more contention from concurrent GetSnapshotData
calls. But at the very least, this says we can experiment with the
behavioral aspects without bothering to build infrastructure for
tracking an appropriate catalog snapshot.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-05 11:35:58 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones.But SnapshotNow doesn't allow you to do actual dirty reads?
Yeah. I believe the issue is that we can't simply do MVCC catalog reads
with a snapshot taken at transaction start time or statement start time,
as we would do if executing an MVCC scan for a user query. Rather, the
snapshot has to be recent enough to ensure we see the current definition
of any table we've just acquired lock on, *even if that's newer than the
snapshot prevailing for the user's purposes*. Otherwise we might be
using the wrong rowtype definition or failing to enforce a just-added
constraint.
Oh, definitely. At least Robert's previous prototype tried to do that
(although I am not sure if it went far enough). And I'd be surprised the
current one wouldn't do so.
The last time we talked about this, we were batting around ideas of
keeping a "current snapshot for catalog purposes", which we'd update
or at least invalidate anytime we acquired a new lock. (In principle,
if that isn't new enough, we have a race condition that we'd better fix
by adding some more locking.) Robert's results seem to say that that
might be unnecessary optimization, and that it'd be sufficient to just
take a new snap each time we need to do a catalog scan. TBH I'm not
sure I believe that; it seems to me that this approach is surely going
to create a great deal more contention from concurrent GetSnapshotData
calls.
I still have a hard time believing those results as well, but I think we
might have underestimated the effectiveness of the syscache during
workloads which are sufficiently concurrent to make locking in
GetSnapshotData() a problem.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 5, 2013 at 10:28 AM, Greg Stark <stark@mit.edu> wrote:
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.
With all respect, I think this is just plain wrong. SnapshotNow is
just like an up-to-date MVCC snapshot. The only difference is that an
MVCC snapshot, once established, stays fixed for the lifetime of the
scan. On the other hand, the SnapshotNow view in the world changes
the instant another transaction commits, meaning that scans can see
multiple versions of a row, or no versions of a row, where any MVCC
scan would have seen just one. There are very few places that want
that behavior.
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.
But in most other parts of the code, the changes-in-mid-scan behavior
of SnapshotNow is a huge liability. The fact that it is fully
up-to-date *as of the time the scan starts* is critical for
correctness. But the fact that it can then change during the scan is
in almost every case something that we do not want. The patch
preserves the first property while ditching the second one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.
FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot. If we don't do it like that
I think we'll have errors of omission all over the place (I had really
no confidence in your original patch because of that worry). The fact
that there are a couple of contrib modules for which there might be an
arguable advantage in doing it the old way isn't sufficient reason to
expose ourselves to bugs like that. If they really want that sort of
uncertain semantics they could use SnapshotDirty, no?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 5, 2013 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot. If we don't do it like that
I think we'll have errors of omission all over the place (I had really
no confidence in your original patch because of that worry). The fact
that there are a couple of contrib modules for which there might be an
arguable advantage in doing it the old way isn't sufficient reason to
expose ourselves to bugs like that. If they really want that sort of
uncertain semantics they could use SnapshotDirty, no?
I had the same thought, initially. I went through the exercise of
doing a grep for SnapshotNow and trying to eliminate as many
references as possible, but there were a few that I couldn't convince
myself to rip out. However, if you'd like to apply the patch and grep
for SnapshotNow and suggest what to do about the remaining cases (or
hack the patch up yourself) I think that would be great. I'd love to
see it completely gone if we can see our way clear to that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-05 18:56:28 -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot.
I suggest simply renaming it to something else. Every external project
that decides to follow the renaming either has a proper usecase for it
or the amount of sympathy for them keeping the old behaviour is rather
limited.
If they really want that sort of uncertain semantics they could use
SnapshotDirty, no?
Not that happy with that. A scan behaving inconsistently over its
proceedings is something rather different than reading uncommitted
rows. I have the feeling that trouble is waiting that way.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Robert,
Took a quick look through the patch to understand what your current
revision is actually doing and to facilitate thinking about possible
pain points.
Here are the notes I made during my reading:
On 2013-06-03 14:57:12 -0400, Robert Haas wrote:
+++ b/src/backend/catalog/catalog.c @@ -232,6 +232,10 @@ IsReservedName(const char *name) * know if it's shared. Fortunately, the set of shared relations is * fairly static, so a hand-maintained list of their OIDs isn't completely * impractical. + * + * XXX: Now that we have MVCC catalog access, the reasoning above is no longer + * true. Are there other good reasons to hard-code this, or should we revisit + * that decision? */
We could just the function by looking in the shared
relmapper. Everything that can be mapped via it is shared.
--- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD * against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe * to execute with less than full exclusive lock on the parent table; * otherwise concurrent executions of RelationGetIndexList could miss indexes. + * + * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index + * shouldn't be common enough to worry about. The above comment needs + * to be updated, and it may be possible to simplify the logic here in other + * ways also. */
You're right, the comment needs to be changed, but I don't think the
effect can. A non-inplace upgrade changes the xmin of the row which is
relevant for indcheckxmin.
(In fact, isn't this update possibly causing problems like delaying the
use of such an index already)
--- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2738,7 +2738,7 @@ AlterTableGetLockLevel(List *cmds) * multiple DDL operations occur in a stream against frequently accessed * tables. * - * 1. Catalog tables are read using SnapshotNow, which has a race bug that + * 1. Catalog tables were read using SnapshotNow, which has a race bug that
Heh.
--- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -207,6 +207,19 @@ GetLatestSnapshot(void) /* + * GetInstantSnapshot + * Get a snapshot that is up-to-date as of the current instant, + * but don't set the transaction snapshot. + */ +Snapshot +GetInstantSnapshot(void) +{ + SecondarySnapshot = GetSnapshotData(&SecondarySnapshotData); + + return SecondarySnapshot; +}
Hm. Looks like this should also change the description of SecondarySnapshot:
/*
* CurrentSnapshot points to the only snapshot taken in transaction-snapshot
* mode, and to the latest one taken in a read-committed transaction.
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
* special-purpose code (say, RI checking.)
*
and
/*
* Checking SecondarySnapshot is probably useless here, but it seems
* better to be sure.
*/
Also looks like BuildEventTriggerCache() in evtcache.c should use
GetInstanSnapshot() now.
I actually wonder if we shouldn't just abolish GetLatestSnapshot(). None
of the callers seem to rely on it's behaviour from a quick look and it
seems rather confusing to have both.
--- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -14,13 +14,13 @@ * Note that pg_dump runs in a transaction-snapshot mode transaction, * so it sees a consistent snapshot of the database including system * catalogs. However, it relies in part on various specialized backend - * functions like pg_get_indexdef(), and those things tend to run on - * SnapshotNow time, ie they look at the currently committed state. So - * it is possible to get 'cache lookup failed' error if someone - * performs DDL changes while a dump is happening. The window for this - * sort of thing is from the acquisition of the transaction snapshot to - * getSchemaData() (when pg_dump acquires AccessShareLock on every - * table it intends to dump). It isn't very large, but it can happen. + * functions like pg_get_indexdef(), and those things tend to look at + * the currently committed state. So it is possible to get 'cache + * lookup failed' error if someone performs DDL changes while a dump is + * happening. The window for this sort of thing is from the acquisition + * of the transaction snapshot to getSchemaData() (when pg_dump acquires + * AccessShareLock on every table it intends to dump). It isn't very large, + * but it can happen.
I think we need another term for what SnapshotNow used to express
here... Imo this description got less clear with this change.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 6, 2013 at 5:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
+ * XXX: Now that we have MVCC catalog access, the reasoning above is no longer + * true. Are there other good reasons to hard-code this, or should we revisit + * that decision?We could just the function by looking in the shared
relmapper. Everything that can be mapped via it is shared.
I suspect there are several possible sources for this information, but
it's hard to beat a hard-coded list for efficiency, so I wasn't sure
if we should tinker with this or not.
--- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD * against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe * to execute with less than full exclusive lock on the parent table; * otherwise concurrent executions of RelationGetIndexList could miss indexes. + * + * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index + * shouldn't be common enough to worry about. The above comment needs + * to be updated, and it may be possible to simplify the logic here in other + * ways also. */You're right, the comment needs to be changed, but I don't think the
effect can. A non-inplace upgrade changes the xmin of the row which is
relevant for indcheckxmin.
OK.
(In fact, isn't this update possibly causing problems like delaying the
use of such an index already)
Well, maybe. In general, the ephemeral snapshot taken for a catalog
scan can't be any older than the primary snapshot already held. But
there could be some corner case where that's not true, if we use this
technique somewhere that such a snapshot hasn't already been acquired.
Hm. Looks like this should also change the description of SecondarySnapshot:
/*
* CurrentSnapshot points to the only snapshot taken in transaction-snapshot
* mode, and to the latest one taken in a read-committed transaction.
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
* special-purpose code (say, RI checking.)
*
I think that's still more or less true, though we could add catalog
scans as another example.
and
/*
* Checking SecondarySnapshot is probably useless here, but it seems
* better to be sure.
*/
Yeah, that is not useless any more, for sure.
Also looks like BuildEventTriggerCache() in evtcache.c should use
GetInstanSnapshot() now.
OK.
I actually wonder if we shouldn't just abolish GetLatestSnapshot(). None
of the callers seem to rely on it's behaviour from a quick look and it
seems rather confusing to have both.
I assume Tom had some reason for making GetLatestSnapshot() behave the
way it does, so I refrained from doing that. I might be wrong.
I think we need another term for what SnapshotNow used to express
here... Imo this description got less clear with this change.
I thought it was OK but I'm open to suggestions.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/5/13 3:49 PM, Robert Haas wrote:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple.
FWIW, I've often wished for a way to make all stat access transactional, across all the stats views. Perhaps that couldn't be done by default, but I'd love something like a function that would make a "snapshot" of all stats data as of one point. Even if that snapshot itself wasn't completely atomic, at least then you could query any stats views however you wanted and know that the info wasn't changing over time.
The reason I don't think this would work so well if done in userspace is how long it would take. Presumably making a complete backend-local copy of pg_proc etc and the stats file would be orders of magnitude faster than a bunch of CREATE TEMP TABLE's.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 6, 2013 at 2:48 PM, Jim Nasby <jim@nasby.net> wrote:
On 6/5/13 3:49 PM, Robert Haas wrote:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple.FWIW, I've often wished for a way to make all stat access transactional,
across all the stats views. Perhaps that couldn't be done by default, but
I'd love something like a function that would make a "snapshot" of all stats
data as of one point. Even if that snapshot itself wasn't completely atomic,
at least then you could query any stats views however you wanted and know
that the info wasn't changing over time.The reason I don't think this would work so well if done in userspace is how
long it would take. Presumably making a complete backend-local copy of
pg_proc etc and the stats file would be orders of magnitude faster than a
bunch of CREATE TEMP TABLE's.
Well, maybe. But at any rate this is completely unrelated to the main
topic of this thread. :-)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-06 12:49:14 -0400, Robert Haas wrote:
On Thu, Jun 6, 2013 at 5:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
+ * XXX: Now that we have MVCC catalog access, the reasoning above is no longer + * true. Are there other good reasons to hard-code this, or should we revisit + * that decision?We could just the function by looking in the shared
relmapper. Everything that can be mapped via it is shared.I suspect there are several possible sources for this information, but
it's hard to beat a hard-coded list for efficiency, so I wasn't sure
if we should tinker with this or not.
I can tell from experience that it makes adding a new shared relation
more of a pain than it already is, but we're hopefully not doing that
all that often.
I just don't think that the mvcc angle has much to do with the decision.
--- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD * against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe * to execute with less than full exclusive lock on the parent table; * otherwise concurrent executions of RelationGetIndexList could miss indexes. + * + * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index + * shouldn't be common enough to worry about. The above comment needs + * to be updated, and it may be possible to simplify the logic here in other + * ways also. */You're right, the comment needs to be changed, but I don't think the
effect can. A non-inplace upgrade changes the xmin of the row which is
relevant for indcheckxmin.OK.
(In fact, isn't this update possibly causing problems like delaying the
use of such an index already)
Well, maybe. In general, the ephemeral snapshot taken for a catalog
scan can't be any older than the primary snapshot already held. But
there could be some corner case where that's not true, if we use this
technique somewhere that such a snapshot hasn't already been acquired.
I wasn't talking about catalog scans or this patch, I wonder whether the
update we do there won't cause the index not being used for concurrent
(normal) scans since now the xmin is newer while it might be far in the
past before. I.e. we might need to unset indexcheckxmin if xmin is far
enough in the past.
Hm. Looks like this should also change the description of SecondarySnapshot:
/*
* CurrentSnapshot points to the only snapshot taken in transaction-snapshot
* mode, and to the latest one taken in a read-committed transaction.
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
* special-purpose code (say, RI checking.)
*I think that's still more or less true, though we could add catalog
scans as another example.
I guess my feeling is that once catalog scans use it, it's not so much
special purpose anymore ;). But I admit that the frequency of usage
doesn't say much about its specificity...
I actually wonder if we shouldn't just abolish GetLatestSnapshot(). None
of the callers seem to rely on it's behaviour from a quick look and it
seems rather confusing to have both.I assume Tom had some reason for making GetLatestSnapshot() behave the
way it does, so I refrained from doing that. I might be wrong.
At least I don't see any on a quick look - which doesn't say very
much. I think I just dislike having *instant and *latest functions in
there, seems to be confusing to me.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-03 14:57:12 -0400, Robert Haas wrote:
On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:+1.
Here's a more serious patch for MVCC catalog access. This one
involves more data copying than the last one, I think, because the
previous version did not register the snapshots it took, which I think
is not safe. So this needs to be re-tested for performance, which I
have so far made no attempt to do.
Ok, I am starting to take a bit more serious look.
Minor issues I noticed:
* index.c:index_constraint_create()s - comments need to get updated
* index.c:IndexCheckExclusion() - why do we still use a SnapshotNow? I'd
rather not use *Now if it isn't necessary.
* the * CONCURRENTLY infrastructure should be simplified once this has
been applied, but I think it makes sense to keep that separate.
* index.c:reindex_index() - SnapshotNow comment should be updated
I still think that renaming SnapshotNow to something like
SnapshotPerTuple to force everyone to reavaluate their usage would be
good.
So, the biggest issue with the patch seems to be performance worries. I
tried to create a worst case scenario:
postgres (patched and HEAD) running with:
-c shared_buffers=4GB \
-c max_connections=2000 \
-c maintenance_work_mem=2GB \
-c checkpoint_segments=300 \
-c wal_buffers=64MB \
-c synchronous_commit=off \
-c autovacuum=off \
-p 5440
With one background pgbench running:
pgbench -p 5440 -h /tmp -f /tmp/readonly-busy.sql -c 1000 -j 10 -T 100 postgres
readonly-busy.sql:
BEGIN;
SELECT txid_current();
SELECT pg_sleep(0.0001);
COMMIT;
I measured the performance of one other pgbench:
pgbench -h /tmp -p 5440 postgres -T 10 -c 100 -j 100 -n -f /tmp/simplequery.sql -C
simplequery.sql:
SELECT * FROM af1, af2 WHERE af1.x = af2.x;
tables:
create table af1 (x) as select g from generate_series(1,4) g;
create table af2 (x) as select g from generate_series(4,7) g;
With that setup one can create quite a noticeable overhead for the mvcc
patch (best of 5):
master-optimize:
tps = 1261.629474 (including connections establishing)
tps = 15121.648834 (excluding connections establishing)
dev-optimize:
tps = 773.719637 (including connections establishing)
tps = 2804.239979 (excluding connections establishing)
Most of the time in both, patched and unpatched is by far spent in
GetSnapshotData. I think the reason this shows a far higher overhead
than what you previously measured is that a) in your test the other
backends were idle, in mine they actually modify PGXACT which causes
noticeable cacheline bouncing b) I have higher numer of connections &
#max_connections
A quick test shows that even with max_connection=600, 400 background,
and 100 foreground pgbenches there's noticeable overhead:
master-optimize:
tps = 2221.226711 (including connections establishing)
tps = 31203.259472 (excluding connections establishing)
dev-optimize:
tps = 1629.734352 (including connections establishing)
tps = 4754.449726 (excluding connections establishing)
Now I grant that's a somewhat harsh test for postgres, but I don't
think it's entirely unreasonable and the performance impact is quite
stark.
It strikes me as rather unfortunate that the snapshot interface is
designed in such a way as to require so much data copying. It seems
we always take a snapshot by copying from PGXACT/PGPROC into
CurrentSnapshotData or SecondarySnapshotData, and then copying data a
second time from there to someplace more permanent. It would be nice
to avoid that, at least in common cases.
Sounds doable. But let's do one thing at a atime ;). That copy wasn't
visible in the rather extreme workload from above btw...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 17, 2013 at 8:12 AM, Andres Freund <andres@2ndquadrant.com> wrote:
So, the biggest issue with the patch seems to be performance worries. I
tried to create a worst case scenario:
postgres (patched and HEAD) running with:
-c shared_buffers=4GB \
-c max_connections=2000 \
-c maintenance_work_mem=2GB \
-c checkpoint_segments=300 \
-c wal_buffers=64MB \
-c synchronous_commit=off \
-c autovacuum=off \
-p 5440With one background pgbench running:
pgbench -p 5440 -h /tmp -f /tmp/readonly-busy.sql -c 1000 -j 10 -T 100 postgres
readonly-busy.sql:
BEGIN;
SELECT txid_current();
SELECT pg_sleep(0.0001);
COMMIT;I measured the performance of one other pgbench:
pgbench -h /tmp -p 5440 postgres -T 10 -c 100 -j 100 -n -f /tmp/simplequery.sql -C
simplequery.sql:
SELECT * FROM af1, af2 WHERE af1.x = af2.x;
tables:
create table af1 (x) as select g from generate_series(1,4) g;
create table af2 (x) as select g from generate_series(4,7) g;With that setup one can create quite a noticeable overhead for the mvcc
patch (best of 5):master-optimize:
tps = 1261.629474 (including connections establishing)
tps = 15121.648834 (excluding connections establishing)dev-optimize:
tps = 773.719637 (including connections establishing)
tps = 2804.239979 (excluding connections establishing)Most of the time in both, patched and unpatched is by far spent in
GetSnapshotData. I think the reason this shows a far higher overhead
than what you previously measured is that a) in your test the other
backends were idle, in mine they actually modify PGXACT which causes
noticeable cacheline bouncing b) I have higher numer of connections &
#max_connectionsA quick test shows that even with max_connection=600, 400 background,
and 100 foreground pgbenches there's noticeable overhead:
master-optimize:
tps = 2221.226711 (including connections establishing)
tps = 31203.259472 (excluding connections establishing)
dev-optimize:
tps = 1629.734352 (including connections establishing)
tps = 4754.449726 (excluding connections establishing)Now I grant that's a somewhat harsh test for postgres, but I don't
think it's entirely unreasonable and the performance impact is quite
stark.
It's not entirely unreasonable, but it *is* mostly unreasonable. I
mean, nobody is going to run 1000 connections in the background that
do nothing but thrash PGXACT on a real system. I just can't get
concerned about that. What I am concerned about is that there may be
other, more realistic workloads that show similar regressions. But I
don't know how to find out whether that's actually the case. On the
IBM POWER box where I tested this, it's not even GetSnapshotData()
that kills you; it's the system CPU scheduler.
The thing about this particular test is that it's artificial -
normally, any operation that wants to modify PGXACT will spend a lot
more time fighting of WALInsertLock and maybe waiting for disk I/O
than is the case here. Of course, with Heikki's WAL scaling patch and
perhaps other optimizations we expect that other overhead to go down,
which might make the problems here more visible; and some of Heikki's
existing testing has shown significant contention around ProcArrayLock
as things stand. But I'm still on the fence about whether this is
really a valid test.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-20 09:45:26 -0400, Robert Haas wrote:
With that setup one can create quite a noticeable overhead for the mvcc
patch (best of 5):master-optimize:
tps = 1261.629474 (including connections establishing)
tps = 15121.648834 (excluding connections establishing)dev-optimize:
tps = 773.719637 (including connections establishing)
tps = 2804.239979 (excluding connections establishing)Most of the time in both, patched and unpatched is by far spent in
GetSnapshotData. I think the reason this shows a far higher overhead
than what you previously measured is that a) in your test the other
backends were idle, in mine they actually modify PGXACT which causes
noticeable cacheline bouncing b) I have higher numer of connections &
#max_connectionsA quick test shows that even with max_connection=600, 400 background,
and 100 foreground pgbenches there's noticeable overhead:
master-optimize:
tps = 2221.226711 (including connections establishing)
tps = 31203.259472 (excluding connections establishing)
dev-optimize:
tps = 1629.734352 (including connections establishing)
tps = 4754.449726 (excluding connections establishing)Now I grant that's a somewhat harsh test for postgres, but I don't
think it's entirely unreasonable and the performance impact is quite
stark.It's not entirely unreasonable, but it *is* mostly unreasonable.
Well, sure. So are the tests that you ran. But that's *completely*
fine. In contrast to evaluating whether a performance improvement is
worth its complexity we're not trying to measure real world
improvements. We're trying to test the worst cases we can think of, even
if they aren't really interesting by stressing potential pain points. If
we can't find a relevant regression for those using something akin to
microbenchmarks it's less likely that there are performance regressions.
The "not entirely unreasonable" point is just about making sure you're
not testing something entirely irrelevant. Say, performance of a 1TB
database when shared_buffers is set to 64k. Or testing DDL performance
while locking pg_class exclusively.
The test was specifically chosen to:
* do uncached syscache lookups (-C) to mesure the impact of the added
GetSnapshotData() calls
* make individual GetSnapshotData() calls slower. (all processes have an
xid)
* contend on ProcArrayLock but not much else (high number of clients in
the background)
I
mean, nobody is going to run 1000 connections in the background that
do nothing but thrash PGXACT on a real system. I just can't get
concerned about that.
In the original mail I did retry it with 400 and the regression is still
pretty big. And the "background" things could also be doing something
that's not that likely to be blocked by global locks. Say, operate on
temporary or unlogged tables. Or just acquire a single row level lock
and then continue to do readonly work in a read committed transaction.
I think we both can come up with scenarios where at least part of the
above scenario is present. But imo that doesn't really matter.
What I am concerned about is that there may be
other, more realistic workloads that show similar regressions. But I
don't know how to find out whether that's actually the case.
So, given the results from that test and the profile I got where
GetSnapshotData was by far the most expensive thing a more
representative test might be something like a readonly pgbench with a
moderately high number of short lived connections. I wouldn't be
surprised if that still showed performance problems.
If that's not enough something like:
BEGIN;
SELECT * FROM my_client WHERE client_id = :id FOR UPDATE;
SELECT * FROM key_table WHERE key = :random
...
SELECT * FROM key_table WHERE key = :random
COMMIT;
will sure still show the problem.
On the
IBM POWER box where I tested this, it's not even GetSnapshotData()
that kills you; it's the system CPU scheduler.
I haven't tried yet, but I'd guess the above setup shows the difference
with less than 400 clients. Might make it more reasonable to run there.
But I'm still on the fence about whether this is really a valid test.
I think it shows that we need to be careful and do further performance
evaluations and/or alleviate the pain by making things cheaper (say, a
"ddl counter" in shared mem, allowing to cache snapshots for the
syscache). If that artificial test hadn't shown problems I'd have voted
for just going ahead and not worry further.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 20, 2013 at 10:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
But I'm still on the fence about whether this is really a valid test.
I think it shows that we need to be careful and do further performance
evaluations and/or alleviate the pain by making things cheaper (say, a
"ddl counter" in shared mem, allowing to cache snapshots for the
syscache). If that artificial test hadn't shown problems I'd have voted
for just going ahead and not worry further.
I tried a general snapshot counter that rolls over every time any
transaction commits, but that doesn't help much. It's a small
improvement on general workloads, but it's not effective against this
kind of hammering. A DDL counter would be a bit more expensive
because we'd have to insert an additional branch into
GetSnapshotData() while ProcArrayLock is held, but it might be
tolerable. Do you have code for this (or some portion of it) already?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-20 10:58:59 -0400, Robert Haas wrote:
On Thu, Jun 20, 2013 at 10:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
But I'm still on the fence about whether this is really a valid test.
I think it shows that we need to be careful and do further performance
evaluations and/or alleviate the pain by making things cheaper (say, a
"ddl counter" in shared mem, allowing to cache snapshots for the
syscache). If that artificial test hadn't shown problems I'd have voted
for just going ahead and not worry further.I tried a general snapshot counter that rolls over every time any
transaction commits, but that doesn't help much. It's a small
improvement on general workloads, but it's not effective against this
kind of hammering. A DDL counter would be a bit more expensive
because we'd have to insert an additional branch into
GetSnapshotData() while ProcArrayLock is held, but it might be
tolerable.
I actually wasn't thinking of adding it at that level. More like just
not fetching a new snapshot in systable_* if a) the global ddl counter
hasn't been increased b) our transactions hasn't performed any
ddl. Instead use the one used the last time we fetched one for that
purpose.
The global ddl counter could be increased everytime we commit and the
commit has invalidations queued. The local one everytime we execute
local cache invalidation messages (i.e. around CommandCounterIncrement).
I think something roughly along those lines should be doable without
adding new overhead to global paths. Except maybe some check in
snapmgr.c for that new, longer living, snapshot.
Do you have code for this (or some portion of it) already?
Nope, sorry.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 20, 2013 at 11:13 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I actually wasn't thinking of adding it at that level. More like just
not fetching a new snapshot in systable_* if a) the global ddl counter
hasn't been increased b) our transactions hasn't performed any
ddl. Instead use the one used the last time we fetched one for that
purpose.
All right, so here's a patch that does something along those lines.
We have to always take a new snapshot when somebody scans a catalog
that has no syscache, because there won't be any invalidation messages
to work off of in that case. The only catalog in that category that's
accessed during backend startup (which is mostly what your awful test
case is banging on) is pg_db_role_setting. We could add a syscache
for that catalog or somehow force invalidation messages to be sent
despite the lack of a syscache, but what I chose to do instead is
refactor things slightly so that we use the same snapshot for all four
scans of pg_db_role_setting, instead of taking a new one each time. I
think that's unimpeachable on correctness grounds; it's no different
than if we'd finished all four scans in the time it took us to finish
the first one, and then gotten put to sleep by the OS scheduler for as
long as it took us to scan the other three. Point being that there's
no interlock there.
Anyway, with that change, plus the general mechanism of the patch,
each backend takes just two MVCC scans during startup. The first
catalog access takes an MVCC snapshot for obvious reasons, and then
there's one additional snapshot for the access to pg_db_role_setting
for the reasons stated above. Everything else piggybacks on those
snapshots, unless of course an invalidation intervenes.
In my testing, this did not completely eliminate the performance hit
on your test case, but it got it down to 3-4%. Best of five runs,
with max_connections=2000 and 1000 connections running
readonly-busy.sql:
(patched)
tps = 183.224651 (including connections establishing)
tps = 1091.813178 (excluding connections establishing)
(unpatched)
tps = 190.598045 (including connections establishing)
tps = 1129.422537 (excluding connections establishing)
The difference is 3-4%, which is quite a lot less than what you
measured before, although on different hardware, so results may vary.
Now, I'm not sure this fix actually helps the other test scenarios
very much; for example, it's not going to help the cases that pound on
pg_depend, because that doesn't have a system cache either. As with
pg_db_role_setting, we could optimize this mechanism by sending
invalidation messages for pg_depend changes, but I'm not sure it's
worth it.
I'm also attaching a fixed version of pg_cxn.c; the last version had a few bugs.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mvcc-catalog-access-v4.patchapplication/octet-stream; name=mvcc-catalog-access-v4.patchDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index e617f9b..1110719 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2046,7 +2046,7 @@ get_pkey_attnames(Relation rel, int16 *numatts)
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(indexRelation, IndexIndrelidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(indexTuple = systable_getnext(scan)))
{
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 17b832e..81ab972 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -727,7 +727,7 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId)
rel = heap_open(catalogId, AccessShareLock);
sscan = systable_beginscan(rel, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while (HeapTupleIsValid(tuple = systable_getnext(sscan)))
{
Form_pg_database datForm;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1531f3b..5bcbc92 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -80,7 +80,7 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync,
- bool is_bitmapscan);
+ bool is_bitmapscan, bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1286,7 +1286,17 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false);
+ true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
+{
+ Oid relid = RelationGetRelid(relation);
+ Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ true, true, false, true);
}
HeapScanDesc
@@ -1295,7 +1305,7 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false);
+ allow_strat, allow_sync, false, false);
}
HeapScanDesc
@@ -1303,14 +1313,14 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true);
+ false, false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync,
- bool is_bitmapscan)
+ bool is_bitmapscan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1335,6 +1345,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
+ scan->rs_temp_snap = temp_snap;
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
@@ -1421,6 +1432,9 @@ heap_endscan(HeapScanDesc scan)
if (scan->rs_strategy != NULL)
FreeAccessStrategy(scan->rs_strategy);
+ if (scan->rs_temp_snap)
+ UnregisterSnapshot(scan->rs_snapshot);
+
pfree(scan);
}
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 31a419b..2bfe78a 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -28,6 +28,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/tqual.h"
@@ -231,7 +232,7 @@ BuildIndexValueDescription(Relation indexRelation,
* rel: catalog to scan, already opened and suitably locked
* indexId: OID of index to conditionally use
* indexOK: if false, forces a heap scan (see notes below)
- * snapshot: time qual to use (usually should be SnapshotNow)
+ * snapshot: time qual to use (NULL for a recent catalog snapshot)
* nkeys, key: scan keys
*
* The attribute numbers in the scan key should be set for the heap case.
@@ -266,6 +267,19 @@ systable_beginscan(Relation heapRelation,
sysscan->heap_rel = heapRelation;
sysscan->irel = irel;
+ if (snapshot == NULL)
+ {
+ Oid relid = RelationGetRelid(heapRelation);
+
+ snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+ sysscan->snapshot = snapshot;
+ }
+ else
+ {
+ /* Caller is responsible for any snapshot. */
+ sysscan->snapshot = NULL;
+ }
+
if (irel)
{
int i;
@@ -401,6 +415,9 @@ systable_endscan(SysScanDesc sysscan)
else
heap_endscan(sysscan->scan);
+ if (sysscan->snapshot)
+ UnregisterSnapshot(sysscan->snapshot);
+
pfree(sysscan);
}
@@ -444,6 +461,19 @@ systable_beginscan_ordered(Relation heapRelation,
sysscan->heap_rel = heapRelation;
sysscan->irel = indexRelation;
+ if (snapshot == NULL)
+ {
+ Oid relid = RelationGetRelid(heapRelation);
+
+ snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+ sysscan->snapshot = snapshot;
+ }
+ else
+ {
+ /* Caller is responsible for any snapshot. */
+ sysscan->snapshot = NULL;
+ }
+
/* Change attribute numbers to be index column numbers. */
for (i = 0; i < nkeys; i++)
{
@@ -494,5 +524,7 @@ systable_endscan_ordered(SysScanDesc sysscan)
{
Assert(sysscan->irel);
index_endscan(sysscan->iscan);
+ if (sysscan->snapshot)
+ UnregisterSnapshot(sysscan->snapshot);
pfree(sysscan);
}
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index fcf1a95..d7853c0 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -142,8 +142,7 @@ guarantees that VACUUM can't delete any heap tuple that an indexscanning
process might be about to visit. (This guarantee works only for simple
indexscans that visit the heap in sync with the index scan, not for bitmap
scans. We only need the guarantee when using non-MVCC snapshot rules such
-as SnapshotNow, so in practice this is only important for system catalog
-accesses.)
+as SnapshotNow.)
Because a page can be split even while someone holds a pin on it, it is
possible that an indexscan will return items that are no longer stored on
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 8905596..d23dc45 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -611,7 +611,7 @@ boot_openrel(char *relname)
{
/* We can now load the pg_type data */
rel = heap_open(TypeRelationId, NoLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
i = 0;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
++i;
@@ -620,7 +620,7 @@ boot_openrel(char *relname)
while (i-- > 0)
*app++ = ALLOC(struct typmap, 1);
*app = NULL;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
app = Typ;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -918,7 +918,7 @@ gettype(char *type)
}
elog(DEBUG4, "external type: %s", type);
rel = heap_open(TypeRelationId, NoLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
i = 0;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
++i;
@@ -927,7 +927,7 @@ gettype(char *type)
while (i-- > 0)
*app++ = ALLOC(struct typmap, 1);
*app = NULL;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
app = Typ;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index ced66b1..e0dcf05 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -788,7 +788,7 @@ objectsInSchemaToOids(GrantObjectType objtype, List *nspnames)
ObjectIdGetDatum(namespaceId));
rel = heap_open(ProcedureRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 1, key);
+ scan = heap_beginscan_catalog(rel, 1, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -833,7 +833,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
CharGetDatum(relkind));
rel = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 2, key);
+ scan = heap_beginscan_catalog(rel, 2, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -1332,7 +1332,7 @@ RemoveRoleFromObjectACL(Oid roleid, Oid classid, Oid objid)
ObjectIdGetDatum(objid));
scan = systable_beginscan(rel, DefaultAclOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
@@ -1452,7 +1452,7 @@ RemoveDefaultACLById(Oid defaclOid)
ObjectIdGetDatum(defaclOid));
scan = systable_beginscan(rel, DefaultAclOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
@@ -2705,7 +2705,7 @@ ExecGrant_Largeobject(InternalGrant *istmt)
scan = systable_beginscan(relation,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -3468,7 +3468,7 @@ pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnum, Oid roleid,
return pg_language_aclmask(table_oid, roleid, mask, how);
case ACL_KIND_LARGEOBJECT:
return pg_largeobject_aclmask_snapshot(table_oid, roleid,
- mask, how, SnapshotNow);
+ mask, how, NULL);
case ACL_KIND_NAMESPACE:
return pg_namespace_aclmask(table_oid, roleid, mask, how);
case ACL_KIND_TABLESPACE:
@@ -3856,10 +3856,13 @@ pg_language_aclmask(Oid lang_oid, Oid roleid,
* Exported routine for examining a user's privileges for a largeobject
*
* When a large object is opened for reading, it is opened relative to the
- * caller's snapshot, but when it is opened for writing, it is always relative
- * to SnapshotNow, as documented in doc/src/sgml/lobj.sgml. This function
- * takes a snapshot argument so that the permissions check can be made relative
- * to the same snapshot that will be used to read the underlying data.
+ * caller's snapshot, but when it is opened for writing, a current
+ * MVCC snapshot will be used. See doc/src/sgml/lobj.sgml. This function
+ * takes a snapshot argument so that the permissions check can be made
+ * relative to the same snapshot that will be used to read the underlying
+ * data. The caller will actually pass NULL for an instantaneous MVCC
+ * snapshot, since all we do with the snapshot argument is pass it through
+ * to systable_beginscan().
*/
AclMode
pg_largeobject_aclmask_snapshot(Oid lobj_oid, Oid roleid,
@@ -4644,7 +4647,7 @@ pg_language_ownercheck(Oid lan_oid, Oid roleid)
* Ownership check for a largeobject (specified by OID)
*
* This is only used for operations like ALTER LARGE OBJECT that are always
- * relative to SnapshotNow.
+ * relative to an up-to-date snapshot.
*/
bool
pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
@@ -4670,7 +4673,7 @@ pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
scan = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -5032,7 +5035,7 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
scan = systable_beginscan(pg_extension,
ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 41a5da0..1378488 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -232,6 +232,10 @@ IsReservedName(const char *name)
* know if it's shared. Fortunately, the set of shared relations is
* fairly static, so a hand-maintained list of their OIDs isn't completely
* impractical.
+ *
+ * XXX: Now that we have MVCC catalog access, the reasoning above is no longer
+ * true. Are there other good reasons to hard-code this, or should we revisit
+ * that decision?
*/
bool
IsSharedRelation(Oid relationId)
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 69171f8..fe17c96 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -558,7 +558,7 @@ findDependentObjects(const ObjectAddress *object,
nkeys = 2;
scan = systable_beginscan(*depRel, DependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -733,7 +733,7 @@ findDependentObjects(const ObjectAddress *object,
nkeys = 2;
scan = systable_beginscan(*depRel, DependReferenceIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1069,7 +1069,7 @@ deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags)
nkeys = 2;
scan = systable_beginscan(*depRel, DependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..4fd42ed 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1386,7 +1386,7 @@ RelationRemoveInheritance(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
simple_heap_delete(catalogRelation, &tuple->t_self);
@@ -1450,7 +1450,7 @@ DeleteAttributeTuples(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(attrel, AttributeRelidNumIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/* Delete all the matching tuples */
while ((atttup = systable_getnext(scan)) != NULL)
@@ -1491,7 +1491,7 @@ DeleteSystemAttributeTuples(Oid relid)
Int16GetDatum(0));
scan = systable_beginscan(attrel, AttributeRelidNumIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/* Delete all the matching tuples */
while ((atttup = systable_getnext(scan)) != NULL)
@@ -1623,7 +1623,7 @@ RemoveAttrDefault(Oid relid, AttrNumber attnum,
Int16GetDatum(attnum));
scan = systable_beginscan(attrdef_rel, AttrDefaultIndexId, true,
- SnapshotNow, 2, scankeys);
+ NULL, 2, scankeys);
/* There should be at most one matching tuple, but we loop anyway */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -1677,7 +1677,7 @@ RemoveAttrDefaultById(Oid attrdefId)
ObjectIdGetDatum(attrdefId));
scan = systable_beginscan(attrdef_rel, AttrDefaultOidIndexId, true,
- SnapshotNow, 1, scankeys);
+ NULL, 1, scankeys);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -2374,7 +2374,7 @@ MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr,
ObjectIdGetDatum(RelationGetNamespace(rel)));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -2640,7 +2640,7 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
}
scan = systable_beginscan(pgstatistic, StatisticRelidAttnumInhIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
/* we must loop even when attnum != 0, in case of inherited stats */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -2885,7 +2885,7 @@ heap_truncate_find_FKs(List *relationIds)
fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
fkeyScan = systable_beginscan(fkeyRel, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while (HeapTupleIsValid(tuple = systable_getnext(fkeyScan)))
{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..f24f093 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1845,7 +1845,7 @@ index_update_stats(Relation rel,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
- pg_class_scan = heap_beginscan(pg_class, SnapshotNow, 1, key);
+ pg_class_scan = heap_beginscan_catalog(pg_class, 1, key);
tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
tuple = heap_copytuple(tuple);
heap_endscan(pg_class_scan);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 23943ff..4434dd6 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4013,8 +4013,8 @@ fetch_search_path_array(Oid *sarray, int sarray_len)
* a nonexistent object OID, rather than failing. This is to avoid race
* condition errors when a query that's scanning a catalog using an MVCC
* snapshot uses one of these functions. The underlying IsVisible functions
- * operate on SnapshotNow semantics and so might see the object as already
- * gone when it's still visible to the MVCC snapshot. (There is no race
+ * always use an up-to-date snapshot and so might see the object as already
+ * gone when it's still visible to the transaction snapshot. (There is no race
* condition in the current coding because we don't accept sinval messages
* between the SearchSysCacheExists test and the subsequent lookup.)
*/
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 215eaf5..4d22f3a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1481,7 +1481,7 @@ get_catalog_object_by_oid(Relation catalog, Oid objectId)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(catalog, oidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
{
@@ -1544,7 +1544,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(castDesc, CastOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -1644,7 +1644,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
adscan = systable_beginscan(attrdefDesc, AttrDefaultOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(adscan);
@@ -1750,7 +1750,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amopDesc, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -1800,7 +1800,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amprocDesc, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -1848,7 +1848,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(ruleDesc, RewriteOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -1883,7 +1883,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
tgscan = systable_beginscan(trigDesc, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(tgscan);
@@ -2064,7 +2064,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(defaclrel, DefaultAclOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -2816,7 +2816,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
adscan = systable_beginscan(attrdefDesc, AttrDefaultOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(adscan);
@@ -2921,7 +2921,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amopDesc, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -2965,7 +2965,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amprocDesc, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -3218,7 +3218,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(defaclrel, DefaultAclOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(rcscan);
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index dd00502..99f4be5 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -166,7 +166,7 @@ RemoveCollationById(Oid collationOid)
ObjectIdGetDatum(collationOid));
scandesc = systable_beginscan(rel, CollationOidIndexId, true,
- SnapshotNow, 1, &scanKeyData);
+ NULL, 1, &scanKeyData);
tuple = systable_getnext(scandesc);
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index a8eb4cb..5021420 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -412,7 +412,7 @@ ConstraintNameIsUsed(ConstraintCategory conCat, Oid objId,
ObjectIdGetDatum(objNamespace));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -506,7 +506,7 @@ ChooseConstraintName(const char *name1, const char *name2,
ObjectIdGetDatum(namespaceid));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
found = (HeapTupleIsValid(systable_getnext(conscan)));
@@ -699,7 +699,7 @@ AlterConstraintNamespaces(Oid ownerId, Oid oldNspId,
ObjectIdGetDatum(ownerId));
scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
}
else
{
@@ -709,7 +709,7 @@ AlterConstraintNamespaces(Oid ownerId, Oid oldNspId,
ObjectIdGetDatum(ownerId));
scan = systable_beginscan(conRel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
}
while (HeapTupleIsValid((tup = systable_getnext(scan))))
@@ -778,7 +778,7 @@ get_relation_constraint_oid(Oid relid, const char *conname, bool missing_ok)
ObjectIdGetDatum(relid));
scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -836,7 +836,7 @@ get_domain_constraint_oid(Oid typid, const char *conname, bool missing_ok)
ObjectIdGetDatum(typid));
scan = systable_beginscan(pg_constraint, ConstraintTypidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -903,7 +903,7 @@ check_functional_grouping(Oid relid,
ObjectIdGetDatum(relid));
scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index 45d8e62..08b2a99 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -166,8 +166,7 @@ RemoveConversionById(Oid conversionOid)
/* open pg_conversion */
rel = heap_open(ConversionRelationId, RowExclusiveLock);
- scan = heap_beginscan(rel, SnapshotNow,
- 1, &scanKeyData);
+ scan = heap_beginscan_catalog(rel, 1, &scanKeyData);
/* search for the target tuple */
if (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
diff --git a/src/backend/catalog/pg_db_role_setting.c b/src/backend/catalog/pg_db_role_setting.c
index 4594912..6e19736 100644
--- a/src/backend/catalog/pg_db_role_setting.c
+++ b/src/backend/catalog/pg_db_role_setting.c
@@ -43,7 +43,7 @@ AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(roleid));
scan = systable_beginscan(rel, DbRoleSettingDatidRolidIndexId, true,
- SnapshotNow, 2, scankey);
+ NULL, 2, scankey);
tuple = systable_getnext(scan);
/*
@@ -205,7 +205,7 @@ DropSetting(Oid databaseid, Oid roleid)
numkeys++;
}
- scan = heap_beginscan(relsetting, SnapshotNow, numkeys, keys);
+ scan = heap_beginscan_catalog(relsetting, numkeys, keys);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
simple_heap_delete(relsetting, &tup->t_self);
@@ -226,7 +226,8 @@ DropSetting(Oid databaseid, Oid roleid)
* databaseid/roleid.
*/
void
-ApplySetting(Oid databaseid, Oid roleid, Relation relsetting, GucSource source)
+ApplySetting(Snapshot snapshot, Oid databaseid, Oid roleid,
+ Relation relsetting, GucSource source)
{
SysScanDesc scan;
ScanKeyData keys[2];
@@ -244,7 +245,7 @@ ApplySetting(Oid databaseid, Oid roleid, Relation relsetting, GucSource source)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(relsetting, DbRoleSettingDatidRolidIndexId, true,
- SnapshotNow, 2, keys);
+ snapshot, 2, keys);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
bool isnull;
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 9535fba..bd5cd99 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -211,7 +211,7 @@ deleteDependencyRecordsFor(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -261,7 +261,7 @@ deleteDependencyRecordsForClass(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -343,7 +343,7 @@ changeDependencyFor(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -407,7 +407,7 @@ isObjectPinned(const ObjectAddress *object, Relation rel)
ObjectIdGetDatum(object->objectId));
scan = systable_beginscan(rel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/*
* Since we won't generate additional pg_depend entries for pinned
@@ -467,7 +467,7 @@ getExtensionOfObject(Oid classId, Oid objectId)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -520,7 +520,7 @@ sequenceIsOwned(Oid seqId, Oid *tableId, int32 *colId)
ObjectIdGetDatum(seqId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -580,7 +580,7 @@ getOwnedSequences(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -643,7 +643,7 @@ get_constraint_index(Oid constraintId)
Int32GetDatum(0));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -701,7 +701,7 @@ get_index_constraint(Oid indexId)
Int32GetDatum(0));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index 7e746f9..a7ef8cd 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -156,7 +156,7 @@ EnumValuesDelete(Oid enumTypeOid)
ObjectIdGetDatum(enumTypeOid));
scan = systable_beginscan(pg_enum, EnumTypIdLabelIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -483,6 +483,9 @@ restart:
* (for example, enum_in and enum_out do so). The worst that can happen
* is a transient failure to find any valid value of the row. This is
* judged acceptable in view of the infrequency of use of RenumberEnumType.
+ *
+ * XXX. Now that we have MVCC catalog scans, the above reasoning is no longer
+ * correct. Should we revisit any decisions here?
*/
static void
RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems)
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index fbfe7bc..638e535 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -81,7 +81,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
ObjectIdGetDatum(parentrelId));
scan = systable_beginscan(relation, InheritsParentIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while ((inheritsTuple = systable_getnext(scan)) != NULL)
{
@@ -325,7 +325,7 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
ObjectIdGetDatum(this_relid));
inhscan = systable_beginscan(inhrel, InheritsRelidSeqnoIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while ((inhtup = systable_getnext(inhscan)) != NULL)
{
diff --git a/src/backend/catalog/pg_largeobject.c b/src/backend/catalog/pg_largeobject.c
index d01a5a7..22d499d 100644
--- a/src/backend/catalog/pg_largeobject.c
+++ b/src/backend/catalog/pg_largeobject.c
@@ -104,7 +104,7 @@ LargeObjectDrop(Oid loid)
scan = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -126,7 +126,7 @@ LargeObjectDrop(Oid loid)
scan = systable_beginscan(pg_largeobject,
LargeObjectLOidPNIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
simple_heap_delete(pg_largeobject, &tuple->t_self);
@@ -145,11 +145,11 @@ LargeObjectDrop(Oid loid)
* We don't use the system cache for large object metadata, for fear of
* using too much local memory.
*
- * This function always scans the system catalog using SnapshotNow, so it
- * should not be used when a large object is opened in read-only mode (because
- * large objects opened in read only mode are supposed to be viewed relative
- * to the caller's snapshot, whereas in read-write mode they are relative to
- * SnapshotNow).
+ * This function always scans the system catalog using an up-to-date snapshot,
+ * so it should not be used when a large object is opened in read-only mode
+ * (because large objects opened in read only mode are supposed to be viewed
+ * relative to the caller's snapshot, whereas in read-write mode they are
+ * relative to a current snapshot).
*/
bool
LargeObjectExists(Oid loid)
@@ -170,7 +170,7 @@ LargeObjectExists(Oid loid)
sd = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(sd);
if (HeapTupleIsValid(tuple))
diff --git a/src/backend/catalog/pg_range.c b/src/backend/catalog/pg_range.c
index 639b40c..b782f90 100644
--- a/src/backend/catalog/pg_range.c
+++ b/src/backend/catalog/pg_range.c
@@ -126,7 +126,7 @@ RangeDelete(Oid rangeTypeOid)
ObjectIdGetDatum(rangeTypeOid));
scan = systable_beginscan(pg_range, RangeTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 7de4420..dc21c10 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -220,7 +220,7 @@ shdepChangeDep(Relation sdepRel,
Int32GetDatum(objsubid));
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 4, key);
+ NULL, 4, key);
while ((scantup = systable_getnext(scan)) != NULL)
{
@@ -554,7 +554,7 @@ checkSharedDependencies(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -729,7 +729,7 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
ObjectIdGetDatum(templateDbId));
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/* Set up to copy the tuples except for inserting newDbId */
memset(values, 0, sizeof(values));
@@ -792,7 +792,7 @@ dropDatabaseDependencies(Oid databaseId)
/* We leave the other index fields unspecified */
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -936,7 +936,7 @@ shdepDropDependency(Relation sdepRel,
}
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1125,7 +1125,7 @@ isSharedObjectPinned(Oid classId, Oid objectId, Relation sdepRel)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/*
* Since we won't generate additional pg_shdepend entries for pinned
@@ -1212,7 +1212,7 @@ shdepDropOwned(List *roleids, DropBehavior behavior)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while ((tuple = systable_getnext(scan)) != NULL)
{
@@ -1319,7 +1319,7 @@ shdepReassignOwned(List *roleids, Oid newrole)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while ((tuple = systable_getnext(scan)) != NULL)
{
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..f23730c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
* against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe
* to execute with less than full exclusive lock on the parent table;
* otherwise concurrent executions of RelationGetIndexList could miss indexes.
+ *
+ * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index
+ * shouldn't be common enough to worry about. The above comment needs
+ * to be updated, and it may be possible to simplify the logic here in other
+ * ways also.
*/
void
mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
@@ -1583,7 +1588,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
Anum_pg_index_indisclustered,
BTEqualStrategyNumber, F_BOOLEQ,
BoolGetDatum(true));
- scan = heap_beginscan(indRelation, SnapshotNow, 1, &entry);
+ scan = heap_beginscan_catalog(indRelation, 1, &entry);
while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
index = (Form_pg_index) GETSTRUCT(indexTuple);
diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c
index 60db27c..8baf017 100644
--- a/src/backend/commands/comment.c
+++ b/src/backend/commands/comment.c
@@ -187,7 +187,7 @@ CreateComments(Oid oid, Oid classoid, int32 subid, char *comment)
description = heap_open(DescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
{
@@ -281,7 +281,7 @@ CreateSharedComments(Oid oid, Oid classoid, char *comment)
shdescription = heap_open(SharedDescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(shdescription, SharedDescriptionObjIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
{
@@ -363,7 +363,7 @@ DeleteComments(Oid oid, Oid classoid, int32 subid)
description = heap_open(DescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, nkeys, skey);
+ NULL, nkeys, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
simple_heap_delete(description, &oldtuple->t_self);
@@ -399,7 +399,7 @@ DeleteSharedComments(Oid oid, Oid classoid)
shdescription = heap_open(SharedDescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(shdescription, SharedDescriptionObjIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
simple_heap_delete(shdescription, &oldtuple->t_self);
@@ -442,7 +442,7 @@ GetComment(Oid oid, Oid classoid, int32 subid)
tupdesc = RelationGetDescr(description);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
comment = NULL;
while ((tuple = systable_getnext(sd)) != NULL)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 0e10a75..a3a150d 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -133,7 +133,6 @@ createdb(const CreatedbStmt *stmt)
int notherbackends;
int npreparedxacts;
createdb_failure_params fparms;
- Snapshot snapshot;
/* Extract options from the statement node tree */
foreach(option, stmt->options)
@@ -538,29 +537,6 @@ createdb(const CreatedbStmt *stmt)
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
/*
- * Take an MVCC snapshot to use while scanning through pg_tablespace. For
- * safety, register the snapshot (this prevents it from changing if
- * something else were to request a snapshot during the loop).
- *
- * Traversing pg_tablespace with an MVCC snapshot is necessary to provide
- * us with a consistent view of the tablespaces that exist. Using
- * SnapshotNow here would risk seeing the same tablespace multiple times,
- * or worse not seeing a tablespace at all, if its tuple is moved around
- * by a concurrent update (eg an ACL change).
- *
- * Inconsistency of this sort is inherent to all SnapshotNow scans, unless
- * some lock is held to prevent concurrent updates of the rows being
- * sought. There should be a generic fix for that, but in the meantime
- * it's worth fixing this case in particular because we are doing very
- * heavyweight operations within the scan, so that the elapsed time for
- * the scan is vastly longer than for most other catalog scans. That
- * means there's a much wider window for concurrent updates to cause
- * trouble here than anywhere else. XXX this code should be changed
- * whenever a generic fix is implemented.
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
-
- /*
* Once we start copying subdirectories, we need to be able to clean 'em
* up if we fail. Use an ENSURE block to make sure this happens. (This
* is not a 100% solution, because of the possibility of failure during
@@ -577,7 +553,7 @@ createdb(const CreatedbStmt *stmt)
* each one to the new database.
*/
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid srctablespace = HeapTupleGetOid(tuple);
@@ -682,9 +658,6 @@ createdb(const CreatedbStmt *stmt)
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
PointerGetDatum(&fparms));
- /* Free our snapshot */
- UnregisterSnapshot(snapshot);
-
return dboid;
}
@@ -1214,7 +1187,7 @@ movedb(const char *dbname, const char *tblspcname)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(dbname));
sysscan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
oldtuple = systable_getnext(sysscan);
if (!HeapTupleIsValid(oldtuple)) /* shouldn't happen... */
ereport(ERROR,
@@ -1403,7 +1376,7 @@ AlterDatabase(AlterDatabaseStmt *stmt, bool isTopLevel)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(stmt->dbname));
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
@@ -1498,7 +1471,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(dbname));
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
@@ -1637,7 +1610,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
NameGetDatum(name));
scan = systable_beginscan(relation, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scanKey);
+ NULL, 1, &scanKey);
tuple = systable_getnext(scan);
@@ -1751,20 +1724,9 @@ remove_dbtablespaces(Oid db_id)
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
- Snapshot snapshot;
-
- /*
- * As in createdb(), we'd better use an MVCC snapshot here, since this
- * scan can run for a long time. Duplicate visits to tablespaces would be
- * harmless, but missing a tablespace could result in permanently leaked
- * files.
- *
- * XXX change this when a generic fix for SnapshotNow races is implemented
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1810,7 +1772,6 @@ remove_dbtablespaces(Oid db_id)
heap_endscan(scan);
heap_close(rel, AccessShareLock);
- UnregisterSnapshot(snapshot);
}
/*
@@ -1832,19 +1793,9 @@ check_db_file_conflict(Oid db_id)
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
- Snapshot snapshot;
-
- /*
- * As in createdb(), we'd better use an MVCC snapshot here; missing a
- * tablespace could result in falsely reporting the OID is unique, with
- * disastrous future consequences per the comment above.
- *
- * XXX change this when a generic fix for SnapshotNow races is implemented
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1870,7 +1821,6 @@ check_db_file_conflict(Oid db_id)
heap_endscan(scan);
heap_close(rel, AccessShareLock);
- UnregisterSnapshot(snapshot);
return result;
}
@@ -1927,7 +1877,7 @@ get_database_oid(const char *dbname, bool missing_ok)
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(dbname));
scan = systable_beginscan(pg_database, DatabaseNameIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
dbtuple = systable_getnext(scan);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 08e8cad..798c92a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -128,7 +128,7 @@ get_extension_oid(const char *extname, bool missing_ok)
CStringGetDatum(extname));
scandesc = systable_beginscan(rel, ExtensionNameIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -173,7 +173,7 @@ get_extension_name(Oid ext_oid)
ObjectIdGetDatum(ext_oid));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -212,7 +212,7 @@ get_extension_schema(Oid ext_oid)
ObjectIdGetDatum(ext_oid));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -1609,7 +1609,7 @@ RemoveExtensionById(Oid extId)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(extId));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -2107,7 +2107,7 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
ObjectIdGetDatum(CurrentExtensionObject));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2256,7 +2256,7 @@ extension_config_remove(Oid extensionoid, Oid tableoid)
ObjectIdGetDatum(extensionoid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2464,7 +2464,7 @@ AlterExtensionNamespace(List *names, const char *newschema)
ObjectIdGetDatum(extensionOid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2512,7 +2512,7 @@ AlterExtensionNamespace(List *names, const char *newschema)
ObjectIdGetDatum(extensionOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -2622,7 +2622,7 @@ ExecAlterExtensionStmt(AlterExtensionStmt *stmt)
CStringGetDatum(stmt->extname));
extScan = systable_beginscan(extRel, ExtensionNameIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2772,7 +2772,7 @@ ApplyExtensionUpdates(Oid extensionOid,
ObjectIdGetDatum(extensionOid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index c776758..0a9facf 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1607,7 +1607,7 @@ DropCastById(Oid castOid)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(castOid));
scan = systable_beginscan(relation, CastOidIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7ea90d0..9d9745e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1358,7 +1358,7 @@ GetDefaultOpClass(Oid type_id, Oid am_id)
ObjectIdGetDatum(am_id));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1838,7 +1838,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
* indirectly by reindex_relation).
*/
relationRelation = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(relationRelation, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(relationRelation, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index f2d78ef..3140b37 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -614,7 +614,7 @@ DefineOpClass(CreateOpClassStmt *stmt)
ObjectIdGetDatum(amoid));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1622,7 +1622,7 @@ RemoveAmOpEntryById(Oid entryOid)
rel = heap_open(AccessMethodOperatorRelationId, RowExclusiveLock);
scan = systable_beginscan(rel, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
/* we expect exactly one match */
tup = systable_getnext(scan);
@@ -1651,7 +1651,7 @@ RemoveAmProcEntryById(Oid entryOid)
rel = heap_open(AccessMethodProcedureRelationId, RowExclusiveLock);
scan = systable_beginscan(rel, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
/* we expect exactly one match */
tup = systable_getnext(scan);
diff --git a/src/backend/commands/proclang.c b/src/backend/commands/proclang.c
index 6e4c682..b7be1f7 100644
--- a/src/backend/commands/proclang.c
+++ b/src/backend/commands/proclang.c
@@ -455,7 +455,7 @@ find_language_template(const char *languageName)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(languageName));
scan = systable_beginscan(rel, PLTemplateNameIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
tup = systable_getnext(scan);
if (HeapTupleIsValid(tup))
diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
index 3b27ac2..7466e66 100644
--- a/src/backend/commands/seclabel.c
+++ b/src/backend/commands/seclabel.c
@@ -167,7 +167,7 @@ GetSharedSecurityLabel(const ObjectAddress *object, const char *provider)
pg_shseclabel = heap_open(SharedSecLabelRelationId, AccessShareLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 3, keys);
+ NULL, 3, keys);
tuple = systable_getnext(scan);
if (HeapTupleIsValid(tuple))
@@ -224,7 +224,7 @@ GetSecurityLabel(const ObjectAddress *object, const char *provider)
pg_seclabel = heap_open(SecLabelRelationId, AccessShareLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, 4, keys);
+ NULL, 4, keys);
tuple = systable_getnext(scan);
if (HeapTupleIsValid(tuple))
@@ -284,7 +284,7 @@ SetSharedSecurityLabel(const ObjectAddress *object,
pg_shseclabel = heap_open(SharedSecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 3, keys);
+ NULL, 3, keys);
oldtup = systable_getnext(scan);
if (HeapTupleIsValid(oldtup))
@@ -375,7 +375,7 @@ SetSecurityLabel(const ObjectAddress *object,
pg_seclabel = heap_open(SecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, 4, keys);
+ NULL, 4, keys);
oldtup = systable_getnext(scan);
if (HeapTupleIsValid(oldtup))
@@ -434,7 +434,7 @@ DeleteSharedSecurityLabel(Oid objectId, Oid classId)
pg_shseclabel = heap_open(SharedSecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(oldtup = systable_getnext(scan)))
simple_heap_delete(pg_shseclabel, &oldtup->t_self);
systable_endscan(scan);
@@ -485,7 +485,7 @@ DeleteSecurityLabel(const ObjectAddress *object)
pg_seclabel = heap_open(SecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, nkeys, skey);
+ NULL, nkeys, skey);
while (HeapTupleIsValid(oldtup = systable_getnext(scan)))
simple_heap_delete(pg_seclabel, &oldtup->t_self);
systable_endscan(scan);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8294b29..ecf2aad 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2738,7 +2738,7 @@ AlterTableGetLockLevel(List *cmds)
* multiple DDL operations occur in a stream against frequently accessed
* tables.
*
- * 1. Catalog tables are read using SnapshotNow, which has a race bug that
+ * 1. Catalog tables were read using SnapshotNow, which has a race bug that
* allows a scan to return no valid rows even when one is present in the
* case of a commit of a concurrent update of the catalog table.
* SnapshotNow also ignores transactions in progress, so takes the latest
@@ -3741,6 +3741,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
MemoryContext oldCxt;
List *dropped_attrs = NIL;
ListCell *lc;
+ Snapshot snapshot;
if (newrel)
ereport(DEBUG1,
@@ -3793,7 +3794,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
* Scan through the rows, generating a new row if needed and then
* checking all the constraints.
*/
- scan = heap_beginscan(oldrel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(oldrel, snapshot, 0, NULL);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -3894,6 +3896,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
MemoryContextSwitchTo(oldCxt);
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
ExecDropSingleTupleTableSlot(oldslot);
ExecDropSingleTupleTableSlot(newslot);
@@ -4170,7 +4173,7 @@ find_composite_type_dependencies(Oid typeOid, Relation origRelation,
ObjectIdGetDatum(typeOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -4269,7 +4272,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(typeOid));
- scan = heap_beginscan(classRel, SnapshotNow, 1, key);
+ scan = heap_beginscan_catalog(classRel, 1, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -6202,7 +6205,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -6683,6 +6686,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
TupleTableSlot *slot;
Form_pg_constraint constrForm;
bool isnull;
+ Snapshot snapshot;
constrForm = (Form_pg_constraint) GETSTRUCT(constrtup);
@@ -6708,7 +6712,8 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
slot = MakeSingleTupleTableSlot(tupdesc);
econtext->ecxt_scantuple = slot;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(rel, snapshot, 0, NULL);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -6732,6 +6737,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
MemoryContextSwitchTo(oldcxt);
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
ExecDropSingleTupleTableSlot(slot);
FreeExecutorState(estate);
}
@@ -6752,6 +6758,7 @@ validateForeignKeyConstraint(char *conname,
HeapScanDesc scan;
HeapTuple tuple;
Trigger trig;
+ Snapshot snapshot;
ereport(DEBUG1,
(errmsg("validating foreign key constraint \"%s\"", conname)));
@@ -6783,7 +6790,8 @@ validateForeignKeyConstraint(char *conname,
* if that tuple had just been inserted. If any of those fail, it should
* ereport(ERROR) and that's that.
*/
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(rel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -6815,6 +6823,7 @@ validateForeignKeyConstraint(char *conname,
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
}
static void
@@ -7033,7 +7042,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -7114,7 +7123,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(childrelid));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
/* scan for matching tuple - there should only be one */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -7514,7 +7523,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
Int32GetDatum((int32) attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
@@ -7699,7 +7708,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
Int32GetDatum((int32) attnum));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
@@ -8376,7 +8385,7 @@ change_owner_fix_column_acls(Oid relationOid, Oid oldOwnerId, Oid newOwnerId)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relationOid));
scan = systable_beginscan(attRelation, AttributeRelidNumIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(attributeTuple = systable_getnext(scan)))
{
Form_pg_attribute att = (Form_pg_attribute) GETSTRUCT(attributeTuple);
@@ -8453,7 +8462,7 @@ change_owner_recurse_to_sequences(Oid relationOid, Oid newOwnerId, LOCKMODE lock
/* we leave refobjsubid unspecified */
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -9047,7 +9056,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(child_rel)));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
/* inhseqno sequences start at 1 */
inhseqno = 0;
@@ -9289,7 +9298,7 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(parent_rel)));
parent_scan = systable_beginscan(catalog_relation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &parent_key);
+ true, NULL, 1, &parent_key);
while (HeapTupleIsValid(parent_tuple = systable_getnext(parent_scan)))
{
@@ -9312,7 +9321,7 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(child_rel)));
child_scan = systable_beginscan(catalog_relation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &child_key);
+ true, NULL, 1, &child_key);
while (HeapTupleIsValid(child_tuple = systable_getnext(child_scan)))
{
@@ -9420,7 +9429,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(inheritsTuple = systable_getnext(scan)))
{
@@ -9454,7 +9463,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, AttributeRelidNumIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(attributeTuple = systable_getnext(scan)))
{
Form_pg_attribute att = (Form_pg_attribute) GETSTRUCT(attributeTuple);
@@ -9496,7 +9505,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(parent_rel)));
scan = systable_beginscan(catalogRelation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
connames = NIL;
@@ -9516,7 +9525,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(constraintTuple = systable_getnext(scan)))
{
@@ -9608,7 +9617,7 @@ drop_parent_dependency(Oid relid, Oid refclassid, Oid refobjid)
Int32GetDatum(0));
scan = systable_beginscan(catalogRelation, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTuple = systable_getnext(scan)))
{
@@ -9663,7 +9672,7 @@ ATExecAddOf(Relation rel, const TypeName *ofTypename, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
scan = systable_beginscan(inheritsRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
if (HeapTupleIsValid(systable_getnext(scan)))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -10119,7 +10128,7 @@ AlterSeqNamespaces(Relation classRel, Relation rel,
/* we leave refobjsubid unspecified */
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 8589512..ba9cb1f 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -400,7 +400,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
if (!HeapTupleIsValid(tuple))
@@ -831,7 +831,7 @@ RenameTableSpace(const char *oldname, const char *newname)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(oldname));
- scan = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scan = heap_beginscan_catalog(rel, 1, entry);
tup = heap_getnext(scan, ForwardScanDirection);
if (!HeapTupleIsValid(tup))
ereport(ERROR,
@@ -861,7 +861,7 @@ RenameTableSpace(const char *oldname, const char *newname)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(newname));
- scan = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scan = heap_beginscan_catalog(rel, 1, entry);
tup = heap_getnext(scan, ForwardScanDirection);
if (HeapTupleIsValid(tup))
ereport(ERROR,
@@ -910,7 +910,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(stmt->tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tup = heap_getnext(scandesc, ForwardScanDirection);
if (!HeapTupleIsValid(tup))
ereport(ERROR,
@@ -1311,7 +1311,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
/* We assume that there can be at most one matching tuple */
@@ -1357,7 +1357,7 @@ get_tablespace_name(Oid spc_oid)
ObjectIdAttributeNumber,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(spc_oid));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
/* We assume that there can be at most one matching tuple */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ed65bab..d86e9ad 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -492,7 +492,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
{
Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(tuple);
@@ -1048,7 +1048,7 @@ RemoveTriggerById(Oid trigOid)
ObjectIdGetDatum(trigOid));
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(tgscan);
if (!HeapTupleIsValid(tup))
@@ -1127,7 +1127,7 @@ get_trigger_oid(Oid relid, const char *trigname, bool missing_ok)
CStringGetDatum(trigname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
tup = systable_getnext(tgscan);
@@ -1242,7 +1242,7 @@ renametrig(RenameStmt *stmt)
BTEqualStrategyNumber, F_NAMEEQ,
PointerGetDatum(stmt->newname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
@@ -1262,7 +1262,7 @@ renametrig(RenameStmt *stmt)
BTEqualStrategyNumber, F_NAMEEQ,
PointerGetDatum(stmt->subname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
{
tgoid = HeapTupleGetOid(tuple);
@@ -1359,7 +1359,7 @@ EnableDisableTrigger(Relation rel, const char *tgname,
nkeys = 1;
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, nkeys, keys);
+ NULL, nkeys, keys);
found = changed = false;
@@ -1468,7 +1468,7 @@ RelationBuildTriggers(Relation relation)
tgrel = heap_open(TriggerRelationId, AccessShareLock);
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
@@ -4270,7 +4270,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
ObjectIdGetDatum(namespaceId));
conscan = systable_beginscan(conrel, ConstraintNameNspIndexId,
- true, SnapshotNow, 2, skey);
+ true, NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -4333,7 +4333,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
ObjectIdGetDatum(conoid));
tgscan = systable_beginscan(tgrel, TriggerConstraintIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 57b69f8..61ebc2e 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -921,7 +921,7 @@ makeConfigurationDependencies(HeapTuple tuple, bool removeOld,
ObjectIdGetDatum(myself.objectId));
scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1059,7 +1059,7 @@ DefineTSConfiguration(List *names, List *parameters)
ObjectIdGetDatum(sourceOid));
scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1138,7 +1138,7 @@ RemoveTSConfigurationById(Oid cfgId)
ObjectIdGetDatum(cfgId));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -1294,7 +1294,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
Int32GetDatum(tokens[i]));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1333,7 +1333,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
ObjectIdGetDatum(cfgId));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1450,7 +1450,7 @@ DropConfigurationMapping(AlterTSConfigurationStmt *stmt,
Int32GetDatum(tokens[i]));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 6bc16f1..031433d 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -71,6 +71,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/tqual.h"
@@ -2256,9 +2257,11 @@ AlterDomainNotNull(List *names, bool notNull)
TupleDesc tupdesc = RelationGetDescr(testrel);
HeapScanDesc scan;
HeapTuple tuple;
+ Snapshot snapshot;
/* Scan all tuples in this relation */
- scan = heap_beginscan(testrel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(testrel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
int i;
@@ -2288,6 +2291,7 @@ AlterDomainNotNull(List *names, bool notNull)
}
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
/* Close each rel after processing, but keep lock */
heap_close(testrel, NoLock);
@@ -2356,7 +2360,7 @@ AlterDomainDropConstraint(List *names, const char *constrName,
ObjectIdGetDatum(HeapTupleGetOid(tup)));
conscan = systable_beginscan(conrel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/*
* Scan over the result set, removing any matching entries.
@@ -2551,7 +2555,7 @@ AlterDomainValidateConstraint(List *names, char *constrName)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(domainoid));
scan = systable_beginscan(conrel, ConstraintTypidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -2638,9 +2642,11 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
TupleDesc tupdesc = RelationGetDescr(testrel);
HeapScanDesc scan;
HeapTuple tuple;
+ Snapshot snapshot;
/* Scan all tuples in this relation */
- scan = heap_beginscan(testrel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(testrel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
int i;
@@ -2684,6 +2690,7 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
ResetExprContext(econtext);
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
/* Hold relation lock till commit (XXX bad for concurrency) */
heap_close(testrel, NoLock);
@@ -2751,7 +2758,7 @@ get_rels_with_domain(Oid domainOid, LOCKMODE lockmode)
ObjectIdGetDatum(domainOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -3066,7 +3073,7 @@ GetDomainConstraints(Oid typeOid)
ObjectIdGetDatum(typeOid));
scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(conTup = systable_getnext(scan)))
{
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 844f25c..e101a86 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1006,7 +1006,7 @@ DropRole(DropRoleStmt *stmt)
ObjectIdGetDatum(roleid));
sscan = systable_beginscan(pg_auth_members_rel, AuthMemRoleMemIndexId,
- true, SnapshotNow, 1, &scankey);
+ true, NULL, 1, &scankey);
while (HeapTupleIsValid(tmp_tuple = systable_getnext(sscan)))
{
@@ -1021,7 +1021,7 @@ DropRole(DropRoleStmt *stmt)
ObjectIdGetDatum(roleid));
sscan = systable_beginscan(pg_auth_members_rel, AuthMemMemRoleIndexId,
- true, SnapshotNow, 1, &scankey);
+ true, NULL, 1, &scankey);
while (HeapTupleIsValid(tmp_tuple = systable_getnext(sscan)))
{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 641c740..68fc9c6 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -351,7 +351,7 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
pgclass = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(pgclass, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(pgclass, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -735,7 +735,7 @@ vac_update_datfrozenxid(void)
relation = heap_open(RelationRelationId, AccessShareLock);
scan = systable_beginscan(relation, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while ((classTup = systable_getnext(scan)) != NULL)
{
@@ -852,7 +852,7 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId frozenMulti)
*/
relation = heap_open(DatabaseRelationId, AccessShareLock);
- scan = heap_beginscan(relation, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(relation, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index cd88061..5b9f348 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1855,7 +1855,7 @@ get_database_list(void)
(void) GetTransactionSnapshot();
rel = heap_open(DatabaseRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
@@ -2002,7 +2002,7 @@ do_autovacuum(void)
* wide tables there might be proportionally much more activity in the
* TOAST table than in its parent.
*/
- relScan = heap_beginscan(classRel, SnapshotNow, 0, NULL);
+ relScan = heap_beginscan_catalog(classRel, 0, NULL);
/*
* On the first pass, we collect main tables to vacuum, and also the main
@@ -2120,7 +2120,7 @@ do_autovacuum(void)
BTEqualStrategyNumber, F_CHAREQ,
CharGetDatum(RELKIND_TOASTVALUE));
- relScan = heap_beginscan(classRel, SnapshotNow, 1, &key);
+ relScan = heap_beginscan_catalog(classRel, 1, &key);
while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
{
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ac20dff..e539bac 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -59,6 +59,7 @@
#include "utils/memutils.h"
#include "utils/ps_status.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
@@ -1097,6 +1098,7 @@ pgstat_collect_oids(Oid catalogid)
Relation rel;
HeapScanDesc scan;
HeapTuple tup;
+ Snapshot snapshot;
memset(&hash_ctl, 0, sizeof(hash_ctl));
hash_ctl.keysize = sizeof(Oid);
@@ -1109,7 +1111,8 @@ pgstat_collect_oids(Oid catalogid)
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
rel = heap_open(catalogid, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(rel, snapshot, 0, NULL);
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid thisoid = HeapTupleGetOid(tup);
@@ -1119,6 +1122,7 @@ pgstat_collect_oids(Oid catalogid)
(void) hash_search(htab, (void *) &thisoid, HASH_ENTER, NULL);
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
heap_close(rel, AccessShareLock);
return htab;
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..3157aba 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -38,6 +38,7 @@
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/tqual.h"
@@ -418,14 +419,17 @@ DefineQueryRewrite(char *rulename,
event_relation->rd_rel->relkind != RELKIND_MATVIEW)
{
HeapScanDesc scanDesc;
+ Snapshot snapshot;
- scanDesc = heap_beginscan(event_relation, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scanDesc = heap_beginscan(event_relation, snapshot, 0, NULL);
if (heap_getnext(scanDesc, ForwardScanDirection) != NULL)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("could not convert table \"%s\" to a view because it is not empty",
RelationGetRelationName(event_relation))));
heap_endscan(scanDesc);
+ UnregisterSnapshot(snapshot);
if (event_relation->rd_rel->relhastriggers)
ereport(ERROR,
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index a467588..d4b9708 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2092,8 +2092,8 @@ relation_is_updatable(Oid reloid, bool include_triggers)
/*
* If the relation doesn't exist, return zero rather than throwing an
* error. This is helpful since scanning an information_schema view under
- * MVCC rules can result in referencing rels that were just deleted
- * according to a SnapshotNow probe.
+ * MVCC rules can result in referencing rels that have actually been
+ * deleted already.
*/
if (rel == NULL)
return 0;
diff --git a/src/backend/rewrite/rewriteRemove.c b/src/backend/rewrite/rewriteRemove.c
index 75fc776..51e27cf 100644
--- a/src/backend/rewrite/rewriteRemove.c
+++ b/src/backend/rewrite/rewriteRemove.c
@@ -58,7 +58,7 @@ RemoveRewriteRuleById(Oid ruleOid)
ObjectIdGetDatum(ruleOid));
rcscan = systable_beginscan(RewriteRelation, RewriteOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(rcscan);
diff --git a/src/backend/rewrite/rewriteSupport.c b/src/backend/rewrite/rewriteSupport.c
index f481c53..a687342 100644
--- a/src/backend/rewrite/rewriteSupport.c
+++ b/src/backend/rewrite/rewriteSupport.c
@@ -143,7 +143,7 @@ get_rewrite_oid_without_relid(const char *rulename,
CStringGetDatum(rulename));
RewriteRelation = heap_open(RewriteRelationId, AccessShareLock);
- scanDesc = heap_beginscan(RewriteRelation, SnapshotNow, 1, &scanKeyData);
+ scanDesc = heap_beginscan_catalog(RewriteRelation, 1, &scanKeyData);
htup = heap_getnext(scanDesc, ForwardScanDirection);
if (!HeapTupleIsValid(htup))
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index b98110c..fb91571 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -250,7 +250,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
if (flags & INV_WRITE)
{
- retval->snapshot = SnapshotNow;
+ retval->snapshot = NULL; /* instantaneous MVCC snapshot */
retval->flags = IFS_WRLOCK | IFS_RDLOCK;
}
else if (flags & INV_READ)
@@ -270,7 +270,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
errmsg("invalid flags for opening a large object: %d",
flags)));
- /* Can't use LargeObjectExists here because it always uses SnapshotNow */
+ /* Can't use LargeObjectExists here because we need to specify snapshot */
if (!myLargeObjectExists(lobjId, retval->snapshot))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -288,9 +288,8 @@ inv_close(LargeObjectDesc *obj_desc)
{
Assert(PointerIsValid(obj_desc));
- if (obj_desc->snapshot != SnapshotNow)
- UnregisterSnapshotFromOwner(obj_desc->snapshot,
- TopTransactionResourceOwner);
+ UnregisterSnapshotFromOwner(obj_desc->snapshot,
+ TopTransactionResourceOwner);
pfree(obj_desc);
}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..5ddeffe 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -697,7 +697,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
* That leads to a couple of choices. We work from the pg_class row alone
* rather than actually opening each relation, for efficiency. We don't
* fail if we can't find the relation --- some rows might be visible in
- * the query's MVCC snapshot but already dead according to SnapshotNow.
+ * the query's MVCC snapshot even though the relations have been dropped.
* (Note: we could avoid using the catcache, but there's little point
* because the relation mapper also works "in the now".) We also don't
* fail if the relation doesn't have storage. In all these cases it
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 0d1ff61..fa61f5a 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -104,7 +104,7 @@ regprocin(PG_FUNCTION_ARGS)
hdesc = heap_open(ProcedureRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, ProcedureNameArgsNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
{
@@ -472,7 +472,7 @@ regoperin(PG_FUNCTION_ARGS)
hdesc = heap_open(OperatorRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, OperatorNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
{
@@ -843,7 +843,7 @@ regclassin(PG_FUNCTION_ARGS)
hdesc = heap_open(RelationRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, ClassNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
result = HeapTupleGetOid(tuple);
@@ -1007,7 +1007,7 @@ regtypein(PG_FUNCTION_ARGS)
hdesc = heap_open(TypeRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, TypeNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
result = HeapTupleGetOid(tuple);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index a1ed781..cf9ce3f 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -704,7 +704,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
ObjectIdGetDatum(trigid));
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
ht_trig = systable_getnext(tgscan);
@@ -1796,7 +1796,7 @@ pg_get_serial_sequence(PG_FUNCTION_ARGS)
Int32GetDatum(attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index cc91406..d12da76 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1182,7 +1182,7 @@ SearchCatCache(CatCache *cache,
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
- SnapshotNow,
+ NULL,
cache->cc_nkeys,
cur_skey);
@@ -1461,7 +1461,7 @@ SearchCatCacheList(CatCache *cache,
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
- SnapshotNow,
+ NULL,
nkeys,
cur_skey);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index e0dc126..04b5c41 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -9,8 +9,8 @@
* consider that it is *still valid* so long as we are in the same command,
* ie, until the next CommandCounterIncrement() or transaction commit.
* (See utils/time/tqual.c, and note that system catalogs are generally
- * scanned under SnapshotNow rules by the system, or plain user snapshots
- * for user queries.) At the command boundary, the old tuple stops
+ * scanned under the most current snapshot available, rather than the
+ * transaction snapshot.) At the command boundary, the old tuple stops
* being valid and the new version, if any, becomes valid. Therefore,
* we cannot simply flush a tuple from the system caches during heap_update()
* or heap_delete(). The tuple is still good at that point; what's more,
@@ -106,6 +106,7 @@
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/relmapper.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -478,6 +479,8 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
static void
LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
+ InvalidateCatalogSnapshot();
+
if (msg->id >= 0)
{
if (msg->cc.dbId == MyDatabaseId || msg->cc.dbId == InvalidOid)
@@ -552,6 +555,7 @@ InvalidateSystemCaches(void)
{
int i;
+ InvalidateCatalogSnapshot();
ResetCatalogCaches();
RelationCacheInvalidate(); /* gets smgr and relmap too */
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index f114038..5a2e755 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -266,7 +266,8 @@ static void unlink_initfile(const char *initfilename);
* tuple matching targetRelId. The caller must hold at least
* AccessShareLock on the target relid to prevent concurrent-update
* scenarios --- else our SnapshotNow scan might fail to find any
- * version that it thinks is live.
+ * version that it thinks is live. XXX: Now that we have MVCC
+ * catalog access, this hazard no longer exists.
*
* NB: the returned tuple has been copied into palloc'd storage
* and must eventually be freed with heap_freetuple.
@@ -305,7 +306,7 @@ ScanPgRelation(Oid targetRelId, bool indexOK)
pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
indexOK && criticalRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
pg_class_tuple = systable_getnext(pg_class_scan);
@@ -480,7 +481,7 @@ RelationBuildTupleDesc(Relation relation)
pg_attribute_scan = systable_beginscan(pg_attribute_desc,
AttributeRelidNumIndexId,
criticalRelcachesBuilt,
- SnapshotNow,
+ NULL,
2, skey);
/*
@@ -663,7 +664,7 @@ RelationBuildRuleLock(Relation relation)
rewrite_tupdesc = RelationGetDescr(rewrite_desc);
rewrite_scan = systable_beginscan(rewrite_desc,
RewriteRelRulenameIndexId,
- true, SnapshotNow,
+ true, NULL,
1, &key);
while (HeapTupleIsValid(rewrite_tuple = systable_getnext(rewrite_scan)))
@@ -1313,7 +1314,7 @@ LookupOpclassInfo(Oid operatorClassOid,
ObjectIdGetDatum(operatorClassOid));
rel = heap_open(OperatorClassRelationId, AccessShareLock);
scan = systable_beginscan(rel, OpclassOidIndexId, indexOK,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(htup = systable_getnext(scan)))
{
@@ -1348,7 +1349,7 @@ LookupOpclassInfo(Oid operatorClassOid,
ObjectIdGetDatum(opcentry->opcintype));
rel = heap_open(AccessMethodProcedureRelationId, AccessShareLock);
scan = systable_beginscan(rel, AccessMethodProcedureIndexId, indexOK,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
while (HeapTupleIsValid(htup = systable_getnext(scan)))
{
@@ -3317,7 +3318,7 @@ AttrDefaultFetch(Relation relation)
adrel = heap_open(AttrDefaultRelationId, AccessShareLock);
adscan = systable_beginscan(adrel, AttrDefaultIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
found = 0;
while (HeapTupleIsValid(htup = systable_getnext(adscan)))
@@ -3384,7 +3385,7 @@ CheckConstraintFetch(Relation relation)
conrel = heap_open(ConstraintRelationId, AccessShareLock);
conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(htup = systable_getnext(conscan)))
{
@@ -3487,7 +3488,7 @@ RelationGetIndexList(Relation relation)
indrel = heap_open(IndexRelationId, AccessShareLock);
indscan = systable_beginscan(indrel, IndexIndrelidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(indscan)))
{
@@ -3938,7 +3939,7 @@ RelationGetExclusionInfo(Relation indexRelation,
conrel = heap_open(ConstraintRelationId, AccessShareLock);
conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
found = false;
while (HeapTupleIsValid(htup = systable_getnext(conscan)))
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index ecb0f96..7d74ca6 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -796,6 +796,10 @@ static CatCache *SysCache[
static int SysCacheSize = lengthof(cacheinfo);
static bool CacheInitialized = false;
+static Oid SysCacheRelationOid[lengthof(cacheinfo)];
+static int SysCacheRelationOidSize;
+
+static int oid_compare(const void *a, const void *b);
/*
* InitCatalogCache - initialize the caches
@@ -809,6 +813,8 @@ void
InitCatalogCache(void)
{
int cacheId;
+ int i,
+ j = 0;
Assert(!CacheInitialized);
@@ -825,11 +831,21 @@ InitCatalogCache(void)
if (!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "could not initialize cache %u (%d)",
cacheinfo[cacheId].reloid, cacheId);
+ SysCacheRelationOid[SysCacheRelationOidSize++] =
+ cacheinfo[cacheId].reloid;
}
+
+ /* Sort and dedup OIDs. */
+ pg_qsort(SysCacheRelationOid, SysCacheRelationOidSize,
+ sizeof(Oid), oid_compare);
+ for (i = 1; i < SysCacheRelationOidSize; ++i)
+ if (SysCacheRelationOid[i] != SysCacheRelationOid[j])
+ SysCacheRelationOid[++j] = SysCacheRelationOid[i];
+ SysCacheRelationOidSize = j + 1;
+
CacheInitialized = true;
}
-
/*
* InitCatalogCachePhase2 - finish initializing the caches
*
@@ -1113,3 +1129,42 @@ SearchSysCacheList(int cacheId, int nkeys,
return SearchCatCacheList(SysCache[cacheId], nkeys,
key1, key2, key3, key4);
}
+
+/*
+ * Test whether a relation has a system cache.
+ */
+bool
+RelationHasSysCache(Oid relid)
+{
+ int low = 0,
+ high = SysCacheRelationOidSize - 1;
+
+ while (low <= high)
+ {
+ int middle = low + (high - low) / 2;
+
+ if (SysCacheRelationOid[middle] == relid)
+ return true;
+ if (SysCacheRelationOid[middle] < relid)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return false;
+}
+
+
+/*
+ * OID comparator for pg_qsort
+ */
+static int
+oid_compare(const void *a, const void *b)
+{
+ Oid oa = *((Oid *) a);
+ Oid ob = *((Oid *) b);
+
+ if (oa == ob)
+ return 0;
+ return (oa > ob) ? 1 : -1;
+}
diff --git a/src/backend/utils/cache/ts_cache.c b/src/backend/utils/cache/ts_cache.c
index 65a8ad7..4e79247 100644
--- a/src/backend/utils/cache/ts_cache.c
+++ b/src/backend/utils/cache/ts_cache.c
@@ -484,7 +484,7 @@ lookup_ts_config_cache(Oid cfgId)
maprel = heap_open(TSConfigMapRelationId, AccessShareLock);
mapidx = index_open(TSConfigMapIndexId, AccessShareLock);
mapscan = systable_beginscan_ordered(maprel, mapidx,
- SnapshotNow, 1, &mapskey);
+ NULL, 1, &mapskey);
while ((maptup = systable_getnext_ordered(mapscan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e0ea2e9..127f927 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -111,7 +111,7 @@ GetDatabaseTuple(const char *dbname)
relation = heap_open(DatabaseRelationId, AccessShareLock);
scan = systable_beginscan(relation, DatabaseNameIndexId,
criticalSharedRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
tuple = systable_getnext(scan);
@@ -154,7 +154,7 @@ GetDatabaseTupleByOid(Oid dboid)
relation = heap_open(DatabaseRelationId, AccessShareLock);
scan = systable_beginscan(relation, DatabaseOidIndexId,
criticalSharedRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
tuple = systable_getnext(scan);
@@ -997,18 +997,23 @@ static void
process_settings(Oid databaseid, Oid roleid)
{
Relation relsetting;
+ Snapshot snapshot;
if (!IsUnderPostmaster)
return;
relsetting = heap_open(DbRoleSettingRelationId, AccessShareLock);
+ /* read all the settings under the same snapsot for efficiency */
+ snapshot = RegisterSnapshot(GetCatalogSnapshot(DbRoleSettingRelationId));
+
/* Later settings are ignored if set earlier. */
- ApplySetting(databaseid, roleid, relsetting, PGC_S_DATABASE_USER);
- ApplySetting(InvalidOid, roleid, relsetting, PGC_S_USER);
- ApplySetting(databaseid, InvalidOid, relsetting, PGC_S_DATABASE);
- ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_GLOBAL);
+ ApplySetting(snapshot, databaseid, roleid, relsetting, PGC_S_DATABASE_USER);
+ ApplySetting(snapshot, InvalidOid, roleid, relsetting, PGC_S_USER);
+ ApplySetting(snapshot, databaseid, InvalidOid, relsetting, PGC_S_DATABASE);
+ ApplySetting(snapshot, InvalidOid, InvalidOid, relsetting, PGC_S_GLOBAL);
+ UnregisterSnapshot(snapshot);
heap_close(relsetting, AccessShareLock);
}
@@ -1078,7 +1083,7 @@ ThereIsAtLeastOneRole(void)
pg_authid_rel = heap_open(AuthIdRelationId, AccessShareLock);
- scan = heap_beginscan(pg_authid_rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(pg_authid_rel, 0, NULL);
result = (heap_getnext(scan, ForwardScanDirection) != NULL);
heap_endscan(scan);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e739d2d..906e9dc 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -46,10 +46,12 @@
#include "storage/predicate.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "storage/sinval.h"
#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/resowner_private.h"
#include "utils/snapmgr.h"
+#include "utils/syscache.h"
#include "utils/tqual.h"
@@ -58,17 +60,29 @@
* mode, and to the latest one taken in a read-committed transaction.
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
- * special-purpose code (say, RI checking.)
+ * special-purpose code (say, RI checking.) CatalogSnapshot points to an
+ * MVCC snapshot intended to be used for catalog scans; we must refresh it
+ * whenever a system catalog change occurs.
*
* These SnapshotData structs are static to simplify memory allocation
* (see the hack in GetSnapshotData to avoid repeated malloc/free).
*/
static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};
static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};
+static SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
/* Pointers to valid snapshots */
static Snapshot CurrentSnapshot = NULL;
static Snapshot SecondarySnapshot = NULL;
+static Snapshot CatalogSnapshot = NULL;
+
+/*
+ * Staleness detection for CatalogSnapshot. We force a refresh whenever
+ * we've processed any invalidation messages, or any time the stale flag
+ * is set.
+ */
+static bool CatalogSnapshotStale = true;
+static uint64 CatalogSnapshotInvalidCounter;
/*
* These are updated by GetSnapshotData. We initialize them this way
@@ -177,6 +191,9 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+ /* Don't allow catalog snapshot to be older than xact snapshot. */
+ CatalogSnapshotStale = true;
+
FirstSnapshotSet = true;
return CurrentSnapshot;
}
@@ -184,6 +201,9 @@ GetTransactionSnapshot(void)
if (IsolationUsesXactSnapshot())
return CurrentSnapshot;
+ /* Don't allow catalog snapshot to be older than xact snapshot. */
+ CatalogSnapshotStale = true;
+
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
return CurrentSnapshot;
@@ -201,12 +221,66 @@ GetLatestSnapshot(void)
if (!FirstSnapshotSet)
return GetTransactionSnapshot();
+ /* Don't allow catalog snapshot to be older than secondary snapshot. */
+ CatalogSnapshotStale = true;
+
SecondarySnapshot = GetSnapshotData(&SecondarySnapshotData);
return SecondarySnapshot;
}
/*
+ * GetCatalogSnapshot
+ * Get a snapshot that is sufficiently up-to-date for scan of the
+ * system catalog with the specified OID.
+ */
+Snapshot
+GetCatalogSnapshot(Oid relid)
+{
+ /*
+ * If the caller is trying to scan a relation that does not have an
+ * associated syscache, we need to refresh the snapshot regardless of
+ * whether it appears to be stale, because sinval messages aren't sent
+ * for such relations.
+ */
+ if (!CatalogSnapshotStale && !RelationHasSysCache(relid))
+ CatalogSnapshotStale = true;
+
+ if (CatalogSnapshotStale)
+ {
+ /*
+ * Remember invalidation counter first, in case the value somehow
+ * changes while we're updating the snapshot.
+ */
+ CatalogSnapshotInvalidCounter = SharedInvalidMessageCounter;
+
+ /* Get new snapshot. */
+ CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData);
+
+ /*
+ * Mark new snapshost as valid. We must do this last, in case an
+ * ERROR occurs inside GetSnapshotData().
+ */
+ CatalogSnapshotStale = false;
+ }
+
+ return CatalogSnapshot;
+}
+
+/*
+ * Mark the current catalog snapshot as invalid. We could change this API
+ * to allow the caller to provide more fine-grained invalidation details, so
+ * that a change to relation A wouldn't prevent us from using our cached
+ * snapshot to scan relation B, but so far there's no evidence that the CPU
+ * cycles we spent tracking such fine details would be well-spent.
+ */
+void
+InvalidateCatalogSnapshot()
+{
+ CatalogSnapshotStale = true;
+}
+
+/*
* SnapshotSetCommandId
* Propagate CommandCounterIncrement into the static snapshots, if set
*/
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index becc82b..9ee9ea2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -14,13 +14,13 @@
* Note that pg_dump runs in a transaction-snapshot mode transaction,
* so it sees a consistent snapshot of the database including system
* catalogs. However, it relies in part on various specialized backend
- * functions like pg_get_indexdef(), and those things tend to run on
- * SnapshotNow time, ie they look at the currently committed state. So
- * it is possible to get 'cache lookup failed' error if someone
- * performs DDL changes while a dump is happening. The window for this
- * sort of thing is from the acquisition of the transaction snapshot to
- * getSchemaData() (when pg_dump acquires AccessShareLock on every
- * table it intends to dump). It isn't very large, but it can happen.
+ * functions like pg_get_indexdef(), and those things tend to look at
+ * the currently committed state. So it is possible to get 'cache
+ * lookup failed' error if someone performs DDL changes while a dump is
+ * happening. The window for this sort of thing is from the acquisition
+ * of the transaction snapshot to getSchemaData() (when pg_dump acquires
+ * AccessShareLock on every table it intends to dump). It isn't very large,
+ * but it can happen.
*
* http://archives.postgresql.org/pgsql-bugs/2010-02/msg00187.php
*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index baa8c50..0d40398 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -105,6 +105,8 @@ typedef struct HeapScanDescData *HeapScanDesc;
extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_catalog(Relation relation, int nkeys,
+ ScanKey key);
extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 5b58028..3a86ca4 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -32,6 +32,7 @@ typedef struct HeapScanDescData
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
+ bool rs_temp_snap; /* unregister snapshot at scan end? */
/* state set up at initscan time */
BlockNumber rs_nblocks; /* number of blocks to scan */
@@ -101,6 +102,7 @@ typedef struct SysScanDescData
Relation irel; /* NULL if doing heap scan */
HeapScanDesc scan; /* only valid in heap-scan case */
IndexScanDesc iscan; /* only valid in index-scan case */
+ Snapshot snapshot; /* snapshot to unregister at end of scan */
} SysScanDescData;
#endif /* RELSCAN_H */
diff --git a/src/include/catalog/pg_db_role_setting.h b/src/include/catalog/pg_db_role_setting.h
index 070cbc8..649f5c4 100644
--- a/src/include/catalog/pg_db_role_setting.h
+++ b/src/include/catalog/pg_db_role_setting.h
@@ -62,7 +62,7 @@ typedef FormData_pg_db_role_setting *Form_pg_db_role_setting;
*/
extern void AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt);
extern void DropSetting(Oid databaseid, Oid roleid);
-extern void ApplySetting(Oid databaseid, Oid roleid, Relation relsetting,
- GucSource source);
+extern void ApplySetting(Snapshot snapshot, Oid databaseid, Oid roleid,
+ Relation relsetting, GucSource source);
#endif /* PG_DB_ROLE_SETTING_H */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index bfbd8dd..81a286c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -28,6 +28,9 @@ extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
extern void SnapshotSetCommandId(CommandId curcid);
+extern Snapshot GetCatalogSnapshot(Oid relid);
+extern void InvalidateCatalogSnapshot(void);
+
extern void PushActiveSnapshot(Snapshot snapshot);
extern void PushCopiedSnapshot(Snapshot snapshot);
extern void UpdateActiveSnapshotCommandId(void);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d1d8abe..68f1e11 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -125,6 +125,8 @@ struct catclist;
extern struct catclist *SearchSysCacheList(int cacheId, int nkeys,
Datum key1, Datum key2, Datum key3, Datum key4);
+extern bool RelationHasSysCache(Oid);
+
/*
* The use of the macros below rather than direct calls to the corresponding
* functions is encouraged, as it insulates the caller from changes in the
Robert Haas escribi�:
All right, so here's a patch that does something along those lines.
We have to always take a new snapshot when somebody scans a catalog
that has no syscache, because there won't be any invalidation messages
to work off of in that case. The only catalog in that category that's
accessed during backend startup (which is mostly what your awful test
case is banging on) is pg_db_role_setting. We could add a syscache
for that catalog or somehow force invalidation messages to be sent
despite the lack of a syscache, but what I chose to do instead is
refactor things slightly so that we use the same snapshot for all four
scans of pg_db_role_setting, instead of taking a new one each time. I
think that's unimpeachable on correctness grounds; it's no different
than if we'd finished all four scans in the time it took us to finish
the first one, and then gotten put to sleep by the OS scheduler for as
long as it took us to scan the other three. Point being that there's
no interlock there.
That seems perfectly acceptable to me, yeah.
The difference is 3-4%, which is quite a lot less than what you
measured before, although on different hardware, so results may vary.
3-4% on that synthetic benchmark sounds pretty acceptable to me, as
well.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 28, 2013 at 12:22 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
The difference is 3-4%, which is quite a lot less than what you
measured before, although on different hardware, so results may vary.3-4% on that synthetic benchmark sounds pretty acceptable to me, as
well.
Here's a further update of this patch. In this version, I added some
mechanism to send a new kind of sinval message that is sent when a
catalog without catcaches is updated; it doesn't apply to all
catalogs, just to whichever ones we want to have this treatment. That
means we don't need to retake snapshots for those catalogs on every
access, so backend startup requires just one extra MVCC snapshot as
compared with current master. Assorted cleanup has been done, along
with the removal of a few more SnapshotNow references.
It's still possible to construct test cases that perform badly by
pounding the server with 1000 clients running Andres's
readonly-busy.sql. Consider the following test case: use a DO block
to create a schema with 10,000 functions in it and then DROP ..
CASCADE. When the server is unloaded, the extra MVCC overhead is
pretty small.
master
Create: 1010.225 ms, Drop: 444.891 ms
Create: 1001.237 ms, Drop: 444.084 ms
Create: 979.621 ms, Drop: 446.091 ms
patched
Create: 992.366 ms, Drop: 459.334 ms
Create: 992.436 ms, Drop: 459.921 ms
Create: 990.971 ms, Drop: 459.573 ms
The create case is actually running just a hair faster with the patch,
and the drop case is about 3% slower. Now let's add 1000 clients
running Andres's readonly-busy.sql in the background and retest:
master
Create: 21554.387 ms, Drop: 2594.534 ms
Create: 32189.395 ms, Drop: 2493.213 ms
Create: 30627.964 ms, Drop: 1813.160 ms
patched
Create: 44312.107 ms, Drop: 11718.305 ms
Create: 46683.021 ms, Drop: 11732.284 ms
Create: 50766.615 ms, Drop: 9363.742 ms
Well, now the create is 52% slower and the drop is a whopping 4.7x
slower. It's worth digging into the reasons just a bit. I was able
to speed up this case quite a bit - it was 30x slower a few hours ago
- by adding a few new relations to the switch in
RelationInvalidatesSnapshotsOnly(). But the code still takes one MVCC
snapshot per object dropped, because deleteOneObject() calls
CommandCounterIncrement() and that, as it must, invalidates our
previous snapshot. We could, if we were inclined to spend the effort,
probably work out that although we need to change curcid, the rest of
the snapshot is still OK, but I'm not too convinced that it's worth
adding an even-more-complicated mechanism for this. We could probably
also optimize the delete code to increment the command counter fewer
times, but I'm not convinced that's worth doing either.
I think my general feeling about this is that we're going to have to
accept that there's no such thing as a free lunch, but maybe that's
OK. The testing done to date shows that MVCC snapshots are not
terribly expensive when PGXACT isn't heavily updated, even if you take
an awful lot of them, but with enough concurrent activity on PGXACT
they do get expensive enough to care about. And that's still not a
big problem on normal workloads, but if you combine code that with a
workload that requires an abnormally high number of snapshots compared
to the overall work it does (like a DROP CASCADE on a schema with many
objects but no tables) then you can make it quite slow. That's not
great, but on the other hand, if it had always been that slow, I'm not
all that sure anyone would have complained. DDL performance is not
something we've spend a lot of cycles on, and for good reason.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mvcc-catalog-access-v5.patchapplication/octet-stream; name=mvcc-catalog-access-v5.patchDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index e617f9b..1110719 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2046,7 +2046,7 @@ get_pkey_attnames(Relation rel, int16 *numatts)
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(indexRelation, IndexIndrelidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(indexTuple = systable_getnext(scan)))
{
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 17b832e..81ab972 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -727,7 +727,7 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId)
rel = heap_open(catalogId, AccessShareLock);
sscan = systable_beginscan(rel, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while (HeapTupleIsValid(tuple = systable_getnext(sscan)))
{
Form_pg_database datForm;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 570ee90..fd11689 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -713,7 +713,7 @@ amrestrpos (IndexScanDesc scan);
When using an MVCC-compliant snapshot, there is no problem because
the new occupant of the slot is certain to be too new to pass the
snapshot test. However, with a non-MVCC-compliant snapshot (such as
- <literal>SnapshotNow</>), it would be possible to accept and return
+ <literal>SnapshotAny</>), it would be possible to accept and return
a row that does not in fact match the scan keys. We could defend
against this scenario by requiring the scan keys to be rechecked
against the heap row in all cases, but that is too expensive. Instead,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1531f3b..5bcbc92 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -80,7 +80,7 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync,
- bool is_bitmapscan);
+ bool is_bitmapscan, bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1286,7 +1286,17 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false);
+ true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
+{
+ Oid relid = RelationGetRelid(relation);
+ Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ true, true, false, true);
}
HeapScanDesc
@@ -1295,7 +1305,7 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false);
+ allow_strat, allow_sync, false, false);
}
HeapScanDesc
@@ -1303,14 +1313,14 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true);
+ false, false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync,
- bool is_bitmapscan)
+ bool is_bitmapscan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1335,6 +1345,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
+ scan->rs_temp_snap = temp_snap;
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
@@ -1421,6 +1432,9 @@ heap_endscan(HeapScanDesc scan)
if (scan->rs_strategy != NULL)
FreeAccessStrategy(scan->rs_strategy);
+ if (scan->rs_temp_snap)
+ UnregisterSnapshot(scan->rs_snapshot);
+
pfree(scan);
}
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 31a419b..2bfe78a 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -28,6 +28,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/tqual.h"
@@ -231,7 +232,7 @@ BuildIndexValueDescription(Relation indexRelation,
* rel: catalog to scan, already opened and suitably locked
* indexId: OID of index to conditionally use
* indexOK: if false, forces a heap scan (see notes below)
- * snapshot: time qual to use (usually should be SnapshotNow)
+ * snapshot: time qual to use (NULL for a recent catalog snapshot)
* nkeys, key: scan keys
*
* The attribute numbers in the scan key should be set for the heap case.
@@ -266,6 +267,19 @@ systable_beginscan(Relation heapRelation,
sysscan->heap_rel = heapRelation;
sysscan->irel = irel;
+ if (snapshot == NULL)
+ {
+ Oid relid = RelationGetRelid(heapRelation);
+
+ snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+ sysscan->snapshot = snapshot;
+ }
+ else
+ {
+ /* Caller is responsible for any snapshot. */
+ sysscan->snapshot = NULL;
+ }
+
if (irel)
{
int i;
@@ -401,6 +415,9 @@ systable_endscan(SysScanDesc sysscan)
else
heap_endscan(sysscan->scan);
+ if (sysscan->snapshot)
+ UnregisterSnapshot(sysscan->snapshot);
+
pfree(sysscan);
}
@@ -444,6 +461,19 @@ systable_beginscan_ordered(Relation heapRelation,
sysscan->heap_rel = heapRelation;
sysscan->irel = indexRelation;
+ if (snapshot == NULL)
+ {
+ Oid relid = RelationGetRelid(heapRelation);
+
+ snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+ sysscan->snapshot = snapshot;
+ }
+ else
+ {
+ /* Caller is responsible for any snapshot. */
+ sysscan->snapshot = NULL;
+ }
+
/* Change attribute numbers to be index column numbers. */
for (i = 0; i < nkeys; i++)
{
@@ -494,5 +524,7 @@ systable_endscan_ordered(SysScanDesc sysscan)
{
Assert(sysscan->irel);
index_endscan(sysscan->iscan);
+ if (sysscan->snapshot)
+ UnregisterSnapshot(sysscan->snapshot);
pfree(sysscan);
}
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index fcf1a95..40f09e3 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -141,9 +141,10 @@ deletes index entries before deleting tuples, the super-exclusive lock
guarantees that VACUUM can't delete any heap tuple that an indexscanning
process might be about to visit. (This guarantee works only for simple
indexscans that visit the heap in sync with the index scan, not for bitmap
-scans. We only need the guarantee when using non-MVCC snapshot rules such
-as SnapshotNow, so in practice this is only important for system catalog
-accesses.)
+scans. We only need the guarantee when using non-MVCC snapshot rules; in
+an MVCC snapshot, it wouldn't matter if the heap tuple were replaced with
+an unrelated tuple at the same TID, because the new tuple wouldn't be
+visible to our scan anyway.)
Because a page can be split even while someone holds a pin on it, it is
possible that an indexscan will return items that are no longer stored on
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index c9c7b4a..7670b60 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -69,11 +69,14 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec)
appendStringInfo(buf, " catalog %u", msg->cat.catId);
else if (msg->id == SHAREDINVALRELCACHE_ID)
appendStringInfo(buf, " relcache %u", msg->rc.relId);
- /* remaining cases not expected, but print something anyway */
+ /* not expected, but print something anyway */
else if (msg->id == SHAREDINVALSMGR_ID)
appendStringInfo(buf, " smgr");
+ /* not expected, but print something anyway */
else if (msg->id == SHAREDINVALRELMAP_ID)
appendStringInfo(buf, " relmap");
+ else if (msg->id == SHAREDINVALSNAPSHOT_ID)
+ appendStringInfo(buf, " snapshot %u", msg->sn.relId);
else
appendStringInfo(buf, " unknown id %d", msg->id);
}
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 8905596..d23dc45 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -611,7 +611,7 @@ boot_openrel(char *relname)
{
/* We can now load the pg_type data */
rel = heap_open(TypeRelationId, NoLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
i = 0;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
++i;
@@ -620,7 +620,7 @@ boot_openrel(char *relname)
while (i-- > 0)
*app++ = ALLOC(struct typmap, 1);
*app = NULL;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
app = Typ;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -918,7 +918,7 @@ gettype(char *type)
}
elog(DEBUG4, "external type: %s", type);
rel = heap_open(TypeRelationId, NoLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
i = 0;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
++i;
@@ -927,7 +927,7 @@ gettype(char *type)
while (i-- > 0)
*app++ = ALLOC(struct typmap, 1);
*app = NULL;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
app = Typ;
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index ced66b1..e0dcf05 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -788,7 +788,7 @@ objectsInSchemaToOids(GrantObjectType objtype, List *nspnames)
ObjectIdGetDatum(namespaceId));
rel = heap_open(ProcedureRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 1, key);
+ scan = heap_beginscan_catalog(rel, 1, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -833,7 +833,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
CharGetDatum(relkind));
rel = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 2, key);
+ scan = heap_beginscan_catalog(rel, 2, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -1332,7 +1332,7 @@ RemoveRoleFromObjectACL(Oid roleid, Oid classid, Oid objid)
ObjectIdGetDatum(objid));
scan = systable_beginscan(rel, DefaultAclOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
@@ -1452,7 +1452,7 @@ RemoveDefaultACLById(Oid defaclOid)
ObjectIdGetDatum(defaclOid));
scan = systable_beginscan(rel, DefaultAclOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
@@ -2705,7 +2705,7 @@ ExecGrant_Largeobject(InternalGrant *istmt)
scan = systable_beginscan(relation,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -3468,7 +3468,7 @@ pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnum, Oid roleid,
return pg_language_aclmask(table_oid, roleid, mask, how);
case ACL_KIND_LARGEOBJECT:
return pg_largeobject_aclmask_snapshot(table_oid, roleid,
- mask, how, SnapshotNow);
+ mask, how, NULL);
case ACL_KIND_NAMESPACE:
return pg_namespace_aclmask(table_oid, roleid, mask, how);
case ACL_KIND_TABLESPACE:
@@ -3856,10 +3856,13 @@ pg_language_aclmask(Oid lang_oid, Oid roleid,
* Exported routine for examining a user's privileges for a largeobject
*
* When a large object is opened for reading, it is opened relative to the
- * caller's snapshot, but when it is opened for writing, it is always relative
- * to SnapshotNow, as documented in doc/src/sgml/lobj.sgml. This function
- * takes a snapshot argument so that the permissions check can be made relative
- * to the same snapshot that will be used to read the underlying data.
+ * caller's snapshot, but when it is opened for writing, a current
+ * MVCC snapshot will be used. See doc/src/sgml/lobj.sgml. This function
+ * takes a snapshot argument so that the permissions check can be made
+ * relative to the same snapshot that will be used to read the underlying
+ * data. The caller will actually pass NULL for an instantaneous MVCC
+ * snapshot, since all we do with the snapshot argument is pass it through
+ * to systable_beginscan().
*/
AclMode
pg_largeobject_aclmask_snapshot(Oid lobj_oid, Oid roleid,
@@ -4644,7 +4647,7 @@ pg_language_ownercheck(Oid lan_oid, Oid roleid)
* Ownership check for a largeobject (specified by OID)
*
* This is only used for operations like ALTER LARGE OBJECT that are always
- * relative to SnapshotNow.
+ * relative to an up-to-date snapshot.
*/
bool
pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
@@ -4670,7 +4673,7 @@ pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
scan = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -5032,7 +5035,7 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
scan = systable_beginscan(pg_extension,
ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 41a5da0..1378488 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -232,6 +232,10 @@ IsReservedName(const char *name)
* know if it's shared. Fortunately, the set of shared relations is
* fairly static, so a hand-maintained list of their OIDs isn't completely
* impractical.
+ *
+ * XXX: Now that we have MVCC catalog access, the reasoning above is no longer
+ * true. Are there other good reasons to hard-code this, or should we revisit
+ * that decision?
*/
bool
IsSharedRelation(Oid relationId)
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 69171f8..fe17c96 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -558,7 +558,7 @@ findDependentObjects(const ObjectAddress *object,
nkeys = 2;
scan = systable_beginscan(*depRel, DependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -733,7 +733,7 @@ findDependentObjects(const ObjectAddress *object,
nkeys = 2;
scan = systable_beginscan(*depRel, DependReferenceIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1069,7 +1069,7 @@ deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags)
nkeys = 2;
scan = systable_beginscan(*depRel, DependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..4fd42ed 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1386,7 +1386,7 @@ RelationRemoveInheritance(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
simple_heap_delete(catalogRelation, &tuple->t_self);
@@ -1450,7 +1450,7 @@ DeleteAttributeTuples(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(attrel, AttributeRelidNumIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/* Delete all the matching tuples */
while ((atttup = systable_getnext(scan)) != NULL)
@@ -1491,7 +1491,7 @@ DeleteSystemAttributeTuples(Oid relid)
Int16GetDatum(0));
scan = systable_beginscan(attrel, AttributeRelidNumIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/* Delete all the matching tuples */
while ((atttup = systable_getnext(scan)) != NULL)
@@ -1623,7 +1623,7 @@ RemoveAttrDefault(Oid relid, AttrNumber attnum,
Int16GetDatum(attnum));
scan = systable_beginscan(attrdef_rel, AttrDefaultIndexId, true,
- SnapshotNow, 2, scankeys);
+ NULL, 2, scankeys);
/* There should be at most one matching tuple, but we loop anyway */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -1677,7 +1677,7 @@ RemoveAttrDefaultById(Oid attrdefId)
ObjectIdGetDatum(attrdefId));
scan = systable_beginscan(attrdef_rel, AttrDefaultOidIndexId, true,
- SnapshotNow, 1, scankeys);
+ NULL, 1, scankeys);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -2374,7 +2374,7 @@ MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr,
ObjectIdGetDatum(RelationGetNamespace(rel)));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -2640,7 +2640,7 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
}
scan = systable_beginscan(pgstatistic, StatisticRelidAttnumInhIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
/* we must loop even when attnum != 0, in case of inherited stats */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -2885,7 +2885,7 @@ heap_truncate_find_FKs(List *relationIds)
fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
fkeyScan = systable_beginscan(fkeyRel, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while (HeapTupleIsValid(tuple = systable_getnext(fkeyScan)))
{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..ca0c672 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1845,7 +1845,7 @@ index_update_stats(Relation rel,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
- pg_class_scan = heap_beginscan(pg_class, SnapshotNow, 1, key);
+ pg_class_scan = heap_beginscan_catalog(pg_class, 1, key);
tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
tuple = heap_copytuple(tuple);
heap_endscan(pg_class_scan);
@@ -2181,15 +2181,10 @@ IndexBuildHeapScan(Relation heapRelation,
* Prepare for scan of the base relation. In a normal index build, we use
* SnapshotAny because we must retrieve all tuples and do our own time
* qual checks (because we have to index RECENTLY_DEAD tuples). In a
- * concurrent build, we take a regular MVCC snapshot and index whatever's
- * live according to that. During bootstrap we just use SnapshotNow.
+ * concurrent build, or during bootstrap, we take a regular MVCC snapshot
+ * and index whatever's live according to that.
*/
- if (IsBootstrapProcessingMode())
- {
- snapshot = SnapshotNow;
- OldestXmin = InvalidTransactionId; /* not used */
- }
- else if (indexInfo->ii_Concurrent)
+ if (IsBootstrapProcessingMode() || indexInfo->ii_Concurrent)
{
snapshot = RegisterSnapshot(GetTransactionSnapshot());
OldestXmin = InvalidTransactionId; /* not used */
@@ -2500,7 +2495,7 @@ IndexBuildHeapScan(Relation heapRelation,
heap_endscan(scan);
/* we can now forget our snapshot, if set */
- if (indexInfo->ii_Concurrent)
+ if (IsBootstrapProcessingMode() || indexInfo->ii_Concurrent)
UnregisterSnapshot(snapshot);
ExecDropSingleTupleTableSlot(slot);
@@ -2520,10 +2515,10 @@ IndexBuildHeapScan(Relation heapRelation,
*
* When creating an exclusion constraint, we first build the index normally
* and then rescan the heap to check for conflicts. We assume that we only
- * need to validate tuples that are live according to SnapshotNow, and that
- * these were correctly indexed even in the presence of broken HOT chains.
- * This should be OK since we are holding at least ShareLock on the table,
- * meaning there can be no uncommitted updates from other transactions.
+ * need to validate tuples that are live according to an up-to-date snapshot,
+ * and that these were correctly indexed even in the presence of broken HOT
+ * chains. This should be OK since we are holding at least ShareLock on the
+ * table, meaning there can be no uncommitted updates from other transactions.
* (Note: that wouldn't necessarily work for system catalogs, since many
* operations release write lock early on the system catalogs.)
*/
@@ -2540,6 +2535,7 @@ IndexCheckExclusion(Relation heapRelation,
TupleTableSlot *slot;
EState *estate;
ExprContext *econtext;
+ Snapshot snapshot;
/*
* If we are reindexing the target index, mark it as no longer being
@@ -2568,8 +2564,9 @@ IndexCheckExclusion(Relation heapRelation,
/*
* Scan all live tuples in the base relation.
*/
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
scan = heap_beginscan_strat(heapRelation, /* relation */
- SnapshotNow, /* snapshot */
+ snapshot, /* snapshot */
0, /* number of keys */
NULL, /* scan key */
true, /* buffer access strategy OK */
@@ -2612,6 +2609,7 @@ IndexCheckExclusion(Relation heapRelation,
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 23943ff..4434dd6 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4013,8 +4013,8 @@ fetch_search_path_array(Oid *sarray, int sarray_len)
* a nonexistent object OID, rather than failing. This is to avoid race
* condition errors when a query that's scanning a catalog using an MVCC
* snapshot uses one of these functions. The underlying IsVisible functions
- * operate on SnapshotNow semantics and so might see the object as already
- * gone when it's still visible to the MVCC snapshot. (There is no race
+ * always use an up-to-date snapshot and so might see the object as already
+ * gone when it's still visible to the transaction snapshot. (There is no race
* condition in the current coding because we don't accept sinval messages
* between the SearchSysCacheExists test and the subsequent lookup.)
*/
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 215eaf5..4d22f3a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1481,7 +1481,7 @@ get_catalog_object_by_oid(Relation catalog, Oid objectId)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(catalog, oidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
{
@@ -1544,7 +1544,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(castDesc, CastOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -1644,7 +1644,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
adscan = systable_beginscan(attrdefDesc, AttrDefaultOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(adscan);
@@ -1750,7 +1750,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amopDesc, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -1800,7 +1800,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amprocDesc, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -1848,7 +1848,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(ruleDesc, RewriteOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -1883,7 +1883,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
tgscan = systable_beginscan(trigDesc, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(tgscan);
@@ -2064,7 +2064,7 @@ getObjectDescription(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(defaclrel, DefaultAclOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(rcscan);
@@ -2816,7 +2816,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
adscan = systable_beginscan(attrdefDesc, AttrDefaultOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(adscan);
@@ -2921,7 +2921,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amopDesc, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -2965,7 +2965,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
amscan = systable_beginscan(amprocDesc, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(amscan);
@@ -3218,7 +3218,7 @@ getObjectIdentity(const ObjectAddress *object)
ObjectIdGetDatum(object->objectId));
rcscan = systable_beginscan(defaclrel, DefaultAclOidIndexId,
- true, SnapshotNow, 1, skey);
+ true, NULL, 1, skey);
tup = systable_getnext(rcscan);
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index dd00502..99f4be5 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -166,7 +166,7 @@ RemoveCollationById(Oid collationOid)
ObjectIdGetDatum(collationOid));
scandesc = systable_beginscan(rel, CollationOidIndexId, true,
- SnapshotNow, 1, &scanKeyData);
+ NULL, 1, &scanKeyData);
tuple = systable_getnext(scandesc);
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index a8eb4cb..5021420 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -412,7 +412,7 @@ ConstraintNameIsUsed(ConstraintCategory conCat, Oid objId,
ObjectIdGetDatum(objNamespace));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -506,7 +506,7 @@ ChooseConstraintName(const char *name1, const char *name2,
ObjectIdGetDatum(namespaceid));
conscan = systable_beginscan(conDesc, ConstraintNameNspIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
found = (HeapTupleIsValid(systable_getnext(conscan)));
@@ -699,7 +699,7 @@ AlterConstraintNamespaces(Oid ownerId, Oid oldNspId,
ObjectIdGetDatum(ownerId));
scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
}
else
{
@@ -709,7 +709,7 @@ AlterConstraintNamespaces(Oid ownerId, Oid oldNspId,
ObjectIdGetDatum(ownerId));
scan = systable_beginscan(conRel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
}
while (HeapTupleIsValid((tup = systable_getnext(scan))))
@@ -778,7 +778,7 @@ get_relation_constraint_oid(Oid relid, const char *conname, bool missing_ok)
ObjectIdGetDatum(relid));
scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -836,7 +836,7 @@ get_domain_constraint_oid(Oid typid, const char *conname, bool missing_ok)
ObjectIdGetDatum(typid));
scan = systable_beginscan(pg_constraint, ConstraintTypidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -903,7 +903,7 @@ check_functional_grouping(Oid relid,
ObjectIdGetDatum(relid));
scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index 45d8e62..08b2a99 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -166,8 +166,7 @@ RemoveConversionById(Oid conversionOid)
/* open pg_conversion */
rel = heap_open(ConversionRelationId, RowExclusiveLock);
- scan = heap_beginscan(rel, SnapshotNow,
- 1, &scanKeyData);
+ scan = heap_beginscan_catalog(rel, 1, &scanKeyData);
/* search for the target tuple */
if (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
diff --git a/src/backend/catalog/pg_db_role_setting.c b/src/backend/catalog/pg_db_role_setting.c
index 4594912..6e19736 100644
--- a/src/backend/catalog/pg_db_role_setting.c
+++ b/src/backend/catalog/pg_db_role_setting.c
@@ -43,7 +43,7 @@ AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(roleid));
scan = systable_beginscan(rel, DbRoleSettingDatidRolidIndexId, true,
- SnapshotNow, 2, scankey);
+ NULL, 2, scankey);
tuple = systable_getnext(scan);
/*
@@ -205,7 +205,7 @@ DropSetting(Oid databaseid, Oid roleid)
numkeys++;
}
- scan = heap_beginscan(relsetting, SnapshotNow, numkeys, keys);
+ scan = heap_beginscan_catalog(relsetting, numkeys, keys);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
simple_heap_delete(relsetting, &tup->t_self);
@@ -226,7 +226,8 @@ DropSetting(Oid databaseid, Oid roleid)
* databaseid/roleid.
*/
void
-ApplySetting(Oid databaseid, Oid roleid, Relation relsetting, GucSource source)
+ApplySetting(Snapshot snapshot, Oid databaseid, Oid roleid,
+ Relation relsetting, GucSource source)
{
SysScanDesc scan;
ScanKeyData keys[2];
@@ -244,7 +245,7 @@ ApplySetting(Oid databaseid, Oid roleid, Relation relsetting, GucSource source)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(relsetting, DbRoleSettingDatidRolidIndexId, true,
- SnapshotNow, 2, keys);
+ snapshot, 2, keys);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
bool isnull;
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 9535fba..bd5cd99 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -211,7 +211,7 @@ deleteDependencyRecordsFor(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -261,7 +261,7 @@ deleteDependencyRecordsForClass(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -343,7 +343,7 @@ changeDependencyFor(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -407,7 +407,7 @@ isObjectPinned(const ObjectAddress *object, Relation rel)
ObjectIdGetDatum(object->objectId));
scan = systable_beginscan(rel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/*
* Since we won't generate additional pg_depend entries for pinned
@@ -467,7 +467,7 @@ getExtensionOfObject(Oid classId, Oid objectId)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -520,7 +520,7 @@ sequenceIsOwned(Oid seqId, Oid *tableId, int32 *colId)
ObjectIdGetDatum(seqId));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -580,7 +580,7 @@ getOwnedSequences(Oid relid)
ObjectIdGetDatum(relid));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -643,7 +643,7 @@ get_constraint_index(Oid constraintId)
Int32GetDatum(0));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -701,7 +701,7 @@ get_index_constraint(Oid indexId)
Int32GetDatum(0));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index 7e746f9..a7ef8cd 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -156,7 +156,7 @@ EnumValuesDelete(Oid enumTypeOid)
ObjectIdGetDatum(enumTypeOid));
scan = systable_beginscan(pg_enum, EnumTypIdLabelIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -483,6 +483,9 @@ restart:
* (for example, enum_in and enum_out do so). The worst that can happen
* is a transient failure to find any valid value of the row. This is
* judged acceptable in view of the infrequency of use of RenumberEnumType.
+ *
+ * XXX. Now that we have MVCC catalog scans, the above reasoning is no longer
+ * correct. Should we revisit any decisions here?
*/
static void
RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems)
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index fbfe7bc..638e535 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -81,7 +81,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
ObjectIdGetDatum(parentrelId));
scan = systable_beginscan(relation, InheritsParentIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while ((inheritsTuple = systable_getnext(scan)) != NULL)
{
@@ -325,7 +325,7 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
ObjectIdGetDatum(this_relid));
inhscan = systable_beginscan(inhrel, InheritsRelidSeqnoIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while ((inhtup = systable_getnext(inhscan)) != NULL)
{
diff --git a/src/backend/catalog/pg_largeobject.c b/src/backend/catalog/pg_largeobject.c
index d01a5a7..22d499d 100644
--- a/src/backend/catalog/pg_largeobject.c
+++ b/src/backend/catalog/pg_largeobject.c
@@ -104,7 +104,7 @@ LargeObjectDrop(Oid loid)
scan = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
@@ -126,7 +126,7 @@ LargeObjectDrop(Oid loid)
scan = systable_beginscan(pg_largeobject,
LargeObjectLOidPNIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
simple_heap_delete(pg_largeobject, &tuple->t_self);
@@ -145,11 +145,11 @@ LargeObjectDrop(Oid loid)
* We don't use the system cache for large object metadata, for fear of
* using too much local memory.
*
- * This function always scans the system catalog using SnapshotNow, so it
- * should not be used when a large object is opened in read-only mode (because
- * large objects opened in read only mode are supposed to be viewed relative
- * to the caller's snapshot, whereas in read-write mode they are relative to
- * SnapshotNow).
+ * This function always scans the system catalog using an up-to-date snapshot,
+ * so it should not be used when a large object is opened in read-only mode
+ * (because large objects opened in read only mode are supposed to be viewed
+ * relative to the caller's snapshot, whereas in read-write mode they are
+ * relative to a current snapshot).
*/
bool
LargeObjectExists(Oid loid)
@@ -170,7 +170,7 @@ LargeObjectExists(Oid loid)
sd = systable_beginscan(pg_lo_meta,
LargeObjectMetadataOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(sd);
if (HeapTupleIsValid(tuple))
diff --git a/src/backend/catalog/pg_range.c b/src/backend/catalog/pg_range.c
index 639b40c..b782f90 100644
--- a/src/backend/catalog/pg_range.c
+++ b/src/backend/catalog/pg_range.c
@@ -126,7 +126,7 @@ RangeDelete(Oid rangeTypeOid)
ObjectIdGetDatum(rangeTypeOid));
scan = systable_beginscan(pg_range, RangeTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 7de4420..dc21c10 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -220,7 +220,7 @@ shdepChangeDep(Relation sdepRel,
Int32GetDatum(objsubid));
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 4, key);
+ NULL, 4, key);
while ((scantup = systable_getnext(scan)) != NULL)
{
@@ -554,7 +554,7 @@ checkSharedDependencies(Oid classId, Oid objectId,
ObjectIdGetDatum(objectId));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -729,7 +729,7 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
ObjectIdGetDatum(templateDbId));
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/* Set up to copy the tuples except for inserting newDbId */
memset(values, 0, sizeof(values));
@@ -792,7 +792,7 @@ dropDatabaseDependencies(Oid databaseId)
/* We leave the other index fields unspecified */
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -936,7 +936,7 @@ shdepDropDependency(Relation sdepRel,
}
scan = systable_beginscan(sdepRel, SharedDependDependerIndexId, true,
- SnapshotNow, nkeys, key);
+ NULL, nkeys, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1125,7 +1125,7 @@ isSharedObjectPinned(Oid classId, Oid objectId, Relation sdepRel)
ObjectIdGetDatum(objectId));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
/*
* Since we won't generate additional pg_shdepend entries for pinned
@@ -1212,7 +1212,7 @@ shdepDropOwned(List *roleids, DropBehavior behavior)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while ((tuple = systable_getnext(scan)) != NULL)
{
@@ -1319,7 +1319,7 @@ shdepReassignOwned(List *roleids, Oid newrole)
ObjectIdGetDatum(roleid));
scan = systable_beginscan(sdepRel, SharedDependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while ((tuple = systable_getnext(scan)) != NULL)
{
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..f23730c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -480,6 +480,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
* against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe
* to execute with less than full exclusive lock on the parent table;
* otherwise concurrent executions of RelationGetIndexList could miss indexes.
+ *
+ * XXX: Now that we have MVCC catalog access, SnapshotNow scans of pg_index
+ * shouldn't be common enough to worry about. The above comment needs
+ * to be updated, and it may be possible to simplify the logic here in other
+ * ways also.
*/
void
mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
@@ -1583,7 +1588,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
Anum_pg_index_indisclustered,
BTEqualStrategyNumber, F_BOOLEQ,
BoolGetDatum(true));
- scan = heap_beginscan(indRelation, SnapshotNow, 1, &entry);
+ scan = heap_beginscan_catalog(indRelation, 1, &entry);
while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
index = (Form_pg_index) GETSTRUCT(indexTuple);
diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c
index 60db27c..8baf017 100644
--- a/src/backend/commands/comment.c
+++ b/src/backend/commands/comment.c
@@ -187,7 +187,7 @@ CreateComments(Oid oid, Oid classoid, int32 subid, char *comment)
description = heap_open(DescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
{
@@ -281,7 +281,7 @@ CreateSharedComments(Oid oid, Oid classoid, char *comment)
shdescription = heap_open(SharedDescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(shdescription, SharedDescriptionObjIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
{
@@ -363,7 +363,7 @@ DeleteComments(Oid oid, Oid classoid, int32 subid)
description = heap_open(DescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, nkeys, skey);
+ NULL, nkeys, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
simple_heap_delete(description, &oldtuple->t_self);
@@ -399,7 +399,7 @@ DeleteSharedComments(Oid oid, Oid classoid)
shdescription = heap_open(SharedDescriptionRelationId, RowExclusiveLock);
sd = systable_beginscan(shdescription, SharedDescriptionObjIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while ((oldtuple = systable_getnext(sd)) != NULL)
simple_heap_delete(shdescription, &oldtuple->t_self);
@@ -442,7 +442,7 @@ GetComment(Oid oid, Oid classoid, int32 subid)
tupdesc = RelationGetDescr(description);
sd = systable_beginscan(description, DescriptionObjIndexId, true,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
comment = NULL;
while ((tuple = systable_getnext(sd)) != NULL)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 0e10a75..a3a150d 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -133,7 +133,6 @@ createdb(const CreatedbStmt *stmt)
int notherbackends;
int npreparedxacts;
createdb_failure_params fparms;
- Snapshot snapshot;
/* Extract options from the statement node tree */
foreach(option, stmt->options)
@@ -538,29 +537,6 @@ createdb(const CreatedbStmt *stmt)
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
/*
- * Take an MVCC snapshot to use while scanning through pg_tablespace. For
- * safety, register the snapshot (this prevents it from changing if
- * something else were to request a snapshot during the loop).
- *
- * Traversing pg_tablespace with an MVCC snapshot is necessary to provide
- * us with a consistent view of the tablespaces that exist. Using
- * SnapshotNow here would risk seeing the same tablespace multiple times,
- * or worse not seeing a tablespace at all, if its tuple is moved around
- * by a concurrent update (eg an ACL change).
- *
- * Inconsistency of this sort is inherent to all SnapshotNow scans, unless
- * some lock is held to prevent concurrent updates of the rows being
- * sought. There should be a generic fix for that, but in the meantime
- * it's worth fixing this case in particular because we are doing very
- * heavyweight operations within the scan, so that the elapsed time for
- * the scan is vastly longer than for most other catalog scans. That
- * means there's a much wider window for concurrent updates to cause
- * trouble here than anywhere else. XXX this code should be changed
- * whenever a generic fix is implemented.
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
-
- /*
* Once we start copying subdirectories, we need to be able to clean 'em
* up if we fail. Use an ENSURE block to make sure this happens. (This
* is not a 100% solution, because of the possibility of failure during
@@ -577,7 +553,7 @@ createdb(const CreatedbStmt *stmt)
* each one to the new database.
*/
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid srctablespace = HeapTupleGetOid(tuple);
@@ -682,9 +658,6 @@ createdb(const CreatedbStmt *stmt)
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
PointerGetDatum(&fparms));
- /* Free our snapshot */
- UnregisterSnapshot(snapshot);
-
return dboid;
}
@@ -1214,7 +1187,7 @@ movedb(const char *dbname, const char *tblspcname)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(dbname));
sysscan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
oldtuple = systable_getnext(sysscan);
if (!HeapTupleIsValid(oldtuple)) /* shouldn't happen... */
ereport(ERROR,
@@ -1403,7 +1376,7 @@ AlterDatabase(AlterDatabaseStmt *stmt, bool isTopLevel)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(stmt->dbname));
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
@@ -1498,7 +1471,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(dbname));
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
@@ -1637,7 +1610,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
NameGetDatum(name));
scan = systable_beginscan(relation, DatabaseNameIndexId, true,
- SnapshotNow, 1, &scanKey);
+ NULL, 1, &scanKey);
tuple = systable_getnext(scan);
@@ -1751,20 +1724,9 @@ remove_dbtablespaces(Oid db_id)
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
- Snapshot snapshot;
-
- /*
- * As in createdb(), we'd better use an MVCC snapshot here, since this
- * scan can run for a long time. Duplicate visits to tablespaces would be
- * harmless, but missing a tablespace could result in permanently leaked
- * files.
- *
- * XXX change this when a generic fix for SnapshotNow races is implemented
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1810,7 +1772,6 @@ remove_dbtablespaces(Oid db_id)
heap_endscan(scan);
heap_close(rel, AccessShareLock);
- UnregisterSnapshot(snapshot);
}
/*
@@ -1832,19 +1793,9 @@ check_db_file_conflict(Oid db_id)
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
- Snapshot snapshot;
-
- /*
- * As in createdb(), we'd better use an MVCC snapshot here; missing a
- * tablespace could result in falsely reporting the OID is unique, with
- * disastrous future consequences per the comment above.
- *
- * XXX change this when a generic fix for SnapshotNow races is implemented
- */
- snapshot = RegisterSnapshot(GetLatestSnapshot());
rel = heap_open(TableSpaceRelationId, AccessShareLock);
- scan = heap_beginscan(rel, snapshot, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1870,7 +1821,6 @@ check_db_file_conflict(Oid db_id)
heap_endscan(scan);
heap_close(rel, AccessShareLock);
- UnregisterSnapshot(snapshot);
return result;
}
@@ -1927,7 +1877,7 @@ get_database_oid(const char *dbname, bool missing_ok)
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(dbname));
scan = systable_beginscan(pg_database, DatabaseNameIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
dbtuple = systable_getnext(scan);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 08e8cad..798c92a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -128,7 +128,7 @@ get_extension_oid(const char *extname, bool missing_ok)
CStringGetDatum(extname));
scandesc = systable_beginscan(rel, ExtensionNameIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -173,7 +173,7 @@ get_extension_name(Oid ext_oid)
ObjectIdGetDatum(ext_oid));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -212,7 +212,7 @@ get_extension_schema(Oid ext_oid)
ObjectIdGetDatum(ext_oid));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -1609,7 +1609,7 @@ RemoveExtensionById(Oid extId)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(extId));
scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
- SnapshotNow, 1, entry);
+ NULL, 1, entry);
tuple = systable_getnext(scandesc);
@@ -2107,7 +2107,7 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
ObjectIdGetDatum(CurrentExtensionObject));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2256,7 +2256,7 @@ extension_config_remove(Oid extensionoid, Oid tableoid)
ObjectIdGetDatum(extensionoid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2464,7 +2464,7 @@ AlterExtensionNamespace(List *names, const char *newschema)
ObjectIdGetDatum(extensionOid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2512,7 +2512,7 @@ AlterExtensionNamespace(List *names, const char *newschema)
ObjectIdGetDatum(extensionOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -2622,7 +2622,7 @@ ExecAlterExtensionStmt(AlterExtensionStmt *stmt)
CStringGetDatum(stmt->extname));
extScan = systable_beginscan(extRel, ExtensionNameIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
@@ -2772,7 +2772,7 @@ ApplyExtensionUpdates(Oid extensionOid,
ObjectIdGetDatum(extensionOid));
extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
extTup = systable_getnext(extScan);
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index c776758..0a9facf 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1607,7 +1607,7 @@ DropCastById(Oid castOid)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(castOid));
scan = systable_beginscan(relation, CastOidIndexId, true,
- SnapshotNow, 1, &scankey);
+ NULL, 1, &scankey);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7ea90d0..9d9745e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1358,7 +1358,7 @@ GetDefaultOpClass(Oid type_id, Oid am_id)
ObjectIdGetDatum(am_id));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1838,7 +1838,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
* indirectly by reindex_relation).
*/
relationRelation = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(relationRelation, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(relationRelation, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index f2d78ef..3140b37 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -614,7 +614,7 @@ DefineOpClass(CreateOpClassStmt *stmt)
ObjectIdGetDatum(amoid));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -1622,7 +1622,7 @@ RemoveAmOpEntryById(Oid entryOid)
rel = heap_open(AccessMethodOperatorRelationId, RowExclusiveLock);
scan = systable_beginscan(rel, AccessMethodOperatorOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
/* we expect exactly one match */
tup = systable_getnext(scan);
@@ -1651,7 +1651,7 @@ RemoveAmProcEntryById(Oid entryOid)
rel = heap_open(AccessMethodProcedureRelationId, RowExclusiveLock);
scan = systable_beginscan(rel, AccessMethodProcedureOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
/* we expect exactly one match */
tup = systable_getnext(scan);
diff --git a/src/backend/commands/proclang.c b/src/backend/commands/proclang.c
index 6e4c682..b7be1f7 100644
--- a/src/backend/commands/proclang.c
+++ b/src/backend/commands/proclang.c
@@ -455,7 +455,7 @@ find_language_template(const char *languageName)
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(languageName));
scan = systable_beginscan(rel, PLTemplateNameIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
tup = systable_getnext(scan);
if (HeapTupleIsValid(tup))
diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
index 3b27ac2..7466e66 100644
--- a/src/backend/commands/seclabel.c
+++ b/src/backend/commands/seclabel.c
@@ -167,7 +167,7 @@ GetSharedSecurityLabel(const ObjectAddress *object, const char *provider)
pg_shseclabel = heap_open(SharedSecLabelRelationId, AccessShareLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 3, keys);
+ NULL, 3, keys);
tuple = systable_getnext(scan);
if (HeapTupleIsValid(tuple))
@@ -224,7 +224,7 @@ GetSecurityLabel(const ObjectAddress *object, const char *provider)
pg_seclabel = heap_open(SecLabelRelationId, AccessShareLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, 4, keys);
+ NULL, 4, keys);
tuple = systable_getnext(scan);
if (HeapTupleIsValid(tuple))
@@ -284,7 +284,7 @@ SetSharedSecurityLabel(const ObjectAddress *object,
pg_shseclabel = heap_open(SharedSecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 3, keys);
+ NULL, 3, keys);
oldtup = systable_getnext(scan);
if (HeapTupleIsValid(oldtup))
@@ -375,7 +375,7 @@ SetSecurityLabel(const ObjectAddress *object,
pg_seclabel = heap_open(SecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, 4, keys);
+ NULL, 4, keys);
oldtup = systable_getnext(scan);
if (HeapTupleIsValid(oldtup))
@@ -434,7 +434,7 @@ DeleteSharedSecurityLabel(Oid objectId, Oid classId)
pg_shseclabel = heap_open(SharedSecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_shseclabel, SharedSecLabelObjectIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid(oldtup = systable_getnext(scan)))
simple_heap_delete(pg_shseclabel, &oldtup->t_self);
systable_endscan(scan);
@@ -485,7 +485,7 @@ DeleteSecurityLabel(const ObjectAddress *object)
pg_seclabel = heap_open(SecLabelRelationId, RowExclusiveLock);
scan = systable_beginscan(pg_seclabel, SecLabelObjectIndexId, true,
- SnapshotNow, nkeys, skey);
+ NULL, nkeys, skey);
while (HeapTupleIsValid(oldtup = systable_getnext(scan)))
simple_heap_delete(pg_seclabel, &oldtup->t_self);
systable_endscan(scan);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ea1c309..2d49646 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2741,7 +2741,7 @@ AlterTableGetLockLevel(List *cmds)
* multiple DDL operations occur in a stream against frequently accessed
* tables.
*
- * 1. Catalog tables are read using SnapshotNow, which has a race bug that
+ * 1. Catalog tables were read using SnapshotNow, which has a race bug that
* allows a scan to return no valid rows even when one is present in the
* case of a commit of a concurrent update of the catalog table.
* SnapshotNow also ignores transactions in progress, so takes the latest
@@ -3753,6 +3753,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
MemoryContext oldCxt;
List *dropped_attrs = NIL;
ListCell *lc;
+ Snapshot snapshot;
if (newrel)
ereport(DEBUG1,
@@ -3805,7 +3806,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
* Scan through the rows, generating a new row if needed and then
* checking all the constraints.
*/
- scan = heap_beginscan(oldrel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(oldrel, snapshot, 0, NULL);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -3906,6 +3908,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
MemoryContextSwitchTo(oldCxt);
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
ExecDropSingleTupleTableSlot(oldslot);
ExecDropSingleTupleTableSlot(newslot);
@@ -4182,7 +4185,7 @@ find_composite_type_dependencies(Oid typeOid, Relation origRelation,
ObjectIdGetDatum(typeOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -4281,7 +4284,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(typeOid));
- scan = heap_beginscan(classRel, SnapshotNow, 1, key);
+ scan = heap_beginscan_catalog(classRel, 1, key);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -6343,7 +6346,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -6824,6 +6827,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
TupleTableSlot *slot;
Form_pg_constraint constrForm;
bool isnull;
+ Snapshot snapshot;
constrForm = (Form_pg_constraint) GETSTRUCT(constrtup);
@@ -6849,7 +6853,8 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
slot = MakeSingleTupleTableSlot(tupdesc);
econtext->ecxt_scantuple = slot;
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(rel, snapshot, 0, NULL);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -6873,6 +6878,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
MemoryContextSwitchTo(oldcxt);
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
ExecDropSingleTupleTableSlot(slot);
FreeExecutorState(estate);
}
@@ -6893,6 +6899,7 @@ validateForeignKeyConstraint(char *conname,
HeapScanDesc scan;
HeapTuple tuple;
Trigger trig;
+ Snapshot snapshot;
ereport(DEBUG1,
(errmsg("validating foreign key constraint \"%s\"", conname)));
@@ -6924,7 +6931,8 @@ validateForeignKeyConstraint(char *conname,
* if that tuple had just been inserted. If any of those fail, it should
* ereport(ERROR) and that's that.
*/
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(rel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -6956,6 +6964,7 @@ validateForeignKeyConstraint(char *conname,
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
}
static void
@@ -7174,7 +7183,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -7255,7 +7264,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(childrelid));
scan = systable_beginscan(conrel, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
/* scan for matching tuple - there should only be one */
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
@@ -7655,7 +7664,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
Int32GetDatum((int32) attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
@@ -7840,7 +7849,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
Int32GetDatum((int32) attnum));
scan = systable_beginscan(depRel, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
@@ -8517,7 +8526,7 @@ change_owner_fix_column_acls(Oid relationOid, Oid oldOwnerId, Oid newOwnerId)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relationOid));
scan = systable_beginscan(attRelation, AttributeRelidNumIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(attributeTuple = systable_getnext(scan)))
{
Form_pg_attribute att = (Form_pg_attribute) GETSTRUCT(attributeTuple);
@@ -8594,7 +8603,7 @@ change_owner_recurse_to_sequences(Oid relationOid, Oid newOwnerId, LOCKMODE lock
/* we leave refobjsubid unspecified */
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
@@ -9188,7 +9197,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(child_rel)));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
/* inhseqno sequences start at 1 */
inhseqno = 0;
@@ -9430,7 +9439,7 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(parent_rel)));
parent_scan = systable_beginscan(catalog_relation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &parent_key);
+ true, NULL, 1, &parent_key);
while (HeapTupleIsValid(parent_tuple = systable_getnext(parent_scan)))
{
@@ -9453,7 +9462,7 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(child_rel)));
child_scan = systable_beginscan(catalog_relation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, &child_key);
+ true, NULL, 1, &child_key);
while (HeapTupleIsValid(child_tuple = systable_getnext(child_scan)))
{
@@ -9561,7 +9570,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(inheritsTuple = systable_getnext(scan)))
{
@@ -9595,7 +9604,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, AttributeRelidNumIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(attributeTuple = systable_getnext(scan)))
{
Form_pg_attribute att = (Form_pg_attribute) GETSTRUCT(attributeTuple);
@@ -9637,7 +9646,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(parent_rel)));
scan = systable_beginscan(catalogRelation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
connames = NIL;
@@ -9657,7 +9666,7 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
scan = systable_beginscan(catalogRelation, ConstraintRelidIndexId,
- true, SnapshotNow, 1, key);
+ true, NULL, 1, key);
while (HeapTupleIsValid(constraintTuple = systable_getnext(scan)))
{
@@ -9749,7 +9758,7 @@ drop_parent_dependency(Oid relid, Oid refclassid, Oid refobjid)
Int32GetDatum(0));
scan = systable_beginscan(catalogRelation, DependDependerIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(depTuple = systable_getnext(scan)))
{
@@ -9804,7 +9813,7 @@ ATExecAddOf(Relation rel, const TypeName *ofTypename, LOCKMODE lockmode)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
scan = systable_beginscan(inheritsRelation, InheritsRelidSeqnoIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
if (HeapTupleIsValid(systable_getnext(scan)))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -10260,7 +10269,7 @@ AlterSeqNamespaces(Relation classRel, Relation rel,
/* we leave refobjsubid unspecified */
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 8589512..ba9cb1f 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -400,7 +400,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
if (!HeapTupleIsValid(tuple))
@@ -831,7 +831,7 @@ RenameTableSpace(const char *oldname, const char *newname)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(oldname));
- scan = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scan = heap_beginscan_catalog(rel, 1, entry);
tup = heap_getnext(scan, ForwardScanDirection);
if (!HeapTupleIsValid(tup))
ereport(ERROR,
@@ -861,7 +861,7 @@ RenameTableSpace(const char *oldname, const char *newname)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(newname));
- scan = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scan = heap_beginscan_catalog(rel, 1, entry);
tup = heap_getnext(scan, ForwardScanDirection);
if (HeapTupleIsValid(tup))
ereport(ERROR,
@@ -910,7 +910,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(stmt->tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tup = heap_getnext(scandesc, ForwardScanDirection);
if (!HeapTupleIsValid(tup))
ereport(ERROR,
@@ -1311,7 +1311,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok)
Anum_pg_tablespace_spcname,
BTEqualStrategyNumber, F_NAMEEQ,
CStringGetDatum(tablespacename));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
/* We assume that there can be at most one matching tuple */
@@ -1357,7 +1357,7 @@ get_tablespace_name(Oid spc_oid)
ObjectIdAttributeNumber,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(spc_oid));
- scandesc = heap_beginscan(rel, SnapshotNow, 1, entry);
+ scandesc = heap_beginscan_catalog(rel, 1, entry);
tuple = heap_getnext(scandesc, ForwardScanDirection);
/* We assume that there can be at most one matching tuple */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ed65bab..d86e9ad 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -492,7 +492,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(rel)));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 1, &key);
+ NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
{
Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(tuple);
@@ -1048,7 +1048,7 @@ RemoveTriggerById(Oid trigOid)
ObjectIdGetDatum(trigOid));
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tup = systable_getnext(tgscan);
if (!HeapTupleIsValid(tup))
@@ -1127,7 +1127,7 @@ get_trigger_oid(Oid relid, const char *trigname, bool missing_ok)
CStringGetDatum(trigname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
tup = systable_getnext(tgscan);
@@ -1242,7 +1242,7 @@ renametrig(RenameStmt *stmt)
BTEqualStrategyNumber, F_NAMEEQ,
PointerGetDatum(stmt->newname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
@@ -1262,7 +1262,7 @@ renametrig(RenameStmt *stmt)
BTEqualStrategyNumber, F_NAMEEQ,
PointerGetDatum(stmt->subname));
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
{
tgoid = HeapTupleGetOid(tuple);
@@ -1359,7 +1359,7 @@ EnableDisableTrigger(Relation rel, const char *tgname,
nkeys = 1;
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, nkeys, keys);
+ NULL, nkeys, keys);
found = changed = false;
@@ -1468,7 +1468,7 @@ RelationBuildTriggers(Relation relation)
tgrel = heap_open(TriggerRelationId, AccessShareLock);
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
@@ -4270,7 +4270,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
ObjectIdGetDatum(namespaceId));
conscan = systable_beginscan(conrel, ConstraintNameNspIndexId,
- true, SnapshotNow, 2, skey);
+ true, NULL, 2, skey);
while (HeapTupleIsValid(tup = systable_getnext(conscan)))
{
@@ -4333,7 +4333,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
ObjectIdGetDatum(conoid));
tgscan = systable_beginscan(tgrel, TriggerConstraintIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 57b69f8..61ebc2e 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -921,7 +921,7 @@ makeConfigurationDependencies(HeapTuple tuple, bool removeOld,
ObjectIdGetDatum(myself.objectId));
scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1059,7 +1059,7 @@ DefineTSConfiguration(List *names, List *parameters)
ObjectIdGetDatum(sourceOid));
scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1138,7 +1138,7 @@ RemoveTSConfigurationById(Oid cfgId)
ObjectIdGetDatum(cfgId));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid((tup = systable_getnext(scan))))
{
@@ -1294,7 +1294,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
Int32GetDatum(tokens[i]));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1333,7 +1333,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
ObjectIdGetDatum(cfgId));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
@@ -1450,7 +1450,7 @@ DropConfigurationMapping(AlterTSConfigurationStmt *stmt,
Int32GetDatum(tokens[i]));
scan = systable_beginscan(relMap, TSConfigMapIndexId, true,
- SnapshotNow, 2, skey);
+ NULL, 2, skey);
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 6bc16f1..031433d 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -71,6 +71,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/tqual.h"
@@ -2256,9 +2257,11 @@ AlterDomainNotNull(List *names, bool notNull)
TupleDesc tupdesc = RelationGetDescr(testrel);
HeapScanDesc scan;
HeapTuple tuple;
+ Snapshot snapshot;
/* Scan all tuples in this relation */
- scan = heap_beginscan(testrel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(testrel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
int i;
@@ -2288,6 +2291,7 @@ AlterDomainNotNull(List *names, bool notNull)
}
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
/* Close each rel after processing, but keep lock */
heap_close(testrel, NoLock);
@@ -2356,7 +2360,7 @@ AlterDomainDropConstraint(List *names, const char *constrName,
ObjectIdGetDatum(HeapTupleGetOid(tup)));
conscan = systable_beginscan(conrel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
/*
* Scan over the result set, removing any matching entries.
@@ -2551,7 +2555,7 @@ AlterDomainValidateConstraint(List *names, char *constrName)
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(domainoid));
scan = systable_beginscan(conrel, ConstraintTypidIndexId,
- true, SnapshotNow, 1, &key);
+ true, NULL, 1, &key);
while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
@@ -2638,9 +2642,11 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
TupleDesc tupdesc = RelationGetDescr(testrel);
HeapScanDesc scan;
HeapTuple tuple;
+ Snapshot snapshot;
/* Scan all tuples in this relation */
- scan = heap_beginscan(testrel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(testrel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
int i;
@@ -2684,6 +2690,7 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
ResetExprContext(econtext);
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
/* Hold relation lock till commit (XXX bad for concurrency) */
heap_close(testrel, NoLock);
@@ -2751,7 +2758,7 @@ get_rels_with_domain(Oid domainOid, LOCKMODE lockmode)
ObjectIdGetDatum(domainOid));
depScan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 2, key);
+ NULL, 2, key);
while (HeapTupleIsValid(depTup = systable_getnext(depScan)))
{
@@ -3066,7 +3073,7 @@ GetDomainConstraints(Oid typeOid)
ObjectIdGetDatum(typeOid));
scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
- SnapshotNow, 1, key);
+ NULL, 1, key);
while (HeapTupleIsValid(conTup = systable_getnext(scan)))
{
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 844f25c..e101a86 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1006,7 +1006,7 @@ DropRole(DropRoleStmt *stmt)
ObjectIdGetDatum(roleid));
sscan = systable_beginscan(pg_auth_members_rel, AuthMemRoleMemIndexId,
- true, SnapshotNow, 1, &scankey);
+ true, NULL, 1, &scankey);
while (HeapTupleIsValid(tmp_tuple = systable_getnext(sscan)))
{
@@ -1021,7 +1021,7 @@ DropRole(DropRoleStmt *stmt)
ObjectIdGetDatum(roleid));
sscan = systable_beginscan(pg_auth_members_rel, AuthMemMemRoleIndexId,
- true, SnapshotNow, 1, &scankey);
+ true, NULL, 1, &scankey);
while (HeapTupleIsValid(tmp_tuple = systable_getnext(sscan)))
{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 641c740..68fc9c6 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -351,7 +351,7 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
pgclass = heap_open(RelationRelationId, AccessShareLock);
- scan = heap_beginscan(pgclass, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(pgclass, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -735,7 +735,7 @@ vac_update_datfrozenxid(void)
relation = heap_open(RelationRelationId, AccessShareLock);
scan = systable_beginscan(relation, InvalidOid, false,
- SnapshotNow, 0, NULL);
+ NULL, 0, NULL);
while ((classTup = systable_getnext(scan)) != NULL)
{
@@ -852,7 +852,7 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId frozenMulti)
*/
relation = heap_open(DatabaseRelationId, AccessShareLock);
- scan = heap_beginscan(relation, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(relation, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d2b2721..bbb89e6 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -4,16 +4,16 @@
* Routines to support bitmapped scans of relations
*
* NOTE: it is critical that this plan type only be used with MVCC-compliant
- * snapshots (ie, regular snapshots, not SnapshotNow or one of the other
+ * snapshots (ie, regular snapshots, not SnapshotAny or one of the other
* special snapshots). The reason is that since index and heap scans are
* decoupled, there can be no assurance that the index tuple prompting a
* visit to a particular heap TID still exists when the visit is made.
* Therefore the tuple might not exist anymore either (which is OK because
* heap_fetch will cope) --- but worse, the tuple slot could have been
* re-used for a newer tuple. With an MVCC snapshot the newer tuple is
- * certain to fail the time qual and so it will not be mistakenly returned.
- * With SnapshotNow we might return a tuple that doesn't meet the required
- * index qual conditions.
+ * certain to fail the time qual and so it will not be mistakenly returned,
+ * but with anything else we might return a tuple that doesn't meet the
+ * required index qual conditions.
*
*
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index cd88061..5b9f348 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1855,7 +1855,7 @@ get_database_list(void)
(void) GetTransactionSnapshot();
rel = heap_open(DatabaseRelationId, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
@@ -2002,7 +2002,7 @@ do_autovacuum(void)
* wide tables there might be proportionally much more activity in the
* TOAST table than in its parent.
*/
- relScan = heap_beginscan(classRel, SnapshotNow, 0, NULL);
+ relScan = heap_beginscan_catalog(classRel, 0, NULL);
/*
* On the first pass, we collect main tables to vacuum, and also the main
@@ -2120,7 +2120,7 @@ do_autovacuum(void)
BTEqualStrategyNumber, F_CHAREQ,
CharGetDatum(RELKIND_TOASTVALUE));
- relScan = heap_beginscan(classRel, SnapshotNow, 1, &key);
+ relScan = heap_beginscan_catalog(classRel, 1, &key);
while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
{
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ac20dff..e539bac 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -59,6 +59,7 @@
#include "utils/memutils.h"
#include "utils/ps_status.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
@@ -1097,6 +1098,7 @@ pgstat_collect_oids(Oid catalogid)
Relation rel;
HeapScanDesc scan;
HeapTuple tup;
+ Snapshot snapshot;
memset(&hash_ctl, 0, sizeof(hash_ctl));
hash_ctl.keysize = sizeof(Oid);
@@ -1109,7 +1111,8 @@ pgstat_collect_oids(Oid catalogid)
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
rel = heap_open(catalogid, AccessShareLock);
- scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(rel, snapshot, 0, NULL);
while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid thisoid = HeapTupleGetOid(tup);
@@ -1119,6 +1122,7 @@ pgstat_collect_oids(Oid catalogid)
(void) hash_search(htab, (void *) &thisoid, HASH_ENTER, NULL);
}
heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
heap_close(rel, AccessShareLock);
return htab;
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..3157aba 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -38,6 +38,7 @@
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/tqual.h"
@@ -418,14 +419,17 @@ DefineQueryRewrite(char *rulename,
event_relation->rd_rel->relkind != RELKIND_MATVIEW)
{
HeapScanDesc scanDesc;
+ Snapshot snapshot;
- scanDesc = heap_beginscan(event_relation, SnapshotNow, 0, NULL);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scanDesc = heap_beginscan(event_relation, snapshot, 0, NULL);
if (heap_getnext(scanDesc, ForwardScanDirection) != NULL)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("could not convert table \"%s\" to a view because it is not empty",
RelationGetRelationName(event_relation))));
heap_endscan(scanDesc);
+ UnregisterSnapshot(snapshot);
if (event_relation->rd_rel->relhastriggers)
ereport(ERROR,
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index a467588..d4b9708 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2092,8 +2092,8 @@ relation_is_updatable(Oid reloid, bool include_triggers)
/*
* If the relation doesn't exist, return zero rather than throwing an
* error. This is helpful since scanning an information_schema view under
- * MVCC rules can result in referencing rels that were just deleted
- * according to a SnapshotNow probe.
+ * MVCC rules can result in referencing rels that have actually been
+ * deleted already.
*/
if (rel == NULL)
return 0;
diff --git a/src/backend/rewrite/rewriteRemove.c b/src/backend/rewrite/rewriteRemove.c
index 75fc776..51e27cf 100644
--- a/src/backend/rewrite/rewriteRemove.c
+++ b/src/backend/rewrite/rewriteRemove.c
@@ -58,7 +58,7 @@ RemoveRewriteRuleById(Oid ruleOid)
ObjectIdGetDatum(ruleOid));
rcscan = systable_beginscan(RewriteRelation, RewriteOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
tuple = systable_getnext(rcscan);
diff --git a/src/backend/rewrite/rewriteSupport.c b/src/backend/rewrite/rewriteSupport.c
index f481c53..a687342 100644
--- a/src/backend/rewrite/rewriteSupport.c
+++ b/src/backend/rewrite/rewriteSupport.c
@@ -143,7 +143,7 @@ get_rewrite_oid_without_relid(const char *rulename,
CStringGetDatum(rulename));
RewriteRelation = heap_open(RewriteRelationId, AccessShareLock);
- scanDesc = heap_beginscan(RewriteRelation, SnapshotNow, 1, &scanKeyData);
+ scanDesc = heap_beginscan_catalog(RewriteRelation, 1, &scanKeyData);
htup = heap_getnext(scanDesc, ForwardScanDirection);
if (!HeapTupleIsValid(htup))
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index b98110c..fb91571 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -250,7 +250,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
if (flags & INV_WRITE)
{
- retval->snapshot = SnapshotNow;
+ retval->snapshot = NULL; /* instantaneous MVCC snapshot */
retval->flags = IFS_WRLOCK | IFS_RDLOCK;
}
else if (flags & INV_READ)
@@ -270,7 +270,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
errmsg("invalid flags for opening a large object: %d",
flags)));
- /* Can't use LargeObjectExists here because it always uses SnapshotNow */
+ /* Can't use LargeObjectExists here because we need to specify snapshot */
if (!myLargeObjectExists(lobjId, retval->snapshot))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -288,9 +288,8 @@ inv_close(LargeObjectDesc *obj_desc)
{
Assert(PointerIsValid(obj_desc));
- if (obj_desc->snapshot != SnapshotNow)
- UnregisterSnapshotFromOwner(obj_desc->snapshot,
- TopTransactionResourceOwner);
+ UnregisterSnapshotFromOwner(obj_desc->snapshot,
+ TopTransactionResourceOwner);
pfree(obj_desc);
}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..5ddeffe 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -697,7 +697,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
* That leads to a couple of choices. We work from the pg_class row alone
* rather than actually opening each relation, for efficiency. We don't
* fail if we can't find the relation --- some rows might be visible in
- * the query's MVCC snapshot but already dead according to SnapshotNow.
+ * the query's MVCC snapshot even though the relations have been dropped.
* (Note: we could avoid using the catcache, but there's little point
* because the relation mapper also works "in the now".) We also don't
* fail if the relation doesn't have storage. In all these cases it
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 0d1ff61..fa61f5a 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -104,7 +104,7 @@ regprocin(PG_FUNCTION_ARGS)
hdesc = heap_open(ProcedureRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, ProcedureNameArgsNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
{
@@ -472,7 +472,7 @@ regoperin(PG_FUNCTION_ARGS)
hdesc = heap_open(OperatorRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, OperatorNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
{
@@ -843,7 +843,7 @@ regclassin(PG_FUNCTION_ARGS)
hdesc = heap_open(RelationRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, ClassNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
result = HeapTupleGetOid(tuple);
@@ -1007,7 +1007,7 @@ regtypein(PG_FUNCTION_ARGS)
hdesc = heap_open(TypeRelationId, AccessShareLock);
sysscan = systable_beginscan(hdesc, TypeNameNspIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
result = HeapTupleGetOid(tuple);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index a1ed781..cf9ce3f 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -704,7 +704,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
ObjectIdGetDatum(trigid));
tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
ht_trig = systable_getnext(tgscan);
@@ -1796,7 +1796,7 @@ pg_get_serial_sequence(PG_FUNCTION_ARGS)
Int32GetDatum(attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
- SnapshotNow, 3, key);
+ NULL, 3, key);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index cc91406..d12da76 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1182,7 +1182,7 @@ SearchCatCache(CatCache *cache,
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
- SnapshotNow,
+ NULL,
cache->cc_nkeys,
cur_skey);
@@ -1461,7 +1461,7 @@ SearchCatCacheList(CatCache *cache,
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
- SnapshotNow,
+ NULL,
nkeys,
cur_skey);
diff --git a/src/backend/utils/cache/evtcache.c b/src/backend/utils/cache/evtcache.c
index 2180f2a..c2242c4 100644
--- a/src/backend/utils/cache/evtcache.c
+++ b/src/backend/utils/cache/evtcache.c
@@ -129,13 +129,11 @@ BuildEventTriggerCache(void)
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
/*
- * Prepare to scan pg_event_trigger in name order. We use an MVCC
- * snapshot to avoid getting inconsistent results if the table is being
- * concurrently updated.
+ * Prepare to scan pg_event_trigger in name order.
*/
rel = relation_open(EventTriggerRelationId, AccessShareLock);
irel = index_open(EventTriggerNameIndexId, AccessShareLock);
- scan = systable_beginscan_ordered(rel, irel, GetLatestSnapshot(), 0, NULL);
+ scan = systable_beginscan_ordered(rel, irel, NULL, 0, NULL);
/*
* Build a cache item for each pg_event_trigger tuple, and append each one
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index e0dc126..3356d0f 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -9,8 +9,8 @@
* consider that it is *still valid* so long as we are in the same command,
* ie, until the next CommandCounterIncrement() or transaction commit.
* (See utils/time/tqual.c, and note that system catalogs are generally
- * scanned under SnapshotNow rules by the system, or plain user snapshots
- * for user queries.) At the command boundary, the old tuple stops
+ * scanned under the most current snapshot available, rather than the
+ * transaction snapshot.) At the command boundary, the old tuple stops
* being valid and the new version, if any, becomes valid. Therefore,
* we cannot simply flush a tuple from the system caches during heap_update()
* or heap_delete(). The tuple is still good at that point; what's more,
@@ -106,6 +106,7 @@
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/relmapper.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -373,6 +374,29 @@ AddRelcacheInvalidationMessage(InvalidationListHeader *hdr,
}
/*
+ * Add a snapshot inval entry
+ */
+static void
+AddSnapshotInvalidationMessage(InvalidationListHeader *hdr,
+ Oid dbId, Oid relId)
+{
+ SharedInvalidationMessage msg;
+
+ /* Don't add a duplicate item */
+ /* We assume dbId need not be checked because it will never change */
+ ProcessMessageList(hdr->rclist,
+ if (msg->sn.id == SHAREDINVALSNAPSHOT_ID &&
+ msg->sn.relId == relId)
+ return);
+
+ /* OK, add the item */
+ msg.sn.id = SHAREDINVALSNAPSHOT_ID;
+ msg.sn.dbId = dbId;
+ msg.sn.relId = relId;
+ AddInvalidationMessage(&hdr->rclist, &msg);
+}
+
+/*
* Append one list of invalidation messages to another, resetting
* the source list to empty.
*/
@@ -469,6 +493,19 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
}
/*
+ * RegisterSnapshotInvalidation
+ *
+ * Register a invalidation event for MVCC scans against a given catalog.
+ * Only needed for catalogs that don't have catcaches.
+ */
+static void
+RegisterSnapshotInvalidation(Oid dbId, Oid relId)
+{
+ AddSnapshotInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
+ dbId, relId);
+}
+
+/*
* LocalExecuteInvalidationMessage
*
* Process a single invalidation message (which could be of any type).
@@ -482,6 +519,8 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
if (msg->cc.dbId == MyDatabaseId || msg->cc.dbId == InvalidOid)
{
+ InvalidateCatalogSnapshot();
+
CatalogCacheIdInvalidate(msg->cc.id, msg->cc.hashValue);
CallSyscacheCallbacks(msg->cc.id, msg->cc.hashValue);
@@ -491,6 +530,8 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
if (msg->cat.dbId == MyDatabaseId || msg->cat.dbId == InvalidOid)
{
+ InvalidateCatalogSnapshot();
+
CatalogCacheFlushCatalog(msg->cat.catId);
/* CatalogCacheFlushCatalog calls CallSyscacheCallbacks as needed */
@@ -532,6 +573,14 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
else if (msg->rm.dbId == MyDatabaseId)
RelationMapInvalidate(false);
}
+ else if (msg->id == SHAREDINVALSNAPSHOT_ID)
+ {
+ /* We only care about our own database and shared catalogs */
+ if (msg->rm.dbId == InvalidOid)
+ InvalidateCatalogSnapshot();
+ else if (msg->rm.dbId == MyDatabaseId)
+ InvalidateCatalogSnapshot();
+ }
else
elog(FATAL, "unrecognized SI message ID: %d", msg->id);
}
@@ -552,6 +601,7 @@ InvalidateSystemCaches(void)
{
int i;
+ InvalidateCatalogSnapshot();
ResetCatalogCaches();
RelationCacheInvalidate(); /* gets smgr and relmap too */
@@ -1006,8 +1056,15 @@ CacheInvalidateHeapTuple(Relation relation,
/*
* First let the catcache do its thing
*/
- PrepareToInvalidateCacheTuple(relation, tuple, newtuple,
- RegisterCatcacheInvalidation);
+ tupleRelId = RelationGetRelid(relation);
+ if (RelationInvalidatesSnapshotsOnly(tupleRelId))
+ {
+ databaseId = IsSharedRelation(tupleRelId) ? InvalidOid : MyDatabaseId;
+ RegisterSnapshotInvalidation(databaseId, tupleRelId);
+ }
+ else
+ PrepareToInvalidateCacheTuple(relation, tuple, newtuple,
+ RegisterCatcacheInvalidation);
/*
* Now, is this tuple one of the primary definers of a relcache entry?
@@ -1015,8 +1072,6 @@ CacheInvalidateHeapTuple(Relation relation,
* Note we ignore newtuple here; we assume an update cannot move a tuple
* from being part of one relcache entry to being part of another.
*/
- tupleRelId = RelationGetRelid(relation);
-
if (tupleRelId == RelationRelationId)
{
Form_pg_class classtup = (Form_pg_class) GETSTRUCT(tuple);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index f114038..5a2e755 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -266,7 +266,8 @@ static void unlink_initfile(const char *initfilename);
* tuple matching targetRelId. The caller must hold at least
* AccessShareLock on the target relid to prevent concurrent-update
* scenarios --- else our SnapshotNow scan might fail to find any
- * version that it thinks is live.
+ * version that it thinks is live. XXX: Now that we have MVCC
+ * catalog access, this hazard no longer exists.
*
* NB: the returned tuple has been copied into palloc'd storage
* and must eventually be freed with heap_freetuple.
@@ -305,7 +306,7 @@ ScanPgRelation(Oid targetRelId, bool indexOK)
pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
indexOK && criticalRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
pg_class_tuple = systable_getnext(pg_class_scan);
@@ -480,7 +481,7 @@ RelationBuildTupleDesc(Relation relation)
pg_attribute_scan = systable_beginscan(pg_attribute_desc,
AttributeRelidNumIndexId,
criticalRelcachesBuilt,
- SnapshotNow,
+ NULL,
2, skey);
/*
@@ -663,7 +664,7 @@ RelationBuildRuleLock(Relation relation)
rewrite_tupdesc = RelationGetDescr(rewrite_desc);
rewrite_scan = systable_beginscan(rewrite_desc,
RewriteRelRulenameIndexId,
- true, SnapshotNow,
+ true, NULL,
1, &key);
while (HeapTupleIsValid(rewrite_tuple = systable_getnext(rewrite_scan)))
@@ -1313,7 +1314,7 @@ LookupOpclassInfo(Oid operatorClassOid,
ObjectIdGetDatum(operatorClassOid));
rel = heap_open(OperatorClassRelationId, AccessShareLock);
scan = systable_beginscan(rel, OpclassOidIndexId, indexOK,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
if (HeapTupleIsValid(htup = systable_getnext(scan)))
{
@@ -1348,7 +1349,7 @@ LookupOpclassInfo(Oid operatorClassOid,
ObjectIdGetDatum(opcentry->opcintype));
rel = heap_open(AccessMethodProcedureRelationId, AccessShareLock);
scan = systable_beginscan(rel, AccessMethodProcedureIndexId, indexOK,
- SnapshotNow, 3, skey);
+ NULL, 3, skey);
while (HeapTupleIsValid(htup = systable_getnext(scan)))
{
@@ -3317,7 +3318,7 @@ AttrDefaultFetch(Relation relation)
adrel = heap_open(AttrDefaultRelationId, AccessShareLock);
adscan = systable_beginscan(adrel, AttrDefaultIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
found = 0;
while (HeapTupleIsValid(htup = systable_getnext(adscan)))
@@ -3384,7 +3385,7 @@ CheckConstraintFetch(Relation relation)
conrel = heap_open(ConstraintRelationId, AccessShareLock);
conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
while (HeapTupleIsValid(htup = systable_getnext(conscan)))
{
@@ -3487,7 +3488,7 @@ RelationGetIndexList(Relation relation)
indrel = heap_open(IndexRelationId, AccessShareLock);
indscan = systable_beginscan(indrel, IndexIndrelidIndexId, true,
- SnapshotNow, 1, &skey);
+ NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(indscan)))
{
@@ -3938,7 +3939,7 @@ RelationGetExclusionInfo(Relation indexRelation,
conrel = heap_open(ConstraintRelationId, AccessShareLock);
conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
- SnapshotNow, 1, skey);
+ NULL, 1, skey);
found = false;
while (HeapTupleIsValid(htup = systable_getnext(conscan)))
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index ecb0f96..1ff2f2b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -33,7 +33,10 @@
#include "catalog/pg_constraint.h"
#include "catalog/pg_conversion.h"
#include "catalog/pg_database.h"
+#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_default_acl.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_description.h"
#include "catalog/pg_enum.h"
#include "catalog/pg_event_trigger.h"
#include "catalog/pg_foreign_data_wrapper.h"
@@ -47,6 +50,10 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_range.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_seclabel.h"
+#include "catalog/pg_shdepend.h"
+#include "catalog/pg_shdescription.h"
+#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
@@ -796,6 +803,10 @@ static CatCache *SysCache[
static int SysCacheSize = lengthof(cacheinfo);
static bool CacheInitialized = false;
+static Oid SysCacheRelationOid[lengthof(cacheinfo)];
+static int SysCacheRelationOidSize;
+
+static int oid_compare(const void *a, const void *b);
/*
* InitCatalogCache - initialize the caches
@@ -809,6 +820,8 @@ void
InitCatalogCache(void)
{
int cacheId;
+ int i,
+ j = 0;
Assert(!CacheInitialized);
@@ -825,11 +838,23 @@ InitCatalogCache(void)
if (!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "could not initialize cache %u (%d)",
cacheinfo[cacheId].reloid, cacheId);
+ SysCacheRelationOid[SysCacheRelationOidSize++] =
+ cacheinfo[cacheId].reloid;
+ /* see comments for RelationInvalidatesSnapshotsOnly */
+ Assert(!RelationInvalidatesSnapshotsOnly(cacheinfo[cacheId].reloid));
}
+
+ /* Sort and dedup OIDs. */
+ pg_qsort(SysCacheRelationOid, SysCacheRelationOidSize,
+ sizeof(Oid), oid_compare);
+ for (i = 1; i < SysCacheRelationOidSize; ++i)
+ if (SysCacheRelationOid[i] != SysCacheRelationOid[j])
+ SysCacheRelationOid[++j] = SysCacheRelationOid[i];
+ SysCacheRelationOidSize = j + 1;
+
CacheInitialized = true;
}
-
/*
* InitCatalogCachePhase2 - finish initializing the caches
*
@@ -1113,3 +1138,73 @@ SearchSysCacheList(int cacheId, int nkeys,
return SearchCatCacheList(SysCache[cacheId], nkeys,
key1, key2, key3, key4);
}
+
+/*
+ * Certain relations that do not have system caches send snapshot invalidation
+ * messages in lieu of catcache messages. This is for the benefit of
+ * GetCatalogSnapshot(), which can then reuse its existing MVCC snapshot
+ * for scanning one of those catalogs, rather than taking a new one, if no
+ * invalidation has been received.
+ *
+ * Relations that have syscaches need not (and must not) be listed here. The
+ * catcache invalidation messages will also flush the snapshot. If you add a
+ * syscache for one of these relations, remove it from this list.
+ */
+bool
+RelationInvalidatesSnapshotsOnly(Oid relid)
+{
+ switch (relid)
+ {
+ case DbRoleSettingRelationId:
+ case DependRelationId:
+ case SharedDependRelationId:
+ case DescriptionRelationId:
+ case SharedDescriptionRelationId:
+ case SecLabelRelationId:
+ case SharedSecLabelRelationId:
+ return true;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Test whether a relation has a system cache.
+ */
+bool
+RelationHasSysCache(Oid relid)
+{
+ int low = 0,
+ high = SysCacheRelationOidSize - 1;
+
+ while (low <= high)
+ {
+ int middle = low + (high - low) / 2;
+
+ if (SysCacheRelationOid[middle] == relid)
+ return true;
+ if (SysCacheRelationOid[middle] < relid)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return false;
+}
+
+
+/*
+ * OID comparator for pg_qsort
+ */
+static int
+oid_compare(const void *a, const void *b)
+{
+ Oid oa = *((Oid *) a);
+ Oid ob = *((Oid *) b);
+
+ if (oa == ob)
+ return 0;
+ return (oa > ob) ? 1 : -1;
+}
diff --git a/src/backend/utils/cache/ts_cache.c b/src/backend/utils/cache/ts_cache.c
index 65a8ad7..4e79247 100644
--- a/src/backend/utils/cache/ts_cache.c
+++ b/src/backend/utils/cache/ts_cache.c
@@ -484,7 +484,7 @@ lookup_ts_config_cache(Oid cfgId)
maprel = heap_open(TSConfigMapRelationId, AccessShareLock);
mapidx = index_open(TSConfigMapIndexId, AccessShareLock);
mapscan = systable_beginscan_ordered(maprel, mapidx,
- SnapshotNow, 1, &mapskey);
+ NULL, 1, &mapskey);
while ((maptup = systable_getnext_ordered(mapscan, ForwardScanDirection)) != NULL)
{
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 2fa6d33..04cb74c 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -1082,12 +1082,7 @@ load_enum_cache_data(TypeCacheEntry *tcache)
items = (EnumItem *) palloc(sizeof(EnumItem) * maxitems);
numitems = 0;
- /*
- * Scan pg_enum for the members of the target enum type. We use a current
- * MVCC snapshot, *not* SnapshotNow, so that we see a consistent set of
- * rows even if someone commits a renumbering of the enum meanwhile. See
- * comments for RenumberEnumType in catalog/pg_enum.c for more info.
- */
+ /* Scan pg_enum for the members of the target enum type. */
ScanKeyInit(&skey,
Anum_pg_enum_enumtypid,
BTEqualStrategyNumber, F_OIDEQ,
@@ -1096,7 +1091,7 @@ load_enum_cache_data(TypeCacheEntry *tcache)
enum_rel = heap_open(EnumRelationId, AccessShareLock);
enum_scan = systable_beginscan(enum_rel,
EnumTypIdLabelIndexId,
- true, GetLatestSnapshot(),
+ true, NULL,
1, &skey);
while (HeapTupleIsValid(enum_tuple = systable_getnext(enum_scan)))
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e0ea2e9..127f927 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -111,7 +111,7 @@ GetDatabaseTuple(const char *dbname)
relation = heap_open(DatabaseRelationId, AccessShareLock);
scan = systable_beginscan(relation, DatabaseNameIndexId,
criticalSharedRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
tuple = systable_getnext(scan);
@@ -154,7 +154,7 @@ GetDatabaseTupleByOid(Oid dboid)
relation = heap_open(DatabaseRelationId, AccessShareLock);
scan = systable_beginscan(relation, DatabaseOidIndexId,
criticalSharedRelcachesBuilt,
- SnapshotNow,
+ NULL,
1, key);
tuple = systable_getnext(scan);
@@ -997,18 +997,23 @@ static void
process_settings(Oid databaseid, Oid roleid)
{
Relation relsetting;
+ Snapshot snapshot;
if (!IsUnderPostmaster)
return;
relsetting = heap_open(DbRoleSettingRelationId, AccessShareLock);
+ /* read all the settings under the same snapsot for efficiency */
+ snapshot = RegisterSnapshot(GetCatalogSnapshot(DbRoleSettingRelationId));
+
/* Later settings are ignored if set earlier. */
- ApplySetting(databaseid, roleid, relsetting, PGC_S_DATABASE_USER);
- ApplySetting(InvalidOid, roleid, relsetting, PGC_S_USER);
- ApplySetting(databaseid, InvalidOid, relsetting, PGC_S_DATABASE);
- ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_GLOBAL);
+ ApplySetting(snapshot, databaseid, roleid, relsetting, PGC_S_DATABASE_USER);
+ ApplySetting(snapshot, InvalidOid, roleid, relsetting, PGC_S_USER);
+ ApplySetting(snapshot, databaseid, InvalidOid, relsetting, PGC_S_DATABASE);
+ ApplySetting(snapshot, InvalidOid, InvalidOid, relsetting, PGC_S_GLOBAL);
+ UnregisterSnapshot(snapshot);
heap_close(relsetting, AccessShareLock);
}
@@ -1078,7 +1083,7 @@ ThereIsAtLeastOneRole(void)
pg_authid_rel = heap_open(AuthIdRelationId, AccessShareLock);
- scan = heap_beginscan(pg_authid_rel, SnapshotNow, 0, NULL);
+ scan = heap_beginscan_catalog(pg_authid_rel, 0, NULL);
result = (heap_getnext(scan, ForwardScanDirection) != NULL);
heap_endscan(scan);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e739d2d..584d70c 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -46,10 +46,12 @@
#include "storage/predicate.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "storage/sinval.h"
#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/resowner_private.h"
#include "utils/snapmgr.h"
+#include "utils/syscache.h"
#include "utils/tqual.h"
@@ -58,17 +60,26 @@
* mode, and to the latest one taken in a read-committed transaction.
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
- * special-purpose code (say, RI checking.)
+ * special-purpose code (say, RI checking.) CatalogSnapshot points to an
+ * MVCC snapshot intended to be used for catalog scans; we must refresh it
+ * whenever a system catalog change occurs.
*
* These SnapshotData structs are static to simplify memory allocation
* (see the hack in GetSnapshotData to avoid repeated malloc/free).
*/
static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};
static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};
+static SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
/* Pointers to valid snapshots */
static Snapshot CurrentSnapshot = NULL;
static Snapshot SecondarySnapshot = NULL;
+static Snapshot CatalogSnapshot = NULL;
+
+/*
+ * Staleness detection for CatalogSnapshot.
+ */
+static bool CatalogSnapshotStale = true;
/*
* These are updated by GetSnapshotData. We initialize them this way
@@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+ /* Don't allow catalog snapshot to be older than xact snapshot. */
+ CatalogSnapshotStale = true;
+
FirstSnapshotSet = true;
return CurrentSnapshot;
}
@@ -184,6 +198,9 @@ GetTransactionSnapshot(void)
if (IsolationUsesXactSnapshot())
return CurrentSnapshot;
+ /* Don't allow catalog snapshot to be older than xact snapshot. */
+ CatalogSnapshotStale = true;
+
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
return CurrentSnapshot;
@@ -207,6 +224,54 @@ GetLatestSnapshot(void)
}
/*
+ * GetCatalogSnapshot
+ * Get a snapshot that is sufficiently up-to-date for scan of the
+ * system catalog with the specified OID.
+ */
+Snapshot
+GetCatalogSnapshot(Oid relid)
+{
+ /*
+ * If the caller is trying to scan a relation that has no syscache,
+ * no catcache invalidations will be sent when it is updated. For a
+ * a few key relations, snapshot invalidations are sent instead. If
+ * we're trying to scan a relation for which neither catcache nor
+ * snapshot invalidations are sent, we must refresh the snapshot every
+ * time.
+ */
+ if (!CatalogSnapshotStale && !RelationInvalidatesSnapshotsOnly(relid) &&
+ !RelationHasSysCache(relid))
+ CatalogSnapshotStale = true;
+
+ if (CatalogSnapshotStale)
+ {
+ /* Get new snapshot. */
+ CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData);
+
+ /*
+ * Mark new snapshost as valid. We must do this last, in case an
+ * ERROR occurs inside GetSnapshotData().
+ */
+ CatalogSnapshotStale = false;
+ }
+
+ return CatalogSnapshot;
+}
+
+/*
+ * Mark the current catalog snapshot as invalid. We could change this API
+ * to allow the caller to provide more fine-grained invalidation details, so
+ * that a change to relation A wouldn't prevent us from using our cached
+ * snapshot to scan relation B, but so far there's no evidence that the CPU
+ * cycles we spent tracking such fine details would be well-spent.
+ */
+void
+InvalidateCatalogSnapshot()
+{
+ CatalogSnapshotStale = true;
+}
+
+/*
* SnapshotSetCommandId
* Propagate CommandCounterIncrement into the static snapshots, if set
*/
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index becc82b..9ee9ea2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -14,13 +14,13 @@
* Note that pg_dump runs in a transaction-snapshot mode transaction,
* so it sees a consistent snapshot of the database including system
* catalogs. However, it relies in part on various specialized backend
- * functions like pg_get_indexdef(), and those things tend to run on
- * SnapshotNow time, ie they look at the currently committed state. So
- * it is possible to get 'cache lookup failed' error if someone
- * performs DDL changes while a dump is happening. The window for this
- * sort of thing is from the acquisition of the transaction snapshot to
- * getSchemaData() (when pg_dump acquires AccessShareLock on every
- * table it intends to dump). It isn't very large, but it can happen.
+ * functions like pg_get_indexdef(), and those things tend to look at
+ * the currently committed state. So it is possible to get 'cache
+ * lookup failed' error if someone performs DDL changes while a dump is
+ * happening. The window for this sort of thing is from the acquisition
+ * of the transaction snapshot to getSchemaData() (when pg_dump acquires
+ * AccessShareLock on every table it intends to dump). It isn't very large,
+ * but it can happen.
*
* http://archives.postgresql.org/pgsql-bugs/2010-02/msg00187.php
*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index baa8c50..0d40398 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -105,6 +105,8 @@ typedef struct HeapScanDescData *HeapScanDesc;
extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_catalog(Relation relation, int nkeys,
+ ScanKey key);
extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
bool allow_strat, bool allow_sync);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 5b58028..3a86ca4 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -32,6 +32,7 @@ typedef struct HeapScanDescData
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
+ bool rs_temp_snap; /* unregister snapshot at scan end? */
/* state set up at initscan time */
BlockNumber rs_nblocks; /* number of blocks to scan */
@@ -101,6 +102,7 @@ typedef struct SysScanDescData
Relation irel; /* NULL if doing heap scan */
HeapScanDesc scan; /* only valid in heap-scan case */
IndexScanDesc iscan; /* only valid in index-scan case */
+ Snapshot snapshot; /* snapshot to unregister at end of scan */
} SysScanDescData;
#endif /* RELSCAN_H */
diff --git a/src/include/catalog/objectaccess.h b/src/include/catalog/objectaccess.h
index 8394401..c8a95a6 100644
--- a/src/include/catalog/objectaccess.h
+++ b/src/include/catalog/objectaccess.h
@@ -24,8 +24,8 @@
*
* OAT_POST_ALTER should be invoked just after the object is altered,
* but before the command counter is incremented. An extension using the
- * hook can use SnapshotNow and SnapshotSelf to get the old and new
- * versions of the tuple.
+ * hook can use a current MVCC snapshot to get the old version of the tuple,
+ * and can use SnapshotSelf to get the new version of the tuple.
*
* OAT_NAMESPACE_SEARCH should be invoked prior to object name lookup under
* a particular namespace. This event is equivalent to usage permission
diff --git a/src/include/catalog/pg_db_role_setting.h b/src/include/catalog/pg_db_role_setting.h
index 070cbc8..649f5c4 100644
--- a/src/include/catalog/pg_db_role_setting.h
+++ b/src/include/catalog/pg_db_role_setting.h
@@ -62,7 +62,7 @@ typedef FormData_pg_db_role_setting *Form_pg_db_role_setting;
*/
extern void AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt);
extern void DropSetting(Oid databaseid, Oid roleid);
-extern void ApplySetting(Oid databaseid, Oid roleid, Relation relsetting,
- GucSource source);
+extern void ApplySetting(Snapshot snapshot, Oid databaseid, Oid roleid,
+ Relation relsetting, GucSource source);
#endif /* PG_DB_ROLE_SETTING_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 9e833ca..7e70e57 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -24,6 +24,7 @@
* * invalidate a relcache entry for a specific logical relation
* * invalidate an smgr cache entry for a specific physical relation
* * invalidate the mapped-relation mapping for a given database
+ * * invalidate any saved snapshot that might be used to scan a given relation
* More types could be added if needed. The message type is identified by
* the first "int8" field of the message struct. Zero or positive means a
* specific-catcache inval message (and also serves as the catcache ID field).
@@ -43,11 +44,11 @@
* catcache inval messages must be generated for each of its caches, since
* the hash keys will generally be different.
*
- * Catcache and relcache invalidations are transactional, and so are sent
- * to other backends upon commit. Internally to the generating backend,
- * they are also processed at CommandCounterIncrement so that later commands
- * in the same transaction see the new state. The generating backend also
- * has to process them at abort, to flush out any cache state it's loaded
+ * Catcache, relcache, and snapshot invalidations are transactional, and so
+ * are sent to other backends upon commit. Internally to the generating
+ * backend, they are also processed at CommandCounterIncrement so that later
+ * commands in the same transaction see the new state. The generating backend
+ * also has to process them at abort, to flush out any cache state it's loaded
* from no-longer-valid entries.
*
* smgr and relation mapping invalidations are non-transactional: they are
@@ -98,6 +99,15 @@ typedef struct
Oid dbId; /* database ID, or 0 for shared catalogs */
} SharedInvalRelmapMsg;
+#define SHAREDINVALSNAPSHOT_ID (-5)
+
+typedef struct
+{
+ int8 id; /* type field --- must be first */
+ Oid dbId; /* database ID, or 0 if a shared relation */
+ Oid relId; /* relation ID */
+} SharedInvalSnapshotMsg;
+
typedef union
{
int8 id; /* type field --- must be first */
@@ -106,6 +116,7 @@ typedef union
SharedInvalRelcacheMsg rc;
SharedInvalSmgrMsg sm;
SharedInvalRelmapMsg rm;
+ SharedInvalSnapshotMsg sn;
} SharedInvalidationMessage;
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index bfbd8dd..81a286c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -28,6 +28,9 @@ extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
extern void SnapshotSetCommandId(CommandId curcid);
+extern Snapshot GetCatalogSnapshot(Oid relid);
+extern void InvalidateCatalogSnapshot(void);
+
extern void PushActiveSnapshot(Snapshot snapshot);
extern void PushCopiedSnapshot(Snapshot snapshot);
extern void UpdateActiveSnapshotCommandId(void);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d1d8abe..e41b3d2 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -125,6 +125,9 @@ struct catclist;
extern struct catclist *SearchSysCacheList(int cacheId, int nkeys,
Datum key1, Datum key2, Datum key3, Datum key4);
+extern bool RelationInvalidatesSnapshotsOnly(Oid);
+extern bool RelationHasSysCache(Oid);
+
/*
* The use of the macros below rather than direct calls to the corresponding
* functions is encouraged, as it insulates the caller from changes in the
On 2013-06-28 23:14:23 -0400, Robert Haas wrote:
Here's a further update of this patch. In this version, I added some
mechanism to send a new kind of sinval message that is sent when a
catalog without catcaches is updated; it doesn't apply to all
catalogs, just to whichever ones we want to have this treatment. That
means we don't need to retake snapshots for those catalogs on every
access, so backend startup requires just one extra MVCC snapshot as
compared with current master. Assorted cleanup has been done, along
with the removal of a few more SnapshotNow references.
This is really cool stuff.
It's still possible to construct test cases that perform badly by
pounding the server with 1000 clients running Andres's
readonly-busy.sql. Consider the following test case: use a DO block
to create a schema with 10,000 functions in it and then DROP ..
CASCADE. When the server is unloaded, the extra MVCC overhead is
pretty small.
Well, now the create is 52% slower and the drop is a whopping 4.7x
slower. It's worth digging into the reasons just a bit. I was able
to speed up this case quite a bit - it was 30x slower a few hours ago
- by adding a few new relations to the switch in
RelationInvalidatesSnapshotsOnly(). But the code still takes one MVCC
snapshot per object dropped, because deleteOneObject() calls
CommandCounterIncrement() and that, as it must, invalidates our
previous snapshot.
I have to say, if the thing that primarily suffers is pretty extreme DDL
in extreme situations I am not really worried. Anybody running anything
close to the territory of such concurrency won't perform that much DDL.
We could, if we were inclined to spend the effort,
probably work out that although we need to change curcid, the rest of
the snapshot is still OK, but I'm not too convinced that it's worth
adding an even-more-complicated mechanism for this. We could probably
also optimize the delete code to increment the command counter fewer
times, but I'm not convinced that's worth doing either.
I am pretty convinced we shouldn't do either for now.
Something picked up when quickly scanning over the last version of the
patch:
+/* + * Staleness detection for CatalogSnapshot. + */ +static bool CatalogSnapshotStale = true;/*
* These are updated by GetSnapshotData. We initialize them this way
@@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);+ /* Don't allow catalog snapshot to be older than xact snapshot. */ + CatalogSnapshotStale = true; + FirstSnapshotSet = true; return CurrentSnapshot; } @@ -184,6 +198,9 @@ GetTransactionSnapshot(void) if (IsolationUsesXactSnapshot()) return CurrentSnapshot;+ /* Don't allow catalog snapshot to be older than xact snapshot. */ + CatalogSnapshotStale = true; + CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);return CurrentSnapshot;
@@ -207,6 +224,54 @@ GetLatestSnapshot(void)
}
Do we really need to invalidate snapshots in either situation? Isn't it
implied, that if it's still valid, according to a) no invalidation via local
invalidation messages b) no invalidations from other backends, there
shouldn't be any possible differences when you only look at the catalog?
And if it needs to change, we could copy the newly generated snapshot
to the catalog snapshot if it's currently valid.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 1, 2013 at 10:04 AM, Andres Freund <andres@2ndquadrant.com> wrote:
This is really cool stuff.
Thanks.
I have to say, if the thing that primarily suffers is pretty extreme DDL
in extreme situations I am not really worried. Anybody running anything
close to the territory of such concurrency won't perform that much DDL.
/me wipes brow.
Something picked up when quickly scanning over the last version of the
patch:+/* + * Staleness detection for CatalogSnapshot. + */ +static bool CatalogSnapshotStale = true;/*
* These are updated by GetSnapshotData. We initialize them this way
@@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);+ /* Don't allow catalog snapshot to be older than xact snapshot. */ + CatalogSnapshotStale = true; + FirstSnapshotSet = true; return CurrentSnapshot; } @@ -184,6 +198,9 @@ GetTransactionSnapshot(void) if (IsolationUsesXactSnapshot()) return CurrentSnapshot;+ /* Don't allow catalog snapshot to be older than xact snapshot. */ + CatalogSnapshotStale = true; + CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);return CurrentSnapshot;
@@ -207,6 +224,54 @@ GetLatestSnapshot(void)
}Do we really need to invalidate snapshots in either situation? Isn't it
implied, that if it's still valid, according to a) no invalidation via local
invalidation messages b) no invalidations from other backends, there
shouldn't be any possible differences when you only look at the catalog?
I had the same thought, removed that code, and then put it back. The
problem is that if we revive an older snapshot "from the dead", so to
speak, our backend's advertised xmin might need to go backwards, and
that seems unsafe - e.g. suppose another backend has updated a tuple
but not yet committed. We don't see any invalidation messages so
decide reuse our existing (old) snapshot and begin a scan. After
we've looked at the page containing the new tuple (and decided not to
see it), vacuum nukes the old tuple (which we then also don't see).
Bad things ensue. It might be possible to avoid the majority of
problems in this area via an appropriate set of grotty hacks, but I
don't want to go there.
And if it needs to change, we could copy the newly generated snapshot
to the catalog snapshot if it's currently valid.
Yeah, I think there's room for further fine-tuning there. But I think
it would make sense to push the patch at this point, and then if we
find cases that can be further improved, or things that it breaks, we
can fix them. This area is complicated enough that I wouldn't be
horribly surprised if we end up having to fix a few more problem cases
or even revert the whole thing, but I think we've probably reached the
point where further review has less value than getting the code out
there in front of more people and seeing where (if anywhere) the
wheels come off out in the wild.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-07-01 15:02:41 -0400, Robert Haas wrote:
* These are updated by GetSnapshotData. We initialize them this way
@@ -177,6 +188,9 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);+ /* Don't allow catalog snapshot to be older than xact snapshot. */ + CatalogSnapshotStale = true; +Do we really need to invalidate snapshots in either situation? Isn't it
implied, that if it's still valid, according to a) no invalidation via local
invalidation messages b) no invalidations from other backends, there
shouldn't be any possible differences when you only look at the catalog?I had the same thought, removed that code, and then put it back. The
problem is that if we revive an older snapshot "from the dead", so to
speak, our backend's advertised xmin might need to go backwards, and
that seems unsafe - e.g. suppose another backend has updated a tuple
but not yet committed. We don't see any invalidation messages so
decide reuse our existing (old) snapshot and begin a scan. After
we've looked at the page containing the new tuple (and decided not to
see it), vacuum nukes the old tuple (which we then also don't see).
Bad things ensue. It might be possible to avoid the majority of
problems in this area via an appropriate set of grotty hacks, but I
don't want to go there.
Yes, I thought about that and I think it's a problem that can be solved
without too ugly hacks. But, as you say:
Yeah, I think there's room for further fine-tuning there. But I think
it would make sense to push the patch at this point, and then if we
find cases that can be further improved, or things that it breaks, we
can fix them. This area is complicated enough that I wouldn't be
horribly surprised if we end up having to fix a few more problem cases
or even revert the whole thing, but I think we've probably reached the
point where further review has less value than getting the code out
there in front of more people and seeing where (if anywhere) the
wheels come off out in the wild.
I am pretty sure that we will have to fix more stuff, but luckily we're
in the beginning of the cycle. And while I'd prefer more eyes on the
patch before it gets applied, especially ones knowledgeable about the
implications this has, I don't really see that happening soon. So
applying is more likely to lead to more review than waiting.
So, from me: +1.
Some things that might be worth changing when committing:
* Could you add a Assert(!RelationHasSysCache(relid)) to
RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
missed by the next person adding a syscache and that seems like it
could have ugly and hard to detect consequences.
* maybe use bsearch(3) instead of open coding the binary search? We
already use it in the backend.
* possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Some things that might be worth changing when committing:
* Could you add a Assert(!RelationHasSysCache(relid)) to
RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
missed by the next person adding a syscache and that seems like it
could have ugly and hard to detect consequences.
There's a cross-check in InitCatalogCache() for that very issue.
* maybe use bsearch(3) instead of open coding the binary search? We
already use it in the backend.
I found comments elsewhere indicating that bsearch() was slower than
open-coding it, so I copied the logic used for ScanKeywordLookup().
* possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.
It'll actually work find as things stand; it'll just take a new
snapshot every time.
I have a few ideas for getting rid of the remaining uses of
SnapshotNow that I'd like to throw out there:
- In pgrowlocks and pgstattuple, I think it would be fine to use
SnapshotSelf instead of SnapshotNow. The only difference is that it
includes changes made by the current command that wouldn't otherwise
be visible until CommandCounterIncrement(). That, however, is not
really a problem for their usage.
- In genam.c and execUtils.c, we treat SnapshotNow as a kind of
default snapshot. That seems like a crappy idea. I propose that we
either set that pointer to NULL and let the server core dump if the
snapshot doesn't get set or (maybe better) add a new special snapshot
called SnapshotError which just errors out if you try to use it for
anything, and initialize to that.
- I'm not quite sure what to do about get_actual_variable_range().
Taking a new MVCC snapshot there seems like it might be pricey on some
workloads. However, I wonder if we could use SnapshotDirty.
Presumably, even uncommitted tuples affect the amount of
index-scanning we have to do, so that approach seems to have some
theoretical justification. But I'm worried there could be unfortunate
consequences to looking at uncommitted data, either now or in the
future. SnapshotSelf seems less likely to have that problem, but
feels wrong somehow.
- currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
argument to heap_get_latest_tid(). I don't know what these functions
are supposed to be good for, but taking a new snapshot for every
function call seems to guarantee that someone will find a way to use
these functions as a poster child for how to brutalize PGXACT, so I
don't particularly want to do that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Some things that might be worth changing when committing:
* Could you add a Assert(!RelationHasSysCache(relid)) to
RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
missed by the next person adding a syscache and that seems like it
could have ugly and hard to detect consequences.There's a cross-check in InitCatalogCache() for that very issue.
Great.
* maybe use bsearch(3) instead of open coding the binary search? We
already use it in the backend.I found comments elsewhere indicating that bsearch() was slower than
open-coding it, so I copied the logic used for ScanKeywordLookup().
Hm. Ok.
* possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.It'll actually work find as things stand; it'll just take a new
snapshot every time.
Ok. Doesn't really change my opinion that it's a crappy idea to use it
otherwise ;)
- In genam.c and execUtils.c, we treat SnapshotNow as a kind of
default snapshot. That seems like a crappy idea. I propose that we
either set that pointer to NULL and let the server core dump if the
snapshot doesn't get set or (maybe better) add a new special snapshot
called SnapshotError which just errors out if you try to use it for
anything, and initialize to that.
I vote for SnapshotError.
- I'm not quite sure what to do about get_actual_variable_range().
Taking a new MVCC snapshot there seems like it might be pricey on some
workloads. However, I wonder if we could use SnapshotDirty.
Presumably, even uncommitted tuples affect the amount of
index-scanning we have to do, so that approach seems to have some
theoretical justification. But I'm worried there could be unfortunate
consequences to looking at uncommitted data, either now or in the
future. SnapshotSelf seems less likely to have that problem, but
feels wrong somehow.
I don't like using SnapshotDirty either. Can't we just use the currently
active snapshot? Unless I miss something this always will be called
while we have one since when we plan we've done an explicit
PushActiveSnapshot() and if we need to replan stuff during execution
PortalRunSelect() will have pushed one.
- currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
argument to heap_get_latest_tid(). I don't know what these functions
are supposed to be good for, but taking a new snapshot for every
function call seems to guarantee that someone will find a way to use
these functions as a poster child for how to brutalize PGXACT, so I
don't particularly want to do that.
Heikki mentioned that at some point they were added for the odbc
driver. I am not particularly inclined to worry about taking too many
snapshots here.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 2, 2013 at 9:52 AM, Andres Freund <andres@2ndquadrant.com> wrote:
* possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.It'll actually work find as things stand; it'll just take a new
snapshot every time.Ok. Doesn't really change my opinion that it's a crappy idea to use it
otherwise ;)
I agree, but I don't see an easy way to write the assertion you want
using only the OID.
- In genam.c and execUtils.c, we treat SnapshotNow as a kind of
default snapshot. That seems like a crappy idea. I propose that we
either set that pointer to NULL and let the server core dump if the
snapshot doesn't get set or (maybe better) add a new special snapshot
called SnapshotError which just errors out if you try to use it for
anything, and initialize to that.I vote for SnapshotError.
OK.
- I'm not quite sure what to do about get_actual_variable_range().
Taking a new MVCC snapshot there seems like it might be pricey on some
workloads. However, I wonder if we could use SnapshotDirty.
Presumably, even uncommitted tuples affect the amount of
index-scanning we have to do, so that approach seems to have some
theoretical justification. But I'm worried there could be unfortunate
consequences to looking at uncommitted data, either now or in the
future. SnapshotSelf seems less likely to have that problem, but
feels wrong somehow.I don't like using SnapshotDirty either. Can't we just use the currently
active snapshot? Unless I miss something this always will be called
while we have one since when we plan we've done an explicit
PushActiveSnapshot() and if we need to replan stuff during execution
PortalRunSelect() will have pushed one.
We could certainly do that, but I'd be a bit reluctant to do so
without input from Tom. I imagine there might be cases where it could
cause a regression.
- currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
argument to heap_get_latest_tid(). I don't know what these functions
are supposed to be good for, but taking a new snapshot for every
function call seems to guarantee that someone will find a way to use
these functions as a poster child for how to brutalize PGXACT, so I
don't particularly want to do that.Heikki mentioned that at some point they were added for the odbc
driver. I am not particularly inclined to worry about taking too many
snapshots here.
Well, if it uses it with any regularity I think it's a legitimate
concern. We have plenty of customers who use ODBC and some of them
allow frightening numbers of concurrent server connections. Now you
may say that's a bad idea, but so is 1000 backends doing
txid_current() in a loop.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-07-02 10:38:17 -0400, Robert Haas wrote:
On Tue, Jul 2, 2013 at 9:52 AM, Andres Freund <andres@2ndquadrant.com> wrote:
* possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
consistency mechanisms in GetCatalogSnapshot() only work for those, so
there doesn't seem to be a valid usecase for non-catalog relations.It'll actually work find as things stand; it'll just take a new
snapshot every time.Ok. Doesn't really change my opinion that it's a crappy idea to use it
otherwise ;)I agree, but I don't see an easy way to write the assertion you want
using only the OID.
Let's add
/*
* IsSystemRelationId
* True iff the relation is a system catalog relation.
*/
bool
IsSystemRelationId(Oid relid)
{
return relid < FirstNormalObjectId;
}
and change IsSystemRelation() to use that instead of what it does now...
- currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
argument to heap_get_latest_tid(). I don't know what these functions
are supposed to be good for, but taking a new snapshot for every
function call seems to guarantee that someone will find a way to use
these functions as a poster child for how to brutalize PGXACT, so I
don't particularly want to do that.Heikki mentioned that at some point they were added for the odbc
driver. I am not particularly inclined to worry about taking too many
snapshots here.Well, if it uses it with any regularity I think it's a legitimate
concern. We have plenty of customers who use ODBC and some of them
allow frightening numbers of concurrent server connections.
I've quickly verified that it indeed uses them. I wish I hadn't. Brrr. I
can't even guess what that should do from the surrounding code/function
names. Except that it looks broken under concurrency as long as
SnapshotNow is used (because the query's snapshot won't be as new as
SnapshotNow, even in read committed mode).
Heikki, do you understand the code well enough to explain it without
investing time?
Now you may say that's a bad idea, but so is 1000 backends doing
txid_current() in a loop.
Hehe ;).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02.07.2013 18:24, Andres Freund wrote:
I've quickly verified that it indeed uses them. I wish I hadn't. Brrr. I
can't even guess what that should do from the surrounding code/function
names. Except that it looks broken under concurrency as long as
SnapshotNow is used (because the query's snapshot won't be as new as
SnapshotNow, even in read committed mode).Heikki, do you understand the code well enough to explain it without
investing time?
No, sorry. I think it has something to do with updateable cursors, but I
don't understand the details.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Robert,
On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
I have a few ideas for getting rid of the remaining uses of
SnapshotNow that I'd like to throw out there:
Is your current plan to get rid of SnapshotNow entirely? I am wonder
because the changeset extraction needs to care and how the proper fix
for dealing with CatalogSnapshotData looks depends on it...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 5, 2013 at 11:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi Robert,
On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
I have a few ideas for getting rid of the remaining uses of
SnapshotNow that I'd like to throw out there:Is your current plan to get rid of SnapshotNow entirely? I am wonder
because the changeset extraction needs to care and how the proper fix
for dealing with CatalogSnapshotData looks depends on it...
I would like to do that, but I haven't quite figured out how to get
rid of the last few instances, per discussion upthread. I do plan to
spend some more time on it, but likely not this week.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers