MVCC catalog access
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact. I decided to do some
testing to measure the impact. I was pleasantly surprised by the
results.
The attached patch is a quick hack to provide for MVCC catalog access.
It adds a GUC called "mvcc_catalog_access". When this GUC is set to
true, and heap_beginscan() or index_beginscan() is called with
SnapshotNow, they call GetLatestSnapshot() and use the resulting
snapshot in lieu of SnapshotNow. As a debugging double-check, I
modified HeapTupleSatisfiesNow to elog(FATAL) if called with
mvcc_catalog_access is true. Of course, both of these are dirty
hacks. If we were actually to implement MVCC catalog access, I think
we'd probably just go through and start replacing instances of
SnapshotNow with GetLatestSnapshot(), but I wanted to make it easy to
do perf testing.
When I first made this change, I couldn't detect any real change;
indeed, it seemed that make check was running ever-so-slightly faster
than before, although that may well have been a testing artifact. I
wrote a test case that created a schema with 100,000 functions in it
and then dropped the schema (I believe it was Tom who previously
suggested this test case as a worst-case scenario for MVCC catalog
access). That didn't seem to be adversely affected either, even
though it must take ~700k additional MVCC snapshots with
mvcc_catalog_access = true.
MVCC Off: Create 8743.101 ms, Drop 9655.471 ms
MVCC On: Create 7462.882 ms, Drop 9515.537 ms
MVCC Off: Create 7519.160 ms, Drop 9380.905 ms
MVCC On: Create 7517.382 ms, Drop 9394.857 ms
The first "Create" seems to be artificially slow because of some kind
of backend startup overhead. Not sure exactly what.
After wracking my brain for a few minutes, I realized that the lack of
any apparent performance regression was probably due to the lack of
any concurrent connections, making the scans of the PGXACT array very
cheap. So I wrote a little program to open a bunch of extra
connections. My MacBook Pro grumbled when I tried to open more than
about 600, so I had to settle for that number. That was enough to
show up the cost of all those extra snapshots:
MVCC Off: Create 9065.887 ms, Drop 9599.494 ms
MVCC On: Create 8384.065 ms, Drop 10532.909 ms
MVCC Off: Create 7632.197 ms, Drop 9499.502 ms
MVCC On: Create 8215.443 ms, Drop 10033.499 ms
Now, I don't know about you, but I'm having a hard time getting
agitated about those numbers. Most people are not going to drop
100,000 objects with a single cascaded drop. And most people are not
going to have 600 connections open when they do. (The snapshot
overhead should be roughly proportional to the product of the number
of drops and the number of open connections, and the number of cases
where the product is as high as 60 million has got to be pretty
small.) But suppose that someone is in that situation. Well, then
they will take a... 10% performance penalty? That sounds plenty
tolerable to me, if it means we can start moving in the direction of
allowing some concurrent DDL.
Am I missing an important test case here? Are these results worse
than I think they are? Did I boot this testing somehow?
[MVCC catalog access patch, test program to create lots of idle
connections, and pg_depend stress test case attached.]
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, May 21, 2013 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
[ MVCC catalog access seems to be pretty cheap ]
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.
I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive). I couldn't quite believe it made no difference, so I
tried doing it in a tight loop under pgbench. I still can't measure
any difference. I haven't tested carefully enough to rule out the
possibility of an effect <1/2% at 600 connections, but there certainly
isn't anything bigger than that and I don't even think there's that
much of a difference.
Andres Freund suggested creating a couple of simple tables and having
lots of short-lived backends select data from them.
rhaas=# create table af1 (x) as select g from generate_series(1,4) g;
SELECT 4
rhaas=# create table af2 (x) as select g from generate_series(4,7) g;
SELECT 4
Test query: SELECT * FROM af1, af2 WHERE af1.x = af2.x;
pgbench command: pgbench -T 10 -c 50 -j 50 -n -f f -C
With mvcc_catalog_access=off, I get ~1553 tps; with it on, I get ~1557
tps. Hmm... that could be because of the two-line debugging hunk my
patch addes to HeapTupleSatisfiesNow(). After removing that, I get
maybe a 1% regression with mvcc_catalog_access=on on this test, but
it's not very consistent. If I restart the database server a few
times, the overhead bounces around each time, and sometimes it's zero;
the highest I saw is 1.4%. But it's not much, and this is a pretty
brutal workload for PostgreSQL, since starting up >1500 connections
per second is not a workload for which we're well-suited in the first
place.
All in all, I'm still feeling optimistic.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.
I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).
Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?
All in all, I'm still feeling optimistic.
+1. I still feel like this has to be much harder since we made it out to
be hard for a long time ;)
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 22, 2013 at 11:02 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?
No... I can hack something up for that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-22 23:05:40 -0400, Robert Haas wrote:
On Wed, May 22, 2013 at 11:02 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-22 22:51:13 -0400, Robert Haas wrote:
In discussions today, Tom Lane suggested testing the time to start up
a backend and run a simple query such as "SELECT 2+2" in the absence
of a relcache file.I did this and can't measure any overhead as a result of MVCC catalog
access. I tried it with no active connections. I tried it with 600
idle active connections (to make taking MVCC snapshots more
expensive).Did you try it with the 600 transactions actually being in a transaction
and having acquired a snapshot?No... I can hack something up for that.
Make that actually having acquired an xid. We skip a large part of the
work if a transaction doesn't yet have one afair. I don't think the mere
presence of 600 idle connections without an xid in contrast to just
having max_connection at 600 should actually make a difference in the
cost of acquiring a snapshot?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 22, 2013 at 11:11 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Make that actually having acquired an xid. We skip a large part of the
work if a transaction doesn't yet have one afair. I don't think the mere
presence of 600 idle connections without an xid in contrast to just
having max_connection at 600 should actually make a difference in the
cost of acquiring a snapshot?
Attached is a slightly updated version of the patch I'm using for
testing, and an updated version of the pg_cxn source that I'm using to
open lotsa connections. With this version, I can do this:
./pg_cxn -n 600 -c BEGIN -c 'SELECT txid_current()'
...which I think is sufficient to make sure all those transactions
have XIDs. Then I reran the "depend" test case (create a schema with
1000,000 functions and then drop the schema with CASCADE) that I
mentioned in my original posting. Here are the results:
MVCC Off: Create 8685.662 ms, Drop 9973.233 ms
MVCC On: Create 7931.039 ms, Drop 10189.189 ms
MVCC Off: Create 7810.084 ms, Drop 9594.580 ms
MVCC On: Create 8854.577 ms, Drop 10240.024 ms
OK, let's try the rebuild-the-relcache test using the same pg_cxn
scenario (600 transactions that have started a transaction and
selected txid_current()).
[rhaas ~]$ time for s in `seq 1 1000`; do rm -f
pgdata/global/pg_internal.init && psql -c 'SELECT 2+2' >/dev/null;
done
MVCC catalog access on:
real 0m11.006s
user 0m2.746s
sys 0m2.664s
MVCC catalog access off:
real 0m10.583s
user 0m2.745s
sys 0m2.661s
MVCC catalog access on:
real 0m10.646s
user 0m2.750s
sys 0m2.661s
MVCC catalog access off:
real 0m10.823s
user 0m2.756s
sys 0m2.681s
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Perhaps we see little difference in performance because PGPROC has been
separated into PGPROC and PGXACT, reducing lock contention with getting
snapshot data?
By the way, I grabbed a 32-core machine and did some more performance tests
with some open connections with XIDs assigned using pg_cxn v2 given by
Robert in his previous mail to make sure that the snapshots get pretty
large.
First I ran the simple read test:
$ time for s in `seq 1 1000`
do
rm -f ~/bin/pgsql/master/global/pg_internal.init && psql -c 'SELECT 2+2'
/dev/null;
done
And then the create/drop test.
I have done those tests with 250, 500, 1000 and 2000 open connections:
1) 250 open connections
1-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.124s
user 0m0.200s
sys 0m0.392s
mvcc_catalog_access on:
real 0m9.297s
user 0m0.148s
sys 0m0.444s
Round 2:
mvcc_catalog_access off:
real 0m8.985s
user 0m0.160s
sys 0m0.372s
mvcc_catalog_access on:
real 0m9.244s
user 0m0.240s
sys 0m0.400s
1-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 24554.849, Drop: 29755.146
mvcc on: Create: 26904.755, Drop: 32891.556
mvcc off: Create: 23337.342, Drop: 29921.990
mvcc on: Create: 24533.708, Drop: 31670.840
2) 500 open connections
2-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.123s
user 0m0.200s
sys 0m0.396s
mvcc_catalog_access on:
real 0m9.627s
user 0m0.156s
sys 0m0.460s
Round 2:
mvcc_catalog_access off:
real 0m9.221s
user 0m0.316s
sys 0m0.392s
mvcc_catalog_access on:
real 0m9.592s
user 0m0.160s
sys 0m0.484s
2-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 25872.886, Drop: 31723.921
mvcc on: Create: 27076.429, Drop: 33674.336
mvcc off: Create: 24039.456, Drop: 30434.019
mvcc on: Create: 29105.713, Drop: 33821.170
3) 1000 open connections
3-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.240s
user 0m0.192s
sys 0m0.396s
mvcc_catalog_access on:
real 0m9.674s
user 0m0.236s
sys 0m0.440s
Round 2:
mvcc_catalog_access off:
real 0m9.302s
user 0m0.308s
sys 0m0.392s
mvcc_catalog_access on:
real 0m9.746s
user 0m0.204s
sys 0m0.436s
3-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 25563.705, Drop: 31747.451
mvcc on: Create: 33281.246, Drop: 36618.166
mvcc off: Create: 28178.210, Drop: 30550.166
mvcc on: Create: 31849.825, Drop: 36831.245
4) 2000 open connections
4-1) read test
Round 1:
mvcc_catalog_access off:
real 0m9.066s
user 0m0.128s
sys 0m0.420s
mvcc_catalog_access on:
real 0m9.978s
user 0m0.168s
sys 0m0.412s
Round 2:
mvcc_catalog_access off:
real 0m9.113s
user 0m0.152s
sys 0m0.444s
mvcc_catalog_access on:
real 0m9.974s
user 0m0.176s
sys 0m0.436s
More or less the same results as previously with 10% performance drop.
4-2) DDL test (drop and creation of 100,000 objects)
mvcc off: Create: 28708.095 ms, Drop: 32510.057 ms
mvcc on: Create: 39987.815 ms, Drop: 43157.006 ms
mvcc off: Create: 28409.853 ms, Drop: 31248.163 ms
mvcc on: Create: 41669.829 ms, Drop: 44645.109 ms
For read tests, we can see a performance drop up to 10% for 2000
connections.
For the write tests, we can see a performance drop of 9~10% for 250
connections, up to 30% performance drop with 2000 connections.
We barely see users drop that many objects at the same time with so much
open transactions, they'll switch to a connection pooler before opening
that many connections to the server. I am not sure that such a performance
drop is acceptable as-is, but perhaps it is if we consider the
functionality gain we can have thanks to MVCC catalogs.
--
Michael
On Sun, May 26, 2013 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Perhaps we see little difference in performance because PGPROC has been
separated into PGPROC and PGXACT, reducing lock contention with getting
snapshot data?By the way, I grabbed a 32-core machine and did some more performance tests
with some open connections with XIDs assigned using pg_cxn v2 given by
Robert in his previous mail to make sure that the snapshots get pretty
large.
Thanks for checking this on another machine. It's interesting that
you were able to measure a hit for relcache rebuild, whereas I was
not, but it doesn't look horrible.
IMHO, we should press forward with this approach. Considering that
these are pretty extreme test cases, I'm inclined to view the
performance loss as acceptable. We've never really viewed DDL as
something that needs to be micro-optimized, and there is ample
testimony to that fact in the existing code and in the treatment of
prior patches in this area. This is not to say that we want to go
around willy-nilly making it slower, but I think there will be very
few users for which the number of microseconds it takes to create or
drop an SQL object is performance-critical, especially when you
consider that (1) the effect will be quite a bit less when the objects
are tables, since in that case the snapshot cost will tend to be
drowned out by the filesystem cost and (2) people who don't habitually
keep hundreds and hundreds of connections open - which hopefully most
people don't - won't see the effect anyway. Against that, this
removes the single largest barrier to allowing more concurrent DDL, a
feature that I suspect will make a whole lot of people *very* happy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 28, 2013 at 10:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
IMHO, we should press forward with this approach. Considering that
these are pretty extreme test cases, I'm inclined to view the
performance loss as acceptable. We've never really viewed DDL as
something that needs to be micro-optimized, and there is ample
testimony to that fact in the existing code and in the treatment of
prior patches in this area. This is not to say that we want to go
around willy-nilly making it slower, but I think there will be very
few users for which the number of microseconds it takes to create or
drop an SQL object is performance-critical, especially when you
consider that (1) the effect will be quite a bit less when the objects
are tables, since in that case the snapshot cost will tend to be
drowned out by the filesystem cost and (2) people who don't habitually
keep hundreds and hundreds of connections open - which hopefully most
people don't - won't see the effect anyway. Against that, this
removes the single largest barrier to allowing more concurrent DDL, a
feature that I suspect will make a whole lot of people *very* happy.
+1.
So, I imagine that the next step would be to add a new Snapshot validation
level in tqual.h. Something like SnapshotMVCC? Then replace SnapshotNow
by SnapshotMVCC where it is required.
I am also seeing that SnapshotNow is used in places where we might not want
to
have it changed. For example autovacuum code path when we retrieve database
or table list should not be changed, no?
--
Michael
On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
+1.
Here's a more serious patch for MVCC catalog access. This one
involves more data copying than the last one, I think, because the
previous version did not register the snapshots it took, which I think
is not safe. So this needs to be re-tested for performance, which I
have so far made no attempt to do.
It strikes me as rather unfortunate that the snapshot interface is
designed in such a way as to require so much data copying. It seems
we always take a snapshot by copying from PGXACT/PGPROC into
CurrentSnapshotData or SecondarySnapshotData, and then copying data a
second time from there to someplace more permanent. It would be nice
to avoid that, at least in common cases.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mvcc-catalog-access-v3.patchapplication/octet-stream; name=mvcc-catalog-access-v3.patchDownload+305-283
On Tue, Jun 4, 2013 at 3:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:+1.
Here's a more serious patch for MVCC catalog access. This one
involves more data copying than the last one, I think, because the
previous version did not register the snapshots it took, which I think
is not safe. So this needs to be re-tested for performance, which I
have so far made no attempt to do.It strikes me as rather unfortunate that the snapshot interface is
designed in such a way as to require so much data copying. It seems
we always take a snapshot by copying from PGXACT/PGPROC into
CurrentSnapshotData or SecondarySnapshotData, and then copying data a
second time from there to someplace more permanent. It would be nice
to avoid that, at least in common cases.
And here are more results comparing master branch with and without this
patch...
1) DDL CREATE/DROP test:
1-1) master:
250 connections:
Create: 24846.060, Drop: 30391.713
Create: 23771.394, Drop: 29769.396
500 connections:
Create: 24339.449, Drop: 30084.741
Create: 24152.176, Drop: 30643.471
1000 connections:
Create: 26007.960, Drop: 31019.918
Create: 25937.592, Drop: 30600.341
2000 connections:
Create: 26900.324, Drop: 30741.989
Create: 26910.660, Drop: 31577.247
1-2) mvcc catalogs:
250 connections:
Create: 25371.342, Drop: 31458.952
Create: 25685.094, Drop: 31492.377
500 connections:
Create: 28557.882, Drop: 33673.266
Create: 27901.910, Drop: 33223.006
1000 connections:
Create: 31910.130, Drop: 36770.062
Create: 32210.093, Drop: 36754.888
2000 connections:
Create: 40374.754, Drop: 43442.528
Create: 39763.691, Drop: 43234.243
2) backend startup
2-1) master branch:
250 connections:
real 0m8.993s
user 0m0.128s
sys 0m0.380s
500 connections:
real 0m9.004s
user 0m0.212s
sys 0m0.340s
1000 connections:
real 0m9.072s
user 0m0.272s
sys 0m0.332s
2000 connections:
real 0m9.257s
user 0m0.204s
sys 0m0.392s
2-2) MVCC catalogs:
250 connections:
real 0m9.067s
user 0m0.108s
sys 0m0.396s
500 connections:
real 0m9.034s
user 0m0.112s
sys 0m0.376s
1000 connections:
real 0m9.303s
user 0m0.176s
sys 0m0.328s
2000 connections
real 0m9.916s
user 0m0.160s
sys 0m0.428s
Except for the case of backend startup test for 500 connections that looks
to have some noise, performance degradation reaches 6% for 2000
connections, and less than 1% for 250 connections. This is better than last
time.
For the CREATE/DROP case, performance drop reaches 40% for 2000 connections
(32% during last tests). I also noticed a lower performance drop for 250
connections now (3~4%) compared to the 1st time (9%).
I compiled the main results on tables here:
http://michael.otacoo.com/postgresql-2/postgres-9-4-devel-mvcc-catalog-access-take-2-2/
The results of last time are also available here:
http://michael.otacoo.com/postgresql-2/postgres-9-4-devel-mvcc-catalog-access-2/
Regards,
--
Michael
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.
I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones. If you clean those all up it will be probably be cleaner and
better but we don't know how many such sites will need to be modified.
I'm not even sure what "clean them up" means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/05/2013 04:28 PM, Greg Stark wrote:
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones. If you clean those all up it will be probably be cleaner and
better but we don't know how many such sites will need to be modified.
I'm not even sure what "clean them up" means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?
I guess that anything that does *not* write should be happier with
MVCC catalogue, especially if there has been any DDL after its snapshot.
For writers the ability to compare MVCC and SnapshotNow snapshots
would tell if they need to take extra steps.
But undoubtedly the whole thing would be lot of work.
--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones.
But SnapshotNow doesn't allow you to do actual dirty reads? It only
gives you rows back that were actually visible when we checked. The
difference to SnapshotMVCC is that during a scan the picture of which
transactions are committed can change.
I'm not even sure what "clean them up" means. You can replace checks
with things like constraints and locks but the implementation of
constraints and locks will still need to use SnapshotNow surely?
The places that require this should already use HeapTupleSatisfiesDirty
which is something different.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones.
But SnapshotNow doesn't allow you to do actual dirty reads?
Yeah. I believe the issue is that we can't simply do MVCC catalog reads
with a snapshot taken at transaction start time or statement start time,
as we would do if executing an MVCC scan for a user query. Rather, the
snapshot has to be recent enough to ensure we see the current definition
of any table we've just acquired lock on, *even if that's newer than the
snapshot prevailing for the user's purposes*. Otherwise we might be
using the wrong rowtype definition or failing to enforce a just-added
constraint.
The last time we talked about this, we were batting around ideas of
keeping a "current snapshot for catalog purposes", which we'd update
or at least invalidate anytime we acquired a new lock. (In principle,
if that isn't new enough, we have a race condition that we'd better fix
by adding some more locking.) Robert's results seem to say that that
might be unnecessary optimization, and that it'd be sufficient to just
take a new snap each time we need to do a catalog scan. TBH I'm not
sure I believe that; it seems to me that this approach is surely going
to create a great deal more contention from concurrent GetSnapshotData
calls. But at the very least, this says we can experiment with the
behavioral aspects without bothering to build infrastructure for
tracking an appropriate catalog snapshot.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-05 11:35:58 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-06-05 15:28:09 +0100, Greg Stark wrote:
I thought there were many call sites that were specifically depending
on seeing dirty reads to avoid race conditions with other backends --
which probably just narrowed the race condition or created different
ones.But SnapshotNow doesn't allow you to do actual dirty reads?
Yeah. I believe the issue is that we can't simply do MVCC catalog reads
with a snapshot taken at transaction start time or statement start time,
as we would do if executing an MVCC scan for a user query. Rather, the
snapshot has to be recent enough to ensure we see the current definition
of any table we've just acquired lock on, *even if that's newer than the
snapshot prevailing for the user's purposes*. Otherwise we might be
using the wrong rowtype definition or failing to enforce a just-added
constraint.
Oh, definitely. At least Robert's previous prototype tried to do that
(although I am not sure if it went far enough). And I'd be surprised the
current one wouldn't do so.
The last time we talked about this, we were batting around ideas of
keeping a "current snapshot for catalog purposes", which we'd update
or at least invalidate anytime we acquired a new lock. (In principle,
if that isn't new enough, we have a race condition that we'd better fix
by adding some more locking.) Robert's results seem to say that that
might be unnecessary optimization, and that it'd be sufficient to just
take a new snap each time we need to do a catalog scan. TBH I'm not
sure I believe that; it seems to me that this approach is surely going
to create a great deal more contention from concurrent GetSnapshotData
calls.
I still have a hard time believing those results as well, but I think we
might have underestimated the effectiveness of the syscache during
workloads which are sufficiently concurrent to make locking in
GetSnapshotData() a problem.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 5, 2013 at 10:28 AM, Greg Stark <stark@mit.edu> wrote:
On Wed, May 22, 2013 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
We've had a number of discussions about the evils of SnapshotNow. As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.I was always under the impression that the problem was we weren't
quite sure what changes would be needed to make mvcc-snapshots work
for the catalog lookups. The semantics of SnapshotNow aren't terribly
clear either but we have years of experience telling us they seem to
basically work. Most of the problems we've run into we either have
worked around in the catalog accesses. Nobody really knows how many of
the call sites will need different logic to behave properly with mvcc
snapshots.
With all respect, I think this is just plain wrong. SnapshotNow is
just like an up-to-date MVCC snapshot. The only difference is that an
MVCC snapshot, once established, stays fixed for the lifetime of the
scan. On the other hand, the SnapshotNow view in the world changes
the instant another transaction commits, meaning that scans can see
multiple versions of a row, or no versions of a row, where any MVCC
scan would have seen just one. There are very few places that want
that behavior.
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.
But in most other parts of the code, the changes-in-mid-scan behavior
of SnapshotNow is a huge liability. The fact that it is fully
up-to-date *as of the time the scan starts* is critical for
correctness. But the fact that it can then change during the scan is
in almost every case something that we do not want. The patch
preserves the first property while ditching the second one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.
FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot. If we don't do it like that
I think we'll have errors of omission all over the place (I had really
no confidence in your original patch because of that worry). The fact
that there are a couple of contrib modules for which there might be an
arguable advantage in doing it the old way isn't sufficient reason to
expose ourselves to bugs like that. If they really want that sort of
uncertain semantics they could use SnapshotDirty, no?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 5, 2013 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot. If we don't do it like that
I think we'll have errors of omission all over the place (I had really
no confidence in your original patch because of that worry). The fact
that there are a couple of contrib modules for which there might be an
arguable advantage in doing it the old way isn't sufficient reason to
expose ourselves to bugs like that. If they really want that sort of
uncertain semantics they could use SnapshotDirty, no?
I had the same thought, initially. I went through the exercise of
doing a grep for SnapshotNow and trying to eliminate as many
references as possible, but there were a few that I couldn't convince
myself to rip out. However, if you'd like to apply the patch and grep
for SnapshotNow and suggest what to do about the remaining cases (or
hack the patch up yourself) I think that would be great. I'd love to
see it completely gone if we can see our way clear to that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-05 18:56:28 -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
gathering statistical information, so there's no harm in having the
snapshot change part-way through the scan, and if the scan is long,
the user might actually regard the results under SnapshotNow as more
accurate. Whether that's the case or not, holding back xmin for those
kinds of scans does not seem wise.FWIW, I think if we're going to ditch SnapshotNow we should ditch
SnapshotNow, full stop, even removing the tqual.c routines for it.
Then we can require that *any* reference to SnapshotNow is replaced by
an MVCC reference prior to execution, and throw an error if we actually
try to test a tuple with that snapshot.
I suggest simply renaming it to something else. Every external project
that decides to follow the renaming either has a proper usecase for it
or the amount of sympathy for them keeping the old behaviour is rather
limited.
If they really want that sort of uncertain semantics they could use
SnapshotDirty, no?
Not that happy with that. A scan behaving inconsistently over its
proceedings is something rather different than reading uncommitted
rows. I have the feeling that trouble is waiting that way.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers