pg_dump --snapshot

Started by Simon Riggsabout 13 years ago38 messageshackers

simon@2ndQuadrant.com

about 13 years ago

Patch to allow pg_dump to use a snapshot exported with an explicit
pg_export_snapshot() for when precise timing of the snapshot is
important.

This overrides the internally generated snapshot in parallel pg_dump.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andrew Dunstan

andrew@dunslane.net

about 13 years ago

In reply to: Simon Riggs (#1)

Re: pg_dump --snapshot

On 05/06/2013 10:56 AM, Simon Riggs wrote:

Patch to allow pg_dump to use a snapshot exported with an explicit
pg_export_snapshot() for when precise timing of the snapshot is
important.

This overrides the internally generated snapshot in parallel pg_dump.

Could you be a bit more expansive about the use case, please?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Andrew Dunstan (#2)

Re: pg_dump --snapshot

On 6 May 2013 16:02, Andrew Dunstan <andrew@dunslane.net> wrote:

On 05/06/2013 10:56 AM, Simon Riggs wrote:

Patch to allow pg_dump to use a snapshot exported with an explicit
pg_export_snapshot() for when precise timing of the snapshot is
important.

This overrides the internally generated snapshot in parallel pg_dump.

Could you be a bit more expansive about the use case, please?

Sure.

Exported snapshots allow you to coordinate a number of actions
together, so they all see a common view of the database. So this patch
allows a very general approach to this, much more so than pg_dump
allows currently since the exact timing of the snapshot is not
controlled by the user.

For example, you may wish to create a sequence of materialized views
all using the same snapshot, while at the same time dumping the
database, then follow that by dumping the mat views also. Or you may
wish to dump the database at the exact moment between two tasks,
rather than wait for pg_dump to startup before commencing the next
task.

The original pg_snapclone code from 2008 allowed an export of the
snapshot and then an import of the snapshot explicitly into pg_dump in
much the same way shown here. This patch would be the only aspect
missing from that original feature set: explicit control. (Thanks for
following up to get it all into production code).

Logical replication would be one of the users of this also, but it is
a standalone feature which complements the existing capabilities.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Simon Riggs (#3)

Re: pg_dump --snapshot

Simon Riggs <simon@2ndQuadrant.com> writes:

On 6 May 2013 16:02, Andrew Dunstan <andrew@dunslane.net> wrote:

On 05/06/2013 10:56 AM, Simon Riggs wrote:

This overrides the internally generated snapshot in parallel pg_dump.

Could you be a bit more expansive about the use case, please?

Exported snapshots allow you to coordinate a number of actions
together, so they all see a common view of the database. So this patch
allows a very general approach to this, much more so than pg_dump
allows currently since the exact timing of the snapshot is not
controlled by the user.

I'm afraid that this is institutionalizing a design deficiency in
pg_dump; namely that it takes its snapshot before acquiring locks.
Ideally that would happen the other way around. I don't have a good
idea how we could fix that --- but a feature that allows imposition
of an outside snapshot will permanently foreclose ever fixing it.

What's more, this would greatly widen the risk window between when
the snapshot is taken and when we have all the locks and can have
some confidence that the DB isn't changing under us.

Or in short: -1 for the very concept of letting the user control
pg_dump's snapshot.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Tom Lane (#4)

Re: pg_dump --snapshot

On 2013-05-06 13:07:17 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 6 May 2013 16:02, Andrew Dunstan <andrew@dunslane.net> wrote:

On 05/06/2013 10:56 AM, Simon Riggs wrote:

This overrides the internally generated snapshot in parallel pg_dump.

Could you be a bit more expansive about the use case, please?

Exported snapshots allow you to coordinate a number of actions
together, so they all see a common view of the database. So this patch
allows a very general approach to this, much more so than pg_dump
allows currently since the exact timing of the snapshot is not
controlled by the user.

I'm afraid that this is institutionalizing a design deficiency in
pg_dump; namely that it takes its snapshot before acquiring locks.
Ideally that would happen the other way around. I don't have a good
idea how we could fix that --- but a feature that allows imposition
of an outside snapshot will permanently foreclose ever fixing it.

What's more, this would greatly widen the risk window between when
the snapshot is taken and when we have all the locks and can have
some confidence that the DB isn't changing under us.

The initial transaction that exports the transaction would need to hold
locks until pg_dump started :/.

Or in short: -1 for the very concept of letting the user control
pg_dump's snapshot.

Its rather useful if you e.g. want to instantiate a new replica without
rebuilding pg_dump/pg_restore's capabilities wrt. ordering, parallelism,
separating initial data load from index creation and all that. Which
already has been incompletely reinvented by several solutions :(.

So besides the above and real problems you point out this seems
worthwile to me...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Tom Lane (#4)

Re: pg_dump --snapshot

On 6 May 2013 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or in short: -1 for the very concept of letting the user control
pg_dump's snapshot.

That API is already exposed, so not sure why you say this now? This
has been in PG since early in 9.2, about 2 years ago.

In any case, "flashback database" is one of the most requested
features I know of... the ability to dump the database as it appeared
in the past *after* that point has passed. I call it by its name as
used in Oracle, but many people have taken the trouble to describe
that in detail to me, even without knowing a similar feature existed
elsewhere.

So it will always be important to do SET TRANSACTION SNAPSHOT 'a point
in the past'
and if we can do that, why not pg_dump also?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

about 13 years ago

In reply to: Andres Freund (#5)

Re: pg_dump --snapshot

* Andres Freund (andres@2ndquadrant.com) wrote:

Its rather useful if you e.g. want to instantiate a new replica without
rebuilding pg_dump/pg_restore's capabilities wrt. ordering, parallelism,
separating initial data load from index creation and all that. Which
already has been incompletely reinvented by several solutions :(.

Perhaps it's just a wording thing, but I wouldn't use the term "replica"
when referring to something built with pg_dump/restore- that should
really be reserved for a slave system created through replication.

So besides the above and real problems you point out this seems
worthwile to me...

It certainly sounds interesting and I like the idea of it, but perhaps
we need a different mechanism than just passing in a raw snapshot, to
address the concerns that Tom raised.

Thanks,

Stephen

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Stephen Frost (#7)

Re: pg_dump --snapshot

On 2013-05-06 14:35:14 -0400, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

Its rather useful if you e.g. want to instantiate a new replica without
rebuilding pg_dump/pg_restore's capabilities wrt. ordering, parallelism,
separating initial data load from index creation and all that. Which
already has been incompletely reinvented by several solutions :(.

Perhaps it's just a wording thing, but I wouldn't use the term "replica"
when referring to something built with pg_dump/restore- that should
really be reserved for a slave system created through replication.

Well, part of the usecase *is* using it for the cloning in a replication
solution instead of open-coding it there. E.g. londiste and slony both
have home-grown hacks to do this.

So besides the above and real problems you point out this seems
worthwile to me...

It certainly sounds interesting and I like the idea of it, but perhaps
we need a different mechanism than just passing in a raw snapshot, to
address the concerns that Tom raised.

If there is anything which isn't magnitudes more complex, I'd be
interested. But given we couldn't even find a sensible solution for
pg_dump internally I don't have all that high hopes...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 13 years ago

In reply to: Simon Riggs (#6)

Re: pg_dump --snapshot

On Mon, May 6, 2013 at 6:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

In any case, "flashback database" is one of the most requested
features I know of... the ability to dump the database as it appeared
in the past *after* that point has passed.

Fwiw that's not what flashback database does. It rolls back the whole
database to that earlier point in time. it's equivalent to running
recovery but backwards in time.

Obviously if you had the ability to dump the database as of an earlier
point in time you could do a complete dump and then a complete restore
and effectively have accomplished the same thing. But that's like
calling pg_dump an incremental backup.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Stephen Frost (#7)

Re: pg_dump --snapshot

On 6 May 2013 19:35, Stephen Frost <sfrost@snowman.net> wrote:

It certainly sounds interesting and I like the idea of it, but perhaps
we need a different mechanism than just passing in a raw snapshot, to
address the concerns that Tom raised.

It does *not* pass in a raw snapshot. All it does is to allow pg_dump
to use an API that is already exposed by the backend for this very
purpose, one that has been in Postgres since 9.2.
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

Minor patch, no amazing new functionality, no drama.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Bruce Momjian (#9)

Re: pg_dump --snapshot

On 6 May 2013 19:48, Greg Stark <stark@mit.edu> wrote:

On Mon, May 6, 2013 at 6:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

In any case, "flashback database" is one of the most requested
features I know of... the ability to dump the database as it appeared
in the past *after* that point has passed.

Fwiw that's not what flashback database does. It rolls back the whole
database to that earlier point in time. it's equivalent to running
recovery but backwards in time.

Obviously if you had the ability to dump the database as of an earlier
point in time you could do a complete dump and then a complete restore
and effectively have accomplished the same thing.

OK, so you know Oracle. So then you also know that there are actually
a number of related features all called "flashback <something>", all
interrelated. What I meant by using their term was just a broad
reference to that capability, not an exact 1:1 match. Most people
requesting this have not asked for it by that name.

At the database level, it rolls back the whole kaboodle. Not what I
meant at all and I would expect people to start twitching at the
prospect.

The feature we have in PG9.2+ is the ability to set a transaction
snapshot to a snapshot that existed in the database at some point,
invoked by some external reference to it. The external reference is
the piece of information that must be specified by the user to allow
the database to look backwards. At the moment we can only specify a
snapshot from a currently running transaction, i.e. the recent past. I
foresee a feature that will allow us to look back further, possibly
with some restrictions, though certainly read only. There is a wide
stretch of water between current state and making that work, but the
existence of an ability to specify an external reference to a snapshot
is pivotal to that future capability and I would not wish to see that
capability removed.

This patch only allows pg_dump to use the existing API. As an example,
we would use it like this.

Session 1:
BEGIN; SELECT pg_export_snapshot(); --returns a textual reference to
the internally held snapshot
pg_export_snapshot
--------------------
000004F6-1
(1 row)

Session 2 -- some other user of the same snapshot
pg_dump --snapshot '000004F6-1' database1

Session 3 -- some other user of the same snapshot
e.g.
pg_dump --snapshot '000004F6-1' database2
some other programs etc..

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Simon Riggs (#10)

Re: pg_dump --snapshot

Simon Riggs <simon@2ndQuadrant.com> writes:

It does *not* pass in a raw snapshot. All it does is to allow pg_dump
to use an API that is already exposed by the backend for this very
purpose, one that has been in Postgres since 9.2.
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

Minor patch, no amazing new functionality, no drama.

You're ignoring the objection ...

The snapshot-transfer facility is intended for supporting, in essence,
multi-threaded closed applications. In such a context we can expect
that the leader knows enough about the followers to predict which locks
need to be acquired before the to-be-shared snapshot is acquired.

Exposing that in pg_dump, without doing a lick of other work (which is
what I take to be your proposal), would leave us with a situation
wherein an application wishing to invoke pg_dump safely would need to
know what locks pg_dump will take --- something that's rather a moving
target. If it gets that wrong, it will be at risk of obtaining
inconsistent dumps without any warning.

I think a minimum requirement before we can accept this feature is that
there be a way to obtain all the same locks that pg_dump would get when
given the same command-line arguments. This would, perhaps, be a good
test criterion for the fabled library-ization of pg_dump.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Stephen Frost

sfrost@snowman.net

about 13 years ago

In reply to: Simon Riggs (#10)

Re: pg_dump --snapshot

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 6 May 2013 19:35, Stephen Frost <sfrost@snowman.net> wrote:

It certainly sounds interesting and I like the idea of it, but perhaps
we need a different mechanism than just passing in a raw snapshot, to
address the concerns that Tom raised.

It does *not* pass in a raw snapshot.

It wasn't my intent to impart anything more specific than what
pg_export_snapshot() returns when I said 'raw snapshot'. What would you
call it? Snapshot identifier? All I was trying to say is that I agree
with Tom that pg_dump really needs more to happen than simply having the
results of pg_export_snapshot() passed to it- pg_dump wants all the
necessary locks taken immediately after the transaction opens and
pg_export_snapshot() simply doesn't do that.

All it does is to allow pg_dump
to use an API that is already exposed by the backend for this very
purpose, one that has been in Postgres since 9.2.

In doing so it opens a much larger hole through which this approach can
break, namely that objects could disappear between the snapshot being
taken and appropriate locks being set up. That issue already exists in
pg_dump, but at least it's a small window through one code path- and it
all happens before any parallelization or snapshot-sharing happens, as
best as I can tell.

If I understand correctly, right now we have:

connect to DB
start a transaction
run around and grab locks
get our snapshot ID
fork, connect in, glob onto the same snapshot

Assuming I'm following along correctly, this change would be:

someone in a far off land creates the snapshot
time passes..
then:
connect to DB
set the who-knows-how-old snapshot ID
run around and try to grab locks
fork, connect in, glob onto the same snapshot

One thing that, I believe, addresses this a bit is that we should at
least bomb out with an error while we're trying to acquire the locks,
should an object be dropped between transaction start and when we go to
lock it, right? We'll still see the old version of pg_class from the
start of the transaction and therefore we'll try to lock everything from
the older viewpoint...?

For my part, I'm a bit less worried about error cases around this,
provided that they're made very clear and that they're quite obvious to
the end user, and very worried about us possibly missing some object
that we were expected to capture.

In any case, would a function along the lines of
"pg_export_and_lock_snapshot()", which basically starts a transaction,
acquires locks on every object in the DB, and then returns the snapshot
ID, address this? Perhaps even pg_dump could use that, on the belief
that it'd be faster for a backend function to acquire those locks than
for pg_dump to do it anyway? I'm not sure that we want to support every
different combination of filtering options that pg_dump supports for
this, but we could start with the "all-database" option since that's, by
definition, the largest set which could be requested by pg_dump.

Or perhaps even a new lock type for this... I'd love to see the
"locking-all-objects" portion of time disappear from my nightly
backups..

Thanks,

Stephen

#14

Bruce Momjian

bruce@momjian.us

about 13 years ago

In reply to: Simon Riggs (#11)

Re: pg_dump --snapshot

On Mon, May 6, 2013 at 10:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

At the database level, it rolls back the whole kaboodle. Not what I
meant at all and I would expect people to start twitching at the
prospect.

I think it would be pretty sweet but we don't have the infrastructure
for it. We would need to retain enough information in the WAL log (or
somewhere else) to reverse the records.

The feature we have in PG9.2+ is the ability to set a transaction
snapshot to a snapshot that existed in the database at some point,
invoked by some external reference to it. The external reference is
the piece of information that must be specified by the user to allow
the database to look backwards. At the moment we can only specify a
snapshot from a currently running transaction, i.e. the recent past. I
foresee a feature that will allow us to look back further, possibly
with some restrictions, though certainly read only.

This is similar to "flashback query". And I think you're right that to
be comparable with Oracle's features we would need some option to
specify the snapshot based on time or wal position. And fwiw I think
it could still be read-write in consistent-read or serializable mode.
If you tried to update any records that had been updated since you
would get a serialization failure.

So I just did some research. It seems Oracle's equivalent of pg_dump
"expdp" does use flashback internally to guarantee consistency in some
cases which is perhaps analogous to how pg_dump uses snapshots to
synchronize multiple sessions (though it sounds like Oracle uses it
for cases that just work in Postgres).

But more interestingly expdp does in fact have a user option to
specify a timestamp or scn (analogous to wal position) and use
flashback query to dump the data at that point in time. That's a
pretty clear a parallel to what you propose here.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Stephen Frost

sfrost@snowman.net

about 13 years ago

In reply to: Bruce Momjian (#14)

Re: pg_dump --snapshot

* Greg Stark (stark@mit.edu) wrote:

On Mon, May 6, 2013 at 10:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

At the database level, it rolls back the whole kaboodle. Not what I
meant at all and I would expect people to start twitching at the
prospect.

I think it would be pretty sweet but we don't have the infrastructure
for it. We would need to retain enough information in the WAL log (or
somewhere else) to reverse the records.

Let me start off by saying that I do like the general idea. We're
rather different from Oracle, which makes me wonder if we might be in a
slightly better position to handle this kind of an option from.

For example, I'm not sure that we need more information in the WAL..
What we need is a way to tell VACUUM to skip over 'recently modified'
records and not mark them as dead until some time has passed. This is
essentially what we're already doing with the feedback mechanism in
replication, isn't it? Would it be possible to make that a simple timer
instead of requiring a replication system which is feeding back that
information based on queries which are running on the replica?

But more interestingly expdp does in fact have a user option to
specify a timestamp or scn (analogous to wal position) and use
flashback query to dump the data at that point in time. That's a
pretty clear a parallel to what you propose here.

What happens to objects which have been dropped after the user-specified
flashback point? My guess is that they simply aren't there for the user
to go look at, but perhaps I'm wrong. Either way though, I'm not sure
that we'd accept such a poor solution to that problem. For that matter,
what do we do today wrt replication feedback when/if that happens?

Thanks,

Stephen

#16

Joe Conway

mail@joeconway.com

about 13 years ago

In reply to: Stephen Frost (#15)

Re: pg_dump --snapshot

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2013 03:00 PM, Stephen Frost wrote:

For example, I'm not sure that we need more information in the
WAL.. What we need is a way to tell VACUUM to skip over 'recently
modified' records and not mark them as dead until some time has
passed. This is essentially what we're already doing with the
feedback mechanism in replication, isn't it? Would it be possible
to make that a simple timer instead of requiring a replication
system which is feeding back that information based on queries
which are running on the replica?

This sounds a lot like having an "idle in transaction" session hanging
around. Could we use something akin to a prepared transaction to
preserve the ability to use the snapshot? Maybe with an (optional)
expiration timestamp?

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRiDCgAAoJEDfy90M199hlbroP/0vVnxo91Atc6hp9l7qFsYZA
YAsrLLHMcRGdP01o+XY50COhm0ScR2zJg88wSwJTwIve5PEKywu7waP6/7Ahty/s
7sHMHZJ90fNbRTqhb5L9/4hEMN0213biol4ANk/gVcNs1nF9t+BrQK3HMsGCe5P/
InMftpTHCuPdHOmAPLMgRi/rAzWgwEy/9A/B6sw+EmMvl7j7EX5Zjur/nHaZAE9s
mA5AxY8oZv7QRJNDmp3Bg0d6tR/6WzXQDv0eEkjpeInk8d/CSFZX/kOWwsGawIrz
9WpxuMRza/L08B0Faw+Bm1jRzjp9FW5SjYDzRLjEcheNreA6vLwHSKNneBfCofU3
SE6+kK/VRxrNyc4f2gq5gl6LmK/frDojoWSt9JUd5hhXSAcmuB5iEmryrnw6xRok
TyXO4PIT59zfLXbesONEJuVIekWVs6GHk5uC+h58Re1dt1cfdQzHrAlX39sodBb8
6uBp++DiPFCg/WklJ29qFL0p6IhXhywxmGuuHB8ca1p1rh8u13HsuJ70MjBAft62
r4T94A1N/vZ9veP6eE8XFYFLaXiNUiR+r1vHdKn6MXnFpqV9OMuJ2pm476j9xSb/
nMOHQFln4IM7W++tV2y9sKKG+C8RqtCAXVSdUe2fFX9FWfprmynecrsphyD17pCi
/ZQFv0jkmS/mBWF7gFjx
=3azd
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Tom Lane (#12)

Re: pg_dump --snapshot

On 6 May 2013 22:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

It does *not* pass in a raw snapshot. All it does is to allow pg_dump
to use an API that is already exposed by the backend for this very
purpose, one that has been in Postgres since 9.2.
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

Minor patch, no amazing new functionality, no drama.

You're ignoring the objection ...

No, I just don't see a problem at all.

Locks and snapshots have got nothing to do with each other, in
Postgres. Taking a snapshot doesn't imply that database objects are
locked; whoever takes the snapshot should lock things first, if they
are worried by that.

If anybody really wanted to fix pg_dump, they could do. If that was so
important, why block this patch, but allow parallel pg_dump to be
committed without it?

There is no risk that is larger than the one already exposed by the
existing user API.

If you do see a risk in the existing API, please deprecate it and
remove it from the docs, or mark it not-for-use-by-users. I hope you
don't, but if you do, do it now - I'll be telling lots of people about
all the useful things you can do with it over the next few years,
hopefully in pg_dump as well.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Stephen Frost

sfrost@snowman.net

about 13 years ago

In reply to: Simon Riggs (#17)

Re: pg_dump --snapshot

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:

If anybody really wanted to fix pg_dump, they could do. If that was so
important, why block this patch, but allow parallel pg_dump to be
committed without it?

Because parallel pg_dump didn't make the problem any *worse*..? This
does. The problem existed before parallel pg_dump.

There is no risk that is larger than the one already exposed by the
existing user API.

The API exposes it, yes, but *pg_dump* isn't any worse than it was
before.

If you do see a risk in the existing API, please deprecate it and
remove it from the docs, or mark it not-for-use-by-users. I hope you
don't, but if you do, do it now - I'll be telling lots of people about
all the useful things you can do with it over the next few years,
hopefully in pg_dump as well.

pg_dump uses it already and uses it as best it can. Users could use it
also, provided they understand the constraints around it. However,
there really isn't a way for users to use this new option correctly-
they would need to intuit what pg_dump will want to lock, lock it
immediately after their transaction is created, and only *then* get the
snapshot ID and pass it to pg_dump, hoping against hope that pg_dump
will actually need the locks that they decided to acquire..

Thanks,

Stephen

#19

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Stephen Frost (#18)

Re: pg_dump --snapshot

On 2013-05-06 20:18:26 -0400, Stephen Frost wrote:

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:

If anybody really wanted to fix pg_dump, they could do. If that was so
important, why block this patch, but allow parallel pg_dump to be
committed without it?

Because parallel pg_dump didn't make the problem any *worse*..? This
does. The problem existed before parallel pg_dump.

Yes, it did.

There is no risk that is larger than the one already exposed by the
existing user API.

The API exposes it, yes, but *pg_dump* isn't any worse than it was
before.

No, but its still broken. pg_dump without the parameter being passed
isn't any worse off after the patch has been applied. With the parameter
the window gets a bit bigger sure...

If you do see a risk in the existing API, please deprecate it and
remove it from the docs, or mark it not-for-use-by-users. I hope you
don't, but if you do, do it now - I'll be telling lots of people about
all the useful things you can do with it over the next few years,
hopefully in pg_dump as well.

pg_dump uses it already and uses it as best it can. Users could use it
also, provided they understand the constraints around it. However,
there really isn't a way for users to use this new option correctly-
they would need to intuit what pg_dump will want to lock, lock it
immediately after their transaction is created, and only *then* get the
snapshot ID and pass it to pg_dump, hoping against hope that pg_dump
will actually need the locks that they decided to acquire..

Given that we don't have all that many types of objects we can lock,
that task isn't all that complicated. But I'd guess a very common usage
is to start the snapshot and immediately fork pg_dump. In that case the
window between snapshot acquiration and reading the object list is
probably smaller than the one between reading the object list and
locking.

This all reads like a textbook case of "perfect is the enemy of good" to
me.

A rather useful feature has to fix a bug in pg_dump which a) exists for
ages b) has yet to be reported to the lists c) is rather complicated to
fix and quite possibly requires proper snapshots for internals?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Andres Freund (#19)

Re: pg_dump --snapshot

On 2013-05-07 02:53:16 +0200, Andres Freund wrote:

A rather useful feature has to fix a bug in pg_dump which a) exists for
ages b) has yet to be reported to the lists c) is rather complicated to
fix and quite possibly requires proper snapshots for internals?

Just to clarify: I think this worth fixing, but it just seems like
something that needs to be fixed independently from this feature.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Stephen Frost

sfrost@snowman.net

about 13 years ago

In reply to: Andres Freund (#19)

#22