TRUNCATE SERIALIZABLE and frozen COPY
For 9.2 we discussed having COPY setting tuples as frozen. Various
details apply.
Earlier threads:
"RFC: Making TRUNCATE more "MVCC-safe"
"COPY with hints, rebirth"
I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.
That new behaviour should be requestable by adding the SERIALIZABLE
keyword.
i.e. TRUNCATE foo SERIALIZABLE;
This then allows a new style of TRUNCATE, yet without modiying
behaviour of earlier programs (ugh!).
(Once we have this, somebody that cares more than me may wish to
discuss deprecation of existing TRUNCATE behaviour in favour of this
new behaviour. I am not suggesting that here, nor do I even see a
reason for that at present.)
As soon as we have TRUNCATE SERIALIZABLE we can then enable normal
COPY to set tuples as frozen when
1) no earlier snapshots exist in the current transaction - since they
might see frozen tuples loaded in a later command and thus cause MVCC
violation
2) table has been truncated in this subtransaction by a TRUNCATE SERIALIZABLE
That then means the optimization would be available for normal data
loads/reloads.
This looks like a fairly neat way to allow MVCC-aware TRUNCATE for
those that want it, without affecting existing code and yet speeding
up large loads.
(Note that I am not suggesting any change to existing CREATE TABLE
behaviour. If you want this optimization, just add in TRUNCATE
SERIALIZABLE).
Thoughts?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
For 9.2 we discussed having COPY setting tuples as frozen. Various
details apply.
Earlier threads:
"RFC: Making TRUNCATE more "MVCC-safe"
"COPY with hints, rebirth"I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.
I don't think I understand the proposal. Under what circumstances
would it throw a serialization error?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
For 9.2 we discussed having COPY setting tuples as frozen. Various
details apply.
Earlier threads:
"RFC: Making TRUNCATE more "MVCC-safe"
"COPY with hints, rebirth"I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.I don't think I understand the proposal. Under what circumstances
would it throw a serialization error?
If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in
the table and has a snapshot that can see earlier data then it will
throw a serializable error. So its a new kind of TRUNCATE that is MVCC
safe.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 11/08/2012 08:51 PM, Simon Riggs wrote:
On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
For 9.2 we discussed having COPY setting tuples as frozen. Various
details apply.
Earlier threads:
"RFC: Making TRUNCATE more "MVCC-safe"
"COPY with hints, rebirth"I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.I don't think I understand the proposal. Under what circumstances
would it throw a serialization error?If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in
the table and has a snapshot that can see earlier data then it will
throw a serializable error. So its a new kind of TRUNCATE that is MVCC
safe.
Can't we make it so that the reader with earlier snapshot sees the data
from the pre-truncation file ?
and we unlink the base file(s) only once nobody has a snapshot the can
see it ?
or are there some subtler problems (I was under impression that we
already did this as described above) ?
----------------
Hannu
On 8 November 2012 23:20, Hannu Krosing <hannu@2ndquadrant.com> wrote:
On 11/08/2012 08:51 PM, Simon Riggs wrote:
On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com>
wrote:For 9.2 we discussed having COPY setting tuples as frozen. Various
details apply.
Earlier threads:
"RFC: Making TRUNCATE more "MVCC-safe"
"COPY with hints, rebirth"I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.I don't think I understand the proposal. Under what circumstances
would it throw a serialization error?If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in
the table and has a snapshot that can see earlier data then it will
throw a serializable error. So its a new kind of TRUNCATE that is MVCC
safe.Can't we make it so that the reader with earlier snapshot sees the data from
the pre-truncation file ?
We could... but that would require keeping a history of relfilenodes
for an object to allow for more than one TRUNCATE event. Tracking all
of that would be hard and I don't personally think its worth that
effort.
and we unlink the base file(s) only once nobody has a snapshot the can see
it ?
DELETE does that if that's the semantics you want.
or are there some subtler problems (I was under impression that we already
did this as described above) ?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 11/09/2012 09:34 AM, Simon Riggs wrote:
On 8 November 2012 23:20, Hannu Krosing <hannu@2ndquadrant.com> wrote:
On 11/08/2012 08:51 PM, Simon Riggs wrote:
On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com>
wrote:For 9.2 we discussed having COPY setting tuples as frozen. Various
details apply.
Earlier threads:
"RFC: Making TRUNCATE more "MVCC-safe"
"COPY with hints, rebirth"I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.I don't think I understand the proposal. Under what circumstances
would it throw a serialization error?If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in
the table and has a snapshot that can see earlier data then it will
throw a serializable error. So its a new kind of TRUNCATE that is MVCC
safe.Can't we make it so that the reader with earlier snapshot sees the data from
the pre-truncation file ?We could... but that would require keeping a history of relfilenodes
for an object to allow for more than one TRUNCATE event.
MVCC does keep history of old relfilenodes.
I thought we were able to read old MVCC versions in pg_class for
this if the snapshot required data matching older pg_class record.
Show quoted text
Tracking all
of that would be hard and I don't personally think its worth that
effort.and we unlink the base file(s) only once nobody has a snapshot the can see
it ?DELETE does that if that's the semantics you want.
or are there some subtler problems (I was under impression that we already
did this as described above) ?
On Wed, Nov 7, 2012 at 5:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.
That new behaviour should be requestable by adding the SERIALIZABLE
keyword.
i.e. TRUNCATE foo SERIALIZABLE;
This then allows a new style of TRUNCATE, yet without modiying
behaviour of earlier programs (ugh!).
Personally I think the behavior should be dictated by the *reader*.
The one doing the truncation may not know about the consistency
requirements of particular readers. Especially when you do the
truncate via pg_restore or some other generic tool. And indeed
different readers may have different consistency requirements.
So I'd prefer if it were a GUC variable; readers that accept relaxed
consistency can set truncate_conflict=off
Granted, making this part of the TRUNCATE statement does give you
table-level granularity. But if a reader can already handle
serialization conflicts, it doesn't really matter which table they
came from.
But making TRUNCATE behave in a MVCC-safe way, like proposed by Hannu,
would be much better.
Regards,
Marti
On 9 November 2012 10:28, Marti Raudsepp <marti@juffo.org> wrote:
On Wed, Nov 7, 2012 at 5:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I was unhappy with changing the behaviour of TRUNCATE, and still am.
So the proposal here is to have a specific modifier on TRUNCATE
command that makes it MVCC safe by throwing a serialization error.
That new behaviour should be requestable by adding the SERIALIZABLE
keyword.
i.e. TRUNCATE foo SERIALIZABLE;
This then allows a new style of TRUNCATE, yet without modiying
behaviour of earlier programs (ugh!).Personally I think the behavior should be dictated by the *reader*.
The one doing the truncation may not know about the consistency
requirements of particular readers. Especially when you do the
truncate via pg_restore or some other generic tool. And indeed
different readers may have different consistency requirements.So I'd prefer if it were a GUC variable; readers that accept relaxed
consistency can set truncate_conflict=off
I proposed something similar earlier, but Robert said he didn't like
that. The way you've described it here makes more sense, whereas my
parameter name made it seem more arbitrary. So I think that looks like
the way to go.
I'm trying to use this as a way to optimize COPY, so we'd need to make
a Seq Scan return zero rows if the truncatexid is seen as running by
the snapshot, which is the current behaviour. That seems easy enough.
Granted, making this part of the TRUNCATE statement does give you
table-level granularity. But if a reader can already handle
serialization conflicts, it doesn't really matter which table they
came from.
But making TRUNCATE behave in a MVCC-safe way, like proposed by Hannu,
would be much better.
Maybe, but one of the reasons for having a separate TRUNCATE command
rather than DELETE is the immediately removal of space. Changing
TRUNCATE so it suddenly holds on to space for longer will force us to
create a new command that acts like the old TRUNCATE.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 9, 2012 at 8:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Personally I think the behavior should be dictated by the *reader*.
The one doing the truncation may not know about the consistency
requirements of particular readers. Especially when you do the
truncate via pg_restore or some other generic tool. And indeed
different readers may have different consistency requirements.So I'd prefer if it were a GUC variable; readers that accept relaxed
consistency can set truncate_conflict=offI proposed something similar earlier, but Robert said he didn't like
that. The way you've described it here makes more sense, whereas my
parameter name made it seem more arbitrary. So I think that looks like
the way to go.
Hmm, I don't remember saying I didn't like that. Maybe I disliked
something about a particular proposed implementation? Actually, I
don't really see the need for this to be customizable at all. I have
to believe that there is vanishingly little application code that
would care about this change in semantics, so why not just change the
behavior and call it good?
I think the question that hasn't really been adequately answered is:
where and how are we going to track conflicts? Your previous patch
involved storing an XID in pg_class, but I think we both found that a
bit grotty - it'd probably need special handling for wraparound, and I
think we came up with some related cases that couldn't be handled in
the same way without adding a bunch more XIDs to various places. I
don't really like the idea of having XIDs floating around in the
system catalogs - it seems like a recipe for bugs, not to mention that
storing ephemeral data in a persistent table seems like a mismatch.
What I've been wondering since this last came up is whether we could
use some variant of the SIREAD locks Kevin introduced for SSI to
handle this case - essentially have the transaction doing the TRUNCATE
make an entry in the lock table that will force a serialization
failure for any backend which accesses the table with a snapshot that
can't see the truncating transaction's XID. The lock table entry
would need some kind of deferred clean-up, so it doesn't go away until
the locker's XID precedes RecentGlobalXmin. Of course, an extra lock
table probe for every table access will be unacceptable from a
concurrency perspective, but we could probably optimize most of them
away by only checking the lock table if the pg_class row's own xmin is
new enough that the other backend's MVCC snapshot can't see it. A
recent update to pg_class doesn't imply the existing of a lock, but
the absence of any recent update to pg_class does imply that no lock
can exist.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas wrote:
What I've been wondering since this last came up is whether we
could use some variant of the SIREAD locks Kevin introduced for SSI
to handle this case - essentially have the transaction doing the
TRUNCATE make an entry in the lock table that will force a
serialization failure for any backend which accesses the table with
a snapshot that can't see the truncating transaction's XID.
It seems to me that the goal would be to make this semantically
idential to the behavior users would see if an unqualified DELETE
were run against the table rather than a TRUNCATE. To wit:
(1) Any attempt to read from the truncated table would not block. The
pg_class version included in the transaction's snapshot would
determine which heap and indexes were accessed. If the reading
transaction were SERIALIZABLE, it would generate a read-write
conflict out to the truncating transaction.
(2) Any attempt to write to the truncated table would block until the
end of the transaction which is doing the truncation. If the
truncating transaction rolls back, it proceeds normally against the
old data. Otherwise: If the transaction is READ COMMITTED, follow the
pg_class update links. At more strict isolation levels, generate a
write conflict error.
I'm not sure where any new use of the predicate locking system would
come into play in that, other than properly handling read-write
conflicts when both transactions were SERIALIZABLE.
This seems like a subset of the issues which one might want to
address by making DDL statement behave in a more strictly MVCC
fashion. Does it make sense to pick those off one at a time, or
should something like this be done only in the context of an overall
plan to deal with all of it?
-Kevin
Import Notes
Resolved by subject fallback
On 9 November 2012 14:01, Robert Haas <robertmhaas@gmail.com> wrote:
I think the question that hasn't really been adequately answered is:
where and how are we going to track conflicts? Your previous patch
involved storing an XID in pg_class, but I think we both found that a
bit grotty - it'd probably need special handling for wraparound, and I
think we came up with some related cases that couldn't be handled in
the same way without adding a bunch more XIDs to various places. I
don't really like the idea of having XIDs floating around in the
system catalogs - it seems like a recipe for bugs, not to mention that
storing ephemeral data in a persistent table seems like a mismatch.
Yes, the xid only needs to be transient, not in pg_class.
What I've been wondering since this last came up is whether we could
use some variant of the SIREAD locks Kevin introduced for SSI to
handle this case - essentially have the transaction doing the TRUNCATE
make an entry in the lock table that will force a serialization
failure for any backend which accesses the table with a snapshot that
can't see the truncating transaction's XID. The lock table entry
would need some kind of deferred clean-up, so it doesn't go away until
the locker's XID precedes RecentGlobalXmin. Of course, an extra lock
table probe for every table access will be unacceptable from a
concurrency perspective, but we could probably optimize most of them
away by only checking the lock table if the pg_class row's own xmin is
new enough that the other backend's MVCC snapshot can't see it. A
recent update to pg_class doesn't imply the existing of a lock, but
the absence of any recent update to pg_class does imply that no lock
can exist.
I think the xid should still live in relcache, per the patch, but
should live in a transient place (and not pg_class).
We need a fast lookup structure that is expandable to accommodate
arbitrary numbers of truncates. Shared hash table, with some form of
overflow mechanism.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 9, 2012 at 4:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
We need a fast lookup structure that is expandable to accommodate
arbitrary numbers of truncates. Shared hash table, with some form of
overflow mechanism.
Surely you only need to remember the last completed truncate for each
relation? The latest one also invalidates any snapshots before earlier
truncates.
Regards,
Marti
On 9 November 2012 14:55, Marti Raudsepp <marti@juffo.org> wrote:
On Fri, Nov 9, 2012 at 4:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
We need a fast lookup structure that is expandable to accommodate
arbitrary numbers of truncates. Shared hash table, with some form of
overflow mechanism.Surely you only need to remember the last completed truncate for each
relation?
Yes
The latest one also invalidates any snapshots before earlier
truncates.
1 per table, arbirary number of tables
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 9 November 2012 14:22, Kevin Grittner <kgrittn@mail.com> wrote:
Robert Haas wrote:
What I've been wondering since this last came up is whether we
could use some variant of the SIREAD locks Kevin introduced for SSI
to handle this case - essentially have the transaction doing the
TRUNCATE make an entry in the lock table that will force a
serialization failure for any backend which accesses the table with
a snapshot that can't see the truncating transaction's XID.It seems to me that the goal would be to make this semantically
idential to the behavior users would see if an unqualified DELETE
were run against the table rather than a TRUNCATE.
Unqualified DELETE already runs that way. TRUNCATE is a different
command for a reason. Making TRUNCATE like something we already have
seems not very useful to me, not least because it breaks existing
applications.
This seems like a subset of the issues which one might want to
address by making DDL statement behave in a more strictly MVCC
fashion. Does it make sense to pick those off one at a time, or
should something like this be done only in the context of an overall
plan to deal with all of it?
TRUNCATE is not DDL, plus I have no interest in this other than
speeding up COPY.
Scope creep just kills features.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 9, 2012 at 8:22 AM, Kevin Grittner <kgrittn@mail.com> wrote:
Robert Haas wrote:
What I've been wondering since this last came up is whether we
could use some variant of the SIREAD locks Kevin introduced for SSI
to handle this case - essentially have the transaction doing the
TRUNCATE make an entry in the lock table that will force a
serialization failure for any backend which accesses the table with
a snapshot that can't see the truncating transaction's XID.It seems to me that the goal would be to make this semantically
idential to the behavior users would see if an unqualified DELETE
were run against the table rather than a TRUNCATE. To wit:
but, triggers would not fire, right?
merlin
Merlin Moncure wrote:
Kevin Grittner <kgrittn@mail.com> wrote:
Robert Haas wrote:
It seems to me that the goal would be to make this semantically
idential to the behavior users would see if an unqualified DELETE
were run against the table rather than a TRUNCATE.but, triggers would not fire, right?
Right. Perhaps "identical" was too strong a word. I was referring to
the aspect under consideration here -- making it "serializable" in
line with other MVCC operations.
If we're not talking about making conflicts with other transactions
behave just the same as an unqualified DELETE from a user
perspective, I'm not sure what the goal is, exactly. Obviously we
would be keeping the guts of the implementation the same (swapping in
a new, empty heap).
-Kevin
Import Notes
Resolved by subject fallback
Simon Riggs wrote:
This seems like a subset of the issues which one might want to
address by making DDL statement behave in a more strictly MVCC
fashion. Does it make sense to pick those off one at a time, or
should something like this be done only in the context of an
overall plan to deal with all of it?TRUNCATE is not DDL
You're right, I should have said utility commands.
I have no interest in this other than speeding up COPY.
I would love to have that!
Scope creep just kills features.
Well, I wasn't saying it should all be *done* at the same time, but
this is not the only utility command which could benefit from such an
effort, and if each one is done with no consideration of what it
takes for them all to be done, we could wind up with something that
doesn't hang together very coherently. Per perhaps this one could
serve as a "pilot", to identify issues and help develop such a plan.
-Kevin
Import Notes
Resolved by subject fallback
On 9 November 2012 15:34, Kevin Grittner <kgrittn@mail.com> wrote:
If we're not talking about making conflicts with other transactions
behave just the same as an unqualified DELETE from a user
perspective, I'm not sure what the goal is, exactly.
Reasonable question.
My goal is to allow COPY to load frozen tuples without causing MVCC violations.
Altering TRUNCATE so it behaves perfectly from an MVCC/Serializable
perspective is a much bigger, and completely different goal, as well
as something I don't see as desirable anyway for at least 2 good
reasons, as explained. IMHO if people want MVCC/Serializable
semantics, use DELETE, possibly spending time to make unqualified
DELETE do some fancy TRUNCATE-like tricks with relfilenodes.
Forcing a tightly scoped proposal into a much wider one will just kill
this and leave it blocked.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 9 November 2012 15:46, Simon Riggs <simon@2ndquadrant.com> wrote:
Altering TRUNCATE so it behaves perfectly from an MVCC/Serializable
perspective is a much bigger, and completely different goal, as well
as something I don't see as desirable anyway for at least 2 good
reasons, as explained. IMHO if people want MVCC/Serializable
semantics, use DELETE, possibly spending time to make unqualified
DELETE do some fancy TRUNCATE-like tricks with relfilenodes.
We spent a lot of time in 9.2 making TRUNCATE/reload of a table "just
work", rather than implementing a REPLACE command.
ISTM strange to throw away all that effort, changing behaviour of
TRUNCATE and thus forcing the need for a REPLACE command after all.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs wrote:
My goal is to allow COPY to load frozen tuples without causing MVCC
violations.
OK. That wasn't initially clear to me.
Forcing a tightly scoped proposal into a much wider one will just
kill this and leave it blocked.
The goal is important enough and narrow enough to merit the approach
you're talking about, IMV, at least. Sorry I initially misunderstood
what you were going for.
-Kevin
Import Notes
Resolved by subject fallback