Transactional DDL, but not Serializable

Started by Stephen Frostabout 15 years ago9 messageshackers

sfrost@snowman.net

about 15 years ago

Greetings,

We have a curious situation, consider this:

Process 1:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CRETE TABLE table1 (i integer);
INSERT INTO table1 VALUES (13);

Process 2:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CREATE TABLE table2 (i integer);
INSERT INTO table2 VALUES (123);
COMMIT;

Process 1:
SELECT * FROM pg_class WHERE relname = 'table2'; -- no rows returned
SELECT * FROM table2; -- works?! but at least no records returned
INSERT INTO table2 VALUES (456);
-- also works.. now we have a tuple in the table which appears to
-- have been added before the table existed..
COMMIT;

This happens, of course, because we use SysCache to look up table
names to Oids and that uses SnapshotNow. In my view, this violates
the basic principle of least suprise and means that while we have
transaction DDL, but it's not really serializable (no, I don't
particularly care about that).

What I do worry about is that the bulk load discussion going on could
be shot down because of this. We won't let the earlier transaction
see any records in the table today because those tuples have an xmin
later, but if we were to insert those tuples with the frozen XID (as I
proposed in the other thread) then they'd be visible.

I don't believe fixing this would be terribly difficult and, I
believe, would be similar to what we've done else where (eg: with
indexes)- basically, add a column to pg_class with the 'createdxmin'
and then compare that against our transaction whenever we're doing
table lookups.

Thoughts?

Thanks,

Stephen

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Stephen Frost (#1)

Re: Transactional DDL, but not Serializable

Stephen Frost <sfrost@snowman.net> writes:

I don't believe fixing this would be terribly difficult and, I
believe, would be similar to what we've done else where (eg: with
indexes)- basically, add a column to pg_class with the 'createdxmin'
and then compare that against our transaction whenever we're doing
table lookups.

Making DDL serializable is *not* simple, and half-baked hacks won't
make that situation better ...

regards, tom lane

Stephen Frost

sfrost@snowman.net

about 15 years ago

In reply to: Tom Lane (#2)

Re: Transactional DDL, but not Serializable

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Making DDL serializable is *not* simple, and half-baked hacks won't
make that situation better ...

Sorry, that obviously didn't come across clearly (I think I've just been
talking to Kevin far too much).

I'm not interested in making them serializable. I'd like to not have
tables randomly appear during a serializable transaction.

Thanks,

Stephen

Josh Berkus

josh@agliodbs.com

about 15 years ago

In reply to: Tom Lane (#2)

Re: Transactional DDL, but not Serializable

Making DDL serializable is *not* simple, and half-baked hacks won't
make that situation better ...

That seemed unnecessary. Whether or not you approve of Stephen's solution, he is dealing with a real issue.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

Stephen Frost

sfrost@snowman.net

about 15 years ago

In reply to: Josh Berkus (#4)

Re: Transactional DDL, but not Serializable

* Joshua Berkus (josh@agliodbs.com) wrote:

That seemed unnecessary. Whether or not you approve of Stephen's solution, he is dealing with a real issue.

The solution felt, to me at least, to have a lot of parallel to an
index's indcheckxmin. We've dealt with this issue there and have a
precedent for how to deal with it. Based on discussions with other
folks it sounds like we may be forced to do it for constraints also, and
I think we'd want to try to deal with all of them in a similar way.

Perhaps the current solution for indexes is a hack and should be tossed
out with a wholesale replacment which solves all these problems, which
would certainly be quite a bit of work, but if that's necessary then
let's discuss it and get an idea down on a wiki somewhere about what
that should look like.

Thanks,

Stephen

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Stephen Frost (#3)

Re: Transactional DDL, but not Serializable

Stephen Frost <sfrost@snowman.net> writes:

Sorry, that obviously didn't come across clearly (I think I've just been
talking to Kevin far too much).

I'm not interested in making them serializable. I'd like to not have
tables randomly appear during a serializable transaction.

Well, basically, you can't have that. Example: you have an existing
table with primary key, and while you're in the middle of doing some
long transaction, somebody else creates a table with a foreign-key
reference to the one you're about to do a delete from. Being
serializable does not excuse you from the obligation to check for
FK violations in that "invisible" table. It might be acceptable to
fail entirely, but not to act as though the table isn't there.

regards, tom lane

Stephen Frost

sfrost@snowman.net

about 15 years ago

In reply to: Tom Lane (#6)

Re: Transactional DDL, but not Serializable

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Well, basically, you can't have that. Example: you have an existing
table with primary key, and while you're in the middle of doing some
long transaction, somebody else creates a table with a foreign-key
reference to the one you're about to do a delete from. Being
serializable does not excuse you from the obligation to check for
FK violations in that "invisible" table. It might be acceptable to
fail entirely, but not to act as though the table isn't there.

That's an excellent example and point. Is there a 'right' answer (with
regard to the SQL spec, what other databases do, etc)? When you go to
delete a record from the existing table you could get a FK violation due
to the invisible table, which could end up being rolled back and
removed.

It seems like the semantics around this would call for the adding-FK
transaction to be treated as if the table did already exist and then
handle this case as we would if there wasn't any DDL involved. Another
approach might be to wait till commit to check the FK, but that'd
probably be considered unkind.

If the spec doesn't dictate anything and/or we can't find anyone else's
semantics that make sense, I suppose we'll need to define our own. To
that end, perhaps we should put up something on a wiki or similar to
start capturing these and considering what the 'right' answer would be.

Apologies for my ignorance on this.

Thanks,

Stephen

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Stephen Frost (#7)

Re: Transactional DDL, but not Serializable

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Well, basically, you can't have that. Example: you have an existing
table with primary key, and while you're in the middle of doing some
long transaction, somebody else creates a table with a foreign-key
reference to the one you're about to do a delete from. Being
serializable does not excuse you from the obligation to check for
FK violations in that "invisible" table. It might be acceptable to
fail entirely, but not to act as though the table isn't there.

That's an excellent example and point. Is there a 'right' answer (with
regard to the SQL spec, what other databases do, etc)?

I'm not aware that anybody's got an amazingly satisfactory solution.
PG's answer is of course to use up-to-the-minute DDL regardless of what
the transaction might see for other purposes, which certainly has got
disadvantages if you're hoping for truly serializable behavior. But I'm
not sure there's a better answer. You could make an argument for
failing any serializable transaction that's affected by DDL changes that
happen after it started. I don't know whether that cure would be worse
than the disease.

regards, tom lane

Darren Duncan

darren@darrenduncan.net

about 15 years ago

In reply to: Tom Lane (#8)

Re: Transactional DDL, but not Serializable

Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Well, basically, you can't have that. Example: you have an existing
table with primary key, and while you're in the middle of doing some
long transaction, somebody else creates a table with a foreign-key
reference to the one you're about to do a delete from. Being
serializable does not excuse you from the obligation to check for
FK violations in that "invisible" table. It might be acceptable to
fail entirely, but not to act as though the table isn't there.

That's an excellent example and point. Is there a 'right' answer (with
regard to the SQL spec, what other databases do, etc)?

I'm not aware that anybody's got an amazingly satisfactory solution.
PG's answer is of course to use up-to-the-minute DDL regardless of what
the transaction might see for other purposes, which certainly has got
disadvantages if you're hoping for truly serializable behavior. But I'm
not sure there's a better answer. You could make an argument for
failing any serializable transaction that's affected by DDL changes that
happen after it started. I don't know whether that cure would be worse
than the disease.

If transaction A commits successfully before transaction B commits, regardless
of when transaction B started, and transaction A changes/adds/etc any
constraints on the database, then I would expect transaction B to only commit
successfully if all of its data changes pass those new/changed constraints.

If B were allowed to commit without that being the case, then it would leave the
database in an inconsistent state, that is a state where its data doesn't
conform to its constraints. A database should always be consistent on
transaction boundaries, at the very least, if not on statement boundaries.

As to whether B's failure happens when it tries to commit or happens earlier,
based on visibility issues with A's changes, doesn't matter to me so much (do
what works best for you/others), but it should fail at some point if it would
otherwise cause inconsistencies.

-- Darren Duncan