incoherent view of serializable transactions

Started by Kevin Grittnerover 17 years ago47 messageshackers
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:

Given concurrent transactions T1 and T2 and non-overlapping sets of
data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.

For reasons I'll omit here, that scenario didn't worry me for my
current uses of PostgreSQL.

I've found another form of deviation from the standard SERIALIZABLE
behavior, though, which does worry me. Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.

Imagine, as an example, a system which involves recording receipts,
each of which must go into a daily deposit. There is a control table
with one row containing the current deposit date for receipts.
Somewhere mid-afternoon that date is updated, all subsequent receipts
fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.

Under a standard-compliant implementation of SERIALIZABLE, this is
straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data. Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.

-- setup
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;

At this point, SERIALIZABLE transactions appear to have worked, with
receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.

This absolutely can't happen in a standard-compliant implementation.
At a minimum, this window where visible data lacks coherency should be
noted in the documentation. I don't know if there's any way to fix
this without killing performance.

-Kevin

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Kevin Grittner (#1)
Re: incoherent view of serializable transactions

On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:

Given concurrent transactions T1 and T2 and non-overlapping sets of
data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.

In so far as the "modifications" are just INSERTs (no UPDATEs or
DELETEs), yes. This case is covered in the documentation.

Imagine, as an example, a system which involves recording receipts,
each of which must go into a daily deposit. There is a control table
with one row containing the current deposit date for receipts.
Somewhere mid-afternoon that date is updated, all subsequent receipts
fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.

This is a variation of the above and has the same "proper" solution:
predicate locking. However, in this case the records in question are
already present so you can workaround it easily. First do a SELECT FOR
UPDATE on all the records you want to update. This will serialize all
parallel transactions to either before or after you. Then do your
update.

This absolutely can't happen in a standard-compliant implementation.
At a minimum, this window where visible data lacks coherency should be
noted in the documentation. I don't know if there's any way to fix
this without killing performance.

Predicate locking is nasty and we don't try. I'm not sure if anybody
else does.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Martijn van Oosterhout (#2)
Re: incoherent view of serializable transactions

Martijn van Oosterhout <kleptog@svana.org> wrote:

On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:

Given concurrent transactions T1 and T2 and non-overlapping sets of
data A and B, T1 reads data including A and uses the data to modify

B

while T2 reads data including B and uses that data to modify A,

where

the modifications performed by either would affect the

modifications

made by the other, if they were visible.

In so far as the "modifications" are just INSERTs (no UPDATEs or
DELETEs), yes. This case is covered in the documentation.

Let's not understate the scope of the issue.
UPDATEs and DELETEs count, too. For example:

-- connection 1
cir=> create table mytab (class int not null, value int not null);
CREATE TABLE
cir=> copy mytab from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

1 10
1 20
2 100
2 200
\.

cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 2) where class = 1 and value = 10;
UPDATE 1

-- connection 2
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 1) where class = 2 and value = 100;
UPDATE 1
cir=> commit transaction;
COMMIT

-- connection 1
cir=> commit transaction;
COMMIT
cir=> select * from mytab;
class | value
-------+-------
1 | 20
2 | 200
1 | 300
2 | 30
(4 rows)

Imagine, as an example, a system which involves recording receipts,
each of which must go into a daily deposit. There is a control

table

with one row containing the current deposit date for receipts.
Somewhere mid-afternoon that date is updated, all subsequent

receipts

fall into the new day, and a report is run listing the receipts for

the

day and giving the deposit total.

This is a variation of the above and has the same "proper" solution:
predicate locking. However, in this case the records in question are
already present so you can workaround it easily. First do a SELECT

FOR

UPDATE on all the records you want to update. This will serialize

all

parallel transactions to either before or after you. Then do your
update.

My point isn't that there aren't workarounds, it is that people might
reasonably assume that SERIALIZABLE transactions provide sufficient
concurrency control for this, since the only example we give of a
problem is a rather contrived update anomaly. The fact that even in
cases where the data settles into good form at commit leave windows
where race conditions could cause occasional bad results without extra
explicit locking is not obvious.

This absolutely can't happen in a standard-compliant

implementation.

At a minimum, this window where visible data lacks coherency should

be

noted in the documentation. I don't know if there's any way to fix
this without killing performance.

Predicate locking is nasty and we don't try. I'm not sure if anybody
else does.

I know for a fact that Sybase ASE does. I've heard from reasonably
reliable sources that DB2 does. I know that Microsoft SQL Server did
for some time after the split from the Sybase code base, but I'm not
sure they've continued that; in fact there was a reference to
concurrency issues in wikipedia which implied that they no longer do.

The implementation is not pretty -- they do it by locking accessed
pages and insertion points, and in some cases entire tables. It is so
ugly that at one point in discussing similar issues Tom said that it
couldn't really qualify as predicate locking, but in the face of the
fact that they have covered all the bases to provide true serializable
transactions, and that theory says that only predicate locking can do
that, he conceded that it was predicate locking -- but really ugly and
in a form he would never support.

Anyway, I didn't argue that we should provide truly serializable
transactions, just that we should provide a less contrived example of
where the PostgreSQL implementation can show anomalies, so that people
don't get burned through a false sense of security.

-Kevin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#1)
Re: incoherent view of serializable transactions

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

At this point, SERIALIZABLE transactions appear to have worked, with
receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.

This absolutely can't happen in a standard-compliant implementation.

I think you mean "you'd like to believe that can't happen in a
standard-compliant implementation". It doesn't include any of the
specific behaviors that are forbidden by the spec, though, so I'm less
than convinced.

An appropriate way to prevent the problem is probably for the
transaction that changes the deposit_date to take out a write-excluding
lock on the receipts table before it does so.

regards, tom lane

#5Emmanuel Cecchet
manu@frogthinker.org
In reply to: Kevin Grittner (#1)
Re: incoherent view of serializable transactions

Kevin,

If you want to know how to build SERIALIZABLE with a database that
provides SI (Snapshot Isolation), read
http://portal.acm.org/citation.cfm?doid=1376616.137669
Note that in practice, READ COMMITTED is the most largely used isolation
level and its limitations are relatively well understood by the average
programmer that can program his application accordingly. I still don't
get why people would use SERIALIZABLE since there is no efficient
implementation of it.

My 2 cents.
Emmanuel

Kevin Grittner wrote:

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:

Given concurrent transactions T1 and T2 and non-overlapping sets of
data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.

For reasons I'll omit here, that scenario didn't worry me for my
current uses of PostgreSQL.

I've found another form of deviation from the standard SERIALIZABLE
behavior, though, which does worry me. Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.

Imagine, as an example, a system which involves recording receipts,
each of which must go into a daily deposit. There is a control table
with one row containing the current deposit date for receipts.
Somewhere mid-afternoon that date is updated, all subsequent receipts
fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.

Under a standard-compliant implementation of SERIALIZABLE, this is
straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data. Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.

-- setup
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;

At this point, SERIALIZABLE transactions appear to have worked, with
receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.

This absolutely can't happen in a standard-compliant implementation.
At a minimum, this window where visible data lacks coherency should be
noted in the documentation. I don't know if there's any way to fix
this without killing performance.

-Kevin

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#4)
Re: incoherent view of serializable transactions

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

At this point, SERIALIZABLE transactions appear to have worked,

with

receipt 3 happening before the update of deposit_date; however,

there

was a window of time when the update to deposit_date was visible

and

receipt 3 was not.

This absolutely can't happen in a standard-compliant

implementation.

I think you mean "you'd like to believe that can't happen in a
standard-compliant implementation".

"The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins."

If you look at the serializable queries in the original post for this
thread, it's not hard to see that this standard is not met. The
insert of receipt 3 appears to happen before the update of the control
record, since it has the old deposit date. The transaction which
selects from both tables sees the update to the control record, so it
must come after that. Yet it doesn't see the results of the first
transaction. There is no sequence of serial execution which is
consistent with the behavior.

Perhaps you are suggesting that it's not possible in a practical sense
for a DBMS to meet this standard? See below.

It doesn't include any of the
specific behaviors that are forbidden by the spec, though, so I'm

less

than convinced.

Following the diagram of specific behaviors allowed at each
level is good evidence that these phenomena don't *define*
serializable:

"NOTE 53 * The exclusion of these phenomena for SQL-transactions
executing at isolation level SERIALIZABLE is a consequence of the
requirement that such transactions be serializable."

An appropriate way to prevent the problem is probably for the
transaction that changes the deposit_date to take out a

write-excluding

lock on the receipts table before it does so.

Well, a serializable transaction operating to standard would probably
take out a write-excluding lock on the control table row when it is
read. This would block the update to the control table until
completion of any pending receipts on the old date. The blocked
update would then take out a read-excluding lock before updating the
row, which would then block receipt transactions looking to check the
date. As soon as the update to the control table is committed, you
can see the new date, and you can have confidence that the old date's
receipts will all show on a SELECT.

Again, I'm not suggesting that we change the behavior of PostgreSQL to
match this, but that we document the difference so that those looking
to switch to PostgreSQL from databases which provide true serializable
transactions know that they have to explicitly lock rows to maintain a
coherent view of the data. In the queries shown in the original post,
the INSERT of the receipts could read the deposit date FOR SHARE to
provide the equivalent functionality, although they would have to be
rejiggered a bit, since that isn't allowed in subqueries.

This isn't some hypothetical "maybe some day some product might
implement this, but it'll never catch on" sort of thing -- Microsoft
and Sybase SQL Server had this from version 1. I used it from 1990
until the conversion to PostgreSQL over the last couple years.
Serializable transactions worked as advertised. Was there more
blocking than PostgreSQL? Yes. Were there more deadlocks than
PostgreSQL? Yes. Did it impact performance? Well, PostgreSQL is
faster by enough that users commented that the applications seemed
"snappier" when we switched the database underneath them from Sybase
to PostgreSQL.

I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.

I'm somewhat mystified at the reaction this topic gets here. :-/

-Kevin

#7Emmanuel Cecchet
manu@frogthinker.org
In reply to: Kevin Grittner (#6)
Re: incoherent view of serializable transactions

Kevin Grittner wrote:

This isn't some hypothetical "maybe some day some product might
implement this, but it'll never catch on" sort of thing -- Microsoft
and Sybase SQL Server had this from version 1. I used it from 1990
until the conversion to PostgreSQL over the last couple years.

Have you ever used serializable transactions with Sybase? The locking is
actually based on memory-pages and you end-up with deadlocks if you
don't pad your data structures to prevent false sharing. Oracle also
provides SI like Postgres and I don't think they are doing that bad.

I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.

I'm somewhat mystified at the reaction this topic gets here. :-

I am somewhat mystified by the interest some people still have in
serializable transactions. Why don't users program the application to
deal with a lower isolation (actually I think they do)?

But I am probably missing the point which was to fix the doc?
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Emmanuel Cecchet (#7)
Re: incoherent view of serializable transactions

Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>>

Kevin Grittner wrote:

This isn't some hypothetical "maybe some day some product might
implement this, but it'll never catch on" sort of thing --

Microsoft

and Sybase SQL Server had this from version 1. I used it from 1990
until the conversion to PostgreSQL over the last couple years.

Have you ever used serializable transactions with Sybase?

Every day for over 15 years.

The locking is
actually based on memory-pages and you end-up with deadlocks if you
don't pad your data structures to prevent false sharing.

We only padded one table, which we were using to assign sequence
numbers. Eventually they did come out with support for row level
locking, which could be chosen on a table by table basis, which
eliminated the need for the padding columns. We didn't go to
row-level locking for all tables because the performance hit was
significant -- worse than the blocking. Blocking was occasionally an
issue. Deadlocks were initially a problem, but they can be handled
automatically with resubmit, which we eventually covered gracefully,
and they weren't an issue for us after that.

Oracle also
provides SI like Postgres and I don't think they are doing that bad.

I don't quire understand. Could you clarify?

I'm going on second-hand information here, but I'm told that IBM

DB2

has used similar techniques to provide true serializable

transactions

for even longer.

I'm somewhat mystified at the reaction this topic gets here. :-

I am somewhat mystified by the interest some people still have in
serializable transactions. Why don't users program the application to

deal with a lower isolation (actually I think they do)?

There really are good reasons. I'm not up to going through that now,
but if there is genuine interest in the topic perhaps I can follow up
later.

But I am probably missing the point which was to fix the doc?

Thank you!

-Kevin

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#8)
Re: incoherent view of serializable transactions

Kevin Grittner wrote:

Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>>

I am somewhat mystified by the interest some people still have in
serializable transactions. Why don't users program the application to

deal with a lower isolation (actually I think they do)?

There really are good reasons. I'm not up to going through that now,
but if there is genuine interest in the topic perhaps I can follow up
later.

Well, the reason why people rely on isolation provided by database in
general is to make it easier to develop applications. One less thing to
worry about. That's why people use RDBMS, transactions, etc. to begin with.

But I am probably missing the point which was to fix the doc?

Thank you!

If you have a concrete suggestion (= patch) for the documentation, I'm
all ears.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#9)
Re: incoherent view of serializable transactions

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> 12/23/08

9:47 AM >>>

Kevin Grittner wrote:

Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>>

Why don't users program the application to
deal with a lower isolation (actually I think they do)?

There really are good reasons. I'm not up to going through that

now,

but if there is genuine interest in the topic perhaps I can follow

up

later.

Well, the reason why people rely on isolation provided by database in

general is to make it easier to develop applications. One less thing

to

worry about. That's why people use RDBMS, transactions, etc. to begin

with.

This is the heart of it. The argument is getting made that you don't
need serializable transactions because you can duplicate the
concurrency control with a combination of read_committed transactions
and explicit lock requests. I could also duplicate the functionality
of a DBMS with a bunch of disk files and custom, hard-coded access
logic, but I sure wouldn't want to do so.

If you have a concrete suggestion (= patch) for the documentation,

I'm

all ears.

Well, I figured I should try to get a consensus here before submitting
a patch. Last time I tried submitting a simple patch to remove the
line about the PostgreSQL community not knowing about any other
databases which use predicate locking, I got shot down hard.

Does the phantom receipt example from the original post seem like a
reasonable thing to include in the doc patch? Does someone have a
more comprehensive summary of where the issues are than I've been able
to generate so far?

The patch should also include information on workarounds. I'd be
inclined to frame them in terms of how to duplicate the behavior of
serializable transactions on other databases from which the reader
might be converting, which would start with using SELECT FOR SHARE for
any SELECT within a serializable transaction. That doesn't cover
everything, because, unlike Sybase, et al, it doesn't block
conflicting INSERTs. I'm not sure I have a firm conceptual grasp on
when and how that renders the SELECT FOR SHARE technique incomplete,
but I could try to think it through. If someone already has
comprehensive guidelines on that, it would save time and might improve
the quality of the docs.

-Kevin

#11Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#8)
Re: incoherent view of serializable transactions

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>>

Have you ever used serializable transactions with Sybase?

Every day for over 15 years.

Afaict doing a few google searches Sybase doesn't do predicate locking either.
It would very much surprise me if they did since they've always had the most
primitive locking infrastructure of the three major databases. Locking records
or pages isn't going to provide true standards-compliant serializable
transactions in the way you're describing.

Predicate locking means being able to lock records which don't actually exist
yet. Ie, locking all records "WHERE COLUMN=0" even if there are no such
records. This has to block any other transaction from inserting such a record
or updating another record to set COLUMN to 0.

Oracle also provides SI like Postgres and I don't think they are doing that
bad.

I don't quire understand. Could you clarify?

The point is Oracle doesn't provide this kind of true serializable isolation
and people still find it useful. In fact Sybase and DB2 also don't provide
true serializable transactions -- nobody does. It's a fantasy.

There really are good reasons. I'm not up to going through that now,
but if there is genuine interest in the topic perhaps I can follow up
later.

I suppose I'm curious whether you're mistaken and your app isn't safe on
Sybase because it's depending on truly serializable transactions and Sybase
isn't doing that, or if you have examples of transactions which Sybase
provides proper serialized semantics for but Postgres doesn't.

But I am probably missing the point which was to fix the doc?

But missing the point and having pointless arguments is so much more fun than
documentation writing :)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#11)
Re: incoherent view of serializable transactions

Gregory Stark <stark@enterprisedb.com> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>>

Have you ever used serializable transactions with Sybase?

Every day for over 15 years.

Afaict doing a few google searches Sybase doesn't do predicate

locking

either.
It would very much surprise me if they did since they've always had

the most

primitive locking infrastructure of the three major databases.

Locking

records
or pages isn't going to provide true standards-compliant

serializable

transactions in the way you're describing.

Predicate locking means being able to lock records which don't

actually

exist
yet. Ie, locking all records "WHERE COLUMN=0" even if there are no

such

records. This has to block any other transaction from inserting such

a

record
or updating another record to set COLUMN to 0.

The page locking provides this because every index page or data page
the serializable transaction looks at is locked against updates until
the end of the transaction. If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction. If a table
scan is required, the entire table is locked against all
modifications. (That's right, it is not unusual to have entire tables
locked against any modification until the end of a database
transaction.)

Oracle also provides SI like Postgres and I don't think they are

doing that

bad.

I don't quire understand. Could you clarify?

The point is Oracle doesn't provide this kind of true serializable

isolation

and people still find it useful.

Sure, and I find PostgreSQL useful. I'm not proposing to change it.

In fact Sybase and DB2 also don't provide
true serializable transactions -- nobody does. It's a fantasy.

They do. They have for over 15 years. If people will read it, I'll
try to find the a web page where they give all the details of the
strategy.

There really are good reasons. I'm not up to going through that

now,

but if there is genuine interest in the topic perhaps I can follow

up

later.

I suppose I'm curious whether you're mistaken and your app isn't safe

on

Sybase because it's depending on truly serializable transactions and

Sybase

isn't doing that, or if you have examples of transactions which

Sybase

provides proper serialized semantics for but Postgres doesn't.

All the examples provided in this thread would be handled by Sybase
with proper serializable semantics. When I proposed changing the docs
to omit the reference to our lack of knowledge about other database
products, there was a full example of code that didn't serialize
according to the mathematical definition. I cut and pasted into
Sybase and provided the results -- a deadlock.

Can you provide any example or logical explanation of where the
technique I outline above (locking against modification for every
index and data page read during the transaction until the end of the
transaction) would NOT provide true serializable behavior? (Keep in
mind that that's the broad stroke overview -- the full details include
various lock escalation techniques, etc.)

But I am probably missing the point which was to fix the doc?

But missing the point and having pointless arguments is so much more

fun

than
documentation writing :)

Frankly, I prefer other sports. :-(

-Kevin

#13Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#12)
Re: incoherent view of serializable transactions

Reading the spec it does seem our initial statement "The SQL standard defines
four levels of transaction isolation in terms of three phenomena that must be
prevented between concurrent transactions" is not accurate.

The spec defines the first three modes in terms of P1,P2,P3 but serializable
is defined, as Kevin describes, as literally serializable. It is included in
the table but with the note:

NOTE 71 — The exclusion of these phenomena for SQL-transactions executing at
isolation level SERIALIZABLE is a consequence of the requirement that such
transactions be serializable.

So it's clear that that's not the definition. Excluding P1,P2,P3 is necessary
but not sufficient to meet the spec's definition of Serializable.

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Gregory Stark <stark@enterprisedb.com> wrote:

Afaict doing a few google searches Sybase doesn't do predicate locking >
either.

The page locking provides this because every index page or data page
the serializable transaction looks at is locked against updates until
the end of the transaction. If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction. If a table
scan is required, the entire table is locked against all
modifications. (That's right, it is not unusual to have entire tables
locked against any modification until the end of a database
transaction.)

Ah, so they don't actually use the term predicate locking which is why my
google-fu was inadequate. I'm not sure if this is technically "predicate
locking" or not. It sounds like it locks a whole lot more than just the
predicate.

All the examples provided in this thread would be handled by Sybase
with proper serializable semantics. When I proposed changing the docs
to omit the reference to our lack of knowledge about other database
products, there was a full example of code that didn't serialize
according to the mathematical definition. I cut and pasted into
Sybase and provided the results -- a deadlock.

Can you provide any example or logical explanation of where the
technique I outline above (locking against modification for every
index and data page read during the transaction until the end of the
transaction) would NOT provide true serializable behavior? (Keep in
mind that that's the broad stroke overview -- the full details include
various lock escalation techniques, etc.)

I imagine they preemptively escalate to locking the table if you're going to
do a sequential scan? Otherwise an inserter might insert on a page you haven't
read yet (and therefore haven't locked yet)?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

#14Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Kevin Grittner (#12)
Re: incoherent view of serializable transactions

On Tue, 23 Dec 2008, Kevin Grittner wrote:

The page locking provides this because every index page or data page
the serializable transaction looks at is locked against updates until
the end of the transaction. If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction. If a table
scan is required, the entire table is locked against all
modifications. (That's right, it is not unusual to have entire tables
locked against any modification until the end of a database
transaction.)

Well, predicate locking for serializable also should only lock the
appropriate conditions. Getting a deadlock between two serializable
transactions for conditions that can be serialized would seemingly also be
disallowed by the definition of serializable since there would exist no
serial ordering of the transactions that has that effect.

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#13)
Re: incoherent view of serializable transactions

Gregory Stark <stark@enterprisedb.com> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Gregory Stark <stark@enterprisedb.com> wrote:

Afaict doing a few google searches Sybase doesn't do predicate

locking >

either.

The page locking provides this because every index page or data

page

the serializable transaction looks at is locked against updates

until

the end of the transaction. If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction. If a

table

scan is required, the entire table is locked against all
modifications. (That's right, it is not unusual to have entire

tables

locked against any modification until the end of a database
transaction.)

Ah, so they don't actually use the term predicate locking which is

why my

google-fu was inadequate. I'm not sure if this is technically

"predicate

locking" or not. It sounds like it locks a whole lot more than just

the

predicate.

Well, I'm not sure whether it is or not; it's a matter of definition.
If predicate locking is required for true serializable transactions,
and this provides true serializable transactions, it must be, eh?
Also, an argument could be made that if it locks every page which must
be viewed for execution based on the search predicates, it is doing
predicate locking -- if only indirectly.

All the examples provided in this thread would be handled by Sybase
with proper serializable semantics. When I proposed changing the

docs

to omit the reference to our lack of knowledge about other database
products, there was a full example of code that didn't serialize
according to the mathematical definition. I cut and pasted into
Sybase and provided the results -- a deadlock.

Can you provide any example or logical explanation of where the
technique I outline above (locking against modification for every
index and data page read during the transaction until the end of

the

transaction) would NOT provide true serializable behavior? (Keep

in

mind that that's the broad stroke overview -- the full details

include

various lock escalation techniques, etc.)

I imagine they preemptively escalate to locking the table if you're

going to

do a sequential scan? Otherwise an inserter might insert on a page

you

haven't
read yet (and therefore haven't locked yet)?

I believe they do go straight to the table lock for a table scan, but
it isn't necessary for full semantics that the transaction lock all
pages in advance. For most purposes the serializable transaction can
proceed to lock pages as it gets to them. It will block or deadlock
if a conflict arises. The transaction may serialize behind a
transaction which started later and read some page it hadn't gotten to
yet, but that doesn't violate the spec or cause any anomalies. The
key phrase in the spec here is "produces the same effect as *some*
serial execution" [emphasis added].

It will escalate from page locks to a table lock if a (configurable)
number or percentage of a table's pages get locked.

-Kevin

#16Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Stephan Szabo (#14)
Re: incoherent view of serializable transactions

Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

On Tue, 23 Dec 2008, Kevin Grittner wrote:

The page locking provides this because every index page or data

page

the serializable transaction looks at is locked against updates

until

the end of the transaction. If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction. If a

table

scan is required, the entire table is locked against all
modifications. (That's right, it is not unusual to have entire

tables

locked against any modification until the end of a database
transaction.)

Well, predicate locking for serializable also should only lock the
appropriate conditions. Getting a deadlock between two serializable
transactions for conditions that can be serialized would seemingly

also be

disallowed by the definition of serializable since there would exist

no

serial ordering of the transactions that has that effect.

Clever. If a DBMS is capable of reporting that it was unable to
serialize a transaction in a situation where there is a logical
possibility that a different implementation might have succeeded, it
hasn't implemented true serializable transactions? That strikes me as
being on a level with Codd's assertion that any database which takes
two different statements which can be proven to be logically identical
and runs one with a different plan than the other should not be called
relational. (i.e., possibly true from a technical perspective, but
hardly useful in the real world.)

Perhaps it would be best to accept this as proof that there is no
current DBMS implementing true serializable transactions and move on
to the issue of documenting real difference between PostgreSQL other
products which come closer to compliance, so that real people
converting from those products don't suffer real production problems.
As I see it, that's what matters.

-Kevin

#17Jeff Davis
pgsql@j-davis.com
In reply to: Emmanuel Cecchet (#5)
Re: incoherent view of serializable transactions

On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote:

I still don't
get why people would use SERIALIZABLE since there is no efficient
implementation of it.

If they need true serializable transactions, and don't care about
efficiency.

Regards,
Jeff Davis

#18Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#12)
Re: incoherent view of serializable transactions

The page locking provides this because every index page or data page
the serializable transaction looks at is locked against updates until
the end of the transaction. If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction. If a table
scan is required, the entire table is locked against all
modifications. (That's right, it is not unusual to have entire tables
locked against any modification until the end of a database
transaction.)

You've mentioned a couple of times that you're surprised by the
reaction of the community to your proposed documentation changes. I
have (a more muted version of) the same reaction as several previous
posters, and I think in the context of this paragraph I can explain
why.

If we were to construct a database that had one giant lock for the
entire database so that only a single query could execute at one time,
transactions would be serializable (because they'd in fact be
serialized). However, performance would suck. Therefore, database
developers and researchers have spent the last twenty (or more?) years
trying to come up with ways to allow multiple transactions to execute
in parallel while maintaining the appearance of serialization.
They've done this by introducing lower level locks: table-level,
page-level, row-level. For most users, the artifacts that have been
introduced by these fine-grained locks are outweighed by the
performance benefits of greater concurrency - but, as you point out,
not necessarily always.

I don't think I can overstate the extent to which fine-grained locking
is viewed as a good thing. It wouldn't surprise me to find out that
the locking behavior that you're relying on in Sybase is one which
some group of Sybase developers (if there still are any) are busily
laboring to eliminate. I think you can see this reflected in Greg
Stark's comment as well: "It would very much surprise me if they did
since they've always had the most primitive locking infrastructure of
the three major databases."

With respect to predicate locking, what you're describing is NOT
predicate locking. It's coarse-grained locking that largely or
completely obviates the need for predicate locking by greatly reducing
concurrency. Now, from the point of view of the application developer
who needs very strong serializability guarantees but doesn't care
about concurrency, that's six of one, half a dozen of the other, but
to me that's the opposite of the typical situation. Maybe our
documentation could say something along the lines of "PostgreSQL's
MVCC framework and row-level locking permit a greater degree of
concurrency than some other databases. Even when the transaction
isolation level is set to serializable, serialization anomalies can
occur in the following situations. When it is important to prevent
these anomalies, explicit row-level or table-level locking can be used
at the expense of reduced concurrency."

...Robert

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#10)
Re: incoherent view of serializable transactions

On Tue, 2008-12-23 at 10:10 -0600, Kevin Grittner wrote:

Well, I figured I should try to get a consensus here before submitting
a patch. Last time I tried submitting a simple patch to remove the
line about the PostgreSQL community not knowing about any other
databases which use predicate locking, I got shot down hard.

The docs got changed though.

I think the current docs make too much of a deal about how hard it is to
do predicate locking in databases. Most RDBMS use predicate locking via
indexes, ie the locking happens in the index. One might also argue that
it is potentially more efficient design, as TPC-C shows, though such
cases of application scalability are rare in the extreme and the utility
of MVCC is by far the best general approach in terms of ease of use and
performance.

The example in the docs is not a realistic example, so your new one is
useful.

I would want you to update it though to show how use of row level locks
can be used to enforce correct behaviour when required, so provide a
problem and its solution. It will b useful for people moving from
systems like Sybase that use locking often fall foul of the *lack* of
locking in MVCC and write programs that won't work correctly as a
result.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#18)
Re: incoherent view of serializable transactions

"Robert Haas" <robertmhaas@gmail.com> wrote:

For most users, the artifacts that have been
introduced by these fine-grained locks are outweighed by the
performance benefits of greater concurrency - but, as you point out,
not necessarily always.

That's what I don't understand. I never did point that out. I never
suggested it. I never requested a change to the software. I just
suggested that we document the artifacts.

Ah, well; thanks for the feedback.

With respect to predicate locking, what you're describing is NOT
predicate locking.

I never would have considered it to be such except for some people
insisting that a true serializable implementation is impossible
without predicate locking. Either that assertion is false or this is
a form of predicate locking, crude as it might be.

Maybe our
documentation could say something along the lines of "PostgreSQL's
MVCC framework and row-level locking permit a greater degree of
concurrency than some other databases. Even when the transaction
isolation level is set to serializable, serialization anomalies can
occur in the following situations. When it is important to prevent
these anomalies, explicit row-level or table-level locking can be

used

at the expense of reduced concurrency."

That's responsive to my concern and nicely worded. Thanks.

-Kevin

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#19)
#22Emmanuel Cecchet
manu@frogthinker.org
In reply to: Jeff Davis (#17)
#23Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Emmanuel Cecchet (#22)
#24Emmanuel Cecchet
manu@frogthinker.org
In reply to: Kevin Grittner (#23)
#25Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Robert Haas (#18)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Ron Mayer (#25)
#27Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Emmanuel Cecchet (#24)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#6)
#29Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#28)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#28)
#31Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#28)
#32Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#31)
#33Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#32)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#9)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#34)
#36Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#33)
#37Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#35)
#38Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#37)
#39Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#38)
#40Paul Schlie
schlie@comcast.net
In reply to: Kevin Grittner (#39)
#41Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Paul Schlie (#40)
#42Paul Schlie
schlie@comcast.net
In reply to: Kevin Grittner (#41)
#43Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Paul Schlie (#42)
#44Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#39)
#45Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#44)
#46Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#45)
#47Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#44)