Question about conccurrency control and Insert

Started by Stéphane Cazeauxover 22 years ago7 messagesgeneral

stephane.cazeaux@netcentrex.net

over 22 years ago

I currently use pgsql 7.2.4 (but the following has also been seen on
pgsql 7.3.3) with a transaction level set to "read committed".
It do a lot of little tests to understand how concurrency control works.
Let see this scenario:

We have a table named "test_count" and a field named "count"
The table contains 1 entry with count=1

Client 1:
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"
...

Client 2 :
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok
...

We continue :

Client 1:
INSERT INTO test_count VALUES (2);
COMMIT;

Client 2: (after commit of client 1)
[The select that was blocked is now free. But the result is the
first row containing "1". I'm surprised by this result]
SELECT count FROM test_count; --> now returns the two rows, on
containing "1", the other containing "2"
COMMIT;

So my question is : why the SELECT...FOR UPDATE of client 2, when
unblocked, returns only one row, and a following SELECT in the same
transaction returns two rows ? Is there a mechanisme I don't understand ?

Thanks for your response.

Richard Huxton

dev@archonet.com

over 22 years ago

In reply to: Stéphane Cazeaux (#1)

Re: Question about conccurrency control and Insert

On Wednesday 10 September 2003 08:34, Stéphane Cazeaux wrote:

Client 1:
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"

Client 2 :
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok

We continue :

Client 1:
INSERT INTO test_count VALUES (2);
COMMIT;

Client 2: (after commit of client 1)
[The select that was blocked is now free. But the result is the
first row containing "1". I'm surprised by this result]
SELECT count FROM test_count; --> now returns the two rows, on
containing "1", the other containing "2"
COMMIT;

So my question is : why the SELECT...FOR UPDATE of client 2, when
unblocked, returns only one row, and a following SELECT in the same
transaction returns two rows ? Is there a mechanisme I don't understand ?

Client2's first SELECT started before you commited the INSERT, the second
SELECT started after you commited. Since you are using READ COMMITTED you can
read the results of transactions committed *before the current statement
started*

See Ch 9.2.1 (in Concurrency Control) for details:
"Since in Read Committed mode each new query starts with a new snapshot that
includes all transactions committed up to that instant, subsequent queries in
the same transaction will see the effects of the committed concurrent
transaction in any case."

You'll be wanting "SERIALIZABLE" transaction isolation if you don't want this
to happen.
--
Richard Huxton
Archonet Ltd

Stéphane Cazeaux

stephane.cazeaux@netcentrex.net

over 22 years ago

In reply to: Richard Huxton (#2)

Re: Question about conccurrency control and Insert

Richard Huxton wrote:

On Wednesday 10 September 2003 08:34, Stï¿½phane Cazeaux wrote:

Client 1:
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"

Client 2 :
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok

We continue :

Client 1:
INSERT INTO test_count VALUES (2);
COMMIT;

Client 2: (after commit of client 1)
[The select that was blocked is now free. But the result is the
first row containing "1". I'm surprised by this result]
SELECT count FROM test_count; --> now returns the two rows, on
containing "1", the other containing "2"
COMMIT;

So my question is : why the SELECT...FOR UPDATE of client 2, when
unblocked, returns only one row, and a following SELECT in the same
transaction returns two rows ? Is there a mechanisme I don't understand ?

Client2's first SELECT started before you commited the INSERT, the second
SELECT started after you commited. Since you are using READ COMMITTED you can
read the results of transactions committed *before the current statement
started*

I'm ok about this, but, if I try exactly the same scenario, where I
replace the INSERT by a DELETE, the first SELECT of the client 2 won't
return any row. This is the same behaviour with an UPDATE. If client 1
updates the row and commits, the first SELECT of client 2 will return
the updated row. Why isn't it the same behaviour with INSERT ?

Show quoted text

See Ch 9.2.1 (in Concurrency Control) for details:
"Since in Read Committed mode each new query starts with a new snapshot that
includes all transactions committed up to that instant, subsequent queries in
the same transaction will see the effects of the committed concurrent
transaction in any case."

You'll be wanting "SERIALIZABLE" transaction isolation if you don't want this
to happen.

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Stéphane Cazeaux (#1)

Re: Question about conccurrency control and Insert

On Wed, 10 Sep 2003, [ISO-8859-1] St�phane Cazeaux wrote:

So my question is : why the SELECT...FOR UPDATE of client 2, when
unblocked, returns only one row, and a following SELECT in the same
transaction returns two rows ? Is there a mechanisme I don't understand ?

From the docs:

Read Committed is the default isolation level in PostgreSQL. When a
transaction runs on this isolation level, a SELECT query sees only data
committed before the query began; it never sees either uncommitted data or
changes committed during query execution by concurrent transactions....
Notice that two successive SELECT commands can see different data, even
though they are within a single transaction, if other transactions commit
changes during execution of the first SELECT.
UPDATE, DELETE and SELECT FOR UPDATE commangs behave the same as SELECT in
terms of searching for target rows: they will only find target rows that
were committed as of the command start time.

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Stéphane Cazeaux (#3)

Re: Question about conccurrency control and Insert

=?ISO-8859-1?Q?St=E9phane_Cazeaux?= <stephane.cazeaux@netcentrex.net> writes:

I'm ok about this, but, if I try exactly the same scenario, where I
replace the INSERT by a DELETE, the first SELECT of the client 2 won't
return any row. This is the same behaviour with an UPDATE. If client 1
updates the row and commits, the first SELECT of client 2 will return
the updated row. Why isn't it the same behaviour with INSERT ?

A row inserted after your statement starts is not visible, period.

When SELECT FOR UPDATE encounters a row that would be visible to a
normal select (ie, it existed at the start of the statement), then
it tries to acquire a row lock on that row. A byproduct of that is
waiting for any other transaction that had already locked the row.
When the other transaction is gone, then what you get back is the
*latest version* of the row (or no row, if the other guy deleted it).
This is a special exception to the ordinary row visibility rules,
which is made on the grounds that if you SELECT FOR UPDATE you had
better get back the latest data, else you might update the row
incorrectly. (Consider for example that the row represents a bank
account, and the different transactions are adding or subtracting
amounts in the balance field. You'd better add to the latest balance
even if it shouldn't theoretically be visible to you.)

Another way of thinking about it is that if you SELECT FOR UPDATE
a particular row, and then look at it again later in the same
transaction, you're guaranteed to see the same data (other than any
changes you made yourself of course). Ordinary SELECTs in read
committed mode don't act that way.

This is all documented btw: see second para in
http://www.postgresql.org/docs/7.3/static/transaction-iso.html#XACT-READ-COMMITTED

regards, tom lane

Richard Huxton

dev@archonet.com

over 22 years ago

In reply to: Stéphane Cazeaux (#3)

Re: Question about conccurrency control and Insert

On Wednesday 10 September 2003 12:54, Stéphane Cazeaux wrote:

Richard Huxton wrote:

Client2's first SELECT started before you commited the INSERT, the second
SELECT started after you commited. Since you are using READ COMMITTED you
can read the results of transactions committed *before the current
statement started*

I'm ok about this, but, if I try exactly the same scenario, where I
replace the INSERT by a DELETE, the first SELECT of the client 2 won't
return any row. This is the same behaviour with an UPDATE. If client 1
updates the row and commits, the first SELECT of client 2 will return
the updated row. Why isn't it the same behaviour with INSERT ?

Client2's first select can't decide whether to block until it has built a
"candidate list" of rows it wants to return. Having built its list, it
notices that one is locked by Client1 and waits to see what happens.

If Client1 deletes the row, Client2 can't return it. If Client1 inserts an
additional row, the "candidate list" is already built so it doesn't get
included.

For Client2 to notice any "new" rows, it would have to re-run the query.
Obviously, for some queries this could be very expensive.
If that wasn't bad enough, if another client was updating the table you'd have
to wait again until it committed/rolled back its transaction to see what to
do. With a busy database you could end up running the query dozens of times
to check whether any data you were interested in had been inserted. What's
worse, all your other clients might well be doing the same waiting for
Client2.

If you use a SERIALIZABLE transaction level, then it guarantees Client2 only
sees data that was committed before Client2's transaction started. This means
within your transaction, your view is guaranteed to be consistent.

If you want to guarantee that Client2 sees all relevant data and that no more
can be added while Client2 is operating, then you need to get an exclusive
write lock on the table. Of course this can kill your performance with
multiple updating clients.

Phew! That was longer than I was expecting. Does it make it any clearer?
Concurrency issues can get involved - there are books and university courses
that just deal with this one topic. Hopefully I haven't got anything wrong
above. Oh - the description of "candidate list" stuff above isn't necessarily
precisely the way that PG does it, but that's how I think of it.

HTH

--
Richard Huxton
Archonet Ltd

Stéphane Cazeaux

stephane.cazeaux@netcentrex.net

over 22 years ago

In reply to: Richard Huxton (#6)

Re: Question about conccurrency control and Insert

It's now clearer for me. Thanks a lot for you long explanation.

--
Stï¿½phane

Richard Huxton wrote:

Show quoted text

On Wednesday 10 September 2003 12:54, Stï¿½phane Cazeaux wrote:

Richard Huxton wrote:

Client2's first SELECT started before you commited the INSERT, the second
SELECT started after you commited. Since you are using READ COMMITTED you
can read the results of transactions committed *before the current
statement started*

I'm ok about this, but, if I try exactly the same scenario, where I
replace the INSERT by a DELETE, the first SELECT of the client 2 won't
return any row. This is the same behaviour with an UPDATE. If client 1
updates the row and commits, the first SELECT of client 2 will return
the updated row. Why isn't it the same behaviour with INSERT ?

Client2's first select can't decide whether to block until it has built a
"candidate list" of rows it wants to return. Having built its list, it
notices that one is locked by Client1 and waits to see what happens.

If Client1 deletes the row, Client2 can't return it. If Client1 inserts an
additional row, the "candidate list" is already built so it doesn't get
included.

For Client2 to notice any "new" rows, it would have to re-run the query.
Obviously, for some queries this could be very expensive.
If that wasn't bad enough, if another client was updating the table you'd have
to wait again until it committed/rolled back its transaction to see what to
do. With a busy database you could end up running the query dozens of times
to check whether any data you were interested in had been inserted. What's
worse, all your other clients might well be doing the same waiting for
Client2.

If you use a SERIALIZABLE transaction level, then it guarantees Client2 only
sees data that was committed before Client2's transaction started. This means
within your transaction, your view is guaranteed to be consistent.

If you want to guarantee that Client2 sees all relevant data and that no more
can be added while Client2 is operating, then you need to get an exclusive
write lock on the table. Of course this can kill your performance with
multiple updating clients.

Phew! That was longer than I was expecting. Does it make it any clearer?
Concurrency issues can get involved - there are books and university courses
that just deal with this one topic. Hopefully I haven't got anything wrong
above. Oh - the description of "candidate list" stuff above isn't necessarily
precisely the way that PG does it, but that's how I think of it.

HTH