Internals question about buffers

Started by Bruce Momjianalmost 24 years ago2 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

I received this question about buffer management. Can someone answer it?

---------------------------------------------------------------------------

nield@usol.com wrote:

Dear Mr. Momjian:

First let me thank you for the great work you have done on PostgreSQL.
This is a huge project, and as someone who is just starting to look at
the code I'm grateful for the effort that has gone into commenting and
clean module interfaces.

Right now, I'm working on a draft proposal involving the WAL system,
in the perhaps vain hope of eliminating VACUUM and allowing
point-in-time recovery and play-forward file recovery from a saved
log-stream. I'm not ready yet to make the proposal, since I'm still
trying to figure out everything I need to know about PostgreSQL
internals, but I hope to have a Request for Comments I can post to the
pgsql-hackers list in the near future.

The reason I'm writing you is because while looking at the buffer
manager, I had a question about locking and extending relations. There
is surely somthing I don't understand that explains why this scenerio
is not a problem, but if you could point out what I've missed it would
help me understand PostgreSQL better, and move me closer to being able
to contribute to the project.

Please feel free to forward this to the pgsql-hackers list if you
don't have time to deal with it.

Regards,

J.R. Nield <nield@usol.com>

[see the scenerio at bottom first]

Is there a race condition in ReadBufferInternal() ?
(From bufmgr.c,v 1.123 2002/04/15 23:47:12 momjian Exp)

When blockNum argument to ReadBufferInternal is 'P_NEW', then
smgrnblocks() is called on the relation to get the last block in the
relation file. This is assigned to blockNum, and the function proceeds
as if ReadBufferInternal() were called with a blockNum equal to
smgrnblocks().

Two things to note:

A) The BufMgrLock is not held when smgrnblocks() is called. (unless
bufferLockHeld was true, which will not be the case when we're
called through ReadBuffer() )

B) ReadBufferInternal() then gets an exclusive lock on BufMgrLock
and calls BufferAlloc().

If between Time A and B another backend allocates a buffer with P_NEW,
then BufferAlloc() will find the buffer in the block cache.

If so, it would seem like there is a problem, because later in the
function we will zero this block.

On line 198, we check if the block was found (Yes), then if we were
expecting it (No) we return. If the buffer is local (It is not) we
do StartBufferIO().

The next thing is at 224 where isExtend == true, so we zero the
block.

*** Why is it OK to zero the block? ***

Scenerio:

PROC1 and PROC2 both call ReadBuffer(reln, P_NEW) -->
ReadBufferInternal(reld, P_NEW, false)

PROC1 gets NOT FOUND from BufferAlloc, so it zero's the buffer and
calls smgrextend.

PROC2 finds the buffer and waits in WaitIO() for PROC1 to complete.

PROC1 finishes ReadBuffer and goes on to modify the buffer.

PROC2 gets a FOUND buffer from BufferAlloc, but was expecting to
extend. It zero's the buffer and calls smgrextend, stomping on PROC1.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Internals question about buffers

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is there a race condition in ReadBufferInternal() ?

No.

As the comments in bufmgr.c point out, this is not bufmgr.c's problem:

* ReadBuffer -- returns a buffer containing the requested
* block of the requested relation. If the blknum
* requested is P_NEW, extend the relation file and
* allocate a new block. (Caller is responsible for
* ensuring that only one backend tries to extend a
* relation at the same time!)

In practice, the necessary locking is done by hio.c in the case of
heap relations:

* Note that we use LockPage(rel, 0) to lock relation for extension.

and in the case of index relations the various index AMs have their own
approaches.

regards, tom lane