Vacuum now uses AccessShareLock for analyze

Started by Bruce Momjianover 25 years ago10 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

I have changed vacuum so analyze now uses AccessShareLock. (Is this the
proper lock for analyze?)

The code will now vacuum all requested relations. It will then analyze
each relation. This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

The code is much clearer with that functionality split into separate
functions.

How separate do people want vacuum and analyze? Analyze currently does
not record the number of tuples and pages, because vacuum does that. Do
people want analyze as a separate command and in a separate file?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#2The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#1)
Re: Vacuum now uses AccessShareLock for analyze

On Mon, 29 May 2000, Bruce Momjian wrote:

I have changed vacuum so analyze now uses AccessShareLock. (Is this the
proper lock for analyze?)

The code will now vacuum all requested relations. It will then analyze
each relation. This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

hrmmm, here's a thought ... why not vacuum->analyze each relation in
order? the 'exclusive lock' will prevent anyone from reading, so do a
relation, release the lock to analyze that relation and let ppl access the
database ... then do the next ... instead of doing an exclusive lock for
the duration of the whole database ...

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: The Hermit Hacker (#2)
Re: Vacuum now uses AccessShareLock for analyze

On Mon, 29 May 2000, Bruce Momjian wrote:

I have changed vacuum so analyze now uses AccessShareLock. (Is this the
proper lock for analyze?)

The code will now vacuum all requested relations. It will then analyze
each relation. This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

hrmmm, here's a thought ... why not vacuum->analyze each relation in
order? the 'exclusive lock' will prevent anyone from reading, so do a
relation, release the lock to analyze that relation and let ppl access the
database ... then do the next ... instead of doing an exclusive lock for
the duration of the whole database ...

No, each table is locked one at a time. We do all the single-table
locks first so the rest is all shared access. Does that make sense?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#3)
Re: Vacuum now uses AccessShareLock for analyze

On Mon, 29 May 2000, Bruce Momjian wrote:

On Mon, 29 May 2000, Bruce Momjian wrote:

I have changed vacuum so analyze now uses AccessShareLock. (Is this the
proper lock for analyze?)

The code will now vacuum all requested relations. It will then analyze
each relation. This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

hrmmm, here's a thought ... why not vacuum->analyze each relation in
order? the 'exclusive lock' will prevent anyone from reading, so do a
relation, release the lock to analyze that relation and let ppl access the
database ... then do the next ... instead of doing an exclusive lock for
the duration of the whole database ...

No, each table is locked one at a time. We do all the single-table
locks first so the rest is all shared access. Does that make sense?

its what I suspected, but my point was that if we did the ANALYZE for the
relation right after the VACUUM for it, there would be a period of time
where readers could come in and process ... think of it as a 'breather'
before the next VACUUM starts, vs just jumping into the next ...

Overall time for doing the vacuum shouldn't be any longer, but it would
give gaps where readers could get in and out ... we're a relational
database, so I imagine ppl are doing JOINs ... if RelationA is locked
while ReaderA is trying to doign a JOIN between RA and RB, ReaderA is
gonna be screwed ... if we did a quick ANALZE between RelationA and
RelationB, then ReaderA would have a chance to do its processing while the
ANALYZE is running, instead of having to wait for both RelationA and
RelationB to be finished ...

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: The Hermit Hacker (#4)
Re: Vacuum now uses AccessShareLock for analyze

its what I suspected, but my point was that if we did the ANALYZE for the
relation right after the VACUUM for it, there would be a period of time
where readers could come in and process ... think of it as a 'breather'
before the next VACUUM starts, vs just jumping into the next ...

Overall time for doing the vacuum shouldn't be any longer, but it would
give gaps where readers could get in and out ... we're a relational
database, so I imagine ppl are doing JOINs ... if RelationA is locked
while ReaderA is trying to doign a JOIN between RA and RB, ReaderA is
gonna be screwed ... if we did a quick ANALZE between RelationA and
RelationB, then ReaderA would have a chance to do its processing while the
ANALYZE is running, instead of having to wait for both RelationA and
RelationB to be finished ...

Good point. Because we are only doing one table at a time, they could
get in and do something, but they could also get part-way in and have
another table locked, and since we are only locking one at a time, it
seemed better to get it all done first. Comments?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Vacuum now uses AccessShareLock for analyze

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The code will now vacuum all requested relations. It will then analyze
each relation. This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

I agree with Marc: it'd make more sense to do it one table at a time,
ie,
get exclusive lock on table A
vacuum table A
commit, release lock
get shared lock on table A
gather stats for table A
commit, release lock
repeat sequence for table B
etc

The code is much clearer with that functionality split into separate
functions.

Wouldn't surprise me.

How separate do people want vacuum and analyze? Analyze currently does
not record the number of tuples and pages, because vacuum does that. Do
people want analyze as a separate command and in a separate file?

We definitely want a separate command that can invoke just the analyze
part. I'd guess something like "ANALYZE [ VERBOSE ] optional-table-name
(optional-list-of-columns)" pretty much like VACUUM.

I would be inclined to move the code out to a new file, just because
vacuum.c is so darn big, but that's purely a code-beautification issue.

On the number of tuples/pages issue, I'd suggest removing that function
from plain vacuum and make the analyze part do it instead. It's always
made me uncomfortable that vacuum needs to update system relations while
it's holding an exclusive lock on the table-being-vacuumed (which might
be another system catalog, or even pg_class itself). It works, more or
less, but that update-tuple-in-place code is awfully ugly and
fragile-looking. I'm also worried that there could be deadlock
scenarios between concurrent vacuums (eg, one guy working on pg_class,
another on pg_statistic, both need to get in and update the other guy's
table. Oops. That particular problem should be gone with your changes,
but maybe there are still problems just from the need to update
pg_class).

regards, tom lane

#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#6)
Re: Vacuum now uses AccessShareLock for analyze

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The code will now vacuum all requested relations. It will then analyze
each relation. This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

I agree with Marc: it'd make more sense to do it one table at a time,
ie,
get exclusive lock on table A
vacuum table A
commit, release lock
get shared lock on table A
gather stats for table A
commit, release lock
repeat sequence for table B
etc

OK, changed.

I will work on the additional issues in the next week or so.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8Daniel Kalchev
daniel@digsys.bg
In reply to: Tom Lane (#6)
Re: Vacuum now uses AccessShareLock for analyze

While we are at VACUUM, it would be good idea to have VACUUM flag each 'vacuumed' table (with some sort of 'clean' flag) - this will permit vaster vacuuming of mostly static databases, such that contain 'history' data in some tables.

Daniel

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#6)
Re: Vacuum now uses AccessShareLock for analyze

How separate do people want vacuum and analyze? Analyze currently does
not record the number of tuples and pages, because vacuum does that. Do
people want analyze as a separate command and in a separate file?

We definitely want a separate command that can invoke just the analyze
part. I'd guess something like "ANALYZE [ VERBOSE ] optional-table-name
(optional-list-of-columns)" pretty much like VACUUM.

OK.

I would be inclined to move the code out to a new file, just because
vacuum.c is so darn big, but that's purely a code-beautification issue.

Done.

On the number of tuples/pages issue, I'd suggest removing that function
from plain vacuum and make the analyze part do it instead. It's always
made me uncomfortable that vacuum needs to update system relations while
it's holding an exclusive lock on the table-being-vacuumed (which might
be another system catalog, or even pg_class itself). It works, more or
less, but that update-tuple-in-place code is awfully ugly and
fragile-looking. I'm also worried that there could be deadlock
scenarios between concurrent vacuums (eg, one guy working on pg_class,
another on pg_statistic, both need to get in and update the other guy's
table. Oops. That particular problem should be gone with your changes,
but maybe there are still problems just from the need to update
pg_class).

How do I find the number of pages from heapscan? Can that number just
be computed from the file size. I can get the block number of the last
entry in the scan, but that doesn't show me expired rows at the end.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: Vacuum now uses AccessShareLock for analyze

Bruce Momjian <pgman@candle.pha.pa.us> writes:

How do I find the number of pages from heapscan?

IIRC, heap_beginscan updates the relcache entry with the current number
of blocks (as determined the hard way, with lseek). Should be able to
use that, even though it might be a little bit out of date by the time
you finish the scan ...

regards, tom lane