Order of operations in lazy_vacuum_rel

Started by Tom Lanealmost 16 years ago10 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
of the free space map. Once upon a time this wouldn't have mattered,
but now it means that cancel interrupts are likely to be ignored for
the duration of FreeSpaceMapVacuum(). Is that really necessary?
Would it be okay to swap the two steps?

regards, tom lane

#2Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#1)
Re: Order of operations in lazy_vacuum_rel

Tom Lane wrote:

I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
of the free space map. Once upon a time this wouldn't have mattered,
but now it means that cancel interrupts are likely to be ignored for
the duration of FreeSpaceMapVacuum(). Is that really necessary?
Would it be okay to swap the two steps?

How would it matter? Interrupts are not enabled until the transaction
is committed anyway, which must happen after both things have finished ..

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Order of operations in lazy_vacuum_rel

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
of the free space map. Once upon a time this wouldn't have mattered,
but now it means that cancel interrupts are likely to be ignored for
the duration of FreeSpaceMapVacuum(). Is that really necessary?
Would it be okay to swap the two steps?

How would it matter? Interrupts are not enabled until the transaction
is committed anyway, which must happen after both things have finished ..

The point would be to not disable interrupts till after doing the FSM
vacuuming. Ignoring CANCEL for longer than we must is bad.

I'm also looking at not disabling the interrupt until lazy_truncate_heap
determines that it's actually going to do RelationTruncate. The current
coding disables interrupts without any need in a large fraction of cases.

regards, tom lane

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#3)
Re: Order of operations in lazy_vacuum_rel

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
of the free space map. Once upon a time this wouldn't have mattered,
but now it means that cancel interrupts are likely to be ignored for
the duration of FreeSpaceMapVacuum(). Is that really necessary?
Would it be okay to swap the two steps?

How would it matter? Interrupts are not enabled until the transaction
is committed anyway, which must happen after both things have finished ..

The point would be to not disable interrupts till after doing the FSM
vacuuming. Ignoring CANCEL for longer than we must is bad.

Oh, I see. I guess the answer is that it depends on what happens if you
interrupt after vacuuming the FSM. I have no idea what that is supposed
to accomplish so I'm of no help here. FreeSpaceMapVacuum says it's
about fixing inconsistencies in the FSM, so I'm guessing that it's not
critical.

I'm also looking at not disabling the interrupt until lazy_truncate_heap
determines that it's actually going to do RelationTruncate. The current
coding disables interrupts without any need in a large fraction of cases.

Hmm, yeah ... that moves the code to the innards of lazy_truncate_heap.
Seems reasonable.

FWIW I notice that RelationTruncate contains an outdated comment at the
top about the 'fsm' function argument which is nowadays no longer an
argument.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#4)
Re: Order of operations in lazy_vacuum_rel

Alvaro Herrera wrote:

Tom Lane wrote:

The point would be to not disable interrupts till after doing the FSM
vacuuming. Ignoring CANCEL for longer than we must is bad.

Oh, I see. I guess the answer is that it depends on what happens if you
interrupt after vacuuming the FSM. I have no idea what that is supposed
to accomplish so I'm of no help here. FreeSpaceMapVacuum says it's
about fixing inconsistencies in the FSM, so I'm guessing that it's not
critical.

Yeah, interrupting FreeSpaceMapVacuum (or right after it) is harmless.

FWIW I notice that RelationTruncate contains an outdated comment at the
top about the 'fsm' function argument which is nowadays no longer an
argument.

Thanks, fixed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: Order of operations in lazy_vacuum_rel

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

The point would be to not disable interrupts till after doing the FSM
vacuuming. Ignoring CANCEL for longer than we must is bad.

Oh, I see. I guess the answer is that it depends on what happens if you
interrupt after vacuuming the FSM. I have no idea what that is supposed
to accomplish so I'm of no help here. FreeSpaceMapVacuum says it's
about fixing inconsistencies in the FSM, so I'm guessing that it's not
critical.

Actually, after thinking about this some more, I realize that this code
has got a significantly bigger problem than just whether it will respond
to CANCEL promptly. If we truncate the table, and then get an error
sometime before commit, the relcache inval message will not be sent,
leaving other backends at significant risk of strange errors due to
having rd_targblock pointing somewhere past the end of the table.
So we should reorder these operations just to reduce the risk window,
and I've done so.

It might be a good idea to develop a nontransactional signaling method
for causing other backends to reset rd_targblock --- perhaps we could
tie it to the smgr inval signaling that already happens on a truncate?
That would probably require moving rd_targblock down to the smgr
level, but offhand I see no reason that that'd be a bad thing to do.

I'm not going to panic about it right now, since the code has been like
this for a long time and we've had few if any complaints. But it seems
like something to fix sometime.

regards, tom lane

#7Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#6)
Re: Order of operations in lazy_vacuum_rel

Tom Lane wrote:

Actually, after thinking about this some more, I realize that this code
has got a significantly bigger problem than just whether it will respond
to CANCEL promptly. If we truncate the table, and then get an error
sometime before commit, the relcache inval message will not be sent,
leaving other backends at significant risk of strange errors due to
having rd_targblock pointing somewhere past the end of the table.
So we should reorder these operations just to reduce the risk window,
and I've done so.

Err, that problem was exactly why I added the interrupt holdoff in
there, so if you've got a better/more invasive solution, it's very
welcome.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: Order of operations in lazy_vacuum_rel

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Actually, after thinking about this some more, I realize that this code
has got a significantly bigger problem than just whether it will respond
to CANCEL promptly.

Err, that problem was exactly why I added the interrupt holdoff in
there, so if you've got a better/more invasive solution, it's very
welcome.

Well, that's a pretty incomplete solution :-(. Maybe we should do
something about this. There wasn't any obvious solution before,
but now that we have the nontransactional smgr-level sinval messages
being sent on drops and truncates, it seems like tying rd_targblock
clearing to those would fix the problem. The easiest way to do that
would involve moving rd_targblock down to the SMgrRelation struct.
Probably rd_fsm_nblocks and rd_vm_nblocks too. Comments?

regards, tom lane

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#8)
Re: Order of operations in lazy_vacuum_rel

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Actually, after thinking about this some more, I realize that this code
has got a significantly bigger problem than just whether it will respond
to CANCEL promptly.

Err, that problem was exactly why I added the interrupt holdoff in
there, so if you've got a better/more invasive solution, it's very
welcome.

Well, that's a pretty incomplete solution :-(.

Yeah, we were well aware of that :-) It solved our problem (which was
related to interrupts from autovac)

Maybe we should do
something about this. There wasn't any obvious solution before,
but now that we have the nontransactional smgr-level sinval messages
being sent on drops and truncates, it seems like tying rd_targblock
clearing to those would fix the problem.

Hmm, sounds good, though I confess not having heard about
nontransactional sinval messages before.

The easiest way to do that
would involve moving rd_targblock down to the SMgrRelation struct.
Probably rd_fsm_nblocks and rd_vm_nblocks too. Comments?

Can't say it doesn't look like a modularity violation from here --
insertion target block doesn't really belong into smgr, does it?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: Order of operations in lazy_vacuum_rel

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Maybe we should do
something about this. There wasn't any obvious solution before,
but now that we have the nontransactional smgr-level sinval messages
being sent on drops and truncates, it seems like tying rd_targblock
clearing to those would fix the problem.

Hmm, sounds good, though I confess not having heard about
nontransactional sinval messages before.

Hey, they've been in there almost a week ;-)
http://archives.postgresql.org/pgsql-committers/2010-02/msg00026.php

The easiest way to do that
would involve moving rd_targblock down to the SMgrRelation struct.
Probably rd_fsm_nblocks and rd_vm_nblocks too. Comments?

Can't say it doesn't look like a modularity violation from here --
insertion target block doesn't really belong into smgr, does it?

It never belonged in relcache, either. Trying to keep dynamic state
(not backed by a catalog entry) in the relcache has always been a
pretty klugy thing. smgr at least has a reasonable excuse for trying
to keep track of physical truncation events, which is the thing we need
here.

regards, tom lane