"ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Started by Alvaro Herreraover 16 years ago12 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

A customer of ours recently hit a problem where after an autovacuum was
cancelled on a table, the app started getting the message in $subject:

ERROR: could not read block 6 of relation 1663/35078/1761966: read only 0 of 8192 bytes

(block numbers vary from 1 to 6). Things remained in this state until
another autovacuum came along and cleaned up the table, 4 minutes later
(this is a high traffic table; there are several inserts per second).

The log looks like this:

2009-10-20 04:02:07 PDT [27396]: [1-1] LOG: automatic vacuum of table "database.public.tabname": index scans: 1
pages: 6 removed, 1 remain
tuples: 755 removed, 2 remain
system usage: CPU 0.00s/0.00u sec elapsed 1.42 sec
2009-10-20 04:02:07 PDT [27396]: [2-1] ERROR: canceling autovacuum task
2009-10-20 04:02:07 PDT [27396]: [3-1] CONTEXT: automatic vacuum of table "database.public.tabname"

What I thought could have happened is that the table was truncated, and
then the sinval message telling that to other backends was not sent due
to the rollback. When they tried to insert to the page they had
recorded as rd_targblock, they try to read the page but it's no longer
there.

I can reproduce this by adding a sleep and CHECK_FOR_INTERRUPTS after
lazy_vacuum_rel() returns, and before CommitTransactionCommand.

So far as I can see, what we need is to make sure the sinval message is
sent regardless of transaction commit/abort. How can that be done? It
is quite ugly to have an untimely autovacuum cancel disrupt the ability
to insert into a table.

Thoughts?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Alvaro Herrera <alvherre@commandprompt.com> writes:

What I thought could have happened is that the table was truncated, and
then the sinval message telling that to other backends was not sent due
to the rollback.

Hmm.

So far as I can see, what we need is to make sure the sinval message is
sent regardless of transaction commit/abort. How can that be done?

I would argue that once we've truncated, it's too late to abort. The
interrupt facility should be disabled from just before issuing the
truncate till after commit. It would probably be relatively painless to
do that with some manipulation of the interrupt holdoff stuff.

regards, tom lane

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

So far as I can see, what we need is to make sure the sinval message is
sent regardless of transaction commit/abort. How can that be done?

I would argue that once we've truncated, it's too late to abort. The
interrupt facility should be disabled from just before issuing the
truncate till after commit. It would probably be relatively painless to
do that with some manipulation of the interrupt holdoff stuff.

That cures my (admittedly simplistic) testcase. The patch is a bit ugly
because the interrupts are held off in lazy_vacuum_rel and need to be
released by its caller. I don't see any other way around the problem
though.

The attached patch is for 8.4; back branches all need a bit of editing.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

vactrunc-nointerrupt-84.patchtext/x-diff; charset=us-asciiDownload+31-6
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#3)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

I would argue that once we've truncated, it's too late to abort. The
interrupt facility should be disabled from just before issuing the
truncate till after commit. It would probably be relatively painless to
do that with some manipulation of the interrupt holdoff stuff.

That cures my (admittedly simplistic) testcase. The patch is a bit ugly
because the interrupts are held off in lazy_vacuum_rel and need to be
released by its caller. I don't see any other way around the problem
though.

I wonder whether we shouldn't extend this into VACUUM FULL too, to
prevent cancel once it's done that internal commit. It would fix
the "PANIC: can't abort a committed transaction" problem V.F. has.

regards, tom lane

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

I would argue that once we've truncated, it's too late to abort. The
interrupt facility should be disabled from just before issuing the
truncate till after commit. It would probably be relatively painless to
do that with some manipulation of the interrupt holdoff stuff.

That cures my (admittedly simplistic) testcase. The patch is a bit ugly
because the interrupts are held off in lazy_vacuum_rel and need to be
released by its caller. I don't see any other way around the problem
though.

I wonder whether we shouldn't extend this into VACUUM FULL too, to
prevent cancel once it's done that internal commit. It would fix
the "PANIC: can't abort a committed transaction" problem V.F. has.

Hmm, it seems to work. The attached is for 8.1.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachments:

vactrunc-nointerrupt-2-81.patchtext/x-diff; charset=us-asciiDownload+58-20
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#5)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

I wonder whether we shouldn't extend this into VACUUM FULL too, to
prevent cancel once it's done that internal commit. It would fix
the "PANIC: can't abort a committed transaction" problem V.F. has.

Hmm, it seems to work. The attached is for 8.1.

Looks OK, but please update the comment right before the
RecordTransactionCommit, along the lines of "We prevent cancel
interrupts after this point to mitigate the problem that you
can't abort the transaction now".

regards, tom lane

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#6)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

I wonder whether we shouldn't extend this into VACUUM FULL too, to
prevent cancel once it's done that internal commit. It would fix
the "PANIC: can't abort a committed transaction" problem V.F. has.

Hmm, it seems to work. The attached is for 8.1.

Looks OK, but please update the comment right before the
RecordTransactionCommit, along the lines of "We prevent cancel
interrupts after this point to mitigate the problem that you
can't abort the transaction now".

BTW I'm thinking in backpatching this all the way back to 7.4 -- are
we agreed on that?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Alvaro Herrera <alvherre@commandprompt.com> writes:

Looks OK, but please update the comment right before the
RecordTransactionCommit, along the lines of "We prevent cancel
interrupts after this point to mitigate the problem that you
can't abort the transaction now".

BTW I'm thinking in backpatching this all the way back to 7.4 -- are
we agreed on that?

Yeah, I would think the problems can manifest all the way back.

regards, tom lane

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#8)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Looks OK, but please update the comment right before the
RecordTransactionCommit, along the lines of "We prevent cancel
interrupts after this point to mitigate the problem that you
can't abort the transaction now".

BTW I'm thinking in backpatching this all the way back to 7.4 -- are
we agreed on that?

Yeah, I would think the problems can manifest all the way back.

Done, thanks for the discussion.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#9)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

2009/11/10 Alvaro Herrera <alvherre@commandprompt.com>:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Looks OK, but please update the comment right before the
RecordTransactionCommit, along the lines of "We prevent cancel
interrupts after this point to mitigate the problem that you
can't abort the transaction now".

BTW I'm thinking in backpatching this all the way back to 7.4 -- are
we agreed on that?

Yeah, I would think the problems can manifest all the way back.

Done, thanks for the discussion.

Hello

do you have a idea abou lazy vacuum lockinkg problem?

any plans?

Regards
Pavel Stehule

Show quoted text

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pavel Stehule (#10)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

Pavel Stehule escribi�:

Hello

do you have a idea abou lazy vacuum lockinkg problem?

any plans?

Well, I understand the issue and we have an idea on how to attack it,
but I have no concrete plans to fix it ATM ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#11)
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

2009/11/10 Alvaro Herrera <alvherre@commandprompt.com>:

Pavel Stehule escribió:

Hello

do you have a idea abou lazy vacuum lockinkg problem?

any plans?

Well, I understand the issue and we have an idea on how to attack it,
but I have no concrete plans to fix it ATM ...

ok
Pavel

Show quoted text

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.