Vaccuum Failure w/7.1beta4 on Linux/Sparc

Started by Ryan Kirkpatrickalmost 25 years ago6 messages
#1Ryan Kirkpatrick
pgsql@rkirkpat.net

While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE: FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR: VACUUM (repair_frag): FlushRelationBuffers returned -2

The first line is the error message from pgsql, while the second line is
the error message from my application (using perl Pg module) reporting the
error message returned. It appears that this should only be a warning
(i.e. NOTICE, not FATAL or ERROR), but it caused the Pg module to throw an
error anyway. My application of course checks for errors, see the error
thrown by Pg and dies assuming the error was fatal.
This error occurred after a load of about 50k records into the
referenced table, a load of 50k records total into a few other tables, and
then a few clean up queries. The part of the application I was testing is
a database load from another (old, closed source) database. The vacuum
was at the end of the of the database load, as part of final cleanup
routines.
So, is this a problem with pgsql in general, specific to
Linux/Sparc, or a bug in Pg causing it to be too paranoid? Thanks.

---------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
---------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ |
---------------------------------------------------------------------------

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Kirkpatrick (#1)
Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc

Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:

While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE: FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR: VACUUM (repair_frag): FlushRelationBuffers returned -2

This is undoubtedly a backend bug. Can you generate a reproducible test
case?

So, is this a problem with pgsql in general, specific to
Linux/Sparc, or a bug in Pg causing it to be too paranoid? Thanks.

Pg did get an ERROR from the vacuum command (note second line). Yes,
there is paranoia right up the line here, but I think that's a good
thing. Somewhere someone is failing to release a buffer refcount,
and we don't know what other consequences that bug might have. Better
to err on the side of caution.

regards, tom lane

#3Ryan Kirkpatrick
pgsql@rkirkpat.net
In reply to: Tom Lane (#2)
Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc

On Mon, 12 Mar 2001, Tom Lane wrote:

Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:

While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE: FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR: VACUUM (repair_frag): FlushRelationBuffers returned -2

This is undoubtedly a backend bug. Can you generate a reproducible test
case?

I will work on it... The code that eventually caused it does a lot
of different things so it will take me a little while to pair it down to
a small, self-contained test case. I should have it by this weekend.
Also, two other details I forgot to put in my first email:

a) Running 'vaccumdb -t Jobs {dbname}' about 24 hours after the error (the
backend had been completely idle during this time), ran successfully
without error.

b) The disk space where the pgsql database is located is NFS mounted from
my Alpha (running Linux of course :). [0]Yes, I know running pgsql on an NFS mount is probably not the greatest idea, but the system only has 1GB of local disk space (almost all used for the system) and is running as development server only. No valuable data is entrusted to it. Hopefully I will have more local disk space in the near future. Might this cause the error?

[0]: Yes, I know running pgsql on an NFS mount is probably not the greatest idea, but the system only has 1GB of local disk space (almost all used for the system) and is running as development server only. No valuable data is entrusted to it. Hopefully I will have more local disk space in the near future.
idea, but the system only has 1GB of local disk space (almost all used for
the system) and is running as development server only. No valuable data is
entrusted to it. Hopefully I will have more local disk space in the near
future.

Pg did get an ERROR from the vacuum command (note second line). Yes,
there is paranoia right up the line here, but I think that's a good
thing. Somewhere someone is failing to release a buffer refcount,
and we don't know what other consequences that bug might have. Better
to err on the side of caution.

A resonable amount of paranoia is indeed always healthy. :) Just
wanted to know if this might have been a known and harmless warning. I
guess not. I will work on a test case and get back hopefully by the
weekend. Thanks for your help.

---------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
---------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ |
---------------------------------------------------------------------------

#4Ryan Kirkpatrick
pgsql@rkirkpat.net
In reply to: Ryan Kirkpatrick (#1)
Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc -- FALSE ALARM

On Mon, 12 Mar 2001, Ryan Kirkpatrick wrote:

While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE: FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR: VACUUM (repair_frag): FlushRelationBuffers returned -2

I moved the data directory to a local parition (from the NFS
mounted one it was on) and reran my application. It worked fine this time,
vaccuming tables with out errors and the above error was never seen. Looks
like pgsql is not NFS safe, or at least with Linux's implementation.
This is good news in that it is not a serious issue, but bad news
in that now I really do have to hurry up and get more local space for this
box to do anything useful with it. :)
Thanks for everyone's help. TTYL.

---------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
---------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ |
---------------------------------------------------------------------------

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Kirkpatrick (#4)
Re: Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc -- FALSE ALARM

Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:

On Mon, 12 Mar 2001, Ryan Kirkpatrick wrote:

While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE: FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR: VACUUM (repair_frag): FlushRelationBuffers returned -2

This is probably explained by the problem we found a few days ago with
BufferSync acquiring locks it shouldn't.

regards, tom lane

#6Ryan Kirkpatrick
pgsql@rkirkpat.net
In reply to: Tom Lane (#5)
Re: Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc -- FALSE ALARM

On Mon, 26 Mar 2001, Tom Lane wrote:

Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:

On Mon, 12 Mar 2001, Ryan Kirkpatrick wrote:

While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE: FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR: VACUUM (repair_frag): FlushRelationBuffers returned -2

This is probably explained by the problem we found a few days ago with
BufferSync acquiring locks it shouldn't.

Yea, it was. I just tried RC1 on the Sparc with my application,
with the data directory NFS mounted, and it ran without errors
now. Thanks. :)

---------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
---------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ |
---------------------------------------------------------------------------