ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

Started by austijcover 17 years ago6 messagesbugs
Jump to latest
#1austijc
jaustin@jasononthe.net

Configuration:

Postgres 8.3.1
Solaris 10 Sparc System NFS mounting the database directory from a NetApp
2020 NAS device.

Mount options:

rw,bg,hard,rsize=32768,wsize=32768,vers=3,forcedirectio,nointr,proto=tcp,suid

Error:

ERROR: unexpected data beyond EOF in block 315378 of relation "file"
HINT: This has been seen to occur with buggy kernels; consider updating
your system.

Situation:

Occasionally under heavy insert load.

The error comes from line 225 of bufmgr.c. The kernel bug mentioned in the
comments is an lseek bug in a Linux kernel so I don't believe that is the
case here.

The question is can anyone more familiar with this tell me what's going on
here? I don't know if this is a Postgres, Sun, or NetApp issue. Could it
be a work around for an old Linux bug causing an issue with acceptable
behavior of the NetApp device?

There has been some clock differences between the Solaris system and the
Netapp device. Could postgres be confused by file modify times being in the
future by a few seconds?

--
View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19680438.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: austijc (#1)
Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

austijc <jaustin@jasononthe.net> writes:

The question is can anyone more familiar with this tell me what's going on
here? I don't know if this is a Postgres, Sun, or NetApp issue. Could it
be a work around for an old Linux bug causing an issue with acceptable
behavior of the NetApp device?

People who try to run databases over NFS usually regret it eventually ;-)

All I can say is that this error message has never before been reported
by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
I am not finding it too hard to believe that NFS might be vulnerable to
similar misbehavior.

regards, tom lane

#3austijc
jaustin@jasononthe.net
In reply to: Tom Lane (#2)
Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

That's going to be a problem for the continued viability of Postgres.
Clustered systems using a NAS for data is a pretty common configuration
these days. Oracle specifically supports it and even complains if your NFS
mount options are not correct. Our Oracle DBs run great in this same
configuration and are a good 10-20 times faster than the local disk
performance along with the quick take-over capability if a system goes belly
up.

I'll try to isolate this problem with a simple C program to tell me what
software layer to look at. Hopefully it's just a configuration issue.

Tom Lane-2 wrote:

austijc <jaustin@jasononthe.net> writes:

The question is can anyone more familiar with this tell me what's going
on
here? I don't know if this is a Postgres, Sun, or NetApp issue. Could
it
be a work around for an old Linux bug causing an issue with acceptable
behavior of the NetApp device?

People who try to run databases over NFS usually regret it eventually ;-)

All I can say is that this error message has never before been reported
by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
I am not finding it too hard to believe that NFS might be vulnerable to
similar misbehavior.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

#4David Fetter
david@fetter.org
In reply to: austijc (#3)
Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote:

That's going to be a problem for the continued viability of
Postgres.

Funny, I thought running a DBMS over a known-unreliable storage system
was a problem for the continued viability of Oracle. When, not if,
people lose enough data to this silliness, they'll be thinking hard
about how to get Oracle out and something reliable in.

Clustered systems using a NAS for data is a pretty common
configuration these days. Oracle specifically supports it and even
complains if your NFS mount options are not correct. Our Oracle
DBs run great in this same configuration and are a good 10-20 times
faster than the local disk performance along with the quick
take-over capability if a system goes belly up.

Oracle stores more state to the disk than PostgreSQL does, which has
significant down sides. There are more effective ways of handling
uptime requirements than jamming NFS into the picture. Maybe it's
just my failure of imagination, but I can't think of a *less*
effective one.

I'll try to isolate this problem with a simple C program to tell me
what software layer to look at. Hopefully it's just a configuration
issue.

It's not. The issue is that NFS is broken garbage from a DBMS, and,
it's pretty easy to argue, just about any other perspective.

Cheers,
David.

Tom Lane-2 wrote:

austijc <jaustin@jasononthe.net> writes:

The question is can anyone more familiar with this tell me what's going
on
here? I don't know if this is a Postgres, Sun, or NetApp issue. Could
it
be a work around for an old Linux bug causing an issue with acceptable
behavior of the NetApp device?

People who try to run databases over NFS usually regret it eventually ;-)

All I can say is that this error message has never before been reported
by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
I am not finding it too hard to believe that NFS might be vulnerable to
similar misbehavior.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5austijc
jaustin@jasononthe.net
In reply to: David Fetter (#4)
Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

Okay, I see the maturity level is too low here. I'll take this elsewhere.
If anyone has a similar problem and would like to know the status please
email me.

David Fetter wrote:

On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote:

That's going to be a problem for the continued viability of
Postgres.

Funny, I thought running a DBMS over a known-unreliable storage system
was a problem for the continued viability of Oracle. When, not if,
people lose enough data to this silliness, they'll be thinking hard
about how to get Oracle out and something reliable in.

Clustered systems using a NAS for data is a pretty common
configuration these days. Oracle specifically supports it and even
complains if your NFS mount options are not correct. Our Oracle
DBs run great in this same configuration and are a good 10-20 times
faster than the local disk performance along with the quick
take-over capability if a system goes belly up.

Oracle stores more state to the disk than PostgreSQL does, which has
significant down sides. There are more effective ways of handling
uptime requirements than jamming NFS into the picture. Maybe it's
just my failure of imagination, but I can't think of a *less*
effective one.

I'll try to isolate this problem with a simple C program to tell me
what software layer to look at. Hopefully it's just a configuration
issue.

It's not. The issue is that NFS is broken garbage from a DBMS, and,
it's pretty easy to argue, just about any other perspective.

Cheers,
David.

Tom Lane-2 wrote:

austijc <jaustin@jasononthe.net> writes:

The question is can anyone more familiar with this tell me what's

going

on
here? I don't know if this is a Postgres, Sun, or NetApp issue.

Could

it
be a work around for an old Linux bug causing an issue with acceptable
behavior of the NetApp device?

People who try to run databases over NFS usually regret it eventually

;-)

All I can say is that this error message has never before been reported
by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
I am not finding it too hard to believe that NFS might be vulnerable to
similar misbehavior.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
View this message in context:
http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19728120.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

#6Peter Eisentraut
peter_e@gmx.net
In reply to: David Fetter (#4)
Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

David Fetter wrote:

On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote:

That's going to be a problem for the continued viability of
Postgres.

Funny, I thought running a DBMS over a known-unreliable storage system
was a problem for the continued viability of Oracle. When, not if,
people lose enough data to this silliness, they'll be thinking hard
about how to get Oracle out and something reliable in.

NFS is not "unreliable", it is just different in some respects from
other file systems. That paired with some poor NFS implementations in
certain operating systems and this evident general misunderstanding make
it a poor fit for PostgreSQL.