Postgres "invalid page header"

Started by Carl Andersonalmost 22 years ago10 messagesgeneral

carl.anderson@co.fulton.ga.us

almost 22 years ago

Hubert,

I too have noticed such an issue

Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)

It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.

Hi list,

I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5 system with ca. 500 GB diskspace (xfs file system)

When doing big transactions or changes (UPDATE several million rows in one step) on a database with ca. 50 GB diskspace and ca 15 million entries, PostGIS and lots of indexes, I get errors like

ERROR: invalid page header in block 582024 of relation ...

The error does not occur regularly. It seems to me that the error is related with heavy load and heavy I/O . CPU and memory does not seem the problem, according to my hotsanic tools.

I have worked through

http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8

which suggest to me that I have run into hardware trouble (disk? memory?!) But I supposed that my server was not SO bad for this database application...

How to fix that? (I mean - not to repair the tables or pages but to get a stable one? ) Trying another sotrage hardware?!
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Frï¿½hlich
Bezirksfinanzdirektion Mï¿½nchen
Alexandrastr. 3, D-80538 Mï¿½nchen, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de

--
Carl Anderson
GIS Manager, Fulton County E&CD
404.730.8026
carl.anderson@co.fulton.ga.us

Carl Anderson

carl.anderson@vadose.org

almost 22 years ago

In reply to: Carl Anderson (#1)

Re: Postgres "invalid page header"

On 06/17/2004 06:10:50 PM, Carl Anderson wrote:

Hubert,

I too have noticed such an issue

Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior causing an "invalid page header"
during completion (for me)

It seems (to me) that while WAL logs are being flushed, a certain type
of failed SQL insert will cause a page to be marked wrong.

I have not been able to nail down the problem but confirmation that
another user can replicate the problem with the exact dataset would
greatly improve the ability to get this fixed.

BTW I too am using XFS so that may also be the culprit. any non XFS
users care to try?

Show quoted text

Hi list,

I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant
DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5
system
with ca. 500 GB diskspace (xfs file system)

When doing big transactions or changes (UPDATE several million rows
in one
step) on a database with ca. 50 GB diskspace and ca 15 million
entries,
PostGIS and lots of indexes, I get errors like

ERROR: invalid page header in block 582024 of relation ...

The error does not occur regularly. It seems to me that the error is
related
with heavy load and heavy I/O . CPU and memory does not seem the
problem,
according to my hotsanic tools.

I have worked through

http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8

which suggest to me that I have run into hardware trouble (disk?
memory?!)
But I supposed that my server was not SO bad for this database
application...

How to fix that? (I mean - not to repair the tables or pages but to
get a
stable one? ) Trying another sotrage hardware?!
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fröhlich
Bezirksfinanzdirektion München
Alexandrastr. 3, D-80538 München, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de

Import Notes

Reply to msg id not found: 40D216EA.9090509@co.fulton.ga.usReference msg id not found: 40D216EA.9090509@co.fulton.ga.us | Resolved by subject fallback

Hubert Fröhlich

hubert.froehlich@bvv.bayern.de

almost 22 years ago

In reply to: Carl Anderson (#1)

Re: Postgres "invalid page header"

Carl,

Hubert,

I too have noticed such an issue

Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)

Up to now, my phenomenon is
a) described as in my posting below The error occurs irregularly, mostly
when UPDATinig large (10 mill. entries) tables, especially when doing
several million UPDATES in ONE transaction. After that, I reduced the
transaction size to < 300000 updates per transaction, and the error
still occurs.
b) Meanwhile I have "managed" to reproduce the error on a smaller scale
rather reliably. (somewhat similar to your scenario)
First, I INSERTed ca. 3 mill. rows into a database in transaction
packets of 100 each. So far, all fine. But when doing the same job
simultaneously on two databases in the same cluster on the same server,
I got the error.
c) I conclude: The issue is not an issue of too big transactions.
However, might it be an issue of coordinating several databases on the
same cluster?

It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.

I have not been able to nail down the problem but confirmation that
another user can replicate the problem with the exact dataset would
greatly improve the ability to get this fixed.

I'd like to test my server(s) with your data, so send'em.

BTW I too am using XFS so that may also be the culprit. any non XFS
users care to try?

I want to test with a different storage device but i haven't one yet.

Greetings,

Hubert

C.

Hi list,

I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)

When doing big transactions or changes (UPDATE several million rows in
one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors like

ERROR: invalid page header in block 582024 of relation ...

The error does not occur regularly. It seems to me that the error is
related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.

I have worked through

http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8

which suggest to me that I have run into hardware trouble (disk?
memory?!) But I supposed that my server was not SO bad for this
database application...

How to fix that? (I mean - not to repair the tables or pages but to
get a stable one? ) Trying another sotrage hardware?!

--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Frï¿½hlich
Bezirksfinanzdirektion Mï¿½nchen
Alexandrastr. 3, D-80538 Mï¿½nchen, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Carl Anderson (#2)

Re: Postgres "invalid page header"

Carl Anderson <carl.anderson@vadose.org> writes:

Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior causing an "invalid page header"
during completion (for me)

I would very much like to see a test case that reliably produces
"invalid page header" ...

regards, tom lane

Florian Pflug

fgp@phlo.org

almost 22 years ago

In reply to: Carl Anderson (#1)

Re: Postgres "invalid page header"

On Thu, Jun 17, 2004 at 06:07:05PM -0400, Carl Anderson wrote:

I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP ProLiant
DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5
system with ca. 500 GB diskspace (xfs file system)

When doing big transactions or changes (UPDATE several million rows in one
step) on a database with ca. 50 GB diskspace and ca 15 million entries,
PostGIS and lots of indexes, I get errors like

ERROR: invalid page header in block 582024 of relation ...

The error does not occur regularly. It seems to me that the error is
related with heavy load and heavy I/O . CPU and memory does not seem the
problem, according to my hotsanic tools.

I believe that I had a similar problem - but since that database in question
was still running I thought it was releated to the 7.4.1 bug regarding the
wrong alignment information.

Our database is running on an 2x Xeon 2.66GHz, 2GB of RAM, and two 120GB
Drives (Seagate ST3120026AS) combined to a software-raid-1 volume (using the
md driver). Postgres has it's data on an XFS-Filesystem (74GB). We are using
kernel 2.6.6 with the "deadline" io-scheduler).

We daily load about 2-3 million rows into the database (in one big
transaction). The first problem
that appeared where crashing selects - the "Invalid page header"-problem
appeared when I tried to upgrade to 7.4.2, in the "analyze" step right after
fixing the wrong alignment info in the system table.

Since we couldn't repair the database (and more and more selects started to
crash), we dumped everything that was still dumpable, and reinitialized the
database. The new database is now running for about 1 1/2 weeks, and there
haven't been any problems until now.

So - maybe it's really an XFS problem... Which version of linux are you
running?

greetings, Florian Pflug

Hubert Fröhlich

hubert.froehlich@bvv.bayern.de

almost 22 years ago

In reply to: Florian Pflug (#5)

Re: Postgres "invalid page header"

Florian,

my hardware

I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP

ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)

seems somewhat similar to yours: I have been using the 2.4.22 kernel. I
wanted do upgrade to a 2.6.X, but your experience ... (see below)

When doing big transactions or changes (UPDATE several million rows

in one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors like

ERROR: invalid page header in block 582024 of relation ...

The error does not occur regularly. It seems to me that the error

is related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.

Hi

I believe that I had a similar problem - but since that database in

question

was still running I thought it was releated to the 7.4.1 bug

regarding the

wrong alignment information.

Our database is running on an 2x Xeon 2.66GHz, 2GB of RAM, and two 120GB
Drives (Seagate ST3120026AS) combined to a software-raid-1 volume

(using the

md driver). Postgres has it's data on an XFS-Filesystem (74GB). We

are using

kernel 2.6.6 with the "deadline" io-scheduler).

We daily load about 2-3 million rows into the database (in one big
transaction). The first problem
that appeared where crashing selects - the "Invalid page header"-problem
appeared when I tried to upgrade to 7.4.2, in the "analyze" step

right after

fixing the wrong alignment info in the system table.

So, can I conclude that
a) the error happened with 7.4.1 and with 7.4.2 upgraded from 7.4.1 as
described in your posting
http://archives.postgresql.org/pgsql-general/2004-06/msg00647.php .
Are you sure that it does no longer occur in your "clean" install of 7.4.2?
b) this also happens with a 2.6 kernel so this is not a 2.4 kernel issue?

Reproducing the error is a bit difficult as it seems to occur only on
high load in big databases.

I'll try to isolate the problem on a smaller scale. Anybody on the air
who has had the problem on a smaller issue?

Since we couldn't repair the database (and more and more selects

started to

crash), we dumped everything that was still dumpable, and

reinitialized the

database. The new database is now running for about 1 1/2 weeks, and

there

haven't been any problems until now.

So - maybe it's really an XFS problem... Which version of linux are you
running?

2.4.22, see above. To isolate filesystem problems I'd like to try with a
NAS if I get one. Do you think this makes sense?

Greetings,

Hubert Frï¿½hlich

Import Notes

Resolved by subject fallback

Hubert Fröhlich

hubert.froehlich@bvv.bayern.de

over 21 years ago

In reply to: Carl Anderson (#1)

Re: Postgres "invalid page header"

Hi list,
hello Carl, Florian,

maybe you remember my posting from 18.06.2004 and your reaction(-s) The
problem is reproducing my error that does not happen very often, i.e.
isolating the problem ... As Tom Lane put it: "I would very much like to
see a test case that reliably produces "invalid page header" ..."

Meanwhile, I was able to reproduce the error (and others) with some very
basic scripts enclosed. If you are interested in the problem, feel free
to use them (No liability ...)

The basic doing is

1) prepCrashtest.sh sets up SQL files for creating & populating (via
COPY) a database and doing a big update in 1 transaction. You can scale
it by varying the numberOfRows parameter which sets the size of the only
table (this may take a while because of random generation via openssl).
There is only one table just with characters and integers and some
indexes. No special datatypes, just very basic SQL.

2) multiCrashtest.sh starts working on several databases at the same
time on each db:
a) creating, populating, dumping, vcuuming the db and performing a
"big" update operation (by singleCrashtest.sh),
b) all that in several cycles (5 to 10 seems absolutely enough).

You can scale this by varying the number of databases accessed
simultaneously

E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash when using
numberOfRows=300000 and ndatabases=3
I regularly get crashes like

ERROR: could not access status of transaction 1497515332
ERROR: could not open segment 1 of relation "idx_vmpnfl_obnr" (...
ERROR: invalid page header in block 25190 of relation "vmpnfl"
PANIC: right sibling's left-link doesn't match
ERROR: could not access status of transaction 188219392

(even PANIC!)
Same holds for an even smaller notebook .

On my big box (see below) i'm still testing with larger values. But as
you can see, the error can be reproduced on a smaller scale with a
smaller box.

The postgresql.conf settings in all cases are somewhat standard, and I
don't believe I can influence the behavior by using different settings.
I am not too familiar with postgres internals, let alone the C coding,
but I guess that there might be some buffer overflow?!

As the hardware on testing was very different, Meanwhile, I dont believe
this is a hardware issue; but this is just a "feeling", no expert's
opinion.

The crashes seem to happen only when *** several *** databases are
accessed simultaneously on the same box - and only when there is heavy
load.

The three scripts are enclosed below in plaintext to avoid cutting of
attachments (sorry for bad line formatting). The scripts may be VERY
BASIC but I hope they can shed some light on the issue

<snip>
#!/bin/bash
# prepCrashtest.sh
# prepare a SQL file containing a COPY with $1 randomly generated rows
# ---------------------------------------------------------------------
#

THIS=`basename $0`
if [ $# != 1 ]
then
echo "usage: $THIS numberOfRows"
exit 1
fi
nrows=$1

make_copy_row()
# ----------------------------------------------------------------------
# create a row for COPY command randomly
{
obnr=`openssl rand -base64 20 | cut -c1-16`
pkn=`openssl rand -base64 25 | cut -c1-30`
echo -e "$obnr\t0\t2147483647\t10\t$pkn\t\\N\t0"
}

make_copy_nrows()
{
#
----------------------------------------------------------------------------
# create a COPY command for a table with $nrows randomly generated rows
echo "COPY vmpnfl (obnr, edat, udat, vma, pkn, dpp, orient) FROM stdin;"
n=0
while [[ $n -lt $nrows ]]
do
make_copy_row
n=`expr $n + 1`
done
echo "\."
}

echo "...create SQL CREATE TABLE file"

echo "CREATE TABLE vmpnfl (obnr character(16), edat integer DEFAULT 0,
udat integer DEFAULT 2147483647, vma smallint, pkn character(30), dpp
smallint, orient double precision DEFAULT 0 );">crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_obnr ON vmpnfl USING btree
(obnr);">>crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_udat ON vmpnfl USING btree
(udat);">>crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_oid ON vmpnfl USING btree
(oid);">>crashSetup.sql

echo "...create SQL UPDATE file"

echo "begin; update vmpnfl set dpp = 1234; commit" >crashUpdate.sql

echo "...create SQL COPY file with $nrows rows ... this may take a
little while"

make_copy_nrows >crashTable.sql

echo "Finished. Try now:"
echo "multiCrashtest.sh ndatabases servername ncycles"
<snip>
#!/bin/bash
# singleCrashtest.sh
# testing a single database on heavy load
# ----------------------------------------------------------------------
# setting up a database with one table and some indexex
# populating the db
# dumping the db somewhere
# vacuuming the db
# performing a big update operation: update of all columns in one row

THIS=`basename $0`
if [ $# != 3 ]
then
echo "usage: $THIS dbname servername ncycles"
exit 1
fi

database=$1
server=$2
ncycles=$3
dumpfile=dump$database.sql

echo "=== test on database $database, server $server, $ncycles R/W
cycles: "`date +%Y%m%d%H%M%S`" ==="

cycle=0

while [[ $cycle -lt $ncycles ]]
do
cycle=`expr $cycle + 1`
echo "=== cycle $cycle: "`date +%Y%m%d%H%M%S`" ==="

dropdb -h $server $database 2>&1
createdb -h $server $database 2>&1
psql -h $server $database -f crashSetup.sql 2>&1| grep -v CREATE #
reducing debug output

psql -h $server $database -f crashTable.sql 2>&1 | grep -v INSERT #
reducing debug output
sleep 3
echo "--- cycle $cycle: Setup "`date +%Y%m%d%H%M%S`" ---"

pg_dump -h $server $database >$dumpfile 2>&1
sleep 3
echo "--- cycle $cycle: Dump "`date +%Y%m%d%H%M%S`" ---"
rm $dumpfile

echo -n "Tuples: "
psql $database -h $server -P tuples_only -c "select count(*) from vmpnfl"
psql $database -h $server -c "vacuum verbose analyze" 2>&1
sleep 3
echo "--- cycle $cycle: Vacuum "`date +%Y%m%d%H%M%S`" ---"

psql -h $server $database -f crashUpdate.sql 2>&1
sleep 3
echo "--- cycle $cycle: BIG Update "`date +%Y%m%d%H%M%S`" ---"

done

<snip>
#!/bin/bash
# multiCrashtest.sh
# testing databases on heavy load:
# Reading/writing into several databases simultaneously:
# this can be looped several times
# load can be increased
# by increasing ndb = no. of databases being accessed simultaneously
# or by increasing the number of rows in prepCrashtest.sh
# ----------------------------------------------------------------------
# ndatabases
# servername = name of database server
# ncycles =

THIS=`basename $0`
if [ $# != 3 ]
then
echo "usage: $THIS ndatabases servername ncycles"
exit 1
fi

ndb=$1
server=$2
ncycles=$3

# ----------------------------------------------------------------------
echo "...loop over $ndb databases on server $servername: $ncycles R/W
cycles per database"

db=0
while [[ $db -lt $ndb ]]
do
# putting job to background
singleCrashtest.sh crash${db} localhost $ncycles >log_crash${db} &
echo " ... process for database crash${db} started"
sleep 5
db=`expr $db + 1`
done
exit

<snip>

Hubert,

I too have noticed such an issue

Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)

It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.

C.

Hi list,

I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)

When doing big transactions or changes (UPDATE several million rows in
one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors like

ERROR: invalid page header in block 582024 of relation ...

The error does not occur regularly. It seems to me that the error is
related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.

I have worked through

http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8

which suggest to me that I have run into hardware trouble (disk?
memory?!) But I supposed that my server was not SO bad for this
database application...

How to fix that? (I mean - not to repair the tables or pages but to
get a stable one? ) Trying another sotrage hardware?!

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Hubert Fröhlich (#7)

Re: Postgres "invalid page header"

=?ISO-8859-1?Q?Hubert_Fr=F6hlich?= <hubert.froehlich@bvv.bayern.de> writes:

E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash when using
numberOfRows=300000 and ndatabases=3

FWIW, I could not reproduce any such problem here. I ran three test
cycles with numberOfRows=300000 and ndatabases=3 under PG 7.4.3 and
another three under CVS tip (amounting to something over six hours
total runtime). I saw no errors whatsoever. I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.

I'm leaning to the theory that you've got intermittent hardware
problems.

regards, tom lane

Manfred Koizar

mkoi-pg@aon.at

over 21 years ago

In reply to: Tom Lane (#8)

Re: Postgres "invalid page header"

On Fri, 23 Jul 2004 14:33:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?ISO-8859-1?Q?Hubert_Fr=F6hlich?= <hubert.froehlich@bvv.bayern.de> writes:

E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash [...]

Hubert, have you ever tested with something different than XFS? I
volunteer to cross-check any test you want me to, as long as you can
reproduce the problem on an ext2 file system first.

FWIW, I could not reproduce any such problem here. [...]
I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.

Tom, do you remember what type of FS you used?

Servus
Manfred

#10

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Manfred Koizar (#9)

Re: Postgres "invalid page header"

Manfred Koizar <mkoi-pg@aon.at> writes:

On Fri, 23 Jul 2004 14:33:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

FWIW, I could not reproduce any such problem here. [...]
I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.

Tom, do you remember what type of FS you used?

That machine is using ext3.

regards, tom lane