Postgres "invalid page header"
Hubert,
I too have noticed such an issue
Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)
It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.
C.
Hi list,
I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5 system with ca. 500 GB diskspace (xfs file system)
When doing big transactions or changes (UPDATE several million rows in one step) on a database with ca. 50 GB diskspace and ca 15 million entries, PostGIS and lots of indexes, I get errors like
ERROR: invalid page header in block 582024 of relation ...
The error does not occur regularly. It seems to me that the error is related with heavy load and heavy I/O . CPU and memory does not seem the problem, according to my hotsanic tools.
I have worked through
http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8which suggest to me that I have run into hardware trouble (disk? memory?!) But I supposed that my server was not SO bad for this database application...
How to fix that? (I mean - not to repair the tables or pages but to get a stable one? ) Trying another sotrage hardware?!
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fr�hlich
Bezirksfinanzdirektion M�nchen
Alexandrastr. 3, D-80538 M�nchen, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de
--
Carl Anderson
GIS Manager, Fulton County E&CD
404.730.8026
carl.anderson@co.fulton.ga.us
On 06/17/2004 06:10:50 PM, Carl Anderson wrote:
Hubert,
I too have noticed such an issue
Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior causing an "invalid page header"
during completion (for me)
It seems (to me) that while WAL logs are being flushed, a certain type
of failed SQL insert will cause a page to be marked wrong.
I have not been able to nail down the problem but confirmation that
another user can replicate the problem with the exact dataset would
greatly improve the ability to get this fixed.
BTW I too am using XFS so that may also be the culprit. any non XFS
users care to try?
C.
Show quoted text
Hi list,
I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant
DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5
system
with ca. 500 GB diskspace (xfs file system)When doing big transactions or changes (UPDATE several million rows
in one
step) on a database with ca. 50 GB diskspace and ca 15 million
entries,
PostGIS and lots of indexes, I get errors likeERROR: invalid page header in block 582024 of relation ...
The error does not occur regularly. It seems to me that the error is
related
with heavy load and heavy I/O . CPU and memory does not seem the
problem,
according to my hotsanic tools.I have worked through
http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8which suggest to me that I have run into hardware trouble (disk?
memory?!)
But I supposed that my server was not SO bad for this database
application...How to fix that? (I mean - not to repair the tables or pages but to
get a
stable one? ) Trying another sotrage hardware?!
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fröhlich
Bezirksfinanzdirektion München
Alexandrastr. 3, D-80538 München, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de
Import Notes
Reply to msg id not found: 40D216EA.9090509@co.fulton.ga.usReference msg id not found: 40D216EA.9090509@co.fulton.ga.us | Resolved by subject fallback
Carl,
Hubert,
I too have noticed such an issue
Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)
Up to now, my phenomenon is
a) described as in my posting below The error occurs irregularly, mostly
when UPDATinig large (10 mill. entries) tables, especially when doing
several million UPDATES in ONE transaction. After that, I reduced the
transaction size to < 300000 updates per transaction, and the error
still occurs.
b) Meanwhile I have "managed" to reproduce the error on a smaller scale
rather reliably. (somewhat similar to your scenario)
First, I INSERTed ca. 3 mill. rows into a database in transaction
packets of 100 each. So far, all fine. But when doing the same job
simultaneously on two databases in the same cluster on the same server,
I got the error.
c) I conclude: The issue is not an issue of too big transactions.
However, might it be an issue of coordinating several databases on the
same cluster?
It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.I have not been able to nail down the problem but confirmation that
another user can replicate the problem with the exact dataset would
greatly improve the ability to get this fixed.
I'd like to test my server(s) with your data, so send'em.
BTW I too am using XFS so that may also be the culprit. any non XFS
users care to try?
I want to test with a different storage device but i haven't one yet.
Greetings,
Hubert
C.
Hi list,
I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)When doing big transactions or changes (UPDATE several million rows in
one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors likeERROR: invalid page header in block 582024 of relation ...
The error does not occur regularly. It seems to me that the error is
related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.I have worked through
http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8which suggest to me that I have run into hardware trouble (disk?
memory?!) But I supposed that my server was not SO bad for this
database application...How to fix that? (I mean - not to repair the tables or pages but to
get a stable one? ) Trying another sotrage hardware?!
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fr�hlich
Bezirksfinanzdirektion M�nchen
Alexandrastr. 3, D-80538 M�nchen, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de
Carl Anderson <carl.anderson@vadose.org> writes:
Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior causing an "invalid page header"
during completion (for me)
I would very much like to see a test case that reliably produces
"invalid page header" ...
regards, tom lane
On Thu, Jun 17, 2004 at 06:07:05PM -0400, Carl Anderson wrote:
I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP ProLiant
DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5
system with ca. 500 GB diskspace (xfs file system)When doing big transactions or changes (UPDATE several million rows in one
step) on a database with ca. 50 GB diskspace and ca 15 million entries,
PostGIS and lots of indexes, I get errors likeERROR: invalid page header in block 582024 of relation ...
The error does not occur regularly. It seems to me that the error is
related with heavy load and heavy I/O . CPU and memory does not seem the
problem, according to my hotsanic tools.
Hi
I believe that I had a similar problem - but since that database in question
was still running I thought it was releated to the 7.4.1 bug regarding the
wrong alignment information.
Our database is running on an 2x Xeon 2.66GHz, 2GB of RAM, and two 120GB
Drives (Seagate ST3120026AS) combined to a software-raid-1 volume (using the
md driver). Postgres has it's data on an XFS-Filesystem (74GB). We are using
kernel 2.6.6 with the "deadline" io-scheduler).
We daily load about 2-3 million rows into the database (in one big
transaction). The first problem
that appeared where crashing selects - the "Invalid page header"-problem
appeared when I tried to upgrade to 7.4.2, in the "analyze" step right after
fixing the wrong alignment info in the system table.
Since we couldn't repair the database (and more and more selects started to
crash), we dumped everything that was still dumpable, and reinitialized the
database. The new database is now running for about 1 1/2 weeks, and there
haven't been any problems until now.
So - maybe it's really an XFS problem... Which version of linux are you
running?
greetings, Florian Pflug
Florian,
my hardware
I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)
seems somewhat similar to yours: I have been using the 2.4.22 kernel. I
wanted do upgrade to a 2.6.X, but your experience ... (see below)
When doing big transactions or changes (UPDATE several million rows
in one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors like
ERROR: invalid page header in block 582024 of relation ...
The error does not occur regularly. It seems to me that the error
is related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.
Hi
I believe that I had a similar problem - but since that database in
question
was still running I thought it was releated to the 7.4.1 bug
regarding the
wrong alignment information.
Our database is running on an 2x Xeon 2.66GHz, 2GB of RAM, and two 120GB
Drives (Seagate ST3120026AS) combined to a software-raid-1 volume
(using the
md driver). Postgres has it's data on an XFS-Filesystem (74GB). We
are using
kernel 2.6.6 with the "deadline" io-scheduler).
We daily load about 2-3 million rows into the database (in one big
transaction). The first problem
that appeared where crashing selects - the "Invalid page header"-problem
appeared when I tried to upgrade to 7.4.2, in the "analyze" step
right after
fixing the wrong alignment info in the system table.
So, can I conclude that
a) the error happened with 7.4.1 and with 7.4.2 upgraded from 7.4.1 as
described in your posting
http://archives.postgresql.org/pgsql-general/2004-06/msg00647.php .
Are you sure that it does no longer occur in your "clean" install of 7.4.2?
b) this also happens with a 2.6 kernel so this is not a 2.4 kernel issue?
Reproducing the error is a bit difficult as it seems to occur only on
high load in big databases.
I'll try to isolate the problem on a smaller scale. Anybody on the air
who has had the problem on a smaller issue?
Since we couldn't repair the database (and more and more selects
started to
crash), we dumped everything that was still dumpable, and
reinitialized the
database. The new database is now running for about 1 1/2 weeks, and
there
haven't been any problems until now.
So - maybe it's really an XFS problem... Which version of linux are you
running?
2.4.22, see above. To isolate filesystem problems I'd like to try with a
NAS if I get one. Do you think this makes sense?
Greetings,
Hubert Fr�hlich
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fr�hlich
Bezirksfinanzdirektion M�nchen
Alexandrastr. 3, D-80538 M�nchen, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de
Import Notes
Resolved by subject fallback
Hi list,
hello Carl, Florian,
maybe you remember my posting from 18.06.2004 and your reaction(-s) The
problem is reproducing my error that does not happen very often, i.e.
isolating the problem ... As Tom Lane put it: "I would very much like to
see a test case that reliably produces "invalid page header" ..."
Meanwhile, I was able to reproduce the error (and others) with some very
basic scripts enclosed. If you are interested in the problem, feel free
to use them (No liability ...)
The basic doing is
1) prepCrashtest.sh sets up SQL files for creating & populating (via
COPY) a database and doing a big update in 1 transaction. You can scale
it by varying the numberOfRows parameter which sets the size of the only
table (this may take a while because of random generation via openssl).
There is only one table just with characters and integers and some
indexes. No special datatypes, just very basic SQL.
2) multiCrashtest.sh starts working on several databases at the same
time on each db:
a) creating, populating, dumping, vcuuming the db and performing a
"big" update operation (by singleCrashtest.sh),
b) all that in several cycles (5 to 10 seems absolutely enough).
You can scale this by varying the number of databases accessed
simultaneously
E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash when using
numberOfRows=300000 and ndatabases=3
I regularly get crashes like
ERROR: could not access status of transaction 1497515332
ERROR: could not open segment 1 of relation "idx_vmpnfl_obnr" (...
ERROR: invalid page header in block 25190 of relation "vmpnfl"
PANIC: right sibling's left-link doesn't match
ERROR: could not access status of transaction 188219392
(even PANIC!)
Same holds for an even smaller notebook .
On my big box (see below) i'm still testing with larger values. But as
you can see, the error can be reproduced on a smaller scale with a
smaller box.
The postgresql.conf settings in all cases are somewhat standard, and I
don't believe I can influence the behavior by using different settings.
I am not too familiar with postgres internals, let alone the C coding,
but I guess that there might be some buffer overflow?!
As the hardware on testing was very different, Meanwhile, I dont believe
this is a hardware issue; but this is just a "feeling", no expert's
opinion.
The crashes seem to happen only when *** several *** databases are
accessed simultaneously on the same box - and only when there is heavy
load.
The three scripts are enclosed below in plaintext to avoid cutting of
attachments (sorry for bad line formatting). The scripts may be VERY
BASIC but I hope they can shed some light on the issue
<snip>
#!/bin/bash
# prepCrashtest.sh
# prepare a SQL file containing a COPY with $1 randomly generated rows
# ---------------------------------------------------------------------
#
THIS=`basename $0`
if [ $# != 1 ]
then
echo "usage: $THIS numberOfRows"
exit 1
fi
nrows=$1
make_copy_row()
# ----------------------------------------------------------------------
# create a row for COPY command randomly
{
obnr=`openssl rand -base64 20 | cut -c1-16`
pkn=`openssl rand -base64 25 | cut -c1-30`
echo -e "$obnr\t0\t2147483647\t10\t$pkn\t\\N\t0"
}
make_copy_nrows()
{
#
----------------------------------------------------------------------------
# create a COPY command for a table with $nrows randomly generated rows
echo "COPY vmpnfl (obnr, edat, udat, vma, pkn, dpp, orient) FROM stdin;"
n=0
while [[ $n -lt $nrows ]]
do
make_copy_row
n=`expr $n + 1`
done
echo "\."
}
echo "...create SQL CREATE TABLE file"
echo "CREATE TABLE vmpnfl (obnr character(16), edat integer DEFAULT 0,
udat integer DEFAULT 2147483647, vma smallint, pkn character(30), dpp
smallint, orient double precision DEFAULT 0 );">crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_obnr ON vmpnfl USING btree
(obnr);">>crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_udat ON vmpnfl USING btree
(udat);">>crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_oid ON vmpnfl USING btree
(oid);">>crashSetup.sql
echo "...create SQL UPDATE file"
echo "begin; update vmpnfl set dpp = 1234; commit" >crashUpdate.sql
echo "...create SQL COPY file with $nrows rows ... this may take a
little while"
make_copy_nrows >crashTable.sql
echo "Finished. Try now:"
echo "multiCrashtest.sh ndatabases servername ncycles"
<snip>
#!/bin/bash
# singleCrashtest.sh
# testing a single database on heavy load
# ----------------------------------------------------------------------
# setting up a database with one table and some indexex
# populating the db
# dumping the db somewhere
# vacuuming the db
# performing a big update operation: update of all columns in one row
THIS=`basename $0`
if [ $# != 3 ]
then
echo "usage: $THIS dbname servername ncycles"
exit 1
fi
database=$1
server=$2
ncycles=$3
dumpfile=dump$database.sql
echo "=== test on database $database, server $server, $ncycles R/W
cycles: "`date +%Y%m%d%H%M%S`" ==="
cycle=0
while [[ $cycle -lt $ncycles ]]
do
cycle=`expr $cycle + 1`
echo "=== cycle $cycle: "`date +%Y%m%d%H%M%S`" ==="
dropdb -h $server $database 2>&1
createdb -h $server $database 2>&1
psql -h $server $database -f crashSetup.sql 2>&1| grep -v CREATE #
reducing debug output
psql -h $server $database -f crashTable.sql 2>&1 | grep -v INSERT #
reducing debug output
sleep 3
echo "--- cycle $cycle: Setup "`date +%Y%m%d%H%M%S`" ---"
pg_dump -h $server $database >$dumpfile 2>&1
sleep 3
echo "--- cycle $cycle: Dump "`date +%Y%m%d%H%M%S`" ---"
rm $dumpfile
echo -n "Tuples: "
psql $database -h $server -P tuples_only -c "select count(*) from vmpnfl"
psql $database -h $server -c "vacuum verbose analyze" 2>&1
sleep 3
echo "--- cycle $cycle: Vacuum "`date +%Y%m%d%H%M%S`" ---"
psql -h $server $database -f crashUpdate.sql 2>&1
sleep 3
echo "--- cycle $cycle: BIG Update "`date +%Y%m%d%H%M%S`" ---"
done
<snip>
#!/bin/bash
# multiCrashtest.sh
# testing databases on heavy load:
# Reading/writing into several databases simultaneously:
# this can be looped several times
# load can be increased
# by increasing ndb = no. of databases being accessed simultaneously
# or by increasing the number of rows in prepCrashtest.sh
# ----------------------------------------------------------------------
# ndatabases
# servername = name of database server
# ncycles =
THIS=`basename $0`
if [ $# != 3 ]
then
echo "usage: $THIS ndatabases servername ncycles"
exit 1
fi
ndb=$1
server=$2
ncycles=$3
# ----------------------------------------------------------------------
echo "...loop over $ndb databases on server $servername: $ncycles R/W
cycles per database"
db=0
while [[ $db -lt $ndb ]]
do
# putting job to background
singleCrashtest.sh crash${db} localhost $ncycles >log_crash${db} &
echo " ... process for database crash${db} started"
sleep 5
db=`expr $db + 1`
done
exit
<snip>
Hubert,
I too have noticed such an issue
Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.C.
Hi list,
I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)When doing big transactions or changes (UPDATE several million rows in
one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors likeERROR: invalid page header in block 582024 of relation ...
The error does not occur regularly. It seems to me that the error is
related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.I have worked through
http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8which suggest to me that I have run into hardware trouble (disk?
memory?!) But I supposed that my server was not SO bad for this
database application...How to fix that? (I mean - not to repair the tables or pages but to
get a stable one? ) Trying another sotrage hardware?!
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fr�hlich
Bezirksfinanzdirektion M�nchen
Alexandrastr. 3, D-80538 M�nchen, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de
=?ISO-8859-1?Q?Hubert_Fr=F6hlich?= <hubert.froehlich@bvv.bayern.de> writes:
E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash when using
numberOfRows=300000 and ndatabases=3
FWIW, I could not reproduce any such problem here. I ran three test
cycles with numberOfRows=300000 and ndatabases=3 under PG 7.4.3 and
another three under CVS tip (amounting to something over six hours
total runtime). I saw no errors whatsoever. I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.
I'm leaning to the theory that you've got intermittent hardware
problems.
regards, tom lane
On Fri, 23 Jul 2004 14:33:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?ISO-8859-1?Q?Hubert_Fr=F6hlich?= <hubert.froehlich@bvv.bayern.de> writes:
E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash [...]
Hubert, have you ever tested with something different than XFS? I
volunteer to cross-check any test you want me to, as long as you can
reproduce the problem on an ext2 file system first.
FWIW, I could not reproduce any such problem here. [...]
I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.
Tom, do you remember what type of FS you used?
Servus
Manfred
Manfred Koizar <mkoi-pg@aon.at> writes:
On Fri, 23 Jul 2004 14:33:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
FWIW, I could not reproduce any such problem here. [...]
I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.
Tom, do you remember what type of FS you used?
That machine is using ext3.
regards, tom lane