Corrupt data pages...

Started by Kevin Brownover 19 years ago6 messages
#1Kevin Brown
kevin@sysexperts.com

Apologies if my previous attempts to post this to the mailing list
have actually succeeded, but I've seen no evidence of that...

While doing some bugzilla testing, I ran into a data page corruption
issue.

The symptom was the usual "could not access status of transaction
<bignum>". I tracked it down via the usual pg_filedump method and
found what appears to my untrained eye to be data tuples from a
different table residing in the (now corrupt) table in question.

First, the relevant environmental details:

System:
- ASUS A8V motherboard
- Athlon 64 3200
- 2G ECC memory (which is enabled. The kernel will panic in the
face of an uncorrectable ECC error).
- 2 250G Seagate ST3250823AS rev 3.03 in a mirrored configuration
via Linux software RAID.

OS: Debian Linux x86_64, 'unstable' branch. 'uname -a' yields:
Linux kbrown-desktop 2.6.15-1-amd64-generic #2 Wed Jan 4 06:22:42
CET 2006 x86_64 GNU/Linux

Filesystem: XFS with the default mount options.

PostgreSQL:
- version: 8.1.3.
- The package is postgresql-8.1 version 8.1.3-2.
- Autovacuum is enabled. I'm attaching my postgresql.conf.

The context:

I'm performing testing in preparation for migrating from a different
bug tracking system to bugzilla. Most of this testing revolves around
the migration tool, which reads directly from the other bug tracking
system's database and inserts into the bugzilla database. The
bugzilla database is hosted on PostgreSQL and has a lot of additional
foreign keys that bugzilla does not have by default. All of the
constraints are DEFERRABLE and INITIALLY DEFERRED.

A test run consists of dropping all the constraints and all the
tables, recreating all the tables and constraints, and importing a
base set of data. The commands for doing all that were created via
pg_dump. They are all executed in a single transaction and then
committed.

The migration run itself also executes in a single transaction and
involves a savepoint prior to every insertion group (the name of the
savepoint is based on the name of the operation, so the total number
of outstanding savepoints at any one time is perhaps 4 or 5). Each
migration run ends up executing perhaps 300,000 inserts.

The meat of the problem:

After the last run, a later select from one of the tables yielded the
error. I narrowed it down to a single table (series_data) and tracked
down the offending row via a binary search. Here's the end results:

stagezilla=# select * from series_data;
ERROR: could not access status of transaction 1685201011
DETAIL: could not open file "pg_clog/0647": No such file or directory
stagezilla=# select * from series_data offset 9927 limit 1;
series_id | series_date | series_value
-----------+---------------------+--------------
46 | 2006-01-06 00:00:00 | 0
(1 row)

stagezilla=# select * from series_data offset 9928 limit 1;
ERROR: could not access status of transaction 1685201011
DETAIL: could not open file "pg_clog/0647": No such file or directory
stagezilla=# select ctid from series_data offset 9927 limit 1;
ctid
----------
(72,136)
(1 row)

stagezilla=# select oid from pg_database where datname = 'stagezilla';
oid
-------
20518
(1 row)

stagezilla=# select oid from pg_class where relname = 'series_data';
oid
-------
53294
(1 row)

stagezilla=# \d series_data
Table "public.series_data"
Column | Type | Modifiers
--------------+--------------------------------+-----------
series_id | integer | not null
series_date | timestamp(0) without time zone | not null
series_value | integer | not null
Indexes:
"series_data_series_id_idx" UNIQUE, btree (series_id, series_date)
Foreign-key constraints:
"series_data_series_id_series_series_id_fk" FOREIGN KEY
(series_id) REFERENCES series(series_id) ON UPDATE
CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
DEFERRED

The only file in pg_clog is 0000.

After examining the output of pg_filedump, it became obvious this was
a corrupt page issue. What bothers me is the way in which it's
corrupt. The corrupt data looks supiciously like the data from
different table, or perhaps from an index. In this case, from the
short_desc field of the bugs table, which has a datatype of 'text' (I
presume this means it's TOASTed?). The text data in the corrupt page
exactly matches entries in that field of the bugs table.

So: because the hardware is relatively new and is using components
that are as reliable as I can make them given my somewhat limited
budget (ECC memory, mirrored SATA disks), I have to consider the
possibility that there's a bug here.

Note that on the amd64 architecture, gcc (or some versions thereof)
seems to have a code generation bug that can be eliminated by using
the -fno-strict-aliasing switch. The version of gcc that seems to be
in current use is 4.0.3. It's possible that this is causing problems,
but to be honest this is the first time I've ever seen this happen.

If you guys have any suggestions on what I might do to help track this
one down, please let me know. I tarred up the entire 'stagezilla'
database after shutting down PostgreSQL so that I can do additional
examination of the datafiles if that proves useful.

Additionally, if you'd like to see the output of pg_filedump, I'll be
happy to post it here (or email it separately). I hesitate to attach
it here because it's not clear the mailing list's spam filtering would
let it pass.

--
Kevin Brown kevin@sysexperts.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Brown (#1)
Re: Corrupt data pages...

Kevin Brown <kevin@sysexperts.com> writes:

After examining the output of pg_filedump, it became obvious this was
a corrupt page issue. What bothers me is the way in which it's
corrupt. The corrupt data looks supiciously like the data from
different table, or perhaps from an index. In this case, from the
short_desc field of the bugs table, which has a datatype of 'text' (I
presume this means it's TOASTed?). The text data in the corrupt page
exactly matches entries in that field of the bugs table.

Substitution of a page from a different table seems unlikely to result
in this error, as the tuple header layout is the same for all tables.
A page from an index, though, maybe. Does the suspicious looking page
have any pd_special space?

We have in the past seen data-corruption errors that were clearly due to
substitution of a chunk of entirely non-Postgres data for a table page.
I suppose that those are symptoms of either kernel or device misfeasance
... it's hard to see how Postgres could cause that to happen. We've not
seen enough of them to detect any pattern though. I rather wonder if
what you've seen is the same thing but the substituted data happens to
be from another Postgres file.

Additionally, if you'd like to see the output of pg_filedump, I'll be
happy to post it here (or email it separately). I hesitate to attach
it here because it's not clear the mailing list's spam filtering would
let it pass.

You should at least show the page you think is corrupt.

regards, tom lane

#3Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#2)
Re: Corrupt data pages...

Tom Lane wrote:

You should at least show the page you think is corrupt.

I attempted to send this additional info to the list but I think the
message got dropped on the floor by the mailing list software or by
the spam filter.

I'll put the files on a web server and post links to them here.

--
Kevin Brown kevin@sysexperts.com

#4Kevin Brown
kevin@sysexperts.com
In reply to: Kevin Brown (#3)
Re: Corrupt data pages...

I wrote:

I attempted to send this additional info to the list but I think the
message got dropped on the floor by the mailing list software or by
the spam filter.

I'll put the files on a web server and post links to them here.

You can find them here:

https://gazebo.sysexperts.com/~kevin/postgresql

The files are bad-page-info.txt and bad-page.txt.

--
Kevin Brown kevin@sysexperts.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Brown (#4)
Re: Corrupt data pages...

Kevin Brown <kevin@sysexperts.com> writes:

I'll put the files on a web server and post links to them here.

You can find them here:
https://gazebo.sysexperts.com/~kevin/postgresql

AFAICT, the first half of page 73 is OK, but the second half clearly is
trashed. In the raw-format dump it does look a whole lot like the
second half has been replaced by the first half of a page of some index.

It's fairly hard to see how that could happen inside Postgres. One can
readily imagine bugs that might replace one whole page with another,
but there aren't any operations that manipulate half-a-page. On the
other hand, if your kernel uses 4K blocksize, this could be explained
as substitution of one buffer for another at the kernel level. So my
money is on a kernel bug. As I mentioned, we've once or twice before
seen reports that looked like similar substitutions of single pages by
the kernel.

regards, tom lane

#6Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#5)
Re: Corrupt data pages...

Tom Lane wrote:

It's fairly hard to see how that could happen inside Postgres. One can
readily imagine bugs that might replace one whole page with another,
but there aren't any operations that manipulate half-a-page. On the
other hand, if your kernel uses 4K blocksize, this could be explained
as substitution of one buffer for another at the kernel level. So my
money is on a kernel bug. As I mentioned, we've once or twice before
seen reports that looked like similar substitutions of single pages by
the kernel.

Yeah, I agree that's what's likely to have happened here. The
kernel's page size is 4k, as is the default block size used by XFS.

Tracking this one down any further is going to be nigh unto
impossible, I think.

There have been a pile of fixes to XFS in 2.6.16, so I'll start using
that, at least...

Thanks for looking at this.

--
Kevin Brown kevin@sysexperts.com