Multicolumn index corruption on 8.4 beta 2

Started by Floris Bos / Maxnetalmost 17 years ago21 messageshackers
Jump to latest
#1Floris Bos / Maxnet
bos@je-eigen-domein.nl

Hi,

I pgdump'ed a 8.3.7 database and loaded the dump to a different server
running PostgreSQL 8.4 beta 2 (compiled from source) under Opensolaris.

One of the tables has about 6 million records, and a Btree index that
spans 3 columns.

I am having the problem that some queries are unable to find rows when
using the index.
When I force a sequential scan, by doing "set enable_indexscan=false;
set enable_bitmapscan=false;", the same queries work fine.

In addition, while running "vacuum full analyze" I got the following
error a couple times:

==
ERROR: failed to re-find parent key in index "pgb_idx" for deletion
target page 25470
===

Doing "reindex" or dropping and creating the index, makes the error go
away for a while.
However it does not solve the problem of the missing rows, making me
believe the index Postgresql generates is still corrupt.

According to memtest the memory of the server is fine, and according to
"zpool status" there are no disk or ZFS checksum errors.

Any idea how to solve or debug this issue?

Yours sincerely,

Floris Bos

#2Richard Huxton
dev@archonet.com
In reply to: Floris Bos / Maxnet (#1)
Re: Multicolumn index corruption on 8.4 beta 2

Floris Bos / Maxnet wrote:

I am having the problem that some queries are unable to find rows when
using the index.
When I force a sequential scan, by doing "set enable_indexscan=false;
set enable_bitmapscan=false;", the same queries work fine.

Not a hacker myself, but I can tell you that the first question you'll
be asked is "can you produce a test case"? If you can generate the
problem from a test table+generated data that will let people figure out
the problem for you.

If not, details of the table schema will be needed, and is there any
pattern to the missed rows? Also - compile settings, character set and
locale details might be relevant too.

--
Richard Huxton
Archonet Ltd

#3Bruce Momjian
bruce@momjian.us
In reply to: Richard Huxton (#2)
Re: Multicolumn index corruption on 8.4 beta 2

And can you post an explain plan for the incorrect scan? In particular
is it using a bitmap index scan or a regular index scan? Or does it
happen with either?

--
Greg

On 9 Jun 2009, at 09:43, Richard Huxton <dev@archonet.com> wrote:

Show quoted text

Floris Bos / Maxnet wrote:

I am having the problem that some queries are unable to find rows
when using the index.
When I force a sequential scan, by doing "set
enable_indexscan=false; set enable_bitmapscan=false;", the same
queries work fine.

Not a hacker myself, but I can tell you that the first question
you'll be asked is "can you produce a test case"? If you can
generate the problem from a test table+generated data that will let
people figure out the problem for you.

If not, details of the table schema will be needed, and is there any
pattern to the missed rows? Also - compile settings, character set
and locale details might be relevant too.

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Floris Bos / Maxnet
bos@je-eigen-domein.nl
In reply to: Richard Huxton (#2)
Re: Multicolumn index corruption on 8.4 beta 2

Hi,

Richard Huxton wrote:

Not a hacker myself, but I can tell you that the first question you'll
be asked is "can you produce a test case"? If you can generate the
problem from a test table+generated data that will let people figure out
the problem for you.

Unfortunately, I have not been able to produce a test case (yet) on a
small data set.
While the data in the database is public information, the whole database
is about 100 GB, and therefore kinda hard to share.

If not, details of the table schema will be needed, and is there any
pattern to the missed rows? Also - compile settings, character set and
locale details might be relevant too.

==
Compile settings
==

No fancy settings.
- Clean Opensolaris 2009.06 installation
- Installed gcc and gmake packages.
- Downloaded source and did a ./configure --disable-readline ; gmake ;
gmake install

==
Postgresql settings
==

The following settings differ from the defaults:

--
shared_buffers=3500MB
maintenance_work_mem = 128MB
fsync = off
synchronous_commit = off
checkpoint_segments = 25
--

The locale used when creating the database is SQL_ASCII

==
Hardware
==

Tyan barebone
2x Opteron 2376 quadcore
32 GB reg ecc memory
1x Intel X25-E 32 GB SSD for OS and pg_xlog directory
2x Intel X25-E 64 GB SSD (ZFS striping) for the database

==
Table layout
==

--
Table "public.posts_index"
Column | Type |
Modifiers
------------+------------------------+-----------------------------------------------------------
cid | integer | not null default
nextval('posts_index_cid
_seq'::regclass)
groupid | integer | not null
startdate | integer | not null
poster | character varying(64) | not null
basefile | character varying(64) | not null
subject | character varying(255) | not null
size | real |
nfo | boolean |
c | boolean |
parts | integer |
totalparts | integer |
imdb | integer |
ng1 | boolean | default false
g2 | integer | default 0
g3 | integer | default 0
data | bytea |
Indexes:
"posts_index5_pkey" PRIMARY KEY, btree (cid) CLUSTER
"gr_idx" btree (groupid, (- cid))
"pgb_idx" btree (poster, groupid, basefile)
--

Only noticed problems with the pgb_idx index so far.

The problem only occurs on a subset of the rows, at a time.
After adding/updating rows and doing a reindex, the rows that were
missing before sometimes suddenly do work, but then different ones do not.

And can you post an explain plan for the incorrect scan? In particular is it using a bitmap index scan or a regular index scan? Or does it happen with either?

Happens with both.

Index scan:

===
=> explain SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.25..11.26 rows=1 width=0)
-> Index Scan using pgb_idx on posts_index (cost=0.00..11.25
rows=1 width=0)
Index Cond: (((poster)::text = 'Yenc@power-post.org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))

=> SELECT count(*) FROM posts_index WHERE poster='Yenc@power-post.org
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849';
count
-------
0
===

When I disable index scan, it uses bitmap without luck:

==
=> set enable_indexscan=false;
SET
=> explain SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.26..11.27 rows=1 width=0)
-> Bitmap Heap Scan on posts_index (cost=7.24..11.26 rows=1 width=0)
Recheck Cond: (((poster)::text = 'Yenc@power-post.org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))
-> Bitmap Index Scan on pgb_idx (cost=0.00..7.24 rows=1 width=0)
Index Cond: (((poster)::text = 'Yenc@power-post.org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))

=> SELECT count(*) FROM posts_index WHERE poster='Yenc@power-post.org
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849';
count
-------
0
==

Sequential scan does find the row:

==
=> set enable_indexscan=false;
SET
=> set enable_bitmapscan=false;
SET
=> explain SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=288153.28..288153.29 rows=1 width=0)
-> Seq Scan on posts_index (cost=0.00..288153.28 rows=1 width=0)
Filter: (((poster)::text = 'Yenc@power-post.org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))
(3 rows)

=> SELECT count(*) FROM posts_index WHERE poster='Yenc@power-post.org
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849';
count
-------
1
==

Yours sincerely,

Floris Bos

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Floris Bos / Maxnet (#4)
Re: Multicolumn index corruption on 8.4 beta 2

Floris Bos / Maxnet <bos@je-eigen-domein.nl> writes:

Richard Huxton wrote:

Not a hacker myself, but I can tell you that the first question you'll
be asked is "can you produce a test case"? If you can generate the
problem from a test table+generated data that will let people figure out
the problem for you.

Unfortunately, I have not been able to produce a test case (yet) on a
small data set.
While the data in the database is public information, the whole database
is about 100 GB, and therefore kinda hard to share.

Seems like we'd only need a dump of the one problem table, not the
entire database.

regards, tom lane

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Floris Bos / Maxnet (#4)
Re: Multicolumn index corruption on 8.4 beta 2

On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:

fsync = off

That's a bad plan if you care about your database.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#7Florian Weimer
fweimer@bfk.de
In reply to: Simon Riggs (#6)
Re: Multicolumn index corruption on 8.4 beta 2

* Simon Riggs:

On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:

fsync = off

That's a bad plan if you care about your database.

It shouldn't introduce this type of corruption, though.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

#8Floris Bos / Maxnet
bos@je-eigen-domein.nl
In reply to: Simon Riggs (#6)
Re: Multicolumn index corruption on 8.4 beta 2

Hi,

Simon Riggs wrote:

On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:

fsync = off

That's a bad plan if you care about your database.

I am aware of the risk of dataloss in case of power failure, etc.

However fsync=on is simply too slow for my purpose, and it concerns data
that can be regenerated from its source.

The website this setup is for has been running various previous versions
of PostgreSQL with fsync=off since 2005.
So I still expect it to work.

Yours sincerely,

Floris Bos

#9Bruce Momjian
bruce@momjian.us
In reply to: Florian Weimer (#7)
Re: Multicolumn index corruption on 8.4 beta 2

Well sure it could -- once. It wouldn't be reproducible in a freshly
rebuilt index unless he's crashing his machine every time.

--
Greg

On 9 Jun 2009, at 17:12, Florian Weimer <fweimer@bfk.de> wrote:

Show quoted text

* Simon Riggs:

On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:

fsync = off

That's a bad plan if you care about your database.

It shouldn't introduce this type of corruption, though.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Josh Berkus
josh@agliodbs.com
In reply to: Floris Bos / Maxnet (#8)
Re: Multicolumn index corruption on 8.4 beta 2

Floris,

The website this setup is for has been running various previous versions
of PostgreSQL with fsync=off since 2005.
So I still expect it to work.

You've been lucky, that's all.

Our documentation has been clear, back to version 7.0, that turning
fsync=off carries the risk that you will have to recreate your entire
database in the event of unexpected shutdown. That's not new.

So, the operative question is: was 8.4 shut down with -immediate or
otherwise unexpectedly? If so, then we don't have a bug. If 8.4 was
never shut down, then we have some strange behavior which bears looking
into. And you've found a wierd corner case, which is what we count on
our users for.

Thanks for testing.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#11Florian Weimer
fweimer@bfk.de
In reply to: Josh Berkus (#10)
Re: Multicolumn index corruption on 8.4 beta 2

* Josh Berkus:

Our documentation has been clear, back to version 7.0, that turning
fsync=off carries the risk that you will have to recreate your entire
database in the event of unexpected shutdown. That's not new.

The documentation does not say this. Instead, there's the following
rather explicit explanation that only OS crashes matter:

| (Crashes of the database software itself are not a risk factor
| here. Only an operating-system-level crash creates a risk of
| corruption.)

If it really matters how PostgreSQL is shut down in "fsync = off" mode
(while the operating system keeps running), the documentation is
seriously wrong here.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Florian Weimer (#11)
Re: Multicolumn index corruption on 8.4 beta 2

Florian Weimer wrote:

* Josh Berkus:

Our documentation has been clear, back to version 7.0, that turning
fsync=off carries the risk that you will have to recreate your entire
database in the event of unexpected shutdown. That's not new.

The documentation does not say this. Instead, there's the following
rather explicit explanation that only OS crashes matter:

| (Crashes of the database software itself are not a risk factor
| here. Only an operating-system-level crash creates a risk of
| corruption.)

If it really matters how PostgreSQL is shut down in "fsync = off" mode
(while the operating system keeps running), the documentation is
seriously wrong here.

Yeah, AFAICT the writes are handed off to the operating system (just not
synced), so if it flushes its caches sanely at all there shouldn't be a
problem.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#12)
Re: Multicolumn index corruption on 8.4 beta 2

Alvaro Herrera <alvherre@commandprompt.com> wrote:

Yeah, AFAICT the writes are handed off to the operating system (just
not synced), so if it flushes its caches sanely at all there
shouldn't be a problem.

I would certainly *hope* that's the case. We sometimes use fsync=off
for conversions, where we plan to just start over if the conversion
crashes, and set it to on when the conversion is done. It would be
disturbing to discover that fsync=off also means "don't bother to
write dirty buffers to the OS before shutdown."

-Kevin

#14Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#13)
Re: Multicolumn index corruption on 8.4 beta 2

Alvaro, Kevin,

Yeah, AFAICT the writes are handed off to the operating system (just
not synced), so if it flushes its caches sanely at all there
shouldn't be a problem.

I would certainly *hope* that's the case. We sometimes use fsync=off
for conversions, where we plan to just start over if the conversion
crashes, and set it to on when the conversion is done. It would be
disturbing to discover that fsync=off also means "don't bother to
write dirty buffers to the OS before shutdown."

It doesn't. But what I don't trust, and the *first* place I'd look for
problems, is whether the OS flushes *all* dirty buffers to disk in the
event the application gets killed.

That's why I want more information on Floris' case. Was 8.4 killed or
shut down with -m immediate? Or the os rebooted with 8.4 running?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#14)
Re: Multicolumn index corruption on 8.4 beta 2

Josh Berkus <josh@agliodbs.com> writes:

It doesn't. But what I don't trust, and the *first* place I'd look for
problems, is whether the OS flushes *all* dirty buffers to disk in the
event the application gets killed.

Why wouldn't you trust it? The sort of thing you seem to be thinking
about would require tracking which process(es) wrote each dirty buffer
and then going back and dropping selected dirty buffers when a process
exits abnormally. I can hardly imagine any OS wishing to do that.

regards, tom lane

#16Floris Bos / Maxnet
bos@je-eigen-domein.nl
In reply to: Josh Berkus (#14)
Re: Multicolumn index corruption on 8.4 beta 2

Hi,

Josh Berkus wrote:

It doesn't. But what I don't trust, and the *first* place I'd look for
problems, is whether the OS flushes *all* dirty buffers to disk in the
event the application gets killed.

That's why I want more information on Floris' case. Was 8.4 killed or
shut down with -m immediate? Or the os rebooted with 8.4 running?

The only reboots I have done on that server were with the "reboot"
system command, which should send a SIGTERM to all processes first
including PostgreSQL, before pulling the plug.

I do recall that during the execution of "vacuum full" the psql client
program once did report that it lost connection with the server, but was
able to reconnect. Maybe the server processes handling the connection
died then, but I am not sure of that, and it only happened once.

Anyway, the problem also occurs when there is no reboot or unexpected
event between the reindex and the query.

After a REINDEX it is able to find the row it was missing first, but
then other rows become missing.

All in the same psql session:

===
usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';
count
-------
0
(1 row)

usenet=> reindex index pgb_idx;
REINDEX
usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';
count
-------
1
(1 row)

usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND basefile='frx-fffe' AND
groupid=757;
count
-------
0
(1 row)

usenet=> set enable_indexscan=false;
SET
usenet=> set enable_bitmapscan=false;
SET
usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND basefile='frx-fffe' AND
groupid=757;
count
-------
1
(1 row)

===

Yours sincerely,

Floris Bos

#17Andy Colson
andy@squeakycode.net
In reply to: Floris Bos / Maxnet (#4)
Re: Multicolumn index corruption on 8.4 beta 2

Floris Bos / Maxnet wrote:

The following settings differ from the defaults:

--
shared_buffers=3500MB
maintenance_work_mem = 128MB
fsync = off
synchronous_commit = off
checkpoint_segments = 25
--

==
Table layout
==

--
Table "public.posts_index"
Column | Type | Modifiers
------------+------------------------+-----------------------------------------------------------

cid | integer | not null default
nextval('posts_index_cid
_seq'::regclass)
groupid | integer | not null
startdate | integer | not null
poster | character varying(64) | not null
basefile | character varying(64) | not null
subject | character varying(255) | not null
size | real |
nfo | boolean |
c | boolean |
parts | integer |
totalparts | integer |
imdb | integer |
ng1 | boolean | default false
g2 | integer | default 0
g3 | integer | default 0
data | bytea |
Indexes:
"posts_index5_pkey" PRIMARY KEY, btree (cid) CLUSTER
"gr_idx" btree (groupid, (- cid))
"pgb_idx" btree (poster, groupid, basefile)
--

Only noticed problems with the pgb_idx index so far.

I have been trying to reproduce the problem but no success so far. I
made myself a table that matches yours, then I wrote a little perl
script to fill it with random data. (The script also writes out a text
file I can use to re-query things).

I fill the db, then add the indexes. Then I test lookup every record I
added, and find them all.

So, a few questions:

1) did you dump/restore into 8.4beta1 first and then upgrade the
program? Or did you dump/restore into 8.4beta2?

2) did you use any of the concurrent restore options?

3) do you do any updates or deletes to the table after you restore it?

4) do you do any other operations on the table (vacuum, cluster, etc..)?

5) got any triggers or stored procs?

6) To the -hackers: I write the records and then refind them in the
exact same order, would it be a better test to search for records in a
more random order? would it make a difference? Would searching for
some but not all make a difference?

-Andy

#18Josh Berkus
josh@agliodbs.com
In reply to: Andy Colson (#17)
Re: Multicolumn index corruption on 8.4 beta 2

Andy,

6) To the -hackers: I write the records and then refind them in the
exact same order, would it be a better test to search for records in a
more random order? would it make a difference? Would searching for some
but not all make a difference?

Are you on OpenSolaris? Can you give your script to Zdenek & Jignesh to
test in their enviroments?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#19Andy Colson
andy@squeakycode.net
In reply to: Josh Berkus (#18)
Re: Multicolumn index corruption on 8.4 beta 2

Josh Berkus wrote:

Andy,

6) To the -hackers: I write the records and then refind them in the
exact same order, would it be a better test to search for records in a
more random order? would it make a difference? Would searching for some
but not all make a difference?

Are you on OpenSolaris? Can you give your script to Zdenek & Jignesh to
test in their enviroments?

I am not, and yes I can.

Hopefully I can attach a .tar.bz2

The bigtest.sh is the one to run (it runs all the parts). You'll need
to edit fill.pl and test.pl and set the dbname and maybe give a
username/password.

In the fill.pl there is a $max variable that's used to set the number of
records to insert. (its set to 10 million right now)

Oh, this .tar wont create a subdirectory

-Andy

Attachments:

test.tar.bz2application/octet-stream; name=test.tar.bz2Download
#20Josh Berkus
josh@agliodbs.com
In reply to: Andy Colson (#19)
Re: Multicolumn index corruption on 8.4 beta 2

Floris,

One more question set: what version of OpenSolaris, and what filesystem
are you using? Does the OS have any non-default tuning settings? How
did you install or compile PostgreSQL?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Floris Bos / Maxnet (#1)