Multiple table entries?

Started by Jeff Rossover 16 years ago26 messagesgeneral
Jump to latest
#1Jeff Ross
jross@wykids.org

Hi,

I recently upgraded to 8.4 and everything went great. All databases are
working as they are supposed to, no problems seen.

Today, however, I did a \d on a database and was surprised to see sets
of 5 identical table entries for each one that is supposed to be there.
Here's the databases in question (used to store apache web logs).
Several other of my databases (but not all) show the same 5 entries for
each real table symptom.

_postgresql@heinlein:/backup/postgresql $ psql pglogd
psql (8.4.0)
Type "help" for help.

pglogd=# \d++
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------------+-------+-------------+---------+-------------
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
(25 rows)

\l correctly lists the individual databases, and when I took a look at
yesterday's dump file I only see one create table statement, not 5.

I browsed through the system catalogs but haven't found anything yet
that can shine some light on this.

System is running OpenBSD -current.

Thanks,

Jeff Ross

#2Bruce Momjian
bruce@momjian.us
In reply to: Jeff Ross (#1)
Re: Multiple table entries?

On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:

Hi,

I recently upgraded to 8.4 and everything went great.  All databases are
working as they are supposed to, no problems seen.

Today, however, I did a \d on a database and was surprised to see sets of 5
identical table entries for each one that is supposed to be there.

Ugh.

What does

select xmin,xmax,ctid,oid,* from pg_class

return?

--
greg
http://mit.edu/~gsstark/resume.pdf

#3Bruce Momjian
bruce@momjian.us
In reply to: Jeff Ross (#1)
Re: Multiple table entries?

On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:

I browsed through the system catalogs but haven't found anything yet that
can shine some light on this.

Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.

select xmin,xmax,ctid,oid,* from pg_namespace

--
greg
http://mit.edu/~gsstark/resume.pdf

#4Jeff Ross
jross@wykids.org
In reply to: Bruce Momjian (#2)
Re: Multiple table entries?

Greg Stark wrote:

On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:

Hi,

I recently upgraded to 8.4 and everything went great. �All databases are
working as they are supposed to, no problems seen.

Today, however, I did a \d on a database and was surprised to see sets of 5
identical table entries for each one that is supposed to be there.

Ugh.

What does

select xmin,xmax,ctid,oid,* from pg_class

return?

216 rows worth, but I only see one entry per table, not 5.

For easier reading, I put the output at

http://openvistas.net/pg_class_query.html

Thanks!

Jeff

#5Jeff Ross
jross@wykids.org
In reply to: Bruce Momjian (#3)
Re: Multiple table entries?

Greg Stark wrote:

On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:

I browsed through the system catalogs but haven't found anything yet that
can shine some light on this.

Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.

select xmin,xmax,ctid,oid,* from pg_namespace

http://openvistas.net/pg_namespace_query.html

Jeff

#6Bruce Momjian
bruce@momjian.us
In reply to: Jeff Ross (#5)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote:

Greg Stark wrote:

Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.

select xmin,xmax,ctid,oid,* from pg_namespace

http://openvistas.net/pg_namespace_query.html

Yeah, that's a problem. Would you be able to load the pageinspect
contrib module and run a query?

select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x;

--
greg
http://mit.edu/~gsstark/resume.pdf

#7Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#6)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 4:40 AM, Greg Stark<gsstark@mit.edu> wrote:

On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote:

Greg Stark wrote:

Yeah, that's a problem. Would you be able to load the pageinspect
contrib module and run a query?

select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x;

Also, do you have the WAL log files going back to the database
creation? If so *please* preserve them! You're the second person to
report similar symptoms so it's starting to look like there may be a
serious bug here. And the nature of the problem is such that I think
it may require someone sifting through the xlog WAL files to see what
sequence of events happened.

The wal files are in a subdirection of the database root called
pg_xlog. If you still have one named 000000010000000000000000 then you
have all of them. Please, if possible, copy them to a backup
directory.

--
greg
http://mit.edu/~gsstark/resume.pdf

#8Jeff Ross
jross@wykids.org
In reply to: Bruce Momjian (#6)
Re: Multiple table entries?

Greg Stark wrote:

On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote:

Greg Stark wrote:

Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.

select xmin,xmax,ctid,oid,* from pg_namespace

http://openvistas.net/pg_namespace_query.html

Yeah, that's a problem. Would you be able to load the pageinspect
contrib module and run a query?

select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x;

I had to modify your query slightly to make it run--hope I got what you
are after!

select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h) as x;

http://www.openvistas.net/pageinspect.html

Thanks,

Jeff

#9Bruce Momjian
bruce@momjian.us
In reply to: Jeff Ross (#8)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:

I had to modify your query slightly to make it run--hope I got what you are
after!

select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h) as x;

http://www.openvistas.net/pageinspect.html

Incidentally, may as well ask the usual questions:

Did you upgrade this via pg_dump/pg_restore or via pg_migrator?
Are you running with fsync=off?
Are you running with synchronous_commit=off?
Have you had any system crashes?
Have you had any Postgres crashes?

Of the above only fsync=off combined with a system crash ought to be
unsafe, but it could be good to know which other events have occurred
to know which code paths have been exercised.

Do you still have pg_clog/0000 (that's kept longer than the xlog so I
would expect you would), how large is it? could you send that file? It
has a record of which transactions have committed or aborted, no
private data.

--
greg
http://mit.edu/~gsstark/resume.pdf

#10Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#9)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 5:02 PM, Greg Stark<gsstark@mit.edu> wrote:

On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:
Incidentally, may as well ask the usual questions:

And just for reference, what does pg_controldata print?

--
greg
http://mit.edu/~gsstark/resume.pdf

#11Jeff Ross
jross@wykids.org
In reply to: Bruce Momjian (#9)
Re: Multiple table entries?

Greg Stark wrote:

On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:

I had to modify your query slightly to make it run--hope I got what you are
after!

select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h) as x;

http://www.openvistas.net/pageinspect.html

Incidentally, may as well ask the usual questions:

Did you upgrade this via pg_dump/pg_restore or via pg_migrator?

pg_dump

Are you running with fsync=off?

No.

Are you running with synchronous_commit=off?

No.

Have you had any system crashes?

No.

Have you had any Postgres crashes?

No.

Of the above only fsync=off combined with a system crash ought to be
unsafe, but it could be good to know which other events have occurred
to know which code paths have been exercised.

Do you still have pg_clog/0000 (that's kept longer than the xlog so I
would expect you would), how large is it? could you send that file? It
has a record of which transactions have committed or aborted, no
private data.

pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog

Jeff

#12Jeff Ross
jross@wykids.org
In reply to: Bruce Momjian (#10)
Re: Multiple table entries?

Greg Stark wrote:

On Sun, Aug 23, 2009 at 5:02 PM, Greg Stark<gsstark@mit.edu> wrote:

On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:
Incidentally, may as well ask the usual questions:

And just for reference, what does pg_controldata print?

_postgresql@heinlein:/var/postgresql $
pg_controldata
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5368117850282091454
Database cluster state: in production
pg_control last modified: Sun Aug 23 10:37:39 2009
Latest checkpoint location: 0/4203B43C
Prior checkpoint location: 0/4202F978
Latest checkpoint's REDO location: 0/4203B43C
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/116884
Latest checkpoint's NextOID: 866525
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Sun Aug 23 10:37:38 2009
Minimum recovery ending location: 0/0
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 2000
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by reference

#13Bruce Momjian
bruce@momjian.us
In reply to: Jeff Ross (#11)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote:

pg_clog is 32K.  I've put it at http://www.openvistas.net/pg_clog

Sorry, I'm getting a 404

--
greg
http://mit.edu/~gsstark/resume.pdf

#14Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#13)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 6:23 PM, Greg Stark<gsstark@mit.edu> wrote:

On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote:

pg_clog is 32K.  I've put it at http://www.openvistas.net/pg_clog

Sorry, I'm getting a 404

For what it's worth this is what the heap dump shows. I don't
understand why you're seeing these tuples show up if they're all
properly marked as updated and XMAX_COMMITTED.

The last tuple is marked strangely I think. I don't think it's
supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also,
I don't understand why it's marked as UPDATED if ctid and xmax aren't
set.

Could you also send
select (h).* from (select page_header(get_raw_page('pg_namespace',0))
as h) as x;

I'm wondering if the page allvisible flag is set. The visibility map
is one of the few 8.4 patches which impact transaction visibility.

lp lp_off flags lp_len t_xmin t_xmax field3 t_ctid t_info2 t_info
3 7616 1 141 2 897 0 (0,6) 16387 9482
HOT_UPDATED, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED

6 7340 1 129 897 898 0 (0,7) -16381 9482
HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED

7 7220 1 117 898 899 0 (0,9) -16381 9482
HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED

9 7088 1 129 899 900 0 (0,10) -16381 9482
HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED

10 6944 1 141 900 0 0 (0,10) -32765 10506
UPDATED, XMAX_INVALID, XMIN_COMMITTED

--
greg
http://mit.edu/~gsstark/resume.pdf

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#14)
Re: Multiple table entries?

Greg Stark <gsstark@mit.edu> writes:

The last tuple is marked strangely I think. I don't think it's
supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also,
I don't understand why it's marked as UPDATED if ctid and xmax aren't
set.

No, that all looks right to me. UPDATE sets HEAP_UPDATED on the newer
version of the row, not the older one. What looks interesting to me is
that the last update isn't marked HEAP_ONLY_TUPLE, ie, it's not in the
same HOT chain. Why is that I wonder ...

I'm wondering if the page allvisible flag is set. The visibility map
is one of the few 8.4 patches which impact transaction visibility.

But we don't use that while examining individual tuples, do we?

regards, tom lane

#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#15)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

The last tuple is marked strangely I think. I don't think it's
supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also,
I don't understand why it's marked as UPDATED if ctid and xmax aren't
set.

No, that all looks right to me.  UPDATE sets HEAP_UPDATED on the newer
version of the row, not the older one.  What looks interesting to me is
that the last update isn't marked HEAP_ONLY_TUPLE, ie, it's not in the
same HOT chain.  Why is that I wonder ...

I'm wondering if the page allvisible flag is set. The visibility map
is one of the few 8.4 patches which impact transaction visibility.

But we don't use that while examining individual tuples, do we?

We don't use the visibility map itself but we *do* use the page
header's all visible bit. On a sequential scan we skip the visibility
check for tuples on the page if the page header bit is set.

--
greg
http://mit.edu/~gsstark/resume.pdf

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: Multiple table entries?

Greg Stark <gsstark@mit.edu> writes:

On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

But we don't use that while examining individual tuples, do we?

We don't use the visibility map itself but we *do* use the page
header's all visible bit. On a sequential scan we skip the visibility
check for tuples on the page if the page header bit is set.

Urgh. I bet that's where the problem is then. Some path is failing
to clear that bit, or maybe there's a race condition that allows it
to become set incorrectly (ie right after somebody else adds a tuple).

regards, tom lane

#18Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#17)
Re: Multiple table entries?

On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

But we don't use that while examining individual tuples, do we?

We don't use the visibility map itself but we *do* use the page
header's all visible bit. On a sequential scan we skip the visibility
check for tuples on the page if the page header bit is set.

Urgh.  I bet that's where the problem is then.  Some path is failing
to clear that bit, or maybe there's a race condition that allows it
to become set incorrectly (ie right after somebody else adds a tuple).

Well it only gets set in vacuum when we have the vacuum cleanout lock
so I can't see it getting set incorrectly unless there's a simple
logic error in vacuum. I just scanned it and it looks right to me.

I can imagine it failing to be unset. Or being unset but allowing a
race condition window before it's unset when it's wrong. That doesn't
explain why it the HEAP_ONLY_TUPLE flag would be missing though.

Also, vacuum logs a warning if the flag is found to be set and
shouldn't be. I suppose that could easily be missed though if it's
autovacuum that ran it. Also, the flag might fail to be cleared but
then be cleared later by another operation. I wonder how much damage
it can do while it's wrong.

--
greg
http://mit.edu/~gsstark/resume.pdf

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#18)
Re: Multiple table entries?

Greg Stark <gsstark@mit.edu> writes:

On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Urgh. I bet that's where the problem is then. Some path is failing
to clear that bit, or maybe there's a race condition that allows it
to become set incorrectly (ie right after somebody else adds a tuple).

Well it only gets set in vacuum when we have the vacuum cleanout lock
so I can't see it getting set incorrectly unless there's a simple
logic error in vacuum. I just scanned it and it looks right to me.

heap_update is broken. Details left as an exercise for the reader
(hint: Heikki has blatantly ignored the WAL-logging rules in transam/README)

regards, tom lane

#20Jeff Ross
jross@wykids.org
In reply to: Bruce Momjian (#13)
Re: Multiple table entries?

Greg Stark wrote:

On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote:

pg_clog is 32K. �I've put it at http://www.openvistas.net/pg_clog

Sorry, I'm getting a 404

Oops--fixed now.

Jeff

#21Jeff Ross
jross@wykids.org
In reply to: Tom Lane (#19)
#22Bruce Momjian
bruce@momjian.us
In reply to: Jeff Ross (#1)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Ross (#21)
#24Jeff Ross
jross@wykids.org
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Ross (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)