Multiple table entries?
Hi,
I recently upgraded to 8.4 and everything went great. All databases are
working as they are supposed to, no problems seen.
Today, however, I did a \d on a database and was surprised to see sets
of 5 identical table entries for each one that is supposed to be there.
Here's the databases in question (used to store apache web logs).
Several other of my databases (but not all) show the same 5 entries for
each real table symptom.
_postgresql@heinlein:/backup/postgresql $ psql pglogd
psql (8.4.0)
Type "help" for help.
pglogd=# \d++
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------------+-------+-------------+---------+-------------
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_entries | table | _postgresql | 528 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | full_temp | table | jross | 1904 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | log_entries | table | _postgresql | 1648 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | page_hits | table | _postgresql | 5296 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
public | total_hits | table | _postgresql | 16 kB |
(25 rows)
\l correctly lists the individual databases, and when I took a look at
yesterday's dump file I only see one create table statement, not 5.
I browsed through the system catalogs but haven't found anything yet
that can shine some light on this.
System is running OpenBSD -current.
Thanks,
Jeff Ross
On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:
Hi,
I recently upgraded to 8.4 and everything went great. All databases are
working as they are supposed to, no problems seen.Today, however, I did a \d on a database and was surprised to see sets of 5
identical table entries for each one that is supposed to be there.
Ugh.
What does
select xmin,xmax,ctid,oid,* from pg_class
return?
On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:
I browsed through the system catalogs but haven't found anything yet that
can shine some light on this.
Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.
select xmin,xmax,ctid,oid,* from pg_namespace
Greg Stark wrote:
On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:
Hi,
I recently upgraded to 8.4 and everything went great. �All databases are
working as they are supposed to, no problems seen.Today, however, I did a \d on a database and was surprised to see sets of 5
identical table entries for each one that is supposed to be there.Ugh.
What does
select xmin,xmax,ctid,oid,* from pg_class
return?
216 rows worth, but I only see one entry per table, not 5.
For easier reading, I put the output at
http://openvistas.net/pg_class_query.html
Thanks!
Jeff
Greg Stark wrote:
On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote:
I browsed through the system catalogs but haven't found anything yet that
can shine some light on this.Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.select xmin,xmax,ctid,oid,* from pg_namespace
http://openvistas.net/pg_namespace_query.html
Jeff
On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote:
Greg Stark wrote:
Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.select xmin,xmax,ctid,oid,* from pg_namespace
Yeah, that's a problem. Would you be able to load the pageinspect
contrib module and run a query?
select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x;
On Sun, Aug 23, 2009 at 4:40 AM, Greg Stark<gsstark@mit.edu> wrote:
On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote:
Greg Stark wrote:
Yeah, that's a problem. Would you be able to load the pageinspect
contrib module and run a query?select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x;
Also, do you have the WAL log files going back to the database
creation? If so *please* preserve them! You're the second person to
report similar symptoms so it's starting to look like there may be a
serious bug here. And the nature of the problem is such that I think
it may require someone sifting through the xlog WAL files to see what
sequence of events happened.
The wal files are in a subdirection of the database root called
pg_xlog. If you still have one named 000000010000000000000000 then you
have all of them. Please, if possible, copy them to a backup
directory.
Greg Stark wrote:
On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote:
Greg Stark wrote:
Actually, I wonder if this isn't more likely to show the problem -- it
would explain why *all* your tables are showing up with duplicates
rather than just one.select xmin,xmax,ctid,oid,* from pg_namespace
Yeah, that's a problem. Would you be able to load the pageinspect
contrib module and run a query?select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x;
I had to modify your query slightly to make it run--hope I got what you
are after!
select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h) as x;
http://www.openvistas.net/pageinspect.html
Thanks,
Jeff
On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:
I had to modify your query slightly to make it run--hope I got what you are
after!select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h) as x;
Incidentally, may as well ask the usual questions:
Did you upgrade this via pg_dump/pg_restore or via pg_migrator?
Are you running with fsync=off?
Are you running with synchronous_commit=off?
Have you had any system crashes?
Have you had any Postgres crashes?
Of the above only fsync=off combined with a system crash ought to be
unsafe, but it could be good to know which other events have occurred
to know which code paths have been exercised.
Do you still have pg_clog/0000 (that's kept longer than the xlog so I
would expect you would), how large is it? could you send that file? It
has a record of which transactions have committed or aborted, no
private data.
On Sun, Aug 23, 2009 at 5:02 PM, Greg Stark<gsstark@mit.edu> wrote:
On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:
Incidentally, may as well ask the usual questions:
And just for reference, what does pg_controldata print?
Greg Stark wrote:
On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:
I had to modify your query slightly to make it run--hope I got what you are
after!select (h).* from (select
heap_page_items(get_raw_page('pg_namespace',0)) as h) as x;Incidentally, may as well ask the usual questions:
Did you upgrade this via pg_dump/pg_restore or via pg_migrator?
pg_dump
Are you running with fsync=off?
No.
Are you running with synchronous_commit=off?
No.
Have you had any system crashes?
No.
Have you had any Postgres crashes?
No.
Of the above only fsync=off combined with a system crash ought to be
unsafe, but it could be good to know which other events have occurred
to know which code paths have been exercised.Do you still have pg_clog/0000 (that's kept longer than the xlog so I
would expect you would), how large is it? could you send that file? It
has a record of which transactions have committed or aborted, no
private data.
pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog
Jeff
Greg Stark wrote:
On Sun, Aug 23, 2009 at 5:02 PM, Greg Stark<gsstark@mit.edu> wrote:
On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote:
Incidentally, may as well ask the usual questions:And just for reference, what does pg_controldata print?
_postgresql@heinlein:/var/postgresql $
pg_controldata
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5368117850282091454
Database cluster state: in production
pg_control last modified: Sun Aug 23 10:37:39 2009
Latest checkpoint location: 0/4203B43C
Prior checkpoint location: 0/4202F978
Latest checkpoint's REDO location: 0/4203B43C
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/116884
Latest checkpoint's NextOID: 866525
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Sun Aug 23 10:37:38 2009
Minimum recovery ending location: 0/0
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 2000
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by reference
On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote:
pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog
Sorry, I'm getting a 404
On Sun, Aug 23, 2009 at 6:23 PM, Greg Stark<gsstark@mit.edu> wrote:
On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote:
pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog
Sorry, I'm getting a 404
For what it's worth this is what the heap dump shows. I don't
understand why you're seeing these tuples show up if they're all
properly marked as updated and XMAX_COMMITTED.
The last tuple is marked strangely I think. I don't think it's
supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also,
I don't understand why it's marked as UPDATED if ctid and xmax aren't
set.
Could you also send
select (h).* from (select page_header(get_raw_page('pg_namespace',0))
as h) as x;
I'm wondering if the page allvisible flag is set. The visibility map
is one of the few 8.4 patches which impact transaction visibility.
lp lp_off flags lp_len t_xmin t_xmax field3 t_ctid t_info2 t_info
3 7616 1 141 2 897 0 (0,6) 16387 9482
HOT_UPDATED, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED
6 7340 1 129 897 898 0 (0,7) -16381 9482
HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED
7 7220 1 117 898 899 0 (0,9) -16381 9482
HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED
9 7088 1 129 899 900 0 (0,10) -16381 9482
HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED
10 6944 1 141 900 0 0 (0,10) -32765 10506
UPDATED, XMAX_INVALID, XMIN_COMMITTED
Greg Stark <gsstark@mit.edu> writes:
The last tuple is marked strangely I think. I don't think it's
supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also,
I don't understand why it's marked as UPDATED if ctid and xmax aren't
set.
No, that all looks right to me. UPDATE sets HEAP_UPDATED on the newer
version of the row, not the older one. What looks interesting to me is
that the last update isn't marked HEAP_ONLY_TUPLE, ie, it's not in the
same HOT chain. Why is that I wonder ...
I'm wondering if the page allvisible flag is set. The visibility map
is one of the few 8.4 patches which impact transaction visibility.
But we don't use that while examining individual tuples, do we?
regards, tom lane
On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
Greg Stark <gsstark@mit.edu> writes:
The last tuple is marked strangely I think. I don't think it's
supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also,
I don't understand why it's marked as UPDATED if ctid and xmax aren't
set.No, that all looks right to me. UPDATE sets HEAP_UPDATED on the newer
version of the row, not the older one. What looks interesting to me is
that the last update isn't marked HEAP_ONLY_TUPLE, ie, it's not in the
same HOT chain. Why is that I wonder ...I'm wondering if the page allvisible flag is set. The visibility map
is one of the few 8.4 patches which impact transaction visibility.But we don't use that while examining individual tuples, do we?
We don't use the visibility map itself but we *do* use the page
header's all visible bit. On a sequential scan we skip the visibility
check for tuples on the page if the page header bit is set.
Greg Stark <gsstark@mit.edu> writes:
On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
But we don't use that while examining individual tuples, do we?
We don't use the visibility map itself but we *do* use the page
header's all visible bit. On a sequential scan we skip the visibility
check for tuples on the page if the page header bit is set.
Urgh. I bet that's where the problem is then. Some path is failing
to clear that bit, or maybe there's a race condition that allows it
to become set incorrectly (ie right after somebody else adds a tuple).
regards, tom lane
On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
Greg Stark <gsstark@mit.edu> writes:
On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
But we don't use that while examining individual tuples, do we?
We don't use the visibility map itself but we *do* use the page
header's all visible bit. On a sequential scan we skip the visibility
check for tuples on the page if the page header bit is set.Urgh. I bet that's where the problem is then. Some path is failing
to clear that bit, or maybe there's a race condition that allows it
to become set incorrectly (ie right after somebody else adds a tuple).
Well it only gets set in vacuum when we have the vacuum cleanout lock
so I can't see it getting set incorrectly unless there's a simple
logic error in vacuum. I just scanned it and it looks right to me.
I can imagine it failing to be unset. Or being unset but allowing a
race condition window before it's unset when it's wrong. That doesn't
explain why it the HEAP_ONLY_TUPLE flag would be missing though.
Also, vacuum logs a warning if the flag is found to be set and
shouldn't be. I suppose that could easily be missed though if it's
autovacuum that ran it. Also, the flag might fail to be cleared but
then be cleared later by another operation. I wonder how much damage
it can do while it's wrong.
Greg Stark <gsstark@mit.edu> writes:
On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
Urgh. I bet that's where the problem is then. Some path is failing
to clear that bit, or maybe there's a race condition that allows it
to become set incorrectly (ie right after somebody else adds a tuple).
Well it only gets set in vacuum when we have the vacuum cleanout lock
so I can't see it getting set incorrectly unless there's a simple
logic error in vacuum. I just scanned it and it looks right to me.
heap_update is broken. Details left as an exercise for the reader
(hint: Heikki has blatantly ignored the WAL-logging rules in transam/README)
regards, tom lane
Greg Stark wrote:
On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote:
pg_clog is 32K. �I've put it at http://www.openvistas.net/pg_clog
Sorry, I'm getting a 404
Oops--fixed now.
Jeff