Duplicate primary keys/rows

Started by CSNover 20 years ago10 messagesgeneral

cool_screen_name90001@yahoo.com

over 20 years ago

This is weird. I set up a table with a serial id field
and created a primary key on it. Then I imported data.
Running an app against it, I got periodic errors
stating "duplicate key violates unique constraint
"pkey_table1." Looking through the table (with
phppgadmin), there are duplicate rows:

id|f1|f2|f3|f4
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z

However:

select * from table1 where id=586;
586|a|b|c|d

Yet:
select * from table1 where id>=585 and id<=587;
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z

Wow, how is this possible? I'm using PG 8.0.3 on
Windows XP. This computer has been crashing repeatedly
lately, if that could be blamed (bad memory? hard
disk? I haven't quite figured out why.) Using
phppgadmin, I was able to delete one of the duplicate
rows (there are several) - don't know how it does that
- maybe using OIDs?

CSN

__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/

Michael Fuhr

mike@fuhr.org

over 20 years ago

In reply to: CSN (#1)

Re: Duplicate primary keys/rows

On Sun, Oct 09, 2005 at 12:46:51PM -0700, CSN wrote:

select * from table1 where id=586;
586|a|b|c|d

Do you get different results from the following queries?

SET enable_seqscan TO on;
SET enable_indexscan TO off;
SELECT * FROM table1 WHERE id = 586;

SET enable_seqscan TO off;
SET enable_indexscan TO on;
SELECT * FROM table1 WHERE id = 586;

Yet:
select * from table1 where id>=585 and id<=587;
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z

What's the output of the following query?

RESET enable_seqscan;
RESET enable_indexscan;

SELECT oid, ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;

If you get the error 'column "oid" does not exist' then you've
created the table without oids, so just omit oid from the select
list:

SELECT ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;

Wow, how is this possible? I'm using PG 8.0.3 on
Windows XP. This computer has been crashing repeatedly
lately, if that could be blamed (bad memory? hard
disk? I haven't quite figured out why.)

Faulty hardware is one possibile explanation.

--
Michael Fuhr

CSN

cool_screen_name90001@yahoo.com

over 20 years ago

In reply to: Michael Fuhr (#2)

Re: Duplicate primary keys/rows

--- Michael Fuhr <mike@fuhr.org> wrote:

On Sun, Oct 09, 2005 at 12:46:51PM -0700, CSN wrote:

select * from table1 where id=586;
586|a|b|c|d

Do you get different results from the following
queries?

SET enable_seqscan TO on;
SET enable_indexscan TO off;
SELECT * FROM table1 WHERE id = 586;

This returns 2 rows.

SET enable_seqscan TO off;
SET enable_indexscan TO on;
SELECT * FROM table1 WHERE id = 586;

This returns 1 row.

Yet:
select * from table1 where id>=585 and id<=587;
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z

What's the output of the following query?

RESET enable_seqscan;
RESET enable_indexscan;

SELECT oid, ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;

oid | ctid | xmin | cmin | xmax | cmax
| id
--------+-----------+---------+------+---------+------+-----
125465 | (3143,78) | 1664385 | 0 | 1664386 | 2
| 984
125466 | (2745,50) | 1481020 | 0 | 1682425 | 2
| 985
125466 | (2672,11) | 1445346 | 0 | 1481020 | 0
| 985
125467 | (3159,28) | 1671875 | 0 | 1671876 | 2
| 986

(I'm using a different duplicate row - 985. I deleted
586's duplicate.)

Is this a problem with the index? Would rebuilding
them fix this problem? I'm still curious why this
happened, and somewhat troubled that something like
this can happen.

Thanks for your help,
CSN

If you get the error 'column "oid" does not exist'
then you've
created the table without oids, so just omit oid
from the select
list:

SELECT ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;

Wow, how is this possible? I'm using PG 8.0.3 on
Windows XP. This computer has been crashing

repeatedly

lately, if that could be blamed (bad memory? hard
disk? I haven't quite figured out why.)

Faulty hardware is one possibile explanation.

--
Michael Fuhr

__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: CSN (#3)

Re: Duplicate primary keys/rows

CSN <cool_screen_name90001@yahoo.com> writes:

oid | ctid | xmin | cmin | xmax | cmax | id
--------+-----------+---------+------+---------+------+-----
125466 | (2672,11) | 1445346 | 0 | 1481020 | 0 | 985
125466 | (2745,50) | 1481020 | 0 | 1682425 | 2 | 985

Hmm. The fact that the dup rows have the same OID indicates pretty
strongly that they are actually two versions of the same row, and
not two independently inserted rows. Furthermore we can see that xact
1481020 deleted the first version and inserted the second (note I took
the liberty of rearranging your output to make the rows appear in
chronological order).

So the index hasn't screwed up, exactly; the problem is that both rows
appear as good at the same time. But why?

It's really highly annoying that we can't see the contents of the
infomasks for the rows. Would you be willing to grab a copy of
pg_filedump and dump out these two data pages so we can see the
complete tuple headers?

(If you don't have a compiler then you'd need to find a precompiled
copy of pg_filedump for Windows. I don't know if anyone's made one
available.)

Given that you say the machine has been crashing, my bet is that a crash
caused the loss of pg_clog status for xid 1481020 at a time when
2745,50's xmin had been marked committed good, but 2672,11's xmax had
not been similarly marked. We have sufficient defenses against this
sort of thing *if the disk drive does not lie about write complete*.
(Unfortunately the vast majority of el-cheapo PCs are configured to lie
with abandon, which means that we can't guarantee data consistency
across power failures on such hardware.) It'd be nice to get direct
confirmation of that theory though.

regards, tom lane

CSN

cool_screen_name90001@yahoo.com

over 20 years ago

In reply to: Tom Lane (#4)

Re: Duplicate primary keys/rows

I don't have a compiler on this machine. If somebody
can point me to a copy of pg_filedump for Windows (I
didn't see any using Google) I'd be happy to use it.
Or perhaps I could compile it under cygwin.

The hard drive is a Western Digital 200GB JD (SATA),
if that can be used to determine how badly it lies. ;)

Thanks,
CSN

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

CSN <cool_screen_name90001@yahoo.com> writes:

oid | ctid | xmin | cmin | xmax |

cmax | id

--------+-----------+---------+------+---------+------+-----

125466 | (2672,11) | 1445346 | 0 | 1481020 |

0 | 985

125466 | (2745,50) | 1481020 | 0 | 1682425 |

2 | 985

Hmm. The fact that the dup rows have the same OID
indicates pretty
strongly that they are actually two versions of the
same row, and
not two independently inserted rows. Furthermore we
can see that xact
1481020 deleted the first version and inserted the
second (note I took
the liberty of rearranging your output to make the
rows appear in
chronological order).

So the index hasn't screwed up, exactly; the problem
is that both rows
appear as good at the same time. But why?

It's really highly annoying that we can't see the
contents of the
infomasks for the rows. Would you be willing to
grab a copy of
pg_filedump and dump out these two data pages so we
can see the
complete tuple headers?

(If you don't have a compiler then you'd need to
find a precompiled
copy of pg_filedump for Windows. I don't know if
anyone's made one
available.)

Given that you say the machine has been crashing, my
bet is that a crash
caused the loss of pg_clog status for xid 1481020 at
a time when
2745,50's xmin had been marked committed good, but
2672,11's xmax had
not been similarly marked. We have sufficient
defenses against this
sort of thing *if the disk drive does not lie about
write complete*.
(Unfortunately the vast majority of el-cheapo PCs
are configured to lie
with abandon, which means that we can't guarantee
data consistency
across power failures on such hardware.) It'd be
nice to get direct
confirmation of that theory though.

regards, tom lane

__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/

Michael Fuhr

mike@fuhr.org

over 20 years ago

In reply to: Tom Lane (#4)

Re: Duplicate primary keys/rows

On Mon, Oct 10, 2005 at 04:28:57PM -0400, Tom Lane wrote:

It's really highly annoying that we can't see the contents of the
infomasks for the rows.

Any particular reason there isn't an infomask system column?

--
Michael Fuhr

Scott Marlowe

smarlowe@g2switchworks.com

over 20 years ago

In reply to: CSN (#5)

Re: Duplicate primary keys/rows

On Mon, 2005-10-10 at 15:51, CSN wrote:

I don't have a compiler on this machine. If somebody
can point me to a copy of pg_filedump for Windows (I
didn't see any using Google) I'd be happy to use it.
Or perhaps I could compile it under cygwin.

The hard drive is a Western Digital 200GB JD (SATA),
if that can be used to determine how badly it lies. ;)

The general rule is that ATA (parallel or serial) drives lie. You can
turn off the write cache, which will make it slower, but then it should
be reliable during a power loss.

Or, put it all on a big UPS and hope the power supply never goes out.

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Michael Fuhr (#6)

Re: Duplicate primary keys/rows

Michael Fuhr <mike@fuhr.org> writes:

On Mon, Oct 10, 2005 at 04:28:57PM -0400, Tom Lane wrote:

It's really highly annoying that we can't see the contents of the
infomasks for the rows.

Any particular reason there isn't an infomask system column?

(a) inertia

(b) lack of desire to add an additional pg_attribute row per table.

I recall having proposed that we stop storing explicit pg_attribute
entries for system columns, which would make this sort of change easier
to make, and would save a pretty considerable amount of space in
pg_attribute too. (In the present regression database, about 45% of the
rows in pg_attribute are for system columns; that might be overly high
for real-world DBs though.) But people were a bit worried about what
might break.

regards, tom lane

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Tom Lane (#8)

Re: Duplicate primary keys/rows

On Mon, Oct 10, 2005 at 05:28:17PM -0400, Tom Lane wrote:

I recall having proposed that we stop storing explicit pg_attribute
entries for system columns, which would make this sort of change easier
to make, and would save a pretty considerable amount of space in
pg_attribute too. (In the present regression database, about 45% of the
rows in pg_attribute are for system columns; that might be overly high
for real-world DBs though.) But people were a bit worried about what
might break.

In catalog/heap.c there already is a SystemAttributeByName() to do the
legwork. Seems to me all you'd need to do is check just as you're about
to fail on "attribute not found".

If you actually look at scanRTEForColumn() in parser/parse_relation.c
it actually checks to see if a column name could be a system column
name, *before* looking it up in the catalog. Remove the catalog test
(except for OID obviously) and it'd sail right through. It'd be
interesting to see what happened...

Given that internally, they're referred to by number, it might not be
so bad. As usual, external clients might get confused if they're not
there...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Martijn van Oosterhout (#9)

Re: Duplicate primary keys/rows

Martijn van Oosterhout <kleptog@svana.org> writes:

On Mon, Oct 10, 2005 at 05:28:17PM -0400, Tom Lane wrote:

But people were a bit worried about what
might break.

In catalog/heap.c there already is a SystemAttributeByName() to do the
legwork. Seems to me all you'd need to do is check just as you're about
to fail on "attribute not found".

We could certainly make it work as far as the backend is concerned.
The issue is whether there is any client code out there that will fail
if these entries are no longer present in pg_attribute.

regards, tom lane