Duplicate primary keys/rows
This is weird. I set up a table with a serial id field
and created a primary key on it. Then I imported data.
Running an app against it, I got periodic errors
stating "duplicate key violates unique constraint
"pkey_table1." Looking through the table (with
phppgadmin), there are duplicate rows:
id|f1|f2|f3|f4
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z
However:
select * from table1 where id=586;
586|a|b|c|d
Yet:
select * from table1 where id>=585 and id<=587;
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z
Wow, how is this possible? I'm using PG 8.0.3 on
Windows XP. This computer has been crashing repeatedly
lately, if that could be blamed (bad memory? hard
disk? I haven't quite figured out why.) Using
phppgadmin, I was able to delete one of the duplicate
rows (there are several) - don't know how it does that
- maybe using OIDs?
CSN
__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/
On Sun, Oct 09, 2005 at 12:46:51PM -0700, CSN wrote:
select * from table1 where id=586;
586|a|b|c|d
Do you get different results from the following queries?
SET enable_seqscan TO on;
SET enable_indexscan TO off;
SELECT * FROM table1 WHERE id = 586;
SET enable_seqscan TO off;
SET enable_indexscan TO on;
SELECT * FROM table1 WHERE id = 586;
Yet:
select * from table1 where id>=585 and id<=587;
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|z
What's the output of the following query?
RESET enable_seqscan;
RESET enable_indexscan;
SELECT oid, ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;
If you get the error 'column "oid" does not exist' then you've
created the table without oids, so just omit oid from the select
list:
SELECT ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;
Wow, how is this possible? I'm using PG 8.0.3 on
Windows XP. This computer has been crashing repeatedly
lately, if that could be blamed (bad memory? hard
disk? I haven't quite figured out why.)
Faulty hardware is one possibile explanation.
--
Michael Fuhr
--- Michael Fuhr <mike@fuhr.org> wrote:
On Sun, Oct 09, 2005 at 12:46:51PM -0700, CSN wrote:
select * from table1 where id=586;
586|a|b|c|dDo you get different results from the following
queries?SET enable_seqscan TO on;
SET enable_indexscan TO off;
SELECT * FROM table1 WHERE id = 586;
This returns 2 rows.
SET enable_seqscan TO off;
SET enable_indexscan TO on;
SELECT * FROM table1 WHERE id = 586;
This returns 1 row.
Yet:
select * from table1 where id>=585 and id<=587;
585|c|a|e|f
586|a|b|c|d
586|a|b|c|d
587|g|e|r|zWhat's the output of the following query?
RESET enable_seqscan;
RESET enable_indexscan;SELECT oid, ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;
oid | ctid | xmin | cmin | xmax | cmax
| id
--------+-----------+---------+------+---------+------+-----
125465 | (3143,78) | 1664385 | 0 | 1664386 | 2
| 984
125466 | (2745,50) | 1481020 | 0 | 1682425 | 2
| 985
125466 | (2672,11) | 1445346 | 0 | 1481020 | 0
| 985
125467 | (3159,28) | 1671875 | 0 | 1671876 | 2
| 986
(I'm using a different duplicate row - 985. I deleted
586's duplicate.)
Is this a problem with the index? Would rebuilding
them fix this problem? I'm still curious why this
happened, and somewhat troubled that something like
this can happen.
Thanks for your help,
CSN
If you get the error 'column "oid" does not exist'
then you've
created the table without oids, so just omit oid
from the select
list:SELECT ctid, xmin, cmin, xmax, cmax, *
FROM table1
WHERE id >= 585 AND id <= 587;Wow, how is this possible? I'm using PG 8.0.3 on
Windows XP. This computer has been crashingrepeatedly
lately, if that could be blamed (bad memory? hard
disk? I haven't quite figured out why.)Faulty hardware is one possibile explanation.
--
Michael Fuhr
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
CSN <cool_screen_name90001@yahoo.com> writes:
oid | ctid | xmin | cmin | xmax | cmax | id
--------+-----------+---------+------+---------+------+-----
125466 | (2672,11) | 1445346 | 0 | 1481020 | 0 | 985
125466 | (2745,50) | 1481020 | 0 | 1682425 | 2 | 985
Hmm. The fact that the dup rows have the same OID indicates pretty
strongly that they are actually two versions of the same row, and
not two independently inserted rows. Furthermore we can see that xact
1481020 deleted the first version and inserted the second (note I took
the liberty of rearranging your output to make the rows appear in
chronological order).
So the index hasn't screwed up, exactly; the problem is that both rows
appear as good at the same time. But why?
It's really highly annoying that we can't see the contents of the
infomasks for the rows. Would you be willing to grab a copy of
pg_filedump and dump out these two data pages so we can see the
complete tuple headers?
(If you don't have a compiler then you'd need to find a precompiled
copy of pg_filedump for Windows. I don't know if anyone's made one
available.)
Given that you say the machine has been crashing, my bet is that a crash
caused the loss of pg_clog status for xid 1481020 at a time when
2745,50's xmin had been marked committed good, but 2672,11's xmax had
not been similarly marked. We have sufficient defenses against this
sort of thing *if the disk drive does not lie about write complete*.
(Unfortunately the vast majority of el-cheapo PCs are configured to lie
with abandon, which means that we can't guarantee data consistency
across power failures on such hardware.) It'd be nice to get direct
confirmation of that theory though.
regards, tom lane
I don't have a compiler on this machine. If somebody
can point me to a copy of pg_filedump for Windows (I
didn't see any using Google) I'd be happy to use it.
Or perhaps I could compile it under cygwin.
The hard drive is a Western Digital 200GB JD (SATA),
if that can be used to determine how badly it lies. ;)
Thanks,
CSN
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
CSN <cool_screen_name90001@yahoo.com> writes:
oid | ctid | xmin | cmin | xmax |
cmax | id
--------+-----------+---------+------+---------+------+-----
125466 | (2672,11) | 1445346 | 0 | 1481020 |
0 | 985
125466 | (2745,50) | 1481020 | 0 | 1682425 |
2 | 985
Hmm. The fact that the dup rows have the same OID
indicates pretty
strongly that they are actually two versions of the
same row, and
not two independently inserted rows. Furthermore we
can see that xact
1481020 deleted the first version and inserted the
second (note I took
the liberty of rearranging your output to make the
rows appear in
chronological order).So the index hasn't screwed up, exactly; the problem
is that both rows
appear as good at the same time. But why?It's really highly annoying that we can't see the
contents of the
infomasks for the rows. Would you be willing to
grab a copy of
pg_filedump and dump out these two data pages so we
can see the
complete tuple headers?(If you don't have a compiler then you'd need to
find a precompiled
copy of pg_filedump for Windows. I don't know if
anyone's made one
available.)Given that you say the machine has been crashing, my
bet is that a crash
caused the loss of pg_clog status for xid 1481020 at
a time when
2745,50's xmin had been marked committed good, but
2672,11's xmax had
not been similarly marked. We have sufficient
defenses against this
sort of thing *if the disk drive does not lie about
write complete*.
(Unfortunately the vast majority of el-cheapo PCs
are configured to lie
with abandon, which means that we can't guarantee
data consistency
across power failures on such hardware.) It'd be
nice to get direct
confirmation of that theory though.regards, tom lane
__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/
On Mon, Oct 10, 2005 at 04:28:57PM -0400, Tom Lane wrote:
It's really highly annoying that we can't see the contents of the
infomasks for the rows.
Any particular reason there isn't an infomask system column?
--
Michael Fuhr
On Mon, 2005-10-10 at 15:51, CSN wrote:
I don't have a compiler on this machine. If somebody
can point me to a copy of pg_filedump for Windows (I
didn't see any using Google) I'd be happy to use it.
Or perhaps I could compile it under cygwin.The hard drive is a Western Digital 200GB JD (SATA),
if that can be used to determine how badly it lies. ;)
The general rule is that ATA (parallel or serial) drives lie. You can
turn off the write cache, which will make it slower, but then it should
be reliable during a power loss.
Or, put it all on a big UPS and hope the power supply never goes out.
Michael Fuhr <mike@fuhr.org> writes:
On Mon, Oct 10, 2005 at 04:28:57PM -0400, Tom Lane wrote:
It's really highly annoying that we can't see the contents of the
infomasks for the rows.
Any particular reason there isn't an infomask system column?
(a) inertia
(b) lack of desire to add an additional pg_attribute row per table.
I recall having proposed that we stop storing explicit pg_attribute
entries for system columns, which would make this sort of change easier
to make, and would save a pretty considerable amount of space in
pg_attribute too. (In the present regression database, about 45% of the
rows in pg_attribute are for system columns; that might be overly high
for real-world DBs though.) But people were a bit worried about what
might break.
regards, tom lane
On Mon, Oct 10, 2005 at 05:28:17PM -0400, Tom Lane wrote:
I recall having proposed that we stop storing explicit pg_attribute
entries for system columns, which would make this sort of change easier
to make, and would save a pretty considerable amount of space in
pg_attribute too. (In the present regression database, about 45% of the
rows in pg_attribute are for system columns; that might be overly high
for real-world DBs though.) But people were a bit worried about what
might break.
In catalog/heap.c there already is a SystemAttributeByName() to do the
legwork. Seems to me all you'd need to do is check just as you're about
to fail on "attribute not found".
If you actually look at scanRTEForColumn() in parser/parse_relation.c
it actually checks to see if a column name could be a system column
name, *before* looking it up in the catalog. Remove the catalog test
(except for OID obviously) and it'd sail right through. It'd be
interesting to see what happened...
Given that internally, they're referred to by number, it might not be
so bad. As usual, external clients might get confused if they're not
there...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes:
On Mon, Oct 10, 2005 at 05:28:17PM -0400, Tom Lane wrote:
But people were a bit worried about what
might break.
In catalog/heap.c there already is a SystemAttributeByName() to do the
legwork. Seems to me all you'd need to do is check just as you're about
to fail on "attribute not found".
We could certainly make it work as far as the backend is concerned.
The issue is whether there is any client code out there that will fail
if these entries are no longer present in pg_attribute.
regards, tom lane