how could duplicate pkey exist in psql?

Started by Yan Chunluover 14 years ago12 messagesgeneral
Jump to latest
#1Yan Chunlu
springrider@gmail.com

recently I have found several tables has exactly the same pkey, here is
the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Yan Chunlu (#1)
Re: how could duplicate pkey exist in psql?

Yan Chunlu wrote:

recently I have found several tables has exactly the same pkey, here

is the definition:

"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41
| num
159292 | funnypics_link_point | 40
| num

I could not even update this record.

It could be a software bug or something else, but did you ever
restore the database after a storage problem?

This has happened to me once:
http://archives.postgresql.org/pgsql-general/2010-02/msg00971.php

Yours,
Laurenz Albe

#3Szymon Guz
mabewlun@gmail.com
In reply to: Yan Chunlu (#1)
Re: how could duplicate pkey exist in psql?

On 17 November 2011 06:19, Yan Chunlu <springrider@gmail.com> wrote:

recently I have found several tables has exactly the same pkey, here is
the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

Hi,
could you send us result of the query:
select thing_id, '|'||key||'|' from table?
Maybe there are some more spaces in the key column which were hidden by
table alignment in the client?

regards
Szymon

#4Edson Carlos Ericksson Richter
richter@simkorp.com.br
In reply to: Yan Chunlu (#1)
Re: how could duplicate pkey exist in psql?

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same pkey, here
is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused
by a "rsync"... instead copy to a different location, script was copying
pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours
of work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from
being overwriten even when they are in use.

Regards,

Edson.

#5Yan Chunlu
springrider@gmail.com
In reply to: Szymon Guz (#3)
Re: how could duplicate pkey exist in psql?

seems they are identical:
159292 | |funnypicscn_link_karma|
159292 | |funnypicscn_link_karma|

On Thu, Nov 17, 2011 at 4:07 PM, Szymon Guz <mabewlun@gmail.com> wrote:

Show quoted text

On 17 November 2011 06:19, Yan Chunlu <springrider@gmail.com> wrote:

recently I have found several tables has exactly the same pkey, here is
the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

Hi,
could you send us result of the query:
select thing_id, '|'||key||'|' from table?
Maybe there are some more spaces in the key column which were hidden by
table alignment in the client?

regards
Szymon

#6Yan Chunlu
springrider@gmail.com
In reply to: Edson Carlos Ericksson Richter (#4)
Re: how could duplicate pkey exist in psql?

I am using pgpool's replication feature, it does copy pg_xlog from one
server to another, was that possible cause of the problem?

thanks for the help!

On Thu, Nov 17, 2011 at 5:38 PM, Edson Richter <richter@simkorp.com.br>wrote:

Show quoted text

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same pkey, here is

the definition:
"diggcontent_data_account_**pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused by
a "rsync"... instead copy to a different location, script was copying
pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours of
work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from being
overwriten even when they are in use.

Regards,

Edson.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-general&lt;http://www.postgresql.org/mailpref/pgsql-general&gt;

In reply to: Yan Chunlu (#1)
Re: how could duplicate pkey exist in psql?

On Thu, Nov 17, 2011 at 01:19:30PM +0800, Yan Chunlu wrote:

recently I have found several tables has exactly the same pkey, here is
the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

please check:
select thing_id, key, count(*) from diggcontent_data_account group by 1,2 having
count(*) > 1;

this will show if you have really duplicated values.

if you have - the index ( diggcontent_data_account_pkey ) is broken.
Exact reason can vary, any chance this database is hot-backup restored
from different system?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

#8Edson Carlos Ericksson Richter
richter@simkorp.com.br
In reply to: Yan Chunlu (#6)
Re: how could duplicate pkey exist in psql?

Em 17-11-2011 09:21, Yan Chunlu escreveu:

I am using pgpool's replication feature, it does copy pg_xlog from one
server to another, was that possible cause of the problem?

I did not mean that this IS your problem, I just gave you a tip
regarding a problem I had in the past, that eventually has same simptom.

This scenario only happens when your script is copy data over own
data... like in "rsync -ar root@127.0.0.1:/var/lib/pgsql/9.0/data/*
/var/lib/pgsql/9.0/data/"

the command above is highly dangerous because it copies data over the
network link over its own data... if you have transactions runing during
the command above, you will get a crash (and, in my case, I had
duplicate primary keys).

Would be better to check if this could be happening to you... some
script overwriting data using rsync, cp, etc... I had no other situation
where Postgresql allowed duplicate keys.

Hope this helps,

Edson.

Show quoted text

thanks for the help!

On Thu, Nov 17, 2011 at 5:38 PM, Edson Richter <richter@simkorp.com.br
<mailto:richter@simkorp.com.br>> wrote:

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same
pkey, here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was
caused by a "rsync"... instead copy to a different location,
script was copying pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two
hours of work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files
from being overwriten even when they are in use.

Regards,

Edson.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Yan Chunlu
springrider@gmail.com
In reply to: Edson Carlos Ericksson Richter (#8)
Re: how could duplicate pkey exist in psql?

got it. thank you very much for you help. I found out this problem too
late, and there is no backup.

luckily there was not too much data for this, and my app keeps running
without error.

I am not sure if they are related but I could not use pg_restore to import
data dumped by "pg_dump -Fc";

pg_restore will print some error message about "duplicate primary key", and
the table is empty. no data has been imported.

pg_restore supposed to import the data and ignore the errors. does any
one have the similar problem?

On Thu, Nov 17, 2011 at 11:08 PM, Edson Richter <richter@simkorp.com.br>wrote:

Show quoted text

Em 17-11-2011 09:21, Yan Chunlu escreveu:

I am using pgpool's replication feature, it does copy pg_xlog from one
server to another, was that possible cause of the problem?

I did not mean that this IS your problem, I just gave you a tip regarding
a problem I had in the past, that eventually has same simptom.

This scenario only happens when your script is copy data over own data...
like in "rsync -ar root@127.0.0.1:/var/lib/pgsql/9.0/data/*/var/lib/pgsql/9.0/data/"

the command above is highly dangerous because it copies data over the
network link over its own data... if you have transactions runing during
the command above, you will get a crash (and, in my case, I had duplicate
primary keys).

Would be better to check if this could be happening to you... some script
overwriting data using rsync, cp, etc... I had no other situation where
Postgresql allowed duplicate keys.

Hope this helps,

Edson.

thanks for the help!

On Thu, Nov 17, 2011 at 5:38 PM, Edson Richter <richter@simkorp.com.br>wrote:

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same pkey, here is

the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

the data is like this:

159292 | funnypics_link_point | 41

| num
159292 | funnypics_link_point | 40

| num

I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused
by a "rsync"... instead copy to a different location, script was copying
pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours of
work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from being
overwriten even when they are in use.

Regards,

Edson.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Yan Chunlu (#9)
Re: how could duplicate pkey exist in psql?

On Monday, November 21, 2011 6:39:55 am Yan Chunlu wrote:

got it. thank you very much for you help. I found out this problem too
late, and there is no backup.

luckily there was not too much data for this, and my app keeps running
without error.

I am not sure if they are related but I could not use pg_restore to import
data dumped by "pg_dump -Fc";

pg_restore will print some error message about "duplicate primary key", and
the table is empty. no data has been imported.

pg_restore supposed to import the data and ignore the errors. does any
one have the similar problem?

pg_restore may ignore the error and keep on going but the database will not. In
other words when pg_restore receives the error it will continue on to the next
item (unless you have the -e switch on). As far as the server(database) is
concerned duplicate primary key is still an error and the data will not be
loaded.

--
Adrian Klaver
adrian.klaver@gmail.com

#11Yan Chunlu
springrider@gmail.com
In reply to: Adrian Klaver (#10)
Re: how could duplicate pkey exist in psql?

and database will stop receiving the following data after detected an
error?
that means while using pg_restore, no error allowed to happen, otherwise
the database will stop receiving data and the import will fail.

I found only one record in psql's log:

duplicate key value violates unique constraint "account_pkey"

does that means one duplicate record will prevent all other records to
import?

On Mon, Nov 21, 2011 at 10:55 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:

Show quoted text

On Monday, November 21, 2011 6:39:55 am Yan Chunlu wrote:

got it. thank you very much for you help. I found out this problem too
late, and there is no backup.

luckily there was not too much data for this, and my app keeps running
without error.

I am not sure if they are related but I could not use pg_restore to

import

data dumped by "pg_dump -Fc";

pg_restore will print some error message about "duplicate primary key",

and

the table is empty. no data has been imported.

pg_restore supposed to import the data and ignore the errors. does any
one have the similar problem?

pg_restore may ignore the error and keep on going but the database will
not. In
other words when pg_restore receives the error it will continue on to the
next
item (unless you have the -e switch on). As far as the server(database) is
concerned duplicate primary key is still an error and the data will not be
loaded.

--
Adrian Klaver
adrian.klaver@gmail.com

#12Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Yan Chunlu (#11)
Re: how could duplicate pkey exist in psql?

On Monday, November 21, 2011 4:53:21 pm Yan Chunlu wrote:

and database will stop receiving the following data after detected an
error?
that means while using pg_restore, no error allowed to happen, otherwise
the database will stop receiving data and the import will fail.

I found only one record in psql's log:

duplicate key value violates unique constraint "account_pkey"

does that means one duplicate record will prevent all other records to
import?

For that table yes. Though if that table is the parent in FK relationships with
other tables, those tables will fail to import also because the keys they refer
to do not exist.

To get around this you have several options:
1) Find the duplicate entry(s) in the original table and eliminate them before
dumping.
2) Dump the table by itself to a plain text format and eliminate the
duplicate(s) in the plain text file before restoring.
3) By default pg_dump uses COPY to load data into tables. As you have found out
that runs as a single transaction and rollbacks if there is an error. You can
specify --insert to the pg_dump command to get it to output INSERT(s) for each
row. The up side is each INSERT is a separate transaction. The down side is if
there is a lot of data it will take a long time to load because each INSERT is a
separate transaction.
4) Use pgloader (http://pgfoundry.org/projects/pgloader/). It is a Python
program that 'manages' COPY. It will kick out bad rows and keep loading data.

--
Adrian Klaver
adrian.klaver@gmail.com