BUG #14874: Dublicate values in primary key

Started by Henri KYover 8 years ago8 messagesbugs
Jump to latest
#1Henri KY
sideuxb-ky.consultant@dgfip.finances.gouv.fr

The following bug has been logged on the website:

Bug reference: 14874
Logged by: Henri Ky
Email address: sideuxb-ky.consultant@dgfip.finances.gouv.fr
PostgreSQL version: 9.4.4
Operating system: CentOS release 6.5
Description:

We have serious issue with duplicate values in primary key with our database
running in PostgreSQL 9.4.4.
This only occured during about 19 hours and in on table. There were no
restart of the base before and after the issue, and no error found in
PostgreSQL logs.

We use sequence to generate automatically the ID of the PK for each data
row.
Only the ID of the PK are duplicated, whereas the data of their rows are
different.

We have tried to reindex the PK, but failed the following error:

reindex index anomalie_pk;
ERROR: could not create unique index "anomalie_pk"
DÉTAIL : Key (id_anomalie)=(xxxxxxx) is duplicated.

And we have about 260000 data records with duplicated key.

Could you help us to fix the issue?

We create the table and the sequence with the following commandes:

CREATE TABLE anomalie
(
id_anomalie BIGINT NOT NULL,
id_fichier_collecte BIGINT NOT NULL,
id_anomalie_reference BIGINT NOT NULL,
nombre INTEGER
)
TABLESPACE ges_dat;

CREATE SEQUENCE anomalie_id_seq
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1
NO CYCLE
;

ALTER SEQUENCE anomalie_id_seq OWNED BY anomalie.id_anomalie;
ALTER TABLE ONLY anomalie ALTER COLUMN id_anomalie SET DEFAULT
nextval('anomalie_id_seq');
ALTER TABLE anomalie ADD CONSTRAINT anomalie_pk PRIMARY KEY (id_anomalie)
USING INDEX TABLESPACE ges_idx;

Regards

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Henri KY (#1)
Re: BUG #14874: Dublicate values in primary key

On Thu, Oct 26, 2017 at 8:16 AM,
<sideuxb-ky.consultant@dgfip.finances.gouv.fr> wrote:

The following bug has been logged on the website:

Bug reference: 14874
Logged by: Henri Ky
Email address: sideuxb-ky.consultant@dgfip.finances.gouv.fr
PostgreSQL version: 9.4.4
Operating system: CentOS release 6.5
Description:

We have serious issue with duplicate values in primary key with our database
running in PostgreSQL 9.4.4.
This only occured during about 19 hours and in on table. There were no
restart of the base before and after the issue, and no error found in
PostgreSQL logs.

We use sequence to generate automatically the ID of the PK for each data
row.
Only the ID of the PK are duplicated, whereas the data of their rows are
different.

That's a very old point release. Many 9.4 bugs were fixed after the
9.4.4 point release, including ones that could account for this. You
ought to be trying to stay on the latest point release.

You'll clearly need to fix the problem by hand, resolving which rows
to keep. Rows can be deleted by using the hidden ctid column.

I also recommend using this tool to isolate the corruption as you fix it:

https://github.com/petergeoghegan/amcheck

There are CentOS packages available from the PGDG yum repository.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3CONCHY Elisabeth
elisabeth.conchy@dgfip.finances.gouv.fr
In reply to: Peter Geoghegan (#2)
Re: BUG #14874: Dublicate values in primary key

Hello,

Thank you for your answer on PostgreSQL mailing list "pgsql-bugs".
I work with Henri Ky who is out of office today, and I ditn't find
the way to answer on the web site.
So, I write to you directly, but feel free to add this text on the site.
_______________________________________________________________________

"You'll clearly need to fix the problem by hand, resolving which rows

to keep. Rows can be deleted by using the hidden ctid column."

We can't just delete the rows, because the data are different. The
id only is duplicate.
It looks as if the sequence was requested only once, and used twice
for inserting new rows.

The id has for default value :

nextval('<sequence>')

and is defined as primary key by :
ALTER TABLE <table> ADD CONSTRAINT <table>_pk PRIMARY KEY
(id) USING INDEX TABLESPACE schema_idx;

Yesterday night, we found out that the problem affects not only one
table, but 5.

We would like to understand how this is possible, whithout PosgreSQL
detecting any error.

Can we be sure that couldn't happen again, if we upgrade the version (to
9.4.12 for example) ?

--
Best Regards,
_____________________________________
Elisabeth Conchy
Inspectrice des Finances Publiques
DGFiP/SSI/SI-2B/NESSIE/SoLiPOD

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: CONCHY Elisabeth (#3)
Re: BUG #14874: Dublicate values in primary key

CONCHY Elisabeth <elisabeth.conchy@dgfip.finances.gouv.fr> writes:

Yesterday night, we found out that the problem affects not only one
table, but 5.
We would like to understand how this is possible, whithout PosgreSQL
detecting any error.
Can we be sure that couldn't happen again, if we upgrade the version (to
9.4.12 for example) ?

Nobody's ever going to promise you that Postgres contains no bugs :-(.
What we *can* say confidently is that 9.4.4 contains several known
data-corruption-causing bugs that are fixed in the latest 9.4 minor
release. So you really ought to upgrade.

Having said that ... the symptoms you are describing don't sound much
like a Postgres-induced bug. I could believe a bug possibly having
caused the sequence generator to go backwards. And I could believe
a bug having corrupted an index so that the index failed to notice
duplicate key insertions. But it's harder to believe both of those
things happening at once, and even harder to believe that a heretofore
unnoticed bug would cause five different indexes to be corrupted at
the same time that a sequence generator went backwards.

What *does* fit that set of facts really well is an OS crash (either
a power failure or kernel crash), if Postgres's fsync calls were not
being honored properly so that some data file changes got to disk
ahead of the associated WAL log entries. In that case, even after
WAL replay you would have an inconsistent database: there could be
rows in the tables with IDs past where the sequence generator thought
it was, and the indexes could be corrupt enough so that you didn't
get told about duplicate entries as the sequence generator got advanced
again. (The "index corruption" might well simply amount to the indexes
not having been updated to match what was in the tables proper.)

So ... have there been any server crashes lately?

In any case, my advice is to take a hard look at whether your storage
stack honors fsync all the way down. That won't get you out of the
immediate problem --- as Peter said, manual cleanup of the data and
then reindexing the indexes seems like your only way out of that.
But it might save you from a recurrence.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#4)
Re: BUG #14874: Dublicate values in primary key

On Fri, Oct 27, 2017 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So ... have there been any server crashes lately?

In any case, my advice is to take a hard look at whether your storage
stack honors fsync all the way down. That won't get you out of the
immediate problem --- as Peter said, manual cleanup of the data and
then reindexing the indexes seems like your only way out of that.
But it might save you from a recurrence.

I agree with your conclusion, but I still think that it's slightly
ambiguous. There didn't seem to be an issue with sequences going
backwards -- the actual complaint was about there being multiple
versions of the same logical row visible simultaneously (sequences
were mentioned in passing). This is probably a case of tuples within a
HOT chain experiencing some kind of "resurrection". Though that in
itself doesn't prove much of anything.

The latest example of a bug that made "tuple resurrection" possible is
the recent "freeze-the-dead" bug, but I can recall a couple of other
bugs in the history of 9.4 with the same symptom. Of course, storage
problems can also cause this symptom, and if you're using foreign
keys, chances are good that storage infelicities will result in
corruption with symptoms similar to the corruption caused by various
Multixact bugs (e.g., resurrected rows, VACUUM reporting apparent
MultiXactId wraparound, etc). That's why these symptoms don't tell us
much.

I don't want to make too much of this. It's clear that being on such
an old point release is needlessly risky, and it's also clear that the
OP has work to do to validate the reliability of storage as part of a
disaster recovery effort.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#5)
Re: BUG #14874: Dublicate values in primary key

Peter Geoghegan <pg@bowt.ie> writes:

I agree with your conclusion, but I still think that it's slightly
ambiguous. There didn't seem to be an issue with sequences going
backwards -- the actual complaint was about there being multiple
versions of the same logical row visible simultaneously (sequences
were mentioned in passing). This is probably a case of tuples within a
HOT chain experiencing some kind of "resurrection". Though that in
itself doesn't prove much of anything.

Yeah, I wondered about that, particularly in view of the multiple
tuple-freezing bugs we've found. If the damage were in just one table
I'd believe it for sure ... but I'm not sure I believe it across 5
tables.

But the bottom line is the same: update to 9.4.latest (which is *not*
12 btw), and double-check storage reliability.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Henri KY
sideuxb-ky.consultant@dgfip.finances.gouv.fr
In reply to: Tom Lane (#6)
Re: BUG #14874: Dublicate values in primary key

Peter Geoghegan <pg@bowt.ie> writes:

I agree with your conclusion, but I still think that it's slightly
ambiguous. There didn't seem to be an issue with sequences going
backwards -- the actual complaint was about there being multiple
versions of the same logical row visible simultaneously (sequences
were mentioned in passing). This is probably a case of tuples within a
HOT chain experiencing some kind of "resurrection". Though that in
itself doesn't prove much of anything.

Yeah, I wondered about that, particularly in view of the multiple
tuple-freezing bugs we've found. If the damage were in just one table
I'd believe it for sure ... but I'm not sure I believe it across 5
tables.

But the bottom line is the same: update to 9.4.latest (which is *not*
12 btw), and double-check storage reliability.

regards, tom lane

Before the issue, there was a data recovery operation (the data cluster
was restored), but it seems that data in tablespaces were not cleaned up
before the operation, do you think this could be the cause of the
duplicate keys issue?

It seems that it is not for us to fix the issue by deleting manually
duplicated data. Is it possible to restore the data cluster before the
issue and re-apply WAL?

Concerning the storage, we use data disk bay, it seems that our hardware
is quite reliable. But it seems that during the issue, no system error
was found, thus we will reconfirm that the system engineer.

Concerning the upgrade, is it possible to upgrade directly from the
minor version 9.4.4 to the latest minor version 9.4.14 without applying
all the minor versions in between? I guess that to upgrade one minor
version it only needs to stop PostgreSQL, then apply the binaries and
start PostgreSQL. Could you also confirm this?

Best regards
Henri Ky

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Henri KY
sideuxb-ky.consultant@dgfip.finances.gouv.fr
In reply to: CONCHY Elisabeth (#3)
Re: BUG #14874: Dublicate values in primary key

Hello again,

With the uncleaned files left in the tablespaces after database restore
operation, this let us perform few futher tests.
We tried to duplicate manually data files used by one table and restart
the cluster, the reindex of the table has failed with duplicate key error.
As a conclusion, PostgreSQL does not check data integrity during cluster
restart, and if data files are altered, the database is corrupted
without any notice.
It would be better that PostgreSQL would report error related to data
integrity issue during database lifetime in the futur versions (this is
not implemented in version 9.6).

Here below the tests we have perfomed in detail:

[postgres] $ ls -lh
/u05/pgsql/9.4/data/mydb001/tbs_ges_dat/PG_9.4_201409291/17273
total 1,1G
-rw------- 1 postgres postgres 1,0G 31 oct.  13:48 17274
-rw------- 1 postgres postgres  11M 31 oct.  13:48 17274.1
-rw------- 1 postgres postgres 280K 31 oct.  13:46 17274_fsm

[postgres] mydb=# select count(*) from anomalie;
  count
----------
 18000000

[postgres] mydb=# \d+
                               Liste des relations
 Schéma |        Nom         |   Type   | Propriétaire | Taille   |
Description
------------+--------------------+----------+--------------+------------+-------------
 public | anomalie           | table    | postgres     | 1034 MB    |
 public | anomalie_id_seq    | séquence | postgres     | 8192 bytes |
 public | pg_buffercache     | vue      | postgres     | 0 bytes    |
 public | pg_stat_statements | vue      | postgres     | 0 bytes    |

[postgres] mydb=# reindex table anomalie;
REINDEX

[postgres] $ pg_ctl stop ...

[postgres] $ cd
/u05/pgsql/9.4/data/mydb001/tbs_ges_dat/PG_9.4_201409291/17273
[postgres] $ cp 17274.1 17274.2
[postgres] $ cp 17274 17274.1

[postgres] $ pg_ctl start ...

[postgres] $ ls -lh
/u05/pgsql/9.4/data/mydb001/tbs_ges_dat/PG_9.4_201409291/17273
total 2,1G
-rw------- 1 postgres postgres 1,0G 31 oct.  13:58 17274
-rw------- 1 postgres postgres 1,0G 31 oct.  13:59 17274.1
-rw------- 1 postgres postgres  11M 31 oct.  13:59 17274.2
-rw------- 1 postgres postgres 280K 31 oct.  13:46 17274_fsm

[postgres] mydb=# select count(*) from anomalie;
  count
----------
 35825792

[postgres] mydb=# \d+
                               Liste des relations
 Schéma |        Nom         |   Type   | Propriétaire | Taille   |
Description
--------+--------------------+----------+--------------+------------+-------------
 public | anomalie           | table    | postgres     | 2058 MB    |
 public | anomalie_id_seq    | séquence | postgres     | 8192 bytes |
 public | pg_buffercache     | vue      | postgres     | 0 bytes    |
 public | pg_stat_statements | vue      | postgres     | 0 bytes    |

[postgres] mydb=# reindex table anomalie;
ERROR:  could not create unique index "anomalie_pk"
DÉTAIL : Key (id_anomalie)=(1) is duplicated.

Regards
Henri

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs