Duplicate Key Values

Started by mark bradleyabout 1 year ago40 messagesgeneral
Jump to latest
#1mark bradley
markbradyju@outlook.com

Although I did not enter them, somehow duplicate primary key values have appeared in one of my tables. I can't remove the offending rows because the key is a foreign key in another table.

What to do?

Mark Brady,
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;

#2Ron
ronljohnsonjr@gmail.com
In reply to: mark bradley (#1)
Re: Duplicate Key Values

On Wed, Mar 5, 2025 at 12:36 PM mark bradley <markbradyju@outlook.com>
wrote:

Although I did not enter them, somehow duplicate primary key values have
appeared in one of my tables.

Is it a text/varchar column? Has the distro been upgraded "recently", or
maybe streamed from an older Linux system to a newer Linux system?

I can't remove the offending rows because the key is a foreign key in
another table.

What to do?

Are the records completely (i.e., all columns) identical?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#3Ron
ronljohnsonjr@gmail.com
In reply to: mark bradley (#1)
Re: Duplicate Key Values

What does "and unique identifier" mean?

(Creating a Primary Key constraint automatically creates a unique
no-nulls-allowed index on the column(s), so no other index is needed.)

Can you show the table definition?

Anyway, you *should* be able to delete one of the duplicate node_id records
by specifying another column with a different value;

On Wed, Mar 5, 2025 at 6:14 PM mark bradley <markbradyju@outlook.com> wrote:

Hi Ron,

The key is an integer. I'm using pGAdmin4 and recently updated to the
latest version.

The records are not all identical, some have NULL values of the non-key
for some unknown reason. Here is a screenshot. The key is node_id, which
I have specified as a key and unique identifier.

Mark Brady,
*amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;*
------------------------------
*From:* Ron Johnson <ronljohnsonjr@gmail.com>
*Sent:* Wednesday, March 5, 2025 12:50 PM
*To:* pgsql-general@lists.postgresql.org <
pgsql-general@lists.postgresql.org>
*Subject:* Re: Duplicate Key Values

On Wed, Mar 5, 2025 at 12:36 PM mark bradley <markbradyju@outlook.com>
wrote:

Although I did not enter them, somehow duplicate primary key values have
appeared in one of my tables.

Is it a text/varchar column? Has the distro been upgraded "recently", or
maybe streamed from an older Linux system to a newer Linux system?

I can't remove the offending rows because the key is a foreign key in
another table.

What to do?

Are the records completely (i.e., all columns) identical?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Attachments:

image.pngimage/png; name=image.pngDownload
#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#3)
Re: Duplicate Key Values

On 3/5/25 15:37, Ron Johnson wrote:

What does "and unique identifier" mean?

(Creating a Primary Key constraint automatically creates a unique
no-nulls-allowed index on the column(s), so no other index is needed.)

Can you show the table definition?

Anyway, you /should/ be able to delete one of the duplicate node_id
records by specifying another column with a different value;

On Wed, Mar 5, 2025 at 6:14 PM mark bradley <markbradyju@outlook.com
<mailto:markbradyju@outlook.com>> wrote:

Hi Ron,

The key is an integer.  I'm using pGAdmin4 and recently updated to
the latest version.

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

The records are not all identical, some have NULL values of the
non-key for some unknown reason.  Here is a screenshot.  The key is
node_id, which I have specified as a key and unique identifier.

--
Adrian Klaver
adrian.klaver@aklaver.com

#5mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#4)
Re: Duplicate Key Values

Here is the table definition:

[cid:ecac8e92-826b-45c8-95a8-aaf0e55c4f9c]

And here is the error message I get when I try to delete a duplicate:

[cid:a0f5f298-984d-4f89-abd2-475c02e65b9d]

Mark Brady,
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Wednesday, March 5, 2025 6:49 PM
To: Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/5/25 15:37, Ron Johnson wrote:

What does "and unique identifier" mean?

(Creating a Primary Key constraint automatically creates a unique
no-nulls-allowed index on the column(s), so no other index is needed.)

Can you show the table definition?

Anyway, you /should/ be able to delete one of the duplicate node_id
records by specifying another column with a different value;

On Wed, Mar 5, 2025 at 6:14 PM mark bradley <markbradyju@outlook.com
<mailto:markbradyju@outlook.com>> wrote:

Hi Ron,

The key is an integer. I'm using pGAdmin4 and recently updated to
the latest version.

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

The records are not all identical, some have NULL values of the
non-key for some unknown reason. Here is a screenshot. The key is
node_id, which I have specified as a key and unique identifier.

--
Adrian Klaver
adrian.klaver@aklaver.com

Attachments:

image.pngimage/png; name=image.pngDownload
image.pngimage/png; name=image.pngDownload
#6mark bradley
markbradyju@outlook.com
In reply to: mark bradley (#5)
Re: Duplicate Key Values

I tried adding a dummy column with different values and then deleting one, as you suggested. Got the same message.

Mark Brady, Ph.D.
Deputy Chief Data Officer, TRMC
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: mark bradley <markbradyju@outlook.com>
Sent: Thursday, March 6, 2025 1:11 PM
To: Adrian Klaver <adrian.klaver@aklaver.com>; Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

Here is the table definition:

[cid:ecac8e92-826b-45c8-95a8-aaf0e55c4f9c]

And here is the error message I get when I try to delete a duplicate:

[cid:a0f5f298-984d-4f89-abd2-475c02e65b9d]

Mark Brady,
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Wednesday, March 5, 2025 6:49 PM
To: Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/5/25 15:37, Ron Johnson wrote:

What does "and unique identifier" mean?

(Creating a Primary Key constraint automatically creates a unique
no-nulls-allowed index on the column(s), so no other index is needed.)

Can you show the table definition?

Anyway, you /should/ be able to delete one of the duplicate node_id
records by specifying another column with a different value;

On Wed, Mar 5, 2025 at 6:14 PM mark bradley <markbradyju@outlook.com
<mailto:markbradyju@outlook.com>> wrote:

Hi Ron,

The key is an integer. I'm using pGAdmin4 and recently updated to
the latest version.

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

The records are not all identical, some have NULL values of the
non-key for some unknown reason. Here is a screenshot. The key is
node_id, which I have specified as a key and unique identifier.

--
Adrian Klaver
adrian.klaver@aklaver.com

Attachments:

image.pngimage/png; name=image.pngDownload
image.pngimage/png; name=image.pngDownload
#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#5)
Re: Duplicate Key Values

On 3/6/25 10:11, mark bradley wrote:

Here is the table definition:

And here is the error message I get when I try to delete a duplicate:

Please answer the following:

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

Mark Brady,
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#1)
Re: Duplicate Key Values

On 3/6/25 10:36, mark bradley wrote:
Reply to list also.
Ccing list.

1) Did you not see duplicates with the old version of pgAdmin4?
I did see it in my last update but have done a couple, so it should have
happened then.

2) What do you see if you use psql?
Here is the PSQL output:

Supply information as copy and paste text.

3) Did you upgrade/move the Postgres server or the underlying OS?
No

4) Have you tried reindexing the node_id field?
No, but I'll try that.

Mark Brady,
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 1:22 PM
*To:* mark bradley <markbradyju@outlook.com>; Ron Johnson
<ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:11, mark bradley wrote:

Here is the table definition:

And here is the error message I get when I try to delete a duplicate:

Please answer the following:

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

Mark Brady,
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;_

------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#1)
Re: Duplicate Key Values

On 3/6/25 12:06, mark bradley wrote:

My mistake I forgot to Cc list on my previous post, which was:

That would be an issue and also would mean it is not a PK.

In psql do:

\d dataset

and show the results as text in your reply.

Ccing list

The below shows there is an index("dataset_pkey") on node_id.

Note, reindexing will take a lock on the table that prevents changing
data while the operation is running. See the below for more information:

https://www.postgresql.org/docs/current/sql-reindex.html

If the table is not to big and you can interrupt access to it then the
simplest command to run would be:

REINDEX TABLE dataset;

Universal Metadata Schema=# \d dataset
                                 Table "public.dataset"
          Column           |           Type            | Collation |
Nullable | Defau
lt
---------------------------+---------------------------+-----------+----------+------
---
 node_id                   | integer                   |           |
not null |
 dataset_name              | character varying(25)     |           |
not null |
 notes                     | text                      |           |
       |
 dataset_type              | database_type             |           |
not null |
 dataset_maturity          | database_maturity_type    |           |
not null |
 disposition               | disposition_type          |           |
not null |
 start_date                | date                      |           |
       |
 end_date                  | date                      |           |
       |
 most_recent_update        | date                      |           |
       |
 update_periodicity        | interval                  |           |
       |
 system_of_record          | text                      |           |
       |
 point_of_contact          | integer                   |           |
not null |
 dataset_url               | text                      |           |
       |
 classification_level      | classification_level_type |           |
not null |
 physical_location         | text                      |           |
       |
 quality_control           | yes_no_type               |           |
not null |
 dataset_documentation_url | text                      |           |
not null |
 description               | text                      |           |
       |
 node_type                 | node_type                 |           |
       |
 dummy                     | integer                   |           |
       |
Indexes:
    "dataset_pkey" PRIMARY KEY, btree (node_id)
Foreign-key constraints:
    "node_id" FOREIGN KEY (node_id) REFERENCES node(node_id) NOT VALID
    "poc" FOREIGN KEY (point_of_contact) REFERENCES poc(poc_id) NOT VALID
Referenced by:
    TABLE "dataset_table" CONSTRAINT "dataset" FOREIGN KEY (node_id)
REFERENCES datas
et(node_id) NOT VALID
    TABLE "dataset_subject" CONSTRAINT "dataset_subject_node_id_fkey"
FOREIGN KEY (no
de_id) REFERENCES dataset(node_id)
    TABLE "system_dataset" CONSTRAINT "system_dataset_node_id_fkey"
FOREIGN KEY (node
_id) REFERENCES dataset(node_id) NOT VALID
Inherits: node

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 3:03 PM
*To:* mark bradley <markbradyju@outlook.com>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:51, mark bradley wrote:
Reply to list alos.
Ccing list.

Looks like there is no index on node_id at the moment

That would be an issue and also would mean it is not a PK.

In psql  do:

\d dataset

and show the results as text in your reply.

Mark Brady, Ph.D.
Deputy Chief Data Officer, TRMC
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;_

------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 1:22 PM
*To:* mark bradley <markbradyju@outlook.com>; Ron Johnson
<ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:11, mark bradley wrote:

Here is the table definition:

And here is the error message I get when I try to delete a duplicate:

Please answer the following:

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

Mark Brady,
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;&gt;_

------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#10mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#9)
Re: Duplicate Key Values

This is what MS Copilot has to say about this apparent bug where Postgres inserts extra rows violating a primary keys uniqueness constraint:

Yes, this issue has been encountered by others. There are a few potential reasons why this might happen:

1.
Sequence Out of Sync: Sometimes, the sequence that generates unique values for the primary key can become out of sync, especially after a bulk import or a database restore. You can check if the sequence is out of sync and reset it if necessary.
2.
Index Corruption: Index corruption can occur due to various reasons, such as hardware failures or bugs in earlier versions of PostgreSQL. This can lead to duplicate primary keys being inserted.
3.
Table Inheritance: If you are using table inheritance, primary keys are not enforced among inherited tables. This can lead to duplicates if not handled correctly.
4.
Application Logic: Sometimes, the application logic might inadvertently insert duplicate records. Reviewing the application code and insert statements can help identify and resolve such issues.

To resolve the issue, you can:

*
Check and reset the sequence if it's out of sync.
*
Rebuild the index if it's corrupted.

Any of the first 3 could be involved. There isn't an application involved other than pgAdmin.

1.
Originally, the key in the node table was a sequence, but I changed it to a non-sequence.
2.
There is no index on the primary key node_id, and I understand there should be one.
3.
I didn't explicitly use Postgres inheritance but there are two tables that are subclasses of node. There are dataset nodes and processing_node [s] tables. Each is a type of node and have primary keys that are foreign keys from the node table. This key is node_id.

What to do? I hesitate to just delete my tables and start over because this error will reoccur.

Best regards,
Mark Brady
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Thursday, March 6, 2025 3:34 PM
To: mark bradley <markbradyju@outlook.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/6/25 12:06, mark bradley wrote:

My mistake I forgot to Cc list on my previous post, which was:

That would be an issue and also would mean it is not a PK.

In psql do:

\d dataset

and show the results as text in your reply.

Ccing list

The below shows there is an index("dataset_pkey") on node_id.

Note, reindexing will take a lock on the table that prevents changing
data while the operation is running. See the below for more information:

https://www.postgresql.org/docs/current/sql-reindex.html

If the table is not to big and you can interrupt access to it then the
simplest command to run would be:

REINDEX TABLE dataset;

Universal Metadata Schema=# \d dataset
Table "public.dataset"
Column | Type | Collation |
Nullable | Defau
lt
---------------------------+---------------------------+-----------+----------+------
---
node_id | integer | |
not null |
dataset_name | character varying(25) | |
not null |
notes | text | |
|
dataset_type | database_type | |
not null |
dataset_maturity | database_maturity_type | |
not null |
disposition | disposition_type | |
not null |
start_date | date | |
|
end_date | date | |
|
most_recent_update | date | |
|
update_periodicity | interval | |
|
system_of_record | text | |
|
point_of_contact | integer | |
not null |
dataset_url | text | |
|
classification_level | classification_level_type | |
not null |
physical_location | text | |
|
quality_control | yes_no_type | |
not null |
dataset_documentation_url | text | |
not null |
description | text | |
|
node_type | node_type | |
|
dummy | integer | |
|
Indexes:
"dataset_pkey" PRIMARY KEY, btree (node_id)
Foreign-key constraints:
"node_id" FOREIGN KEY (node_id) REFERENCES node(node_id) NOT VALID
"poc" FOREIGN KEY (point_of_contact) REFERENCES poc(poc_id) NOT VALID
Referenced by:
TABLE "dataset_table" CONSTRAINT "dataset" FOREIGN KEY (node_id)
REFERENCES datas
et(node_id) NOT VALID
TABLE "dataset_subject" CONSTRAINT "dataset_subject_node_id_fkey"
FOREIGN KEY (no
de_id) REFERENCES dataset(node_id)
TABLE "system_dataset" CONSTRAINT "system_dataset_node_id_fkey"
FOREIGN KEY (node
_id) REFERENCES dataset(node_id) NOT VALID
Inherits: node

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 3:03 PM
*To:* mark bradley <markbradyju@outlook.com>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:51, mark bradley wrote:
Reply to list alos.
Ccing list.

Looks like there is no index on node_id at the moment

That would be an issue and also would mean it is not a PK.

In psql do:

\d dataset

and show the results as text in your reply.

Mark Brady, Ph.D.
Deputy Chief Data Officer, TRMC
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;_

------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 1:22 PM
*To:* mark bradley <markbradyju@outlook.com>; Ron Johnson
<ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:11, mark bradley wrote:

Here is the table definition:

And here is the error message I get when I try to delete a duplicate:

Please answer the following:

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

Mark Brady,
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;&gt;_

------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#11mark bradley
markbradyju@outlook.com
In reply to: mark bradley (#10)
Re: Duplicate Key Values

Here are some of the references from Copilot

https://dba.stackexchange.com/questions/62675/why-does-my-table-hold-duplicate-primary-keys

https://stackoverflow.com/questions/55551461/having-duplicate-rows-on-a-primary-key-and-unique-constraints-in-postgres

Best regards,
Mark Brady
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: mark bradley <markbradyju@outlook.com>
Sent: Friday, March 7, 2025 9:34 AM
To: Adrian Klaver <adrian.klaver@aklaver.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

This is what MS Copilot has to say about this apparent bug where Postgres inserts extra rows violating a primary keys uniqueness constraint:

Yes, this issue has been encountered by others. There are a few potential reasons why this might happen:

1.
Sequence Out of Sync: Sometimes, the sequence that generates unique values for the primary key can become out of sync, especially after a bulk import or a database restore. You can check if the sequence is out of sync and reset it if necessary.
2.
Index Corruption: Index corruption can occur due to various reasons, such as hardware failures or bugs in earlier versions of PostgreSQL. This can lead to duplicate primary keys being inserted.
3.
Table Inheritance: If you are using table inheritance, primary keys are not enforced among inherited tables. This can lead to duplicates if not handled correctly.
4.
Application Logic: Sometimes, the application logic might inadvertently insert duplicate records. Reviewing the application code and insert statements can help identify and resolve such issues.

To resolve the issue, you can:

*
Check and reset the sequence if it's out of sync.
*
Rebuild the index if it's corrupted.

Any of the first 3 could be involved. There isn't an application involved other than pgAdmin.

1.
Originally, the key in the node table was a sequence, but I changed it to a non-sequence.
2.
There is no index on the primary key node_id, and I understand there should be one.
3.
I didn't explicitly use Postgres inheritance but there are two tables that are subclasses of node. There are dataset nodes and processing_node [s] tables. Each is a type of node and have primary keys that are foreign keys from the node table. This key is node_id.

What to do? I hesitate to just delete my tables and start over because this error will reoccur.

Best regards,
Mark Brady
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Thursday, March 6, 2025 3:34 PM
To: mark bradley <markbradyju@outlook.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/6/25 12:06, mark bradley wrote:

My mistake I forgot to Cc list on my previous post, which was:

That would be an issue and also would mean it is not a PK.

In psql do:

\d dataset

and show the results as text in your reply.

Ccing list

The below shows there is an index("dataset_pkey") on node_id.

Note, reindexing will take a lock on the table that prevents changing
data while the operation is running. See the below for more information:

https://www.postgresql.org/docs/current/sql-reindex.html

If the table is not to big and you can interrupt access to it then the
simplest command to run would be:

REINDEX TABLE dataset;

Universal Metadata Schema=# \d dataset
Table "public.dataset"
Column | Type | Collation |
Nullable | Defau
lt
---------------------------+---------------------------+-----------+----------+------
---
node_id | integer | |
not null |
dataset_name | character varying(25) | |
not null |
notes | text | |
|
dataset_type | database_type | |
not null |
dataset_maturity | database_maturity_type | |
not null |
disposition | disposition_type | |
not null |
start_date | date | |
|
end_date | date | |
|
most_recent_update | date | |
|
update_periodicity | interval | |
|
system_of_record | text | |
|
point_of_contact | integer | |
not null |
dataset_url | text | |
|
classification_level | classification_level_type | |
not null |
physical_location | text | |
|
quality_control | yes_no_type | |
not null |
dataset_documentation_url | text | |
not null |
description | text | |
|
node_type | node_type | |
|
dummy | integer | |
|
Indexes:
"dataset_pkey" PRIMARY KEY, btree (node_id)
Foreign-key constraints:
"node_id" FOREIGN KEY (node_id) REFERENCES node(node_id) NOT VALID
"poc" FOREIGN KEY (point_of_contact) REFERENCES poc(poc_id) NOT VALID
Referenced by:
TABLE "dataset_table" CONSTRAINT "dataset" FOREIGN KEY (node_id)
REFERENCES datas
et(node_id) NOT VALID
TABLE "dataset_subject" CONSTRAINT "dataset_subject_node_id_fkey"
FOREIGN KEY (no
de_id) REFERENCES dataset(node_id)
TABLE "system_dataset" CONSTRAINT "system_dataset_node_id_fkey"
FOREIGN KEY (node
_id) REFERENCES dataset(node_id) NOT VALID
Inherits: node

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 3:03 PM
*To:* mark bradley <markbradyju@outlook.com>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:51, mark bradley wrote:
Reply to list alos.
Ccing list.

Looks like there is no index on node_id at the moment

That would be an issue and also would mean it is not a PK.

In psql do:

\d dataset

and show the results as text in your reply.

Mark Brady, Ph.D.
Deputy Chief Data Officer, TRMC
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;_

------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Thursday, March 6, 2025 1:22 PM
*To:* mark bradley <markbradyju@outlook.com>; Ron Johnson
<ronljohnsonjr@gmail.com>; pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/6/25 10:11, mark bradley wrote:

Here is the table definition:

And here is the error message I get when I try to delete a duplicate:

Please answer the following:

1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?

Mark Brady,
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady

<https://amazon.com/author/markjbrady&gt;&gt;&gt;_

------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#12Greg Sabino Mullane
greg@turnstep.com
In reply to: mark bradley (#10)
Re: Duplicate Key Values

On Fri, Mar 7, 2025 at 9:35 AM mark bradley <markbradyju@outlook.com> wrote:

This is what MS Copilot has to say about this apparent bug where Postgres
inserts extra rows violating a primary keys uniqueness constraint:

Yes, this issue has been encountered by others. There are a few potential
reasons why this might happen:

1. *Sequence Out of Sync*: Sometimes, the sequence that generates
unique values for the primary key can become out of sync, especially after
a bulk import or a database restore. You can check if the sequence is out
of sync and reset it if necessary.
2. *Index Corruption*: Index corruption can occur due to various
reasons, such as hardware failures or bugs in earlier versions of
PostgreSQL. This can lead to duplicate primary keys being inserted.
3. *Table Inheritance*: If you are using table inheritance, primary
keys are not enforced among inherited tables. This can lead to duplicates
if not handled correctly.
4. *Application Logic*: Sometimes, the application logic might
inadvertently insert duplicate records. Reviewing the application code and
insert statements can help identify and resolve such issues.

This is AI gobbledygook, and can be ignored. The only real option

is number 2 (index corruption).

1. There is no index on the primary key *node_id*, and I understand
there should be one.

There is an index, as your table definition showed.

What to do? I hesitate to just delete my tables and start over because

this error will reoccur.

The error should not reoccur. At least, a normal Postgres system will
prevent this from happening in the first place. To clean it up, carefully
run the below. If an error appears, or something does not look right,
rollback and stop.

-- Encourage not using indexes:
set enable_indexscan = 0;
set enable_bitmapscan = 0;
set enable_indexonlyscan = 0;

-- Sanity check. This should return a number greater than 1. If not, stop.
set search_path = public;
select count(*) from dataset where node_id = 26;

-- Make a backup:
create table dataset_backup as select * from dataset;

-- Test out the process on a subset of the data:
create table test_dataset as select * from dataset where node_id < 30;
create table test_dataset_duperows_20250307 (like dataset);
begin;
set local session_replication_role = 'replica';
with goodctids as (select min(ctid) from TEST_dataset group by node_id)
, mydelete as (delete from TEST_dataset where not exists (select 1 from
goodctids where min=ctid)
returning *)
insert into test_dataset_duperows_20250307 select * from mydelete;
reset session_replication_role;
commit;

-- STOP HERE and examine the test_dataset and
test_dataset_duperows_20250307 tables
--
-- If ZERO rows were deleted, then you should no go further,
-- as some of the underlying assumptions must be wrong.

-- Do the real table:
create table dataset_duperows_20250307 (like dataset);
begin;
set local session_replication_role = 'replica';
with goodctids as (select min(ctid) from dataset group by node_id)
, mydelete as (delete from dataset where not exists (select 1 from
goodctids where min=ctid)
returning *)
insert into dataset_duperows_20250307 select * from mydelete;
reset session_replication_role;
commit;

-- Rebuild the index
reindex index concurrently dataset_pkey;

-- Put things back from good measure:
reset enable_indexscan;
reset enable_bitmapscan;
reset enable_indexonlyscan;

drop table test_dataset;
drop table test_dataset_duperows;

Given the issues, I would keep the dataset_backup table around for a while.
And make sure your backups are running and up to date.

You might also want to reindex all the tables in your database, to see if
any other issues are lurking.

Check your Postgres logs closely to see if anything else unusual has
appeared.

Look over your OS logs to see if there are clues as to how the corruption
happened. Maybe you recently upgraded your OS?

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#13Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#10)
Re: Duplicate Key Values

On 3/7/25 06:34, mark bradley wrote:

This is what MS Copilot has to say about this apparent bug where
Postgres inserts extra rows violating a primary keys uniqueness constraint:

What to do?  I hesitate to just delete my tables and start over because
this error will reoccur.

Let the AI solve it.

Best regards,
Mark Brady

--
Adrian Klaver
adrian.klaver@aklaver.com

#14mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#13)
Re: Duplicate Key Values

Wouldn't that be nice 🙂

Best regards,
Mark Brady
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Friday, March 7, 2025 10:55 AM
To: mark bradley <markbradyju@outlook.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/7/25 06:34, mark bradley wrote:

This is what MS Copilot has to say about this apparent bug where
Postgres inserts extra rows violating a primary keys uniqueness constraint:

What to do? I hesitate to just delete my tables and start over because
this error will reoccur.

Let the AI solve it.

Best regards,
Mark Brady

--
Adrian Klaver
adrian.klaver@aklaver.com

#15Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#14)
Re: Duplicate Key Values

On 3/7/25 11:47, mark bradley wrote:

Wouldn't that be nice 🙂

No, because you would end up with a mess.

My AI rant:

AI is neither artificial or intelligent. It is human code that pattern
matches and then throws the matches against the wall and hope something
sticks. It is left to the human to clean up.

At any rate, in this post:

/messages/by-id/75b33741-ee99-4524-b63a-edad21c1266d@aklaver.com

You where provided an answer, which is further extended here:

/messages/by-id/CAKAnmm+BBBaXGN2xPHhXywkwb72UWzinWu2wQ5WadcMw3_57rQ@mail.gmail.com

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Friday, March 7, 2025 10:55 AM
*To:* mark bradley <markbradyju@outlook.com>
*Cc:* pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/7/25 06:34, mark bradley wrote:

This is what MS Copilot has to say about this apparent bug where
Postgres inserts extra rows violating a primary keys uniqueness constraint:

What to do?  I hesitate to just delete my tables and start over because
this error will reoccur.

Let the AI solve it.

Best regards,
Mark Brady

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#16mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#15)
Re: Duplicate Key Values

An "interesting" effect of reindexing is that all the records that were dups in the nodes table were deleted, both copies.

Also, all rows having node_id as a foreign key in other tables were deleted, which means all rows in these tables were deleted.

Fortunately these are not huge tables. I will reenter the data, make a backup, and then try your further extended suggestions.

Best regards,
Mark Brady
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Friday, March 7, 2025 3:25 PM
To: mark bradley <markbradyju@outlook.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/7/25 11:47, mark bradley wrote:

Wouldn't that be nice 🙂

No, because you would end up with a mess.

My AI rant:

AI is neither artificial or intelligent. It is human code that pattern
matches and then throws the matches against the wall and hope something
sticks. It is left to the human to clean up.

At any rate, in this post:

/messages/by-id/75b33741-ee99-4524-b63a-edad21c1266d@aklaver.com

You where provided an answer, which is further extended here:

/messages/by-id/CAKAnmm+BBBaXGN2xPHhXywkwb72UWzinWu2wQ5WadcMw3_57rQ@mail.gmail.com

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Friday, March 7, 2025 10:55 AM
*To:* mark bradley <markbradyju@outlook.com>
*Cc:* pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/7/25 06:34, mark bradley wrote:

This is what MS Copilot has to say about this apparent bug where
Postgres inserts extra rows violating a primary keys uniqueness constraint:

What to do? I hesitate to just delete my tables and start over because
this error will reoccur.

Let the AI solve it.

Best regards,
Mark Brady

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#17Greg Sabino Mullane
greg@turnstep.com
In reply to: mark bradley (#16)
Re: Duplicate Key Values

On Tue, Mar 11, 2025 at 10:29 AM mark bradley <markbradyju@outlook.com>
wrote:

An "interesting" effect of reindexing is that all the records that were
dups in the nodes table were deleted, both copies.

Er...that's not just interesting, but alarming - if true. Can you show the
steps you took?

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#18Ron
ronljohnsonjr@gmail.com
In reply to: mark bradley (#16)
Re: Duplicate Key Values

Do you *currently* take regular backups?

We'd be glad to show you how to take regular logical backups.

On Tue, Mar 11, 2025 at 10:29 AM mark bradley <markbradyju@outlook.com>
wrote:

An "interesting" effect of reindexing is that all the records that were
dups in the nodes table were deleted, both copies.

Also, all rows having node_id as a foreign key in other tables were
deleted, which means all rows in these tables were deleted.

Fortunately these are not huge tables. I will reenter the data, make a
backup, and then try your further extended suggestions.

Best regards,
Mark Brady
*amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;*
------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Friday, March 7, 2025 3:25 PM
*To:* mark bradley <markbradyju@outlook.com>
*Cc:* pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values

On 3/7/25 11:47, mark bradley wrote:

Wouldn't that be nice 🙂

No, because you would end up with a mess.

My AI rant:

AI is neither artificial or intelligent. It is human code that pattern
matches and then throws the matches against the wall and hope something
sticks. It is left to the human to clean up.

At any rate, in this post:

/messages/by-id/75b33741-ee99-4524-b63a-edad21c1266d@aklaver.com

You where provided an answer, which is further extended here:

/messages/by-id/CAKAnmm+BBBaXGN2xPHhXywkwb72UWzinWu2wQ5WadcMw3_57rQ@mail.gmail.com

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@aklaver.com>
*Sent:* Friday, March 7, 2025 10:55 AM
*To:* mark bradley <markbradyju@outlook.com>
*Cc:* pgsql-general <pgsql-general@postgresql.org>
*Subject:* Re: Duplicate Key Values
On 3/7/25 06:34, mark bradley wrote:

This is what MS Copilot has to say about this apparent bug where
Postgres inserts extra rows violating a primary keys uniqueness

constraint:

What to do? I hesitate to just delete my tables and start over because
this error will reoccur.

Let the AI solve it.

Best regards,
Mark Brady

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#19Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#16)
Re: Duplicate Key Values

On 3/11/25 07:28, mark bradley wrote:

An "interesting" effect of reindexing is that all the records that were
dups in the nodes table were deleted, both copies.

I am trying to understand above.

Was there at least one row of each node_id left?

Also, all rows having node_id as a foreign key in other tables were
deleted, which means all rows in these tables were deleted.

Fortunately these are not huge tables.  I will reenter the data, make a
backup, and then try your further extended suggestions.

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_

--
Adrian Klaver
adrian.klaver@aklaver.com

#20mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#19)
Re: Duplicate Key Values

The rows that were preserved in the nodes table were the ones that were not dups originally.

Best regards,
Mark Brady
amazon.com/author/markjbrady<https://amazon.com/author/markjbrady&gt;
________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Tuesday, March 11, 2025 10:56 AM
To: mark bradley <markbradyju@outlook.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Duplicate Key Values

On 3/11/25 07:28, mark bradley wrote:

An "interesting" effect of reindexing is that all the records that were
dups in the nodes table were deleted, both copies.

I am trying to understand above.

Was there at least one row of each node_id left?

Also, all rows having node_id as a foreign key in other tables were
deleted, which means all rows in these tables were deleted.

Fortunately these are not huge tables. I will reenter the data, make a
backup, and then try your further extended suggestions.

Best regards,
Mark Brady
_amazon.com/author/markjbrady <https://amazon.com/author/markjbrady&gt;_

--
Adrian Klaver
adrian.klaver@aklaver.com

#21Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#20)
#22mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#21)
#23Greg Sabino Mullane
greg@turnstep.com
In reply to: mark bradley (#22)
#24Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#22)
#25mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#24)
#26Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#25)
#27mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#26)
#28Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#27)
#29Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Adrian Klaver (#28)
#30Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Adrian Klaver (#29)
#31mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#30)
#32Ron
ronljohnsonjr@gmail.com
In reply to: mark bradley (#31)
#33Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#31)
#34Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#31)
#35mark bradley
markbradyju@outlook.com
In reply to: Ron (#32)
#36Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#35)
#37mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#36)
#38mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#36)
#39Adrian Klaver
adrian.klaver@aklaver.com
In reply to: mark bradley (#37)
#40mark bradley
markbradyju@outlook.com
In reply to: Adrian Klaver (#39)