running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

Started by Torsten Krahover 2 years ago8 messagesgeneral

krah.tm@gmail.com

over 2 years ago

Hi,

I am running that one (official docker image)

PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

and one of my nightly jobs reported that error yesterday when running
an "ANALYZE":

FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pg_statistic_relid_att_inh_index«
Detail: Schlüssel »(starelid, staattnum, stainherit)=(2609, 4, f)« existiert bereits.

which should translate to something like:

ERROR: duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

DETAIL: Key (starelid, staattnum, stainherit)=(2609, 4, f) already exists.

Anyone an idea what's wrong?

Maybe (not?) related but sometimes the analyze does fail with:

ERROR: attempted to delete invisible tuple

Both errors are only happening here and there - so I don't have a
reproducer, but still I am curious what is wrong here with me running
an "ANALYZE" after my data import.

thanks for insights :)

kind regards

Torsten

Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: Torsten Krah (#1)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

On Wed, 2023-09-06 at 09:46 +0200, Torsten Krah wrote:

I am running that one (official docker image)

PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

and one of my nightly jobs reported that error yesterday when running
an "ANALYZE":

FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pg_statistic_relid_att_inh_index«
Detail: Schlüssel »(starelid, staattnum, stainherit)=(2609, 4, f)« existiert bereits.

which should translate to something like:

ERROR: duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

DETAIL: Key (starelid, staattnum, stainherit)=(2609, 4, f) already exists.

Anyone an idea what's wrong?

Yes: the metadata table pg_statistic has data corruption.

Maybe (not?) related but sometimes the analyze does fail with:

ERROR: attempted to delete invisible tuple

That also looks like data corrupton, albeit different one.

Both errors are only happening here and there - so I don't have a
reproducer, but still I am curious what is wrong here with me running
an "ANALYZE" after my data import.

To fix the "pg_statistic" error:

- take down time

- set "allow_system_mods = on"

- TRUNCATE pg_statistic;

- ANALYZE;

You are lucky that the corrupted table is one that holds data that can be rebuilt.

Yours,
Laurenz Albe

Torsten Krah

krah.tm@gmail.com

over 2 years ago

In reply to: Laurenz Albe (#2)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

Am Mittwoch, dem 06.09.2023 um 10:21 +0200 schrieb Laurenz Albe:

You are lucky that the corrupted table is one that holds data that
can be rebuilt.

It is a test instance / container anyway which is deleted afterwards
and can be setup again as often as I want.

But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.

After that I am starting a container from that image, (re)importing
data (different tests => different data so the cycle of delete data /
import data / analyze the data happens quite often) and running my
tests.
The OS does not report anything which would relate nor does any other
tool / system fail nor does postgresl itself fail on any other table
here - it always fails only on that analyze part.

That happens all in about 8-10 minutes for the whole process - what is
causing that corruption in that short timeframe here?

regards

Torsten

Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: Torsten Krah (#3)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

On Wed, 2023-09-06 at 10:33 +0200, Torsten Krah wrote:

Am Mittwoch, dem 06.09.2023 um 10:21 +0200 schrieb Laurenz Albe:

You are lucky that the corrupted table is one that holds data that
can be rebuilt.

It is a test instance / container anyway which is deleted afterwards
and can be setup again as often as I want.

But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.

After that I am starting a container from that image, (re)importing
data (different tests => different data so the cycle of delete data /
import data / analyze the data happens quite often) and running my
tests.
The OS does not report anything which would relate nor does any other
tool / system fail nor does postgresl itself fail on any other table
here - it always fails only on that analyze part.

That happens all in about 8-10 minutes for the whole process - what is
causing that corruption in that short timeframe here?

If you have a reproducible way to create the data corruption, that would
be very interesting. It micht be a software bug.

Yours,
Laurenz Albe

Erik Wienhold

ewie@ewie.name

over 2 years ago

In reply to: Torsten Krah (#1)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

On 06/09/2023 09:46 CEST Torsten Krah <krah.tm@gmail.com> wrote:

I am running that one (official docker image)

PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

Have you also tried with 13.12?

and one of my nightly jobs reported that error yesterday when running
an "ANALYZE":

FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pg_statistic_relid_att_inh_index«
Detail: Schlüssel »(starelid, staattnum, stainherit)=(2609, 4, f)« existiert bereits.

which should translate to something like:

ERROR: duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

DETAIL: Key (starelid, staattnum, stainherit)=(2609, 4, f) already exists.

Anyone an idea what's wrong?

Maybe (not?) related but sometimes the analyze does fail with:

ERROR: attempted to delete invisible tuple

Both errors are only happening here and there - so I don't have a
reproducer, but still I am curious what is wrong here with me running
an "ANALYZE" after my data import.

Does the unique constraint violation always occur for the same row? OID 2609
is pg_description.

--
Erik

Torsten Krah

krah.tm@gmail.com

over 2 years ago

In reply to: Erik Wienhold (#5)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

Am Mittwoch, dem 06.09.2023 um 12:04 +0200 schrieb Erik Wienhold:

I am running that one (official docker image)

PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

Have you also tried with 13.12?

Yes, but it did also happen on previous versions before 13.11 / 13.12
sometimes (I just ignored it until now because it happens so rarely).

Does the unique constraint violation always occur for the same row?
OID 2609
is pg_description.

As I don't have a reproducer yet (I did not track stats but lets say it
runs fine for 100 / 200 times and 1 or 2 of those are failing with
those mentioned analyze errors - it may even be less than that) I can't
tell you if it fails always over that OID - I need to wait for it to
happen again, I will report here if it is the same - may take some time
;).

Torsten

Imre Samu

pella.samu@gmail.com

over 2 years ago

In reply to: Torsten Krah (#3)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.

Hi Torsten,

Maybe you have to increase the "*--stop-timeout" value ; ( or
"*stop_grace_period"
in docker-compose )
https://github.com/docker-library/postgres/issues/544#issuecomment-455738848

*docker run: **" --stop-timeout Timeout (in seconds) to stop a container "*
https://docs.docker.com/engine/reference/commandline/run/
or
https://docs.docker.com/compose/compose-file/compose-file-v3/#stop_grace_period

*STOPSIGNAL SIGINT## An additional setting that is recommended for all
users regardless of this# value is the runtime "--stop-timeout" (or your
orchestrator/runtime's# equivalent) for controlling how long to wait
between sending the defined# STOPSIGNAL and sending SIGKILL (which is
likely to cause data corruption).## The default in most runtimes (such as
Docker) is 10 seconds, and the# documentation at
https://www.postgresql.org/docs/12/server-start.html
<https://www.postgresql.org/docs/12/server-start.html> notes# that even 90
seconds may not be long enough in many instances.*

regards,
Imre

Torsten Krah <krah.tm@gmail.com> ezt írta (időpont: 2023. szept. 6., Sze,
14:45):

Show quoted text

Am Mittwoch, dem 06.09.2023 um 10:21 +0200 schrieb Laurenz Albe:

You are lucky that the corrupted table is one that holds data that
can be rebuilt.

It is a test instance / container anyway which is deleted afterwards
and can be setup again as often as I want.

But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.

After that I am starting a container from that image, (re)importing
data (different tests => different data so the cycle of delete data /
import data / analyze the data happens quite often) and running my
tests.
The OS does not report anything which would relate nor does any other
tool / system fail nor does postgresl itself fail on any other table
here - it always fails only on that analyze part.

That happens all in about 8-10 minutes for the whole process - what is
causing that corruption in that short timeframe here?

regards

Torsten

Torsten Krah

krah.tm@gmail.com

over 2 years ago

In reply to: Imre Samu (#7)

Re: running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"

Am Mittwoch, dem 06.09.2023 um 20:42 +0200 schrieb Imre Samu:

Maybe you have to increase the "*--stop-timeout" value

That is totally unrelated in my case, it is an anonymous volume anyway
which gets created on start and deleted afterwards.

Torsten