running ANALYZE results in => duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"
Hi,
I am running that one (official docker image)
PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
and one of my nightly jobs reported that error yesterday when running
an "ANALYZE":
FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pg_statistic_relid_att_inh_index«
Detail: Schlüssel »(starelid, staattnum, stainherit)=(2609, 4, f)« existiert bereits.
which should translate to something like:
ERROR: duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"
DETAIL: Key (starelid, staattnum, stainherit)=(2609, 4, f) already exists.
Anyone an idea what's wrong?
Maybe (not?) related but sometimes the analyze does fail with:
ERROR: attempted to delete invisible tuple
Both errors are only happening here and there - so I don't have a
reproducer, but still I am curious what is wrong here with me running
an "ANALYZE" after my data import.
thanks for insights :)
kind regards
Torsten
On Wed, 2023-09-06 at 09:46 +0200, Torsten Krah wrote:
I am running that one (official docker image)
PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bitand one of my nightly jobs reported that error yesterday when running
an "ANALYZE":FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pg_statistic_relid_att_inh_index«
Detail: Schlüssel »(starelid, staattnum, stainherit)=(2609, 4, f)« existiert bereits.which should translate to something like:
ERROR: duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"
DETAIL: Key (starelid, staattnum, stainherit)=(2609, 4, f) already exists.
Anyone an idea what's wrong?
Yes: the metadata table pg_statistic has data corruption.
Maybe (not?) related but sometimes the analyze does fail with:
ERROR: attempted to delete invisible tuple
That also looks like data corrupton, albeit different one.
Both errors are only happening here and there - so I don't have a
reproducer, but still I am curious what is wrong here with me running
an "ANALYZE" after my data import.
To fix the "pg_statistic" error:
- take down time
- set "allow_system_mods = on"
- TRUNCATE pg_statistic;
- ANALYZE;
You are lucky that the corrupted table is one that holds data that can be rebuilt.
Yours,
Laurenz Albe
Am Mittwoch, dem 06.09.2023 um 10:21 +0200 schrieb Laurenz Albe:
You are lucky that the corrupted table is one that holds data that
can be rebuilt.
It is a test instance / container anyway which is deleted afterwards
and can be setup again as often as I want.
But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.
After that I am starting a container from that image, (re)importing
data (different tests => different data so the cycle of delete data /
import data / analyze the data happens quite often) and running my
tests.
The OS does not report anything which would relate nor does any other
tool / system fail nor does postgresl itself fail on any other table
here - it always fails only on that analyze part.
That happens all in about 8-10 minutes for the whole process - what is
causing that corruption in that short timeframe here?
regards
Torsten
On Wed, 2023-09-06 at 10:33 +0200, Torsten Krah wrote:
Am Mittwoch, dem 06.09.2023 um 10:21 +0200 schrieb Laurenz Albe:
You are lucky that the corrupted table is one that holds data that
can be rebuilt.It is a test instance / container anyway which is deleted afterwards
and can be setup again as often as I want.But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.After that I am starting a container from that image, (re)importing
data (different tests => different data so the cycle of delete data /
import data / analyze the data happens quite often) and running my
tests.
The OS does not report anything which would relate nor does any other
tool / system fail nor does postgresl itself fail on any other table
here - it always fails only on that analyze part.That happens all in about 8-10 minutes for the whole process - what is
causing that corruption in that short timeframe here?
If you have a reproducible way to create the data corruption, that would
be very interesting. It micht be a software bug.
Yours,
Laurenz Albe
On 06/09/2023 09:46 CEST Torsten Krah <krah.tm@gmail.com> wrote:
I am running that one (official docker image)
PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
Have you also tried with 13.12?
and one of my nightly jobs reported that error yesterday when running
an "ANALYZE":FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pg_statistic_relid_att_inh_index«
Detail: Schlüssel »(starelid, staattnum, stainherit)=(2609, 4, f)« existiert bereits.which should translate to something like:
ERROR: duplicate key value violates unique constraint "pg_statistic_relid_att_inh_index"
DETAIL: Key (starelid, staattnum, stainherit)=(2609, 4, f) already exists.
Anyone an idea what's wrong?
Maybe (not?) related but sometimes the analyze does fail with:
ERROR: attempted to delete invisible tuple
Both errors are only happening here and there - so I don't have a
reproducer, but still I am curious what is wrong here with me running
an "ANALYZE" after my data import.
Does the unique constraint violation always occur for the same row? OID 2609
is pg_description.
--
Erik
Am Mittwoch, dem 06.09.2023 um 12:04 +0200 schrieb Erik Wienhold:
I am running that one (official docker image)
PostgreSQL 13.11 (Debian 13.11-1.pgdg110+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bitHave you also tried with 13.12?
Yes, but it did also happen on previous versions before 13.11 / 13.12
sometimes (I just ignored it until now because it happens so rarely).
Does the unique constraint violation always occur for the same row?
OID 2609
is pg_description.
As I don't have a reproducer yet (I did not track stats but lets say it
runs fine for 100 / 200 times and 1 or 2 of those are failing with
those mentioned analyze errors - it may even be less than that) I can't
tell you if it fails always over that OID - I need to wait for it to
happen again, I will report here if it is the same - may take some time
;).
Torsten
But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.
Hi Torsten,
Maybe you have to increase the "*--stop-timeout" value ; ( or
"*stop_grace_period"
in docker-compose )
https://github.com/docker-library/postgres/issues/544#issuecomment-455738848
*docker run: **" --stop-timeout Timeout (in seconds) to stop a container "*
https://docs.docker.com/engine/reference/commandline/run/
or
https://docs.docker.com/compose/compose-file/compose-file-v3/#stop_grace_period
And recommended in the Dockerfile:
https://github.com/docker-library/postgres/blob/master/Dockerfile-debian.template#L208
*STOPSIGNAL SIGINT## An additional setting that is recommended for all
users regardless of this# value is the runtime "--stop-timeout" (or your
orchestrator/runtime's# equivalent) for controlling how long to wait
between sending the defined# STOPSIGNAL and sending SIGKILL (which is
likely to cause data corruption).## The default in most runtimes (such as
Docker) is 10 seconds, and the# documentation at
https://www.postgresql.org/docs/12/server-start.html
<https://www.postgresql.org/docs/12/server-start.html> notes# that even 90
seconds may not be long enough in many instances.*
regards,
Imre
Torsten Krah <krah.tm@gmail.com> ezt írta (időpont: 2023. szept. 6., Sze,
14:45):
Show quoted text
Am Mittwoch, dem 06.09.2023 um 10:21 +0200 schrieb Laurenz Albe:
You are lucky that the corrupted table is one that holds data that
can be rebuilt.It is a test instance / container anyway which is deleted afterwards
and can be setup again as often as I want.But how is that corruption happening - I mean it is a docker image,
freshly fetched from the registry.After that I am starting a container from that image, (re)importing
data (different tests => different data so the cycle of delete data /
import data / analyze the data happens quite often) and running my
tests.
The OS does not report anything which would relate nor does any other
tool / system fail nor does postgresl itself fail on any other table
here - it always fails only on that analyze part.That happens all in about 8-10 minutes for the whole process - what is
causing that corruption in that short timeframe here?regards
Torsten
Am Mittwoch, dem 06.09.2023 um 20:42 +0200 schrieb Imre Samu:
Maybe you have to increase the "*--stop-timeout" value
That is totally unrelated in my case, it is an anonymous volume anyway
which gets created on start and deleted afterwards.
Torsten