ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

Started by Tushar Takate7 months ago6 messagesbugs

tushar11.takate@gmail.com

7 months ago

Hi Team,

I was reviewing a couple of community threads in pgsql-bugs and
pgsql-general, however, I was unable to determine whether this is a bug or
actual corruption.

*Details as below *

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

*Error/Issue : *
vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:
found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11
02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""
2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"
127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03
UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""","VACUUM (VERBOSE, ANALYZE)
public.order;",,,"vacuumdb","client backend",,-5528190995457849841

*One more thing/observation we saw in the PostgreSQL logs :*

The following message consistently appeared once a day during the past week

2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18
UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible but
visibility map bit is set in relation ""order"" page 5815453",,,,,"while
scanning block 5815453 of relation ""public.order""",,,,"","autovacuum
worker",,0

What specific condition or scenario is triggering this PostgreSQL
error? Can it be classified as a bug? If not, what’s a safe
and efficient way to resolve it without relying on a dump and restore,
particularly for large, mission-critical tables over 200GB?

-
Thanks & Regards,

Tushar.
LinkedIn : Tushar <https://www.linkedin.com/in/tushar-t-93660867/>
My-Blogs : Tushar Blogspot <http://tushar-postgresql.blogspot.in/>

Laurenz Albe

laurenz.albe@cybertec.at

7 months ago

In reply to: Tushar Takate (#1)

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

Error/Issue :

vacuumdb: error: processing of database "live_order_us_db" failed: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""
2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03 UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client backend",,-5528190995457849841

That is probably caused by a PostgreSQL bug; you can get rid of the message
by creating the "pg_surgery" extension and running

SELECT heap_force_freeze('public.order'::regclass, '{(5821149,5)}'::tid[]);

One more thing/observation we saw in the PostgreSQL logs :

The following message consistently appeared once a day during the past week

2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18 UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible but visibility map bit is set in relation ""order"" page 5815453",,,,,"while scanning block 5815453 of relation ""public.order""",,,,"","autovacuum worker",,0

What specific condition or scenario is triggering this PostgreSQL error? Can it be classified
as a bug? If not, what’s a safe and efficient way to resolve it without relying on a dump
and restore, particularly for large, mission-critical tables over 200GB?

That is some kind of data corruption, perhaps caused by a bug, perhaps by
something else. The autovacuum run should fix that problem.

Yours,
Laurenz Albe

Tushar Takate

tushar11.takate@gmail.com

7 months ago

In reply to: Laurenz Albe (#2)

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

On Sat, Sep 13, 2025 at 2:40 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

Error/Issue :

vacuumdb: error: processing of database "live_order_us_db" failed:

ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11

02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""

2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"

127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03
UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""","VACUUM (VERBOSE, ANALYZE)
public.order;",,,"vacuumdb","client backend",,-5528190995457849841

That is probably caused by a PostgreSQL bug; you can get rid of the message

In which version can we expect the fix for it? Also, can you please help to
understand which specific condition or scenario is triggering this
PostgreSQL error and skipping to freeze xmin?

by creating the "pg_surgery" extension and running

SELECT heap_force_freeze('public.order'::regclass,
'{(5821149,5)}'::tid[]);

I agree we can run pg_surgery , but the question is how safe it is to run
for large and mission-critical tables over 200GB.
From pg_surgery doc <https://www.postgresql.org/docs/current/pgsurgery.html>
: *These functions are unsafe by design and using them may corrupt (or
further corrupt) your database*

One more thing/observation we saw in the PostgreSQL logs :

The following message consistently appeared once a day during the past

week

2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10

23:31:18 UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible
but visibility map bit is set in relation ""order"" page
5815453",,,,,"while scanning block 5815453 of relation
""public.order""",,,,"","autovacuum worker",,0

What specific condition or scenario is triggering this PostgreSQL

error? Can it be classified

as a bug? If not, what’s a safe and efficient way to resolve it without

relying on a dump

and restore, particularly for large, mission-critical tables over 200GB?

That is some kind of data corruption, perhaps caused by a bug, perhaps by
something else. The autovacuum run should fix that problem.

This is something supporting data I have provided, before the issue, the
above WARNING was seen in db-logs for the same table.

Show quoted text

Yours,
Laurenz Albe

Laurenz Albe

laurenz.albe@cybertec.at

7 months ago

In reply to: Tushar Takate (#3)

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

On Sat, 2025-09-13 at 06:40 +0530, Tushar Takate wrote:

On Sat, Sep 13, 2025 at 2:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

Error/Issue :

vacuumdb: error: processing of database "live_order_us_db" failed: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""
2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03 UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client backend",,-5528190995457849841

That is probably caused by a PostgreSQL bug; you can get rid of the message

In which version can we expect the fix for it? Also, can you please help to understand
which specific condition or scenario is triggering this PostgreSQL error and skipping
to freeze xmin?

I *believe* there must be a bug that causes that problem, because I have seen that error
reported often enough that I don't think it can be attributed to hardware errors.
Unfortunately, I think that nobody knows how it happens, so we cannot fix it.

by creating the "pg_surgery" extension and running

SELECT heap_force_freeze('public.order'::regclass, '{(5821149,5)}'::tid[]);

I agree we can run pg_surgery , but the question is how safe it is to run for large and mission-critical tables over 200GB.
From pg_surgery doc: These functions are unsafe by design and using them may corrupt (or further corrupt) your database

It is dangerous, and that has nothing to do with the size of the table.
If you do the wrong thing with that knife, you can cause more problems
than you fix.

One more thing/observation we saw in the PostgreSQL logs :

The following message consistently appeared once a day during the past week

2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18 UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible but visibility map bit is set in relation ""order"" page 5815453",,,,,"while scanning block 5815453 of relation ""public.order""",,,,"","autovacuum worker",,0

What specific condition or scenario is triggering this PostgreSQL error? Can it be classified
as a bug? If not, what’s a safe and efficient way to resolve it without relying on a dump
and restore, particularly for large, mission-critical tables over 200GB?

That is some kind of data corruption, perhaps caused by a bug, perhaps by
something else. The autovacuum run should fix that problem.

This is something supporting data I have provided, before the issue, the above WARNING was seen in db-logs for the same table.

Sorry, I cannot parse that sentence.

Yours,
Laurenz Albe

Pavel Stehule

pavel.stehule@gmail.com

7 months ago

In reply to: Laurenz Albe (#4)

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

so 13. 9. 2025 v 3:24 odesílatel Laurenz Albe <laurenz.albe@cybertec.at>
napsal:

On Sat, 2025-09-13 at 06:40 +0530, Tushar Takate wrote:

On Sat, Sep 13, 2025 at 2:40 AM Laurenz Albe <laurenz.albe@cybertec.at>

wrote:

On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

Error/Issue :

vacuumdb: error: processing of database "live_order_us_db" failed:

ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11

02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""

2025-09-11 02:40:50.361

UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11
02:30:03 UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""","VACUUM (VERBOSE, ANALYZE)
public.order;",,,"vacuumdb","client backend",,-5528190995457849841

That is probably caused by a PostgreSQL bug; you can get rid of the

message

In which version can we expect the fix for it? Also, can you please help

to understand

which specific condition or scenario is triggering this PostgreSQL error

and skipping

to freeze xmin?

I *believe* there must be a bug that causes that problem, because I have
seen that error
reported often enough that I don't think it can be attributed to hardware
errors.
Unfortunately, I think that nobody knows how it happens, so we cannot fix
it.

by creating the "pg_surgery" extension and running

SELECT heap_force_freeze('public.order'::regclass,

'{(5821149,5)}'::tid[]);

I agree we can run pg_surgery , but the question is how safe it is to

run for large and mission-critical tables over 200GB.

From pg_surgery doc: These functions are unsafe by design and using them

may corrupt (or further corrupt) your database

It is dangerous, and that has nothing to do with the size of the table.
If you do the wrong thing with that knife, you can cause more problems
than you fix.

I got this error after using pg_repack and following the upgrade. So maybe
there can be more factors.

Regards

Pavel

surya poondla

suryapoondla4@gmail.com

6 months ago

In reply to: Tushar Takate (#1)

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

Hi All,

I and Tushar work in the same org.

I was able to look at page 5821149 and gathered the below information

live_order_us_db=# select * from pg_visibility('public.order', 5821149);
all_visible | all_frozen | pd_all_visible
-------------+------------+---------------
t | t | f
(1 row)

live_order_us_db=# SELECT t_ctid, t_xmin, t_xmax, t_infomask,
t_infomask2 FROM heap_page_items(get_raw_page('public.order',
5821149)) WHERE lp = 5;
t_ctid | t_xmin | t_xmax | t_infomask | t_infomask2
-------------+----------------+-------------+----------------+-------------
(5821149,5) | 4133102167 | 0 | 2306 | 8
(1 row)

live_order_us_db=# SELECT t_ctid, raw_flags, combined_flags
FROM heap_page_items(get_raw_page('order', 5821149)),
LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2)
WHERE t_infomask IS NOT NULL OR t_infomask2 IS NOT NULL;
t_ctid | raw_flags
| combined_flags
--------------+-----------------------------------------------------------------------------------------------------------------+--------------------
(5821149,1) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}.
| {HEAP_XMIN_FROZEN}
(5821149,2) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(5821149,3) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(5821149,4) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,5) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}
| {}
(5821149,7) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(5821149,9) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(5821149,10) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,11) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(5821149,13) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,14) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(5821149,15) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,16) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,17) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,18) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,19) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,20) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,21) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED}
| {HEAP_XMIN_FROZEN}
(5821149,22) |
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
| {HEAP_XMIN_FROZEN}
(19 rows)

As we can see for t_ctid (5821149, 5) the t_infomask decimal value is 2306
(or 0x902 in hex) which corresponds to the raw flags as
{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} and this
particular tuple has somehow missed the freezing action but the other
tuples in the same page with different offsets (lp like 3, 4, 14, 15 ...)
are frozen.
Also the all_visible, all_frozen flags for page 5821149 show as true.We are
wondering what caused the (5821149, 5) tuple not get frozen and more
importantly how did the relfrozenxid (4151440783) get an value bigger than
the t_xmin (4133102167)?

- Surya

On Tue, Oct 28, 2025 at 2:17 PM Tushar Takate <tushar11.takate@gmail.com>
wrote:

Show quoted text

Hi Team,

I was reviewing a couple of community threads in pgsql-bugs and
pgsql-general, however, I was unable to determine whether this is a bug or
actual corruption.

*Details as below *

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

*Error/Issue : *
vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:
found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11
02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""
2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"
127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03
UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before
relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of
relation ""public.order""","VACUUM (VERBOSE, ANALYZE)
public.order;",,,"vacuumdb","client backend",,-5528190995457849841

*One more thing/observation we saw in the PostgreSQL logs :*

The following message consistently appeared once a day during the past week

2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10
23:31:18 UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible
but visibility map bit is set in relation ""order"" page
5815453",,,,,"while scanning block 5815453 of relation
""public.order""",,,,"","autovacuum worker",,0

What specific condition or scenario is triggering this PostgreSQL
error? Can it be classified as a bug? If not, what’s a safe
and efficient way to resolve it without relying on a dump and restore,
particularly for large, mission-critical tables over 200GB?

-
Thanks & Regards,

Tushar.
LinkedIn : Tushar <https://www.linkedin.com/in/tushar-t-93660867/>
My-Blogs : Tushar Blogspot <http://tushar-postgresql.blogspot.in/>