reporting TID/table with corruption error

Started by Alvaro Herreraover 4 years ago12 messages
#1Alvaro Herrera
alvherre@alvh.no-ip.org
1 attachment(s)

A customer recently hit this error message:

ERROR: t_xmin is uncommitted in tuple to be updated

This was not very informative, so without any clues, we had to let it
go. But it did occur to me that we can improve this message so that we
know details such as the TID and the relation that caused the issue, so
that if it ever occurs again we can at least look at the WAL stream for
anything affecting the tuple, maybe it'd help to understand the problem.

So I propose the following patch.

--
Álvaro Herrera

Attachments:

report-tuple.patchtext/x-diff; charset=utf-8Download
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index beb8f20708..9befe012a9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -424,7 +424,11 @@ tuple_lock_retry:
 					if (TransactionIdIsValid(SnapshotDirty.xmin))
 						ereport(ERROR,
 								(errcode(ERRCODE_DATA_CORRUPTED),
-								 errmsg_internal("t_xmin is uncommitted in tuple to be updated")));
+								 errmsg_internal("t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table \"%s\"",
+												 SnapshotDirty.xmin,
+												 ItemPointerGetBlockNumber(&tuple->t_self),
+												 ItemPointerGetOffsetNumber(&tuple->t_self),
+												 RelationGetRelationName(relation))));
 
 					/*
 					 * If tuple is being updated by other transaction then we
In reply to: Alvaro Herrera (#1)
Re: reporting TID/table with corruption error

On Thu, Aug 19, 2021 at 9:38 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

A customer recently hit this error message:

ERROR: t_xmin is uncommitted in tuple to be updated

This was not very informative, so without any clues, we had to let it
go. But it did occur to me that we can improve this message so that we
know details such as the TID and the relation that caused the issue, so
that if it ever occurs again we can at least look at the WAL stream for
anything affecting the tuple, maybe it'd help to understand the problem.

I think that this is a very good idea. Ideally this stuff would be
more standardized.

--
Peter Geoghegan

#3Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#2)
Re: reporting TID/table with corruption error

19 авг. 2021 г., в 21:44, Peter Geoghegan <pg@bowt.ie> написал(а):

I think that this is a very good idea. Ideally this stuff would be
more standardized.

+1
It would be great to see relation, block, offset, xmin\xmax and, probably, flags whenever ERRCODE_DATA_CORRUPTED\ERRCODE_INDEX_CORRUPTED is used. Iif it's possible to extract this information, of cause. This is needed especially in amcheck functions.

Best regards, Andrey Borodin.

#4Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#1)
Re: reporting TID/table with corruption error

19 авг. 2021 г., в 21:37, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

A customer recently hit this error message:

ERROR: t_xmin is uncommitted in tuple to be updated

Sorry for one more message, it occurred to me only after sending previous one that I know this exact message.

We encountered this error twice:
1. When dealing with broken WAL-delta backups in WAL-G (a kind of fast incremental backup, the bug was fixed back in 2019) The page simply was of an old version.
2. As a result of broken page cache in patched linux kernel (clog was fsynced succesfully, but the page fsync was lost before restart)

Best regards, Andrey Borodin.

In reply to: Peter Geoghegan (#2)
Re: reporting TID/table with corruption error

Peter Geoghegan <pg@bowt.ie> writes:

On Thu, Aug 19, 2021 at 9:38 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

A customer recently hit this error message:

ERROR: t_xmin is uncommitted in tuple to be updated

This was not very informative, so without any clues, we had to let it
go. But it did occur to me that we can improve this message so that we
know details such as the TID and the relation that caused the issue, so
that if it ever occurs again we can at least look at the WAL stream for
anything affecting the tuple, maybe it'd help to understand the problem.

I think that this is a very good idea. Ideally this stuff would be
more standardized.

There are various functions for adding auxilliary fields to the error
report, e.g. errtable() for the schema and table names, but that only
seems to be reported to the client, not logged on the server.

Should we be logging auxiliary error fields? If not by default, maybe
by a higher log_error_verbosity level (possibly a new one between
default and verbose).

- ilmari

#6Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andrey Borodin (#3)
Re: reporting TID/table with corruption error

On Aug 19, 2021, at 10:17 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

It would be great to see relation, block, offset, xmin\xmax and, probably, flags whenever ERRCODE_DATA_CORRUPTED\ERRCODE_INDEX_CORRUPTED is used. Iif it's possible to extract this information, of cause. This is needed especially in amcheck functions.

blockno, offnum and attnum are already included in every result for amcheck functions over heap relations, though attnum may be null if the corruption is not specific to any particular column.

xmin, xmax and various flags may occur in the corruption message if they are relevant, but they are not always present.

There was some disagreement during the development of verify_heapam on this point. We went with the idea that the user could find and inspect the corrupt data with another tool if they had the (blockno, offnum, attnum) information. As such, it wasn't necessary to include all the data in the output.

It shouldn't be too complicated to have a second interface that returns all of the 23 byte main table tuple header information and also the 23 byte toast tuple header (when relevant) along with the corruption message. The guts of the corruption check would be shared between the two interfaces. I haven't tried writing a patch yet, but it seems the patch shouldn't be terribly complicated.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Mark Dilger (#6)
Re: reporting TID/table with corruption error

On Thu, Aug 19, 2021 at 10:49 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

There was some disagreement during the development of verify_heapam on this point. We went with the idea that the user could find and inspect the corrupt data with another tool if they had the (blockno, offnum, attnum) information. As such, it wasn't necessary to include all the data in the output.

I find that kind of dubious. By definition you're never supposed to
see any verify_heapam errors. If you do end up seeing one, it
should be exceptional -- something that you'll learn from. High
verbosity makes a lot of sense here.

--
Peter Geoghegan

#8Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Peter Geoghegan (#7)
Re: reporting TID/table with corruption error

On Aug 19, 2021, at 10:57 AM, Peter Geoghegan <pg@bowt.ie> wrote:

High
verbosity makes a lot of sense here.

Works for me. We could create another function, "verify_heapam_full" perhaps, that returns additional columns matching those from pageinspect's heap_page_items():

OUT lp smallint,
OUT lp_off smallint,
OUT lp_flags smallint,
OUT lp_len smallint,
OUT t_xmin xid,
OUT t_xmax xid,
OUT t_field3 int4,
OUT t_ctid tid,
OUT t_infomask2 integer,
OUT t_infomask integer,
OUT t_hoff smallint,
OUT t_bits text,
OUT t_oid oid,
OUT t_data bytea

Should it also return the full page? That would be quite verbose (an extra 8k per row), but it could be fed into any of pageinspect's functions for further analysis.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Mark Dilger (#8)
Re: reporting TID/table with corruption error

19 авг. 2021 г., в 23:19, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):

On Aug 19, 2021, at 10:57 AM, Peter Geoghegan <pg@bowt.ie> wrote:

High
verbosity makes a lot of sense here.

Works for me. We could create another function, "verify_heapam_full" perhaps, that returns additional columns matching those from pageinspect's heap_page_items():

Currently I'm mostly interested in index functions to investigate CIC bug.
I see 4 different cases for corruption checks:
1. Developer tackling a bug
2. Backup smoke test
3. DBA recovering corrupted data
4. Running system detects anomaly

In case 1 you want to find corruption and trace back events that lead to it. You need all the bits that can connect current state with events in the past.

In case 2 you want succinct check, that in case of fire provides initial information for case 3. Ideally you want a check that is symbiosis of "all indexed" check and heap check. Meanwhile, it's preferred that you can share heap scan between many index checks.

In case 3 you want to collect all corrupted data (find everything with same xmin\xmax, or on the same page, or with near xmin\xmax). In this case returning heap page right away would be quite useful.
Sometimes you want to create backup copy of the page to try some surgery. (create table backup_pages as select from verify_heapam_full())

In case 4 you want to alarm DBA and provide all the necessary information to start 3. Adding standardised corruption info to all ERRCODE_DATA_CORRUPTED\ERRCODE_INDEX_CORRUPTED would suffice. Also, when monitoring wakes you at night you want to know:
- How many tuples are corrupted?
- How long ago data was corrupted? Is corrupted data within PITR window yet?
- Where to seek a manual for recovery?
But I don't think we can have this logged in case of "ERROR: t_xmin is uncommitted in tuple to be updated"

Thanks!

Best regards, Andrey Borodin.

#10Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#1)
Re: reporting TID/table with corruption error

Pushed to all branches, thanks.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

#11Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#1)
Re: reporting TID/table with corruption error

19 авг. 2021 г., в 21:37, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

A customer recently hit this error message:

ERROR: t_xmin is uncommitted in tuple to be updated

Hi!

Currently I'm observing this on one of our production clusters. The problem occurs at random points in time, seems to be covered by retries on client's side and so far did not inflict any harm (except woken engineers).

Few facts:
0. PostgreSQL 12.9 (with some unrelated patches)
1. amcheck\heapcheck\pg_visibility never suspected the cluster and remain silent
2. I observe the problem ~once a day
3. The tuple seems to be updated in a high-contention concurrency trigger function, autovacuum keeks in ~20-30 seconds after the message in logs

[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR: t_xmin 696079792 is uncommitted in tuple (1419011,109) to be updated in table "????s_statistics"
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:CONTEXT: SQL statement "UPDATE ????_????s.????s_statistics os
SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()),
last_????_found_ts = NOW(),
num_????s = os.num_????s + 1
WHERE ????_id = NEW.????_id"
PL/pgSQL function statistics__update_from_new_????() line 3 at SQL statement
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:STATEMENT:
INSERT INTO ????_????s.????s_????s AS ????s

4. t_xmin is relatevely new, not ancient
5. pageinspect shows dead tuple after some time
6. no suspicious activity in logs nearby
7. vacuum (disable_page_skipping) and repack of indexes did not change anything

I suspect this can be relatively new concurrency stuff. At least I never saw this before on clusters with clean amcheck and heapcheck results.

Alvaro, did you observe this on binaries from August 13 minor release or older?

Thanks!

Best regards, Andrey Borodin.

#12Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andrey Borodin (#11)
Re: reporting TID/table with corruption error

On 2022-Jan-10, Andrey Borodin wrote:

3. The tuple seems to be updated in a high-contention concurrency
trigger function, autovacuum keeks in ~20-30 seconds after the message
in logs

Hmm, I bet this is related.

[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR: t_xmin 696079792 is uncommitted in tuple (1419011,109) to be updated in table "????s_statistics"
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:CONTEXT: SQL statement "UPDATE ????_????s.????s_statistics os
SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()),
last_????_found_ts = NOW(),
num_????s = os.num_????s + 1
WHERE ????_id = NEW.????_id"
PL/pgSQL function statistics__update_from_new_????() line 3 at SQL statement
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:STATEMENT:
INSERT INTO ????_????s.????s_????s AS ????s

4. t_xmin is relatevely new, not ancient
5. pageinspect shows dead tuple after some time
6. no suspicious activity in logs nearby
7. vacuum (disable_page_skipping) and repack of indexes did not change anything

I suspect this can be relatively new concurrency stuff. At least I never saw this before on clusters with clean amcheck and heapcheck results.

Ah. I've been thinking that it'd be some very old tuple that is in
trouble, but that seems to be proven false. I think we should examine
the affected tuples more closely while they're in the state that causes
the problem. Can you set up things so that pageinspect's
heap_page_items() is run on the broken page, before the problem
disappears? Maybe extract the page number from the log line, have
get_raw_page() store the page in a separate table, so that we can run
heap_page_items() at leisure later. I would also grep through
pg_waldump output to see what changes have been done to that tuple.

Maybe the reason the problem fixes itself is that something else deletes
the tuple.

Alvaro, did you observe this on binaries from August 13 minor release or older?

Well, the only reports I have of this problem are with the original
error message that didn't give any clues at to what the problem was or
where to look for it, so I don't know if the xmin was recent or not.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/