PL/pgSQL question about EXCEPTION clause & corrupt records

Started by Nick Rendersabout 6 years ago8 messagesgeneral
Jump to latest
#1Nick Renders
postgres@arcict.com

Hello,

We recently suffered a database crash which resulted in some corrupt
records.

I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.

For instance, if I run the following statement in pgAdmin:

SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513

I get the following message:

ERROR: missing chunk number 0 for toast value 8289525 in
pg_toast_5572299

So, as a test, I created a function that would just retrieve that one
record:

DECLARE
rcontent f_gsxws_transaction%ROWTYPE;
BEGIN
SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number =
762513;
RETURN rcontent;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Record 762513 is corrupt';
END;

Now, when I run this function, I have noticed two things:

1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.

2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.

Is it possible to check for these kind of errors with a PL script? Or is
there perhaps a better way to check for corrupt records in a database?

Best regards,

Nick Renders

#2Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Nick Renders (#1)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

On 14/2/20 2:39 μ.μ., Nick Renders wrote:

Hello,

We recently suffered a database crash which resulted in some corrupt records.

I thought I would write a little PL script that would loop through all the data and report any inconsistencies. However, I can't get it to work properly.

For instance, if I run the following statement in pgAdmin:

    SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513

I get the following message:

    ERROR:  missing chunk number 0 for toast value 8289525 in pg_toast_5572299

So, as a test, I created a function that would just retrieve that one record:

    DECLARE
        rcontent f_gsxws_transaction%ROWTYPE;
    BEGIN
        SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number = 762513;
        RETURN rcontent;
    EXCEPTION WHEN OTHERS THEN
        RAISE NOTICE 'Record 762513 is corrupt';
    END;

Now, when I run this function, I have noticed two things:

1) The function has no problem executing the SELECT statement. It is only when "rcontents" is returned, that the function fails. This is a problem, because the ultimate goal is to loop through all
records and only return/alert something in case of an error.

2) The function never enters the EXCEPTION clause. Instead, when it hits the RETURN command, it breaks and shows the same message as in pgAdmin: missing chunk number 0 for toast value 8289525 in
pg_toast_5572299.

Does the table have any PKs or UKs?
do something like

FOR vid IN SELECT <somepkid> FROM f_gsxws_transaction where gwta_number = 762513 ORDER BY <somepkid> LOOP
    RAISE NOTICE 'examining row with <somepkid>= %',vid;
    select * into rcontent FROM f_gsxws_transaction where <somepkid> = vid;
    RAISE NOTICE 'content of row <somepkid>= % , is % ',vid,rcontent;
END LOOP;

Is it possible to check for these kind of errors with a PL script? Or is there perhaps a better way to check for corrupt records in a database?

Best regards,

Nick Renders

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nick Renders (#1)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

"Nick Renders" <postgres@arcict.com> writes:

I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.

I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO. It just stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.

The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block, perhaps
along the lines of "select md5(mycolumn) into local_variable from..."

A close reading of

https://www.postgresql.org/docs/current/storage-toast.html

would probably help you understand what's happening here.

regards, tom lane

#4Jeremy Schneider
schneider@ardentperf.com
In reply to: Nick Renders (#1)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote:

I get the following message:

ERROR: missing chunk number 0 for toast value 8289525 in pg_toast_5572299

What version of PostgreSQL are you running? I’ve seen this a number of times the past couple years; curious if the lurking bug is still observed in latest versions.

-Jeremy

Sent from my TI-83

#5Nick Renders
postgres@arcict.com
In reply to: Tom Lane (#3)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

The problem is that I don't know which column is corrupt. But I found a
solution: by simply copying the record into another variable, the values
are parsed and the TOAST errors are thrown.

In case anyone's interested, here's my code, based on an example from
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html

DO $f$
DECLARE
rContent1 record;
rContent2 record;
iCounter integer DEFAULT 1;
iValue integer;
pTableName varchar := 'f_gsxws_transaction';
pFieldName varchar := 'gwta_number';
BEGIN
FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' ||
pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
BEGIN
EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' ||
pFieldName || ' = $1'
INTO rContent1
USING iValue;
rContent2 := rContent1;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName,
iValue;
END;
IF iCounter % 100000 = 0 THEN
RAISE NOTICE '% % records checked', iCounter, pTableName;
END IF;
iCounter := iCounter+1;
END LOOP;
END;
$f$;

Cheers,

Nick

On 14 Feb 2020, at 16:14, Tom Lane wrote:

Show quoted text

"Nick Renders" <postgres@arcict.com> writes:

I thought I would write a little PL script that would loop through
all
the data and report any inconsistencies. However, I can't get it to
work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it
hits
the RETURN command, it breaks and shows the same message as in
pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.

I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO. It just
stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.

The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block,
perhaps
along the lines of "select md5(mycolumn) into local_variable from..."

A close reading of

https://www.postgresql.org/docs/current/storage-toast.html

would probably help you understand what's happening here.

regards, tom lane

#6Nick Renders
postgres@arcict.com
In reply to: Jeremy Schneider (#4)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

Hi Jeremy,

This happend on PostgreSQL v9.6 which crashed 2 weeks ago.
Since then we have upgraded and restored our server, but my example is
from the older, corrupt database.

Nick

On 15 Feb 2020, at 5:30, Jeremy Schneider wrote:

Show quoted text

On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote:

I get the following message:

ERROR: missing chunk number 0 for toast value 8289525 in
pg_toast_5572299

What version of PostgreSQL are you running? I’ve seen this a number
of times the past couple years; curious if the lurking bug is still
observed in latest versions.

-Jeremy

Sent from my TI-83

#7Jeremy Schneider
schneider@ardentperf.com
In reply to: Nick Renders (#5)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

FWIW, Bertrand blogged an even faster way to do this about a month ago - using pageinspect and processing blocks instead of rows

https://bdrouvot.wordpress.com/2020/01/18/retrieve-postgresql-variable-length-storage-information-thanks-to-pageinspect/

-J

Sent from my TI-83

Show quoted text

On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:

The problem is that I don't know which column is corrupt. But I found a solution: by simply copying the record into another variable, the values are parsed and the TOAST errors are thrown.

In case anyone's interested, here's my code, based on an example from http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html

DO $f$
DECLARE
rContent1 record;
rContent2 record;
iCounter integer DEFAULT 1;
iValue integer;
pTableName varchar := 'f_gsxws_transaction';
pFieldName varchar := 'gwta_number';
BEGIN
FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' || pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
BEGIN
EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' || pFieldName || ' = $1'
INTO rContent1
USING iValue;
rContent2 := rContent1;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName, iValue;
END;
IF iCounter % 100000 = 0 THEN
RAISE NOTICE '% % records checked', iCounter, pTableName;
END IF;
iCounter := iCounter+1;
END LOOP;
END;
$f$;

Cheers,

Nick

On 14 Feb 2020, at 16:14, Tom Lane wrote:

"Nick Renders" <postgres@arcict.com> writes:

I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.

I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO. It just stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.

The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block, perhaps
along the lines of "select md5(mycolumn) into local_variable from..."

A close reading of

https://www.postgresql.org/docs/current/storage-toast.html

would probably help you understand what's happening here.

regards, tom lane

#8Jeremy Schneider
schneider@ardentperf.com
In reply to: Jeremy Schneider (#7)
Re: PL/pgSQL question about EXCEPTION clause & corrupt records

On 2/17/20 08:23, Jeremy Schneider wrote:

FWIW, Bertrand blogged an even faster way to do this about a month ago -
using pageinspect and processing blocks instead of rows

https://bdrouvot.wordpress.com/2020/01/18/retrieve-postgresql-variable-length-storage-information-thanks-to-pageinspect/

On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:

The problem is that I don't know which column is corrupt. But I found
a solution: by simply copying the record into another variable, the
values are parsed and the TOAST errors are thrown.

In case anyone's interested, here's my code, based on an example from
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html

Apologies - sent that slightly rushed email while I was taking the bus
to the office this morning and linked the wrong blog post :D

Here's the right one, showing a much faster way to identify which tuple
links to a bad toast row:

https://bdrouvot.wordpress.com/2020/01/04/get-toast-chunk_id-from-the-user-table-tuples-or-from-the-toast-index-thanks-to-pageinspect/

-Jeremy

--
http://about.me/jeremy_schneider