PL/pgSQL question about EXCEPTION clause & corrupt records
Hello,
We recently suffered a database crash which resulted in some corrupt
records.
I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.
For instance, if I run the following statement in pgAdmin:
SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513
I get the following message:
ERROR: missing chunk number 0 for toast value 8289525 in
pg_toast_5572299
So, as a test, I created a function that would just retrieve that one
record:
DECLARE
rcontent f_gsxws_transaction%ROWTYPE;
BEGIN
SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number =
762513;
RETURN rcontent;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Record 762513 is corrupt';
END;
Now, when I run this function, I have noticed two things:
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.
Is it possible to check for these kind of errors with a PL script? Or is
there perhaps a better way to check for corrupt records in a database?
Best regards,
Nick Renders
On 14/2/20 2:39 μ.μ., Nick Renders wrote:
Hello,
We recently suffered a database crash which resulted in some corrupt records.
I thought I would write a little PL script that would loop through all the data and report any inconsistencies. However, I can't get it to work properly.
For instance, if I run the following statement in pgAdmin:
SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513
I get the following message:
ERROR: missing chunk number 0 for toast value 8289525 in pg_toast_5572299
So, as a test, I created a function that would just retrieve that one record:
DECLARE
rcontent f_gsxws_transaction%ROWTYPE;
BEGIN
SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number = 762513;
RETURN rcontent;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Record 762513 is corrupt';
END;Now, when I run this function, I have noticed two things:
1) The function has no problem executing the SELECT statement. It is only when "rcontents" is returned, that the function fails. This is a problem, because the ultimate goal is to loop through all
records and only return/alert something in case of an error.2) The function never enters the EXCEPTION clause. Instead, when it hits the RETURN command, it breaks and shows the same message as in pgAdmin: missing chunk number 0 for toast value 8289525 in
pg_toast_5572299.
Does the table have any PKs or UKs?
do something like
FOR vid IN SELECT <somepkid> FROM f_gsxws_transaction where gwta_number = 762513 ORDER BY <somepkid> LOOP
RAISE NOTICE 'examining row with <somepkid>= %',vid;
select * into rcontent FROM f_gsxws_transaction where <somepkid> = vid;
RAISE NOTICE 'content of row <somepkid>= % , is % ',vid,rcontent;
END LOOP;
Is it possible to check for these kind of errors with a PL script? Or is there perhaps a better way to check for corrupt records in a database?
Best regards,
Nick Renders
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
"Nick Renders" <postgres@arcict.com> writes:
I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.
I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO. It just stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.
The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block, perhaps
along the lines of "select md5(mycolumn) into local_variable from..."
A close reading of
https://www.postgresql.org/docs/current/storage-toast.html
would probably help you understand what's happening here.
regards, tom lane
On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote:
I get the following message:
ERROR: missing chunk number 0 for toast value 8289525 in pg_toast_5572299
What version of PostgreSQL are you running? I’ve seen this a number of times the past couple years; curious if the lurking bug is still observed in latest versions.
-Jeremy
Sent from my TI-83
The problem is that I don't know which column is corrupt. But I found a
solution: by simply copying the record into another variable, the values
are parsed and the TOAST errors are thrown.
In case anyone's interested, here's my code, based on an example from
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html
DO $f$
DECLARE
rContent1 record;
rContent2 record;
iCounter integer DEFAULT 1;
iValue integer;
pTableName varchar := 'f_gsxws_transaction';
pFieldName varchar := 'gwta_number';
BEGIN
FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' ||
pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
BEGIN
EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' ||
pFieldName || ' = $1'
INTO rContent1
USING iValue;
rContent2 := rContent1;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName,
iValue;
END;
IF iCounter % 100000 = 0 THEN
RAISE NOTICE '% % records checked', iCounter, pTableName;
END IF;
iCounter := iCounter+1;
END LOOP;
END;
$f$;
Cheers,
Nick
On 14 Feb 2020, at 16:14, Tom Lane wrote:
Show quoted text
"Nick Renders" <postgres@arcict.com> writes:
I thought I would write a little PL script that would loop through
all
the data and report any inconsistencies. However, I can't get it to
work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it
hits
the RETURN command, it breaks and shows the same message as in
pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO. It just
stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block,
perhaps
along the lines of "select md5(mycolumn) into local_variable from..."A close reading of
https://www.postgresql.org/docs/current/storage-toast.html
would probably help you understand what's happening here.
regards, tom lane
Hi Jeremy,
This happend on PostgreSQL v9.6 which crashed 2 weeks ago.
Since then we have upgraded and restored our server, but my example is
from the older, corrupt database.
Nick
On 15 Feb 2020, at 5:30, Jeremy Schneider wrote:
Show quoted text
On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote:
I get the following message:
ERROR: missing chunk number 0 for toast value 8289525 in
pg_toast_5572299What version of PostgreSQL are you running? I’ve seen this a number
of times the past couple years; curious if the lurking bug is still
observed in latest versions.-Jeremy
Sent from my TI-83
FWIW, Bertrand blogged an even faster way to do this about a month ago - using pageinspect and processing blocks instead of rows
-J
Sent from my TI-83
Show quoted text
On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:
The problem is that I don't know which column is corrupt. But I found a solution: by simply copying the record into another variable, the values are parsed and the TOAST errors are thrown.
In case anyone's interested, here's my code, based on an example from http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html
DO $f$
DECLARE
rContent1 record;
rContent2 record;
iCounter integer DEFAULT 1;
iValue integer;
pTableName varchar := 'f_gsxws_transaction';
pFieldName varchar := 'gwta_number';
BEGIN
FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' || pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
BEGIN
EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' || pFieldName || ' = $1'
INTO rContent1
USING iValue;
rContent2 := rContent1;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName, iValue;
END;
IF iCounter % 100000 = 0 THEN
RAISE NOTICE '% % records checked', iCounter, pTableName;
END IF;
iCounter := iCounter+1;
END LOOP;
END;
$f$;Cheers,
Nick
On 14 Feb 2020, at 16:14, Tom Lane wrote:
"Nick Renders" <postgres@arcict.com> writes:
I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO. It just stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block, perhaps
along the lines of "select md5(mycolumn) into local_variable from..."A close reading of
https://www.postgresql.org/docs/current/storage-toast.html
would probably help you understand what's happening here.
regards, tom lane
On 2/17/20 08:23, Jeremy Schneider wrote:
FWIW, Bertrand blogged an even faster way to do this about a month ago -
using pageinspect and processing blocks instead of rowsOn Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:
The problem is that I don't know which column is corrupt. But I found
a solution: by simply copying the record into another variable, the
values are parsed and the TOAST errors are thrown.In case anyone's interested, here's my code, based on an example from
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html
Apologies - sent that slightly rushed email while I was taking the bus
to the office this morning and linked the wrong blog post :D
Here's the right one, showing a much faster way to identify which tuple
links to a bad toast row:
-Jeremy