ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Started by ADSJ (Adam Sjøgren)almost 9 years ago58 messagesgeneral
Jump to latest
#1ADSJ (Adam Sjøgren)
adsj@novozymes.com

Our database has started reporting errors like this:

2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630

(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.

These went away, but the next day we got similar errors from another
table:

2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100

(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).

The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:

2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100

and today the logs have 65 lines, ending with these:

2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100

The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).

We are updating rows in the database a lot/continuously.

There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.

We are running PostgreSQL 9.3.14 currently.

The only thing I could see in the release notes since 9.3.14 that might
be related is this:

"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"

Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?

We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.

We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.

Best regards,

Adam

--
"Lägg ditt liv i min hand Adam Sjøgren
Sälj din själ till ett band" adsj@novozymes.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: ADSJ (Adam Sjøgren) (#1)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:

Our database has started reporting errors like this:

2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630

(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.

These went away, but the next day we got similar errors from another
table:

2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100

(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).

The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:

2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100

and today the logs have 65 lines, ending with these:

2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100

First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;

also VACUUM the above tables.

You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.

The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).

We are updating rows in the database a lot/continuously.

There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.

We are running PostgreSQL 9.3.14 currently.

The only thing I could see in the release notes since 9.3.14 that might
be related is this:

"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"

Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?

We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.

We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.

Best regards,

Adam

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Harry Ambrose
harry.ambrose@gmail.com
In reply to: Achilleas Mantzios (#2)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi,

I too have been experiencing this with a busy PostgreSQL instance.

I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are:

BEGIN;
CREATE TABLE x (id BIGSERIAL PRIMARY KEY, payload1 VARCHAR, payload2 VARCHAR, payload3 VARCHAR, payload4 BIGINT, payload5 BIGINT);
/* Repeat until 2,000,000 rows are inserted */
INSERT INTO x (id, payload1, payload2, payload3, payload4, payload5) VALUES (random values of varying length/size to force random toast usage);
COMMIT;

VACUUM (ANALYZE, FULL);

BEGIN;
/* Repeat until all 2,000,000 rows are updated */
UPDATE x SET payload1 = , payload2 = , payload3 = , payload4 = , payload5 = ... again random values of varying length/size to force random toast usage
COMMIT;

VACCUM (ANALYZE, FULL);

The second vacuum causes an ERROR identical to that you are reporting below (unexpected chunk number n (expected n) for toast value...). However it may take up to ten attempts to replicate it.

Out of interest, are you using any tablespaces other than pg_default? I can only replicate the issue when using separately mounted tablespaces.

I have been investigating this quite extensively and everything I can find on the web suggests data corruption. However running the the following DO reports no errors and I can dump the database without issue.

DO $$
DECLARE

curid INT := 0;
vcontent RECORD;
badid BIGINT;

var1_sub VARCHAR;
var2_sub VARCHAR;
var3_sub VARCHAR;
var4_sub VARCHAR;
var5_sub VARCHAR;

BEGIN
FOR badid IN SELECT id FROM x
LOOP
curid = curid + 1;

IF curid % 100000 = 0
THEN
RAISE NOTICE '% rows inspected', curid;
END IF;

BEGIN
SELECT *
INTO vcontent
FROM x
WHERE rowid = badid;

var1_sub := SUBSTR(vcontent.var1,2000,5000);
var2_sub := SUBSTR(vcontent.var2,2000,5000);
var3_sub := SUBSTR(vcontent.var3,2000,5000);
var4_sub := SUBSTR(vcontent.var4::VARCHAR,2000,5000);
var5_sub := SUBSTR(vcontent.var5::VARCHAR,2000,5000);

EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Data for rowid % is corrupt', badid;
CONTINUE;
END;

END LOOP;
END;
$$;

I also found the following has been reported: /messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org

Best wishes,
Harry

Show quoted text

On 7 Jun 2017, at 15:22, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:

Our database has started reporting errors like this:

2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630

(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.

These went away, but the next day we got similar errors from another
table:

2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100

(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).

The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:

2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100

and today the logs have 65 lines, ending with these:

2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100

First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;

also VACUUM the above tables.

You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.

The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).

We are updating rows in the database a lot/continuously.

There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.

We are running PostgreSQL 9.3.14 currently.

The only thing I could see in the release notes since 9.3.14 that might
be related is this:

"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"

Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?

We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.

We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.

Best regards,

Adam

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general <http://www.postgresql.org/mailpref/pgsql-general&gt;

#4Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Harry Ambrose (#3)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

On 07/06/2017 17:49, Harry Ambrose wrote:

Hi,
Out of interest, are you using any tablespaces other than pg_default? I can only replicate the issue when using separately mounted tablespaces.

One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answer it is time to start blaming my hardware. Are those tablespaces on any cheap
SSD's ?

I have been investigating this quite extensively and everything I can find on the web suggests data corruption. However running the the following DO reports no errors and I can dump the database
without issue.

You don't use index when pg_dump . If only the index is corrupted you can get away with dump/reload (but for big DBs this is unrealistic)

I also found the following has been reported: /messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org

Best wishes,
Harry

On 7 Jun 2017, at 15:22, Achilleas Mantzios <achill@matrix.gatewaynet.com <mailto:achill@matrix.gatewaynet.com>> wrote:

On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:

Our database has started reporting errors like this:

2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630

(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.

These went away, but the next day we got similar errors from another
table:

2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100

(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).

The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:

2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100

and today the logs have 65 lines, ending with these:

2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100

First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;

also VACUUM the above tables.

You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.

The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).

We are updating rows in the database a lot/continuously.

There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.

We are running PostgreSQL 9.3.14 currently.

The only thing I could see in the release notes since 9.3.14 that might
be related is this:

"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"

Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?

We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.

We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.

Best regards,

Adam

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Harry Ambrose (#3)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Harry Ambrose <harry.ambrose@gmail.com> writes:

I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are:

This is a very interesting report, but you didn't actually provide a
reproducer, just a handwavy outline. If you submit a script that
makes this happen, we will most definitely look into it. But
people aren't going to be excited about trying to reverse-engineer
a test case out of a vague description.

I also found the following has been reported:
/messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org

That person never came back with a self-contained test case, either.

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Harry Ambrose
harry.ambrose@gmail.com
In reply to: Tom Lane (#5)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi,

Thanks for the responses.

"One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answer it is time to start blaming my hardware. Are those tablespaces on any cheap SSD's ?”

The tablespaces are not sat on SSD’s. Something I had also considered.

Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?

Best wishes,
Harry

On 7 Jun 2017, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Harry Ambrose <harry.ambrose@gmail.com> writes:

I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are:

This is a very interesting report, but you didn't actually provide a
reproducer, just a handwavy outline. If you submit a script that
makes this happen, we will most definitely look into it. But
people aren't going to be excited about trying to reverse-engineer
a test case out of a vague description.

I also found the following has been reported:
/messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org

That person never came back with a self-contained test case, either.

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Harry Ambrose (#6)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Harry Ambrose <harry.ambrose@gmail.com> writes:

Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?

If it's not enormous, just send it as an email attachment.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Harry Ambrose
harry.ambrose@gmail.com
In reply to: Tom Lane (#7)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi,

Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).

The jar accepts the following arguments:

$1 = host
$2 = database
$3 = username
$4 = password
$5 = port

It returns its logging to STDOUT. Please let me know if you require further
info.

Best wishes,
Harry

On 7 June 2017 at 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Harry Ambrose <harry.ambrose@gmail.com> writes:

Tom - I can provide a jar that I have been using to replicate the issue.

Whats the best transport method to send it over?

If it's not enormous, just send it as an email attachment.

regards, tom lane

Attachments:

Unsupported File Types Alert.txtapplication/octet-stream; name="Unsupported File Types Alert.txt"Download
#9Harry Ambrose
harry.ambrose@gmail.com
In reply to: Tom Lane (#7)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi,

Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).

The jar accepts the following arguments:

$1 = host
$2 = database
$3 = username
$4 = password
$5 = port

It returns its logging to STDOUT. Please let me know if you require further
info.

Best wishes,
Harry

On 7 June 2017 at 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Harry Ambrose <harry.ambrose@gmail.com> writes:

Tom - I can provide a jar that I have been using to replicate the issue.

Whats the best transport method to send it over?

If it's not enormous, just send it as an email attachment.

regards, tom lane

Attachments:

toast-corrupter-aio.txttext/plain; charset=x-binaryenc; name=toast-corrupter-aio.txtDownload+17-11
#10Harry Ambrose
harry.ambrose@gmail.com
In reply to: Tom Lane (#7)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi,

Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).

The jar accepts the following arguments:

$1 = host
$2 = database
$3 = username
$4 = password
$5 = port

It returns its logging to STDOUT. Please let me know if you require further
info.

Best wishes,
Harry

On 7 June 2017 at 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Harry Ambrose <harry.ambrose@gmail.com> writes:

Tom - I can provide a jar that I have been using to replicate the issue.

Whats the best transport method to send it over?

If it's not enormous, just send it as an email attachment.

regards, tom lane

Attachments:

Unsupported File Types Alert.txtapplication/octet-stream; name="Unsupported File Types Alert.txt"Download
#11ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: Harry Ambrose (#10)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Achilleas writes:

First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

They are our two big tables, containing the bulk of our data (one with
168M rows, the other with 320M rows).

They are constantly being updated, but if I can identify the affected
rows, I can restore a backup on another machine and cherry pick them
from there.

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;

also VACUUM the above tables.

Yes, but I'd like to know find out why it happens, because cleaning up
and having the corruption reoccur is not so fun.

You might want to write a function which iterates over the damaged
table's rows in order to identify the damaged row(s). And then do some
good update to create a new version.

Yes - we started by doing a quick pg_dump, but I guess we should switch
to something that can tell us exactly what rows hit the problem.

Anyone has a handy little script lying around?

Thanks for the response!

Adam

--
"Lägg ditt liv i min hand Adam Sjøgren
Sälj din själ till ett band" adsj@novozymes.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#11)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Harry writes:

The second vacuum causes an ERROR identical to that you are reporting
below (unexpected chunk number n (expected n) for toast value...).
However it may take up to ten attempts to replicate it.

Interesting.

Out of interest, are you using any tablespaces other than pg_default?
I can only replicate the issue when using separately mounted
tablespaces.

No, we are using pg_default only.

I hope your finding can be reproduced, it would be really interesting to
see.

Best regards,

Adam

--
"Lägg ditt liv i min hand Adam Sjøgren
Sälj din själ till ett band" adsj@novozymes.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Adrian Klaver
adrian.klaver@aklaver.com
In reply to: ADSJ (Adam Sjøgren) (#11)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

On 06/08/2017 08:13 AM, ADSJ (Adam Sjøgren) wrote:

Achilleas writes:

Anyone has a handy little script lying around?

http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html

Thanks for the response!

Adam

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Harry Ambrose (#9)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Harry Ambrose <harry.ambrose@gmail.com> writes:

Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).

Hmm, the output from this script reminds me quite a lot of one I was
sent in connection with bug #14444 awhile back:
/messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org
Was that a colleague of yours?

Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Harry Ambrose
harry.ambrose@gmail.com
In reply to: Tom Lane (#14)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi Tom,

Thanks for attempting to replicate the issue.

Anyway, the bad news is I couldn't reproduce the problem then and I can't

now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?

Further details about the environment that I can replicate on below:

- Non default postgresql.conf settings:
checkpoint_segments = 192
checkpoint_completion_target = 0.9
checkpoint_timeout = 5min
wal_keep_segments = 256
wal_writer_delay = 200ms
archive_mode = on
archive_command = 'rsync -e ssh -arv /wal/pg_xlog/%f postgres@<removed
hostname>:/wal/pg_xlog'
archive_timeout = 60
syslog_facility = 'LOCAL0'
log_statement = 'mod'
syslog_ident = 'postgres'
log_line_prefix = '%h %m %p %c %u %a %e '
log_timezone = 'GB'
track_activities = on
track_counts = on
datestyle = 'iso, mdy'
timezone = 'GB'
default_text_search_config = 'pg_catalog.english'
array_nulls = on
sql_inheritance = on
standard_conforming_strings = on
synchronize_seqscans = on
transform_null_equals = off
- Two node master/slave setup using streaming replication (without slots).
- CentOS 6.9 (2.6.32-696.el6.x86_64).
- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit.
- 64GiB RAM.
- AMD Opteron(TM) Processor 6238.
- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).
- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3
filesystem).
- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery
backed cache enabled.

Please let me know if you require further info.

Best wishes,
Harry

#16Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Harry Ambrose (#15)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

On 09/06/2017 14:44, Harry Ambrose wrote:

Hi Tom,

Thanks for attempting to replicate the issue.

Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?

Further details about the environment that I can replicate on below:

- Non default postgresql.conf settings:
checkpoint_segments = 192
checkpoint_completion_target = 0.9
checkpoint_timeout = 5min
wal_keep_segments = 256
wal_writer_delay = 200ms
archive_mode = on
archive_command = 'rsync -e ssh -arv /wal/pg_xlog/%f postgres@<removed hostname>:/wal/pg_xlog'
archive_timeout = 60
syslog_facility = 'LOCAL0'
log_statement = 'mod'
syslog_ident = 'postgres'
log_line_prefix = '%h %m %p %c %u %a %e '
log_timezone = 'GB'
track_activities = on
track_counts = on
datestyle = 'iso, mdy'
timezone = 'GB'
default_text_search_config = 'pg_catalog.english'
array_nulls = on
sql_inheritance = on
standard_conforming_strings = on
synchronize_seqscans = on
transform_null_equals = off
- Two node master/slave setup using streaming replication (without slots).
- CentOS 6.9 (2.6.32-696.el6.x86_64).
- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit.
- 64GiB RAM.
- AMD Opteron(TM) Processor 6238.
- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).
- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3 filesystem).
- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery backed cache enabled.

Maybe you could give some info on :
- your ext3 mkfs and mount options (journal, barriers, etc)
- your controller setup (battery should be working good and cache mode set to write back)
- your disks setup (write cache should be disabled)
- you should check your syslogs/messages for any errors related to storage
- is your RAM ECC? Did you run any memtest?
- is your CPU overheating ?
- have you experienced any crashes/freezes ?

Please let me know if you require further info.

Best wishes,
Harry

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

#17Harry Ambrose
harry.ambrose@gmail.com
In reply to: Achilleas Mantzios (#16)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Hi,

Maybe you could give some info on :

- your ext3 mkfs and mount options (journal, barriers, etc)

/etc/fstab details below:
LABEL=/var/lib/pgsql /var/lib/pgsql ext3 defaults
1 2
LABEL=/tablespace1 /tablespace1 ext3 defaults
1 2
LABEL=/tablespace2 /tablespace2 ext3 defaults
1 2
LABEL=/tablespace3 /tablespace3 ext3 defaults
1 2
pg_default:

Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file

Filesystem flags: signed_directory_hash

Default mount options: (none)

Filesystem state: clean

Errors behavior: Continue

Filesystem OS type: Linux

Inode count: 36634624

Block count: 146506767

Reserved block count: 7325338

Free blocks: 143785740

Free inodes: 36627866

First block: 0

Block size: 4096

Fragment size: 4096

Reserved GDT blocks: 989

Blocks per group: 32768

Fragments per group: 32768

Inodes per group: 8192

Inode blocks per group: 512

RAID stride: 64

RAID stripe width: 64

Filesystem created: Fri Aug 9 16:11:53 2013

Last mount time: Fri Apr 21 22:37:02 2017

Last write time: Fri Apr 21 22:37:02 2017

Mount count: 2

Maximum mount count: 100

Last checked: Thu Sep 15 18:52:43 2016

Check interval: 31536000 (12 months, 5 days)

Next check after: Fri Sep 15 18:52:43 2017

Reserved blocks uid: 0 (user root)

Reserved blocks gid: 0 (group root)

First inode: 11

Inode size: 256

Required extra isize: 28

Desired extra isize: 28

Journal inode: 8

Default directory hash: half_md4

Journal backup: inode blocks
tablespaces

Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file

Filesystem flags: signed_directory_hash

Default mount options: (none)

Filesystem state: clean

Errors behavior: Continue

Filesystem OS type: Linux

Inode count: 73261056

Block count: 293013543

Reserved block count: 14650677

Free blocks: 286208439

Free inodes: 73174728

First block: 0

Block size: 4096

Fragment size: 4096

Reserved GDT blocks: 954

Blocks per group: 32768

Fragments per group: 32768

Inodes per group: 8192

Inode blocks per group: 512

RAID stride: 64

RAID stripe width: 128

Filesystem created: Fri Aug 9 16:11:53 2013

Last mount time: Fri Apr 21 22:37:02 2017

Last write time: Fri Apr 21 22:37:02 2017

Mount count: 2

Maximum mount count: 100

Last checked: Thu Sep 15 18:52:43 2016

Check interval: 31536000 (12 months, 5 days)

Next check after: Fri Sep 15 18:52:43 2017

Reserved blocks uid: 0 (user root)

Reserved blocks gid: 0 (group root)

First inode: 11

Inode size: 256

Required extra isize: 28

Desired extra isize: 28

Journal inode: 8

Default directory hash: half_md4

Journal backup: inode blocks

- your controller setup (battery should be working good and cache mode set
to write back)

Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 2.0 GB
Total Cache Memory Available: 1.8 GB
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: False
SSD Caching Version: 1
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK

- your disks setup (write cache should be disabled)

Write cache is disabled, see above.

- you should check your syslogs/messages for any errors related to storage

No error messages found.

- is your RAM ECC? Did you run any memtest?

Yes, memory is ECC. No error messages found.

- is your CPU overheating ?

No overheating issues.

- have you experienced any crashes/freezes ?

No crashes/freezes experienced.

Best wishes,
Harry

#18Robert Lakes
robertl@propaas.com
In reply to: Tom Lane (#14)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Ha guys,
I am new to postgress and I am trying to write my first function to insert,
update or delete and trap errors as a result of the table not existing ,
the columns not exist or if any other error simply pass back the sqlstate
here's my code can you help
CREATE OR REPLACE FUNCTION listings_audit() RETURNS TRIGGER AS
$listings_audit$
BEGIN
IF (TG_OP = 'DELETE') THEN
IF (EXISTS (
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'schema_name'
AND c.relname = 'table_name'
AND c.relkind = 'r' -- only tables
)) THEN
INSERT INTO listings_changes
SELECT now(), 'DELETE', OLD.*;
RETURN OLD;
ELSE RAISE EXCEPTION 'Table does not exists';
END IF;
ELSIF (TG_OP = 'UPDATE') THEN
IF (EXISTS (
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'schema_name'
AND c.relname = 'table_name'
AND c.relkind = 'r' -- only tables
)) THEN
INSERT INTO listings_changes
SELECT now(), 'UPDATE', NEW.*;
RETURN NEW;
ELSE RAISE EXCEPTION 'Table does not exists';
END IF;
ELSEIF (TG_OP = 'INSERT') THEN

INSERT INTO listings_changes
SELECT now(), 'INSERT', NEW.*;
RETURN NEW;

END IF;
EXCEPTION
WHEN SQLSTATE '42611' THEN
RAISE EXCEPTION 'Columns do not match audit file does not match user
file';
WHEN SQLSTATE '42P16' THEN
RAISE EXCEPTION 'Table does not exists';
WHEN OTHERS THEN
RAISE EXCEPTION 'PostgresSQL error code that has occurred';
RETURN SQLSTATE;
END;
$listings_audit$ LANGUAGE plpgsql;

On Thu, Jun 8, 2017 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Harry Ambrose <harry.ambrose@gmail.com> writes:

Please find the jar attached (renamed with a .txt extension as I know

some

email services deem jars a security issue).

Hmm, the output from this script reminds me quite a lot of one I was
sent in connection with bug #14444 awhile back:
/messages/by-id/20161201165505.
4360.28203%40wrigleys.postgresql.org
Was that a colleague of yours?

Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Lakes (#18)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

Robert Lakes <robertl@propaas.com> writes:

I am new to postgress and I am trying to write my first function to insert,
update or delete and trap errors as a result of the table not existing ,
the columns not exist or if any other error simply pass back the sqlstate

Please do not hijack an existing thread to ask an unrelated question.
Start a new thread (ie "compose" don't "reply") and use an appropriate
subject line.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: ADSJ (Adam Sjøgren) (#1)
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

ADSJ (Adam Sj�gren) wrote:

Our database has started reporting errors like this:

2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630

Does the problem still reproduce if you revert commit
6c243f90ab6904f27fa990f1f3261e1d09a11853?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Harry Ambrose (#9)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#21)
#23Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom Lane (#22)
#24Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Harry Ambrose (#17)
#25ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: Alvaro Herrera (#20)
#26Harry Ambrose
harry.ambrose@gmail.com
In reply to: ADSJ (Adam Sjøgren) (#25)
#27Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Harry Ambrose (#26)
#28Harry Ambrose
harry.ambrose@gmail.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#29Harry Ambrose
harry.ambrose@gmail.com
In reply to: Achilleas Mantzios (#27)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Achilleas Mantzios (#27)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Harry Ambrose (#15)
#32Harry Ambrose
harry.ambrose@gmail.com
In reply to: Tom Lane (#31)
#33Harry Ambrose
harry.ambrose@gmail.com
In reply to: Harry Ambrose (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Harry Ambrose (#33)
#35ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#25)
#36Harry Ambrose
harry.ambrose@gmail.com
In reply to: ADSJ (Adam Sjøgren) (#35)
#37ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#35)
#38Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: ADSJ (Adam Sjøgren) (#37)
#39Harry Ambrose
harry.ambrose@gmail.com
In reply to: Achilleas Mantzios (#38)
#40Scott Marlowe
scott.marlowe@gmail.com
In reply to: ADSJ (Adam Sjøgren) (#37)
#41Harry Ambrose
harry.ambrose@gmail.com
In reply to: Scott Marlowe (#40)
#42Christoph Berg
myon@debian.org
In reply to: Harry Ambrose (#41)
#43ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#44Jorge Daniel
elgaita@hotmail.com
In reply to: ADSJ (Adam Sjøgren) (#43)
#45ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#46Michael Paquier
michael@paquier.xyz
In reply to: ADSJ (Adam Sjøgren) (#45)
#47ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: ADSJ (Adam Sjøgren) (#47)
#49Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#48)
#50ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#51ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: ADSJ (Adam Sjøgren) (#50)
#53ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#54ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: ADSJ (Adam Sjøgren) (#54)
#56Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#55)
#57Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Pavan Deolasee (#56)
#58ADSJ (Adam Sjøgren)
adsj@novozymes.com
In reply to: ADSJ (Adam Sjøgren) (#1)