missing chunk number 0 for toast value 25693266 in pg_toast_25497233

Started by Bryan Murphyalmost 16 years ago3 messagesgeneral
Jump to latest
#1Bryan Murphy
bmurphy1976@gmail.com

I'm running into this issue again:

psql --version
psql (PostgreSQL) 8.3.7

COPY items_extended TO '/dev/null';
ERROR: missing chunk number 0 for toast value 25693266 in pg_toast_25497233

Unfortunately, I do not know where these are coming from and I cannot
replicate the data in at least one of my tables (which has 20 million
records) because of this. I've already found 10 bad records. There
are more.

I have four tables with 20 million records (and a fifth which has even
more) that may have this problem. Right now, checking every record
one by one is going to take *DAYS* to complete, and I'm in the middle
of adding additional nodes to our cluster because we are already
stretched to the limit.

I have a few options, such as trying to check batches of records and
spinning up multiple checkers in parallel looking at different subsets
of the data on wal shipped spares (assuming the wal shipped spares
would suffer the same problem, which is a big assumption), but this is
a lot of effort to get going.

Help!

Thanks,
Bryan

#2Magnus Hagander
magnus@hagander.net
In reply to: Bryan Murphy (#1)
Re: missing chunk number 0 for toast value 25693266 in pg_toast_25497233

On Fri, May 7, 2010 at 01:24, Bryan Murphy <bmurphy1976@gmail.com> wrote:

I'm running into this issue again:

psql --version
psql (PostgreSQL) 8.3.7

COPY items_extended TO '/dev/null';
ERROR:  missing chunk number 0 for toast value 25693266 in pg_toast_25497233

Unfortunately, I do not know where these are coming from and I cannot
replicate the data in at least one of my tables (which has 20 million
records) because of this.  I've already found 10 bad records.  There
are more.

I have four tables with 20 million records (and a fifth which has even
more) that may have this problem.  Right now, checking every record
one by one is going to take *DAYS* to complete, and I'm in the middle
of adding additional nodes to our cluster because we are already
stretched to the limit.

I have a few options, such as trying to check batches of records and
spinning up multiple checkers in parallel looking at different subsets
of the data on wal shipped spares (assuming the wal shipped spares
would suffer the same problem, which is a big assumption), but this is
a lot of effort to get going.

Try doing a binary search with LIMIT. E.g., if you have 20M reecords,
do a SELECT * FROM ... LIMIT 10M. (throw away the results) If that
broke, check the upper half, if not, check the lower one (with
OFFSET).

If you have a serial primary key or something, you can use WHERE on it
which will likely be a lot faster than using LIMIT, but the same idea
applies - do a binary search. Should take a lot less than days, and is
reasonably easy to script.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#3Bryan Murphy
bmurphy1976@gmail.com
In reply to: Magnus Hagander (#2)
Re: missing chunk number 0 for toast value 25693266 in pg_toast_25497233

On Fri, May 7, 2010 at 9:02 AM, Magnus Hagander <magnus@hagander.net> wrote:

Try doing a binary search with LIMIT. E.g., if you have 20M reecords,
do a SELECT * FROM ... LIMIT 10M. (throw away the results) If that
broke, check the upper half, if not, check the lower one (with
OFFSET).

If you have a serial primary key or something, you can use WHERE on it
which will likely be a lot faster than using LIMIT, but the same idea
applies - do a binary search. Should take a lot less than days, and is
reasonably easy to script.

That's my next step if I can't find a quicker/simpler method. My math
tells me that my current script is going to take 24 days to test every
record. Obviously, there are ways I can speed that up if I have no
choice but I'm hoping for a simpler solution.

I'd prefer to run a COPY TABLE like command and have it skip the bad records.

Thanks,
Bryan