What could cause CREATE TEMP... "could not read block" error?

Started by Chris Richardsover 10 years ago9 messagesgeneral
Jump to latest
#1Chris Richards
chris@infiniteio.com

Howdy. I have two tables that I'm joining together and using the result to
create a temporary table. Performing the join operation works fine; it
produces 0 rows (as expected). If I prepend "CREATE TEMP tmp_policyqueue
AS" to the SELECT then it generates this error:

ERROR: could not read block 39 in file "base/16391/11937": read only 0 of
8192 bytes

$ psql
psql (9.3.9)
mdb=> CREATE TEMP TABLE tmp_policyqueue AS
mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
ERROR: could not read block 40 in file "base/16391/11937": read only 0 of
8192 bytes

You'll also observe that the block number is changing each time I execute
the command. I know very little about postgres internal structure so it may
be irrelevant. I've left my database in this state should extra information
be needed.

Thanks for any assist.

Chris

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Chris Richards (#1)
Re: What could cause CREATE TEMP... "could not read block" error?

Chris Richards wrote:

Howdy. I have two tables that I'm joining together and using the result to create a temporary table.
Performing the join operation works fine; it produces 0 rows (as expected). If I prepend "CREATE TEMP
tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR: could not read block 39 in file "base/16391/11937": read only 0 of 8192 bytes

$ psql
psql (9.3.9)

mdb=> CREATE TEMP TABLE tmp_policyqueue AS

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
ERROR: could not read block 40 in file "base/16391/11937": read only 0 of 8192 bytes

You'll also observe that the block number is changing each time I execute the command. I know very
little about postgres internal structure so it may be irrelevant. I've left my database in this state
should extra information be needed.

It would be interesting to know what object is affected:

SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
FROM pg_class t JOIN
pg_namespace s ON t.relnamespace = s.oid
WHERE t.relfilenode = 11937;

If it is an index, REINDEX should help.

What is the statement that performs the join operation and works just fine?

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Laurenz Albe (#2)
Re: What could cause CREATE TEMP... "could not read block" error?

On 11/20/2015 06:18 AM, Albe Laurenz wrote:

Chris Richards wrote:

Howdy. I have two tables that I'm joining together and using the result to create a temporary table.
Performing the join operation works fine; it produces 0 rows (as expected). If I prepend "CREATE TEMP
tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR: could not read block 39 in file "base/16391/11937": read only 0 of 8192 bytes

$ psql
psql (9.3.9)

mdb=> CREATE TEMP TABLE tmp_policyqueue AS

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
ERROR: could not read block 40 in file "base/16391/11937": read only 0 of 8192 bytes

You'll also observe that the block number is changing each time I execute the command. I know very
little about postgres internal structure so it may be irrelevant. I've left my database in this state
should extra information be needed.

It would be interesting to know what object is affected:

SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
FROM pg_class t JOIN
pg_namespace s ON t.relnamespace = s.oid
WHERE t.relfilenode = 11937;

If it is an index, REINDEX should help.

What is the statement that performs the join operation and works just fine?

If I am following correctly it is:

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;

Yours,
Laurenz Albe

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Chris Richards (#1)
Re: What could cause CREATE TEMP... "could not read block" error?

On 11/18/2015 04:31 PM, Chris Richards wrote:

Howdy. I have two tables that I'm joining together and using the result
to create a temporary table. Performing the join operation works fine;
it produces 0 rows (as expected). If I prepend "CREATE TEMP
tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR: could not read block 39 in file "base/16391/11937": read only 0
of 8192 bytes

$ psql
psql (9.3.9)
mdb=> CREATE TEMP TABLE tmp_policyqueue AS
mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id&gt;=sp.polidx;
ERROR: could not read block 40 in file "base/16391/11937": read only 0
of 8192 bytes

You'll also observe that the block number is changing each time I
execute the command. I know very little about postgres internal
structure so it may be irrelevant. I've left my database in this state
should extra information be needed.

Thanks for any assist.

What happens if you do?:

SELECT pq.* INTO TEMP TABLE tmp_policyqueue FROM policyqueue AS pq
JOIN seed_progress AS sp ON pq.id=sp.polidx;

What does \d show for:

policyqueue

seed_progress

Chris

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Chris Richards
chris@infinite.io
In reply to: Adrian Klaver (#4)
Re: What could cause CREATE TEMP... "could not read block" error?

Adrian is correct. This worked by itself whereas using it in the creation
of the temporary table failed.

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;

I checked the query Albe suggested; there were two `relfilenode`s (11936
and 11937) that exhibited the error. Respectively, they were
pg_depend_depender_index
and pg_depend_reference_index.

Unfortunately, I didn't disable the nightly processes and something must
have(?) fixed the glitch; at midnight GMT the query ran successfully. Ugh.

If it crops up again, I have some tools to try and capture data
immediately, and the suggested REINDEX since both appear to be indices.

Thanks for the help. It's appreciated.

Chris

On Fri, Nov 20, 2015 at 8:28 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 11/20/2015 06:18 AM, Albe Laurenz wrote:

Chris Richards wrote:

Howdy. I have two tables that I'm joining together and using the result
to create a temporary table.
Performing the join operation works fine; it produces 0 rows (as
expected). If I prepend "CREATE TEMP
tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR: could not read block 39 in file "base/16391/11937": read only 0
of 8192 bytes

$ psql
psql (9.3.9)

mdb=> CREATE TEMP TABLE tmp_policyqueue AS

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
ERROR: could not read block 40 in file "base/16391/11937": read only 0
of 8192 bytes

You'll also observe that the block number is changing each time I
execute the command. I know very
little about postgres internal structure so it may be irrelevant. I've
left my database in this state
should extra information be needed.

It would be interesting to know what object is affected:

SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
FROM pg_class t JOIN
pg_namespace s ON t.relnamespace = s.oid
WHERE t.relfilenode = 11937;

If it is an index, REINDEX should help.

What is the statement that performs the join operation and works just
fine?

If I am following correctly it is:

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;

Yours,
Laurenz Albe

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Chris Richards (#5)
Re: What could cause CREATE TEMP... "could not read block" error?

On 11/20/2015 07:26 AM, Chris Richards wrote:

Adrian is correct. This worked by itself whereas using it in the
creation of the temporary table failed.

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/&gt;=sp.polidx;

I checked the query Albe suggested; there were two `relfilenode`s (11936
and 11937) that exhibited the error. Respectively, they were
pg_depend_depender_index and pg_depend_reference_index.

Unfortunately, I didn't disable the nightly processes and something must
have(?) fixed the glitch; at midnight GMT the query ran successfully. Ugh.

What are the nightly processes, or at least those that might have
touched the tables in the join?

If this pops up again, could be another clue.

If it crops up again, I have some tools to try and capture data
immediately, and the suggested REINDEX since both appear to be indices.

Thanks for the help. It's appreciated.

Chris

On Fri, Nov 20, 2015 at 8:28 AM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 11/20/2015 06:18 AM, Albe Laurenz wrote:

Chris Richards wrote:

Howdy. I have two tables that I'm joining together and using
the result to create a temporary table.
Performing the join operation works fine; it produces 0 rows
(as expected). If I prepend "CREATE TEMP
tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR: could not read block 39 in file "base/16391/11937":
read only 0 of 8192 bytes

$ psql
psql (9.3.9)

mdb=> CREATE TEMP TABLE tmp_policyqueue AS

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id
<http://pq.id&gt;=sp.polidx;
ERROR: could not read block 40 in file "base/16391/11937":
read only 0 of 8192 bytes

You'll also observe that the block number is changing each
time I execute the command. I know very
little about postgres internal structure so it may be
irrelevant. I've left my database in this state
should extra information be needed.

It would be interesting to know what object is affected:

SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
FROM pg_class t JOIN
pg_namespace s ON t.relnamespace = s.oid
WHERE t.relfilenode = 11937;

If it is an index, REINDEX should help.

What is the statement that performs the join operation and works
just fine?

If I am following correctly it is:

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id&gt;=sp.polidx;

Yours,
Laurenz Albe

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Chris Richards (#5)
Re: What could cause CREATE TEMP... "could not read block" error?

Chris Richards wrote:

Adrian is correct. This worked by itself whereas using it in the creation of the temporary table
failed.
mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/&gt; =sp.polidx;

I checked the query Albe suggested; there were two `relfilenode`s (11936 and 11937) that exhibited the
error. Respectively, they were pg_depend_depender_index and pg_depend_reference_index.

Unfortunately, I didn't disable the nightly processes and something must have(?) fixed the glitch; at
midnight GMT the query ran successfully. Ugh.

If it crops up again, I have some tools to try and capture data immediately, and the suggested REINDEX
since both appear to be indices.

These indexes are both on the system table "pg_depend".

That explains why the query ran without problems but the CREATE TABLE didn't:
Creating an object creates dependencies, and you have data corruption in the
system table that tracks dependencies.

I would be pretty worried in your place if I had a corrupted catalog, even if
it seems to have "fixed itself". There might be other tables with corruption.

I would do two things:
1) Wait for a moment when there is little database traffic and
run "REINDEX TABLE pg_depend;" to rebuild these indexes.

2) As soon as you can schedule some downtime, pg_dumpall the database
cluster, stop the server, throw away the database cluster, create a new
one with "initdb" and restore the dump into that.
That will get rid of any lurking data corruption.
Watch out for error messages during the pg_dumpall!

The main question is of course how you got the corruption in the first place.
Are you running the latest minor release for your PostgreSQL version?
Were there any crashes recently?
Do you have "fsync = on"?
Did you make sure that you have a reliable storage system?

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Chris Richards
chris@infinite.io
In reply to: Laurenz Albe (#7)
Re: What could cause CREATE TEMP... "could not read block" error?

Are you running the latest minor release for your PostgreSQL version?

No. 9.3.9 whereas the lastest is 9.3.10

Were there any crashes recently?

No[*]. See comments below about LVM snapshots.

Do you have "fsync = on"?

Yes.

Did you make sure that you have a reliable storage system?

RAID-6 SSDs on ext4

I've observed the problem on other than the "pg_depend" table. A normal
table (called "blocks") had the problem. A "REINDEX TABLE blocks" did not
fix the problem, however a "VACUUM FULL" did "fix" it.

We had thought we figured out what might have caused the problem. We were
taking LVM snapshots while the database was running and then at sometime
later we reverted the snapshot (while postgres was stopped). Our theory was
that since postgres was running and therefore we captured its backing store
in an indeterminate state. We changed our snapshot process to shutdown
postgres, take the LVM snapshot, and then start postgres.

Unfortunately, the problem cropped up again.

We had an empty database except for our created tables, took the snapshot,
and then populated the tables using normal INSERT/UPDATE/DELETEs while the
system ran. Then, we reboot and revert the LVM snapshot--back to the empty
database. We then populated the tables by using a series "COPY <table> FROM
<file>". The only things special about this is we dropped one foreign key
constraint and that with our data files is that we may issue back-to-back
COPY's to the same table (each COPY pulling in different data as we
pre-process the data before COPY'ing it). The amount of data is relatively
small from our COPY's:

dirtable - 1 row
dirpath - 1 row
cloud - 940 rows
blocks - 176 rows (before it died)

Indexes were not disabled during the bulk import.

Chris

On Mon, Nov 23, 2015 at 2:59 AM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Show quoted text

Chris Richards wrote:

Adrian is correct. This worked by itself whereas using it in the

creation of the temporary table

failed.
mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/&gt; =sp.polidx;

I checked the query Albe suggested; there were two `relfilenode`s (11936

and 11937) that exhibited the

error. Respectively, they were pg_depend_depender_index and

pg_depend_reference_index.

Unfortunately, I didn't disable the nightly processes and something

must have(?) fixed the glitch; at

midnight GMT the query ran successfully. Ugh.

If it crops up again, I have some tools to try and capture data

immediately, and the suggested REINDEX

since both appear to be indices.

These indexes are both on the system table "pg_depend".

That explains why the query ran without problems but the CREATE TABLE
didn't:
Creating an object creates dependencies, and you have data corruption in
the
system table that tracks dependencies.

I would be pretty worried in your place if I had a corrupted catalog, even
if
it seems to have "fixed itself". There might be other tables with
corruption.

I would do two things:
1) Wait for a moment when there is little database traffic and
run "REINDEX TABLE pg_depend;" to rebuild these indexes.

2) As soon as you can schedule some downtime, pg_dumpall the database
cluster, stop the server, throw away the database cluster, create a new
one with "initdb" and restore the dump into that.
That will get rid of any lurking data corruption.
Watch out for error messages during the pg_dumpall!

The main question is of course how you got the corruption in the first
place.
Are you running the latest minor release for your PostgreSQL version?
Were there any crashes recently?
Do you have "fsync = on"?
Did you make sure that you have a reliable storage system?

Yours,
Laurenz Albe

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Chris Richards (#8)
Re: What could cause CREATE TEMP... "could not read block" error?

Chris Richards wrote:
[had corruption in pg_depend indexes]

I've observed the problem on other than the "pg_depend" table. A normal table (called "blocks") had
the problem. A "REINDEX TABLE blocks" did not fix the problem, however a "VACUUM FULL" did "fix" it.

Did you dump/remove cluster/initdb/restore?
That's the only reliable way to get rid of corruption.

Are you running the latest minor release for your PostgreSQL version?

No. 9.3.9 whereas the lastest is 9.3.10

I think no data corruption bugs have been discovered since 9.3.9.

Did you make sure that you have a reliable storage system?

RAID-6 SSDs on ext4

Did you test if it is reliable?
Pulled the plug on the server during a busy checkpoint?
Or read http://brad.livejournal.com/2116715.html

But that shouldn't apply to your case if there was no crash.

Were there any crashes recently?

No[*]. See comments below about LVM snapshots.

We had thought we figured out what might have caused the problem. We were taking LVM snapshots while
the database was running and then at sometime later we reverted the snapshot (while postgres was
stopped). Our theory was that since postgres was running and therefore we captured its backing store
in an indeterminate state. We changed our snapshot process to shutdown postgres, take the LVM
snapshot, and then start postgres.

If the snapshot is truly atomic, it should work - that would be just like restarting
PostgreSQL after a server crash. PostgreSQL is designed to handle such a case well,
provided that the storage does what it promises, i.e. make sure that data are really on disk
when an fsync() is issued.

Is the whole database, including pg_xlog, on one logical volume?

Unfortunately, the problem cropped up again.

We had an empty database except for our created tables, took the snapshot, and then populated the
tables using normal INSERT/UPDATE/DELETEs while the system ran. Then, we reboot and revert the LVM
snapshot--back to the empty database. We then populated the tables by using a series "COPY <table>
FROM <file>". The only things special about this is we dropped one foreign key constraint and that
with our data files is that we may issue back-to-back COPY's to the same table (each COPY pulling in
different data as we pre-process the data before COPY'ing it). The amount of data is relatively small
from our COPY's:

dirtable - 1 row
dirpath - 1 row
cloud - 940 rows
blocks - 176 rows (before it died)

Indexes were not disabled during the bulk import.

It "died" because of data curruption?
And you had PostgreSQL shut down when you took the snapshot?
Was this a new database cluster (not just a new database) created
with initdb just before the test?

Now that should work even if your storage lies to you.
Unless there is something quite wrong with your storage - RAID checksums
should prevent the effects of a failing disk from affecting you.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general