Getting "could not read block" error when creating an index on a function.

Started by Demitri Munaover 5 years ago8 messagesgeneral
Jump to latest
#1Demitri Muna
postgresql@demitri.com

Hello,

I’m getting stuck on a problem I was hoping I could get some help with. I’m trying to create an index for the results of a function that touches two tables like this and get the following error:

CREATE INDEX my_idx ON mytable (first_time(id));
ERROR: could not read block 0 in file "base/16386/42868": read only 0 of 8192 bytes
CONTEXT: SQL function "first_time" during startup

Every time I run this, the last number in the block path increments by one, e.g.

ERROR: could not read block 0 in file "base/16386/42869": read only 0 of 8192 bytes
ERROR: could not read block 0 in file "base/16386/42870": read only 0 of 8192 bytes
ERROR: could not read block 0 in file "base/16386/42871": read only 0 of 8192 bytes

The database is sitting on two tablespaces (pg_default and ‘data2’). When I try to use the second, I get:

CREATE INDEX my_idx ON mytable (first_time(id)) TABLESPACE data2;
ERROR: could not read block 0 in file "pg_tblspc/17007/PG_13_202007201/16386/42870": read only 0 of 8192 bytes
CONTEXT: SQL function "first_time" during startup

with the last number similarly incrementing upon repeated attempts.

Relevant details:

* PostgreSQL version 13.1 running on Ubuntu 20.04.1 on an AWS instance using 2 x 8TB EBS storage.
* The database is ~15TB in size.
* I am not worried about data loss; the database can be considered read-only and I have all of the files needed to recreate any table.
* I am hoping to not recreate the whole database from scratch since doing so and creating the required indices will take more than a week.
* I used these settings while importing the files to speed the process since I was not worried about data loss to improve the import speed (all turned back on after import):

autovacuum = off
synchronous_commit=off
fsync = off
full_page_writes = off

* I will not do the above again. :)
* The postmaster server crashed at least twice during the process due to running out of disk space.
* Creating any number of new indices on bare columns is no problem.
* I DROPped and recreated the functions with no change.
* This statement returns no results (but maybe am looking at the wrong thing):

select n.nspname AS schema, c.relname AS realtion from pg_class c inner join pg_namespace n on (c.relnamespace = n.oid) where c.relfilenode = '16386’;

From reading about this error (missing block files) it suggests I have some database corruption, which is fine as I can easily delete anything problematic and recreate. I’ve deleted the indices related to the function and recreated them, but the same error remains. Accessing the related tables seems ok, but with that much data I can’t guarantee that. I don’t get any errors.

Any help would be appreciated!

Cheers,
Demitri

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Demitri Muna (#1)
Re: Getting "could not read block" error when creating an index on a function.

Demitri Muna <postgresql@demitri.com> writes:

I’m getting stuck on a problem I was hoping I could get some help with. I’m trying to create an index for the results of a function that touches two tables like this and get the following error:

CREATE INDEX my_idx ON mytable (first_time(id));
ERROR: could not read block 0 in file "base/16386/42868": read only 0 of 8192 bytes
CONTEXT: SQL function "first_time" during startup

What's apparently happening is that some query in the function is trying
to examine the newly-created index before it's been populated.

I would call this a bug if it were a supported case, but really you are
doing something you are not allowed to. Functions in indexed expressions
are required to be immutable, and a function that looks at the contents of
a table --- particularly the very table that the index is on --- is simply
not going to be that. Marking such a function immutable to try to end-run
around the restriction is unsafe.

regards, tom lane

#3Demitri Muna
postgresql@demitri.com
In reply to: Tom Lane (#2)
Re: Getting "could not read block" error when creating an index on a function.

Hi Tom,

On Dec 30, 2020, at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would call this a bug if it were a supported case, but really you are
doing something you are not allowed to. Functions in indexed expressions
are required to be immutable, and a function that looks at the contents of
a table --- particularly the very table that the index is on --- is simply
not going to be that. Marking such a function immutable to try to end-run
around the restriction is unsafe.

Thank you, that makes perfect sense. In my mind it was immutable since the database is read-only, but I can see to PG it’s not. Can you suggest an alternate for what I’m trying to do? Given this schema (a “person” has a number of “events”):

CREATE TABLE person (
id SERIAL,
...
);

CREATE TABLE event (
id SERIAL,
patient_id INTEGER
event_timestamp TIMESTAMP,

);

I have a function (the one I was trying to index) that returns the earliest event for a person. I’m scanning another table with ~10B rows several times using a few of these “constant” values:

* first_event_timestamp(person_id) + ‘1 month’
* first_event_timestamp(person_id) + ‘13 months’
* etc.

I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would be best to do this? Create additional columns? Create another table?

Thanks again,
Demitri

#4Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Demitri Muna (#3)
Re: Getting "could not read block" error when creating an index on a function.

Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:

I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would be best to do this? Create additional columns? Create another table?

A materialized view ?

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#5Christophe Pettus
xof@thebuild.com
In reply to: Demitri Muna (#3)
Re: Getting "could not read block" error when creating an index on a function.

On Dec 30, 2020, at 11:37, Demitri Muna <postgresql@demitri.com> wrote:
I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would be best to do this? Create additional columns? Create another table?

This might be a good use for a generated column.

https://www.postgresql.org/docs/current/ddl-generated-columns.html
--
-- Christophe Pettus
xof@thebuild.com

#6Christophe Pettus
xof@thebuild.com
In reply to: Christophe Pettus (#5)
Re: Getting "could not read block" error when creating an index on a function.

On Dec 30, 2020, at 11:48, Christophe Pettus <xof@thebuild.com> wrote:

This might be a good use for a generated column.

https://www.postgresql.org/docs/current/ddl-generated-columns.html

I take that back; the generation formula has to be immutable as well. Perhaps a column populated by a trigger?
--
-- Christophe Pettus
xof@thebuild.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Karsten Hilbert (#4)
Re: Getting "could not read block" error when creating an index on a function.

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:

I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would be best to do this? Create additional columns? Create another table?

A materialized view ?

Yeah, or you might be able to do something with a before-insert-or-update
trigger that computes whatever desired value you want and fills it into a
derived column. Indexing that column then gives the same results as
indexing the derived expression; but it sidesteps the semantic problems
because the time of computation of the expression is well-defined, even
if it's not immutable.

You might try to avoid a handwritten trigger by defining a generated
column instead, but we insist that generation expressions be immutable
so it won't really work. (Of course, you could still lie about the
mutability of the expression, but I can't recommend that. Per Henry
Spencer's well-known dictum, "If you lie to the compiler, it will get its
revenge". He was speaking of C compilers, I suppose, but the principle
applies to database query optimizers too.)

regards, tom lane

#8Demitri Muna
postgresql@demitri.com
In reply to: Tom Lane (#7)
Re: Getting "could not read block" error when creating an index on a function.

Thank you for the responses! I was going to go with a materialized view, but then realized that since the dataset is static it’s really no different from just creating a new table and indexing that. The suggestions provide useful advice for the future though.

Cheers,
Demitri

Show quoted text

On Dec 30, 2020, at 3:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:

I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would be best to do this? Create additional columns? Create another table?

A materialized view ?

Yeah, or you might be able to do something with a before-insert-or-update
trigger that computes whatever desired value you want and fills it into a
derived column. Indexing that column then gives the same results as
indexing the derived expression; but it sidesteps the semantic problems
because the time of computation of the expression is well-defined, even
if it's not immutable.

You might try to avoid a handwritten trigger by defining a generated
column instead, but we insist that generation expressions be immutable
so it won't really work. (Of course, you could still lie about the
mutability of the expression, but I can't recommend that. Per Henry
Spencer's well-known dictum, "If you lie to the compiler, it will get its
revenge". He was speaking of C compilers, I suppose, but the principle
applies to database query optimizers too.)

regards, tom lane