per-sesson errors after interrupting CLUSTER pg_attrdef

Started by Justin Pryzbyover 8 years ago4 messageshackers

pryzby@telsasoft.com

over 8 years ago

This was briefly scary but seems to have been limited to my psql session (no
other errors logged). Issue with catcache (?)

I realized that the backup job I'd kicked off was precluding the CLUSTER from
running, but that CLUSTER was still holding lock and stalling everything else
under the sun.

psql (10.0, server 9.6.5)
...

ts=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index ; ^CCancel request sent ERROR: canceling statement due to user request

ts=# \df qci_add*
ERROR: could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
ts=# \dt+ pg_att

ts=# \dt+ pg_attrdef
ERROR: could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
ts=# ^C
ts=# \q

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Justin Pryzby (#1)

Re: per-sesson errors after interrupting CLUSTER pg_attrdef

On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:

This was briefly scary but seems to have been limited to my psql session (no
other errors logged). Issue with catcache (?)

I realized that the backup job I'd kicked off was precluding the CLUSTER from
running, but that CLUSTER was still holding lock and stalling everything else
under the sun.

ts=# \df qci_add*
ERROR: could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
ts=# \dt+ pg_att

ts=# \dt+ pg_attrdef
ERROR: could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes

Perhaps that's too late, but to which relation is this relfilenode
actually referring to?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Justin Pryzby

pryzby@telsasoft.com

about 8 years ago

In reply to: Michael Paquier (#2)

Re: [HACKERS] per-sesson errors after interrupting CLUSTER pg_attribute (not attrdef)

On Tue, Oct 24, 2017 at 04:56:27PM -0700, Michael Paquier wrote:

On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:

This was briefly scary but seems to have been limited to my psql session (no
other errors logged). Issue with catcache (?)

I realized that the backup job I'd kicked off was precluding the CLUSTER from
running, but that CLUSTER was still holding lock and stalling everything else
under the sun.

ts=# \df qci_add*
ERROR: could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
ts=# \dt+ pg_att

ts=# \dt+ pg_attrdef
ERROR: could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes

Perhaps that's too late, but to which relation is this relfilenode
actually referring to?

Sorry for the late response :)

..but I've had mixed success reproducign this, and wasn't sure if it even an
issue under current postgres (the original issue was a 9.6 server+10.0client).

Now I know, this is still an issue under PG10.2:

[pryzbyj@united-telsasoft ~]$ psql ts
psql (10.2)
Type "help" for help.

ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO: clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO: "pg_attribute": found 18074 removable, 2488511 nonremovable row versions in 80769 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 9.66 s, system: 1.92 s, elapsed: 22.29 s.
^CCancel request sent
ERROR: canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO: clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO: "pg_attribute": found 18102 removable, 2488511 nonremovable row versions in 80769 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 10.00 s, system: 1.59 s, elapsed: 27.60 s.
^CCancel request sent
ERROR: canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
ERROR: could not open file "base/16400/948150297": No such file or directory
ts=# SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;
ERROR: column "pg_relation_filenode" does not exist
LINE 1: SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150...
^
ts=# SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;
ERROR: function pg_relation_filenode() does not exist
LINE 1: SELECT * FROM pg_attribute WHERE pg_relation_filenode()=9481...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
ts=# SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;
ERROR: could not open file "base/16400/948150297": No such file or directory
ts=#

postgres=# SELECT session_line, message, query FROM postgres_log_2018_03_20_1200 WHERE pid=29855 AND session_id='5ab147d9.749f' ORDER BY 1;
session_line|message|query
1|statement: SELECT pg_catalog.quote_ident(c2.relname) FROM pg_catalog.pg_class c1, pg_catalog.pg_class c2, pg_catalog.pg_index i WHERE c1.oid=i.indrelid and i.indexrelid=c2.oid and (0 = pg_catalog.length('')) and pg_catalog.quote_ident(c1.relname)='pg_attribute' and pg_catalog.pg_table_is_visible(c2.oid)
LIMIT 1000|
2|statement: SELECT pg_catalog.quote_ident(c2.relname) FROM pg_catalog.pg_class c1, pg_catalog.pg_class c2, pg_catalog.pg_index i WHERE c1.oid=i.indrelid and i.indexrelid=c2.oid and (24 = pg_catalog.length('pg_attribute_relid_attnu')) and pg_catalog.quote_ident(c1.relname)='pg_attribute' and pg_catalog.pg_table_is_visible(c2.oid)
LIMIT 1000|
3|statement: CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;|
4|clustering "pg_catalog.pg_attribute" using sequential scan and sort|
5|temporary file: path "base/pgsql_tmp/pgsql_tmp29855.0", size 374013952|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
6|"pg_attribute": found 18074 removable, 2488511 nonremovable row versions in 80769 pages|
7|canceling statement due to user request|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
8|statement: CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;|
9|clustering "pg_catalog.pg_attribute" using sequential scan and sort|
10|temporary file: path "base/pgsql_tmp/pgsql_tmp29855.1", size 374013952|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
11|"pg_attribute": found 18102 removable, 2488511 nonremovable row versions in 80769 pages|
12|canceling statement due to user request|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
13|statement: CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;|
14|could not open file "base/16400/948150297": No such file or directory|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
15|statement: SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;|
16|column "pg_relation_filenode" does not exist|SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;
17|statement: SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;|
18|function pg_relation_filenode() does not exist|SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;
19|statement: SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;|
20|could not open file "base/16400/948150297": No such file or directory|SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;
(20 rows)

I believe lines 1-2 were tab completion, which was also present in my original report.

In my original report, I believe \df wasn't working (or maybe tab completion
for a function?). This is different, as the CLUSTER isn't working.

Also note the subject of my original report was wrong (pg_attrdef not pg_attribute).

I'll leave the broken session opened for awhile in cases someone has questions
about it other than relfilenode, which apparently isn't working..

I don't know much but..is this related to pg_filenode.map?

Justin

Justin Pryzby

pryzby@telsasoft.com

about 8 years ago

In reply to: Justin Pryzby (#3)

Re: [HACKERS] per-sesson errors after interrupting CLUSTER pg_attribute (not attrdef)

On Tue, Mar 20, 2018 at 01:03:57PM -0500, Justin Pryzby wrote:

On Tue, Oct 24, 2017 at 04:56:27PM -0700, Michael Paquier wrote:

On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:

This was briefly scary but seems to have been limited to my psql session (no
other errors logged). Issue with catcache (?)

Now I know, this is still an issue under PG10.2:

[pryzbyj@united-telsasoft ~]$ psql ts
psql (10.2)
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO: clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO: "pg_attribute": found 18102 removable, 2488511 nonremovable row versions in 80769 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 10.00 s, system: 1.59 s, elapsed: 27.60 s.
^CCancel request sent
ERROR: canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
ERROR: could not open file "base/16400/948150297": No such file or directory

I don't know much but..is this related to pg_filenode.map?

I gather this is the "prefer that it not happen" case here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/cache/relmapper.c;hb=HEAD#l449

So I think this is a known, accepted/expected behavior and no issue.

Justin