Unable to Connect to DB Instance

Started by Boblitz Johnalmost 8 years ago9 messagesgeneral
Jump to latest
#1Boblitz John
john.boblitz@bertschi.com

Good Morning,

Beginning yesterday morning, users have been unable to fully connect to our DB Instance.

1. At the time of the initial report - I was connected to the DB via pgAdmin and could perform queries without problem.

2. Users reported messages similar to "could not open file "global/11801": No such file or directory"

3. At that time, connection logging was turned off and there were no messages in the log files.

4. As this is a development environment, I turned logging on in the config and restarted the DB

5. After restart, neither I, nor the Users could fully reconnect.

6. I have performed a SYS Level backup (tar of the whole postgres directory tree)

7. I cannot perform a DB level Backup (same errors occur)

System Details
Linux Debian 7.11
Postgres 9.1 (9.1.24lts-0+d)

It appears that we can connect to the DB Server itself as I get "connection received" and "connection authorized" - but when trying to access the DB itself, several errors are raised (see below).
I am assuming that some internals are no longer consistent - the file "global/11801" for instance really does not exist on the system.

Questions:

1. Is there any way to recover from this (backup is unfortunately rather old)

2. What are possible causes? I'd like to prevent this from happening on my production servers.

** I am aware that we are on older releases, and yes, we plan to migrate to more current releases "soon" (tm) ...

Thanks in advance.

John Boblitz

Exceprt from Log:
2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG: connection received: host=192.168.250.50 port=28559
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG: connection authorized: user=dbadmin database=postgres
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR: could not open file "global/11801": No such file or directory
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince, CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location() ELSE NULL END as receiveloc, CASE WHEN usesuper THEN pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused
FROM pg_user WHERE usename=current_user
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR: could not open file "global/11801": No such file or directory
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname = current_user;
2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG: connection received: host=192.168.250.50 port=28561
2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG: connection authorized: user=dbadmin database=g11Base
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No such file or directory
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT: SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1
WHEN (nspname LIKE E'pg\\_%') THEN 0
ELSE 3 END AS nsptyp,
nsp.nspname, nsp.oid, pg_get_userbyid(nspowner) AS namespaceowner, nspacl, description, has_schema_privilege(nsp.oid, 'CREATE') as cancreate,
(SELECT array_agg(label) FROM pg_seclabels sl1 WHERE sl1.objoid=nsp.oid) AS labels,
(SELECT array_agg(provider) FROM pg_seclabels sl2 WHERE sl2.objoid=nsp.oid) AS providers
FROM pg_namespace nsp
LEFT OUTER JOIN pg_description des ON (des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)
WHERE NOT ((nspname = 'pg_catalog' AND EXISTS (SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname = 'pgagent' AND EXISTS (SELECT 1 FROM pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname = 'information_schema' AND EXISTS (SELECT 1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname LIKE '_%' AND EXISTS (SELECT 1 FROM pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid LIMIT 1))
) AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname

#2Murtuza Zabuawala
murtuza.zabuawala@enterprisedb.com
In reply to: Boblitz John (#1)
Re: Unable to Connect to DB Instance

Hi,

pgAdmin4 supports Postgres 9.2 & onwards.

--
Regards,
Murtuza Zabuawala
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

On Wed, Jul 4, 2018 at 3:19 PM, Boblitz John <john.boblitz@bertschi.com>
wrote:

Show quoted text

Good Morning,

Beginning yesterday morning, users have been unable to fully connect to
our DB Instance.

1. At the time of the initial report – I was connected to the DB
via pgAdmin and could perform queries without problem.

2. Users reported messages similar to “could not open file
"global/11801": No such file or directory”

3. At that time, connection logging was turned off and there were
no messages in the log files.

4. As this is a development environment, I turned logging on in the
config and restarted the DB

5. After restart, neither I, nor the Users could fully reconnect.

6. I have performed a SYS Level backup (tar of the whole postgres
directory tree)

7. I cannot perform a DB level Backup (same errors occur)

System Details

Linux Debian 7.11

Postgres 9.1 (9.1.24lts-0+d)

It appears that we can connect to the DB Server itself as I get
“connection received” and “connection authorized” – but when trying to
access the DB itself, several errors are raised (see below).

I am assuming that some internals are no longer consistent – the file
“global/11801” for instance really does not exist on the system.

Questions:

1. Is there any way to recover from this (backup is unfortunately
rather old)

2. What are possible causes? I’d like to prevent this from
happening on my production servers.

** I am aware that we are on older releases, and yes, we plan to migrate
to more current releases “soon” ™ …

Thanks in advance.

John Boblitz

Exceprt from Log:

2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG: connection
received: host=192.168.250.50 port=28559

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG: connection
authorized: user=dbadmin database=postgres

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR: could not
open file "global/11801": No such file or directory

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT
usecreatedb, usesuper, CASE WHEN usesuper THEN pg_postmaster_start_time()
ELSE NULL END as upsince, CASE WHEN usesuper THEN pg_conf_load_time() ELSE
NULL END as confloadedsince, CASE WHEN usesuper THEN pg_is_in_recovery()
ELSE NULL END as inrecovery, CASE WHEN usesuper THEN
pg_last_xlog_receive_location() ELSE NULL END as receiveloc, CASE WHEN
usesuper THEN pg_last_xlog_replay_location() ELSE NULL END as replayloc,
CASE WHEN usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as
replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN
pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused

FROM pg_user WHERE usename=current_user

2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR: could not
open file "global/11801": No such file or directory

2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT
rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname = current_user;

2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG: connection
received: host=192.168.250.50 port=28561

2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG: connection
authorized: user=dbadmin database=g11Base

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could not
open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No such file or
directory

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT: SELECT
CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1

WHEN (nspname LIKE E'pg\\_%') THEN 0

ELSE 3 END AS nsptyp,

nsp.nspname, nsp.oid, pg_get_userbyid(nspowner) AS
namespaceowner, nspacl, description, has_schema_privilege(nsp.oid,
'CREATE') as cancreate,

(SELECT array_agg(label) FROM pg_seclabels sl1 WHERE
sl1.objoid=nsp.oid) AS labels,

(SELECT array_agg(provider) FROM pg_seclabels sl2 WHERE
sl2.objoid=nsp.oid) AS providers

FROM pg_namespace nsp

LEFT OUTER JOIN pg_description des ON
(des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)

WHERE NOT ((nspname = 'pg_catalog' AND EXISTS (SELECT 1
FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid LIMIT
1)) OR

(nspname = 'pgagent' AND EXISTS (SELECT 1 FROM pg_class
WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT 1)) OR

(nspname = 'information_schema' AND EXISTS (SELECT 1 FROM
pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid LIMIT 1)) OR

(nspname LIKE '_%' AND EXISTS (SELECT 1 FROM pg_proc WHERE
proname='slonyversion' AND pronamespace = nsp.oid LIMIT 1))

) AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname NOT
LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname

#3Ashesh Vashi
ashesh.vashi@enterprisedb.com
In reply to: Boblitz John (#1)
Re: Unable to Connect to DB Instance

On Wed, Jul 4, 2018, 15:19 Boblitz John <john.boblitz@bertschi.com> wrote:

Good Morning,

Beginning yesterday morning, users have been unable to fully connect to
our DB Instance.

1. At the time of the initial report – I was connected to the DB
via pgAdmin and could perform queries without problem.

2. Users reported messages similar to “could not open file
"global/11801": No such file or directory”

3. At that time, connection logging was turned off and there were
no messages in the log files.

4. As this is a development environment, I turned logging on in the
config and restarted the DB

5. After restart, neither I, nor the Users could fully reconnect.

6. I have performed a SYS Level backup (tar of the whole postgres
directory tree)

7. I cannot perform a DB level Backup (same errors occur)

System Details

Linux Debian 7.11

Postgres 9.1 (9.1.24lts-0+d)

Please send your queries to pgsql-general@postgresql.org for database
server issues.
This is a pgAdmin support list.

-- Thanks, Ashesh

Show quoted text

It appears that we can connect to the DB Server itself as I get
“connection received” and “connection authorized” – but when trying to
access the DB itself, several errors are raised (see below).

I am assuming that some internals are no longer consistent – the file
“global/11801” for instance really does not exist on the system.

Questions:

1. Is there any way to recover from this (backup is unfortunately
rather old)

2. What are possible causes? I’d like to prevent this from
happening on my production servers.

** I am aware that we are on older releases, and yes, we plan to migrate
to more current releases “soon” ™ …

Thanks in advance.

John Boblitz

Exceprt from Log:

2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG: connection
received: host=192.168.250.50 port=28559

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG: connection
authorized: user=dbadmin database=postgres

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR: could not
open file "global/11801": No such file or directory

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT
usecreatedb, usesuper, CASE WHEN usesuper THEN pg_postmaster_start_time()
ELSE NULL END as upsince, CASE WHEN usesuper THEN pg_conf_load_time() ELSE
NULL END as confloadedsince, CASE WHEN usesuper THEN pg_is_in_recovery()
ELSE NULL END as inrecovery, CASE WHEN usesuper THEN
pg_last_xlog_receive_location() ELSE NULL END as receiveloc, CASE WHEN
usesuper THEN pg_last_xlog_replay_location() ELSE NULL END as replayloc,
CASE WHEN usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as
replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN
pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused

FROM pg_user WHERE usename=current_user

2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR: could not
open file "global/11801": No such file or directory

2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT
rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname = current_user;

2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG: connection
received: host=192.168.250.50 port=28561

2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG: connection
authorized: user=dbadmin database=g11Base

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could not
open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No such file or
directory

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT: SELECT
CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1

WHEN (nspname LIKE E'pg\\_%') THEN 0

ELSE 3 END AS nsptyp,

nsp.nspname, nsp.oid, pg_get_userbyid(nspowner) AS
namespaceowner, nspacl, description, has_schema_privilege(nsp.oid,
'CREATE') as cancreate,

(SELECT array_agg(label) FROM pg_seclabels sl1 WHERE
sl1.objoid=nsp.oid) AS labels,

(SELECT array_agg(provider) FROM pg_seclabels sl2 WHERE
sl2.objoid=nsp.oid) AS providers

FROM pg_namespace nsp

LEFT OUTER JOIN pg_description des ON
(des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)

WHERE NOT ((nspname = 'pg_catalog' AND EXISTS (SELECT 1
FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid LIMIT
1)) OR

(nspname = 'pgagent' AND EXISTS (SELECT 1 FROM pg_class
WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT 1)) OR

(nspname = 'information_schema' AND EXISTS (SELECT 1 FROM
pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid LIMIT 1)) OR

(nspname LIKE '_%' AND EXISTS (SELECT 1 FROM pg_proc WHERE
proname='slonyversion' AND pronamespace = nsp.oid LIMIT 1))

) AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname NOT
LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname

#4Boblitz John
john.boblitz@bertschi.com
In reply to: Ashesh Vashi (#3)

Forwarded from pgadmin-support – wrong list

From: Ashesh Vashi [mailto:ashesh.vashi@enterprisedb.com]
Sent: Mittwoch, 4. Juli 2018 12:14
To: Boblitz John <john.boblitz@bertschi.com>
Cc: pgadmin-support <pgadmin-support@postgresql.org>
Subject: Re: Unable to Connect to DB Instance

On Wed, Jul 4, 2018, 15:19 Boblitz John <john.boblitz@bertschi.com<mailto:john.boblitz@bertschi.com>> wrote:
Good Morning,

Beginning yesterday morning, users have been unable to fully connect to our DB Instance.

1. At the time of the initial report – I was connected to the DB via pgAdmin and could perform queries without problem.

2. Users reported messages similar to “could not open file "global/11801": No such file or directory”

3. At that time, connection logging was turned off and there were no messages in the log files.

4. As this is a development environment, I turned logging on in the config and restarted the DB

5. After restart, neither I, nor the Users could fully reconnect.

6. I have performed a SYS Level backup (tar of the whole postgres directory tree)

7. I cannot perform a DB level Backup (same errors occur)

System Details
Linux Debian 7.11
Postgres 9.1 (9.1.24lts-0+d)

Please send your queries to pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org> for database server issues.
This is a pgAdmin support list.

-- Thanks, Ashesh
It appears that we can connect to the DB Server itself as I get “connection received” and “connection authorized” – but when trying to access the DB itself, several errors are raised (see below).
I am assuming that some internals are no longer consistent – the file “global/11801” for instance really does not exist on the system.

Questions:

1. Is there any way to recover from this (backup is unfortunately rather old)

2. What are possible causes? I’d like to prevent this from happening on my production servers.

** I am aware that we are on older releases, and yes, we plan to migrate to more current releases “soon” ™ …

Thanks in advance.

John Boblitz

Exceprt from Log:
2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG: connection received: host=192.168.250.50 port=28559
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG: connection authorized: user=dbadmin database=postgres
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR: could not open file "global/11801": No such file or directory
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince, CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location() ELSE NULL END as receiveloc, CASE WHEN usesuper THEN pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused
FROM pg_user WHERE usename=current_user
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR: could not open file "global/11801": No such file or directory
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT: SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname = current_user;
2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG: connection received: host=192.168.250.50 port=28561
2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG: connection authorized: user=dbadmin database=g11Base
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No such file or directory
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT: SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1
WHEN (nspname LIKE E'pg\\_%') THEN 0
ELSE 3 END AS nsptyp,
nsp.nspname, nsp.oid, pg_get_userbyid(nspowner) AS namespaceowner, nspacl, description, has_schema_privilege(nsp.oid, 'CREATE') as cancreate,
(SELECT array_agg(label) FROM pg_seclabels sl1 WHERE sl1.objoid=nsp.oid) AS labels,
(SELECT array_agg(provider) FROM pg_seclabels sl2 WHERE sl2.objoid=nsp.oid) AS providers
FROM pg_namespace nsp
LEFT OUTER JOIN pg_description des ON (des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)
WHERE NOT ((nspname = 'pg_catalog' AND EXISTS (SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname = 'pgagent' AND EXISTS (SELECT 1 FROM pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname = 'information_schema' AND EXISTS (SELECT 1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname LIKE '_%' AND EXISTS (SELECT 1 FROM pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid LIMIT 1))
) AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Boblitz John (#4)
Re: Unable to Connect to DB Instance

Boblitz John <john.boblitz@bertschi.com> writes:

2. Users reported messages similar to "could not open file "global/11801": No such file or directory"

I'd try "select relname from pg_class where pg_relation_filenode(oid) =
11801" to see if you can identify the problematic relation that way.

If you're lucky, this is just loss of some system catalog index in which
case reindexing will fix it. The fact that you're able to get through
connecting, and the errors only show up with queries, is somewhat
promising given that the problem looks like it's related to pg_authid
or pg_db_role_setting.

regards, tom lane

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Boblitz John (#4)
Re: Unable to Connect to DB Instance

On 07/04/2018 04:08 AM, Boblitz John wrote:

Good Morning,

Beginning yesterday morning, users have been unable to fully connect
to our DB Instance.

1.At the time of the initial report – I was connected to the DB via
pgAdmin and could perform queries without problem.

2.Users reported messages similar to “could not open file
"global/11801": No such file or directory”

3.At that time, connection logging was turned off and there were no
messages in the log files.

4.As this is a development environment, I turned logging on in the
config and restarted the DB

5.After restart, neither I, nor the Users could fully reconnect.

6.I have performed a SYS Level backup (tar of the whole postgres
directory tree)

7.I cannot perform a DB level Backup (same errors occur)

System Details

                Linux Debian      7.11

                Postgres              9.1 (9.1.24lts-0+d)

It appears that we can connect to the DB Server itself as I get
“connection received” and “connection authorized” – but when trying
to access the DB itself, several errors are raised (see below).

I am assuming that some internals are no longer consistent – the
file “global/11801” for instance really does not exist on the system.

Questions:

1.Is there any way to recover from this (backup is unfortunately
rather old) >
2.What are possible causes?  I’d like to prevent this from happening
on my production servers.

Looks like something/someone deleted files from portions of the $DATA
directory. In particular from ~/global and ~/pg_tblspc. Without those
files you can't really proceed.

Can you recover by creating a new instance and restoring from a dump of
the production server?

As to exact cause the only thing I can think of is to to look at the
system logs at the time of the initial failure and see if there is
anything there that would shed light.

** I am aware that we are on older releases, and yes, we plan to
migrate to more current releases “soon” ™ …

Thanks in advance.

John Boblitz

Exceprt from Log:

2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG:
connection received: host=192.168.250.50 port=28559

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG:
connection authorized: user=dbadmin database=postgres

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR:  could
not open file "global/11801": No such file or directory

2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT:
SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN
pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN
usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince,
CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as
inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location()
ELSE NULL END as receiveloc, CASE WHEN usesuper THEN
pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN
usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as
replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN
pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused

                  FROM pg_user WHERE usename=current_user

2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR:  could
not open file "global/11801": No such file or directory

2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT:
SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname =
current_user;

2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG:
connection received: host=192.168.250.50 port=28561

2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG:
connection authorized: user=dbadmin database=g11Base

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR:  could
not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No
such file or directory

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT:
SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1

                            WHEN (nspname LIKE E'pg\\_%') THEN 0

                            ELSE 3 END AS nsptyp,

                       nsp.nspname, nsp.oid,
pg_get_userbyid(nspowner) AS namespaceowner, nspacl,
description,       has_schema_privilege(nsp.oid, 'CREATE') as cancreate,

                (SELECT array_agg(label) FROM pg_seclabels sl1
WHERE sl1.objoid=nsp.oid) AS labels,

                (SELECT array_agg(provider) FROM pg_seclabels sl2
WHERE sl2.objoid=nsp.oid) AS providers

                  FROM pg_namespace nsp

                  LEFT OUTER JOIN pg_description des ON
(des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)

                WHERE NOT ((nspname = 'pg_catalog' AND EXISTS
(SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace
= nsp.oid LIMIT 1)) OR

                (nspname = 'pgagent' AND EXISTS (SELECT 1 FROM
pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT
1)) OR

                (nspname = 'information_schema' AND EXISTS (SELECT
1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid
LIMIT 1)) OR

                (nspname LIKE '_%' AND EXISTS (SELECT 1 FROM
pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid
LIMIT 1))

                )  AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname
NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname

--
Adrian Klaver
adrian.klaver@aklaver.com

#7Boblitz John
john.boblitz@bertschi.com
In reply to: Tom Lane (#5)
RE: Unable to Connect to DB Instance

Hello Tom,

Thanks - I get "pg_db_role_setting" as a response.

I have already attempted to reindex system but get:

NOTICE: table "pg_catalog.pg_class" was reindexed
NOTICE: table "pg_catalog.pg_statistic" was reindexed
NOTICE: table "pg_catalog.pg_type" was reindexed
NOTICE: table "pg_catalog.pg_attribute" was reindexed
NOTICE: table "pg_catalog.pg_authid" was reindexed
ERROR: could not open file "base/11919/11680": No such file or directory

John

Show quoted text

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Mittwoch, 4. Juli 2018 17:50
To: Boblitz John <john.boblitz@bertschi.com>
Cc: pgsql-general@postgresql.org
Subject: Re: Unable to Connect to DB Instance

Boblitz John <john.boblitz@bertschi.com> writes:

2. Users reported messages similar to "could not open file

"global/11801": No such file or directory"

I'd try "select relname from pg_class where pg_relation_filenode(oid) =
11801" to see if you can identify the problematic relation that way.

If you're lucky, this is just loss of some system catalog index in which case
reindexing will fix it. The fact that you're able to get through connecting,
and the errors only show up with queries, is somewhat promising given that
the problem looks like it's related to pg_authid or pg_db_role_setting.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Boblitz John (#7)
Re: Unable to Connect to DB Instance

Boblitz John <john.boblitz@bertschi.com> writes:

Thanks - I get "pg_db_role_setting" as a response.

Hm ... not its index? If the table itself is gone, it's surprising
that you can get through session startup.

I have already attempted to reindex system but get:

I had in mind just reindexing the specific table you're having trouble
with ... but this:

ERROR: could not open file "base/11919/11680": No such file or directory

shows that there's another table that also has a problem, and there
may be more :-(. I don't know what the odds are that you can get out
of this completely. I would NOT recommend "reindex system" as a
blunderbuss solution. You do not know how much is corrupted and there's
a significant chance of making things worse by tromping over the whole
database using catalogs of uncertain reliability.

Did you identify which table 11680 is?

In the case of pg_db_role_setting, a possible solution is to "touch" the
missing file so it exists; it'll be empty, which means that you'll have
lost any ALTER DATABASE/ROLE SET settings, but that's better than not
being able to dump at all. (You might then need to REINDEX
pg_db_role_setting to get its indexes in sync with it being empty.)

Whether an equally drastic answer is tolerable for your other missing
table(s) depends on what they are...

regards, tom lane

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom Lane (#8)
Re: Unable to Connect to DB Instance

On 07/04/2018 12:36 PM, Tom Lane wrote:

Boblitz John <john.boblitz@bertschi.com> writes:

Thanks - I get "pg_db_role_setting" as a response.

Hm ... not its index? If the table itself is gone, it's surprising
that you can get through session startup.

I have already attempted to reindex system but get:

I had in mind just reindexing the specific table you're having trouble
with ... but this:

ERROR: could not open file "base/11919/11680": No such file or directory

shows that there's another table that also has a problem, and there
may be more :-(. I don't know what the odds are that you can get out
of this completely. I would NOT recommend "reindex system" as a
blunderbuss solution. You do not know how much is corrupted and there's
a significant chance of making things worse by tromping over the whole
database using catalogs of uncertain reliability.

Did you identify which table 11680 is?

There is also this from the OP:

2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could not
open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No such file
or directory

Which failed on a query that references pgAgent and Slony.

So are you using either or both of those programs?

In the case of pg_db_role_setting, a possible solution is to "touch" the
missing file so it exists; it'll be empty, which means that you'll have
lost any ALTER DATABASE/ROLE SET settings, but that's better than not
being able to dump at all. (You might then need to REINDEX
pg_db_role_setting to get its indexes in sync with it being empty.)

Whether an equally drastic answer is tolerable for your other missing
table(s) depends on what they are...

regards, tom lane

--
Adrian Klaver
adrian.klaver@aklaver.com