Re: Issue with pg_dump due to Schema OID Error
I forgot to attache the script.
On Thu, Dec 19, 2024 at 10:41 AM Renzo Dani <renzo.dani@gmail.com> wrote:
Show quoted text
Hi,
Recently, I encountered a problem during a database export using pg_dump.
Here is the error message:
pg_dump: last built-in OID is 16383
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: error: schema with OID 41960442 does not exist
To investigate the issue, I ran the following query:
SELECT * FROM pg_proc WHERE pronamespace = 41960442;
The result:
oid;proname;pronamespace;proowner;prolang;procost;prorows;provariadic;
prosupport;prokind;prosecdef;proleakproof;proisstrict;proretset;
provolatile;proparallel;pronargs;pronargdefaults;prorettype;proargtypes;
proallargtypes;proargmodes;proargnames;proargdefaults;protrftypes;prosrc;
probin;prosqlbody;proconfig;proacl41966618;remapprotocoltypeids
;41960442;19214494;13547;100;0;0;-;f;f;f;f;f;v;u;1;0;25;25;;;{
pprotocoltypeids};;;I resolved the issue by removing the problematic record (admin privileges
required):DELETE FROM pg_proc WHERE oid = 41966618;
This situation seems inconsistent and likely should not occur under normal
conditions.While I’m unsure exactly when this issue originated in our environment, I
was able to reproduce it by performing concurrent modifications on the
schema.To demonstrate, I wrote a bash script (test_bug.sh) that starts two
threads running in parallel.Each thread drops the schema with CASCADE and recreates it using the SQL
script search_bug.sql.To use the script, you’ll need to adapt two variables at the beginning of
the script: PGPASSWORD and URL.Using this script, I reproduced the problem on PostgreSQL versions 16.1
and 17.1.It typically takes less than a minute to trigger the issue.
The script terminates automatically as soon as the problem is detected.
Here are additional references that might be related to this issue:
/messages/by-id/20110209003823.GA93840@mr-paradox.net
/messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com
Let me know if you need additional information.
Best regards
Renzo
Import Notes
Reply to msg id not found: CA+XOKQCwLv_oJ+bFO7hbhzM_fZU7Kvst7uEZfYNNrSnzhT=kxQ@mail.gmail.comReference msg id not found: CA+XOKQCwLv_oJ+bFO7hbhzM_fZU7Kvst7uEZfYNNrSnzhT=kxQ@mail.gmail.com
On 12/19/24 01:43, arons wrote:
I forgot to attache the script.
In MyTestBugSchema01.baseProc() you meant to have:
select MyTestBugSchema.afunction( u.username )
instead of
select MyTestBugSchema2.afunction( u.username )
In other words there is a predefined MyTestBugSchema?
On Thu, Dec 19, 2024 at 10:41 AM Renzo Dani <renzo.dani@gmail.com
<mailto:renzo.dani@gmail.com>> wrote:Hi,
Recently, I encountered a problem during a database export using
pg_dump.Here is the error message:
pg_dump: last built-in OID is 16383
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: error: schema with OID 41960442 does not exist
To investigate the issue, I ran the following query:
SELECT * FROM pg_proc WHERE pronamespace = 41960442;
The result:
oid;proname;pronamespace;proowner;prolang;procost;prorows;provariadic;prosupport;prokind;prosecdef;proleakproof;proisstrict;proretset;provolatile;proparallel;pronargs;pronargdefaults;prorettype;proargtypes;proallargtypes;proargmodes;proargnames;proargdefaults;protrftypes;prosrc;probin;prosqlbody;proconfig;proacl
41966618;remapprotocoltypeids;41960442;19214494;13547;100;0;0;-;f;f;f;f;f;v;u;1;0;25;25;;;{pprotocoltypeids};;;
I resolved the issue by removing the problematic record (admin
privileges required):DELETE FROM pg_proc WHERE oid = 41966618;
This situation seems inconsistent and likely should not occur under
normal conditions.While I’m unsure exactly when this issue originated in our
environment, I was able to reproduce it by performing concurrent
modifications on the schema.To demonstrate, I wrote a bash script (test_bug.sh) that starts two
threads running in parallel.Each thread drops the schema with CASCADE and recreates it using the
SQL script search_bug.sql.To use the script, you’ll need to adapt two variables at the
beginning of the script: PGPASSWORD and URL.Using this script, I reproduced the problem on PostgreSQL versions
16.1 and 17.1.It typically takes less than a minute to trigger the issue.
The script terminates automatically as soon as the problem is detected.
Here are additional references that might be related to this issue:
/messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net>
/messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com>
Let me know if you need additional information.
Best regards
Renzo
--
Adrian Klaver
adrian.klaver@aklaver.com
On 12/19/24 08:21, Renzo Dani wrote:
Reply to list also.
Ccing list.
HI Adrian,
you are right, there is a typo, the correct would be to have:
MyTestBugSchema2.afunction( u.username )In any case the problem appears in my tests also with that script.
To be clear the test script did not use MyTestBugSchema2.afunction(
u.username ) but instead MyTestBugSchema.afunction( u.username ).
If that is the case where did MyTestBugSchema come from?
I think the execution is not really important, I suppose what create the
problem is the drop cascade of the schema.
It is important if someone wants to replicate the test case.
BR
RenzoOn Thu, Dec 19, 2024 at 5:15 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 12/19/24 01:43, arons wrote:
I forgot to attache the script.
In MyTestBugSchema01.baseProc() you meant to have:
select MyTestBugSchema.afunction( u.username )
instead of
select MyTestBugSchema2.afunction( u.username )
In other words there is a predefined MyTestBugSchema?
On Thu, Dec 19, 2024 at 10:41 AM Renzo Dani <renzo.dani@gmail.com
<mailto:renzo.dani@gmail.com>
<mailto:renzo.dani@gmail.com <mailto:renzo.dani@gmail.com>>> wrote:
Hi,
Recently, I encountered a problem during a database export using
pg_dump.Here is the error message:
pg_dump: last built-in OID is 16383
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: error: schema with OID 41960442 does not exist
To investigate the issue, I ran the following query:
SELECT * FROM pg_proc WHERE pronamespace = 41960442;
The result:
oid;proname;pronamespace;proowner;prolang;procost;prorows;provariadic;prosupport;prokind;prosecdef;proleakproof;proisstrict;proretset;provolatile;proparallel;pronargs;pronargdefaults;prorettype;proargtypes;proallargtypes;proargmodes;proargnames;proargdefaults;protrftypes;prosrc;probin;prosqlbody;proconfig;proacl
41966618;remapprotocoltypeids;41960442;19214494;13547;100;0;0;-;f;f;f;f;f;v;u;1;0;25;25;;;{pprotocoltypeids};;;
I resolved the issue by removing the problematic record (admin
privileges required):DELETE FROM pg_proc WHERE oid = 41966618;
This situation seems inconsistent and likely should not occur
under
normal conditions.
While I’m unsure exactly when this issue originated in our
environment, I was able to reproduce it by performing concurrent
modifications on the schema.To demonstrate, I wrote a bash script (test_bug.sh) that
starts two
threads running in parallel.
Each thread drops the schema with CASCADE and recreates it
using the
SQL script search_bug.sql.
To use the script, you’ll need to adapt two variables at the
beginning of the script: PGPASSWORD and URL.Using this script, I reproduced the problem on PostgreSQL
versions
16.1 and 17.1.
It typically takes less than a minute to trigger the issue.
The script terminates automatically as soon as the problem is
detected.
Here are additional references that might be related to this
issue:
/messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net> </messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net>>
/messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com> </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com>>
Let me know if you need additional information.
Best regards
Renzo
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
Import Notes
Reply to msg id not found: CA+XOKQBXogzXP1bduZ=ppt2XscWhwc69F-19ua8bpcvebLAQOw@mail.gmail.comReference msg id not found: CA+XOKQCwLv_oJ+bFO7hbhzM_fZU7Kvst7uEZfYNNrSnzhT=kxQ@mail.gmail.com
On 12/19/24 08:46, Renzo Dani wrote:
Again.
Reply to list also using Reply All.
Ccing list.
Hi Adrian,
here a new version of the script that I just tested produce the same
problem.
The script do not rely now on any additional relation than the ones
defined into the script.BR
RenzoOn Thu, Dec 19, 2024 at 5:28 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 12/19/24 08:21, Renzo Dani wrote:
Reply to list also.
Ccing list.HI Adrian,
you are right, there is a typo, the correct would be to have:
MyTestBugSchema2.afunction( u.username )In any case the problem appears in my tests also with that script.
To be clear the test script did not use MyTestBugSchema2.afunction(
u.username ) but instead MyTestBugSchema.afunction( u.username ).If that is the case where did MyTestBugSchema come from?
I think the execution is not really important, I suppose what
create the
problem is the drop cascade of the schema.
It is important if someone wants to replicate the test case.
BR
RenzoOn Thu, Dec 19, 2024 at 5:15 PM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
<mailto:adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>> wrote:
On 12/19/24 01:43, arons wrote:
> I forgot to attache the script.In MyTestBugSchema01.baseProc() you meant to have:
select MyTestBugSchema.afunction( u.username )
instead of
select MyTestBugSchema2.afunction( u.username )
In other words there is a predefined MyTestBugSchema?
>
> On Thu, Dec 19, 2024 at 10:41 AM Renzo Dani<renzo.dani@gmail.com <mailto:renzo.dani@gmail.com>
<mailto:renzo.dani@gmail.com <mailto:renzo.dani@gmail.com>>
> <mailto:renzo.dani@gmail.com <mailto:renzo.dani@gmail.com><mailto:renzo.dani@gmail.com <mailto:renzo.dani@gmail.com>>>> wrote:
>
> Hi,
>
>
> Recently, I encountered a problem during a databaseexport using
> pg_dump.
>
>
> Here is the error message:
>
>
> pg_dump: last built-in OID is 16383
>
> pg_dump: reading extensions
>
> pg_dump: identifying extension members
>
> pg_dump: reading schemas
>
> pg_dump: reading user-defined tables
>
> pg_dump: reading user-defined functions
>
> pg_dump: error: schema with OID 41960442 does not exist
>
>
> To investigate the issue, I ran the following query:
>
>
> SELECT * FROM pg_proc WHERE pronamespace = 41960442;
>
>
> The result:
>
>
>oid;proname;pronamespace;proowner;prolang;procost;prorows;provariadic;prosupport;prokind;prosecdef;proleakproof;proisstrict;proretset;provolatile;proparallel;pronargs;pronargdefaults;prorettype;proargtypes;proallargtypes;proargmodes;proargnames;proargdefaults;protrftypes;prosrc;probin;prosqlbody;proconfig;proacl
>
>41966618;remapprotocoltypeids;41960442;19214494;13547;100;0;0;-;f;f;f;f;f;v;u;1;0;25;25;;;{pprotocoltypeids};;;
>
>
> I resolved the issue by removing the problematicrecord (admin
> privileges required):
>
>
> DELETE FROM pg_proc WHERE oid = 41966618;
>
>
> This situation seems inconsistent and likely shouldnot occur
under
> normal conditions.
>
>
> While I’m unsure exactly when this issue originated in our
> environment, I was able to reproduce it by performingconcurrent
> modifications on the schema.
>
>
> To demonstrate, I wrote a bash script (test_bug.sh) that
starts two
> threads running in parallel.
>
> Each thread drops the schema with CASCADE and recreates it
using the
> SQL script search_bug.sql.
>
>
> To use the script, you’ll need to adapt two variablesat the
> beginning of the script: PGPASSWORD and URL.
>
>
> Using this script, I reproduced the problem on PostgreSQL
versions
> 16.1 and 17.1.
>
> It typically takes less than a minute to trigger theissue.
>
> The script terminates automatically as soon as theproblem is
detected.
>
>
> Here are additional references that might be relatedto this
issue:
>
>
>/messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net> </messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net>> </messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net> </messages/by-id/20110209003823.GA93840@mr-paradox.net </messages/by-id/20110209003823.GA93840@mr-paradox.net>>>
>
>/messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com> </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com>> </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com> </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com </messages/by-id/BB8AF37F-E3D9-4DE0-B398-AF89748704F5@bandwidth.com>>>
>
>
>
> Let me know if you need additional information.
>
>
> Best regards
>
> Renzo
>--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com><mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
Attachments:
Import Notes
Reply to msg id not found: CA+XOKQDaHqA0eGU15fXVg904Kt-u3TwRrHDpthOEs6JdykQVkQ@mail.gmail.comReference msg id not found: CA+XOKQCwLv_oJ+bFO7hbhzM_fZU7Kvst7uEZfYNNrSnzhT=kxQ@mail.gmail.com
On 12/19/24 08:50, Adrian Klaver wrote:
On 12/19/24 08:46, Renzo Dani wrote:
Again.
Reply to list also using Reply All.
Ccing list.Hi Adrian,
here a new version of the script that I just tested produce the same
problem.
The script do not rely now on any additional relation than the ones
defined into the script.
I could not replicate using Ubuntu 22.04 and Postgres 16.6.
From your original post:
"I reproduced the problem on PostgreSQL versions 16.1 and 17.1."
The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those
and then try again.
BR
Renzo
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi Adrian,
Thanks for the test, I installed also version 17.2 and test with a fresh
newly created db.
At the beginning I cannot reproduce the problem as well.
Than I start adding the extension we use, one after one, and testing in
between.
I found out that the problem appear immediately as soon as I add the
extension pg_stat_statements:
create extension pg_stat_statements WITH SCHEMA public;
alter system set shared_preload_libraries = 'pg_stat_statements';
Can you maybe test as well with this extension?
the other that I installed before that are: pgcrypto and pg_prewarm.
here the current installed one:
dbtest=# \dx
List of installed extensions
Name | Version | Schema |
Description
--------------------+---------+------------+------------------------------------------------------------------------
pg_prewarm | 1.2 | public | prewarm relation data
pg_stat_statements | 1.11 | public | track planning and execution
statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(4 rows)
dbtest=# select version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.2 reda build on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit
(1 row)
I'll do some additional tests.
BR
Renzo
On Thu, Dec 19, 2024 at 6:21 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 12/19/24 08:50, Adrian Klaver wrote:
On 12/19/24 08:46, Renzo Dani wrote:
Again.
Reply to list also using Reply All.
Ccing list.Hi Adrian,
here a new version of the script that I just tested produce the same
problem.
The script do not rely now on any additional relation than the ones
defined into the script.I could not replicate using Ubuntu 22.04 and Postgres 16.6.
From your original post:
"I reproduced the problem on PostgreSQL versions 16.1 and 17.1."
The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those
and then try again.BR
Renzo--
Adrian Klaver
adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes:
The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those
and then try again.
Highly unlikely to make any difference. What's evidently going on
here is that the test script attempts to do DROP SCHEMA concurrently
with another session that's creating an object inside that schema.
(Here, that's a function, but the particular type of object doesn't
really matter.) There are three possible outcomes of that:
1. The object creation commits soon enough that DROP SCHEMA sees it,
and drops the object along with the schema.
2. The object creation begins after DROP SCHEMA commits, and fails
because the schema is not to be found.
3. The object creation goes through, leaving a now-dangling schema
OID reference in the object's catalog entry. The object is useless
because it's unnamable, but it won't really cause any trouble except
for applications that scan the system catalogs (like pg_dump).
Exactly none of these outcomes result in a usable object, so
one wonders why your application is doing this sort of thing
often enough to hit the race condition.
We could prevent case 3 by locking the schema during object creation,
converting it to one of the other cases. We actually do that for
tables, but not for any other object types, reasoning that the greatly
increased cost of locking would outweigh the problems that dangling
objects create. (Note that to eliminate the issue fully, we'd have to
lock every referenced object not only schemas; for example, also the
data types of the function's arguments and result.) Also, adding such
locking might well lead to deadlocks in concurrent add/drop scenarios,
not just performance costs.
tl;dr: it's been like this a long time, and I don't really foresee
us accepting the costs of making it not act like that. I seem to
recall someone submitting a patch recently that would add such
locking, but I doubt it'll get accepted.
regards, tom lane
HI Adrian,
I did additional tests and I can reproduce the problem also without the
extensions.
I did the test multiple time with a script that I put as attachment
(wholetest.sh).
I added also the file log_different_run.txt that contains the results of
some test and the times it takes to reproduce.
BR
Renzo
On Thu, Dec 19, 2024 at 9:42 PM Renzo Dani <arons7@gmail.com> wrote:
Show quoted text
Hi Adrian,
Thanks for the test, I installed also version 17.2 and test with a fresh
newly created db.
At the beginning I cannot reproduce the problem as well.
Than I start adding the extension we use, one after one, and testing in
between.I found out that the problem appear immediately as soon as I add the
extension pg_stat_statements:create extension pg_stat_statements WITH SCHEMA public;
alter system set shared_preload_libraries = 'pg_stat_statements';
Can you maybe test as well with this extension?
the other that I installed before that are: pgcrypto and pg_prewarm.here the current installed one:
dbtest=# \dx
List of installed extensions
Name | Version | Schema |
Description--------------------+---------+------------+------------------------------------------------------------------------
pg_prewarm | 1.2 | public | prewarm relation data
pg_stat_statements | 1.11 | public | track planning and execution
statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(4 rows)dbtest=# select version();
version------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.2 reda build on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit
(1 row)I'll do some additional tests.
BR
RenzoOn Thu, Dec 19, 2024 at 6:21 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 12/19/24 08:50, Adrian Klaver wrote:
On 12/19/24 08:46, Renzo Dani wrote:
Again.
Reply to list also using Reply All.
Ccing list.Hi Adrian,
here a new version of the script that I just tested produce the same
problem.
The script do not rely now on any additional relation than the ones
defined into the script.I could not replicate using Ubuntu 22.04 and Postgres 16.6.
From your original post:
"I reproduced the problem on PostgreSQL versions 16.1 and 17.1."
The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those
and then try again.BR
Renzo--
Adrian Klaver
adrian.klaver@aklaver.com