Avoid orphaned objects dependencies, take 3
Hi,
This new thread is a follow-up of [1]/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net and [2]/messages/by-id/8369ff70-0e31-f194-2954-787f4d9e21dd@amazon.com.
Problem description:
We have occasionally observed objects having an orphaned dependency, the
most common case we have seen is functions not linked to any namespaces.
Examples to produce such orphaned dependencies:
Scenario 1:
session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;
With the above, the function created in session 2 would be linked to a non
existing schema.
Scenario 2:
session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;
With the above, the function created in session 1 would be linked to a non
existing schema.
A patch has been initially proposed to fix this particular
(function-to-namespace) dependency (see [1]/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net), but there could be much
more scenarios (like the function-to-datatype one highlighted by Gilles
in [1]/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net that could lead to a function having an invalid parameter datatype).
As Tom said there are dozens more cases that would need to be
considered, and a global approach to avoid those race conditions should
be considered instead.
A first global approach attempt has been proposed in [2]/messages/by-id/8369ff70-0e31-f194-2954-787f4d9e21dd@amazon.com making use of a dirty
snapshot when recording the dependency. But this approach appeared to be "scary"
and it was still failing to close some race conditions (see [2]/messages/by-id/8369ff70-0e31-f194-2954-787f4d9e21dd@amazon.com for details).
Then, Tom proposed another approach in [2]/messages/by-id/8369ff70-0e31-f194-2954-787f4d9e21dd@amazon.com which is that "creation DDL will have
to take a lock on each referenced object that'd conflict with a lock taken by
DROP".
This is what the attached patch is trying to achieve.
It does the following:
1) A new lock (that conflicts with a lock taken by DROP) has been put in place
when the dependencies are being recorded.
Thanks to it, the drop schema in scenario 2 would be locked (resulting in an
error should session 1 committs).
2) After locking the object while recording the dependency, the patch checks
that the object still exists.
Thanks to it, session 2 in scenario 1 would be locked and would report an error
once session 1 committs (that would not be the case should session 1 abort the
transaction).
The patch also adds a few tests for some dependency cases (that would currently
produce orphaned objects):
- schema and function (as the above scenarios)
- function and type
- table and type (which is I think problematic enough, as involving a table into
the game, to fix this stuff as a whole).
[1]: /messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net
[2]: /messages/by-id/8369ff70-0e31-f194-2954-787f4d9e21dd@amazon.com
Please note that I'm not used to with this area of the code so that the patch
might not take the option proposed by Tom the "right" way.
Adding the patch to the July CF.
Looking forward to your feedback,
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Avoid-orphaned-objects-dependencies.patchtext/x-diff; charset=us-asciiDownload+226-4
Hi Bertrand,
22.04.2024 11:45, Bertrand Drouvot wrote:
Hi,
This new thread is a follow-up of [1] and [2].
Problem description:
We have occasionally observed objects having an orphaned dependency, the
most common case we have seen is functions not linked to any namespaces....
Looking forward to your feedback,
This have reminded me of bug #17182 [1]/messages/by-id/17182-a6baa001dd1784be@postgresql.org.
Unfortunately, with the patch applied, the following script:
for ((i=1;i<=100;i++)); do
( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 &
echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
" | psql >psql2.log 2>&1 &
wait
psql -c "DROP SCHEMA s CASCADE" >psql3.log
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql
still ends with:
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG: server process (PID 1388378) was terminated by signal 11:
Segmentation fault
2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL: Failed process was running: SELECT
pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL
[1]: /messages/by-id/17182-a6baa001dd1784be@postgresql.org
Best regards,
Alexander
Hi,
On Mon, Apr 22, 2024 at 01:00:00PM +0300, Alexander Lakhin wrote:
Hi Bertrand,
22.04.2024 11:45, Bertrand Drouvot wrote:
Hi,
This new thread is a follow-up of [1] and [2].
Problem description:
We have occasionally observed objects having an orphaned dependency, the
most common case we have seen is functions not linked to any namespaces....
Looking forward to your feedback,
This have reminded me of bug #17182 [1].
Thanks for the link to the bug!
Unfortunately, with the patch applied, the following script:
for ((i=1;i<=100;i++)); do
� ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 &
� echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
� "� | psql >psql2.log 2>&1 &
� wait
� psql -c "DROP SCHEMA s CASCADE" >psql3.log
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
� LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psqlstill ends with:
server closed the connection unexpectedly
������� This probably means the server terminated abnormally
������� before or while processing the request.2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG:� server process (PID
1388378) was terminated by signal 11: Segmentation fault
2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL:� Failed process was
running: SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid
FROM pg_proc pp
����� LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL
Thanks for sharing the script.
That's weird, I just launched it several times with the patch applied and I'm not
able to see the seg fault (while I can see it constently failing on the master
branch).
Are you 100% sure you tested it against a binary with the patch applied?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
22.04.2024 13:52, Bertrand Drouvot wrote:
That's weird, I just launched it several times with the patch applied and I'm not
able to see the seg fault (while I can see it constently failing on the master
branch).Are you 100% sure you tested it against a binary with the patch applied?
Yes, at least I can't see what I'm doing wrong. Please try my
self-contained script attached.
Best regards,
Alexander
Attachments:
repro-17182.sh.txttext/plain; charset=UTF-8; name=repro-17182.sh.txtDownload
Hi,
On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
22.04.2024 13:52, Bertrand Drouvot wrote:
That's weird, I just launched it several times with the patch applied and I'm not
able to see the seg fault (while I can see it constently failing on the master
branch).Are you 100% sure you tested it against a binary with the patch applied?
Yes, at least I can't see what I'm doing wrong. Please try my
self-contained script attached.
Thanks for sharing your script!
Yeah your script ensures the patch is applied before the repro is executed.
I do confirm that I can also see the issue with the patch applied (but I had to
launch multiple attempts, while on master one attempt is enough).
I'll have a look.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Tue, Apr 23, 2024 at 04:59:09AM +0000, Bertrand Drouvot wrote:
Hi,
On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
22.04.2024 13:52, Bertrand Drouvot wrote:
That's weird, I just launched it several times with the patch applied and I'm not
able to see the seg fault (while I can see it constently failing on the master
branch).Are you 100% sure you tested it against a binary with the patch applied?
Yes, at least I can't see what I'm doing wrong. Please try my
self-contained script attached.Thanks for sharing your script!
Yeah your script ensures the patch is applied before the repro is executed.
I do confirm that I can also see the issue with the patch applied (but I had to
launch multiple attempts, while on master one attempt is enough).I'll have a look.
Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while recording
the dependency.
I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Avoid-orphaned-objects-dependencies.patchtext/x-diff; charset=us-asciiDownload+245-4
Hi,
On Tue, Apr 23, 2024 at 04:20:46PM +0000, Bertrand Drouvot wrote:
Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while recording
the dependency.I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?
I gave more thought to v2 and the approach seems reasonable to me. Basically what
it does is that in case the object is already dropped before we take the new lock
(while recording the dependency) then the error message is a "generic" one (means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.
Of course, I would be happy to hear others thought about it.
Please find v3 attached (which is v2 with more comments).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Avoid-orphaned-objects-dependencies.patchtext/x-diff; charset=us-asciiDownload+251-4
Hi Bertrand,
24.04.2024 11:38, Bertrand Drouvot wrote:
Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while recording
the dependency.I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?I gave more thought to v2 and the approach seems reasonable to me. Basically what
it does is that in case the object is already dropped before we take the new lock
(while recording the dependency) then the error message is a "generic" one (means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.Of course, I would be happy to hear others thought about it.
Please find v3 attached (which is v2 with more comments).
Thank you for the improved version!
I can confirm that it fixes that case.
I've also tested other cases that fail on master (most of them also fail
with v1), please try/look at the attached script. (There could be other
broken-dependency cases, of course, but I think I've covered all the
representative ones.)
All tested cases work correctly with v3 applied — I couldn't get broken
dependencies, though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.
Best regards,
Alexander
Attachments:
test-broken-deps.sh.txttext/plain; charset=UTF-8; name=test-broken-deps.sh.txtDownload
Hi,
On Wed, Apr 24, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
24.04.2024 11:38, Bertrand Drouvot wrote:
I gave more thought to v2 and the approach seems reasonable to me. Basically what
it does is that in case the object is already dropped before we take the new lock
(while recording the dependency) then the error message is a "generic" one (means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.Of course, I would be happy to hear others thought about it.
Please find v3 attached (which is v2 with more comments).
Thank you for the improved version!
I can confirm that it fixes that case.
Great, thanks for the test!
I've also tested other cases that fail on master (most of them also fail
with v1), please try/look at the attached script.
Thanks for all those tests!
(There could be other broken-dependency cases, of course, but I think I've
covered all the representative ones.)
Agree. Furthermore the way the patch is written should be agnostic to the
object's kind that are part of the dependency. Having said that, that does not
hurt to add more tests in this patch, so v4 attached adds some of your tests (
that would fail on the master branch without the patch applied).
The way the tests are written in the patch are less "racy" that when triggered
with your script. Indeed, I think that in the patch the dependent object can not
be removed before the new lock is taken when recording the dependency. We may
want to add injection points in the game if we feel the need.
All tested cases work correctly with v3 applied —
Yeah, same on my side, I did run them too and did not find any issues.
I couldn't get broken
dependencies,
Same here.
though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.
I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?
Attached v4, simply adding more tests to v3.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0001-Avoid-orphaned-objects-dependencies.patchtext/x-diff; charset=us-asciiDownload+354-8
Hi,
25.04.2024 08:00, Bertrand Drouvot wrote:
though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?
You can try, for example, table-trigger, or other tests that check for
"cache lookup failed" psql output only (maybe you'll need to increase the
iteration count). For instance, I've got (with v4 applied):
2024-04-25 05:48:08.102 UTC [3638763] ERROR: cache lookup failed for function 20893
2024-04-25 05:48:08.102 UTC [3638763] STATEMENT: CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t
FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c');
Or with function-function:
2024-04-25 05:52:31.390 UTC [3711897] ERROR: cache lookup failed for function 32190 at character 54
2024-04-25 05:52:31.390 UTC [3711897] STATEMENT: CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
--
2024-04-25 05:52:37.639 UTC [3720011] ERROR: cache lookup failed for function 34465 at character 54
2024-04-25 05:52:37.639 UTC [3720011] STATEMENT: CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
Attached v4, simply adding more tests to v3.
Thank you for working on this!
Best regards,
Alexander
Hi,
On Thu, Apr 25, 2024 at 09:00:00AM +0300, Alexander Lakhin wrote:
Hi,
25.04.2024 08:00, Bertrand Drouvot wrote:
though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?You can try, for example, table-trigger, or other tests that check for
"cache lookup failed" psql output only (maybe you'll need to increase the
iteration count). For instance, I've got (with v4 applied):
2024-04-25 05:48:08.102 UTC [3638763] ERROR:� cache lookup failed for function 20893
2024-04-25 05:48:08.102 UTC [3638763] STATEMENT:� CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t
������� FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c');Or with function-function:
2024-04-25 05:52:31.390 UTC [3711897] ERROR:� cache lookup failed for function 32190 at character 54
2024-04-25 05:52:31.390 UTC [3711897] STATEMENT:� CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
--
2024-04-25 05:52:37.639 UTC [3720011] ERROR:� cache lookup failed for function 34465 at character 54
2024-04-25 05:52:37.639 UTC [3720011] STATEMENT:� CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
I see, so this is during object creation.
It's easy to reproduce this kind of issue with gdb. For example set a breakpoint
on SearchSysCache1() and during the create function f1() once it breaks on:
#0 SearchSysCache1 (cacheId=45, key1=16400) at syscache.c:221
#1 0x00005ad305beacd6 in func_get_detail (funcname=0x5ad308204d50, fargs=0x0, fargnames=0x0, nargs=0, argtypes=0x7ffff2ff9cc0, expand_variadic=true, expand_defaults=true, include_out_arguments=false, funcid=0x7ffff2ff9ba0, rettype=0x7ffff2ff9b9c, retset=0x7ffff2ff9b94, nvargs=0x7ffff2ff9ba4,
vatype=0x7ffff2ff9ba8, true_typeids=0x7ffff2ff9bd8, argdefaults=0x7ffff2ff9be0) at parse_func.c:1622
#2 0x00005ad305be7dd0 in ParseFuncOrColumn (pstate=0x5ad30823be98, funcname=0x5ad308204d50, fargs=0x0, last_srf=0x0, fn=0x5ad308204da0, proc_call=false, location=55) at parse_func.c:266
#3 0x00005ad305bdffb0 in transformFuncCall (pstate=0x5ad30823be98, fn=0x5ad308204da0) at parse_expr.c:1474
#4 0x00005ad305bdd2ee in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204da0) at parse_expr.c:230
#5 0x00005ad305bdec34 in transformAExprOp (pstate=0x5ad30823be98, a=0x5ad308204e20) at parse_expr.c:990
#6 0x00005ad305bdd1a0 in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204e20) at parse_expr.c:187
#7 0x00005ad305bdd00b in transformExpr (pstate=0x5ad30823be98, expr=0x5ad308204e20, exprKind=EXPR_KIND_SELECT_TARGET) at parse_expr.c:131
#8 0x00005ad305b96b7e in transformReturnStmt (pstate=0x5ad30823be98, stmt=0x5ad308204ee0) at analyze.c:2395
#9 0x00005ad305b92213 in transformStmt (pstate=0x5ad30823be98, parseTree=0x5ad308204ee0) at analyze.c:375
#10 0x00005ad305c6321a in interpret_AS_clause (languageOid=14, languageName=0x5ad308204c40 "sql", funcname=0x5ad308204ad8 "f100", as=0x0, sql_body_in=0x5ad308204ee0, parameterTypes=0x0, inParameterNames=0x0, prosrc_str_p=0x7ffff2ffa208, probin_str_p=0x7ffff2ffa200, sql_body_out=0x7ffff2ffa210,
queryString=0x5ad3082040b0 "CREATE FUNCTION f100() RETURNS int LANGUAGE SQL RETURN f() + 1;") at functioncmds.c:953
#11 0x00005ad305c63c93 in CreateFunction (pstate=0x5ad308186310, stmt=0x5ad308204f00) at functioncmds.c:1221
then drop function f() in another session. Then the create function f1() would:
postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
ERROR: cache lookup failed for function 16400
This stuff does appear before we get a chance to call the new depLockAndCheckObject()
function.
I think this is what Tom was referring to in [1]/messages/by-id/2872252.1630851337@sss.pgh.pa.us:
"
So the only real fix for this would be to make every object lookup in the entire
system do the sort of dance that's done in RangeVarGetRelidExtended.
"
The fact that those kind of errors appear also somehow ensure that no orphaned
dependencies can be created.
The patch does ensure that no orphaned depencies can occur after those "initial"
look up are done (should the dependent object be dropped).
I'm tempted to not add extra complexity to avoid those kind of errors and keep the
patch as it is. All of those servicing the same goal: no orphaned depencies are
created.
[1]: /messages/by-id/2872252.1630851337@sss.pgh.pa.us
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi Bertrand,
25.04.2024 10:20, Bertrand Drouvot wrote:
postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
ERROR: cache lookup failed for function 16400This stuff does appear before we get a chance to call the new depLockAndCheckObject()
function.I think this is what Tom was referring to in [1]:
"
So the only real fix for this would be to make every object lookup in the entire
system do the sort of dance that's done in RangeVarGetRelidExtended.
"The fact that those kind of errors appear also somehow ensure that no orphaned
dependencies can be created.
I agree; the only thing that I'd change here, is the error code.
But I've discovered yet another possibility to get a broken dependency.
Please try this script:
res=0
numclients=20
for ((i=1;i<=100;i++)); do
for ((c=1;c<=numclients;c++)); do
echo "
CREATE SCHEMA s_$c;
CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
" | psql >psql1-$c.log 2>&1 &
echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
done
wait
pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
for ((c=1;c<=numclients;c++)); do
echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
done
done
psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);"
It fails for me (with the v4 patch applied) as follows:
pg_dump: error: schema with OID 16392 does not exist
on iteration 1
oid | conname | connamespace | conowner | conforencoding | contoencoding | conproc | condefault
-------+----------+--------------+----------+----------------+---------------+-------------------+------------
16396 | myconv_6 | 16392 | 10 | 8 | 6 | iso8859_1_to_utf8 | f
Best regards,
Alexander
Hi,
On Tue, Apr 30, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
Hi Bertrand,
But I've discovered yet another possibility to get a broken dependency.
Thanks for the testing and the report!
Please try this script:
res=0
numclients=20
for ((i=1;i<=100;i++)); do
for ((c=1;c<=numclients;c++)); do
� echo "
CREATE SCHEMA s_$c;
CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
� " | psql >psql1-$c.log 2>&1 &
� echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
done
wait
pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
for ((c=1;c<=numclients;c++)); do
� echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
done
done
psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);"It fails for me (with the v4 patch applied) as follows:
pg_dump: error: schema with OID 16392 does not exist
on iteration 1
� oid� | conname� | connamespace | conowner | conforencoding | contoencoding |����� conproc����� | condefault
-------+----------+--------------+----------+----------------+---------------+-------------------+------------
�16396 | myconv_6 |������� 16392 |������ 10 |������������� 8 |������������ 6 | iso8859_1_to_utf8 | f
Thanks for sharing the test, I'm able to reproduce the issue with v4.
Oh I see, your test updates an existing dependency. v4 took care about brand new
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).
Please find attached v5 that adds:
- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.
With v5 applied, I don't see the issue anymore.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0001-Avoid-orphaned-objects-dependencies.patchtext/x-diff; charset=us-asciiDownload+387-9
Hi Bertrand,
09.05.2024 15:20, Bertrand Drouvot wrote:
Oh I see, your test updates an existing dependency. v4 took care about brand new
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).Please find attached v5 that adds:
- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.With v5 applied, I don't see the issue anymore.
Me too. Thank you for the improved version!
I will test the patch in the background, but for now I see no other
issues with it.
Best regards,
Alexander
On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote:
Oh I see, your test updates an existing dependency. v4 took care about brand new
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).Please find attached v5 that adds:
- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.With v5 applied, I don't see the issue anymore.
+ if (object_description)
+ ereport(ERROR, errmsg("%s does not exist", object_description));
+ else
+ ereport(ERROR, errmsg("a dependent object does not ex
This generates an internal error code. Is that intended?
--- /dev/null
+++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec
This introduces a module with only one single spec. I could get
behind an extra module if splitting the tests into more specs makes
sense or if there is a restriction on installcheck. However, for
one spec file filed with a bunch of objects, and note that I'm OK to
let this single spec grow more for this range of tests, it seems to me
that this would be better under src/test/isolation/.
+ if (use_dirty_snapshot)
+ {
+ InitDirtySnapshot(DirtySnapshot);
+ snapshot = &DirtySnapshot;
+ }
+ else
+ snapshot = NULL;
I'm wondering if Robert has a comment about that. It looks backwards
in a world where we try to encourage MVCC snapshots for catalog
scans (aka 568d4138c646), especially for the part related to
dependency.c and ObjectAddresses.
--
Michael
Hi,
On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote:
Oh I see, your test updates an existing dependency. v4 took care about brand new
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).Please find attached v5 that adds:
- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.With v5 applied, I don't see the issue anymore.
+ if (object_description) + ereport(ERROR, errmsg("%s does not exist", object_description)); + else + ereport(ERROR, errmsg("a dependent object does not exThis generates an internal error code. Is that intended?
Thanks for looking at it!
Yes, it's like when say you want to create an object in a schema that does not
exist (see get_namespace_oid()).
--- /dev/null +++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.specThis introduces a module with only one single spec. I could get
behind an extra module if splitting the tests into more specs makes
sense or if there is a restriction on installcheck. However, for
one spec file filed with a bunch of objects, and note that I'm OK to
let this single spec grow more for this range of tests, it seems to me
that this would be better under src/test/isolation/.
Yeah, I was not sure about this one (the location is from take 2 mentioned
up-thread). I'll look at moving the tests to src/test/isolation/.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Tue, May 14, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
Hi Bertrand,
09.05.2024 15:20, Bertrand Drouvot wrote:
Oh I see, your test updates an existing dependency. v4 took care about brand new
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).Please find attached v5 that adds:
- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.With v5 applied, I don't see the issue anymore.
Me too. Thank you for the improved version!
I will test the patch in the background, but for now I see no other
issues with it.
Thanks for confirming!
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Wed, May 15, 2024 at 08:31:43AM +0000, Bertrand Drouvot wrote:
Hi,
On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
+++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.specThis introduces a module with only one single spec. I could get
behind an extra module if splitting the tests into more specs makes
sense or if there is a restriction on installcheck. However, for
one spec file filed with a bunch of objects, and note that I'm OK to
let this single spec grow more for this range of tests, it seems to me
that this would be better under src/test/isolation/.Yeah, I was not sure about this one (the location is from take 2 mentioned
up-thread). I'll look at moving the tests to src/test/isolation/.
Please find attached v6 (only diff with v5 is moving the tests as suggested
above).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v6-0001-Avoid-orphaned-objects-dependencies.patchtext/x-diff; charset=us-asciiDownload+357-9
Hello Bertrand,
15.05.2024 11:31, Bertrand Drouvot wrote:
On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
+ if (object_description) + ereport(ERROR, errmsg("%s does not exist", object_description)); + else + ereport(ERROR, errmsg("a dependent object does not exThis generates an internal error code. Is that intended?
Yes, it's like when say you want to create an object in a schema that does not
exist (see get_namespace_oid()).
AFAICS, get_namespace_oid() throws not ERRCODE_INTERNAL_ERROR,
but ERRCODE_UNDEFINED_SCHEMA:
# SELECT regtype('unknown_schema.type');
ERROR: schema "unknown_schema" does not exist
LINE 1: SELECT regtype('unknown_schema.type');
^
# \echo :LAST_ERROR_SQLSTATE
3F000
Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]/messages/by-id/Zic_GNgos5sMxKoa@paquier.xyz
and [2]https://commitfest.postgresql.org/48/4735/, as these errors are not that abnormal (not Assert-like).
[1]: /messages/by-id/Zic_GNgos5sMxKoa@paquier.xyz
[2]: https://commitfest.postgresql.org/48/4735/
Best regards,
Alexander
Hi,
On Sun, May 19, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:
Hello Bertrand,
Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
and [2], as these errors are not that abnormal (not Assert-like).[1] /messages/by-id/Zic_GNgos5sMxKoa@paquier.xyz
[2] https://commitfest.postgresql.org/48/4735/
Thanks for mentioning the above examples, I agree that it's worth to avoid
ERRCODE_INTERNAL_ERROR here: please find attached v7 that makes use of a new
ERRCODE: ERRCODE_DEPENDENT_OBJECTS_DOES_NOT_EXIST
I thought about this name as it is close enough to the already existing
"ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST" but I'm open to suggestion too.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com