Patch to avoid orphaned dependencies

Started by Bertrand Drouvotalmost 5 years ago14 messageshackers
Jump to latest
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com

Hi,

This new thread is a follow-up of [1]/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net.

Problem description:

We have occasionally observed objects having an orphaned dependency, the
most common case we have seen (if not the only one) is functions not
linked to any namespaces.
A patch has been initially proposed to fix this particular
(function-to-namespace) dependency (see [1]/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net), but there could be much
more scenarios (like the function-to-datatype one highlighted by Gilles
in [1]/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net that could lead to a function having an invalid parameter datatype).
As Tom said there are dozens more cases that would need to be
considered, and a global approach to avoid those race conditions should
be considered instead.

The attached patch is avoiding those race conditions globally by
changing the dependency mechanism: we are using a dirty snapshot any
time we’re about to create a pg_depend or pg_shdepend entry.
That way we can check if there is in-flight transactions that are
affecting the dependency: if that’s the case, an error is being reported.

This approach has been chosen over another one that would have make use
of the locking machinery.
The reason for this choice is to avoid possible slow down of typical DDL
command, risk of deadlock, number of locks taken by transaction...

Implementation overview:

* A new catalog snapshot is added: DirtyCatalogSnapshot.
* This catalog snapshot is a dirty one to be able to look for
in-flight dependencies.
* Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
* Any time this variable is being set to true, then the next call to
GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
* This snapshot is being used to check for in-flight dependencies and
also to get the objects description to generate the error messages.

*Testing:*

Test 1

Session1:

|postgres=# create schema tobeorph; CREATE SCHEMA postgres=# create
table tobeorph.bdt (a int); CREATE TABLE postgres=# begin; BEGIN
postgres=*# CREATE OR REPLACE FUNCTION tobeorph.bdttime() RETURNS
TIMESTAMP AS $$DECLARE outTS TIMESTAMP; BEGIN perform pg_sleep(10);
RETURN CURRENT_TIMESTAMP; END; $$ LANGUAGE plpgsql volatile; CREATE
FUNCTION |

Session 1 does not commit, then session 2:

|postgres=# drop schema tobeorph; ERROR: cannot drop schema tobeorph
because other objects depend on it DETAIL: table tobeorph.bdt depends on
schema tobeorph function tobeorph.bdttime() (not yet committed) depends
on schema tobeorph HINT: DROP and DROP CASCADE won't work when there are
uncommitted dependencies. |

Test 2

Session 1:

|postgres=# create schema toinsert; CREATE SCHEMA postgres=# begin;
BEGIN postgres=*# drop schema toinsert; DROP SCHEMA |

Session 1 does not commit, then session 2:

|postgres=# CREATE OR REPLACE FUNCTION toinsert.bdttime() RETURNS
TIMESTAMP AS $$DECLARE outTS TIMESTAMP; BEGIN perform pg_sleep(10);
RETURN CURRENT_TIMESTAMP; END; $$ LANGUAGE plpgsql volatile; ERROR:
cannot create function toinsert.bdttime() because it depends of other
objects uncommitted dependencies DETAIL: function toinsert.bdttime()
depends on schema toinsert (dependency not yet committed) HINT: CREATE
won't work as long as there is uncommitted dependencies. |

Test3

|Session1: psql -U toorph postgres psql (14devel) Type "help" for help.
postgres=> begin; BEGIN postgres=*> CREATE OR REPLACE FUNCTION bdttime()
RETURNS TIMESTAMP AS $$DECLARE outTS TIMESTAMP; BEGIN perform
pg_sleep(10); RETURN CURRENT_TIMESTAMP; END; $$ LANGUAGE plpgsql
volatile; CREATE FUNCTION |

Session 1 does not commit, then session 2:

|postgres=# drop owned by toorph; ERROR: cannot drop objects owned by
role toorph because other uncommitted objects depend on it DETAIL:
function public.bdttime() (not yet committed) depends on role toorph
HINT: Commit or rollback function public.bdttime() creation. |

I'm creating a new commitfest entry for this patch.

Thanks

Bertrand

||

[1]: /messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net
/messages/by-id/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net

Attachments:

v1-0001-orphaned-dependencies.patchtext/plain; charset=UTF-8; name=v1-0001-orphaned-dependencies.patch; x-mac-creator=0; x-mac-type=0Download+365-25
#2Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Bertrand Drouvot (#1)
Re: Patch to avoid orphaned dependencies

Hello Bertrand,

Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :

Implementation overview:

* A new catalog snapshot is added: DirtyCatalogSnapshot.
* This catalog snapshot is a dirty one to be able to look for
in-flight dependencies.
* Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
* Any time this variable is being set to true, then the next call to
GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
* This snapshot is being used to check for in-flight dependencies and
also to get the objects description to generate the error messages.

I quickly tested the patch, it behaves as advertised, and passes tests.

Isolation tests should be added to demonstrate the issues it is solving.

However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot
global variable which is then reset after each snapshot acquisition: I'm
having trouble understanding all the implications of that, if it would be
possible to introduce an unforeseen snapshot before the one we actually want
to be dirty.

I don't want to derail this thread, but couldn't predicate locks on the
pg_depend index pages corresponding to the dropped object / referenced objects
work as a different approach ? I'm not familiar enough with them so maybe there
is some fundamental misunderstanding on my end.

--
Ronan Dunklau

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ronan Dunklau (#2)
Re: Patch to avoid orphaned dependencies

Hi Ronan,

On 9/17/21 10:09 AM, Ronan Dunklau wrote:

Hello Bertrand,

Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :

Implementation overview:

* A new catalog snapshot is added: DirtyCatalogSnapshot.
* This catalog snapshot is a dirty one to be able to look for
in-flight dependencies.
* Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
* Any time this variable is being set to true, then the next call to
GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
* This snapshot is being used to check for in-flight dependencies and
also to get the objects description to generate the error messages.

I quickly tested the patch, it behaves as advertised, and passes tests.

Thanks for looking at it!

Isolation tests should be added to demonstrate the issues it is solving.

Good point. I'll have a look.

However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot
global variable which is then reset after each snapshot acquisition: I'm
having trouble understanding all the implications of that, if it would be
possible to introduce an unforeseen snapshot before the one we actually want
to be dirty.

I don't think that could be possible as long as:

- this is a per backend variable

- we pay attention where we set it to true

But i might be missing something.

Do you have any corner cases in mind?

I don't want to derail this thread, but couldn't predicate locks on the
pg_depend index pages corresponding to the dropped object / referenced objects
work as a different approach ?

I'm fine to have a look at another approach if needed, but does it mean
we are not happy with the current approach proposal?

Thanks

Bertrand

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Bertrand Drouvot (#3)
Re: Patch to avoid orphaned dependencies

On 20 Sep 2021, at 12:50, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Ronan,

On 9/17/21 10:09 AM, Ronan Dunklau wrote:

Hello Bertrand,

Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :

Implementation overview:

* A new catalog snapshot is added: DirtyCatalogSnapshot.
* This catalog snapshot is a dirty one to be able to look for
in-flight dependencies.
* Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
* Any time this variable is being set to true, then the next call to
GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
* This snapshot is being used to check for in-flight dependencies and
also to get the objects description to generate the error messages.

I quickly tested the patch, it behaves as advertised, and passes tests.

Thanks for looking at it!

Isolation tests should be added to demonstrate the issues it is solving.

Good point. I'll have a look.

However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot
global variable which is then reset after each snapshot acquisition: I'm
having trouble understanding all the implications of that, if it would be
possible to introduce an unforeseen snapshot before the one we actually want
to be dirty.

I don't think that could be possible as long as:

- this is a per backend variable

- we pay attention where we set it to true

But i might be missing something.

Do you have any corner cases in mind?

I don't want to derail this thread, but couldn't predicate locks on the
pg_depend index pages corresponding to the dropped object / referenced objects
work as a different approach ?

I'm fine to have a look at another approach if needed, but does it mean we are not happy with the current approach proposal?

This patch fails to apply as a whole, with the parts applying showing quite
large offsets. Have you had the chance to look at the isolation test asked for
above?

--
Daniel Gustafsson https://vmware.com/

#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Daniel Gustafsson (#4)
Re: Patch to avoid orphaned dependencies

Hi,

On 11/17/21 2:25 PM, Daniel Gustafsson wrote:

This patch fails to apply as a whole, with the parts applying showing quite
large offsets.

Thanks for the warning, please find attached a rebase of it.

Have you had the chance to look at the isolation test asked for
above?

Not yet, but I'll look at it for sure.

Thanks

Bertrand

Attachments:

v1-0002-orphaned-dependencies.patchtext/plain; charset=UTF-8; name=v1-0002-orphaned-dependencies.patchDownload+363-23
#6Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#5)
Re: Patch to avoid orphaned dependencies

Hi,

On 11/23/21 4:22 PM, Drouvot, Bertrand wrote:

Hi,

On 11/17/21 2:25 PM, Daniel Gustafsson wrote:

This patch fails to apply as a whole, with the parts applying showing
quite
large offsets.

Thanks for the warning, please find attached a rebase of it.

Have you had the chance to look at the isolation test asked for
above?

Not yet, but I'll look at it for sure.

Please find enclosed v1-0003-orphaned-dependencies.patch, that contains:

- a mandatory rebase

- a few isolation tests added in src/test/modules/test_dependencies (but
I'm not sure at all that's the right place to add them, is it?)

Thanks

Bertrand

Attachments:

v1-0003-orphaned-dependencies.patchtext/plain; charset=UTF-8; name=v1-0003-orphaned-dependencies.patchDownload+473-23
#7Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#6)
Re: Patch to avoid orphaned dependencies

Hi,

On 2021-12-17 14:19:18 +0100, Drouvot, Bertrand wrote:

Please find enclosed v1-0003-orphaned-dependencies.patch, that contains:

- a mandatory rebase

- a few isolation tests added in src/test/modules/test_dependencies (but I'm
not sure at all that's the right place to add them, is it?)

This fails on windows w/ msvc:

https://cirrus-ci.com/task/5368174125252608?logs=configure#L102
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157904#L12

Greetings,

Andres Freund

#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#7)
Re: Patch to avoid orphaned dependencies

Hi,

On 12/31/21 7:03 AM, Andres Freund wrote:

Hi,

On 2021-12-17 14:19:18 +0100, Drouvot, Bertrand wrote:

Please find enclosed v1-0003-orphaned-dependencies.patch, that contains:

- a mandatory rebase

- a few isolation tests added in src/test/modules/test_dependencies (but I'm
not sure at all that's the right place to add them, is it?)

This fails on windows w/ msvc:

https://cirrus-ci.com/task/5368174125252608?logs=configure#L102
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157904#L12

Thanks Andres for the warning.

Please find enclosed v1-0004-orphaned-dependencies.patch that addresses
the issue.

Thanks

Bertrand

Attachments:

v1-0004-orphaned-dependencies.patchtext/plain; charset=UTF-8; name=v1-0004-orphaned-dependencies.patchDownload+474-24
#9Zhihong Yu
zyu@yugabyte.com
In reply to: Bertrand Drouvot (#8)
Re: Patch to avoid orphaned dependencies

Hi,

For genam.c:

+   UseDirtyCatalogSnapshot = dirtysnap;
+
Does the old value of UseDirtyCatalogSnapshot need to be restored at the
end of the func ?

+systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup, bool dirtysnap)

Considering that parameter dirtysnap is a bool, I think it should be named
isdirtysnap so that its meaning can be distinguished from:

+ Snapshot dirtySnapshot;

+   UseDirtyCatalogSnapshot = true;
+
+   dirtySnapshot = GetCatalogSnapshot(RelationGetRelid(*depRel));

I tend to think that passing usedirtysnap (bool parameter)
to GetCatalogSnapshot() would be more flexible than setting global variable.

Cheers

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: Patch to avoid orphaned dependencies

Bertand, do you think this has any chance of making it into v15? If not,
are you alright with adjusting the commitfest entry to v16 and moving it to
the next commitfest?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#10)
Re: Patch to avoid orphaned dependencies

Nathan Bossart <nathandbossart@gmail.com> writes:

Bertand, do you think this has any chance of making it into v15? If not,
are you alright with adjusting the commitfest entry to v16 and moving it to
the next commitfest?

I looked this over briefly, and IMO it should have no chance of being
committed in anything like this form.

The lesser problem is that (as already noted) the reliance on a global
variable that changes catalog lookup semantics is just unbelievably
scary. An example of the possible consequences here is that a syscache
entry could get made while that's set, containing a row that we should
not be able to see yet, and indeed might never get committed at all.
Perhaps that could be addressed by abandoning the patch's ambition to tell
you the details of an uncommitted object (which would have race conditions
anyway), so that *only* reads of pg_[sh]depend itself need be dirty.

The bigger problem is that it fails to close the race condition that
it's intending to solve. This patch will catch a race like this:

Session doing DROP Session doing CREATE

DROP begins

CREATE makes a dependent catalog entry

DROP scans for dependent objects

CREATE commits

DROP removes catalog entry

DROP commits

However, it will not catch this slightly different timing:

Session doing DROP Session doing CREATE

DROP begins

DROP scans for dependent objects

CREATE makes a dependent catalog entry

CREATE commits

DROP removes catalog entry

DROP commits

So I don't see that we've moved the goalposts very far at all.

Realistically, if we want to prevent this type of problem, then all
creation DDL will have to take a lock on each referenced object that'd
conflict with a lock taken by DROP. This might not be out of reach:
I think we do already take such locks while dropping objects. The
reference-side lock could be taken by the recordDependency mechanism
itself, ensuring that we don't miss anything; and that would also
allow us to not bother taking such a lock on pinned objects, which'd
greatly cut the cost (though not to zero).

regards, tom lane

#12Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#11)
Re: Patch to avoid orphaned dependencies

On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

Bertand, do you think this has any chance of making it into v15? If not,
are you alright with adjusting the commitfest entry to v16 and moving it to
the next commitfest?

I looked this over briefly, and IMO it should have no chance of being
committed in anything like this form.

I marked the commitfest entry as waiting-on-author, set the target version
to v16, and moved it to the next commitfest.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#13Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Nathan Bossart (#12)
Re: Patch to avoid orphaned dependencies

This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

https://commitfest.postgresql.org/38/3106/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob

#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Tom Lane (#11)
Re: Patch to avoid orphaned dependencies

Hi,

On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote:

Realistically, if we want to prevent this type of problem, then all
creation DDL will have to take a lock on each referenced object that'd
conflict with a lock taken by DROP. This might not be out of reach:
I think we do already take such locks while dropping objects. The
reference-side lock could be taken by the recordDependency mechanism
itself, ensuring that we don't miss anything; and that would also
allow us to not bother taking such a lock on pinned objects, which'd
greatly cut the cost (though not to zero).

Thanks for the idea (and sorry for the delay replying to it)! I had a look at it
and just created a new thread [1]/messages/by-id/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal based on your proposal.

[1]: /messages/by-id/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com