How about using dirty snapshots to locate dependent objects?

Started by Ashutosh Sharmaover 1 year ago8 messages
#1Ashutosh Sharma
ashu.coek88@gmail.com
1 attachment(s)

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This
implies that if a new dependent object is inserted within a transaction
that is still ongoing, our search for dependent objects won't include this
recently added one. Consequently, if someone attempts to drop the
referenced object, it will be dropped, and when the ongoing transaction
completes, we will end up having an entry for a referenced object that has
already been dropped. This situation can lead to an inconsistent state.
Below is an example illustrating this scenario:

Session 1:
- create table t1(a int);
- insert into t1 select i from generate_series(1, 10000000) i;
- create extension btree_gist;
- create index i1 on t1 using gist( a );

Session 2: (While the index creation in session 1 is in progress, drop the
btree_gist extension)
- drop extension btree_gist;

Above command succeeds and so does the create index command running in
session 1, post this, if we try running anything on table t1, i1, it fails
with an error: "cache lookup failed for opclass ..."

Attached is the patch that I have tried, which seems to be working for me.
It's not polished and thoroughly tested, but just sharing here to clarify
what I am trying to suggest. Please have a look and let me know your
thoughts.

--
With Regards,
Ashutosh Sharma.

Attachments:

use-dirty-snapshots-to-find-dependent-objects.patchapplication/octet-stream; name=use-dirty-snapshots-to-find-dependent-objects.patchDownload
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index d4b5b2ade1..78a73c2e9a 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -449,6 +449,7 @@ findDependentObjects(const ObjectAddress *object,
 	int			maxDependentObjects;
 	ObjectAddressStack mystack;
 	ObjectAddressExtra extra;
+	SnapshotData SnapshotDirty;
 
 	/*
 	 * If the target object is already being visited in an outer recursion
@@ -823,8 +824,14 @@ findDependentObjects(const ObjectAddress *object,
 	else
 		nkeys = 2;
 
+	/*
+	 * We use a "dirty snapshot" to read rows modified by transactions still
+	 * in-progress, which wouldn't be visible with normal snapshots.
+	 */
+	InitDirtySnapshot(SnapshotDirty);
+
 	scan = systable_beginscan(*depRel, DependReferenceIndexId, true,
-							  NULL, nkeys, key);
+							  &SnapshotDirty, nkeys, key);
 
 	while (HeapTupleIsValid(tup = systable_getnext(scan)))
 	{
#2Dilip Kumar
dilipbalaut@gmail.com
In reply to: Ashutosh Sharma (#1)
Re: How about using dirty snapshots to locate dependent objects?

On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:

I don't think it's correct to allow the index to be dropped while a
transaction is creating it. Instead, the right solution should be for
the create index operation to protect the object it is using from
being dropped. Specifically, the create index operation should acquire
a shared lock on the Access Method (AM) to ensure it doesn't get
dropped concurrently while the transaction is still in progress.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Sharma (#1)
Re: How about using dirty snapshots to locate dependent objects?

On Thu, Jun 06, 2024 at 05:59:00PM +0530, Ashutosh Sharma wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This
implies that if a new dependent object is inserted within a transaction
that is still ongoing, our search for dependent objects won't include this
recently added one. Consequently, if someone attempts to drop the
referenced object, it will be dropped, and when the ongoing transaction
completes, we will end up having an entry for a referenced object that has
already been dropped. This situation can lead to an inconsistent state.
Below is an example illustrating this scenario:

Session 1:
- create table t1(a int);
- insert into t1 select i from generate_series(1, 10000000) i;
- create extension btree_gist;
- create index i1 on t1 using gist( a );

Session 2: (While the index creation in session 1 is in progress, drop the
btree_gist extension)
- drop extension btree_gist;

Above command succeeds and so does the create index command running in
session 1, post this, if we try running anything on table t1, i1, it fails
with an error: "cache lookup failed for opclass ..."

Attached is the patch that I have tried, which seems to be working for me.
It's not polished and thoroughly tested, but just sharing here to clarify
what I am trying to suggest. Please have a look and let me know your
thoughts.

Thanks for the patch proposal!

The patch does not fix the other way around:

- session 1: BEGIN; DROP extension btree_gist;
- session 2: create index i1 on t1 using gist( a );
- session 1: commits while session 2 is creating the index

and does not address all the possible orphaned dependencies cases.

There is an ongoing thread (see [1]/messages/by-id/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal) to fix the orphaned dependencies issue.

v9 attached in [1]/messages/by-id/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal fixes the case you describe here.

[1]: /messages/by-id/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Dilip Kumar (#2)
Re: How about using dirty snapshots to locate dependent objects?

On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This

implies that if a new dependent object is inserted within a transaction
that is still ongoing, our search for dependent objects won't include this
recently added one. Consequently, if someone attempts to drop the
referenced object, it will be dropped, and when the ongoing transaction
completes, we will end up having an entry for a referenced object that has
already been dropped. This situation can lead to an inconsistent state.
Below is an example illustrating this scenario:

I don't think it's correct to allow the index to be dropped while a
transaction is creating it. Instead, the right solution should be for
the create index operation to protect the object it is using from
being dropped. Specifically, the create index operation should acquire
a shared lock on the Access Method (AM) to ensure it doesn't get
dropped concurrently while the transaction is still in progress.

If I'm following you correctly, that's exactly what the patch is trying to
do; while the index creation is in progress, if someone tries to drop the
object referenced by the index under creation, the referenced object being
dropped is able to know about the dependent object (in this case the index
being created) using dirty snapshot and hence, it is unable to acquire the
lock on the dependent object, and as a result of that, it is unable to drop
it.

--
With Regards,
Ashutosh Sharma.

#5Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Bertrand Drouvot (#3)
Re: How about using dirty snapshots to locate dependent objects?

On Thu, Jun 6, 2024 at 6:57 PM Bertrand Drouvot <
bertranddrouvot.pg@gmail.com> wrote:

On Thu, Jun 06, 2024 at 05:59:00PM +0530, Ashutosh Sharma wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This
implies that if a new dependent object is inserted within a transaction
that is still ongoing, our search for dependent objects won't include

this

recently added one. Consequently, if someone attempts to drop the
referenced object, it will be dropped, and when the ongoing transaction
completes, we will end up having an entry for a referenced object that

has

already been dropped. This situation can lead to an inconsistent state.
Below is an example illustrating this scenario:

Session 1:
- create table t1(a int);
- insert into t1 select i from generate_series(1, 10000000) i;
- create extension btree_gist;
- create index i1 on t1 using gist( a );

Session 2: (While the index creation in session 1 is in progress, drop

the

btree_gist extension)
- drop extension btree_gist;

Above command succeeds and so does the create index command running in
session 1, post this, if we try running anything on table t1, i1, it

fails

with an error: "cache lookup failed for opclass ..."

Attached is the patch that I have tried, which seems to be working for

me.

It's not polished and thoroughly tested, but just sharing here to clarify
what I am trying to suggest. Please have a look and let me know your
thoughts.

Thanks for the patch proposal!

The patch does not fix the other way around:

- session 1: BEGIN; DROP extension btree_gist;
- session 2: create index i1 on t1 using gist( a );
- session 1: commits while session 2 is creating the index

and does not address all the possible orphaned dependencies cases.

There is an ongoing thread (see [1]) to fix the orphaned dependencies
issue.

v9 attached in [1] fixes the case you describe here.

[1]:
/messages/by-id/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal

I see. Thanks for sharing this. I can take a look at this and help in
whatever way I can.

With Regards,
Ashutosh Sharma.

#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: Ashutosh Sharma (#4)
Re: How about using dirty snapshots to locate dependent objects?

On Thu, Jun 6, 2024 at 7:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:

I don't think it's correct to allow the index to be dropped while a
transaction is creating it. Instead, the right solution should be for
the create index operation to protect the object it is using from
being dropped. Specifically, the create index operation should acquire
a shared lock on the Access Method (AM) to ensure it doesn't get
dropped concurrently while the transaction is still in progress.

If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress, if someone tries to drop the object referenced by the index under creation, the referenced object being dropped is able to know about the dependent object (in this case the index being created) using dirty snapshot and hence, it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it.

You are aiming for the same outcome, but not in the conventional way.
In my opinion, the correct approach is not to find objects being
created using a dirty snapshot. Instead, when creating an object, you
should acquire a proper lock on any dependent objects to prevent them
from being dropped during the creation process. For instance, when
creating an index that depends on the btree_gist access method, the
create index operation should protect btree_gist from being dropped by
acquiring the appropriate lock. It is not the responsibility of the
drop extension to identify in-progress index creations.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#7Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Dilip Kumar (#6)
Re: How about using dirty snapshots to locate dependent objects?

On Fri, Jun 7, 2024 at 10:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 6, 2024 at 7:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:

I don't think it's correct to allow the index to be dropped while a
transaction is creating it. Instead, the right solution should be for
the create index operation to protect the object it is using from
being dropped. Specifically, the create index operation should acquire
a shared lock on the Access Method (AM) to ensure it doesn't get
dropped concurrently while the transaction is still in progress.

If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress, if someone tries to drop the object referenced by the index under creation, the referenced object being dropped is able to know about the dependent object (in this case the index being created) using dirty snapshot and hence, it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it.

You are aiming for the same outcome, but not in the conventional way.
In my opinion, the correct approach is not to find objects being
created using a dirty snapshot. Instead, when creating an object, you
should acquire a proper lock on any dependent objects to prevent them
from being dropped during the creation process. For instance, when
creating an index that depends on the btree_gist access method, the
create index operation should protect btree_gist from being dropped by
acquiring the appropriate lock. It is not the responsibility of the
drop extension to identify in-progress index creations.

Thanks for sharing your thoughts, I appreciate your inputs and
completely understand your perspective, but I wonder if that is
feasible? For example, if an object (index in this case) has
dependency on lets say 'n' number of objects, and those 'n' number of
objects belong to say 'n' different catalog tables, so should we
acquire locks on each of them until the create index command succeeds,
or, should we just check for the presence of dependent objects and
record their dependency inside the pg_depend table. Talking about this
particular case, we are trying to create gist index that has
dependency on gist_int4 opclass, it is one of the tuple inside
pg_opclass catalog table, so should acquire lock in this tuple/table
until the create index command succeeds and is that the thing to be
done for all the dependent objects?

--
With Regards,
Ashutosh Sharma.

#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Ashutosh Sharma (#7)
Re: How about using dirty snapshots to locate dependent objects?

On Fri, Jun 7, 2024 at 11:53 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Jun 7, 2024 at 10:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 6, 2024 at 7:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hello everyone,

At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:

I don't think it's correct to allow the index to be dropped while a
transaction is creating it. Instead, the right solution should be for
the create index operation to protect the object it is using from
being dropped. Specifically, the create index operation should acquire
a shared lock on the Access Method (AM) to ensure it doesn't get
dropped concurrently while the transaction is still in progress.

If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress, if someone tries to drop the object referenced by the index under creation, the referenced object being dropped is able to know about the dependent object (in this case the index being created) using dirty snapshot and hence, it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it.

You are aiming for the same outcome, but not in the conventional way.
In my opinion, the correct approach is not to find objects being
created using a dirty snapshot. Instead, when creating an object, you
should acquire a proper lock on any dependent objects to prevent them
from being dropped during the creation process. For instance, when
creating an index that depends on the btree_gist access method, the
create index operation should protect btree_gist from being dropped by
acquiring the appropriate lock. It is not the responsibility of the
drop extension to identify in-progress index creations.

Thanks for sharing your thoughts, I appreciate your inputs and
completely understand your perspective, but I wonder if that is
feasible? For example, if an object (index in this case) has
dependency on lets say 'n' number of objects, and those 'n' number of
objects belong to say 'n' different catalog tables, so should we
acquire locks on each of them until the create index command succeeds,
or, should we just check for the presence of dependent objects and
record their dependency inside the pg_depend table. Talking about this
particular case, we are trying to create gist index that has
dependency on gist_int4 opclass, it is one of the tuple inside
pg_opclass catalog table, so should acquire lock in this tuple/table
until the create index command succeeds and is that the thing to be
done for all the dependent objects?

I am not sure what is the best way to do it, but if you are creating
an object which is dependent on the other object then you need to
check the existence of those objects, record dependency on those
objects, and also lock them so that those object doesn't get dropped
while you are creating your object. I haven't looked into the patch
but something similar is being achieved in the thread Bertrand has
pointed out by locking the database object while recording the
dependency on those.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com