Why to index a "Recently DEAD" tuple when creating index

Started by Andy Fanabout 7 years ago9 messageshackers

zhihui.fan1213@gmail.com

about 7 years ago

HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

It is a tuple which has been deleted AND committed but before the delete
there is a transaction started but not committed. Let call this transaction
as Transaction A.

if we create index on this time, Let's call this index as Index A， it still
index this record. my question is why need this.

The only reason I can think out (maybe also not reasonable enough) is:
If we index like this and the isolate level of transaction A is
serializable， it is possible that the query in transaction A can use Index
A since it contains the snapshot data when the transaction A was began.
this reason may be not reasonable enough is because the transaction A may
be should not see the index A at all.

Kuntal Ghosh

kuntalghosh.2007@gmail.com

about 7 years ago

In reply to: Andy Fan (#1)

Re: Why to index a "Recently DEAD" tuple when creating index

On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:

HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

It is a tuple which has been deleted AND committed but before the delete
there is a transaction started but not committed. Let call this transaction
as Transaction A.

if we create index on this time, Let's call this index as Index A， it
still index this record. my question is why need this.

In this case, the changes of the tuple is not visible yet. Now suppose,

your transaction A is serializable and you've another serializable
transaction B which can see the index A. It generates a plan that requires
to fetch the deleted tuple through an index scan. If the tuple is not
present in the index, how are you going to create a conflict edge between
transaction A and transaction B?

Basically, you need to identify the following clause to detect serializable
conflicts:
Transaction A precedes transaction B. (Because, transaction A has deleted a
tuple and it's not visible to transaction B)

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Andy Fan

zhihui.fan1213@gmail.com

about 7 years ago

In reply to: Kuntal Ghosh (#2)

Re: Why to index a "Recently DEAD" tuple when creating index

On Mon, Jun 10, 2019 at 3:28 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:

On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:

HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

It is a tuple which has been deleted AND committed but before the delete
there is a transaction started but not committed. Let call this transaction
as Transaction A.

if we create index on this time, Let's call this index as Index A， it
still index this record. my question is why need this.

In this case, the changes of the tuple is not visible yet. Now suppose,

your transaction A is serializable and you've another serializable
transaction B which can see the index A. It generates a plan that requires
to fetch the deleted tuple through an index scan. If the tuple is not
present in the index, how are you going to create a conflict edge between
transaction A and transaction B?

Basically, you need to identify the following clause to detect
serializable conflicts:
Transaction A precedes transaction B. (Because, transaction A has deleted
a tuple and it's not visible to transaction B)

thanks Ghosh. Looks your answer is similar with my previous point
(transaction is serializable). actually if the transaction B can't see
the “deleted" which has been committed, should it see the index A which is
created after the "delete" transaction?

Show quoted text

Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Kuntal Ghosh

kuntalghosh.2007@gmail.com

about 7 years ago

In reply to: Andy Fan (#3)

Re: Why to index a "Recently DEAD" tuple when creating index

On Mon, Jun 10, 2019 at 1:30 PM Alex <zhihui.fan1213@gmail.com> wrote:

On Mon, Jun 10, 2019 at 3:28 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:

On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:

HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

It is a tuple which has been deleted AND committed but before the delete there is a transaction started but not committed. Let call this transaction as Transaction A.

if we create index on this time, Let's call this index as Index A， it still index this record. my question is why need this.

In this case, the changes of the tuple is not visible yet. Now suppose, your transaction A is serializable and you've another serializable transaction B which can see the index A. It generates a plan that requires to fetch the deleted tuple through an index scan. If the tuple is not present in the index, how are you going to create a conflict edge between transaction A and transaction B?

Basically, you need to identify the following clause to detect serializable conflicts:
Transaction A precedes transaction B. (Because, transaction A has deleted a tuple and it's not visible to transaction B)

thanks Ghosh. Looks your answer is similar with my previous point (transaction is serializable). actually if the transaction B can't see the “deleted" which has been committed, should it see the index A which is created after the "delete" transaction?

I think what I'm trying to say is different.

For my case, the sequence is as following:
1. Transaction A has deleted a tuple, say t1 and got committed.
2. Index A has been created successfully.
3. Now, transaction B starts and use the index A to fetch the tuple
t1. While doing visibility check, transaction B gets to know that t1
has been deleted by a committed transaction A, so it can't see the
tuple. But, it creates a dependency edge that transaction A precedes
transaction B. This edge is required to detect a serializable conflict
failure.

If you don't create the index entry, it'll not be able to create that edge.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Andy Fan

zhihui.fan1213@gmail.com

about 7 years ago

In reply to: Kuntal Ghosh (#4)

Re: Why to index a "Recently DEAD" tuple when creating index

On Mon, Jun 10, 2019 at 4:10 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:

On Mon, Jun 10, 2019 at 1:30 PM Alex <zhihui.fan1213@gmail.com> wrote:

On Mon, Jun 10, 2019 at 3:28 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com>

wrote:

On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:

HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

It is a tuple which has been deleted AND committed but before the

delete there is a transaction started but not committed. Let call this
transaction as Transaction A.

if we create index on this time, Let's call this index as Index A， it

still index this record. my question is why need this.

In this case, the changes of the tuple is not visible yet. Now suppose,

your transaction A is serializable and you've another serializable
transaction B which can see the index A. It generates a plan that requires
to fetch the deleted tuple through an index scan. If the tuple is not
present in the index, how are you going to create a conflict edge between
transaction A and transaction B?

Basically, you need to identify the following clause to detect

serializable conflicts:

Transaction A precedes transaction B. (Because, transaction A has

deleted a tuple and it's not visible to transaction B)

thanks Ghosh. Looks your answer is similar with my previous point

(transaction is serializable). actually if the transaction B can't see
the “deleted" which has been committed, should it see the index A which is
created after the "delete" transaction?

I think what I'm trying to say is different.

For my case, the sequence is as following:
1. Transaction A has deleted a tuple, say t1 and got committed.
2. Index A has been created successfully.
3. Now, transaction B starts and use the index A to fetch the tuple
t1. While doing visibility check, transaction B gets to know that t1
has been deleted by a committed transaction A, so it can't see the
tuple. But, it creates a dependency edge that transaction A precedes
transaction B. This edge is required to detect a serializable conflict
failure.

If you don't create the index entry, it'll not be able to create that edge.

Thanks, I got the difference now, but still not get the necessity of it.
1. Assume we don't index it, in which situation we can get a wrong
result?
2. If we only support "Read Committed" isolation level, is there a safe
way to not index such data?

Show quoted text

Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Kuntal Ghosh

kuntalghosh.2007@gmail.com

about 7 years ago

In reply to: Andy Fan (#5)

Re: Why to index a "Recently DEAD" tuple when creating index

On Mon, Jun 10, 2019 at 2:12 PM Alex <zhihui.fan1213@gmail.com> wrote:

On Mon, Jun 10, 2019 at 4:10 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:

I think what I'm trying to say is different.

For my case, the sequence is as following:
1. Transaction A has deleted a tuple, say t1 and got committed.
2. Index A has been created successfully.
3. Now, transaction B starts and use the index A to fetch the tuple
t1. While doing visibility check, transaction B gets to know that t1
has been deleted by a committed transaction A, so it can't see the
tuple. But, it creates a dependency edge that transaction A precedes
transaction B. This edge is required to detect a serializable conflict
failure.

If you don't create the index entry, it'll not be able to create that edge.

Thanks, I got the difference now, but still not get the necessity of it.
1. Assume we don't index it, in which situation we can get a wrong result?

Consider the following sequence of three different transactions X,A and B:

1. Transaction X reads a tuple t2.
2. Transaction A updates the tuple t2, deletes a tuple t1 and gets
committed. So, there transaction X precedes transaction A, i.e., X <-
A.
3. Index A is created successfully.
4. Transaction B starts and use the index A to fetch tuple t1. But,
it's already deleted by the committed transaction A. So, transaction A
precedes transaction B, i.e., A<-B.
5. At this point you've a dangerous structure X<-A<-B (definition of
dangerous structure src/backend/storage/lmgr/README-SSI) in the graph
which can produce an anomaly. For example now, if X tries to update
another tuple previously read by B, you'll have a dependency B<-X.
But, you already have X<-B which leads to serializable conflict.
Postgres tries to resolve this anomaly by rolling back one of the
transaction.

In your case, it'll be difficult to detect.

2. If we only support "Read Committed" isolation level, is there a safe way to not index such data?

I can't think of a case where the RECENTLY_DELETED tuple needs to be
indexed in "Read Committed" case. So, your suggestion likely to work
logically in "Read committed" isolation level. But, I'm not sure
whether you'll encounter any assertion failures in vacuum path or
concurrent index paths.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Andy Fan

zhihui.fan1213@gmail.com

about 7 years ago

In reply to: Kuntal Ghosh (#6)

Re: Why to index a "Recently DEAD" tuple when creating index

Thanks! Appreciate it for your time!

On Mon, Jun 10, 2019 at 5:34 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:

Show quoted text

On Mon, Jun 10, 2019 at 2:12 PM Alex <zhihui.fan1213@gmail.com> wrote:

On Mon, Jun 10, 2019 at 4:10 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com>

wrote:

I think what I'm trying to say is different.

For my case, the sequence is as following:
1. Transaction A has deleted a tuple, say t1 and got committed.
2. Index A has been created successfully.
3. Now, transaction B starts and use the index A to fetch the tuple
t1. While doing visibility check, transaction B gets to know that t1
has been deleted by a committed transaction A, so it can't see the
tuple. But, it creates a dependency edge that transaction A precedes
transaction B. This edge is required to detect a serializable conflict
failure.

If you don't create the index entry, it'll not be able to create that

edge.

Thanks, I got the difference now, but still not get the necessity of it.
1. Assume we don't index it, in which situation we can get a wrong

result?

Consider the following sequence of three different transactions X,A and B:

1. Transaction X reads a tuple t2.
2. Transaction A updates the tuple t2, deletes a tuple t1 and gets
committed. So, there transaction X precedes transaction A, i.e., X <-
A.
3. Index A is created successfully.
4. Transaction B starts and use the index A to fetch tuple t1. But,
it's already deleted by the committed transaction A. So, transaction A
precedes transaction B, i.e., A<-B.
5. At this point you've a dangerous structure X<-A<-B (definition of
dangerous structure src/backend/storage/lmgr/README-SSI) in the graph
which can produce an anomaly. For example now, if X tries to update
another tuple previously read by B, you'll have a dependency B<-X.
But, you already have X<-B which leads to serializable conflict.
Postgres tries to resolve this anomaly by rolling back one of the
transaction.

In your case, it'll be difficult to detect.

2. If we only support "Read Committed" isolation level, is there a

safe way to not index such data?

I can't think of a case where the RECENTLY_DELETED tuple needs to be
indexed in "Read Committed" case. So, your suggestion likely to work
logically in "Read committed" isolation level. But, I'm not sure
whether you'll encounter any assertion failures in vacuum path or
concurrent index paths.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Kuntal Ghosh (#6)

Re: Why to index a "Recently DEAD" tuple when creating index

Kuntal Ghosh <kuntalghosh.2007@gmail.com> writes:

2. If we only support "Read Committed" isolation level, is there a safe way to not index such data?

I can't think of a case where the RECENTLY_DELETED tuple needs to be
indexed in "Read Committed" case.

I think you're making dangerously optimistic assumptions about how
long a query snapshot might survive in READ COMMITTED mode.

In particular, I suspect you're reasoning that the new index couldn't
be used except by a freshly-created query plan, which is possibly
true, and that such a plan must be used with a freshly-created
snapshot, which is simply wrong. A counterexample could be built
using a SQL or PL function that's marked STABLE, because such a
function is defined to be executed using the calling query's
snapshot. But it'll make query plans using current reality.

regards, tom lane

Kuntal Ghosh

kuntalghosh.2007@gmail.com

about 7 years ago

In reply to: Tom Lane (#8)

Re: Why to index a "Recently DEAD" tuple when creating index

On Mon, Jun 10, 2019 at 5:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kuntal Ghosh <kuntalghosh.2007@gmail.com> writes:

2. If we only support "Read Committed" isolation level, is there a safe way to not index such data?

I can't think of a case where the RECENTLY_DELETED tuple needs to be
indexed in "Read Committed" case.

I think you're making dangerously optimistic assumptions about how
long a query snapshot might survive in READ COMMITTED mode.

In particular, I suspect you're reasoning that the new index couldn't
be used except by a freshly-created query plan, which is possibly
true, and that such a plan must be used with a freshly-created
snapshot, which is simply wrong. A counterexample could be built
using a SQL or PL function that's marked STABLE, because such a
function is defined to be executed using the calling query's
snapshot. But it'll make query plans using current reality.

Wow. I've not thought of this scenario. Also, I'm not aware about this
different snapshot usage as well. I'll debug the same. Thank you Tom.

So, the READ COMMITTED mode will also cause this kind of issues.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com