Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

Started by Andy Fanabout 4 years ago15 messages
#1Andy Fan
zhihui.fan1213@gmail.com

Hi,

During my recent work, I need some new stuff attached to Relation. Rather
than adding
some new data structures, I added it to Relation directly. Like
relation->balabala. Then
I initialize it during ExecutorRun, like table_tuple_insert.. and
destroy it at ExecutorEnd.

The above solution works based on 2 assumptions at least:
1. During the ExecutorRun & ExecutorEnd, the relcache will never by
invalidated, if not
the old relation->balabala will be lost. I assume this is correct since I
didn't see any places
where we handle such changes in Executor code.
2. We need to consider the ExuecotRun raised error, we need to destroy
the balabala resource
as well. so I added it to the RelationClose function.

So the overall design works like this:

xxx_table_tuple_insert(Relation rel, ...)
{
if (rel->balabala == NULL)
rel->balabala = allocate_bala_resource(rel); // Allocate the
memory xCtx which is under TopTransactionContext.
do_task_with(rel->balabala);
}

at the end of the executor, I run

release_bala_resource(Relation rel)
{
if (rel->balabala == NULL)
return;
do_the_real_task();
MemoryContextDelete(rel->bala->memctx);
rel->balabala = NULL
}

For the failed cases:

RelationClose(..)
{
if (RelationHasReferenceCountZero(relation))
release_bala_resource(relation);
}

Will my suluation work?

--
Best Regards
Andy Fan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Fan (#1)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

Andy Fan <zhihui.fan1213@gmail.com> writes:

During my recent work, I need some new stuff attached to Relation. Rather
than adding
some new data structures, I added it to Relation directly. Like
relation->balabala. Then
I initialize it during ExecutorRun, like table_tuple_insert.. and
destroy it at ExecutorEnd.

This is not going to work, at least not if you expect that a relcache
reset would not preserve the data. Also, what happens in the case of
nested executor runs touching the same relation?

Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?

regards, tom lane

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#1)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Mon, Nov 29, 2021 at 2:10 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

1. During the ExecutorRun & ExecutorEnd, the relcache will never by invalidated, if not
the old relation->balabala will be lost. I assume this is correct since I didn't see any places
where we handle such changes in Executor code.

It's not correct. We accept invalidation messages in a number of
different code paths, including whenever we acquire a lock on a
relation. That doesn't typically happen in the middle of a query, but
that's just because most queries don't happen to do anything that
would make it happen. They can, though. For example, the query can
call a user-defined function that accesses a table not previously
touched by the transaction. Or a built-in function that does the same
thing, like table_to_xml().

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#2)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Mon, Nov 29, 2021 at 10:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Fan <zhihui.fan1213@gmail.com> writes:

During my recent work, I need some new stuff attached to Relation.

Rather

than adding
some new data structures, I added it to Relation directly. Like
relation->balabala. Then
I initialize it during ExecutorRun, like table_tuple_insert.. and
destroy it at ExecutorEnd.

at least not if you expect that a relcache reset would not preserve the
data.

Thanks for looking into this question.

I am not clear about this sentence . IMO, if the relcache is reset, my
data
would be lost, That's why I expect there is no relcache reset happening in
Executor Stage, are you talking about something different?

Also, what happens in the case of nested executor runs touching the same
relation?

If you are talking about the RelationClose would be called many times, then
my solution is my resource is only destroyed when the refcount == 0; If
you are talking
about the different situation needs different resource types, then that's
something
I didn't describe it clearly. At the current time, we can assume "For 1
SQL statement, there
are only 1 resource needed per relation, even the executor or nest executor
touch the
same relation many times". I'd like to describe this clearly as well for
review purposes,
but I want to focus on one topic only first. So let's first assume my
assumption is correct.

Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?

I just see the ExecutorEnd code is not called after the exceptions case.
and I hope my resource
can be released totally even if the statement raises errors. for example:

CREATE TABLE t(A INT PRIMARY KEY);

INSERT INTO t VALUES(1);
INSERT INTO t VALUES(1);

In the above case, the ExecutorEnd is not called, but the RelationClose
will be absolutely called
during the ResourceOwnerRelease call.

--
Best Regards
Andy Fan

#5Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#3)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Mon, Nov 29, 2021 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Nov 29, 2021 at 2:10 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

1. During the ExecutorRun & ExecutorEnd, the relcache will never by

invalidated, if not

the old relation->balabala will be lost. I assume this is correct since

I didn't see any places

where we handle such changes in Executor code.

It's not correct. We accept invalidation messages in a number of
different code paths, including whenever we acquire a lock on a
relation. That doesn't typically happen in the middle of a query, but
that's just because most queries don't happen to do anything that
would make it happen. They can, though.

Thanks for looking into this question.

Actually I think this would not happen. My reasons are:

1. I think the case which would cause the relcache reset needs a lock, after
the executor start, the executor lock is acquired so on one can really
have chances
to send an invalidation message until the lock is released. (let me first
ignore the 2
examples you talked about, and I will talk about it later).

2. _If_ the relation can be reset after we open it during Executor code,
then would the
relation (RelationData *) pointed memory still validated after the relcache
reset? For example

CREATE TABLE t(a int);
INSERT INTO t VALUES(1), (2);

UPDATE t set a = 100;

We need to update 2 tuples in the update statement, if the relcache is
reset, can we still use the previous
(RelationData *) to do the following update? If not, what code is used to
change the relation for the old relcache
address to the new relcache. I assumed (RelationData *) pointer. to the
relcache directly, hope this is correct..

For example, the query can

call a user-defined function that accesses a table not previously
touched by the transaction. Or a built-in function that does the same
thing, like table_to_xml().

OK, this is something I missed before. but looks it would not caused
different _in my case_;

IIUC, you are describing the thing like this:

CREATE FUNCTION udf()
...
SELECT * FROM t2;
$$;

SELECT udf() FROM t1;

then the relation t2 will not be opened at the beginning of ExecutorRun and
it will be only opened
when we fetch the first tuple from t1; so we can have cache invalidation
between ExecutorRun and
the first call of udf.

But in my case, my exception should be that the relcache should not be
invalidated _after the first relation open_
in the executor (not the beginning of executorRun), this is something I
didn't describe well when I post
my message since I didn't find out this situation at that time.

--
Best Regards
Andy Fan

#6Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#4)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?

I re-think about this, I guess I didn't mention something clear enough.
That's true that I bound my bala struct to Relation struct, and the memory
relation used is allocated in relcache. but the memory of bala is
allocated in
TopTransactionMemory context.

xxx_table_tuple_insert(Relation rel, ...)
{
if (rel->balabala == NULL)
rel->balabala = allocate_bala_resource(rel); //
*TopTransactionContext*.
do_task_with(rel->balabala);
}

not sure if this should be called as putting my data in relcache.

and I rechecked the RelationData struct, and it looks like some
Executor-bind struct also
resides in it. for example: RelationData.rd_lookInfo. If the relcache can
be reset, the
fields like this are unsafe to access as well. Am I missing something?

--
Best Regards
Andy Fan

#7Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andy Fan (#6)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Tue, Nov 30, 2021 at 6:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?

I re-think about this, I guess I didn't mention something clear enough.
That's true that I bound my bala struct to Relation struct, and the memory
relation used is allocated in relcache. but the memory of bala is allocated in
TopTransactionMemory context.

xxx_table_tuple_insert(Relation rel, ...)
{
if (rel->balabala == NULL)
rel->balabala = allocate_bala_resource(rel); // TopTransactionContext.
do_task_with(rel->balabala);
}

not sure if this should be called as putting my data in relcache.

and I rechecked the RelationData struct, and it looks like some Executor-bind struct also
resides in it. for example: RelationData.rd_lookInfo. If the relcache can be reset, the
fields like this are unsafe to access as well. Am I missing something?

I think you are talking about rd_lockInfo? first, think this is not a
pointer so even if you get a new recache entry this memory will be
allocated along with the new relation descriptor and it will be
initialized whenever a new relation descriptor is created check
RelationInitLockInfo(). You will see there are many pointers also in
RelationData but we ensure before we access them they are initialized,
and most of them are allocated while building the relation descriptor
see RelationBuildDesc().

So if you keep some random pointer in RelationData without ensuring
that is getting reinitialized while building the relation cache then I
think that's not the correct way to do it.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#8Andy Fan
zhihui.fan1213@gmail.com
In reply to: Dilip Kumar (#7)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

You will see there are many pointers also in
RelationData but we ensure before we access them they are initialized,

The initialized values are not much helpful in the cases I provided here.

What do you think about this question?

2. _If_ the relation can be reset after we open it during Executor code,
then would the
relation (RelationData *) pointed memory still validated after the relcache
reset? For example

CREATE TABLE t(a int);
INSERT INTO t VALUES(1), (2);

UPDATE t set a = 100;

We need to update 2 tuples in the update statement, if the relcache is
reset after the first tuple is
updated, can we still use the previous (RelationData *) to do the 2nd
update? This is just
a common example. If you would say, in this case the relcache can't be
reset, then the question
come back to what situation the relcache can be reset between the first
time I open it during execution
code and the end of execution code. I think we have some talks about this
at [1]/messages/by-id/CAKU4AWonyG2_5V+vKaD6GETNNDKbKFL=otwWSEA3UXOJ=rS_wQ@mail.gmail.com.

Right now, I am not pretty sure I am doing something well. That's why I
post the question here. but still
I didn't find a better solution right now. [2]/messages/by-id/CAKU4AWotGB7PH+SJk41cgvqxOfeEEvJ1MV+6b21_5DMDE8SLXg@mail.gmail.com

[1]: /messages/by-id/CAKU4AWonyG2_5V+vKaD6GETNNDKbKFL=otwWSEA3UXOJ=rS_wQ@mail.gmail.com
/messages/by-id/CAKU4AWonyG2_5V+vKaD6GETNNDKbKFL=otwWSEA3UXOJ=rS_wQ@mail.gmail.com

[2]: /messages/by-id/CAKU4AWotGB7PH+SJk41cgvqxOfeEEvJ1MV+6b21_5DMDE8SLXg@mail.gmail.com
/messages/by-id/CAKU4AWotGB7PH+SJk41cgvqxOfeEEvJ1MV+6b21_5DMDE8SLXg@mail.gmail.com

--
Best Regards
Andy Fan

#9Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andy Fan (#8)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Tue, Nov 30, 2021 at 12:12 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

You will see there are many pointers also in
RelationData but we ensure before we access them they are initialized,

The initialized values are not much helpful in the cases I provided here.

What do you think about this question?

2. _If_ the relation can be reset after we open it during Executor code, then would the
relation (RelationData *) pointed memory still validated after the relcache reset? For example

CREATE TABLE t(a int);
INSERT INTO t VALUES(1), (2);

UPDATE t set a = 100;

We need to update 2 tuples in the update statement, if the relcache is reset after the first tuple is
updated, can we still use the previous (RelationData *) to do the 2nd update? This is just
a common example. If you would say, in this case the relcache can't be reset, then the question
come back to what situation the relcache can be reset between the first time I open it during execution
code and the end of execution code. I think we have some talks about this at [1].

IMHO, if you are doing an update then you must be already holding the
relation lock, so even if you call some UDF (e.g. UPDATE t set a = 100
WHERE x= UDF()) and it accepts the invalidation it wont affect your
RelationData because you are already holding the lock so parallelly
there could not be any DDL on this particular relation so this recache
entry should not be invalidated at least in the example you have
given.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#5)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

Thanks for everyone's insight so far!

my exception should be that the relcache should not be invalidated _after

the first relation open_
in the executor (not the beginning of executorRun)。

s/exception/expectation.

To be more accurate, my expectation is for a single sql statement, after
the first time I write data into
the relation, until the statement is completed or "aborted and
RelationClose is called in ResourceOwnerRelease",
the relcache reset should never happen.

Since there are many places the write table access methods can be called
like COPY, CAST, REFRESH MATVIEW,
VACUUM and ModifyNode, and it is possible that we can get error in each
place, so I think RelationClose
should be a great places for exceptional case(at the same time, we should
remember to destroy properly for non
exceptional case).

This might be not very professional since I bind some executor related data
into a relcache related struct.
But it should be workable in my modified user case. The most professional
method I can think out is adding
another resource type in ResourceOwner and let ResourceOwnerRelease to
handle the exceptional cases.

--
Best Regards
Andy Fan

#11Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#10)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Tue, Nov 30, 2021 at 4:47 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

my exception should be that the relcache should not be invalidated _after the first relation open_
in the executor (not the beginning of executorRun)。

s/exception/expectation.

To be more accurate, my expectation is for a single sql statement, after the first time I write data into
the relation, until the statement is completed or "aborted and RelationClose is called in ResourceOwnerRelease",
the relcache reset should never happen.

Well .... I'm not sure why you asked the question and then argued with
the answer you got. I mean, you can certainly decide how you think it
works, but that's not how I think it works.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#11)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Wed, Dec 1, 2021 at 3:33 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 30, 2021 at 4:47 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

my exception should be that the relcache should not be invalidated

_after the first relation open_

in the executor (not the beginning of executorRun)。

s/exception/expectation.

To be more accurate, my expectation is for a single sql statement,

after the first time I write data into

the relation, until the statement is completed or "aborted and

RelationClose is called in ResourceOwnerRelease",

the relcache reset should never happen.

Well .... I'm not sure why you asked the question and then argued with
the answer you got.

I think you misunderstand me, I argued with the answer because after I got
the
answer and I rethink my problem, I found my question description is not
accurate
enough, so I improved the question and willing discussion again. My
exception was
things will continue with something like this:
1. In your new described situation, your solution still does not work
because ...
2. In your new described situation, the solution would work for sure
3. your situation is still not cleared enough.

--
Best Regards
Andy Fan

#13Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#12)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Tue, Nov 30, 2021 at 7:50 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

I think you misunderstand me, I argued with the answer because after I got the
answer and I rethink my problem, I found my question description is not accurate
enough, so I improved the question and willing discussion again. My exception was
things will continue with something like this:
1. In your new described situation, your solution still does not work because ...
2. In your new described situation, the solution would work for sure
3. your situation is still not cleared enough.

I mean, it's clear to me that you can make a new table get opened at
any time in the execution of a query. Just use CASE or an IF statement
inside of a stored procedure to do nothing for the first 9999 calls
and then on call 10000 access a new relation. And as soon as that
happens, you can AcceptInvalidationMessages(), which can cause
relcache entries to be destroyed or rebuilt. If the relation is open,
the relcache entry can't be destroyed altogether, but it can be
rebuilt: see RelationClearRelation(). Whether that's a problem for
what you are doing I don't know. But the overall point is that access
to a new relation can happen at any point in the query -- and as soon
as it does, we will accept ALL pending invalidation messages for ALL
relations regardless of what locks anyone holds on anything.

So it's generally a mistake to assume that relcache entries are
"stable" across large stretches of code. They are in fact stable in a
certain sense - if we have the relation open, we hold a reference
count on it, and so the Relation pointer itself will remain valid. But
the data it points to can change in various ways, and different
members of the RelationData struct are handled differently. Separately
from the reference count, the heavyweight lock that we also hold on
the relation as a condition of opening it prevents certain kinds of
changes, so that even if the relation cache entry is rebuilt, certain
particular fields will be unaffected. Which fields are protected in
this way will depend on what kind of lock is held. It's hard to speak
in general terms. The best advice I can give you is (1) look exactly
what RelationClearRelation() is going to do to the fields you care
about if a rebuild happens, (2) err on the side of assuming that
things can change under you, and (3) try running your code under
debug_discard_caches = 1. It will be slow that way, but it's pretty
effective in finding places where you've made unsafe assumptions.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#13)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

If the relation is open,
the relcache entry can't be destroyed altogether, but it can be
rebuilt: see RelationClearRelation().

Thanks! This is a new amazing knowledge for me!

They are in fact stable in a certain sense - if we have the relation open

we hold a reference count on it, and so the Relation pointer itself will

remain valid.

This sounds amazing as well.

But
the data it points to can change in various ways, and different
members of the RelationData struct are handled differently. Separately
from the reference count, the heavyweight lock that we also hold on
the relation as a condition of opening it prevents certain kinds of
changes, so that even if the relation cache entry is rebuilt, certain
particular fields will be unaffected. Which fields are protected in
this way will depend on what kind of lock is held. It's hard to speak
in general terms.

Amazing++;

The best advice I can give you is (1) look exactly
what RelationClearRelation() is going to do to the fields you care
about if a rebuild happens, (2) err on the side of assuming that
things can change under you, and (3) try running your code under
debug_discard_caches = 1. It will be slow that way, but it's pretty
effective in finding places where you've made unsafe assumptions.

Thanks! I clearly understand what's wrong in my previous knowledge.
That is, after a relation is open with some lock, then the content of the
relation
will never change until the RelationClose. It would take time to fill the
gap, but I'd like to say "thank you!" first.

--
Best Regards
Andy Fan

#15Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#14)
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

On Wed, Dec 1, 2021 at 10:58 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Thanks! I clearly understand what's wrong in my previous knowledge.

Cool, glad it helped.

--
Robert Haas
EDB: http://www.enterprisedb.com