Use of ActiveSnapshot

Started by Jan Wieckalmost 19 years ago8 messageshackers
Jump to latest
#1Jan Wieck
JanWieck@Yahoo.com

The use of ActiveSnapshot throughout the code appears rather dangerous
to me. It is a global pointer, assumed not to be set yet in some places,
assumed to be saved and restored by the caller in others. The actual
(context) memory it points to is sometimes explicitly freed, sometimes
just left in the context and thrown away by MemoryContextDelete()
without resetting ActiveSnapshot to NULL.

The comment for the call of pg_plan_queries in util/cache/plancache.c
line 469 for example is fatally wrong. Not only should the snapshot be
set by all callers at this point, but if the call actually does replan
the queries, the existing ActiveSnapshot is replaced with one allocated
on the current memory context. If this happens to be inside of a nested
SPI call sequence, the innermost SPI stack frame will free the snapshot
data without restoring ActiveSnapshot to the one from the caller.

Either calling pg_plan_queries() with needSnapshot=false or saving and
restoring ActiveSnapshot will prevent the backend from dumping core in
the mentioned example, but I am not entirely sure as to which one is the
right solution.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#2Jan Wieck
JanWieck@Yahoo.com
In reply to: Jan Wieck (#1)
Re: Use of ActiveSnapshot

On 5/12/2007 4:53 PM, Jan Wieck wrote:

Either calling pg_plan_queries() with needSnapshot=false or saving and
restoring ActiveSnapshot will prevent the backend from dumping core in
the mentioned example, but I am not entirely sure as to which one is the
right solution.

Attached is a self contained example that crashes the current backend.
It took me a moment to figure out exactly how to reproduce it. The
problem occurs when the query that needs replanning is actually a

FOR record IN SELECT ...

that is inside of a nested function call. In that case, the revalidation
of the saved plan actually happens inside of SPI_cursor_open(), which
does not save and restore the ActiveSnapshot. Shouldn't it?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

Attachments:

t1.sqltext/plain; name=t1.sqlDownload
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#1)
Re: Use of ActiveSnapshot

Jan Wieck <JanWieck@Yahoo.com> writes:

The comment for the call of pg_plan_queries in util/cache/plancache.c
line 469 for example is fatally wrong. Not only should the snapshot be
set by all callers at this point, but if the call actually does replan
the queries, the existing ActiveSnapshot is replaced with one allocated
on the current memory context. If this happens to be inside of a nested
SPI call sequence, the innermost SPI stack frame will free the snapshot
data without restoring ActiveSnapshot to the one from the caller.

Yeah, I'd been meaning to go back and recheck that point after the code
settled down, but forgot :-(.

It is possible for RevalidateCachedPlan to be called with no snapshot
yet set --- at least the protocol Describe messages can do that. I
don't want Describe to force a snapshot because that would be bad for
cases like LOCK TABLE at the start of a serializable transaction, so
RevalidateCachedPlan had better be able to cope with this case.

Since the "typical" case in which no replan is necessary won't touch
the snapshot, I think we'd better adopt the rule that
RevalidateCachedPlan never causes any caller-visible change in
ActiveSnapshot, else we'll be risking very-hard-to-reproduce bugs.
So my proposal is that RevalidateCachedPlan should set a snapshot for
itself if it needs to replan and ActiveSnapshot is NULL (else it might
as well just use the existing snap); and that it should save and restore
ActiveSnapshot when it does this.

regards, tom lane

#4Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#3)
Re: Use of ActiveSnapshot

On 5/14/2007 1:29 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

The comment for the call of pg_plan_queries in util/cache/plancache.c
line 469 for example is fatally wrong. Not only should the snapshot be
set by all callers at this point, but if the call actually does replan
the queries, the existing ActiveSnapshot is replaced with one allocated
on the current memory context. If this happens to be inside of a nested
SPI call sequence, the innermost SPI stack frame will free the snapshot
data without restoring ActiveSnapshot to the one from the caller.

Yeah, I'd been meaning to go back and recheck that point after the code
settled down, but forgot :-(.

It is possible for RevalidateCachedPlan to be called with no snapshot
yet set --- at least the protocol Describe messages can do that. I
don't want Describe to force a snapshot because that would be bad for
cases like LOCK TABLE at the start of a serializable transaction, so
RevalidateCachedPlan had better be able to cope with this case.

Since the "typical" case in which no replan is necessary won't touch
the snapshot, I think we'd better adopt the rule that
RevalidateCachedPlan never causes any caller-visible change in
ActiveSnapshot, else we'll be risking very-hard-to-reproduce bugs.
So my proposal is that RevalidateCachedPlan should set a snapshot for
itself if it needs to replan and ActiveSnapshot is NULL (else it might
as well just use the existing snap); and that it should save and restore
ActiveSnapshot when it does this.

The only problem with that is that there are code paths that set
ActiveSnapshot to palloc()'d memory that is released due to a
MemoryContextDelete() without resetting ActiveSnapshot to NULL. So it
might be possible for RevalidateCachedPlan to go ahead with an
ActiveSnapshot pointing to garbage.

I think it would be cleaner if RevalidateCachedPlan()'s API would have a
Snapshot argument. If it needs a snapshot and the argument is NULL, it
can create (and free) one itself, otherwise it'd use the one given.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#4)
Re: Use of ActiveSnapshot

Jan Wieck <JanWieck@Yahoo.com> writes:

The only problem with that is that there are code paths that set
ActiveSnapshot to palloc()'d memory that is released due to a
MemoryContextDelete() without resetting ActiveSnapshot to NULL.

Only at the very end of a transaction (where ActiveSnapshot *is* reset
to null, in FreeXactSnapshot); otherwise we'd have bugs unrelated to
RevalidateCachedPlan. Eventually I would like to have reference-counted
snapshots managed by a centralized module, as was discussed a month or
two back; but right at the moment I don't think it's broken and I don't
want to spend time on intermediate solutions.

I think it would be cleaner if RevalidateCachedPlan()'s API would have a
Snapshot argument.

How does that improve anything? AFAICS the only thing that would ever
get passed is ActiveSnapshot, so this is just more notation to do
exactly the same thing.

regards, tom lane

#6Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#5)
Re: Use of ActiveSnapshot

On 5/14/2007 3:35 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

The only problem with that is that there are code paths that set
ActiveSnapshot to palloc()'d memory that is released due to a
MemoryContextDelete() without resetting ActiveSnapshot to NULL.

Only at the very end of a transaction (where ActiveSnapshot *is* reset
to null, in FreeXactSnapshot); otherwise we'd have bugs unrelated to
RevalidateCachedPlan. Eventually I would like to have reference-counted
snapshots managed by a centralized module, as was discussed a month or
two back; but right at the moment I don't think it's broken and I don't
want to spend time on intermediate solutions.

Which means that the 8.3 fix for the reproducible backend crash, I
posted earlier, is to have SPI_cursor_open() save and restore
ActiveSnapshot while calling RevalidateCachedPlan(). I'll cross check
that this fixes this symptom and commit later today.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#6)
Re: Use of ActiveSnapshot

Jan Wieck <JanWieck@Yahoo.com> writes:

Which means that the 8.3 fix for the reproducible backend crash, I
posted earlier, is to have SPI_cursor_open() save and restore
ActiveSnapshot while calling RevalidateCachedPlan(). I'll cross check
that this fixes this symptom and commit later today.

No, the correct fix is to do that inside RevalidateCachedPlan ... and I
already did it.

regards, tom lane

#8Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#7)
Re: Use of ActiveSnapshot

On 5/14/2007 4:26 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

Which means that the 8.3 fix for the reproducible backend crash, I
posted earlier, is to have SPI_cursor_open() save and restore
ActiveSnapshot while calling RevalidateCachedPlan(). I'll cross check
that this fixes this symptom and commit later today.

No, the correct fix is to do that inside RevalidateCachedPlan ... and I
already did it.

Works for me. It fixed the Slony test that actually tripped over the
bug. Thanks.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #