tuptoaster.c must *not* use SnapshotAny

Started by Tom Laneabout 24 years ago34 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row. Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR: chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead. Comments anyone?

regards, tom lane

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: tuptoaster.c must *not* use SnapshotAny

"Command Prompt, Inc." <pgsql-hackers@commandprompt.com> writes:

Josh Drake here: Although I know that development has pretty much ceased
on the 7.1 series. This bug, is potentially very serious for those who
have the problem.

Well, based on our experiments last night, recovery is pretty simple:
vacuum the affected table. So it doesn't qualify as a critical problem
in my eyes, although surely it's a "must fix" for 7.2.

The last time this came up (that I can find) is in August of 2001, so the
bug is not frequent that we are aware of. However, as it appears the fix
is simple should we apply a 7.1.3.1 or 7.1.4 that fixes this issue?

I don't think anyone has the energy to put out a 7.1.4. We seem to have
enough trouble getting 7.2 out the door.

regards, tom lane

#3Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#2)
Re: tuptoaster.c must *not* use SnapshotAny

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row.

Well, doesn't it need to find *any* row, since the accesses to heap and toast
are not "atomic" (not interlocked) ?

Including dead rows resulting from an aborted

But then also the heap tuple is aborted, and thus a normal select will not
select that toast anyway.

(or, in 7.2, still-in-progress)
VACUUM.

I thought new vacuum did not move tuples ? Why should it then produce
two rows ? And VACUUM FULL locks the table so ...

I think SnapshotSelf could have concurrency problems (not see a toast
that someone else committed between our heap read and toast read)
IIRC, at the time at least SnapshotAny was chosen on purpose.

Might it be a problem, that VACUUM sets xids to FrozenXid, and we always
consider that one Valid, even if that vacuum is still in progress ?

I can only see a problem with a partway through vacuum full, that aborted.

Andreas

PS: short desc between Any and Self, anybody ?

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#3)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row.

Well, doesn't it need to find *any* row, since the accesses to heap and toast
are not "atomic" (not interlocked) ?

No; it needs to *not* find dead rows resulting from a failed vacuum.
You are right that for the most part TOAST relies on time qualification
of the main-table row, and no doubt that was why Jan thought he could
code it like this; but vacuuming the TOAST table is not an operation
that affects the main table.

Including dead rows resulting from an aborted

But then also the heap tuple is aborted, and thus a normal select will not
select that toast anyway.

Wrong; see above. We are talking about reshuffling rows in the TOAST
table that belong (presumably) to a live row in the main table. During
the VACUUM there will be more than one copy of such rows.

(or, in 7.2, still-in-progress)
VACUUM.

I thought new vacuum did not move tuples ? Why should it then produce
two rows ? And VACUUM FULL locks the table so ...

You're right, lazy VACUUM doesn't create this issue. My mistake.
But VACUUM FULL does.

I can only see a problem with a partway through vacuum full, that aborted.

Exactly.

regards, tom lane

#5Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#4)
Re: tuptoaster.c must *not* use SnapshotAny

You're right, lazy VACUUM doesn't create this issue. My mistake.
But VACUUM FULL does.

I can only see a problem with a partway through vacuum

full, that aborted.

Exactly.

Can you try give me a hint, why an aborted VACUUM FULL will not allways be
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

Andreas

#6Command Prompt, Inc.
pgsql-hackers@commandprompt.com
In reply to: Tom Lane (#1)
Re: tuptoaster.c must *not* use SnapshotAny

Josh Drake here: Although I know that development has pretty much ceased
on the 7.1 series. This bug, is potentially very serious for those who
have the problem.

The last time this came up (that I can find) is in August of 2001, so the
bug is not frequent that we are aware of. However, as it appears the fix
is simple should we apply a 7.1.3.1 or 7.1.4 that fixes this issue?

J

On Tue, 15 Jan 2002, Tom Lane wrote:

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row. Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR: chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead. Comments anyone?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
by way of pgsql-hackers@commandprompt.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#5)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

Can you try give me a hint, why an aborted VACUUM FULL will not allways be
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

Because noplace else uses SnapshotAny, basically. The only really
legitimate use I can see for it is in Tatsuo's contrib/pgstattuple
hack: that wants to count both visible and invisible tuples.

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I forgot to answer your question about what these mean. See
src/backend/utils/time/tqual.c for the gory details, but if you
are willing to accept the summary comments in that file:

Any No visibility check whatever

Self Other committed xacts + effects of this xact up to
*and including* current command

Now Other committed xacts + effects of this xact up to
*but not including* current command

Dirty like Self, but can see effects of other in-progress
xacts too

Snapshot like Now, but specified other xacts are considered
uncommitted even if they've actually committed by now

There are also special visibility rules for Update and Vacuum, which
are not really different from the normal rules, they just return more
information than "visible" or "not visible". (Dirty returns some
side information too.)

Hmm. Now that I look at it, Self probably isn't quite right either;
if we are reading a main-table tuple that's committed dead but is
still visible according to our snapshot, we need to be able to see
its toast tuples too; but they're committed dead as well. Sigh.
I think we need a special visibility rule for TOAST, that only
checks for HEAP_MOVED_IN/HEAP_MOVED_OFF (the bits that take care
of VACUUM moves).

Comments?

regards, tom lane

#8Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#1)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row. Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

Really? So what you're saying means that while vacuum is
running there's a window where one tuple can exist multiple
times and just the fact that other backends can see the
vacuum transaction alive prevents them from showing up? That
doesn't sound really good to me.

Jan

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR: chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead. Comments anyone?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#8)
Re: tuptoaster.c must *not* use SnapshotAny

Jan Wieck <janwieck@yahoo.com> writes:

Really? So what you're saying means that while vacuum is
running there's a window where one tuple can exist multiple
times

Moving a tuple in VACUUM isn't fundamentally different from an UPDATE.
You need a visibility rule to tell you which version of the tuple to
pay attention to. Right now, TOAST hasn't got one.

Per later discussion, Andreas is right that SnapshotSelf won't do.
I think we need a new tqual.c routine just for TOAST. (The alternative
would be for TOAST to know which snapshot was used to find the main-
table tuple, but that doesn't seem workable.)

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
Re: tuptoaster.c must *not* use SnapshotAny

I have added more comments to tqual.c to the stuff Tom just added. I
added visibility items to the top of each function comment.

---------------------------------------------------------------------------

Tom Lane wrote:

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

Can you try give me a hint, why an aborted VACUUM FULL will not allways be
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

Because noplace else uses SnapshotAny, basically. The only really
legitimate use I can see for it is in Tatsuo's contrib/pgstattuple
hack: that wants to count both visible and invisible tuples.

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I forgot to answer your question about what these mean. See
src/backend/utils/time/tqual.c for the gory details, but if you
are willing to accept the summary comments in that file:

Any No visibility check whatever

Self Other committed xacts + effects of this xact up to
*and including* current command

Now Other committed xacts + effects of this xact up to
*but not including* current command

Dirty like Self, but can see effects of other in-progress
xacts too

Snapshot like Now, but specified other xacts are considered
uncommitted even if they've actually committed by now

There are also special visibility rules for Update and Vacuum, which
are not really different from the normal rules, they just return more
information than "visible" or "not visible". (Dirty returns some
side information too.)

Hmm. Now that I look at it, Self probably isn't quite right either;
if we are reading a main-table tuple that's committed dead but is
still visible according to our snapshot, we need to be able to see
its toast tuples too; but they're committed dead as well. Sigh.
I think we need a special visibility rule for TOAST, that only
checks for HEAP_MOVED_IN/HEAP_MOVED_OFF (the bits that take care
of VACUUM moves).

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#11Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#10)
Re: tuptoaster.c must *not* use SnapshotAny

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

What if we crash/abort between step 1 and 2 but we used FrozenXid for 1.
Don't know if we actually do this, but imho we are only allowed
to use FrozenXid for an inplace vacuum operation.

Andreas

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#11)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

regards, tom lane

#13Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#12)
Re: tuptoaster.c must *not* use SnapshotAny

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Andreas

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#13)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Right. Actually it doesn't matter whether the old tuple has
HEAP_MOVED_OFF or not; it's still visible *until* the VACUUM commits.
The commit atomically switches us from the state where the unmoved
tuples are good to the state where the moved ones are good.

This is all exactly the same whether FrozenXid is involved or not.

regards, tom lane

#15Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#14)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Right. Actually it doesn't matter whether the old tuple has
HEAP_MOVED_OFF or not; it's still visible *until* the VACUUM commits.
The commit atomically switches us from the state where the unmoved
tuples are good to the state where the moved ones are good.

This is all exactly the same whether FrozenXid is involved or not.

Originally I added SnapshotAny only to solve the problem that
the lookup of the toast table happens independant from the
access to the main tuple, and that in fact the main tuple
could have been deleted by the own transaction in between.

Would changing SnapshotAny to honor HEAP_MOVED_* fix the
problem? If nothing else by now uses this snapshot, it could
be done in place.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Jan Wieck <janwieck@yahoo.com> writes:

Originally I added SnapshotAny only to solve the problem that
the lookup of the toast table happens independant from the
access to the main tuple, and that in fact the main tuple
could have been deleted by the own transaction in between.

Would changing SnapshotAny to honor HEAP_MOVED_* fix the
problem? If nothing else by now uses this snapshot, it could
be done in place.

What I've done is add an additional snapshot class SnapshotToast
that behaves that way. There are a couple other uses of SnapshotAny
that I didn't think I wanted to mess with.

regards, tom lane

#17Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Jan Wieck <janwieck@yahoo.com> writes:

Originally I added SnapshotAny only to solve the problem that
the lookup of the toast table happens independant from the
access to the main tuple, and that in fact the main tuple
could have been deleted by the own transaction in between.

Would changing SnapshotAny to honor HEAP_MOVED_* fix the
problem? If nothing else by now uses this snapshot, it could
be done in place.

What I've done is add an additional snapshot class SnapshotToast
that behaves that way. There are a couple other uses of SnapshotAny
that I didn't think I wanted to mess with.

AFAIR SnapshotAny was introduced for Foreign Key stuff.
SnapshotToast doesn't seem a proper solution to me but
7.2 should be released ASAP anyway.

regards,
Hiroshi Inoue

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#17)
Re: tuptoaster.c must *not* use SnapshotAny

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

SnapshotToast doesn't seem a proper solution to me

Do you have a better idea in mind?

regards, tom lane

#19Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

SnapshotToast doesn't seem a proper solution to me

Do you have a better idea in mind?

The same snapshot for both the main and the toast table
seems better though I don't know how it's difficult to
change. For example is it possible to update a toast
chunk partially using SnapshotToast ?

regards,
Hiroshi Inoue

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#19)
Re: tuptoaster.c must *not* use SnapshotAny

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The same snapshot for both the main and the toast table
seems better though I don't know how it's difficult to
change.

That would be cleaner but I don't see any way to do it. The toasted
value gets passed all over the system, potentially, before someone asks
to detoast it. At that point you really don't have a good way to know
which snapshot was used to fetch the main-table row.

For example is it possible to update a toast
chunk partially using SnapshotToast ?

As things stand (with either SnapshotToast or the old SnapshotAny way)
it is never possible to update an individual toast value, either
partially or wholly. All you can do is lay down a new toast value (with
a new identifying OID) and then delete the old one.

But I'm not sure that this is wrong, or fixable. Trying to update part
of a toasted value is very much like wanting to update part of an
existing row in-place, which we cannot possibly do. We have to lay down
a whole new row whenever any part of it is updated.

regards, tom lane

#21Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#20)
#22Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Jan Wieck (#15)
#23Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#22)
#24Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#24)
#26Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#23)
#27Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#25)
#29Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#25)
#30Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#29)
#31Jan Wieck
JanWieck@Yahoo.com
In reply to: Hiroshi Inoue (#29)
#32Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#25)
#33Joe Conway
mail@joeconway.com
In reply to: Jan Wieck (#31)
#34Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#25)