tuptoaster.c must *not* use SnapshotAny

Started by Tom Lanealmost 24 years ago34 messages
#1Tom Lane
tgl@sss.pgh.pa.us

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row. Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR: chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead. Comments anyone?

regards, tom lane

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: tuptoaster.c must *not* use SnapshotAny

"Command Prompt, Inc." <pgsql-hackers@commandprompt.com> writes:

Josh Drake here: Although I know that development has pretty much ceased
on the 7.1 series. This bug, is potentially very serious for those who
have the problem.

Well, based on our experiments last night, recovery is pretty simple:
vacuum the affected table. So it doesn't qualify as a critical problem
in my eyes, although surely it's a "must fix" for 7.2.

The last time this came up (that I can find) is in August of 2001, so the
bug is not frequent that we are aware of. However, as it appears the fix
is simple should we apply a 7.1.3.1 or 7.1.4 that fixes this issue?

I don't think anyone has the energy to put out a 7.1.4. We seem to have
enough trouble getting 7.2 out the door.

regards, tom lane

#3Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#2)
Re: tuptoaster.c must *not* use SnapshotAny

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row.

Well, doesn't it need to find *any* row, since the accesses to heap and toast
are not "atomic" (not interlocked) ?

Including dead rows resulting from an aborted

But then also the heap tuple is aborted, and thus a normal select will not
select that toast anyway.

(or, in 7.2, still-in-progress)
VACUUM.

I thought new vacuum did not move tuples ? Why should it then produce
two rows ? And VACUUM FULL locks the table so ...

I think SnapshotSelf could have concurrency problems (not see a toast
that someone else committed between our heap read and toast read)
IIRC, at the time at least SnapshotAny was chosen on purpose.

Might it be a problem, that VACUUM sets xids to FrozenXid, and we always
consider that one Valid, even if that vacuum is still in progress ?

I can only see a problem with a partway through vacuum full, that aborted.

Andreas

PS: short desc between Any and Self, anybody ?

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#3)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row.

Well, doesn't it need to find *any* row, since the accesses to heap and toast
are not "atomic" (not interlocked) ?

No; it needs to *not* find dead rows resulting from a failed vacuum.
You are right that for the most part TOAST relies on time qualification
of the main-table row, and no doubt that was why Jan thought he could
code it like this; but vacuuming the TOAST table is not an operation
that affects the main table.

Including dead rows resulting from an aborted

But then also the heap tuple is aborted, and thus a normal select will not
select that toast anyway.

Wrong; see above. We are talking about reshuffling rows in the TOAST
table that belong (presumably) to a live row in the main table. During
the VACUUM there will be more than one copy of such rows.

(or, in 7.2, still-in-progress)
VACUUM.

I thought new vacuum did not move tuples ? Why should it then produce
two rows ? And VACUUM FULL locks the table so ...

You're right, lazy VACUUM doesn't create this issue. My mistake.
But VACUUM FULL does.

I can only see a problem with a partway through vacuum full, that aborted.

Exactly.

regards, tom lane

#5Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#4)
Re: tuptoaster.c must *not* use SnapshotAny

You're right, lazy VACUUM doesn't create this issue. My mistake.
But VACUUM FULL does.

I can only see a problem with a partway through vacuum

full, that aborted.

Exactly.

Can you try give me a hint, why an aborted VACUUM FULL will not allways be
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

Andreas

#6Command Prompt, Inc.
pgsql-hackers@commandprompt.com
In reply to: Tom Lane (#1)
Re: tuptoaster.c must *not* use SnapshotAny

Josh Drake here: Although I know that development has pretty much ceased
on the 7.1 series. This bug, is potentially very serious for those who
have the problem.

The last time this came up (that I can find) is in August of 2001, so the
bug is not frequent that we are aware of. However, as it appears the fix
is simple should we apply a 7.1.3.1 or 7.1.4 that fixes this issue?

J

On Tue, 15 Jan 2002, Tom Lane wrote:

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row. Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR: chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead. Comments anyone?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
by way of pgsql-hackers@commandprompt.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#5)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

Can you try give me a hint, why an aborted VACUUM FULL will not allways be
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

Because noplace else uses SnapshotAny, basically. The only really
legitimate use I can see for it is in Tatsuo's contrib/pgstattuple
hack: that wants to count both visible and invisible tuples.

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I forgot to answer your question about what these mean. See
src/backend/utils/time/tqual.c for the gory details, but if you
are willing to accept the summary comments in that file:

Any No visibility check whatever

Self Other committed xacts + effects of this xact up to
*and including* current command

Now Other committed xacts + effects of this xact up to
*but not including* current command

Dirty like Self, but can see effects of other in-progress
xacts too

Snapshot like Now, but specified other xacts are considered
uncommitted even if they've actually committed by now

There are also special visibility rules for Update and Vacuum, which
are not really different from the normal rules, they just return more
information than "visible" or "not visible". (Dirty returns some
side information too.)

Hmm. Now that I look at it, Self probably isn't quite right either;
if we are reading a main-table tuple that's committed dead but is
still visible according to our snapshot, we need to be able to see
its toast tuples too; but they're committed dead as well. Sigh.
I think we need a special visibility rule for TOAST, that only
checks for HEAP_MOVED_IN/HEAP_MOVED_OFF (the bits that take care
of VACUUM moves).

Comments?

regards, tom lane

#8Jan Wieck
janwieck@yahoo.com
In reply to: Tom Lane (#1)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row. Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

Really? So what you're saying means that while vacuum is
running there's a window where one tuple can exist multiple
times and just the fact that other backends can see the
vacuum transaction alive prevents them from showing up? That
doesn't sound really good to me.

Jan

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR: chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead. Comments anyone?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#8)
Re: tuptoaster.c must *not* use SnapshotAny

Jan Wieck <janwieck@yahoo.com> writes:

Really? So what you're saying means that while vacuum is
running there's a window where one tuple can exist multiple
times

Moving a tuple in VACUUM isn't fundamentally different from an UPDATE.
You need a visibility rule to tell you which version of the tuple to
pay attention to. Right now, TOAST hasn't got one.

Per later discussion, Andreas is right that SnapshotSelf won't do.
I think we need a new tqual.c routine just for TOAST. (The alternative
would be for TOAST to know which snapshot was used to find the main-
table tuple, but that doesn't seem workable.)

regards, tom lane

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#7)
Re: tuptoaster.c must *not* use SnapshotAny

I have added more comments to tqual.c to the stuff Tom just added. I
added visibility items to the top of each function comment.

---------------------------------------------------------------------------

Tom Lane wrote:

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

Can you try give me a hint, why an aborted VACUUM FULL will not allways be
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

Because noplace else uses SnapshotAny, basically. The only really
legitimate use I can see for it is in Tatsuo's contrib/pgstattuple
hack: that wants to count both visible and invisible tuples.

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I forgot to answer your question about what these mean. See
src/backend/utils/time/tqual.c for the gory details, but if you
are willing to accept the summary comments in that file:

Any No visibility check whatever

Self Other committed xacts + effects of this xact up to
*and including* current command

Now Other committed xacts + effects of this xact up to
*but not including* current command

Dirty like Self, but can see effects of other in-progress
xacts too

Snapshot like Now, but specified other xacts are considered
uncommitted even if they've actually committed by now

There are also special visibility rules for Update and Vacuum, which
are not really different from the normal rules, they just return more
information than "visible" or "not visible". (Dirty returns some
side information too.)

Hmm. Now that I look at it, Self probably isn't quite right either;
if we are reading a main-table tuple that's committed dead but is
still visible according to our snapshot, we need to be able to see
its toast tuples too; but they're committed dead as well. Sigh.
I think we need a special visibility rule for TOAST, that only
checks for HEAP_MOVED_IN/HEAP_MOVED_OFF (the bits that take care
of VACUUM moves).

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#11Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#10)
Re: tuptoaster.c must *not* use SnapshotAny

How do we know, that a (newly) FrozenXid tuple does not still have
a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

What if we crash/abort between step 1 and 2 but we used FrozenXid for 1.
Don't know if we actually do this, but imho we are only allowed
to use FrozenXid for an inplace vacuum operation.

Andreas

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#11)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

regards, tom lane

#13Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#12)
Re: tuptoaster.c must *not* use SnapshotAny

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Andreas

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#13)
Re: tuptoaster.c must *not* use SnapshotAny

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Right. Actually it doesn't matter whether the old tuple has
HEAP_MOVED_OFF or not; it's still visible *until* the VACUUM commits.
The commit atomically switches us from the state where the unmoved
tuples are good to the state where the moved ones are good.

This is all exactly the same whether FrozenXid is involved or not.

regards, tom lane

#15Jan Wieck
janwieck@yahoo.com
In reply to: Tom Lane (#14)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

It doesn't matter whether it's FrozenXid or not. The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Right. Actually it doesn't matter whether the old tuple has
HEAP_MOVED_OFF or not; it's still visible *until* the VACUUM commits.
The commit atomically switches us from the state where the unmoved
tuples are good to the state where the moved ones are good.

This is all exactly the same whether FrozenXid is involved or not.

Originally I added SnapshotAny only to solve the problem that
the lookup of the toast table happens independant from the
access to the main tuple, and that in fact the main tuple
could have been deleted by the own transaction in between.

Would changing SnapshotAny to honor HEAP_MOVED_* fix the
problem? If nothing else by now uses this snapshot, it could
be done in place.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Jan Wieck <janwieck@yahoo.com> writes:

Originally I added SnapshotAny only to solve the problem that
the lookup of the toast table happens independant from the
access to the main tuple, and that in fact the main tuple
could have been deleted by the own transaction in between.

Would changing SnapshotAny to honor HEAP_MOVED_* fix the
problem? If nothing else by now uses this snapshot, it could
be done in place.

What I've done is add an additional snapshot class SnapshotToast
that behaves that way. There are a couple other uses of SnapshotAny
that I didn't think I wanted to mess with.

regards, tom lane

#17Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Jan Wieck <janwieck@yahoo.com> writes:

Originally I added SnapshotAny only to solve the problem that
the lookup of the toast table happens independant from the
access to the main tuple, and that in fact the main tuple
could have been deleted by the own transaction in between.

Would changing SnapshotAny to honor HEAP_MOVED_* fix the
problem? If nothing else by now uses this snapshot, it could
be done in place.

What I've done is add an additional snapshot class SnapshotToast
that behaves that way. There are a couple other uses of SnapshotAny
that I didn't think I wanted to mess with.

AFAIR SnapshotAny was introduced for Foreign Key stuff.
SnapshotToast doesn't seem a proper solution to me but
7.2 should be released ASAP anyway.

regards,
Hiroshi Inoue

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#17)
Re: tuptoaster.c must *not* use SnapshotAny

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

SnapshotToast doesn't seem a proper solution to me

Do you have a better idea in mind?

regards, tom lane

#19Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

SnapshotToast doesn't seem a proper solution to me

Do you have a better idea in mind?

The same snapshot for both the main and the toast table
seems better though I don't know how it's difficult to
change. For example is it possible to update a toast
chunk partially using SnapshotToast ?

regards,
Hiroshi Inoue

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#19)
Re: tuptoaster.c must *not* use SnapshotAny

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The same snapshot for both the main and the toast table
seems better though I don't know how it's difficult to
change.

That would be cleaner but I don't see any way to do it. The toasted
value gets passed all over the system, potentially, before someone asks
to detoast it. At that point you really don't have a good way to know
which snapshot was used to fetch the main-table row.

For example is it possible to update a toast
chunk partially using SnapshotToast ?

As things stand (with either SnapshotToast or the old SnapshotAny way)
it is never possible to update an individual toast value, either
partially or wholly. All you can do is lay down a new toast value (with
a new identifying OID) and then delete the old one.

But I'm not sure that this is wrong, or fixable. Trying to update part
of a toasted value is very much like wanting to update part of an
existing row in-place, which we cannot possibly do. We have to lay down
a whole new row whenever any part of it is updated.

regards, tom lane

#21Jan Wieck
janwieck@yahoo.com
In reply to: Tom Lane (#20)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The same snapshot for both the main and the toast table
seems better though I don't know how it's difficult to
change.

That would be cleaner but I don't see any way to do it. The toasted
value gets passed all over the system, potentially, before someone asks
to detoast it. At that point you really don't have a good way to know
which snapshot was used to fetch the main-table row.

That's IMHO not the point. The design of toast is that if a
main heap tuple points to a toast id, and the heap tuple at
scan time was visible, then no matter what, the value is
readable. Regardless of any isolation level or command
counter or whatnot. That's achieved by allways assigning a
new toast ID on change of a toasted value. So if the columns
value changed, the old and new rows point to different toast
values not by visibility but by toast id. If it did not
change, they point to the same.

The problem arises only because SnapshotAny is *too* blind
and ignores the special vacuum HEAP_MOVED_* states, so that a
scan after a crashed vacuum full, that left yet unborn moved-
in tuples in the toast table, can see them.

Since the design of toast is NO NEED to care about snapshots
during toast value fetching, I don't see why doing so now
would be a cleaner solution.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#22Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Jan Wieck (#15)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

For example is it possible to update a toast
chunk partially using SnapshotToast ?

As things stand (with either SnapshotToast or the old SnapshotAny way)
it is never possible to update an individual toast value, either
partially or wholly. All you can do is lay down a new toast value (with
a new identifying OID) and then delete the old one.

But I'm not sure that this is wrong, or fixable. Trying to update part
of a toasted value is very much like wanting to update part of an
existing row in-place, which we cannot possibly do.

Bytea seems to be considered as a candidate for BLOB
though I think the entirely new type is preferable.
It seems impossible to implement a functionality like
inv_write using bytea which the current large object
stuff has.

regards,
Hiroshi Inoue

#23Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hiroshi Inoue (#22)
Re: tuptoaster.c must *not* use SnapshotAny

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

For example is it possible to update a toast
chunk partially using SnapshotToast ?

As things stand (with either SnapshotToast or the old SnapshotAny way)
it is never possible to update an individual toast value, either
partially or wholly. All you can do is lay down a new toast value (with
a new identifying OID) and then delete the old one.

But I'm not sure that this is wrong, or fixable. Trying to update part
of a toasted value is very much like wanting to update part of an
existing row in-place, which we cannot possibly do.

Bytea seems to be considered as a candidate for BLOB
though I think the entirely new type is preferable.
It seems impossible to implement a functionality like
inv_write using bytea which the current large object
stuff has.

Agreed. I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write. If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#24Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#23)
Re: tuptoaster.c must *not* use SnapshotAny

Bruce Momjian wrote:

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

For example is it possible to update a toast
chunk partially using SnapshotToast ?

As things stand (with either SnapshotToast or the old SnapshotAny way)
it is never possible to update an individual toast value, either
partially or wholly. All you can do is lay down a new toast value (with
a new identifying OID) and then delete the old one.

But I'm not sure that this is wrong, or fixable. Trying to update part
of a toasted value is very much like wanting to update part of an
existing row in-place, which we cannot possibly do.

Bytea seems to be considered as a candidate for BLOB
though I think the entirely new type is preferable.
It seems impossible to implement a functionality like
inv_write using bytea which the current large object
stuff has.

Agreed. I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write. If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

Oops I seem to have missed the discussion about excluding
bytea from the candidate from BLOB. Yes now we seem to have
a good reason to exclude existent type from the candidate
of BLOB.

regards,
Hiroshi Inoue

#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hiroshi Inoue (#24)
Re: tuptoaster.c must *not* use SnapshotAny

Agreed. I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write. If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

Oops I seem to have missed the discussion about excluding
bytea from the candidate from BLOB. Yes now we seem to have
a good reason to exclude existent type from the candidate
of BLOB.

Well, we had the discussion when Jan was adding TOAST, and Jan was
saying we still need large objects for I/O purposes and for very large
items.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#26Jan Wieck
janwieck@yahoo.com
In reply to: Bruce Momjian (#23)
Re: tuptoaster.c must *not* use SnapshotAny

Bruce Momjian wrote:

Agreed. I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write. If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

Not entirely. BLOB's should have copy semantics, so that
doing

INSERT INTO tab1 SELECT id, blob_column FROM tab2 ...

and subsequently changing the blob in either tab1 or tab2
does NOT change the blob in the other table. Currently, even
if you make the pg_depend stuff smart enough to let the blob
live until all references are gone, the two tables would
share the same blob. And that's not only IMHO wrong, it's
also incompatible to Oracle :-)

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#27Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jan Wieck (#26)
Re: tuptoaster.c must *not* use SnapshotAny

Jan Wieck wrote:

Bruce Momjian wrote:

Agreed. I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write. If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

Not entirely. BLOB's should have copy semantics, so that
doing

INSERT INTO tab1 SELECT id, blob_column FROM tab2 ...

and subsequently changing the blob in either tab1 or tab2
does NOT change the blob in the other table. Currently, even
if you make the pg_depend stuff smart enough to let the blob
live until all references are gone, the two tables would
share the same blob. And that's not only IMHO wrong, it's
also incompatible to Oracle :-)

Yes, we have to add duplication large object stuff hard-wired to that new data
type.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#25)
Re: tuptoaster.c must *not* use SnapshotAny

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Well, we had the discussion when Jan was adding TOAST, and Jan was
saying we still need large objects for I/O purposes and for very large
items.

Yeah. I hadn't realized until this conversation that TOAST couldn't
support partial updates of huge values. That is a definite problem for
supporting lo_read/lo_write-type operations.

regards, tom lane

#29Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#25)
Re: tuptoaster.c must *not* use SnapshotAny

Bruce Momjian wrote:

Agreed. I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write. If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

Oops I seem to have missed the discussion about excluding
bytea from the candidate from BLOB. Yes now we seem to have
a good reason to exclude existent type from the candidate
of BLOB.

Well, we had the discussion when Jan was adding TOAST, and Jan was
saying we still need large objects for I/O purposes and for very large
items.

Though I've often seen the reference to bytea BLOB
I remember no clear negation. Don't we have to negate
it clearly from the first ?

regards,
Hiroshi Inoue

#30Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hiroshi Inoue (#29)
Re: tuptoaster.c must *not* use SnapshotAny

Well, we had the discussion when Jan was adding TOAST, and Jan was
saying we still need large objects for I/O purposes and for very large
items.

Though I've often seen the reference to bytea BLOB
I remember no clear negation. Don't we have to negate
it clearly from the first ?

As I remember, we only mention BLOB when people ask. We don't refer to
either as BLOB, only as TOAST/bytea and large objects.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#31Jan Wieck
janwieck@yahoo.com
In reply to: Hiroshi Inoue (#29)
Re: tuptoaster.c must *not* use SnapshotAny

Hiroshi Inoue wrote:
[Charset iso-2022-jp unsupported, skipping...]

Though I've often seen the reference to bytea BLOB
I remember no clear negation. Don't we have to negate
it clearly from the first ?

Any datatype that has the potential to be loaded into memory
entirely by the backend is dangerous. It leads to problems
like "out of swapspace", which aren't really funny in
production.

We need something that is restricted to streaming access
only. Yet, it needs to have copy semantics.

That means to me, that this will not only be a new datatype,
but some more infrastructure too. We need INSERT ...
RETURNING ... where the values for BLOBs are given as
NEW_BLOB() and the returned identifier is subsequently used
to open the created BLOBs for writing and pump the data in.

We need heap access level tuple support to duplicate those
values on INSERT ... SELECT, to remove them on DELETE and to
manipulate them on UPDATE.

We need new functionality in procedural languages to support
streaming in and out of these guy's, where the actual data is
buffered in temp files and only blob identifiers passed
around.

I see the final storage of blob data to happen in the toast
table, sliced, diced and handy (most probably not
compressed). But I don't see an easy way to abuse the bytea
datatype for real blob support.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#32Joe Conway
jconway@cox.net
In reply to: Bruce Momjian (#25)
Re: tuptoaster.c must *not* use SnapshotAny

Hiroshi Inoue wrote:

Though I've often seen the reference to bytea BLOB
I remember no clear negation. Don't we have to negate
it clearly from the first ?

regards,
Hiroshi Inoue

With the changes in 7.2, bytea is actually very close to matching the
SQL99 definition of binary strings, which it also refers to as BLOB.
BLOBS should include support for (section 4.3):

<comparison predicate> - yes
<blob concatenation> - yes
<blob substring function> - yes
<blob overlay function> - no
<trim function> - yes
<length expression> - yes
<position expression> - yes
<like predicate> - yes

Other than the overlay function, the primary thing missing is the use of
hex as the I/O representation.

See table 3-9 at
http://candle.pha.pa.us/main/writings/pgsql/sgml/datatype-binary.html
for more info.

Joe

#33Joe Conway
jconway@cox.net
In reply to: Jan Wieck (#31)
Re: tuptoaster.c must *not* use SnapshotAny

Jan Wieck wrote:

Any datatype that has the potential to be loaded into memory
entirely by the backend is dangerous. It leads to problems
like "out of swapspace", which aren't really funny in
production.

We need something that is restricted to streaming access
only. Yet, it needs to have copy semantics.

That means to me, that this will not only be a new datatype,
but some more infrastructure too. We need INSERT ...
RETURNING ... where the values for BLOBs are given as
NEW_BLOB() and the returned identifier is subsequently used
to open the created BLOBs for writing and pump the data in.

We need heap access level tuple support to duplicate those
values on INSERT ... SELECT, to remove them on DELETE and to
manipulate them on UPDATE.

We need new functionality in procedural languages to support
streaming in and out of these guy's, where the actual data is
buffered in temp files and only blob identifiers passed
around.

I see the final storage of blob data to happen in the toast
table, sliced, diced and handy (most probably not
compressed). But I don't see an easy way to abuse the bytea
datatype for real blob support.

All this may be needed for BLOBs that are *huge*, but there are at least
two classes of BLOB data that are not typically huge in practice. One is
encrypted or hashed data, the other is image data for web pages. Either
of these work very well with the current bytea support.

Joe

#34Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#25)
Re: tuptoaster.c must *not* use SnapshotAny

Joe Conway wrote:

Hiroshi Inoue wrote:

Though I've often seen the reference to bytea BLOB
I remember no clear negation. Don't we have to negate
it clearly from the first ?

regards,
Hiroshi Inoue

With the changes in 7.2, bytea is actually very close to matching the
SQL99 definition of binary strings, which it also refers to as BLOB.
BLOBS should include support for (section 4.3):

<comparison predicate> - yes
<blob concatenation> - yes
<blob substring function> - yes
<blob overlay function> - no
<trim function> - yes
<length expression> - yes
<position expression> - yes
<like predicate> - yes

Other than the overlay function,

It seems good as long as they are not large(huge ?).
It seems a significant flaw for bytea to have no
possibility of partial update.

the primary thing missing is the use of
hex as the I/O representation.

I don't love the current I/O representation of bytea.
It seems worth adding a new data type only to change the
binary I/O representation.

regards,
Hiroshi Inoue