Pass-by-reference UDTs and volatility

Started by Stephen Scheckalmost 13 years ago4 messagesgeneral
Jump to latest
#1Stephen Scheck
singularsyntax@gmail.com

Hello,

I am working on an extension which defines a number of user-defined
functions which will operate on a common, custom data type to perform a
pipeline of transformations (the data type is the IN/OUT parameter for all
of the functions), eventually being supplied to a sink function which takes
the data type as input and produces tuple(s) as output. A source function
will produce the initial instance of the data type and feed it to the head
of the pipeline. As such, the data type only ever exists in memory and is
never intended to be used as a column of a table or persisted to disk.

What I would really like is something like the "internal" pseudo-type that
can be used to define the functions, but is not allowed to be used as a
column type in a table. However, since these functions must be callable
from a user context, "internal" will not work. I'm not that concerned with
users trying to use the data type in table DDL as they can simply be
instructed not to in documentation (and suffer the consequences of their
actions if they don't read the fine manual). However, in chapter 35.9 of
the Postgres docs, there is this warning:

"Never modify the contents of a pass-by-reference input value. If you do so
you are likely to corrupt on-disk data, since the pointer you are given
might point directly into a disk buffer. The sole exception to this rule is
explained in Section 35.10."

If the UDTs the extension defines are the sole producer/consumer of the
data type and are consistent in the way they manipulate the in-memory data
structure for the type, can the above rule be safely ignored? Or could the
backend do something like try to persist intermediate return values from
functions to temporary hard storage as it proceeds with execution of a
query plan?

Thanks.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Scheck (#1)
Re: Pass-by-reference UDTs and volatility

Stephen Scheck <singularsyntax@gmail.com> writes:

"Never modify the contents of a pass-by-reference input value. If you do so
you are likely to corrupt on-disk data, since the pointer you are given
might point directly into a disk buffer. The sole exception to this rule is
explained in Section 35.10."

If the UDTs the extension defines are the sole producer/consumer of the
data type and are consistent in the way they manipulate the in-memory data
structure for the type, can the above rule be safely ignored?

No.

Or could the
backend do something like try to persist intermediate return values from
functions to temporary hard storage as it proceeds with execution of a
query plan?

It might well do that; you really do not have the option to create Datum
values that can't be copied by datumCopy(). Even more directly, if you
do something like

select foo('...'::pass_by_ref_type)

and foo elects to scribble on its input, it will be corrupting a Const
node in the query plan. You'd probably not notice any bad effects from
that in the case of a one-shot plan, but it would definitely break
cached plans.

Just brainstorming here, but: you might consider keeping the actual
value(s) in private storage, perhaps a hashtable, and making the Datums
that Postgres passes around be just tokens referencing hashtable
entries. This would for one thing give you greatly more security
against user query-structure errors than what you're sketching.
The main thing that might be hard to deal with is figuring out when it's
safe to reclaim a no-longer-referenced value. You could certainly do so
at top-level transaction end, but depending on what your app is doing,
that might not be enough.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Stephen Scheck
singularsyntax@gmail.com
In reply to: Tom Lane (#2)
Re: Pass-by-reference UDTs and volatility

Hmm, that might work - so allocate the values in a transaction-scoped
memory context?
But how would the hash table keys themselves be deleted? Is there some
callback API to
hook transaction completion?

On Wed, Jun 12, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Stephen Scheck <singularsyntax@gmail.com> writes:

"Never modify the contents of a pass-by-reference input value. If you do

so

you are likely to corrupt on-disk data, since the pointer you are given
might point directly into a disk buffer. The sole exception to this rule

is

explained in Section 35.10."

If the UDTs the extension defines are the sole producer/consumer of the
data type and are consistent in the way they manipulate the in-memory

data

structure for the type, can the above rule be safely ignored?

No.

Or could the
backend do something like try to persist intermediate return values from
functions to temporary hard storage as it proceeds with execution of a
query plan?

It might well do that; you really do not have the option to create Datum
values that can't be copied by datumCopy(). Even more directly, if you
do something like

select foo('...'::pass_by_ref_type)

and foo elects to scribble on its input, it will be corrupting a Const
node in the query plan. You'd probably not notice any bad effects from
that in the case of a one-shot plan, but it would definitely break
cached plans.

Just brainstorming here, but: you might consider keeping the actual
value(s) in private storage, perhaps a hashtable, and making the Datums
that Postgres passes around be just tokens referencing hashtable
entries. This would for one thing give you greatly more security
against user query-structure errors than what you're sketching.
The main thing that might be hard to deal with is figuring out when it's
safe to reclaim a no-longer-referenced value. You could certainly do so
at top-level transaction end, but depending on what your app is doing,
that might not be enough.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Scheck (#3)
Re: Pass-by-reference UDTs and volatility

Stephen Scheck <singularsyntax@gmail.com> writes:

But how would the hash table keys themselves be deleted? Is there some
callback API to hook transaction completion?

See RegisterXactCallback and RegisterSubXactCallback.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general