function side effects

Started by Tatsuo Ishiiover 16 years ago32 messageshackers

t-ishii@sra.co.jp

over 16 years ago

Hi,

I'm wondering if we could detect a funcion has a side effect,
i.e. does a write to database. This is neccessary for pgpool to decide
if a qeury should to be sent to all of databases or not. If a query
includes functions which do writes to database, it should send the
query to all of databases, otherwise the contents of databases go into
inconsistent state.

Currently we have three properties of functions: IMMUTABLE, STABLE and
VOLATILE. According to docs IMMUTABLE or STABLE functions do not write
to database. VOLATILE functions *may* do writes to database. Maybe I
could regard VOLATILE functions always do write, but priblem is,
VOLATILE qfunctions such as random() and timeofday() apparently do not
write and sending those queries that include such functions is
overkill.

Can we VOLATILE property divide into two categories, say, VOLATILE
without write, and VOLATILE with write?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

over 16 years ago

In reply to: Tatsuo Ishii (#1)

Re: [SPAM]function side effects

"Tatsuo Ishii" <ishii@postgresql.org> wrote:

VOLATILE functions such as random() and timeofday() apparently do not
write and sending those queries that include such functions is
overkill.

Can we VOLATILE property divide into two categories, say, VOLATILE
without write, and VOLATILE with write?

I think it's possible. We might borrow words and semantics from
unctional programming languages for functions with side effects.
How do they handle the issue?

BTW, random() *writes* the random seed, though no one will mind it.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Tatsuo Ishii (#1)

Re: function side effects

Tatsuo Ishii <ishii@postgresql.org> writes:

I'm wondering if we could detect a funcion has a side effect,
i.e. does a write to database.

Currently we have three properties of functions: IMMUTABLE, STABLE and
VOLATILE. According to docs IMMUTABLE or STABLE functions do not write
to database.

Those classifications are meant as planner directives; they are NOT
meant to be bulletproof. Hanging database integrity guarantees on
whether a "non volatile" function changes anything is entirely unsafe.
To give just one illustration of the problems, a nonvolatile function
is allowed to call a volatile one.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

over 16 years ago

In reply to: Tatsuo Ishii (#1)

Re: function side effects

Tatsuo Ishii wrote:

Hi,

I'm wondering if we could detect a funcion has a side effect,
i.e. does a write to database. This is neccessary for pgpool to decide
if a qeury should to be sent to all of databases or not. If a query
includes functions which do writes to database, it should send the
query to all of databases, otherwise the contents of databases go into
inconsistent state.

I was talking about this to someone in Cuba and one conclusion we
reached was that this was a fairly difficult task -- consider that
someone may choose to define an innocent-looking operator using a
volatile function. If you only examine things that look like functions
in the query you will miss those. The only way to figure out whether a
query has a write effect is to ask the server about the whole query.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Tatsuo Ishii

t-ishii@sra.co.jp

over 16 years ago

In reply to: Alvaro Herrera (#4)

Re: function side effects

I'm wondering if we could detect a funcion has a side effect,
i.e. does a write to database. This is neccessary for pgpool to decide
if a qeury should to be sent to all of databases or not. If a query
includes functions which do writes to database, it should send the
query to all of databases, otherwise the contents of databases go into
inconsistent state.

I was talking about this to someone in Cuba and one conclusion we
reached was that this was a fairly difficult task -- consider that
someone may choose to define an innocent-looking operator using a
volatile function. If you only examine things that look like functions
in the query you will miss those. The only way to figure out whether a
query has a write effect is to ask the server about the whole query.

In general you are right. However in most database application
systems, it is possible that all functions are properly designed and
implemented (at least they want so). In this world, more or less
PostgreSQL functions are just a part of their applications. If they
trust their client side applications, why they cannot trust PostgreSQL
custom functions as well?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 16 years ago

In reply to: Tom Lane (#3)

Re: function side effects

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Those classifications are meant as planner directives; they are
NOT meant to be bulletproof. Hanging database integrity
guarantees on whether a "non volatile" function changes anything
is entirely unsafe. To give just one illustration of the
problems, a nonvolatile function is allowed to call a volatile
one.

Could it work to store a flag in each process to indicate when it is
executing a non-volatile function, and throw an error on any attempt
to call a volatile function or modify the database?

-Kevin

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Kevin Grittner (#6)

Re: function side effects

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Those classifications are meant as planner directives; they are
NOT meant to be bulletproof. Hanging database integrity
guarantees on whether a "non volatile" function changes anything
is entirely unsafe. To give just one illustration of the
problems, a nonvolatile function is allowed to call a volatile
one.

Could it work to store a flag in each process to indicate when it is
executing a non-volatile function, and throw an error on any attempt
to call a volatile function or modify the database?

It's *not an error* for a nonvolatile function to call a volatile one.
At least it's never been in the past, and I'm sure you'd break some
applications if you made it so in the future.

regards, tom lane

Tatsuo Ishii

t-ishii@sra.co.jp

over 16 years ago

In reply to: Tatsuo Ishii (#5)

Re: function side effects

I was talking about this to someone in Cuba and one conclusion we
reached was that this was a fairly difficult task -- consider that
someone may choose to define an innocent-looking operator using a
volatile function. If you only examine things that look like functions
in the query you will miss those. The only way to figure out whether a
query has a write effect is to ask the server about the whole query.

In general you are right. However in most database application
systems, it is possible that all functions are properly designed and
implemented (at least they want so). In this world, more or less
PostgreSQL functions are just a part of their applications. If they
trust their client side applications, why they cannot trust PostgreSQL
custom functions as well?

Still there could be "none honest functions" such as calling volatile
functions from non volatile function in the PostgreSQL system(I have
not made any investigation. But it's possible). Or vendor provided
functions (for example embedded in closed source packages) might fall
into this category. Probably it's enough for pgpool to have a "black
list" of such that function. Maintaining such a list is a boring task
but I cannot think of any good way at this point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Jaime Casanova

jcasanov@systemguards.com.ec

over 16 years ago

In reply to: Tom Lane (#7)

Re: function side effects

On Tue, Feb 23, 2010 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's *not an error* for a nonvolatile function to call a volatile one.

it should be considered an error i think, someone think there is a use
cas for calling volatile functions
inside stable ones but i can see what that reason could be...

At least it's never been in the past, and I'm sure you'd break some
applications if you made it so in the future.

i'm sure of that too, but in this case seems reasonable to do so

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#10

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 16 years ago

In reply to: Tom Lane (#7)

Re: function side effects

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

throw an error on any attempt to call a volatile function or
modify the database?

It's *not an error* for a nonvolatile function to call a volatile
one.

Right, we all know it currently doesn't throw an error, but I can't
think of anywhere I'd like to have someone do that in a database for
which I have any responsibility. Does anyone have a sane use case
for a non-volatile function to call a volatile one or to update the
database?

-Kevin

#11

Bruce Momjian

bruce@momjian.us

over 16 years ago

In reply to: Kevin Grittner (#10)

Re: function side effects

On Tue, Feb 23, 2010 at 4:52 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Right, we all know it currently doesn't throw an error, but I can't
think of anywhere I'd like to have someone do that in a database for
which I have any responsibility. Does anyone have a sane use case
for a non-volatile function to call a volatile one or to update the
database?

So consider for example a function which explicitly sets the timezone
and then uses timestamp without timezone functions (which are volatile
only because the GUC variable might change between calls).

Or somebody who uses the tsearch functions because they're planning to
not change their dictionaries.

Or builds a hash function by calling random after setting the seed to
a specific value -- this is actually a fairly popular strategy for
building good hash functions.

--
greg

#12

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 16 years ago

In reply to: Bruce Momjian (#11)

Re: function side effects

Greg Stark <gsstark@mit.edu> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Does anyone have a sane use case for a non-volatile function to
call a volatile one or to update the database?

So consider for example a function which explicitly sets the
timezone and then uses timestamp without timezone functions (which
are volatile only because the GUC variable might change between
calls).

OK, I can see where that would be sane, but it seems more fragile
than using timestamp with time zone. But, OK, something sane and
functional could break on that.

Or somebody who uses the tsearch functions because they're
planning to not change their dictionaries.

I didn't realize tsearch functions were volatile. Should they
really be so?

Or builds a hash function by calling random after setting the seed
to a specific value -- this is actually a fairly popular strategy
for building good hash functions.

I'd never seen that. I'm not sure I understand where that comes in
useful, but if you've seen it enough to call it "fairly popular" I
guess I have to accept it.

Thanks for the examples. They did make me consider a real-life type
of process which isn't currently implemented as a PostgreSQL
function, but conceivably could be -- randomizing a pool of jurors
to facilitate jury selection. My eyes are opened. :-)

-Kevin

#13

Bruce Momjian

bruce@momjian.us

over 16 years ago

In reply to: Kevin Grittner (#12)

Re: function side effects

On Tue, Feb 23, 2010 at 6:39 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Or somebody who uses the tsearch functions because they're
planning to not change their dictionaries.

I didn't realize tsearch functions were volatile. Should they
really be so?

Uhm, my mistake. They're stable. Ok, for that one I'll substitute a
function which uses pg_read_file knowing that the file in question
won't be changed. Perhaps it's a per-machine key or something like
that.

Or builds a hash function by calling random after setting the seed
to a specific value -- this is actually a fairly popular strategy
for building good hash functions.

I'd never seen that. I'm not sure I understand where that comes in
useful, but if you've seen it enough to call it "fairly popular" I
guess I have to accept it.

http://en.wikipedia.org/wiki/Universal_hashing

They have the useful property that it's hard for an attacker to
contrive data which has poor collision behaviour.

Thanks for the examples. They did make me consider a real-life type
of process which isn't currently implemented as a PostgreSQL
function, but conceivably could be -- randomizing a pool of jurors
to facilitate jury selection. My eyes are opened. :-)

I'm not actually sure I follow what you're picturing.

--
greg

#14

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Bruce Momjian (#13)

Re: function side effects

Greg Stark <gsstark@mit.edu> writes:

On Tue, Feb 23, 2010 at 6:39 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

I didn't realize tsearch functions were volatile. �Should they
really be so?

Uhm, my mistake. They're stable.

IMMUTABLE/STABLE/VOLATILE is not really about side effects, it is about
how long the function value can be expected to hold still for.

There are quite a lot of cases of functions that are marked
conservatively as stable (or even volatile) but could be considered
immutable in particular queries, because the application developer is
prepared to assume that values such as GUCs won't change in his usage.
The traditional way to deal with that is to wrap them in an immutable
wrapper function. There's actually code in the planner to make that
work --- we have to suppress inlining to avoid exposing the not-immutable
guts, else the planner will not do what's wanted.

There may be some value in inventing a "has no side effects" marker, but
that should not be confused with IMMUTABLE/STABLE.

regards, tom lane

#15

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#14)

Re: function side effects

On Tue, Feb 23, 2010 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There may be some value in inventing a "has no side effects" marker, but
that should not be confused with IMMUTABLE/STABLE.

Yeah, that's what I was thinking, too....

...Robert

#16

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 16 years ago

In reply to: Bruce Momjian (#13)

Re: function side effects

Greg Stark <gsstark@mit.edu> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Thanks for the examples. They did make me consider a real-life
type of process which isn't currently implemented as a PostgreSQL
function, but conceivably could be -- randomizing a pool of
jurors to facilitate jury selection. My eyes are opened. :-)

I'm not actually sure I follow what you're picturing.

Well, to facilitate people's rights to a jury of their peers, we
obtain lists of people in each county based on having a drivers
license or state ID, being registered to vote, etc., then (after
eliminating duplicates and those who have served on juries in recent
years) we randomly select a subset, who get questionnaires, from
which (at a later date) we randomly pick people to summon for jury a
juror panel, from which (on each day they appear) we randomly select
people for particular juries.

Any flaw in the randomness of selection could constitute grounds for
an appeal of the outcome of a case, so we have to be careful about
process. (Randomness being defined as the properties that nobody
with an interest in the case can control or predict who will be
selected from one group into the next, and there is no bias on
anything related to demographics, like age or last name [which could
correlate with ethnicity]). Sounds like fun, eh?

-Kevin

#17

Jaime Casanova

jcasanov@systemguards.com.ec

over 16 years ago

In reply to: Tom Lane (#14)

Re: function side effects

On Tue, Feb 23, 2010 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There may be some value in inventing a "has no side effects" marker, but
that should not be confused with IMMUTABLE/STABLE.

a READONLY function?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#18

Simon Riggs

simon@2ndQuadrant.com

over 16 years ago

In reply to: Tom Lane (#3)

Re: function side effects

On Mon, 2010-02-22 at 23:49 -0500, Tom Lane wrote:

Tatsuo Ishii <ishii@postgresql.org> writes:

I'm wondering if we could detect a funcion has a side effect,
i.e. does a write to database.

Currently we have three properties of functions: IMMUTABLE, STABLE and
VOLATILE. According to docs IMMUTABLE or STABLE functions do not write
to database.

Those classifications are meant as planner directives; they are NOT
meant to be bulletproof.

You make them sound like "hints". (I thought we frowned on those?)

That isn't true, they don't just change the optimal plan in the way the
enable_* parameters do. Immutable functions are reduced in ways that
would give the wrong answer if the function is actually volatile.
Referring to function properties as "planner directives" hides their
critical importance to the output of a query that calls such functions.

Hanging database integrity guarantees on
whether a "non volatile" function changes anything is entirely unsafe.
To give just one illustration of the problems, a nonvolatile function
is allowed to call a volatile one.

So wrongly marking a function as something other than volatile *is* a
data integrity issue. Why is that OK? ISTM that this should work the way
Tatsuo wants it to work. Immutability should be passed down through the
call stack to ensure we can't get this wrong.

If people have been advising clients to set things immutable when they
are not that seems fairly questionable. We shouldn't avoid fixing an
integrity loophole just simply to preserve a planner backdoor,
especially since other backdoors are specifically avoided.

--
Simon Riggs www.2ndQuadrant.com

#19

Simon Riggs

simon@2ndQuadrant.com

over 16 years ago

In reply to: Tatsuo Ishii (#1)

Re: function side effects

On Tue, 2010-02-23 at 12:51 +0900, Tatsuo Ishii wrote:

I'm wondering if we could detect a funcion has a side effect,
i.e. does a write to database. This is neccessary for pgpool to decide
if a qeury should to be sent to all of databases or not. If a query
includes functions which do writes to database, it should send the
query to all of databases, otherwise the contents of databases go into
inconsistent state.

Currently we have three properties of functions: IMMUTABLE, STABLE and
VOLATILE. According to docs IMMUTABLE or STABLE functions do not write
to database. VOLATILE functions *may* do writes to database. Maybe I
could regard VOLATILE functions always do write, but priblem is,
VOLATILE qfunctions such as random() and timeofday() apparently do not
write and sending those queries that include such functions is
overkill.

Can we VOLATILE property divide into two categories, say, VOLATILE
without write, and VOLATILE with write?

pgpool parses the query before deciding how to route it, yes?

Why not mark random() and timeofday() as stable in the pgpool catalog,
yet leave them as volatile on the database servers? It will "just work"
then.

--
Simon Riggs www.2ndQuadrant.com

#20

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Simon Riggs (#18)

Re: function side effects

Simon Riggs <simon@2ndQuadrant.com> writes:

So wrongly marking a function as something other than volatile *is* a
data integrity issue. Why is that OK? ISTM that this should work the way
Tatsuo wants it to work.

Please read the rest of the thread.

regards, tom lane

#21

Tatsuo Ishii

t-ishii@sra.co.jp

over 16 years ago

In reply to: Simon Riggs (#19)

#22

Tatsuo Ishii

t-ishii@sra.co.jp

over 16 years ago

In reply to: Tom Lane (#14)