Two different methods of sneaking non-immutable data into an index

Started by Merlin Moncureover 15 years ago10 messages
#1Merlin Moncure
mmoncure@gmail.com

While chatting with Haas off-list regarding how the new array/string
functions should work (see the thread in its glory here:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg148865.html)
the debate morphed into the relative pros and cons about the proposed
concat() being marked stable vs immutable. I did some checking into
how things work now, and found some surprising cases.

*) No dependency check on user definable casts:
postgres=# create function hackdate(date) returns text as $$ select
'casted!'::text; $$ language sql;
CREATE FUNCTION
postgres=# create cast (date as text) with function hackdate(date);
CREATE CAST
postgres=# select now()::date || 'abc'::text; -- you're right!
?column?
------------
casted!abc
postgres=# create table vtest(a date, b text);
CREATE TABLE
postgres=# create unique index vtest_idx on vtest((a || b));
CREATE INDEX
postgres=# insert into vtest values (now(), 'test');
INSERT 0 1
postgres=# insert into vtest values (now(), 'test'); -- should fail
ERROR: duplicate key value violates unique constraint "vtest_idx"
DETAIL: Key ((a || b))=(casted!test) already exists.
postgres=# drop cast (date as text);
DROP CAST
postgres=# insert into vtest values (now(), 'test');
INSERT 0 1
postgres=# select * from vtest;
a | b
------------+------
2010-08-01 | test
2010-08-01 | test
(2 rows)

*) textanycat is defined immutable and shouldn't be:
create table vtest(a date, b text);
create unique index vtest_idx on vtest((a::text || b)); -- fails on
immutable check
create unique index vtest_idx on vtest((a || b)); -- works??
insert into vtest values (now(), 'test');
set datestyle to 'SQL, DMY';
insert into vtest values (now(), 'test');
postgres=# select * from vtest;date
a | b
------------+------
31/07/2010 | test
31/07/2010 | test
(2 rows)
postgres=# select * from vtest where a|| b = now()::date || 'test';
a | b
------------+------
31/07/2010 | test
(1 row)

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle? Shouldn't the related functions
be marked 'volatile', not stable?

merlin

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
Re: Two different methods of sneaking non-immutable data into an index

Merlin Moncure <mmoncure@gmail.com> writes:

While chatting with Haas off-list regarding how the new array/string
functions should work (see the thread in its glory here:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg148865.html)
the debate morphed into the relative pros and cons about the proposed
concat() being marked stable vs immutable. I did some checking into
how things work now, and found some surprising cases.

Er ... "now" being defined as what? I can't replicate your results in
HEAD. In particular, textanycat isn't immutable anymore.

The DROP CAST case is a bit interesting. We don't record a dependency
on the cast as such, but on the underlying function --- if you'd tried
to drop the function you'd not have been allowed to. It is a bit
peculiar that dropping the cast causes the meaning of a::text to change,
but I'm not sure there's much we can do about that. In any case, it
seems like that's not nearly as much of a hazard as doing CREATE OR
REPLACE FUNCTION and changing the computation done by the function.
We could disallow that maybe, but that cure seems worse than the
disease.

regards, tom lane

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#2)
Re: Two different methods of sneaking non-immutable data into an index

On Wed, Aug 4, 2010 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

While chatting with Haas off-list regarding how the new array/string
functions should work (see the thread in its glory here:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg148865.html)
the debate  morphed into the relative pros and cons about the proposed
concat() being marked stable vs immutable.  I did some checking into
how things work now, and found some surprising cases.

Er ... "now" being defined as what?  I can't replicate your results in
HEAD.  In particular, textanycat isn't immutable anymore.

ah, my mistake. I'm using a fairly old build of 9.0.

The DROP CAST case is a bit interesting.  We don't record a dependency
on the cast as such, but on the underlying function --- if you'd tried
to drop the function you'd not have been allowed to.  It is a bit
peculiar that dropping the cast causes the meaning of a::text to change,
but I'm not sure there's much we can do about that.  In any case, it
seems like that's not nearly as much of a hazard as doing CREATE OR
REPLACE FUNCTION and changing the computation done by the function.
We could disallow that maybe, but that cure seems worse than the
disease.

yep (the textanycat was much more interesting to me)

merlin

#4Robert Haas
robertmhaas@gmail.com
In reply to: Merlin Moncure (#1)
Re: Two different methods of sneaking non-immutable data into an index

On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

This is just evil. It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something. We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Robert Haas (#4)
Re: Two different methods of sneaking non-immutable data into an index

On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

This is just evil.  It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something.  We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then. I realize this is a corner (of the universe) case,
but I can't recall any other case of volatility being relaxed on
performance grounds... :-). Maybe a documentation warning would
suffice?

merlin

#6Chris Browne
cbbrowne@acm.org
In reply to: Merlin Moncure (#1)
Re: Two different methods of sneaking non-immutable data into an index

mmoncure@gmail.com (Merlin Moncure) writes:

On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

This is just evil.  It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something.  We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then. I realize this is a corner (of the universe) case,
but I can't recall any other case of volatility being relaxed on
performance grounds... :-). Maybe a documentation warning would
suffice?

That would cause grief for Slony-I, methinks, and probably other things
that behave somewhat similar.

The "logtrigger()" function coerces datestyle to ISO, so that when dates
get stored, they are stored in a canonical form, irrespective of an
individual connection's decisions on datestyle, so we don't have to
include datestyle information as part of the replicated data.
--
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/postgresql.html
Chaotic Evil means never having to say you're sorry.

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Chris Browne (#6)
Re: Two different methods of sneaking non-immutable data into an index

On Thu, Aug 5, 2010 at 12:59 PM, Chris Browne <cbbrowne@acm.org> wrote:

mmoncure@gmail.com (Merlin Moncure) writes:

On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

This is just evil.  It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something.  We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then.  I realize this is a corner (of the universe) case,
but I can't recall any other case of volatility being relaxed on
performance grounds... :-).  Maybe a documentation warning would
suffice?

That would cause grief for Slony-I, methinks, and probably other things
that behave somewhat similar.

The "logtrigger()" function coerces datestyle to ISO, so that when dates
get stored, they are stored in a canonical form, irrespective of an
individual connection's decisions on datestyle, so we don't have to
include datestyle information as part of the replicated data.

hm -- interesting -- couldn't that cause exactly the sort of situation
though where stability of statement is violated?

merlin

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Browne (#6)
Re: Two different methods of sneaking non-immutable data into an index

Chris Browne <cbbrowne@acm.org> writes:

mmoncure@gmail.com (Merlin Moncure) writes:

yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then.

That would cause grief for Slony-I, methinks, and probably other things
that behave somewhat similar.

Yeah, it's not really practical (or useful IMO) to try to lock this down
completely.

regards, tom lane

#9Chris Browne
cbbrowne@acm.org
In reply to: Merlin Moncure (#1)
Re: Two different methods of sneaking non-immutable data into an index

mmoncure@gmail.com (Merlin Moncure) writes:

On Thu, Aug 5, 2010 at 12:59 PM, Chris Browne <cbbrowne@acm.org> wrote:

mmoncure@gmail.com (Merlin Moncure) writes:

On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

This is just evil.  It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something.  We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then.  I realize this is a corner (of the universe) case,
but I can't recall any other case of volatility being relaxed on
performance grounds... :-).  Maybe a documentation warning would
suffice?

That would cause grief for Slony-I, methinks, and probably other things
that behave somewhat similar.

The "logtrigger()" function coerces datestyle to ISO, so that when dates
get stored, they are stored in a canonical form, irrespective of an
individual connection's decisions on datestyle, so we don't have to
include datestyle information as part of the replicated data.

hm -- interesting -- couldn't that cause exactly the sort of situation
though where stability of statement is violated?

It shouldn't...

The data gets stored physically, on disk, in a canonical form.

Why should it be "unstable" to capture data in a canonical form, when
that's what gets stored on disk?
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
The statistics on sanity are that one out of every four Americans is
suffering from some form of mental illness. Think of your three best
friends. If they're okay, then it's you. -- Rita Mae Brown

#10Robert Haas
robertmhaas@gmail.com
In reply to: Chris Browne (#6)
Re: Two different methods of sneaking non-immutable data into an index

On Thu, Aug 5, 2010 at 12:59 PM, Chris Browne <cbbrowne@acm.org> wrote:

mmoncure@gmail.com (Merlin Moncure) writes:

On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

This is just evil.  It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something.  We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then.  I realize this is a corner (of the universe) case,
but I can't recall any other case of volatility being relaxed on
performance grounds... :-).  Maybe a documentation warning would
suffice?

That would cause grief for Slony-I, methinks, and probably other things
that behave somewhat similar.

The "logtrigger()" function coerces datestyle to ISO, so that when dates
get stored, they are stored in a canonical form, irrespective of an
individual connection's decisions on datestyle, so we don't have to
include datestyle information as part of the replicated data.

I think functions should be allowed to change GUCs internally, but
maybe not for the context from which they were called.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company