2 questions about volatile attribute of pg_proc.

Started by Andy Fanover 4 years ago13 messages
#1Andy Fan
zhihui.fan1213@gmail.com

Hi:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that. Asking user
to set the value is not a good experience, is it possible to auto-generate
the value for it rather than use the volatile directly for user defined
function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see
if there
is a volatile function?

The second question "It is v for “volatile” functions, whose results might
change at any time.
(Use v also for functions with side-effects, so that calls to them cannot
get optimized away.)"
I think they are different semantics. One of the results is volatile
functions can't be removed
by remove_unused_subquery_output even if it doesn't have side effects. for
example:
select b from (select an_expensive_random(), b from t); Is it by design
on purpose?

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#1)
Re: 2 questions about volatile attribute of pg_proc.

ne 18. 4. 2021 v 17:06 odesílatel Andy Fan <zhihui.fan1213@gmail.com>
napsal:

Hi:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that. Asking user
to set the value is not a good experience, is it possible to auto-generate
the value for it rather than use the volatile directly for user defined
function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see
if there
is a volatile function?

plpgsql_check does this check - the performance check check if function can
be marked as stable

https://github.com/okbob/plpgsql_check

I don't think so this can be done automatically - plpgsql does not check
objects inside in registration time. You can use objects and functions that
don't exist in CREATE FUNCTION time. And you need to know this info before
optimization time. So if we implement this check automatically, then
planning time can be increased a lot.

Regards

Pavel

Show quoted text

The second question "It is v for “volatile” functions, whose results might
change at any time.
(Use v also for functions with side-effects, so that calls to them cannot
get optimized away.)"
I think they are different semantics. One of the results is volatile
functions can't be removed
by remove_unused_subquery_output even if it doesn't have side effects. for
example:
select b from (select an_expensive_random(), b from t); Is it by design
on purpose?

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Fan (#1)
Re: 2 questions about volatile attribute of pg_proc.

Andy Fan <zhihui.fan1213@gmail.com> writes:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that. Asking user
to set the value is not a good experience, is it possible to auto-generate
the value for it rather than use the volatile directly for user defined
function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see
if there
is a volatile function?

Are you familiar with the halting problem? I don't see any meaningful
difference here.

regards, tom lane

#4Isaac Morland
isaac.morland@gmail.com
In reply to: Tom Lane (#3)
Re: 2 questions about volatile attribute of pg_proc.

On Sun, 18 Apr 2021 at 11:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Fan <zhihui.fan1213@gmail.com> writes:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that. Asking user
to set the value is not a good experience, is it possible to

auto-generate

the value for it rather than use the volatile directly for user defined
function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see
if there
is a volatile function?

Are you familiar with the halting problem? I don't see any meaningful
difference here.

I think what is being suggested is akin to type checking, not solving the
halting problem. Parse the function text, identify all functions it might
call (without solving the equivalent of the halting problem to see if it
actually does or could), and apply the most volatile value of called
functions to the calling function.

That being said, there are significant difficulties, including but almost
certainly not limited to:

- what happens if one modifies a called function after creating the calling
function?
- EXECUTE
- a PL/PGSQL function's meaning depends on the search path in effect when
it is called, unless it has a SET search_path clause or it fully qualifies
all object references, so it isn't actually possible in general to
determine what a function calls at definition time

If the Haskell compiler is possible then what is being requested here is
conceptually possible even if there are major issues with actually doing it
in the Postgres context. The halting problem is not the problem here.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Isaac Morland (#4)
Re: 2 questions about volatile attribute of pg_proc.

Isaac Morland <isaac.morland@gmail.com> writes:

On Sun, 18 Apr 2021 at 11:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Are you familiar with the halting problem? I don't see any meaningful
difference here.

I think what is being suggested is akin to type checking, not solving the
halting problem.

Yeah, on further thought we'd be satisfied with a conservative
approximation, so that removes the theoretical-impossibility objection.
Still, there are a lot of remaining problems, as you note.

regards, tom lane

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#5)
Re: 2 questions about volatile attribute of pg_proc.

On Sun, Apr 18, 2021 at 9:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Isaac Morland <isaac.morland@gmail.com> writes:

On Sun, 18 Apr 2021 at 11:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Are you familiar with the halting problem? I don't see any meaningful
difference here.

I think what is being suggested is akin to type checking, not solving the
halting problem.

Yeah, on further thought we'd be satisfied with a conservative
approximation, so that removes the theoretical-impossibility objection.
Still, there are a lot of remaining problems, as you note.

Yeah, the type checking approach seems blocked by the "black box" nature of
functions. A possibly more promising approach is for the top-level call to
declare its expectations (which are set by the user) and during execution
if that expectation is violated directly, or is reported as violated deeper
in the call stack, the execution of the function fails with some kind of
invalid state error. However, as with other suggestions of this nature,
the fundamental blocker here is that to be particularly useful this kind of
validation needs to happen by default (as opposed to opt-in) which risks
breaking existing code. And so I foresee this request falling into the
same category as those others - an interesting idea that could probably be
made to work but by itself isn't worthwhile enough to go and introduce a
fundamental change to the amount of "parental oversight" PostgreSQL tries
to provide.

David J.

#7Andy Fan
zhihui.fan1213@gmail.com
In reply to: Isaac Morland (#4)
Re: 2 questions about volatile attribute of pg_proc.

- a PL/PGSQL function's meaning depends on the search path in effect when

it is called, unless it has a SET search_path clause or it fully qualifies
all object references, so it isn't actually possible in general to
determine what a function calls at definition time

I'd think this one as a blocker issue at the beginning since I have to
insist on
any new features should not cause semantic changes for existing ones. Later
I
found the new definition. As for this feature request, I think we can
define the
features like this:

1. We define a new attribute named VOLATILE_AUTO; The semantic is PG will
auto
detect the volatile info based on current search_path / existing
function. If any embedded function can't be found, we can raise an error
if
VOLATILE_AUTO is used. If people change the volatile attribute later, we
can:
a). do nothing. This can be the documented feature. or. b). Maintain the
dependency tree between functions and if anyone is changed, other
functions
should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when
people
requires it.

Then what we can get from this? Thinking a user is migrating lots of UDF
from
other databases. Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#8Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#7)
Re: 2 questions about volatile attribute of pg_proc.

I'm listening to any obvious reason to reject it.

Any obvious reason to reject it because of it would be a lose battle for

sure,
so I would not waste time on it. Or vote up if you think it is possible and
useful.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#7)
Re: 2 questions about volatile attribute of pg_proc.

út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

- a PL/PGSQL function's meaning depends on the search path in effect

when it is called, unless it has a SET search_path clause or it fully
qualifies all object references, so it isn't actually possible in general
to determine what a function calls at definition time

I'd think this one as a blocker issue at the beginning since I have to
insist on
any new features should not cause semantic changes for existing ones.
Later I
found the new definition. As for this feature request, I think we can
define the
features like this:

1. We define a new attribute named VOLATILE_AUTO; The semantic is PG will
auto
detect the volatile info based on current search_path / existing
function. If any embedded function can't be found, we can raise an
error if
VOLATILE_AUTO is used. If people change the volatile attribute later,
we can:
a). do nothing. This can be the documented feature. or. b). Maintain the
dependency tree between functions and if anyone is changed, other
functions
should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when
people
requires it.

Then what we can get from this? Thinking a user is migrating lots of UDF
from
other databases. Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query
plans are planned only when are required - and this feature requires
complete planning current function and all nested VOLATILE_AUTO functions -
so start of function can be significantly slower

b) When you migrate from Oracle,then you can use the STABLE flag, and it
will be mostly correct.

Regards

Pavel

Show quoted text

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#10Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#9)
Re: 2 questions about volatile attribute of pg_proc.

On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com>
napsal:

- a PL/PGSQL function's meaning depends on the search path in effect

when it is called, unless it has a SET search_path clause or it fully
qualifies all object references, so it isn't actually possible in general
to determine what a function calls at definition time

I'd think this one as a blocker issue at the beginning since I have to
insist on
any new features should not cause semantic changes for existing ones.
Later I
found the new definition. As for this feature request, I think we can
define the
features like this:

1. We define a new attribute named VOLATILE_AUTO; The semantic is PG
will auto
detect the volatile info based on current search_path / existing
function. If any embedded function can't be found, we can raise an
error if
VOLATILE_AUTO is used. If people change the volatile attribute later,
we can:
a). do nothing. This can be the documented feature. or. b). Maintain
the
dependency tree between functions and if anyone is changed, other
functions
should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when
people
requires it.

Then what we can get from this? Thinking a user is migrating lots of UDF
from
other databases. Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query
plans are planned only when are required - and this feature requires
complete planning current function and all nested VOLATILE_AUTO functions -
so start of function can be significantly slower

Actually I am thinking we can do this when we compile the function, which
means that would
happen on the "CREATE FUNCTION " stage. this would need some hacks for
sure. Does
this remove your concern?

b) When you migrate from Oracle,then you can use the STABLE flag, and it
will be mostly correct.

This was suggested in our team as well, but I don't think it is very
strict. For example:
SELECT materialize_bills_for(userId) from users; Any more proof to say
"STABLE" flag
is acceptable?

--

Best Regards
Andy Fan (https://www.aliyun.com/)

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#10)
Re: 2 questions about volatile attribute of pg_proc.

út 20. 4. 2021 v 5:16 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com>
napsal:

- a PL/PGSQL function's meaning depends on the search path in effect

when it is called, unless it has a SET search_path clause or it fully
qualifies all object references, so it isn't actually possible in general
to determine what a function calls at definition time

I'd think this one as a blocker issue at the beginning since I have to
insist on
any new features should not cause semantic changes for existing ones.
Later I
found the new definition. As for this feature request, I think we can
define the
features like this:

1. We define a new attribute named VOLATILE_AUTO; The semantic is PG
will auto
detect the volatile info based on current search_path / existing
function. If any embedded function can't be found, we can raise an
error if
VOLATILE_AUTO is used. If people change the volatile attribute later,
we can:
a). do nothing. This can be the documented feature. or. b). Maintain
the
dependency tree between functions and if anyone is changed, other
functions
should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when
people
requires it.

Then what we can get from this? Thinking a user is migrating lots of
UDF from
other databases. Asking them to check/set each function's attribute
might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query
plans are planned only when are required - and this feature requires
complete planning current function and all nested VOLATILE_AUTO functions -
so start of function can be significantly slower

Actually I am thinking we can do this when we compile the function, which
means that would
happen on the "CREATE FUNCTION " stage. this would need some hacks for
sure. Does
this remove your concern?

you cannot do it - with this you introduce strong dependency on nested
objects - and that means a lot of problems - necessity of rechecks when any
nested object is changed. There will be new problems with dependency, when
you create functions, and until we have global temp tables, then it is
blocker for usage of temporary tables. The current behavior is not perfect,
but in this direction is very practical, and I would not change it. Can be
nice if some functionality of plpgsql_check can be in core, because I think
so it is necessary for development, but the structure and integration of
SQL in PLpgSQL is very good (and very practical).

b) When you migrate from Oracle,then you can use the STABLE flag, and it
will be mostly correct.

This was suggested in our team as well, but I don't think it is very
strict. For example:
SELECT materialize_bills_for(userId) from users; Any more proof to say
"STABLE" flag
is acceptable?

Oracle doesn't allow write operations in functions. Or didn't allow it - I
am not sure what is possible now. So when you migrate data from Oracle, and
if the function is not marked as DETERMINISTIC, you can safely mark it as
STABLE. Ora2pg does it. Elsewhere - it works 99% well. In special cases,
there is some black magic - with fresh snapshots, and with using autonomous
transactions, and these cases should be solved manually. Sometimes is good
enough just removing autonomous transactions, sometimes the complete
rewrite is necessary - or redesign functionality.

Show quoted text

--

Best Regards
Andy Fan (https://www.aliyun.com/)

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#12Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#11)
Re: 2 questions about volatile attribute of pg_proc.

On Tue, Apr 20, 2021 at 11:32 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

út 20. 4. 2021 v 5:16 odesílatel Andy Fan <zhihui.fan1213@gmail.com>
napsal:

On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com>
napsal:

- a PL/PGSQL function's meaning depends on the search path in effect

when it is called, unless it has a SET search_path clause or it fully
qualifies all object references, so it isn't actually possible in general
to determine what a function calls at definition time

I'd think this one as a blocker issue at the beginning since I have to
insist on
any new features should not cause semantic changes for existing ones.
Later I
found the new definition. As for this feature request, I think we can
define the
features like this:

1. We define a new attribute named VOLATILE_AUTO; The semantic is PG
will auto
detect the volatile info based on current search_path / existing
function. If any embedded function can't be found, we can raise an
error if
VOLATILE_AUTO is used. If people change the volatile attribute
later, we can:
a). do nothing. This can be the documented feature. or. b). Maintain
the
dependency tree between functions and if anyone is changed, other
functions
should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when
people
requires it.

Then what we can get from this? Thinking a user is migrating lots of
UDF from
other databases. Asking them to check/set each function's attribute
might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would
be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query
plans are planned only when are required - and this feature requires
complete planning current function and all nested VOLATILE_AUTO functions -
so start of function can be significantly slower

Actually I am thinking we can do this when we compile the function,
which means that would
happen on the "CREATE FUNCTION " stage. this would need some hacks for
sure. Does
this remove your concern?

you cannot do it - with this you introduce strong dependency on nested
objects

What does the plpgsql_check do in this area? I checked the README[1]https://github.com/okbob/plpgsql_check/blob/master/README.md#features, but
can't find
anything about it.

until we have global temp tables, then it is blocker for usage of
temporary tables.

Can you explain more about this?

Can be nice if some functionality of plpgsql_check can be in core,
because I think so it is necessary for development, but the structure and
integration of SQL in PLpgSQL is very good (and very practical).

I'm interested in plpgsql_check. However I am still confused why we can do
it in this way, but
can't do it in the VOLATILE_AUTO way.

b) When you migrate from Oracle,then you can use the STABLE flag, and it
will be mostly correct.

This was suggested in our team as well, but I don't think it is very
strict. For example:
SELECT materialize_bills_for(userId) from users; Any more proof to say
"STABLE" flag
is acceptable?

Oracle doesn't allow write operations in functions. Or didn't allow it - I
am not sure what is possible now. So when you migrate data from Oracle, and
if the function is not marked as DETERMINISTIC, you can safely mark it as
STABLE.

You are correct. Good to know the above points.

Elsewhere - it works 99% well. In special cases, there is some black
magic - with fresh snapshots, and with using autonomous transactions, and
these cases should be solved manually. Sometimes is good enough just
removing autonomous transactions, sometimes the complete rewrite is
necessary - or redesign functionality.

is the 1% == "special cases" ? Do you mind sharing more information about
these cases,
either document or code?

[1]: https://github.com/okbob/plpgsql_check/blob/master/README.md#features

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#12)
Re: 2 questions about volatile attribute of pg_proc.

you cannot do it - with this you introduce strong dependency on nested
objects

What does the plpgsql_check do in this area? I checked the README[1], but
can't find
anything about it.

When you run plpgsql_check with performance warning (disabled by default),
then it does check if all called functions are on the same or lower level
than checked functions have. So when all called operations are stable
(read only), then the function can be marked as stable - and if the
function is marked as volatile, then plpgsql_check raises an warning.

until we have global temp tables, then it is blocker for usage of
temporary tables.

All plpgsql expressions are SQL expressions - and anybody can call a
function against a temporary table. But local temporary tables don't exist
in typical CREATE FUNCTION time (registration time). Typically doesn't
exist in plpgsql compile time too. Usually temporary tables are created
inside executed plpgsql functions. So you cannot do any semantical (deeper)
check in registration, or compile time. Just because one kind of object
(temporary tables) doesn't exist. This is a difficult issue for
plpgsql_check too.

Can you explain more about this?

Can be nice if some functionality of plpgsql_check can be in core,
because I think so it is necessary for development, but the structure and
integration of SQL in PLpgSQL is very good (and very practical).

I'm interested in plpgsql_check. However I am still confused why we can
do it in this way, but
can't do it in the VOLATILE_AUTO way.

You can do it. But you solve one issue, and introduce new kinds of more
terrible issues (related to dependencies between database's objects). The
design of plpgsql is pretty different from the design of Oracle's PL/SQL.
So proposed change means total conceptual change, and you need to write a
lot of new code that will provide necessary infrastructure. I am not sure
if the benefit is higher than the cost. It can be usable, if plpgsql can be
really compiled to some machine code - but it means ten thousands codes
without significant benefits - the bottleneck inside stored procedures is
SQL, and the compilation doesn't help with it.

b) When you migrate from Oracle,then you can use the STABLE flag, and
it will be mostly correct.

This was suggested in our team as well, but I don't think it is very
strict. For example:
SELECT materialize_bills_for(userId) from users; Any more proof to say
"STABLE" flag
is acceptable?

Oracle doesn't allow write operations in functions. Or didn't allow it -
I am not sure what is possible now. So when you migrate data from Oracle,
and if the function is not marked as DETERMINISTIC, you can safely mark it
as STABLE.

You are correct. Good to know the above points.

And DETERMINISTIC functions are IMMUTABLE on Postgres's side

Elsewhere - it works 99% well. In special cases, there is some black
magic - with fresh snapshots, and with using autonomous transactions, and
these cases should be solved manually. Sometimes is good enough just
removing autonomous transactions, sometimes the complete rewrite is
necessary - or redesign functionality.

is the 1% == "special cases" ? Do you mind sharing more information about
these cases,
either document or code?

Unfortunately not. I have not well structured notes from work on ports from
Oracle to Postgres. And these 1% cases are very very variable. People are
very creative. But usually this code is almost very dirty, and not
critical. In Postgres we can use LISTEN, NOTIFY, or possibility to set
app_name or we can use RAISE NOTICE.

Show quoted text

[1] https://github.com/okbob/plpgsql_check/blob/master/README.md#features

--
Best Regards
Andy Fan (https://www.aliyun.com/)