[bug?] Missed parallel safety checks, and wrong parallel safety
Hello,
I think we've found a few existing problems with handling the parallel safety of functions while doing an experiment. Could I hear your opinions on what we should do? I'd be willing to create and submit a patch to fix them.
The experiment is to add a parallel safety check in FunctionCallInvoke() and run the regression test with force_parallel_mode=regress. The added check errors out with ereport(ERROR) when the about-to-be-called function is parallel unsafe and the process is currently in parallel mode. 6 test cases failed because the following parallel-unsafe functions were called:
dsnowball_init
balkifnull
int44out
text_w_default_out
widget_out
The first function is created in src/backend/snowball/snowball_create.sql for full text search. The remaining functions are created during the regression test run.
The relevant issues follow.
(1)
All the above functions are actually parallel safe looking at their implementations. It seems that their CREATE FUNCTION statements are just missing PARALLEL SAFE specifications, so I think I'll add them. dsnowball_lexize() may also be parallel safe.
(2)
I'm afraid the above phenomenon reveals that postgres overlooks parallel safety checks in some places. Specifically, we noticed the following:
* User-defined aggregate
CREATE AGGREGATE allows to specify parallel safety of the aggregate itself and the planner checks it, but the support function of the aggregate is not checked. OTOH, the document clearly says:
https://www.postgresql.org/docs/devel/xaggr.html
"Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE. The parallel-safety markings on its support functions are not consulted."
https://www.postgresql.org/docs/devel/sql-createaggregate.html
"An aggregate will not be considered for parallelization if it is marked PARALLEL UNSAFE (which is the default!) or PARALLEL RESTRICTED. Note that the parallel-safety markings of the aggregate's support functions are not consulted by the planner, only the marking of the aggregate itself."
Can we check the parallel safety of aggregate support functions during statement execution and error out? Is there any reason not to do so?
* User-defined data type
The input, output, send,receive, and other functions of a UDT are not checked for parallel safety. Is there any good reason to not check them other than the concern about performance?
* Functions for full text search
Should CREATE TEXT SEARCH TEMPLATE ensure that the functions are parallel safe? (Those functions could be changed to parallel unsafe later with ALTER FUNCTION, though.)
(3) Built-in UDFs are not checked for parallel safety
The functions defined in fmgr_builtins[], which are derived from pg_proc.dat, are not checked. Most of them are marked parallel safe, but some are paralel unsaferestricted.
Besides, changing their parallel safety with ALTER FUNCTION PARALLEL does not affect the selection of query plan. This is because fmgr_builtins[] does not have a member for parallel safety.
Should we add a member for parallel safety in fmgr_builtins[], and disallow ALTER FUNCTION to change the parallel safety of builtin UDFs?
Regards
Takayuki Tsunakawa
On Tue, Apr 20, 2021 at 2:23 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
(2)
I'm afraid the above phenomenon reveals that postgres overlooks parallel safety checks in some places. Specifically, we noticed the following:* User-defined aggregate
CREATE AGGREGATE allows to specify parallel safety of the aggregate itself and the planner checks it, but the support function of the aggregate is not checked. OTOH, the document clearly says:https://www.postgresql.org/docs/devel/xaggr.html
"Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE. The parallel-safety markings on its support functions are not consulted."
https://www.postgresql.org/docs/devel/sql-createaggregate.html
"An aggregate will not be considered for parallelization if it is marked PARALLEL UNSAFE (which is the default!) or PARALLEL RESTRICTED. Note that the parallel-safety markings of the aggregate's support functions are not consulted by the planner, only the marking of the aggregate itself."
IMO, the reason for not checking the parallel safety of the support
functions is that the functions themselves can have whole lot of other
functions (can be nested as well) which might be quite hard to check
at the planning time. That is why the job of marking an aggregate as
parallel safe is best left to the user. They have to mark the aggreage
parallel unsafe if at least one support function is parallel unsafe,
otherwise parallel safe.
Can we check the parallel safety of aggregate support functions during statement execution and error out? Is there any reason not to do so?
And if we were to do above, within the function execution API, we need
to know where the function got called from(?). It is best left to the
user to decide whether a function/aggregate is parallel safe or not.
This is the main reason we have declarative constructs like parallel
safe/unsafe/restricted.
For core functions, we definitely should properly mark parallel
safe/restricted/unsafe tags wherever possible.
Please correct me If I miss something.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
On Tue, Apr 20, 2021 at 2:23 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:https://www.postgresql.org/docs/devel/xaggr.html
"Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE. The parallel-safety markings on its support functions are not consulted."
IMO, the reason for not checking the parallel safety of the support
functions is that the functions themselves can have whole lot of other
functions (can be nested as well) which might be quite hard to check
at the planning time. That is why the job of marking an aggregate as
parallel safe is best left to the user.
Yes. I think the documentation is perfectly clear that this is
intentional; I don't see a need to change it.
Should we add a member for parallel safety in fmgr_builtins[], and disallow ALTER FUNCTION to change the parallel safety of builtin UDFs?
No. You'd have to be superuser anyway to do that, and we're not in the
habit of trying to put training wheels on superusers.
Don't have an opinion about the other points yet.
regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us>
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
IMO, the reason for not checking the parallel safety of the support
functions is that the functions themselves can have whole lot of other
functions (can be nested as well) which might be quite hard to check
at the planning time. That is why the job of marking an aggregate as
parallel safe is best left to the user.Yes. I think the documentation is perfectly clear that this is
intentional; I don't see a need to change it.
OK, that's what I expected. I understood from this that the Postgres's stance toward parallel safety is that Postgres does its best effort to check parallel safety (as far as it doesn't hurt performance much, and perhaps the core code doesn't get very complex), and the user should be responsible for the actual parallel safety of ancillary objects (in this case, support functions for an aggregate) of the target object that he/she marked as parallel safe.
Should we add a member for parallel safety in fmgr_builtins[], and disallow
ALTER FUNCTION to change the parallel safety of builtin UDFs?
No. You'd have to be superuser anyway to do that, and we're not in the
habit of trying to put training wheels on superusers.
Understood. However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT SELECT. I'd appreciate your comment on this if you see any concern.
Don't have an opinion about the other points yet.
I'd like to have your comments on them, too. But I understand you must be so busy at least until the beta release of PG 14.
Regards
Takayuki Tsunakawa
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes:
From: Tom Lane <tgl@sss.pgh.pa.us>
No. You'd have to be superuser anyway to do that, and we're not in the
habit of trying to put training wheels on superusers.
Understood. However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT SELECT. I'd appreciate your comment on this if you see any concern.
[ raised eyebrow... ] I find it very hard to understand why that would
be necessary, or even a good idea. Not least because there's no spare
room there; you'd have to incur a substantial enlargement of the
array to add another flag. But also, that would indeed lock down
the value of the parallel-safety flag, and that seems like a fairly
bad idea.
regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us>
[ raised eyebrow... ] I find it very hard to understand why that would
be necessary, or even a good idea. Not least because there's no spare
room there; you'd have to incur a substantial enlargement of the
array to add another flag. But also, that would indeed lock down
the value of the parallel-safety flag, and that seems like a fairly
bad idea.
You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit machines). Enlarging the frequently accessed fmgr_builtins array may wreak unexpectedly large adverse effect on performance.
I wanted to check the parallel safety of functions, which various objects (data type, index, trigger, etc.) come down to, in FunctionCallInvoke() and other few places. But maybe we skip the check for built-in functions. That's a matter of where we draw a line between where we check and where we don't.
Regards
Takayuki Tsunakawa
I think we've found a few existing problems with handling the parallel safety of
functions while doing an experiment. Could I hear your opinions on what we
should do? I'd be willing to create and submit a patch to fix them.The experiment is to add a parallel safety check in FunctionCallInvoke() and run
the regression test with force_parallel_mode=regress. The added check
errors out with ereport(ERROR) when the about-to-be-called function is
parallel unsafe and the process is currently in parallel mode. 6 test cases failed
because the following parallel-unsafe functions were called:dsnowball_init
balkifnull
int44out
text_w_default_out
widget_outThe first function is created in src/backend/snowball/snowball_create.sql for
full text search. The remaining functions are created during the regression
test run.(1)
All the above functions are actually parallel safe looking at their
implementations. It seems that their CREATE FUNCTION statements are just
missing PARALLEL SAFE specifications, so I think I'll add them.
dsnowball_lexize() may also be parallel safe.
I agree that it's better to mark the function with correct parallel safety lable.
Especially for the above functions which will be executed in parallel mode.
It will be friendly to developer and user who is working on something related to parallel test.
So, I attached the patch to mark the above functions parallel safe.
Best regards,
houzj
Attachments:
0001-fix-testcase-with-wrong-parallel-safety-flag.patchapplication/octet-stream; name=0001-fix-testcase-with-wrong-parallel-safety-flag.patchDownload+14-13
On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
From: Tom Lane <tgl@sss.pgh.pa.us>
[ raised eyebrow... ] I find it very hard to understand why that would
be necessary, or even a good idea. Not least because there's no spare
room there; you'd have to incur a substantial enlargement of the
array to add another flag. But also, that would indeed lock down
the value of the parallel-safety flag, and that seems like a fairly
bad idea.You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit machines). Enlarging the frequently accessed fmgr_builtins array may wreak unexpectedly large adverse effect on performance.
I wanted to check the parallel safety of functions, which various objects (data type, index, trigger, etc.) come down to, in FunctionCallInvoke() and other few places. But maybe we skip the check for built-in functions. That's a matter of where we draw a line between where we check and where we don't.
IIUC, the idea here is to check for parallel safety of functions at
someplace in the code during function invocation so that if we execute
any parallel unsafe/restricted function via parallel worker then we
error out. If so, isn't it possible to deal with built-in and
non-built-in functions in the same way?
I think we want to have some safety checks for functions as we have
for transaction id in AssignTransactionId(), command id in
CommandCounterIncrement(), for write operations in
heap_prepare_insert(), etc. Is that correct?
--
With Regards,
Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes:
On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:From: Tom Lane <tgl@sss.pgh.pa.us>
[ raised eyebrow... ] I find it very hard to understand why that would
be necessary, or even a good idea.
IIUC, the idea here is to check for parallel safety of functions at
someplace in the code during function invocation so that if we execute
any parallel unsafe/restricted function via parallel worker then we
error out. If so, isn't it possible to deal with built-in and
non-built-in functions in the same way?
Yeah, one of the reasons I doubt this is a great idea is that you'd
still have to fetch the pg_proc row for non-built-in functions.
The obvious place to install such a check is fmgr_info(), which is
fetching said row anyway for other purposes, so it's really hard to
see how adding anything to FmgrBuiltin is going to help.
regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us>
Amit Kapila <amit.kapila16@gmail.com> writes:
IIUC, the idea here is to check for parallel safety of functions at
someplace in the code during function invocation so that if we execute
any parallel unsafe/restricted function via parallel worker then we
error out. If so, isn't it possible to deal with built-in and
non-built-in functions in the same way?Yeah, one of the reasons I doubt this is a great idea is that you'd
still have to fetch the pg_proc row for non-built-in functions.The obvious place to install such a check is fmgr_info(), which is
fetching said row anyway for other purposes, so it's really hard to
see how adding anything to FmgrBuiltin is going to help.
Thank you, fmgr_info() looks like the best place to do the parallel safety check. Having a quick look at its callers, I didn't find any concerning place (of course, we can't be relieved until the regression test succeeds.) Also, with fmgr_info(), we don't have to find other places to add the check to deal with functions calls in execExpr.c and execExprInterp.c. This is beautiful.
But the current fmgr_info() does not check the parallel safety of builtin functions. It does not have information to do that. There are two options. Which do you think is better? I think 2.
1) fmgr_info() reads pg_proc like for non-builtin functions
This ruins the effort for the fast path for builtin functions. I can't imagine how large the adverse impact on performance would be, but I'm worried.
The benefit is that ALTER FUNCTION on builtin functions takes effect. But such operations are nonsensical, so I don't think we want to gain such a benefit.
2) Gen_fmgrtab.pl adds a member for proparallel in FmgrBuiltin
But we don't want to enlarge FmgrBuiltin struct. So, change the existing bool members strict and and retset into one member of type char, and represent the original values with some bit flags. Then we use that member for proparallel as well. (As a result, one byte is left for future use.)
I think we'll try 2). I'd be grateful if you could point out anything I need to be careful to.
Regards
Takayuki Tsunakawa
From: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
I agree that it's better to mark the function with correct parallel safety lable.
Especially for the above functions which will be executed in parallel mode.
It will be friendly to developer and user who is working on something related to
parallel test.So, I attached the patch to mark the above functions parallel safe.
Thank you, the patch looks good. Please register it with the next CF if not yet.
Regards
Takayuki Tsunakawa
Thank you, fmgr_info() looks like the best place to do the parallel safety check.
Having a quick look at its callers, I didn't find any concerning place (of course,
we can't be relieved until the regression test succeeds.) Also, with fmgr_info(),
we don't have to find other places to add the check to deal with functions calls
in execExpr.c and execExprInterp.c. This is beautiful.But the current fmgr_info() does not check the parallel safety of builtin
functions. It does not have information to do that. There are two options.
Which do you think is better? I think 2.1) fmgr_info() reads pg_proc like for non-builtin functions This ruins the effort
for the fast path for builtin functions. I can't imagine how large the adverse
impact on performance would be, but I'm worried.
For approach 1): I think it could result in infinite recursion.
For example:
If we first access one built-in function A which have not been cached,
it need access the pg_proc, When accessing the pg_proc, it internally still need some built-in function B to scan.
At this time, if B is not cached , it still need to fetch function B's parallel flag by accessing the pg_proc.proparallel.
Then it could result in infinite recursion.
So, I think we can consider the approach 2)
Best regards,
houzj
From: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
For approach 1): I think it could result in infinite recursion.
For example:
If we first access one built-in function A which have not been cached,
it need access the pg_proc, When accessing the pg_proc, it internally still need
some built-in function B to scan.
At this time, if B is not cached , it still need to fetch function B's parallel flag by
accessing the pg_proc.proparallel.
Then it could result in infinite recursion.So, I think we can consider the approach 2)
Hmm, that makes sense. That's a problem structure similar to that of relcache. Only one choice is left already, unless there's another better idea.
Regards
Takayuki Tsunakawa
On Wed, Apr 21, 2021 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes:
From: Tom Lane <tgl@sss.pgh.pa.us>
No. You'd have to be superuser anyway to do that, and we're not in the
habit of trying to put training wheels on superusers.Understood. However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT SELECT. I'd appreciate your comment on this if you see any concern.
[ raised eyebrow... ] I find it very hard to understand why that would
be necessary, or even a good idea. Not least because there's no spare
room there; you'd have to incur a substantial enlargement of the
array to add another flag. But also, that would indeed lock down
the value of the parallel-safety flag, and that seems like a fairly
bad idea.
I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?
Regards,
Greg Nancarrow
Fujitsu Australia
On Wed, Apr 21, 2021 at 7:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:From: Tom Lane <tgl@sss.pgh.pa.us>
[ raised eyebrow... ] I find it very hard to understand why that would
be necessary, or even a good idea.IIUC, the idea here is to check for parallel safety of functions at
someplace in the code during function invocation so that if we execute
any parallel unsafe/restricted function via parallel worker then we
error out. If so, isn't it possible to deal with built-in and
non-built-in functions in the same way?Yeah, one of the reasons I doubt this is a great idea is that you'd
still have to fetch the pg_proc row for non-built-in functions.
So, are you suggesting that we should fetch the pg_proc row for
built-in functions as well for this purpose? If not, then how to
identify parallel safety of built-in functions in fmgr_info()?
Another idea could be that we check parallel safety of built-in
functions based on some static information. As we know the func_ids of
non-parallel-safe built-in functions, we can have a function
fmgr_builtin_parallel_safe() which check if the func_id is not one
among the predefined func_ids of non-parallel-safe built-in functions,
it returns true, otherwise, false. Then, we can call this new function
in fmgr_info for built-in functions.
Thoughts?
--
With Regards,
Amit Kapila.
Greg Nancarrow <gregn4422@gmail.com> writes:
I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?
It does, but that's much more directly a property of the function's
C code than parallel-safety is.
regards, tom lane
On Fri, Apr 23, 2021 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Nancarrow <gregn4422@gmail.com> writes:
I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?It does, but that's much more directly a property of the function's
C code than parallel-safety is.
I'm not sure I agree with that, but I think having the "strict" flag
in FmgrBuiltin isn't that nice either.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Apr 23, 2021 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Nancarrow <gregn4422@gmail.com> writes:
I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?
It does, but that's much more directly a property of the function's
C code than parallel-safety is.
I'm not sure I agree with that, but I think having the "strict" flag
in FmgrBuiltin isn't that nice either.
Yeah, if we could readily do without it, we probably would. But the
function call mechanism itself is responsible for implementing strictness,
so it *has* to have that flag available.
regards, tom lane
On Fri, Apr 23, 2021 at 6:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Nancarrow <gregn4422@gmail.com> writes:
I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?It does, but that's much more directly a property of the function's
C code than parallel-safety is.
Isn't parallel safety also the C code property? I mean unless someone
changes the built-in function code, changing that property would be
dangerous. The other thing is even if a user is allowed to change one
function's property, how will they know which other functions are
called by that function and whether they are parallel-safe or not. For
example, say if the user wants to change the parallel safe property of
a built-in function brin_summarize_new_values, unless she changes its
code and the functions called by it like brin_summarize_range, it
would be dangerous. So, isn't it better to disallow changing parallel
safety for built-in functions?
Also, if the strict property of built-in functions is fixed
internally, why we allow users to change it and is that of any help?
--
With Regards,
Amit Kapila.
On Sat, Apr 24, 2021 at 12:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 23, 2021 at 6:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Nancarrow <gregn4422@gmail.com> writes:
I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?It does, but that's much more directly a property of the function's
C code than parallel-safety is.Isn't parallel safety also the C code property? I mean unless someone
changes the built-in function code, changing that property would be
dangerous. The other thing is even if a user is allowed to change one
function's property, how will they know which other functions are
called by that function and whether they are parallel-safe or not. For
example, say if the user wants to change the parallel safe property of
a built-in function brin_summarize_new_values, unless she changes its
code and the functions called by it like brin_summarize_range, it
would be dangerous. So, isn't it better to disallow changing parallel
safety for built-in functions?Also, if the strict property of built-in functions is fixed
internally, why we allow users to change it and is that of any help?
Yes, I'd like to know too.
I think it would make more sense to disallow changing properties like
strict/parallel-safety on built-in functions.
Also, with sufficient privileges, a built-in function can be
redefined, yet the original function (whose info is cached in
FmgrBuiltins[], from build-time) is always invoked, not the
newly-defined version.
Regards,
Greg Nancarrow
Fujitsu Australia