BUG #15324: Non-deterministic behaviour from parallelised sub-query
The following bug has been logged on the website:
Bug reference: 15324
Logged by: Andrew Fletcher
Email address: andy@prestigedigital.com
PostgreSQL version: 10.5
Operating system: macOS High Sierra 10.13.6
Description:
Reproductions -
Make sure postgresql.conf sets max_parallel_workers_per_gather to 2 or
more
max_parallel_workers_per_gather = 2
Repro 1:
Create table from the following sql file -
https://www.dropbox.com/s/3cm643vmugcgkxh/events.sql.zip?dl=0
Execute this query (multiple times!)
select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);
Incorrect output -
account | type | data
---------+----------+--------------------------
304873 | pageview | {"page": "success.html"}
304875 | pageview | {"page": "c.html"}
304875 | pageview | {"page": "success.html"}
304885 | pageview | {"page": "a.html"}
304885 | pageview | {"page": "success.html"}
(5 rows)
Correct output -
account | type | data
---------+----------+--------------------------
304873 | pageview | {"page": "success.html"}
304875 | pageview | {"page": "c.html"}
304875 | pageview | {"page": "success.html"}
304885 | pageview | {"page": "a.html"}
304885 | pageview | {"page": "success.html"}
304873 | pageview | {"page": "b.html"}
(6 rows)
Repro 2 -
Create table from the following sql file -
https://www.dropbox.com/s/mzglgm4a5x1mqno/repro1.sql.zip?dl=0
Execute this query (multiple times!)
select * from repro1 where account in (select account from repro1 where page
= 'success.html' limit 3);
Incorrect Output -
account | page
---------+--------------
14 | a.html
14 | success.html
65 | b.html
65 | success.html
80 | b.html
80 | success.html
24084 | a.html
24084 | success.html
24085 | c.html
24085 | success.html
24095 | a.html
24095 | success.html
(12 rows)
Correct output -
account | page
---------+--------------
14 | a.html
14 | success.html
65 | b.html
65 | success.html
80 | b.html
80 | success.html
(6 rows)
Full version string -
PostgreSQL 10.5 on x86_64-apple-darwin17.7.0, compiled by Apple LLVM version
9.1.0 (clang-902.0.39.2), 64-bit
Also reproduced (with slightly different non-determinism) on -
PostgreSQL 9.6.3, compiled by Visual C++ build 1800, 32-bit on Windows 10
Pro 1709, build 16299.547
Known workarounds -
1. max_parallel_workers_per_gather = 0
2. Add order by account asc to the subquery (works for both repros)
Hi,
On 2018-08-13 16:14:03 +0000, PG Bug reporting form wrote:
Execute this query (multiple times!)
select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);
Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.
Greetings,
Andres Freund
Sorry the bug report was unclear.
Its not just that its returning different accounts from the subquery.
In the first repro you can see account 304873 appears twice in the correct
result but only once in the incorrect one. Even though its the same data,
same query etc. If account 304873 is selected within the limit of the
subquery then all the results for it should be returned in the outer query.
In the second repro you can see more than 3 accounts in the outer query,
even though the inner one is limited to 3.
Hope that makes it clearer.
Cheers,
Andy
On Mon, Aug 13, 2018 at 5:35 PM, Andres Freund <andres@anarazel.de> wrote:
Show quoted text
Hi,
On 2018-08-13 16:14:03 +0000, PG Bug reporting form wrote:
Execute this query (multiple times!)
select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.Greetings,
Andres Freund
On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-08-13 16:14:03 +0000, PG Bug reporting form wrote:
Execute this query (multiple times!)
select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.
While this is true, that's missing the point. This output, for example:
account | page
---------+--------------
14 | a.html
14 | success.html
65 | b.html
65 | success.html
80 | b.html
80 | success.html
24084 | a.html
24084 | success.html
24085 | c.html
24085 | success.html
24095 | a.html
24095 | success.html
(12 rows)
contains data from six different accounts, which should surely be
impossible regardless of which three accounts the subquery returns.
The one in repro1 is also problematic, because it shows that 304873, 304875
and 304885 were all selected, but not all rows for those accounts were
returned.
.m
Marko Tiikkaja <marko@joh.to> writes:
On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de> wrote:
Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.
While this is true, that's missing the point.
Yeah, I agree. I think probably what's happening is that the sub-select
is getting pushed down to the parallel workers and they are not all
computing the same set of sub-select results, leading to inconsistent
answers at the top level.
Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.
regards, tom lane
2018-08-13 19:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Marko Tiikkaja <marko@joh.to> writes:
On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de>
wrote:
Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.While this is true, that's missing the point.
Yeah, I agree. I think probably what's happening is that the sub-select
is getting pushed down to the parallel workers and they are not all
computing the same set of sub-select results, leading to inconsistent
answers at the top level.Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.
Isn't it default behave of LIMIT/OFFSET without ORDER BY clause?
If we don't need to solve order of rows, then parallel unsafe is not
necessary.
Regards
Pavel
Show quoted text
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
2018-08-13 19:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.
Isn't it default behave of LIMIT/OFFSET without ORDER BY clause?
In principle, the planner could prove in some cases that the results
were deterministic even with LIMIT/OFFSET. BuT I doubt it's worth
the trouble. I certainly wouldn't advocate for such logic to be
part of a back-patched bug fix.
regards, tom lane
On Monday, August 13, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2018-08-13 19:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.Isn't it default behave of LIMIT/OFFSET without ORDER BY clause?
In principle, the planner could prove in some cases that the results
were deterministic even with LIMIT/OFFSET. BuT I doubt it's worth
the trouble. I certainly wouldn't advocate for such logic to be
part of a back-patched bug fix.
Could the planner stick a materialize node there that feeds the same set of
originally selected records to any parallel executors that end up pulling
from it?
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
Could the planner stick a materialize node there that feeds the same set of
originally selected records to any parallel executors that end up pulling
from it?
No. At least, Materialize as it stands doesn't help. There's no
provision for pushing rowsets to parallel workers, only pulling
from them, and it would be an extremely nontrivial thing to add
AFAICS (for one thing, it'd make the leader process even more of
a bottleneck than it is already).
Maybe you could do something with dumping a rowset into a tuplestore
and forcing that out to temp files on-disk before starting any of
the parallel workers, then letting them read it in from the temp
files. But that still looks like a lot of new work and completely
not fit for back-patching, even if we had it.
regards, tom lane
On Mon, Aug 13, 2018 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de> wrote:
Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.While this is true, that's missing the point.
Yeah, I agree. I think probably what's happening is that the sub-select
is getting pushed down to the parallel workers and they are not all
computing the same set of sub-select results, leading to inconsistent
answers at the top level.
Your analysis is correct. The plan for one of the reported query is as follows:
postgres=# explain select * from repro1 where account in (select
account from repro1 where page
postgres(# = 'success.html' limit 3);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Gather (cost=1000.71..12727.24 rows=3 width=11)
Workers Planned: 2
-> Hash Semi Join (cost=0.71..11726.94 rows=1 width=11)
Hash Cond: (repro1.account = repro1_1.account)
-> Parallel Seq Scan on repro1 (cost=0.00..10532.50
rows=454750 width=11)
-> Hash (cost=0.67..0.67 rows=3 width=4)
-> Limit (cost=0.00..0.64 rows=3 width=4)
-> Seq Scan on repro1 repro1_1
(cost=0.00..19627.50 rows=91823 width=4)
Filter: ((page)::text = 'success.html'::text)
(9 rows)
As Tom said, it is evident from the plan that the Limit clause is
pushed in the inner-side of the parallel plan and not all the workers
compute the same result set for the inner side.
Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.
Yeah, one idea could be that we detect this in
max_parallel_hazard_walker during the very first pass it performs on
query-tree. Basically, in the SubLink node check, we can detect
whether the subselect has Limit/Offset clause and if so, then we can
treat it as parallel_unsafe. I have tried that way and it prohibits
the parallel plan for the reported queries. However, I think more
analysis and verification is required to see if it can happen in any
other related cases. BTW, will there be any problem if we allow
sub-selects which have sortclause even if the Limit/Offset is present?
Let me know if you have already started working on it, otherwise, I
will prepare an initial patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Aug 14, 2018 at 9:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 13, 2018 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
As Tom said, it is evident from the plan that the Limit clause is
pushed in the inner-side of the parallel plan and not all the workers
compute the same result set for the inner side.Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.Yeah, one idea could be that we detect this in
max_parallel_hazard_walker during the very first pass it performs on
query-tree.
I have written a patch along those lines. This is still a WIP patch
and it is mainly to demonstrate what I have in mind. There is one
test in select_parallel.sql failing after this patch, but I think that
is expected and we need to adjust that test. Let me know if you see
a flaw in this approach?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
prohibit_parallel_limit_subselect_v1.patchapplication/octet-stream; name=prohibit_parallel_limit_subselect_v1.patchDownload+36-1
On Tue, Aug 14, 2018 at 7:10 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Yeah, one idea could be that we detect this in
max_parallel_hazard_walker during the very first pass it performs on
query-tree. Basically, in the SubLink node check, we can detect
whether the subselect has Limit/Offset clause and if so, then we can
treat it as parallel_unsafe. I have tried that way and it prohibits
the parallel plan for the reported queries. However, I think more
analysis and verification is required to see if it can happen in any
other related cases.
This seems broken as well:
create table qwr(a int not null, b int not null, c text not null);
insert into qwr select i, i, (select prosrc from pg_proc where
oid=11734) from generate_series(1, 128000) i;
set parallel_setup_cost to 0;
analyze qwr;
select count(*) from qwr where (a, b) in (select a, row_number() over()
from qwr);
.m
On Tue, Aug 14, 2018 at 3:52 PM, Marko Tiikkaja <marko@joh.to> wrote:
On Tue, Aug 14, 2018 at 7:10 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:Yeah, one idea could be that we detect this in
max_parallel_hazard_walker during the very first pass it performs on
query-tree. Basically, in the SubLink node check, we can detect
whether the subselect has Limit/Offset clause and if so, then we can
treat it as parallel_unsafe. I have tried that way and it prohibits
the parallel plan for the reported queries. However, I think more
analysis and verification is required to see if it can happen in any
other related cases.This seems broken as well:
create table qwr(a int not null, b int not null, c text not null);
insert into qwr select i, i, (select prosrc from pg_proc where
oid=11734) from generate_series(1, 128000) i;
set parallel_setup_cost to 0;
analyze qwr;
select count(*) from qwr where (a, b) in (select a, row_number() over()
from qwr);
I am getting below error in above steps:
postgres=# insert into qwr select i, i, (select prosrc from
pg_proc where oid=11734) from generate_series(1, 128000) i;
ERROR: null value in column "c" violates not-null constraint
DETAIL: Failing row contains (1, 1, null).
If I remove 'not null' constraint from column c, then the above
statement works fine, but I am getting below plan which is a serial
plan:
postgres=# Explain select count(*) from qwr where (a, b) in (select a,
row_number() over() from qwr)
QUERY PLAN
-----------------------------------------------------------------------------------------
Aggregate (cost=12360.00..12360.01 rows=1 width=8)
-> Hash Semi Join (cost=7272.00..12200.00 rows=64000 width=0)
Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?))))
-> Seq Scan on qwr (cost=0.00..1847.00 rows=128000 width=8)
-> Hash (cost=4727.00..4727.00 rows=128000 width=12)
-> WindowAgg (cost=0.00..3447.00 rows=128000 width=12)
-> Seq Scan on qwr qwr_1 (cost=0.00..1847.00
rows=128000 width=4)
(7 rows)
I am not sure why I am not seeing the same problem as you.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi Amit,
On Tue, Aug 14, 2018 at 2:09 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Tue, Aug 14, 2018 at 3:52 PM, Marko Tiikkaja <marko@joh.to> wrote:
On Tue, Aug 14, 2018 at 7:10 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:Yeah, one idea could be that we detect this in
max_parallel_hazard_walker during the very first pass it performs on
query-tree. Basically, in the SubLink node check, we can detect
whether the subselect has Limit/Offset clause and if so, then we can
treat it as parallel_unsafe. I have tried that way and it prohibits
the parallel plan for the reported queries. However, I think more
analysis and verification is required to see if it can happen in any
other related cases.This seems broken as well:
create table qwr(a int not null, b int not null, c text not null);
insert into qwr select i, i, (select prosrc from pg_proc where
oid=11734) from generate_series(1, 128000) i;
set parallel_setup_cost to 0;
analyze qwr;
select count(*) from qwr where (a, b) in (select a, row_number()over()
from qwr);
I am getting below error in above steps:
postgres=# insert into qwr select i, i, (select prosrc from
pg_proc where oid=11734) from generate_series(1, 128000) i;
ERROR: null value in column "c" violates not-null constraint
DETAIL: Failing row contains (1, 1, null).
Sorry, try this instead:
insert into qwr select i, i, (select prosrc from pg_proc where
oid='ts_debug(regconfig,text)'::regprocedure) from generate_series(1,
128000) i;
.m
On Tue, Aug 14, 2018 at 4:42 PM, Marko Tiikkaja <marko@joh.to> wrote:
Hi Amit,
This seems broken as well:
create table qwr(a int not null, b int not null, c text not null);
insert into qwr select i, i, (select prosrc from pg_proc where
oid=11734) from generate_series(1, 128000) i;
set parallel_setup_cost to 0;
analyze qwr;
select count(*) from qwr where (a, b) in (select a, row_number()
over()
from qwr);I am getting below error in above steps:
postgres=# insert into qwr select i, i, (select prosrc from
pg_proc where oid=11734) from generate_series(1, 128000) i;
ERROR: null value in column "c" violates not-null constraint
DETAIL: Failing row contains (1, 1, null).Sorry, try this instead:
insert into qwr select i, i, (select prosrc from pg_proc where
oid='ts_debug(regconfig,text)'::regprocedure) from generate_series(1,
128000) i;
This looks related, but I think this is a different issue. The real
reason for this case is that row_number is marked as parallel_safe
which seems to be wrong. I think it should be marked as
parallel_unsafe. This needs some more analysis w.r.t which other
Window functions has a similar problem.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Aug 14, 2018 at 2:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
This looks related, but I think this is a different issue.
Sure.
The real
reason for this case is that row_number is marked as parallel_safe
which seems to be wrong. I think it should be marked as
parallel_unsafe.
Marking the function parallel safe doesn't seem wrong to me. The
non-parallel-safe part is that the input gets fed to it in different order
in different workers. And I don't really think that to be the function's
fault.
.m
Marko Tiikkaja <marko@joh.to> writes:
Marking the function parallel safe doesn't seem wrong to me. The
non-parallel-safe part is that the input gets fed to it in different order
in different workers. And I don't really think that to be the function's
fault.
So that basically opens the question of whether *any* window function
calculation can safely be pushed down to parallel workers.
Somewhat like the LIMIT/OFFSET case, it seems to me that we could only
expect to do this safely if the row ordering induced by the WINDOW clause
can be proven to be fully deterministic. The planner has no such smarts
at the moment AFAIR. In principle you could do it if there were
partitioning/ordering by a primary key, but I'm not excited about the
prospects of that being true often enough in practice to justify making
the check.
regards, tom lane
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Marking the function parallel safe doesn't seem wrong to me. The
non-parallel-safe part is that the input gets fed to it in different
order in different workers. And I don't really think that to be the
function's fault.
Tom> So that basically opens the question of whether *any* window
Tom> function calculation can safely be pushed down to parallel
Tom> workers.
Grepping the spec for the phrase "possibly non-deterministic" is quite
enlightening. Leaving out non-determinisms caused by timezone or actual
volatility, leaving out cases of non-determinism that we'd call
"stable", and leaving out features like multisets that we don't support
at all, here's the list of interesting cases (comments after each quoted
paragraph are mine):
6.28 <value expression>
d) An <array value constructor by query>.
i.e. ARRAY(select)
o) An <aggregate function> that specifies MIN or MAX and that simply
contains a <value expression> whose declared type is based on a
character string type, user-defined type, or datetime with time
zone type.
i.e. MIN(x) is non-deterministic if "x" can have distinguishable values
that compare equal. PG doesn't have that for text or timestamptz, unlike
the spec, but it does for citext or other user-defined types.
q) An <array aggregate function>.
i.e. array_agg()
u) A <window function> that specifies ROW_NUMBER, FIRST_VALUE,
LAST_VALUE, NTH_VALUE, NTILE, LEAD, or LAG, or whose associated
<window specification> specifies ROWS.
This covers those cases where window functions don't treat peer rows
together.
7.6 <table reference>
27) A <table reference> is possibly non-deterministic if the simply
contained <table primary> or <joined table> is possibly
non-deterministic or if <sample clause> is specified.
i.e. TABLESAMPLE is non-deterministic
7.16 <query specification>
a) The <set quantifier> DISTINCT is specified and one of the columns
of T has a data type of character string, user-defined type, TIME
WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.
c) The <select list>, <having clause>, or <window clause> contains a
reference to a column C of T that has a data type of character
string, user-defined type, TIME WITH TIME ZONE, or TIMESTAMP WITH
TIME ZONE, and the functional dependency G C, where G is the set
consisting of the grouping columns of T, holds in T.
For both the above two cases, if distinguishable values of a type
compare equal, it's non-deterministic which gets into the result.
7.17 <query expression>
a) The <query expression> contains a <result offset clause>.
b) The <query expression> contains a <fetch first clause>.
f) Both of the following are true:
i) T contains a set operator UNION and ALL is not specified, or T
contains either of the set operators EXCEPT or INTERSECT.
ii) At least one of the following is true:
1) The first or second operand contains a column that has a
declared type of character string.
2) The first or second operand contains a column that has a
declared type of datetime with time zone.
3) The first or second operand contains a column that has a
declared type that is a user-defined type.
(I've left out the many clauses which just amount to "if $thing contains
something which is possibly non-deterministic then it is possibly
non-deterministic")
--
Andrew (irc:RhodiumToad)
On Tue, Aug 14, 2018 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
Marking the function parallel safe doesn't seem wrong to me. The
non-parallel-safe part is that the input gets fed to it in different order
in different workers. And I don't really think that to be the function's
fault.So that basically opens the question of whether *any* window function
calculation can safely be pushed down to parallel workers.
I think we can consider it as a parallel-restricted operation. For
the purpose of testing, I have marked row_number as
parallel-restricted in pg_proc and I get the below plan:
postgres=# Explain select count(*) from qwr where (a, b) in (select a,
row_number() over() from qwr);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Aggregate (cost=46522.12..46522.13 rows=1 width=8)
-> Hash Semi Join (cost=24352.08..46362.12 rows=64001 width=0)
Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?))))
-> Gather (cost=0.00..18926.01 rows=128002 width=8)
Workers Planned: 2
-> Parallel Seq Scan on qwr (cost=0.00..18926.01
rows=64001 width=8)
-> Hash (cost=21806.06..21806.06 rows=128002 width=12)
-> WindowAgg (cost=0.00..20526.04 rows=128002 width=12)
-> Gather (cost=0.00..18926.01 rows=128002 width=4)
Workers Planned: 2
-> Parallel Seq Scan on qwr qwr_1
(cost=0.00..18926.01 rows=64001 width=4)
(11 rows)
This seems okay, though the results of the above parallel-execution
are not same as serial-execution. I think the reason for it is that
we don't get rows in predictable order from workers.
Somewhat like the LIMIT/OFFSET case, it seems to me that we could only
expect to do this safely if the row ordering induced by the WINDOW clause
can be proven to be fully deterministic. The planner has no such smarts
at the moment AFAIR. In principle you could do it if there were
partitioning/ordering by a primary key, but I'm not excited about the
prospects of that being true often enough in practice to justify making
the check.
Yeah, I am also not sure if it is worth adding the additional checks.
So, for now, we can treat any window function calculation as
parallel-restricted and if later anybody has a reason strong enough to
relax the restriction for some particular case, we will consider it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Greetings,
* Amit Kapila (amit.kapila16@gmail.com) wrote:
On Tue, Aug 14, 2018 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
Marking the function parallel safe doesn't seem wrong to me. The
non-parallel-safe part is that the input gets fed to it in different order
in different workers. And I don't really think that to be the function's
fault.So that basically opens the question of whether *any* window function
calculation can safely be pushed down to parallel workers.I think we can consider it as a parallel-restricted operation. For
the purpose of testing, I have marked row_number as
parallel-restricted in pg_proc and I get the below plan:postgres=# Explain select count(*) from qwr where (a, b) in (select a,
row_number() over() from qwr);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Aggregate (cost=46522.12..46522.13 rows=1 width=8)
-> Hash Semi Join (cost=24352.08..46362.12 rows=64001 width=0)
Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?))))
-> Gather (cost=0.00..18926.01 rows=128002 width=8)
Workers Planned: 2
-> Parallel Seq Scan on qwr (cost=0.00..18926.01
rows=64001 width=8)
-> Hash (cost=21806.06..21806.06 rows=128002 width=12)
-> WindowAgg (cost=0.00..20526.04 rows=128002 width=12)
-> Gather (cost=0.00..18926.01 rows=128002 width=4)
Workers Planned: 2
-> Parallel Seq Scan on qwr qwr_1
(cost=0.00..18926.01 rows=64001 width=4)
(11 rows)This seems okay, though the results of the above parallel-execution
are not same as serial-execution. I think the reason for it is that
we don't get rows in predictable order from workers.
You wouldn't get them in a predictable order even without
parallelization due to the lack of an ordering, so this hardly seems
like an issue.
Somewhat like the LIMIT/OFFSET case, it seems to me that we could only
expect to do this safely if the row ordering induced by the WINDOW clause
can be proven to be fully deterministic. The planner has no such smarts
at the moment AFAIR. In principle you could do it if there were
partitioning/ordering by a primary key, but I'm not excited about the
prospects of that being true often enough in practice to justify making
the check.Yeah, I am also not sure if it is worth adding the additional checks.
So, for now, we can treat any window function calculation as
parallel-restricted and if later anybody has a reason strong enough to
relax the restriction for some particular case, we will consider it.
Seems likely that we'll want this at some point, but certainly seems
like new work and not a small bit of it.
Thanks!
Stephen