BUG #15324: Non-deterministic behaviour from parallelised sub-query

Started by PG Bug reporting formalmost 8 years ago38 messagesbugs

noreply@postgresql.org

almost 8 years ago

The following bug has been logged on the website:

Bug reference: 15324
Logged by: Andrew Fletcher
Email address: andy@prestigedigital.com
PostgreSQL version: 10.5
Operating system: macOS High Sierra 10.13.6
Description:

Reproductions -

Make sure postgresql.conf sets max_parallel_workers_per_gather to 2 or
more

max_parallel_workers_per_gather = 2

Repro 1:

Create table from the following sql file -

https://www.dropbox.com/s/3cm643vmugcgkxh/events.sql.zip?dl=0

Execute this query (multiple times!)

select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);

Incorrect output -

account | type | data
---------+----------+--------------------------
304873 | pageview | {"page": "success.html"}
304875 | pageview | {"page": "c.html"}
304875 | pageview | {"page": "success.html"}
304885 | pageview | {"page": "a.html"}
304885 | pageview | {"page": "success.html"}
(5 rows)

Correct output -

Repro 2 -

Create table from the following sql file -

https://www.dropbox.com/s/mzglgm4a5x1mqno/repro1.sql.zip?dl=0

Execute this query (multiple times!)

select * from repro1 where account in (select account from repro1 where page
= 'success.html' limit 3);

Incorrect Output -

Correct output -

Full version string -

PostgreSQL 10.5 on x86_64-apple-darwin17.7.0, compiled by Apple LLVM version
9.1.0 (clang-902.0.39.2), 64-bit

Also reproduced (with slightly different non-determinism) on -

PostgreSQL 9.6.3, compiled by Visual C++ build 1800, 32-bit on Windows 10
Pro 1709, build 16299.547

Known workarounds -

1. max_parallel_workers_per_gather = 0
2. Add order by account asc to the subquery (works for both repros)

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

Hi,

On 2018-08-13 16:14:03 +0000, PG Bug reporting form wrote:

Execute this query (multiple times!)

select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);

Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.

Greetings,

Andres Freund

Andrew Fletcher

andy@prestigedigital.com

almost 8 years ago

In reply to: Andres Freund (#2)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

Sorry the bug report was unclear.

Its not just that its returning different accounts from the subquery.

In the first repro you can see account 304873 appears twice in the correct
result but only once in the incorrect one. Even though its the same data,
same query etc. If account 304873 is selected within the limit of the
subquery then all the results for it should be returned in the outer query.

In the second repro you can see more than 3 accounts in the outer query,
even though the inner one is limited to 3.

Hope that makes it clearer.

Cheers,

Andy

On Mon, Aug 13, 2018 at 5:35 PM, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2018-08-13 16:14:03 +0000, PG Bug reporting form wrote:

Execute this query (multiple times!)

select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);

Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.

Greetings,

Andres Freund

Marko Tiikkaja

marko@joh.to

almost 8 years ago

In reply to: Andres Freund (#2)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-08-13 16:14:03 +0000, PG Bug reporting form wrote:

Execute this query (multiple times!)

select * from events where account in (select account from events where
data->>'page' = 'success.html' limit 3);

Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.

While this is true, that's missing the point. This output, for example:

contains data from six different accounts, which should surely be
impossible regardless of which three accounts the subquery returns.

The one in repro1 is also problematic, because it shows that 304873, 304875
and 304885 were all selected, but not all rows for those accounts were
returned.

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Marko Tiikkaja (#4)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

Marko Tiikkaja <marko@joh.to> writes:

On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de> wrote:

Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.

While this is true, that's missing the point.

Yeah, I agree. I think probably what's happening is that the sub-select
is getting pushed down to the parallel workers and they are not all
computing the same set of sub-select results, leading to inconsistent
answers at the top level.

Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 8 years ago

In reply to: Tom Lane (#5)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

2018-08-13 19:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Marko Tiikkaja <marko@joh.to> writes:

On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres@anarazel.de>

wrote:

Well, the subselect with thelimit going to return different results from
run to run. Unless you add an ORDER BY there's no guaranteed order in
which tuples are returned. So I don't think it's surprising that you're
getting results that differ between runs.

While this is true, that's missing the point.

Yeah, I agree. I think probably what's happening is that the sub-select
is getting pushed down to the parallel workers and they are not all
computing the same set of sub-select results, leading to inconsistent
answers at the top level.

Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.

Isn't it default behave of LIMIT/OFFSET without ORDER BY clause?

If we don't need to solve order of rows, then parallel unsafe is not
necessary.

Regards

Pavel

Show quoted text

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Pavel Stehule (#6)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

Pavel Stehule <pavel.stehule@gmail.com> writes:

2018-08-13 19:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.

Isn't it default behave of LIMIT/OFFSET without ORDER BY clause?

In principle, the planner could prove in some cases that the results
were deterministic even with LIMIT/OFFSET. BuT I doubt it's worth
the trouble. I certainly wouldn't advocate for such logic to be
part of a back-patched bug fix.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 8 years ago

In reply to: Tom Lane (#7)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

On Monday, August 13, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2018-08-13 19:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
as making it parallel-unsafe, for exactly the reason that that makes
its results non-deterministic.

Isn't it default behave of LIMIT/OFFSET without ORDER BY clause?

In principle, the planner could prove in some cases that the results
were deterministic even with LIMIT/OFFSET. BuT I doubt it's worth
the trouble. I certainly wouldn't advocate for such logic to be
part of a back-patched bug fix.

Could the planner stick a materialize node there that feeds the same set of
originally selected records to any parallel executors that end up pulling
from it?

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: David G. Johnston (#8)

Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Could the planner stick a materialize node there that feeds the same set of
originally selected records to any parallel executors that end up pulling
from it?

No. At least, Materialize as it stands doesn't help. There's no
provision for pushing rowsets to parallel workers, only pulling
from them, and it would be an extremely nontrivial thing to add
AFAICS (for one thing, it'd make the leader process even more of
a bottleneck than it is already).

Maybe you could do something with dumping a rowset into a tuplestore
and forcing that out to temp files on-disk before starting any of
the parallel workers, then letting them read it in from the temp
files. But that still looks like a lot of new work and completely
not fit for back-patching, even if we had it.

regards, tom lane

BUG #15324: Non-deterministic behaviour from parallelised sub-query

Attachments: