BUG #15577: Query returns different results when executed multiple times
The following bug has been logged on the website:
Bug reference: 15577
Logged by: Bartosz Polnik
Email address: bartoszpolnik@gmail.com
PostgreSQL version: 11.1
Operating system: Windows 10
Description:
Hello,
A couple of days ago I stumbled upon a query that doesn't return all rows it
should. When we execute it multiple times, we get only a subset of data. The
query is:
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
We always expect 31 rows in return:
table_a_id | table_c_id | table_b_id
------------+------------+------------
16116185 | 328860 | 2936924
16115788 | 348539 | 2913754
16115788 | 348539 | 3039173
16115788 | 348539 | 2913874
16116256 | 293541 | 2997160
16116256 | 293541 | 2901938
16116256 | 293541 | 2901933
16114789 | 292051 | 3038539
16114813 | 342353 | 3052371
16116069 | 351585 | 3025941
16114814 | 331329 | 2946332
16115861 | 350487 | 2933633
16115106 | 350047 | 2902075
16116066 | 351434 | 3010909
16114811 | 298605 | 2998909
16114811 | 298605 | 2893809
16114811 | 298605 | 2987038
16114811 | 298605 | 3038877
16114811 | 298605 | 3010694
16114811 | 298605 | 2893188
16114811 | 298605 | 2893391
16114811 | 298605 | 2983360
16114811 | 298605 | 3038221
16114811 | 298605 | 3026078
16114811 | 298605 | 2998966
16116260 | 290583 | 2955483
16116260 | 290583 | 2921135
16116260 | 290583 | 2947914
16116260 | 290583 | 2901669
16116241 | 295971 | 3038921
16116249 | 296708 | 3038888
But when we run it using psql, we get different results (not all rows are
returned). Here's the query I used to get the execution plan and example
output from two consecutive plan executions:
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
Nested Loop Left Join (actual time=157.436..329.538 rows=21 loops=1)
-> Gather (actual time=157.407..349.743 rows=21 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
-> Hash Join (actual time=0.300..18.206 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.107..17.326 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.065..0.065 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.043..0.049 rows=8 loops=2)
Filter: (group = 'A')
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b (actual
time=0.273..0.278 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.019..0.019 rows=1 loops=21)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.218 ms
Execution Time: 350.283 ms
Above we only got 21 rows, instead of 31.
Nested Loop Left Join (actual time=164.189..358.068 rows=28 loops=1)
-> Gather (actual time=164.140..370.034 rows=28 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=188.718..306.289 rows=14 loops=2)
-> Hash Join (actual time=0.321..21.860 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.113..20.784 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.063..0.063 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.040..0.049 rows=8 loops=2)
Filter: (group = 'A')
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b (actual
time=0.294..0.300 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.031..0.031 rows=1 loops=28)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 0.971 ms
Execution Time: 371.202 ms
Here we only got 28 rows, instead of 31.
Used indices:
create index table_b_idx_target_id
on test.table_b (target_id)
where (t_active IS TRUE);
create unique index table_a_uq_001
on table_a (table_c_id, date);
And views:
create view test.table_b_active as
SELECT *
FROM test.table_b
WHERE (table_b.t_active IS TRUE);
I made a couple of tests and couldn't reproduce this problem with
max_parallel_workers=0. With this parameter set, we get the execution
plan:
Nested Loop Left Join (actual time=212.010..509.274 rows=31 loops=1)
-> Gather (actual time=211.976..508.855 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 0
-> Nested Loop (actual time=211.593..508.425 rows=31 loops=1)
-> Hash Join (actual time=0.063..31.628 rows=1890 loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.042..30.000 rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.016..0.016 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d_id td (actual
time=0.008..0.012 rows=8 loops=1)
Filter: (group = 'A')
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b (actual
time=0.248..0.251 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.014..0.014 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.135 ms
Execution Time: 509.431 ms
Nr of rows returned is correct - 31.
I tried to create a clean schema with test data, but couldn't get the same
execution plan, so I can't include that.
The data above comes from db snapshot with table, index and view names
changed by hand, so there might be some typos. Sorry about them.
I'm using PostgreSQL 11.1, compiled by Visual C++ build 1914, 64-bit.
Best regards,
Bartosz Polnik
On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
<noreply@postgresql.org> wrote:
A couple of days ago I stumbled upon a query that doesn't return all rows it
should.
-> Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
-> Nested Loop (actual time=188.718..306.289 rows=14 loops=2)
Above is the first place where the actual row counts diverge. This
makes it appear that it's the join to table_b that's causing the
variation. The parallel worker is meant to be running with the same
snapshot as the main process so that it properly sees the same tuples
as visible. If for some reason that was not working correctly then
that might explain why you get differing results.
Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?
Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?
I'm able to produce the same plan as you're getting with the attached
script, but unable to see any sort of row variation. My row counts
don't match yours exactly. If I adjust the rows in table_b too much I
get a completely different plan.
I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.
I tried to create a clean schema with test data, but couldn't get the same
execution plan, so I can't include that.
If you're removing columns to simplify the test script then you made
need to add some columns back to pad the tuples out a bit in order to
keep the tables around the same size. Variations in the table size
will have an effect on the join costs and could change the join order.
Also, a smaller table_c may not receive a parallel seq scan. In my
mockup, I made table_b and table_a fairly large so as to keep the
joins as parameterized nested loops. I also left out indexes on the
"id" columns to reduce the chances of a Merge Join.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
mockup_bug15577.sqltext/plain; charset=US-ASCII; name=mockup_bug15577.sqlDownload
Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?
I confirm. There are no concurrent changes being made and I'm the only
client connecting to that db.
Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?
I executed VACUUM (FULL, ANALYZE) on all tables from the query, but
I'm still getting different results.
With your script, I'm always getting the same output, so it must be
something else.
I thought about sharing db dump, but I can't do that due to the company
policy. What I could share though, is anonymized db dump. Do you know if
there is any good tool for performing anonymization?
On Mon, Jan 7, 2019 at 2:57 AM David Rowley <david.rowley@2ndquadrant.com>
wrote:
Show quoted text
On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
<noreply@postgresql.org> wrote:A couple of days ago I stumbled upon a query that doesn't return all
rows it
should.
-> Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
-> Nested Loop (actual time=188.718..306.289 rows=14 loops=2)
Above is the first place where the actual row counts diverge. This
makes it appear that it's the join to table_b that's causing the
variation. The parallel worker is meant to be running with the same
snapshot as the main process so that it properly sees the same tuples
as visible. If for some reason that was not working correctly then
that might explain why you get differing results.Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?I'm able to produce the same plan as you're getting with the attached
script, but unable to see any sort of row variation. My row counts
don't match yours exactly. If I adjust the rows in table_b too much I
get a completely different plan.I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.
I tried to create a clean schema with test data, but couldn't get the
same
execution plan, so I can't include that.
If you're removing columns to simplify the test script then you made
need to add some columns back to pad the tuples out a bit in order to
keep the tables around the same size. Variations in the table size
will have an effect on the join costs and could change the join order.
Also, a smaller table_c may not receive a parallel seq scan. In my
mockup, I made table_b and table_a fairly large so as to keep the
joins as parameterized nested loops. I also left out indexes on the
"id" columns to reduce the chances of a Merge Join.--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 7, 2019 at 10:29 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.
Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.
With your script, I'm always getting the same output, so it must be something else.
What do you get if you run it with SET force_parallel_mode = on?
--
Thomas Munro
http://www.enterprisedb.com
I've modified the script:
1. I Added SET force_parallel_mode = on;:
*SET force_parallel_mode = on;*
explain (costs off, analyze)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
2. I've changed group and group_type to be enums (we have them as enums in
db):
drop table if exists table_c;
drop table if exists table_d;
*drop type if exists "group";*
*drop type if exists "group_type";*
*create type group_type AS ENUM ('A', 'B');*
*create type "group" AS ENUM ('A', 'B');*
Updated version of the script is in attachment.
When I execute it, I get:
c:\Program Files\PostgreSQL\11\bin>psql -U postgres -f
"C:\mockup_bug15577.sql" test
DROP VIEW
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TYPE
DROP TYPE
CREATE TABLE
INSERT 0 20000
CREATE TYPE
CREATE TYPE
CREATE TABLE
INSERT 0 200000
CREATE TABLE
INSERT 0 101736
CREATE TABLE
INSERT 0 8
INSERT 0 55
CREATE INDEX
CREATE INDEX
CREATE VIEW
ANALYZE
SET
SET
ALTER TABLE
DROP VIEW
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TYPE
DROP TYPE
CREATE TYPE
CREATE TYPE
CREATE TABLE
INSERT 0 20000
CREATE TABLE
INSERT 0 200000
CREATE TABLE
INSERT 0 101736
CREATE TABLE
INSERT 0 8
INSERT 0 55
CREATE INDEX
CREATE INDEX
CREATE VIEW
ANALYZE
SET
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=1.185..81.170 rows=112 loops=1)
-> Gather (actual time=1.179..95.573 rows=112 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=0.076..7.182 rows=56 loops=2)
-> Hash Join (actual time=0.050..4.592 rows=936 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.004..4.358 rows=1755 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49114
-> Hash (actual time=0.019..0.020 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.011..0.014 rows=8 loops=1)
Filter: ("group" = 'A'::"group")
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.003..0.003 rows=0 loops=1871)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 1
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.003..0.003 rows=0 loops=112)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 0.773 ms
Execution Time: 96.037 ms
(23 rows)
Nr of rows doesn't change from execution to execution.
On Mon, Jan 7, 2019 at 1:20 PM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
Show quoted text
On Mon, Jan 7, 2019 at 10:29 PM Bartosz Polnik <bartoszpolnik@gmail.com>
wrote:Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?I confirm. There are no concurrent changes being made and I'm the only
client connecting to that db.
Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm
still getting different results.
With your script, I'm always getting the same output, so it must be
something else.
What do you get if you run it with SET force_parallel_mode = on?
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
I've modified the script:
1. I Added SET force_parallel_mode = on;:
What if you try that with your original query in the database that
shows the problem? Or alternatively SET parallel_leader_participation
= on. What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.
--
Thomas Munro
http://www.enterprisedb.com
For the following sql:
*SET force_parallel_mode = on;*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
I got:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=145.151..284.612 rows=26 loops=1)
-> Gather (actual time=145.092..304.166 rows=26 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=153.289..247.768 rows=13 loops=2)
-> Hash Join (actual time=0.221..15.028 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.085..14.442 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.044..0.045 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.033 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.242..0.246 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.016..0.016 rows=1 loops=26)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 5.024 ms
Execution Time: 304.858 ms
On Second execution:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=228.405..279.383 rows=19 loops=1)
-> Gather (actual time=228.376..288.807 rows=19 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=150.785..242.440 rows=10 loops=2)
-> Hash Join (actual time=0.212..14.353 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.075..13.794 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.050..0.050 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d_id td (actual
time=0.031..0.037 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.237..0.241 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.021..0.021 rows=1 loops=19)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 4.816 ms
Execution Time: 289.503 ms
I executed it a couple of times and saw in output 26 rows (as above), 21
rows (also above), 28 rows, 28 rows, 29 rows etc.
The changing parts (nr of rows) were: Nested Loop Left Join and Nested Loop.
For the following sql:
*SET parallel_leader_participation = on;*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
I received:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=143.805..279.354 rows=29 loops=1)
-> Gather (actual time=143.735..299.066 rows=29 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=149.334..242.108 rows=15 loops=2)
-> Hash Join (actual time=0.203..14.536 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.072..13.971 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.043..0.043 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.034 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.237..0.240 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.018..0.018 rows=1 loops=29)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 5.070 ms
Execution Time: 299.849 ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=231.714..276.770 rows=26 loops=1)
-> Gather (actual time=231.654..295.971 rows=26 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=151.209..239.965 rows=13 loops=2)
-> Hash Join (actual time=0.214..14.467 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.070..13.923 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.045..0.045 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.033 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.235..0.238 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.017..0.017 rows=1 loops=26)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 4.976 ms
Execution Time: 296.729 ms
I also made a test with:
*SET parallel_leader_participation = off;*
*SET force_parallel_mode = on;*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
The output is correct - 31 rows, but the plan is different:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (actual time=263.880..507.668 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
-> Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
-> Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Seq Scan on table_c tc (actual time=0.128..23.044
rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.064..0.069 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.045..0.052 rows=8 loops=1)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.203..0.206 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.021..0.021 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date =
'2018-08-31'::date))
Planning Time: 5.051 ms
Execution Time: 507.858 ms
On Mon, Jan 7, 2019 at 8:59 PM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
Show quoted text
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <bartoszpolnik@gmail.com>
wrote:I've modified the script:
1. I Added SET force_parallel_mode = on;:What if you try that with your original query in the database that
shows the problem? Or alternatively SET parallel_leader_participation
= on. What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.--
Thomas Munro
http://www.enterprisedb.com
On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
Gather (actual time=263.880..507.668 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
-> Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
-> Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.064..0.069 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Hmm. So if the plan runs entirely in a worker, we get 31 rows. If
the plan runs entirely in the leader, we get 31 rows. But if it runs
in both, we get fewer. In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890". But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.
As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")? That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.
--
Thomas Munro
http://www.enterprisedb.com
For:
SET max_parallel_workers = 0;
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
I got:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=196.969..433.340 rows=31 loops=1)
-> Gather (actual time=196.897..432.757 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 0
-> Nested Loop (actual time=196.384..432.197 rows=31 loops=1)
-> Hash Join (actual time=0.104..24.477 rows=1890 loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.060..23.639 rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.022..0.023 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.009..0.013 rows=8 loops=1)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.212..0.215 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.019..0.019 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 5.084 ms
Execution Time: 433.630 ms
table_a_id | table_c_id | table_b_id
------------+------------+------------
16116185 | 328860 | 2936924
16115788 | 348539 | 3039173
16115788 | 348539 | 2913874
16115788 | 348539 | 2913754
16116256 | 293541 | 2901938
16116256 | 293541 | 2901933
16116256 | 293541 | 2997160
16114789 | 292051 | 3038539
16114813 | 342353 | 3052371
16116069 | 351585 | 3025941
16114814 | 331329 | 2946332
16115861 | 350487 | 2933633
16115106 | 350047 | 2902075
16116066 | 351434 | 3010909
16114811 | 298605 | 2893809
16114811 | 298605 | 2987038
16114811 | 298605 | 3038877
16114811 | 298605 | 3010694
16114811 | 298605 | 2893188
16114811 | 298605 | 2893391
16114811 | 298605 | 2983360
16114811 | 298605 | 3038221
16114811 | 298605 | 3026078
16114811 | 298605 | 2998966
16114811 | 298605 | 2998909
16116260 | 290583 | 2955483
16116260 | 290583 | 2921135
16116260 | 290583 | 2947914
16116260 | 290583 | 2901669
16116241 | 295971 | 3038921
16116249 | 296708 | 3038888
(31 rows)
When I use force_parallel_mode = on, parallel_leader_participation = off
and removed max_parallel_workers line from the script (so that default
applies), I got:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (actual time=271.255..530.713 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Nested Loop Left Join (actual time=200.434..437.321 rows=31 loops=1)
-> Nested Loop (actual time=200.126..436.547 rows=31 loops=1)
-> Hash Join (actual time=0.322..25.510 rows=1890 loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Seq Scan on table_c tc (actual time=0.141..24.501
rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.082..0.084 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.049..0.066 rows=8 loops=1)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.214..0.217 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.024..0.024 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date =
'2018-08-31'::date))
Planning Time: 5.641 ms
Execution Time: 530.926 ms
table_a_id | table_c_id | table_b_id
------------+------------+------------
16116185 | 328860 | 2936924
16115788 | 348539 | 3039173
16115788 | 348539 | 2913874
16115788 | 348539 | 2913754
16116256 | 293541 | 2901938
16116256 | 293541 | 2901933
16116256 | 293541 | 2997160
16114789 | 292051 | 3038539
16114813 | 342353 | 3052371
16116069 | 351585 | 3025941
16114814 | 331329 | 2946332
16115861 | 350487 | 2933633
16115106 | 350047 | 2902075
16116066 | 351434 | 3010909
16114811 | 298605 | 2893809
16114811 | 298605 | 2987038
16114811 | 298605 | 3038877
16114811 | 298605 | 3010694
16114811 | 298605 | 2893188
16114811 | 298605 | 2893391
16114811 | 298605 | 2983360
16114811 | 298605 | 3038221
16114811 | 298605 | 3026078
16114811 | 298605 | 2998966
16114811 | 298605 | 2998909
16116260 | 290583 | 2955483
16116260 | 290583 | 2921135
16116260 | 290583 | 2947914
16116260 | 290583 | 2901669
16116241 | 295971 | 3038921
16116249 | 296708 | 3038888
(31 rows)
Rows returned by two queries are the same.
On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
Show quoted text
On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <bartoszpolnik@gmail.com>
wrote:Gather (actual time=263.880..507.668 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Nested Loop Left Join (actual time=193.246..415.020 rows=31loops=1)
-> Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
-> Hash Join (actual time=0.287..23.958 rows=1890loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Seq Scan on table_c tc (actualtime=0.128..23.044 rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.064..0.069 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actualtime=0.045..0.052 rows=8 loops=1)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b(actual time=0.203..0.206 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <='2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actualtime=0.021..0.021 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date =
'2018-08-31'::date))
Hmm. So if the plan runs entirely in a worker, we get 31 rows. If
the plan runs entirely in the leader, we get 31 rows. But if it runs
in both, we get fewer. In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890". But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")? That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.--
Thomas Munro
http://www.enterprisedb.com
On Tue, Jan 8, 2019 at 12:04 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")? That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.Rows returned by two queries are the same.
Can we please also see the actual output in the broken case, where it
runs in two processes and produces fewer than 31 rows?
--
Thomas Munro
http://www.enterprisedb.com
Query:
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
Output:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=236.640..281.711 rows=26 loops=1)
-> Gather (actual time=236.578..304.194 rows=26 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=154.947..244.097 rows=13 loops=2)
-> Hash Join (actual time=0.223..14.649 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.083..14.102 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.044..0.044 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.033 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.239..0.242 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.020..0.020 rows=1 loops=26)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 5.047 ms
Execution Time: 304.960 ms
table_a_id | table_c_id | table_b_id
------------+------------+------------
16116185 | 328860 | 2936924
16116256 | 293541 | 2901938
16115788 | 348539 | 3039173
16115788 | 348539 | 2913874
16115788 | 348539 | 2913754
16114813 | 342353 | 3052371
16114789 | 292051 | 3038539
16116069 | 351585 | 3025941
16115861 | 350487 | 2933633
16114814 | 331329 | 2946332
16115106 | 350047 | 2902075
16116260 | 290583 | 2955483
16116066 | 351434 | 3010909
16114811 | 298605 | 2893809
16114811 | 298605 | 2987038
16114811 | 298605 | 3038877
16114811 | 298605 | 3010694
16114811 | 298605 | 2893188
16114811 | 298605 | 2893391
16114811 | 298605 | 2983360
16114811 | 298605 | 3038221
16114811 | 298605 | 3026078
16114811 | 298605 | 2998966
16116249 | 296708 | 3038888
16114811 | 298605 | 2998909
16116241 | 295971 | 3038921
(26 rows)
Here's an example with only 21 rows:
table_a_id | table_c_id | table_b_id
------------+------------+------------
16116185 | 328860 | 2936924
16115788 | 348539 | 3039173
16115788 | 348539 | 2913874
16115788 | 348539 | 2913754
16116256 | 293541 | 2901938
16116256 | 293541 | 2901933
16116256 | 293541 | 2997160
16116069 | 351585 | 3025941
16114789 | 292051 | 3038539
16114813 | 342353 | 3052371
16114814 | 331329 | 2946332
16115861 | 350487 | 2933633
16116066 | 351434 | 3010909
16114811 | 298605 | 2893809
16115106 | 350047 | 2902075
16116241 | 295971 | 3038921
16116260 | 290583 | 2955483
16116249 | 296708 | 3038888
16116260 | 290583 | 2921135
16116260 | 290583 | 2947914
16116260 | 290583 | 2901669
(21 rows)
On Tue, Jan 8, 2019 at 12:39 AM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
Show quoted text
On Tue, Jan 8, 2019 at 12:04 PM Bartosz Polnik <bartoszpolnik@gmail.com>
wrote:On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <
thomas.munro@enterprisedb.com> wrote:
As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")? That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.Rows returned by two queries are the same.
Can we please also see the actual output in the broken case, where it
runs in two processes and produces fewer than 31 rows?--
Thomas Munro
http://www.enterprisedb.com
On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
(26 rows)
Here's an example with only 21 rows:
Interestingly between those two results, all your 31 distinct rows
exist. I threw them into a table and did:
# select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
a | b | c | a | b | c
----------+--------+---------+----------+--------+---------
16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
16114811 | 298605 | 2893188 | | |
16114811 | 298605 | 2893391 | | |
16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
16114811 | 298605 | 2983360 | | |
16114811 | 298605 | 2987038 | | |
16114811 | 298605 | 2998909 | | |
16114811 | 298605 | 2998966 | | |
16114811 | 298605 | 3010694 | | |
16114811 | 298605 | 3026078 | | |
16114811 | 298605 | 3038221 | | |
16114811 | 298605 | 3038877 | | |
16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
| | | 16116256 | 293541 | 2901933
16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
| | | 16116256 | 293541 | 2997160
| | | 16116260 | 290583 | 2901669
| | | 16116260 | 290583 | 2921135
| | | 16116260 | 290583 | 2947914
16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
(31 rows)
IOW, it does not seem like there's a particlar pattern to what rows are missing.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 8, 2019 at 1:31 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
(26 rows)
Here's an example with only 21 rows:
Interestingly between those two results, all your 31 distinct rows
exist. I threw them into a table and did:# select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
a | b | c | a | b | c
----------+--------+---------+----------+--------+---------
16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
16114811 | 298605 | 2893188 | | |
16114811 | 298605 | 2893391 | | |
16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
16114811 | 298605 | 2983360 | | |
16114811 | 298605 | 2987038 | | |
16114811 | 298605 | 2998909 | | |
16114811 | 298605 | 2998966 | | |
16114811 | 298605 | 3010694 | | |
16114811 | 298605 | 3026078 | | |
16114811 | 298605 | 3038221 | | |
16114811 | 298605 | 3038877 | | |
16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
| | | 16116256 | 293541 | 2901933
16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
| | | 16116256 | 293541 | 2997160
| | | 16116260 | 290583 | 2901669
| | | 16116260 | 290583 | 2921135
| | | 16116260 | 290583 | 2947914
16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
(31 rows)IOW, it does not seem like there's a particlar pattern to what rows are missing.
Is the pattern something like this? When the index probe of b should
find multiple matches, it's returning only one and then we move onto
the next outer row? For example we see this for table_c_id = 298605,
of which we see 11 examples in the 31-row output (with different
table_b_id values), but only one in the 21-row output, with the first
table_b_id value.
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Jan 8, 2019 at 1:40 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 8, 2019 at 1:31 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
(26 rows)
Here's an example with only 21 rows:
Interestingly between those two results, all your 31 distinct rows
exist. I threw them into a table and did:# select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
a | b | c | a | b | c
----------+--------+---------+----------+--------+---------
16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
16114811 | 298605 | 2893188 | | |
16114811 | 298605 | 2893391 | | |
16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
16114811 | 298605 | 2983360 | | |
16114811 | 298605 | 2987038 | | |
16114811 | 298605 | 2998909 | | |
16114811 | 298605 | 2998966 | | |
16114811 | 298605 | 3010694 | | |
16114811 | 298605 | 3026078 | | |
16114811 | 298605 | 3038221 | | |
16114811 | 298605 | 3038877 | | |
16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
| | | 16116256 | 293541 | 2901933
16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
| | | 16116256 | 293541 | 2997160
| | | 16116260 | 290583 | 2901669
| | | 16116260 | 290583 | 2921135
| | | 16116260 | 290583 | 2947914
16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
(31 rows)IOW, it does not seem like there's a particlar pattern to what rows are missing.
Is the pattern something like this? When the index probe of b should
find multiple matches, it's returning only one and then we move onto
the next outer row? For example we see this for table_c_id = 298605,
of which we see 11 examples in the 31-row output (with different
table_b_id values), but only one in the 21-row output, with the first
table_b_id value.
It's as if it thought it was executing a semi-join.
--
Thomas Munro
http://www.enterprisedb.com
On Tue, 8 Jan 2019 at 13:43, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
It's as if it thought it was executing a semi-join.
EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
would act as a semi-join.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 8, 2019 at 1:48 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
On Tue, 8 Jan 2019 at 13:43, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
It's as if it thought it was executing a semi-join.
EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
would act as a semi-join.
Hmm, yes, node->js.single_match == true in one (but not both!) process
would fit these results. But I don't see a mechanism for that. I
*guess* it's the worker that is skipping duplicates, because the
leader usually has time to emit a few tuples while the worker's
warming up and we see some duplicates (348539) in first few results
(at least it usually does on my machines for eager plans, though maybe
on Windows it's different?)
Bartosz, can we please try with force_parallel_mode = off, but also
parallel_leader_participation = off? (I meant to write that in an
earlier email but accidentally wrote "on". Gah.) Then we should get
the same plan (the forced parallel plan is a bit different as you
noted; maybe whatever is broken isn't triggered that way). You might
need to set parallel_workers to 2 on the table for it to pick a
parallel plan without leader participation.
--
Thomas Munro
http://www.enterprisedb.com
Sure.
*SET force_parallel_mode = off;*
*SET parallel_leader_participation = off;*
*alter table_c set (parallel_workers=2);*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);
Output:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=203.910..346.758 rows=31 loops=1)
-> Gather (actual time=203.802..373.228 rows=31 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (actual time=157.358..253.888 rows=16 loops=2)
-> Hash Join (actual time=0.365..15.749 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.113..15.043 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.078..0.078 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.058..0.065 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.248..0.252 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.030..0.030 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 4.923 ms
Execution Time: 374.408 ms
31 rows - correct result.
On Tue, Jan 8, 2019 at 2:30 AM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
Show quoted text
On Tue, Jan 8, 2019 at 1:48 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:On Tue, 8 Jan 2019 at 13:43, Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
It's as if it thought it was executing a semi-join.
EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
would act as a semi-join.Hmm, yes, node->js.single_match == true in one (but not both!) process
would fit these results. But I don't see a mechanism for that. I
*guess* it's the worker that is skipping duplicates, because the
leader usually has time to emit a few tuples while the worker's
warming up and we see some duplicates (348539) in first few results
(at least it usually does on my machines for eager plans, though maybe
on Windows it's different?)Bartosz, can we please try with force_parallel_mode = off, but also
parallel_leader_participation = off? (I meant to write that in an
earlier email but accidentally wrote "on". Gah.) Then we should get
the same plan (the forced parallel plan is a bit different as you
noted; maybe whatever is broken isn't triggered that way). You might
need to set parallel_workers to 2 on the table for it to pick a
parallel plan without leader participation.--
Thomas Munro
http://www.enterprisedb.com
On Tue, Jan 8, 2019 at 10:45 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
SET force_parallel_mode = off;
SET parallel_leader_participation = off;alter table_c set (parallel_workers=2);
...31 rows - correct result.
Huh. So now we have:
1. Leader only (max_parallel_workers = 0): OK
2. Worker only (parallel_leader_participation = off, parallel_workers = 2): OK
3. Leader + worker: broken, short results
4. Worker only with slightly different plan (force_parallel_mode = on): OK
Ahh, I just reproduced this. Take mockup_bug15577.sql (Bartosz's
version) and then do this:
insert into table_c select * from table_c;
insert into table_b
select generate_series(1000000, 1000010), 100112, 'A', 99888,
'2018-09-01', true,
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
Now the number of results from the query in the script is not stable,
varying between 235 or 246 results for me. Not sure what's going on
yet.
--
Thomas Munro
http://www.enterprisedb.com
On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
Now the number of results from the query in the script is not stable,
varying between 235 or 246 results for me. Not sure what's going on
yet.
When there are multiple matches found by the Index Scan, sometimes
only the first one makes it through to the Nested Loop, in the leader.
It's not because of single_match being inappropriately set as I
speculated earlier (though that'd behave similarly I guess) but
because the quals are somehow broken in between invocations so that
ExecScan() filters the rest of the matches out, and then fixed again
on the next rescan. This happens whenever the leader takes a break to
read tuple from parallel workers in between executing the Nested Loop
node, which it often does after reading the first of several tuples
resulting from a single rescan.
What exactly is being corrupted and how, I don't yet know, and I need
to leave this here for today, but that's what I've got so far.
--
Thomas Munro
http://www.enterprisedb.com
Thomas, I tried to reproduce this case with changes you listed, but failed
to do that - the number of results was the same on each run. Could you post
an updated script in an attachment?
On Wed, Jan 9, 2019 at 5:01 AM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:
Show quoted text
On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:Now the number of results from the query in the script is not stable,
varying between 235 or 246 results for me. Not sure what's going on
yet.When there are multiple matches found by the Index Scan, sometimes
only the first one makes it through to the Nested Loop, in the leader.
It's not because of single_match being inappropriately set as I
speculated earlier (though that'd behave similarly I guess) but
because the quals are somehow broken in between invocations so that
ExecScan() filters the rest of the matches out, and then fixed again
on the next rescan. This happens whenever the leader takes a break to
read tuple from parallel workers in between executing the Nested Loop
node, which it often does after reading the first of several tuples
resulting from a single rescan.What exactly is being corrupted and how, I don't yet know, and I need
to leave this here for today, but that's what I've got so far.--
Thomas Munro
http://www.enterprisedb.com