Wrong example in the bloom documentation

Started by Daniel Westermann (DWE)almost 6 years ago15 messageshackersdocs

daniel.westermann@dbi-services.com

almost 6 years ago

hackersdocs

Hi,

I've briefly discussed this with Bruce some time ago in [1]/messages/by-id/20191105231854.GA26542@momjian.us.
Replaying the example referenced in the documentation does not give a Bitmap Heap Scan on tbloom but a parallel seq scan with the default configuration:

-- tested on head
postgres=# CREATE TABLE tbloom AS
postgres-# SELECT
postgres-# (random() * 1000000)::int as i1,
postgres-# (random() * 1000000)::int as i2,
postgres-# (random() * 1000000)::int as i3,
postgres-# (random() * 1000000)::int as i4,
postgres-# (random() * 1000000)::int as i5,
postgres-# (random() * 1000000)::int as i6
postgres-# FROM
postgres-# generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 0.895 ms
JIT:
Functions: 6
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms
Execution Time: 2288.056 ms
(12 rows)

As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct:
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
(1 row)
postgres=# CREATE TABLE tbloom AS
postgres-# SELECT
postgres-# (random() * 1000000)::int as i1,
postgres-# (random() * 1000000)::int as i2,
postgres-# (random() * 1000000)::int as i3,
postgres-# (random() * 1000000)::int as i4,
postgres-# (random() * 1000000)::int as i5,
postgres-# (random() * 1000000)::int as i6
postgres-# FROM
postgres-# generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbloom (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0 loops=1)
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Index Recheck: 2329
Heap Blocks: exact=2288
-> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329 loops=1)
Index Cond: ((i2 = 898732) AND (i5 = 123451))
Planning time: 282.059 ms
Execution time: 2286.138 ms
(8 rows)

The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well:

postgres=# set max_parallel_workers_per_gather = 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning time: 0.580 ms
Execution time: 1148.247 ms
(8 rows)

Starting with PostgreSQL 10 the example in the documentation is therefore wrong. Attached a proposal to fix this. The new example starts with 100x reduced rows (as suggested by Bruce in [1]/messages/by-id/20191105231854.GA26542@momjian.us and adds a note that the behavior changes as soon as parallel execution is cheaper than the index access.

Thoughts?

Regards
Daniel

[1]: /messages/by-id/20191105231854.GA26542@momjian.us

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

almost 6 years ago

In reply to: Daniel Westermann (DWE) (#1)

hackersdocs

Re: Wrong example in the bloom documentation

Hi,

any thoughts on this?

Regards
Daniel

________________________________
From: Daniel Westermann (DWE)
Sent: Sunday, September 27, 2020 17:58
To: Pg Docs <pgsql-docs@lists.postgresql.org>
Subject: Wrong example in the bloom documentation

Hi,

The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well:

Thoughts?

Regards
Daniel

[1]: /messages/by-id/20191105231854.GA26542@momjian.us

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

almost 6 years ago

In reply to: Daniel Westermann (DWE) (#1)

hackersdocs

Hi,

as this does not get any attention on the docs-list, once again here.
Any thoughts on this?

Regards
Daniel

From: Daniel Westermann (DWE)
Sent: Sunday, September 27, 2020 17:58
To: Pg Docs <pgsql-docs@lists.postgresql.org>
Subject: Wrong example in the bloom documentation

Hi,

-- tested on head
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0 loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
Planning Time: 0.895 ms
JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms
Execution Time: 2288.056 ms
(12 rows)

As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct:
postgres=# select version();
                                                 version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
(1 row)
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbloom (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2329
   Heap Blocks: exact=2288
   -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
Planning time: 282.059 ms
Execution time: 2286.138 ms
(8 rows)

The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well:

postgres=# set max_parallel_workers_per_gather = 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on tbloom (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0 loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
Planning time: 0.580 ms
Execution time: 1148.247 ms
(8 rows)

Thoughts?

Regards
Daniel

[1]: /messages/by-id/20191105231854.GA26542@momjian.us

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Daniel Westermann (DWE) (#3)

hackersdocs

Re: Wrong example in the bloom documentation

On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:

Hi,

as this does not get any attention on the docs-list, once again here.
Any thoughts on this?

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

almost 6 years ago

In reply to: Bruce Momjian (#4)

hackersdocs

Re: Wrong example in the bloom documentation

Hi Bruce,

On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:

Hi,

as this does not get any attention on the docs-list, once again here.
Any thoughts on this?

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

Regards
Daniel

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Daniel Westermann (DWE) (#5)

hackersdocs

Re: Wrong example in the bloom documentation

On Thu, Oct 8, 2020 at 06:12:47PM +0000, Daniel Westermann (DWE) wrote:

Hi Bruce,

On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:

Hi,

as this does not get any attention on the docs-list, once again here.
Any thoughts on this?

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in
10.

OK, so the patch should be applied only to PG 10 and later?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Daniel Westermann (DWE) (#5)

hackersdocs

Re: Wrong example in the bloom documentation

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

regards, tom lane

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Tom Lane (#7)

hackersdocs

Re: Wrong example in the bloom documentation

On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

I think we should just go for simple application cases for this.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

almost 6 years ago

In reply to: Bruce Momjian (#8)

hackersdocs

Re: Wrong example in the bloom documentation

Hi Bruce, Tom,

On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.

Regards
Daniel

#10

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Daniel Westermann (DWE) (#9)

hackersdocs

Re: Wrong example in the bloom documentation

On Fri, Oct 9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:

Hi Bruce, Tom,

On Thu, Octï¿½ 8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.

Agreed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#11

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Bruce Momjian (#10)

hackersdocs

Re: Wrong example in the bloom documentation

On Fri, Oct 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:

On Fri, Oct 9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:

Hi Bruce, Tom,

On Thu, Octï¿½ 8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.

Agreed.

This is not applying to PG 12 or earlier because the patch mentions JIT,
which was only mentioned in the PG bloom docs in PG 13+.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#12

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

almost 6 years ago

In reply to: Bruce Momjian (#11)

hackersdocs

Re: Wrong example in the bloom documentation

On Fri, Oct 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:

On Fri, Oct 9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:

Hi Bruce, Tom,

On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote:

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions?
Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwise the whole example needs to be rewritten.

Agreed.

This is not applying to PG 12 or earlier because the patch mentions JIT,
which was only mentioned in the PG bloom docs in PG 13+.

Does that mean we need separate patches for each release starting with 10?
As I am not frequently writing patches, I would need some help here.

Regards
Daniel

#13

Bruce Momjian

bruce@momjian.us

over 5 years ago

In reply to: Daniel Westermann (DWE) (#12)

hackersdocs

Re: Wrong example in the bloom documentation

On Sat, Oct 17, 2020 at 01:50:26PM +0000, Daniel Westermann (DWE) wrote:

On Fri, Octï¿½ 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:

This is not applying to PG 12 or earlier because the patch mentions JIT,
which was only mentioned in the PG bloom docs in PG 13+.

Does that mean we need separate patches for each release starting with 10?
As I am not frequently writing patches, I would need some help here.

I can regenerate the output for older versions using your patch.
However, I am confused about the parallelism you are seeing. Your patch
shows:

Without the two indexes being created, a parallel sequential scan will happen for the query below:
-------------------
<programlisting>
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on tbloom (cost=0.00..214.00 rows=1 width=24) (actual time=2.729..2.731 rows=0 loops=1)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 10000
Planning Time: 0.257 ms
Execution Time: 2.764 ms
(5 rows)

However, I don't see any parallelism in this output. Also, I don't see
any parallelism once the indexes are created. What PG version is this?
and what config settings did you use? Thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#14

Bruce Momjian

bruce@momjian.us

over 5 years ago

In reply to: Bruce Momjian (#13)

hackersdocs

Re: Wrong example in the bloom documentation

On Mon, Oct 26, 2020 at 05:04:09PM -0400, Bruce Momjian wrote:

On Sat, Oct 17, 2020 at 01:50:26PM +0000, Daniel Westermann (DWE) wrote:

On Fri, Octï¿½ 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:

This is not applying to PG 12 or earlier because the patch mentions JIT,
which was only mentioned in the PG bloom docs in PG 13+.

Does that mean we need separate patches for each release starting with 10?
As I am not frequently writing patches, I would need some help here.

I can regenerate the output for older versions using your patch.
However, I am confused about the parallelism you are seeing. Your patch
shows:

Without the two indexes being created, a parallel sequential scan will happen for the query below:
-------------------
<programlisting>
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on tbloom (cost=0.00..214.00 rows=1 width=24) (actual time=2.729..2.731 rows=0 loops=1)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 10000
Planning Time: 0.257 ms
Execution Time: 2.764 ms
(5 rows)

However, I don't see any parallelism in this output. Also, I don't see
any parallelism once the indexes are created. What PG version is this?
and what config settings did you use? Thanks.

I figured it out --- you have to use the larger generate_series value to
get the parallel output. I have adjusted all the docs back to 9.6 to
show accurate output for that version, and simplified the query
ordering --- patch to master attached. The other releases are similar.
Daniel, please let me know if I have left out any details.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#15

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

over 5 years ago

In reply to: Bruce Momjian (#14)

hackersdocs

Re: Wrong example in the bloom documentation

I figured it out --- you have to use the larger generate_series value to
get the parallel output. I have adjusted all the docs back to 9.6 to
show accurate output for that version, and simplified the query
ordering --- patch to master attached. The other releases are similar.
Daniel, please let me know if I have left out any details.

Thanks for your support on this. Looks good to me.

Regards
Daniel

Wrong example in the bloom documentation

Attachments:

Attachments:

Attachments: