Incorrect EXPLAIN ANALYZE output in bloom index docs
I was fixing up the patch in [1] with the intention of committing it
when I noticed that there are a few outdated EXPLAIN ANALYZE examples
in the documents for the bloom contrib module.
The example outputs look like they've been created with a 100 thousand
row table, but the commands given are to insert 10 million rows into
that table. I suspect someone did a 100x on the example row count at
some point during development and forgot to update the EXPLAIN output.
The patch I want to commit adds buffer outputs to these EXPLAINs, so I
kinda need to fix these before adding that, otherwise what I want to
add does not make any sense.
Patch attached. I propose to backpatch this fix.
David
Attachments:
v1-0001-Doc-fix-incorrect-EXPLAIN-ANALYZE-output-for-bloo.patchapplication/octet-stream; name=v1-0001-Doc-fix-incorrect-EXPLAIN-ANALYZE-output-for-bloo.patchDownload
From dbb12e9880dc28fc1d89a52ebd3d8e04bee5ca98 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 9 Dec 2024 20:30:22 +1300
Subject: [PATCH v1] Doc: fix incorrect EXPLAIN ANALYZE output for bloom
indexes
It looks like the example case was once modified to increase the number
of rows but the EXPLAIN ANALYZE output wasn't updated to reflect that.
---
doc/src/sgml/bloom.sgml | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 19f2b172cc..b6eeaf442a 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -118,11 +118,11 @@ SELECT 10000000
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------&zwsp;-----------------------------------
- Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=16.971..16.971 rows=0 loops=1)
+ Seq Scan on tbloom (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0 loops=1)
Filter: ((i2 = 898732) AND (i5 = 123451))
- Rows Removed by Filter: 100000
+ Rows Removed by Filter: 10000000
Planning Time: 0.346 ms
- Execution Time: 16.988 ms
+ Execution Time: 357.076 ms
(5 rows)
</programlisting>
</para>
@@ -136,16 +136,16 @@ CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
pg_size_pretty
----------------
- 3976 kB
+ 386 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------&zwsp;-----------------------------------
- Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.805..12.805 rows=0 loops=1)
+ Seq Scan on tbloom (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0 loops=1)
Filter: ((i2 = 898732) AND (i5 = 123451))
- Rows Removed by Filter: 100000
+ Rows Removed by Filter: 10000000
Planning Time: 0.138 ms
- Execution Time: 12.817 ms
+ Execution Time: 351.035 ms
(5 rows)
</programlisting>
</para>
@@ -159,19 +159,19 @@ CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
pg_size_pretty
----------------
- 1584 kB
+ 153 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------&zwsp;--------------------------------------------------
- Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.388..0.388 rows=0 loops=1)
+ Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0 loops=1)
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
- Rows Removed by Index Recheck: 29
- Heap Blocks: exact=28
- -> Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.356..0.356 rows=29 loops=1)
+ Rows Removed by Index Recheck: 2300
+ Heap Blocks: exact=2256
+ -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300 loops=1)
Index Cond: ((i2 = 898732) AND (i5 = 123451))
Planning Time: 0.099 ms
- Execution Time: 0.408 ms
+ Execution Time: 22.632 ms
(8 rows)
</programlisting>
</para>
@@ -197,15 +197,15 @@ CREATE INDEX
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------&zwsp;--------------------------------------------------------
- Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.028..0.029 rows=0 loops=1)
+ Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.032..0.033 rows=0 loops=1)
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
- -> BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
- -> Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.026..0.026 rows=0 loops=1)
+ -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)
+ -> Bitmap Index Scan on btreeidx5 (cost=0.00..4.52 rows=11 width=0) (actual time=0.026..0.026 rows=7 loops=1)
Index Cond: (i5 = 123451)
- -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed)
+ -> Bitmap Index Scan on btreeidx2 (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8 loops=1)
Index Cond: (i2 = 898732)
- Planning Time: 0.491 ms
- Execution Time: 0.055 ms
+ Planning Time: 0.264 ms
+ Execution Time: 0.047 ms
(9 rows)
</programlisting>
Although this query runs much faster than with either of the single
--
2.34.1
On Dec 9, 2024, at 15:53, David Rowley <dgrowleyml@gmail.com> wrote:
I was fixing up the patch in [1] with the intention of committing it
when I noticed that there are a few outdated EXPLAIN ANALYZE examples
in the documents for the bloom contrib module.The example outputs look like they've been created with a 100 thousand
row table, but the commands given are to insert 10 million rows into
that table. I suspect someone did a 100x on the example row count at
some point during development and forgot to update the EXPLAIN output.The patch I want to commit adds buffer outputs to these EXPLAINs, so I
kinda need to fix these before adding that, otherwise what I want to
add does not make any sense.Patch attached. I propose to backpatch this fix.
David
<v1-0001-Doc-fix-incorrect-EXPLAIN-ANALYZE-output-for-bloo.patch>
Most changes look good to me. Only two small comments here:
1. Why did the following part change from ‘never executed’ to execute. Why the previous state is `never executed`?
```
- -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed)
+ -> Bitmap Index Scan on btreeidx2 (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8 loops=1)
```
2. There is one sentence in the old one that says, `Although this query runs much faster than with either of the single indexes, we pay a penalty in index size. Each of the single-column btree indexes occupies 2 MB, so the total space needed is 12 MB, eight times the space used by the bloom index.` I think the size also needs to be changed.
- Yan
On Tue, 10 Dec 2024 at 14:33, Yan Chengpeng <chengpeng_yan@outlook.com> wrote:
Most changes look good to me. Only two small comments here:
1. Why did the following part change from ‘never executed’ to execute. Why the previous state is `never executed`? ``` - -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed) + -> Bitmap Index Scan on btreeidx2 (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8 loops=1) ```
This was me just aligning the output I observed with the documents. I
expect it previously stated "(never executed)" because the smaller
number of rows made it less likely that a matching row was found.
Since I ran the queries with 100x more rows than what the previous
output had shown, it's much more likely that one of the random numbers
generated during the INSERT matched the WHERE clause.
As for why the short-circuit was previously hit and execution was
skipped for that node, that's because of the following nodeBitmapAnd.c
code:
/*
* If at any stage we have a completely empty bitmap, we can fall out
* without evaluating the remaining subplans, since ANDing them can no
* longer change the result. (Note: the fact that indxpath.c orders
* the subplans by selectivity should make this case more likely to
* occur.)
*/
if (tbm_is_empty(result))
break;
i.e, if we find a BitmapAnd child with an empty bitmap result, there's
no point in executing any more children as (logically) it's guaranteed
that a bitwise-AND with the empty set results in an empty set.
2. There is one sentence in the old one that says, `Although this query runs much faster than with either of the single indexes, we pay a penalty in index size. Each of the single-column btree indexes occupies 2 MB, so the total space needed is 12 MB, eight times the space used by the bloom index.` I think the size also needs to be changed.
Thanks for checking. I adjusted this earlier and pushed the result
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=36d0229b8ff5907
David
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=36d0229b8ff5907