[PATCH] Enable min/max optimization for bool_and/bool_or/every
Hi list,
As discussed on the pgsql-general list, the bool_and() and bool_or()
aggregate functions behave exactly like min() and max() would over
booleans. While it's not likely that people would have an appropriate
index on a boolean column, it seems it wouldn't cost us anything to
take advantage of this optimization, as it requires no code changes at
all, simply value changes in the pg_aggregate catalog.
Before:
db=# explain analyze select bool_and(b) from bools;
Aggregate (cost=1693.01..1693.02 rows=1 width=1)
-> Seq Scan on bools (cost=0.00..1443.01 rows=100001 width=1)
Total runtime: 29.736 ms
After:
db=# explain analyze select bool_and(b) from bools;
Result (cost=0.03..0.04 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=1)
-> Index Scan using bools_b_idx on bools
(cost=0.00..3300.28 rows=100001 width=1)
Index Cond: (b IS NOT NULL)
Total runtime: 0.109 ms
Original discussion here:
http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=B4cJEcUE2-QCpb_ZdrNPgQNa8xKuA@mail.gmail.com
PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.
Regards,
Marti
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.
I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.
I forgot to attach it, sorry. Here it is.
Regards,
Marti
Attachments:
bool-minmax-optimization.patchtext/x-patch; charset=US-ASCII; name=bool-minmax-optimization.patchDownload
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 26966d2..29e262f 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -203,9 +203,9 @@ DATA(insert ( 2828 float8_regr_accum float8_covar_samp 0 1022 "{0,0,0,0,0,0}" )
DATA(insert ( 2829 float8_regr_accum float8_corr 0 1022 "{0,0,0,0,0,0}" ));
/* boolean-and and boolean-or */
-DATA(insert ( 2517 booland_statefunc - 0 16 _null_ ));
-DATA(insert ( 2518 boolor_statefunc - 0 16 _null_ ));
-DATA(insert ( 2519 booland_statefunc - 0 16 _null_ ));
+DATA(insert ( 2517 booland_statefunc - 58 16 _null_ ));
+DATA(insert ( 2518 boolor_statefunc - 59 16 _null_ ));
+DATA(insert ( 2519 booland_statefunc - 58 16 _null_ ));
/* bitwise integer */
DATA(insert ( 2236 int2and - 0 21 _null_ ));
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 51ab6e5..7d245d2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -774,16 +774,19 @@ WHERE a.aggfnoid = p.oid AND
(0 rows)
-- Cross-check aggsortop (if present) against pg_operator.
--- We expect to find only "<" for "min" and ">" for "max".
+-- We expect to find "<" for min/bool_and/every and ">" for max/bool_or
SELECT DISTINCT proname, oprname
FROM pg_operator AS o, pg_aggregate AS a, pg_proc AS p
WHERE a.aggfnoid = p.oid AND a.aggsortop = o.oid
ORDER BY 1;
- proname | oprname
----------+---------
- max | >
- min | <
-(2 rows)
+ proname | oprname
+----------+---------
+ bool_and | <
+ bool_or | >
+ every | <
+ max | >
+ min | <
+(5 rows)
-- Check datatypes match
SELECT a.aggfnoid::oid, o.oid
@@ -816,11 +819,14 @@ WHERE a.aggfnoid = p.oid AND a.aggsortop = o.oid AND
amopopr = o.oid AND
amopmethod = (SELECT oid FROM pg_am WHERE amname = 'btree')
ORDER BY 1, 2;
- proname | oprname | amopstrategy
----------+---------+--------------
- max | > | 5
- min | < | 1
-(2 rows)
+ proname | oprname | amopstrategy
+----------+---------+--------------
+ bool_and | < | 1
+ bool_or | > | 5
+ every | < | 1
+ max | > | 5
+ min | < | 1
+(5 rows)
-- Check that there are not aggregates with the same name and different
-- numbers of arguments. While not technically wrong, we have a project policy
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index e29148f..bcd8544 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -626,7 +626,7 @@ WHERE a.aggfnoid = p.oid AND
NOT binary_coercible(p.proargtypes[0], a.aggtranstype);
-- Cross-check aggsortop (if present) against pg_operator.
--- We expect to find only "<" for "min" and ">" for "max".
+-- We expect to find "<" for min/bool_and/every and ">" for max/bool_or
SELECT DISTINCT proname, oprname
FROM pg_operator AS o, pg_aggregate AS a, pg_proc AS p
On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.I forgot to attach it, sorry. Here it is.
Nice. It doesn't get much simpler than that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Marti Raudsepp <marti@juffo.org> writes:
On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.
I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.
I forgot to attach it, sorry. Here it is.
I applied this patch, since I was busy applying catalog changes from you
anyway ;-).
I did think of a possible reason to reject the patch: with this change,
the planner will take longer to plan queries involving bool_and() et al,
since planagg.c will spend time looking (usually fruitlessly) for an
index-based plan. I tried this simple test case:
create table t (f1 bool);
\timing
explain select bool_and(f1) from t;
Best-case timings for the EXPLAIN were about 0.480 ms without the patch
and 0.500 ms with, so about a 4% penalty. On more complicated queries
I think the fractional cost would be less. This seemed acceptable to
me, so I went ahead and applied the change, but if anyone wants to
argue about it now's the time.
regards, tom lane
On Wed, Feb 8, 2012 at 19:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I applied this patch, since I was busy applying catalog changes from you
anyway ;-).
Thanks :)
I did think of a possible reason to reject the patch: with this change,
the planner will take longer to plan queries involving bool_and() et al,
since planagg.c will spend time looking (usually fruitlessly) for an
index-based plan.
Good point, I should have done those measurements up front. Anyway,
since I've often noticed \timing to be unreliable for short queries, I
decided to retry your test with pgbench.
Long story short, I measured 27% overhead in the un-indexed column
case and 33% overhead for an indexed column. That's a lot more than I
expected. I even rebuilt and retried a few times to make sure I hadn't
botched something. The benchmark script is attached.
UNPATCHED
select bool_and(b) from unindexed;
tps = 13787.023113 (excluding connections establishing)
tps = 13880.484788 (excluding connections establishing)
tps = 13784.654542 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 12536.650703 (excluding connections establishing)
tps = 12647.767993 (excluding connections establishing)
tps = 12500.956407 (excluding connections establishing)
PATCHED
select bool_and(b) from unindexed;
tps = 10096.834678 (excluding connections establishing)
tps = 10110.182425 (excluding connections establishing)
tps = 10103.904500 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 8373.631407 (excluding connections establishing)
tps = 8659.917173 (excluding connections establishing)
tps = 8473.899896 (excluding connections establishing)
Regards,
Marti