empty string causes planner to avoid index. Makes me sad.

Started by Jeff Amielover 16 years ago9 messagesgeneral
Jump to latest
#1Jeff Amiel
becauseimjeff@yahoo.com

PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)

CREATE TABLE items
(
field1 character(9) NOT NULL,
field2 character varying(17) NOT NULL
};

CREATE INDEX "field1-field2"
ON items
USING btree
(field1, field2);

About 15 million rows in the items table.

explain select count(*) from items where field1 = '102100400' and field2 = '';

Aggregate (cost=231884.57..231884.57 rows=1 width=0)
-> Bitmap Heap Scan on items (cost=4286.53..231841.95 rows=170468 width=0)
Recheck Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ''::text))
-> Bitmap Index Scan on "field1-field2-check" (cost=0.00..4282.27 rows=170468 width=0)
Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ''::text))

explain select count(*) from items where field1 = '102100400' and field2 = ' '; /*17 spaces*/

Aggregate (cost=34.83..34.83 rows=1 width=0)
-> Index Scan using "field1-field2" on items (cost=0.00..34.82 rows=18 width=0)
Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ' '::text))

If I have any value in field2 other than an empty string '' (like '1' or 'space'), it will use the index.
It appears that somehow the empty string is causing the planner to abandon the index.

Can I get any insights into this?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Amiel (#1)
Re: empty string causes planner to avoid index. Makes me sad.

Jeff Amiel <becauseimjeff@yahoo.com> writes:

It appears that somehow the empty string is causing the planner to abandon the index.

You didn't actually show us such a case...

regards, tom lane

#3Jeff Amiel
becauseimjeff@yahoo.com
In reply to: Tom Lane (#2)
Re: empty string causes planner to avoid index. Makes me sad.

hmm...ok...planner is not using the index effectively (as effectively as when a non-empty value is passed in)

--- On Fri, 11/27/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text

From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [GENERAL] empty string causes planner to avoid index. Makes me sad.
To: "Jeff Amiel" <becauseimjeff@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Friday, November 27, 2009, 3:14 PM
Jeff Amiel <becauseimjeff@yahoo.com>
writes:

It appears that somehow the empty string is causing

the planner to abandon the index.

You didn't actually show us such a case...

           
regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Amiel (#3)
Re: empty string causes planner to avoid index. Makes me sad.

Jeff Amiel <becauseimjeff@yahoo.com> writes:

hmm...ok...planner is not using the index effectively (as effectively as when a non-empty value is passed in)

You didn't show us any evidence of that, either. Both of your test
cases are using the index.

regards, tom lane

#5Jeff Amiel
becauseimjeff@yahoo.com
In reply to: Tom Lane (#4)
Re: empty string causes planner to avoid index. Makes me sad.
--- On Fri, 11/27/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You didn't show us any evidence of that, either.  Both
of your test
cases are using the index.

Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not. Wouldn't seem that the planner is using the index effectively.

Aggregate (cost=231884.57..231884.57 rows=1 width=0)

versus

Aggregate (cost=34.83..34.83 rows=1 width=0)

By

#6Guillaume Lelarge
guillaume@lelarge.info
In reply to: Jeff Amiel (#5)
Re: empty string causes planner to avoid index. Makes me sad.

Le vendredi 27 novembre 2009 à 23:32:14, Jeff Amiel a écrit :

--- On Fri, 11/27/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You didn't show us any evidence of that, either. Both
of your test
cases are using the index.

Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY
higher than when not. Wouldn't seem that the planner is using the index
effectively.

Aggregate (cost=231884.57..231884.57 rows=1 width=0)

versus

Aggregate (cost=34.83..34.83 rows=1 width=0)

But in the first example (the empty string one), it fetched 170468 rows from
the index, and in the second one (the 17-spaces string), it fetched only 18
rows. It seems quite normal that the first one is costier then the second one.

--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com

#7Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Jeff Amiel (#5)
Re: empty string causes planner to avoid index. Makes me sad.

On Fri, 27 Nov 2009, Jeff Amiel wrote:

--- On Fri, 11/27/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You didn't show us any evidence of that, either.� Both
of your test
cases are using the index.

Ok...third try. The cost when passing in an empty string is
SIGNIFICANTLY higher than when not. Wouldn't seem that the planner is
using the index effectively.

But it's also estimating that it's aggregating over around 10000 times as
many rows presumably because it thinks empty string is alot more common.
That might not be the case in the actual data, but the estimated
difference is the likely cause of the plan differences. What are the
actual runtimes and rowcounts for the queries with different values you're
trying? Explain analyze output would be useful for that.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Amiel (#5)
Re: empty string causes planner to avoid index. Makes me sad.

Jeff Amiel <becauseimjeff@yahoo.com> writes:

Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not.

That just reflects the fact that it's expecting a lot more rows matching
that query. I suppose this is because the statistics show you've got a
lot more rows containing the empty string than other values.

regards, tom lane

#9Steve Crawford
scrawford@pinpointresearch.com
In reply to: Tom Lane (#8)
Re: empty string causes planner to avoid index. Makes me sad.

Tom Lane wrote:

Jeff Amiel <becauseimjeff@yahoo.com> writes:

Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not.

That just reflects the fact that it's expecting a lot more rows matching
that query. I suppose this is because the statistics show you've got a
lot more rows containing the empty string than other values.

If you believe the statistics the planner is using are not a useful
approximation of your data, you can try raising the
default_statistics_target. IIRC, it was 10 in that version of PG but has
been raised to 100 in the latest version as the improvement due to
additional data available to the planner seems to typically outweigh the
overhead of collecting/storing/processing the additional stats.

Also, are you sure that the table is being analyzed either by autovacuum
or manually (if you analyze your table, does the explain change
significantly?).

Cheers,
Steve