Re: ANALYZE versus expression indexes with nondefault opckeytype

Started by Kevin Grittnerover 15 years ago8 messages
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

Robert Haas 07/31/10 12:33 PM >>>

Tom Lane wrote:

Robert Haas writes:

I think this whole discussion is starting with the wrong premise.
This is not a bug fix; therefore, it's 9.1 material.

Failing to store stats isn't a bug?

Well, it kind of sounds more like you're removing a known
limitation than fixing a bug.

It's operating as designed and documented. There is room for
enhancement, but the only thing which could possibly justify this as
9.0 material is if there was a demonstrated performance regression in
9.0 for which this was the safest cure.

-Kevin

#2Stephen Frost
sfrost@snowman.net
In reply to: Kevin Grittner (#1)
Re: ANALYZE versus expression indexes with nondefault opckeytype

* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:

Robert Haas 07/31/10 12:33 PM >>>

Tom Lane wrote:

Failing to store stats isn't a bug?

Well, it kind of sounds more like you're removing a known
limitation than fixing a bug.

It's operating as designed and documented. There is room for
enhancement, but the only thing which could possibly justify this as
9.0 material is if there was a demonstrated performance regression in
9.0 for which this was the safest cure.

I have to disagree with this, to be honest. The fact that we've
documented what is completely unexpected and frustrating behaviour
doesn't mean we get to say it's not a bug. Not collecting stats, at
all, is a pretty bad bug, in my view. Stats are an important part of
the system which needs to work at least decently. Perhaps before it was
pretty rare that we'd have the situation described (before we brought in
tsearch2), but it's not any longer and we need to support it as we would
the other types. The only reason I'm against backpatching it to the
beginning is that it's either an ABI change or some rather grotty code,
and even then it wouldn't be hard to push me to accepting the grotty
code if we make the cleaner change for 9.0 and going forward, especially
as we have people in the wild being affected by it.

Certain other databases have done a very good job of documenting their
bugs and in some cases even calling them features. I'd rather we not go
down that path. I don't see the lack of stats collecting to be a simple
'limitation'.

Thanks,

Stephen

#3Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#2)
Re: ANALYZE versus expression indexes with nondefault opckeytype

On Sat, Jul 31, 2010 at 9:16 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:

Robert Haas  07/31/10 12:33 PM >>>

Tom Lane  wrote:

Failing to store stats isn't a bug?

Well, it kind of sounds more like you're removing a known
limitation than fixing a bug.

It's operating as designed and documented.  There is room for
enhancement, but the only thing which could possibly justify this as
9.0 material is if there was a demonstrated performance regression in
9.0 for which this was the safest cure.

I have to disagree with this, to be honest.  The fact that we've
documented what is completely unexpected and frustrating behaviour
doesn't mean we get to say it's not a bug.  Not collecting stats, at
all, is a pretty bad bug, in my view.

I guess I'd appreciate it if someone could explain in more detail in
what cases we fail to collect stats. Do we have a typanalyze function
here that can't possibly work for anything, ever? Or is it just some
subset of the cases?

(Apologies if this has been discussed on the original thread; I was
unable to find it in the archives.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: ANALYZE versus expression indexes with nondefault opckeytype

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jul 31, 2010 at 9:16 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:

Robert Haas �07/31/10 12:33 PM >>>

Tom Lane �wrote:

Failing to store stats isn't a bug?

Well, it kind of sounds more like you're removing a known
limitation than fixing a bug.

It's operating as designed and documented.

I have to disagree with this, to be honest. �The fact that we've
documented what is completely unexpected and frustrating behaviour
doesn't mean we get to say it's not a bug. �Not collecting stats, at
all, is a pretty bad bug, in my view.

I'm a bit bemused by the claim that this behavior is "documented". One
comment buried deep in the bowels of the source is not user-visible
documentation in my book.

I guess I'd appreciate it if someone could explain in more detail in
what cases we fail to collect stats. Do we have a typanalyze function
here that can't possibly work for anything, ever? Or is it just some
subset of the cases?

ANALYZE normally collects stats for any expression that there is an
expression index for. However, it will punt and fail to collect stats
if the expression index uses an opclass whose opckeytype (ie, storage
datatype) is different from the actual expression datatype. A quick
look into the system catalogs shows that that applies to these opclasses:

amname | opcname | opcintype | opckeytype
--------+------------------+-------------------------------+-----------------------------
btree | name_ops | name | cstring
gist | point_ops | point | box
gist | poly_ops | polygon | box
gist | circle_ops | circle | box
gin | _int4_ops | integer[] | integer
gin | _text_ops | text[] | text
gin | _abstime_ops | abstime[] | abstime
gin | _bit_ops | bit[] | bit
gin | _bool_ops | boolean[] | boolean
gin | _bpchar_ops | character[] | character
gin | _bytea_ops | bytea[] | bytea
gin | _char_ops | "char"[] | "char"
gin | _cidr_ops | cidr[] | cidr
gin | _date_ops | date[] | date
gin | _float4_ops | real[] | real
gin | _float8_ops | double precision[] | double precision
gin | _inet_ops | inet[] | inet
gin | _int2_ops | smallint[] | smallint
gin | _int8_ops | bigint[] | bigint
gin | _interval_ops | interval[] | interval
gin | _macaddr_ops | macaddr[] | macaddr
gin | _name_ops | name[] | name
gin | _numeric_ops | numeric[] | numeric
gin | _oid_ops | oid[] | oid
gin | _oidvector_ops | oidvector[] | oidvector
gin | _time_ops | time without time zone[] | time without time zone
gin | _timestamptz_ops | timestamp with time zone[] | timestamp with time zone
gin | _timetz_ops | time with time zone[] | time with time zone
gin | _varbit_ops | bit varying[] | bit varying
gin | _varchar_ops | character varying[] | character varying
gin | _timestamp_ops | timestamp without time zone[] | timestamp without time zone
gin | _money_ops | money[] | money
gin | _reltime_ops | reltime[] | reltime
gin | _tinterval_ops | tinterval[] | tinterval
gist | tsvector_ops | tsvector | gtsvector
gin | tsvector_ops | tsvector | text
gist | tsquery_ops | tsquery | bigint
(37 rows)

Now, of the above the only cases where we'd be likely to be able to do
anything very useful with stats on the expression value are the name
case, which isn't that exciting in practice, and the tsvector cases.
For tsvector it was only with 8.4 that we had non-toy stats code, so
while the limitation is ancient it's only recently that it started to be
meaningful.

I don't think this can be claimed to be a corner case. If you set up
an FTS index according to the first alternative offered in

http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

you will find that the system fails to collect stats for it and so you
get stupid default estimates for your FTS queries. If this were a
"documented" limitation I'd expect to see a big red warning there to
*not* do it that way. The only way that you actually get usable
tsvector stats at the moment is to explicitly store the tsvector as an
ordinary column, as in the second approach offered in the above
documentation section.

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: ANALYZE versus expression indexes with nondefault opckeytype

On Sat, Jul 31, 2010 at 11:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Now, of the above the only cases where we'd be likely to be able to do
anything very useful with stats on the expression value are the name
case, which isn't that exciting in practice, and the tsvector cases.
For tsvector it was only with 8.4 that we had non-toy stats code, so
while the limitation is ancient it's only recently that it started to be
meaningful.

I don't think this can be claimed to be a corner case.  If you set up
an FTS index according to the first alternative offered in

http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

you will find that the system fails to collect stats for it and so you
get stupid default estimates for your FTS queries.  If this were a
"documented" limitation I'd expect to see a big red warning there to
*not* do it that way.  The only way that you actually get usable
tsvector stats at the moment is to explicitly store the tsvector as an
ordinary column, as in the second approach offered in the above
documentation section.

Yeah, maybe you're right. But I'd still prefer to see us break the
ABI and do this just in 9.0 rather than changing 8.4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#1)

Tom Lane wrote:
Robert Haas writes:

I guess I'd appreciate it if someone could explain in more detail
in what cases we fail to collect stats.

[detailed description]

I don't think this can be claimed to be a corner case. If you set
up an FTS index according to the first alternative offered in

http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

you will find that the system fails to collect stats for it and so
you get stupid default estimates for your FTS queries.

Objection to a fix in 9.0 withdrawn. No opinion on backpatching.

-Kevin

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: ANALYZE versus expression indexes with nondefault opckeytype

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jul 31, 2010 at 11:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't think this can be claimed to be a corner case. �If you set up
an FTS index according to the first alternative offered in

http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

you will find that the system fails to collect stats for it and so you
get stupid default estimates for your FTS queries.

Yeah, maybe you're right. But I'd still prefer to see us break the
ABI and do this just in 9.0 rather than changing 8.4.

OK, I can live with that. I'll take a look at it shortly.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: ANALYZE versus expression indexes with nondefault opckeytype

I wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Yeah, maybe you're right. But I'd still prefer to see us break the
ABI and do this just in 9.0 rather than changing 8.4.

OK, I can live with that. I'll take a look at it shortly.

Proposed patch attached (compiles, untested as yet).

regards, tom lane