Timestamp conversion can't use index

Started by Bruce Momjianabout 24 years ago10 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

Someone reported to me that they can't get their queries to use indexes.
It turns out this is because timestamp() has pg_proc.proiscachable set
to false in many cases. Date() also has this in some cases.

I realized timestamp() can be called with 'CURRENT_TIMESTAMP', which of
course is not cachable, but when called with a real date, it seems it
would be cachable. However, I seem to remember that the timezone
setting can effect the output, and therefore it isn't cachable, or
something like that.

While the actual conversion call it very minor, there is code in
backend/optimizer/utils/clauses::simplify_op_or_func() that has:

if (!proiscachable)
return NULL;

This prevents index usage for non-cachable functions, as shown below.

The first only does only a date() conversion, the second adds an
interval, which results in a timestamp() conversion. Notice this uses a
sequential scan. The final one avoids timestamp but just adding '1' to
the date value:

test=> EXPLAIN SELECT * FROM test WHERE x = DATE('2001-01-01');
NOTICE: QUERY PLAN:

Index Scan USING i_test ON test (cost=0.00..3.01 ROWS=1 width=208)

EXPLAIN
test=> EXPLAIN SELECT * FROM test WHERE x = DATE('2001-01-01') +
INTERVAL '1 DAY';
NOTICE: QUERY PLAN:

Seq Scan ON test (cost=0.00..26.00 ROWS=5 width=208)

EXPLAIN
test=> EXPLAIN SELECT * FROM test WHERE x = DATE('2001-01-01') + 1;
NOTICE: QUERY PLAN:

Index Scan USING i_test ON test (cost=0.00..3.01 ROWS=1 width=208)

EXPLAIN

Can someone explain the rational between which timestamp/date calls are
cachable and which are not, and whether the cachablility really relates
to index usage or is this just a problem with our having only one
cachable setting for each function? I would like to understand this so
I can formulate a TODO item to document it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#1)
Re: Timestamp conversion can't use index

Someone reported to me that they can't get their queries to use indexes.
It turns out this is because timestamp() has pg_proc.proiscachable set
to false in many cases. Date() also has this in some cases.

Please let me add a reference to this email from Tom Lane:

http://fts.postgresql.org/db/mw/msg.html?mid=1041918

It specifically states:

[More complete] reasonable [cachable] definitions would be:

1. noncachable: must be called every time; not guaranteed to return same
result for same parameters even within a query. random(), timeofday(),
nextval() are examples.

2. fully cachable: function guarantees same result for same parameters
no matter when invoked. This setting allows a call with constant
parameters to be constant-folded on sight.

3. query cachable: function guarantees same result for same parameters
within a single query, or more precisely within a single
CommandCounterIncrement interval. This corresponds to the actual
behavior of functions that execute SELECTs, and it's sufficiently strong
to allow the function result to be used in an indexscan, which is what
we really care about.

Item #2 clearly mentions constant folding, I assume by the optimizer.
What has me confused is why constant folding is needed to perform index
lookups. Can't the executor call the function and then do the index
lookup? Is this just a failing in our executor? Is there a reason
#1-type noncachable functions can't use indexes? Is the timezone
related here?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Thomas Lockhart
lockhart@fourpalms.org
In reply to: Bruce Momjian (#2)
Re: Timestamp conversion can't use index

Someone reported to me that they can't get their queries to use indexes.
It turns out this is because timestamp() has pg_proc.proiscachable set
to false in many cases. Date() also has this in some cases.

Please let me add a reference to this email from Tom Lane:

The functions marked as non-cachable are those that are converting from
data types (such as text for which the input may need to be evaluated
for (at least) that transaction.

What kind of queries against constants are they doing that can't use
SQL-standard syntax to avoid a conversion from another data type?

timestamp('stringy time')

may not be good, but I would think that

timestamp 'timey time'

should let the optimizer use indices just fine. It *could* do some more
constant folding if we had a distinction between functions with
indeterminate side effects (e.g. random()) as opposed to those who just
need to be evaluated once per transaction (say, date/time conversion
functions needing the time zone evaluated).

- Thomas

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Timestamp conversion can't use index

Bruce Momjian <pgman@candle.pha.pa.us> writes:

What has me confused is why constant folding is needed to perform index
lookups.

You are confused because those aren't related.

The entire notion of an indexscan is predicated on the assumption that
you are comparing all elements of the index to the same comparison
value. Thus for example "x = random()" is not indexable. To use an
indexscan the query planner must be able to determine that the right
hand side will not change over the course of the scan.

Constant-folding requires a stronger assumption: that the result the
function gives when evaluated by the query planner will be the same
result we'd get later (perhaps much later) at execution time.

Since we only have one kind of noncachable function at the moment,
these two restrictions are conflated ... but there should be more than
one kind of noncachable function.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#3)
Re: Timestamp conversion can't use index

Thomas Lockhart <lockhart@fourpalms.org> writes:

timestamp('stringy time')
may not be good, but I would think that
timestamp 'timey time'
should let the optimizer use indices just fine.

Yup. Possibly this should be noted in the FAQ?

Actually,
timestamp('stringy time')
doesn't work at all anymore in 7.2, unless you doublequote the name:

regression=# select timestamp('now');
ERROR: parser: parse error at or near "'"
regression=# select "timestamp"('now');
timestamp
----------------------------
2001-12-26 12:18:07.008337
(1 row)

Another interesting factoid is that "timestamp"('now') does indeed
produce a constant in 7.2, not a runtime evaluation of text_timestamp.
text_timestamp is still considered noncachable, but the expression is
considered to represent timestamp 'now' and not a call of text_timestamp,
presumably because of this change:

2001-10-04 18:06 tgl

* doc/src/sgml/typeconv.sgml, src/backend/commands/indexcmds.c,
src/backend/parser/parse_func.c, src/include/parser/parse_func.h:
Consider interpreting a function call as a trivial
(binary-compatible) type coercion after failing to find an exact
match in pg_proc, but before considering interpretations that
involve a function call with one or more argument type coercions.
This avoids surprises wherein what looks like a type coercion is
interpreted as coercing to some third type and then to the
destination type, as in Dave Blasby's bug report of 3-Oct-01. See
subsequent discussion in pghackers.

So there's more here than meets the eye, but the syntax change from
7.1 to 7.2 is definitely going to warrant a FAQ entry, IMHO.

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: Timestamp conversion can't use index

Based on Tom's comments and this email, I am adding this to the TODO
list:

* Add new pg_proc cachable settings to specify whether function can be
evaluated only once or once per query

---------------------------------------------------------------------------

Someone reported to me that they can't get their queries to use indexes.
It turns out this is because timestamp() has pg_proc.proiscachable set
to false in many cases. Date() also has this in some cases.

Please let me add a reference to this email from Tom Lane:

http://fts.postgresql.org/db/mw/msg.html?mid=1041918

It specifically states:

[More complete] reasonable [cachable] definitions would be:

1. noncachable: must be called every time; not guaranteed to return same
result for same parameters even within a query. random(), timeofday(),
nextval() are examples.

2. fully cachable: function guarantees same result for same parameters
no matter when invoked. This setting allows a call with constant
parameters to be constant-folded on sight.

3. query cachable: function guarantees same result for same parameters
within a single query, or more precisely within a single
CommandCounterIncrement interval. This corresponds to the actual
behavior of functions that execute SELECTs, and it's sufficiently strong
to allow the function result to be used in an indexscan, which is what
we really care about.

Item #2 clearly mentions constant folding, I assume by the optimizer.
What has me confused is why constant folding is needed to perform index
lookups. Can't the executor call the function and then do the index
lookup? Is this just a failing in our executor? Is there a reason
#1-type noncachable functions can't use indexes? Is the timezone
related here?

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Thomas Lockhart (#3)
Re: Timestamp conversion can't use index

Someone reported to me that they can't get their queries to use indexes.
It turns out this is because timestamp() has pg_proc.proiscachable set
to false in many cases. Date() also has this in some cases.

Please let me add a reference to this email from Tom Lane:

The functions marked as non-cachable are those that are converting from
data types (such as text for which the input may need to be evaluated
for (at least) that transaction.

What kind of queries against constants are they doing that can't use
SQL-standard syntax to avoid a conversion from another data type?

They are doing trying to add one day to a date field:

test=> EXPLAIN SELECT * FROM test WHERE x = DATE('2001-01-01');
NOTICE: QUERY PLAN:

Index Scan USING i_test ON test (cost=0.00..3.01 ROWS=1 width=208)

EXPLAIN
test=> EXPLAIN SELECT * FROM test WHERE x = DATE('2001-01-01') +
INTERVAL '1 DAY';
NOTICE: QUERY PLAN:

Seq Scan ON test (cost=0.00..26.00 ROWS=5 width=208)

^^^^^^^^

EXPLAIN
test=> EXPLAIN SELECT * FROM test WHERE x = DATE('2001-01-01') + 1;
NOTICE: QUERY PLAN:

Index Scan USING i_test ON test (cost=0.00..3.01 ROWS=1 width=208)

EXPLAIN

Seems it is an operator that returns a timestamp.

timestamp('stringy time')

may not be good, but I would think that

timestamp 'timey time'

should let the optimizer use indices just fine. It *could* do some more
constant folding if we had a distinction between functions with
indeterminate side effects (e.g. random()) as opposed to those who just
need to be evaluated once per transaction (say, date/time conversion
functions needing the time zone evaluated).

I have added this to the TODO list.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: Timestamp conversion can't use index

Thomas Lockhart <lockhart@fourpalms.org> writes:

timestamp('stringy time')
may not be good, but I would think that
timestamp 'timey time'
should let the optimizer use indices just fine.

Yup. Possibly this should be noted in the FAQ?

Actually,
timestamp('stringy time')
doesn't work at all anymore in 7.2, unless you doublequote the name:

regression=# select timestamp('now');
ERROR: parser: parse error at or near "'"
regression=# select "timestamp"('now');
timestamp
----------------------------
2001-12-26 12:18:07.008337
(1 row)

I have updated HISTORY and release.sgml Migration sections:

* The timestamp() function is no longer available. Use timestamp
"string" instead, or CAST.

Another interesting factoid is that "timestamp"('now') does indeed
produce a constant in 7.2, not a runtime evaluation of text_timestamp.
text_timestamp is still considered noncachable, but the expression is
considered to represent timestamp 'now' and not a call of text_timestamp,
presumably because of this change:

2001-10-04 18:06 tgl

* doc/src/sgml/typeconv.sgml, src/backend/commands/indexcmds.c,
src/backend/parser/parse_func.c, src/include/parser/parse_func.h:
Consider interpreting a function call as a trivial
(binary-compatible) type coercion after failing to find an exact
match in pg_proc, but before considering interpretations that
involve a function call with one or more argument type coercions.
This avoids surprises wherein what looks like a type coercion is
interpreted as coercing to some third type and then to the
destination type, as in Dave Blasby's bug report of 3-Oct-01. See
subsequent discussion in pghackers.

So there's more here than meets the eye, but the syntax change from
7.1 to 7.2 is definitely going to warrant a FAQ entry, IMHO.

Added to same files:

datatype(const,...) function calls now evaluated earlier

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: Timestamp conversion can't use index

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Added to same files:
datatype(const,...) function calls now evaluated earlier

This is quite wrong, since (a) the change only applies to single-
argument function calls (so, no "..."), (b) the call is not
evaluated "earlier", but "differently", and (c) it doesn't only
apply to constant arguments.

Not sure that I can come up with a one-liner definition of this change,
but the above definitely doesn't do the job.

We already have

Modify type coersion logic to attempt binary-compatible functions first (Tom)

and I'm not sure there is a better one-liner for it.

regards, tom lane

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#9)
Re: Timestamp conversion can't use index

OK, new text:

Some datatype() function calls now evaluated differently

At least it is a warning.

---------------------------------------------------------------------------

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Added to same files:
datatype(const,...) function calls now evaluated earlier

This is quite wrong, since (a) the change only applies to single-
argument function calls (so, no "..."), (b) the call is not
evaluated "earlier", but "differently", and (c) it doesn't only
apply to constant arguments.

Not sure that I can come up with a one-liner definition of this change,
but the above definitely doesn't do the job.

We already have

Modify type coersion logic to attempt binary-compatible functions first (Tom)

and I'm not sure there is a better one-liner for it.

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026