indexes and floats

Started by Vince Vielhaberover 27 years ago32 messageshackers

vev@michvhf.com

over 27 years ago

Is this a bug that the index doesn't work on floats or is it a datatype
mismatch thing?

-----

template1=> create table foo (x float4, y float4);
CREATE
template1=> create index foox on foo(x);
CREATE
template1=> create index fooy on foo(y);
CREATE
template1=> insert into foo values(123.45,678.90);
INSERT 239376 1
template1=> insert into foo values(123.25,678.95);
INSERT 239377 1
template1=> explain select * from foo where x = 123.4;
NOTICE: QUERY PLAN:

Seq Scan on foo (cost=0.00 size=0 width=8)

EXPLAIN
template1=>

-----

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null
# include <std/disclaimers.h> TEAM-OS2
Online Searchable Campground Listings http://www.camping-usa.com
"There is no outfit less entitled to lecture me about bloat
than the federal government" -- Tony Snow
==========================================================================

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Vince Vielhaber (#1)

Re: [HACKERS] indexes and floats

Is this a bug that the index doesn't work on floats or is it a
datatype mismatch thing?
Seq Scan on foo (cost=0.00 size=0 width=8)

Likely neither. Notice the cost of a sequential scan; it is zero. Why
bother doing an index scan?

You need at least two more things to happen to provoke Postgres into
using your index:

1) Have more entries in the table; rule of thumb is O(100) before an
index will be used.

2) Run vacuum after getting that many entries to make sure the optimizer
will know about the configuration of the tables and indices.

Good luck.

- Tom

Vince Vielhaber

vev@michvhf.com

over 27 years ago

In reply to: Thomas Lockhart (#2)

Re: [HACKERS] indexes and floats

On 03-Aug-98 Thomas G. Lockhart wrote:

Is this a bug that the index doesn't work on floats or is it a
datatype mismatch thing?
Seq Scan on foo (cost=0.00 size=0 width=8)

Likely neither. Notice the cost of a sequential scan; it is zero. Why
bother doing an index scan?

You need at least two more things to happen to provoke Postgres into
using your index:

1) Have more entries in the table; rule of thumb is O(100) before an
index will be used.

2) Run vacuum after getting that many entries to make sure the optimizer
will know about the configuration of the tables and indices.

Here's why I'm bringing it up. I ran vacuum on the database yesterday.
The table consists of city, state (both text), lon and lat (both float4).
The text index is working but not the floats. I have another table with
12,000 - 14,000 rows and I'm getting the same thing.

-----

campsites=> explain select lon from locations where lon = 83.5;
NOTICE: QUERY PLAN:

Seq Scan on locations (cost=7263.30 size=2 width=4)

EXPLAIN
campsites=> select count(*) from locations;
count
------
169797
(1 row)

campsites=>

-----

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Vince Vielhaber (#3)

Re: [HACKERS] indexes and floats

Is this a bug that the index doesn't work on floats or is it a
datatype mismatch thing?
Seq Scan on foo (cost=0.00 size=0 width=8)

The table consists of city, state (both text), lon and lat (both
float4). The text index is working but not the floats. I have another
table with 12,000 - 14,000 rows and I'm getting the same thing.

Hmm. You are right. I'm glad you insisted on pursuing this, but I might
have been able to see the problem right away if you had described it in
detail (e.g. "floats" is not clearly float4) and if your initial example
had actually been with float4 data (it showed a width of 8 which is
likely to be a float8) :/

Anyway, the problem seems to be specific to float4. The workaround is to
use float8 for now, and we'll look into fixing the problem for the next
release...

- Tom

Test cases from the regression database follow:

regression=> create table test (f4 float4);
CREATE
regression=> insert into test select seqno from bt_f8_heap;
INSERT 0 10000
regression=> create index ti on test using btree (f4);
CREATE
regression=> vacuum;
VACUUM
regression=> explain select f4 from test where f4 = 500.0;
NOTICE: QUERY PLAN:

Seq Scan on test (cost=394.00 size=1 width=4)

EXPLAIN

regression=> create table test8 as select seqno from bt_f8_heap;
SELECT
regression=> create index t8 on test8 using btree (seqno);
CREATE
regression=> vacuum;
VACUUM
regression=> explain select seqno from test8 where seqno = 500.0;
NOTICE: QUERY PLAN:

Index Scan using t8 on test8 (cost=2.05 size=1 width=8)

EXPLAIN

Tom Lane

tgl@sss.pgh.pa.us

over 27 years ago

In reply to: Thomas Lockhart (#4)

Re: [HACKERS] indexes and floats

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

The table consists of city, state (both text), lon and lat (both
float4). The text index is working but not the floats. I have another
table with 12,000 - 14,000 rows and I'm getting the same thing.

Anyway, the problem seems to be specific to float4. The workaround is to
use float8 for now, and we'll look into fixing the problem for the next
release...

Ah-hah, all of a sudden this looks *real* familiar. I bet it's because
pgsql is not noticing that "500.0" can be interpreted as a float4.
Let's try it.

play=> create table f8 (x float8);
CREATE
play=> \copy f8 from countup; // 0..999 in a text file
Successfully copied.
play=> create index f8_i on f8 using btree (x);
CREATE
play=> create table f4 (x float4);
CREATE
play=> \copy f4 from countup;
Successfully copied.
play=> create index f4_i on f4 using btree (x);
CREATE
play=> vacuum verbose analyze;
.... blah blah
play=> explain select x from f8 where x = 500;
NOTICE: QUERY PLAN:

Seq Scan on f8 (cost=40.00 size=100 width=8)

EXPLAIN
play=> explain select x from f8 where x = 500.0;
NOTICE: QUERY PLAN:

Index Scan using f8_i on f8 (cost=2.05 size=1 width=8)

EXPLAIN
play=> explain select x from f4 where x = 500;
NOTICE: QUERY PLAN:

Seq Scan on f4 (cost=40.00 size=100 width=4)

EXPLAIN
play=> explain select x from f4 where x = 500.0;
NOTICE: QUERY PLAN:

Seq Scan on f4 (cost=40.00 size=1 width=4)

EXPLAIN
play=> explain select x from f4 where x = 500 :: float4;
NOTICE: QUERY PLAN:

Index Scan using f4_i on f4 (cost=2.05 size=1 width=4)

EXPLAIN
play=> explain select x from f4 where x = 500.0 :: float4;
ERROR: parser_typecast: cannot cast this expression to type 'float4'
play=>

(This is with cvs sources of a few days ago.)

I see two different bugs here:

1. select ... where x = constant; is optimized to an index scan
only if the constant is of the exact type of the field x.
Apparently, the coercion of the constant to match the type of x
happens only after the optimizer decides it doesn't know what to do.
The coercion ought to happen *before* the optimizer runs.
(I have no idea whether this is a new bug caused by the recent
type-system changes, or whether it existed in 6.3.2 and earlier.)

2. Type coercion fails for "500.0 :: float4" (ditto for "500.0 :: float8"
btw). Presumably this is a simple localized bug in the parser or the type
coercion logic for floats.

I had previously complained of bug #1 in connection with OIDs;
with the present sources, "where oid = 123456" will not use
an index on OID, while "where oid = 123456::oid" will.

I will bet lunch (at the nearest McD's, I'm not rich ;-)) that
Vince Vielhaber's recent gripe about
select city from locations where lower(city) = lower('st. ignace');
failing to use an index
create index locations_city on locations(lower(city) text_ops);
is an artifact of the same sort of type-mismatch problem.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon03Aug1998142838+000035C5C916.55BE0381@alumni.caltech.edu | Resolved by subject fallback

Bruce Momjian

bruce@momjian.us

over 27 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] indexes and floats

1. select ... where x = constant; is optimized to an index scan
only if the constant is of the exact type of the field x.
Apparently, the coercion of the constant to match the type of x
happens only after the optimizer decides it doesn't know what to do.
The coercion ought to happen *before* the optimizer runs.
(I have no idea whether this is a new bug caused by the recent
type-system changes, or whether it existed in 6.3.2 and earlier.)

I think the current sources Thomas has modified work better, perhaps?

2. Type coercion fails for "500.0 :: float4" (ditto for "500.0 :: float8"
btw). Presumably this is a simple localized bug in the parser or the type
coercion logic for floats.

I had previously complained of bug #1 in connection with OIDs;
with the present sources, "where oid = 123456" will not use
an index on OID, while "where oid = 123456::oid" will.

I will bet lunch (at the nearest McD's, I'm not rich ;-)) that
Vince Vielhaber's recent gripe about
select city from locations where lower(city) = lower('st. ignace');
failing to use an index
create index locations_city on locations(lower(city) text_ops);
is an artifact of the same sort of type-mismatch problem.

regards, tom lane

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Vadim Mikheev

vadim@krs.ru

over 27 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] indexes and floats

Tom Lane wrote:

I will bet lunch (at the nearest McD's, I'm not rich ;-)) that
Vince Vielhaber's recent gripe about
select city from locations where lower(city) = lower('st. ignace');
failing to use an index
create index locations_city on locations(lower(city) text_ops);
is an artifact of the same sort of type-mismatch problem.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No. This is the result of using lower('st. ignace') - function:
optimizer considers clause as usable for index only for
constants and parameters!
We discussed this ~ month ago. lower('st. ignace') could be
replaced by parameter of PARAM_EXEC type (implemented
for subqueries) to be 1. considered by optimizer as index key
value, 2. evaluated _ONCE_ by executor.
As I mentioned before, lower('st. ignace') will be evaluated
FOR EACH tuple in SeqScan!!!

PARAM_EXEC was implemented to handle queries like this:

select * from A where A.x = (select max(B.y) from B)

- subquery will be executed once and index on A (x) will be
used (if exists).

Optimizer just rewrites this query as
select * from A where A.x = _parameter_
and stores information about _parameter_ in InitPlan of
execution plan.

Look:

vac=> explain select * from test where x = lower('a');
NOTICE: QUERY PLAN:

Seq Scan on test (cost=40.00 size=100 width=12)
^^^^^^^^
EXPLAIN
vac=> explain select * from test where x = (select lower('a'));
NOTICE: QUERY PLAN:

Index Scan using itest2 on test (cost=2.05 size=1 width=12)
^^^^^^^^^^
InitPlan
-> Result (cost=0.00 size=0 width=0)

Nevertheless,

vac=> explain select * from test where lower(x) = (select lower('a'));
NOTICE: IndexSelectivity: no key -1 in index 20305
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NOTICE: QUERY PLAN:

Seq Scan on test (cost=40.00 size=100 width=12)
^^^^^^^^
InitPlan
-> Result (cost=0.00 size=0 width=0)

- something is broken for functional indices...

Vadim

Bruce Momjian

bruce@momjian.us

over 27 years ago

In reply to: Vadim Mikheev (#7)

Re: [HACKERS] indexes and floats

Ah-hah, all of a sudden this looks *real* familiar. I bet it's
because pgsql is not noticing that "500.0" can be interpreted as a
float4. Let's try it.

Oh, you have nailed it! This is interesting because (probably) a query
like

select f4 from t4 where f4 = 500.0;

is being automatically "upgraded" in the parser backend to

select f4 from t4 where float8(f4) = 500.0;

So, since there is no functional index float8(f4) on the table we cannot
use an existing index on f4 to advantage.

This may be a result of my recent enhancements to the automatic type
coersion code. But I'm a little suprised that v6.3.x doesn't just
complain about a type mismatch but instead actually works. It may be
that the old code which converted constants using intermediate strings
worked (sort of) for this case. In general, the pre-enhancement code
_only_ tried to convert constants, and complained about type mismatches
when non-constants were involved.

Yes, it did numeric constants, I think.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Import Notes

Reply to msg id not found: 35C69298.4EE5669@alumni.caltech.edu | Resolved by subject fallback

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] indexes and floats

Ah-hah, all of a sudden this looks *real* familiar. I bet it's
because pgsql is not noticing that "500.0" can be interpreted as a
float4. Let's try it.

Oh, you have nailed it! This is interesting because (probably) a query
like

select f4 from t4 where f4 = 500.0;

is being automatically "upgraded" in the parser backend to

select f4 from t4 where float8(f4) = 500.0;

So, since there is no functional index float8(f4) on the table we cannot
use an existing index on f4 to advantage.

This may be a result of my recent enhancements to the automatic type
coersion code. But I'm a little suprised that v6.3.x doesn't just
complain about a type mismatch but instead actually works. It may be
that the old code which converted constants using intermediate strings
worked (sort of) for this case. In general, the pre-enhancement code
_only_ tried to convert constants, and complained about type mismatches
when non-constants were involved.

So far in the examples, imho this (new?) feature isn't a _fatal_
problem, because you want to handle the following query correctly (I'll
switch to an int column to make it clearer):

select i4 from t4 where i4 < 500.1;

Now, if we do the "optimizable thing" blindly, then we would transform
this to

select i4 from t4 where i4 < int4(500.1);

But of course this would produce the wrong result if the table contains
a value of 500. Perhaps something a bit different could be implemented,
but it probably wouldn't generalize very well with the extensible type
system.

So, is there a problem to fix, or just documentation to write? I've
already written a new "Type Conversion" chapter for the User's Guide,
but haven't mentioned this specific issue.

- Tom

#10

Tom Lane

tgl@sss.pgh.pa.us

over 27 years ago

In reply to: Thomas Lockhart (#9)

Re: [HACKERS] indexes and floats

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

Oh, you have nailed it! This is interesting because (probably) a query
like
select f4 from t4 where f4 = 500.0;
is being automatically "upgraded" in the parser backend to
select f4 from t4 where float8(f4) = 500.0;
So, since there is no functional index float8(f4) on the table we cannot
use an existing index on f4 to advantage.

OK, that sounds plausible. But in my examples,

play=> explain select x from f8 where x = 500;
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

Your explanation implies that here, the parser is converting to
select x from f8 where int4(x) = 500;
which is wrong for the same accuracy-loss reasons you cite later.
(And if that isn't what it's doing, what then?)

I think it would be a good idea if someone actually dug into this
and verified what's going on. I have found some other cases that
lead me to think there's more to this than we understand just yet.
With an index on an int4 field, I get

tree=> explain select * from marketorderhistory where sequenceNo = 140000;
NOTICE: QUERY PLAN:
Index Scan using marketorderhistory_sequenceno_i on marketorderhistory
(cost=2.05 size=1 width=100)

tree=> explain select * from marketorderhistory where sequenceNo > 140000;
NOTICE: QUERY PLAN:
Seq Scan on marketorderhistory (cost=63.38 size=449 width=100)

which doesn't look like it could be explained by parser type coercions.
Perhaps this one just indicates an omission from the list of
type-specific routines that can be used for index comparisons? If so,
maybe there are other omissions affecting the results for other types.

... you want to handle the following query correctly (I'll
switch to an int column to make it clearer):
select i4 from t4 where i4 < 500.1;
Now, if we do the "optimizable thing" blindly, then we would transform
this to
select i4 from t4 where i4 < int4(500.1);
But of course this would produce the wrong result if the table contains
a value of 500. Perhaps something a bit different could be implemented,
but it probably wouldn't generalize very well with the extensible type
system.

That's a good point. Still, it would be nice if the system had some
reasonable amount of smarts about the "primitive" types that the parser
has constant syntax for. In particular I think an automatic coercion of
an int constant to float where needed would be a reasonable thing to
expect. That's not happening now, see my example above.

So, is there a problem to fix, or just documentation to write?

This one is most certainly a bug:

play=> select x from f4 where x = 500.0 :: float4;
ERROR: parser_typecast: cannot cast this expression to type 'float4'

Beyond that, if I can force the right thing to happen by casting
the constant to the type of the field, then I can live with it.
I have seen a number of cases where the system wouldn't use an index
even with a cast, however, so I'm not a happy camper yet.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofTue04Aug1998044824+000035C69298.4EE5669@alumni.caltech.edu | Resolved by subject fallback

#11

Vince Vielhaber

vev@michvhf.com

over 27 years ago

In reply to: Tom Lane (#10)

Re: [HACKERS] indexes and floats

On 04-Aug-98 Tom Lane wrote:

OK, that sounds plausible. But in my examples,

play=> explain select x from f8 where x = 500;
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

Your explanation implies that here, the parser is converting to
select x from f8 where int4(x) = 500;
which is wrong for the same accuracy-loss reasons you cite later.
(And if that isn't what it's doing, what then?)

I think it would be a good idea if someone actually dug into this
and verified what's going on. I have found some other cases that
lead me to think there's more to this than we understand just yet.
With an index on an int4 field, I get

What file(s) are these decisions actually made in?

#12

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Tom Lane (#10)

Re: [HACKERS] indexes and floats

Your explanation implies that here, the parser is converting to
select x from f8 where int4(x) = 500;
which is wrong for the same accuracy-loss reasons you cite later.
(And if that isn't what it's doing, what then?)

No, that isn't what it would do. The parser does know about a heirarchy
of built-in data types, and would not downgrade a float8 to an int4.
I've sent some e-mail about the new parser features, and have written
them up in doc/src/sgml/typeconv.sgml (but haven't generated a new html
version yet).

I think it would be a good idea if someone actually dug into this
and verified what's going on. I have found some other cases that
lead me to think there's more to this than we understand just yet.

I'm pretty sure that there is another thing happening, which is getting
in the way of using indices with your example using

where lowercase(col) = lowercase('const')

That's a good point. Still, it would be nice if the system had some
reasonable amount of smarts about the "primitive" types that the
parser has constant syntax for. In particular I think an automatic
coercion of an int constant to float where needed would be a
reasonable thing to expect. That's not happening now, see my example
above.

I agree with your points, but it already does exactly what you would
want. I don't see an example above illustrating this problem.

So, is there a problem to fix, or just documentation to write?

This one is most certainly a bug:

play=> select x from f4 where x = 500.0 :: float4;
ERROR: parser_typecast: cannot cast this expression to type 'float4'

Might be easy. I'll look at it... Oh, the current workaround is to
specify it as

where x = '500.0'::float4;

where x = float4(500.0);

Beyond that, if I can force the right thing to happen by casting
the constant to the type of the field, then I can live with it.
I have seen a number of cases where the system wouldn't use an index
even with a cast, however, so I'm not a happy camper yet.

Yeah, that was Vadim's example and he sez we need to add a bit of code
to get things working the way you'd expect.

- Tom

#13

Tom Lane

tgl@sss.pgh.pa.us

over 27 years ago

In reply to: Thomas Lockhart (#12)

Re: [HACKERS] indexes and floats

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

No, that isn't what it would do. The parser does know about a heirarchy
of built-in data types, and would not downgrade a float8 to an int4.

OK, that's good. But if it knows that much, I'd expect it to be able
to *upgrade* an int4 to a float8 where appropriate. That's not
happening, as in my prior example:

play=> explain select x from f8 where x = 500.0;
NOTICE: QUERY PLAN:
Index Scan using f8_i on f8 (cost=2.05 size=1 width=8)

play=> explain select x from f8 where x = 500;
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

The thing that I find peculiar about all this is that the system doesn't
reject the query completely, as you'd expect it to if there were really
a type mismatch problem. It takes it and just does it a lot slower than
it should. It seems the type coercion will happen, but too late for the
optimizer to notice.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofTue04Aug1998163754+000035C738E2.9C77CD5@alumni.caltech.edu | Resolved by subject fallback

#14

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Tom Lane (#13)

Re: [HACKERS] indexes and floats

The parser does know about a heirarchy
of built-in data types, and would not downgrade a float8 to an int4.

OK, that's good. But if it knows that much, I'd expect it to be able
to *upgrade* an int4 to a float8 where appropriate. That's not
happening

Ah, but it _is_ happening. The problem lies later. Read on...

, as in my prior example:

play=> explain select x from f8 where x = 500;
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

It seems the type coercion will happen, but too late for the
optimizer to notice.

This is the specific case discussed by Vadim, and is, I believe,
directly related to the

select city from locations where lower(city) = lower('st. ignace');

example given by Vince.

The parser is converting this query to become

select x from f8 where x = float8(500);

The problem appears to be that the optimizer/executor does not know how
to evaluate the constant string once-only, and insists on doing a
sequential scan for some reason. Not really in the parser's hands.
Remember, the types are all set and the functions are all defined by the
time the actual optimizer/executor sees anything. There may be a slight
bit of "optimization" which happens in the parser, but only to shoehorn
SQL92 into the Postgres backend. Oh, there are some heuristics in the
parser to fudge the distinctions between char/varchar/text types to
allow reuse of some of the support code, but that is a Bad Thing in
principle.

The downside to putting more heuristics into the parser (as opposed to
upgrading the optimizer/executor to handle the case) is that it embeds
more assumptions about the _behavior_ of, for example, int4->float8
conversions and would reduce the flexibility of the extensible type
system.

istm that we should be focusing on Vadim's hints on what it would take
to use indices with function calls on constants...

- Tom

#15

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Tom Lane (#13)

Re: [HACKERS] indexes and floats

istm that we should be focusing on Vadim's hints on what it would take
to use indices with function calls on constants...

Looking at Vadim's note again, maybe it will be the parser's duty to
insert the PARAM_EXEC node; will need more details or some time to look
at it...

- Tom

#16

Tom Lane

tgl@sss.pgh.pa.us

over 27 years ago

In reply to: Thomas Lockhart (#15)

Re: [HACKERS] indexes and floats

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

The parser is converting this query to become
select x from f8 where x = float8(500);

The problem appears to be that the optimizer/executor does not know how
to evaluate the constant string once-only, and insists on doing a
sequential scan for some reason.

Ah, it's finally starting to make some sense to me. What you're saying
is that this is a failure to do constant-folding.

Doing a sequential scan would be appropriate if the righthand side of
the comparison needed to be evaluated afresh at each row. If the
optimizer thinks that, then that explains a lot.

The question then is why the righthand side doesn't look like a
constant. I'd have expected that any expression not involving a table
attribute would be evaluated once (folded into a constant) before any
decisions are made on how to perform the scan. Is that reasonable, or
is there some aspect of SQL semantics that makes it wrong?

If it is supposed to be happening, could it be that float8() is for
some reason not marked as a safely foldable function?

While I'm asking dumb questions: are "float8(500)" and "500::float8"
treated differently? Actually, I can see that they are:

play=> explain select x from f8 where x = 500::float8;
NOTICE: QUERY PLAN:
Index Scan using f8_i on f8 (cost=2.05 size=1 width=8)

play=> explain select x from f8 where x = float8(500);
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

But why? Is there a difference in semantics?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed05Aug1998152938+000035C87A62.C11A66EF@alumni.caltech.edu | Resolved by subject fallback

#17

Bruce Momjian

bruce@momjian.us

over 27 years ago

In reply to: Thomas Lockhart (#14)

Re: [HACKERS] indexes and floats

The problem appears to be that the optimizer/executor does not know how
to evaluate the constant string once-only, and insists on doing a
sequential scan for some reason. Not really in the parser's hands.
Remember, the types are all set and the functions are all defined by the
time the actual optimizer/executor sees anything. There may be a slight
bit of "optimization" which happens in the parser, but only to shoehorn
SQL92 into the Postgres backend. Oh, there are some heuristics in the
parser to fudge the distinctions between char/varchar/text types to
allow reuse of some of the support code, but that is a Bad Thing in
principle.

The downside to putting more heuristics into the parser (as opposed to
upgrading the optimizer/executor to handle the case) is that it embeds
more assumptions about the _behavior_ of, for example, int4->float8
conversions and would reduce the flexibility of the extensible type
system.

istm that we should be focusing on Vadim's hints on what it would take
to use indices with function calls on constants...

I understand the problem now. There is code in the optimizer to handle
functions on constants, but it does not appear to be in a place where
index matching can use it.

I will be looking over the code in the next few weeks, but not sure if I
can figure it all out by then. I have added it to my personal TODO
list.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

#18

Bruce Momjian

bruce@momjian.us

over 27 years ago

In reply to: Tom Lane (#16)

Re: [HACKERS] indexes and floats

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

The parser is converting this query to become
select x from f8 where x = float8(500);

The problem appears to be that the optimizer/executor does not know how
to evaluate the constant string once-only, and insists on doing a
sequential scan for some reason.

Ah, it's finally starting to make some sense to me. What you're saying
is that this is a failure to do constant-folding.

Yep. I believe it happens in the executor, but doesn't appear to happen
in the optimizer at a time when it would be useful. You can create
functional indexes, and I think they are matched by the function, just
not constants.

Doing a sequential scan would be appropriate if the righthand side of
the comparison needed to be evaluated afresh at each row. If the
optimizer thinks that, then that explains a lot.

Yep. I think that is the issue, and index matching does not
pre-evaluate a function on a constant.

The question then is why the righthand side doesn't look like a
constant. I'd have expected that any expression not involving a table
attribute would be evaluated once (folded into a constant) before any
decisions are made on how to perform the scan. Is that reasonable, or
is there some aspect of SQL semantics that makes it wrong?

If it is supposed to be happening, could it be that float8() is for
some reason not marked as a safely foldable function?

While I'm asking dumb questions: are "float8(500)" and "500::float8"
treated differently? Actually, I can see that they are:

play=> explain select x from f8 where x = 500::float8;
NOTICE: QUERY PLAN:
Index Scan using f8_i on f8 (cost=2.05 size=1 width=8)

play=> explain select x from f8 where x = float8(500);
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

But why? Is there a difference in semantics?

Sure. In the :: case (or CAST (const AS type)), the parser actually
converts the type INSIDE the parser to the proper type. In the float8()
case, the value conversion is delayed until the executor.

I may be wrong in some of this, but that is what I think is happening.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

#19

Jackson, DeJuan

djackson@cpsgroup.com

over 27 years ago

In reply to: Bruce Momjian (#18)

RE: [HACKERS] indexes and floats

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

The parser is converting this query to become
select x from f8 where x = float8(500);

The problem appears to be that the optimizer/executor does not

know how

to evaluate the constant string once-only, and insists on doing a
sequential scan for some reason.

Ah, it's finally starting to make some sense to me. What you're

saying

is that this is a failure to do constant-folding.

Yep. I believe it happens in the executor, but doesn't appear to
happen
in the optimizer at a time when it would be useful. You can create
functional indexes, and I think they are matched by the function, just
not constants.

Doing a sequential scan would be appropriate if the righthand side

of

the comparison needed to be evaluated afresh at each row. If the
optimizer thinks that, then that explains a lot.

Yep. I think that is the issue, and index matching does not
pre-evaluate a function on a constant.

The question then is why the righthand side doesn't look like a
constant. I'd have expected that any expression not involving a

table

attribute would be evaluated once (folded into a constant) before

any

decisions are made on how to perform the scan. Is that reasonable,

or

is there some aspect of SQL semantics that makes it wrong?

If it is supposed to be happening, could it be that float8() is for
some reason not marked as a safely foldable function?

While I'm asking dumb questions: are "float8(500)" and "500::float8"
treated differently? Actually, I can see that they are:

play=> explain select x from f8 where x = 500::float8;
NOTICE: QUERY PLAN:
Index Scan using f8_i on f8 (cost=2.05 size=1 width=8)

play=> explain select x from f8 where x = float8(500);
NOTICE: QUERY PLAN:
Seq Scan on f8 (cost=40.00 size=100 width=8)

But why? Is there a difference in semantics?

Sure. In the :: case (or CAST (const AS type)), the parser actually
converts the type INSIDE the parser to the proper type. In the
float8()
case, the value conversion is delayed until the executor.

I may be wrong in some of this, but that is what I think is happening.

This would be logical to me because we allow function creation.
Consider, if someone creates an SQL function on a table being updated
that returns a count of all rows who's index matches an int. Which
would mean that an insert using that function to select rows could need
to be evaluated for each row.
Well, I guess what I'm trying to say is that the current behavior could
be desired/being used by someone. It could also be used as a substitute
for a select trigger.
Of course as always I could be completely wrong.
-DEJ

Show quoted text

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania
19026
+  If your life is a hard drive,     |  (610) 353-9879(w)
+  Christ can be your backup.        |  (610) 853-3000(h)

Import Notes

Resolved by subject fallback

#20

Thomas Lockhart

lockhart@alumni.caltech.edu

over 27 years ago

In reply to: Jackson, DeJuan (#19)

Re: [HACKERS] indexes and floats

This would be logical to me because we allow function creation.
Consider, if someone creates an SQL function on a table being updated
that returns a count of all rows who's index matches an int. Which
would mean that an insert using that function to select rows could
need to be evaluated for each row.
Well, I guess what I'm trying to say is that the current behavior
could be desired/being used by someone.

As you point out, the current behavior allows functions with
side-effects to work, where if we change it then functions must be
side-effect-free. Most functions do not have side effects, though one
using the SPI interface for example might be prone to them.

We _could_ add a column attribute for each function which declares a
function as being "safe" or not.

- Tom

#21

Vadim Mikheev

vadim@krs.ru

over 27 years ago

In reply to: Tom Lane (#13)

#22

Bruce Momjian

bruce@momjian.us

over 27 years ago

In reply to: Vadim Mikheev (#21)