type coersion

Started by Bruce Momjianover 27 years ago32 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

Let me ask about type coersion.

When you have an int2 column called x, how do you place the conversion
functions when it is being compared to an in4 constant?

x = int2(500)

int4(x) = 500

The first is good for indexing, the second is not. If they are both
variables or both constants, the handling does not matter.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#2Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#1)
Re: type coersion (was OR clause status)

<I changed "x" to "i2" in the example for clarity>

... have an int2 column called i2, how do you place the conversion
functions when it is being compared to an in4 constant?
i2 = int2(500)
int4(i2) = 500
The first is good for indexing, the second is not. If they are both
variables or both constants, the handling does not matter.

Yes the handling does matter *in general*, since
while i4 = 500.1
cannot be evaluated as
while i4 = int4(500.1)
and get the right result.
Even for the types in your example,
while i2 = 4000000
should execute correctly, even though we both know that *for this case*
it doesn't make a lot of sense since the constant exceeds the range of
the column.

Actually, there are a lot of mixed-type comparison functions in pg_proc.
For example, there is a int24eq function which would be used for your
query outside of a "where clause". But, read on...

*slaps forehead*

It dawned on me while I was waking up this morning that we'd forgotten a
major bit of info about indexing. The pg_proc routines which are
declared for =, <, >, etc. are _not_ directly used to access indices! It
is of course the pg_am, pg_amop, and pg_amproc tables which are used for
this.

Here are examples from v6.3.2:

regression=> explain select * from tenk1 where unique1 = 3000;
Index Scan on tenk1 (cost=2.05 size=1 width=100)
regression=> explain select * from tenk1 where unique1 = 3000+1;
Seq Scan on tenk1 (cost=512.00 size=1000 width=100)
regression=> explain select * from tenk1 where unique1 = int4(3000.1);
Seq Scan on tenk1 (cost=512.00 size=1000 width=100)

The *only case* I've noticed so far which does better in v6.3.2 than
"v6.4today" (not yet v6.4alpha :) is the one involving OIDs:
regression=> explain select * from tenk1 where oid = 3000;
Index Scan on tenk1 (cost=2.05 size=1 width=100)

And remember that there are a lot of cases which do better in v6.4today
than in earlier versions.

Let's try to figure out how to get constant expressions using indices,
rather than just patching to get mismatched constant values using them.
And even for that I should be able to fix up the "binary compatible"
cases such as OID and int4 without too much trouble. Just need to find
the right spot in the index handling.

However, I should be able to find that by looking for references to the
pg_am* tables in the source code now that my brain is partly unwedged :)
Will do that, and we might still want to figure out how to use Vadim's
new PARAM_EXEC nodes for function calls on constants and constant
expressions (e.g. func1(func2(const1+const2))+const3) in the parser.

- Tom

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas G. Lockhart (#2)
Re: [HACKERS] Re: type coersion (was OR clause status)

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

The *only case* I've noticed so far which does better in v6.3.2 than
"v6.4today" (not yet v6.4alpha :) is the one involving OIDs:
regression=> explain select * from tenk1 where oid = 3000;
Index Scan on tenk1 (cost=2.05 size=1 width=100)

Actually, I'm aware of another one: comparisons using "> constant"
or "< constant" seem more likely to use an index in 6.3.2 than they
do in the sources I have. I have examples involving both datetime
and int4 columns where "where x = constant" will be implemented by
index scan, but "where x > constant" will not. Explicit casts of
the righthand side make no difference. And it works fine in 6.3.2.

I'm a couple of weeks behind the CVS tree, so I was going to wait
until I'd confirmed it with up-to-the-minute sources before complaining.
But if you're proceeding on the assumption that the "oid = integer" case
is the only thing that's broken, you may be misled.

The fact that casting doesn't affect this makes me think it is a
different problem than the must-cast-to-get-an-index-scan cases
we've discussed so far.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: [HACKERS] Re: type coersion (was OR clause status)

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

... have an int2 column called i2, how do you place the conversion
functions when it is being compared to an in4 constant?
i2 = int2(500)
int4(i2) = 500
The first is good for indexing, the second is not. If they are both
variables or both constants, the handling does not matter.

Yes the handling does matter *in general*, since
while i4 = 500.1
cannot be evaluated as
while i4 = int4(500.1)
and get the right result.
Even for the types in your example,
while i2 = 4000000
should execute correctly, even though we both know that *for this case*
it doesn't make a lot of sense since the constant exceeds the range of
the column.

This is all a good point. I wonder whether it wouldn't help to make the
parser's initial assignment of types to constants depend on how big the
constants are. In other words, if I write "400" the parser would
implicitly mark this as "int2", whereas if I write "400000" it would be
implicitly marked "int4". Then, subsequent implicit upward promotion
would cast 400::int2 to 400::int4 if int4 was actually the most
appropriate thing to use *in context*.

It seems to me that this handles all the cases cited correctly:
where i2 = 400
will get implemented directly as int2 eq int2,
where i4 = 400
will promote 400::int2 to 400::int4 and then use an int4 eq int4
comparison op (of course we have to address the current failure to
reduce int4(constant) to a constant, but IMHO that has to happen
anyway),
where i2 = 400000
will get promoted to where int4(i2) = 400000::int4 and executed
correctly (in this case returning no rows, but that's not a reason
not to do it right --- "where i2 < 32768" might be a more compelling
example). Likewise
where i4 = 500.1
will be executed with correct semantics as float8(i4) = 500.1::float8
since the parser will know that float8 is a "wider" type than int4.

It'd be nice if the same answer would work for
where f4 = 500.1
but I'm not sure that I care to see the parser deciding that "float4
is good enough for this constant". We could too easily find that in
where f8 = 500.1
the parser might cast the constant down to float4 and back up to float8
with catastrophic loss of precision.

In the float case, is there some way that a constant might be marked
as "float of unspecified width" (with the actual value held as a float8)
until the type resolution pass is done? This would need to be
considered "equivalent" to both float4 and float8, so that in
where f4 = 500.1
the type resolver doesn't decide it must rewrite f4 as float8(f4).
Then at some late stage of the game we resolve the constant to float4
rather than float8 if the context is float4. I'm not sure how the
details should work, however. If we *could* make this work it might
be best to handle int2 vs. int4 constants the same way.

regards, tom lane

#5Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Tom Lane (#4)
Re: [HACKERS] Re: type coersion (was OR clause status)

Yes the handling does matter *in general*...

This is all a good point. I wonder whether it wouldn't help to make
the parser's initial assignment of types to constants depend on how
big the constants are.

I agree that there is something to fix, including both of the indexing
example cases you are following. I've been seeing this as an opportunity
to fix sub-optimal utilization of indices, and my recent changes to
enhance the type conversion capabilities in the parser just put the
index handling in a harsher light.

If we come near release time, and no one has been able to penetrate the
index handling (to fix cases as simple as "where x = 1 + 1" which have
never worked) the we could fairly easily go back into the parser and
brute-force some non-general workarounds to get the few "worse than
before" cases hacked around.

If we _can_ get the index handling to work more generally, then we have
substantially enhanced the overall capabilities of Postgres.

So far, I haven't seen cases where the parser has tried to do "the wrong
thing", only cases where "the right thing" causes the index handling to
miss the boat...

- Tom

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas G. Lockhart (#5)
Re: [HACKERS] Re: type coersion (was OR clause status)

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

So far, I haven't seen cases where the parser has tried to do "the wrong
thing", only cases where "the right thing" causes the index handling to
miss the boat...

I disagree, actually. In the example

select ... where i2 = 400;

I claim the parser is doing the wrong thing by representing this as
"where int4(i2) = 400::int4" rather than "where i2 = 400::int2".
It is not really reasonable to expect the optimizer to clean up
after that initial mistake.

regards, tom lane

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#5)
Re: [HACKERS] Re: type coersion (was OR clause status)

Yes the handling does matter *in general*...

This is all a good point. I wonder whether it wouldn't help to make
the parser's initial assignment of types to constants depend on how
big the constants are.

I agree that there is something to fix, including both of the indexing
example cases you are following. I've been seeing this as an opportunity
to fix sub-optimal utilization of indices, and my recent changes to
enhance the type conversion capabilities in the parser just put the
index handling in a harsher light.

If we come near release time, and no one has been able to penetrate the
index handling (to fix cases as simple as "where x = 1 + 1" which have
never worked) the we could fairly easily go back into the parser and
brute-force some non-general workarounds to get the few "worse than
before" cases hacked around.

If we _can_ get the index handling to work more generally, then we have
substantially enhanced the overall capabilities of Postgres.

So far, I haven't seen cases where the parser has tried to do "the wrong
thing", only cases where "the right thing" causes the index handling to
miss the boat...

I have found what is happening in the optimizer, and will explain soon.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#5)
Re: [HACKERS] Re: type coersion (was OR clause status)

If we come near release time, and no one has been able to penetrate the
index handling (to fix cases as simple as "where x = 1 + 1" which have
never worked) the we could fairly easily go back into the parser and
brute-force some non-general workarounds to get the few "worse than
before" cases hacked around.

If we _can_ get the index handling to work more generally, then we have
substantially enhanced the overall capabilities of Postgres.

So far, I haven't seen cases where the parser has tried to do "the wrong
thing", only cases where "the right thing" causes the index handling to
miss the boat...

OK, here is what I have. I have a table called test, with one
attribute. I execute:

select * from test where oid = 3;

and the output plan is:

(
{ EXPR
:typeOid 0
:opType op
:oper
{ OPER
:opno 1137
:opid 0
:opresulttype 16
}

:args (
{ VAR
:varno 1
:varattno -2
:vartype 26
:vartypmod -1
:varlevelsup 0
:varnoold 1
:varoattno -2
}

{ CONST
:consttype 23
:constlen 4
:constisnull false
:constvalue 4 [ 3 0 0 0 ]
:constbyval true
}
)
}
)

Why does the Var have a type 26(int), and the constant a type of
23(oid)? Where's the conversion function?

Now, the existance of the function doesn't help either, but that is a
different problem:

test=> explain select * from test where oid = oid(3);
NOTICE: QUERY PLAN:

Seq Scan on test (cost=1936.05 size=4916 width=8)

Is it because there is a int4eqoid() function? How to use an index on that?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#9Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#8)
Re: [HACKERS] Re: type coersion (was OR clause status)

and the output plan is:
(
{ EXPR
:typeOid 0
:opType op
:oper
{ OPER
:opno 1137
:opid 0
:opresulttype 16
}

:args (
{ VAR
:varno 1
:varattno -2
:vartype 26
:vartypmod -1
:varlevelsup 0
:varnoold 1
:varoattno -2
}

{ CONST
:consttype 23
:constlen 4
:constisnull false
:constvalue 4 [ 3 0 0 0 ]
:constbyval true
}
)
}
)

Why does the Var have a type 26(int), and the constant a type of
23(oid)? Where's the conversion function?

A conversion function is not necessary; the operator in the "opno" field
(1137) corresponds to the oid of the entry in pg_operator for "=" with
the correct arguments.

Now, the existance of the function doesn't help either, but that is a
different problem:
test=> explain select * from test where oid = oid(3);
NOTICE: QUERY PLAN:
Seq Scan on test (cost=1936.05 size=4916 width=8)
Is it because there is a int4eqoid() function?

Yes. The function is called int4eqoid() (good guess :). And there is
also a function oideqint4().

There is a chance that this case would actually work if we just removed
those functions, since (in my test code only) I've made int4 and oid
"binary compatible" so the int4eq or oideq routines would be used
instead. The index support code might actually behave properly then.

How to use an index on that?

So that is the problem for this case; there is actually a function which
matches the arguments exactly, so it is specified. *And* the same
function is (probably) not mentioned in the index configuration tables
pg_am*.

However, if we allowed the index support code to look for possible
matches on indices for the non-constant terms, and then look for the
best possible match for conversion routines on other terms, and then did
an "optimizer substitution", we might get better behavior. We would want
code to do the same kind of analysis for constant terms with function
calls and constant expressions too.

I'd be happy to work on the actual substitution code, but still don't
know what the planner does with indices. I'm starting to poke through it
like you are, but am farther behind.

I thought a good start would be to try addressing a case like this, and
allow the planner/indexer/optimizer to substitute "binary compatible"
indices. If we can succeed at that, then we would know what places need
to be touched to do more, like handling function calls and expressions
with constants.

- Tom

#10Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#6)
Re: [HACKERS] Re: type coersion (was OR clause status)

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

So far, I haven't seen cases where the parser has tried to do "the wrong
thing", only cases where "the right thing" causes the index handling to
miss the boat...

I disagree, actually. In the example

select ... where i2 = 400;

I claim the parser is doing the wrong thing by representing this as
"where int4(i2) = 400::int4" rather than "where i2 = 400::int2".
It is not really reasonable to expect the optimizer to clean up
after that initial mistake.

I don't see that. If I do:

select * from test2 where i2 = 3;

I don't see any constants being converted. The Var is still i2, and the
Const is i4. This is as it is seen by the optimizer. It is using
int24eq(opno = 532). This is valid when doing comparisons in the
executor.

The problem is how do we use indexes for this? I am still researching
this.

---------------------------------------------------------------------------

(
{ EXPR
:typeOid 0
:opType op
:oper
{ OPER
:opno 532
:opid 0
:opresulttype 16
}

:args (
{ VAR
:varno 1
:varattno 1
:vartype 21
:vartypmod -1
:varlevelsup 0
:varnoold 1
:varoattno 1
}

{ CONST
:consttype 23
:constlen 4
:constisnull false
:constvalue 4 [ 3 0 0 0 ]
:constbyval true
}
)
}
)

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#11Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#10)
Re: [HACKERS] Re: type coersion (was OR clause status)

I claim the parser is doing the wrong thing by representing this as
"where int4(i2) = 400::int4" rather than "where i2 = 400::int2".
It is not really reasonable to expect the optimizer to clean up
after that initial mistake.

I don't see that.

Yup. The parser is behaving as Bruce describes. The new type conversion
stuff isn't the fundamental problem. It's the original features in the
planner when trying to use indices.

The problem is how do we use indexes for this? I am still researching
this.

OK, let me know if I can help look into anything. In the meantime, I'll
keep poking at it a bit...

- Tom

#12Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#11)
Re: [HACKERS] Re: type coersion (was OR clause status)

I claim the parser is doing the wrong thing by representing this as
"where int4(i2) = 400::int4" rather than "where i2 = 400::int2".
It is not really reasonable to expect the optimizer to clean up
after that initial mistake.

I don't see that.

Yup. The parser is behaving as Bruce describes. The new type conversion
stuff isn't the fundamental problem. It's the original features in the
planner when trying to use indices.

The problem is how do we use indexes for this? I am still researching
this.

OK, let me know if I can help look into anything. In the meantime, I'll
keep poking at it a bit...

The optimizer does a loop for each index on every relation:

In match_clause_to_indexkey(), there is code that takes the
operator, in the case of "oid = 3", value 1137, oideqint4:

if ((rightop && IsA(rightop, Const)) ||
(rightop && IsA(rightop, Param)))
{
restrict_op = ((Oper *) ((Expr *) clause)->oper)->opno;
isIndexable = (op_class(restrict_op, xclass, index->relam) &&
IndexScanableOperand(leftop,
indexkey,
rel,
index));
}

and calls opclass(), which does a lookup in the pg_amop cache, passing
the operator oid (1137), the access method class oid, and the index
access method:

#0 op_class (opno=1137, opclass=427, amopid=403) at lsyscache.c:58
oideqint4 oid_ops btree_am_oid

and it returns false because there is no access operator for oid_ops and
btree_am_oid that matches oideqint4.

The fundamental problem is that index scans are made to compare columns
all of the same type. That is how indexes are built and traversed.
Now, we want to bring in a constant of another type, and have it use an
index.

Sounds like if we add the proper pg_am functions for binary
compatability, the optimizer should then use the proper indexes.

Thomas?

[I am going to bed now.]

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#13Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#11)
Re: [HACKERS] Re: type coersion (was OR clause status)

I claim the parser is doing the wrong thing by representing this as
"where int4(i2) = 400::int4" rather than "where i2 = 400::int2".
It is not really reasonable to expect the optimizer to clean up
after that initial mistake.

I don't see that.

Yup. The parser is behaving as Bruce describes. The new type conversion
stuff isn't the fundamental problem. It's the original features in the
planner when trying to use indices.

The problem is how do we use indexes for this? I am still researching
this.

OK, let me know if I can help look into anything. In the meantime, I'll
keep poking at it a bit...

I have also looked at x = oid(3) to see why functions are not being
used. That same function I mentioned match_clause_to_indexkey clearly
shows it does not support this:

/*
* If this is not a join clause, check for clauses of the form:
* (operator var/func constant) and (operator constant var/func)
*/
if (!join)
{
/*
* Check for standard s-argable clause
*/
if ((rightop && IsA(rightop, Const)) ||
(rightop && IsA(rightop, Param)))
{
restrict_op = ((Oper *) ((Expr *) clause)->oper)->opno;
isIndexable = (op_class(restrict_op, xclass, index->relam) &&

They want to use a functional index for oid(), but we are computing it
on a constant. Some code will have to be added somewhere to handle
this. If we want it done in the executor, perhaps we can look inside
the function to see if they are all consts, and handle it somehow.

Of course, if a function index does match, we should use that first.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#14Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#12)
Re: [HACKERS] Re: type coersion (was OR clause status)

The fundamental problem is that index scans are made to compare columns
all of the same type. That is how indexes are built and traversed.
Now, we want to bring in a constant of another type, and have it use an
index.

Sounds like if we add the proper pg_am functions for binary
compatability, the optimizer should then use the proper indexes.

FOLLOWUP:

This may fix our int4/oid problem, but I don't think it addresses the
more common problems like int2/int4. I can't see how that would work
with just additions to pg_amop.

Yes, we have the int2eqint4 function, but how does that work in an
index. Actually, it was my understanding that the new conversion code
was going to get rid of the int2eqint4 style functions, and replace them
with conversions.

Thomas, perhaps a quick summary of the logic behind your parser
conversion changes would help.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#15Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#12)
Re: [HACKERS] Re: type coersion (was OR clause status)

Sounds like if we add the proper pg_am functions for binary
compatability, the optimizer should then use the proper indexes.

I just tried this and it failed. Not sure why yet:

test=> explain select * from test where oid = 3;
NOTICE: QUERY PLAN:

Seq Scan on test (cost=1936.05 size=1 width=8)

EXPLAIN
test=> create index i_test3 on test(oid oidint4ops);
ERROR: DefineIndex: oidint4ops class not found
test=> create index i_test3 on test(oid oidint4_ops);
vacuum ;
CREATE
test=> vacuum ;
VACUUM
test=> explain select * from test where oid = 3;
NOTICE: QUERY PLAN:

Seq Scan on test (cost=1936.05 size=1 width=8)

EXPLAIN

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#16Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#12)
Re: [HACKERS] Re: type coersion (was OR clause status)

#0 op_class (opno=1137, opclass=427, amopid=403) at lsyscache.c:58
oideqint4 oid_ops btree_am_oid

and it returns false because there is no access operator for oid_ops and
btree_am_oid that matches oideqint4.

The fundamental problem is that index scans are made to compare columns
all of the same type. That is how indexes are built and traversed.
Now, we want to bring in a constant of another type, and have it use an
index.

Sounds like if we add the proper pg_am functions for binary
compatability, the optimizer should then use the proper indexes.

I think I have found part of the cause. We have duplicate type
conversion functions, and the parser is choosing the one that is not in
the access method tables.

---------------------------------------------------------------------------

test=> select * from pg_operator where oid = 1137;
oprname|oprowner|oprprec|oprkind|oprisleft|oprcanhash|oprleft|oprright|oprresult|oprcom|oprnegate|oprlsortop|oprrsortop|oprcode
|oprrest|oprjoin
-------+--------+-------+-------+---------+----------+-------+--------+---------+------+---------+----------+----------+---------+-------+---------
= | 139| 0|b |t |t | 26| 23|
16| 1136| 0| 0| 0|oideqint4|eqsel
|eqjoinsel (1 row)

test=> select * from pg_operator where oid = 932;
oprname|oprowner|oprprec|oprkind|oprisleft|oprcanhash|oprleft|oprright|oprresult|oprcom|oprnegate|oprlsortop|oprrsortop|oprcode
|oprrest |oprjoin
-------+--------+-------+-------+---------+----------+-------+--------+---------+------+---------+----------+----------+---------+--------+------------
= | 139| 0|b |t |f | 910| 910|
16| 932| 935| 0| 0|oidint4eq|intltsel|intltjoinsel (1 row)

t

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#17Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#16)
Re: [HACKERS] Re: type coersion (was OR clause status)

test=> select * from pg_operator where oid = 1137;
oprname|oprowner|oprprec|oprkind|oprisleft|oprcanhash|oprleft|oprright|oprresult|oprcom|oprnegate|oprlsortop|oprrsortop|oprcode
|oprrest|oprjoin
-------+--------+-------+-------+---------+----------+-------+--------+---------+------+---------+----------+----------+---------+-------+---------
= | 139| 0|b |t |t | 26| 23|
16| 1136| 0| 0| 0|oideqint4|eqsel
|eqjoinsel (1 row)

test=> select * from pg_operator where oid = 932;
oprname|oprowner|oprprec|oprkind|oprisleft|oprcanhash|oprleft|oprright|oprresult|oprcom|oprnegate|oprlsortop|oprrsortop|oprcode
|oprrest |oprjoin
-------+--------+-------+-------+---------+----------+-------+--------+---------+------+---------+----------+----------+---------+--------+------------
= | 139| 0|b |t |f | 910| 910|
16| 932| 935| 0| 0|oidint4eq|intltsel|intltjoinsel (1 row)

I was wrong about the above entries. oidint4eq is for a custom type
that is made up of and oid and an int4. Strange but true. Probably
should be removed. Can't imagine why someone would use this. Probably
used when we didn't have multi-key indexes or something.

Anyway, here is a query that shows the operators defined for access
methods.

SELECT proname
FROM pg_operator, pg_amop,pg_proc
WHERE amopopr = pg_operator.oid AND
RegprocToOid(pg_operator.oprcode) = pg_proc.oid;

Perhaps if i add oid am functions to use the int4 functions, it would
work. int4int2eq assumes promotion of int2 to int4. Should work.

proname
----------------
chareq
chareq
nameeq
nameeq
int2eq
int2eq
int2lt
int4eq
int4eq
int4lt
texteq
texteq
poly_left
poly_overleft
poly_overright
poly_right
poly_contained
poly_contain
poly_same
poly_overlap
box_left
box_left
box_overleft
box_overleft
box_overright
box_overright
box_right
box_right
box_contained
box_contained
box_contain
box_contain
box_same
box_same
box_overlap
box_overlap
int2gt
int4gt
int2le
int4le
int2ge
int4ge
int24eq
int42eq
int24lt
int42lt
int24gt
int42gt
int24le
int42le
int24ge
int42ge
abstimeeq
abstimelt
abstimegt
abstimele
abstimege
oideq
oideq
int4lt
int4gt
int4le
int4ge
float4eq
float4eq
float4lt
float4gt
float4le
float4ge
charlt
charle
chargt
charge
namelt
namele
namegt
namege
text_lt
text_le
text_gt
text_ge
float8eq
float8eq
float8lt
float8le
float8gt
float8ge
oidnamelt
oidnamele
oidnameeq
oidnamege
oidnamegt
oidint2lt
oidint2le
oidint2eq
oidint2ge
oidint2gt
oidint4lt
oidint4le
oidint4eq
oidint4ge
oidint4gt
bpchareq
bpchareq
bpcharlt
bpcharle
bpchargt
bpcharge
varchareq
varchareq
varcharlt
varcharle
varchargt
varcharge
date_eq
date_eq
date_lt
date_le
date_gt
date_ge
time_eq
time_eq
time_lt
time_le
time_gt
time_ge
datetime_eq
datetime_eq
datetime_lt
datetime_le
datetime_gt
datetime_ge
timespan_eq
timespan_eq
timespan_lt
timespan_le
timespan_gt
timespan_ge
circle_left
circle_overleft
circle_overright
circle_right
circle_contained
circle_contain
circle_same
circle_overlap
(146 rows)

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#18Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#16)
Re: [HACKERS] Re: type coersion (was OR clause status)

I think I have found part of the cause. We have duplicate type
conversion functions, and the parser is choosing the one that is not
in the access method tables.

I don't think so for this case; I got stuck on this for awhile too. It
seems that "oidint4" is an actual data type, so has an "oidint4eq"
comparison function. The parser/planner/optimizer is finding the correct
functions, which are "oideqint4" and "int4eqoid".

Don't know what "oidint4" actually does or how it's used. Confusing...

- Tom

---------------------------------------------------------------------------

Show quoted text

test=> select * from pg_operator where oid = 1137;
oprname|oprowner|oprprec|oprkind|oprisleft|oprcanhash|oprleft|oprright|oprresult|oprcom|oprnegate|oprlsortop|oprrsortop|oprcode
|oprrest|oprjoin
-------+--------+-------+-------+---------+----------+-------+--------+---------+------+---------+----------+----------+---------+-------+---------
= | 139| 0|b |t |t | 26| 23|
16| 1136| 0| 0| 0|oideqint4|eqsel
|eqjoinsel (1 row)

test=> select * from pg_operator where oid = 932;
oprname|oprowner|oprprec|oprkind|oprisleft|oprcanhash|oprleft|oprright|oprresult|oprcom|oprnegate|oprlsortop|oprrsortop|oprcode
|oprrest |oprjoin
-------+--------+-------+-------+---------+----------+-------+--------+---------+------+---------+----------+----------+---------+--------+------------
= | 139| 0|b |t |f | 910| 910|
16| 932| 935| 0| 0|oidint4eq|intltsel|intltjoinsel (1 row)

#19Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#18)
Re: [HACKERS] Re: type coersion (was OR clause status)

I think I have found part of the cause. We have duplicate type
conversion functions, and the parser is choosing the one that is not
in the access method tables.

I don't think so for this case; I got stuck on this for awhile too. It
seems that "oidint4" is an actual data type, so has an "oidint4eq"
comparison function. The parser/planner/optimizer is finding the correct
functions, which are "oideqint4" and "int4eqoid".

Don't know what "oidint4" actually does or how it's used. Confusing...

OK, new information. The following query shows the pg_proc names for
oid.*int4:

select pg_operator.oid as pg_operator_oid, pg_proc.proname from pg_proc,
pg_operator where proname ~ 'oid.*int4' and pg_proc.oid =
RegprocToOid(pg_operator.oprcode)

pg_operator_oid|proname
---------------+---------
1137|oideqint4
930|oidint4lt
931|oidint4le
932|oidint4eq
933|oidint4ge
934|oidint4gt
935|oidint4ne
(7 rows)

The pg_operator_oid is the value stored in pg_amop.amopopr field. The
optimizer is finding a function that works for the query, in our case
oideqint4, with a pg_operator oid of 1137.

The problem is that there is no pg_am.amopopr for 1137. What is defined
for oid is this:

/*
* nbtree oid_ops
*/

DATA(insert OID = 0 ( 403 427 609 1 btreesel btreenpage ));
DATA(insert OID = 0 ( 403 427 611 2 btreesel btreenpage ));
DATA(insert OID = 0 ( 403 427 607 3 btreesel btreenpage ));
DATA(insert OID = 0 ( 403 427 612 4 btreesel btreenpage ));
DATA(insert OID = 0 ( 403 427 610 5 btreesel btreenpage ));

For example, the third number in the third line is 607, which is the
amopopr, and is oideq. We know we can use this for the query because
int4 and oid are identical, but the parser has already chosen the more
appopriate 1137/oideqint4. If we could add an extra line to this file,
perhaps:

DATA(insert OID = 0 ( 403 427 607 3 btreesel btreenpage ));

However, I don't think the access methods allow more than one line in
this way.

The other problem is that there is not a <, >, etc for int4/oid, so a
separate op type can not be created. In fact, I am not sure it would
work anyway. I think the operator oid chosen by the backend must
EXACTLY match the oid stored in pg_amop.amopopr for the index to be
used.

That is what match_clause_to_indexkey is doing by calling op_class().

[Update you source trees. The op_class parameter names were wrong,
called opid instead of opno, so it was very confusing. Fixed now.]

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#20Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#19)
Re: [HACKERS] Re: type coersion (was OR clause status)

For example, the third number in the third line is 607, which is the
amopopr, and is oideq. We know we can use this for the query because
int4 and oid are identical, but the parser has already chosen the more
appopriate 1137/oideqint4. If we could add an extra line to this file,
perhaps:

DATA(insert OID = 0 ( 403 427 607 3 btreesel btreenpage ));

However, I don't think the access methods allow more than one line in
this way.

I just tried adding the extra line, and initdb failed. I wonder if we
remove the oideqint4, if the parser will go for oideq?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#21Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#20)
Re: [HACKERS] Re: type coersion (was OR clause status)

I just tried adding the extra line, and initdb failed. I wonder if we
remove the oideqint4, if the parser will go for oideq?

If you want to try this as an experiment, so we can tell if this is a
possible solution, then great. I still have hopes that we can substitute
other parse trees and strategies within the index utilization routines,
so how about not committing things until we've tried a few options.

match_clause_to_indexkey() seems to be a starting point for what I want
to do. Will let you know how it goes...

- Tom

#22Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#21)
Re: [HACKERS] Re: type coersion (was OR clause status)

I just tried adding the extra line, and initdb failed. I wonder if we
remove the oideqint4, if the parser will go for oideq?

If you want to try this as an experiment, so we can tell if this is a
possible solution, then great. I still have hopes that we can substitute
other parse trees and strategies within the index utilization routines,
so how about not committing things until we've tried a few options.

match_clause_to_indexkey() seems to be a starting point for what I want
to do. Will let you know how it goes...

OK. Check out my other posts on how op_class/pg_amop.amopopr is used to
look up if the current expression operator is a member of the
class(btree), and index opclass(oid_ops). I could probably code
something so you could put multiple entries in pg_amop. My issue is
that there are not that many interchangable operators to make that
useful.

Now, if you are thinking of doing some type of constant conversion
there, that may be a bigger win. The issue is at this point in the
code, there is no guarentee the index will be used. It is just checking
the index usability.

I think I see where you are going with this. For queries not involving
indexes(many joins don't), it may be better to leave things for the
executor.

Also, if you can, do a fresh cvs update if you can, because the code is
a little cleaner now in that area. Particularly, the op_class()
parameters were badly named, causing me confusion.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#23Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#20)
Re: [HACKERS] Re: type coersion (was OR clause status)

However, I don't think the access methods allow more than one line in
this way.

I just tried adding the extra line, and initdb failed. I wonder if we
remove the oideqint4, if the parser will go for oideq?

One other item. oid indexes use int4lt(), etc. for their internal
indexing, not oidlt() because they do not exist.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#24Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#22)
Re: [HACKERS] Re: type coersion (was OR clause status)

match_clause_to_indexkey() seems to be a starting point for what I
want to do. Will let you know how it goes...

Check out my other posts on how op_class/pg_amop.amopopr is used to
look up if the current expression operator is a member of the
class(btree), and index opclass(oid_ops). I could probably code
something so you could put multiple entries in pg_amop. My issue is
that there are not that many interchangable operators to make that
useful.
Now, if you are thinking of doing some type of constant conversion
there, that may be a bigger win. The issue is at this point in the
code, there is no guarentee the index will be used. It is just
checking the index usability.
I think I see where you are going with this. For queries not
involving indexes(many joins don't), it may be better to leave things
for the executor.

The first thing I want to try is to substitute the operator for types
which are known to be binary-compatible and do not have their own index
defined. The next step would be to substitute an operator _and_ insert
an explicit type conversion using a function call (which may not work
yet for other reasons). Don't know where Vadim's PARAM_EXEC node comes
in, but if it does it might be around here.

... if you can, do a fresh cvs update if you can, because the code is
a little cleaner now in that area. Particularly, the op_class()
parameters were badly named, causing me confusion.

OK. Does the current source tree compile now? Haven't had much time to
look at it the last few days, but will start poking at it this evening a
bit.

- Tom

#25Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#24)
Re: [HACKERS] Re: type coersion (was OR clause status)

The first thing I want to try is to substitute the operator for types
which are known to be binary-compatible and do not have their own index
defined. The next step would be to substitute an operator _and_ insert
an explicit type conversion using a function call (which may not work
yet for other reasons). Don't know where Vadim's PARAM_EXEC node comes
in, but if it does it might be around here.

... if you can, do a fresh cvs update if you can, because the code is
a little cleaner now in that area. Particularly, the op_class()
parameters were badly named, causing me confusion.

OK. Does the current source tree compile now? Haven't had much time to
look at it the last few days, but will start poking at it this evening a
bit.

Yep. Only been down for a few hours in the past month, at least for me.
I am working on another huge patch, but it will be tested before
install, of course.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#26Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#25)
Re: [HACKERS] Re: type coersion (was OR clause status)

The first thing I want to try is to substitute the operator for
types which are known to be binary-compatible and do not have their
own index defined.

Got a start on it :)

regression=> explain select * from tenk1 where oid = 3000;
NOTICE: QUERY PLAN:

Index Scan using tenk1_oid on tenk1 (cost=2.05 size=1 width=148)

EXPLAIN

So far, I've just done the right-hand form (the one with the constant on
the rhs of the expression). The left-hand form should be easy now.

So, just curious: if we stop here, and only match up binary-compatible
built-in types with available indices, then what v6.3.2 features/good
behaviors are still missing?

- Tom

The example is from the regression test database with an extra btree
index built on the tenk1 table...

regression=> \d tenk1

Table    = tenk1
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-------+
| unique1                          | int4                            
|     4 |
| unique2                          | int4                            
|     4 |
| two                              | int4                            
|     4 |
| four                             | int4                            
|     4 |
| ten                              | int4                            
|     4 |
| twenty                           | int4                            
|     4 |
| hundred                          | int4                            
|     4 |
| thousand                         | int4                            
|     4 |
| twothousand                      | int4                            
|     4 |
| fivethous                        | int4                            
|     4 |
| tenthous                         | int4                            
|     4 |
| odd                              | int4                            
|     4 |
| even                             | int4                            
|     4 |
| stringu1                         | name                            
|    32 |
| stringu2                         | name                            
|    32 |
| string4                          | name                            
|    32 |
+----------------------------------+----------------------------------+-------+
Indices:  tenk1_hundred
          tenk1_oid
          tenk1_unique1
          tenk1_unique2
regression=>
#27Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#26)
Re: [HACKERS] Re: type coersion (was OR clause status)

The first thing I want to try is to substitute the operator for
types which are known to be binary-compatible and do not have their
own index defined.

Got a start on it :)

regression=> explain select * from tenk1 where oid = 3000;
NOTICE: QUERY PLAN:

Index Scan using tenk1_oid on tenk1 (cost=2.05 size=1 width=148)

EXPLAIN

So far, I've just done the right-hand form (the one with the constant on
the rhs of the expression). The left-hand form should be easy now.

So, just curious: if we stop here, and only match up binary-compatible
built-in types with available indices, then what v6.3.2 features/good
behaviors are still missing?

Did you put it in the optimizer?

I think binary compatable types converted is going to be the easiest
thing to do for index use. Not sure how you could try and break it.
How about character string comparisons using indexes?

| 32 |
+----------------------------------+----------------------------------+-------+
Indices: tenk1_hundred
tenk1_oid
tenk1_unique1
tenk1_unique2

^^^^^^^^^^^^^^^^^^^^^^^^^

How about that new display?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#28Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#27)
Re: [HACKERS] Re: type coersion (was OR clause status)

The first thing I want to try is to substitute the operator for
types which are known to be binary-compatible and do not have
their own index defined.

Got a start on it :)
regression=> explain select * from tenk1 where oid = 3000;
Index Scan using tenk1_oid on tenk1 (cost=2.05 size=1 width=148)
So, just curious: if we stop here, and only match up
binary-compatible built-in types with available indices, then what
v6.3.2 features/good behaviors are still missing?

Did you put it in the optimizer?

It is inside backend/optimizer/path/indxpath.c. I'm using a bit of the
parser support code to help out. It has to be where the backend actually
is checking to see if an index is usable, which is in the optimization
step. Any earlier and we would have to look-ahead at the indices which
seems inappropriate.

I think binary compatable types converted is going to be the easiest
thing to do for index use. Not sure how you could try and break it.
How about character string comparisons using indexes?

Will try some more tests, but it seems like it will be hard to break
since it only comes into effect with built-in datatypes which are
supposed to be binary compatible.

It would be interesting to try to do constant expressions and function
calls on constants next, though I'm thinking that it isn't required for
v6.4.

Vadim, will the executor know how to use a PARAM_EXEC node in any
context, or will we have to do some coding to get it recognized outside
of subselects? I'll need to figure out how to build one too, I
suppose...

- Tom

#29Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#27)
Re: [HACKERS] Re: type coersion (was OR clause status)

Thomas G. Lockhart wrote:

The first thing I want to try is to substitute the operator for
types which are known to be binary-compatible and do not have
their own index defined.

Got a start on it :)
regression=> explain select * from tenk1 where oid = 3000;
Index Scan using tenk1_oid on tenk1 (cost=2.05 size=1 width=148)
So, just curious: if we stop here, and only match up
binary-compatible built-in types with available indices, then what
v6.3.2 features/good behaviors are still missing?

Did you put it in the optimizer?

It is inside backend/optimizer/path/indxpath.c. I'm using a bit of the
parser support code to help out. It has to be where the backend actually
is checking to see if an index is usable, which is in the optimization
step. Any earlier and we would have to look-ahead at the indices which
seems inappropriate.

I think binary compatable types converted is going to be the easiest
thing to do for index use. Not sure how you could try and break it.
How about character string comparisons using indexes?

Will try some more tests, but it seems like it will be hard to break
since it only comes into effect with built-in datatypes which are
supposed to be binary compatible.

OK, I've just committed to the source tree changes for looking for
binary-compatible indices when either the left- or right-hand side of
the restriction clause is a constant.

I've #ifndef'd them so we can disable it if necessary. But, all of the
regression tests pass, except for the select_view test, which has been
core dumping for weeks. Anyone else seeing that, or is it just me? :(

- Tom

#30Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#27)
Re: [HACKERS] Re: type coersion (was OR clause status)

Thomas G. Lockhart wrote:

It is inside backend/optimizer/path/indxpath.c. I'm using a bit of the
parser support code to help out. It has to be where the backend actually
is checking to see if an index is usable, which is in the optimization
step. Any earlier and we would have to look-ahead at the indices which
seems inappropriate.

Just let me note that function calls on constants is problem not
only for indices using. Call lower() for each tuple in
WHERE a = lower('bbb') is always bad - lower() eats memory...

I think binary compatable types converted is going to be the easiest
thing to do for index use. Not sure how you could try and break it.
How about character string comparisons using indexes?

Parser could use type_in()/type_out() funcs to do type
coersion...

Will try some more tests, but it seems like it will be hard to break
since it only comes into effect with built-in datatypes which are
supposed to be binary compatible.

It would be interesting to try to do constant expressions and function
calls on constants next, though I'm thinking that it isn't required for
v6.4.

Vadim, will the executor know how to use a PARAM_EXEC node in any
context, or will we have to do some coding to get it recognized outside
of subselects? I'll need to figure out how to build one too, I
suppose...

I'm not sure... But imho, PARAM_EXEC could be usefull for
now() etc funcs - for non-variant funcs I would suggest
just evaluate them in parser...

Vadim

#31Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#27)
Re: [HACKERS] Re: type coersion (was OR clause status)

Just let me note that function calls on constants is problem not
only for indices using. Call lower() for each tuple in
WHERE a = lower('bbb') is always bad - lower() eats memory...

Sure, so we want to evaluate as a constant. I've found a place in the
code where a function gets evaluated immediately; don't remember where
but it looked like an interesting possibility.

How about character string comparisons using indexes?

Parser could use type_in()/type_out() funcs to do type
coersion...

That is what it used to try to do. But I don't think that generalizes
very well. For example, at the moment floating point numbers without
fractional parts are printed without a decimal point or trailing digits,
but we could decide to format them with a ".0" at the end. Then they
couldn't be converted to an integer...

Vadim, will the executor know how to use a PARAM_EXEC node in any
context, or will we have to do some coding to get it recognized
outside of subselects? I'll need to figure out how to build one too,
I suppose...

I'm not sure... But imho, PARAM_EXEC could be usefull for
now() etc funcs - for non-variant funcs I would suggest
just evaluate them in parser...

But it could be used for every function called with constants, right? If
it works for everything, why bother with other special cases?

- Tom

#32Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#29)
Re: [HACKERS] Re: type coersion (was OR clause status)

OK, I've just committed to the source tree changes for looking for
binary-compatible indices when either the left- or right-hand side of
the restriction clause is a constant.

I've #ifndef'd them so we can disable it if necessary. But, all of the
regression tests pass, except for the select_view test, which has been
core dumping for weeks. Anyone else seeing that, or is it just me? :(

Is this fixed? Are there any open problems with the regress tests?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)