Index optimization ?

Started by Bo Lorentsenabout 21 years ago37 messagesgeneral

bl@netgroup.dk

about 21 years ago

Hi ...

In my quest to get rid of the oid dependency, i have made some new low
level code with the help from many nice people from this community
(thanks for that), but I still have one somewhat big problem.

I am running PG 7.4.6, btw.

I have a "sale" table that have a BIGSERIAL as primary key, but explain
keeps telling me (and so does the performance) that it will perform a
"seq scan" of my table when performing this statement (not using the
pkey index) :

select * from sale where id = currval( 'sale_id_seq' );

I tried this too :

select * from sale where id = currval( 'sale_id_seq' )::bigint;

But this still did not work (still using seq scan) :-(

At last I did a :

explain select * from sale where id = 42::bigint;

Just to make sure, and this made the proper optimizations (used the pkey
index).

What have I done wrong, or do PG still have some casting problems in the
optimizer ?

/BL

tgl@sss.pgh.pa.us

about 21 years ago

In reply to: Bo Lorentsen (#1)

Re: Index optimization ?

Bo Lorentsen <bl@netgroup.dk> writes:

select * from sale where id = currval( 'sale_id_seq' );

This is not legally optimizable into an indexscan, because currval() is
a volatile function. (It's easy to construct cases where its value
actually does change from row to row --- just use a nextval() as well.)

You can fake it out in a couple of ways --- the recommended method is to
wrap currval in a user-defined function that is misleadingly marked
stable. I think it still works to just put the call in a sub-select:
select * from sale where id = (select currval( 'sale_id_seq' ));
but I take no responsibility if future improvements in the planner break
that trick.

regards, tom lane

mike@fuhr.org

about 21 years ago

In reply to: Bo Lorentsen (#1)

Re: Index optimization ?

On Sat, Jan 15, 2005 at 07:03:43PM +0100, Bo Lorentsen wrote:

select * from sale where id = currval( 'sale_id_seq' )::bigint;

But this still did not work (still using seq scan) :-(

currval() is volatile. See "Function Volatility Categories" in the
"Extending SQL" chapter of the documentation and search the list
archives for past discussion of currval()'s volatility.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Jim.Nasby@BlueTreble.com

about 21 years ago

In reply to: Tom Lane (#2)

Re: Index optimization ?

On Sat, Jan 15, 2005 at 01:27:49PM -0500, Tom Lane wrote:

Bo Lorentsen <bl@netgroup.dk> writes:

select * from sale where id = currval( 'sale_id_seq' );

This is not legally optimizable into an indexscan, because currval() is
a volatile function. (It's easy to construct cases where its value
actually does change from row to row --- just use a nextval() as well.)

You can fake it out in a couple of ways --- the recommended method is to
wrap currval in a user-defined function that is misleadingly marked
stable. I think it still works to just put the call in a sub-select:
select * from sale where id = (select currval( 'sale_id_seq' ));
but I take no responsibility if future improvements in the planner break
that trick.

Would it make sense to have a version of currval that will only return
one value in a statement/transaction? So the first time it's called it
remembers what currval for that sequence is and always returns the same
value?
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

alvherre@dcc.uchile.cl

about 21 years ago

In reply to: Jim Nasby (#4)

Re: Index optimization ?

On Sat, Jan 15, 2005 at 03:11:22PM -0600, Jim C. Nasby wrote:

Would it make sense to have a version of currval that will only return
one value in a statement/transaction? So the first time it's called it
remembers what currval for that sequence is and always returns the same
value?

What would nextval() do in that case? Return the nextval on the first
call, and act like currval() from then until the end of the transaction?

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ��Qui��n es el machito que tendr��a carnet?" (Mafalda)

bl@netgroup.dk

about 21 years ago

In reply to: Tom Lane (#2)

Re: Index optimization ?

Tom Lane wrote:

This is not legally optimizable into an indexscan, because currval() is
a volatile function. (It's easy to construct cases where its value
actually does change from row to row --- just use a nextval() as well.)

I am not sure what you mean by a "volatile function", and how this
affect the returned types, I guess this demands some more low level PG
knowledge to understand.

You can fake it out in a couple of ways --- the recommended method is to
wrap currval in a user-defined function that is misleadingly marked
stable. I think it still works to just put the call in a sub-select:
select * from sale where id = (select currval( 'sale_id_seq' ));

but I take no responsibility if future improvements in the planner break
that trick.

The select trick works just fine, but I don't understand why :-(

Do you have any idea to how I may learn more about function types, or is
this a "read the source, luke" thing (I am not sure I have time for that
right now) ?

Well, thanks anyway ... this just work so nice, and I a looking forward
to fill up this database with plenty of data, and still being able to
sleep :-)

/BL

bl@netgroup.dk

about 21 years ago

In reply to: Michael Fuhr (#3)

Re: Index optimization ?

Michael Fuhr wrote:

currval() is volatile. See "Function Volatility Categories" in the
"Extending SQL" chapter of the documentation and search the list
archives for past discussion of currval()'s volatility.

Hmm, I can't find that chapter in the 7.4 manual, or am I looking the
wrong place ? I really like to understand this !

/BL

tgl@sss.pgh.pa.us

about 21 years ago

In reply to: Bo Lorentsen (#6)

Re: Index optimization ?

Bo Lorentsen <bl@netgroup.dk> writes:

Do you have any idea to how I may learn more about function types, or is
this a "read the source, luke" thing (I am not sure I have time for that
right now) ?

http://developer.postgresql.org/docs/postgres/xfunc-volatility.html

regards, tom lane

mike@fuhr.org

about 21 years ago

In reply to: Bo Lorentsen (#7)

Re: Index optimization ?

On Sat, Jan 15, 2005 at 11:28:08PM +0100, Bo Lorentsen wrote:

Michael Fuhr wrote:

currval() is volatile. See "Function Volatility Categories" in the
"Extending SQL" chapter of the documentation and search the list
archives for past discussion of currval()'s volatility.

Hmm, I can't find that chapter in the 7.4 manual, or am I looking the
wrong place ? I really like to understand this !

Sorry...apparently that section is in the development documentation
but not in the 7.x doc. Tom Lane posted the link but here it is
again:

http://developer.postgresql.org/docs/postgres/xfunc-volatility.html

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Jim.Nasby@BlueTreble.com

about 21 years ago

In reply to: Alvaro Herrera (#5)

Re: Index optimization ?

On Sat, Jan 15, 2005 at 06:34:11PM -0300, Alvaro Herrera wrote:

On Sat, Jan 15, 2005 at 03:11:22PM -0600, Jim C. Nasby wrote:

Would it make sense to have a version of currval that will only return
one value in a statement/transaction? So the first time it's called it
remembers what currval for that sequence is and always returns the same
value?

What would nextval() do in that case? Return the nextval on the first
call, and act like currval() from then until the end of the transaction?

I'm not sure which would be best. It could either do what you suggest,
or it could operate as normal, or it could possibly throw an error if
run a second time.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

bl@netgroup.dk

about 21 years ago

In reply to: Tom Lane (#8)

Re: Index optimization ?

Tom Lane wrote:

http://developer.postgresql.org/docs/postgres/xfunc-volatility.html

Ok, thanks I see why there is these three differant function types, but
I don't quite understand why the value from a volatile function, can't
be used as a index key. Is this because there is no return type garanti,
for the voilatile function too ?

Will the only possible way to fix this be to make a volatile function
with a return type (I know this is not possible now, but in theory) ?

/BL

Michael Glaesemann

grzm@seespotcode.net

about 21 years ago

In reply to: Bo Lorentsen (#11)

Re: Index optimization ?

On Jan 17, 2005, at 0:25, Bo Lorentsen wrote:

Tom Lane wrote:

http://developer.postgresql.org/docs/postgres/xfunc-volatility.html

Ok, thanks I see why there is these three differant function types,
but I don't quite understand why the value from a volatile function,
can't be used as a index key. Is this because there is no return type
garanti, for the voilatile function too ?

I don't believe it has necessarily anything to do with the return type,
but rather the return value. An index only works if you know what the
value is, and the return value for a volatile function is not
guaranteed to be the same for given parameters. Here's a contrived (and
untestsd) example, but one I think makes it clear:

CREATE FUNCTION plus_random ( INTEGER )
RETURNS INTEGER
LANGUAGE SQL AS '
SELECT round( $1 + random() * 100 );
';

One could conceivably attempt to make a functional index using
plus_random(), but the result it gives every time is indeterminant. How
would you be able to usefully search for values in an index that is
based on this function? Would it make sense do to do so?

Does this help? (And if I'm completely off base, someone please let me
know :)

Michael Glaesemann
grzm myrealbox com

bl@netgroup.dk

about 21 years ago

In reply to: Michael Glaesemann (#12)

Re: Index optimization ?

Michael Glaesemann wrote:

I don't believe it has necessarily anything to do with the return
type, but rather the return value. An index only works if you know
what the value is, and the return value for a volatile function is not
guaranteed to be the same for given parameters. Here's a contrived
(and untestsd) example, but one I think makes it clear:

CREATE FUNCTION plus_random ( INTEGER )
RETURNS INTEGER
LANGUAGE SQL AS '
SELECT round( $1 + random() * 100 );
';

One could conceivably attempt to make a functional index using
plus_random(), but the result it gives every time is indeterminant.
How would you be able to usefully search for values in an index that
is based on this function? Would it make sense do to do so?

What you say is that PG can't see the difference between this
"plus_random" and the "currval", right.

But if I have a select (a quite strange one), like this :

SELECT * FROM test_table WHERE id = plus_random( test_col );

I don't understand the problem. The function always return an integer as
specified in the function decl. so why not use the PK index for search,
instead of using seq scan ? The value is totally unpredictable but it is
still an integer and the pk index is still useful regarding performance !

I know there is something I don't understand, so I just have to ask :-)

Does this help? (And if I'm completely off base, someone please let me
know :)

No this time I think missed the point, I understand the volatility of
functions, so the planer know what to expect from the function,
regarding side effect, but I still don't understand why this influences
the choice of valid indexes.

/BL

Ragnar Hafstað

gnari@simnet.is

about 21 years ago

In reply to: Bo Lorentsen (#11)

Re: Index optimization ?

On Sun, 2005-01-16 at 16:25 +0100, Bo Lorentsen wrote:
[about a volatile function in a where clause not generating index scan]

Will the only possible way to fix this be to make a volatile function
with a return type (I know this is not possible now, but in theory) ?

this has nothing to do with the return type. a volatile function is a
function that is not garanteed to return the same value given same
input parameters, (such as currval()).

when a volatile function is used thus:
SELECT * FROM mytable WHERE col=myvolatilefunc();
the planner must call the function once per table row, and assume
possibly different return values each time, so an indexscan will
not improve timings.

on the other hand, if the function is labeled STABLE, the planner
can assume that the same value will alway be returned, so only
one call to it can be made, and an indexscan might be found the
most effective.

hope this helps

gnari

bl@netgroup.dk

about 21 years ago

In reply to: Ragnar Hafstað (#14)

Re: Index optimization ?

Ragnar Hafstaï¿½ wrote:

this has nothing to do with the return type. a volatile function is a
function that is not garanteed to return the same value given same
input parameters, (such as currval()).

when a volatile function is used thus:
SELECT * FROM mytable WHERE col=myvolatilefunc();
the planner must call the function once per table row, and assume
possibly different return values each time, so an indexscan will
not improve timings.

Why not use the index scan for every row, is this a "limit" in the
planner ? I think there is something in the planner I don't understand :-)

on the other hand, if the function is labeled STABLE, the planner
can assume that the same value will alway be returned, so only
one call to it can be made, and an indexscan might be found the
most effective.

The two other function types are not interesting, but I don't understand
the planners use of index optimization.

/BL

Martijn van Oosterhout

kleptog@svana.org

about 21 years ago

In reply to: Bo Lorentsen (#13)

Re: Index optimization ?

On Sun, Jan 16, 2005 at 05:30:22PM +0100, Bo Lorentsen wrote:

One could conceivably attempt to make a functional index using
plus_random(), but the result it gives every time is indeterminant.
How would you be able to usefully search for values in an index that
is based on this function? Would it make sense do to do so?

What you say is that PG can't see the difference between this
"plus_random" and the "currval", right.

But if I have a select (a quite strange one), like this :

SELECT * FROM test_table WHERE id = plus_random( test_col );

I don't understand the problem. The function always return an integer as
specified in the function decl. so why not use the PK index for search,
instead of using seq scan ? The value is totally unpredictable but it is
still an integer and the pk index is still useful regarding performance !

No, it depends on your interpretation of the query. Note, I'm not up
with the SQL standard so maybe it doesn't work like this, but this is
what I think the problem is.

The above query can be interpreted as: for each row in test_table,
compare id against plus_random( test_col ). Now, in theory the
plus_random function needs to be evaluated for every row, each time
giving a different value, thus it may or may not match id.

You can see that with that interpretation an index on id doesn't help.
If you interpret the query so plus_random is evaluted only once, then
an index will help. If test_col is a column of the table then there is
no way an index can help you.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

Ragnar Hafstað

gnari@simnet.is

about 21 years ago

In reply to: Bo Lorentsen (#15)

Re: Index optimization ?

On Sun, 2005-01-16 at 17:45 +0100, Bo Lorentsen wrote:

Ragnar Hafstaï¿½ wrote:

when a volatile function is used thus:
SELECT * FROM mytable WHERE col=myvolatilefunc();
the planner must call the function once per table row, and assume
possibly different return values each time, so an indexscan will
not improve timings.

Why not use the index scan for every row, is this a "limit" in the
planner ? I think there is something in the planner I don't understand :-)

the planner will just use the plan it estimates will be fastest.
because of how indexscans work in postgresql, in this case it would be
slower than a tablescan (assuming the function really is volatile)

gnari

Roman Neuhauser

neuhauser@chello.cz

about 21 years ago

In reply to: Martijn van Oosterhout (#16)

Re: Index optimization ?

# kleptog@svana.org / 2005-01-16 17:48:08 +0100:

On Sun, Jan 16, 2005 at 05:30:22PM +0100, Bo Lorentsen wrote:

One could conceivably attempt to make a functional index using
plus_random(), but the result it gives every time is indeterminant.
How would you be able to usefully search for values in an index that
is based on this function? Would it make sense do to do so?

What you say is that PG can't see the difference between this
"plus_random" and the "currval", right.

But if I have a select (a quite strange one), like this :

SELECT * FROM test_table WHERE id = plus_random( test_col );

I don't understand the problem. The function always return an integer as
specified in the function decl. so why not use the PK index for search,
instead of using seq scan ? The value is totally unpredictable but it is
still an integer and the pk index is still useful regarding performance !

No, it depends on your interpretation of the query. Note, I'm not up
with the SQL standard so maybe it doesn't work like this, but this is
what I think the problem is.

The above query can be interpreted as: for each row in test_table,
compare id against plus_random( test_col ).

That's what happens if you declare the function VOLATILE.
Make it STABLE, and the function call will be evaluated only once
for the whole table scan. That's just what Tom Lane suggested in
his post.

--
If you cc me or remove the list(s) completely I'll most likely ignore
your message. see http://www.eyrie.org./~eagle/faqs/questions.html

tgl@sss.pgh.pa.us

about 21 years ago

In reply to: Ragnar Hafstað (#17)

Re: Index optimization ?

Ragnar =?ISO-8859-1?Q?Hafsta=F0?= <gnari@simnet.is> writes:

On Sun, 2005-01-16 at 17:45 +0100, Bo Lorentsen wrote:

Why not use the index scan for every row, is this a "limit" in the
planner ? I think there is something in the planner I don't understand :-)

the planner will just use the plan it estimates will be fastest.
because of how indexscans work in postgresql, in this case it would be
slower than a tablescan (assuming the function really is volatile)

It has nothing to do with speed, it has to do with giving the correct
answer. We define "correct answer" as being the result you would get
from a naive interpretation of the SQL semantics --- that is, for every
row in the FROM table, actually execute the WHERE clause, and return the
rows where it produces TRUE.

As an example, a query like
SELECT * FROM mytable WHERE random() < 0.1;
should produce a random sampling of about one-tenth of the rows in mytable.
If we evaluated random() only once in this query, we would get either
all or none of the rows, clearly not the right answer.

An indexscan is a legal optimization only if the function(s) in the
WHERE clause are all STABLE or better. This is because the index access
code will only evaluate the righthand side of the "indexcol = something"
clause once, and then will use that value to descend the btree and
select matching index entries. We must be certain that this gives the
same result we would get from a seqscan.

The definition of STABLE that PostgreSQL uses was crafted specifically
to capture the property that a function is safe to use in an indexscan
qualification ...

regards, tom lane

bl@netgroup.dk

about 21 years ago

In reply to: Martijn van Oosterhout (#16)

Re: Index optimization ?

Martijn van Oosterhout wrote:

No, it depends on your interpretation of the query. Note, I'm not up
with the SQL standard so maybe it doesn't work like this, but this is
what I think the problem is.

I just try to learn, so that is ok :-) Tom gave me a solution that
works, so now I struggle to understand why.

The above query can be interpreted as: for each row in test_table,
compare id against plus_random( test_col ). Now, in theory the
plus_random function needs to be evaluated for every row, each time
giving a different value, thus it may or may not match id.

But if you take a look at a function, it has a return type. So "currval"
always returns a BIGINT no matter what kind of parameters are given,
that is a part of the declaration, as far as I can see. Why are this
type info not used to match an index, as the type is the same no matter
what row we are in, or no matter its parameter value (or context). The
value change, but not the type. The type is used to find a matching
index is it not ?

Am I misunderstanding you ?

You can see that with that interpretation an index on id doesn't help.

No, I think this is the problem, I don't see :-) The function promise to
return a certain type, and type can be used to find the prober index (if
any).

If you interpret the query so plus_random is evaluted only once, then
an index will help. If test_col is a column of the table then there is
no way an index can help you.

If and only if the function returns a different value TYPE, otherwise it
can use the same index but with different values, of the same type alias
use index scan.

But again, I am sure there is something I have misunderstud :-)

Thanks for trying :-)

/BL

bl@netgroup.dk

about 21 years ago

In reply to: Tom Lane (#19)

Ragnar Hafstað

gnari@simnet.is

about 21 years ago

In reply to: Tom Lane (#19)

fgp@phlo.org

about 21 years ago

In reply to: Bo Lorentsen (#21)

tgl@sss.pgh.pa.us

about 21 years ago

In reply to: Bo Lorentsen (#21)

bruce@momjian.us

about 21 years ago

In reply to: Florian Pflug (#23)

bl@netgroup.dk

about 21 years ago

In reply to: Tom Lane (#24)

bl@netgroup.dk

about 21 years ago

In reply to: Florian Pflug (#23)

bl@netgroup.dk

about 21 years ago

In reply to: Bruce Momjian (#25)

Frank D. Engel, Jr.

fde101@fjrhome.net

about 21 years ago

In reply to: Bo Lorentsen (#26)

tgl@sss.pgh.pa.us

about 21 years ago

In reply to: Bo Lorentsen (#26)

bruce@momjian.us

about 21 years ago

In reply to: Bo Lorentsen (#27)

fgp@phlo.org

about 21 years ago

In reply to: Bo Lorentsen (#28)

Jim.Nasby@BlueTreble.com

about 21 years ago

In reply to: Florian Pflug (#32)

alvherre@dcc.uchile.cl

about 21 years ago

In reply to: Jim Nasby (#33)

Jim.Nasby@BlueTreble.com

about 21 years ago

In reply to: Alvaro Herrera (#34)

bl@netgroup.dk

about 21 years ago

In reply to: Florian Pflug (#32)

bl@netgroup.dk

about 21 years ago

In reply to: Bruce Momjian (#31)