Mechanics of Select

Started by Willem Buitendykabout 18 years ago10 messagesgeneral
Jump to latest
#1Willem Buitendyk
willem@pcfish.ca

I have the following function that returns the first day of the next
month from whatever date is inserted. If I use this as part of a select
statement then it takes almost twice as long to perform. Is this
because for each scanned record this function is being called? If so
any ideas how I could make this only occur once?

For instance:

select * from track where datetime >= '2007-04-01' and datetime <
'2007-05-01'; takes about 30 ms to return 650K rows.

select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month'2007-04-01'; takes about 55 ms to return 650K rows

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
declare
inputmonth1 integer;
inputyear1 integer;
inputmonth2 integer;
inputyear2 integer;
resultdate date;
BEGIN
inputmonth1 = extract(month from inputdate)::integer;
inputyear1 = extract(year from inputdate)::integer;

if inputmonth1 = 12 then
inputyear2 = inputyear1 + 1;
else
inputyear2 = inputyear1;
end if;

if inputmonth1 = 12 then
inputmonth2 = 1;
else
inputmonth2 = inputmonth1 + 1;
end if;

resultdate = (inputyear2)::text || '-' || (inputmonth2)::text || '-' ||
'01';
resultdate = to_date(resultdate::text,'yyyy-MM-DD');

RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

#2Greg Smith
gsmith@gregsmith.com
In reply to: Willem Buitendyk (#1)
Re: Mechanics of Select

On Sun, 10 Feb 2008, Willem Buitendyk wrote:

I have the following function that returns the first day of the next month
from whatever date is inserted.

See if you can do this with date_trunc instead to avoid calling a
function, which avoids the whole thing. The first day of next month is:

select date_trunc('month',now())+interval '1 month';

I'd be curious how the runtime using that compares with the plpgsql
version you've done.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#3Bill Moran
wmoran@potentialtech.com
In reply to: Willem Buitendyk (#1)
Re: Mechanics of Select

Willem Buitendyk <willem@pcfish.ca> wrote:

I have the following function that returns the first day of the next
month from whatever date is inserted. If I use this as part of a select
statement then it takes almost twice as long to perform. Is this
because for each scanned record this function is being called?

An explain of the query would help you answer that question.

If so
any ideas how I could make this only occur once?

Don't mark it as VOLITILE. Sounds like an IMMUTABLE function to me.
PostgreSQL is doing exactly what you told it to do.

For instance:

select * from track where datetime >= '2007-04-01' and datetime <
'2007-05-01'; takes about 30 ms to return 650K rows.

select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month'2007-04-01'; takes about 55 ms to return 650K rows

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
declare
inputmonth1 integer;
inputyear1 integer;
inputmonth2 integer;
inputyear2 integer;
resultdate date;
BEGIN
inputmonth1 = extract(month from inputdate)::integer;
inputyear1 = extract(year from inputdate)::integer;

if inputmonth1 = 12 then
inputyear2 = inputyear1 + 1;
else
inputyear2 = inputyear1;
end if;

if inputmonth1 = 12 then
inputmonth2 = 1;
else
inputmonth2 = inputmonth1 + 1;
end if;

resultdate = (inputyear2)::text || '-' || (inputmonth2)::text || '-' ||
'01';
resultdate = to_date(resultdate::text,'yyyy-MM-DD');

RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bill Moran
http://www.potentialtech.com

#4brian
brian@zijn-digital.com
In reply to: Willem Buitendyk (#1)
Re: Mechanics of Select

Willem Buitendyk wrote:

I have the following function that returns the first day of the next
month from whatever date is inserted. If I use this as part of a select
statement then it takes almost twice as long to perform. Is this
because for each scanned record this function is being called? If so
any ideas how I could make this only occur once?

For instance:

select * from track where datetime >= '2007-04-01' and datetime <
'2007-05-01'; takes about 30 ms to return 650K rows.

select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month'2007-04-01'; takes about 55 ms to return 650K rows

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
declare
inputmonth1 integer;
inputyear1 integer;
inputmonth2 integer;
inputyear2 integer;
resultdate date;
BEGIN
inputmonth1 = extract(month from inputdate)::integer; inputyear1 =
extract(year from inputdate)::integer;

if inputmonth1 = 12 then
inputyear2 = inputyear1 + 1;
else
inputyear2 = inputyear1;
end if;

if inputmonth1 = 12 then
inputmonth2 = 1;
else
inputmonth2 = inputmonth1 + 1;
end if;

resultdate = (inputyear2)::text || '-' || (inputmonth2)::text || '-' ||
'01';
resultdate = to_date(resultdate::text,'yyyy-MM-DD');

RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
DECLARE
resultdate date;
BEGIN
SELECT INTO resultdate to_date(to_char((inputdate + interval \
'1 month'), 'yyyy-MM') || '-01', 'yyyy-mm-dd');
RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql';

Mind the wrap.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Willem Buitendyk (#1)
Re: Mechanics of Select

Willem Buitendyk <willem@pcfish.ca> writes:

I have the following function that returns the first day of the next
month from whatever date is inserted. If I use this as part of a select
statement then it takes almost twice as long to perform. Is this
because for each scanned record this function is being called? If so
any ideas how I could make this only occur once?

Don't declare it VOLATILE, when (AFAICS) you have no reason to.
It would be evaluated only once if it were marked IMMUTABLE.

BTW, I think you're doing it the hard way --- this could almost
certainly be a one-liner if you were using the available date
arithmetic facilities.

regards, tom lane

#6Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: brian (#4)
Re: Mechanics of Select

On Feb 11, 2008, at 12:43 AM, brian wrote:

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
DECLARE
resultdate date;
BEGIN
SELECT INTO resultdate to_date(to_char((inputdate + interval \
'1 month'), 'yyyy-MM') || '-01', 'yyyy-mm-dd');
RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql';

No need for the variable or the SELECT, and it's an immutable
function, so better define that. Besides that it's probably better to
use the date_trunc function here.

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
BEGIN
RETURN date_trunc('month', inputdate + interval '1 month');
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

And with that I wonder why you'd even need a function :)

Another thing I've taught myself is to prefix local variables and
parameters in functions, so that they can NEVER accidentally match a
column name that you use in a query (I usually use '_'). Otherwise
you can get silly queries like "SELECT * FROM table WHERE x = x" that
look perfectly fine while you're writing them down, being perfectly
able to make the distinction between *variable x* and *column x* in
your mind.

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47af8f8e167321323610058!

#7Willem Buitendyk
willem@pcfish.ca
In reply to: Greg Smith (#2)
Re: Mechanics of Select

As others have suggested my big problem with the function I wrote was
that I had made it Volatile instead of Immutable (it is no doubt
suffering from code bloat as well). That made all the difference.
Curiously though - I tried it just with the date_trunc function and it
was just as slow as my old Volatile function.

select * from track where datetime >= '2007-04-01' and datetime <
date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was
about 55s
select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month('2007-04-01'); was about 36s

cheers

Greg Smith wrote:

Show quoted text

On Sun, 10 Feb 2008, Willem Buitendyk wrote:

I have the following function that returns the first day of the next
month from whatever date is inserted.

See if you can do this with date_trunc instead to avoid calling a
function, which avoids the whole thing. The first day of next month is:

select date_trunc('month',now())+interval '1 month';

I'd be curious how the runtime using that compares with the plpgsql
version you've done.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#8Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Willem Buitendyk (#7)
Re: Mechanics of Select

On Feb 11, 2008, at 2:23 AM, Willem Buitendyk wrote:

As others have suggested my big problem with the function I wrote
was that I had made it Volatile instead of Immutable (it is no
doubt suffering from code bloat as well). That made all the
difference. Curiously though - I tried it just with the date_trunc
function and it was just as slow as my old Volatile function.

select * from track where datetime >= '2007-04-01' and datetime <
date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was
about 55s

That's probably because '2007-04-01'::timestamp can be at different
time zones depending on client configuration and hence is volatile.

If you need a timestamp you probably want to use the servers TZ,
which you can specify using: timestamp at <your timezone>

Are you always entering the first day of a month for start date? In
that case you can leave out the entire date_trunc as the interval
already calculates the correct length internally:

template1=> select '2007-04-01'::date + interval '1 month';
?column?
---------------------
2007-05-01 00:00:00

select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month('2007-04-01'); was about 36s

Also, specifying dates like this without specifying their format is a
bad habit in my book. You're in trouble as soon as the date format
for your database changes (different system, for example). I suggest
you use to_date('2007-04-01', 'YYYY-MM-DD') instead.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47b015f9167323996417255!

#9Bill Moran
wmoran@potentialtech.com
In reply to: Alban Hertroys (#6)
Re: Mechanics of Select

In response to Alban Hertroys <dalroi@solfertje.student.utwente.nl>:

On Feb 11, 2008, at 12:43 AM, brian wrote:

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
DECLARE
resultdate date;
BEGIN
SELECT INTO resultdate to_date(to_char((inputdate + interval \
'1 month'), 'yyyy-MM') || '-01', 'yyyy-mm-dd');
RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql';

No need for the variable or the SELECT, and it's an immutable
function, so better define that. Besides that it's probably better to
use the date_trunc function here.

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
RETURNS date AS
$BODY$
BEGIN
RETURN date_trunc('month', inputdate + interval '1 month');
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

And with that I wonder why you'd even need a function :)

Because it's clear what the function does by the name. It becomes
self-documenting, and ginormous queries will be easier to grok with
a function called first_day_next_month().

--
Bill Moran
http://www.potentialtech.com

#10Scott Marlowe
scott.marlowe@gmail.com
In reply to: Alban Hertroys (#8)
Re: Mechanics of Select

On Feb 11, 2008 3:56 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On Feb 11, 2008, at 2:23 AM, Willem Buitendyk wrote:

As others have suggested my big problem with the function I wrote
was that I had made it Volatile instead of Immutable (it is no
doubt suffering from code bloat as well). That made all the
difference. Curiously though - I tried it just with the date_trunc
function and it was just as slow as my old Volatile function.

select * from track where datetime >= '2007-04-01' and datetime <
date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was
about 55s

That's probably because '2007-04-01'::timestamp can be at different
time zones depending on client configuration and hence is volatile.

If you need a timestamp you probably want to use the servers TZ,
which you can specify using: timestamp at <your timezone>

No, straight up timestamps shouldn't have this problem, only timestamptz.

I'd suggest trying an index on the date_trunc function here and see if
that helped.