Adventures in PostgreSQL

Started by Josh Berkusover 23 years ago20 messages
#1Josh Berkus
josh@agliodbs.com

Folks,

I've recently started a new series of articles at TechDocs (
http://techdocs.postgresql.org ). This will be a biweekly (or
triweekly if I'm busy) column, with articles covering an entire range
of issues around my professional support of PostgreSQL.

Currently there are two articles up, both oriented toward
beginning-intermediate PostgreSQL users:

1. Restoring a corrupted Template 1 Database
2. Part I of "The Joy Of Index"

There will be a few articles oriented toward advanced users, but most
will keep the focus of the first two. Please take a look, and check
back regularly for new columns.

-Josh Berkus
Aglio Database Solutions
San Francisco

P.S. I'm posting this because the articles were inspired by questions I
was asked on these lists.

P.P.S. I still need book review submissions for the PostgreSQL Book
Review page at TechDocs!

#2D. Duccini
duccini@backpack.com
In reply to: Josh Berkus (#1)
date_trunc'd timestamp index possible?

I'm trying to create a index from a timestamp+tz field and want the index
to be date_trunc'd down to just the date

when i try to do a

create idxfoo on foo (date(footime));

i get a

ERROR: DefineIndex: index function must be marked IMMUTABLE

and it chokes on when i try to use the date_trunc() function as well

create idxfoo on foo (date_trunc('day',footime));

ERROR: parser: parse error at or near "'day'" at character 53

Any suggestions/workarounds (other than creating additional date-only
columns in the schema and indexing those???)

-d

#3Bruno Wolff III
bruno@wolff.to
In reply to: D. Duccini (#2)
Re: date_trunc'd timestamp index possible?

On Mon, Sep 27, 2004 at 19:14:09 -0500,
"D. Duccini" <duccini@backpack.com> wrote:

I'm trying to create a index from a timestamp+tz field and want the index
to be date_trunc'd down to just the date

when i try to do a

create idxfoo on foo (date(footime));

i get a

ERROR: DefineIndex: index function must be marked IMMUTABLE

and it chokes on when i try to use the date_trunc() function as well

create idxfoo on foo (date_trunc('day',footime));

ERROR: parser: parse error at or near "'day'" at character 53

Any suggestions/workarounds (other than creating additional date-only
columns in the schema and indexing those???)

The reason this doesn't work is that the timestamp to date conversion
depends on the time zone setting. In theory you should be able to avoid
this by specifying the time zone to check the date in. I tried something
like the following which I think should work, but doesn't:
create idxfoo on foo (date(timezone('UTC',footime)));

The conversion of the timestamp stored in footime should be immutable
and then taking the date should work. I did find that date of a timestamp
without time zone is treated as immutable.

I am not sure how to check if the supplied function for converting
a timestamp with time zone to a timestamp without timezone using a
specified time zone is immutable. I think this function should be
immutable, but that it probably isn't.

#4D. Duccini
duccini@backpack.com
In reply to: Bruno Wolff III (#3)
Re: date_trunc'd timestamp index possible?

The reason this doesn't work is that the timestamp to date conversion
depends on the time zone setting. In theory you should be able to avoid
this by specifying the time zone to check the date in. I tried something
like the following which I think should work, but doesn't:
create idxfoo on foo (date(timezone('UTC',footime)));

The conversion of the timestamp stored in footime should be immutable
and then taking the date should work. I did find that date of a timestamp
without time zone is treated as immutable.

I am not sure how to check if the supplied function for converting
a timestamp with time zone to a timestamp without timezone using a
specified time zone is immutable. I think this function should be
immutable, but that it probably isn't.

I think we found a way around it!

CREATE OR REPLACE FUNCTION date_immutable( timestamptz ) RETURNS date AS
'SELECT date( $1 ) ;' LANGUAGE 'sql' IMMUTABLE ;

CREATE INDEX "new_event_dt" ON "the_events" USING btree (
date_immutable( "event_dt_tm" ) ) ;

-----------------------------------------------------------------------------
david@backpack.com            BackPack Software, Inc.        www.backpack.com
+1 651.645.7550 voice       "Life is an Adventure.    
+1 651.645.9798 fax            Don't forget your BackPack!"   
-----------------------------------------------------------------------------
#5Bruno Wolff III
bruno@wolff.to
In reply to: Bruno Wolff III (#3)
Re: date_trunc'd timestamp index possible?

On Fri, Oct 01, 2004 at 13:28:30 -0500,
Bruno Wolff III <bruno@wolff.to> wrote:

I am not sure how to check if the supplied function for converting
a timestamp with time zone to a timestamp without timezone using a
specified time zone is immutable. I think this function should be
immutable, but that it probably isn't.

I found that most of the various timezone functions are marked as stable
instead of immutable. I think at least a couple of these should be
marked as immutable and I will try reporting this as a bug.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#3)
Re: [SQL] date_trunc'd timestamp index possible?

Bruno Wolff III <bruno@wolff.to> writes:

I am not sure how to check if the supplied function for converting
a timestamp with time zone to a timestamp without timezone using a
specified time zone is immutable. I think this function should be
immutable, but that it probably isn't.

Yup. In 7.4:

regression=# select provolatile from pg_proc where oid = 'timezone(text,timestamptz)'::regprocedure;
provolatile
-------------
s
(1 row)

regression=#

This is a thinko that's already been corrected for 8.0:

regression=# select provolatile from pg_proc where oid = 'timezone(text,timestamptz)'::regprocedure;
provolatile
-------------
i
(1 row)

regression=#

If you wanted you could just UPDATE pg_proc to correct this mistake.
Another possibility is to create a function that's an IMMUTABLE
wrapper around the standard function.

Looking at this, I realize that date_trunc() is mismarked: the
timestamptz variant is strongly dependent on the timezone setting
and so should be STABLE not IMMUTABLE. Ooops.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: D. Duccini (#4)
Re: date_trunc'd timestamp index possible?

"D. Duccini" <duccini@backpack.com> writes:

I think we found a way around it!

CREATE OR REPLACE FUNCTION date_immutable( timestamptz ) RETURNS date AS
'SELECT date( $1 ) ;' LANGUAGE 'sql' IMMUTABLE ;

No, you just found a way to corrupt your index. Pretending that
date(timestamptz) is immutable does not make it so. The above
*will* break the first time someone uses the table with a different
timezone setting.

What you can do safely is date(footime AT TIME ZONE 'something'),
since this nails down the zone in which the date is interpreted.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

I wrote:

Looking at this, I realize that date_trunc() is mismarked: the
timestamptz variant is strongly dependent on the timezone setting
and so should be STABLE not IMMUTABLE. Ooops.

On looking more closely, I think that all of these functions are
mislabeled:

oid | prorettype | prosrc | provolatile

should be stable not immutable:
date_trunc(text,timestamptz) | timestamptz | timestamptz_trunc | i
interval_pl_timestamptz(interval,timestamptz) | timestamptz | select $2 + $1 | i
timestamptz_pl_interval(timestamptz,interval) | timestamptz | timestamptz_pl_interval | i
timestamptz_mi_interval(timestamptz,interval) | timestamptz | timestamptz_mi_interval | i
"overlaps"(timestamptz,timestamptz,timestamptz,interval) | boolean | select ($1, $2) overlaps ($3, ($3 + $4)) | i
"overlaps"(timestamptz,interval,timestamptz,timestamptz) | boolean | select ($1, ($1 + $2)) overlaps ($3, $4) | i
"overlaps"(timestamptz,interval,timestamptz,interval) | boolean | select ($1, ($1 + $2)) overlaps ($3, ($3 + $4)) | i

should be immutable not stable:
to_char(timestamp,text) | text | timestamp_to_char | s
timestamptz(abstime) | timestamptz | abstime_timestamptz | s
abstime(timestamptz) | abstime | timestamptz_abstime | s

It's easy to demonstrate that timestamptz+interval is dependent on the
timezone setting:

regression=# set timezone = 'EST5EDT';
SET
regression=# select '2004-03-31 00:00-05'::timestamptz + '1 month'::interval;
?column?
------------------------
2004-04-30 00:00:00-04
(1 row)

regression=# set timezone = 'GMT';
SET
regression=# select '2004-03-31 00:00-05'::timestamptz + '1 month'::interval;
?column?
------------------------
2004-04-30 05:00:00+00
(1 row)

and then the overlaps variants have to follow along.

On the other side of the coin, I don't think that to_char has any
dependency on timezone when it is dealing with a timestamp without time
zone. (If you ask it for TZ you always get an empty string.) Likewise
there's no such dependency in abstime/timestamptz conversions.

Do you see any other mislabelings?

What I'm inclined to do with these is change pg_proc.h but not force an
initdb. Does anyone want to argue for an initdb to force it to be fixed
in 8.0? We've lived with the wrong labelings for some time now without
noticing, so it doesn't seem like a serious enough bug to force a
post-beta initdb ... to me anyway.

regards, tom lane

#9Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#8)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

On Fri, Oct 01, 2004 at 18:53:03 -0400,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I'm inclined to do with these is change pg_proc.h but not force an
initdb. Does anyone want to argue for an initdb to force it to be fixed
in 8.0? We've lived with the wrong labelings for some time now without
noticing, so it doesn't seem like a serious enough bug to force a
post-beta initdb ... to me anyway.

As long as it is mentioned in the release notes, it doesn't seem worth
forcing an initdb.

#10Gaetano Mendola
mendola@bigfoot.com
In reply to: Tom Lane (#8)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd

Tom Lane wrote:

I wrote:
Do you see any other mislabelings?

I don't but I think that the concept of immutable shall be expanded.
I mean I can use safely a date_trunc immutable in a query ( I think this
is a sort of "immutable per statement" ) but not in a index definition
( the index mantainance is affected by the current timezonesettings ).
So may be another modifier shall be introduced that reflect the "immutable
per statement"

What I'm inclined to do with these is change pg_proc.h but not force an
initdb. Does anyone want to argue for an initdb to force it to be fixed
in 8.0? We've lived with the wrong labelings for some time now without
noticing, so it doesn't seem like a serious enough bug to force a
post-beta initdb ... to me anyway.

I think that an initdb is not required but at least a script, released
only with the 8.0, that will update the catalogs could be usefull.

Regards
Gaetano Mendola

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#8)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

Tom Lane wrote:

What I'm inclined to do with these is change pg_proc.h but not force
an initdb. Does anyone want to argue for an initdb to force it to be
fixed in 8.0? We've lived with the wrong labelings for some time now
without noticing, so it doesn't seem like a serious enough bug to
force a post-beta initdb ... to me anyway.

I'd prefer if all users of 8.0 were guaranteed to have the same catalog.
I don't want to ask users, "what version, and when did you last
initdb". We're still in beta; no one purchased any stability
guarantees.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

Peter Eisentraut <peter_e@gmx.net> writes:

Tom Lane wrote:

What I'm inclined to do with these is change pg_proc.h but not force
an initdb. Does anyone want to argue for an initdb to force it to be
fixed in 8.0? We've lived with the wrong labelings for some time now
without noticing, so it doesn't seem like a serious enough bug to
force a post-beta initdb ... to me anyway.

I'd prefer if all users of 8.0 were guaranteed to have the same catalog.

Well, there's something to be said for that viewpoint too. Anyone else
feel the same?

If we do go for an initdb, I'd like at the same time to do something
I had intended to do but forgotten, which is to yank the functions
and operators for basic arithmetic on type "char", and instead put in
(explicit) conversions between "char" and integer. See for instance
http://archives.postgresql.org/pgsql-sql/2002-11/msg00116.php
http://archives.postgresql.org/pgsql-general/2004-08/msg01562.php
http://archives.postgresql.org/pgsql-general/2004-09/msg01209.php

Specifically I want to remove these operators:

oid | oid | oprresult
-----+-------------------+-----------
635 | +("char","char") | "char"
636 | -("char","char") | "char"
637 | *("char","char") | "char"
638 | /("char","char") | "char"

and their underlying functions:

oid | oid | prorettype | prosrc
------+--------------------------+------------+-------------
1248 | charpl("char","char") | "char" | charpl
1250 | charmi("char","char") | "char" | charmi
77 | charmul("char","char") | "char" | charmul
78 | chardiv("char","char") | "char" | chardiv

The following operators on "char" will remain:

oid | oid | oprresult
-----+-------------------+-----------
92 | =("char","char") | boolean
630 | <>("char","char") | boolean
631 | <("char","char") | boolean
632 | <=("char","char") | boolean
633 | >("char","char") | boolean
634 | >=("char","char") | boolean

These are not as dangerous as the arithmetic operators, because in a
situation where the parser is having difficulty resolving types, it
will prefer the "text" comparison operators over these. The reason
the "char" arithmetic operators are dangerous is that they are the only
ones of those names in the STRING type category.

regards, tom lane

#13Dave Page
dpage@vale-housing.co.uk
In reply to: Tom Lane (#12)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: 02 October 2004 19:23
To: Peter Eisentraut
Cc: Bruno Wolff III; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Mislabeled timestamp functions (was
Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

I'd prefer if all users of 8.0 were guaranteed to have the

same catalog.

Well, there's something to be said for that viewpoint too.
Anyone else feel the same?

It makes sense to me. Especially with hordes of win32 newbies gathering
at the door it'll be one less variable to think about.

Regards, Dave.

#14Bruno Wolff III
bruno@wolff.to
In reply to: Gaetano Mendola (#10)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd

On Sat, Oct 02, 2004 at 10:43:01 +0200,
Gaetano Mendola <mendola@bigfoot.com> wrote:

Tom Lane wrote:

I wrote:
Do you see any other mislabelings?

I don't but I think that the concept of immutable shall be expanded.
I mean I can use safely a date_trunc immutable in a query ( I think this
is a sort of "immutable per statement" ) but not in a index definition
( the index mantainance is affected by the current timezonesettings ).
So may be another modifier shall be introduced that reflect the "immutable
per statement"

There has been such a distinction for a major release or two. "Stable"
is how you mark a function that will return the same value within a
single transaction.

#15Bruno Wolff III
bruno@wolff.to
In reply to: Bruno Wolff III (#14)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd

On Sat, Oct 02, 2004 at 15:04:51 -0500,
Bruno Wolff III <bruno@wolff.to> wrote:

On Sat, Oct 02, 2004 at 10:43:01 +0200,

There has been such a distinction for a major release or two. "Stable"
is how you mark a function that will return the same value within a
single transaction.

I should have said within a single statement instead of within a single
transaction.

#16Gaetano Mendola
mendola@bigfoot.com
In reply to: Bruno Wolff III (#15)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd

Bruno Wolff III wrote:

On Sat, Oct 02, 2004 at 15:04:51 -0500,
Bruno Wolff III <bruno@wolff.to> wrote:

On Sat, Oct 02, 2004 at 10:43:01 +0200,

There has been such a distinction for a major release or two. "Stable"
is how you mark a function that will return the same value within a
single transaction.

I should have said within a single statement instead of within a single
transaction.

I know that but a stable function is not called once inside the same query,
instead an immutable is:

sp_immutable() is a simple immutable function
sp_stable() is a simple stable function
sp_foo() is a simple function

test is a table with two rows in it.

regression=# select sp_stable(), sp_immutable(), sp_foo() from test;
NOTICE: sp_immutable called
NOTICE: sp_stable called
NOTICE: sp_foo called
NOTICE: sp_stable called
NOTICE: sp_foo called
sp_stable | sp_immutable | sp_foo
-----------+--------------+--------
0 | 0 | 0
0 | 0 | 0
(2 rows)

so now do you see what do I mean ?

The stable function is threated "stable" only if inserted inside a filter:

regression=# select * from test where sp_stable() = 3;
NOTICE: sp_stable called
a
---
(0 rows)

and from this point of view immutable is not immutable enough:

regression=# select sp_immutable() from test where sp_immutable() = 3;
NOTICE: sp_immutable called
NOTICE: sp_immutable called
sp_immutable
--------------
(0 rows)

Regards
Gaetano Mendola

#17Mike Rylander
mrylander@gmail.com
In reply to: Tom Lane (#12)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

Not that my 2c is worth 1c, but I second this. I'd rather initdb now
than get bitten by some catalog difference when I move my DB into
production. :)

--miker

On Sat, 02 Oct 2004 14:22:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[...]

I'd prefer if all users of 8.0 were guaranteed to have the same catalog.

Well, there's something to be said for that viewpoint too. Anyone else
feel the same?

[...]

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#1)
Re: Stable function semantics (was Re: Mislabeled timestamp functions)

Oliver Jowett <oliver@opencloud.com> writes:

Bruno Wolff III wrote:

I should have said within a single statement instead of within a single
transaction.

As I understand Tom's earlier explanation of this, the definition is
even more narrow: stable functions only need to return the same value
across a single tablescan.

It might be useful to have some variant of stable (or perhaps just a
change in semantics) such that the function returns the same value for
identical parameters until the next CommandCounterIncrement.

In practice I think these are equivalent definitions.

regards, tom lane

#19Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#12)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

Tom Lane <tgl@sss.pgh.pa.us> writes:

Peter Eisentraut <peter_e@gmx.net> writes:

Tom Lane wrote:

What I'm inclined to do with these is change pg_proc.h but not force
an initdb. Does anyone want to argue for an initdb to force it to be
fixed in 8.0? We've lived with the wrong labelings for some time now
without noticing, so it doesn't seem like a serious enough bug to
force a post-beta initdb ... to me anyway.

I'd prefer if all users of 8.0 were guaranteed to have the same catalog.

Well, there's something to be said for that viewpoint too. Anyone else
feel the same?

I would wonder about any users that are happily using beta3 with test data and
upgrade to 8.0 without any problems but at some point later have trouble
restoring from a pg_dump.

Specifically I want to remove these operators:

oid | oid | oprresult
-----+-------------------+-----------
635 | +("char","char") | "char"
636 | -("char","char") | "char"
637 | *("char","char") | "char"
638 | /("char","char") | "char"
...
The reason the "char" arithmetic operators are dangerous is that they are
the only ones of those names in the STRING type category.

What would happen if "char" were just removed from the STRING type category?
Or alternatively if it were broken out into two data types, "char" which
didn't have these operators, and int1 which only had these operators and not
all the string operators?

It does seem like having a fixed size 1 byte integer data type would be
something appealing. Personally I find a lot of demand in my database models
for status flags that have very few possible states (often only two but I
don't want to limit myself with a boolean and booleans don't behave nicely
with any other application language since they return 't' and 'f'). I could
easily see some very large table where someone wants to store lots of small
integers that need some arithmetic capabilities.

--
greg

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#19)
Re: Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

Greg Stark <gsstark@mit.edu> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

The reason the "char" arithmetic operators are dangerous is that they are
the only ones of those names in the STRING type category.

What would happen if "char" were just removed from the STRING type category?

What other category would you put it in? The I/O behavior of "char"
is certainly not very conducive to thinking of it as storing integral
values, anyway.

Or alternatively if it were broken out into two data types, "char" which
didn't have these operators, and int1 which only had these operators and not
all the string operators?

I don't have an objection in principle to an int1 datatype, but there
are a couple of practical objections; primarily that that looks way too
much like a new feature for this point in the 8.0 cycle. (I seem to
recall having once had concerns about unexpected side effects from
adding another set of overloaded operators to the NUMERIC category, too;
but I'm not sure if that's still relevant given the resolution-rule
changes we've made in the last couple releases.)

Even with an int1 datatype, I'm not sure it makes sense to provide
arithmetic operators specifically for the type, as opposed to providing
implicit coercions to "integer" and letting the actual arithmetic
happen at integer width.

regards, tom lane