The Contrib Roundup (long)

Started by Josh Berkusalmost 21 years ago66 messageshackers
Jump to latest
#1Josh Berkus
josh@agliodbs.com

Folks,

I had a lot of time to kill on airplanes recently so I've gone
digging through /contrib in an effort to sort out what's in
there and try to apply some consistent rules to it. Before
people read further, please understand that this is just an
initial discussion on what will and won't be in contrib for
8.1; nobody has made any decisions yet.

What Should Be In Contrib?
-------------------------------
Looking over what's in there most of the reasonable contrib
options fall into 3 groups: extra data types, extra functions
and backend utilities. These all seem reasonable things to put
into contrib, with the addition of other code being
tested for inclusion in the core. These categories also
pretty much cover things that need to be inside the PostgreSQL
source to build.

What Shouldn't Be In Contrib?
-------------------------------
The things I think we should exclude from contrib are rather
more varied. Based on examples:

a) Code with major external dependencies other than a
programming language. Partly this is because this means they
are useful to less users; more importantly, this is because the
external dependencies mean that the release cycle for these
tools is likely to be determined by the external dependency and
not by PostgreSQL's release cycle. Further, the external
dependencies mean that it's less likely that the postgresql
core programmers can maintain them in the event that the
original developer goes away. The Mysql conversion scripts are
a good example of this; I don't believe that my2pg even works
with MySQL 4.

b) Alpha-quality code and unfinished projects. Shipping
something with the PostgreSQL source code implies a certain
level of stability, completeness and quality. We shouldn't be
including scripts which took 2 hours to write and have only
been tested on one platform. This stuff can get developed on
pgFoundry and moved to contrib when it's close to mature.

c) Differently licensed code. I'm not an attorney: I won't
pretend to know which licenses it's legal to bundle in our
tarballs and which are not. But I do know that most users and
redistributors aren't going to grep contrib looking for other
licenses, and putting differently licensed stuff in there is
bad pr at best, and a legal booby trap at worst.
(Particularly, there are 3 contrib modules by Massimo del Zotto,
which are GPL licensed. According to the FSF's licensing admin,
installing any these contrib modules will instantly make that
copy of PostgreSQL GPL.)

d) Application code and example code. Contrib is *not* a good
place for "here's how you do this in an application" kind of
code. It's not visible enough to be documentation, and such
examples aren't generally useful to the majority of users as
code.

Moving to PgFoundry is NOT "Demotion"
----------------------------------------
I know that I'm going to get a lot of resistance for the idea
of moving some project to pgFoundry, because authors feel that
it's a "demotion" for their code not to be shipped with the
PostgreSQL source. However, being on pgFoundry increases the
visibility of your code and allows a wider array of people to
contribute to it -- and even find it. And for items of
particularly broad utility, stuff can always go from pgFoundry
into the core when mature or when utility is demonstrated.

Contrib Subdirectories?
-------------------------------------
I think it would also be helpful to users if we could create
subdirectories to organize contrib into categories. This would
help users and packagers find what they want. These
directories would be:
data_types/
functions/
utilities/
I've noted below which contrib code I think should go in those
subdirs.

Contrib Build Options?
---------------------------
I'll point out that several people (including one of our
RPM builders) spoke up in favor of the idea of adding ./contrib
command line options for individual contrib items. Discussion
was dropped without a decision being reached. That would work
like:
./configure --with-perl --prefix=/usr/pgsql --with-tsearch2
--with-fuzzystrmatch

Documentation
--------------------------
As previously mentioned, all contrib modules need to have
documentation in the main postgreSQL docs. Probably their own
section, called "Optional Modules".

Contrib Item Listing
--------------------------------
What follows is my notes on individual contrib projects. Many
contain questions because I don't know enough about the item.
Please read through them an provide what feedback you can.
Especially, provide feedback on the items I'm suggesting
eliminating or moving out. I've noted the author contact info
where I'm thinking of moving modules, and will be attempting to
contact those authors if we decide to change status.

adddepend: is this still needed, or would a proper
dump-and-reload from 7.2 add the dependancy information anyway?

array: placeholder for old array module; contains only a
readme. Should probably be dropped for 8.2.

btree_gist: data_types/

chkpass: data_types/

cube: README needs documentation on what the module is *for*.

dbmirror: should be on pgfoundry/gborg with other replication
systems. Stephen Singer (ssinger@navtechinc.com)

dbsize: functions/

earthdistance: data_types/

findoidjoins: again, it's not clear what this module is for.
Bruce?

fulltextindex: Obsolesced by Tsearch2. Also rather a
brute-force technique for FTI possibly more useful as an
illustration of advance trigger use than as an index. Move to
pgfoundry or techdocs? Maarten Boekhold
(maartenb@dutepp0.et.tudelft.nl)

fuzzystrmatch: functions/

intagg: what does this module do which is not already available
through the built-in array functions and operators? Maybe I
don't understand what it does. Unnatributed in the README. Move
to pgfoundry?

intarray: data_types/

ipc_check: nice idea, possibly useful but works only on FreeBSD.
Needs to be vastly expanded to support multiple platforms.
Work on replacing with "Configurator" project at pgfoundry.
Author unattributed. Recommend removal.

isbn_issn: more data types. Has anyone tested this one lately?
It appears not to have been modified since 7.2. data_types/

lo: another special data type. Is its functionality required
anymore? It appears to be a workaround to some limitations of
our large object interface which may no longer exist. Author
Peter Mount ( peter@retep.org.uk ) data_types/

ltree: data_types/

msql_interface: does anyone use mSQL anymore? In any case,
conversion and foriegn-database-connection tools definitely
belong on pgFoundry. Author Aldrin Leal (
aldrin@americasnet.com ).

mac: A special purpose script which I doubt works on all
platforms. Belongs on pgFoundry so that maybe someone will
take an interest in expanding it.

misc_utils: I believe that all of these utils are obsolesced by
builtin system commands or easily written userspace functions
(like max(x,y)). Also, is under the GPL (see above). Author
Massimo Dal Zotto (dz@cs.unitn.it)

mysql: these utilities have been moved to project sites (such as
GBorg), and I believe that my2pg is broken with current versions
of MySQL. Can we remove this from contrib?

noupdate: this is a cool example of a simple C trigger and would
be lovely to have in a doc somewhere. However, its
functionality is easily replicated through a simple PL/pgSQL
trigger so it seems unnecessary as a contrib module. Author
unattributed.

oid2name: a useful backend utility which is used by a number of
external tools. What would it take to make this a builtin
binary? utilities/

oracle: again, very useful and I wish to move it to pgFoundry
and take over maintenance of it. Author Gilles Darold
(gilles@darold.net).

pg_autovaccuum: moving into the backend.

pg_buffercache: another useful backend utility. Seems perfect
for contrib. utilities/

pg_dumplo: is this still required for pg large objects? If
so, can't we integrate it into the core? utilities/

pg_trgm: data_types/

pg_upgrade: what's the status of this, Bruce? Does it work at
all? Shouldn't this be moved to the pgfoundry project of the
same name until it's stable?

pgbench: I see repeated complaints on -performance about how
pgbench results are misleading. Why are we shipping it with
PostgreSQL then? Shouldn't this be on pgFoundry, maybe in the
testperf project? Shouldn't all performance tests be on
pgFoundry instead of in the code, unless they're part of
regression tests?

pgcrypto: more for /functions. And a good reason to keep the
main PostgreSQL ftp servers outside the US :-b

pgstattuple: utilities/

reindexdb: now obsolete per the REINDEX {database} command.
Remove from contrib.

rtree_gist: data_types/

seg: data_types/

spi: contains TimeTravel functions. Do these actually still
work? The spi stuff is good for documentation purposes anyway
... but if the functions aren't working, should be in the docs
and not /contrib.

start-scripts: utilities/. Needs to be expanded and
checked against more oses.

string: data_types/ Same problem as Massimo's
other library; it's GPL. Also, is it really needed at this
point? Massimo (dz@cs.unitn.it).

tablefunc: functions/

tips: this is a proto-apache-log-slurping project, in *alpha*.
As such, it really needs to be on pgFoundry. Author Terry
Mackintosh (terry@terrym.com)

tools: Two of these are emacs scripts, and would be better
on pgFoundry if not on Savannah. The find-sources shell
script is again GPL and should probably be removed, and moreover
appears to have nothing to do with PostgreSQL.

tsearch: obsolesced by tsearch2. Should be moved to pgfoundry
where it can be maintained by users needing backwards
compatibility.

userlocks: another GPL script, with the problems that entails.
Also problematic as it relies heavily on per-record OIDs,
something we tell users not to do. Overall, should be removed.
Author: Massimo.

vacuumlo: is this still required? If utilities/.

xml and xml2: both by John Gray (jgray@azuli.co.uk). John, why
do we have two of these? Otherwise, data_types/.
--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#2elein
elein@varlena.com
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

a few comments scattered inline...

On Tue, Jun 07, 2005 at 02:53:32PM -0300, Josh Berkus wrote:

Folks,

I had a lot of time to kill on airplanes recently so I've gone
digging through /contrib in an effort to sort out what's in
there and try to apply some consistent rules to it. Before
people read further, please understand that this is just an
initial discussion on what will and won't be in contrib for
8.1; nobody has made any decisions yet.

What Should Be In Contrib?
-------------------------------
Looking over what's in there most of the reasonable contrib
options fall into 3 groups: extra data types, extra functions
and backend utilities. These all seem reasonable things to put
into contrib, with the addition of other code being
tested for inclusion in the core. These categories also
pretty much cover things that need to be inside the PostgreSQL
source to build.

What Shouldn't Be In Contrib?
-------------------------------
The things I think we should exclude from contrib are rather
more varied. Based on examples:

a) Code with major external dependencies other than a
programming language. Partly this is because this means they
are useful to less users; more importantly, this is because the
external dependencies mean that the release cycle for these
tools is likely to be determined by the external dependency and
not by PostgreSQL's release cycle. Further, the external
dependencies mean that it's less likely that the postgresql
core programmers can maintain them in the event that the
original developer goes away. The Mysql conversion scripts are
a good example of this; I don't believe that my2pg even works
with MySQL 4.

b) Alpha-quality code and unfinished projects. Shipping
something with the PostgreSQL source code implies a certain
level of stability, completeness and quality. We shouldn't be
including scripts which took 2 hours to write and have only
been tested on one platform. This stuff can get developed on
pgFoundry and moved to contrib when it's close to mature.

c) Differently licensed code. I'm not an attorney: I won't
pretend to know which licenses it's legal to bundle in our
tarballs and which are not. But I do know that most users and
redistributors aren't going to grep contrib looking for other
licenses, and putting differently licensed stuff in there is
bad pr at best, and a legal booby trap at worst.
(Particularly, there are 3 contrib modules by Massimo del Zotto,
which are GPL licensed. According to the FSF's licensing admin,
installing any these contrib modules will instantly make that
copy of PostgreSQL GPL.)

I agree that anything that is not BSD licensed should not go
into contrib.

d) Application code and example code. Contrib is *not* a good
place for "here's how you do this in an application" kind of
code. It's not visible enough to be documentation, and such
examples aren't generally useful to the majority of users as
code.

Moving to PgFoundry is NOT "Demotion"
----------------------------------------
I know that I'm going to get a lot of resistance for the idea
of moving some project to pgFoundry, because authors feel that
it's a "demotion" for their code not to be shipped with the
PostgreSQL source. However, being on pgFoundry increases the
visibility of your code and allows a wider array of people to
contribute to it -- and even find it. And for items of
particularly broad utility, stuff can always go from pgFoundry
into the core when mature or when utility is demonstrated.

Contrib Subdirectories?
-------------------------------------
I think it would also be helpful to users if we could create
subdirectories to organize contrib into categories. This would
help users and packagers find what they want. These
directories would be:
data_types/
functions/
utilities/
I've noted below which contrib code I think should go in those
subdirs.

These directories are misleading since all data types include functions.
If we are paring down contrib, I see no reason to reorganize them.

Contrib Build Options?
---------------------------
I'll point out that several people (including one of our
RPM builders) spoke up in favor of the idea of adding ./contrib
command line options for individual contrib items. Discussion
was dropped without a decision being reached. That would work
like:
./configure --with-perl --prefix=/usr/pgsql --with-tsearch2
--with-fuzzystrmatch

Documentation
--------------------------
As previously mentioned, all contrib modules need to have
documentation in the main postgreSQL docs. Probably their own
section, called "Optional Modules".

Contrib Item Listing
--------------------------------
What follows is my notes on individual contrib projects. Many
contain questions because I don't know enough about the item.
Please read through them an provide what feedback you can.
Especially, provide feedback on the items I'm suggesting
eliminating or moving out. I've noted the author contact info
where I'm thinking of moving modules, and will be attempting to
contact those authors if we decide to change status.

adddepend: is this still needed, or would a proper
dump-and-reload from 7.2 add the dependancy information anyway?

array: placeholder for old array module; contains only a
readme. Should probably be dropped for 8.2.

btree_gist: data_types/

Actually this is an index, not a datatype

chkpass: data_types/

cube: README needs documentation on what the module is *for*.

dbmirror: should be on pgfoundry/gborg with other replication
systems. Stephen Singer (ssinger@navtechinc.com)

dbsize: functions/

earthdistance: data_types/

Isn't this just a function?

findoidjoins: again, it's not clear what this module is for.
Bruce?

fulltextindex: Obsolesced by Tsearch2. Also rather a
brute-force technique for FTI possibly more useful as an
illustration of advance trigger use than as an index. Move to
pgfoundry or techdocs? Maarten Boekhold
(maartenb@dutepp0.et.tudelft.nl)

fuzzystrmatch: functions/

intagg: what does this module do which is not already available
through the built-in array functions and operators? Maybe I
don't understand what it does. Unnatributed in the README. Move
to pgfoundry?

intarray: data_types/

what does this do that arrays do not?

Show quoted text

ipc_check: nice idea, possibly useful but works only on FreeBSD.
Needs to be vastly expanded to support multiple platforms.
Work on replacing with "Configurator" project at pgfoundry.
Author unattributed. Recommend removal.

isbn_issn: more data types. Has anyone tested this one lately?
It appears not to have been modified since 7.2. data_types/

lo: another special data type. Is its functionality required
anymore? It appears to be a workaround to some limitations of
our large object interface which may no longer exist. Author
Peter Mount ( peter@retep.org.uk ) data_types/

ltree: data_types/

msql_interface: does anyone use mSQL anymore? In any case,
conversion and foriegn-database-connection tools definitely
belong on pgFoundry. Author Aldrin Leal (
aldrin@americasnet.com ).

mac: A special purpose script which I doubt works on all
platforms. Belongs on pgFoundry so that maybe someone will
take an interest in expanding it.

misc_utils: I believe that all of these utils are obsolesced by
builtin system commands or easily written userspace functions
(like max(x,y)). Also, is under the GPL (see above). Author
Massimo Dal Zotto (dz@cs.unitn.it)

mysql: these utilities have been moved to project sites (such as
GBorg), and I believe that my2pg is broken with current versions
of MySQL. Can we remove this from contrib?

noupdate: this is a cool example of a simple C trigger and would
be lovely to have in a doc somewhere. However, its
functionality is easily replicated through a simple PL/pgSQL
trigger so it seems unnecessary as a contrib module. Author
unattributed.

oid2name: a useful backend utility which is used by a number of
external tools. What would it take to make this a builtin
binary? utilities/

oracle: again, very useful and I wish to move it to pgFoundry
and take over maintenance of it. Author Gilles Darold
(gilles@darold.net).

pg_autovaccuum: moving into the backend.

pg_buffercache: another useful backend utility. Seems perfect
for contrib. utilities/

pg_dumplo: is this still required for pg large objects? If
so, can't we integrate it into the core? utilities/

pg_trgm: data_types/

pg_upgrade: what's the status of this, Bruce? Does it work at
all? Shouldn't this be moved to the pgfoundry project of the
same name until it's stable?

pgbench: I see repeated complaints on -performance about how
pgbench results are misleading. Why are we shipping it with
PostgreSQL then? Shouldn't this be on pgFoundry, maybe in the
testperf project? Shouldn't all performance tests be on
pgFoundry instead of in the code, unless they're part of
regression tests?

pgcrypto: more for /functions. And a good reason to keep the
main PostgreSQL ftp servers outside the US :-b

pgstattuple: utilities/

reindexdb: now obsolete per the REINDEX {database} command.
Remove from contrib.

rtree_gist: data_types/

seg: data_types/

spi: contains TimeTravel functions. Do these actually still
work? The spi stuff is good for documentation purposes anyway
... but if the functions aren't working, should be in the docs
and not /contrib.

start-scripts: utilities/. Needs to be expanded and
checked against more oses.

string: data_types/ Same problem as Massimo's
other library; it's GPL. Also, is it really needed at this
point? Massimo (dz@cs.unitn.it).

tablefunc: functions/

tips: this is a proto-apache-log-slurping project, in *alpha*.
As such, it really needs to be on pgFoundry. Author Terry
Mackintosh (terry@terrym.com)

tools: Two of these are emacs scripts, and would be better
on pgFoundry if not on Savannah. The find-sources shell
script is again GPL and should probably be removed, and moreover
appears to have nothing to do with PostgreSQL.

tsearch: obsolesced by tsearch2. Should be moved to pgfoundry
where it can be maintained by users needing backwards
compatibility.

userlocks: another GPL script, with the problems that entails.
Also problematic as it relies heavily on per-record OIDs,
something we tell users not to do. Overall, should be removed.
Author: Massimo.

vacuumlo: is this still required? If utilities/.

xml and xml2: both by John Gray (jgray@azuli.co.uk). John, why
do we have two of these? Otherwise, data_types/.
--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#3Robert Treat
xzilla@users.sourceforge.net
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

On Tue, 2005-06-07 at 13:53, Josh Berkus wrote:

mysql: these utilities have been moved to project sites (such as
GBorg), and I believe that my2pg is broken with current versions
of MySQL. Can we remove this from contrib?

I believe this version now lives at
http://gborg.postgresql.org/project/mysql2psql/projdisplay.php, although
there are other versions. I agree it should be removed.

reindexdb: now obsolete per the REINDEX {database} command.
Remove from contrib.

actually I think part of the point of this was to give a command line
version of the reindex command, like we have for vaccum. If that still
matters, then it should probably stay. Actually it should probably be
converted to C and moved to /src/bin.

xml and xml2: both by John Gray (jgray@azuli.co.uk). John, why
do we have two of these? Otherwise, data_types/.

istr that xml2 had some expanded capabilties at the expense of
additional security issues, but we should wait for the author to jump
in.

Josh, was this comprehensive? I don't see dblink, and was thinking there
was some others missing... soundex ?.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#4Andrew - Supernews
andrew+nonews@supernews.com
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

On 2005-06-07, Josh Berkus <josh@agliodbs.com> wrote:

userlocks: another GPL script, with the problems that entails.
Also problematic as it relies heavily on per-record OIDs,
something we tell users not to do. Overall, should be removed.
Author: Massimo.

userlocks is just a very thin interface to functionality that's really in
the backend. What's left in contrib/userlock probably isn't even
copyrightable in any case. The best bet is probably to re-implement it in
the backend directly.

Removing it certainly isn't a good idea; the functionality is important.
(It doesn't "rely on per-record OIDs" either.)

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

#5Alvaro Herrera
alvherre@surnet.cl
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

On Tue, Jun 07, 2005 at 02:53:32PM -0300, Josh Berkus wrote:

Moving to PgFoundry is NOT "Demotion"
----------------------------------------

Yeah, I agree. Lots of people understand "search in pgfoundry.org" much
easily than "see contrib/adddepend". (I agree with most of the rest of
your comments as well.)

adddepend: is this still needed, or would a proper
dump-and-reload from 7.2 add the dependancy information anyway?

Yes, it's still needed: a normal dump/reload doesn't fix the problem.

findoidjoins: again, it's not clear what this module is for.
Bruce?

I don't think this should be a contrib at all. It's more like a
developer tool.

lo: another special data type. Is its functionality required
anymore? It appears to be a workaround to some limitations of
our large object interface which may no longer exist.

No, it's still needed I think. It's somewhat redundant with vacuumlo
apparently? The functionality of both should be incorporated into the
backend somehow, I'd think.

pg_dumplo: is this still required for pg large objects? If
so, can't we integrate it into the core? utilities/

I believe pg_dump has this functionality, with -O.

reindexdb: now obsolete per the REINDEX {database} command.
Remove from contrib.

No, this is different than REINDEX DATABASE.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
Oh, oh, las chicas galacianas, lo har�n por las perlas,
�Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, �Prueba una hija de Caladan! (Gurney Halleck)

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Herrera (#5)
Re: The Contrib Roundup (long)

lo: another special data type. Is its functionality required
anymore? It appears to be a workaround to some limitations of
our large object interface which may no longer exist.

I **think** the lo datatype is for ODBC binary access.

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#7Rod Taylor
rbt@rbt.ca
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

adddepend: is this still needed, or would a proper
dump-and-reload from 7.2 add the dependancy information anyway?

No, a 7.2 to 7.3 or later upgrade will not have full dependency
information using pg_dump.

That said, I would abandon the module anyway. I don't recall testing it
for a 7.2 to 8.0 upgrade, let alone to 8.1. It's probably been broken in
some way by now (table spaces?)

--

#8Josh Berkus
josh@agliodbs.com
In reply to: Andrew - Supernews (#4)
Re: The Contrib Roundup (long)

Andrew,

userlocks is just a very thin interface to functionality that's really in
the backend. What's left in contrib/userlock probably isn't even
copyrightable in any case. The best bet is probably to re-implement it in
the backend directly.

Removing it certainly isn't a good idea; the functionality is important.

Hmm. It needs to be re-written from scratch then so that we can remove the
GPL, or if you can get an attorney to say it's not copyrightable ...

(It doesn't "rely on per-record OIDs" either.)

Ah, I misread the code then. It still seems like application code to me, but
I'll happily admit to not really understanding it.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#9Doug McNaught
doug@mcnaught.org
In reply to: Joshua D. Drake (#6)
Re: The Contrib Roundup (long)

"Joshua D. Drake" <jd@commandprompt.com> writes:

lo: another special data type. Is its functionality required
anymore? It appears to be a workaround to some limitations of
our large object interface which may no longer exist.

I **think** the lo datatype is for ODBC binary access.

Yes, ISTR needing to install it to use ODBC BLOBs. I wonder if it
should be packaged with the ODBC driver instead of being in contrib/?

-Doug

#10Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

Josh Berkus wrote:

intagg: what does this module do which is not already available
through the built-in array functions and operators? Maybe I
don't understand what it does. Unnatributed in the README. Move
to pgfoundry?

Short summary:

Is there an equivalent of "int_array_enum()" built in?

I use it for substantial (9X) performance improvements
for doing joins similar to those described in its README.

I think it can be used to do somewhat similar things with
integer arrays that the SQL2003 UNNEST operator does
on MULTISETs (but yeah, they're quite different too).

Long and boring, but with examples:

I find that it can speed up certain kinds of joins (like
those described in it's readme) drastically. I have a
pretty big application that has a lot of joins that use
int_array_enum() to expand an array stored in one
column into something that looks like a table instead
of having a third join table connecting two tables.

Note that this is often much faster than the
array IN/ANY/SOME/NOT IN comparisons because when
planning the join it can all the various join plans
like hash joins; while the array operators seem to
just do linear searches of the arrays.

This trick is especially useful in conjunction with an aggregate
based on the "_int_union" function from the intarray/ contrib
module (similar to the FUSION operator for MULTISETS) when you
only want distinct values for that type of join. Sample queries
from an actual application showing a factor-of-9 performance
improvement(7 seconds to 800ms) are shown below.

-- similar to the standard FUSION operator for MULTISETS.
create aggregate intarray_union_agg (
sfunc = _int_union,
basetype = int[],
stype = int[],
initcond = '{-1}'
);
explain analyze select fac_nam from userfeatures.point_features join entity_facets using (entity_id) where featureid=115 group by fac_nam;
-- Total runtime: 7125.322 ms
explain analyze select fac_nam from (select distinct int_array_enum(fac_ids) as fac_id from (select distinct fac_ids from entity_facids natural join point_features where featureid=115) as a) as a join facet_lookup using (fac_id);
-- Total runtime: 1297.558 ms
explain analyze select fac_nam from (select distinct int_array_enum(fac_ids) as fac_id from (select intarray_union_agg(fac_ids) as fac_ids from entity_facids natural join point_features where featureid=115) as a) as a join facet_lookup using (fac_id);
-- Total runtime: 803.187 ms

I don't have access to the system right now, so I don't have the
full table definitions - but the basic problem is that there are
many "facets" for each row in the "point_features" table and there
are many "features" with featureid=115. The queries are trying
to find the names of each facet available from that set of point_features.

intarray: data_types/

Well, the array of int's data type is built in, so I think
this module is more about the functions, operators, and
indexes that it provides that operate on arrays of ints.
Would that make it fit better under functions/ in your new
directory tree?

If I had a vote, I'd think it nice if the intagg module got
merged with the intarray module (wherever it ends up) because
they really are quite complementary in providing useful
tools for manipulating arrays of ints.

#11Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: elein (#2)
Re: The Contrib Roundup (long)

elein wrote:

intarray: data_types/

what does this do that arrays do not?

It provides lossy indexes that work well on big arrays;
as well as some quite useful convenience functions that
work on arrays of ints.

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

Am Dienstag, 7. Juni 2005 19:53 schrieb Josh Berkus:

I think it would also be helpful to users if we could create
subdirectories to organize contrib into categories. This would
help users and packagers find what they want. These
directories would be:
data_types/
functions/
utilities/

I think this is out of the question both because these categories are fuzzy
and it would destroy the CVS history. It might be equally effective to
organize the README file along these lines.

I'll point out that several people (including one of our
RPM builders) spoke up in favor of the idea of adding ./contrib
command line options for individual contrib items.

Packagers should simply build all contrib items. No extra options are needed.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

In reply to: Robert Treat (#3)
Re: The Contrib Roundup (long)

Hi Robert,

reindexdb: now obsolete per the REINDEX {database} command.
Remove from contrib.

actually I think part of the point of this was to give a command line
version of the reindex command, like we have for vaccum. If that
still
matters, then it should probably stay. Actually it should probably
be
converted to C and moved to /src/bin.

I'm thinking of converting it so Windows users can benefit from it. Do
we have to move it to /src/bin/scripts? It's similar to the other
scripts and can benefit from some code shared by the other scripts.
I'll submit a patch ASAP.

Comments?

Euler Taveira de Oliveira
euler[at]yahoo_com_br

__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger
http://br.download.yahoo.com/messenger/

#14Michael Fuhr
mike@fuhr.org
In reply to: Josh Berkus (#1)
Re: The Contrib Roundup (long)

On Tue, Jun 07, 2005 at 02:53:32PM -0300, Josh Berkus wrote:

noupdate: this is a cool example of a simple C trigger and would
be lovely to have in a doc somewhere. However, its
functionality is easily replicated through a simple PL/pgSQL
trigger so it seems unnecessary as a contrib module. Author
unattributed.

Does noupdate even work correctly? The README is pretty thin so
maybe I've misunderstood something. First of all, the example fails
due to a case problem:

CREATE TABLE TEST ( COL1 INT, COL2 INT, COL3 INT );

CREATE TRIGGER BT BEFORE UPDATE ON TEST FOR EACH ROW
EXECUTE PROCEDURE
noup ('COL1');

INSERT INTO TEST VALUES (10,20,30);
UPDATE TEST SET COL1 = 5;
ERROR: noup: there is no attribute COL1 in relation test

If we fix the case problem then this particular example works:

DROP TRIGGER BT ON TEST;

CREATE TRIGGER BT BEFORE UPDATE ON TEST FOR EACH ROW
EXECUTE PROCEDURE
noup ('col1');

UPDATE TEST SET COL1 = 5;
WARNING: col1: update not allowed
UPDATE 0

But the trigger won't allow updates on other columns either:

UPDATE TEST SET COL2 = 15;
WARNING: col1: update not allowed
UPDATE 0

...unless we *do* change COL1 to NULL:

UPDATE TEST SET COL1 = NULL, COL2 = 15;
UPDATE 1

The code rejects the update if the new value for the designated
column (col1 in this case) is not NULL, rather than checking if
its value has changed. Is that the intended behavior?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#15Josh Berkus
josh@agliodbs.com
In reply to: Peter Eisentraut (#12)
Re: The Contrib Roundup (long)

Peter,

Packagers should simply build all contrib items. No extra options are
needed.

No, they shoudn't. 3 of the packages currently in /contrib are GPL.
Building them makes all of PostgreSQL GPL.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#16Josh Berkus
josh@agliodbs.com
In reply to: Peter Eisentraut (#12)
Re: The Contrib Roundup (long)

Peter,

I think this is out of the question both because these categories are fuzzy
and it would destroy the CVS history. It might be equally effective to
organize the README file along these lines.

Ach, I forgot about this lovely property of CVS. Well, scratch that proposal.
SVN is looking better and better ...

Packagers should simply build all contrib items. No extra options are
needed.

Hmmm, when an RPM builds a contrib item, where does the .sql file go? How
does an RPM user actually add the functions/datatypes/etc to their database?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#17Alvaro Herrera
alvherre@surnet.cl
In reply to: Josh Berkus (#15)
Re: The Contrib Roundup (long)

On Wed, Jun 08, 2005 at 08:45:42AM -0700, Josh Berkus wrote:

Peter,

Packagers should simply build all contrib items. No extra options are
needed.

No, they shoudn't. 3 of the packages currently in /contrib are GPL.
Building them makes all of PostgreSQL GPL.

No, it means the distributors are illegally distributing software they
don't have permission to distribute. The GPL doesn't make everything
else GPL right away, that's a myth.

The only entity that can change PostgreSQL's license is the copyright
owner. Since it's a rather big and unidentified entity, that's
difficult. So the only lawful (legal?) way to distribute a binary
PostgreSQL distribution is to refrain from distributing GPL-licensed
contrib modules.

Or we could remove them from contrib.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Hoy es el primer d�a del resto de mi vida"

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#15)
Re: The Contrib Roundup (long)

Josh Berkus <josh@agliodbs.com> writes:

Packagers should simply build all contrib items. No extra options are
needed.

No, they shoudn't. 3 of the packages currently in /contrib are GPL.
Building them makes all of PostgreSQL GPL.

The fix for that is to remove or relicense those packages, not to
complicate the build process.

regards, tom lane

#19Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#18)
Re: The Contrib Roundup (long)

Tom,

The fix for that is to remove or relicense those packages, not to
complicate the build process.

OK. Then we'll make BSD licensing an absolute requirement for /contrib?

Also, we'll add --build-all-contrib to ./configure?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#19)
Re: The Contrib Roundup (long)

Josh Berkus <josh@agliodbs.com> writes:

Tom,

The fix for that is to remove or relicense those packages, not to
complicate the build process.

OK. Then we'll make BSD licensing an absolute requirement for /contrib?

That's been the intention for a very long time: everything in the core
tarball should be under the same license. Someone's got to do the
legwork of contacting the module authors involved to see if they're
willing to relicense ... and so far it just hasn't gotten to the top
of the to-do queue.

regards, tom lane

#21Alvaro Herrera
alvherre@surnet.cl
In reply to: Josh Berkus (#16)
#22Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#17)
#23Josh Berkus
josh@agliodbs.com
In reply to: Robert Treat (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#23)
#25Alvaro Herrera
alvherre@surnet.cl
In reply to: Josh Berkus (#23)
#26The Hermit Hacker
scrappy@hub.org
In reply to: Josh Berkus (#15)
#27The Hermit Hacker
scrappy@hub.org
In reply to: Peter Eisentraut (#12)
#28Alvaro Herrera
alvherre@surnet.cl
In reply to: The Hermit Hacker (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#27)
#30The Hermit Hacker
scrappy@hub.org
In reply to: Alvaro Herrera (#28)
#31John Gray
jgray@azuli.co.uk
In reply to: Josh Berkus (#1)
#32Matthew D. Fuller
fullermd@over-yonder.net
In reply to: The Hermit Hacker (#30)
#33Alvaro Herrera
alvherre@surnet.cl
In reply to: Matthew D. Fuller (#32)
#34The Hermit Hacker
scrappy@hub.org
In reply to: Matthew D. Fuller (#32)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#34)
#36Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#20)
#37Josh Berkus
josh@agliodbs.com
In reply to: Neil Conway (#36)
#38The Hermit Hacker
scrappy@hub.org
In reply to: Josh Berkus (#37)
#39Josh Berkus
josh@agliodbs.com
In reply to: The Hermit Hacker (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#39)
#41The Hermit Hacker
scrappy@hub.org
In reply to: Josh Berkus (#39)
#42Kaare Rasmussen
kar@kakidata.dk
In reply to: Robert Treat (#3)
#43Steve Crawford
scrawford@pinpointresearch.com
In reply to: Kaare Rasmussen (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#1)
#45Kaare Rasmussen
kar@kakidata.dk
In reply to: Steve Crawford (#43)
#46Andrew - Supernews
andrew+nonews@supernews.com
In reply to: Josh Berkus (#1)
#47Kaare Rasmussen
kar@kakidata.dk
In reply to: Andrew - Supernews (#46)
#48Bruce Momjian
bruce@momjian.us
In reply to: Kaare Rasmussen (#47)
#49Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#44)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#48)
#51Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#50)
#52Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#44)
#53The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#50)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#52)
#56The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#54)
#57Bruce Momjian
bruce@momjian.us
In reply to: The Hermit Hacker (#56)
#58The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#57)
#59Jan Wieck
JanWieck@Yahoo.com
In reply to: The Hermit Hacker (#56)
#60Tino Wildenhain
tino@wildenhain.de
In reply to: Jan Wieck (#59)
#61The Hermit Hacker
scrappy@hub.org
In reply to: Jan Wieck (#59)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#61)
#63Andrew Dunstan
andrew@dunslane.net
In reply to: The Hermit Hacker (#61)
#64The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#62)
#65The Hermit Hacker
scrappy@hub.org
In reply to: Andrew Dunstan (#63)
#66Jan Wieck
JanWieck@Yahoo.com
In reply to: The Hermit Hacker (#65)