TODO list comments

Started by Tom Lanealmost 21 years ago20 messageshackers

tgl@sss.pgh.pa.us

almost 21 years ago

I made a pass over the TODO list to see what was out of date.

* Allow administrators to safely terminate individual sessions either
via an SQL function or SIGTERM

Currently SIGTERM of a backend can lead to lock table corruption.

This comment may be out of date. Suggest

Lock table corruption following SIGTERM of an individual backend
has been reported in 8.0. A possible cause is fixed in 8.1, but
it is unknown whether other trouble spots exist. This item is
mainly a matter of doing adequate testing rather than of writing
any new code.

o Allow postgresql.conf values to be set so they can not be changed
by the user

Is that really a good idea? The ones that are unsafe are restricted already.

* %Remove Money type, add money formatting for decimal type

There's a fair-size contingent that doesn't want Money removed
completely, but just reimplemented as an I/O wrapper around type
numeric. Maybe that's even what you mean by the TODO item, but
it's not clear. Please at least mention the alternative.

o %Allow MIN()/MAX() on arrays

This is done.

o Modify array literal representation to handle array index lower bound
of other than one

This too.

o Add security checking for large objects

Currently large objects entries do not have owners. Permissions can
only be set at the pg_largeobject table level.

This comment is wrong: trying to set the permissions on pg_largeobject
would have no effect whatsoever on the lo_xxx functions, so there is not
even a partial solution available now.

o Auto-delete large objects when referencing row is deleted

This should note that contrib/lo already offers a solution.

* %Have views on temporary tables exist in the temporary namespace
* Allow temporary views on non-temporary tables

Both of these are done in 8.1.

* %Allow RULE recompilation

Eh? Perhaps you meant "automatically regenerate cached plans when
needed", in which case it's redundant with the Dependency Checking
entries. Whatever it means, this doesn't seem a particularly simple
item.

* %Allow TRUNCATE ... CASCADE/RESTRICT

Huh? What would that do?

* Make row-wise comparisons work per SQL spec

This could probably be marked as a % item.

o Currently the system uses the operating system COPY command to
create a new database. Add ON COMMIT capability to CREATE TABLE AS
SELECT

This seems a bit garbled, and anyway the first part is done.

o %Add ALTER DOMAIN TYPE

To do what, exactly? This is unclear.

o -Allow objects to be moved to different schemas

This is only partly done --- the 8.1 patch didn't cover all object types.

o %Disallow dropping of an inherited constraint
...
o %Prevent child tables from altering constraints like CHECK that were
inherited from the parent table

These seem to be duplicates, or at least in need of merging.

o Handle references to temporary tables that are created, destroyed,
then recreated during a session, and EXECUTE is not used

This requires the cached PL/PgSQL byte code to be invalidated when
an object referenced in the function is changed.

This is redundant with the Dependency Checking item about regenerating
cached plans.

o Add table function support to pltcl, plperl, plpython?

Isn't this done for plperl?

o Allow PL/pgSQL to name columns by ordinal position, e.g. rec.(3)

This doesn't seem like an amazingly good idea; would prefer to see a way
to get the column name list and use names dynamically. Numbers have all
the same problems as "SELECT *" ...

o Add MOVE to PL/pgSQL

This should be generalized: upgrade plpgsql cursor support to have all
the FETCH and MOVE options of the main language.

o Add support for polymorphic arguments and return types to plperl

I think all the PLs except plpgsql need this.

Also, all the PLs except plpgsql are well behind the curve on supporting
parameter names and OUT parameters. Please add TODO item(s) for these.

* Allow libpq to access SQLSTATE so pg_ctl can test for connection failure

This would be used for checking if the server is up.

Huh? What has SQLSTATE got to do with connection failure checking?

* Have initdb set DateStyle based on locale?

Is this really a good idea? Being standardized on ISO format seems like
a good thing to me, and encouraging people to adopt ambiguous formats as
default a very bad thing. They can do it if they like, certainly, but
having initdb do it for them just seems like not the direction we want.

* Add a schema option to createlang

This is superseded by events: createlang now puts the functions in
pg_catalog, and there doesn't seem any particularly good reason to
want to put them elsewhere.

o Improve psql's handling of multi-line queries

Uh, what's wrong with it? This item seems far too vague.

o Add pg_dumpall custom format dumps.

This is probably best done by combining pg_dump and pg_dumpall
into a single binary.

This is probably obsoleted by events, too. Now that we can dump blobs
in text mode, I see no reason that we ever need to do this.
pg_restore's only real reason to live is to support selective restore
(ie, pulling out just a few objects from an existing dump) and I do not
see that you need that for pg_dumpall dumps.

o Remove unnecessary abstractions in pg_dump source code

Like which?

* %Remove CREATE CONSTRAINT TRIGGER

This was used in older releases to dump referential integrity
constraints.

Do we really want to remove it, and thereby guarantee we can't load
dumps from those old releases?

* Fetch heap pages matching index entries in sequential order

Rather than randomly accessing heap pages based on index entries, mark
heap pages needing access in a bitmap and do the lookups in sequential
order. Another method would be to sort heap ctids matching the index
before accessing the heap rows.

This is done (see bitmap index scans).

* Hash

Why doesn't the hash index section mention the need for WAL support?
Multicolumn hash indexes might be interesting too.

o Pack hash index buckets onto disk pages more efficiently

Currently no only one hash bucket can be stored on a page. Ideally
several hash buckets could be stored on a single page and greater
granularity used for the hash algorithm.

I think that should read "Currently only one ..."

* Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

This item should probably point out that the optimal settings are
certainly platform-dependent.

* Improve the background writer

Allow the background writer to more efficiently write dirty buffers
from the end of the LRU cache and use a clock sweep algorithm to
write other dirty buffers to reduced checkpoint I/O

This is done.

* Improve speed with indexes

For large table adjustements during vacuum, it is faster to reindex
rather than update the index.

This applies only to VACUUM FULL, so it probably needs to be reworded.

* Reduce lock time by moving tuples with read lock, then write
lock and truncate table

Ditto.

* Auto-vacuum

o %Suggest VACUUM FULL if a table is nearly empty

It seems like a fairly bad idea for auto-vacuum to do a VACUUM FULL
ever, given the locking effects. And how is a background daemon going
to "suggest" anything? It could write to the postmaster log but it's
entirely likely the user would never notice.

* -Improve SMP performance on i386 machines

i386-based SMP machines can generate excessive context switching
caused by lock failure in high concurrency situations. This may be
caused by CPU cache line invalidation inefficiencies.

This isn't really done, I don't think, we just ameliorated what was the
largest cause of it. It'll be back...

* Fix priority ordering of read and write light-weight locks (Neil)

I think we concluded that was a bad idea.

* Add WAL index reliability improvement to non-btree indexes

Oh, here it is. I'd be inclined to put this item in the index section
instead, so that you can have separate entries for each index type.
Besides, GIST is done.

* -Use CHECK constraints to influence optimizer decisions

CHECK constraints contain information about the distribution of values
within the table. This is also useful for implementing subtables where
a tables content is distributed across several subtables.

This isn't completely done by any means.

* ANALYZE should record a pg_statistic entry for an all-NULL column

This is done.

* Remove memory/file descriptor freeing before ereport(ERROR)
* Promote debug_query_string into a server-side function current_query()
* Allow the identifier length to be increased via a configure option

All of those could probably be marked %.

* Allow cross-compiling by generating the zic database on the target system
* Fix cross-compiling of time zone database via 'zic'

These look like duplicates to me ...

o Improve dlerror() reporting string

I think this got done.

o Add support for Unicode

This is done too, though not tested enough.

regards, tom lane

Kris Jurka

books@ejurka.com

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

On Wed, 24 Aug 2005, Tom Lane wrote:

* Fetch heap pages matching index entries in sequential order

Rather than randomly accessing heap pages based on index entries, mark
heap pages needing access in a bitmap and do the lookups in sequential
order. Another method would be to sort heap ctids matching the index
before accessing the heap rows.

This is done (see bitmap index scans).

Will the optimizer ever choose this plan when dealing with only one index?

Kris Jurka

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Kris Jurka (#2)

Re: TODO list comments

Kris Jurka <books@ejurka.com> writes:

On Wed, 24 Aug 2005, Tom Lane wrote:

This is done (see bitmap index scans).

Will the optimizer ever choose this plan when dealing with only one index?

Certainly. It's actually likely to prefer a bitmap scan whenever the
query is estimated to fetch more than one percent or so of the table
(although if you are demanding ORDER BY the index order, the crossover
point is higher, since a bitmap scan doesn't deliver sorted output).

Something that probably ought to be on the Open Items list for 8.1
is whether the cost estimation for bitmap vs plain indexscan is OK.
It's entirely likely that we need to do some tweaking to get the
planner to make the right choice.

regards, tom lane

Michael Glaesemann

grzm@seespotcode.net

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

On Aug 25, 2005, at 10:58 AM, Tom Lane wrote:

* %Remove CREATE CONSTRAINT TRIGGER

This was used in older releases to dump referential integrity
constraints.

Do we really want to remove it, and thereby guarantee we can't load
dumps from those old releases?

Also, I believe CONSTRAINT TRIGGERS are the only way to provide
transaction level (rather than statement level) referential
integrity. I've used this in the past. The SQL command reference page
mentions that it's not for general use, but it'd be a shame to remove
it before there's an alternative way to provide transaction level
referential integrity.

Michael Glaesemann
grzm myrealbox com

Hannu Krosing

hannu@tm.ee

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

On K, 2005-08-24 at 21:58 -0400, Tom Lane wrote:

* %Allow TRUNCATE ... CASCADE/RESTRICT

Huh? What would that do?

Maybe this was meant truncating of tables with dependent foreign keys ?

AFAIR this was solved by allowing truncating several tables in one
command even if they have FK relationships between themselves.

This is only partly done --- the 8.1 patch didn't cover all object types.

o %Disallow dropping of an inherited constraint
...
o %Prevent child tables from altering constraints like CHECK that were
inherited from the parent table

These seem to be duplicates, or at least in need of merging.

It should probably mention about weird inheritance behaviour of "CREATE
CONSTRAINT ON ONLY tablename" - it is not propagated to existing child
tables, but is inherited when creating new ones.

Also, I don't think this should be done at all, at least not before we
have proper partitioned table support ready. I could live with it
creating a warning about not being future-compatible.

o Handle references to temporary tables that are created, destroyed,
then recreated during a session, and EXECUTE is not used

This requires the cached PL/PgSQL byte code to be invalidated when
an object referenced in the function is changed.

This is redundant with the Dependency Checking item about regenerating
cached plans.

Or maybe not completely, depending on how you do it.

If temp table itself is created inside the same pl/pgsql function, then
there could still be a way to do the planning/optimising only once and
then substitute temp table oids when running the function.

The table structure in this case is quaranteed to be the same during
each run of the function, it's just that the temp table and index oids
should be treated as local variables.

Done this way, it gives real benefits in terms of cached query plans,
instead of just preventing newcomers from shooting themselves in foot by
not using EXECUTE.

* Improve speed with indexes

For large table adjustements during vacuum, it is faster to reindex
rather than update the index.

This applies only to VACUUM FULL, so it probably needs to be reworded.

In case we implement concurrent/non-blocking CREATE INDEX at some point,
this might be a good idea for lazy VACUUM as well.

And it may make more sense to do CLUSTER instead of VACUUM FULL in at
least some of these cases.

(btw. CLUSTER seems to be another function which my concurrent vacuuming
patch should be extended to cover, at least on "client" side, like
CREATE INDEX)

* Auto-vacuum

o %Suggest VACUUM FULL if a table is nearly empty

It seems like a fairly bad idea for auto-vacuum to do a VACUUM FULL
ever, given the locking effects. And how is a background daemon going
to "suggest" anything? It could write to the postmaster log but it's
entirely likely the user would never notice.

With current implementations of commands, doing (some equivalent of)
CLUSTER here seems a better idea than VACUUM FULL, as it also un-bloats
indexes. Not sure of of transactional behaviour though.

--
Hannu Krosing <hannu@skype.net>

Robert Treat

xzilla@users.sourceforge.net

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

On Wed, 2005-08-24 at 21:58, Tom Lane wrote:

o Add pg_dumpall custom format dumps.

This is probably best done by combining pg_dump and pg_dumpall
into a single binary.

This is probably obsoleted by events, too. Now that we can dump blobs
in text mode, I see no reason that we ever need to do this.
pg_restore's only real reason to live is to support selective restore
(ie, pulling out just a few objects from an existing dump) and I do not
see that you need that for pg_dumpall dumps.

Being able to restore just the database users without restoring all
databases? (There are other ways that could be accomplished, like
adding user information to pg_dump, but it's one scenario anyway)

Actually the argument that you would have to do both a pg_dumpall of the
cluster and a pg_dump of each database in order to obtain this
functionality seems so user unfriendly it seems like something to persue
on those grounds alone (imho).

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Greg Sabino Mullane

greg@turnstep.com

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane asked:

o Improve psql's handling of multi-line queries

Uh, what's wrong with it? This item seems far too vague.

I think perhaps this means adding multi-line support to
the tab-completion? Only thing I can think of, cause other
than that, multi-line queries work just fine.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200508250952
https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkMNzTAACgkQvJuQZxSWSshB8gCgvOU3rZi1uwFnwXO2zVz6KjUG
TUwAn3VoHGbqGkP1bRItMgVFE3vPQkkf
=rA0w
-----END PGP SIGNATURE-----

Matt Miller

mattm@epx.com

almost 21 years ago

In reply to: Michael Glaesemann (#4)

Re: TODO list comments

On Thu, 2005-08-25 at 15:50 +0900, Michael Glaesemann wrote:

* %Remove CREATE CONSTRAINT TRIGGER

Do we really want to remove it,

Also, I believe CONSTRAINT TRIGGERS are the only way to provide
transaction level (rather than statement level) referential
integrity.

Don't deferrable foreign keys give you transaction-level referential
integrity? From the SET CONSTRAINTS doc:

Synopsis
SET CONSTRAINTS { ALL | name [, ...] } { DEFERRED | IMMEDIATE }
Description
SET CONSTRAINTS sets the behavior of constraint checking within the
current transaction. IMMEDIATE constraints are checked at the end of
each statement. DEFERRED constraints are not checked until transaction
commit.

Alvaro Herrera

alvherre@2ndquadrant.com

almost 21 years ago

In reply to: Greg Sabino Mullane (#7)

Re: TODO list comments

On Thu, Aug 25, 2005 at 01:53:32PM -0000, Greg Sabino Mullane wrote:

Tom Lane asked:

o Improve psql's handling of multi-line queries

Uh, what's wrong with it? This item seems far too vague.

I think perhaps this means adding multi-line support to
the tab-completion? Only thing I can think of, cause other
than that, multi-line queries work just fine.

The saved history is also not cool about multiline queries. If you
enter them interactively (or by pasting), they are entered as several
entries. If you edit them with \e, they are entered as a single unit.

It would be also nice to have M-# to work well -- currently it inserts a
#, which works in bash but is obviously wrong in psql.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Los romï¿½nticos son seres que mueren de deseos de vida"

#10

Oliver Elphick

olly@lfix.co.uk

almost 21 years ago

In reply to: Greg Sabino Mullane (#7)

Re: TODO list comments

On Thu, 2005-08-25 at 13:53 +0000, Greg Sabino Mullane wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane asked:

o Improve psql's handling of multi-line queries

Uh, what's wrong with it? This item seems far too vague.

If you enter a multi-line query one line at a time, a subsequent
up-arrow will recover one line at a time; on the other hand, if you use
\e to edit a multi-line query, a subsequent up-arrow will recover the
whole query in one go. The latter behaviour would be nice in all cases.

An item not in the TODO list yet -- would anyone support including this
feature in psql?:
It would be nice if multi-line items lined up with their proper column
on output. This is what happens at the moment:

junk=# insert into xyz (name,address) values ('Joe Bloggs','1 Hindhead Villas,
junk'# Newport,
junk'# Gwent');
INSERT 230412518 1
junk=# select * from xyz;
id | name | address
----+------------+-----------------------------------
1 | Joe Bloggs | 1 Hindhead Villas,
Newport,
Gwent
(1 row)

If there is more than one potential source column, things are even
worse:

It would be better to show the columns aligned (perhaps without showing
separators for other columns so as not to give the impression that the
other columns contain null or empty strings):

\a would turn this behaviour off.

Oliver Elphick

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Oliver Elphick (#10)

Re: TODO list comments

Oliver Elphick <olly@lfix.co.uk> writes:

It would be better to show the columns aligned (perhaps without showing
separators for other columns so as not to give the impression that the
other columns contain null or empty strings):

junk=# select * from xyz;
id | name | address | del_addr
----+------------+-----------------------------------+----------------------------------
1 | Joe Bloggs | 1 Hindhead Villas, | 2 The Laurels,
| Newport, | Swinkley,
| Gwent | XX3 5CX
(1 row)

I think the above is unacceptable because it looks indistinguishable
from a valid but quite different dataset. (No, the "1 row" doesn't make
it better; as soon as there's more than one row you can't tell what you
have. And leaving out the first | doesn't help if all the columns are
multiline.)

It might be OK without any separators on the added lines, though:

Or perhaps use a different separator:

junk=# select * from xyz;
 id |    name    |              address              |             del_addr 
----+------------+-----------------------------------+----------------------------------
  1 | Joe Bloggs | 1 Hindhead Villas,                | 2 The Laurels,
    +            + Newport,                          + Swinkley,
    +            + Gwent                             + XX3 5CX
(1 row)

Not sure how hard this would be to program, or what sort of overhead it
might impose to check for the case. My recollection is that psql's
table-layout code is pretty slow and ugly already ...

regards, tom lane

#12

Andrew Dunstan

andrew@dunslane.net

almost 21 years ago

In reply to: Tom Lane (#11)

Re: TODO list comments

Tom Lane wrote:

Or perhaps use a different separator:

junk=# select * from xyz;
id |    name    |              address              |             del_addr 
----+------------+-----------------------------------+----------------------------------
1 | Joe Bloggs | 1 Hindhead Villas,                | 2 The Laurels,
+            + Newport,                          + Swinkley,
+            + Gwent                             + XX3 5CX
(1 row)

That's a terrific idea, and, incidentally, just the sort of project that
might well suit a beginning hacker, since the code is pretty isolated.

Not sure how hard this would be to program, or what sort of overhead it
might impose to check for the case. My recollection is that psql's
table-layout code is pretty slow and ugly already ...

If people want speed they shouldn't use psql as a client anyway. I don't
see this as much of an obstacle.

cheers

andrew

#13

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

On Wed, Aug 24, 2005 at 09:58:04PM -0400, Tom Lane wrote:

* %Allow RULE recompilation

Eh? Perhaps you meant "automatically regenerate cached plans when
needed", in which case it's redundant with the Dependency Checking
entries. Whatever it means, this doesn't seem a particularly simple
item.

Hrm... I read that as allowing CREATE OR REPLACE on rules, but of course
that already exists.

http://lnk.nu/search.postgresql.org/3mt.search

* %Allow TRUNCATE ... CASCADE/RESTRICT

Huh? What would that do?

http://archives.postgresql.org/pgsql-hackers/2003-08/msg01045.php

o %Add ALTER DOMAIN TYPE

To do what, exactly? This is unclear.

http://archives.postgresql.org/pgsql-hackers/2004-05/msg00985.php

o Remove unnecessary abstractions in pg_dump source code

Like which?

I *think* this is reffering to how pg_dump makes some assumptions about
what things are system objects.

http://archives.postgresql.org/pgsql-committers/2005-08/msg00203.php
doesn't help a heck of a lot...

Can we add an interface to the TODO list that contains search links back
to the mailing lists?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com 512-569-9461

#14

Michael Glaesemann

grzm@seespotcode.net

almost 21 years ago

In reply to: Matt Miller (#8)

Re: TODO list comments

On Aug 25, 2005, at 11:29 PM, Matt Miller wrote:

On Thu, 2005-08-25 at 15:50 +0900, Michael Glaesemann wrote:

* %Remove CREATE CONSTRAINT TRIGGER

Do we really want to remove it,

Also, I believe CONSTRAINT TRIGGERS are the only way to provide
transaction level (rather than statement level) referential
integrity.

Don't deferrable foreign keys give you transaction-level referential
integrity? From the SET CONSTRAINTS doc:

Sorry, I misspoke. What I'm thinking of is not referential integrity
in the sense of foreign keys, but assertions, which PostgreSQL does
not yet support.

Say for example you have a table that contains time periods marked by
a start_date and an end_date and you want there to be no gaps between
the different time periods in the table for a given key. When doing
updates, deletes, or inserts on this table, you'll need to check to
make sure there are no gaps when the transaction is finished.
However, there may indeed be gaps during the transaction as
start_dates and end_dates are updated. Triggers can be written to
enforce this kind of integrity, but they'll only work if they're
deferrable.

Michael Glaesemann
grzm myrealbox com

#15

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Tom Lane (#1)

Re: TODO list comments

Great updates! Let me comment on each one.

I made a pass over the TODO list to see what was out of date.

* Allow administrators to safely terminate individual sessions either
via an SQL function or SIGTERM

Currently SIGTERM of a backend can lead to lock table corruption.

This comment may be out of date. Suggest

Lock table corruption following SIGTERM of an individual backend
has been reported in 8.0. A possible cause is fixed in 8.1, but
it is unknown whether other trouble spots exist. This item is
mainly a matter of doing adequate testing rather than of writing
any new code.

Done.

o Allow postgresql.conf values to be set so they can not be changed
by the user

Is that really a good idea? The ones that are unsafe are restricted already.

Well, a typical case would be log_statement, but I see that is
super-user now. I guess we are OK, removed. If we get more problems,
we can re-add something later.

* %Remove Money type, add money formatting for decimal type

There's a fair-size contingent that doesn't want Money removed
completely, but just reimplemented as an I/O wrapper around type
numeric. Maybe that's even what you mean by the TODO item, but
it's not clear. Please at least mention the alternative.

Updated:

* Improve the MONEY data type

Change the MONEY data type to use DECIMAL internally, with special
locale-aware output formatting.

o %Allow MIN()/MAX() on arrays

This is done.

OK.

o Modify array literal representation to handle array index lower bound
of other than one

This too.

OK.

o Add security checking for large objects

Currently large objects entries do not have owners. Permissions can
only be set at the pg_largeobject table level.

This comment is wrong: trying to set the permissions on pg_largeobject
would have no effect whatsoever on the lo_xxx functions, so there is not
even a partial solution available now.

Oh, comment removed.

o Auto-delete large objects when referencing row is deleted

This should note that contrib/lo already offers a solution.

Done.

* %Have views on temporary tables exist in the temporary namespace
* Allow temporary views on non-temporary tables

Both of these are done in 8.1.

OK.

* %Allow RULE recompilation

Eh? Perhaps you meant "automatically regenerate cached plans when
needed", in which case it's redundant with the Dependency Checking
entries. Whatever it means, this doesn't seem a particularly simple
item.

Agreed, updated to:

* Allow VIEW/RULE recompilation when the underlying tables change

* %Allow TRUNCATE ... CASCADE/RESTRICT

Huh? What would that do?

I assume it is just like DELETE CASCADE, but it TRUNCATES rather than
DELETE. Description added.

* Make row-wise comparisons work per SQL spec

This could probably be marked as a % item.

Done.

o Currently the system uses the operating system COPY command to
create a new database. Add ON COMMIT capability to CREATE TABLE AS
SELECT

This seems a bit garbled, and anyway the first part is done.

Yep, garbled. I have removed the first part.

o %Add ALTER DOMAIN TYPE

To do what, exactly? This is unclear.

I assume it would allow the underlying data type to be changed. Updated
text:

o Add ALTER DOMAIN to modify the underlying data type

o -Allow objects to be moved to different schemas

This is only partly done --- the 8.1 patch didn't cover all object types.

Updated to:

o Add missing object types for ALTER ... SET SCHEMA

o %Disallow dropping of an inherited constraint
...
o %Prevent child tables from altering constraints like CHECK that were
inherited from the parent table

These seem to be duplicates, or at least in need of merging.

Merged and updated:

o %Prevent child tables from altering or dropping constraints
like CHECK that were inherited from the parent table

o Handle references to temporary tables that are created, destroyed,
then recreated during a session, and EXECUTE is not used

This requires the cached PL/PgSQL byte code to be invalidated when
an object referenced in the function is changed.

This is redundant with the Dependency Checking item about regenerating
cached plans.

Removed and description added to dependency item:

* Track dependencies in function bodies and recompile/invalidate

This is particularly important for references to temporary tables
in PL/PgSQL because PL/PgSQL caches query plans. The only workaround
in PL/PgSQL is to use EXECUTE.

o Add table function support to pltcl, plperl, plpython?

Isn't this done for plperl?

Right, plperl removed.

o Allow PL/pgSQL to name columns by ordinal position, e.g. rec.(3)

This doesn't seem like an amazingly good idea; would prefer to see a way
to get the column name list and use names dynamically. Numbers have all
the same problems as "SELECT *" ...

Yep, updated:

o Allow function argument names to be queries from PL/PgSQL

o Add MOVE to PL/pgSQL

This should be generalized: upgrade plpgsql cursor support to have all
the FETCH and MOVE options of the main language.

o Add support for polymorphic arguments and return types to plperl

I think all the PLs except plpgsql need this.

Updated to:

o Add support for polymorphic arguments and return types to
languages other than PL/PgSQL

Also, all the PLs except plpgsql are well behind the curve on supporting
parameter names and OUT parameters. Please add TODO item(s) for these.

OK:

o Add support for OUT and INOUT parameters to languages other
than PL/PgSQL

* Allow libpq to access SQLSTATE so pg_ctl can test for connection failure

This would be used for checking if the server is up.

Huh? What has SQLSTATE got to do with connection failure checking?

I am confused too. What we have now is this check in pg_ctl.c:

(PQstatus(conn) == CONNECTION_OK ||
(strcmp(PQerrorMessage(conn),
PQnoPasswordSupplied) == 0)))

I assume it is to allow better checking of the password request, but I
am unsure. I will remove the item unless someone else understands it.
Or is it this from libpq, PG_DIAG_SQLSTATE? I think the underlying
problem is that pg_ctl doesn't have a good way to determine if the
server is running based on a failure to connect. Anyway, item removed
unless we get more reports of problems.

* Have initdb set DateStyle based on locale?

Is this really a good idea? Being standardized on ISO format seems like
a good thing to me, and encouraging people to adopt ambiguous formats as
default a very bad thing. They can do it if they like, certainly, but
having initdb do it for them just seems like not the direction we want.

True, but we still have to have a default of what do with with input
like 01/02/03. TODO updated:

* Have initdb set the input DateStyle (MDY or DMY) based on locale?

* Add a schema option to createlang

This is superseded by events: createlang now puts the functions in
pg_catalog, and there doesn't seem any particularly good reason to
want to put them elsewhere.

OK, removed.

o Improve psql's handling of multi-line queries

Uh, what's wrong with it? This item seems far too vague.

Later emails in this thread clarify this and I will address it there.

o Add pg_dumpall custom format dumps.

This is probably best done by combining pg_dump and pg_dumpall
into a single binary.

This is probably obsoleted by events, too. Now that we can dump blobs
in text mode, I see no reason that we ever need to do this.
pg_restore's only real reason to live is to support selective restore
(ie, pulling out just a few objects from an existing dump) and I do not
see that you need that for pg_dumpall dumps.

Hmm, good points. I added a question mark and removed the description.
If we add CSV output format to pg_dump, I can imagine pg_dumpall perhaps
optionally using that too.

o Remove unnecessary abstractions in pg_dump source code

Like which?

Uh, we have talked about it. It is the complex function pointer usage
in the code that needs cleaning. Updated:

o Remove unnecessary function pointer abstractions in pg_dump source
code

* %Remove CREATE CONSTRAINT TRIGGER

This was used in older releases to dump referential integrity
constraints.

Do we really want to remove it, and thereby guarantee we can't load
dumps from those old releases?

OK, TODO item removed, and I saw later discussion that is still useful.

* Fetch heap pages matching index entries in sequential order

Rather than randomly accessing heap pages based on index entries, mark
heap pages needing access in a bitmap and do the lookups in sequential
order. Another method would be to sort heap ctids matching the index
before accessing the heap rows.

This is done (see bitmap index scans).

OK.

* Hash

Why doesn't the hash index section mention the need for WAL support?
Multicolumn hash indexes might be interesting too.

Added:

o Add WAL logging for crash recovery
o Allow multi-column hash indexes

o Pack hash index buckets onto disk pages more efficiently

Currently no only one hash bucket can be stored on a page. Ideally
several hash buckets could be stored on a single page and greater
granularity used for the hash algorithm.

I think that should read "Currently only one ..."

Thanks, fixed.

* Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

This item should probably point out that the optimal settings are
certainly platform-dependent.

Added, though they are not optimizer per-platform. Rather we use the
best one we find, and assume the _best_ is the same on all platforms.

Updated:

* Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

Ideally this requires a separate test program that can be run
at initdb time or optionally later.

* Improve the background writer

Allow the background writer to more efficiently write dirty buffers
from the end of the LRU cache and use a clock sweep algorithm to
write other dirty buffers to reduced checkpoint I/O

This is done.

OK.

* Improve speed with indexes

For large table adjustements during vacuum, it is faster to reindex
rather than update the index.

This applies only to VACUUM FULL, so it probably needs to be reworded.

Updated.

* Reduce lock time by moving tuples with read lock, then write
lock and truncate table

Ditto.

Updated.

* Auto-vacuum

o %Suggest VACUUM FULL if a table is nearly empty

It seems like a fairly bad idea for auto-vacuum to do a VACUUM FULL
ever, given the locking effects. And how is a background daemon going
to "suggest" anything? It could write to the postmaster log but it's
entirely likely the user would never notice.

Well, something in the logs is better than nothing. We already warn
about other server issues like too frequent checkpoints in the server
logs.

New TODO:

o %Issue log message to suggest VACUUM FULL if a table is nearly
empty?

Added a question mark because this clearly needs more thought.

* -Improve SMP performance on i386 machines

i386-based SMP machines can generate excessive context switching
caused by lock failure in high concurrency situations. This may be
caused by CPU cache line invalidation inefficiencies.

This isn't really done, I don't think, we just ameliorated what was the
largest cause of it. It'll be back...

Yep, we can re-add it later once we get a report.

* Fix priority ordering of read and write light-weight locks (Neil)

I think we concluded that was a bad idea.

* Add WAL index reliability improvement to non-btree indexes

Oh, here it is. I'd be inclined to put this item in the index section
instead, so that you can have separate entries for each index type.
Besides, GIST is done.

Right, I think Hash is the only one left, and you suggested adding that
above. Item removed.

* -Use CHECK constraints to influence optimizer decisions

CHECK constraints contain information about the distribution of values
within the table. This is also useful for implementing subtables where
a tables content is distributed across several subtables.

This isn't completely done by any means.

Uh, isn't this constraint_elimination? The only problem is that it
isn't automatic. The remaining part seems useless now that the
optimizer statistics are pretty good. OK?

Added:

* Allow constraint_elimination to be automatically performed

This requires additional code to reduce the performance loss caused by
constraint elimination.

* ANALYZE should record a pg_statistic entry for an all-NULL column

This is done.

* Remove memory/file descriptor freeing before ereport(ERROR)
* Promote debug_query_string into a server-side function current_query()
* Allow the identifier length to be increased via a configure option

All of those could probably be marked %.

Done.

* Allow cross-compiling by generating the zic database on the target system
* Fix cross-compiling of time zone database via 'zic'

These look like duplicates to me ...

Yep, second one removed.

o Improve dlerror() reporting string

I think this got done.

OK.

o Add support for Unicode

This is done too, though not tested enough.

OK.

Great changes. All applied.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#16

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Hannu Krosing (#5)

Re: TODO list comments

Hannu Krosing wrote:

On K, 2005-08-24 at 21:58 -0400, Tom Lane wrote:

* %Allow TRUNCATE ... CASCADE/RESTRICT

Huh? What would that do?

Maybe this was meant truncating of tables with dependent foreign keys ?

AFAIR this was solved by allowing truncating several tables in one
command even if they have FK relationships between themselves.

Yes, but I can imagine allowing a CASCADE behavior as well.

This is only partly done --- the 8.1 patch didn't cover all object types.

o %Disallow dropping of an inherited constraint
...
o %Prevent child tables from altering constraints like CHECK that were
inherited from the parent table

These seem to be duplicates, or at least in need of merging.

It should probably mention about weird inheritance behaviour of "CREATE
CONSTRAINT ON ONLY tablename" - it is not propagated to existing child
tables, but is inherited when creating new ones.

I am not sure on that one because the table does have the constraint at
the time the child is created. Comments?

Also, I don't think this should be done at all, at least not before we
have proper partitioned table support ready. I could live with it
creating a warning about not being future-compatible.

Right, TODO item removed.

o Handle references to temporary tables that are created, destroyed,
then recreated during a session, and EXECUTE is not used

This requires the cached PL/PgSQL byte code to be invalidated when
an object referenced in the function is changed.

This is redundant with the Dependency Checking item about regenerating
cached plans.

Or maybe not completely, depending on how you do it.

Well, I beefed up the item:

* Track dependencies in function bodies and recompile/invalidate

This is particularly important for references to temporary tables
in PL/PgSQL because PL/PgSQL caches query plans. The only workaround
in PL/PgSQL is to use EXECUTE.

If temp table itself is created inside the same pl/pgsql function, then
there could still be a way to do the planning/optimising only once and
then substitute temp table oids when running the function.

The table structure in this case is quaranteed to be the same during
each run of the function, it's just that the temp table and index oids
should be treated as local variables.

Interesting approach but is it worth the added complexity? One issue
this does bring up is that functions themselves might invalidate their
own cached query plan by dropping a table and receating it. In those
cases, your solution would be the only valid one, or throw an error.

I added some more text:

* Track dependencies in function bodies and recompile/invalidate

This is particularly important for references to temporary tables
in PL/PgSQL because PL/PgSQL caches query plans. The only workaround
in PL/PgSQL is to use EXECUTE. One complexity is that a function
might itself drop and recreate dependent tables, causing it to
invalidate its own query plan.

Done this way, it gives real benefits in terms of cached query plans,
instead of just preventing newcomers from shooting themselves in foot by
not using EXECUTE.

* Improve speed with indexes

For large table adjustements during vacuum, it is faster to reindex
rather than update the index.

This applies only to VACUUM FULL, so it probably needs to be reworded.

In case we implement concurrent/non-blocking CREATE INDEX at some point,
this might be a good idea for lazy VACUUM as well.

Perhaps.

And it may make more sense to do CLUSTER instead of VACUUM FULL in at
least some of these cases.

Cluster modifies the heap while reindex does not. This makes cluster a
much heavier operation.

(btw. CLUSTER seems to be another function which my concurrent vacuuming
patch should be extended to cover, at least on "client" side, like
CREATE INDEX)

Not sure.

* Auto-vacuum

o %Suggest VACUUM FULL if a table is nearly empty

It seems like a fairly bad idea for auto-vacuum to do a VACUUM FULL
ever, given the locking effects. And how is a background daemon going
to "suggest" anything? It could write to the postmaster log but it's
entirely likely the user would never notice.

With current implementations of commands, doing (some equivalent of)
CLUSTER here seems a better idea than VACUUM FULL, as it also un-bloats
indexes. Not sure of of transactional behaviour though.

Not sure, CLUSTER is still heavier. That doesn't mean it shouldn't be
used, but the administrator should automatically consider CLUSTER in
place of VACUUM FULL for large updates.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#17

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Alvaro Herrera (#9)

Re: TODO list comments

Alvaro Herrera wrote:

On Thu, Aug 25, 2005 at 01:53:32PM -0000, Greg Sabino Mullane wrote:

Tom Lane asked:

o Improve psql's handling of multi-line queries

Uh, what's wrong with it? This item seems far too vague.

I think perhaps this means adding multi-line support to
the tab-completion? Only thing I can think of, cause other
than that, multi-line queries work just fine.

The saved history is also not cool about multiline queries. If you
enter them interactively (or by pasting), they are entered as several
entries. If you edit them with \e, they are entered as a single unit.

TODO updated:

o Improve psql's handling of multi-line queries

Currently, while \e saves a single query as one entry, interactive
queries are saved one line at a time. Ideally all queries
whould be saved like \e does.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#18

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Andrew Dunstan (#12)

Re: TODO list comments

Andrew Dunstan wrote:

Tom Lane wrote:
Or perhaps use a different separator:
junk=# select * from xyz;
id |    name    |              address              |             del_addr 
----+------------+-----------------------------------+----------------------------------
1 | Joe Bloggs | 1 Hindhead Villas,                | 2 The Laurels,
+            + Newport,                          + Swinkley,
+            + Gwent                             + XX3 5CX
(1 row)
That's a terrific idea, and, incidentally, just the sort of project that
might well suit a beginning hacker, since the code is pretty isolated.

Not sure how hard this would be to program, or what sort of overhead it
might impose to check for the case. My recollection is that psql's
table-layout code is pretty slow and ugly already ...

If people want speed they shouldn't use psql as a client anyway. I don't
see this as much of an obstacle.

Added to TODO:

o Allow multi-line column values to align in the proper columns

If the second output column value is 'a\nb', the 'b' should appear
in the second display column, rather than the first column as it
does now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#19

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Jim Nasby (#13)

Re: TODO list comments

Jim C. Nasby wrote:

I *think* this is reffering to how pg_dump makes some assumptions about
what things are system objects.

http://archives.postgresql.org/pgsql-committers/2005-08/msg00203.php
doesn't help a heck of a lot...

Can we add an interface to the TODO list that contains search links back
to the mailing lists?

Yes, that would be nice, though some times the threads are pretty long
and I try to digest the agreed-upon solution. Where would we put the
URLs? In the TODO file?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#20

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 21 years ago

In reply to: Bruce Momjian (#19)

Re: TODO list comments

On Fri, Aug 26, 2005 at 03:44:18PM -0400, Bruce Momjian wrote:

Jim C. Nasby wrote:

I *think* this is reffering to how pg_dump makes some assumptions about
what things are system objects.

http://archives.postgresql.org/pgsql-committers/2005-08/msg00203.php
doesn't help a heck of a lot...

Can we add an interface to the TODO list that contains search links back
to the mailing lists?

Yes, that would be nice, though some times the threads are pretty long
and I try to digest the agreed-upon solution. Where would we put the
URLs? In the TODO file?

Yeah, the digestification is good, and I hope it continues. But it's
also good to be able to refer back to the original thread in it's
entirety. My thought was to make the TODO item itself a link to the
search (or ideally the thread itself). The advantage of just linking to
the search is that would allow a clever CGI to just parse through the
TODO and linkify the TODO items.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com 512-569-9461