7.4 Wishlist
Hi guys,
Just out of interest, if someone was going to pay you to hack on Postgres
for 6 months, what would you like to code for 7.4?
My ones are:
* Compliant ADD COLUMN
* Integrated full text indexes
* pg_dump dependency ordering
What would you guys do? Even if it isn't feasible right now...
Chris
Christopher Kings-Lynne kirjutas R, 29.11.2002 kell 23:51:
Hi guys,
Just out of interest, if someone was going to pay you to hack on Postgres
for 6 months, what would you like to code for 7.4?My ones are:
* Compliant ADD COLUMN
* Integrated full text indexes
* pg_dump dependency orderingWhat would you guys do? Even if it isn't feasible right now...
As I don't have a permanent job starting next year (my main employer
went bust), I'm planning to do more on postgreSQL anyway (at least until
I run out of money ;)
I have done some (or sometimes a lot of) brain-twitching on items in the
following list, but very little actual useful coding on most.
My personal todo list is:
"Application server support"
----------------------------
* better XML integration
- XML(*) aggregate function returning XML representation of subquery
- XML input/output to/from tables
- XML searchable/indexable in fields)
* Overhaul of OO features (moving closer to SQL99)
- type/table inheritance,
table inheritance would be done using SQL99's UNDER and would be
single inheritance, stored in single logical table, possibly
subdivided in physical tables reusing our current huge table 1GB
split mechanisms
type inheritance would be done using SQL99's LIKE and would be
multiple inheritance and would reuse as much as possible the
current code for ADD/DROP/RENAME column
- check constraints would apply to both type and table inheritance
- pk/fk constraints would apply only to table inheritance
- types as base of tables,
- study feasibility of reference types,
- dynamic invocation of table function on queries over hierarchies
* WITH (as part of query/view)
* WITH RECURSIVE for recursive queries
* better NOTIFY (with optional argument, using shared memory
instead of tables)
General stuff
-------------
* making array types btree-indexable in a general way
* study feasibility of using SQL99's ARRAY syntax for arrays
Data warehousing
----------------
* bitmap indexes,
- using bitmap "indexes" internally for star joins
- real bitmap indexes
- clustered multiple bitmap indexes especially
clustering on group of bitmap indexes
* clustering in general - specifying pages to be filled only to a
certain percentage in clustered tables so that updated tuples can
be placed near original ones if needed and parallel vacuum can
then reclaim the space and keep table clustered with less shuffling.
* OLAP features
- WINDOW clause, PARTITION BY
- GROUPING SETS, ROLLUP, CUBE, ()
WAL-based master-slave replication
----------------------------------
* if someone is not doing it (which I hope is not true ;)
UNICODE / Localization
----------------------
* UTEXT, UCHAR, UVARCHAR types using IBM's ICU, stored in UTF-16 or SCSU
* fast LIKE, ILIKE, REGEX code for UTF-16, possibly lifted from python2
* field-level localization, again using ICU
FE/BE protocol
--------------
all can be worked on independently
* try to find a better wire protocol from existing ones (X-window
system seems simple enough, perhaps DB2's DRDA) or fix the existing
one for high performance (mainly make sure that as big chunks as
possible have preceeding length), make it easy to send
out-of-band/optional data (Notifications, info on actual query
performance (so one can visualize it for user), ...)
* standardize a fire-level binary protocol for field types (currently
whatever is stored is sent)
* work on making python use this protocol and port some postgres
datatypes (initially timestamp/date/time and varbit)to python
Really Dark Arts
------------------
* making backend internals available to a scripting language (for me it
means python ;) for making more parts (especially planner/optimizer)
more easily hackable
* using the stuff from previous point ;)
And that's all ;)
----------------
Hannu Krosing
pg_dump, our upgrade process is painful enough having to do a dump, reload.
I think we should be able to guarantee (or at least let much closer to it)
that the process works in all cases.
Personally pg_upgrade would be even nicer.
----- Original Message -----
From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au>
To: <pgsql-hackers@postgresql.org>
Sent: Friday, November 29, 2002 1:51 PM
Subject: [HACKERS] 7.4 Wishlist
Show quoted text
Hi guys,
Just out of interest, if someone was going to pay you to hack on Postgres
for 6 months, what would you like to code for 7.4?My ones are:
* Compliant ADD COLUMN
* Integrated full text indexes
* pg_dump dependency orderingWhat would you guys do? Even if it isn't feasible right now...
Chris
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
Christopher Kings-Lynne wrote:
Hi guys,
Just out of interest, if someone was going to pay you to hack on Postgres
for 6 months, what would you like to code for 7.4?
Replication. Replication. Replication. Replication. Replication.
Replication. Replication. Replication. Replication. Replication.
Replication. Replication. Replication. Replication. Replication.
Well, jokes apart, I think this is one of the most needed features to
me. Currently I'm using strange voodoo to replicate some tables on other
machines in order to spread load and resilency. Compared to what I am
doing now a good master to slave replication would be heaven.
I understand that a good replication is painful but in my experience, if
you start by integrating some rude, experimental implementation in the
mainstream PostgreSQL the rest will come by itself.
For example, RI was something I wouldn't consider "production level" in
7.2, but was a start, now in 7.3 is much much better, probably complete
in the most important parts.
Other wishes (not as important as the replication issue) are:
- Better granularity of security and access control, like in mysql.
- Ability to reset the state of an open backend, including aborting open
transaction to allow for better connection pooling and reusing, maybe
giving the client the ability to switch between users...
Bye!
--
Daniele Orlandi
Planet Srl
Wow Hannu - your list puts mine to shame!
"Application server support"
----------------------------
* better XML integration- XML(*) aggregate function returning XML representation of subquery
- XML input/output to/from tables
- XML searchable/indexable in fields)
I've had thoughts about XML too. Since XML is hierachical, imagine being
able to index xml using contrib/ltree or something!
ie. We create a new 'xml' column type.
We create a new indexing scheme for it based on ltree & gist.
You index the xml column.
Then you can do sort of XPath queries:
SELECT * FROM requests WHERE xml_xpath('/request/owner/name', datafield) =
'Bob';
And it would be indexed. Imaging being able to pull up all XML documents
that had certain properties, etc.
MS-SQL has a SELECT ... FOR XML clause, but we could always just create
function called xml_select() or something now that we can return recordsets.
Chris
At 10:51 AM 29/11/2002 -0800, Christopher Kings-Lynne wrote:
* pg_dump dependency ordering
I've actually started working on pg_dump in the background, but if you want
to do it let me know.
In terms of things I would like to see:
- background/integrated vacuum (not just an overwriting storage manager)
- insert/update...returning
- function result caches -- assuming I can demonstrate that they are a Good
Thing.
- COPY TO/FROM with a list of columns (maybe we have it?) - it's useful for
making metadata changes then reloading data (Inserts are much slower).
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
I've actually started working on pg_dump in the background, but if you
want
to do it let me know.
Nah - you can do it! I don't have much time to code Postgres as it is.
I'll stick to working on making ADD COLUMN compliant.
In terms of things I would like to see:
- background/integrated vacuum (not just an overwriting storage manager)
Work has started on this already, which is neat.
- insert/update...returning
That would be so useful for our web app where we're always needing to get
the primary key as opposed to oid. We keep needing to requery.
- function result caches -- assuming I can demonstrate that they are a
Good
Thing.
- COPY TO/FROM with a list of columns (maybe we have it?) - it's useful
for
making metadata changes then reloading data (Inserts are much slower).
We do already have it in 7.3:
http://developer.postgresql.org/docs/postgres/sql-copy.html
Chris
At 04:56 PM 29/11/2002 -0800, Christopher Kings-Lynne wrote:
- COPY TO/FROM with a list of columns (maybe we have it?) - it's useful
for
making metadata changes then reloading data (Inserts are much slower).
We do already have it in 7.3:
Excellent. Then I just need to add support in pg_dump.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
- COPY TO/FROM with a list of columns (maybe we have it?) - it's
useful
for
making metadata changes then reloading data (Inserts are much slower).
We do already have it in 7.3:
Excellent. Then I just need to add support in pg_dump.
Hmmm. I could have sworn that someone (Neil?) already did that?
Chris
Comment in {identifier} section in src/backend/parser/scan.l states:
[...]
* Note: here we use a locale-dependent case conversion,
* which seems appropriate under SQL99 rules, whereas
* the keyword comparison was NOT locale-dependent.
*/
And in ScanKeywordLookup() in src/backend/parser/keywords.c:
/*
* Apply an ASCII-only downcasing. We must not use tolower()
since it
* may produce the wrong translation in some locales (eg, Turkish),
* and we don't trust isupper() very much either. In an ASCII-based
* encoding the tests against A and Z are sufficient, but we also
* check isupper() so that we will work correctly under EBCDIC. The
* actual case conversion step should work for either ASCII or
EBCDIC.
*/
And I happen to have bad luck to use PostgreSQL with Turkish locale. And, as
you
may know our "I" is not your "I":
pgsql=# create table a(x char(1));
CREATE TABLE
pgsql=# grant SELECT ON a to PUBLIC;
ERROR: user "public" does not exist
pgsql=#
Oracle, the second best database I have does seem to convert relation names
in
locale-dependent fassion:
SQL> alter session set NLS_LANGUAGE='TURKISH';
Session altered.
SQL> create table a(x char(1));
Table created.
SQL> grant select on a to PUBLIC;
Grant succeeded.
Further, if I try to create a table in oracle using Turkish-specific
characters,
it is creating it alright, without trying to make them upper-case as it
usually does.
So I have changed lower-case conversion code in scan.l to make it purely
ASCII-based
as in keywords.c. Mini-patch is given below. Please bear in mind that it is
my first
attempt at hacking PostgreSQL code, so there can be some mistakes.
Regards,
Nick
diff -Nur src/backend/parser/scan.l.orig src/backend/parser/scan.l
--- src/backend/parser/scan.l.orig Sat Nov 30 02:54:06 2002
+++ src/backend/parser/scan.l Sat Nov 30 02:57:45 2002
@@ -551,9 +551,12 @@
ident = pstrdup(yytext);
for (i = 0; ident[i]; i++)
{
- if (isupper((unsigned char)
ident[i]))
- ident[i] =
tolower((unsigned char) ident[i]);
+ char ch =
ident[i];
+ if (ch >= 'A' && ch <= 'Z'
&& isupper((unsigned char) ch))
+ ch += 'a' - 'A';
+ ident[i] = ch;
}
+ ident[i] = '\0';
if (i >= NAMEDATALEN)
{
int len;
At 05:33 PM 29/11/2002 -0800, Christopher Kings-Lynne wrote:
Hmmm. I could have sworn that someone (Neil?) already did that?
Not AFAICT - at least based on looking at the manual. I'll check the code.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
"Nicolai Tufar" <ntufar@apb.com.tr> writes:
So I have changed lower-case conversion code in scan.l to make it purely
ASCII-based.
as in keywords.c. Mini-patch is given below.
Rather than offering a patch, you need to convince us why our reading of
the SQL standard is wrong. ("Oracle does it that way" is not an
argument that will carry a lot of weight.)
SQL99 states that identifier case conversions are done on the basis of
the Unicode upper/lower case equivalences, so it seems clear that they
intend more than ASCII-only conversion for identifiers. Locale-based
conversion might not be an exact implementation of the spec, but it's
surely closer than ASCII-only.
regards, tom lane
On Fri, Nov 29, 2002 at 10:51:26AM -0800, Christopher Kings-Lynne wrote:
Just out of interest, if someone was going to pay you to hack on Postgres
for 6 months, what would you like to code for 7.4?
Well, nobody is paying me, but I want to
- fix the btree problem leaking unused pages (I think I'm getting near,
I just haven't had free time during the last month). This one is a
must to me.
- try different regexp algorithms, compare efficiency. Both Henry
Spencer's new code for Tcl, and Baeza-Navarro shift-or approach (can
be much faster than "traditional" regex engines)
(do people care for allowing "search with errors", similar to what
agrep and nrgrep do?)
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiar� en un traidor. Ni siquiera si el traidor lo he creado yo"
(Bar�n Vladimir Harkonnen)
By no means I would try to convince that your reading of
the SQL standards is wrong. What I am trying to tell is
that Turkish alphabet is broken beyond repair. And since
there is absolutely no way to change our alphabet, we
may can code a workaround in the code.
So i do not claim that your code is wrong. It is
behaviang according to specification. But unfortunately
folks at SQL99 probably were not aware of the woes
of Turkish "I".
The very special case of letter "I" in Turkish is not
only PostgreSQL's problem. Many java programs have
failed miserably trying to open files with "I"s in
pathnames.
So basically, there are two letters "I" in Trukish.
The wone is with dot on top and another is without.
The with dot on top walways has the dot and the one
without never has it. Simple. The problem is
with the standard Latin "I". So why small "i" does
have a dot and capital "I" does not?
Standard conversion is
Lower: "I" -> "y'" and "Y'" -> "i".
Upper: "y'" -> "I" and "i" -> "Y'".
(font may not be displayed correctly in your mail reader)
Historically programs that operate in Turkish locale have
chosen to hardcode the capitalisation of "i" in system
messages and identifier names like this:
Lower: "I" -> "i" and "Y'" -> "i".
Upper: "y'" -> "I" and "i" -> "I".
With this, no matter what kind of "I" you used in names,
it is always going to end up a valid ASCII character.
Would it be acceptable if I submit a path that applies this
special logic in src/backend/parser/scan.l if the locale is "tr_TR"?
Because for many folks setting locale to Turkish would
render their database unusable. For, god forbid, if your
sql has a column name written in capitlas including "I".
It is not working. So I deeply believe that PostgreSQL community
have to provide a workaround for this problem.
So what should I do?
Best regards,
Nick
Tom Lane wrote:
Show quoted text
"Nicolai Tufar" <ntufar@apb.com.tr> writes:
So I have changed lower-case conversion code in scan.l to make it purely
ASCII-based.
as in keywords.c. Mini-patch is given below.Rather than offering a patch, you need to convince us why our reading of
the SQL standard is wrong. ("Oracle does it that way" is not an
argument that will carry a lot of weight.)SQL99 states that identifier case conversions are done on the basis of
the Unicode upper/lower case equivalences, so it seems clear that they
intend more than ASCII-only conversion for identifiers. Locale-based
conversion might not be an exact implementation of the spec, but it's
surely closer than ASCII-only.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
On Fri, 2002-11-29 at 18:06, Daniele Orlandi wrote:
- Better granularity of security and access control, like in mysql.
Can you be more specific on exactly what features you'd like to see?
- Ability to reset the state of an open backend, including aborting open
transaction to allow for better connection pooling and reusing
IIRC, it's been suggested that we can implement this by passing back the
transaction state as part of the FE/BE protocol -- if we're doing a
protocol change for 7.4, this could be part of it.
Cheers,
Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
On Fri, 2002-11-29 at 13:51, Christopher Kings-Lynne wrote:
Just out of interest, if someone was going to pay you to hack on Postgres
for 6 months, what would you like to code for 7.4?
Here's some of my current TODO list:
* FOR EACH STATEMENT triggers (already done)
* column lists for UPDATE triggers (will be done soon)
* Improve the buffer manager's replacement algorithm (LRU-K, perhaps?)
* Implement support for hugetlb pages on linux 2.5
* Finish off PITR (if no one else does ...)
* Asynchronous notification improvements (optional message, store
notifications in shared memory)
* Rowtype assignment in PL/PgSQL
Cheers,
Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
What I'd like to have in future versions of PostgreSQL:
- replication, replication, ... (you have seen that before). i guess
most people would like to see that.
- a dblink like system for connecting to remote database systems
(not just PostgreSQL???)
something like CREATE REMOTE VIEW would be damn good.
it would solve many problem when it comes to migration
- tablespaces (the directory based stuff which has been discussed on
the list)
- somebody has mentioned XML before. the XPath stuff would be really
cool
- PL/Sh should be in contrib. i know that the core team has decided
not to put it in the core but contrib would be fine (I keep forgetting
the URL of Peters website :( ...)
- packages: is there a way to define a set of functions as a package
so that they can be removed using just one DROP PACKAGE or so? would be
nice for huge projects
- urgent: being able to use PL/Perl in combination with SPI (There
is a Pg-SPI but it is 0.01 - see
http://search.cpan.org/author/APILOS/DBD-PgSPI-0.01/PgSPI.pm). a full
and reliable implementation would be fine.
- preforking for faster startup
- declare MySQL as evil *g*.
Thanks a lot :)
Hans
Oops, there is something I have forgotten:
- "nicing" backends: this would be nice for administration tasks
- CREATE DATABASE ... WITH MAXSIZE (many providers would like to see
that; quotas are painful in this case - especially when porting the
database to a different or a second server)
Hans
My list is;
Point to Point and Broadcast replication
----------------------------------------
With point to point you specify multiple endpoints, with broadcast you can
specify a subnet address and the updates are broadcast over that subnet.
The difference being that point to point works well for cross network
replication, or where you have a few replicants. I have multiple database
servers which could have a deadicated class C network that they are all on,
by broadcasting updates you can cutdown the amount of traffic on that net by
a factor of n minus 1 (where n is the number of servers involved).
Ability to use raw partitions
----------------------------
I've not seen an install of PostgreSQL yet that didn't put the database
files onto a filesystem, so I'm assuming it's the only way of doing it. By
using the filesystem the files are at the mercy of filesystem handler code
as to where they end up on the disk, and thus the speed of access will
always have some dependancy on the speed of the filesystem.
With a raw partition it would be possible to use two devices (e.g. /dev/hde
and /dev/hdg on an eight channel ide linux box), and PostgreSQL could then
ensure the WALs were located on one the disk with the entries running
sequentally, and that the database files were located on the other disk in
the most appropriate location (e.g. index data starting near the center of
the disk, and user table data starting near the outside).
Win32 Port
------------
I've explained the reasons before. Apart from that it's always useful to
open PostgreSQL up to a larger audience.
----- Original Message -----
From: "Daniele Orlandi" <daniele@orlandi.com>
To: <pgsql-hackers@postgresql.org>
Sent: Friday, November 29, 2002 11:06 PM
Subject: [mail] Re: [HACKERS] 7.4 Wishlist
Christopher Kings-Lynne wrote:
Hi guys,
Just out of interest, if someone was going to pay you to hack on
Postgres
Show quoted text
for 6 months, what would you like to code for 7.4?
Replication. Replication. Replication. Replication. Replication.
Replication. Replication. Replication. Replication. Replication.
Replication. Replication. Replication. Replication. Replication.Well, jokes apart, I think this is one of the most needed features to
me. Currently I'm using strange voodoo to replicate some tables on other
machines in order to spread load and resilency. Compared to what I am
doing now a good master to slave replication would be heaven.I understand that a good replication is painful but in my experience, if
you start by integrating some rude, experimental implementation in the
mainstream PostgreSQL the rest will come by itself.For example, RI was something I wouldn't consider "production level" in
7.2, but was a start, now in 7.3 is much much better, probably complete
in the most important parts.Other wishes (not as important as the replication issue) are:
- Better granularity of security and access control, like in mysql.
- Ability to reset the state of an open backend, including aborting open
transaction to allow for better connection pooling and reusing, maybe
giving the client the ability to switch between users...Bye!
--
Daniele Orlandi
Planet Srl---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
And that's all ;)
----------------
Hannu Krosing
- and what will you do after January? ;-)
Just kidding. I hope you have a big fat bank account if you want to finish
all that!
--
Kaare Rasmussen --Linux, spil,-- Tlf: 3816 2582
Kaki Data tshirts, merchandize Fax: 3816 2501
Howitzvej 75 �ben 12.00-18.00 Email: kar@kakidata.dk
2000 Frederiksberg L�rdag 12.00-16.00 Web: www.suse.dk