Google SoC--Idea Request

Started by Jonah H. Harrisover 19 years ago53 messages
#1Jonah H. Harris
jonah.harris@gmail.com

Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas. I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#2Dave Page
dpage@vale-housing.co.uk
In reply to: Jonah H. Harris (#1)
Re: Google SoC--Idea Request

-----Original Message-----
From: "Jonah H. Harris"<jonah.harris@gmail.com>
Sent: 15/04/06 20:06:27
To: "Pgsql Hackers"<pgsql-hackers@postgresql.org>
Subject: [HACKERS] Google SoC--Idea Request

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas.

There's a couple of listen/notify todos iirc that would be nice to get done - one to allow a message to be sent with the notify, and one to move from a table based design to shared mem/disk.

Regards, Dave

-----Unmodified Original Message-----
Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas. I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#3Neil Conway
neilc@samurai.com
In reply to: Dave Page (#2)
Re: Google SoC--Idea Request

On Sat, 2006-04-15 at 21:24 +0100, Dave Page wrote:

one to allow a message to be sent with the notify, and one to move
from a table based design to shared mem/disk.

Doing the latter is a precondition for implementing the former in a
reasonable way, I believe.

BTW, these two web log entries summarizing Mono and Mozilla's
experiences with SoC might make interesting reading:

http://weblogs.mozillazine.org/gerv/archives/2006/03/summer_of_code_six_months_on.html
http://tirania.org/blog/archive/2006/Apr-13.html

we should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

Given the above, I would be wary of such projects bit-rotting. If the
upstream project hasn't bothered to add PostgreSQL support, there might
be a good reason why: writing truly database-agnostic applications is
not always easy (or even desirable).

-Neil

#4Jonah H. Harris
jonah.harris@gmail.com
In reply to: Neil Conway (#3)
Re: Google SoC--Idea Request

On 4/15/06, Neil Conway <neilc@samurai.com> wrote:

Doing the latter is a precondition for implementing the former in a
reasonable way, I believe.

BTW, these two web log entries summarizing Mono and Mozilla's
experiences with SoC might make interesting reading:

Thanks for the reading material. I don't think our project is exactly
the same, but it's good information to keep in mind.

Given the above, I would be wary of such projects bit-rotting. If the
upstream project hasn't bothered to add PostgreSQL support, there might
be a good reason why: writing truly database-agnostic applications is
not always easy (or even desirable).

This isn't always the case. In a lot of cases, the developers just
wanted to take the easy route and used MySQL... they have a lot of
people asking for PostgreSQL support but they don't have the expertise
to add it themselves.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#5Robert Treat
xzilla@users.sourceforge.net
In reply to: Jonah H. Harris (#4)
Re: Google SoC--Idea Request

On Saturday 15 April 2006 19:25, Jonah H. Harris wrote:

On 4/15/06, Neil Conway <neilc@samurai.com> wrote:

Doing the latter is a precondition for implementing the former in a
reasonable way, I believe.

BTW, these two web log entries summarizing Mono and Mozilla's
experiences with SoC might make interesting reading:

Thanks for the reading material. I don't think our project is exactly
the same, but it's good information to keep in mind.

Agreed. I sent some ideas to Josh, was thinking he might be posting a list
soon. I kept it aimed at a few ideas I have had/seen that need an initial
push to get going but beyond that could be (and likely would be) community
maintained. Example? Extendning the build farm code to test external pl
langs or database drivers or patches other modules. We've talked about it,
and if someone had the time to make the push, I believe this would be
community maintained going forward.

Given the above, I would be wary of such projects bit-rotting. If the
upstream project hasn't bothered to add PostgreSQL support, there might
be a good reason why: writing truly database-agnostic applications is
not always easy (or even desirable).

This isn't always the case. In a lot of cases, the developers just
wanted to take the easy route and used MySQL... they have a lot of
people asking for PostgreSQL support but they don't have the expertise
to add it themselves.

I think more importantly is that the time needed to do an initial port is
often much greater than it is to maintain a port.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#6Dave Page
dpage@vale-housing.co.uk
In reply to: Robert Treat (#5)
Re: Google SoC--Idea Request

-----Original Message-----
From: "Jonah H. Harris"<jonah.harris@gmail.com>
Sent: 15/04/06 20:06:27
To: "Pgsql Hackers"<pgsql-hackers@postgresql.org>
Subject: [HACKERS] Google SoC--Idea Request

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas.

Another thought - a nice C++ project, requiring minimal previous knowledge of existing code would be to add a query builder to pgAdmin.

Regards, Dave

-----Unmodified Original Message-----
Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas. I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#7Ned Lilly
ned@nedscape.com
In reply to: Jonah H. Harris (#1)
Re: Google SoC--Idea Request

OpenMFG has done some work on getting PostgreSQL working with the Drupal CMS and the Mantis bugtracker (and also integrating those two, btw). We're in contact with the respective projects about getting our patches worked in, but if anyone's keeping a tally, just wanted you to be aware.

Regards,
Ned

Jonah H. Harris wrote:

Show quoted text

Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas. I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#8Stephen Frost
sfrost@snowman.net
In reply to: Jonah H. Harris (#1)
Re: Google SoC--Idea Request

* Jonah H. Harris (jonah.harris@gmail.com) wrote:

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

I got an email already for a good idea, actually, which is to work on
having pg_hba.conf modifiable from SQL. The only problem with that is
that it really needs to be done in an acceptable way which requires
probably as much design work as actual programming. Another idea along
those same lines would be having .k5login-style support for Kerberos.
We'd need a conf-flag for that for backwards compatibility (once the
.k5login-style support exists we should clean up our Kerberos
credentials matching to, for example, not accept 'sfrost/root' for
'sfrost' or 'sfrost@ABC.COM' for 'sfrost@XYZ.com').

It'd also be nice to support SASL, and better hashes than md5.

Thanks,

Stephen

#9Jim C. Nasby
jnasby@pervasive.com
In reply to: Jonah H. Harris (#1)
Re: Google SoC--Idea Request

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:

All ideas welcome!

I know it's not directly PostgreSQL related, but I'd love to see the
dbt* code improved. Items on my wish-list:

- make it easy to run the test framework and clients on a seperate
machine from the database server
- keep results in a database
- provide a front-end to allow users to schedule tests in a queue
- add support for windows, at least for the database (theoretically
possible to run that way now, but you have to do everything by hand)

Another idea: afaik, spikesource is still offering a bounty for
improvements to OSS test suites, something that'd fit well with SoC.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#10Mark Wong
markw@osdl.org
In reply to: Jim C. Nasby (#9)
Re: Google SoC--Idea Request

Jim C. Nasby wrote:

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:

All ideas welcome!

I know it's not directly PostgreSQL related, but I'd love to see the
dbt* code improved. Items on my wish-list:

- make it easy to run the test framework and clients on a seperate
machine from the database server
- keep results in a database
- provide a front-end to allow users to schedule tests in a queue
- add support for windows, at least for the database (theoretically
possible to run that way now, but you have to do everything by hand)

Another idea: afaik, spikesource is still offering a bounty for
improvements to OSS test suites, something that'd fit well with SoC.

I second this. :) There are also the TPC-App (Java) fair-use
implementation that I've started and the TPC-E (next gen OLTP) that I
would like to start.

Mark

#11Jim C. Nasby
jnasby@pervasive.com
In reply to: Mark Wong (#10)
Re: Google SoC--Idea Request

On Tue, Apr 18, 2006 at 11:27:40AM -0700, Mark Wong wrote:

Jim C. Nasby wrote:

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:

All ideas welcome!

I know it's not directly PostgreSQL related, but I'd love to see the
dbt* code improved. Items on my wish-list:

- make it easy to run the test framework and clients on a seperate
machine from the database server
- keep results in a database
- provide a front-end to allow users to schedule tests in a queue
- add support for windows, at least for the database (theoretically
possible to run that way now, but you have to do everything by hand)

Another idea: afaik, spikesource is still offering a bounty for
improvements to OSS test suites, something that'd fit well with SoC.

I second this. :) There are also the TPC-App (Java) fair-use
implementation that I've started and the TPC-E (next gen OLTP) that I
would like to start.

Maybe before starting on TPC-E it makes sense to try and get a common
framework for all the different tests built? AFAIK most of the
benchmarks all use a fairly standard client-server infrastructure, so we
should hopefully be able to share that between the different tests...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#12Jonah H. Harris
jonah.harris@gmail.com
In reply to: Jim C. Nasby (#11)
Re: Google SoC--Idea Request

On 4/18/06, Jim C. Nasby <jnasby@pervasive.com> wrote:

On Tue, Apr 18, 2006 at 11:27:40AM -0700, Mark Wong wrote:

Jim C. Nasby wrote:

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:

All ideas welcome!

I know it's not directly PostgreSQL related, but I'd love to see the
dbt* code improved. Items on my wish-list:

- make it easy to run the test framework and clients on a seperate
machine from the database server
- keep results in a database
- provide a front-end to allow users to schedule tests in a queue
- add support for windows, at least for the database (theoretically
possible to run that way now, but you have to do everything by hand)

Another idea: afaik, spikesource is still offering a bounty for
improvements to OSS test suites, something that'd fit well with SoC.

I second this. :) There are also the TPC-App (Java) fair-use
implementation that I've started and the TPC-E (next gen OLTP) that I
would like to start.

Maybe before starting on TPC-E it makes sense to try and get a common
framework for all the different tests built? AFAIK most of the
benchmarks all use a fairly standard client-server infrastructure, so we
should hopefully be able to share that between the different tests...

I agree with Jim. A framework would really help out here. All of the
tests are basically the same and would benefit from a framework.

However, Mark, do you think Java is a reliable benchmarking platform?
At EnterpriseDB, we've tried several Java benchmarks and could never
get as repeatable or reliable of a benchmark as DBT2 gives you.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#13Mark Wong
markw@osdl.org
In reply to: Jonah H. Harris (#12)
Re: Google SoC--Idea Request

Jonah H. Harris wrote:

On 4/18/06, Jim C. Nasby <jnasby@pervasive.com> wrote:

On Tue, Apr 18, 2006 at 11:27:40AM -0700, Mark Wong wrote:

Jim C. Nasby wrote:

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:

All ideas welcome!

I know it's not directly PostgreSQL related, but I'd love to see the
dbt* code improved. Items on my wish-list:

- make it easy to run the test framework and clients on a seperate
machine from the database server
- keep results in a database
- provide a front-end to allow users to schedule tests in a queue
- add support for windows, at least for the database (theoretically
possible to run that way now, but you have to do everything by hand)

Another idea: afaik, spikesource is still offering a bounty for
improvements to OSS test suites, something that'd fit well with SoC.

I second this. :) There are also the TPC-App (Java) fair-use
implementation that I've started and the TPC-E (next gen OLTP) that I
would like to start.

Maybe before starting on TPC-E it makes sense to try and get a common
framework for all the different tests built? AFAIK most of the
benchmarks all use a fairly standard client-server infrastructure, so we
should hopefully be able to share that between the different tests...

I agree with Jim. A framework would really help out here. All of the
tests are basically the same and would benefit from a framework.

This has crossed my mind before. I haven't been able to come up with
something that I've felt good about on my own though.

However, Mark, do you think Java is a reliable benchmarking platform?
At EnterpriseDB, we've tried several Java benchmarks and could never
get as repeatable or reliable of a benchmark as DBT2 gives you.

I don't have much experience here yet. I've only got a portion of the
TPC-App implemented, although probably enough now to see how repeatable
it is thus far. Do you want to give my DBT4 kit a shot? :) I'm curious
to what platforms you've tried Java on as I've heard the Linux
implementations aren't as good as their Windows counterparts. I'm not
sure how true that is today though.

Mark

#14John DeSoi
desoi@pgedit.com
In reply to: Mark Wong (#10)
Re: Google SoC--Idea Request

Proposed item: Improve PL/PHP support, especially installation on non-
Linux platforms. PL/PHP does not currently work on OS X (not sure
about Windows, but I doubt it).

Alvaro indicated he would be willing to provide direction on this
with testing support from me. He also said there are several other
possible PL/PHP issues that would warrant a SoC project.

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

#15Jonah H. Harris
jonah.harris@gmail.com
In reply to: John DeSoi (#14)
Re: Google SoC--Idea Request

On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:

Alvaro indicated he would be willing to provide direction on this
with testing support from me. He also said there are several other
possible PL/PHP issues that would warrant a SoC project.

Cool... let's get 'em all listed here so we can move forward.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#16Joshua D. Drake
jd@commandprompt.com
In reply to: John DeSoi (#14)
Re: Google SoC--Idea Request

John DeSoi wrote:

Proposed item: Improve PL/PHP support, especially installation on
non-Linux platforms. PL/PHP does not currently work on OS X (not sure
about Windows, but I doubt it).

It definitely does NOT work on Windows. MacOSX is just a matter of us
having some time.

Alvaro indicated he would be willing to provide direction on this with
testing support from me. He also said there are several other possible
PL/PHP issues that would warrant a SoC project.

Well my number one issue is the build process which needs to be cleaned
up but there are other more technical issues to be resolved as well.

Joshua D. Drake

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#17Alvaro Herrera
alvherre@commandprompt.com
In reply to: Joshua D. Drake (#16)
Re: Google SoC--Idea Request

Joshua D. Drake wrote:

John DeSoi wrote:

Proposed item: Improve PL/PHP support, especially installation on
non-Linux platforms. PL/PHP does not currently work on OS X (not sure
about Windows, but I doubt it).

It definitely does NOT work on Windows. MacOSX is just a matter of us
having some time.

Alvaro indicated he would be willing to provide direction on this with
testing support from me. He also said there are several other possible
PL/PHP issues that would warrant a SoC project.

Well my number one issue is the build process which needs to be cleaned
up but there are other more technical issues to be resolved as well.

Yeah, there are also a number of possible improvements documented as
tickets in the Trac site and others that currently exist only as very
vague noise in my head.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#18Robert Treat
xzilla@users.sourceforge.net
In reply to: Jonah H. Harris (#15)
Re: Google SoC--Idea Request

On Wednesday 19 April 2006 12:09, Jonah H. Harris wrote:

On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:

Alvaro indicated he would be willing to provide direction on this
with testing support from me. He also said there are several other
possible PL/PHP issues that would warrant a SoC project.

Cool... let's get 'em all listed here so we can move forward.

I think Martin Oosterhout's nearby email on coverity bug reports might make a
good SoC project, but should it also be added to the TODO list?

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#19Martijn van Oosterhout
kleptog@svana.org
In reply to: Robert Treat (#18)
Re: Google SoC--Idea Request

On Thu, Apr 20, 2006 at 08:51:25AM -0400, Robert Treat wrote:

On Wednesday 19 April 2006 12:09, Jonah H. Harris wrote:

On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:

Alvaro indicated he would be willing to provide direction on this
with testing support from me. He also said there are several other
possible PL/PHP issues that would warrant a SoC project.

Cool... let's get 'em all listed here so we can move forward.

I think Martin Oosterhout's nearby email on coverity bug reports might make a
good SoC project, but should it also be added to the TODO list?

Nice idea, though it would be much more useful if the reports could be
exported en-masse. There's an export function but it only exports the
user comments, not the error itself. So unless people signup there's no
easy way to get the info to people. :(

In any case, after you weed out the false-positives and exclude ECPG
you're only talking about less than 50 issues that may need to be
addressed. Hardly a project that will take any amount of time.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#19)
Re: Google SoC--Idea Request

Martijn van Oosterhout <kleptog@svana.org> writes:

On Thu, Apr 20, 2006 at 08:51:25AM -0400, Robert Treat wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might make a
good SoC project, but should it also be added to the TODO list?

...
In any case, after you weed out the false-positives and exclude ECPG
you're only talking about less than 50 issues that may need to be
addressed. Hardly a project that will take any amount of time.

Nor one we'd be willing to wait till the summer to address, if any of
the bugs are real.

regards, tom lane

#21Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#20)
Re: Google SoC--Idea Request

On Thu, Apr 20, 2006 at 11:04:31AM -0400, Tom Lane wrote:

Martijn van Oosterhout <kleptog@svana.org> writes:

On Thu, Apr 20, 2006 at 08:51:25AM -0400, Robert Treat wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might make a
good SoC project, but should it also be added to the TODO list?

...
In any case, after you weed out the false-positives and exclude ECPG
you're only talking about less than 50 issues that may need to be
addressed. Hardly a project that will take any amount of time.

Nor one we'd be willing to wait till the summer to address, if any of
the bugs are real.

Most of the stuff remaining is memory leaks in the src/bin directories,
and ECPG. The memory leaks are not important there (initdb leaks like a
sieve in many places).

About the only thing in the backend I found interesting was this:

src/backend/utils/hash/dynahash.c function hash_create

The numbers are line numbers. Somewhat squished version, hope I didn't
miss anything.

185 if( flags & HASH_SHARED_MEM) {
193 hashp->hcxt = NULL;
197 if (flags & HASH_ATTACH)
198 return hashp;
199 }
256 if (!init_htab(hashp, nelem))
257 {
258 hash_destroy(hashp);

hash_destroy dereferences hashp->hcxt. I don't see anything in
init_htab that special-cases shared memory hashes. The only way this
could be avoided is if HASH_SHARED_MEM was always combined with
HASH_ATTACH. But if so, why the test?

The only other thing we could do, if we were prepare to annotate the
source, is maybe teach it about our locking stuff and have it check
that. But I don't think that's suitable for mainline, more someone's
private tree...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#21)
Re: Google SoC--Idea Request

Martijn van Oosterhout <kleptog@svana.org> writes:

About the only thing in the backend I found interesting was this:
src/backend/utils/hash/dynahash.c function hash_create

I wonder if we shouldn't just remove the hash_destroy calls in
hash_create's failure paths. hash_destroy is explicitly not gonna
work on a shared-memory hashtable, and in all other cases I'd expect
that any already-allocated table structure will be in a palloc context
that will get cleaned up during error recovery.

regards, tom lane

#23Jim C. Nasby
jnasby@pervasive.com
In reply to: Robert Treat (#18)
Re: Google SoC--Idea Request

Another idea; add the ability for buildfarm machines to do a pgbench run
to stress-test the code. Such a test would probably have found the
windows pgbench issue I reported some time ago.

This would have to be optional, as not all buildfarm machines/owners
would tolerate the benchmark.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#24Martijn van Oosterhout
kleptog@svana.org
In reply to: Jonah H. Harris (#1)
Re: Google SoC--Idea Request

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:

Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

Here's an idea: Get the ECPG test programs into a state that they can
be integrated into the regression tests.

There are programs already but you can't easily run them, no schema...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#25Christopher Kings-Lynne
chris.kings-lynne@calorieking.com
In reply to: Robert Treat (#18)
Re: Google SoC--Idea Request

I think Martin Oosterhout's nearby email on coverity bug reports might make a
good SoC project, but should it also be added to the TODO list?

I may as well put up phpPgAdmin for it. We have plenty of projects
available in phpPgAdmin...

Chris

#26Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Christopher Kings-Lynne (#25)
Re: Google SoC--Idea Request

Christopher Kings-Lynne wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might
make a good SoC project, but should it also be added to the TODO list?

I may as well put up phpPgAdmin for it. We have plenty of projects
available in phpPgAdmin...

Same with pgAdmin3.

Regards,
Andreas

#27Jim C. Nasby
jnasby@pervasive.com
In reply to: Andreas Pflug (#26)
Re: Google SoC--Idea Request

On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:

Christopher Kings-Lynne wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might
make a good SoC project, but should it also be added to the TODO list?

I may as well put up phpPgAdmin for it. We have plenty of projects
available in phpPgAdmin...

Same with pgAdmin3.

Is there a list of specific projects? I'm pretty sure we can't just say
"work on (pgp)PgAdmin...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#28Jonah H. Harris
jonah.harris@gmail.com
In reply to: Jim C. Nasby (#27)
Re: Google SoC--Idea Request

Robert and I are working on updating it ASAP.

On 4/21/06, Jim C. Nasby <jnasby@pervasive.com> wrote:

On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:

Christopher Kings-Lynne wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might
make a good SoC project, but should it also be added to the TODO list?

I may as well put up phpPgAdmin for it. We have plenty of projects
available in phpPgAdmin...

Same with pgAdmin3.

Is there a list of specific projects? I'm pretty sure we can't just say
"work on (pgp)PgAdmin...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#29Robert Treat
xzilla@users.sourceforge.net
In reply to: Jim C. Nasby (#27)
Re: Google SoC--Idea Request

On Friday 21 April 2006 14:11, Jim C. Nasby wrote:

On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:

Christopher Kings-Lynne wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might
make a good SoC project, but should it also be added to the TODO list?

I may as well put up phpPgAdmin for it. We have plenty of projects
available in phpPgAdmin...

Same with pgAdmin3.

Is there a list of specific projects? I'm pretty sure we can't just say
"work on (pgp)PgAdmin...

http://www.postgresql.org/developer/summerofcode

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#30Jim C. Nasby
jnasby@pervasive.com
In reply to: Robert Treat (#29)
Re: Google SoC--Idea Request

On Fri, Apr 21, 2006 at 05:48:33PM -0400, Robert Treat wrote:

On Friday 21 April 2006 14:11, Jim C. Nasby wrote:

On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:

Christopher Kings-Lynne wrote:

I think Martin Oosterhout's nearby email on coverity bug reports might
make a good SoC project, but should it also be added to the TODO list?

I may as well put up phpPgAdmin for it. We have plenty of projects
available in phpPgAdmin...

Same with pgAdmin3.

Is there a list of specific projects? I'm pretty sure we can't just say
"work on (pgp)PgAdmin...

http://www.postgresql.org/developer/summerofcode

Want to replace

<li><strong>Many TODO Items</strong>A number of the items on our TODO
list have been marked as good projects for beginners whos are new to the
PostgreSQL code. Items on this list have the advantage of already having
general community agreement that the feature is desireable. These items
should also have some general discussion available in the mailing list
archives to help get you started. You can find these items on the <a
href="http://wwwmaster.postgresql.org/docs/faqs.TODO.html&quot;&gt;TODO&lt;/a&gt;
list, they will be marked with apercent sign (%).
</li>

with

<li><strong>Many TODO Items</strong>: A number of the items on our TODO
list have been marked as good projects for beginners who are new to the
PostgreSQL code. Items on this list have the advantage of already having
general community agreement that the feature is desireable. These items
should also have some general discussion available in the mailing list
archives to help get you started. You can find these items on the <a
href="http://wwwmaster.postgresql.org/docs/faqs.TODO.html&quot;&gt;TODO&lt;/a&gt;
list, they will be marked with apercent sign (%).
</li>

?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#31Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Jim C. Nasby (#27)
Re: Google SoC--Idea Request

Jim C. Nasby wrote:

Same with pgAdmin3.

Is there a list of specific projects? I'm pretty sure we can't just say
"work on (pgp)PgAdmin...

Our TODO list has some.

Regards,
Andreas

#32Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jonah H. Harris (#15)
Re: Google SoC--Idea Request

I hope I'm not too late.

Jonah H. Harris wrote:

On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:

Alvaro indicated he would be willing to provide direction on this
with testing support from me. He also said there are several other
possible PL/PHP issues that would warrant a SoC project.

Cool... let's get 'em all listed here so we can move forward.

The following is all PL/php related, in no particular order:

1. Add support for IN/OUT parameters, and named parameters. This should
be easy to do, the majority of needed infraestructure in PL/php is there
already. It only needs a bit more love.

2. Clean up memory usage. Both compilation and execution of a function
should happen on separate, maybe temporary, memory contexts; and provide
adequate cleanup for both (for example when a function is recompiled).

3. Enable it to build separate from the Apache SAPI.

4. Allow huge resultsets to be processed by providing an option to
transparently use a cursor to fetch results partially, when spi_exec()
is called.

5. Clean up the plphp_proc_desc struct. This involves making sure we
store all the info we need to know about a function; no more, no less.
(I think currently we store things we don't need, and we don't store
some things it would be useful to know).

I don't think any of these would warrant a SoC by itself. Maybe the
whole bunch could, however.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#33Jonah H. Harris
jonah.harris@gmail.com
In reply to: Alvaro Herrera (#32)
Re: Google SoC--Idea Request

Cool... will get them added.

On 4/23/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

I hope I'm not too late.

Jonah H. Harris wrote:

On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:

Alvaro indicated he would be willing to provide direction on this
with testing support from me. He also said there are several other
possible PL/PHP issues that would warrant a SoC project.

Cool... let's get 'em all listed here so we can move forward.

The following is all PL/php related, in no particular order:

1. Add support for IN/OUT parameters, and named parameters. This should
be easy to do, the majority of needed infraestructure in PL/php is there
already. It only needs a bit more love.

2. Clean up memory usage. Both compilation and execution of a function
should happen on separate, maybe temporary, memory contexts; and provide
adequate cleanup for both (for example when a function is recompiled).

3. Enable it to build separate from the Apache SAPI.

4. Allow huge resultsets to be processed by providing an option to
transparently use a cursor to fetch results partially, when spi_exec()
is called.

5. Clean up the plphp_proc_desc struct. This involves making sure we
store all the info we need to know about a function; no more, no less.
(I think currently we store things we don't need, and we don't store
some things it would be useful to know).

I don't think any of these would warrant a SoC by itself. Maybe the
whole bunch could, however.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#34Jim C. Nasby
jnasby@pervasive.com
In reply to: Jonah H. Harris (#33)
Re: Google SoC--Idea Request

Where do we stand with getting much more reasonable default values in
postgresql.conf? Maybe that should be a SoC project, or is it too small?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#34)
Re: Google SoC--Idea Request

"Jim C. Nasby" <jnasby@pervasive.com> writes:

Where do we stand with getting much more reasonable default values in
postgresql.conf? Maybe that should be a SoC project, or is it too small?

Define "much more reasonable".

I doubt this is SoC material, simply because the issues have little to
do with coding and a lot to do with persuading people to drop default
support for old platforms. Which is not something a student is likely
to succeed at.

regards, tom lane

#36Jonah H. Harris
jonah.harris@gmail.com
In reply to: Tom Lane (#35)
Re: Google SoC--Idea Request

On 4/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I doubt this is SoC material, simply because the issues have little to
do with coding and a lot to do with persuading people to drop default
support for old platforms. Which is not something a student is likely
to succeed at.

While the student could do some benchmarking on relatively new
hardware and make suggestions, I agree with Tom. Having to keep
support for older platforms doesn't leave much flexibility to change
the defaults.

I just don't see enough work here to warrant a SoC project.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonah H. Harris (#36)
Re: Google SoC--Idea Request

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

While the student could do some benchmarking on relatively new
hardware and make suggestions, I agree with Tom. Having to keep
support for older platforms doesn't leave much flexibility to change
the defaults.

Another point here is that the defaults *are* reasonable for development
and for small installations; the people who are complaining are the ones
who expect to run terabyte databases without any tuning. (I exaggerate
perhaps, but the point is valid.)

We've talked more than once about offering multiple alternative
starting-point postgresql.conf files to give people an idea of what to
do for small/medium/large installations. MySQL have done that for years
and it doesn't seem that users are unable to cope with the concept.
But doing this is (a) mostly a matter of testing and documenting, not
coding and (b) probably too small for a SoC project anyway.

regards, tom lane

#38Jonah H. Harris
jonah.harris@gmail.com
In reply to: Tom Lane (#37)
Re: Google SoC--Idea Request

On 4/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We've talked more than once about offering multiple alternative
starting-point postgresql.conf files to give people an idea of what to
do for small/medium/large installations. MySQL have done that for years
and it doesn't seem that users are unable to cope with the concept.
But doing this is (a) mostly a matter of testing and documenting, not
coding and (b) probably too small for a SoC project anyway.

Yeah, it would be nice to offer a small/med/large config file, but
there are also other considerations that affect PostgreSQL and not
MySQL. An example is the system-wide shared memory maximum... RedHat
defaults to 32M, SuSE to 32M?, and OSX to 4M (or something crazy like
that). So even if we give out a med/large config file, they won't
work for most people who have default Linux installs. Tuning
PostgreSQL isn't all that hard, but it may be nice to give people a
starting point.

I don't know, I'm not averse to adding something like the following to
the SoC ideas:

Benchmark PostgreSQL and analyze results to build optimal default
configuration files for medium and large-scale systems.

Of course, the definition of medium and large vary, as does the
application (OLTP, DSS, etc.); so we'd have to define them.

Thoughts?

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#39Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#37)
Re: Google SoC--Idea Request

On Mon, Apr 24, 2006 at 11:05:18PM -0400, Tom Lane wrote:

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

While the student could do some benchmarking on relatively new
hardware and make suggestions, I agree with Tom. Having to keep
support for older platforms doesn't leave much flexibility to change
the defaults.

Another point here is that the defaults *are* reasonable for development
and for small installations; the people who are complaining are the ones
who expect to run terabyte databases without any tuning. (I exaggerate
perhaps, but the point is valid.)

We've talked more than once about offering multiple alternative
starting-point postgresql.conf files to give people an idea of what to
do for small/medium/large installations. MySQL have done that for years
and it doesn't seem that users are unable to cope with the concept.
But doing this is (a) mostly a matter of testing and documenting, not
coding and (b) probably too small for a SoC project anyway.

My recollection was that there was opposition to offering multiple
config files, but that there was a proposal to make initdb smarter about
picking configuration values.

Personally, I agree that multiple config files would be fine. Or a
really fancy solution would be feeding a config option to initdb and
have it generate an appropriate postgresql.conf.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#40Andrew Dunstan
andrew@dunslane.net
In reply to: Jim C. Nasby (#39)
Re: Google SoC--Idea Request

Jim C. Nasby wrote:

On Mon, Apr 24, 2006 at 11:05:18PM -0400, Tom Lane wrote:

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

While the student could do some benchmarking on relatively new
hardware and make suggestions, I agree with Tom. Having to keep
support for older platforms doesn't leave much flexibility to change
the defaults.

Another point here is that the defaults *are* reasonable for development
and for small installations; the people who are complaining are the ones
who expect to run terabyte databases without any tuning. (I exaggerate
perhaps, but the point is valid.)

We've talked more than once about offering multiple alternative
starting-point postgresql.conf files to give people an idea of what to
do for small/medium/large installations. MySQL have done that for years
and it doesn't seem that users are unable to cope with the concept.
But doing this is (a) mostly a matter of testing and documenting, not
coding and (b) probably too small for a SoC project anyway.

My recollection was that there was opposition to offering multiple
config files, but that there was a proposal to make initdb smarter about
picking configuration values.

Personally, I agree that multiple config files would be fine. Or a
really fancy solution would be feeding a config option to initdb and
have it generate an appropriate postgresql.conf.

We have already done some initdb tuning improvements for 8.2 - shared
buffers now tops out at 4000 instead of 1000 and initdb now sets
max_fsm_pages at a more realistic level. (top is 200,000 instead of
previously hardcoded 20,000).

I would have liked to increase max_connections too, but that would have
caused problems on OSX, apparently. See previous discussion.

Personally I would much rather see a tuning advisor tool in more general
use than just provide small/medium/large config setting files.

cheers

andrew

#41Jonah H. Harris
jonah.harris@gmail.com
In reply to: Andrew Dunstan (#40)
Re: Google SoC--Idea Request

On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:

We have already done some initdb tuning improvements for 8.2

Cool, I hadn't looked at this.

I would have liked to increase max_connections too, but that would have
caused problems on OSX, apparently. See previous discussion.

Yeah, their defaults really suck.

Personally I would much rather see a tuning advisor tool in more general
use than just provide small/medium/large config setting files.

True dat.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonah H. Harris (#41)
Re: Google SoC--Idea Request

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:

Personally I would much rather see a tuning advisor tool in more general
use than just provide small/medium/large config setting files.

True dat.

One thing that has to be figured out before we can go far with this
is the whole question of how much smarts initdb really ought to have.
Since a lot of packagers think that initdb should be run
non-interactively behind the scenes, the obvious solution of "give
initdb a --small/--medium/--large parameter" does not work all that
nicely. But on the other hand we can't just tell people to drop in
replacement config files when the one in place contains initdb-created
specifics, such as locale settings.

Now that there's a provision for "include" directives in
postgresql.conf, one way to address this would be to split the
config info into multiple physical files, some containing purely
performance-related settings while others consider functionality.
But that seems more like a wart than a solution to me. I feel that
we've pushed performance-tuning logic into initdb that probably ought
not be there, and we ought to factor it out again.

regards, tom lane

#43ipig
ipig@ercist.iscas.ac.cn
In reply to: Jim C. Nasby (#9)
Re: Google SoC--Idea Request

Maybe you can develop a graphic interface just like Fedora Core setup interface which can choose packages installing, then the user can choose config file and then have a little change in parameters.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Jonah H. Harris" <jonah.harris@gmail.com>
Cc: "Andrew Dunstan" <andrew@dunslane.net>; "Jim C. Nasby" <jnasby@pervasive.com>; "John DeSoi" <desoi@pgedit.com>; "Pgsql Hackers" <pgsql-hackers@postgresql.org>
Sent: Tuesday, April 25, 2006 2:16 PM
Subject: Re: [HACKERS] Google SoC--Idea Request

Show quoted text

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:

Personally I would much rather see a tuning advisor tool in more general
use than just provide small/medium/large config setting files.

True dat.

One thing that has to be figured out before we can go far with this
is the whole question of how much smarts initdb really ought to have.
Since a lot of packagers think that initdb should be run
non-interactively behind the scenes, the obvious solution of "give
initdb a --small/--medium/--large parameter" does not work all that
nicely. But on the other hand we can't just tell people to drop in
replacement config files when the one in place contains initdb-created
specifics, such as locale settings.

Now that there's a provision for "include" directives in
postgresql.conf, one way to address this would be to split the
config info into multiple physical files, some containing purely
performance-related settings while others consider functionality.
But that seems more like a wart than a solution to me. I feel that
we've pushed performance-tuning logic into initdb that probably ought
not be there, and we ought to factor it out again.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

#44Bort, Paul
pbort@tmwsystems.com
In reply to: ipig (#43)
Re: Google SoC--Idea Request

Personally I would much rather see a tuning advisor tool in

more general

use than just provide small/medium/large config setting files.

True dat.

Maybe the SoC project here is just such a tuning advisor tool? Something
that can run pgbench repeatedly, try different settings, and compare
results.

#45Jonah H. Harris
jonah.harris@gmail.com
In reply to: Bort, Paul (#44)
Re: Google SoC--Idea Request

On 4/25/06, Bort, Paul <pbort@tmwsystems.com> wrote:

Maybe the SoC project here is just such a tuning advisor tool? Something
that can run pgbench repeatedly, try different settings, and compare
results.

IIRC, that already exists. I think it was called pg_autotune or
something similar.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#46Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#42)
Re: Google SoC--Idea Request

Tom Lane wrote:

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:

Personally I would much rather see a tuning advisor tool in more general
use than just provide small/medium/large config setting files.

True dat.

One thing that has to be figured out before we can go far with this
is the whole question of how much smarts initdb really ought to have.
Since a lot of packagers think that initdb should be run
non-interactively behind the scenes, the obvious solution of "give
initdb a --small/--medium/--large parameter" does not work all that
nicely. But on the other hand we can't just tell people to drop in
replacement config files when the one in place contains initdb-created
specifics, such as locale settings.

Now that there's a provision for "include" directives in
postgresql.conf, one way to address this would be to split the
config info into multiple physical files, some containing purely
performance-related settings while others consider functionality.
But that seems more like a wart than a solution to me. I feel that
we've pushed performance-tuning logic into initdb that probably ought
not be there, and we ought to factor it out again.

Sounds good. I don't care what we do for 8.2, but we should do
something.

Or am I going to have to bring out my dancing elephant again? :-)

http://www.janetskiles.com/ART/greeting/greet-ani/dancing-elephant.jpg

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#47Jim C. Nasby
jnasby@pervasive.com
In reply to: Jonah H. Harris (#45)
Re: Google SoC--Idea Request

On Tue, Apr 25, 2006 at 08:39:57AM -0400, Jonah H. Harris wrote:

On 4/25/06, Bort, Paul <pbort@tmwsystems.com> wrote:

Maybe the SoC project here is just such a tuning advisor tool? Something
that can run pgbench repeatedly, try different settings, and compare
results.

IIRC, that already exists. I think it was called pg_autotune or
something similar.

Last time I tried autotune I couldn't get it to work on FreeBSD, and it
tuned a minimum of parameters. For example, it didn't touch
checkpoint_segments, which is pretty essential to tune on a higher-end
server.

Not saying it wouldn't be a good place to start, but I also don't think
it's a replacement for a built-in tuning tool.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#48Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Jonah H. Harris (#1)
Re: Google SoC--Idea Request

Proposal: XMLType for PostgreSQL.

*** Minimum: ***
to have special type support for storing XML data and working with it.
This means following:
- ability to define any column of a table as of XMLType; internally,
all data is stored as VARCHAR;
- auto validation of documents against XML schema, if it was
specified in column
definition or in XML data sheets themselves (DTD, XSD or at least one
of them) /*contrib/xml2 has such feature, but it uses libxml, what
means DOM interface. Maybe it's better to use some SAX parser to solve
this task*/;
- XPath indexes for queries with path expressions in WHERE clause /*I
suppose this kind of indexes would be most frequently used. I propose
using good labeling schema and GIST and/or Gin here*/;
- some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has
more than 400 pages now and contains some established constructions,
that are using in other DBMSes. There is the some patch already
written by Pavel Stehule:
http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?
it was kept for 8.2, so what is the result?) I've tested it several
months ago, basic SQL/XML functions worked fine. It changes grammar,
but there is no other way... So, using this patch as a part of this
project means that this project cannot be contrib module,
unfortunately. Nevertheless, current paper of SQL/XML standard seems
to be mature - so, compared with existing implementation it would be a
nice 'landmark';
- XML domains support: ability to define domain based on XMLType and
XML schema definition (e.g., external DTD file or smth). I'd consider
XML schema definition as a restriction of entire XML Type (similar to
restrictions for plain types, which are defined as CHECK constraint in
domain definition)

*** Maximum: ***
- all things from 'minimum' list :-)
- reach index system:
* structure index (labeling schema; prefix schemas seem to be best
for this and I
suppose GIST would help here). Actually, it would be full shredding,
like primary index for XML in MS SQL Server, but I'm aware of better
labeling algorithms than simple prefix labeling (as in SQL Server).
Surely, GIST/Gin support would be great foundation for these
* flexible support of path indexes, value indexes and so on (smth
like secondary XML indexes in SQL Server...) - as a continuation of
work on path indexes from 'minimum' list;
- full-text search abilties (tsearch2 / GIST);
- different encoding issues (auto conversion to column's encoding, etc);
- ability to choose storage type: VARCHAR or 'native' (trees - like
in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)])
mode. Actually, this is very-very huge task (almost so as creating
DBMS from scratch) and I inderstand clearly that I won't solve it
using only my own abilities. But the work on 'minimum' list
(especially if it will be a part of SoC) would be a good start point
and may involve some other developers that help to implement it. Maybe
at the initial stage, it's worth to integrate with some other DBMS and
work with it using two-phase commit (surely, this is not a clue to all
problems, as it
means two different execution plans, etc);
- XQuery and its integration with SQL (according SQL/XML standard).
In other words, implementation of XQuery Data Model - this would be
great target point (version 1.0 of entire project);
- XML views / updatable XML views (actually, it's a crazy idea, but
it's my dream ;-) )

As a part of SoC I would concentrate on tasks from 'minimum' list. It
would be a good start point.

Some articles:
Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip
Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards
XISS system (Li, Moon - advanced interval indexes):
http://www.cs.arizona.edu/xiss/
MASS (prefix indexes):
http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html
Staircase joins (accelerating XPath Evaluation):
http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf
Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo
XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf
XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf
Labeling schema in SQL Server (ORDPATHs):
http://portal.acm.org/ft_gateway.cfm?id=1007686&amp;type=pdf&amp;coll=GUIDE&amp;dl=GUIDE&amp;CFID=74920272&amp;CFTOKEN=73736781

One more comment: I'm a PhD student of MIPT, Russia. I plan to create
an overview of XMLType implementations of last versions of three major
commercial DBMSes (ORA, MS, DB2), comparing them to standard and each
other. First article of this comparison is planned to the end of May.
This work will help to understand, where major commercial DBMS vendors
go and why they go there :-) Moreover, I intend to create a technique
for testing of XMLType support in (O)RDBMSes. In spite of the fact,
that SoC assumes all work be done by only one person, I expect some
upport/help from following people:
- Dr. Sergey Kuznetsov (my scientific mentor)
- Oleg Bartunov and Teodor Sigaev (as major developers of PostgreSQL
and GIST and Gin, they definitely can help me to be successive);
- Ivan Zolotukhin (together we plan to create the overview mentioned above)
- PostgreSQL community (actually, as I've already mentioned, I intend
using code by Pavel Stehule, and I'm pretty sure that I'll need a lot
of other help from the community)

On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:

Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas. I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Best regards,
Nikolay

#49Jonah H. Harris
jonah.harris@gmail.com
In reply to: Nikolay Samokhvalov (#48)
Re: Google SoC--Idea Request

You need to submit this through Google.

Student FAQ:
http://code.google.com/soc/studentfaq.html

Student Sign-up:
http://code.google.com/soc/student_step1.html

On 5/2/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

Proposal: XMLType for PostgreSQL.

*** Minimum: ***
to have special type support for storing XML data and working with it.
This means following:
- ability to define any column of a table as of XMLType; internally,
all data is stored as VARCHAR;
- auto validation of documents against XML schema, if it was
specified in column
definition or in XML data sheets themselves (DTD, XSD or at least one
of them) /*contrib/xml2 has such feature, but it uses libxml, what
means DOM interface. Maybe it's better to use some SAX parser to solve
this task*/;
- XPath indexes for queries with path expressions in WHERE clause /*I
suppose this kind of indexes would be most frequently used. I propose
using good labeling schema and GIST and/or Gin here*/;
- some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has
more than 400 pages now and contains some established constructions,
that are using in other DBMSes. There is the some patch already
written by Pavel Stehule:
http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?
it was kept for 8.2, so what is the result?) I've tested it several
months ago, basic SQL/XML functions worked fine. It changes grammar,
but there is no other way... So, using this patch as a part of this
project means that this project cannot be contrib module,
unfortunately. Nevertheless, current paper of SQL/XML standard seems
to be mature - so, compared with existing implementation it would be a
nice 'landmark';
- XML domains support: ability to define domain based on XMLType and
XML schema definition (e.g., external DTD file or smth). I'd consider
XML schema definition as a restriction of entire XML Type (similar to
restrictions for plain types, which are defined as CHECK constraint in
domain definition)

*** Maximum: ***
- all things from 'minimum' list :-)
- reach index system:
* structure index (labeling schema; prefix schemas seem to be best
for this and I
suppose GIST would help here). Actually, it would be full shredding,
like primary index for XML in MS SQL Server, but I'm aware of better
labeling algorithms than simple prefix labeling (as in SQL Server).
Surely, GIST/Gin support would be great foundation for these
* flexible support of path indexes, value indexes and so on (smth
like secondary XML indexes in SQL Server...) - as a continuation of
work on path indexes from 'minimum' list;
- full-text search abilties (tsearch2 / GIST);
- different encoding issues (auto conversion to column's encoding, etc);
- ability to choose storage type: VARCHAR or 'native' (trees - like
in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)])
mode. Actually, this is very-very huge task (almost so as creating
DBMS from scratch) and I inderstand clearly that I won't solve it
using only my own abilities. But the work on 'minimum' list
(especially if it will be a part of SoC) would be a good start point
and may involve some other developers that help to implement it. Maybe
at the initial stage, it's worth to integrate with some other DBMS and
work with it using two-phase commit (surely, this is not a clue to all
problems, as it
means two different execution plans, etc);
- XQuery and its integration with SQL (according SQL/XML standard).
In other words, implementation of XQuery Data Model - this would be
great target point (version 1.0 of entire project);
- XML views / updatable XML views (actually, it's a crazy idea, but
it's my dream ;-) )

As a part of SoC I would concentrate on tasks from 'minimum' list. It
would be a good start point.

Some articles:
Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip
Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards
XISS system (Li, Moon - advanced interval indexes):
http://www.cs.arizona.edu/xiss/
MASS (prefix indexes):
http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html
Staircase joins (accelerating XPath Evaluation):
http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf
Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo
XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf
XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf
Labeling schema in SQL Server (ORDPATHs):
http://portal.acm.org/ft_gateway.cfm?id=1007686&amp;type=pdf&amp;coll=GUIDE&amp;dl=GUIDE&amp;CFID=74920272&amp;CFTOKEN=73736781

One more comment: I'm a PhD student of MIPT, Russia. I plan to create
an overview of XMLType implementations of last versions of three major
commercial DBMSes (ORA, MS, DB2), comparing them to standard and each
other. First article of this comparison is planned to the end of May.
This work will help to understand, where major commercial DBMS vendors
go and why they go there :-) Moreover, I intend to create a technique
for testing of XMLType support in (O)RDBMSes. In spite of the fact,
that SoC assumes all work be done by only one person, I expect some
upport/help from following people:
- Dr. Sergey Kuznetsov (my scientific mentor)
- Oleg Bartunov and Teodor Sigaev (as major developers of PostgreSQL
and GIST and Gin, they definitely can help me to be successive);
- Ivan Zolotukhin (together we plan to create the overview mentioned above)
- PostgreSQL community (actually, as I've already mentioned, I intend
using code by Pavel Stehule, and I'm pretty sure that I'll need a lot
of other help from the community)

On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:

Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects. However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas. I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Best regards,
Nikolay

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#50Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#22)
Re: Google SoC--Idea Request

On Thu, Apr 20, 2006 at 11:56:32AM -0400, Tom Lane wrote:

Martijn van Oosterhout <kleptog@svana.org> writes:

About the only thing in the backend I found interesting was this:
src/backend/utils/hash/dynahash.c function hash_create

I wonder if we shouldn't just remove the hash_destroy calls in
hash_create's failure paths. hash_destroy is explicitly not gonna
work on a shared-memory hashtable, and in all other cases I'd expect
that any already-allocated table structure will be in a palloc context
that will get cleaned up during error recovery.

[re: failure to create hash in shared memory causes crash]

Any thoughts on this? Make it a TODO item, document it, or simply
ignore it?

Have a nicy day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#50)
Re: Google SoC--Idea Request

Martijn van Oosterhout <kleptog@svana.org> writes:

On Thu, Apr 20, 2006 at 11:56:32AM -0400, Tom Lane wrote:

I wonder if we shouldn't just remove the hash_destroy calls in
hash_create's failure paths. hash_destroy is explicitly not gonna
work on a shared-memory hashtable, and in all other cases I'd expect
that any already-allocated table structure will be in a palloc context
that will get cleaned up during error recovery.

Any thoughts on this? Make it a TODO item, document it, or simply
ignore it?

It's like a two-line patch, so hardly worth putting in TODO ... might
as well just do it. IIRC the motivation is mostly to silence a
Coverity warning?

regards, tom lane

#52Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#51)
Re: Google SoC--Idea Request

On Mon, Aug 14, 2006 at 08:09:36AM -0400, Tom Lane wrote:

Any thoughts on this? Make it a TODO item, document it, or simply
ignore it?

It's like a two-line patch, so hardly worth putting in TODO ... might
as well just do it. IIRC the motivation is mostly to silence a
Coverity warning?

Well sort of. I can also just tick a box and the warning goes away too.
It just seemed from the discussion that it was something people were
going to fix...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#52)
Re: Google SoC--Idea Request

Martijn van Oosterhout <kleptog@svana.org> writes:

On Mon, Aug 14, 2006 at 08:09:36AM -0400, Tom Lane wrote:

It's like a two-line patch, so hardly worth putting in TODO ... might
as well just do it. IIRC the motivation is mostly to silence a
Coverity warning?

Well sort of. I can also just tick a box and the warning goes away too.
It just seemed from the discussion that it was something people were
going to fix...

Done now --- I have to admit I'd forgotten about it.

regards, tom lane