Call for Google Summer of Code (GSoC) 2012: Project ideas?

Started by Stefan Kellerabout 14 years ago14 messagesgeneral

sfkeller@gmail.com

about 14 years ago

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

andy@squeakycode.net

about 14 years ago

In reply to: Stefan Keller (#1)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

On 03/08/2012 01:40 PM, Stefan Keller wrote:

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

How about one of:

1) on disk page level compression (maybe with LZF or snappy) (maybe not page level, any level really)

I know toast compresses, but I believe its only one row. page level would compress better because there is more data, and it would also decrease the amount of IO, so it might speed up disk access.

2) better partitioning support. Something much more automatic.

3) take a nice big table, have it inserted/updated a few times a second. Then make "select * from bigtable where indexed_field = 'somevalue'; work 10 times faster than it does today.

I think there is also a wish list on the wiki somewhere.

-Andy

dennis.jenkins.75@gmail.com

about 14 years ago

In reply to: Andy Colson (#2)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

How about one of:

1) on disk page level compression (maybe with LZF or snappy) (maybe not page
level, any level really)

I know toast compresses, but I believe its only one row. page level would
compress better because there is more data, and it would also decrease the
amount of IO, so it might speed up disk access.

2) better partitioning support. Something much more automatic.

3) take a nice big table, have it inserted/updated a few times a second.
Then make "select * from bigtable where indexed_field = 'somevalue'; work
10 times faster than it does today.

I think there is also a wish list on the wiki somewhere.

-Andy

Ability to dynamically resize the shared-memory segment without taking
postgresql down :)

simon@2ndQuadrant.com

about 14 years ago

In reply to: Andy Colson (#2)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

On Thu, Mar 8, 2012 at 8:01 PM, Andy Colson <andy@squeakycode.net> wrote:

On 03/08/2012 01:40 PM, Stefan Keller wrote:

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

How about one of:

1) on disk page level compression (maybe with LZF or snappy) (maybe not page
level, any level really)

I know toast compresses, but I believe its only one row. page level would
compress better because there is more data, and it would also decrease the
amount of IO, so it might speed up disk access.

2) better partitioning support. Something much more automatic.

3) take a nice big table, have it inserted/updated a few times a second.
Then make "select * from bigtable where indexed_field = 'somevalue'; work
10 times faster than it does today.

I think there is also a wish list on the wiki somewhere.

Nice ideas

Those aren't projects we should be giving to summer students. I don't
suppose many people could do those things in two months, let alone
people with the least experience in both their career and our
codebase.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

mmoncure@gmail.com

about 14 years ago

In reply to: Andy Colson (#2)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson <andy@squeakycode.net> wrote:

I know toast compresses, but I believe its only one row. page level would
compress better because there is more data, and it would also decrease the
amount of IO, so it might speed up disk access.

er, but when data is toasted it's spanning pages. page level
compression is a super complicated problem.

something that is maybe more attainable on the compression side of
things is a userland api for compression -- like pgcrypto is for
encryption. even if it didn't make it into core, it could live on
reasonably as a pgfoundry project.

merlin

andy@squeakycode.net

about 14 years ago

In reply to: Merlin Moncure (#5)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

On 3/9/2012 9:47 AM, Merlin Moncure wrote:

On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy@squeakycode.net> wrote:

I know toast compresses, but I believe its only one row. page level would
compress better because there is more data, and it would also decrease the
amount of IO, so it might speed up disk access.

er, but when data is toasted it's spanning pages. page level
compression is a super complicated problem.

something that is maybe more attainable on the compression side of
things is a userland api for compression -- like pgcrypto is for
encryption. even if it didn't make it into core, it could live on
reasonably as a pgfoundry project.

merlin

Agreed its probably too difficult for a GSoC project. But userland api
would still be row level, which, in my opinion is useless. Consider
rows from my apache log that I'm dumping to database:

date, url, status
2012-3-9 10:15:00, '/index.php?id=4', 202
2012-3-9 10:15:01, '/index.php?id=5', 202
2012-3-9 10:15:02, '/index.php?id=6', 202

That wont compress at all on a row level. But it'll compress 99% on a
"larger" (page/multirow/whatever/?) level.

-Andy

mmoncure@gmail.com

about 14 years ago

In reply to: Andy Colson (#6)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

On Fri, Mar 9, 2012 at 10:19 AM, Andy Colson <andy@squeakycode.net> wrote:

On 3/9/2012 9:47 AM, Merlin Moncure wrote:

On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy@squeakycode.net> wrote:

I know toast compresses, but I believe its only one row. page level
would
compress better because there is more data, and it would also decrease
the
amount of IO, so it might speed up disk access.

er, but when data is toasted it's spanning pages. page level
compression is a super complicated problem.

something that is maybe more attainable on the compression side of
things is a userland api for compression -- like pgcrypto is for
encryption. even if it didn't make it into core, it could live on
reasonably as a pgfoundry project.

merlin

Agreed its probably too difficult for a GSoC project. But userland api
would still be row level, which, in my opinion is useless. Consider rows
from my apache log that I'm dumping to database:

It's useless for what you're trying to do, but it would be useful to
people trying to compress large datums (data, I know) before storage
using algorithms that postgres can't support, like lzo.

date, url, status
2012-3-9 10:15:00, '/index.php?id=4', 202
2012-3-9 10:15:01, '/index.php?id=5', 202
2012-3-9 10:15:02, '/index.php?id=6', 202

That wont compress at all on a row level. But it'll compress 99% on a
"larger" (page/multirow/whatever/?) level.

sure, but you can only get those rates by giving up the segmented view
of the data that postgres requires. your tuples are very small and I
only see compression happening on the userland side by employing
tricks specific to your specific dataset (like employing "char" to map
the status, url mapping, etc).

merlin

Selena Deckelmann

selenamarie@gmail.com

about 14 years ago

In reply to: Stefan Keller (#1)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Hi!

On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote:

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

Also those who are on this thread, we are collecting ideas on the wiki:

http://wiki.postgresql.org/wiki/GSoC_2012

And we have the TODO list:

http://wiki.postgresql.org/wiki/TODO

-selena

--
http://chesnok.com
@selenamarie

laurenz.albe@cybertec.at

about 14 years ago

In reply to: Selena Deckelmann (#8)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Selena Deckelmann wrote:

On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote:

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

Also those who are on this thread, we are collecting ideas on the wiki:

http://wiki.postgresql.org/wiki/GSoC_2012

I have added Foreign Data Wrappers.
I think that would be a good idea for anybody who wants a clearly
defined project - the API is (currently changing but) documented,
it's a good opportunity to learn hacking PostgreSQL server code,
and you can leverage your knowledge of other software.

Yours,
Laurenz Albe

saasira@gmail.com

about 14 years ago

In reply to: Laurenz Albe (#9)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Excuse me if what i say below is nonsensical, for I haven't read much about
compression techniques and hence these ramblings are just out of common
sense.

I think the debate about level (row, page, file) of compression arises when
we strictly stick to the axioms of compression which require that all the
info that would be needed for decompression must also be presented in the
same compressed unit.

Can't we relax this rule a bit and separate out the compression-hints into
separate file, like the way we have a table data in one file and the
positional references [indexes] in another file? will it not eliminate this
dilemma about the boundaries of compression?

perhaps a periodic auto vacuum like compressor daemon can take up the job
of recompression to have the compression-hints updated as per the latest
data present in the file/page at that instant.

Regards,
Samba

pierce@hogranch.com

about 14 years ago

In reply to: Andy Colson (#2)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

On 03/08/12 12:01 PM, Andy Colson wrote:

2) better partitioning support. Something much more automatic.

that would be really high on our list. and something that can handle
adding/dropping partitions while there's concurrent transactions
involving the partitioned table

also a planner that can cope with optimizing prepared statements where
the partitioning variable is a passed parameter.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

Kiriakos Georgiou

kg.postgresql@olympiakos.com

about 14 years ago

In reply to: John R Pierce (#11)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

+1 to seamless partitioning.

Although the idea of having a student work on this seems a bit scary, but what seems scary to me may be a piece of cake for a talented kid :-)

Kiriakos
http://www.mockbites.com

On Mar 13, 2012, at 3:07 PM, John R Pierce wrote:

Show quoted text

On 03/08/12 12:01 PM, Andy Colson wrote:

2) better partitioning support. Something much more automatic.

that would be really high on our list. and something that can handle adding/dropping partitions while there's concurrent transactions involving the partitioned table

also a planner that can cope with optimizing prepared statements where the partitioning variable is a passed parameter.

Thomas Kellerer

spam_eater@gmx.net

about 14 years ago

In reply to: Stefan Keller (#1)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Stefan Keller, 08.03.2012 20:40:

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

What about an extension to the CREATE TRIGGER syntax that combines trigger definition and function definition in a single statement?

Something like:

CREATE TRIGGER my_trg BEFORE UPDATE ON some_table
FOR EACH ROW EXECUTE
DO
$body$
BEGIN
... here goes the function code ...
END;
$body$
LANGUAGE plpgsql;

which would create both objects (trigger and trigger function) at the same time in the background.

The CASCADE option of DROP TRIGGER could be enhanced to include the corresponding function in the DROP as well.

This would make the syntax a bit easier to handle for those cases where a 1:1 relationship exists between triggers and functions but would still allow the flexibility to re-use trigger functions in more than one trigger.

Regards
Thomas

sfkeller@gmail.com

about 14 years ago

In reply to: Thomas Kellerer (#13)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Hi all,

2012/3/14 Thomas Kellerer <spam_eater@gmx.net>:

Stefan Keller, 08.03.2012 20:40:

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

What about an extension to the CREATE TRIGGER syntax that combines trigger
definition and function definition in a single statement?

Something like:

CREATE TRIGGER my_trg BEFORE UPDATE ON some_table
FOR EACH ROW EXECUTE
DO
$body$
BEGIN
... here goes the function code ...
END;
$body$
LANGUAGE plpgsql;

which would create both objects (trigger and trigger function) at the same
time in the background.

The CASCADE option of DROP TRIGGER could be enhanced to include the
corresponding function in the DROP as well.

This would make the syntax a bit easier to handle for those cases where a
1:1 relationship exists between triggers and functions but would still allow
the flexibility to re-use trigger functions in more than one trigger.

Regards
Thomas

Thanks to all who responded here.
There are now two students here at our university and it seems that
they prefer another open source project (which I support too).
Let's take some these good ideas to the Postgres wiki (if there is an
idea page there :->)

-Stefan