alternative back-end block formats

Started by Christian Conveyalmost 12 years ago10 messages
#1Christian Convey
christian.convey@gmail.com

Hi all,

I'm playing around with Postgres, and I thought it might be fun to
experiment with alternative formats for relation blocks, to see if I can
get smaller files and/or faster server performance.

Does anyone know if this has been done before with Postgres? I would have
assumed yes, but I'm not finding anything in Google about people having
done this.

Thanks,
Christian

#2Bruce Momjian
bruce@momjian.us
In reply to: Christian Convey (#1)
Re: alternative back-end block formats

On Tue, Jan 21, 2014 at 06:43:54AM -0500, Christian Convey wrote:

Hi all,

I'm playing around with Postgres, and I thought it might be fun to experiment
with alternative formats for relation blocks, to see if I can get smaller files
and/or faster server performance.

Does anyone know if this has been done before with Postgres? I would have
assumed yes, but I'm not finding anything in Google about people having done
this.

Not that I know of.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Christian Convey (#1)
Re: alternative back-end block formats

On 01/21/2014 07:43 PM, Christian Convey wrote:

Hi all,

I'm playing around with Postgres, and I thought it might be fun to
experiment with alternative formats for relation blocks, to see if I can
get smaller files and/or faster server performance.

It's not clear how you'd do this without massively rewriting the guts of Pg.

Per the docs on internal structure, Pg has a block header, then tuples
within the blocks, each with a tuple header and list of Datum values for
the tuple. Each Datum has a generic Datum header (handling varlena vs
fixed length values etc) then a type-specific on-disk representation
controlled by the type output function for that type.

At least, that's my understanding - I haven't had cause to delve into
the on-disk format yet.

What concrete problem do you mean to tackle? What idea do you want to
explore or implement?

Does anyone know if this has been done before with Postgres? I would
have assumed yes, but I'm not finding anything in Google about people
having done this.

AFAIK (and I don't know much in this area) the storage manager isn't
very pluggable compared to the rest of Pg.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Christian Convey
christian.convey@gmail.com
In reply to: Craig Ringer (#3)
Re: alternative back-end block formats

Hi Craig,

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 01/21/2014 07:43 PM, Christian Convey wrote:

Hi all,

I'm playing around with Postgres, and I thought it might be fun to
experiment with alternative formats for relation blocks, to see if I can
get smaller files and/or faster server performance.

It's not clear how you'd do this without massively rewriting the guts of
Pg.

Per the docs on internal structure, Pg has a block header, then tuples
within the blocks, each with a tuple header and list of Datum values for
the tuple. Each Datum has a generic Datum header (handling varlena vs
fixed length values etc) then a type-specific on-disk representation
controlled by the type output function for that type.

I'm still in the process of getting familiar with the pg backend code, so I
don't have a concrete plan yet. However, I'm working on the assumption
that some set of macros and functions encapsulates the page layout.

If/when I tackle this, I expect to add a layer of indirection somewhere
around that boundary, so that some non-catalog tables, whose schemas meet
certain simplifying assumptions, are read and modified using specialized
code.

I don't want to get into the specific optimizations I'd like to try, only
because I haven't fully studied the code yet, so I don't want to put my
foot in my mouth.

What concrete problem do you mean to tackle? What idea do you want to

explore or implement?

My real motivation is that I'd like to get more familiar with the pg
backend codebase, and tilting at this windmill seemed like an interesting
way to accomplish that.

If I was focused on really solving a real-world problem, I'd say that this
lays the groundwork for table-schema-specific storage optimizations and
optimized record-filtering code. But I'd only make that argument if I
planned to (a) perform a careful study with statistically significant
benchmarks, and/or (b) produce a merge-worthy patch. At this point I have
no intentions of doing so. My main goal really is just to have fun with
the code.

Does anyone know if this has been done before with Postgres? I would
have assumed yes, but I'm not finding anything in Google about people
having done this.

AFAIK (and I don't know much in this area) the storage manager isn't
very pluggable compared to the rest of Pg.

Thanks for the warning. Duly noted.

Kind regards,
Christian

#5Cédric Villemain
cedric@2ndquadrant.com
In reply to: Christian Convey (#4)
Re: alternative back-end block formats

Le lundi 27 janvier 2014 13:42:29 Christian Convey a écrit :

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer <craig@2ndquadrant.com>

wrote:

On 01/21/2014 07:43 PM, Christian Convey wrote:

Does anyone know if this has been done before with Postgres? I
would
have assumed yes, but I'm not finding anything in Google about
people
having done this.

AFAIK (and I don't know much in this area) the storage manager isn't
very pluggable compared to the rest of Pg.

Thanks for the warning. Duly noted.

As written in the meeting notes, Tom Lane revealed an interest in
pluggable storage. So it might be interesting to check that.

https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#6Christian Convey
christian.convey@gmail.com
In reply to: Cédric Villemain (#5)
Re: alternative back-end block formats

On Tue, Jan 28, 2014 at 5:42 AM, Cédric Villemain <cedric@2ndquadrant.com>wrote:
...

As written in the meeting notes, Tom Lane revealed an interest in
pluggable storage. So it might be interesting to check that.

https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting

Thanks. I just read those meeting notes, and also Josh Berkus' blog on the
topic:
http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html

I was only thinking to enable pluggable operations on a single, specified
heap page, probably as a function of which table owned the page. Josh's
blog seems to describe something a little broader in scope, although I
can't tell from that post exactly what functionality that entails.

Either way, this sounds like something I'd enjoy pitching in on, to
whatever extent I could be useful. Has anyone started work on this yet?

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christian Convey (#6)
Re: alternative back-end block formats

Christian Convey <christian.convey@gmail.com> writes:

On Tue, Jan 28, 2014 at 5:42 AM, C�dric Villemain <cedric@2ndquadrant.com>wrote:

As written in the meeting notes, Tom Lane revealed an interest in
pluggable storage. So it might be interesting to check that.
https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting

Thanks. I just read those meeting notes, and also Josh Berkus' blog on the
topic:
http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html

I was only thinking to enable pluggable operations on a single, specified
heap page, probably as a function of which table owned the page. Josh's
blog seems to describe something a little broader in scope, although I
can't tell from that post exactly what functionality that entails.

Either way, this sounds like something I'd enjoy pitching in on, to
whatever extent I could be useful. Has anyone started work on this yet?

Nope, but it's still on the radar screen.

There are a couple of really huge issues that would have to be argued out
before any progress could be made.

One is that tuple layout (particularly tuple header format) is something
known in detail throughout large parts of the system. This is a PITA if
the storage layer would like to use some other tuple format, but is it
worthwhile or even possible to abstract it?

Another is that we've got whole *classes* of utility commands that are
specifically targeted to the storage engine we've got. VACUUM, CLUSTER,
ALTER TABLE SET TABLESPACE for example. Not to mention autovacuum.
It's not clear where these would fit if we tried to define a storage
engine API layer.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Christian Convey
christian.convey@gmail.com
In reply to: Tom Lane (#7)
Re: alternative back-end block formats

There are a couple of really huge issues that would have to be argued out
before any progress could be made.

Is this something that people want to spend time on right now? As I
mentioned earlier, I'm game. But it doesn't sound like I'll get very far
without adult supervision.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christian Convey (#8)
Re: alternative back-end block formats

Christian Convey <christian.convey@gmail.com> writes:

There are a couple of really huge issues that would have to be argued out
before any progress could be made.

Is this something that people want to spend time on right now? As I
mentioned earlier, I'm game. But it doesn't sound like I'll get very far
without adult supervision.

TBH, I'd rather we waited till the commitfest is over. This is certainly
material for 9.5, if not even further out, so there's no pressing need for
a debate right now; and we have plenty of stuff we do need to deal with
right now.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Christian Convey
christian.convey@gmail.com
In reply to: Tom Lane (#9)
Re: alternative back-end block formats

On Tue, Jan 28, 2014 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

TBH, I'd rather we waited till the commitfest is over. This is certainly
material for 9.5, if not even further out, so there's no pressing need for
a debate right now; and we have plenty of stuff we do need to deal with
right now.

Works for me. I'll just lurk in the meantime, and see what I can figure
out. Thanks.

- Christian