question about postgres storage management

Started by Will Childs-Kleinover 12 years ago5 messages

willck93@gmail.com

over 12 years ago

Hello All,
I'm writing today to inquire about finding the exact point in the source
where postgres writes to disk. I'm trying to implement some compression in
postgres. The idea is to compress the data right when its written to disk,
to reduce the amount of data written to disk, reducing the amount of time
of disk read/write. I'm hoping that this reduction in disk IO latency is
greater than the CPU cost incurred by compression, resulting in a speedup.
I will be testing various compression libraries to see which (if any) work
well for various query types. I've been looking through the source code, in
src/backend/storage specifically. I'm thinking something in smgr is where i
want to make my de/compress calls. Specifically in
src/backend/storage/smgr/md.c in the functions mdwrite(...) and
mdread(...). Am I in the right place? If not where should I look?

Sincerely,
Will Childs-Klein

Stephen Frost

sfrost@snowman.net

over 12 years ago

In reply to: Will Childs-Klein (#1)

Re: question about postgres storage management

Will,

* Will Childs-Klein (willck93@gmail.com) wrote:

I'm writing today to inquire about finding the exact point in the source
where postgres writes to disk. I'm trying to implement some compression in
postgres.

PostgreSQL already does compression in most cases where you'd want it
done..

Thanks,

Stephen

Merlin Moncure

mmoncure@gmail.com

over 12 years ago

In reply to: Will Childs-Klein (#1)

Re: question about postgres storage management

On Fri, Apr 19, 2013 at 10:17 AM, Will Childs-Klein <willck93@gmail.com> wrote:

Hello All,
I'm writing today to inquire about finding the exact point in the source
where postgres writes to disk. I'm trying to implement some compression in
postgres. The idea is to compress the data right when its written to disk,
to reduce the amount of data written to disk, reducing the amount of time of
disk read/write. I'm hoping that this reduction in disk IO latency is
greater than the CPU cost incurred by compression, resulting in a speedup. I
will be testing various compression libraries to see which (if any) work
well for various query types. I've been looking through the source code, in
src/backend/storage specifically. I'm thinking something in smgr is where i
want to make my de/compress calls. Specifically in
src/backend/storage/smgr/md.c in the functions mdwrite(...) and mdread(...).
Am I in the right place? If not where should I look?

this is not going to work. postgres tables are page organized -- if
you compress pages as they are written out they become variable
length. this in turn would cause the entire file to have to shift up
if you wrote a page back and the size grew.

as noted, postgres already compresses the most interesting case --
when a single tuple spans pages (aka toast).

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Christopher Browne

cbbrowne@gmail.com

over 12 years ago

In reply to: Will Childs-Klein (#1)

Re: question about postgres storage management

I would expect the strategy you have in mind to be more useful to apply at
the filesystem level, so that it's not in Postgres altogether. (Ala
"Stacker", remember DR-DOS?)

But, to speak arguable heresy, the demerits of this sort of thing are
described nicely in Another Database's Documentation: <
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html

The relevant bit that seems to describe fairly aptly why what you are
suggesting is unlikely to turn out well:

"Some operating systems implement compression at the file system level.
Files are typically divided into fixed-size blocks that are compressed into
variable-size blocks, which easily leads into fragmentation. Every time
something inside a block is modified, the whole block is recompressed
before it is written to disk. These properties make this compression
technique unsuitable for use in an update-intensive database system."

The principle described is as applicable to Postgres as it is to InnoDB,
and is as applicable to attempting to compress disk blocks from within the
database as it is to apply it at the filesystem layer.

Postgres *does* make use of data compression, where applicable; see the
documentation for TOAST: <
http://www.postgresql.org/docs/9.2/static/storage-toast.html>

You should avail yourself of the code on TOAST:
./src/backend/catalog/toasting.c
./src/backend/access/heap/tuptoaster.c
./src/include/catalog/toasting.h
./src/include/access/tuptoaster.h

Merlin Moncure

mmoncure@gmail.com

over 12 years ago

In reply to: Christopher Browne (#4)

Re: question about postgres storage management

On Fri, Apr 19, 2013 at 11:17 AM, Christopher Browne <cbbrowne@gmail.com> wrote:

I would expect the strategy you have in mind to be more useful to apply at
the filesystem level, so that it's not in Postgres altogether. (Ala
"Stacker", remember DR-DOS?)

But, to speak arguable heresy, the demerits of this sort of thing are
described nicely in Another Database's Documentation:
<http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html>

The relevant bit that seems to describe fairly aptly why what you are
suggesting is unlikely to turn out well:

"Some operating systems implement compression at the file system level.
Files are typically divided into fixed-size blocks that are compressed into
variable-size blocks, which easily leads into fragmentation. Every time
something inside a block is modified, the whole block is recompressed before
it is written to disk. These properties make this compression technique
unsuitable for use in an update-intensive database system."

The principle described is as applicable to Postgres as it is to InnoDB, and
is as applicable to attempting to compress disk blocks from within the
database as it is to apply it at the filesystem layer.

Postgres *does* make use of data compression, where applicable; see the
documentation for TOAST:
<http://www.postgresql.org/docs/9.2/static/storage-toast.html>

...which I turn off in cases where I'm concerned about performance :-D.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers