Dept of ugly hacks: eliminating padding space in system indexes

Started by Tom Laneover 17 years ago19 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I was thinking a bit about how we pad columns of type NAME to
fixed-width, even though they're semantically equivalent to C strings.
The reason for wasting that space is that it makes it possible to
overlay a C struct onto the leading columns of most system catalogs.
I don't wish to propose changing that (at least not today), but it
struck me that there is no reason to overlay a C struct onto index
entries, and that getting rid of the padding space would be even more
useful in an index than in the catalog itself. It turns out to be
dead easy to implement this: effectively, we just decree that the
index column storage type for NAME is always CSTRING. Because the
two types are effectively binary-compatible as long as you don't
look at the padding, the attached ugly-but-impressively-short patch
seems to accomplish this. It passes the regression tests anyway.
Here are some numbers about the space savings in a virgin database:

CVS HEAD w/patch savings

pg_database_size('postgres') 4439752 4071112 8.3%
pg_relation_size('pg_class_relname_nsp_index') 57344 40960 28%
pg_relation_size('pg_proc_proname_args_nsp_index') 319488 204800 35%

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

regards, tom lane

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Dept of ugly hacks: eliminating padding space in system indexes

I would mention in the C comment that we are doing this for space
savings, but other than that, it seems fine.

---------------------------------------------------------------------------

Tom Lane wrote:

I was thinking a bit about how we pad columns of type NAME to
fixed-width, even though they're semantically equivalent to C strings.
The reason for wasting that space is that it makes it possible to
overlay a C struct onto the leading columns of most system catalogs.
I don't wish to propose changing that (at least not today), but it
struck me that there is no reason to overlay a C struct onto index
entries, and that getting rid of the padding space would be even more
useful in an index than in the catalog itself. It turns out to be
dead easy to implement this: effectively, we just decree that the
index column storage type for NAME is always CSTRING. Because the
two types are effectively binary-compatible as long as you don't
look at the padding, the attached ugly-but-impressively-short patch
seems to accomplish this. It passes the regression tests anyway.
Here are some numbers about the space savings in a virgin database:

CVS HEAD w/patch savings

pg_database_size('postgres') 4439752 4071112 8.3%
pg_relation_size('pg_class_relname_nsp_index') 57344 40960 28%
pg_relation_size('pg_proc_proname_args_nsp_index') 319488 204800 35%

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

regards, tom lane

Content-Description: index-name-as-cstring.patch

Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.300
diff -c -r1.300 index.c
*** src/backend/catalog/index.c	19 Jun 2008 00:46:04 -0000	1.300
--- src/backend/catalog/index.c	23 Jun 2008 19:34:54 -0000
***************
*** 262,267 ****
--- 262,278 ----
ReleaseSysCache(tuple);
}
+ 
+ 		/*
+ 		 * For an index on NAME, force the index storage to be CSTRING,
+ 		 * rather than padded to fixed length.
+ 		 */
+ 		if (to->atttypid == NAMEOID)
+ 		{
+ 			to->atttypid = CSTRINGOID;
+ 			to->attlen = -2;
+ 			to->attalign = 'c';
+ 		}
}

return indexTupDesc;

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Tom Lane wrote:

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

I think so.

cheers

andrew

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#1)
Re: Dept of ugly hacks: eliminating padding space in system indexes

On Mon, 2008-06-23 at 15:52 -0400, Tom Lane wrote:

CVS HEAD w/patch savings

pg_database_size('postgres') 4439752 4071112 8.3%
pg_relation_size('pg_class_relname_nsp_index') 57344 40960 28%
pg_relation_size('pg_proc_proname_args_nsp_index') 319488 204800 35%

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

Not doing it would be more ugly, unless there is some negative
side-effect?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#5Mark Mielke
mark@mark.mielke.cc
In reply to: Andrew Dunstan (#3)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Andrew Dunstan wrote:

Tom Lane wrote:

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

I think so.

Were you able to time any speedup? Is this something that would benefit
installations with a lot of metadata? I presume most of this information
normally easily fits in cache most of the time?

I am trying to understand what exactly it is worth... :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Mielke (#5)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Mark Mielke <mark@mark.mielke.cc> writes:

Tom Lane wrote:

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

Were you able to time any speedup?

I didn't try; can you suggest any suitable benchmark?

The performance impact is probably going to be limited by our extensive
use of catalog caches --- once a desired row is in a backend's catcache,
it doesn't take a btree search to fetch it again. Still, the system
indexes are probably "hot" enough to stay in shared buffers most of the
time, and the smaller they are the more space will be left for other
stuff, so I think there should be a distributed benefit.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#4)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Simon Riggs <simon@2ndquadrant.com> writes:

On Mon, 2008-06-23 at 15:52 -0400, Tom Lane wrote:

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

Not doing it would be more ugly, unless there is some negative
side-effect?

I thought some more about why this seems ugly to me, and realized that a
lot of it has to do with the change in typalign. Currently, a compiler
is entitled to assume that a pointer to Name is 4-byte aligned; thus
for instance it could generate word-wide instructions for copying a Name
from one place to another. A "Name" that is stored as just CSTRING
might break that. We are already at risk of this, really, because of
all the places where we gaily pass plain old C strings to syscache and
index searches on Name columns. I think the only reason we've not been
burnt is that it's hard to optimize strcmp() into word-wide operations.

However the solution to that seems fairly obvious: let's downgrade Name
to typalign 1 instead of 4.

regards, tom lane

#8Mark Mielke
mark@mark.mielke.cc
In reply to: Tom Lane (#6)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Tom Lane wrote:

Were you able to time any speedup?

I didn't try; can you suggest any suitable benchmark?

Unfortunately - no. I kind of think it won't benefit any of my databases
in any noticeable way. My numbers are similar to yours:

pccyber=# select pg_database_size('postgres');
4468332

pccyber=# select pg_relation_size('pg_class_relname_nsp_index');
90112

pccyber=# select pg_relation_size('pg_proc_proname_args_nsp_index');
294912

Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of RAM
for my main PostgreSQL system that I manage seems like a drop in the
bucket. Even if 40% of pg_class_relname and pg_proc_proname indices was
saved - we're talking about 154 Kbytes saved on both those indices
combined. Minor? Major? I bet I wouldn't notice unless my database
requirements used up all RAM, and even then I'm suspecting it wouldn't
matter except for border line cases (like all pages required for
everything else happened to equal 4 Gbytes near exactly).

The performance impact is probably going to be limited by our extensive
use of catalog caches --- once a desired row is in a backend's catcache,
it doesn't take a btree search to fetch it again. Still, the system
indexes are probably "hot" enough to stay in shared buffers most of the
time, and the smaller they are the more space will be left for other
stuff, so I think there should be a distributed benefit.

In my opinion it is 'do the right thing', rather than a performance
question. It seems to me that an index keeping tracking of space
characters at the end of a name, char, varchar, or text does not make
sense, and the right thing may be to do a more generic version of your
patch? In the few cases that space at the end matters, couldn't that be
determined by re-checking the table row after querying it?

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#1)
Re: Dept of ugly hacks: eliminating padding space in system indexes

dead easy to implement this: effectively, we just decree that the
index column storage type for NAME is always CSTRING. Because the

Isn't it a reason to add STORAGE option of CREATE OPERATOR CLASS to BTree? as
it's done for GiST and GIN indexes.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#10Shane Ambler
pgsql@Sheeky.Biz
In reply to: Mark Mielke (#8)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Mark Mielke wrote:

Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of RAM
for my main PostgreSQL system that I manage seems like a drop in the
bucket. Even if 40% of pg_class_relname and pg_proc_proname indices was
saved - we're talking about 154 Kbytes saved on both those indices
combined. Minor? Major? I bet I wouldn't notice unless my database
requirements used up all RAM, and even then I'm suspecting it wouldn't
matter except for border line cases (like all pages required for
everything else happened to equal 4 Gbytes near exactly).

Guess the mileage will vary depending on the complexity of the db
structure. Shorter names will also benefit more than longer ones.

The performance impact is probably going to be limited by our extensive
use of catalog caches --- once a desired row is in a backend's catcache,
it doesn't take a btree search to fetch it again. Still, the system
indexes are probably "hot" enough to stay in shared buffers most of the
time, and the smaller they are the more space will be left for other
stuff, so I think there should be a distributed benefit.

My question is whether this is limited to system catalogs? or will this
benefit char() index used on any table? The second would make it more
worthwhile.

--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

#11Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Shane Ambler (#10)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Shane Ambler wrote:

My question is whether this is limited to system catalogs? or will this
benefit char() index used on any table? The second would make it more
worthwhile.

char(n) fields are already stored as variable-length on disk. This isn't
applicable to them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#9)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Teodor Sigaev <teodor@sigaev.ru> writes:

dead easy to implement this: effectively, we just decree that the
index column storage type for NAME is always CSTRING. Because the

Isn't it a reason to add STORAGE option of CREATE OPERATOR CLASS to BTree? as
it's done for GiST and GIN indexes.

Hmm ... I don't see a point in exposing that as a user-level facility,
unless you can point to other use-cases besides NAME. But it would be
cute to implement the hack by changing the initial contents of
pg_opclass instead of inserting code in the backend. I'll give that
a try.

regards, tom lane

#13Josh Berkus
josh@agliodbs.com
In reply to: Shane Ambler (#10)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Shane Ambler wrote:

Mark Mielke wrote:

Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of
RAM for my main PostgreSQL system that I manage seems like a drop in
the bucket. Even if 40% of pg_class_relname and pg_proc_proname
indices was saved - we're talking about 154 Kbytes saved on both those
indices combined. Minor? Major? I bet I wouldn't notice unless my
database requirements used up all RAM, and even then I'm suspecting it
wouldn't matter except for border line cases (like all pages required
for everything else happened to equal 4 Gbytes near exactly).

Guess the mileage will vary depending on the complexity of the db
structure. Shorter names will also benefit more than longer ones.

There are PostgreSQL users out there with more than 100,000 tables per
server instance. This will make more of a difference to them.

--Josh

#14Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#13)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Josh Berkus wrote:

Shane Ambler wrote:

Mark Mielke wrote:

Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of
RAM for my main PostgreSQL system that I manage seems like a drop in
the bucket. Even if 40% of pg_class_relname and pg_proc_proname
indices was saved - we're talking about 154 Kbytes saved on both
those indices combined. Minor? Major? I bet I wouldn't notice unless
my database requirements used up all RAM, and even then I'm
suspecting it wouldn't matter except for border line cases (like all
pages required for everything else happened to equal 4 Gbytes near
exactly).

Guess the mileage will vary depending on the complexity of the db
structure. Shorter names will also benefit more than longer ones.

There are PostgreSQL users out there with more than 100,000 tables per
server instance. This will make more of a difference to them.

More than I think people realize.

Joshua D. Drake

Show quoted text

--Josh

In reply to: Mark Mielke (#8)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Mark Mielke wrote:

saved - we're talking about 154 Kbytes saved on both those indices
combined. Minor? Major? I bet I wouldn't notice unless my database
requirements used up all RAM, and even then I'm suspecting it wouldn't
matter except for border line cases (like all pages required for
everything else happened to equal 4 Gbytes near exactly).

There is always only so much of 1st level and 2nd level cache; for those
the savings might well make a difference, even on multigigabyte
databases.
--
Sincerely,
Stephen R. van den Berg.

Life is that brief interlude between nothingness and eternity.

#16Josh Berkus
josh@agliodbs.com
In reply to: Mark Mielke (#8)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Mark,

Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of RAM
for my main PostgreSQL system that I manage seems like a drop in the
bucket. Even if 40% of pg_class_relname and pg_proc_proname indices was
saved - we're talking about 154 Kbytes saved on both those indices
combined. Minor? Major? I bet I wouldn't notice unless my database
requirements used up all RAM, and even then I'm suspecting it wouldn't
matter except for border line cases (like all pages required for
everything else happened to equal 4 Gbytes near exactly).

Again, I think the best way to test this would be to create an
installation with more than 100,000 tables & views. That's not
hypothetical; I've encountered it already twice in production users.

--Josh Berkus

#17Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#1)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Tom Lane napsal(a):

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

The problem what I see there is how to fit with in-place-upgrade. Catalog should
be generate from scratch, but if somebody uses name in regular table it invokes
request for reindex.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zdenek Kotala (#17)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Tom Lane napsal(a):

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

The problem what I see there is how to fit with in-place-upgrade. Catalog should
be generate from scratch, but if somebody uses name in regular table it invokes
request for reindex.

Actually, an existing index stored as "name" would continue to work
fine, so I think this could be worked around. But in any case name is
deprecated for user use, so anyone who suffers a reindex has only
themselves to blame.

regards, tom lane

#19Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#18)
Re: Dept of ugly hacks: eliminating padding space in system indexes

Tom Lane napsal(a):

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Tom Lane napsal(a):

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

The problem what I see there is how to fit with in-place-upgrade. Catalog should
be generate from scratch, but if somebody uses name in regular table it invokes
request for reindex.

Actually, an existing index stored as "name" would continue to work
fine, so I think this could be worked around. But in any case name is
deprecated for user use, so anyone who suffers a reindex has only
themselves to blame.

Yes, it is deprecated by you know a user. Give him a loaded shot-gun and he
start play a golf :-).

However, reindex is acceptable penalty for user who uses deprecated things.

Zdenek