Significantly larger toast tables on 8.4?
I just did a fresh import of my alpha database into 8.4 and noticed
that the size of the database had increased significantly:
8.4: 6.3G
8.3: 4.9G
Tracking it down the main difference seems to a toast tables namely this one:
ls -ltrh 8.3/base/16516/430156
-rw------- 1 postgres postgres 145M 2009-01-01 00:12 8.3/base/16516/430156
and the same toast table in 8.4:
ls -ltrh 8.4/base/16513/347706
-rw------- 1 postgres postgres 967M 2009-01-01 20:56 8.4/base/16513/347706
This table consists mainly of perl Storable binary blobs in a bytea
column schema looks like:
Column | Type |
Modifiers
---------------+--------------------------+---------------------------------------------------------------------
action | text |
die_id | integer | not null default
nextval('dies_die_id_seq'::regclass)
cparam | bytea |
date_created | timestamp with time zone | not null default now()
db_profile | bytea |
debug | bytea |
defunct | smallint | not null default 0
env | bytea |
login | bytea |
msg | text |
open_user_id | integer |
page_load_id | integer |
session_id | integer |
state | bytea |
state_action | bytea |
user_id | integer |
whoops | bytea |
Indexes:
"dies_pkey" PRIMARY KEY, btree (die_id)
My hunch is its related to
http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cdbits
or for the CVS inclined
http://archives.postgresql.org/pgsql-committers/2008-03/msg00121.php
But if anything that looks like it should help reduce size...
Ideas?
On Thu, Jan 1, 2009 at 21:30, Alex Hunsaker <badalex@gmail.com> wrote:
http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cdbits
... typoed that its
http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cd
On Thu, Jan 1, 2009 at 21:30, Alex Hunsaker <badalex@gmail.com> wrote:
My hunch is its related to
http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cd
or for the CVS inclined
http::/archives.postgresql.org/pgsql-committers/2008-03/msg00121.phpBut if anything that looks like it should help reduce size...
Looking at the patch we dont compress things > 1M anymore so I thought
maybe I was hitting that. But no luck there are only 39 rows where
the row size > 1M... With those 39 being about 22M each. Unless I
calculated something wrong. Oh and CLUSTER and VACUUM dont seem to
help.
select count(1) from dies;
count
-------
52010
select count(1) from
( select
coalesce(pg_column_size(action), 0) +
coalesce(pg_column_size(die_id), 0) +
coalesce(pg_column_size(cparam), 0) +
coalesce(pg_column_size(date_created), 0) +
coalesce(pg_column_size(db_profile), 0) +
coalesce(pg_column_size(debug), 0) +
coalesce(pg_column_size(defunct), 0) +
coalesce(pg_column_size(env), 0) +
coalesce(pg_column_size(login), 0) +
coalesce(pg_column_size(msg), 0) +
coalesce(pg_column_size(open_user_id), 0) +
coalesce(pg_column_size(page_load_id), 0) +
coalesce(pg_column_size(session_id), 0) +
coalesce(pg_column_size(state), 0) +
coalesce(pg_column_size(state_action), 0) +
coalesce(pg_column_size(user_id), 0) +
coalesce(pg_column_size(whoops), 0) as row_size
from dies ) as foo where foo.row_size > 1024*1024;
count
-------
39
BTW is there a "cooler" way to do this?
On Thu, Jan 1, 2009 at 22:44, Alex Hunsaker <badalex@gmail.com> wrote:
Looking at the patch we dont compress things > 1M anymore so I thought
maybe I was hitting that. But no luck there are only 39 rows where
the row size > 1M... With those 39 being about 22M each.
Oh my... 25 * 40 = 1000M
So I guess my question is are we going to recommend to people that
they manually compress their data just for 8.4?
Alex Hunsaker wrote:
On Thu, Jan 1, 2009 at 22:44, Alex Hunsaker <badalex@gmail.com> wrote:
Looking at the patch we dont compress things > 1M anymore so I thought
maybe I was hitting that. But no luck there are only 39 rows where
the row size > 1M... With those 39 being about 22M each.
Oh my... 25 * 40 = 1000M
So I guess my question is are we going to recommend to people that
they manually compress their data just for 8.4?
What seems to be hurting the most is the 1MB upper limit. What is the
rationale behind that limit?
What would be the downside to require compressibility instead?
--
Sincerely,
Stephen R. van den Berg.
"Always remember that you are unique. Just like everyone else."
"Stephen R. van den Berg" <srb@cuci.nl> writes:
What seems to be hurting the most is the 1MB upper limit. What is the
rationale behind that limit?
The argument was that compressing/decompressing such large chunks would
require a lot of CPU effort; also it would defeat attempts to fetch
subsections of a large string. In the past we've required people to
explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
use of the substring-fetch optimization, but it was argued that this
would make that more likely to work automatically.
I'm not entirely convinced by Alex' analysis anyway; the only way
those 39 large values explain the size difference is if they are
*tremendously* compressible, like almost all zeroes. The toast
compressor isn't so bright that it's likely to get 10X compression
on typical data.
regards, tom lane
On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
What seems to be hurting the most is the 1MB upper limit. What is the
rationale behind that limit?The argument was that compressing/decompressing such large chunks would
require a lot of CPU effort; also it would defeat attempts to fetch
subsections of a large string. In the past we've required people to
explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
use of the substring-fetch optimization, but it was argued that this
would make that more likely to work automatically.I'm not entirely convinced by Alex' analysis anyway; the only way
those 39 large values explain the size difference is if they are
*tremendously* compressible, like almost all zeroes. The toast
compressor isn't so bright that it's likely to get 10X compression
on typical data.
I've seen gzip approach 10X on what was basically a large
tab-separated values file, but I agree that some more experimentation
to determine the real cause of the problem would be useful.
I am a little mystified by the apparent double standard regarding
compressibility. My suggestion that we disable compression for
pg_statistic columns was perfunctorily shot down even though I
provided detailed performance results demonstrating that it greatly
sped up query planning on toasted statistics and even though the space
savings from compression in that case are bound to be tiny.
Here, we have a case where the space savings are potentially much
larger, and the only argument against it is that someone might be
disappointed in the performance of substring operations, if they
happen to do any. What if they know that they don't want to do any
and want to get compression? Even if the benefit is only 1.5X on
their data rather than 10X, that seems like a pretty sane and useful
thing to want to do. It's easy to shut off compression if you don't
want it; if the system makes an arbitrary decision to disable it, how
do you get it back?
...Robert
"Robert Haas" <robertmhaas@gmail.com> writes:
On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm not entirely convinced by Alex' analysis anyway; the only way
those 39 large values explain the size difference is if they are
*tremendously* compressible, like almost all zeroes. The toast
compressor isn't so bright that it's likely to get 10X compression
on typical data.
I've seen gzip approach 10X on what was basically a large
tab-separated values file, but I agree that some more experimentation
to determine the real cause of the problem would be useful.
If I'm counting on my fingers correctly, you'd need to assume about
23X compression to get the reported size change...
An easy way to prove or disprove the point would be to go into
src/backend/utils/adt/pg_lzcompress.c, and change the second entry
in strategy_default_data from "1024 * 1024" to "INT_MAX", then
re-load the table with the modified backend (or you could even
just do "CREATE TABLE foo AS SELECT * FROM existing_table" and
then compare sizes). If that puts the toast table size back where
it had been, then Alex correctly diagnosed the cause.
regards, tom lane
On Fri, Jan 2, 2009 at 09:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The argument was that compressing/decompressing such large chunks would
require a lot of CPU effort; also it would defeat attempts to fetch
subsections of a large string. In the past we've required people to
explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
use of the substring-fetch optimization, but it was argued that this
would make that more likely to work automatically.
It is *way* faster here are some ruff numbers:
$ cat q.sql
select length(substring(debug, 1024, 1024)) from dies where die_id = 295;
$ ./pgbench -T60 -n -f q.sql
8.4: 1532.327546
8.3: 21.295657
Thats with 8.4 as a --enable-debug --enable-cassert build! (8.3 just
has --enable-debug)
I'm not entirely convinced by Alex' analysis anyway; the only way
those 39 large values explain the size difference is if they are
*tremendously* compressible, like almost all zeroes. The toast
compressor isn't so bright that it's likely to get 10X compression
on typical data.
Well its certainly not all zeros, but those big rows are the same
message repeated (~5k) until perl detected endless recursion died...
So its not exactly typical or *useful* data either. The other 51,
971 rows (rows smaller than 1024*1024 bytes) account for 174MB while
those 39 account for the other 828MB.
Mainly I was just alarmed by the size increase. But I think the
pgbench numbers make me happy even if I never do a substring on that
data...
------------------
# 8.3
select pg_size_pretty(pg_column_size(debug)) from dies where die_id = 295;
pg_size_pretty
----------------
1256 kB
select pg_size_pretty(pg_column_size(debug)) from dies where die_id = 295;
pg_size_pretty
----------------
22 MB
select pg_size_pretty(sum(row_size)) from
( select
coalesce(pg_column_size(action), 0) +
coalesce(pg_column_size(die_id), 0) +
coalesce(pg_column_size(cparam), 0) +
coalesce(pg_column_size(date_created), 0) +
coalesce(pg_column_size(db_profile), 0) +
coalesce(pg_column_size(debug), 0) +
coalesce(pg_column_size(defunct), 0) +
coalesce(pg_column_size(env), 0) +
coalesce(pg_column_size(login), 0) +
coalesce(pg_column_size(msg), 0) +
coalesce(pg_column_size(open_user_id), 0) +
coalesce(pg_column_size(page_load_id), 0) +
coalesce(pg_column_size(session_id), 0) +
coalesce(pg_column_size(state), 0) +
coalesce(pg_column_size(state_action), 0) +
coalesce(pg_column_size(user_id), 0) +
coalesce(pg_column_size(whoops), 0) as row_size
from dies ) as foo where foo.row_size < 1024*1024;
pg_size_pretty
----------------
174 MB
select pg_size_pretty(sum(row_size)) from
( select
coalesce(pg_column_size(action), 0) +
coalesce(pg_column_size(die_id), 0) +
coalesce(pg_column_size(cparam), 0) +
coalesce(pg_column_size(date_created), 0) +
coalesce(pg_column_size(db_profile), 0) +
coalesce(pg_column_size(debug), 0) +
coalesce(pg_column_size(defunct), 0) +
coalesce(pg_column_size(env), 0) +
coalesce(pg_column_size(login), 0) +
coalesce(pg_column_size(msg), 0) +
coalesce(pg_column_size(open_user_id), 0) +
coalesce(pg_column_size(page_load_id), 0) +
coalesce(pg_column_size(session_id), 0) +
coalesce(pg_column_size(state), 0) +
coalesce(pg_column_size(state_action), 0) +
coalesce(pg_column_size(user_id), 0) +
coalesce(pg_column_size(whoops), 0) as row_size
from dies ) as foo where foo.row_size < 1024*1024;
pg_size_pretty
----------------
828 MB
On Fri, Jan 2, 2009 at 10:44, Robert Haas <robertmhaas@gmail.com> wrote:
Here, we have a case where the space savings are potentially much
larger, and the only argument against it is that someone might be
disappointed in the performance of substring operations, if they
happen to do any. What if they know that they don't want to do any
and want to get compression? Even if the benefit is only 1.5X on
their data rather than 10X, that seems like a pretty sane and useful
thing to want to do. It's easy to shut off compression if you don't
want it; if the system makes an arbitrary decision to disable it, how
do you get it back?
I think we could just add another toast storage type: alter table
alter column set storage compress; ? It seems overkill to expose
PGLZ_Strategy knobs per column...
On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Robert Haas" <robertmhaas@gmail.com> writes:
I've seen gzip approach 10X on what was basically a large
tab-separated values file, but I agree that some more experimentation
to determine the real cause of the problem would be useful.If I'm counting on my fingers correctly, you'd need to assume about
23X compression to get the reported size change...An easy way to prove or disprove the point would be to go into
src/backend/utils/adt/pg_lzcompress.c, and change the second entry
in strategy_default_data from "1024 * 1024" to "INT_MAX", then
re-load the table with the modified backend (or you could even
just do "CREATE TABLE foo AS SELECT * FROM existing_table" and
then compare sizes). If that puts the toast table size back where
it had been, then Alex correctly diagnosed the cause.
And the toast file size is *drum roll* 167M.
Alex Hunsaker wrote:
I think we could just add another toast storage type: alter table
alter column set storage compress; ? It seems overkill to expose
PGLZ_Strategy knobs per column...
Three things:
a. Shouldn't it in theory be possible to have a decompression algorithm
which is IO-bound because it decompresses faster than the disk can
supply the data? (On common current hardware).
b. Has the current algorithm been carefully benchmarked and/or optimised
and/or chosen to fit the IO-bound target as close as possible?
c. Are there any well-known pitfalls/objections which would prevent me from
changing the algorithm to something more efficient (read: IO-bound)?
--
Sincerely,
Stephen R. van den Berg.
"Always remember that you are unique. Just like everyone else."
On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
Alex Hunsaker wrote:
I think we could just add another toast storage type: alter table
alter column set storage compress; ? It seems overkill to expose
PGLZ_Strategy knobs per column...Three things:
a. Shouldn't it in theory be possible to have a decompression algorithm
which is IO-bound because it decompresses faster than the disk can
supply the data? (On common current hardware).
b. Has the current algorithm been carefully benchmarked and/or optimised
and/or chosen to fit the IO-bound target as close as possible?
c. Are there any well-known pitfalls/objections which would prevent me from
changing the algorithm to something more efficient (read: IO-bound)?
Any compression algorithm is going to require you to decompress the
entire string before extracting a substring at a given offset. When
the data is uncompressed, you can jump directly to the offset you want
to read. Even if the compression algorithm requires no overhead at
all, it's going to make the location of the data nondeterministic, and
therefore force additional disk reads.
...Robert
"Stephen R. van den Berg" <srb@cuci.nl> writes:
Alex Hunsaker wrote:
I think we could just add another toast storage type: alter table
alter column set storage compress; ? It seems overkill to expose
PGLZ_Strategy knobs per column...Three things:
a. Shouldn't it in theory be possible to have a decompression algorithm
which is IO-bound because it decompresses faster than the disk can
supply the data? (On common current hardware).
We don't stream the data from disk through the decompressor. So whether it's
i/o bound or not, the time spend decompressing is in addition to the time
spent doing the i/o. The only way it wins is if the time saved on i/o due to
the smaller data size is greater than the time spent decompressing.
I think the right value for this setting is going to depend on the
environment. If the system is starved for cpu cycles then you won't want to
compress large data. If it's starved for i/o bandwidth but has spare cpu
cycles then you will.
If that's true then we really have to expose this parameter to users. There
won't be a single value that is appropriate for everyone.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Robert Haas wrote:
On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
Three things:
a. Shouldn't it in theory be possible to have a decompression algorithm
which is IO-bound because it decompresses faster than the disk can
supply the data? (On common current hardware).
b. Has the current algorithm been carefully benchmarked and/or optimised
and/or chosen to fit the IO-bound target as close as possible?
c. Are there any well-known pitfalls/objections which would prevent me from
changing the algorithm to something more efficient (read: IO-bound)?
Any compression algorithm is going to require you to decompress the
entire string before extracting a substring at a given offset. When
the data is uncompressed, you can jump directly to the offset you want
to read. Even if the compression algorithm requires no overhead at
all, it's going to make the location of the data nondeterministic, and
therefore force additional disk reads.
That shouldn't be insurmountable:
- I currently have difficulty imagining applications that actually do
lots of substring extractions from large compressible fields.
The most likely operation would be a table which contains tsearch
indexed large textfields, but those are unlikely to participate in
a lot of substring extractions.
- Even if substring operations would be likely, I could envision a compressed
format which compresses in compressed chunks of say 64KB which can then
be addressed randomly independently.
--
Sincerely,
Stephen R. van den Berg.
"Always remember that you are unique. Just like everyone else."
On Fri, Jan 2, 2009 at 4:19 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
Robert Haas wrote:
On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
Three things:
a. Shouldn't it in theory be possible to have a decompression algorithm
which is IO-bound because it decompresses faster than the disk can
supply the data? (On common current hardware).
b. Has the current algorithm been carefully benchmarked and/or optimised
and/or chosen to fit the IO-bound target as close as possible?
c. Are there any well-known pitfalls/objections which would prevent me from
changing the algorithm to something more efficient (read: IO-bound)?Any compression algorithm is going to require you to decompress the
entire string before extracting a substring at a given offset. When
the data is uncompressed, you can jump directly to the offset you want
to read. Even if the compression algorithm requires no overhead at
all, it's going to make the location of the data nondeterministic, and
therefore force additional disk reads.That shouldn't be insurmountable:
- I currently have difficulty imagining applications that actually do
lots of substring extractions from large compressible fields.
The most likely operation would be a table which contains tsearch
indexed large textfields, but those are unlikely to participate in
a lot of substring extractions.
I completely agree. If your large text field has interior structure
with certain data items at certain positions, you'd presumably break
it into multiple fixed-width fields. If it doesn't, what's the use
case?
- Even if substring operations would be likely, I could envision a compressed
format which compresses in compressed chunks of say 64KB which can then
be addressed randomly independently.
I think this would require some sort of indexing so that you could
find the page that contains the first bit of any particular chunk you
want to find, so it might be a bit complex to implement, and I expect
it would reduce compression ratios as well. I'm sure it could be
done, but I doubt it's worth the bother. If you're more concerned
about the speed with which you can access your data than the size of
it, you can and should turn compression off altogether.
...Robert
On Fri, Jan 02, 2009 at 03:35:18PM -0500, Robert Haas wrote:
Any compression algorithm is going to require you to decompress the
entire string before extracting a substring at a given offset. When
the data is uncompressed, you can jump directly to the offset you want
to read. Even if the compression algorithm requires no overhead at
all, it's going to make the location of the data nondeterministic, and
therefore force additional disk reads.
So you compromise. You split the data into say 1MB blobs and compress
each individually. Then if someone does a substring at offset 3MB you
can find it quickly. This barely costs you anything in the compression
ratio mostly.
Implementation though, that's harder. The size of the blobs is tunable
also. I imagine the optimal value will probably be around 100KB. (12
blocks uncompressed).
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
"Alex Hunsaker" <badalex@gmail.com> writes:
On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
An easy way to prove or disprove the point would be to go into
src/backend/utils/adt/pg_lzcompress.c, and change the second entry
in strategy_default_data from "1024 * 1024" to "INT_MAX",
And the toast file size is *drum roll* 167M.
Hmmm ... so that's a lot closer to the original 145M, but it still
seems like there's something else going on. It looks like the other
thing we changed that might result in not compressing things was to
increase the third entry (minimum compression rate) from 20% to 25%.
Could you try it with that value also changed back?
regards, tom lane
"Stephen R. van den Berg" <srb@cuci.nl> writes:
- I currently have difficulty imagining applications that actually do
lots of substring extractions from large compressible fields.
The code that's in there to make this happen was written by people who
needed the feature. They're going to be upset with you if you propose
disabling it.
regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes:
I think the right value for this setting is going to depend on the
environment. If the system is starved for cpu cycles then you won't want to
compress large data. If it's starved for i/o bandwidth but has spare cpu
cycles then you will.
If that's true then we really have to expose this parameter to users. There
won't be a single value that is appropriate for everyone.
Yeah. The commit message for these changes commented
There was some discussion in the earlier threads of exposing some
of the compression knobs to users, perhaps even on a per-column
basis. I have not done anything about that here. It seems to me
that if we are changing around the parameters, we'd better get some
experience and be sure we are happy with the design before we set
things in stone by providing user-visible knobs.
and I'm still pretty worried about the longevity of any knob we put in
here. But we might not have a lot of choice.
It would be fairly easy, I think, to add some reloption fields that
would let these parameters be controlled on a per-table level.
Per-column would be much more painful; do we really need that?
regards, tom lane