TOAST usage setting

Started by Bruce Momjianalmost 19 years ago44 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

bruce wrote:

* Re: [HACKERS] Modifying TOAST thresholds /Tom Lane/

At this point it seems nothing will be done about this issue for 8.3.

I'm not sure anyone has an idea how to test it. TPCC isn't really useful
because it has a fixed size (500 byte) string buffer. Perhaps if we modified
it to have a random string length uniformly distributed between 0-2k ? But
even then it never does any scans based on that buffer. But the problem with
going with something more natural is that it'll be harder to tell exactly what
it's testing.

My idea on this was to create two backends, one with the default TOAST
value, and a second with a value of 50 bytes. Create a table with one
TEXT field, and several other columns, each column < 50 bytes.

Then, fill the table with random data (script attached that might help),
and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row
(use random data so the compression code doesn't shrink it). Then run a
test with both backends acessing the TEXT column and non-TEXT column and
measure the difference between the two backends, i.e. the backend with a
TOAST value of 50 should show faster access on the non-TEXT field, but
slower access on the TEXT field.

Then, figure out where the gains on the non-TEXT field seem to diminish
in usefulness. Basically, with a lower TOAST value, we are going to
spend more time accessing the TEXT field, but the speedup for the
non-TEXT field should be large enough win that we don't care. As the
TEXT column becomes shorter, it has less affect on the non-TEXT access.

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#2Chris Browne
cbbrowne@acm.org
In reply to: Bruce Momjian (#1)
Re: TOAST usage setting

bruce@momjian.us (Bruce Momjian) writes:

The results are here:

http://momjian.us/expire/TOAST/

I'll take a look and see if there's anything further it makes sense
for me to try testing. Thanks for following up so quickly; what with
the cold I have had, I haven't yet gotten back to the office.
--
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/languages.html
I've had a perfectly wonderful evening. But this wasn't it.
-- Groucho Marx

#3Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#1)
Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
spelled)? And what size long values were you actually storing? How did you
generate them?

I wonder if what's happening is that you have large chunks which when stored
inline are leaving lots of dead space in the table. Ie, if you're generating
values with size near 2k and the default chunk size you would expect to find
an average of 1k dead space per page, or a 12.5% drain on performance. As you
lower the chunk size you decrease that margin.

However I agree that it's hard to believe that the costs of random access
wouldn't swamp that 12.5% overhead pretty quickly.

One query I used when measuring the impact of the variable varlena stuff was
this which gives the distribution of tuples/page over a table:

SELECT count(*),n
FROM (SELECT count(*) AS n
FROM foo
GROUP BY (point_in(tidout(ctid)))[0]
) as x
GROUP BY n;

Which might help you peek at what's going on. You could also combine
pg_column_size(foo.*) to measure the size of the tuple. I think that will
measure the size of the tuple as is before the columns are detoasted.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#4Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#3)
Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
spelled)? And what size long values were you actually storing? How did you
generate them?

Please look at the script sqltest.sh at that URL. I did not modify
TOAST_MAX_CHUNK_SIZE, but it changes based on TOAST_TUPLES_PER_PAGE,
which I did change.

I wonder if what's happening is that you have large chunks which when stored
inline are leaving lots of dead space in the table. Ie, if you're generating
values with size near 2k and the default chunk size you would expect to find
an average of 1k dead space per page, or a 12.5% drain on performance. As you
lower the chunk size you decrease that margin.

Well, that could be it, but effectively that is what would happen in the
real world too.

However I agree that it's hard to believe that the costs of random access
wouldn't swamp that 12.5% overhead pretty quickly.

One query I used when measuring the impact of the variable varlena stuff was
this which gives the distribution of tuples/page over a table:

SELECT count(*),n
FROM (SELECT count(*) AS n
FROM foo
GROUP BY (point_in(tidout(ctid)))[0]
) as x
GROUP BY n;

Which might help you peek at what's going on. You could also combine
pg_column_size(foo.*) to measure the size of the tuple. I think that will
measure the size of the tuple as is before the columns are detoasted.

Please use my test script and see what you find.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#5Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#4)
Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Also, I think you have to run this for small datasets like you have well as
large data sets where the random access seek time of TOAST will really hurt.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#6Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#5)
Re: TOAST usage setting

"Gregory Stark" <stark@enterprisedb.com> writes:

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Of course I got that wrong. The question is "is your database in a single-byte
encoding" which isn't the same as being in C locale.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#7Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#5)
Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

Also, I think you have to run this for small datasets like you have well as
large data sets where the random access seek time of TOAST will really hurt.

Well, if everything doesn't fit in the cache, then the smaller heap will
be a bigger win for non-TOAST access, so some of that overhead balances
out. Let me get in-cache numbers and then I can look at larger data sets.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#7)
Re: TOAST usage setting

Bruce Momjian wrote:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

Be careful about the compression! It might be a good idea to run the
test once with the column set to uncompressible (SET STORAGE EXTERNAL?),
and again with it as compressible.

--
Alvaro Herrera Valdivia, Chile ICBM: S 39� 49' 18.1", W 73� 13' 56.4"
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.

#9Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#8)
Re: TOAST usage setting

Alvaro Herrera wrote:

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

Be careful about the compression! It might be a good idea to run the
test once with the column set to uncompressible (SET STORAGE EXTERNAL?),
and again with it as compressible.

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table. Should I still try EXTERNAL?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#9)
Re: TOAST usage setting

Bruce Momjian wrote:

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table. Should I still try EXTERNAL?

My point is that you probably want to measure separately the effect of
compression from that of the external storage.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"La tristeza es un muro entre dos jardines" (Khalil Gibran)

#11Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#7)
Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

No, we did substring() too :)

You could try hashtext()

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#12Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#10)
Re: TOAST usage setting

Alvaro Herrera wrote:

Bruce Momjian wrote:

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table. Should I still try EXTERNAL?

My point is that you probably want to measure separately the effect of
compression from that of the external storage.

OK, I will get to that if we need it later.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#11)
Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#13)
Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#15Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#14)
Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

OK, will run a test with hashtext(). What I am seeing now is a 10-20x
slowdown to access the TOAST data, and a 0-1x speedup for accessing the
non-TOAST data when the rows are long:

http://momjian.us/expire/TOAST/

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#15)
Re: TOAST usage setting

Bruce Momjian wrote:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

OK, will run a test with hashtext(). What I am seeing now is a 10-20x
slowdown to access the TOAST data, and a 0-1x speedup for accessing the
non-TOAST data when the rows are long:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:

http://momjian.us/expire/TOAST/

What you will see is that pushing TEXT to a TOAST column allows quick
access to non-TOAST values and single-row TOAST values, but accessing
all TOAST columns is slower than accessing them in the heap, by a factor
of 3-18x.

Looking at the chart, it seems 512 is the proper breakpoint for TOAST
because 512 gives us a 2x change in accessing non-TOAST values and
single-row TOAST values, and it is only 2x slower to access all TOAST
values than we have now.

Of course, this has all the data in the cache, but if the cache is
limited, pushing more to TOAST is going to be a bigger win. In general,
I would guess that the number of times all >512 byte rows are accessed
is much less than the number of times that pushing those >512 byte
values to TOAST will give a speedup.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: TOAST usage setting

Bruce Momjian <bruce@momjian.us> writes:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
http://momjian.us/expire/TOAST/

I don't understand what the numbers in this chart are?

regards, tom lane

#18Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#17)
Re: TOAST usage setting

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
http://momjian.us/expire/TOAST/

I don't understand what the numbers in this chart are?

They are taken from the test script and output files that are also in
that directory. The are in milliseconds, so higher is slower.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#19Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#16)
Re: TOAST usage setting

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:

http://momjian.us/expire/TOAST/

What you will see is that pushing TEXT to a TOAST column
allows quick access to non-TOAST values and single-row TOAST
values, but accessing all TOAST columns is slower than
accessing them in the heap, by a factor of 3-18x.

Um, my understanding for this measurement would be to tweak
TOAST_TUPLE_THRESHOLD, and use a fixed TOAST_MAX_CHUNK_SIZE ?
Why would you want to increase the number of toast tuples required for
one column value ?

My expectation would be, that we want to allow a toast tuple to fill a
whole page (TOAST_TUPLES_PER_PAGE = 1),
but keep or reduce the threshold. Maybe we need other logic to find and
use free space for that, though (8.4 material).

Andreas

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#19)
Re: TOAST usage setting

"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

My expectation would be, that we want to allow a toast tuple to fill a
whole page (TOAST_TUPLES_PER_PAGE = 1),

I've been wondering about that too. It certainly needs to be
experimented with, independently of TOAST_TUPLE_THRESHOLD, now that
we can vary the two numbers independently.

regards, tom lane

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#22)
#24Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#22)
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#22)
#26Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#25)
#27Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#16)
#28Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#26)
#29Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#29)
#31Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#28)
#32Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#30)
#33Bruce Momjian
bruce@momjian.us
In reply to: Zeugswetter Andreas SB SD (#31)
#34Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#34)
#36Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Jim Nasby (#34)
#37Bruce Momjian
bruce@momjian.us
In reply to: Zeugswetter Andreas SB SD (#36)
#38Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#37)
#39Bruce Momjian
bruce@momjian.us
In reply to: Zeugswetter Andreas SB SD (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#39)
#41Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#39)
#42Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#40)
#43Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#42)
#44Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Zeugswetter Andreas SB SD (#36)