TOAST usage setting

Started by Bruce Momjianalmost 19 years ago44 messageshackers

bruce@momjian.us

almost 19 years ago

bruce wrote:

* Re: [HACKERS] Modifying TOAST thresholds /Tom Lane/

At this point it seems nothing will be done about this issue for 8.3.

I'm not sure anyone has an idea how to test it. TPCC isn't really useful
because it has a fixed size (500 byte) string buffer. Perhaps if we modified
it to have a random string length uniformly distributed between 0-2k ? But
even then it never does any scans based on that buffer. But the problem with
going with something more natural is that it'll be harder to tell exactly what
it's testing.

My idea on this was to create two backends, one with the default TOAST
value, and a second with a value of 50 bytes. Create a table with one
TEXT field, and several other columns, each column < 50 bytes.

Then, fill the table with random data (script attached that might help),
and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row
(use random data so the compression code doesn't shrink it). Then run a
test with both backends acessing the TEXT column and non-TEXT column and
measure the difference between the two backends, i.e. the backend with a
TOAST value of 50 should show faster access on the non-TEXT field, but
slower access on the TEXT field.

Then, figure out where the gains on the non-TEXT field seem to diminish
in usefulness. Basically, with a lower TOAST value, we are going to
spend more time accessing the TEXT field, but the speedup for the
non-TEXT field should be large enough win that we don't care. As the
TEXT column becomes shorter, it has less affect on the non-TEXT access.

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Import Notes

Reply to msg id not found:

cbbrowne@acm.org

almost 19 years ago

In reply to: Bruce Momjian (#1)

Re: TOAST usage setting

bruce@momjian.us (Bruce Momjian) writes:

The results are here:

http://momjian.us/expire/TOAST/

I'll take a look and see if there's anything further it makes sense
for me to try testing. Thanks for following up so quickly; what with
the cold I have had, I haven't yet gotten back to the office.
--
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/languages.html
I've had a perfectly wonderful evening. But this wasn't it.
-- Groucho Marx

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#1)

Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
spelled)? And what size long values were you actually storing? How did you
generate them?

I wonder if what's happening is that you have large chunks which when stored
inline are leaving lots of dead space in the table. Ie, if you're generating
values with size near 2k and the default chunk size you would expect to find
an average of 1k dead space per page, or a 12.5% drain on performance. As you
lower the chunk size you decrease that margin.

However I agree that it's hard to believe that the costs of random access
wouldn't swamp that 12.5% overhead pretty quickly.

One query I used when measuring the impact of the variable varlena stuff was
this which gives the distribution of tuples/page over a table:

SELECT count(*),n
FROM (SELECT count(*) AS n
FROM foo
GROUP BY (point_in(tidout(ctid)))[0]
) as x
GROUP BY n;

Which might help you peek at what's going on. You could also combine
pg_column_size(foo.*) to measure the size of the tuple. I think that will
measure the size of the tuple as is before the columns are detoasted.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#3)

Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
spelled)? And what size long values were you actually storing? How did you
generate them?

Please look at the script sqltest.sh at that URL. I did not modify
TOAST_MAX_CHUNK_SIZE, but it changes based on TOAST_TUPLES_PER_PAGE,
which I did change.

I wonder if what's happening is that you have large chunks which when stored
inline are leaving lots of dead space in the table. Ie, if you're generating
values with size near 2k and the default chunk size you would expect to find
an average of 1k dead space per page, or a 12.5% drain on performance. As you
lower the chunk size you decrease that margin.

Well, that could be it, but effectively that is what would happen in the
real world too.

However I agree that it's hard to believe that the costs of random access
wouldn't swamp that 12.5% overhead pretty quickly.

One query I used when measuring the impact of the variable varlena stuff was
this which gives the distribution of tuples/page over a table:

SELECT count(*),n
FROM (SELECT count(*) AS n
FROM foo
GROUP BY (point_in(tidout(ctid)))[0]
) as x
GROUP BY n;

Which might help you peek at what's going on. You could also combine
pg_column_size(foo.*) to measure the size of the tuple. I think that will
measure the size of the tuple as is before the columns are detoasted.

Please use my test script and see what you find.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#4)

Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Also, I think you have to run this for small datasets like you have well as
large data sets where the random access seek time of TOAST will really hurt.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#5)

Re: TOAST usage setting

"Gregory Stark" <stark@enterprisedb.com> writes:

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Of course I got that wrong. The question is "is your database in a single-byte
encoding" which isn't the same as being in C locale.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#5)

Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

Also, I think you have to run this for small datasets like you have well as
large data sets where the random access seek time of TOAST will really hurt.

Well, if everything doesn't fit in the cache, then the smaller heap will
be a bigger win for non-TOAST access, so some of that overhead balances
out. Let me get in-cache numbers and then I can look at larger data sets.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

alvherre@2ndquadrant.com

almost 19 years ago

In reply to: Bruce Momjian (#7)

Re: TOAST usage setting

Bruce Momjian wrote:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

Be careful about the compression! It might be a good idea to run the
test once with the column set to uncompressible (SET STORAGE EXTERNAL?),
and again with it as compressible.

--
Alvaro Herrera Valdivia, Chile ICBM: S 39ï¿½ 49' 18.1", W 73ï¿½ 13' 56.4"
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.

bruce@momjian.us

almost 19 years ago

In reply to: Alvaro Herrera (#8)

Re: TOAST usage setting

Alvaro Herrera wrote:

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

Be careful about the compression! It might be a good idea to run the
test once with the column set to uncompressible (SET STORAGE EXTERNAL?),
and again with it as compressible.

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table. Should I still try EXTERNAL?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

alvherre@2ndquadrant.com

almost 19 years ago

In reply to: Bruce Momjian (#9)

Re: TOAST usage setting

Bruce Momjian wrote:

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table. Should I still try EXTERNAL?

My point is that you probably want to measure separately the effect of
compression from that of the external storage.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"La tristeza es un muro entre dos jardines" (Khalil Gibran)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#7)

Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

No, we did substring() too :)

You could try hashtext()

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

almost 19 years ago

In reply to: Alvaro Herrera (#10)

Re: TOAST usage setting

Alvaro Herrera wrote:

Bruce Momjian wrote:

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table. Should I still try EXTERNAL?

My point is that you probably want to measure separately the effect of
compression from that of the external storage.

OK, I will get to that if we need it later.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#11)

Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:

if (pg_database_encoding_max_length() == 1)
PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length(). OK, will run the test with
substring(t,1,1).

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#13)

Re: TOAST usage setting

"Bruce Momjian" <bruce@momjian.us> writes:

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#14)

Re: TOAST usage setting

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

OK, will run a test with hashtext(). What I am seeing now is a 10-20x
slowdown to access the TOAST data, and a 0-1x speedup for accessing the
non-TOAST data when the rows are long:

http://momjian.us/expire/TOAST/

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#15)

Re: TOAST usage setting

Bruce Momjian wrote:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

OK, will run a test with hashtext(). What I am seeing now is a 10-20x
slowdown to access the TOAST data, and a 0-1x speedup for accessing the
non-TOAST data when the rows are long:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:

http://momjian.us/expire/TOAST/

What you will see is that pushing TEXT to a TOAST column allows quick
access to non-TOAST values and single-row TOAST values, but accessing
all TOAST columns is slower than accessing them in the heap, by a factor
of 3-18x.

Looking at the chart, it seems 512 is the proper breakpoint for TOAST
because 512 gives us a 2x change in accessing non-TOAST values and
single-row TOAST values, and it is only 2x slower to access all TOAST
values than we have now.

Of course, this has all the data in the cache, but if the cache is
limited, pushing more to TOAST is going to be a bigger win. In general,
I would guess that the number of times all >512 byte rows are accessed
is much less than the number of times that pushing those >512 byte
values to TOAST will give a speedup.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Import Notes

Reply to msg id not found: 200705292240.l4TMeME04623@momjian.us | Resolved by subject fallback

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Bruce Momjian (#16)

Re: TOAST usage setting

Bruce Momjian <bruce@momjian.us> writes:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
http://momjian.us/expire/TOAST/

I don't understand what the numbers in this chart are?

regards, tom lane

bruce@momjian.us

almost 19 years ago

In reply to: Tom Lane (#17)

Re: TOAST usage setting

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
http://momjian.us/expire/TOAST/

I don't understand what the numbers in this chart are?

They are taken from the test script and output files that are also in
that directory. The are in milliseconds, so higher is slower.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Bruce Momjian (#16)

Re: TOAST usage setting

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:

http://momjian.us/expire/TOAST/

What you will see is that pushing TEXT to a TOAST column
allows quick access to non-TOAST values and single-row TOAST
values, but accessing all TOAST columns is slower than
accessing them in the heap, by a factor of 3-18x.

Um, my understanding for this measurement would be to tweak
TOAST_TUPLE_THRESHOLD, and use a fixed TOAST_MAX_CHUNK_SIZE ?
Why would you want to increase the number of toast tuples required for
one column value ?

My expectation would be, that we want to allow a toast tuple to fill a
whole page (TOAST_TUPLES_PER_PAGE = 1),
but keep or reduce the threshold. Maybe we need other logic to find and
use free space for that, though (8.4 material).

Andreas

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Zeugswetter Andreas SB SD (#19)

Re: TOAST usage setting

"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

My expectation would be, that we want to allow a toast tuple to fill a
whole page (TOAST_TUPLES_PER_PAGE = 1),

I've been wondering about that too. It certainly needs to be
experimented with, independently of TOAST_TUPLE_THRESHOLD, now that
we can vary the two numbers independently.

regards, tom lane

bruce@momjian.us

almost 19 years ago

In reply to: Tom Lane (#20)

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Bruce Momjian (#21)

bruce@momjian.us

almost 19 years ago

In reply to: Tom Lane (#22)

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Tom Lane (#22)

bruce@momjian.us

almost 19 years ago

In reply to: Tom Lane (#22)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#25)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 19 years ago

In reply to: Bruce Momjian (#16)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#26)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#28)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#29)

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Bruce Momjian (#28)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#30)

bruce@momjian.us

almost 19 years ago

In reply to: Zeugswetter Andreas SB SD (#31)

Jim.Nasby@BlueTreble.com

almost 19 years ago

In reply to: Bruce Momjian (#33)

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Jim Nasby (#34)

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Jim Nasby (#34)

bruce@momjian.us

almost 19 years ago

In reply to: Zeugswetter Andreas SB SD (#36)

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Bruce Momjian (#37)

bruce@momjian.us

almost 19 years ago

In reply to: Zeugswetter Andreas SB SD (#38)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#39)

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Bruce Momjian (#39)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#40)

bruce@momjian.us

almost 19 years ago

In reply to: Bruce Momjian (#42)

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 19 years ago

In reply to: Zeugswetter Andreas SB SD (#36)