Optimize partial TOAST decompression

Started by Binguo Baoalmost 7 years ago30 messageshackers
Jump to latest
#1Binguo Bao
djydewang@gmail.com

Hi, hackers!
I'm a student participating in GSoC 2019 and my project is related to TOAST
slices.
When I'm getting familiar with the postgresql codebase, I find that
PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch
all compressed data chunks then extract the relevant slice. Obviously, this
is unnecessary, we only need to fetch the data chunks we need.

The patch optimizes partial TOAST decompression.
For an example of the improvement possible, this trivial example:
---------------------------------------------------------------------
create table slicingtest (
id serial primary key,
a text
);

insert into slicingtest (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000) as a from
generate_series(1,100);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;
---------------------------------------------------------------------
environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux
On master, I get
Time: 28.123 ms (Take ten times average)
With the patch, I get
Time: 2.306 ms (take ten times average)

This seems to have a 10x improvement. If the number of toast data chunks is
more, I believe that patch can play a greater role, there are about 200
related TOAST data chunks for each entry in the case.

Related discussion:
/messages/by-id/CACowWR07EDm7Y4m2kbhN_jnys=BBf9A6768RyQdKm_=NpkcaWg@mail.gmail.com

Best regards, Binguo Bao.

Attachments:

0001-Optimize-partial-TOAST-decompression.patchtext/x-patch; charset=US-ASCII; name=0001-Optimize-partial-TOAST-decompression.patchDownload+14-8
#2Andrey Borodin
amborodin@acm.org
In reply to: Binguo Bao (#1)
Re: Optimize partial TOAST decompression

Hi, Binguo!

2 июня 2019 г., в 19:48, Binguo Bao <djydewang@gmail.com> написал(а):

Hi, hackers!

....

This seems to have a 10x improvement. If the number of toast data chunks is more, I believe that patch can play a greater role, there are about 200 related TOAST data chunks for each entry in the case.

That's really cool that you could produce meaningful patch long before end of GSoC!

I'll describe what is going on a little:
1. We have compressed value, which resides in TOAST table.
2. We want only some fraction of this value. We want some prefix with length L.
3. Previously Paul Ramsey submitted patch that omits decompression of value beyond desired L bytes.
4. Binguo's patch tries to do not fetch compressed data which will not bee needed to decompressor. In fact it fetches L bytes from TOAST table.

This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst we have one control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format.

Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...

Best regards, Andrey Borodin.

#3Binguo Bao
djydewang@gmail.com
In reply to: Andrey Borodin (#2)
Re: Optimize partial TOAST decompression

This is not correct: L bytes of compressed data do not always can be

decoded into at least L bytes of data. At worst we have one control byte
per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8
bytes with current pglz format.

Good catch! I've corrected the related code in the patch.

Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...

I followed the code in toast_fetch_datum function[1]https://github.com/postgres/postgres/blob/master/src/backend/access/heap/tuptoaster.c#L1898, and I didn't see any
wrong with it.

Best regards, Binguo Bao

[1]: https://github.com/postgres/postgres/blob/master/src/backend/access/heap/tuptoaster.c#L1898
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/tuptoaster.c#L1898

Andrey Borodin <x4mmm@yandex-team.ru> 于2019年6月23日周日 下午5:23写道:

Show quoted text

Hi, Binguo!

2 июня 2019 г., в 19:48, Binguo Bao <djydewang@gmail.com> написал(а):

Hi, hackers!

....

This seems to have a 10x improvement. If the number of toast data chunks

is more, I believe that patch can play a greater role, there are about 200
related TOAST data chunks for each entry in the case.

That's really cool that you could produce meaningful patch long before end
of GSoC!

I'll describe what is going on a little:
1. We have compressed value, which resides in TOAST table.
2. We want only some fraction of this value. We want some prefix with
length L.
3. Previously Paul Ramsey submitted patch that omits decompression of
value beyond desired L bytes.
4. Binguo's patch tries to do not fetch compressed data which will not bee
needed to decompressor. In fact it fetches L bytes from TOAST table.

This is not correct: L bytes of compressed data do not always can be
decoded into at least L bytes of data. At worst we have one control byte
per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8
bytes with current pglz format.

Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...

Best regards, Andrey Borodin.

Attachments:

0001-Optimize-partial-TOAST-decompression-2.patchtext/x-patch; charset=US-ASCII; name=0001-Optimize-partial-TOAST-decompression-2.patchDownload+27-8
#4Andrey Borodin
amborodin@acm.org
In reply to: Binguo Bao (#3)
Re: Optimize partial TOAST decompression

Hi!
Please, do not use top-posting, i.e. reply style where you quote whole message under your response. It makes reading of archives terse.

24 июня 2019 г., в 7:53, Binguo Bao <djydewang@gmail.com> написал(а):

This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst we have one control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format.

Good catch! I've corrected the related code in the patch.
...
<0001-Optimize-partial-TOAST-decompression-2.patch>

I've took a look into the code.
I think we should extract function for computation of max_compressed_size and put it somewhere along with pglz code. Just in case something will change something about pglz so that they would not forget about compression algorithm assumption.

Also I suggest just using 64 bit computation to avoid overflows. And I think it worth to check if max_compressed_size is whole data and use min of (max_compressed_size, uncompressed_data_size).

Also you declared needsize and max_compressed_size too far from use. But this will be solved by function extraction anyway.

Thanks!

Best regards, Andrey Borodin.

#5Binguo Bao
djydewang@gmail.com
In reply to: Andrey Borodin (#4)
Re: Optimize partial TOAST decompression

Hi!

Andrey Borodin <x4mmm@yandex-team.ru> 于2019年6月29日周六 下午9:48写道:

Hi!
Please, do not use top-posting, i.e. reply style where you quote whole
message under your response. It makes reading of archives terse.

24 июня 2019 г., в 7:53, Binguo Bao <djydewang@gmail.com> написал(а):

This is not correct: L bytes of compressed data do not always can be

decoded into at least L bytes of data. At worst we have one control byte
per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8
bytes with current pglz format.

Good catch! I've corrected the related code in the patch.
...
<0001-Optimize-partial-TOAST-decompression-2.patch>

I've took a look into the code.
I think we should extract function for computation of max_compressed_size
and put it somewhere along with pglz code. Just in case something will
change something about pglz so that they would not forget about compression
algorithm assumption.

Also I suggest just using 64 bit computation to avoid overflows. And I
think it worth to check if max_compressed_size is whole data and use min of
(max_compressed_size, uncompressed_data_size).

Also you declared needsize and max_compressed_size too far from use. But
this will be solved by function extraction anyway.

Thanks!

Best regards, Andrey Borodin.

Thanks for the suggestion.
I've extracted function for computation for max_compressed_size and put the
function into pg_lzcompress.c.

Best regards, Binguo Bao.

Attachments:

0001-Optimize-partial-TOAST-decompression-3.patchtext/x-patch; charset=US-ASCII; name=0001-Optimize-partial-TOAST-decompression-3.patchDownload+40-8
#6Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Binguo Bao (#5)
Re: Optimize partial TOAST decompression

On Mon, Jul 1, 2019 at 6:46 AM Binguo Bao <djydewang@gmail.com> wrote:

Andrey Borodin <x4mmm@yandex-team.ru> 于2019年6月29日周六 下午9:48写道:
I've took a look into the code.
I think we should extract function for computation of max_compressed_size and put it somewhere along with pglz code. Just in case something will change something about pglz so that they would not forget about compression algorithm assumption.

Also I suggest just using 64 bit computation to avoid overflows. And I think it worth to check if max_compressed_size is whole data and use min of (max_compressed_size, uncompressed_data_size).

Also you declared needsize and max_compressed_size too far from use. But this will be solved by function extraction anyway.

Thanks for the suggestion.
I've extracted function for computation for max_compressed_size and put the function into pg_lzcompress.c.

This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.

If you're looking to continue down this code line in your next patch,
the next TODO item is a little more involved: a user-land (ala
PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow
the optimization of searching of large objects like JSONB types, and
so on, where the thing you are looking for is not at a known location
in the object. So, things like looking for a particular substring in a
string, or looking for a particular key in a JSONB. "Iterate until you
find the thing." would allow optimization of some code lines that
currently require full decompression of the objects.

P.

#7Binguo Bao
djydewang@gmail.com
In reply to: Paul Ramsey (#6)
Re: Optimize partial TOAST decompression

Paul Ramsey <pramsey@cleverelephant.ca> 于2019年7月2日周二 下午10:46写道:

This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.

If you're looking to continue down this code line in your next patch,
the next TODO item is a little more involved: a user-land (ala
PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow
the optimization of searching of large objects like JSONB types, and
so on, where the thing you are looking for is not at a known location
in the object. So, things like looking for a particular substring in a
string, or looking for a particular key in a JSONB. "Iterate until you
find the thing." would allow optimization of some code lines that
currently require full decompression of the objects.

P.

Thanks for your comment. I've updated the patch.
As for the iterator API, I've implemented a de-TOAST iterator actually[0]/messages/by-id/CAL-OGks_onzpc9M9bXPCztMofWULcFkyeCeKiAgXzwRL8kXiag@mail.gmail.com.
And I’m looking for more of its application scenarios and perfecting it.
Any comments would be much appreciated.

Best Regards, Binguo Bao.

[0]: /messages/by-id/CAL-OGks_onzpc9M9bXPCztMofWULcFkyeCeKiAgXzwRL8kXiag@mail.gmail.com
/messages/by-id/CAL-OGks_onzpc9M9bXPCztMofWULcFkyeCeKiAgXzwRL8kXiag@mail.gmail.com

Attachments:

0001-Optimize-partial-TOAST-decompression-4.patchtext/x-patch; charset=US-ASCII; name=0001-Optimize-partial-TOAST-decompression-4.patchDownload+44-8
#8Andrey Borodin
amborodin@acm.org
In reply to: Binguo Bao (#7)
Re: Optimize partial TOAST decompression

3 июля 2019 г., в 18:06, Binguo Bao <djydewang@gmail.com> написал(а):

Paul Ramsey <pramsey@cleverelephant.ca> 于2019年7月2日周二 下午10:46写道:
This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.
...

Thanks for your comment. I've updated the patch.

Thanks Biguo and Paul! From my POV patch looks ready for committer, so I switched state on CF.

Best regards, Andrey Borodin.

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andrey Borodin (#8)
Re: Optimize partial TOAST decompression

On Thu, Jul 04, 2019 at 11:10:24AM +0200, Andrey Borodin wrote:

3 июля 2019 г., в 18:06, Binguo Bao <djydewang@gmail.com> написал(а):

Paul Ramsey <pramsey@cleverelephant.ca> 于2019年7月2日周二 下午10:46写道:
This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.
...

Thanks for your comment. I've updated the patch.

Thanks Biguo and Paul! From my POV patch looks ready for committer, so I switched state on CF.

I've done a bit of testing and benchmaring on this patch today, and
there's a bug somewhere, making it look like there are corrupted data.

What I'm seeing is this:

CREATE TABLE t (a text);

-- attached is data for one row
COPY t FROM '/tmp/t.data';

SELECT length(substr(a,1000)) from t;
psql: ERROR: compressed data is corrupted

SELECT length(substr(a,0,1000)) from t;
length
--------
999
(1 row)

SELECT length(substr(a,1000)) from t;
psql: ERROR: invalid memory alloc request size 2018785106

That's quite bizarre behavior - it does work with a prefix, but not with
suffix. And the exact ERROR changes after the prefix query. (Of course,
on master it works in all cases.)

The backtrace (with the patch applied) looks like this:

#0 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
#1 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
#2 0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315
#3 0x000000000085c1e5 in pg_detoast_datum_slice (datum=<optimized out>, first=<optimized out>, count=<optimized out>) at fmgr.c:1767
#4 0x0000000000833b7a in text_substring (str=133761519127512, start=0, length=<optimized out>, length_not_specified=<optimized out>) at varlena.c:956
...

I've only observed this with a very small number of rows (the data is
generated randomly with different compressibility etc.), so I'm only
attaching one row that exhibits this issue.

My guess is toast_fetch_datum_slice() gets confused by the headers or
something, or something like that. FWIW the new code added to this
function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#9)
Re: Optimize partial TOAST decompression

Of course, I forgot to attach the files, so here they are.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

t.data.gzapplication/gzipDownload
bt.txttext/plain; charset=us-asciiDownload
#11Binguo Bao
djydewang@gmail.com
In reply to: Tomas Vondra (#9)
Re: Optimize partial TOAST decompression

Tomas Vondra <tomas.vondra@2ndquadrant.com> 于2019年7月5日周五 上午1:46写道:

I've done a bit of testing and benchmaring on this patch today, and
there's a bug somewhere, making it look like there are corrupted data.

What I'm seeing is this:

CREATE TABLE t (a text);

-- attached is data for one row
COPY t FROM '/tmp/t.data';

SELECT length(substr(a,1000)) from t;
psql: ERROR: compressed data is corrupted

SELECT length(substr(a,0,1000)) from t;
length
--------
999
(1 row)

SELECT length(substr(a,1000)) from t;
psql: ERROR: invalid memory alloc request size 2018785106

That's quite bizarre behavior - it does work with a prefix, but not with
suffix. And the exact ERROR changes after the prefix query. (Of course,
on master it works in all cases.)

The backtrace (with the patch applied) looks like this:

#0 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
#1 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
#2 0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized
out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315
#3 0x000000000085c1e5 in pg_detoast_datum_slice (datum=<optimized out>,
first=<optimized out>, count=<optimized out>) at fmgr.c:1767
#4 0x0000000000833b7a in text_substring (str=133761519127512, start=0,
length=<optimized out>, length_not_specified=<optimized out>) at
varlena.c:956
...

I've only observed this with a very small number of rows (the data is
generated randomly with different compressibility etc.), so I'm only
attaching one row that exhibits this issue.

My guess is toast_fetch_datum_slice() gets confused by the headers or
something, or something like that. FWIW the new code added to this
function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with

suffix. And the exact ERROR changes after the prefix query.

I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this

function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Best Regards, Binguo Bao.

Attachments:

0001-Optimize-partial-TOAST-decompression-5.patchtext/x-patch; charset=US-ASCII; name=0001-Optimize-partial-TOAST-decompression-5.patchDownload+72-13
#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Binguo Bao (#11)
Re: Optimize partial TOAST decompression

On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with

suffix. And the exact ERROR changes after the prefix query.

I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this

function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Thanks, this seems to address the issue - I can no longer reproduce the
crashes, allowing the benchmark to complete. I'm attaching the script I
used and spreadsheet with a summary of results.

For the cases I've tested (100k - 10M values, different compressibility,
different prefix/length values), the results are kinda mixed - many
cases got much faster (~2x), but other cases got slower too. We're
however talking about queries taking a couple of miliseconds, so in
absolute numbers the differences are small.

That does not mean the optimization is useless - but the example shared
at the beginning of this thread is quite extreme, as the values are
extremely compressible. Each value is ~38MB (in plaintext), but a table
with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
which I think is not typical for real-world data sets.

The data I've used are less extreme, depending on the fraction of random
data in values.

I went through the code too. I've reworder a couple of comments and code
style issues, but there are a couple of more serious issues.

1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?

This seems unnecessary, and it discards the clear hint that it's about
accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
IMHO we should keep the original naming.

2) pglz_maximum_compressed_size signatures are confusing

There are two places with pglz_maximum_compressed_size signature, and
those places are kinda out of sync when it comes to parameter names:

int32
pglz_maximum_compressed_size(int32 raw_slice_size,
int32 total_compressed_size)

extern
int32 pglz_maximum_compressed_size(int32 raw_slice_size,
int32 raw_size);

Also, pg_lzcompress.c has no concept of a "slice" because it only deals
with simple compression, slicing is responsibility of the tuptoaster. So
we should not mix those two, not even in comments.

I propose tweaks per the attached patch - I think it makes the code
clearer, and it's mostly cosmetic stuff. But I haven't tested the
changes beyond "it compiles".

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Optimize-partial-TOAST-decompression.patchtext/plain; charset=us-asciiDownload+72-13
0002-review-reworks.patchtext/plain; charset=us-asciiDownload+37-32
random-bench2.shapplication/x-shDownload
comparison.odsapplication/vnd.oasis.opendocument.spreadsheetDownload
#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#12)
Re: Optimize partial TOAST decompression

On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:

On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with

suffix. And the exact ERROR changes after the prefix query.

I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this

function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Thanks, this seems to address the issue - I can no longer reproduce the
crashes, allowing the benchmark to complete. I'm attaching the script I
used and spreadsheet with a summary of results.

For the cases I've tested (100k - 10M values, different compressibility,
different prefix/length values), the results are kinda mixed - many
cases got much faster (~2x), but other cases got slower too. We're
however talking about queries taking a couple of miliseconds, so in
absolute numbers the differences are small.

That does not mean the optimization is useless - but the example shared
at the beginning of this thread is quite extreme, as the values are
extremely compressible. Each value is ~38MB (in plaintext), but a table
with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
which I think is not typical for real-world data sets.

The data I've used are less extreme, depending on the fraction of random
data in values.

I went through the code too. I've reworder a couple of comments and code
style issues, but there are a couple of more serious issues.

1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?

This seems unnecessary, and it discards the clear hint that it's about
accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
IMHO we should keep the original naming.

2) pglz_maximum_compressed_size signatures are confusing

There are two places with pglz_maximum_compressed_size signature, and
those places are kinda out of sync when it comes to parameter names:

int32
pglz_maximum_compressed_size(int32 raw_slice_size,
int32 total_compressed_size)

extern
int32 pglz_maximum_compressed_size(int32 raw_slice_size,
int32 raw_size);

Also, pg_lzcompress.c has no concept of a "slice" because it only deals
with simple compression, slicing is responsibility of the tuptoaster. So
we should not mix those two, not even in comments.

I propose tweaks per the attached patch - I think it makes the code
clearer, and it's mostly cosmetic stuff. But I haven't tested the
changes beyond "it compiles".

regards

FWIW I've done another round of tests with larger values (up to ~10MB)
and the larger the values the better the speedup (a bit as expected).
Attached is the script I used and a spreasheet with a summary.

We still need a patch addressing the review comments, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

random-bench.shapplication/x-shDownload
results-large.odsapplication/vnd.oasis.opendocument.spreadsheetDownload
#14Binguo Bao
djydewang@gmail.com
In reply to: Tomas Vondra (#13)
Re: Optimize partial TOAST decompression

Tomas Vondra <tomas.vondra@2ndquadrant.com> 于2019年7月10日周三 上午5:12写道:

On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:

On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with

suffix. And the exact ERROR changes after the prefix query.

I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this

function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Thanks, this seems to address the issue - I can no longer reproduce the
crashes, allowing the benchmark to complete. I'm attaching the script I
used and spreadsheet with a summary of results.

For the cases I've tested (100k - 10M values, different compressibility,
different prefix/length values), the results are kinda mixed - many
cases got much faster (~2x), but other cases got slower too. We're
however talking about queries taking a couple of miliseconds, so in
absolute numbers the differences are small.

That does not mean the optimization is useless - but the example shared
at the beginning of this thread is quite extreme, as the values are
extremely compressible. Each value is ~38MB (in plaintext), but a table
with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
which I think is not typical for real-world data sets.

The data I've used are less extreme, depending on the fraction of random
data in values.

I went through the code too. I've reworder a couple of comments and code
style issues, but there are a couple of more serious issues.

1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?

This seems unnecessary, and it discards the clear hint that it's about
accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
IMHO we should keep the original naming.

2) pglz_maximum_compressed_size signatures are confusing

There are two places with pglz_maximum_compressed_size signature, and
those places are kinda out of sync when it comes to parameter names:

int32
pglz_maximum_compressed_size(int32 raw_slice_size,
int32 total_compressed_size)

extern
int32 pglz_maximum_compressed_size(int32 raw_slice_size,
int32 raw_size);

Also, pg_lzcompress.c has no concept of a "slice" because it only deals
with simple compression, slicing is responsibility of the tuptoaster. So
we should not mix those two, not even in comments.

I propose tweaks per the attached patch - I think it makes the code
clearer, and it's mostly cosmetic stuff. But I haven't tested the
changes beyond "it compiles".

regards

FWIW I've done another round of tests with larger values (up to ~10MB)
and the larger the values the better the speedup (a bit as expected).
Attached is the script I used and a spreasheet with a summary.

We still need a patch addressing the review comments, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi, Tomas!
Sorry for the late reply.
Thank you for further testing, I am trying to reproduce your first test
summary,
since I think the performance of the patch will not drop in almost all
cases.

Besides, If a value is composed of random characters,
it will be hard to be compressed, because pglz requires a 25% compression
ratio by default or not worth it.
This means that querying the value will not trigger the patch. But the
first test results show that the patch
is slower than the master when the value is composed of random characters,
which is confusing.

From the second test result, we can infer that the first test result
was indeed affected by a random disturbance in the case of a small
time-consuming.

We still need a patch addressing the review comments, though.

done:)

Attachments:

0001-Optimize-partial-TOAST-decompression-6.patchtext/x-patch; charset=US-ASCII; name=0001-Optimize-partial-TOAST-decompression-6.patchDownload+74-9
#15Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Binguo Bao (#14)
Re: Optimize partial TOAST decompression

On Wed, Jul 10, 2019 at 01:35:25PM +0800, Binguo Bao wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> 于2019年7月10日周三 上午5:12写道:

On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:

On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with

suffix. And the exact ERROR changes after the prefix query.

I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this

function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.

I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Thanks, this seems to address the issue - I can no longer reproduce the
crashes, allowing the benchmark to complete. I'm attaching the script I
used and spreadsheet with a summary of results.

For the cases I've tested (100k - 10M values, different compressibility,
different prefix/length values), the results are kinda mixed - many
cases got much faster (~2x), but other cases got slower too. We're
however talking about queries taking a couple of miliseconds, so in
absolute numbers the differences are small.

That does not mean the optimization is useless - but the example shared
at the beginning of this thread is quite extreme, as the values are
extremely compressible. Each value is ~38MB (in plaintext), but a table
with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
which I think is not typical for real-world data sets.

The data I've used are less extreme, depending on the fraction of random
data in values.

I went through the code too. I've reworder a couple of comments and code
style issues, but there are a couple of more serious issues.

1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?

This seems unnecessary, and it discards the clear hint that it's about
accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
IMHO we should keep the original naming.

2) pglz_maximum_compressed_size signatures are confusing

There are two places with pglz_maximum_compressed_size signature, and
those places are kinda out of sync when it comes to parameter names:

int32
pglz_maximum_compressed_size(int32 raw_slice_size,
int32 total_compressed_size)

extern
int32 pglz_maximum_compressed_size(int32 raw_slice_size,
int32 raw_size);

Also, pg_lzcompress.c has no concept of a "slice" because it only deals
with simple compression, slicing is responsibility of the tuptoaster. So
we should not mix those two, not even in comments.

I propose tweaks per the attached patch - I think it makes the code
clearer, and it's mostly cosmetic stuff. But I haven't tested the
changes beyond "it compiles".

regards

FWIW I've done another round of tests with larger values (up to ~10MB)
and the larger the values the better the speedup (a bit as expected).
Attached is the script I used and a spreasheet with a summary.

We still need a patch addressing the review comments, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi, Tomas! Sorry for the late reply. Thank you for further testing, I
am trying to reproduce your first test summary, since I think the
performance of the patch will not drop in almost all cases.

I've done some changes to the test script since the first benchmark,
aiming to make the results more stable

1) uses larger amount of data (10x more)

2) the data set recreated for each run (to rule out random differences in
the random data affecting the runs differently)

3) minor configuration changes (more shared buffers etc.)

I don't think we need to worry about small differences (within ~5%) which
can easily be due to changes to binary layout. And otherwise results from
the second benchmark round seem much more stable.

Besides, If a value is composed of random characters,
it will be hard to be compressed, because pglz requires a 25% compression
ratio by default or not worth it.
This means that querying the value will not trigger the patch. But the
first test results show that the patch
is slower than the master when the value is composed of random characters,
which is confusing.

Yes, I know. But the values have compressible and incompressible (random)
part, so in most cases the value should be compressible, although with
various compression ratio. I have not tracked the size of the loaded data
so I don't know which cases happened to be compressed or not. I'll rerun
the test and I'll include this information.

From the second test result, we can infer that the first test result
was indeed affected by a random disturbance in the case of a small
time-consuming.

Yes, I agree.

We still need a patch addressing the review comments, though.

done:)

Thanks.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#15)
Re: Optimize partial TOAST decompression

Hello, can you please update this patch?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#16)
Re: Optimize partial TOAST decompression

On Wed, Sep 25, 2019 at 05:38:34PM -0300, Alvaro Herrera wrote:

Hello, can you please update this patch?

I'm not the patch author, but I've been looking at the patch recently
and I have a rebased version at hand - so attached.

FWIW I believe the patch is solid and in good shape, and it got stuck
after I reported some benchmarks showing somewhat flaky performance. I
know Binguo Bao was trying to reproduce the benchmark, and I assume the
silence means he was not successful :-(

On the larger data set the patch however performed very nicely, so maybe
I just did something stupid while running the smaller one, or maybe it's
just noise (the queries were just a couple of ms in that test). I do
plan to rerun the benchmarks and investigate a bit - if I find the patch
is fine, I'd like to commit it shortly.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Optimize-partial-TOAST-decompression-7.patchtext/plain; charset=us-asciiDownload+74-9
#18Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#17)
Re: Optimize partial TOAST decompression

On Fri, Sep 27, 2019 at 01:00:36AM +0200, Tomas Vondra wrote:

On Wed, Sep 25, 2019 at 05:38:34PM -0300, Alvaro Herrera wrote:

Hello, can you please update this patch?

I'm not the patch author, but I've been looking at the patch recently
and I have a rebased version at hand - so attached.

FWIW I believe the patch is solid and in good shape, and it got stuck
after I reported some benchmarks showing somewhat flaky performance. I
know Binguo Bao was trying to reproduce the benchmark, and I assume the
silence means he was not successful :-(

On the larger data set the patch however performed very nicely, so maybe
I just did something stupid while running the smaller one, or maybe it's
just noise (the queries were just a couple of ms in that test). I do
plan to rerun the benchmarks and investigate a bit - if I find the patch
is fine, I'd like to commit it shortly.

OK, I was just about to push this after polishing it a bit, but then I
noticed the patch does not address one of the points from Paul' review,
asking for comment explaining the pglz_maximum_compressed_size formula.

I mean this:

/*
* Use int64 to prevent overflow during calculation.
*/
compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

#define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?

Regarding benchmarks - as I mentioned, I've repeated the tests and
everything seems fine. The results from the two usual machines are
available in [1]https://github.com/tvondra/toast-optimize. There are only very few noise-level regressions and
many very significant speedups.

I have a theory what went wrong in the first run that showed some
regressions - it's possible the machine had CPU power management
enabled. I can't check this retroactively, but it'd explain variability
for short queries, and smooth behavior on longer queries.

[1]: https://github.com/tvondra/toast-optimize

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Andrey Borodin
amborodin@acm.org
In reply to: Tomas Vondra (#18)
Re: Optimize partial TOAST decompression

30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):

I mean this:

/*
* Use int64 to prevent overflow during calculation.
*/
compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

#define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?

compressed_size accounts for possible increase of size during compression. pglz can consume up to 1 control byte for each 8 bytes of data in worst case.
Even if whole data is compressed well - there can be prefix compressed extremely ineffectively. Thus, if you are going to decompress rawsize bytes, you need at most compressed_size bytes of compressed input.

#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andrey Borodin (#19)
Re: Optimize partial TOAST decompression

On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:

30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):

I mean this:

/*
* Use int64 to prevent overflow during calculation.
*/
compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

#define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?

compressed_size accounts for possible increase of size during
compression. pglz can consume up to 1 control byte for each 8 bytes of
data in worst case.

OK, but does that actually translate in to the formula? We essentially
need to count 8-byte chunks in raw data, and multiply that by 9. Which
gives us something like

nchunks = ((rawsize + 7) / 8) * 9;

which is not quite what the patch does.

Even if whole data is compressed well - there can be prefix compressed
extremely ineffectively. Thus, if you are going to decompress rawsize
bytes, you need at most compressed_size bytes of compressed input.

OK, that explains why we can't use the PGLZ_MAX_OUTPUT macro.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Andrey Borodin
amborodin@acm.org
In reply to: Tomas Vondra (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andrey Borodin (#21)
#23Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#22)
#24Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#24)
#26Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#26)
#28Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Tom Lane (#27)
#29Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Rushabh Lathia (#28)
#30Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Tomas Vondra (#29)