[PATCH] pg_stat_toast

Started by Gunnar "Nick" Bluthover 4 years ago40 messageshackers
Jump to latest
#1Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de

Hello -hackers!

Please have a look at the attached patch, which implements some
statistics for TOAST.

The idea (and patch) have been lurking here for quite a while now, so I
decided to dust it off, rebase it to HEAD and send it out for review today.

A big shoutout to Georgios Kokolatos, who gave me a crash course in PG
hacking, some very useful hints and valueable feedback early this year.

I'd like to get some feedback about the general idea, approach, naming
etc. before refining this further.

I'm not a C person and I s**k at git, so please be kind with me! ;-)
Also, I'm not subscribed here, so a CC would be much appreciated!

Why gather TOAST statistics?
============================
TOAST is transparent and opaque at the same time.
Whilst we know that it's there and we know _that_ it works, we cannot
generally tell _how well_ it works.

What we can't answer (easily) are questions like e.g.
- how many datums have been externalized?
- how many datums have been compressed?
- how often has a compression failed (resulted in no space saving)?
- how effective is the compression algorithm used on a column?
- how much time did the DB spend compressing/decompressing TOAST values?

The patch adds some functionality that will eventually be able to answer
these (and probably more) questions.

Currently, #1 - #4 can be answered based on the view contained in
"pg_stats_toast.sql":

postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,100000) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]--------+----------
schemaname | public
reloid | 16829
attnum | 2
relname | test
attname | lz4
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 6299710
originalsizesum | 320403204
-[ RECORD 2 ]--------+----------
schemaname | public
reloid | 16829
attnum | 3
relname | test
attname | std
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 8198819
originalsizesum | 320403204

Implementation
==============
I added some callbacks in backend/access/table/toast_helper.c to
"pgstat_report_toast_activity" in backend/postmaster/pgstat.c.

The latter (and the other additions there) are essentially 1:1 copies of
the function statistics.

Those were the perfect template, as IMHO the TOAST activities (well,
what we're interested in at least) are very much comparable to function
calls:
a) It doesn't really matter if the TOASTed data was committed, as "the
damage is done" (i.e. CPU cycles were used) anyway
b) The information can (thus/best) be stored on DB level, no need to
touch the relation or attribute statistics

I didn't find anything that could have been used as a hash key, so the
PgStat_StatToastEntry
uses the shiny new
PgStat_BackendAttrIdentifier
(containing relid Oid, attr int).

For persisting in the statsfile, I chose the identifier 'O' (as 'T' was
taken).

What's working?
===============
- Gathering of TOAST externalization and compression events
- collecting the sizes before and after compression
- persisting in statsfile
- not breaking "make check"
- not crashing anything (afaict)

What's missing (yet)?
===============
- proper definition of the "pgstat_track_toast" GUC
- Gathering of times (for compression [and decompression?])
- improve "pg_stat_toast" view and include it in the catalog
- documentation (obviously)
- proper naming (of e.g. the hash key type, functions, view columns etc.)
- would it be necessary to implement overflow protection for the size &
time sums?

Thanks in advance & best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

0001-initial-patch-of-pg_stat_toast-for-hackers.patchtext/x-patch; charset=UTF-8; name=0001-initial-patch-of-pg_stat_toast-for-hackers.patchDownload+533-7
#2Andres Freund
andres@anarazel.de
In reply to: Gunnar "Nick" Bluth (#1)
Re: [PATCH] pg_stat_toast

Hi,

On 2021-12-12 17:20:58 +0100, Gunnar "Nick" Bluth wrote:

Please have a look at the attached patch, which implements some statistics
for TOAST.

The idea (and patch) have been lurking here for quite a while now, so I
decided to dust it off, rebase it to HEAD and send it out for review today.

A big shoutout to Georgios Kokolatos, who gave me a crash course in PG
hacking, some very useful hints and valueable feedback early this year.

I'd like to get some feedback about the general idea, approach, naming etc.
before refining this further.

I'm worried about the additional overhead this might impose. For some workload
it'll substantially increase the amount of stats traffic. Have you tried to
qualify the overheads? Both in stats size and in stats management overhead?

Greetings,

Andres Freund

#3Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Andres Freund (#2)
Re: [PATCH] pg_stat_toast

Am 12.12.21 um 22:52 schrieb Andres Freund:

Hi,

On 2021-12-12 17:20:58 +0100, Gunnar "Nick" Bluth wrote:

Please have a look at the attached patch, which implements some statistics
for TOAST.

The idea (and patch) have been lurking here for quite a while now, so I
decided to dust it off, rebase it to HEAD and send it out for review today.

A big shoutout to Georgios Kokolatos, who gave me a crash course in PG
hacking, some very useful hints and valueable feedback early this year.

I'd like to get some feedback about the general idea, approach, naming etc.
before refining this further.

I'm worried about the additional overhead this might impose. For some workload
it'll substantially increase the amount of stats traffic. Have you tried to
qualify the overheads? Both in stats size and in stats management overhead?

I'd lie if I claimed so...

Regarding stats size; it adds one PgStat_BackendToastEntry
(PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64
bytes or something in that ballpark) per TOASTable attribute, I can't
see that make any system break sweat ;-)

A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with
and without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557
ms. So at least the call overhead seems to be neglectible.

Obviously, this was really a quick run and doesn't reflect real life.
I'll have the machine run some reasonable tests asap, also looking at
stat size, of course!

Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

#4Andres Freund
andres@anarazel.de
In reply to: Gunnar "Nick" Bluth (#3)
Re: [PATCH] pg_stat_toast

Hi,

On 2021-12-13 00:00:23 +0100, Gunnar "Nick" Bluth wrote:

Regarding stats size; it adds one PgStat_BackendToastEntry
(PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or
something in that ballpark) per TOASTable attribute, I can't see that make
any system break sweat ;-)

That's actually a lot. The problem is that all the stats data for a database
is loaded into private memory for each connection to that database, and that
the stats collector regularly writes out all the stats data for a database.

A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and
without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So
at least the call overhead seems to be neglectible.

Yea, you'd probably need a few more tables and a few more connections for it
to have a chance of mattering meaningfully.

Greetings,

Andres Freund

#5Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Andres Freund (#4)
Re: [PATCH] pg_stat_toast

Am 13.12.21 um 00:41 schrieb Andres Freund:

Hi,

On 2021-12-13 00:00:23 +0100, Gunnar "Nick" Bluth wrote:

Regarding stats size; it adds one PgStat_BackendToastEntry
(PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or
something in that ballpark) per TOASTable attribute, I can't see that make
any system break sweat ;-)

That's actually a lot. The problem is that all the stats data for a database
is loaded into private memory for each connection to that database, and that
the stats collector regularly writes out all the stats data for a database.

My understanding is that the stats file is only pulled into the backend
when the SQL functions (for the view) are used (see
"pgstat_fetch_stat_toastentry()").

Otherwise, a backend just initializes an empty hash, right?

Of which I reduced the initial size from 512 to 32 for the below tests
(I guess the "truth" lies somewhere in between here), along with making
the GUC parameter an actual GUC parameter and disabling the elog() calls
I scattered all over the place ;-) for the v0.2 patch attached.

A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and
without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So
at least the call overhead seems to be neglectible.

Yea, you'd probably need a few more tables and a few more connections for it
to have a chance of mattering meaningfully.

So, I went ahead and
* set up 2 clusters with "track_toast" off and on resp.
* created 100 DBs
* each with 100 tables
* with one TOASTable column in each table
* filling those with 32000 bytes of md5 garbage

These clusters sum up to ~ 2GB each, so differences should _start to_
show up, I reckon.

$ du -s testdb*
2161208 testdb
2163240 testdb_tracking

$ du -s testdb*/pg_stat
4448 testdb/pg_stat
4856 testdb_tracking/pg_stat

The db_*.stat files are 42839 vs. 48767 bytes each (so confirmed, the
differences do show).

No idea if this is telling us anything, tbth, but the
/proc/<pid>/smaps_rollup for a backend serving one of these DBs look
like this ("0 kB" lines omitted):

track_toast OFF
===============
Rss: 12428 kB
Pss: 5122 kB
Pss_Anon: 1310 kB
Pss_File: 2014 kB
Pss_Shmem: 1797 kB
Shared_Clean: 5864 kB
Shared_Dirty: 3500 kB
Private_Clean: 1088 kB
Private_Dirty: 1976 kB
Referenced: 11696 kB
Anonymous: 2120 kB

track_toast ON (view not called yet):
=====================================
Rss: 12300 kB
Pss: 4883 kB
Pss_Anon: 1309 kB
Pss_File: 1888 kB
Pss_Shmem: 1685 kB
Shared_Clean: 6040 kB
Shared_Dirty: 3468 kB
Private_Clean: 896 kB
Private_Dirty: 1896 kB
Referenced: 11572 kB
Anonymous: 2116 kB

track_toast ON (view called):
=============================
Rss: 15408 kB
Pss: 7482 kB
Pss_Anon: 2083 kB
Pss_File: 2572 kB
Pss_Shmem: 2826 kB
Shared_Clean: 6616 kB
Shared_Dirty: 3532 kB
Private_Clean: 1472 kB
Private_Dirty: 3788 kB
Referenced: 14704 kB
Anonymous: 2884 kB

That backend used some memory for displaying the result too, of course...

A backend with just two TOAST columns in one table (filled with
1.000.001 rows) looks like this before and after calling the
"pg_stat_toast" view:
Rss: 146208 kB
Pss: 116181 kB
Pss_Anon: 2050 kB
Pss_File: 2787 kB
Pss_Shmem: 111342 kB
Shared_Clean: 6636 kB
Shared_Dirty: 45928 kB
Private_Clean: 1664 kB
Private_Dirty: 91980 kB
Referenced: 145532 kB
Anonymous: 2844 kB

Rss: 147736 kB
Pss: 103296 kB
Pss_Anon: 2430 kB
Pss_File: 3147 kB
Pss_Shmem: 97718 kB
Shared_Clean: 6992 kB
Shared_Dirty: 74056 kB
Private_Clean: 1984 kB
Private_Dirty: 64704 kB
Referenced: 147092 kB
Anonymous: 3224 kB

After creating 10.000 more tables (view shows 10.007 rows now), before
and after calling "TABLE pg_stat_toast":
Rss: 13816 kB
Pss: 4898 kB
Pss_Anon: 1314 kB
Pss_File: 1755 kB
Pss_Shmem: 1829 kB
Shared_Clean: 5972 kB
Shared_Dirty: 5760 kB
Private_Clean: 832 kB
Private_Dirty: 1252 kB
Referenced: 13088 kB
Anonymous: 2124 kB

Rss: 126816 kB
Pss: 55213 kB
Pss_Anon: 5383 kB
Pss_File: 2615 kB
Pss_Shmem: 47215 kB
Shared_Clean: 6460 kB
Shared_Dirty: 113028 kB
Private_Clean: 1600 kB
Private_Dirty: 5728 kB
Referenced: 126112 kB
Anonymous: 6184 kB

That DB's stat-file is now 4.119.254 bytes (3.547.439 without track_toast).

After VACUUM ANALYZE, the size goes up to 5.919.812 (5.348.768).
The "100 tables" DBs' go to 97.910 (91.868) bytes.

In total:
$ du -s testdb*/pg_stat
14508 testdb/pg_stat
15472 testdb_tracking/pg_stat

IMHO, this would be ok to at least enable temporarily (e.g. to find out
if MAIN or EXTERNAL storage/LZ4 compression would be ok/better for some
columns).

All the best,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pgstat_toast_v0.2.patchtext/x-patch; charset=UTF-8; name=pgstat_toast_v0.2.patchDownload+542-7
#6Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Gunnar "Nick" Bluth (#1)
RE: [PATCH] pg_stat_toast

Dear Gunnar,

postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,100000) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]--------+----------
schemaname | public
reloid | 16829
attnum | 2
relname | test
attname | lz4
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 6299710
originalsizesum | 320403204
-[ RECORD 2 ]--------+----------
schemaname | public
reloid | 16829
attnum | 3
relname | test
attname | std
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 8198819
originalsizesum | 320403204

I'm not sure about TOAST, but currently compressions are configurable:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294

How about adding a new attribute "method" to pg_stat_toast?
ToastAttrInfo *attr->tai_compression represents how compress the data,
so I think it's easy to add.
Or, is it not needed because pg_attr has information?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#7Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Hayato Kuroda (Fujitsu) (#6)
Re: [PATCH] pg_stat_toast

Am 20.12.2021 um 04:20 schrieb kuroda.hayato@fujitsu.com:

Dear Gunnar,

Hi Kuroda-San!

postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,100000) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]--------+----------
schemaname | public
reloid | 16829
attnum | 2
relname | test
attname | lz4
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 6299710
originalsizesum | 320403204
-[ RECORD 2 ]--------+----------
schemaname | public
reloid | 16829
attnum | 3
relname | test
attname | std
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 8198819
originalsizesum | 320403204

I'm not sure about TOAST, but currently compressions are configurable:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294

How about adding a new attribute "method" to pg_stat_toast?
ToastAttrInfo *attr->tai_compression represents how compress the data,
so I think it's easy to add.
Or, is it not needed because pg_attr has information?

That information could certainly be included in the view, grabbing the
information from pg_attribute.attcompression. It probably should!

I guess the next step will be to include that view in the catalog
anyway, so I'll do that next.

Thx for the feedback!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

#8Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Gunnar "Nick" Bluth (#1)
[PATCH] pg_stat_toast v0.3

Am 12.12.21 um 17:20 schrieb Gunnar "Nick" Bluth:

Hello -hackers!

Please have a look at the attached patch, which implements some
statistics for TOAST.

The attached v0.3 includes
* a proper GUC "track_toast" incl. postgresql.conf.sample line
* gathering timing information
* the system view "pg_stat_toast"
* naming improvements more than welcome!
* columns "storagemethod" and "compressmethod" added as per Hayato
Kuroda's suggestion
* documentation (pointing out the potential impacts as per Andres
Freund's reservations)

Any hints on how to write meaningful tests would be much appreciated!
I.e., will a test

INSERTing some long text to a column raises the view counts and
"compressedsize" is smaller than "originalsize"

suffice?!?

Cheers & best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pgstat_toast_v0.3.patchtext/x-patch; charset=UTF-8; name=pgstat_toast_v0.3.patchDownload+778-8
#9Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Gunnar "Nick" Bluth (#8)
RE: [PATCH] pg_stat_toast v0.3

Dear Gunnar,

The attached v0.3 includes
* columns "storagemethod" and "compressmethod" added as per Hayato
Kuroda's suggestion

I prefer your implementation that referring another system view.

* gathering timing information

Here is a minor comment:
I'm not sure when we should start to measure time.
If you want to record time spent for compressing, toast_compress_datum() should be
sandwiched by INSTR_TIME_SET_CURRENT() and caclurate the time duration.
Currently time_spent is calcurated in the pgstat_report_toast_activity(),
but this have a systematic error.
If you want to record time spent for inserting/updating, however,
I think we should start measuring at the upper layer, maybe heap_toast_insert_or_update().

Any hints on how to write meaningful tests would be much appreciated!
I.e., will a test

I searched hints from PG sources, and I thought that modules/ subdirectory can be used.
Following is the example:
https://github.com/postgres/postgres/tree/master/src/test/modules/commit_ts

I attached a patch to create a new test. Could you rewrite the sql and the output file?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

adding_new_test.patchapplication/octet-stream; name=adding_new_test.patchDownload+31-0
#10Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Hayato Kuroda (Fujitsu) (#9)
Re: [PATCH] pg_stat_toast v0.4

Am 21.12.21 um 13:51 schrieb kuroda.hayato@fujitsu.com:

Dear Gunnar,

Hayato-san, all,

thanks for the feedback!

* gathering timing information

Here is a minor comment:
I'm not sure when we should start to measure time.
If you want to record time spent for compressing, toast_compress_datum() should be
sandwiched by INSTR_TIME_SET_CURRENT() and caclurate the time duration.
Currently time_spent is calcurated in the pgstat_report_toast_activity(),
but this have a systematic error.
If you want to record time spent for inserting/updating, however,
I think we should start measuring at the upper layer, maybe heap_toast_insert_or_update().

Yes, both toast_compress_datum() and toast_save_datum() are sandwiched
the way you mentioned, as that's exactly what we want to measure (time
spent doing compression and/or externalizing data).

Implementation-wise, I (again) took "track_functions" as a template
there, assuming that jumping into pgstat_report_toast_activity() and
only then checking if "track_toast = on" isn't too costly (we call
pgstat_init_function_usage() and pgstat_end_function_usage() a _lot_).

I did miss though that
INSTR_TIME_SET_CURRENT(time_spent);
should be called right after entering pgstat_report_toast_activity(), as
that might need additional clock ticks for setting up the hash etc.
That's fixed now.

What I can't assess is the cost of the unconditional call to
INSTR_TIME_SET_CURRENT(start_time) in both toast_tuple_try_compression()
and toast_tuple_externalize().

Would it be wise (cheaper) to add a check like
if (pgstat_track_toast)
before querying the system clock?

Any hints on how to write meaningful tests would be much appreciated!
I.e., will a test

I searched hints from PG sources, and I thought that modules/ subdirectory can be used.
Following is the example:
https://github.com/postgres/postgres/tree/master/src/test/modules/commit_ts

I attached a patch to create a new test. Could you rewrite the sql and the output file?

Thanks for the kickstart ;-)

Some tests (as meaningful as they may get, I'm afraid) are now in
src/test/modules/track_toast/.
"make check-world" executes them successfully, although only after I
introduced a "SELECT pg_sleep(1);" to them.

pg_stat_toast_v0.4.patch attached.

Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pgstat_toast_v0.4.patchtext/x-patch; charset=UTF-8; name=pgstat_toast_v0.4.patchDownload+878-7
#11Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Gunnar "Nick" Bluth (#10)
Re: [PATCH] pg_stat_toast v0.4

Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth:

pg_stat_toast_v0.4.patch attached.

Aaaand I attached a former version of the patch file... sorry, I'm kind
of struggling with all the squashing/rebasing...
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pg_stat_toast_v0.4.patchtext/x-patch; charset=UTF-8; name=pg_stat_toast_v0.4.patchDownload+874-8
#12Justin Pryzby
pryzby@telsasoft.com
In reply to: Gunnar "Nick" Bluth (#11)
Re: [PATCH] pg_stat_toast v0.4

On Mon, Jan 03, 2022 at 05:00:45PM +0100, Gunnar "Nick" Bluth wrote:

Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth:

pg_stat_toast_v0.4.patch attached.

Note that the cfbot says this fails under windows

http://cfbot.cputube.org/gunnar-quotnickquot-bluth.html
...
[16:47:05.347] Could not determine contrib module type for track_toast
[16:47:05.347] at src/tools/msvc/mkvcbuild.pl line 31.

Aaaand I attached a former version of the patch file... sorry, I'm kind of
struggling with all the squashing/rebasing...

Soon you will think this is fun :)

--
Justin

#13Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Justin Pryzby (#12)
Re: [PATCH] pg_stat_toast v0.4

Am 03.01.22 um 17:50 schrieb Justin Pryzby:

On Mon, Jan 03, 2022 at 05:00:45PM +0100, Gunnar "Nick" Bluth wrote:

Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth:

pg_stat_toast_v0.4.patch attached.

Note that the cfbot says this fails under windows

Thanks for the heads up!

http://cfbot.cputube.org/gunnar-quotnickquot-bluth.html
...
[16:47:05.347] Could not determine contrib module type for track_toast
[16:47:05.347] at src/tools/msvc/mkvcbuild.pl line 31.

Not only Window$... as it turns out, one of the checks was pretty bogus.
Kicked that one and instead wrote two (hopefully) meaningful ones.

Also, I moved the tests to regress/, as they're not really for a module
anyway.

Let's see how this fares!

Aaaand I attached a former version of the patch file... sorry, I'm kind of
struggling with all the squashing/rebasing...

Soon you will think this is fun :)

As long as you're happy with plain patches like the attached one, I may ;-)

All the best,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pg_stat_toast_v0.5.patchtext/x-patch; charset=UTF-8; name=pg_stat_toast_v0.5.patchDownload
#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gunnar "Nick" Bluth (#13)
Re: [PATCH] pg_stat_toast v0.4

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:

Am 03.01.22 um 17:50 schrieb Justin Pryzby:

Soon you will think this is fun :)

As long as you're happy with plain patches like the attached one, I may ;-)

Well, with a zero-byte patch, not very much ...

BTW why do you number with a "0." prefix? It could just be "4" and "5"
and so on. There's no value in two-part version numbers for patches.
Also, may I suggest that "git format-patch -vN" with varying N is an
easier way to generate patches to attach?

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

#15Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Alvaro Herrera (#14)
Re: [PATCH] pg_stat_toast v0.4

Am 03.01.22 um 19:30 schrieb Alvaro Herrera:

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:

Am 03.01.22 um 17:50 schrieb Justin Pryzby:

Soon you will think this is fun :)

As long as you're happy with plain patches like the attached one, I may ;-)

Well, with a zero-byte patch, not very much ...

D'oh!!!!

BTW why do you number with a "0." prefix? It could just be "4" and "5"
and so on. There's no value in two-part version numbers for patches.
Also, may I suggest that "git format-patch -vN" with varying N is an
easier way to generate patches to attach?

Not when you have a metric ton of commits in the history... I'll
hopefully find a way to start over soon :/

9:38 $ git format-patch PGDG/master -v5 -o ..
../v5-0001-ping-pong-of-thougths.patch
../v5-0002-ping-pong-of-thougths.patch
../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch
../v5-0004-adds-some-groundwork-for-pg_stat_toast-to-pgstat..patch
../v5-0005-fixes-wrong-type-for-pgstat_track_toast-GUC.patch
../v5-0006-introduces-PgStat_BackendAttrIdentifier-OID-attr-.patch
../v5-0007-implements-and-calls-pgstat_report_toast_activity.patch
../v5-0008-Revert-adds-some-debugging-messages-in-toast_help.patch
../v5-0009-adds-more-detail-to-logging.patch
../v5-0010-adds-toastactivity-to-table-stats-and-many-helper.patch
../v5-0011-fixes-missed-replacement-in-comment.patch
../v5-0012-adds-SQL-support-functions.patch
../v5-0013-Add-SQL-functions.patch
../v5-0014-reset-to-HEAD.patch
../v5-0015-makes-DEBUG2-messages-more-precise.patch
../v5-0016-adds-timing-information-to-stats-and-view.patch
../v5-0017-adds-a-basic-set-of-tests.patch
../v5-0018-adds-a-basic-set-of-tests.patch
../v5-0019-chooses-a-PGSTAT_TOAST_HASH_SIZE-of-64-changes-ha.patch
../v5-0020-removes-whitespace-trash.patch
../v5-0021-returns-to-PGDG-master-.gitignore.patch
../v5-0022-pg_stat_toast_v0.5.patch
../v5-0023-moves-tests-to-regress.patch

But alas! INT versioning it is from now on!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pg_stat_toast_v5.patchtext/x-patch; charset=UTF-8; name=pg_stat_toast_v5.patchDownload+868-8
#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gunnar "Nick" Bluth (#15)
Re: [PATCH] pg_stat_toast v0.4

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:

9:38 $ git format-patch PGDG/master -v5 -o ..
../v5-0001-ping-pong-of-thougths.patch
../v5-0002-ping-pong-of-thougths.patch
../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch
...

Hmm, in such cases I would suggest to create a separate branch and then
"git merge --squash" for submission. You can keep your development
branch separate, with other merges if you want.

I've found this to be easier to manage, though I don't always follow
that workflow myself.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)

#17Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Alvaro Herrera (#16)
Re: [PATCH] pg_stat_toast v6

Am 03.01.22 um 20:11 schrieb Alvaro Herrera:

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:

9:38 $ git format-patch PGDG/master -v5 -o ..
../v5-0001-ping-pong-of-thougths.patch
../v5-0002-ping-pong-of-thougths.patch
../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch
...

Hmm, in such cases I would suggest to create a separate branch and then
"git merge --squash" for submission. You can keep your development
branch separate, with other merges if you want.

I've found this to be easier to manage, though I don't always follow
that workflow myself.

Using --stdout does help ;-)

I wonder why "track_toast.sql" test fails on Windows (with "ERROR:
compression method lz4 not supported"), but "compression.sql" doesn't.
Any hints?

Anyway, I shamelessly copied "wait_for_stats()" from the "stats.sql"
file and the tests _should_ now work at least on the platforms with lz4.

v6 attached!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachments:

pg_stat_toast_v6.patchtext/x-patch; charset=UTF-8; name=pg_stat_toast_v6.patchDownload+946-9
#18Justin Pryzby
pryzby@telsasoft.com
In reply to: Gunnar "Nick" Bluth (#17)
Re: [PATCH] pg_stat_toast v6

On Mon, Jan 03, 2022 at 08:40:50PM +0100, Gunnar "Nick" Bluth wrote:

I wonder why "track_toast.sql" test fails on Windows (with "ERROR:
compression method lz4 not supported"), but "compression.sql" doesn't.
Any hints?

The windows CI doesn't have LZ4, so the SQL command fails, but there's an
"alternate" expected/compression_1.out so that's accepted. (The regression
tests exercise many commands which fail, as expected, like creating an index on
an index).

If you're going have an alternate file for the --without-lz4 case, then I think
you should put it into compression.sql. (But not if you needed an alternate
for something else, since we'd need 4 alternates, which is halfway to 8...).

--
Justin

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Gunnar "Nick" Bluth (#17)
Re: [PATCH] pg_stat_toast v6
+pgstat_report_toast_activity(Oid relid, int attr,
+							bool externalized,
+							bool compressed,
+							int32 old_size,
+							int32 new_size,

...

+		if (new_size)
+		{
+			htabent->t_counts.t_size_orig+=old_size;
+			if (new_size)
+			{

I guess one of these is supposed to say old_size?

+		&pgstat_track_toast,
+		false,
+		NULL, NULL, NULL
+	},
{

+CREATE TABLE toast_test (cola TEXT, colb TEXT COMPRESSION lz4, colc TEXT , cold TEXT, cole TEXT);

Is there a reason this uses lz4 ?
If that's needed for stable results, I think you should use pglz, since that's
what's guaranteed to exist. I imagine LZ4 won't be required any time soon,
seeing as zlib has never been required.

+        Be aware that this feature, depending on the amount of TOASTable columns in
+        your databases, may significantly increase the size of the statistics files
+        and the workload of the statistics collector. It is recommended to only
+        temporarily activate this to assess the right compression and storage method
+        for (a) column(s).

saying "a column" is fine

+       <structfield>schemaname</structfield> <type>name</type>
+       Attribute (column) number in the relation
+       <structfield>relname</structfield> <type>name</type>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>compressmethod</structfield> <type>char</type>
+      </para>
+      <para>
+       Compression method of the attribute (empty means default)

One thing to keep in mind is that the current compression method is only used
for *new* data - old data can still use the old compression method. It
probably doesn't need to be said here, but maybe you can refer to the docs
about that in alter_table.

+ Number of times the compression was successful (gained a size reduction)

It's more clear to say "was reduced in size"

+	/* we assume this inits to all zeroes: */
+	static const PgStat_ToastCounts all_zeroes;

You don't have to assume; static/global allocations are always zero unless
otherwise specified.

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gunnar "Nick" Bluth (#17)
Re: [PATCH] pg_stat_toast v6

Overall I think this is a good feature to have; assessing the need for
compression is important for tuning, so +1 for the goal of the patch.

I didn't look into the patch carefully, but here are some minor
comments:

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:

@@ -229,7 +230,9 @@ toast_tuple_try_compression(ToastTupleContext *ttc, int attribute)
Datum *value = &ttc->ttc_values[attribute];
Datum new_value;
ToastAttrInfo *attr = &ttc->ttc_attr[attribute];
+ instr_time start_time;

+ INSTR_TIME_SET_CURRENT(start_time);
new_value = toast_compress_datum(*value, attr->tai_compression);

if (DatumGetPointer(new_value) != NULL)

Don't INSTR_TIME_SET_CURRENT unconditionally; in some systems it's an
expensive syscall. Find a way to only do it if the feature is enabled.
This also suggests that perhaps it'd be a good idea to allow this to be
enabled for specific tables only, rather than system-wide. (Maybe in
order for stats to be collected, the user should have to both set the
GUC option *and* set a per-table option? Not sure.)

@@ -82,10 +82,12 @@ typedef enum StatMsgType
PGSTAT_MTYPE_DEADLOCK,
PGSTAT_MTYPE_CHECKSUMFAILURE,
PGSTAT_MTYPE_REPLSLOT,
+ PGSTAT_MTYPE_CONNECTION,

I think this new enum value doesn't belong in this patch.

+/* ----------
+ * PgStat_ToastEntry			Per-TOAST-column info in a MsgFuncstat
+ * ----------

Not in "a MsgFuncstat", right?

+-- function to wait for counters to advance
+create function wait_for_stats() returns void as $$

I don't think we want a separate copy of wait_for_stats; see commit
fe60b67250a3 and the discussion leading to it. Maybe you'll want to
move the test to stats.sql. I'm not sure what to say about relying on
LZ4; maybe you'll want to leave that part out, and just verify in an
LZ4-enabled build that some 'l' entry exists. BTW, don't we have any
decent way to turn that 'l' into a more reasonable, descriptive string?

Also, perhaps make the view-defining query turn an empty compression
method into whatever the default is. Or even better, stats collection
should store the real compression method used rather than empty string,
to avoid confusing things when some stats are collected, then the
default is changed, then some more stats are collected.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"

#21Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Justin Pryzby (#19)
#22Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Alvaro Herrera (#20)
#23Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Gunnar "Nick" Bluth (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Gunnar "Nick" Bluth (#23)
#25Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Andres Freund (#24)
#26Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Gunnar "Nick" Bluth (#25)
#27Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Gunnar "Nick" Bluth (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Gunnar "Nick" Bluth (#27)
#29Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Robert Haas (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Gunnar "Nick" Bluth (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Gunnar "Nick" Bluth (#29)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#32)
#35Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Robert Haas (#31)
#36Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Robert Haas (#33)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Gunnar "Nick" Bluth (#35)
#38Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#37)
#39Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Andres Freund (#38)
#40Gunnar "Nick" Bluth
gunnar.bluth@pro-open.de
In reply to: Alvaro Herrera (#32)