Reducing output size of nodeToString

Started by Matthias van de Meentover 2 years ago29 messageshackers

boekewurm+postgres@gmail.com

over 2 years ago

Hi,

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
- Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
- Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
- Reset more fields to their default values before storing the values.
- Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.

Future work will probably have to be on a significantly different
storage format, as the textual format is about to hit its entropy
limits.

Kind regards,

Matthias van de Meent

[0]: /messages/by-id/CAEze2WgGexDM63dOvndLdAWwA6uSmSsc97jmrCuNmrF1JEDK7w@mail.gmail.com
[1]: /messages/by-id/CACxu=vL_SD=WJiFSJyyBuZAp_2v_XBqb1x9JBiqz52a_g9z3jA@mail.gmail.com
[2]: /messages/by-id/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Matthias van de Meent (#1)

Re: Reducing output size of nodeToString

On 06.12.23 22:08, Matthias van de Meent wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
- Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
- Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
- Reset more fields to their default values before storing the values.
- Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.

Future work will probably have to be on a significantly different
storage format, as the textual format is about to hit its entropy
limits.

One thing that was mentioned repeatedly is that we might want different
formats for human consumption and for machine storage.

For human consumption, I would like some format like what you propose,
because it generally omits the "unset" or "uninteresting" fields.

But since you also talk about the size of pg_rewrite, I wonder whether
it would be smaller if we just didn't write the field names at all but
instead all the field values. (This should be pretty easy to test,
since the read functions currently ignore the field names anyway; you
could just write out all field names as "x" and see what happens.)

I don't much like the way your patch uses the term "default". Most of
these default values are not defaults at all, but perhaps "most common
values". In theory, I would expect a default value to be initialized by
makeNode(). (That could be an interesting feature, but let's stay
focused here.) But even then most of these "defaults" wouldn't be
appropriate for a real default value. This part seems quite
controversial to me, and I would like to see some more details about how
much this specifically really saves.

I don't quite understand why in your patch you have some fields as
optional and some not. Or is that what WRITE_NODE_FIELD() vs.
WRITE_NODE_FIELD_OPT() means? How is it decided which one to use?

The part that clears out the location fields in pg_rewrite entries might
be worth considering as a separate patch. Could you explain it more?
Does it affect location pointers when using views at all?

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#2)

Re: Reducing output size of nodeToString

On Thu, 7 Dec 2023 at 11:26, Peter Eisentraut <peter@eisentraut.org> wrote:

On 06.12.23 22:08, Matthias van de Meent wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
- Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
- Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
- Reset more fields to their default values before storing the values.
- Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.

Future work will probably have to be on a significantly different
storage format, as the textual format is about to hit its entropy
limits.

One thing that was mentioned repeatedly is that we might want different
formats for human consumption and for machine storage.
For human consumption, I would like some format like what you propose,
because it generally omits the "unset" or "uninteresting" fields.

But since you also talk about the size of pg_rewrite, I wonder whether
it would be smaller if we just didn't write the field names at all but
instead all the field values. (This should be pretty easy to test,
since the read functions currently ignore the field names anyway; you
could just write out all field names as "x" and see what happens.)

I've been thinking about using a more binary storage format similar to
protobuf (but with system knowledge baked in, instead of PB's
defaults), but that would be dependent on functions that change the
output functions of pg_node_tree too, which Michel mentioned he would
work on a year ago (iiuc).

I think it would be a logical next step after this, but this patch is
just on building infrastructure that reduces the stored size without
getting in the way of Michel's work, if there was any result.

I don't much like the way your patch uses the term "default". Most of
these default values are not defaults at all, but perhaps "most common
values".

Yes, some 'defaults' are curated, but they have sound logic behind
them: *typmod is essentially always copied from an attypmod, which
defaults to -1. *isnull for any constant is generally unset. Many of
those other fields (once initialized by the relevant code) default to
those values I used.

In theory, I would expect a default value to be initialized by
makeNode(). (That could be an interesting feature, but let's stay
focused here.) But even then most of these "defaults" wouldn't be
appropriate for a real default value. This part seems quite
controversial to me, and I would like to see some more details about how
much this specifically really saves.

The tuning of these "defaults" got the savings from 20-30% to this
50%+ reduction in raw size.

I don't quite understand why in your patch you have some fields as
optional and some not. Or is that what WRITE_NODE_FIELD() vs.
WRITE_NODE_FIELD_OPT() means? How is it decided which one to use?

I use _OPT when I know the value is likely to be its defualt value,
and don't change over to _OPT when I know with great certainty the
value is going to be dynamic, such as relation ID in RTEs, but this is
only relevant for manual code as generated code essentially always
uses the _OPT paths.

The part that clears out the location fields in pg_rewrite entries might
be worth considering as a separate patch. Could you explain it more?
Does it affect location pointers when using views at all?

Views don't store the original query string, so the location pointers
in views point to locations in a now non-existent query string.
Additionally, unless WRITE_READ_PARSE_PLAN_TREES is defined,
READ_LOCATION_FIELD does not actually read the stored value but
instead stores -1 in the indicated field, so in most cases there won't
be any difference between the deserialized data before and after this
part of the patch; the only difference is the amount of debugable
information stored in the view's internal data.
Note that resetting them to 'invalid' value thus makes sense, and
improves compressibility and allows removal from the serialized format
when serialization omits fields with default values.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Matthias van de Meent (#1)

Re: Reducing output size of nodeToString

On Thu, 7 Dec 2023 at 10:09, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size significantly.

It would be very cool to have the technology proposed by Andres back
in 2019 [1]/messages/by-id/flat/20190828234136.fk2ndqtld3onfrrp@alap3.anarazel.de. With that, we could easily write various output
functions. One could be compact and easily machine-readable and
another designed to be better for humans for debugging purposes.

We could also easily serialize plans to binary format for copying to
parallel workers rather than converting them to a text-based
serialized format. It would also allow us to do things like serialize
PREPAREd plans into a nicely compact single allocation that we could
just pfree in a single pfree call on DEALLOCATE.

Likely we could just use the existing Perl scripts to form the
metadata arrays rather than the clang parsing stuff Andres used in his
patch.

Anyway, just wanted to ensure you knew about this idea.

David

[1]: /messages/by-id/flat/20190828234136.fk2ndqtld3onfrrp@alap3.anarazel.de

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: David Rowley (#4)

Re: Reducing output size of nodeToString

On Thu, 7 Dec 2023 at 13:09, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 7 Dec 2023 at 10:09, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size significantly.

It would be very cool to have the technology proposed by Andres back
in 2019 [1]. With that, we could easily write various output
functions. One could be compact and easily machine-readable and
another designed to be better for humans for debugging purposes.

We could also easily serialize plans to binary format for copying to
parallel workers rather than converting them to a text-based
serialized format. It would also allow us to do things like serialize
PREPAREd plans into a nicely compact single allocation that we could
just pfree in a single pfree call on DEALLOCATE.

I'm not sure what benefit you're refering to. If you mean "it's more
compact than the current format" then sure; but the other points can
already be covered by either the current nodeToString format, or by
nodeCopy-ing the prepared plan into its own MemoryContext, which would
allow us to do essentially the same thing.

Likely we could just use the existing Perl scripts to form the
metadata arrays rather than the clang parsing stuff Andres used in his
patch.

Anyway, just wanted to ensure you knew about this idea.

I knew about that thread thread, but didn't notice the metadata arrays
part of it, which indeed looks interesting for this patch. Thanks for
pointing it out. I'll see if I can incorporate parts of that into this
patchset.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Matthias van de Meent (#1)

Re: Reducing output size of nodeToString

On 06.12.23 22:08, Matthias van de Meent wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
- Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
- Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
- Reset more fields to their default values before storing the values.
- Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.

Based on our discussions, my understanding is that you wanted to produce
an updated patch set that is split up a bit.

My suggestion is to make incremental patches along these lines:

- Omit from output all fields that have value zero.

- Omit location fields that have value -1.

- Omit trailing zeroes for scalar values.

- Recent location fields before storing in pg_rewrite (or possibly
catalogs in general?)

- And then whatever is left, including the "default" value system that
you have proposed.

The last one I have some doubts about, as previously expressed, but the
first few seem sensible to me. By splitting it up we can consider these
incrementally.

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Matthias van de Meent (#5)

Re: Reducing output size of nodeToString

On Thu, 14 Dec 2023 at 19:21, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Thu, 7 Dec 2023 at 13:09, David Rowley <dgrowleyml@gmail.com> wrote:

We could also easily serialize plans to binary format for copying to
parallel workers rather than converting them to a text-based
serialized format. It would also allow us to do things like serialize
PREPAREd plans into a nicely compact single allocation that we could
just pfree in a single pfree call on DEALLOCATE.

I'm not sure what benefit you're refering to. If you mean "it's more
compact than the current format" then sure; but the other points can
already be covered by either the current nodeToString format, or by
nodeCopy-ing the prepared plan into its own MemoryContext, which would
allow us to do essentially the same thing.

There's significantly less memory involved in just having a plan
serialised into a single chunk of memory vs a plan stored in its own
MemoryContext. With the serialised plan, you don't have any power of
2 rounding up wastage that aset.c does and don't need extra space for
all the MemoryChunks that would exist for every single palloc'd chunk
in the MemoryContext version.

I think it would be nice if one day in the future if a PREPAREd plan
could have multiple different plans cached. We could then select which
one to use by looking at statistics for the given parameters and
choose the plan that's most suitable for the given parameters. Of
course, this is a whole entirely different project. I mention it just
because being able to serialise a plan would make the memory
management and overhead for such a feature much more manageable.
There'd likely need to be some eviction logic in such a feature as the
number of possible plans for some complex query is quite likely to be
much more than we'd care to cache.

David

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: David Rowley (#7)

Re: Reducing output size of nodeToString

On Wed, 3 Jan 2024 at 03:02, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 14 Dec 2023 at 19:21, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Thu, 7 Dec 2023 at 13:09, David Rowley <dgrowleyml@gmail.com> wrote:

We could also easily serialize plans to binary format for copying to
parallel workers rather than converting them to a text-based
serialized format. It would also allow us to do things like serialize
PREPAREd plans into a nicely compact single allocation that we could
just pfree in a single pfree call on DEALLOCATE.

I'm not sure what benefit you're refering to. If you mean "it's more
compact than the current format" then sure; but the other points can
already be covered by either the current nodeToString format, or by
nodeCopy-ing the prepared plan into its own MemoryContext, which would
allow us to do essentially the same thing.

There's significantly less memory involved in just having a plan
serialised into a single chunk of memory vs a plan stored in its own
MemoryContext. With the serialised plan, you don't have any power of
2 rounding up wastage that aset.c does and don't need extra space for
all the MemoryChunks that would exist for every single palloc'd chunk
in the MemoryContext version.

I was envisioning this to use the Bump memory context you proposed
over in [0]/messages/by-id/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com, as to the best of my knowledge prepared plans are not
modified, so nodeCopy-ing a prepared plan into bump context could be a
good use case for those contexts. This should remove the issue of
rounding and memorychunk wastage in aset.

I think it would be nice if one day in the future if a PREPAREd plan
could have multiple different plans cached. We could then select which
one to use by looking at statistics for the given parameters and
choose the plan that's most suitable for the given parameters. Of
course, this is a whole entirely different project. I mention it just
because being able to serialise a plan would make the memory
management and overhead for such a feature much more manageable.
There'd likely need to be some eviction logic in such a feature as the
number of possible plans for some complex query is quite likely to be
much more than we'd care to cache.

Yeah, that'd be nice, but is also definitely future work.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0]: /messages/by-id/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#6)

Re: Reducing output size of nodeToString

On Tue, 2 Jan 2024 at 11:30, Peter Eisentraut <peter@eisentraut.org> wrote:

On 06.12.23 22:08, Matthias van de Meent wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
- Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
- Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
- Reset more fields to their default values before storing the values.
- Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.

Based on our discussions, my understanding is that you wanted to produce
an updated patch set that is split up a bit.

I mentioned that I've been working on implementing (but have not yet
completed) a binary serialization format, with an implementation based
on Andres' generated metadata idea. However, that requires more
elaborate infrastructure than is currently available, so while I said
I'd expected it to be complete before the Christmas weekend, it'll
take some more time - I'm not sure it'll be ready for PG17.

In the meantime here's an updated version of the v0 patch, formally
keeping the textual format alive, while reducing the size
significantly (nearing 2/3 reduction), taking your comments into
account. I think the gains are worth the consideration without taking
into account the as-of-yet unimplemented binary format.

My suggestion is to make incremental patches along these lines:
[...]

Something like the attached? It splits out into the following
0001: basic 'omit default values'
0002: reset location and other querystring-related node fields for all
catalogs of type pg_node_tree.
0003: add default marking on typmod fields.
0004 & 0006: various node fields marked with default() based on
observed common or initial values of those fields
0005: truncate trailing 0s from outDatum
0007 (new): do run-length + gap coding for bitmapset and the various
integer list types. This saves a surprising amount of bytes.

The last one I have some doubts about, as previously expressed, but the
first few seem sensible to me. By splitting it up we can consider these
incrementally.

That makes a lot of sense. The numbers for the full patchset do seem
quite positive though: The metrics of the query below show a 40%
decrease in size of a fresh pg_rewrite (standard toast compression)
and a 5% decrease in size of the template0 database. The uncompressed
data of pg_rewrite.ev_action is also 60% smaller.

select pg_database_size('template0') as "template0"
, pg_total_relation_size('pg_rewrite') as "pg_rewrite"
, sum(pg_column_size(ev_action)) as "compressed"
, sum(octet_length(ev_action)) as "raw"
from pg_rewrite;

version | template0 | pg_rewrite | compressed | raw
---------|-----------+------------+------------+---------
master | 7545359 | 761856 | 573307 | 2998712
0001 | 7365135 | 622592 | 438224 | 1943772
0002 | 7258639 | 573440 | 401660 | 1835803
0003 | 7258639 | 565248 | 386211 | 1672539
0004 | 7176719 | 483328 | 317099 | 1316552
0005 | 7176719 | 483328 | 315556 | 1300420
0006 | 7160335 | 466944 | 302806 | 1208621
0007 | 7143951 | 450560 | 287659 | 1187237

While looking through the data, I noticed the larger views now consist
for a significant portion out of range table entries, specifically the
Alias and Var nodes (which are mostly repeated and/or repetative
values, but split across Nodes). I think column-major storage would be
more efficient to write, but I'm not sure it's worth the effort in
planner code.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#10

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Matthias van de Meent (#9)

Re: Reducing output size of nodeToString

On 04.01.24 00:23, Matthias van de Meent wrote:

Something like the attached? It splits out into the following
0001: basic 'omit default values'

  /* Write an integer field (anything written as ":fldname %d") */
-#define WRITE_INT_FIELD(fldname) \
+#define WRITE_INT_FIELD_DIRECT(fldname) \
         appendStringInfo(str, " :" CppAsString(fldname) " %d", 
node->fldname)
+#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
+       ((node->fldname == default) ? (0) : WRITE_INT_FIELD_DIRECT(fldname))
+#define WRITE_INT_FIELD(fldname) \
+       WRITE_INT_FIELD_DEFAULT(fldname, 0)

Do we need the _DIRECT macros at all? Could we not combine that into
the _DEFAULT ones?

I think the way the conditional operator (?:) is written is not
technically correct C, because one side has an integer result (0) and
the other a void result (from appendStringInfo()). Also, this could
break accidentally even more if the result type of appendStringInfo()
was changed for some reason. I think it would be better to write this
in a more straightforward way like

#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
do { \
if (node->fldname == default) \
appendStringInfo(str, " :" CppAsString(fldname) " %d",
node->fldname); \
while (0)

Relatedly, this

+/* a scaffold function to read an optionally-omitted field */
+#define READ_OPT_SCAFFOLD(fldname, read_field_code, default_value) \
+       if (pg_strtoken_next(":" CppAsString(fldname))) \
+       { \
+               read_field_code; \
+       } \
+       else \
+               local_node->fldname = default_value

would need to be written with a do { } while (0) wrapper around it.

0002: reset location and other querystring-related node fields for all
catalogs of type pg_node_tree.

This goal makes sense, but I think it can be done in a better way. If
you look into the area of stringToNode(), stringToNodeWithLocations(),
and stringToNodeInternal(), there already is support for selectively
resetting or omitting location fields. Notably, this works with the
existing automated knowledge of where the location fields are and
doesn't require a new hand-maintained table. I think the way forward
here would be to apply a similar toggle to nodeToString() (the reverse).

0003: add default marking on typmod fields.
0004 & 0006: various node fields marked with default() based on
observed common or initial values of those fields

I think we could get about half the benefit here more automatically, by
creating a separate type for typmods, like

typedef int32 TypMod;

and then having the node support automatically generate the
serialization support with a -1 default.

(A similar thing could be applied to the location fields, which would
allow us to get rid of the current hack of parsing out the name.)

Most of the other defaults I'm doubtful about. First, we are colliding
here between the goals of minimizing the storage size and making the
debug output more readable. If a Query dump would now omit the
commandType field if it is CMD_SELECT, I think that would be widely
confusing, and one would need to check the source code to identify the
reason. Also, what if we later decide to change a "default" for a
field. Then output between version would differ. Of course, node
output does change between versions in general, but these kinds of
differences would be confusing. Second, this relies on hand-maintained
annotations that were created by you presumably through a combination of
intuition and testing, based on what is in the template database. Do we
know whether this matches real-world queries created by users later?
Also, my experience dealing with the node support over the last little
while is that these manually maintained exceptions get ossified and
outdated and create a maintenance headache for the future.

0005: truncate trailing 0s from outDatum

Does this significantly affect anything other than the "name" type?
User views don't usually use the "name" type, so this would have limited
impact outside of system views.

0007 (new): do run-length + gap coding for bitmapset and the various
integer list types. This saves a surprising amount of bytes.

Can you show examples of this? How would this affects the ability to
manually interpret the output?

#11

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#10)

Re: Reducing output size of nodeToString

On Tue, 9 Jan 2024, 09:23 Peter Eisentraut, <peter@eisentraut.org> wrote:

On 04.01.24 00:23, Matthias van de Meent wrote:

Something like the attached? It splits out into the following
0001: basic 'omit default values'
/* Write an integer field (anything written as ":fldname %d") */
-#define WRITE_INT_FIELD(fldname) \
+#define WRITE_INT_FIELD_DIRECT(fldname) \
appendStringInfo(str, " :" CppAsString(fldname) " %d",
node->fldname)
+#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
+       ((node->fldname == default) ? (0) : WRITE_INT_FIELD_DIRECT(fldname))
+#define WRITE_INT_FIELD(fldname) \
+       WRITE_INT_FIELD_DEFAULT(fldname, 0)
Do we need the _DIRECT macros at all? Could we not combine that into
the _DEFAULT ones?

I was planning on using them to reduce the size of generated code for
select fields that we know we will always serialize, but then later
decided against doing that in this patch as it'd add even more
arbitrary annotations to nodes. This is a leftover from that.

I think the way the conditional operator (?:) is written is not
technically correct C,
[...]
I think it would be better to write this
in a more straightforward way like

#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
do { \
[...]
while (0)

Relatedly, this

+/* a scaffold function to read an optionally-omitted field */
[...]
would need to be written with a do { } while (0) wrapper around it.

I'll fix that.

0002: reset location and other querystring-related node fields for all
catalogs of type pg_node_tree.

This goal makes sense, but I think it can be done in a better way. If
you look into the area of stringToNode(), stringToNodeWithLocations(),
and stringToNodeInternal(), there already is support for selectively
resetting or omitting location fields. Notably, this works with the
existing automated knowledge of where the location fields are and
doesn't require a new hand-maintained table. I think the way forward
here would be to apply a similar toggle to nodeToString() (the reverse).

I'll try to work something out for that.

0003: add default marking on typmod fields.
0004 & 0006: various node fields marked with default() based on
observed common or initial values of those fields

I think we could get about half the benefit here more automatically, by
creating a separate type for typmods, like

typedef int32 TypMod;

and then having the node support automatically generate the
serialization support with a -1 default.

Hm, I suspect that the code churn for that would be significant. I'd
also be confused when the type in storage (pg_attribute, pg_type's
typtypmod) is still int32 when it would be TypMod only in nodes.

(A similar thing could be applied to the location fields, which would
allow us to get rid of the current hack of parsing out the name.)

I suppose so.

Most of the other defaults I'm doubtful about. First, we are colliding
here between the goals of minimizing the storage size and making the
debug output more readable.

I've never really wanted to make the output "more readable". The
current one is too verbose, yes.

If a Query dump would now omit the
commandType field if it is CMD_SELECT, I think that would be widely
confusing, and one would need to check the source code to identify the
reason.

AFAIK, SELECT is the only command type you can possibly store in a
view (insert/delete/update/utility are all invalid there, and while
I'm not fully certain about MERGE, I'd say it's certainly a niche).

Also, what if we later decide to change a "default" for a
field. Then output between version would differ. Of course, node
output does change between versions in general, but these kinds of
differences would be confusing.

I've not heard of anyone trying to read and compare the contents of
pg_node_tree manually where they're not trying to debug some
deep-nested issue. Note

Second, this relies on hand-maintained
annotations that were created by you presumably through a combination of
intuition and testing, based on what is in the template database. Do we
know whether this matches real-world queries created by users later?

No, or at least I don't know this for certain. But I think it's a good start.

Also, my experience dealing with the node support over the last little
while is that these manually maintained exceptions get ossified and
outdated and create a maintenance headache for the future.

I'm not sure what headache this would become. nodeToString is a fairly
straightforward API with (AFAIK) no external dependencies, where only
nodes go in and out. The metadata on top of that will indeed require
some maintenance, but AFAIK only in the areas that read and utilize
said metadata. While it certainly wouldn't be great if we didn't have
this metadata, it'd be no worse than not having compression.

0005: truncate trailing 0s from outDatum

Does this significantly affect anything other than the "name" type?
User views don't usually use the "name" type, so this would have limited
impact outside of system views.

It saves a few bytes each on byval types like bool, oid, and int on
little-endian systems, as they don't utilize the latter bytes of the
4- or 8-byte Datum. At least in the default catalog this shaves some
bytes off.

0007 (new): do run-length + gap coding for bitmapset and the various
integer list types. This saves a surprising amount of bytes.

Can you show examples of this? How would this affects the ability to
manually interpret the output?

The ability to interpret the results manually is somewhat reduced for
complex cases (bitmaps), but things like RangeTableEntries are
significantly reduced in size because of this. A good amount of
IntegerLists is reduced to (i 1 +10) instead of (i 1 2 3 4 5 ... 11).
Specifically notable are the joinleftcols/joinrightcols fields, as
they will often contain large lists of joined columns when many tables
are joined together. While bitmaps are less prevalent/large, they also
benefit from this optimization.
As for bitmapsets, the use of differential coding saves bytes when the
set is large or otherwise has structure: the bitmapset of uneven
numbers (b 1 3 5 7 ... 23 25 27 ... 101 103 ...) takes up more space
(and is less compressible than) the equivalent differential coded (b 1
2 2 2 2 ...). This is at the cost of direct readability, but I think
that's worth it.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#12

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Matthias van de Meent (#11)

Re: Reducing output size of nodeToString

On 30.01.24 12:26, Matthias van de Meent wrote:

Most of the other defaults I'm doubtful about. First, we are colliding
here between the goals of minimizing the storage size and making the
debug output more readable.

I've never really wanted to make the output "more readable". The
current one is too verbose, yes.

My motivations at the moment to work in this area are (1) to make the
output more readable, and (2) to reduce maintenance burden of node
support functions.

There can clearly be some overlap with your goals. For example, a less
verbose and less redundant output can ease readability. But it can also
go the opposite direction; a very minimalized output can be less readable.

I would like to understand your target more. You have shown some
figures how these various changes reduce storage size in pg_rewrite.
But it's a few hundred kilobytes, if I read this correctly, maybe some
megabytes if you add a lot of user views. Does this translate into any
other tangible benefits, like you can store more views, or processing
views is faster, or something like that?

#13

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#12)

Re: Reducing output size of nodeToString

On Wed, 31 Jan 2024, 09:16 Peter Eisentraut, <peter@eisentraut.org> wrote:

On 30.01.24 12:26, Matthias van de Meent wrote:

Most of the other defaults I'm doubtful about. First, we are colliding
here between the goals of minimizing the storage size and making the
debug output more readable.

I've never really wanted to make the output "more readable". The
current one is too verbose, yes.

My motivations at the moment to work in this area are (1) to make the
output more readable, and (2) to reduce maintenance burden of node
support functions.

There can clearly be some overlap with your goals. For example, a less
verbose and less redundant output can ease readability. But it can also
go the opposite direction; a very minimalized output can be less readable.

I would like to understand your target more. You have shown some
figures how these various changes reduce storage size in pg_rewrite.
But it's a few hundred kilobytes, if I read this correctly, maybe some
megabytes if you add a lot of user views. Does this translate into any
other tangible benefits, like you can store more views, or processing
views is faster, or something like that?

I was also thinking about smaller per-attribute expression storage, for
index attribute expressions, table default expressions, and functions.
Other than that, less memory overhead for the serialized form of these
constructs also helps for catalog cache sizes, etc.
People complained about the size of a fresh initdb, and I agreed with them,
so I started looking at low-hanging fruits, and this is one.

I've not done any tests yet on whether it's more performant in general. I'd
expect the new code to do a bit better given the extremely verbose nature
of the data and the rather complex byte-at-a-time token read method used,
but this is currently hypothesis.
I do think that serialization itself may be slightly slower, but given that
this generally happens only in DDL, and that we have to grow the output
buffer less often, this too may still be a net win (but, again, this is an
untested hypothesis).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#14

Robert Haas

robertmhaas@gmail.com

over 2 years ago

In reply to: Matthias van de Meent (#13)

Re: Reducing output size of nodeToString

On Wed, Jan 31, 2024 at 11:17 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I was also thinking about smaller per-attribute expression storage, for index attribute expressions, table default expressions, and functions. Other than that, less memory overhead for the serialized form of these constructs also helps for catalog cache sizes, etc.
People complained about the size of a fresh initdb, and I agreed with them, so I started looking at low-hanging fruits, and this is one.

I've not done any tests yet on whether it's more performant in general. I'd expect the new code to do a bit better given the extremely verbose nature of the data and the rather complex byte-at-a-time token read method used, but this is currently hypothesis.
I do think that serialization itself may be slightly slower, but given that this generally happens only in DDL, and that we have to grow the output buffer less often, this too may still be a net win (but, again, this is an untested hypothesis).

I think we're going to have to have separate formats for debugging and
storage if we want to get very far here. The current format sucks for
readability because it's so verbose, and tightening that up where we
can makes sense to me. For me, that can include things like emitting
unset location fields for sure, but delta-encoding of bitmap sets is
more questionable. Turning 1 2 3 4 5 6 7 8 9 10 into 1-10 would be
fine with me because that is both shorter and more readable, but
turning 2 4 6 8 10 into 2 2 2 2 2 is way worse for a human reader.
Such optimizations might make sense in a format that is designed for
computer processing only but not one that has to serve multiple
purposes.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Robert Haas (#14)

Re: Reducing output size of nodeToString

On Wed, 31 Jan 2024 at 18:47, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 31, 2024 at 11:17 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I was also thinking about smaller per-attribute expression storage, for index attribute expressions, table default expressions, and functions. Other than that, less memory overhead for the serialized form of these constructs also helps for catalog cache sizes, etc.
People complained about the size of a fresh initdb, and I agreed with them, so I started looking at low-hanging fruits, and this is one.

I've not done any tests yet on whether it's more performant in general. I'd expect the new code to do a bit better given the extremely verbose nature of the data and the rather complex byte-at-a-time token read method used, but this is currently hypothesis.
I do think that serialization itself may be slightly slower, but given that this generally happens only in DDL, and that we have to grow the output buffer less often, this too may still be a net win (but, again, this is an untested hypothesis).

I think we're going to have to have separate formats for debugging and
storage if we want to get very far here. The current format sucks for
readability because it's so verbose, and tightening that up where we
can makes sense to me. For me, that can include things like emitting
unset location fields for sure, but delta-encoding of bitmap sets is
more questionable. Turning 1 2 3 4 5 6 7 8 9 10 into 1-10 would be
fine with me because that is both shorter and more readable, but
turning 2 4 6 8 10 into 2 2 2 2 2 is way worse for a human reader.
Such optimizations might make sense in a format that is designed for
computer processing only but not one that has to serve multiple
purposes.

I suppose so, yes. I've removed the delta-encoding from the
serialization of bitmapsets in the attached patchset.

Peter E. and I spoke about this patchset at FOSDEM PGDay, too. I said
to him that I wouldn't mind if this patchset was only partly applied:
The gains for most of the changes are definitely worth it even if some
others don't get in.

I think it'd be a nice QoL and storage improvement if even only (say)
the first two patches were committed, though the typmod default
markings (or alternatively, using a typedef-ed TypMod and one more
type-specific serialization handler) would also be a good improvement
without introducing too many "common value = default = omitted"
considerations that would reduce debugability.

Attached is patchset v2, which contains the improvements from these patches:

0001 has the "omit defaults" for the current types.
-23.5%pt / -35.1%pt (toasted / raw)
0002+0003 has new #defined type "Location" for those fields in Nodes
that point into (or have sizes of) query texts, and adds
infrastructure to conditionally omit them at all (see previous
discussions)
-3.5%pt / -6.3%pt
0004 has new #defined type TypeMod as alias for int32, that uses a
default value of -1 for (de)serialization purposes.
-3.0%pt / -6.1%pt
0005 updates Const node serialization to omit `:constvalue` if the
value is null.
+0.1%pt / -0.1%pt [^0]
0006 does run-length encoding for bitmaps and the various typed
integer lists, using "+int" as indicators of a run of a certain
length, excluding the start value.
Bitmaps, IntLists and XidLists are based on runs with increments
of 1 (so, a notation (i 1 +3) means (i 1 2 3 4), while OidLists are
based on runs with no increments (so, (o 1 +3) means (o 1 1 1 1).
-2.5%pt / -0.6%pt
0007 does add some select custom 'default' values, in that the
varnosyn and varattnosyn fields now treat the value of varno and
varattno as their default values.
This reduces the size of lists of Vars significantly and has a
very meaningful impact on the size of the compressed data (the default
pg_rewrite dataset contains some 10.8k Var nodes).
-10.4%pt / 9.7%pt

Total for the full applied patchset:
55.5% smaller data in pg_rewrite.ev_action before TOAST
45.7% smaller data in pg_rewrite.ev_action after applying TOAST

Toast relation size, as fraction of the main pg_rewrite table:
select pg_relation_size(2838) *1.0 / pg_relation_size('pg_rewrite');
master: 4.7
0007: 1.3

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[^0]: The small difference in size for patch 0005 is presumably due to
low occurrance of NULL-valued Const nodes. Additionally, the inline vs
out-of-line TOASTed data and data (not) fitting on the last blocks of
each relation are likely to cause the change in total compression
ratio. If we had more null-valued Const nodes in pg_rewrite, the ratio
would presumably have been better than this.

PS: query I used for my data collection, + combined data:

select 'master' as "version"
, pg_database_size('template0') as "template0"
, pg_total_relation_size('pg_rewrite') as "pg_rewrite"
, sum(pg_column_size(ev_action)) as "toasted"
, sum(octet_length(ev_action)) as "raw";

version | template0 | pg_rewrite | toasted | raw
---------+-----------+------------+---------+---------
master | 7537167 | 770048 | 574003 | 3002556
0001 | 7348751 | 630784 | 438852 | 1946364
0002 | 7242255 | 573440 | 403160 | 1840404
0003 | 7242255 | 573440 | 402325 | 1838367
0004 | 7225871 | 557056 | 384888 | 1652287
0005 | 7234063 | 565248 | 385678 | 1648717
0006 | 7217679 | 548864 | 371256 | 1627733
0007 | 7143951 | 475136 | 311255 | 1337496

#16

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Matthias van de Meent (#15)

Re: Reducing output size of nodeToString

On Mon, 12 Feb 2024 at 19:03, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Attached is patchset v2, which contains the improvements from these patches:

Attached v3, which fixes an out-of-bounds read in pg_strtoken_next,
detected by asan, that was a likely cause of the problems in CFBot's
FreeBSD regression tests.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#17

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Matthias van de Meent (#16)

Re: Reducing output size of nodeToString

On Mon, 12 Feb 2024 at 20:32, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Mon, 12 Feb 2024 at 19:03, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Attached is patchset v2, which contains the improvements from these patches:

Attached v3, which fixes an out-of-bounds read in pg_strtoken_next,
detected by asan, that was a likely cause of the problems in CFBot's
FreeBSD regression tests.

Apparently that was caused by issues in my updated bitmapset
serializer; where I used bms_next_member(..., x=0) as first iteration
thus skipping the first bit. This didn't show up earlier because that
bit is not exercised in PG's builtin views, but is exercised when
WRITE_READ_PARSE_PLAN_TREES is defined (as on the FreeBSD CI job).

Trivial fix in the attached v4 of the patchset, with some fixes for
other assertions that'd get some exercise in non-pg_node_tree paths in
the WRITE_READ configuration.

#18

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Matthias van de Meent (#17)

Re: Reducing output size of nodeToString

Thanks, this patch set is a good way to incrementally work through these
changes.

I have looked at
v4-0001-pg_node_tree-Omit-serialization-of-fields-with-de.patch today.
Here are my thoughts:

I believe we had discussed offline to not omit enum fields with value 0
(WRITE_ENUM_FIELD). This is because the values of enum fields are
implementation artifacts, and this could be confusing for readers.
(This could be added as a squeeze-out-every-byte change later, but if
we're going to keep the format fit for human reading, I think we should
skip this.)

I have some concerns about the round-trippability of float values. If
we do, effectively, if (node->fldname != 0.0), then I think this would
also match negative zero, but when we read it back it, it would get
assigned positive zero. Maybe there are other edge cases like this.
Might be safer to not mess with this.

On the reading side, the macro nesting has gotten a bit out of hand. :)
We had talked earlier in the thread about the _DIRECT macros and you
said there were left over from something else you want to try, but I see
nothing else in this patch set uses this. I think this could all be
much simpler, like (omitting required punctuation)

#define READ_INT_FIELD(fldname, default)
if ((token = next_field(fldname, &length)))
local_node->fldname = atoi(token);
else
local_node->fldname = default;

where next_field() would

1. read the next token
2. if it is ":fldname", continue;
else rewind the read pointer and return NULL
3. read the next token and return that

Not only is this simpler, but it might also have better performance,
because we don't have separate pg_strtok_next() and pg_strtok() calls in
sequence.

#19

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#18)

Re: Reducing output size of nodeToString

On Thu, 15 Feb 2024 at 13:59, Peter Eisentraut <peter@eisentraut.org> wrote:

Thanks, this patch set is a good way to incrementally work through these
changes.

I have looked at
v4-0001-pg_node_tree-Omit-serialization-of-fields-with-de.patch today.
Here are my thoughts:

I believe we had discussed offline to not omit enum fields with value 0
(WRITE_ENUM_FIELD). This is because the values of enum fields are
implementation artifacts, and this could be confusing for readers.

Thanks for reminding me, I didn't remember this when I worked on
updating the patchset. I'll update this soon.

I have some concerns about the round-trippability of float values. If
we do, effectively, if (node->fldname != 0.0), then I think this would
also match negative zero, but when we read it back it, it would get
assigned positive zero. Maybe there are other edge cases like this.
Might be safer to not mess with this.

That's a good point. Would an additional check that the sign of the
field equals the default's sign be enough for this? As for other
cases, I'm not sure we currently want to support non-normal floats,
even if it is technically possible to do the round-trip in the current
format.

On the reading side, the macro nesting has gotten a bit out of hand. :)
We had talked earlier in the thread about the _DIRECT macros and you
said there were left over from something else you want to try, but I see
nothing else in this patch set uses this. I think this could all be
much simpler, like (omitting required punctuation)

[...]

Not only is this simpler, but it might also have better performance,
because we don't have separate pg_strtok_next() and pg_strtok() calls in
sequence.

Good points. I'll see what I can do here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#20

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Matthias van de Meent (#19)

Re: Reducing output size of nodeToString

On Thu, 15 Feb 2024 at 15:37, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Thu, 15 Feb 2024 at 13:59, Peter Eisentraut <peter@eisentraut.org> wrote:

Thanks, this patch set is a good way to incrementally work through these
changes.

I have looked at
v4-0001-pg_node_tree-Omit-serialization-of-fields-with-de.patch today.
Here are my thoughts:

I believe we had discussed offline to not omit enum fields with value 0
(WRITE_ENUM_FIELD). This is because the values of enum fields are
implementation artifacts, and this could be confusing for readers.

Thanks for reminding me, I didn't remember this when I worked on
updating the patchset. I'll update this soon.

This has been split into patch 0008 in the set. A query on ev_action
shows that enum default-0-omission is effective on 1994 fields:

On the reading side, the macro nesting has gotten a bit out of hand. :)
We had talked earlier in the thread about the _DIRECT macros and you
said there were left over from something else you want to try, but I see
nothing else in this patch set uses this. I think this could all be
much simpler, like (omitting required punctuation)

[...]

Not only is this simpler, but it might also have better performance,
because we don't have separate pg_strtok_next() and pg_strtok() calls in
sequence.

Good points. I'll see what I can do here.

Attached the updated version of the patch on top of 5497daf3, which
incorporates this last round of feedback. It moves the
default-0-omission for Enums to newly added 0008, and checks the sign
to deal with +0/-0 issues in float default checks.
See below for updated numbers.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

New numbers:

select 'master' as "version"
, pg_database_size('template0') as "template0"
, pg_total_relation_size('pg_rewrite') as "rel_total"
, pg_relation_size('pg_rewrite', 'main') as "rel_main"
, sum(pg_column_size(ev_action)) as "toasted"
, sum(octet_length(ev_action)) as "raw"
from pg_rewrite;

version | template0 | rel_total | rel_main | toasted | raw
---------+-----------+-----------+----------+---------+---------
master | 7528975 | 770048 | 114688 | 574051 | 3002981
0001 | 7348751 | 630784 | 131072 | 448495 | 1972854
0002 | 7250447 | 589824 | 131072 | 412261 | 1866880
0003 | 7242255 | 581632 | 131072 | 410476 | 1864843
0004 | 7225871 | 565248 | 139264 | 393801 | 1678735
0005 | 7225871 | 565248 | 139264 | 393556 | 1675165
0006 | 7217679 | 557056 | 139264 | 379062 | 1654178
0007 | 7160335 | 491520 | 155648 | 322145 | 1363885
0008 | 7135759 | 475136 | 155648 | 311294 | 1337649