Refactoring pg_dump's getTables()

Started by Tom Laneover 4 years ago6 messageshackers

tgl@sss.pgh.pa.us

over 4 years ago

Attached is a proposed patch that refactors getTables() along the
same lines as some previous work (eg 047329624, ed2c7f65b, daa9fe8a5)
to avoid having multiple partially-redundant copies of the SQL query.
This gets rid of nearly 300 lines of duplicative spaghetti code,
creates a uniform style for dealing with cross-version changes
(replacing at least three different methods currently being used
for that in this same stretch of code), and allows moving some
comments to be closer to the code they describe.

There's a lot I still want to change here, but this part seems like it
should be fairly uncontroversial. I've tested it against servers back
to 8.0 (which is what led me to trip over the bug fixed in 40dfac4fc).
Any objections to just pushing it?

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

over 4 years ago

In reply to: Tom Lane (#1)

Re: Refactoring pg_dump's getTables()

On 2021-Oct-16, Tom Lane wrote:

Attached is a proposed patch that refactors getTables() along the
same lines as some previous work (eg 047329624, ed2c7f65b, daa9fe8a5)
to avoid having multiple partially-redundant copies of the SQL query.
This gets rid of nearly 300 lines of duplicative spaghetti code,
creates a uniform style for dealing with cross-version changes
(replacing at least three different methods currently being used
for that in this same stretch of code), and allows moving some
comments to be closer to the code they describe.

Yeah, this seems a lot better than the original coding. Maybe I would
group together the changes that all require the same version test,
rather than keeping the output columns in the same order. This reduces
the number of branches. Because the follow-on code uses column names
rather than numbers, there is no reason to keep related columns
together. But it's a clear improvement even without that.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Alvaro Herrera (#2)

Re: Refactoring pg_dump's getTables()

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Yeah, this seems a lot better than the original coding. Maybe I would
group together the changes that all require the same version test,
rather than keeping the output columns in the same order. This reduces
the number of branches. Because the follow-on code uses column names
rather than numbers, there is no reason to keep related columns
together. But it's a clear improvement even without that.

Yeah, I thought about rearranging the code order some more, but
desisted since it'd make the patch footprint a bit bigger (I'd
want to make all the related stanzas list things in a uniform
order). But maybe we should just do that.

regards, tom lane

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Alvaro Herrera (#2)

Re: Refactoring pg_dump's getTables()

On 17 Oct 2021, at 22:05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Oct-16, Tom Lane wrote:

Attached is a proposed patch that refactors getTables() along the
same lines as some previous work (eg 047329624, ed2c7f65b, daa9fe8a5)
to avoid having multiple partially-redundant copies of the SQL query.
This gets rid of nearly 300 lines of duplicative spaghetti code,
creates a uniform style for dealing with cross-version changes
(replacing at least three different methods currently being used
for that in this same stretch of code), and allows moving some
comments to be closer to the code they describe.

Yeah, this seems a lot better than the original coding.

Maybe I would group together the changes that all require the same version
test, rather than keeping the output columns in the same order.

I agree with that, if we're doing all this we might as well go all the way for
readability.

--
Daniel Gustafsson https://vmware.com/

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Daniel Gustafsson (#4)

Re: Refactoring pg_dump's getTables()

Daniel Gustafsson <daniel@yesql.se> writes:

On 17 Oct 2021, at 22:05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Maybe I would group together the changes that all require the same version
test, rather than keeping the output columns in the same order.

I agree with that, if we're doing all this we might as well go all the way for
readability.

OK, I'll make it so.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Daniel Gustafsson (#4)

Re: Refactoring pg_dump's getTables()

Daniel Gustafsson <daniel@yesql.se> writes:

On 17 Oct 2021, at 22:05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Maybe I would group together the changes that all require the same version
test, rather than keeping the output columns in the same order.

I agree with that, if we're doing all this we might as well go all the way for
readability.

I had a go at doing that, but soon decided that it wasn't as great an
idea as it first seemed. There are two problems:

1. It's not clear what to do with fields where there are three or more
variants, such as reloptions and checkoptions.

2. Any time we modify the behavior for a particular field, we'd
have to merge or un-merge it from the stanza for the
previously-most-recently-relevant version. This seems like it'd
be a maintenance headache and make patch footprints bigger than they
needed to be.

So I ended up not doing very much merging. I did make an effort
to group the fields in perhaps a slightly more logical order
than before.

regards, tom lane

Refactoring pg_dump's getTables()

Attachments: