small pg_dump code cleanup

Started by Nathan Bossartalmost 2 years ago8 messageshackers
Jump to latest
#1Nathan Bossart
nathandbossart@gmail.com

While reviewing Daniel's pg_dump patch [0]/messages/by-id/8F1F1E1D-D17B-4B33-B014-EDBCD15F3F0B@yesql.se, I was initially confused
because the return value of getTypes() isn't saved anywhere. Once I found
commit 92316a4, I realized that data was actually stored in a separate hash
table. In fact, many of the functions in this area don't actually need to
return anything, so we can trim some code and hopefully reduce confusion a
bit. Patch attached.

[0]: /messages/by-id/8F1F1E1D-D17B-4B33-B014-EDBCD15F3F0B@yesql.se

--
nathan

Attachments:

v1-0001-Trim-some-unnecessary-pg_dump-code.patchtext/plain; charset=us-asciiDownload+113-288
#2Neil Conway
neilc@samurai.com
In reply to: Nathan Bossart (#1)
Re: small pg_dump code cleanup

On Wed, Jun 5, 2024 at 11:14 AM Nathan Bossart <nathandbossart@gmail.com>
wrote:

In fact, many of the functions in this area don't actually need to

return anything, so we can trim some code and hopefully reduce confusion a

bit. Patch attached.

Nice cleanup! Two minor comments:

(1) Names like `getXXX` for these functions suggest to me that they return
a value, rather than side-effecting. I realize some variants continue to
return a value, but the majority no longer do. Perhaps a name like
lookupXXX() or readXXX() would be clearer?

(2) These functions malloc() a single ntups * sizeof(struct) allocation and
then index into it to fill-in each struct before entering it into the hash
table. It might be more straightforward to just malloc each individual
struct.

Neil

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Neil Conway (#2)
Re: small pg_dump code cleanup

On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:

Nice cleanup! Two minor comments:

Thanks for taking a look.

(1) Names like `getXXX` for these functions suggest to me that they return
a value, rather than side-effecting. I realize some variants continue to
return a value, but the majority no longer do. Perhaps a name like
lookupXXX() or readXXX() would be clearer?

What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?

(2) These functions malloc() a single ntups * sizeof(struct) allocation and
then index into it to fill-in each struct before entering it into the hash
table. It might be more straightforward to just malloc each individual
struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios. At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

--
nathan

#4Neil Conway
neilc@samurai.com
In reply to: Nathan Bossart (#3)
Re: small pg_dump code cleanup

On Wed, Jun 5, 2024 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?

sgtm.

(2) These functions malloc() a single ntups * sizeof(struct) allocation

and

then index into it to fill-in each struct before entering it into the

hash

table. It might be more straightforward to just malloc each individual
struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios. At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

As you say, I'd be surprised if the performance difference is noticeable.
Personally I don't think the marginal performance win justifies the hit to
readability, but I don't feel strongly about it.

Neil

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#3)
Re: small pg_dump code cleanup

Nathan Bossart <nathandbossart@gmail.com> writes:

On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:

(1) Names like `getXXX` for these functions suggest to me that they return
a value, rather than side-effecting. I realize some variants continue to
return a value, but the majority no longer do. Perhaps a name like
lookupXXX() or readXXX() would be clearer?

What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?

Personally I see nothing much wrong with leaving them as getXXX.

(2) These functions malloc() a single ntups * sizeof(struct) allocation and
then index into it to fill-in each struct before entering it into the hash
table. It might be more straightforward to just malloc each individual
struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios. At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

I think that would be quite an invasive change; it would require
many hundreds of edits like

-		finfo[i].dobj.objType = DO_FUNC;
+		finfo->dobj.objType = DO_FUNC;

which aside from being tedious would create a back-patching hazard.
So I'm kind of -0.1 or so.

Another angle to this is that Coverity and possibly other tools tend
to report that these functions leak these allocations, apparently
because they don't notice that pointers into the allocations get
stored in hash tables by a subroutine. I'm not sure if making this
change would make that worse or better. If we really want to change
it, that might be worth checking somehow before we jump.

regards, tom lane

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#5)
Re: small pg_dump code cleanup

On Wed, Jun 05, 2024 at 01:58:54PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:

(2) These functions malloc() a single ntups * sizeof(struct) allocation and
then index into it to fill-in each struct before entering it into the hash
table. It might be more straightforward to just malloc each individual
struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios. At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

I think that would be quite an invasive change; it would require
many hundreds of edits like

-		finfo[i].dobj.objType = DO_FUNC;
+		finfo->dobj.objType = DO_FUNC;

which aside from being tedious would create a back-patching hazard.
So I'm kind of -0.1 or so.

Another angle to this is that Coverity and possibly other tools tend
to report that these functions leak these allocations, apparently
because they don't notice that pointers into the allocations get
stored in hash tables by a subroutine. I'm not sure if making this
change would make that worse or better. If we really want to change
it, that might be worth checking somehow before we jump.

At the moment, I'm inclined to commit v1 once v18 development opens up. We
can consider any additional adjustments separately.

--
nathan

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Nathan Bossart (#6)
Re: small pg_dump code cleanup

On 11 Jun 2024, at 22:30, Nathan Bossart <nathandbossart@gmail.com> wrote:

At the moment, I'm inclined to commit v1 once v18 development opens up. We
can consider any additional adjustments separately.

Patch LGTM and the tests pass, +1 on pushing this version.

--
Daniel Gustafsson

#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Daniel Gustafsson (#7)
Re: small pg_dump code cleanup

Committed.

--
nathan