Speeding up ruleutils' name de-duplication code, redux

Started by Tom Lanealmost 2 years ago5 messageshackers

tgl@sss.pgh.pa.us

almost 2 years ago

When deparsing queries or expressions, ruleutils.c has to generate
unique names for RTEs and columns of RTEs. (Often, they're unique
already, but this isn't reliably true.) The original logic for that
involved just strcmp'ing a proposed name against all the ones already
assigned, which obviously is O(N^2) in the number of names being
considered. Back in commit 8004953b5, we fixed that problem for
generation of unique RTE names, by using a hash table to remember the
already-assigned names. However, that commit did not touch the logic
for de-duplicating the column names within each RTE, explaining

In principle the same problem applies to the column-name-de-duplication
code; but in practice that seems to be less of a problem, first because
N is limited since we don't support extremely wide tables, and second
because duplicate column names within an RTE are fairly rare, so that in
practice the cost is more like O(N^2) not O(N^3). It would be very much
messier to fix the column-name code, so for now I've left that alone.

But I think the time has come to do something about it. In [1]/messages/by-id/2422717.1722201869@sss.pgh.pa.us
I presented this Perl script to generate a database that gives
pg_upgrade indigestion:

-----
for (my $i = 0; $i < 100; $i++)
{
print "CREATE TABLE test_inh_check$i (\n";
for (my $j = 0; $j < 1000; $j++)
{
print "a$j float check (a$j > 10.2),\n";
}
print "b float);\n";
print "CREATE TABLE test_inh_check_child$i() INHERITS(test_inh_check$i);\n";
}
-----

On my development machine, it takes over 14 minutes to pg_upgrade
this, and it turns out that that time is largely spent in column
name de-duplication while deparsing the CHECK constraints. The
attached patch reduces that to about 3m45s.

(I think that we ought to reconsider MergeConstraintsIntoExisting's
use of deparsing to compare check constraints: it'd be faster and
probably more reliable to apply attnum translation to one parsetree
and then use equal(). But that's a matter for a different patch, and
this patch would still be useful for the pg_dump side of the problem.)

I was able to avoid a lot of the complexity I'd feared before by not
attempting to use hashing during set_using_names(), which only has to
consider columns merged by USING clauses, so it shouldn't have enough
of a performance problem to be worth touching. The hashing code needs
to be optional anyway because it's unlikely to be a win for narrow
tables, so we can simply ignore it until we reach the potentially
expensive steps. Also, things are already factored in such a way that
we only need to have one hashtable at a time, so this shouldn't cause
any large memory bloat.

I'll park this in the next CF.

regards, tom lane

[1]: /messages/by-id/2422717.1722201869@sss.pgh.pa.us

David Rowley

dgrowleyml@gmail.com

almost 2 years ago

In reply to: Tom Lane (#1)

Re: Speeding up ruleutils' name de-duplication code, redux

On Tue, 30 Jul 2024 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

-----
for (my $i = 0; $i < 100; $i++)
{
print "CREATE TABLE test_inh_check$i (\n";
for (my $j = 0; $j < 1000; $j++)
{
print "a$j float check (a$j > 10.2),\n";
}
print "b float);\n";
print "CREATE TABLE test_inh_check_child$i() INHERITS(test_inh_check$i);\n";
}
-----

On my development machine, it takes over 14 minutes to pg_upgrade
this, and it turns out that that time is largely spent in column
name de-duplication while deparsing the CHECK constraints. The
attached patch reduces that to about 3m45s.

I think this is worth doing. Reducing the --schema-only time in
pg_dump is a worthy goal to reduce downtime during upgrades.

I looked at the patch and tried it out. I wondered about the choice of
32 as the cut-off point so decided to benchmark using the attached
script. Here's an extract from the attached results:

Patched with 10 child tables
pg_dump for 16 columns real 0m0.068s
pg_dump for 31 columns real 0m0.080s
pg_dump for 32 columns real 0m0.083s

This gives me what I'd expect to see. I wanted to ensure the point
where you're switching to the hashing method was about the right
place. It seems to be, at least for my test.

The performance looks good too:

10 tables:
master: pg_dump for 1024 columns real 0m23.053s
patched: pg_dump for 1024 columns real 0m1.573s

100 tables:
master: pg_dump for 1024 columns real 3m29.857s
patched: pg_dump for 1024 columns real 0m23.053s

Perhaps you don't think it's worth the additional complexity, but I
see that in both locations you're calling build_colinfo_names_hash(),
it's done just after a call to expand_colnames_array_to(). I wondered
if it was worthwhile unifying both of those functions maybe with a new
name so that you don't need to loop over the always NULL element of
the colnames[] array when building the hash table. This is likely
quite a small overhead compared to the quadratic search you've
removed, so it might not move the needle any. I just wanted to point
it out as I've little else I can find to comment on.

David

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: David Rowley (#2)

Re: Speeding up ruleutils' name de-duplication code, redux

David Rowley <dgrowleyml@gmail.com> writes:

On Tue, 30 Jul 2024 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

On my development machine, it takes over 14 minutes to pg_upgrade
this, and it turns out that that time is largely spent in column
name de-duplication while deparsing the CHECK constraints. The
attached patch reduces that to about 3m45s.

I looked at the patch and tried it out.

Thanks for looking!

This gives me what I'd expect to see. I wanted to ensure the point
where you're switching to the hashing method was about the right
place. It seems to be, at least for my test.

Yeah, I was just going by gut feel there. It's good to have some
numbers showing it's not a totally silly choice.

Perhaps you don't think it's worth the additional complexity, but I
see that in both locations you're calling build_colinfo_names_hash(),
it's done just after a call to expand_colnames_array_to(). I wondered
if it was worthwhile unifying both of those functions maybe with a new
name so that you don't need to loop over the always NULL element of
the colnames[] array when building the hash table. This is likely
quite a small overhead compared to the quadratic search you've
removed, so it might not move the needle any. I just wanted to point
it out as I've little else I can find to comment on.

Hmm, but there are quite a few expand_colnames_array_to calls that
are not associated with build_colinfo_names_hash. On the whole it
feels like those are separate concerns that are better kept separate.

We could accomplish what you suggest by re-ordering the calls so that
we build the hash table before enlarging the array. 0001 attached
is the same as before (modulo line number changes from being rebased
up to HEAD) and then 0002 implements this idea on top. On the whole
though I find 0002 fairly ugly and would prefer to stick to 0001.
I really doubt that scanning any newly-created column positions is
going to take long enough to justify intertwining things like this.

regards, tom lane

David Rowley

dgrowleyml@gmail.com

almost 2 years ago

In reply to: Tom Lane (#3)

Re: Speeding up ruleutils' name de-duplication code, redux

On Wed, 11 Sept 2024 at 03:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We could accomplish what you suggest by re-ordering the calls so that
we build the hash table before enlarging the array. 0001 attached
is the same as before (modulo line number changes from being rebased
up to HEAD) and then 0002 implements this idea on top. On the whole
though I find 0002 fairly ugly and would prefer to stick to 0001.
I really doubt that scanning any newly-created column positions is
going to take long enough to justify intertwining things like this.

I'm fine with that. I did test the performance with and without
v2-0002 and the performance is just a little too noisy to tell. Both
runs I did with v2-0002, it was slower, so I agree it's not worth
making the code uglier for.

I've no more comments. Looks good.

David

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: David Rowley (#4)

Re: Speeding up ruleutils' name de-duplication code, redux

David Rowley <dgrowleyml@gmail.com> writes:

On Wed, 11 Sept 2024 at 03:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We could accomplish what you suggest by re-ordering the calls so that
we build the hash table before enlarging the array. 0001 attached
is the same as before (modulo line number changes from being rebased
up to HEAD) and then 0002 implements this idea on top. On the whole
though I find 0002 fairly ugly and would prefer to stick to 0001.
I really doubt that scanning any newly-created column positions is
going to take long enough to justify intertwining things like this.

I'm fine with that. I did test the performance with and without
v2-0002 and the performance is just a little too noisy to tell. Both
runs I did with v2-0002, it was slower, so I agree it's not worth
making the code uglier for.
I've no more comments. Looks good.

Thanks for the review! I'll go push just 0001.

regards, tom lane

Speeding up ruleutils' name de-duplication code, redux

Attachments:

Attachments:

Attachments: