generic plans and "initial" pruning

Started by Amit Langoteover 4 years ago246 messageshackers

Langote_Amit_f8@lab.ntt.co.jp

over 4 years ago

Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.

Previous attempt at solving that problem was by David Rowley [1]/messages/by-id/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com,
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2]/messages/by-id/CAKJS1f99JNe+sw5E3qWmS+HeLMFaAhehKO67J1Ym3pXv0XBsxw@mail.gmail.com is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.

However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.

Note that "initial" pruning steps are now performed twice when
executing generic plans: once in AcquireExecutorLocks() to find
partitions to be locked, and a 2nd time in ExecInit[Merge]Append() to
determine the set of partition sub-nodes to be initialized for
execution, though I wasn't able to come up with a good idea to avoid
this duplication.

Using the following benchmark setup:

pgbench testdb -i --partitions=$nparts > /dev/null 2>&1
pgbench -n testdb -S -T 30 -Mprepared

And plan_cache_mode = force_generic_plan,

I get following numbers:

HEAD:

32 tps = 20561.776403 (without initial connection time)
64 tps = 12553.131423 (without initial connection time)
128 tps = 13330.365696 (without initial connection time)
256 tps = 8605.723120 (without initial connection time)
512 tps = 4435.951139 (without initial connection time)
1024 tps = 2346.902973 (without initial connection time)
2048 tps = 1334.680971 (without initial connection time)

Patched:

32 tps = 27554.156077 (without initial connection time)
64 tps = 27531.161310 (without initial connection time)
128 tps = 27138.305677 (without initial connection time)
256 tps = 25825.467724 (without initial connection time)
512 tps = 19864.386305 (without initial connection time)
1024 tps = 18742.668944 (without initial connection time)
2048 tps = 16312.412704 (without initial connection time)

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com

[2]: /messages/by-id/CAKJS1f99JNe+sw5E3qWmS+HeLMFaAhehKO67J1Ym3pXv0XBsxw@mail.gmail.com

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 4 years ago

In reply to: Amit Langote (#1)

Re: generic plans and "initial" pruning

On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09@gmail.com> wrote:

Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.

Previous attempt at solving that problem was by David Rowley [1],
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2] is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.

However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.

In which cases, we will have "pre-execution" pruning instructions that
can be used to skip locking partitions? Can you please give a few
examples where this approach will be useful?

The benchmark is showing good results, indeed.

--
Best Wishes,
Ashutosh Bapat

generic plans and "initial" pruning

Attachments:

Attachments:

Attachments:

Attachments: