pgbench client-side performance issue on large scripts

Started by Daniel Veriteover 1 year ago8 messageshackers

daniel@manitou-mail.org

over 1 year ago

Hi,

On large scripts, pgbench happens to consume a lot of CPU time.
For instance, with a script consisting of 50000 "SELECT 1;"
I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time,
out of a total time of 6.7 secs. When run with perf, this profile shows up:

81,10% pgbench pgbench [.] expr_scanner_get_lineno
0,36% pgbench [unknown] [k] 0xffffffffac90410b
0,33% pgbench [unknown] [k] 0xffffffffac904109
0,23% pgbench libpq.so.5.18 [.] pqParseInput3
0,21% pgbench [unknown] [k] 0xffffffffac800080
0,17% pgbench pgbench [.] advanceConnectionState
0,15% pgbench [unknown] [k] 0xffffffffac904104
0,14% pgbench libpq.so.5.18 [.] PQmakeEmptyPGresult

In ParseScript(), expr_scanner_get_lineno() is called for each line
with its current offset, and it scans the script from the beginning
up to the current line. I think that on the whole, parsing this script
ends up looking at (N*(N+1))/2 lines, which is 1.275 billion lines
if N=50000.
Since it only need the current line number in case of certain errors
with \gset, I've made the trivial attached fix calling
expr_scanner_get_lineno() only in these cases.
This moves the CPU consumption to a more reasonable 0.1s in the
above test case (with the drawback of having the line number pointing
one line after).

However, there is another caller, process_backslash_command()
which is not amenable to the same kind of easy fix. A large script
having backslash commands near the end is also penalized, and it
would be nice to fix that as well.

I wonder whether pgbench should materialize the current line number
in a variable, as psql does in pset.lineno. But given that there are
two different parsers in pgbench, maybe it's not the simplest.
Flex has yylineno but neither pgbench nor psql make use of it.
I thought I would ask before going further into this, as there might
be a better method that I don't see, being unfamiliar with that code
and with flex/bison.
WDYT?

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Daniel Verite (#1)

Re: pgbench client-side performance issue on large scripts

"Daniel Verite" <daniel@manitou-mail.org> writes:

On large scripts, pgbench happens to consume a lot of CPU time.
For instance, with a script consisting of 50000 "SELECT 1;"
I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time,
out of a total time of 6.7 secs. When run with perf, this profile shows up:

You ran only a single execution of a 50K-line script? This test
case feels a little bit artificial. Having said that ...

In ParseScript(), expr_scanner_get_lineno() is called for each line
with its current offset, and it scans the script from the beginning
up to the current line. I think that on the whole, parsing this script
ends up looking at (N*(N+1))/2 lines, which is 1.275 billion lines
if N=50000.

... yes, O(N^2) is not nice. It has to be possible to do better.

I wonder whether pgbench should materialize the current line number
in a variable, as psql does in pset.lineno. But given that there are
two different parsers in pgbench, maybe it's not the simplest.
Flex has yylineno but neither pgbench nor psql make use of it.

Yeah, we do rely on yylineno in bootscanner.l and ecpg, but not
elsewhere; not sure if there's a performance reason for that.
I see that plpgsql has a hand-rolled version (look for cur_line_num)
that perhaps could be stolen.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Tom Lane (#2)

Re: pgbench client-side performance issue on large scripts

I wrote:

Yeah, we do rely on yylineno in bootscanner.l and ecpg, but not
elsewhere; not sure if there's a performance reason for that.

Ah: the flex manual specifically calls out "%option yylineno"
as something that has a moderate performance cost. So that's
why we don't use it except in non-performance-critical scanners.

Now, it could be argued that pgbench's script scanner doesn't
rise to that level of performance-criticalness, especially not
if we're paying the cost of counting newlines some other way.
I'm not excited about doing a lot of performance analysis here
to decide that. I think we could steal plpgsql's idea to
keep the code structure basically the same while avoiding the
O(N^2) repeat scans, and that should be enough.

regards, tom lane

Daniel Verite

daniel@manitou-mail.org

over 1 year ago

In reply to: Tom Lane (#2)

Re: pgbench client-side performance issue on large scripts

Tom Lane wrote:

I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time,
out of a total time of 6.7 secs. When run with perf, this profile shows up:

You ran only a single execution of a 50K-line script? This test
case feels a little bit artificial. Having said that ...

I guess my use case is unusual, otherwise the O(N^2) parse
time would have been noticed sooner, but it's genuine.

I want to see how much a long sequence of statement inside a
pipeline differs between pgbench and psql with
the proposed patch at [1]https://commitfest.postgresql.org/patch/5407/ that implements pipelining in psql.
psql would not actually send statements until \endpipeline is
reached, whereas pgbench does.
In fact I'd be inclined to push much more statements in the pipeline
than 50k, but then the parse time issue really kicks in.

For the moment I'll stay with my quick fix, then l'll try
to come up with something to replace expr_scanner_get_lineno() .

[1]: https://commitfest.postgresql.org/patch/5407/

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Daniel Verite (#4)

Re: pgbench client-side performance issue on large scripts

"Daniel Verite" <daniel@manitou-mail.org> writes:

For the moment I'll stay with my quick fix, then l'll try
to come up with something to replace expr_scanner_get_lineno() .

I got nerd-sniped by this question and spent some time looking into
it. ParseScript has got worse problems than just being slow: it's
actively buggy. Notice that start_offset is set only once before
entering the loop, and doesn't change thereafter. How is it that
we're getting sane line numbers at all? The reason is that (1) if
we've not called yylex() at all yet, expr_scanner_offset() gives the
distance to the end of the string, since the yytext-ending NUL it's
looking for isn't there yet; and (2) expr_scanner_get_lineno() treats
the given start_offset as an upper bound, and won't complain if it
finds the NUL earlier than that. So it gave the desired
line-number-of-the-current-token on all iterations after the first,
but on the first time through we get the line number of the script
end. You can only see that in the case of \gset as the first command,
and I guess nobody noticed it yet.

Furthermore, it's not only ParseScript that's got O(N^2) problems;
so does process_backslash_command. Your test case didn't show that
up, but a test with 50K backslash commands would. We were actually
doing a strlen() of the whole string for each word of a backslash
command. strlen() is likely faster than expr_scanner_get_lineno(),
but it's not so fast that O(N^2) effects don't matter.

The attached patch gets rid of both expr_scanner_offset() and
expr_scanner_get_lineno() altogether, in favor of using a new
function I added to psqlscan.l. That uses the idea from plpgsql
of tracking the last-detected line end so that we don't have to
rescan prior lines over and over. On my machine, parsing 50K-line
scripts goes from more than 10 seconds to perhaps 50 ms.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Tom Lane (#5)

Re: pgbench client-side performance issue on large scripts

I wrote:

I got nerd-sniped by this question and spent some time looking into
it.

On second look, I'd failed to absorb your point about how the main
loop of ParseScript doesn't need the line number at all; only if
it's a backslash command are we going to use that. So we can
move the calculation to be done only after we see a backslash.

I'd spent a little time worrying about how the calculation was
really giving a wrong line number: typically, it'd return the
line number of the previous semicolon, since we haven't lexed
any further than that. That could be fixed with more code,
but it's pretty pointless if we don't need the value in the
first place.

Also, I did a tiny bit of micro-optimization in the first
patch to remove the hazard that it'd still be O(N^2) with
very long input lines.

regards, tom lane

Daniel Verite

daniel@manitou-mail.org

over 1 year ago

In reply to: Tom Lane (#6)

Re: pgbench client-side performance issue on large scripts

Tom Lane wrote:

I got nerd-sniped by this question and spent some time looking into
it.

Thank you for the patch! LGTM.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Daniel Verite (#7)

Re: pgbench client-side performance issue on large scripts

"Daniel Verite" <daniel@manitou-mail.org> writes:

Tom Lane wrote:

I got nerd-sniped by this question and spent some time looking into
it.

Thank you for the patch! LGTM.

Thanks for reviewing! Pushed after a tiny bit more polishing.

regards, tom lane

pgbench client-side performance issue on large scripts

Attachments:

Attachments:

Attachments: