computing completion tag is expensive for pgbench -S -M prepared

Started by Andres Freundover 7 years ago13 messages
#1Andres Freund
andres@anarazel.de
1 attachment(s)

Hi,

While looking at a profile I randomly noticed that we spend a surprising
amount of time in snprintf() and its subsidiary functions. That turns
out to be
if (strcmp(portal->commandTag, "SELECT") == 0)
snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
"SELECT " UINT64_FORMAT, nprocessed);

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

yields nearly a ~2% increase in TPS. Larger than I expected. The code
is obviously less pretty, but it's also not actually that bad.

Attached is the patch I used for benchmarking. I wonder if I just hit
some specific version of glibc that regressed snprintf performance, or
whether others can reproduce this.

If it actually reproducible, I think we should go for it. But update the
rest of the completionTag writes in the same file too.

Greetings,

Andres Freund

Attachments:

completion-tag.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 66cc5c35c68..88179e5754a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -24,6 +24,7 @@
 #include "pg_trace.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -780,8 +781,15 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
 				if (completionTag && portal->commandTag)
 				{
 					if (strcmp(portal->commandTag, "SELECT") == 0)
+					{
+#if 0
 						snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 								 "SELECT " UINT64_FORMAT, nprocessed);
+#else
+						memcpy(completionTag, "SELECT ", sizeof("SELECT "));
+						pg_lltoa(nprocessed, completionTag + 7);
+#endif
+					}
 					else
 						strcpy(completionTag, portal->commandTag);
 				}
#2David Rowley
david.rowley@2ndquadrant.com
In reply to: Andres Freund (#1)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#2)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Simon Riggs (#3)
Re: computing completion tag is expensive for pgbench -S -M prepared

2018-06-07 12:01 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT

"));

pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

Regards

Pavel

Show quoted text

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Pavel Stehule (#4)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: computing completion tag is expensive for pgbench -S -M prepared

Andres Freund <andres@anarazel.de> writes:

... That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

While I don't have any objection to this change if the speedup is
reproducible, I do object to spelling the same constant as
'sizeof("SELECT ")' and '7' on adjacent lines ...

regards, tom lane

#7Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#5)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

How do you mean it then? We can't really easily change how we return
that value on the protocol level, and the command tag is basically just
returned as a string in the protocol. If we were to design the protocol
today I'm sure we just would declare the rowcount and affected rowcounts
separate fields or something, but ...

Greetings,

Andres Freund

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 10:30:14 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

... That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

While I don't have any objection to this change if the speedup is
reproducible, I do object to spelling the same constant as
'sizeof("SELECT ")' and '7' on adjacent lines ...

Hah, yes. Nor would I want to keep the #if 0 around it ;). I mostly
wanted to know whether others can reproduce the effect - the actual
patch would need to be bigger and affect more places.

Greetings,

Andres Freund

#9Andres Freund
andres@anarazel.de
In reply to: David Rowley (#2)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 17:01:47 +1200, David Rowley wrote:

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I'm kinda surprised that I never noticed this until recently. I wonder
if there's a glibc or compiler regression causing this. There's some
string optimization passes, it could be that it now does less.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Generally sounds reasonable, I'm not sure it's worth to do the surgery
to avoid the strcmp(). That's a larger change that's somewhat
independent...

Greetings,

Andres Freund

#10Simon Riggs
simon@2ndquadrant.com
In reply to: Andres Freund (#7)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 19:20, Andres Freund <andres@anarazel.de> wrote:

On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

How do you mean it then? We can't really easily change how we return
that value on the protocol level, and the command tag is basically just
returned as a string in the protocol. If we were to design the protocol
today I'm sure we just would declare the rowcount and affected rowcounts
separate fields or something, but ...

I meant remove the pointless noise word at the start of the tag that
few clients care about.

I was thinking of just returning "SQL" instead, but that wasn't after
much thought.

But now I think about it more returning any fixed string, "SQL" or
"SELECT", in the protocol seems like a waste of bandwidth and a
potential route in to decrypt the stream.

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#10)
Re: computing completion tag is expensive for pgbench -S -M prepared

Simon Riggs <simon@2ndquadrant.com> writes:

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

Breaking the wire protocol was nowhere in this thread.

regards, tom lane

#12Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

Breaking the wire protocol was nowhere in this thread.

No, it wasn't. But there is another thread on the subject of
compressing libpq, which is what I was referring to.

Andres' patch is clearly very efficient at adding the SELECT tag. I am
questioning if we can remove that need altogether.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#12)
Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 20:34:39 +0100, Simon Riggs wrote:

On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

Breaking the wire protocol was nowhere in this thread.

No, it wasn't. But there is another thread on the subject of
compressing libpq, which is what I was referring to.

Andres' patch is clearly very efficient at adding the SELECT tag. I am
questioning if we can remove that need altogether.

That'd be a wire protocol break. We'd have to add compatibilities for
both things in the client, wait a couple years, and then change. Or we
could make it an optional thing based on a client option passed at
connect. Which'd also take a good while. Those seem extremely
disproportionate complicated solutions for the problem. Nor can I
believe that a "SELECT " in the resultset is a meaningful space issue,
making it even worthwhile to break compat in the first place.

Greetings,

Andres Freund