computing completion tag is expensive for pgbench -S -M prepared

Started by Andres Freundover 7 years ago13 messages

andres@anarazel.de

over 7 years ago

1 attachment(s)

Hi,

While looking at a profile I randomly noticed that we spend a surprising
amount of time in snprintf() and its subsidiary functions. That turns
out to be
if (strcmp(portal->commandTag, "SELECT") == 0)
snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
"SELECT " UINT64_FORMAT, nprocessed);

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

yields nearly a ~2% increase in TPS. Larger than I expected. The code
is obviously less pretty, but it's also not actually that bad.

Attached is the patch I used for benchmarking. I wonder if I just hit
some specific version of glibc that regressed snprintf performance, or
whether others can reproduce this.

If it actually reproducible, I think we should go for it. But update the
rest of the completionTag writes in the same file too.

Greetings,

Andres Freund

Attachments:

completion-tag.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 66cc5c35c68..88179e5754a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -24,6 +24,7 @@
 #include "pg_trace.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -780,8 +781,15 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
 				if (completionTag && portal->commandTag)
 				{
 					if (strcmp(portal->commandTag, "SELECT") == 0)
+					{
+#if 0
 						snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 								 "SELECT " UINT64_FORMAT, nprocessed);
+#else
+						memcpy(completionTag, "SELECT ", sizeof("SELECT "));
+						pg_lltoa(nprocessed, completionTag + 7);
+#endif
+					}
 					else
 						strcpy(completionTag, portal->commandTag);
 				}

David Rowley

david.rowley@2ndquadrant.com

over 7 years ago

In reply to: Andres Freund (#1)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Simon Riggs

simon@2ndquadrant.com

over 7 years ago

In reply to: David Rowley (#2)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Simon Riggs (#3)

Re: computing completion tag is expensive for pgbench -S -M prepared

2018-06-07 12:01 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT

"));

pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

Regards

Pavel

Show quoted text

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Simon Riggs

simon@2ndquadrant.com

over 7 years ago

In reply to: Pavel Stehule (#4)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#1)

Re: computing completion tag is expensive for pgbench -S -M prepared

Andres Freund <andres@anarazel.de> writes:

... That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

While I don't have any objection to this change if the speedup is
reproducible, I do object to spelling the same constant as
'sizeof("SELECT ")' and '7' on adjacent lines ...

regards, tom lane

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Simon Riggs (#5)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

How do you mean it then? We can't really easily change how we return
that value on the protocol level, and the command tag is basically just
returned as a string in the protocol. If we were to design the protocol
today I'm sure we just would declare the rowcount and affected rowcounts
separate fields or something, but ...

Greetings,

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#6)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 10:30:14 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

... That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

While I don't have any objection to this change if the speedup is
reproducible, I do object to spelling the same constant as
'sizeof("SELECT ")' and '7' on adjacent lines ...

Hah, yes. Nor would I want to keep the #if 0 around it ;). I mostly
wanted to know whether others can reproduce the effect - the actual
patch would need to be bigger and affect more places.

Greetings,

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: David Rowley (#2)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 17:01:47 +1200, David Rowley wrote:

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:

in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:

memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I'm kinda surprised that I never noticed this until recently. I wonder
if there's a glibc or compiler regression causing this. There's some
string optimization passes, it could be that it now does less.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Generally sounds reasonable, I'm not sure it's worth to do the surgery
to avoid the strcmp(). That's a larger change that's somewhat
independent...

Greetings,

Andres Freund

#10

Simon Riggs

simon@2ndquadrant.com

over 7 years ago

In reply to: Andres Freund (#7)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 19:20, Andres Freund <andres@anarazel.de> wrote:

On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

How do you mean it then? We can't really easily change how we return
that value on the protocol level, and the command tag is basically just
returned as a string in the protocol. If we were to design the protocol
today I'm sure we just would declare the rowcount and affected rowcounts
separate fields or something, but ...

I meant remove the pointless noise word at the start of the tag that
few clients care about.

I was thinking of just returning "SQL" instead, but that wasn't after
much thought.

But now I think about it more returning any fixed string, "SQL" or
"SELECT", in the protocol seems like a waste of bandwidth and a
potential route in to decrypt the stream.

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Simon Riggs (#10)

Re: computing completion tag is expensive for pgbench -S -M prepared

Simon Riggs <simon@2ndquadrant.com> writes:

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

Breaking the wire protocol was nowhere in this thread.

regards, tom lane

#12

Simon Riggs

simon@2ndquadrant.com

over 7 years ago

In reply to: Tom Lane (#11)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

Breaking the wire protocol was nowhere in this thread.

No, it wasn't. But there is another thread on the subject of
compressing libpq, which is what I was referring to.

Andres' patch is clearly very efficient at adding the SELECT tag. I am
questioning if we can remove that need altogether.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Simon Riggs (#12)

Re: computing completion tag is expensive for pgbench -S -M prepared

On 2018-06-07 20:34:39 +0100, Simon Riggs wrote:

On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

Breaking the wire protocol was nowhere in this thread.

No, it wasn't. But there is another thread on the subject of
compressing libpq, which is what I was referring to.

Andres' patch is clearly very efficient at adding the SELECT tag. I am
questioning if we can remove that need altogether.

That'd be a wire protocol break. We'd have to add compatibilities for
both things in the client, wait a couple years, and then change. Or we
could make it an optional thing based on a client option passed at
connect. Which'd also take a good while. Those seem extremely
disproportionate complicated solutions for the problem. Nor can I
believe that a "SELECT " in the resultset is a meaningful space issue,
making it even worthwhile to break compat in the first place.

Greetings,

Andres Freund