libpq: PQgetCopyData() and allocation overhead

Started by Jeroen Vermeulenabout 3 years ago20 messageshackers

jtvjtv@gmail.com

about 3 years ago

Would there be interest in a variant of PQgetCopyData() that re-uses the
same buffer for a new row, rather than allocating a new buffer on each
iteration?

I tried it on a toy benchmark, and it reduced client-side CPU time by about
12%. (Less of a difference in wall-clock time of course; the client was
only using the CPU for a bit over half the time.)

Jeroen

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 3 years ago

In reply to: Jeroen Vermeulen (#1)

Re: libpq: PQgetCopyData() and allocation overhead

On Fri, Feb 10, 2023 at 3:43 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

Would there be interest in a variant of PQgetCopyData() that re-uses the same buffer for a new row, rather than allocating a new buffer on each iteration?

I tried it on a toy benchmark, and it reduced client-side CPU time by about 12%. (Less of a difference in wall-clock time of course; the client was only using the CPU for a bit over half the time.)

Interesting. It might improve logical replication performance too as
it uses COPY protocol.

Do you mind sharing a patch, test case that you used and steps to
verify the benefit?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Bharath Rupireddy (#2)

Re: libpq: PQgetCopyData() and allocation overhead

Here's the patch (as a PR just to make it easy to read):
https://github.com/jtv/postgres/pull/1

I don't have an easily readable benchmark yet, since I've been timing the
potential impact on libpqxx. But can do that next.

Jeroen

On Fri, Feb 10, 2023, 11:26 Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Fri, Feb 10, 2023 at 3:43 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

Would there be interest in a variant of PQgetCopyData() that re-uses the

same buffer for a new row, rather than allocating a new buffer on each
iteration?

I tried it on a toy benchmark, and it reduced client-side CPU time by

about 12%. (Less of a difference in wall-clock time of course; the client
was only using the CPU for a bit over half the time.)

Interesting. It might improve logical replication performance too as
it uses COPY protocol.

Do you mind sharing a patch, test case that you used and steps to
verify the benefit?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Jeroen Vermeulen (#3)

Re: libpq: PQgetCopyData() and allocation overhead

OK, I've updated the PR with a benchmark (in the main directory).

On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and
a 8% reduction in "system" CPU time. (Almost no reduction in wall-clock
time.)

Jeroen

On Fri, 10 Feb 2023 at 11:32, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

Show quoted text

Here's the patch (as a PR just to make it easy to read):
https://github.com/jtv/postgres/pull/1

I don't have an easily readable benchmark yet, since I've been timing the
potential impact on libpqxx. But can do that next.

Jeroen

On Fri, Feb 10, 2023, 11:26 Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Feb 10, 2023 at 3:43 PM Jeroen Vermeulen <jtvjtv@gmail.com>
wrote:

Would there be interest in a variant of PQgetCopyData() that re-uses

the same buffer for a new row, rather than allocating a new buffer on each
iteration?

I tried it on a toy benchmark, and it reduced client-side CPU time by

about 12%. (Less of a difference in wall-clock time of course; the client
was only using the CPU for a bit over half the time.)

Interesting. It might improve logical replication performance too as
it uses COPY protocol.

Do you mind sharing a patch, test case that you used and steps to
verify the benefit?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 3 years ago

In reply to: Jeroen Vermeulen (#4)

Re: libpq: PQgetCopyData() and allocation overhead

On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

OK, I've updated the PR with a benchmark (in the main directory).

On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time. (Almost no reduction in wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Jelte Fennema-Nio

postgres@jeltef.nl

about 3 years ago

In reply to: Bharath Rupireddy (#5)

Re: libpq: PQgetCopyData() and allocation overhead

Instead of implementing new growable buffer logic in this patch. It
seems much nicer to reuse the already existing PQExpBuffer type for
this patch.

On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

OK, I've updated the PR with a benchmark (in the main directory).

On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time. (Almost no reduction in wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Bharath Rupireddy (#5)

Re: libpq: PQgetCopyData() and allocation overhead

Done. Thanks for looking!

Jelte Fennema pointed out that I should probably be using PQExpBuffer for
this. I'll look into that later; this is a proof of concept, not a
production-ready API proposal.

Jeroen

On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

OK, I've updated the PR with a benchmark (in the main directory).

On this benchmark I'm seeing about a 24% reduction in "user" CPU time,

and a 8% reduction in "system" CPU time. (Almost no reduction in
wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Jeroen Vermeulen (#7)

Re: libpq: PQgetCopyData() and allocation overhead

Update: in the latest iteration, I have an alternative that reduces CPU
time by more than half, compared to PQgetCopyData().

And the code is simpler, too, both in the client and in libpq itself. The
one downside is that it breaks with libpq's existing API style.

PR for easy discussion: https://github.com/jtv/postgres/pull/1

Jeroen

On Mon, 27 Feb 2023 at 17:08, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

Show quoted text

Done. Thanks for looking!

Jelte Fennema pointed out that I should probably be using PQExpBuffer for
this. I'll look into that later; this is a proof of concept, not a
production-ready API proposal.

Jeroen

On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com>
wrote:

OK, I've updated the PR with a benchmark (in the main directory).

On this benchmark I'm seeing about a 24% reduction in "user" CPU time,

and a 8% reduction in "system" CPU time. (Almost no reduction in
wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Daniel Gustafsson

daniel@yesql.se

about 3 years ago

In reply to: Jeroen Vermeulen (#8)

Re: libpq: PQgetCopyData() and allocation overhead

On 1 Mar 2023, at 15:23, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

PR for easy discussion: https://github.com/jtv/postgres/pull/1

The process for discussing work on pgsql-hackers is to attach the patch to the
email and discuss it inline in the thread. That way all versions of the patch
as well as the discussion is archived and searchable.

--
Daniel Gustafsson

#10

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Daniel Gustafsson (#9)

Re: libpq: PQgetCopyData() and allocation overhead

My apologies. The wiki said to discuss early, even before writing the code
if possible, but I added a link to the PR for those who really wanted to
see the details.

I'm attaching a diff now. This is not a patch, it's just a discussion
piece.

The problem was that PQgetCopyData loops use a lot of CPU time, and this
alternative reduces that by a lot.

Jeroen

On Thu, 2 Mar 2023 at 13:38, Daniel Gustafsson <daniel@yesql.se> wrote:

Show quoted text

On 1 Mar 2023, at 15:23, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

PR for easy discussion: https://github.com/jtv/postgres/pull/1

The process for discussing work on pgsql-hackers is to attach the patch to
the
email and discuss it inline in the thread. That way all versions of the
patch
as well as the discussion is archived and searchable.

--
Daniel Gustafsson

#11

Jelte Fennema-Nio

postgres@jeltef.nl

about 3 years ago

In reply to: Jeroen Vermeulen (#10)

Re: libpq: PQgetCopyData() and allocation overhead

On Thu, 2 Mar 2023 at 20:45, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

I'm attaching a diff now. This is not a patch, it's just a discussion piece.

Did you try with PQExpBuffer? I still think that probably fits better
in the API design of libpq. Obviously if it's significantly slower
than the callback approach in this patch then it's worth considering
using the callback approach. Overall it definitely seems reasonable to
me to have an API that doesn't do an allocation per row.

#12

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Jelte Fennema-Nio (#11)

Re: libpq: PQgetCopyData() and allocation overhead

Thank you.

I meant to try PQExpBuffer — as you say, it fits much better with existing
libpq style. But then I hit on the callback idea which was even more
efficient, by a fair margin. It was also considerably simpler both inside
libpq and in the client code, eliminating all sorts of awkward questions
about who deallocates the buffer in which situations. So I ditched the
"dynamic buffer" concept and went with the callback.

One other possible alternative suggests itself: instead of taking a
callback and a context pointer, the function could probably just return a
struct: status/size, and buffer. Then the caller would have to figure out
whether there's a line in the buffer, and if so, process it. It seems like
more work for the client code, but it may make the compiler's optimisation
work easier.

Jeroen

On Fri, 3 Mar 2023 at 16:52, Jelte Fennema <postgres@jeltef.nl> wrote:

Show quoted text

On Thu, 2 Mar 2023 at 20:45, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

I'm attaching a diff now. This is not a patch, it's just a discussion

piece.

Did you try with PQExpBuffer? I still think that probably fits better
in the API design of libpq. Obviously if it's significantly slower
than the callback approach in this patch then it's worth considering
using the callback approach. Overall it definitely seems reasonable to
me to have an API that doesn't do an allocation per row.

#13

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Jelte Fennema-Nio (#11)

Re: libpq: PQgetCopyData() and allocation overhead

Jelte Fennema <postgres@jeltef.nl> writes:

Did you try with PQExpBuffer? I still think that probably fits better
in the API design of libpq.

If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
seriously against that. I realize that libpq exposes it at an ABI
level, but that doesn't mean we want non-Postgres code to use it.
I also don't see what it'd add for this particular use-case.

One thing I don't care for at all in the proposed API spec is the bit
about how the handler function can scribble on the passed buffer.
Let's not do that. Declare it const char *, or maybe better const void *.

Rather than duplicating most of pqGetCopyData3, I'd suggest revising
it to take a callback, where the callback is either user-supplied
or is supplied by PQgetCopyData to emulate the existing behavior.
This would both avoid duplicate coding and provide a simple check that
you've made a usable callback API (in particular, one that can do
something sane for error cases).

regards, tom lane

#14

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Tom Lane (#13)

Re: libpq: PQgetCopyData() and allocation overhead

On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
seriously against that. I realize that libpq exposes it at an ABI
level, but that doesn't mean we want non-Postgres code to use it.
I also don't see what it'd add for this particular use-case.

Fair enough. Never even got around to checking whether it was in the API
already.

One thing I don't care for at all in the proposed API spec is the bit
about how the handler function can scribble on the passed buffer.
Let's not do that. Declare it const char *, or maybe better const void *.

Personally I would much prefer "char" over "void" here:
* It really is a char buffer, containing text.
* If there is to be any type punning, best have it explicit.
* Reduces risk of getting the two pointer arguments the wrong way around.

As for const, I would definitely have preferred that. But if the caller
needs a zero-terminated string, forcing them into a memcpy() would kind of
defeat the purpose.

I even tried poking a terminating zero in there from inside the function,
but that made the code significantly less efficient. Optimiser
assumptions, I suppose.

Rather than duplicating most of pqGetCopyData3, I'd suggest revising

it to take a callback, where the callback is either user-supplied
or is supplied by PQgetCopyData to emulate the existing behavior.
This would both avoid duplicate coding and provide a simple check that
you've made a usable callback API (in particular, one that can do
something sane for error cases).

Can do that, sure. I'll also try benchmarking a variant that doesn't take
a callback at all, but gives you the buffer pointer in addition to the
size/status return. I don't generally like callbacks.

Jeroen

#15

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Jeroen Vermeulen (#14)

Re: libpq: PQgetCopyData() and allocation overhead

Jeroen Vermeulen <jtvjtv@gmail.com> writes:

On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Let's not do that. Declare it const char *, or maybe better const void *.

Personally I would much prefer "char" over "void" here:
* It really is a char buffer, containing text.

Not in binary-mode COPY.

As for const, I would definitely have preferred that. But if the caller
needs a zero-terminated string, forcing them into a memcpy() would kind of
defeat the purpose.

I'm willing to grant that avoiding malloc-and-free is worth the trouble.
I'm not willing to allow applications to scribble on libpq buffers to
avoid memcpy. Even your not-a-patch patch fails to make the case that
this is essential, because you could have used fwrite() instead of
printf() (which would be significantly faster yet btw, printf formatting
ain't cheap).

Can do that, sure. I'll also try benchmarking a variant that doesn't take
a callback at all, but gives you the buffer pointer in addition to the
size/status return. I don't generally like callbacks.

Um ... that would require an assumption that libpq neither changes nor
moves that buffer before returning to the caller. I don't much like
that either.

regards, tom lane

#16

Jelte Fennema-Nio

postgres@jeltef.nl

about 3 years ago

In reply to: Tom Lane (#13)

Re: libpq: PQgetCopyData() and allocation overhead

On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
seriously against that. I realize that libpq exposes it at an ABI
level, but that doesn't mean we want non-Postgres code to use it.

The code comment in the pqexpbuffer.h header suggests that client
applications are fine too use the API to:

* This module is essentially the same as the backend's StringInfo data type,
* but it is intended for use in frontend libpq and client applications.

I know both pg_auto_failover and pgcopydb use it quite a lot.

#17

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Jelte Fennema-Nio (#16)

Re: libpq: PQgetCopyData() and allocation overhead

Jelte Fennema <postgres@jeltef.nl> writes:

On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
seriously against that. I realize that libpq exposes it at an ABI
level, but that doesn't mean we want non-Postgres code to use it.

The code comment in the pqexpbuffer.h header suggests that client
applications are fine too use the API to:

Our own client apps, sure. But you have to buy into the whole Postgres
compilation environment to use PQExpBuffer. (If you don't believe me,
just try including pqexpbuffer.h by itself.) That's a non-starter
for most clients.

regards, tom lane

#18

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Tom Lane (#15)

Re: libpq: PQgetCopyData() and allocation overhead

On Fri, 3 Mar 2023 at 18:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeroen Vermeulen <jtvjtv@gmail.com> writes:

On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Let's not do that. Declare it const char *, or maybe better const void

*.

Personally I would much prefer "char" over "void" here:
* It really is a char buffer, containing text.

Not in binary-mode COPY.

True. And in that case zero-termination doesn't matter much either. But
overall libpq's existing choice seems reasonable.

As for const, I would definitely have preferred that. But if the caller

needs a zero-terminated string, forcing them into a memcpy() would kind

of

defeat the purpose.

I'm willing to grant that avoiding malloc-and-free is worth the trouble.
I'm not willing to allow applications to scribble on libpq buffers to
avoid memcpy. Even your not-a-patch patch fails to make the case that
this is essential, because you could have used fwrite() instead of
printf() (which would be significantly faster yet btw, printf formatting
ain't cheap).

Your house, your rules. For my own use-case "const" is just peachy.

The printf() is just the simplest example that sprang to mind though.
There may be other use-cases out there involving libraries that require
zero-terminated strings, and I figured an ability to set a sentinel could
help those.

Can do that, sure. I'll also try benchmarking a variant that doesn't take

a callback at all, but gives you the buffer pointer in addition to the
size/status return. I don't generally like callbacks.

Um ... that would require an assumption that libpq neither changes nor
moves that buffer before returning to the caller. I don't much like
that either.

Not an assumption about _before returning to the caller_ I guess, because
the function would be on top of that anyway. The concern would be libpq
changing or moving the buffer _before the caller is done with the line._
Which would require some kind of clear rule about what invalidates the
buffer. Yes, that is easier with the callback.

Jeroen

#19

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Jeroen Vermeulen (#18)

Re: libpq: PQgetCopyData() and allocation overhead

Jeroen Vermeulen <jtvjtv@gmail.com> writes:

The printf() is just the simplest example that sprang to mind though.
There may be other use-cases out there involving libraries that require
zero-terminated strings, and I figured an ability to set a sentinel could
help those.

Well, since it won't help for binary COPY, I'm skeptical that this is
something we should cater to. Anybody who's sufficiently worried about
performance to be trying to remove malloc/free cycles ought to be
interested in binary COPY as well.

regards, tom lane

#20

Jeroen Vermeulen

jtvjtv@gmail.com

about 3 years ago

In reply to: Tom Lane (#19)

Re: libpq: PQgetCopyData() and allocation overhead

Interested, yes. But there may be reasons not to do that. Last time I
looked the binary format wasn't documented.

Anyway, I tried re-implementing pqGetCopyData3() using the callback.
Wasn't hard, but I did have to add a way for the callback to return an
error. Kept it pretty dumb for now, hoping a sensible rule will become
obvious later.

Saw no obvious performance impact, so that's good.

Jeroen

On Fri, 3 Mar 2023 at 19:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Jeroen Vermeulen <jtvjtv@gmail.com> writes:

The printf() is just the simplest example that sprang to mind though.
There may be other use-cases out there involving libraries that require
zero-terminated strings, and I figured an ability to set a sentinel could
help those.

Well, since it won't help for binary COPY, I'm skeptical that this is
something we should cater to. Anybody who's sufficiently worried about
performance to be trying to remove malloc/free cycles ought to be
interested in binary COPY as well.

regards, tom lane

libpq: PQgetCopyData() and allocation overhead

Attachments:

Attachments: