libpq pipelineing

Started by Samuel Williamsalmost 6 years ago14 messagesgeneral

Jump to latest

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

Hello,

Using the asynchronous interface of libpq, is it possible to pipeline
multiple queries?

i.e.

PQsendQuery(query1)
PQsendQuery(query2)

followed by

query1_results = PQgetResult(...)
query2_results = PQgetResult(...)

I tried it but got "another command is already in progress" error.

So, maybe it's not supported, or maybe I'm doing something wrong.

Thanks
Samuel

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Samuel Williams (#1)

Re: libpq pipelineing

On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com>
wrote:

Hello,

Using the asynchronous interface of libpq, is it possible to pipeline
multiple queries?

i.e.

PQsendQuery(query1)
PQsendQuery(query2)

followed by

query1_results = PQgetResult(...)
query2_results = PQgetResult(...)

I tried it but got "another command is already in progress" error.

The documentation seems to leave zero ambiguity:

After successfully calling PQsendQuery, call PQgetResult one or more times
to obtain the results. PQsendQuery cannot be called again (on the same
connection) until PQgetResult has returned a null pointer, indicating that
the command is done.

David J.

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

In reply to: David G. Johnston (#2)

Re: libpq pipelineing

Thanks David,

You are correct.

I was giving an example of what I was hoping to achieve, not what I
expected to work with the current interface.

I found some discussion in the past relating to batch processing which
appears to support some kind of pipelining:

https://2ndquadrant.github.io/postgres/libpq-batch-mode.html

However it seems to be abandoned.

Kind regards,
Samuel

On Sat, 27 Jun 2020 at 16:15, David G. Johnston
<david.g.johnston@gmail.com> wrote:

Show quoted text

On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com> wrote:

Hello,

Using the asynchronous interface of libpq, is it possible to pipeline
multiple queries?

i.e.

PQsendQuery(query1)
PQsendQuery(query2)

followed by

query1_results = PQgetResult(...)
query2_results = PQgetResult(...)

I tried it but got "another command is already in progress" error.

The documentation seems to leave zero ambiguity:

After successfully calling PQsendQuery, call PQgetResult one or more times to obtain the results. PQsendQuery cannot be called again (on the same connection) until PQgetResult has returned a null pointer, indicating that the command is done.

David J.

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Samuel Williams (#3)

Re: libpq pipelineing

On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com>
wrote:

Thanks David,

You are correct.

I was giving an example of what I was hoping to achieve, not what I
expected to work with the current interface.

What about, as it says, sending multiple statements in a single sendQuery
and then polling for multiple results?

David J.

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

In reply to: David G. Johnston (#4)

Re: libpq pipelineing

What about, as it says, sending multiple statements in a single sendQuery and then polling for multiple results?

I tried this, and even in single row streaming mode, I found that
there are cases where the results would not be streamed until all the
queries were sent.

From the users point of view, they may generate a loop sending
multiple queries and don't care about the result, so a pipeline/batch
processing is ideal to avoid RTT per loop iteration, if database
access is slow, this can be a significant source of latency.

Kind regards,
Samuel

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Samuel Williams (#5)

Re: libpq pipelineing

On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com>
wrote:

What about, as it says, sending multiple statements in a single

sendQuery and then polling for multiple results?

I tried this, and even in single row streaming mode, I found that
there are cases where the results would not be streamed until all the
queries were sent.

From the users point of view, they may generate a loop sending
multiple queries and don't care about the result, so a pipeline/batch
processing is ideal to avoid RTT per loop iteration, if database
access is slow, this can be a significant source of latency

I don’t have any insight into the bigger picture but I’d concur that no
other option is documented so what you desire is not possible.

David J.

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

In reply to: David G. Johnston (#6)

Re: libpq pipelineing

Here is a short example:

https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18

It makes 10 queries in one "PQsendQuery" and sets single row mode. But
all the results come back at once as shown by the timestamps.

Next I'm planning to investigate streaming large recordsets to see if
it works better/incrementally.

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Samuel Williams (#7)

Re: libpq pipelineing

Samuel Williams <space.ship.traveller@gmail.com> writes:

Here is a short example:
https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18
It makes 10 queries in one "PQsendQuery" and sets single row mode. But
all the results come back at once as shown by the timestamps.

That looks to be less about what libpq will do than what the Ruby
interface code will do.

The volume of return data may also be an issue. I don't think the
backend will flush data out to the client except when it (a) reaches
an idle state or (b) fills the output buffer. Ten occurrences of
a short query result aren't gonna be enough for (b) --- from memory,
that buffer is probably 8KB.

regards, tom lane

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

In reply to: Tom Lane (#8)

Re: libpq pipelineing

I think libmariadb has a nicer interface for this.

Essentially what you do is send your query, and then read a result set
(one result set per query), and then you stream individual rows using:

mysql_fetch_row_start
mysql_fetch_row_cont

Those methods don't seem to have an equivalent in libpq - you can use
PQgetResult but it buffers all the rows. Using single row mode results
in many results for each query (seems like a big overhead). The
difference between this and MariaDB is that mysql_fetch_row still
operates within one logical set of results, but single row mode breaks
the single logical set of results into lots of individual results.

Maybe the statement about efficiency is incorrect, but it would be
nice if you could incrementally stream a single result set more
easily.

Show quoted text

On Sun, 28 Jun 2020 at 02:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Samuel Williams <space.ship.traveller@gmail.com> writes:

Here is a short example:
https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18
It makes 10 queries in one "PQsendQuery" and sets single row mode. But
all the results come back at once as shown by the timestamps.

That looks to be less about what libpq will do than what the Ruby
interface code will do.

The volume of return data may also be an issue. I don't think the
backend will flush data out to the client except when it (a) reaches
an idle state or (b) fills the output buffer. Ten occurrences of
a short query result aren't gonna be enough for (b) --- from memory,
that buffer is probably 8KB.

regards, tom lane

#10

Stephen Frost

sfrost@snowman.net

almost 6 years ago

In reply to: Samuel Williams (#7)

Re: libpq pipelineing

Greetings,

* Samuel Williams (space.ship.traveller@gmail.com) wrote:

Here is a short example:

https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18

It makes 10 queries in one "PQsendQuery" and sets single row mode. But
all the results come back at once as shown by the timestamps.

If you have 10 queries that you want to make in a given transaction and
you care about the latency then really the best option is to wrap that
all in a single pl/pgsql function on the server side and make one call.

Next I'm planning to investigate streaming large recordsets to see if
it works better/incrementally.

If you want to stream large data sets to/from PG, you should consider
using COPY.

Thanks,

Stephen

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Samuel Williams (#9)

Re: libpq pipelineing

Samuel Williams <space.ship.traveller@gmail.com> writes:

Those methods don't seem to have an equivalent in libpq - you can use
PQgetResult but it buffers all the rows. Using single row mode results
in many results for each query (seems like a big overhead).

Have you got any actual evidence for that? Sure, the overhead is
more than zero, but does it mean anything in comparison to the other
costs of data transmission?

Maybe the statement about efficiency is incorrect, but it would be
nice if you could incrementally stream a single result set more
easily.

More easily than what? If we did not construct a PGresult then we would
need some other abstraction for access to the returned row, dealing with
error cases, etc etc. That would mean a lot of very duplicative API code
in libpq, and a painful bunch of adjustments in client code.

regards, tom lane

#12

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

In reply to: Tom Lane (#11)

Re: libpq pipelineing

Tom, I'm implementing a small abstraction layer for event-driven
result streaming on PostgreSQL and MariaDB for Ruby, and I'll endeavor
to report back with some numbers once I have enough of it working to
benchmark something meaningful.

Thanks for your patience and help.

Kind regards,
Samuel

Show quoted text

On Tue, 30 Jun 2020 at 02:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Samuel Williams <space.ship.traveller@gmail.com> writes:

Those methods don't seem to have an equivalent in libpq - you can use
PQgetResult but it buffers all the rows. Using single row mode results
in many results for each query (seems like a big overhead).

Have you got any actual evidence for that? Sure, the overhead is
more than zero, but does it mean anything in comparison to the other
costs of data transmission?

Maybe the statement about efficiency is incorrect, but it would be
nice if you could incrementally stream a single result set more
easily.

More easily than what? If we did not construct a PGresult then we would
need some other abstraction for access to the returned row, dealing with
error cases, etc etc. That would mean a lot of very duplicative API code
in libpq, and a painful bunch of adjustments in client code.

regards, tom lane

#13

Samuel Williams

space.ship.traveller@gmail.com

almost 6 years ago

In reply to: Samuel Williams (#12)

Re: libpq pipelineing

Hi,

Here are some initial numbers.

DB::Client
Warming up --------------------------------------
db-postgres 281.000 i/100ms
db-mariadb 399.000 i/100ms
mysql2 533.000 i/100ms
pg 591.000 i/100ms
Calculating -------------------------------------
db-postgres 2.725k (± 1.8%) i/s - 13.769k in 5.053750s
db-mariadb 3.990k (± 2.4%) i/s - 19.950k in 5.002453s
mysql2 5.153k (± 4.7%) i/s - 26.117k in 5.079570s
pg 5.772k (± 4.4%) i/s - 28.959k in 5.027423s

Comparison:
pg: 5771.7 i/s
mysql2: 5152.8 i/s - 1.12x (± 0.00) slower
db-mariadb: 3990.3 i/s - 1.45x (± 0.00) slower
db-postgres: 2725.5 i/s - 2.12x (± 0.00) slower

The db-* gems are event driven. However, that is of less interest right now.

This benchmark creates a table, inserts 1000 (or some fixed number) of
rows, and then selects them all back out.

What I noticed is that `PQgetvalue` and `PQgetisnull` is not
particularly efficient, at least via FFI. Requires rows * columns * 2
FFI calls.

libmariadb provides `mysql_fetch_row` which returns a `char **` per
row. Requires only rows FFI calls.

Does a similar method exist for libpq? e.g. `PGgetrow(index) ->
char**` (array of strings, one for each column, may be nil to indicate
null).

Kind regards,
Samuel

On Tue, 30 Jun 2020 at 12:50, Samuel Williams
<space.ship.traveller@gmail.com> wrote:

Show quoted text

Tom, I'm implementing a small abstraction layer for event-driven
result streaming on PostgreSQL and MariaDB for Ruby, and I'll endeavor
to report back with some numbers once I have enough of it working to
benchmark something meaningful.

Thanks for your patience and help.

Kind regards,
Samuel

On Tue, 30 Jun 2020 at 02:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Samuel Williams <space.ship.traveller@gmail.com> writes:

Those methods don't seem to have an equivalent in libpq - you can use
PQgetResult but it buffers all the rows. Using single row mode results
in many results for each query (seems like a big overhead).

Have you got any actual evidence for that? Sure, the overhead is
more than zero, but does it mean anything in comparison to the other
costs of data transmission?

Maybe the statement about efficiency is incorrect, but it would be
nice if you could incrementally stream a single result set more
easily.

More easily than what? If we did not construct a PGresult then we would
need some other abstraction for access to the returned row, dealing with
error cases, etc etc. That would mean a lot of very duplicative API code
in libpq, and a painful bunch of adjustments in client code.

regards, tom lane

#14

Alvaro Herrera

alvherre@2ndquadrant.com

about 5 years ago

In reply to: Samuel Williams (#3)

Re: libpq pipelineing

Hello Samuel,

On 2020-Jun-27, Samuel Williams wrote:

I found some discussion in the past relating to batch processing which
appears to support some kind of pipelining:

https://2ndquadrant.github.io/postgres/libpq-batch-mode.html

I just noticed this old thread of yours. I've been working on getting
the work you linked to, polished a little bit and completed for
PostgreSQL 14. If you'd like to give it a try, your input would be
very useful to me. You can find the patch here (applies on the current
master branch): /messages/by-id/20210306003559.GA1375@alvherre.pgsql
(If you need help building, please ping me on private email).

The patched libpq version is compatible with older servers.

Thanks

--
ï¿½lvaro Herrera Valdivia, Chile