Performance penalty when requesting text values in binary format
I'm the creator of the PostgreSQL driver pgx (https://github.com/jackc/pgx)
for the Go language. I have found significant performance advantages to
using the extended protocol and binary format values -- in particular for
types such as timestamptz.
However, I was recently very surprised to find that it is significantly
slower to select a text type value in the binary format. For an example
case of selecting 1,000 rows each with 5 text columns of 16 bytes each the
application time from sending the query to having received the entire
response is approximately 16% slower. Here is a link to the test benchmark:
https://github.com/jackc/pg_text_binary_bench
Given that the text and binary formats for the text type are identical I
would not have expected any performance differences.
My C is rusty and my knowledge of the PG server internals is minimal but
the performance difference appears to be that function textsend creates an
extra copy where textout simply returns a pointer to the existing data.
This seems to be superfluous.
I can work around this by specifying the format per result column instead
of specifying binary for all but this performance bug / anomaly seemed
worth reporting.
Jack
On Sat, 2020-05-16 at 20:12 -0500, Jack Christensen wrote:
I'm the creator of the PostgreSQL driver pgx (https://github.com/jackc/pgx) for the Go language.
I have found significant performance advantages to using the extended protocol and binary format
values -- in particular for types such as timestamptz.However, I was recently very surprised to find that it is significantly slower to select a text
type value in the binary format. For an example case of selecting 1,000 rows each with 5 text
columns of 16 bytes each the application time from sending the query to having received the
entire response is approximately 16% slower. Here is a link to the test benchmark:
https://github.com/jackc/pg_text_binary_benchGiven that the text and binary formats for the text type are identical I would not have
expected any performance differences.My C is rusty and my knowledge of the PG server internals is minimal but the performance
difference appears to be that function textsend creates an extra copy where textout
simply returns a pointer to the existing data. This seems to be superfluous.I can work around this by specifying the format per result column instead of specifying
binary for all but this performance bug / anomaly seemed worth reporting.
Did you profile your benchmark?
It would be interesting to know where the time is spent.
Yours,
Laurenz Albe
On Mon, May 18, 2020 at 7:07 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
Did you profile your benchmark?
It would be interesting to know where the time is spent.
Unfortunately, I have not. Fortunately, it appears that Tom Lane recognized
this as a part of another issue and has prepared a patch.
/messages/by-id/6648.1589819885@sss.pgh.pa.us
Thanks,
Jack