libpq: PQfnumber overload for not null-terminated strings
Hi!
This one comes from C++'s `std::string_view`: being just a `const char*`
plus the `size`, it's a very convenient type to use in interfaces which
don't need an ownership of the data passed in.
Unfortunately, `PQfnumber` expects a null-terminated string, which
`std::string_view` can not guarantee, and this limitations affects the
interfaces built on top of libpq.
Would you be willing to review a patch that adds an `PQfnumber` overload
that takes a `field_name` size as well?
Ivan Trofimov <i.trofimow@yandex.ru> writes:
Would you be willing to review a patch that adds an `PQfnumber` overload
that takes a `field_name` size as well?
I'm a little skeptical of this idea. If you need counted strings
for PQfnumber, wouldn't you need them for every single other
string-based API in libpq as well? That's not a lift that I think
we'd want to undertake.
regards, tom lane
Thanks for the quick reply.
If you need counted strings
for PQfnumber, wouldn't you need them for every single other
string-based API in libpq as well?
No, not really.
Thing is, out of all the functions listed in "34.3.2. Retrieving Query
Result Information" and "34.3.3. Retrieving Other Result Information"
PQfnumber is the only (well, technically also PQprint) one that takes a
string as an input.
As far as I know PQfnumber is the only portable way to implement "give
me the the value at this row with this 'column_name'", which is an
essential feature for any kind of client-side parsing.
Right now as a library writer in a higher-level language I'm forced to
either
* Sacrifice performance to ensure 'column_name' is null-terminated
(that's what some bindings in Rust do)
* Sacrifice interface quality by requiring a null-terminated string,
which is not necessary idiomatic (that's what we do)
* Sacrifice usability by requiring a user to guarantee that the
'string_view' provided is null-terminated (that's what libpqxx does, for
example)
I don't think it's _that_ big of a deal, but could it be QoL improvement
nice enough to be worth of a tiny addition into libpq interface?
Ivan Trofimov <i.trofimow@yandex.ru> writes:
If you need counted strings
for PQfnumber, wouldn't you need them for every single other
string-based API in libpq as well?
No, not really.
Thing is, out of all the functions listed in "34.3.2. Retrieving Query
Result Information" and "34.3.3. Retrieving Other Result Information"
PQfnumber is the only (well, technically also PQprint) one that takes a
string as an input.
I think that's a mighty myopic definition of which APIs would need
counted-string support if we were to make that a thing in libpq.
Just for starters, why are you only concerned with processing a
query result, and not with the functions needed to send the query?
Right now as a library writer in a higher-level language I'm forced to
either
* Sacrifice performance to ensure 'column_name' is null-terminated
(that's what some bindings in Rust do)
I'd go with that. You would have a very hard time convincing me that
the per-query overhead that building a null-terminated string adds
is even measurable compared to the time needed to send, process, and
receive the query.
regards, tom lane
Right now as a library writer in a higher-level language I'm forced to
either
* Sacrifice performance to ensure 'column_name' is null-terminated
(that's what some bindings in Rust do)I'd go with that. You would have a very hard time convincing me that
the per-query overhead
I see now that I failed to express myself clearly: it's not a per-query
overhead, but rather a per-result-field one.
Given a code like this (in pseudo-code)
result = ExecuteQuery(some_query)
for (row in result):
a = row["some_column_name"]
b = row["some_other_column_name"]
...
a field-name string should be null-terminated for every field accessed.
There absolutely are ways to write the same in a more performant way and
avoid repeatedly calling PQfnumber altogether, but that I as a library
writer can't control.
In my quickly-hacked-together test just null-terminating a user-provided
string takes ~14% of total CPU time (and PQfnumber itself takes ~30%,
but oh well), please see the code and flamegraph attached.
On Tue, 27 Feb 2024 at 00:31, Ivan Trofimov <i.trofimow@yandex.ru> wrote:
I see now that I failed to express myself clearly: it's not a per-query
overhead, but rather a per-result-field one.
I'm fairly sympathetic to decreasing the overhead of any per-row
operation. And looking at the code, it doesn't surprise me that
PQfnumber shows up so big in your profile. I think it would probably
make sense to introduce a PQfnumber variant that does not do the
downcasing/quote handling (called e.g. PQfnumberRaw).
However, I do think you could convert this per-row overhead in your
case to per-query overhead by caching the result of PQfnumber for each
unique C++ string_view. Afaict that should solve your performance
problem.
However, I do think you could convert this per-row overhead in your
case to per-query overhead by caching the result of PQfnumber for each
unique C++ string_view. Afaict that should solve your performance
problem.
Absolutely, you're right.
The problem here is not that it's impossible to write it in a performant
way, but rather that it's impossible to do so in a performant _and_
clean way given the convenient abstractions wrapper-libraries provide:
here's a `Result`, which consists of `Row`s, which in turn consist of
`Field`s.
The most natural and straightforward way to iterate over a `Result`
would be in the lines of that loop, and people do write code like that
because it's what they expect to just work given the abstractions (and
it does, it's just slow).
Caching the result of PQfnumber could be done, but would result in
somewhat of a mess on a call-site.
I like your idea of 'PQfnumberRaw': initially i was only concerned about
a null-terminated string requirement affecting my interfaces (because
users complained about that to me,
https://github.com/userver-framework/userver/issues/494), but I think
PQfnumberRaw could solve both problems (PQfnumber being a bottleneck
when called repeatedly and a null-terminated string requirement)
simultaneously.
On Tue, 27 Feb 2024 at 15:49, Ivan Trofimov <i.trofimow@yandex.ru> wrote:
Caching the result of PQfnumber could be done, but would result in
somewhat of a mess on a call-site.
To be clear I meant your wrapper around libpq could internally cache
this, then the call sites of users of your wrapper would not need to
be changed. i.e. your Result could contain a cache of
columnname->columnumber mapping that you know because of previous
calls to PQfnumber on the same Result.
I like your idea of 'PQfnumberRaw': initially i was only concerned about
a null-terminated string requirement affecting my interfaces (because
users complained about that to me,
https://github.com/userver-framework/userver/issues/494), but I think
PQfnumberRaw could solve both problems (PQfnumber being a bottleneck
when called repeatedly and a null-terminated string requirement)
simultaneously.
Feel free to send a patch for this.