Use ereport() instead of elog() for invalid weights in setweight()

Started by Ewan Young20 days ago4 messageshackers
Jump to latest
#1Ewan Young
kdbase.hack@gmail.com

Hi hackers,

I noticed that setweight() reports an internal error (SQLSTATE XX000)
when the weight argument is not one of A/a, B/b, C/c, D/d, even though
the weight comes directly from user input. The two-argument variant
also prints the weight as a raw ASCII code, which is a bit unfriendly:

=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: unrecognized weight: 112

ts_filter() in the same file (tsvector_op.c) already handles the
equivalent case with ereport() and ERRCODE_INVALID_PARAMETER_VALUE,
so the attached patch simply makes tsvector_setweight() and
tsvector_setweight_by_filter() do the same, and adds regression tests
covering the three error paths (none of which were covered before).

This seems to be in the same spirit as the earlier cleanup of
user-reachable internal error codes [1]/messages/by-id/Zic_GNgos5sMxKoa@paquier.xyz; these two sites appear to
have been missed there.

The patch is against the master and passes make check. Please let me know
if I've missed anything -- I'd be happy to revise.

[1]: /messages/by-id/Zic_GNgos5sMxKoa@paquier.xyz

Best regards,
Ewan Young

Attachments:

v1-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patchapplication/octet-stream; name=v1-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patchDownload+17-5
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ewan Young (#1)
Re: Use ereport() instead of elog() for invalid weights in setweight()

Ewan Young <kdbase.hack@gmail.com> writes:

I noticed that setweight() reports an internal error (SQLSTATE XX000)
when the weight argument is not one of A/a, B/b, C/c, D/d, even though
the weight comes directly from user input. The two-argument variant
also prints the weight as a raw ASCII code, which is a bit unfriendly:

=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: unrecognized weight: 112

I agree that these ought to be ereport()s. However, I suspect that
the reason for printing bogus weights numerically was to avoid the
risk of generating encoding-incorrect strings if the given char
value has its high bit set. The existing code in tsvector_filter
is failing to consider that hazard.

I experimented with making the error messages print non-ASCII
characters differently, and soon decided that that added enough
complexity that we shouldn't have three copies of it. So the
attached proposed v2 also factors the code out into a new
function parse_weight (maybe a different name would be better?).

I'm unconvinced that we really need a regression test case for
this ...

regards, tom lane

Attachments:

v2-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Use-ereport-not-elog-for-invalid-weights-in-setweig; name*1=ht.patchDownload+30-57
#3Ewan Young
kdbase.hack@gmail.com
In reply to: Tom Lane (#2)
Re: Use ereport() instead of elog() for invalid weights in setweight()

On Thu, Jun 4, 2026 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ewan Young <kdbase.hack@gmail.com> writes:

I noticed that setweight() reports an internal error (SQLSTATE XX000)
when the weight argument is not one of A/a, B/b, C/c, D/d, even though
the weight comes directly from user input. The two-argument variant
also prints the weight as a raw ASCII code, which is a bit unfriendly:

=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: unrecognized weight: 112

I agree that these ought to be ereport()s. However, I suspect that
the reason for printing bogus weights numerically was to avoid the
risk of generating encoding-incorrect strings if the given char
value has its high bit set. The existing code in tsvector_filter
is failing to consider that hazard.

Ah, I hadn't considered that. You're right: in a multibyte encoding
the bogus byte could well be a fragment of a multibyte character, so
printing it with %c would inject an invalidly-encoded byte into the
error message. The style used in v2 (matching charout()) keeps
the message pure ASCII, which seems clearly safer.

I experimented with making the error messages print non-ASCII
characters differently, and soon decided that that added enough
complexity that we shouldn't have three copies of it. So the
attached proposed v2 also factors the code out into a new
function parse_weight (maybe a different name would be better?).

Factoring it out looks like a clear improvement. parse_weight reads
fine to me; I don't think it's worth bikeshedding.
I tested v2 on top of current master:
- applies cleanly, builds without new warnings
- core regression suite passes
- manually exercised the error paths, and it works:

=# \set VERBOSITY verbose
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: 22023: unrecognized weight: "p"
LOCATION: parse_weight, tsvector_op.c:236
=# SELECT setweight('cat:1'::tsvector, '\200');
ERROR: 22023: unrecognized weight: "\200"
LOCATION: parse_weight, tsvector_op.c:240

I'm unconvinced that we really need a regression test case for
this ...

Agreed, no objection to dropping it. The behavior worth checking is
the message formatting, which is easy enough to verify by hand.

regards, tom lane

So v2 looks good to me. Thanks for improving the patch!

Best regards,
Ewan Young

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ewan Young (#3)
Re: Use ereport() instead of elog() for invalid weights in setweight()

Ewan Young <kdbase.hack@gmail.com> writes:

So v2 looks good to me. Thanks for improving the patch!

Sounds good, pushed.

regards, tom lane