Dead encoding conversion functions

Started by Tom Laneabout 7 years ago5 messageshackers

tgl@sss.pgh.pa.us

about 7 years ago

Pursuant to today's discussion at PGCon about code coverage, I went
nosing into some of the particularly under-covered subdirectories
in our tree, and immediately tripped over an interesting factoid:
the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
untested ... not because the regression tests don't try, but because
those conversions are unreachable. pg_do_encoding_conversion() and
its sister functions have hard-wired fast paths for any conversion
in which the source or target encoding is SQL_ASCII, so that an
encoding conversion function declared for such a case will never
be used.

(The coverage results do show ascii_to_utf8 as being covered, but
that's just because alter_table.sql randomly chose to test
ALTER CONVERSION using a user-defined conversion from SQL_ASCII
to UTF8, rather than any other case. CreateConversionCommand()
will invoke the specified function on an empty string just to see
if it works, so that's where that "coverage" comes from.)

This situation seems kinda silly. My inclination is to delete
these functions as useless, but I suppose another approach is
to suppress the fast paths if there's a declared conversion function.
(Doing so would likely require added catalog lookups in places we
might not want them...)

If we do delete them as useless, it might also be advisable to change
CreateConversionCommand() to refuse creation of conversions to/from
SQL_ASCII, to prevent future confusion.

Thoughts?

regards, tom lane

Daniel Gustafsson

daniel@yesql.se

about 7 years ago

In reply to: Tom Lane (#1)

Re: Dead encoding conversion functions

On 29 May 2019, at 15:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pursuant to today's discussion at PGCon about code coverage, I went
nosing into some of the particularly under-covered subdirectories
in our tree,

On a similar, but much less important/interesting note. I fat-fingered when
compiling isolationtester on the plane over here and happened to compile
src/test/examples, and in there testlo.c and testlo64.c has two dead functions
for which the callsites have been commented out since the Postgres95 import
(and now cause a warning). Is there any (historic?) reason to keep that code?
It also seems kind of broken as it doesn’t really handle the open() call
failure very well.

cheers ./daniel

Noah Misch

noah@leadboat.com

almost 7 years ago

In reply to: Tom Lane (#1)

Re: Dead encoding conversion functions

On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote:

Pursuant to today's discussion at PGCon about code coverage, I went
nosing into some of the particularly under-covered subdirectories
in our tree, and immediately tripped over an interesting factoid:
the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
untested ... not because the regression tests don't try, but because
those conversions are unreachable. pg_do_encoding_conversion() and
its sister functions have hard-wired fast paths for any conversion
in which the source or target encoding is SQL_ASCII, so that an
encoding conversion function declared for such a case will never
be used.

This situation seems kinda silly. My inclination is to delete
these functions as useless, but I suppose another approach is
to suppress the fast paths if there's a declared conversion function.
(Doing so would likely require added catalog lookups in places we
might not want them...)

Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR
when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would
otherwise send the client any character outside 7-bit ASCII. That's fairly
defensible, but doing it for only UTF8 and MULE_INTERNAL is not. So if we
like the ascii_to_utf8() behavior, I think the action would be to replace the
fast path with an encoding-independent verification that all bytes are 7-bit
ASCII. (The check would not apply when both server_encoding and
client_encoding are SQL_ASCII, of course.) Alternately, one might prefer to
replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8
case, we'd allow byte sequences that are valid UTF8, even though the validity
may be a coincidence and mojibake may ensue. SQL_ASCII is for being casual
about encoding, so it's not clear to me whether or not either prospective
behavior change would be an improvement. However, I do find it clear to
delete ascii_to_utf8() and ascii_to_mic().

If we do delete them as useless, it might also be advisable to change
CreateConversionCommand() to refuse creation of conversions to/from
SQL_ASCII, to prevent future confusion.

Sounds good.

Peter Eisentraut

peter_e@gmx.net

almost 7 years ago

In reply to: Tom Lane (#1)

Re: Dead encoding conversion functions

On 2019-05-29 21:03, Tom Lane wrote:

If we do delete them as useless, it might also be advisable to change
CreateConversionCommand() to refuse creation of conversions to/from
SQL_ASCII, to prevent future confusion.

It seems nonsensical by definition to allow that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Peter Eisentraut (#4)

Re: Dead encoding conversion functions

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2019-05-29 21:03, Tom Lane wrote:

If we do delete them as useless, it might also be advisable to change
CreateConversionCommand() to refuse creation of conversions to/from
SQL_ASCII, to prevent future confusion.

It seems nonsensical by definition to allow that.

Here's a completed patch for that. Obviously this is a bit late
for v12, but if there aren't objections I'll push this soon after
v13 opens.

regards, tom lane

Dead encoding conversion functions

Attachments: