Unicode normalization test broken output

Started by Peter Eisentrautover 6 years ago5 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

I was playing with the Unicode normalization test in
src/common/unicode/. I think there is something wrong with how the test
program reports failures. For example, if I manually edit the
norm_test_table.h to make a failure, like

-    { 74, { 0x00A8, 0 }, { 0x0020, 0x0308, 0 } },
+    { 74, { 0x00A8, 0 }, { 0x0020, 0x0309, 0 } },

then the output from the test is

FAILURE (NormalizationTest.txt line 74):
input: 00
expected: 0003
got 0003

which doesn't make sense.

There appear to be several off-by-more-than-one errors in norm_test.c
print_wchar_str(). Attached is a patch to fix this (and make the output
a bit prettier). Result afterwards:

FAILURE (NormalizationTest.txt line 74):
input: U+00A8
expected: U+0020 U+0309
got: U+0020 U+0308

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Fix-output-of-Unicode-normalization-test.patchtext/plain; charset=UTF-8; name=0001-Fix-output-of-Unicode-normalization-test.patch; x-mac-creator=0; x-mac-type=0Download+6-7
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: Unicode normalization test broken output

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

There appear to be several off-by-more-than-one errors in norm_test.c
print_wchar_str(). Attached is a patch to fix this (and make the output
a bit prettier). Result afterwards:

I concur that this looks broken and your patch improves it.
But I'm not very happy about the remaining assumption that
we don't have to worry about characters above U+FFFF. I'd
rather see it allocate 11 bytes per allowed pg_wchar, and
manage the string contents with something like

p += sprintf(p, "U+%04X ", *s);

An alternative fix would be to start using a PQExpBuffer, but
it's probably not quite worth that.

regards, tom lane

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#2)
Re: Unicode normalization test broken output

On 2019-12-09 23:22, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

There appear to be several off-by-more-than-one errors in norm_test.c
print_wchar_str(). Attached is a patch to fix this (and make the output
a bit prettier). Result afterwards:

I concur that this looks broken and your patch improves it.
But I'm not very happy about the remaining assumption that
we don't have to worry about characters above U+FFFF. I'd
rather see it allocate 11 bytes per allowed pg_wchar, and
manage the string contents with something like

p += sprintf(p, "U+%04X ", *s);

Good point. Fixed in attached patch.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Fix-output-of-Unicode-normalization-test.patchtext/plain; charset=UTF-8; name=v2-0001-Fix-output-of-Unicode-normalization-test.patch; x-mac-creator=0; x-mac-type=0Download+9-7
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#3)
Re: Unicode normalization test broken output

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Good point. Fixed in attached patch.

This one LGTM.

regards, tom lane

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#4)
Re: Unicode normalization test broken output

On 2019-12-10 17:16, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Good point. Fixed in attached patch.

This one LGTM.

done, thanks

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services