how to run encoding-dependent tests by default

Started by Peter Eisentrautalmost 7 years ago13 messageshackers

peter_e@gmx.net

almost 7 years ago

There is a fair amount of collation-related functionality that is only
being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
which are not run by default. There is more functionality planned in
this area, so making the testing more straightforward would be useful.

The reason these tests cannot be run by default (other than that they
don't apply to each build, which is easy to figure out) is that

a) They contain UTF8 non-ASCII characters that might not convert to
every server-side encoding, and

b) The error messages mention the encoding name ('ERROR: collation
"foo" for encoding "UTF8" does not exist')

The server encoding can be set more-or-less arbitrarily for each test
run, and moreover it is computed from the locale, so it's not easy to
determine ahead of time from a makefile, say.

What would be a good way to sort this out? None of these problems are
terribly difficult on their own, but I'm struggling to come up with a
coherent solution.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Peter Eisentraut (#1)

Re: how to run encoding-dependent tests by default

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

There is a fair amount of collation-related functionality that is only
being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
which are not run by default. There is more functionality planned in
this area, so making the testing more straightforward would be useful.
The reason these tests cannot be run by default (other than that they
don't apply to each build, which is easy to figure out) is that
a) They contain UTF8 non-ASCII characters that might not convert to
every server-side encoding, and
b) The error messages mention the encoding name ('ERROR: collation
"foo" for encoding "UTF8" does not exist')
The server encoding can be set more-or-less arbitrarily for each test
run, and moreover it is computed from the locale, so it's not easy to
determine ahead of time from a makefile, say.

What would be a good way to sort this out? None of these problems are
terribly difficult on their own, but I'm struggling to come up with a
coherent solution.

Perhaps set up a separate test run (not part of the core tests) in which
the database is forced to have UTF8 encoding? That could be expanded
to other encodings too if anyone cares.

regards, tom lane

Andrew Dunstan

andrew@dunslane.net

almost 7 years ago

In reply to: Tom Lane (#2)

Re: how to run encoding-dependent tests by default

On 6/17/19 11:32 AM, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

There is a fair amount of collation-related functionality that is only
being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
which are not run by default. There is more functionality planned in
this area, so making the testing more straightforward would be useful.
The reason these tests cannot be run by default (other than that they
don't apply to each build, which is easy to figure out) is that
a) They contain UTF8 non-ASCII characters that might not convert to
every server-side encoding, and
b) The error messages mention the encoding name ('ERROR: collation
"foo" for encoding "UTF8" does not exist')
The server encoding can be set more-or-less arbitrarily for each test
run, and moreover it is computed from the locale, so it's not easy to
determine ahead of time from a makefile, say.
What would be a good way to sort this out? None of these problems are
terribly difficult on their own, but I'm struggling to come up with a
coherent solution.

Perhaps set up a separate test run (not part of the core tests) in which
the database is forced to have UTF8 encoding? That could be expanded
to other encodings too if anyone cares.

I should point out that the buildfarm does run these tests for every
utf8 locale it's configured for if the TestICU module is enabled. At the
moment the only animal actually running those tests is prion, for
en_US.utf8.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Peter Eisentraut (#1)

Re: how to run encoding-dependent tests by default

Hi,

On 2019-06-17 16:56:00 +0200, Peter Eisentraut wrote:

There is a fair amount of collation-related functionality that is only
being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
which are not run by default. There is more functionality planned in
this area, so making the testing more straightforward would be useful.

The reason these tests cannot be run by default (other than that they
don't apply to each build, which is easy to figure out) is that

a) They contain UTF8 non-ASCII characters that might not convert to
every server-side encoding, and

b) The error messages mention the encoding name ('ERROR: collation
"foo" for encoding "UTF8" does not exist')

The server encoding can be set more-or-less arbitrarily for each test
run, and moreover it is computed from the locale, so it's not easy to
determine ahead of time from a makefile, say.

What would be a good way to sort this out? None of these problems are
terribly difficult on their own, but I'm struggling to come up with a
coherent solution.

I wonder if using alternative output files and psql's \if could be good
enough here. It's not that hard to maintain an alternative output file
if it's nearly empty.

Basically something like:

\gset SELECT my_encodings_are_compatible() AS compatible
\if :compatible
test;
contents;
\endif

That won't get rid of b) in its entirety, but even just running the test
automatically on platforms it works without problems would be an
improvement.

We probably also could just have a wrapper function in those tests that
catch the exception and print a more anodyne message.

Greetings,

Andres Freund

Peter Eisentraut

peter_e@gmx.net

almost 7 years ago

In reply to: Andres Freund (#4)

Re: how to run encoding-dependent tests by default

On 2019-06-17 18:39, Andres Freund wrote:

Basically something like:

\gset SELECT my_encodings_are_compatible() AS compatible
\if :compatible
test;
contents;
\endif

Cool, that works out quite well. See attached patch. I flipped the
logic around to make it \quit if not compatible. That way the
alternative expected file is shorter and doesn't need to be updated all
the time. But it gets the job done either way.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Peter Eisentraut

peter_e@gmx.net

over 6 years ago

In reply to: Peter Eisentraut (#5)

Re: how to run encoding-dependent tests by default

On 2019-06-23 21:44, Peter Eisentraut wrote:

On 2019-06-17 18:39, Andres Freund wrote:

Basically something like:

\gset SELECT my_encodings_are_compatible() AS compatible
\if :compatible
test;
contents;
\endif

Cool, that works out quite well. See attached patch. I flipped the
logic around to make it \quit if not compatible. That way the
alternative expected file is shorter and doesn't need to be updated all
the time. But it gets the job done either way.

Small patch update: The collate.linux.utf8 test also needs to check in a
similar manner that all the locales it is using are installed. This
should get the cfbot run passing.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Peter Eisentraut (#6)

Re: how to run encoding-dependent tests by default

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Cool, that works out quite well. See attached patch. I flipped the
logic around to make it \quit if not compatible. That way the
alternative expected file is shorter and doesn't need to be updated all
the time. But it gets the job done either way.

I took a look at this and did some light testing. It seems to work
as advertised, but I do have one gripe, which is the dependency on
the EXTRA_TESTS mechanism. There are a few things not to like about
doing it that way:

* need additional hacking for Windows (admittedly, moot for
collate.linux.utf8, but I hope it's not for collate.icu.utf8).

* can't put these tests into a parallel group, they run by themselves;

* if user specifies EXTRA_TESTS on make command line, that overrides
the Makefile so these tests aren't run.

So I wish we could get rid of the Makefile changes, have the test
scripts be completely responsible for whether to run themselves or
not, and put them into the schedule files normally.

It's pretty obvious how we might do this for collate.icu.utf8:
make it look to see if there are any ICU-supplied collations in
pg_collation.

I'm less clear on a reasonable way to detect a glibc platform
from SQL. The best I can think of is to see if the string
"linux" appears in the output of version(), and that's probably
none too robust. Can we do anything based on the content of
pg_collation? Probably not :-(.

Still, even if you only fixed collate.icu.utf8 this way, that
would be a step forward since it would solve the Windows aspect.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#7)

Re: how to run encoding-dependent tests by default

I wrote:

I'm less clear on a reasonable way to detect a glibc platform
from SQL. The best I can think of is to see if the string
"linux" appears in the output of version(), and that's probably
none too robust. Can we do anything based on the content of
pg_collation? Probably not :-(.

Actually, scraping the buildfarm database suggests that checking
version() for "linux" or even "linux-gnu" would work very well.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#8)

Re: how to run encoding-dependent tests by default

Oh ... one other thought, based on forcing the collate.linux.utf8
test to run on platforms where it can be expected to fail: I think
you'd be well advised to make that test verify that the required
collations are present, the same as you did in the collate.icu.utf8
test. I noticed for instance that it fails if en_US.utf8 is not
present (or not spelled exactly like that), but I doubt that that
locale is necessarily present on every Linux platform. tr_TR is
even more likely to be subject to packagers' whims.

regards, tom lane

#10

Peter Eisentraut

peter_e@gmx.net

over 6 years ago

In reply to: Tom Lane (#7)

Re: how to run encoding-dependent tests by default

On 2019-07-28 20:12, Tom Lane wrote:

So I wish we could get rid of the Makefile changes, have the test
scripts be completely responsible for whether to run themselves or
not, and put them into the schedule files normally.

It's pretty obvious how we might do this for collate.icu.utf8:
make it look to see if there are any ICU-supplied collations in
pg_collation.

I'm less clear on a reasonable way to detect a glibc platform
from SQL. The best I can think of is to see if the string
"linux" appears in the output of version(), and that's probably
none too robust. Can we do anything based on the content of
pg_collation? Probably not :-(.

Still, even if you only fixed collate.icu.utf8 this way, that
would be a step forward since it would solve the Windows aspect.

Good points. Updated patch attach.

(The two tests create the same schema name, so they cannot be run in
parallel. I opted against changing that here, since it would blow up
the patch and increase the diff between the two tests.)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

Peter Eisentraut

peter_e@gmx.net

over 6 years ago

In reply to: Tom Lane (#9)

Re: how to run encoding-dependent tests by default

On 2019-07-28 21:42, Tom Lane wrote:

Oh ... one other thought, based on forcing the collate.linux.utf8
test to run on platforms where it can be expected to fail: I think
you'd be well advised to make that test verify that the required
collations are present, the same as you did in the collate.icu.utf8
test. I noticed for instance that it fails if en_US.utf8 is not
present (or not spelled exactly like that), but I doubt that that
locale is necessarily present on every Linux platform. tr_TR is
even more likely to be subject to packagers' whims.

This was already done in my v2 test posted in this thread.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Peter Eisentraut (#10)

Re: how to run encoding-dependent tests by default

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2019-07-28 20:12, Tom Lane wrote:

So I wish we could get rid of the Makefile changes, have the test
scripts be completely responsible for whether to run themselves or
not, and put them into the schedule files normally.

Good points. Updated patch attach.

v3 looks good and passes local testing. I've marked it RFC.

(The two tests create the same schema name, so they cannot be run in
parallel. I opted against changing that here, since it would blow up
the patch and increase the diff between the two tests.)

This does create one tiny nit, which is that the order of the
parallel and serial schedule files don't match. Possibly I'm
overly anal-retentive about that, but I think it's confusing
when they don't.

regards, tom lane

#13

Peter Eisentraut

peter_e@gmx.net

over 6 years ago

In reply to: Tom Lane (#12)

Re: how to run encoding-dependent tests by default

On 2019-07-29 16:47, Tom Lane wrote:

(The two tests create the same schema name, so they cannot be run in
parallel. I opted against changing that here, since it would blow up
the patch and increase the diff between the two tests.)

This does create one tiny nit, which is that the order of the
parallel and serial schedule files don't match. Possibly I'm
overly anal-retentive about that, but I think it's confusing
when they don't.

Right. Committed with adjustment to keep these consistent.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

how to run encoding-dependent tests by default

Attachments:

Attachments:

Attachments:

Attachments: