Getting rid of regression test input/ and output/ files

Started by Tom Laneover 4 years ago11 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I did some desultory investigation of the idea proposed at [1]/messages/by-id/20211217182518.GA2529654@rfd.leadboat.com
that we should refactor the regression test scripts to try to
reduce their interdependencies. I soon realized that one of
the stumbling blocks to this is that we've tended to concentrate
data-loading COPY commands, as well as C function creation
commands, into a few files to reduce the notational cruft of
substituting path names and the like into the test scripts.
That is, we don't want to have even more scripts that have to be
translated from input/foo.source and output/foo.source into
runnable scripts and test results.

This led me to wonder why we couldn't get rid of that entire
mechanism in favor of some less-painful way of getting that
information into the scripts. If we had the desired values in
psql variables, we could do what we need easily, for example
instead of

CREATE FUNCTION check_primary_key ()
RETURNS trigger
AS '@libdir@/refint@DLSUFFIX@'
LANGUAGE C;

something like

CREATE FUNCTION check_primary_key ()
RETURNS trigger
AS :'LIBDIR'
'/refint'
:'DLSUFFIX'
LANGUAGE C;

(The extra line breaks are needed to convince SQL that the
adjacent string literals should be concatenated. We couldn't
have done this so easily before psql had the :'variable'
notation, but that came in in 9.0.)

I see two ways we could get the info from pg_regress into psql
variables:

1. Add "-v VARIABLE=VALUE" switches to the psql invocations.
This requires no new psql capability, but it does introduce
the problem of getting correct shell quoting of the values.
I think we'd need to either duplicate appendShellString in
pg_regress.c, or start linking both libpq and libpgfeutils.a
into pg_regress to be able to use appendShellString itself.
In the past we've not wanted to link libpq into pg_regress
(though I admit I've forgotten the argument for not doing so).

2. Export the values from pg_regress as environment variables,
and then add a way for the test scripts to read those variables.
I was a bit surprised to realize that we didn't have any way
to do that already --- psql has \setenv, so why did we never
invent \getenv?

On the whole I prefer #2, as it seems cleaner and it adds some
actually useful-to-end-users psql functionality.

Attached is a really incomplete, quick-n-dirty POC showing that
this can be made to work. If there aren't objections or better
ideas, I'll see about fleshing this out.

regards, tom lane

[1]: /messages/by-id/20211217182518.GA2529654@rfd.leadboat.com

Attachments:

pass-paths-to-tests-as-envvars-wip.patchtext/x-diff; charset=us-ascii; name=pass-paths-to-tests-as-envvars-wip.patchDownload+75-6
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: Getting rid of regression test input/ and output/ files

I wrote:

This led me to wonder why we couldn't get rid of that entire
mechanism in favor of some less-painful way of getting that
information into the scripts. If we had the desired values in
psql variables, we could do what we need easily, for example ...

Here's some fleshed-out patches for this.

0001 adds the \getenv command to psql; now with documentation
and a simple regression test.

0002 tweaks pg_regress to export the needed values as environment
variables, and modifies the test scripts to use those variables.
(For ease of review, this patch modifies the scripts in-place,
and then 0003 will move them.) A few comments on this:

* I didn't see any value in exporting @testtablespace@ as a separate
variable; we might as well let the test script know how to construct
that path name.

* I concluded that the right way to handle the concatenation issue
is *not* to rely on SQL literal concatenation, but to use psql's
\set command to concatenate parts of a string. In particular this
gives us a clean way to handle quoting/escaping rules in the places
where a pathname has to be embedded in some larger string, such as
a function body. The golden rule for that seems to be "use one \set
per level of quoting". I believe this code is now fairly proof
against situations that would completely break the existing way of
doing things, such as pathnames with quotes or backslashes in them.
(It's hard to test the embedded-quote case, because that breaks the
Makefiles too, but I did get through the regression tests with a
path including a backslash.)

* There are a couple of places where the existing tests involve
substituting a path name into expected query output or error messages.
This technique cannot handle that, but we have plenty of prior art for
dealing with such cases. I changed file_fdw to use a filter function
to hide the pathnames in EXPLAIN output, and tweaked create_function_0
to show only an edited version of an error message (this is based on a
similar case in infinite_recurse.sql).

0003 simply "git mv"'s the scripts and output files into place as
normal not-requiring-editing files. Be careful to "make clean"
before applying this, else you may have conflicts with the target
files already being present. Also, while you can run the tests
between 0003 and 0004, don't do "make clean" in this state or the
hacky EXTRA_CLEAN rules in dblink and file_fdw will remove files
you want.

0004 finally removes the no-longer-needed infrastructure in
pg_regress and the makefiles. (BTW, as far as I can find, the
MSVC scripts have no provisions for cleaning these generated files?)

There's some refactoring that could be done afterwards, for example
there seems little reason for dblink's paths.sql to continue to exist
as a separate script. But it seemed best for this patch series to
convert the scripts as mechanically as possible.

I'm fairly pleased with how this came out. I think these scripts
will be *much* easier to maintain in this form. Updating the
output/*.source files was always a major pain in the rear, since
you couldn't just copy results/ files to them.

Comments?

regards, tom lane

Attachments:

v1-0001-add-getenv-command.patchtext/x-diff; charset=us-ascii; name=v1-0001-add-getenv-command.patchDownload+87-0
v1-0002-use-env-vars-in-tests.patchtext/x-diff; charset=us-ascii; name=v1-0002-use-env-vars-in-tests.patchDownload+477-261
v1-0003-move-scripts-to-normal-dirs.patchtext/x-diff; charset=us-ascii; name=v1-0003-move-scripts-to-normal-dirs.patchDownload
v1-0004-remove-useless-infrastructure.patchtext/x-diff; charset=us-ascii; name=v1-0004-remove-useless-infrastructure.patchDownload+8-197
#3Corey Huinker
corey.huinker@gmail.com
In reply to: Tom Lane (#2)
Re: Getting rid of regression test input/ and output/ files

0001 adds the \getenv command to psql; now with documentation
and a simple regression test.

+1. Wish I had added this years ago when I had a need for it.

0002 tweaks pg_regress to export the needed values as environment
variables, and modifies the test scripts to use those variables.
(For ease of review, this patch modifies the scripts in-place,
and then 0003 will move them.) A few comments on this:

* I didn't see any value in exporting @testtablespace@ as a separate
variable; we might as well let the test script know how to construct
that path name.

* I concluded that the right way to handle the concatenation issue
is *not* to rely on SQL literal concatenation, but to use psql's
\set command to concatenate parts of a string. In particular this

+1 to that, much better than the multi-line thing.

I have a nitpick about the \getenv FOO FOO lines.
It's a new function to everyone, and to anyone who hasn't seen the
documentation it won't be immediately obvious which one is the ENV var and
which one is the local var. Lowercasing the local var would be a way to
reinforce which is which to the reader. It would also be consistent with
var naming in the rest of the script.

0004 finally removes the no-longer-needed infrastructure in

+1
Deleted code is debugged code.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Corey Huinker (#3)
Re: Getting rid of regression test input/ and output/ files

Corey Huinker <corey.huinker@gmail.com> writes:

I have a nitpick about the \getenv FOO FOO lines.
It's a new function to everyone, and to anyone who hasn't seen the
documentation it won't be immediately obvious which one is the ENV var and
which one is the local var. Lowercasing the local var would be a way to
reinforce which is which to the reader. It would also be consistent with
var naming in the rest of the script.

Reasonable idea. Another thing I was wondering about was whether
to attach PG_ prefixes to the environment variable names, since
those are in a more-or-less global namespace. If we do that,
then a different method for distinguishing the psql variables
is to not prefix them.

regards, tom lane

#5Corey Huinker
corey.huinker@gmail.com
In reply to: Tom Lane (#4)
Re: Getting rid of regression test input/ and output/ files

On Sun, Dec 19, 2021 at 5:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Corey Huinker <corey.huinker@gmail.com> writes:

I have a nitpick about the \getenv FOO FOO lines.
It's a new function to everyone, and to anyone who hasn't seen the
documentation it won't be immediately obvious which one is the ENV var

and

which one is the local var. Lowercasing the local var would be a way to
reinforce which is which to the reader. It would also be consistent with
var naming in the rest of the script.

Reasonable idea. Another thing I was wondering about was whether
to attach PG_ prefixes to the environment variable names, since
those are in a more-or-less global namespace. If we do that,
then a different method for distinguishing the psql variables
is to not prefix them.

+1 to that as well.

Which brings up a tangential question, is there value in having something
that brings in one or more env vars as psql vars directly. I'm thinking
something like:

\importenv pattern [prefix]

(alternate names: \getenv_multi \getenv_pattern, \getenvs, etc)

which could be used like

\importenv PG* env_

which would import PGFOO and PGBAR as env_PGFOO and env_PGBAR, awkward
names but leaving no doubt about where a previously unreferenced variable
came from.

I don't *think* we need it for this specific case, but since the subject of
env vars has come up I thought I'd throw it out there.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Corey Huinker (#5)
Re: Getting rid of regression test input/ and output/ files

Corey Huinker <corey.huinker@gmail.com> writes:

Which brings up a tangential question, is there value in having something
that brings in one or more env vars as psql vars directly. I'm thinking
something like:

\importenv pattern [prefix]

Meh ... considering we've gone this long without any getenv capability
in psql at all, that seems pretty premature.

regards, tom lane

#7Corey Huinker
corey.huinker@gmail.com
In reply to: Tom Lane (#6)
Re: Getting rid of regression test input/ and output/ files

On Sun, Dec 19, 2021 at 7:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Corey Huinker <corey.huinker@gmail.com> writes:

Which brings up a tangential question, is there value in having something
that brings in one or more env vars as psql vars directly. I'm thinking
something like:

\importenv pattern [prefix]

Meh ... considering we've gone this long without any getenv capability
in psql at all, that seems pretty premature.

regards, tom lane

Fair enough.

Patches didn't apply with `git apply` but did fine with `patch -p1`, from
there it passes make check-world.

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#1)
Re: Getting rid of regression test input/ and output/ files

On 19.12.21 00:53, Tom Lane wrote:

2. Export the values from pg_regress as environment variables,
and then add a way for the test scripts to read those variables.
I was a bit surprised to realize that we didn't have any way
to do that already --- psql has \setenv, so why did we never
invent \getenv?

You can do

\set foo `echo $ENVVAR`

but that's probably not portable enough for your purpose.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#8)
Re: Getting rid of regression test input/ and output/ files

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 19.12.21 00:53, Tom Lane wrote:

I was a bit surprised to realize that we didn't have any way
to do that already --- psql has \setenv, so why did we never
invent \getenv?

You can do
\set foo `echo $ENVVAR`
but that's probably not portable enough for your purpose.

I suppose that wouldn't work on Windows, so no.

regards, tom lane

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Getting rid of regression test input/ and output/ files

On 12/18/21 18:53, Tom Lane wrote:

2. Export the values from pg_regress as environment variables,
and then add a way for the test scripts to read those variables.
I was a bit surprised to realize that we didn't have any way
to do that already --- psql has \setenv, so why did we never
invent \getenv?

I don't recall anyone expressing a need for it at the time we added
\setenv.

+1 for adding it now.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#11Bruce Momjian
bruce@momjian.us
In reply to: Corey Huinker (#5)
Re: Getting rid of regression test input/ and output/ files

On Sun, 19 Dec 2021 at 18:41, Corey Huinker <corey.huinker@gmail.com> wrote:

Which brings up a tangential question, is there value in having something that brings in one or more env vars as psql vars directly. I'm thinking something like:

\importenv pattern [prefix]

Oof. That gives me the security heebie jeebies. Off the top of my head
PHP, CGI, SSH have all dealt with vulnerabilities caused by
accidentally importing variables they didn't intend to.

--
greg