A test for replay of regression tests
Hi,
We have automated tests for many specific replication and recovery
scenarios, but nothing that tests replay of a wide range of records.
People working on recovery code often use installcheck (presumably
along with other custom tests) to exercise it, sometimes with
wal_consistency_check enabled. So, why don't we automate that? Aside
from exercising the WAL decoding machinery (which brought me here),
that'd hopefully provide some decent improvements in coverage of the
various redo routines, many of which are not currently exercised at
all.
I'm not quite sure where it belongs, though. The attached initial
sketch patch puts it under rc/test/recovery near other similar things,
but I'm not sure if it's really OK to invoke make -C ../regress from
here. I copied pg_update/test.sh's trick of using a different
outputdir to avoid clobbering a concurrent run under src/test/regress,
and I also needed to invent a way to stop it from running the cursed
tablespace test (deferring startup of the standby also works but eats
way too much space, which I learned after blowing out a smallish
development VM's disk). Alternatively, we could put it under
src/test/regress, which makes some kind of logical sense, but it would
make a quick "make check" take more than twice as long. Perhaps it
could use a different target, one that check-world somehow reaches but
not check, and that also doesn't seem great. Ideas on how to
structure this or improve the perl welcome.
Attachments:
0001-Test-replay-of-regression-tests.patchtext/x-patch; charset=US-ASCII; name=0001-Test-replay-of-regression-tests.patchDownload+97-1
From: Thomas Munro <thomas.munro@gmail.com>
We have automated tests for many specific replication and recovery scenarios,
but nothing that tests replay of a wide range of records.
People working on recovery code often use installcheck (presumably along
with other custom tests) to exercise it, sometimes with
wal_consistency_check enabled. So, why don't we automate that? Aside
from exercising the WAL decoding machinery (which brought me here), that'd
hopefully provide some decent improvements in coverage of the various redo
routines, many of which are not currently exercised at all.I'm not quite sure where it belongs, though. The attached initial sketch patch
I think that's a good attempt. It'd be better to confirm that the database contents are identical on the primary and standby. But... I remember when I ran make installcheck with a standby connected, then ran pg_dumpall on both the primary and standby and compare the two output files, about 40 lines of difference were observed. Those differences were all about the sequence values. I didn't think about whether I should/can remove the differences.
+# XXX The tablespace tests don't currently work when the standby shares a
+# filesystem with the primary due to colliding absolute paths. We'll skip
+# that for now.
Maybe we had better have a hidden feature that creates tablespace contents in
/tablespace_location/PG_..._<some_name>/
<some_name> is the value of cluster_name or application_name.
Or, we may provide a visible feature that allows users to store tablespace contents in a specified directory regardless of the LOCATION value in CREATE TABLESPACE. For instance, add a GUC like
table_space_directory = '/some_dir'
Then, the tablespace contents are stored in /some_dir/<tablespace_name>/. This may be useful if a DBaaS provider wants to offer some tablespace-based feature, say tablespace transparent data encryption, but doesn't want users to specify filesystem directories.
Regards
Takayuki Tsunakawa
On Fri, Apr 23, 2021 at 6:27 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
From: Thomas Munro <thomas.munro@gmail.com>
I'm not quite sure where it belongs, though. The attached initial sketch patch
I think that's a good attempt. It'd be better to confirm that the database contents are identical on the primary and standby. But... I remember when I ran make installcheck with a standby connected, then ran pg_dumpall on both the primary and standby and compare the two output files, about 40 lines of difference were observed. Those differences were all about the sequence values. I didn't think about whether I should/can remove the differences.
Interesting idea. I hadn't noticed the thing with sequences before.
+# XXX The tablespace tests don't currently work when the standby shares a +# filesystem with the primary due to colliding absolute paths. We'll skip +# that for now.Maybe we had better have a hidden feature that creates tablespace contents in
/tablespace_location/PG_..._<some_name>/
<some_name> is the value of cluster_name or application_name.
Or, we may provide a visible feature that allows users to store tablespace contents in a specified directory regardless of the LOCATION value in CREATE TABLESPACE. For instance, add a GUC like
table_space_directory = '/some_dir'
Then, the tablespace contents are stored in /some_dir/<tablespace_name>/. This may be useful if a DBaaS provider wants to offer some tablespace-based feature, say tablespace transparent data encryption, but doesn't want users to specify filesystem directories.
Yeah, a few similar ideas have been put forward before, for example in
this thread:
/messages/by-id/CALfoeisEF92F5nJ-aAcuWTvF_Aogxq_1bHLem_kVfM_tHc2mfg@mail.gmail.com
... but also others. I would definitely like to fix that problem too
(having worked on several things that interact with tablespaces, I
definitely want to be able to extend those tests in future patches,
and get coverage on the build farm and CI), but with --skip-tests that
could be done independently.
Apparently the invocation of make failed for some reason on CI (not
sure why, works for me), but in any case, that feels a bit clunky and
might not ever work on Windows, so perhaps we should figure out how to
invoke the pg_regress[.exe] program directly.
Hi,
On 2021-04-23 17:37:48 +1200, Thomas Munro wrote:
We have automated tests for many specific replication and recovery
scenarios, but nothing that tests replay of a wide range of records.
People working on recovery code often use installcheck (presumably
along with other custom tests) to exercise it, sometimes with
wal_consistency_check enabled. So, why don't we automate that? Aside
from exercising the WAL decoding machinery (which brought me here),
that'd hopefully provide some decent improvements in coverage of the
various redo routines, many of which are not currently exercised at
all.
Yay.
I'm not quite sure where it belongs, though. The attached initial
sketch patch puts it under rc/test/recovery near other similar things,
but I'm not sure if it's really OK to invoke make -C ../regress from
here.
I'd say it's not ok, and we should just invoke pg_regress without make.
Add a new TAP test under src/test/recovery that runs the regression
tests with wal_consistency_checking=all.
Hm. I wonder if running with wal_consistency_checking=all doesn't also
reduce coverage of some aspects of recovery, by increasing record sizes
etc.
I copied pg_update/test.sh's trick of using a different
outputdir to avoid clobbering a concurrent run under src/test/regress,
and I also needed to invent a way to stop it from running the cursed
tablespace test (deferring startup of the standby also works but eats
way too much space, which I learned after blowing out a smallish
development VM's disk).
That's because you are using wal_consistency_checking=all, right?
Because IIRC we don't generate that much WAL otherwise?
+# Create some content on primary and check its presence in standby 1 +$node_primary->safe_psql('postgres', + "CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a"); + +# Wait for standby to catch up +$node_primary->wait_for_catchup($node_standby_1, 'replay', + $node_primary->lsn('insert'));
+my $result = + $node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int"); +print "standby 1: $result\n"; +is($result, qq(1002), 'check streamed content on standby 1');
Why is this needed?
Greetings,
Andres Freund
On 4/23/21 1:37 AM, Thomas Munro wrote:
Hi,
We have automated tests for many specific replication and recovery
scenarios, but nothing that tests replay of a wide range of records.
People working on recovery code often use installcheck (presumably
along with other custom tests) to exercise it, sometimes with
wal_consistency_check enabled. So, why don't we automate that? Aside
from exercising the WAL decoding machinery (which brought me here),
that'd hopefully provide some decent improvements in coverage of the
various redo routines, many of which are not currently exercised at
all.I'm not quite sure where it belongs, though. The attached initial
sketch patch puts it under rc/test/recovery near other similar things,
but I'm not sure if it's really OK to invoke make -C ../regress from
here. I copied pg_update/test.sh's trick of using a different
outputdir to avoid clobbering a concurrent run under src/test/regress,
and I also needed to invent a way to stop it from running the cursed
tablespace test (deferring startup of the standby also works but eats
way too much space, which I learned after blowing out a smallish
development VM's disk). Alternatively, we could put it under
src/test/regress, which makes some kind of logical sense, but it would
make a quick "make check" take more than twice as long. Perhaps it
could use a different target, one that check-world somehow reaches but
not check, and that also doesn't seem great. Ideas on how to
structure this or improve the perl welcome.
Nice, I like adding a skip-tests option to pg_regress.
The perl looks ok - I assume those
print "standby 1: $result\n";
lines are there for debugging. Normally you would just process $result
using the Test::More functions
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andres Freund <andres@anarazel.de> writes:
On 2021-04-23 17:37:48 +1200, Thomas Munro wrote:
We have automated tests for many specific replication and recovery
scenarios, but nothing that tests replay of a wide range of records.
Yay.
+1
Add a new TAP test under src/test/recovery that runs the regression
tests with wal_consistency_checking=all.
Hm. I wonder if running with wal_consistency_checking=all doesn't also
reduce coverage of some aspects of recovery, by increasing record sizes
etc.
Yeah. I found out earlier that wal_consistency_checking=all is an
absolute PIG. Running the regression tests that way requires tens of
gigabytes of disk space, and a significant amount of time if your
disk is not very speedy. If we put this into the buildfarm at all,
it would have to be opt-in, not run-by-default, because a lot of BF
animals simply don't have the horsepower. I think I'd vote against
adding it to check-world, too; the cost/benefit ratio is not good
unless you are specifically working on replay functions.
And as you say, it alters the behavior enough to make it a little
questionable whether we're exercising the "normal" code paths.
The two things I'd say about this are (1) Whether to use
wal_consistency_checking, and with what value, needs to be
easily adjustable. (2) People will want to run other test suites
than the core regression tests, eg contrib modules.
regards, tom lane
Hi,
On 2021-04-23 11:53:35 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Hm. I wonder if running with wal_consistency_checking=all doesn't also
reduce coverage of some aspects of recovery, by increasing record sizes
etc.Yeah. I found out earlier that wal_consistency_checking=all is an
absolute PIG. Running the regression tests that way requires tens of
gigabytes of disk space, and a significant amount of time if your
disk is not very speedy. If we put this into the buildfarm at all,
it would have to be opt-in, not run-by-default, because a lot of BF
animals simply don't have the horsepower. I think I'd vote against
adding it to check-world, too; the cost/benefit ratio is not good
unless you are specifically working on replay functions.
I think it'd be a huge improvement to test recovery during check-world
by default - it's a huge swath of crucial code that practically has no
test coverage. I agree that testing by default with
wal_consistency_checking=all isn't feasible due to the time & space
overhead, so I think we should not do that.
The two things I'd say about this are (1) Whether to use
wal_consistency_checking, and with what value, needs to be
easily adjustable. (2) People will want to run other test suites
than the core regression tests, eg contrib modules.
I'm not really excited about tackling 2) initially. I think it's a
significant issue that we don't have test coverage for most redo
routines and that we should change that with some urgency - even if we
back out these WAL prefetch related changes, there've been sufficiently
many changes affecting WAL that I think it's worth doing the minimal
thing soon.
I don't think there's actually that much need to test contrib modules
through recovery - most of them don't seem like they'd add meaningful
coverage? The exception is contrib/bloom, but perhaps that'd be better
tackled with a dedicated test?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-23 11:53:35 -0400, Tom Lane wrote:
Yeah. I found out earlier that wal_consistency_checking=all is an
absolute PIG. Running the regression tests that way requires tens of
gigabytes of disk space, and a significant amount of time if your
disk is not very speedy. If we put this into the buildfarm at all,
it would have to be opt-in, not run-by-default, because a lot of BF
animals simply don't have the horsepower. I think I'd vote against
adding it to check-world, too; the cost/benefit ratio is not good
unless you are specifically working on replay functions.
I think it'd be a huge improvement to test recovery during check-world
by default - it's a huge swath of crucial code that practically has no
test coverage. I agree that testing by default with
wal_consistency_checking=all isn't feasible due to the time & space
overhead, so I think we should not do that.
I was mainly objecting to enabling wal_consistency_checking by default.
I agree it's bad that we have so little routine test coverage on WAL
replay code.
The two things I'd say about this are (1) Whether to use
wal_consistency_checking, and with what value, needs to be
easily adjustable. (2) People will want to run other test suites
than the core regression tests, eg contrib modules.
I don't think there's actually that much need to test contrib modules
through recovery - most of them don't seem like they'd add meaningful
coverage? The exception is contrib/bloom, but perhaps that'd be better
tackled with a dedicated test?
contrib/bloom is indeed the only(?) case within contrib, but in my mind
that's a proxy for what people will be needing to test out-of-core AMs.
It might not be a test to run by default, but there needs to be a way
to do it.
Also, I suspect that there are bits of GIST/GIN/SPGIST that are not
well exercised if you don't run the contrib modules that add opclasses
for those.
regards, tom lane
Hi,
On 2021-04-23 13:13:15 -0400, Tom Lane wrote:
contrib/bloom is indeed the only(?) case within contrib, but in my mind
that's a proxy for what people will be needing to test out-of-core AMs.
It might not be a test to run by default, but there needs to be a way
to do it.
Hm. My experience in the past was that the best way to test an external
AM is to run the core regression tests with a different
default_table_access_method. That does require some work of ensuring the
AM is installed and the relevant extension created, which in turn
requires a different test schedule, or hacking template1. So likely a
different test script anyway?
Also, I suspect that there are bits of GIST/GIN/SPGIST that are not
well exercised if you don't run the contrib modules that add opclasses
for those.
Possible - still think it'd be best to get the minimal thing in asap,
and then try to extend further afterwards... Perfect being the enemy of
good and all that.
Greetings,
Andres Freund
Ok, here's a new version incorporating feedback so far.
1. Invoke pg_regress directly (no make).
2. Use PG_TEST_EXTRA="wal_consistency_checking" as a way to opt in to
the more expensive test.
3. Use parallel schedule rather than serial. It's faster but also
the non-determinism might discover more things. This required
changing the TAP test max_connections setting from 10 to 25.
4. Remove some extraneous print statements and
check-if-data-is-replicated-using-SELECT tests that are technically
not needed (I had copied those from 001_stream_rep.pl).
Attachments:
v2-0001-Test-replay-of-regression-tests.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Test-replay-of-regression-tests.patchDownload+101-2
вт, 8 июн. 2021 г. в 02:25, Thomas Munro <thomas.munro@gmail.com>:
Ok, here's a new version incorporating feedback so far.
1. Invoke pg_regress directly (no make).
2. Use PG_TEST_EXTRA="wal_consistency_checking" as a way to opt in to
the more expensive test.3. Use parallel schedule rather than serial. It's faster but also
the non-determinism might discover more things. This required
changing the TAP test max_connections setting from 10 to 25.4. Remove some extraneous print statements and
check-if-data-is-replicated-using-SELECT tests that are technically
not needed (I had copied those from 001_stream_rep.pl).
Thank you for working on this test set!
I was especially glad to see the skip-tests option for pg_regress. I think
it will become a very handy tool for hackers.
To try the patch I had to resolve a few merge conflicts, see a rebased
version in attachments.
auth_extra => [ '--create-role', 'repl_role' ]);
This line and the comment above it look like some copy-paste artifacts. Did
I get it right? If so, I suggest removing them.
Other than that, the patch looks good to me.
--
Best regards,
Lubennikova Anastasia
Attachments:
v3-0001-Test-replay-of-regression-tests.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Test-replay-of-regression-tests.patchDownload+101-1
вт, 8 июн. 2021 г. в 02:44, Anastasia Lubennikova <lubennikovaav@gmail.com>:
вт, 8 июн. 2021 г. в 02:25, Thomas Munro <thomas.munro@gmail.com>:
Ok, here's a new version incorporating feedback so far.
1. Invoke pg_regress directly (no make).
2. Use PG_TEST_EXTRA="wal_consistency_checking" as a way to opt in to
the more expensive test.3. Use parallel schedule rather than serial. It's faster but also
the non-determinism might discover more things. This required
changing the TAP test max_connections setting from 10 to 25.4. Remove some extraneous print statements and
check-if-data-is-replicated-using-SELECT tests that are technically
not needed (I had copied those from 001_stream_rep.pl).Thank you for working on this test set!
I was especially glad to see the skip-tests option for pg_regress. I think
it will become a very handy tool for hackers.To try the patch I had to resolve a few merge conflicts, see a rebased
version in attachments.auth_extra => [ '--create-role', 'repl_role' ]);
This line and the comment above it look like some copy-paste artifacts.
Did I get it right? If so, I suggest removing them.
Other than that, the patch looks good to me.
For some reason, it failed on cfbot, so I'll switch it back to 'Waiting on
author'.
BTW, do I get it right, that cfbot CI will need some adjustments to print
regression.out for this test?
See one more revision of the patch attached. It contains the following
changes:
- rebase to recent main
- removed 'auth_extra' piece, that I mentioned above.
- added lacking make clean and gitignore changes.
--
Best regards,
Lubennikova Anastasia
Attachments:
v4-0001-Test-replay-of-regression-tests.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Test-replay-of-regression-tests.patchDownload+111-2
On Thu, Jun 10, 2021 at 7:37 PM Anastasia Lubennikova
<lubennikovaav@gmail.com> wrote:
вт, 8 июн. 2021 г. в 02:44, Anastasia Lubennikova <lubennikovaav@gmail.com>:
Thank you for working on this test set!
I was especially glad to see the skip-tests option for pg_regress. I think it will become a very handy tool for hackers.To try the patch I had to resolve a few merge conflicts, see a rebased version in attachments.
auth_extra => [ '--create-role', 'repl_role' ]);
This line and the comment above it look like some copy-paste artifacts. Did I get it right? If so, I suggest removing them.
Other than that, the patch looks good to me.For some reason, it failed on cfbot, so I'll switch it back to 'Waiting on author'.
BTW, do I get it right, that cfbot CI will need some adjustments to print regression.out for this test?See one more revision of the patch attached. It contains the following changes:
- rebase to recent main
- removed 'auth_extra' piece, that I mentioned above.
- added lacking make clean and gitignore changes.
Thanks! Yeah, there does seem to be a mysterious CI failure there,
not reproducible locally for me. You're right that it's not dumping
enough information to diagnose the problem... I will look into it
tomorrow.
On Thu, Jun 10, 2021 at 7:47 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Jun 10, 2021 at 7:37 PM Anastasia Lubennikova
<lubennikovaav@gmail.com> wrote:For some reason, it failed on cfbot, so I'll switch it back to 'Waiting on
Sorry for the delay. I got stuck in a CI loop trying to make a new
improved version work on Windows, so far without success. If anyone
can tell me what's wrong on that OS I'd be very grateful to hear it.
I don't know if it's something I haven't understood about reparse
points, or something to do with "restricted tokens" and accounts
privileges. The symptoms on Cirrus are:
DROP TABLESPACE regress_tblspacewith;
+WARNING: could not open directory
"pg_tblspc/16386/PG_15_202109101": No such file or directory
+WARNING: could not stat file "pg_tblspc/16386": No such file or directory
The short explanation of this new version is that the tablespace tests
now work on a pair of local nodes because you can do this sort of
thing:
postgres=# create tablespace xxx location 'pg_user_files/xxx';
ERROR: directory "pg_user_files/xxx" does not exist
postgres=# create tablespace xxx new location 'pg_user_files/xxx';
CREATE TABLESPACE
Patches:
0001: Allow restricted relative tablespace paths.
Rationale: I really want to be able to run the tablespace test with a
local replica, instead of just skipping it (including but not only
from this new TAP test). After re-reading a bunch of threads about
how to achieve that that never went anywhere and considering various
ideas already presented, I wondered if we could agree on allowing
relative paths under one specific directory "pg_user_files" (a
directory that PostgreSQL itself will completely ignore). Better
names welcome.
I wonder if for Windows we'd want to switch to real symlinks, since,
as far as I know from some light reading, reparse points are converted
to absolute paths on creation, so the pgdata directory would be less
portable than it would be on POSIX systems.
0002: CREATE TABLESPACE ... NEW LOCATION.
The new syntax "NEW" says that it's OK if the directory doesn't exist
yet, we'll just create it.
Rationale: With relative paths, it's tricky for pg_regress to find
the data directory of the primary server + any streaming replicas that
may be downstream from it (and possibly remote) to create the
directory, but the server can do it easily. Better syntax welcome.
(I originally wanted to use WITH (<something>) but that syntax is
tangled up with persistent relopts.)
0003: Use relative paths for tablespace regression test.
Remove the pg_regress logic for creating the directory, and switch to
relative paths using the above.
0004: Test replay of regression tests.
Same as before, this adds a replicated run of the regression tests in
src/test/recovery/t/027_stream_regress.pl, with an optional expensive
mode that you can enable with
PG_TEST_EXTRA="wal_consistency_checking".
I removed the useless --create-role as pointed out by Anastasia.
I added a step to compare the contents of the primary and replica
servers with pg_dump, as suggested by Tsunakawa-san.
I think the way I pass in the psql source directory to --bindir is not
good, but I've reached my daily limit of Perl; how should I be
specifying the tmp_install bin directory here? This is so pg_regress
can find psql.
system_or_bail("../regress/pg_regress",
"--bindir=../../bin/psql",
"--port=" . $node_primary->port,
"--schedule=../regress/parallel_schedule",
"--dlpath=../regress",
"--inputdir=../regress");
Attachments:
v5-0003-Use-relative-paths-for-tablespace-regression-test.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Use-relative-paths-for-tablespace-regression-test.patchDownload+7-51
v5-0004-Test-replay-of-regression-tests.patchtext/x-patch; charset=UTF-8; name=v5-0004-Test-replay-of-regression-tests.patchDownload+80-2
v5-0001-Allow-restricted-relative-tablespace-paths.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Allow-restricted-relative-tablespace-paths.patchDownload+31-4
v5-0002-CREATE-TABLESPACE-.-NEW-LOCATION.patchtext/x-patch; charset=US-ASCII; name=v5-0002-CREATE-TABLESPACE-.-NEW-LOCATION.patchDownload+47-19
On Wed, Oct 6, 2021 at 7:10 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I wonder if for Windows we'd want to switch to real symlinks, since,
as far as I know from some light reading, reparse points are converted
to absolute paths on creation, so the pgdata directory would be less
portable than it would be on POSIX systems.
I looked into that a bit, and it seems that the newer "real" symlinks
can't be created without admin privileges, unlike junction points, so
that wouldn't help. I still need to figure out what this
junction-based patch set is doing wrong on Windows though trial-by-CI.
In the meantime, here is a rebase over recent changes to the Perl
testing APIs.
Attachments:
v6-0001-Allow-restricted-relative-tablespace-paths.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Allow-restricted-relative-tablespace-paths.patchDownload+31-4
v6-0002-CREATE-TABLESPACE-.-NEW-LOCATION.patchtext/x-patch; charset=US-ASCII; name=v6-0002-CREATE-TABLESPACE-.-NEW-LOCATION.patchDownload+47-19
v6-0003-Use-relative-paths-for-tablespace-regression-test.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Use-relative-paths-for-tablespace-regression-test.patchDownload+7-51
v6-0004-Test-replay-of-regression-tests.patchtext/x-patch; charset=UTF-8; name=v6-0004-Test-replay-of-regression-tests.patchDownload+80-2
On 11/23/21 04:07, Thomas Munro wrote:
On Wed, Oct 6, 2021 at 7:10 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I wonder if for Windows we'd want to switch to real symlinks, since,
as far as I know from some light reading, reparse points are converted
to absolute paths on creation, so the pgdata directory would be less
portable than it would be on POSIX systems.I looked into that a bit, and it seems that the newer "real" symlinks
can't be created without admin privileges, unlike junction points, so
that wouldn't help. I still need to figure out what this
junction-based patch set is doing wrong on Windows though trial-by-CI.
In the meantime, here is a rebase over recent changes to the Perl
testing APIs.
More exactly you need to "Create Symbolic Links" privilege. see
<https://github.com/git-for-windows/git/wiki/Symbolic-Links>
I haven't yet worked out a way of granting that from the command line,
but if it's working the buildfarm client (as of git tip) will use it on
windows for the workdirs space saving feature.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 11/23/21 10:47, Andrew Dunstan wrote:
On 11/23/21 04:07, Thomas Munro wrote:
On Wed, Oct 6, 2021 at 7:10 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I wonder if for Windows we'd want to switch to real symlinks, since,
as far as I know from some light reading, reparse points are converted
to absolute paths on creation, so the pgdata directory would be less
portable than it would be on POSIX systems.I looked into that a bit, and it seems that the newer "real" symlinks
can't be created without admin privileges, unlike junction points, so
that wouldn't help. I still need to figure out what this
junction-based patch set is doing wrong on Windows though trial-by-CI.
In the meantime, here is a rebase over recent changes to the Perl
testing APIs.More exactly you need to "Create Symbolic Links" privilege. see
<https://github.com/git-for-windows/git/wiki/Symbolic-Links>I haven't yet worked out a way of granting that from the command line,
but if it's working the buildfarm client (as of git tip) will use it on
windows for the workdirs space saving feature.
Update:
There is a PowerShell module called Carbon which provides this and a
whole lot more. It can be installed in numerous ways, including via
Chocolatey. Here's what I am using:
choco install -y Carbon
Import-Module Carbon
Grant-CPrivilege -Identity myuser -Privilege SeCreateSymbolicLinkPrivilege
See <https://get-carbon.org/Grant-Privilege.html> The command name I
used above is now the preferred spelling, although that's not reflected
on the manual page.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Tue, Nov 30, 2021 at 3:14 AM Andrew Dunstan <andrew@dunslane.net> wrote:
choco install -y Carbon
Import-Module Carbon
Grant-CPrivilege -Identity myuser -Privilege SeCreateSymbolicLinkPrivilege
Interesting. Well, I found the problem with my last patch (to wit:
junction points must be absolute, unlike real symlinks, which I'd
considered already but I missed that tmp_check's DataDir had a stray
internal \.\), and now I'm wondering whether these newer real symlinks
could help. The constraints are pretty hard to work with... I thought
about a couple of options:
1. We could try to use real symlinks, and fall back to junction
points if that fails. That means that these new tests I'm proposing
would fail unless you grant that privilege or run in developer mode as
you were saying. It bothers me a bit that developers and the BF would
be testing a different code path than production databases run...
unless you're thinking we should switch to symlinks with no fallback,
and require that privilege to be granted by end users to production
servers at least if they want to use tablespaces, and also drop
pre-Win10 support in the same release? That's bigger than I was
thinking.
2. We could convert relative paths to absolute paths at junction
point creation time, which I tried, and "check" now passes. Problems:
(1) now you can't move pgdata around, (2) the is-the-path-too-long
check performed on a primary is not sufficient to check if the
transformed absolute path will be too long on a replica.
The most conservative simple idea I have so far is: go with option 2,
but make this whole thing an undocumented developer-only mode, and
turn it on in the regression tests. Here's a patch set like that.
Thoughts?
Another option would be to stop using operating system symlinks, and
build the target paths ourselves; I didn't investigate that as it
seemed like a bigger change than this warrants.
Next problem: The below is clearly not the right way to find the
pg_regress binary and bindir, and doesn't work on Windows or VPATH.
Any suggestions for how to do this? I feel like something like
$node->installed_command() or similar logic is needed...
# Run the regression tests against the primary.
# XXX How should we find the pg_regress binary and bindir?
system_or_bail("../regress/pg_regress",
"--bindir=../../bin/psql",
"--port=" . $node_primary->port,
"--schedule=../regress/parallel_schedule",
"--dlpath=../regress",
"--inputdir=../regress");
BTW 0002 is one of those renaming patches from git that GNU patch
doesn't seem to apply correctly, sorry cfbot...
Attachments:
v7-0001-Allow-restricted-relative-tablespace-paths.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Allow-restricted-relative-tablespace-paths.patchDownload+77-8
v7-0002-Use-relative-paths-for-tablespace-regression-test.patchtext/x-patch; charset=US-ASCII; name=v7-0002-Use-relative-paths-for-tablespace-regression-test.patchDownload+32-51
v7-0003-Test-replay-of-regression-tests.patchtext/x-patch; charset=UTF-8; name=v7-0003-Test-replay-of-regression-tests.patchDownload+81-2
On 12/3/21 23:21, Thomas Munro wrote:
Next problem: The below is clearly not the right way to find the
pg_regress binary and bindir, and doesn't work on Windows or VPATH.
Any suggestions for how to do this? I feel like something like
$node->installed_command() or similar logic is needed...# Run the regression tests against the primary.
# XXX How should we find the pg_regress binary and bindir?
system_or_bail("../regress/pg_regress",
"--bindir=../../bin/psql",
"--port=" . $node_primary->port,
"--schedule=../regress/parallel_schedule",
"--dlpath=../regress",
"--inputdir=../regress");
TAP tests are passed a path to pg_regress as $ENV{PG_REGRESS}. You
should be using that. On non-MSVC, the path to a non-installed psql is
in this case "$TESTDIR/../../bin/psql" - this should work for VPATH
builds as well as non-VPATH. On MSVC it's a bit harder - it's
"$top_builddir/$releasetype/psql" but we don't expose that. Perhaps we
should. c.f. commit f4ce6c4d3a
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Sun, Dec 5, 2021 at 4:16 AM Andrew Dunstan <andrew@dunslane.net> wrote:
TAP tests are passed a path to pg_regress as $ENV{PG_REGRESS}. You
should be using that. On non-MSVC, the path to a non-installed psql is
in this case "$TESTDIR/../../bin/psql" - this should work for VPATH
builds as well as non-VPATH. On MSVC it's a bit harder - it's
"$top_builddir/$releasetype/psql" but we don't expose that. Perhaps we
should. c.f. commit f4ce6c4d3a
Thanks, that helped. Here's a new version that passes on Windows,
Unix and Unix with VPATH. I also had to figure out where the DLLs
are, and make sure that various output files go to the build
directory, not source directory, if different, which I did by passing
down another similar environment variable. Better ideas? (It
confused me for some time that make follows the symlink and runs the
perl code from inside the source directory.)
This adds 2 whole minutes to the recovery check, when running with the
Windows serial-only scripts on Cirrus CI (using Andres's CI patches).
For Linux it adds ~20 seconds to the total of -j8 check-world.
Hopefully that's time well spent, because it adds test coverage for
all the redo routines, and hopefully soon we won't have to run 'em in
series on Windows.
Does anyone want to object to the concept of the "pg_user_files"
directory or the developer-only GUC "allow_relative_tablespaces"?
There's room for discussion about names; maybe initdb shouldn't create
the directory unless you ask it to, or something.