pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i
When VACUUM or ANALYZE skips a concurrently dropped table, log it.
Hopefully, the additional logging will help avoid confusion that
could otherwise result.
Nathan Bossart, reviewed by Michael Paquier, Fabrízio Mello, and me
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/ab6eaee88420db58a948849d5a735997728d73a9
Modified Files
--------------
doc/src/sgml/config.sgml | 4 +-
src/backend/commands/analyze.c | 46 +++++++++++--
src/backend/commands/vacuum.c | 49 ++++++++++++--
.../isolation/expected/vacuum-concurrent-drop.out | 76 ++++++++++++++++++++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/vacuum-concurrent-drop.spec | 45 +++++++++++++
6 files changed, 208 insertions(+), 13 deletions(-)
Robert Haas <rhaas@postgresql.org> writes:
When VACUUM or ANALYZE skips a concurrently dropped table, log it.
When this went in, I was pretty skeptical of the value of an isolation
test for it, but said nothing. However, I now observe that the isolation
test is falling over on buildfarm machines with -DCLOBBER_CACHE_ALWAYS.
The buildfarm reports are a bit hard to interpret, but it's easy to
reproduce locally, and what I get is
$ more output_iso/regression.diffs
*** /home/postgres/pgsql/src/test/isolation/expected/vacuum-concurrent-drop.out
Mon Dec 4 17:02:55 2017
--- /home/postgres/pgsql/src/test/isolation/output_iso/results/vacuum-concurrent-drop.out Wed Dec 6 12:07:37 2017
***************
*** 49,54 ****
--- 49,55 ----
COMMIT;
step analyze_all: <... completed>
+ error in steps drop_and_commit analyze_all: ERROR: canceling statement due to user request
starting permutation: lock vac_analyze_specified drop_and_commit
step lock:
======================================================================
What appears to be happening is that a database-wide ANALYZE takes more
than a minute under CLOBBER_CACHE_ALWAYS, causing isolationtester.c's
hardwired one-minute timeout to trigger.
While you could imagine doing something to get around that, I do not
believe that this test is worth memorializing in perpetuity to begin
with. I'd recommend just taking it out again.
regards, tom lane
On 12/6/17, 11:57 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Robert Haas <rhaas@postgresql.org> writes:
When VACUUM or ANALYZE skips a concurrently dropped table, log it.
When this went in, I was pretty skeptical of the value of an isolation
test for it, but said nothing. However, I now observe that the isolation
test is falling over on buildfarm machines with -DCLOBBER_CACHE_ALWAYS.
The buildfarm reports are a bit hard to interpret, but it's easy to
reproduce locally, and what I get is$ more output_iso/regression.diffs *** /home/postgres/pgsql/src/test/isolation/expected/vacuum-concurrent-drop.out Mon Dec 4 17:02:55 2017 --- /home/postgres/pgsql/src/test/isolation/output_iso/results/vacuum-concurrent-drop.out Wed Dec 6 12:07:37 2017 *************** *** 49,54 **** --- 49,55 ---- COMMIT;step analyze_all: <... completed>
+ error in steps drop_and_commit analyze_all: ERROR: canceling statement due to user requeststarting permutation: lock vac_analyze_specified drop_and_commit
step lock:======================================================================
What appears to be happening is that a database-wide ANALYZE takes more
than a minute under CLOBBER_CACHE_ALWAYS, causing isolationtester.c's
hardwired one-minute timeout to trigger.
Thanks for digging into this.
While you could imagine doing something to get around that, I do not
believe that this test is worth memorializing in perpetuity to begin
with. I'd recommend just taking it out again.
While the current version of the test is clearly broken, I thought
Robert made a pretty strong argument regarding the value of the test
[0]: /messages/by-id/CA+TgmobH17W=WdduhXJhxdwHAeTazNp7MDP=k0p=2w1nuSSruw@mail.gmail.com
lines of code is not worth the extra work needed to maintain the
isolation test. I’m not strongly opinionated either way, but I lean
towards wanting to keep the test around.
Perhaps this could be fixed by modifying the database-wide cases to
use partitioned tables instead. The individual partitions will not
have RangeVars specified, so it would cover the case when logging
should be skipped.
Nathan
[0]: /messages/by-id/CA+TgmobH17W=WdduhXJhxdwHAeTazNp7MDP=k0p=2w1nuSSruw@mail.gmail.com
On Wed, Dec 6, 2017 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
What appears to be happening is that a database-wide ANALYZE takes more
than a minute under CLOBBER_CACHE_ALWAYS, causing isolationtester.c's
hardwired one-minute timeout to trigger.While you could imagine doing something to get around that, I do not
believe that this test is worth memorializing in perpetuity to begin
with. I'd recommend just taking it out again.
Mumble. I don't really mind that, but I'll bet $0.05 that this will
get broken at some point and we won't notice right away without the
isolation test.
Is it really our policy that no isolation test can take more than a
minute on the slowest buildfarm critter? If somebody decides to start
running CLOBBER_CACHE_ALWAYS on an even-slower critter, will we just
nuke isolation tests from orbit until the tests pass there? I have
difficulty seeing that as a sound approach.
Another thought is that it might not be necessary to have a
database-wide ANALYZE to trigger this. I managed to reproduce it
locally by doing VACUUM a, b while alternately locking a and b, so
that I let the name lookups complete, but then blocked trying to
vacuum a, and then at that point dropped b, then released the VACUUM.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 12/6/17, 1:23 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
Perhaps this could be fixed by modifying the database-wide cases to
use partitioned tables instead. The individual partitions will not
have RangeVars specified, so it would cover the case when logging
should be skipped.
This fixed the problem for me. I've attached a patch.
Nathan
Attachments:
fix_vacuum_concurrent_drop_test_v1.patchapplication/octet-stream; name=fix_vacuum_concurrent_drop_test_v1.patchDownload
diff --git a/src/test/isolation/expected/vacuum-concurrent-drop.out b/src/test/isolation/expected/vacuum-concurrent-drop.out
index 72d80a1..bb92e80 100644
--- a/src/test/isolation/expected/vacuum-concurrent-drop.out
+++ b/src/test/isolation/expected/vacuum-concurrent-drop.out
@@ -3,74 +3,74 @@ Parsed test spec with 2 sessions
starting permutation: lock vac_specified drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_specified: VACUUM test1, test2; <waiting ...>
+step vac_specified: VACUUM part1, part2; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-WARNING: skipping vacuum of "test2" --- relation no longer exists
+WARNING: skipping vacuum of "part2" --- relation no longer exists
step vac_specified: <... completed>
-starting permutation: lock vac_all drop_and_commit
+starting permutation: lock vac_all_parts drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_all: VACUUM; <waiting ...>
+step vac_all_parts: VACUUM parted; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-step vac_all: <... completed>
+step vac_all_parts: <... completed>
starting permutation: lock analyze_specified drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step analyze_specified: ANALYZE test1, test2; <waiting ...>
+step analyze_specified: ANALYZE part1, part2; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-WARNING: skipping analyze of "test2" --- relation no longer exists
+WARNING: skipping analyze of "part2" --- relation no longer exists
step analyze_specified: <... completed>
-starting permutation: lock analyze_all drop_and_commit
+starting permutation: lock analyze_all_parts drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step analyze_all: ANALYZE; <waiting ...>
+step analyze_all_parts: ANALYZE parted; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-step analyze_all: <... completed>
+step analyze_all_parts: <... completed>
starting permutation: lock vac_analyze_specified drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_analyze_specified: VACUUM ANALYZE test1, test2; <waiting ...>
+step vac_analyze_specified: VACUUM ANALYZE part1, part2; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-WARNING: skipping vacuum of "test2" --- relation no longer exists
+WARNING: skipping vacuum of "part2" --- relation no longer exists
step vac_analyze_specified: <... completed>
-starting permutation: lock vac_analyze_all drop_and_commit
+starting permutation: lock vac_analyze_all_parts drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_analyze_all: VACUUM ANALYZE; <waiting ...>
+step vac_analyze_all_parts: VACUUM ANALYZE parted; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-step vac_analyze_all: <... completed>
+step vac_analyze_all_parts: <... completed>
diff --git a/src/test/isolation/specs/vacuum-concurrent-drop.spec b/src/test/isolation/specs/vacuum-concurrent-drop.spec
index 31fc161..cae4092 100644
--- a/src/test/isolation/specs/vacuum-concurrent-drop.spec
+++ b/src/test/isolation/specs/vacuum-concurrent-drop.spec
@@ -7,39 +7,39 @@
setup
{
- CREATE TABLE test1 (a INT);
- CREATE TABLE test2 (a INT);
+ CREATE TABLE parted (a INT) PARTITION BY LIST (a);
+ CREATE TABLE part1 PARTITION OF parted FOR VALUES IN (1);
+ CREATE TABLE part2 PARTITION OF parted FOR VALUES IN (2);
}
teardown
{
- DROP TABLE IF EXISTS test1;
- DROP TABLE IF EXISTS test2;
+ DROP TABLE IF EXISTS parted;
}
session "s1"
step "lock"
{
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
}
step "drop_and_commit"
{
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
}
session "s2"
-step "vac_specified" { VACUUM test1, test2; }
-step "vac_all" { VACUUM; }
-step "analyze_specified" { ANALYZE test1, test2; }
-step "analyze_all" { ANALYZE; }
-step "vac_analyze_specified" { VACUUM ANALYZE test1, test2; }
-step "vac_analyze_all" { VACUUM ANALYZE; }
+step "vac_specified" { VACUUM part1, part2; }
+step "vac_all_parts" { VACUUM parted; }
+step "analyze_specified" { ANALYZE part1, part2; }
+step "analyze_all_parts" { ANALYZE parted; }
+step "vac_analyze_specified" { VACUUM ANALYZE part1, part2; }
+step "vac_analyze_all_parts" { VACUUM ANALYZE parted; }
permutation "lock" "vac_specified" "drop_and_commit"
-permutation "lock" "vac_all" "drop_and_commit"
+permutation "lock" "vac_all_parts" "drop_and_commit"
permutation "lock" "analyze_specified" "drop_and_commit"
-permutation "lock" "analyze_all" "drop_and_commit"
+permutation "lock" "analyze_all_parts" "drop_and_commit"
permutation "lock" "vac_analyze_specified" "drop_and_commit"
-permutation "lock" "vac_analyze_all" "drop_and_commit"
+permutation "lock" "vac_analyze_all_parts" "drop_and_commit"
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Dec 6, 2017 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
What appears to be happening is that a database-wide ANALYZE takes more
than a minute under CLOBBER_CACHE_ALWAYS, causing isolationtester.c's
hardwired one-minute timeout to trigger.
Is it really our policy that no isolation test can take more than a
minute on the slowest buildfarm critter?
Well, I think it's a minute per query not per whole test script. But in
any case, if it's taking a longer time than any other isolation test on
the CLOBBER_CACHE_ALWAYS critters, then it's also taking a proportionately
longer time than any other test on every other platform, and is therefore
costing every developer precious time today and indefinitely far into the
future. I continue to say that this test ain't worth it.
It's possible that we could compromise on dropping the steps that test
whole-database VACUUM/ANALYZE; the incremental gain from testing those
scenarios is certainly even less worth its cost than the basic cases.
regards, tom lane
On Wed, Dec 6, 2017 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Well, I think it's a minute per query not per whole test script. But in
any case, if it's taking a longer time than any other isolation test on
the CLOBBER_CACHE_ALWAYS critters, then it's also taking a proportionately
longer time than any other test on every other platform, and is therefore
costing every developer precious time today and indefinitely far into the
future. I continue to say that this test ain't worth it.
Sure. But, you also continue to not respond to my arguments about why
it IS worth it. I don't want to spend a lot of time fighting about
this, but it looks to me like your preferences here are purely
arbitrary. Yesterday, you added - without discussion - a test that I
had "obviously" left out by accident. Today, you want a test removed
that I added on purpose but which you assert has insufficient value.
So, sometimes you think it's worth adding tests that make the test
suite longer, and other times you think it isn't. That's fair enough
-- everyone comes down in different places on this at different times
-- but the only actual reason you've offered is that the script
contains a command that runs for over a minute on very slow machines
that have been artificially slowed down 100x. That's a silly reason:
it means that on real machines we're talking less than a second of
runtime even without modifying the test case, and if we do modify the
test, it can probably be made much less.
Please give me a little time to see if I can speed up this test enough
to fix this problem. If that doesn't work out, then we can rip this
out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 12/6/17, 8:25 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
Please give me a little time to see if I can speed up this test enough
to fix this problem. If that doesn't work out, then we can rip this
out.
Just in case it got missed earlier, here’s a patch that speeds it
up enough to pass with CLOBBER_CACHE_ALWAYS enabled. Instead of
doing database-wide operations, it just uses a partitioned table.
Nathan
Attachments:
fix_vacuum_concurrent_drop_test_v1.patchapplication/octet-stream; name=fix_vacuum_concurrent_drop_test_v1.patchDownload
diff --git a/src/test/isolation/expected/vacuum-concurrent-drop.out b/src/test/isolation/expected/vacuum-concurrent-drop.out
index 72d80a1..bb92e80 100644
--- a/src/test/isolation/expected/vacuum-concurrent-drop.out
+++ b/src/test/isolation/expected/vacuum-concurrent-drop.out
@@ -3,74 +3,74 @@ Parsed test spec with 2 sessions
starting permutation: lock vac_specified drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_specified: VACUUM test1, test2; <waiting ...>
+step vac_specified: VACUUM part1, part2; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-WARNING: skipping vacuum of "test2" --- relation no longer exists
+WARNING: skipping vacuum of "part2" --- relation no longer exists
step vac_specified: <... completed>
-starting permutation: lock vac_all drop_and_commit
+starting permutation: lock vac_all_parts drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_all: VACUUM; <waiting ...>
+step vac_all_parts: VACUUM parted; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-step vac_all: <... completed>
+step vac_all_parts: <... completed>
starting permutation: lock analyze_specified drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step analyze_specified: ANALYZE test1, test2; <waiting ...>
+step analyze_specified: ANALYZE part1, part2; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-WARNING: skipping analyze of "test2" --- relation no longer exists
+WARNING: skipping analyze of "part2" --- relation no longer exists
step analyze_specified: <... completed>
-starting permutation: lock analyze_all drop_and_commit
+starting permutation: lock analyze_all_parts drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step analyze_all: ANALYZE; <waiting ...>
+step analyze_all_parts: ANALYZE parted; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-step analyze_all: <... completed>
+step analyze_all_parts: <... completed>
starting permutation: lock vac_analyze_specified drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_analyze_specified: VACUUM ANALYZE test1, test2; <waiting ...>
+step vac_analyze_specified: VACUUM ANALYZE part1, part2; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-WARNING: skipping vacuum of "test2" --- relation no longer exists
+WARNING: skipping vacuum of "part2" --- relation no longer exists
step vac_analyze_specified: <... completed>
-starting permutation: lock vac_analyze_all drop_and_commit
+starting permutation: lock vac_analyze_all_parts drop_and_commit
step lock:
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
-step vac_analyze_all: VACUUM ANALYZE; <waiting ...>
+step vac_analyze_all_parts: VACUUM ANALYZE parted; <waiting ...>
step drop_and_commit:
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
-step vac_analyze_all: <... completed>
+step vac_analyze_all_parts: <... completed>
diff --git a/src/test/isolation/specs/vacuum-concurrent-drop.spec b/src/test/isolation/specs/vacuum-concurrent-drop.spec
index 31fc161..cae4092 100644
--- a/src/test/isolation/specs/vacuum-concurrent-drop.spec
+++ b/src/test/isolation/specs/vacuum-concurrent-drop.spec
@@ -7,39 +7,39 @@
setup
{
- CREATE TABLE test1 (a INT);
- CREATE TABLE test2 (a INT);
+ CREATE TABLE parted (a INT) PARTITION BY LIST (a);
+ CREATE TABLE part1 PARTITION OF parted FOR VALUES IN (1);
+ CREATE TABLE part2 PARTITION OF parted FOR VALUES IN (2);
}
teardown
{
- DROP TABLE IF EXISTS test1;
- DROP TABLE IF EXISTS test2;
+ DROP TABLE IF EXISTS parted;
}
session "s1"
step "lock"
{
BEGIN;
- LOCK test1 IN SHARE MODE;
+ LOCK part1 IN SHARE MODE;
}
step "drop_and_commit"
{
- DROP TABLE test2;
+ DROP TABLE part2;
COMMIT;
}
session "s2"
-step "vac_specified" { VACUUM test1, test2; }
-step "vac_all" { VACUUM; }
-step "analyze_specified" { ANALYZE test1, test2; }
-step "analyze_all" { ANALYZE; }
-step "vac_analyze_specified" { VACUUM ANALYZE test1, test2; }
-step "vac_analyze_all" { VACUUM ANALYZE; }
+step "vac_specified" { VACUUM part1, part2; }
+step "vac_all_parts" { VACUUM parted; }
+step "analyze_specified" { ANALYZE part1, part2; }
+step "analyze_all_parts" { ANALYZE parted; }
+step "vac_analyze_specified" { VACUUM ANALYZE part1, part2; }
+step "vac_analyze_all_parts" { VACUUM ANALYZE parted; }
permutation "lock" "vac_specified" "drop_and_commit"
-permutation "lock" "vac_all" "drop_and_commit"
+permutation "lock" "vac_all_parts" "drop_and_commit"
permutation "lock" "analyze_specified" "drop_and_commit"
-permutation "lock" "analyze_all" "drop_and_commit"
+permutation "lock" "analyze_all_parts" "drop_and_commit"
permutation "lock" "vac_analyze_specified" "drop_and_commit"
-permutation "lock" "vac_analyze_all" "drop_and_commit"
+permutation "lock" "vac_analyze_all_parts" "drop_and_commit"
On Wed, Dec 6, 2017 at 9:42 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
On 12/6/17, 8:25 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
Please give me a little time to see if I can speed up this test enough
to fix this problem. If that doesn't work out, then we can rip this
out.Just in case it got missed earlier, here’s a patch that speeds it
up enough to pass with CLOBBER_CACHE_ALWAYS enabled. Instead of
doing database-wide operations, it just uses a partitioned table.
Yeah, that looks like a reasonable approach to try. Committed, thanks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company