Flaky test in t/100_vacuumdb.pl: ordering assumption not stable under plan changes

Started by Andrei Lepikhov12 days ago2 messageshackers
Jump to latest
#1Andrei Lepikhov
lepihov@gmail.com

Hi,

I found that t/100_vacuumdb.pl has a fragile ordering check that fails
if the query plan for vacuumdb's catalogue query changes. I sometimes
see how this test fails when writing an optimisation-related extension.

The test checks that vacuumdb processes "Foo".bar before "Bar".baz:

qr/VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo".bar
.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar".baz
/sx,

Both tables being tested, "Foo".bar and "Bar".baz, are created empty.
This means pg_class.relpages is 0 for both and the sort order is
completely unstable. The output order depends entirely on which query
plan will be chosen. Any change in the planner that affects the plan for
this query, such as a new join path type or a cost model change, may
flip the order and cause the test to fail.

AFAICS, The fix is quite trivial. Change the test regex (in
100_vacuumdb.pl) to use order-independent lookaheads instead of a
sequential match:

qr/(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo"\.bar)
(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar"\.baz)
/sx,

This makes the test robust regardless of the order in which the server
returns results.

Hence, it doesn’t change anything important. I think it deserves to be
back-patched down to v.16 (like the commit 2143d96dc7b introduced this
test) so other extensions can be stable with check-world tests.

Regards,
Andrei Lepikhov
pgEdge

Attachments:

0001-Fix-flaky-ordering-assertion-in-t-100_vacuumdb.pl.patchtext/plain; charset=UTF-8; name=0001-Fix-flaky-ordering-assertion-in-t-100_vacuumdb.pl.patchDownload+2-3
#2Daniel Gustafsson
daniel@yesql.se
In reply to: Andrei Lepikhov (#1)
Re: Flaky test in t/100_vacuumdb.pl: ordering assumption not stable under plan changes

On 3 Apr 2026, at 08:42, Andrei Lepikhov <lepihov@gmail.com> wrote:

Hi,

I found that t/100_vacuumdb.pl has a fragile ordering check that fails if the query plan for vacuumdb's catalogue query changes. I sometimes see how this test fails when writing an optimisation-related extension.

The test checks that vacuumdb processes "Foo".bar before "Bar".baz:

qr/VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo".bar
.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar".baz
/sx,

Both tables being tested, "Foo".bar and "Bar".baz, are created empty. This means pg_class.relpages is 0 for both and the sort order is completely unstable. The output order depends entirely on which query plan will be chosen. Any change in the planner that affects the plan for this query, such as a new join path type or a cost model change, may flip the order and cause the test to fail.

AFAICS, The fix is quite trivial. Change the test regex (in 100_vacuumdb.pl) to use order-independent lookaheads instead of a sequential match:

qr/(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo"\.bar)
(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar"\.baz)
/sx,

This makes the test robust regardless of the order in which the server returns results.

Hence, it doesn’t change anything important. I think it deserves to be back-patched down to v.16 (like the commit 2143d96dc7b introduced this test) so other extensions can be stable with check-world tests.

Thanks for the report, I'll have a look.

--
Daniel Gustafsson