Flaky test in t/100_vacuumdb.pl: ordering assumption not stable under plan changes

Started by Andrei Lepikhovabout 2 months ago2 messageshackers

lepihov@gmail.com

about 2 months ago

Hi,

I found that t/100_vacuumdb.pl has a fragile ordering check that fails
if the query plan for vacuumdb's catalogue query changes. I sometimes
see how this test fails when writing an optimisation-related extension.

The test checks that vacuumdb processes "Foo".bar before "Bar".baz:

qr/VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo".bar
.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar".baz
/sx,

Both tables being tested, "Foo".bar and "Bar".baz, are created empty.
This means pg_class.relpages is 0 for both and the sort order is
completely unstable. The output order depends entirely on which query
plan will be chosen. Any change in the planner that affects the plan for
this query, such as a new join path type or a cost model change, may
flip the order and cause the test to fail.

AFAICS, The fix is quite trivial. Change the test regex (in
100_vacuumdb.pl) to use order-independent lookaheads instead of a
sequential match:

qr/(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo"\.bar)
(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar"\.baz)
/sx,

This makes the test robust regardless of the order in which the server
returns results.

Hence, it doesn’t change anything important. I think it deserves to be
back-patched down to v.16 (like the commit 2143d96dc7b introduced this
test) so other extensions can be stable with check-world tests.

Regards,
Andrei Lepikhov
pgEdge

Daniel Gustafsson

daniel@yesql.se

about 2 months ago

In reply to: Andrei Lepikhov (#1)

Re: Flaky test in t/100_vacuumdb.pl: ordering assumption not stable under plan changes

On 3 Apr 2026, at 08:42, Andrei Lepikhov <lepihov@gmail.com> wrote:

Hi,

I found that t/100_vacuumdb.pl has a fragile ordering check that fails if the query plan for vacuumdb's catalogue query changes. I sometimes see how this test fails when writing an optimisation-related extension.

The test checks that vacuumdb processes "Foo".bar before "Bar".baz:

qr/VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo".bar
.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar".baz
/sx,

Both tables being tested, "Foo".bar and "Bar".baz, are created empty. This means pg_class.relpages is 0 for both and the sort order is completely unstable. The output order depends entirely on which query plan will be chosen. Any change in the planner that affects the plan for this query, such as a new join path type or a cost model change, may flip the order and cause the test to fail.

AFAICS, The fix is quite trivial. Change the test regex (in 100_vacuumdb.pl) to use order-independent lookaheads instead of a sequential match:

qr/(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Foo"\.bar)
(?=.*VACUUM\ \(SKIP_DATABASE_STATS\)\ "Bar"\.baz)
/sx,

This makes the test robust regardless of the order in which the server returns results.

Hence, it doesn’t change anything important. I think it deserves to be back-patched down to v.16 (like the commit 2143d96dc7b introduced this test) so other extensions can be stable with check-world tests.

Thanks for the report, I'll have a look.

--
Daniel Gustafsson

Flaky test in t/100_vacuumdb.pl: ordering assumption not stable under plan changes

Attachments: