create_index test fails when synchronous_commit = off @ master

Started by Aleksander Alekseevalmost 4 years ago7 messages
#1Aleksander Alekseev
aleksander@timescale.com

Hi hackers,

I noticed that create_index test (make installcheck) fails on my laptop
because different query plans are chosen:

-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+         ->  Bitmap Index Scan on tenk1_unique1
+               Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

...

-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Index Cond: ((thousand < 2) AND (tenthous = ANY
('{1001,3000}'::integer[])))
+(4 rows)

Investigation showed that it happens only with `synchronous_commit = off`.
Only the master branch is affected, starting from cc50080a82. This is a
debug build. I'm running MacOS Monterey 12.2.1 @ x64.

I didn't investigate further. Do we assume that `make installcheck` suppose
to pass with a different postgresql.conf options?

--
Best regards,
Aleksander Alekseev

#2Andres Freund
andres@anarazel.de
In reply to: Aleksander Alekseev (#1)
Re: create_index test fails when synchronous_commit = off @ master

Hi,

On 2022-02-24 16:47:25 +0300, Aleksander Alekseev wrote:

-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Index Cond: ((thousand < 2) AND (tenthous = ANY
('{1001,3000}'::integer[])))
+(4 rows)

Heh. We've been having a lot of fights with exactly this plan change in the
AIO branch, before cc50080a82, and without synchronous_commit =
off. Interestingly near-exclusively with the regression run within
pg_upgrade's tests.

For aio we (David did a lot of that IIRC) finally hunted it down to be due
vacuum skipping pages due to inability to get a cleanup lock. If that happens
enough, pg_class.relallvisible changes enough to lead to the different plan.

I first was going to suggest that we should just use VACUUM FREEZE to prevent
the issue.

But in this instance the cause isn't cleanup locks, probably that we can't yet
set hint bits due to synchronous_commit=off? But I don't *fully* understand
how it leads to this.

I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass;
just after the
VACUUM ANALYZE tenk1;

synchronous_commit=on
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |           345
+(1 row)
synchronous_commit=off
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |             0
+(1 row)

So it clearly is the explanation for the issue.

Obviously we can locally work around it by adding a
SET LOCAL synchronous_commit = local;
to the COPY. But I'd like to fully understand what's going on.

I didn't investigate further. Do we assume that `make installcheck` suppose
to pass with a different postgresql.conf options?

Depends on the option, I think... There's some where it's interesting to run
tests with different options and where the effort required is reasonable. And
some cases where it's not... synchronous_commit=off worked until recently, and
I think we should keep it working.

Greetings,

Andres Freund

#3Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#2)
Re: create_index test fails when synchronous_commit = off @ master

Hi,

On 2022-02-24 07:33:39 -0800, Andres Freund wrote:

I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass;
just after the
VACUUM ANALYZE tenk1;

synchronous_commit=on
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |           345
+(1 row)
synchronous_commit=off
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |             0
+(1 row)

So it clearly is the explanation for the issue.

Obviously we can locally work around it by adding a
SET LOCAL synchronous_commit = local;
to the COPY. But I'd like to fully understand what's going on.

It is the hint bit sets delayed by asynchronous commit. I traced execution and
we do end up not setting all visible due to reaching the
!HeapTupleHeaderXminCommitted() path in lazy_scan_prune()

case HEAPTUPLE_LIVE:
...
/*
* Is the tuple definitely visible to all transactions?
*
* NB: Like with per-tuple hint bits, we can't set the
* PD_ALL_VISIBLE flag if the inserter committed
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
if (prunestate->all_visible)
{
TransactionId xmin;

if (!HeapTupleHeaderXminCommitted(tuple.t_data))

So it might be reasonable to use synchronous_commit=on in test_setup.sql?

It's not super satisfying, but i don't immediately see what else could prevent
all-visible to be set in this case? There's no dead rows, so concurrent
snapshots shouldn't prevent cleanup.

I guess we could alternatively try doing something like flushing pending async
commits at the start of vacuum. But that probably isn't warranted.

Greetings,

Andres Freund

#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Andres Freund (#3)
1 attachment(s)
Re: create_index test fails when synchronous_commit = off @ master

Hi Andres,

So it might be reasonable to use synchronous_commit=on in test_setup.sql?

It's not super satisfying, but I don't immediately see what else could prevent
all-visible to be set in this case?

I don't see what else can be done either. Here is the corresponding patch.

--
Best regards,
Aleksander Alekseev

Attachments:

fix-flacky-tests.patchapplication/octet-stream; name=fix-flacky-tests.patchDownload
diff --git a/src/test/regress/expected/test_setup.out b/src/test/regress/expected/test_setup.out
index 98e08cb6eb..e667412fd4 100644
--- a/src/test/regress/expected/test_setup.out
+++ b/src/test/regress/expected/test_setup.out
@@ -7,6 +7,13 @@
 \getenv dlsuffix PG_DLSUFFIX
 \set regresslib :libdir '/regress' :dlsuffix
 --
+-- synchronous_commit setting affects the time when hint bits are set which
+-- in its turn has an effect on the query plans. This means that in the
+-- general case we can't guarantee that all the tests will pass regardless of
+-- synchronous_commit value in postgresql.conf.
+--
+SET synchronous_commit = on;
+--
 -- Postgres formerly made the public schema read/write by default,
 -- and most of the core regression tests still expect that.
 --
diff --git a/src/test/regress/sql/test_setup.sql b/src/test/regress/sql/test_setup.sql
index d0a73c473d..ea2ac22b52 100644
--- a/src/test/regress/sql/test_setup.sql
+++ b/src/test/regress/sql/test_setup.sql
@@ -9,6 +9,14 @@
 
 \set regresslib :libdir '/regress' :dlsuffix
 
+--
+-- synchronous_commit setting affects the time when hint bits are set which
+-- in its turn has an effect on the query plans. This means that in the
+-- general case we can't guarantee that all the tests will pass regardless of
+-- synchronous_commit value in postgresql.conf.
+--
+SET synchronous_commit = on;
+
 --
 -- Postgres formerly made the public schema read/write by default,
 -- and most of the core regression tests still expect that.
#5Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#4)
1 attachment(s)
Re: create_index test fails when synchronous_commit = off @ master

Hi hackers,

I don't see what else can be done either. Here is the corresponding patch.

Here is an updated patch that includes the commit message.

--
Best regards,
Aleksander Alekseev

Attachments:

v2-0001-Use-synchronous_commit-on-in-test_setup.sql.patchapplication/octet-stream; name=v2-0001-Use-synchronous_commit-on-in-test_setup.sql.patchDownload
From bf99a048a77b312e7cae16bc5e940f9255241dba Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Mar 2022 13:00:51 +0300
Subject: [PATCH v2] Use synchronous_commit=on in test_setup.sql

Starting from cc50080a82 create_index test started to fail when there is a
synchronous_commit=off setting in postgresql.conf. As pointed out by Andres,
this is because synchronous_commit affects the time when the hint bits are
set. This in its turn affects the choice of the query plans.

This patch sets synchronous_commit to `on` in test_setup.sql. It's not very
satisfying but we don't immediately see what else can be done.

Authors: Andres Freund, Aleksander Alekseev
Discussion: https://www.postgresql.org/message-id/flat/CAJ7c6TPJNof1Q+vJsy3QebgbPgXdu2ErPvYkBdhD6_Ckv5EZRg@mail.gmail.com
---
 src/test/regress/expected/test_setup.out | 7 +++++++
 src/test/regress/sql/test_setup.sql      | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/src/test/regress/expected/test_setup.out b/src/test/regress/expected/test_setup.out
index 98e08cb6eb..e667412fd4 100644
--- a/src/test/regress/expected/test_setup.out
+++ b/src/test/regress/expected/test_setup.out
@@ -7,6 +7,13 @@
 \getenv dlsuffix PG_DLSUFFIX
 \set regresslib :libdir '/regress' :dlsuffix
 --
+-- synchronous_commit setting affects the time when hint bits are set which
+-- in its turn has an effect on the query plans. This means that in the
+-- general case we can't guarantee that all the tests will pass regardless of
+-- synchronous_commit value in postgresql.conf.
+--
+SET synchronous_commit = on;
+--
 -- Postgres formerly made the public schema read/write by default,
 -- and most of the core regression tests still expect that.
 --
diff --git a/src/test/regress/sql/test_setup.sql b/src/test/regress/sql/test_setup.sql
index d0a73c473d..ea2ac22b52 100644
--- a/src/test/regress/sql/test_setup.sql
+++ b/src/test/regress/sql/test_setup.sql
@@ -9,6 +9,14 @@
 
 \set regresslib :libdir '/regress' :dlsuffix
 
+--
+-- synchronous_commit setting affects the time when hint bits are set which
+-- in its turn has an effect on the query plans. This means that in the
+-- general case we can't guarantee that all the tests will pass regardless of
+-- synchronous_commit value in postgresql.conf.
+--
+SET synchronous_commit = on;
+
 --
 -- Postgres formerly made the public schema read/write by default,
 -- and most of the core regression tests still expect that.
-- 
2.35.1

#6Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#5)
Re: create_index test fails when synchronous_commit = off @ master

Hi hackers,

Here is an updated patch that includes the commit message.

Since this is a trivial change and no one seems to object to it so far, I'm
going to re-check the patch against the recent `master` and change its
status to "Ready for Committer" somewhere next week.

--
Best regards,
Aleksander Alekseev

#7Andres Freund
andres@anarazel.de
In reply to: Aleksander Alekseev (#6)
Re: create_index test fails when synchronous_commit = off @ master

On 2022-03-11 10:38:22 +0300, Aleksander Alekseev wrote:

Here is an updated patch that includes the commit message.

Since this is a trivial change and no one seems to object to it so far, I'm
going to re-check the patch against the recent `master` and change its
status to "Ready for Committer" somewhere next week.

Pushed it now.

- Andres