[PATCH] Full support for index LP_DEAD hint bits on standby

michail.nikolaev@gmail.com

over 5 years ago

In reply to: Michail Nikolaev (#1)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello, everyone.

Oh, I just realized that it seems like I was too naive to allow
standby to set LP_DEAD bits this way.
There is a possible consistency problem in the case of low
minRecoveryPoint value (because hint bits do not move PageLSN
forward).

Something like this:

LSN=10 STANDBY INSERTS NEW ROW IN INDEX (index_lsn=10)
<-----------minRecoveryPoint will go here
LSN=20 STANDBY DELETES ROW FROM HEAP, INDEX UNTACHED (index_lsn=10)
REPLICA SCANS INDEX AND SET hint bits (index_lsn=10)
INDEX IS FLUSHED (minRecoveryPoint=index_lsn=10)
CRASH

On crash recovery, a standby will be able to handle queries after
LSN=10. But the index page contains hints bits from the future
(LSN=20).
So, need to think here.

Thanks,
Michail.

michail.nikolaev@gmail.com

over 5 years ago

In reply to: Michail Nikolaev (#2)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello, hackers.

I think I was able to fix the issue related to minRecoveryPoint and crash
recovery. To make sure standby will be consistent after crash recovery, we
need to take the current value of minRecoveryPoint into account while
setting LP_DEAD hints (almost the same way as it is done for *heap* hint
bits already).

I have introduced new structure IndexHintBitsData:
-------
/* guaranteed not visible for all backends */
bool all_dead;

/* latest removed xid if known */
TransactionId latest_removed_xid;

/* lsn of page where dead tuple located */
XLogRecPtr page_lsn;
-------

This structure is filled by the `heap_hot_search_buffer` function. After,
we decide to set or not `kill_prior_tuple` depending on its content
(calling `IsMarkBufferDirtyIndexHintAllowed`).

For primary - it is always safe to set LP_DEAD in index if `all_dead` ==
true.

In the case of standby, we need to check `latest_removed_xid` (if
available) first. If commit LSN of the latest removed xid is already lower
than minRecoveryPoint (`XLogNeedsFlush`) - it is safe to set
`kill_prior_tuple`.

Sometimes we are not sure about the latest removed xid - heap record could
be marked dead by the XLOG_HEAP2_CLEAN record, for example. In such a case
we check the LSN of the *heap* page containing the tuple (LSN could be
updated by other transactions already - but it does not matter in that
situation). If page LSN is lower than minRecoveryPoint - it is safe to set
LP_DEAD in the index too. Otherwise - just leave the index tuple alive.

So, to bring it all together:

* Normal operation, proc->indexIgnoreKilledTuples is true:
It is safe for standby to use hint bits from the primary FPI because
of XLOG_INDEX_HINT_BITS_HORIZON conflict resolution.
It is safe for standby to set its index hint bits because
`ComputeXidHorizons` honors other read-only procs xmin and lowest xid on
primary (`KnownAssignedXidsGetOldestXmin`).

* Normal operation, proc->indexIgnoreKilledTuples is false:
Index hint bits are never set or taken into account.

* Crash recovery, proc->indexIgnoreKilledTuples is true:
It is safe for standby to use hint bits from the primary FPW because
XLOG_INDEX_HINT_BITS_HORIZON is always logged before FPI, and commit record
of transaction removed the tuple is logged before
XLOG_INDEX_HINT_BITS_HORIZON. So, if FPI with hints was flushed (and taken
into account by minRecoveryPoint) - both transaction-remover and horizon
records are replayed before reading queries.
It is safe for standby to use its hint bits because they can be set
only if the commit record of transaction-remover is lower than
minRecoveryPoint or LSN of heap page with removed tuples is lower than
minRecoveryPoint.

* Crash recovery, proc->indexIgnoreKilledTuples is false:
Index hint bits are never set or taken into account.

So, now it seems correct to me.

Another interesting point here - now position of minRecoveryPoint affects
performance a lot. It is happening already (because of *heap* hint bits)
but after the patch, it is noticeable even more. Is there any sense to keep
minRecoveryPoint at a low value?

Rebased and updated patch in attachment.

Will be happy if someone could recheck my ideas or even the code :)

Thanks a lot,
Michail.

[1]: /messages/by-id/CAH2-Wz=-BoaKgkN-MnKj6hFwO1BOJSA+yLMMO+LRZK932fNUXA@mail.gmail.com
[2]: /messages/by-id/CANtu0oiAtteJ+MpPonBg6WfEsJCKrxuLK15P6GsaGDcYGjefVQ@mail.gmail.com
[3]: /messages/by-id/CANtu0oh28mX5gy5jburH+n1mcczK5_dCQnhbBnCM=Pfqh-A26Q@mail.gmail.com
[4]: /messages/by-id/CANtu0oiP18H31dSaEzn0B0rW6tA_q1G7=9Y92+US_WHGOoQevg@mail.gmail.com

michail.nikolaev@gmail.com

over 5 years ago

In reply to: Michail Nikolaev (#3)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello, everyone.

After some correspondence with Peter Geoghegan (1) and his ideas, I
have reworked the patch a lot and now it is much more simple with even
better performance (no new WAL or conflict resolution, hot standby
feedback is unrelated).

The idea is pretty simple now - let’s mark the page with
“standby-safe” LP_DEAD hints by the bit in btpo_flags
(BTP_LP_SAFE_ON_STANDBY and similar for gist and hash).

If standby wants to set LP_DEAD - it checks BTP_LP_SAFE_ON_STANDBY on
the page first, if it is not set - all “primary” hints are removed
first, and then the flag is set (with memory barrier to avoid memory
ordering issues in concurrent scans).
Also, standby checks BTP_LP_SAFE_ON_STANDBY to be sure about ignoring
tuples marked by LP_DEAD during the scan.

Of course, it is not so easy. If standby was promoted (or primary was
restored from standby backup) - it is still possible to receive FPI
with such flag set in WAL logs. So, the main problem is still there.

But we could just clear this flag while applying FPI because the page
remains dirty after that anyway! It should not cause any checksum,
consistency, or pg_rewind issues as explained in (2).
Semantically it is the same as set hint bit one milisecond after FPI
was applied (while page still remains dirty after FPI replay) - and
standby already does it with *heap* hint bits.

Also, TAP-test attached to (2) shows how it is easy to flush a hint
bit which was set by standby to achieve different checksum comparing
to primary already.

If standby was promoted (or restored from standby backup) it is safe
to use LP_DEAD with or without BTP_LP_SAFE_ON_STANDBY on a page. But
for accuracy BTP_LP_SAFE_ON_STANDBY is cleared by primary if found.

Also, we should take into account minRecoveryPoint as described in (3)
to avoid consistency issues during crash recovery (see
IsIndexLpDeadAllowed).

Also, as far as I know - there is no practical sense to keep
minRecoveryPoint at a low value. So, there is an optional patch that
moves minRecoveryPoint forward at each xl_running_data (to allow
standby to set hint bits and LP_DEADs more aggressively). It is about
every 15s.

There are some graphics showing performance testing results on my PC
in the attachment (test is taken from (4)). Each test was running for
10 minutes.
Additional primary performance is probably just measurement error. But
standby performance gain is huge.

Feel free to ask if you need more proof about correctness.

Thanks,
Michail.

ah@cybertec.at

about 5 years ago

In reply to: Michail Nikolaev (#2)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

I'm trying to review the patch, but not sure if I understand this problem,
please see my comment below.

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

Oh, I just realized that it seems like I was too naive to allow
standby to set LP_DEAD bits this way.
There is a possible consistency problem in the case of low
minRecoveryPoint value (because hint bits do not move PageLSN
forward).

Something like this:

LSN=10 STANDBY INSERTS NEW ROW IN INDEX (index_lsn=10)
<-----------minRecoveryPoint will go here
LSN=20 STANDBY DELETES ROW FROM HEAP, INDEX UNTACHED (index_lsn=10)

Why doesn't minRecoveryPoint get updated to 20? IMO that should happen by
replaying the commit record. And if the standby happens to crash before the
commit record could be replayed, no query should see the deletion and thus no
hint bit should be set in the index.

REPLICA SCANS INDEX AND SET hint bits (index_lsn=10)
INDEX IS FLUSHED (minRecoveryPoint=index_lsn=10)
CRASH

On crash recovery, a standby will be able to handle queries after
LSN=10. But the index page contains hints bits from the future
(LSN=20).
So, need to think here.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

michail.nikolaev@gmail.com

about 5 years ago

In reply to: Antonin Houska (#5)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello, Antonin.

I'm trying to review the patch, but not sure if I understand this problem,
please see my comment below.

Thanks a lot for your attention. It is strongly recommended to look at
version N3 (1) because it is a much more elegant, easy, and reliable
solution :) But the minRecoveryPoint-related issue affects it anyway.

Why doesn't minRecoveryPoint get updated to 20? IMO that should happen by
replaying the commit record. And if the standby happens to crash before the
commit record could be replayed, no query should see the deletion and thus no
hint bit should be set in the index.

minRecoveryPoint is not affected by replaying the commit record in
most cases. It is updated in a lazy way, something like this:
minRecoveryPoint = max LSN of flushed page. Version 3 of a patch
contains a code_optional.patch to move minRecoveryPoint more
aggressively to get additional performance on standby (based on
Peter’s answer in (2).

So, “minRecoveryPoint will go here” is not because of “STANDBY INSERTS
NEW ROW IN INDEX” it is just a random event.

Thanks,
Michail.

[1]: /messages/by-id/CANtu0ohHu1r1xQfTzEJuxeaOMYncG7xRxUQWdH=cMXZSf+nzvg@mail.gmail.com
[2]: /messages/by-id/CAH2-WzkSUcuFukhJdSxHFgtL6zEQgNhgOzNBiTbP_4u=k6igAg@mail.gmail.com (“Also, btw, do you know any reason to keep minRecoveryPoint at a low value?”)
(“Also, btw, do you know any reason to keep minRecoveryPoint at a low
value?”)

ah@cybertec.at

about 5 years ago

In reply to: Michail Nikolaev (#6)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

Hello, Antonin.

I'm trying to review the patch, but not sure if I understand this problem,
please see my comment below.

Thanks a lot for your attention. It is strongly recommended to look at
version N3 (1) because it is a much more elegant, easy, and reliable
solution :) But the minRecoveryPoint-related issue affects it anyway.

Indeed I'm reviewing (1), but I wanted to discuss this particular question in
context, so I replied here.

Why doesn't minRecoveryPoint get updated to 20? IMO that should happen by
replaying the commit record. And if the standby happens to crash before the
commit record could be replayed, no query should see the deletion and thus no
hint bit should be set in the index.

minRecoveryPoint is not affected by replaying the commit record in
most cases. It is updated in a lazy way, something like this:
minRecoveryPoint = max LSN of flushed page. Version 3 of a patch
contains a code_optional.patch to move minRecoveryPoint more
aggressively to get additional performance on standby (based on
Peter’s answer in (2).

So, “minRecoveryPoint will go here” is not because of “STANDBY INSERTS
NEW ROW IN INDEX” it is just a random event.
Michail.

Sorry, I missed the fact that your example can be executed inside BEGIN - END
block, in which case minRecoveryPoint won't advance after each command.

I'll continue my review by replying to (1)

[1]: /messages/by-id/CANtu0ohHu1r1xQfTzEJuxeaOMYncG7xRxUQWdH=cMXZSf+nzvg@mail.gmail.com
[2]: /messages/by-id/CAH2-WzkSUcuFukhJdSxHFgtL6zEQgNhgOzNBiTbP_4u=k6igAg@mail.gmail.com

(“Also, btw, do you know any reason to keep minRecoveryPoint at a low
value?”)

I'm not an expert in this area (I'm reviewing this patch also to learn more
about recovery and replication), but after a breif research I think that
postgres tries not to update the control file too frequently, see comments in
UpdateMinRecoveryPoint(). I don't know if what you do in code_optional.patch
would be a problem. Actually I think that a commit record should be replayed
more often than XLOG_RUNNING_XACTS, shouldn't it?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

michail.nikolaev@gmail.com

about 5 years ago

In reply to: Antonin Houska (#7)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello,
Antonin.

Sorry, I missed the fact that your example can be executed inside BEGIN - END
block, in which case minRecoveryPoint won't advance after each command.

No, the block is not executed as a single transaction, all commands
are separated transactions (see below)

Actually I think that a commit record should be replayed
more often than XLOG_RUNNING_XACTS, shouldn't it?

Yes, but replaying commit records DOES NOT affect minRecoveryPoint in
almost all cases.

UpdateMinRecoveryPoint is called by XLogFlush, but xact_redo_commit
calls XLogFlush only in two cases:
* DropRelationFiles is called (some relation are dropped)
* If ForceSyncCommit was used on primary - few “heavy” commands, like
DropTableSpace, CreateTableSpace, movedb, etc.

But “regular” commit record is replayed without XLogFlush and, as
result, without UpdateMinRecoveryPoint.

So, in practice, UpdateMinRecoveryPoint is updated in an “async” way
by checkpoint job. This is why there is a sense to call it on
XLOG_RUNNING_XACTS.

Thanks,
Michail.

ah@cybertec.at

about 5 years ago

In reply to: Michail Nikolaev (#8)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

Sorry, I missed the fact that your example can be executed inside BEGIN - END
block, in which case minRecoveryPoint won't advance after each command.

No, the block is not executed as a single transaction, all commands
are separated transactions (see below)

Actually I think that a commit record should be replayed
more often than XLOG_RUNNING_XACTS, shouldn't it?

Yes, but replaying commit records DOES NOT affect minRecoveryPoint in
almost all cases.

UpdateMinRecoveryPoint is called by XLogFlush, but xact_redo_commit
calls XLogFlush only in two cases:
* DropRelationFiles is called (some relation are dropped)
* If ForceSyncCommit was used on primary - few “heavy” commands, like
DropTableSpace, CreateTableSpace, movedb, etc.

But “regular” commit record is replayed without XLogFlush and, as
result, without UpdateMinRecoveryPoint.

ok, I missed this. Thanks for explanation.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#10

ah@cybertec.at

about 5 years ago

In reply to: Michail Nikolaev (#4)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

After some correspondence with Peter Geoghegan (1) and his ideas, I
have reworked the patch a lot and now it is much more simple with even
better performance (no new WAL or conflict resolution, hot standby
feedback is unrelated).

My review that started in [1]/messages/by-id/61470.1620647290@antos continues here.

(Please note that code.patch does not apply to the current master branch.)

I think I understand your approach now and couldn't find a problem by reading
the code. What I consider worth improving is documentation, both code comments
and nbtree/README. Especially for the problem discussed in [1]/messages/by-id/61470.1620647290@antos it should be
explained what would happen if kill_prior_tuple_min_lsn was not checked.

Also, in IsIndexLpDeadAllowed() you say that invalid
deadness->latest_removed_xid means the following:

/*
* Looks like it is tuple cleared by heap_page_prune_execute,
* we must be sure if LSN of XLOG_HEAP2_CLEAN (or any subsequent
* updates) less than minRecoveryPoint to avoid MVCC failure
* after crash recovery.
*/

However I think there's one more case: if heap_hot_search_buffer() considers
all tuples in the chain to be "surely dead", but
HeapTupleHeaderAdvanceLatestRemovedXid() skips them all for this reason:

/*
* Ignore tuples inserted by an aborted transaction or if the tuple was
* updated/deleted by the inserting transaction.
*
* Look for a committed hint bit, or if no xmin bit is set, check clog.
*/

I think that the dead tuples produced this way should never be visible on the
standby (and even if they were, they would change the page LSN so your
algorithm would treat them correctly) so I see no correctness problem. But it
might be worth explaining better the meaning of invalid "latest_removed_xid"
in comments.

In the nbtree/README, you say

"... if the commit record of latestRemovedXid is more ..."

but it's not clear to me what "latestRemovedXid" is. If you mean the
scan->kill_prior_tuple_min_lsn field, you probably need more words to explain
it.

* IsIndexLpDeadAllowed()

/* It all always allowed on primary if *all_dead. */

should probably be

/* It is always allowed on primary if *all_dead. */

* gistkillitems()

As the function is only called if (so->numKilled > 0), I think both
"killedsomething" and "dirty" variables should always have the same value, so
one variable should be enough. Assert(so->numKilled) would be appropriate in
that case.

The situation is similar for btree and hash indexes.

doc.patch:

"+applying the fill page write."

[1]: /messages/by-id/61470.1620647290@antos

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#11

michail.nikolaev@gmail.com

about 5 years ago

In reply to: Antonin Houska (#10)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello,
Antonin.

My review that started in [1] continues here.

Thanks a lot for the review.

(Please note that code.patch does not apply to the current master branch.)

Rebased.

Especially for the problem discussed in [1] it should be
explained what would happen if kill_prior_tuple_min_lsn was not checked.

Updated README, hope it is better now. Also, added few details related
to the flush of hint bits.

However I think there's one more case: if heap_hot_search_buffer() considers
all tuples in the chain to be "surely dead", but
HeapTupleHeaderAdvanceLatestRemovedXid() skips them all for this reason:

Yes, good catch, missed it.

I think that the dead tuples produced this way should never be visible on the
standby (and even if they were, they would change the page LSN so your
algorithm would treat them correctly) so I see no correctness problem. But it
might be worth explaining better the meaning of invalid "latest_removed_xid"
in comments.

Added additional comment.

but it's not clear to me what "latestRemovedXid" is. If you mean the
scan->kill_prior_tuple_min_lsn field, you probably need more words to explain
it.

Hope it is better now.

should probably be
/* It is always allowed on primary if *all_dead. */

Fixed.

As the function is only called if (so->numKilled > 0), I think both
"killedsomething" and "dirty" variables should always have the same value, so
one variable should be enough. Assert(so->numKilled) would be appropriate in
that case.

Fixed, but partly. It is because I have added additional checks for a
long transaction in the case of promoted server.

"+applying the fill page write."

Fixed.

Updated version in attach.

Thanks a lot,
Michail.

#12

michail.nikolaev@gmail.com

about 5 years ago

In reply to: Michail Nikolaev (#11)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello.

Added a check for standby promotion with the long transaction to the
test (code and docs are unchanged).

Thanks,
Michail.

#13

ah@cybertec.at

over 4 years ago

In reply to: Michail Nikolaev (#12)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

Hello.

Added a check for standby promotion with the long transaction to the
test (code and docs are unchanged).

I'm trying to continue the review, sorry for the delay. Following are a few
question about the code:

* Does the masking need to happen in the AM code, e.g. _bt_killitems()? I'd
expect that the RmgrData.rm_fpi_mask can do all the work.

Maybe you're concerned about clearing the "LP-safe-on-standby" bits after
promotion, but I wouldn't consider this a problem: once the standby is
allowed to set the hint bits (i.e. minRecoveryPoint is high enough, see
IsIndexLpDeadAllowed() -> XLogNeedsFlush()), promotion shouldn't break
anything because it should not allow minRecoveryPoint to go backwards.

* How about modifying rm_mask() instead of introducing rm_fpi_mask()? Perhaps
a boolean argument can be added to distinguish the purpose of the masking.

* Are you sure it's o.k. to use mask_lp_flags() here? It sets the item flags
to LP_UNUSED unconditionally, which IMO should only be done by VACUUM. I
think you only need to revert the effect of prior ItemIdMarkDead(), so you
only need to change the status LP_DEAD to LP_NORMAL if the tuple still has
storage. (And maybe add an assertion to ItemIdMarkDead() confirming that
it's only used for LP_NORMAL items?)

As far as I understand, the current code only uses mask_lp_flags() during
WAL consistency check on copies of pages which don't eventually get written
to disk.

* IsIndexLpDeadAllowed()

** is bufmgr.c the best location for this function?

** the header comment should explain the minLsn argument.

** comment

/* It is always allowed on primary if *all_dead. */

should probably be

/* It is always allowed on primary if ->all_dead. */

* comment: XLOG_HEAP2_CLEAN has been renamed to XLOG_HEAP2_PRUNE in PG14.

On regression tests:

* Is the purpose of the repeatable read (RR) snapshot to test that
heap_hot_search_buffer() does not set deadness->all_dead if some transaction
can still see a tuple of the chain? If so, I think the RR snapshot does not
have to be used in the tests because this patch does not really affect the
logic: heap_hot_search_buffer() only sets deadness->all_dead to false, just
like it sets *all_dead in the current code. Besides that,
IsIndexLpDeadAllowed() too can avoid setting of the LP_DEAD flag on an index
tuple (at least until the commit record of the deleting/updating transaction
gets flushed to disk), so it can hide the behaviour of
heap_hot_search_buffer().

* Unless I miss something, the tests check that the hint bits are not
propagated from primary (or they are propagated but marked non-safe),
however there's no test to check that standby does set the hint bits itself.

* I'm also not sure if promotion needs to be tested. What's specific about the
promoted cluster from the point of view of this feature? The only thing I
can think of is clearing of the "LP-safe-on-standby" bits, but, as I said
above, I'm not sure if the tests ever let standby to set those bits before
the promotion.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#14

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Antonin Houska (#13)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello, Antonin.

I'm trying to continue the review, sorry for the delay. Following are a few
question about the code:

Thanks for the review :) And sorry for the delay too :)

* Does the masking need to happen in the AM code, e.g. _bt_killitems()?
I'd expect that the RmgrData.rm_fpi_mask can do all the work.

RmgrData.rm_fpi_mask clears a single BTP_LP_SAFE_ON_STANDBY bit only
to indicate that hints bit are not safe to be used on standby.
Why do not clear LP_DEAD bits in rm_fpi_mask? There is no sense
because we could get such bits in multiple ways:

* the standby was created from the base backup of the primary
* some pages were changed by pg_rewind
* the standby was updated to the version having this feature (so, old
pages still contains LP_DEAD)

So, AM code needs to know when and why clear LP_DEAD bits if
BTP_LP_SAFE_ON_STANDBY is not set.
Also, the important moment here is pg_memory_barrier() usage.

* How about modifying rm_mask() instead of introducing rm_fpi_mask()? Perhaps
a boolean argument can be added to distinguish the purpose of the masking.

I have tried this way but the code was looking dirty and complicated.
Also, the separated fpi_mask provides some semantics to the function.

* Are you sure it's o.k. to use mask_lp_flags() here? It sets the item flags
to LP_UNUSED unconditionally, which IMO should only be done by VACUUM.

Oh, good catch. I made mask_lp_dead for this. Also, added such a
situation to the test.

** is bufmgr.c the best location for this function?

Moved to indexam.c and made static (is_index_lp_dead_allowed).

should probably be
/* It is always allowed on primary if ->all_dead. */

Fixed.

* comment: XLOG_HEAP2_CLEAN has been renamed to XLOG_HEAP2_PRUNE in PG14.

Fixed.

* Is the purpose of the repeatable read (RR) snapshot to test that
heap_hot_search_buffer() does not set deadness->all_dead if some transaction
can still see a tuple of the chain?

The main purpose is to test xactStartedInRecovery logic after the promotion.
For example -

if (scan->xactStartedInRecovery && !RecoveryInProgress())`

* Unless I miss something, the tests check that the hint bits are not
propagated from primary (or they are propagated but marked non-safe),
however there's no test to check that standby does set the hint bits itself.

It is tested on different standby, see

is(hints_num($node_standby_2), qq(10), 'index hint bits already

set on second standby 2');

Also, I added checks for BTP_LP_SAFE_ON_STANDBY to make sure
everything in the test goes by scenario.

Thanks a lot,
Michail.

#15

ah@cybertec.at

over 4 years ago

In reply to: Michail Nikolaev (#14)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

* Is the purpose of the repeatable read (RR) snapshot to test that
heap_hot_search_buffer() does not set deadness->all_dead if some transaction
can still see a tuple of the chain?

The main purpose is to test xactStartedInRecovery logic after the promotion.
For example -

if (scan->xactStartedInRecovery && !RecoveryInProgress())`

I understand that the RR snapshot is used to check the MVCC behaviour, however
this comment seems to indicate that the RR snapshot should also prevent the
standb from setting the hint bits.

# Make sure previous queries not set the hints on standby because
# of RR snapshot

I can imagine that on the primary, but I don't think that the backend that
checks visibility on standby does checks other snapshots/backends. And it
didn't work when I ran the test manually, although I could have missed
something.

A few more notes regarding the tests:

* 026_standby_index_lp_dead.pl should probably be renamed to
027_standby_index_lp_dead.pl (026_* was created in the master branch
recently)

* The test fails, although I do have convigrured the build with
--enable-tap-tests.

BEGIN failed--compilation aborted at t/026_standby_index_lp_dead.pl line 5.
t/026_standby_index_lp_dead.pl .. Dubious, test returned 2 (wstat 512, 0x200)

I suspect the testing infrastructure changed recently.

* The messages like this

is(hints_num($node_standby_1), qq(10),
'hints are set on standby1 because FPI but marked as non-safe');

say that the hints are "marked as non-safe", but the hints_num() function
does not seem to check that.

* wording:

is(hints_num($node_standby_2), qq(10), 'index hint bits already set on second standby 2');
->
is(hints_num($node_standby_2), qq(10), 'index hint bits already set on standby 2');

And a few more notes on the code:

* There's an extra colon in mask_lp_dead():

bufmask.c:148:38: warning: for loop has empty body [-Wempty-body]
offnum = OffsetNumberNext(offnum));
^
bufmask.c:148:38: note: put the semicolon on a separate line to silence this warning

* the header comment of heap_hot_search_buffer() still says "*all_dead"
whereas I'd expect "->all_dead".

The same for "*page_lsn".

* I can see no test for the INDEX_LP_DEAD_OK_MIN_LSN value of the
IndexLpDeadAllowedResult enumeration. Shouldn't there be only two values,
e.g. INDEX_LP_DEAD_OK and INDEX_LP_DEAD_MAYBE_OK ? Or a boolean variable (in
index_fetch_heap()) of the appropriate name, e.g. kill_maybe_allowed, and
rename the function is_index_lp_dead_allowed() to
is_index_lp_dead_maybe_allowed()?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#16

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Antonin Houska (#15)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello, Antonin.

Thanks for pushing it forward.

I understand that the RR snapshot is used to check the MVCC behaviour, however
this comment seems to indicate that the RR snapshot should also prevent the
standb from setting the hint bits.
# Make sure previous queries not set the hints on standby because
# of RR snapshot
I can imagine that on the primary, but I don't think that the backend that
checks visibility on standby does checks other snapshots/backends. And it
didn't work when I ran the test manually, although I could have missed
something.

Yes, it checks - you could see ComputeXidHorizons for details. It is
the main part of the correctness of the whole feature. I added some
details about it to the test.

* 026_standby_index_lp_dead.pl should probably be renamed to
027_standby_index_lp_dead.pl (026_* was created in the master branch
recently)

Done.

BEGIN failed--compilation aborted at t/026_standby_index_lp_dead.pl line 5.
t/026_standby_index_lp_dead.pl .. Dubious, test returned 2 (wstat 512, 0x200)

Fixed.

* The messages like this

Fixed.

* There's an extra colon in mask_lp_dead():

Oh, it is a huge error really (the loop was empty) :) Fixed.

* the header comment of heap_hot_search_buffer() still says "*all_dead"
whereas I'd expect "->all_dead".
The same for "*page_lsn".

I was trying to mimic the style of comment (it says about “*tid” from
2007). So, I think it is better to keep it in the same style for the
whole function comment.

* I can see no test for the INDEX_LP_DEAD_OK_MIN_LSN value of the
IndexLpDeadAllowedResult enumeration. Shouldn't there be only two values,
e.g. INDEX_LP_DEAD_OK and INDEX_LP_DEAD_MAYBE_OK ? Or a boolean variable (in
index_fetch_heap()) of the appropriate name, e.g. kill_maybe_allowed, and
rename the function is_index_lp_dead_allowed() to
is_index_lp_dead_maybe_allowed()?

Yes, this way it is looks better. Done. Also, I have added some checks
for “maybe” LSN-related logic to the test.

Thanks a lot,
Michail.

#17

ah@cybertec.at

over 4 years ago

In reply to: Michail Nikolaev (#16)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Michail Nikolaev <michail.nikolaev@gmail.com> wrote:

I understand that the RR snapshot is used to check the MVCC behaviour, however
this comment seems to indicate that the RR snapshot should also prevent the
standb from setting the hint bits.
# Make sure previous queries not set the hints on standby because
# of RR snapshot
I can imagine that on the primary, but I don't think that the backend that
checks visibility on standby does checks other snapshots/backends. And it
didn't work when I ran the test manually, although I could have missed
something.

Yes, it checks - you could see ComputeXidHorizons for details. It is
the main part of the correctness of the whole feature. I added some
details about it to the test.

Ah, ok. I thought that only KnownAssignedXids is used on standby, but that
would ignore the RR snapshot. It wasn't clear to me when the xmin of the
hot-standby backends is set, now I think it's done by GetSnapshotData().

* I can see no test for the INDEX_LP_DEAD_OK_MIN_LSN value of the
IndexLpDeadAllowedResult enumeration. Shouldn't there be only two values,
e.g. INDEX_LP_DEAD_OK and INDEX_LP_DEAD_MAYBE_OK ? Or a boolean variable (in
index_fetch_heap()) of the appropriate name, e.g. kill_maybe_allowed, and
rename the function is_index_lp_dead_allowed() to
is_index_lp_dead_maybe_allowed()?

Yes, this way it is looks better. Done. Also, I have added some checks
for “maybe” LSN-related logic to the test.

Attached is a proposal for a minor addition that would make sense to me, add
it if you think it's appropriate.

I think I've said enough, changing the status to "ready for committer" :-)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#18

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Antonin Houska (#17)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

I have changed approach, so it is better to start from this email:
/messages/by-id/CANtu0ohHu1r1xQfTzEJuxeaOMYncG7xRxUQWdH=cMXZSf+nzvg@mail.gmail.com

#19

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Antonin Houska (#17)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Woo-hoo :)

Attached is a proposal for a minor addition that would make sense to me, add
it if you think it's appropriate.

Yes, I'll add to the patch.

I think I've said enough, changing the status to "ready for committer" :-)

Thanks a lot for your help and attention!

Best regards,
Michail.

#20

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Michail Nikolaev (#19)

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

Hello.

Attached is a proposal for a minor addition that would make sense to me, add
it if you think it's appropriate.

Added. Also, I updated the documentation a little.

I have changed approach, so it is better to start from this email:

Oops, I was thinking the comments feature in the commitfest app works
in a different way :)

Best regards,
Michail.

#21

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michail Nikolaev (#20)

#22

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#21)

#23

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: Michail Nikolaev (#22)

#24

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Justin Pryzby (#23)

#25

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michail Nikolaev (#24)

#26

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#25)

#27

michail.nikolaev@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#26)

#28

Andres Freund

andres@anarazel.de

about 4 years ago

In reply to: Julien Rouhaud (#25)

#29

michail.nikolaev@gmail.com

about 4 years ago

In reply to: Andres Freund (#28)

#30

Bruce Momjian

bruce@momjian.us

about 4 years ago

In reply to: Michail Nikolaev (#29)

#31

pg@bowt.ie

about 4 years ago

In reply to: Bruce Momjian (#30)

#32

pg@bowt.ie

about 4 years ago

In reply to: Peter Geoghegan (#31)

#33

michail.nikolaev@gmail.com

about 4 years ago

In reply to: Bruce Momjian (#30)

#34

michail.nikolaev@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#32)

#35

michail.nikolaev@gmail.com

about 4 years ago

In reply to: Michail Nikolaev (#34)

#36

pg@bowt.ie

about 4 years ago

In reply to: Michail Nikolaev (#34)

#37

David G. Johnston

david.g.johnston@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#36)

#38

David G. Johnston

david.g.johnston@gmail.com

about 4 years ago

In reply to: Michail Nikolaev (#29)

#39

michail.nikolaev@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#36)

#40

michail.nikolaev@gmail.com

about 4 years ago

In reply to: David G. Johnston (#38)

#41