Re: Can we change pg_rewind used without wal_log_hints and data_checksums

Started by lchch1990@sina.cn3 months ago4 messageshackers
Jump to latest
#1lchch1990@sina.cn
lchch1990@sina.cn

See /messages/by-id/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com
In more detail:
1. there is a transaction open on the primary server (server A)
2. the transaction inserts a row
3. a checkpoint happens
4. the transaction commits
5. the session reads the row it just inserted, which sets hint bits on the row
that mark it as generally visible
Now the standby (server B) promoted between steps 3 and 4, which means that on server B
(the new primary), the transaction didn't commit and the row is invisible.
Now if we run pg_rewind on server A, it examines the local WAL to find all the blocks
that were modified after the last common checkpoint (which happened in step 3 above).
If neither wal_log_hints = on nor checksums are enabled (which effectively forces
WAL-logging hint bit changes), there is no track of step 5 in the WAL, and pg_rewind
fails to copy that block from server B. The consequence is that after pg_rewind, the
row is *still* visible on server A because of the hint bits. That is data corruption.
Therefore, the requirement cannot be relaxed.

Yes I known the step and I have check the mail link. As described in the top mail we can find some way to solve the problem so that pg_rewind can run without wal_log_hints and data_checksums. 

Currently pg_rewind search wal start at checkpoint lsn or redo lsn, I mean to search more wal to cover whole releated transactions so any releated pages with copyed, and we never warried about hint bits issue. 

Anyway, I wish my mail in right format. Because my last mail reply to Michael out of order and  miss from this thread. 
---Movead Li

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: lchch1990@sina.cn (#1)

On Thu, 2026-01-15 at 14:14 +0800, lchch1990@sina.cn wrote:

See /messages/by-id/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com

Yes I known the step and I have check the mail link. 
As described in the top mail we can find some way to solve the problem so that 
pg_rewind can run without wal_log_hints and data_checksums. 

Currently pg_rewind search wal start at checkpoint lsn or redo lsn, I mean to search more 
wal to cover whole releated transactions so any releated pages with copyed, and we never 
warried about hint bits issue. 

I apologize for my misunderstanding.

I had a brief look at the patch, and the gratuitous use of static variables didn't
appeal to me. Can you briefly describe the algorithm? You look at all commit
records *after* the fork, right? Then how can you identify how far back you have
rewind? How can you identify when a transaction started?

Yours,
Laurenz Albe

#3lchch1990@sina.cn
lchch1990@sina.cn
In reply to: lchch1990@sina.cn (#1)

On Thu, 2026-01-16 at 03:08 +0800, laurenz.albe@cybertec.at wrote:

I had a brief look at the patch, and the gratuitous use of static variables didn't
appeal to me. 

Sorry about that, and I can find a nice way if the design is good.

Can you briefly describe the algorithm?

I think the algorithm is introduced in the patch mail, let me do a summary

During a forward WAL walk, the system collects the minimal commit transaction
ID. If it find when this transaction ID assigned, a safety rewind can be performed.

You look at all commit records *after* the fork, right? 

It's all record after for fork, not only commited. I do not change the code logic.
Maybe commit record is enougn.

On the other hand, my patch collect all record *before* the fork, maybe
it will cause many meaningless copy and I can fix that.

Then how can you identify how far back you have rewind? 

It's the point that we can known the stop point only by walk wal. So it maybe
take long time, and it's the reason I add a '-d, --deep-dig' option.

How can you identify when a transaction started?

We find it by XLOG_RUNNING_XACTS wal record, this wal record have a *nextXid*
which mean a unassigned transaction id when produce the wal record.

 

----
Best Regards,
Movead Li

 

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: lchch1990@sina.cn (#3)

On Fri, 2026-01-16 at 08:04 +0800, Movead wrote:

How can you identify when a transaction started?

We find it by XLOG_RUNNING_XACTS wal record, this wal record  have a *nextXid*
which mean a unassigned transaction id when produce the wal record.

I see; thanks for the explanation.

Yours,
Laurenz Albe