checkpoint clarifications needed

Started by Tom DalPozzoover 9 years ago12 messagesgeneral
Jump to latest
#1Tom DalPozzo
t.dalpozzo@gmail.com

Hi, I need some clarifications about checkpoints.
Below here a log from my standby server when started and then some parts of
the interested WAL in the master's cluster obtained by pg_xlogdump.
Just to have an example to talk on.

1) I see: "LOG: redo starts at 1/F00A7448" . I was expecting a checkpoint
record around 1/F00A7448 but the related checkpoint record is at lsn:
1/FCBD7510 instead.
I expected that because I read in the doc :"Checkpoints are points in the
sequence of transactions at which it is guaranteed that the heap and index
data files have been updated with ALL information written BEFORE that
checkpoint".
And I interpreted that as "All information written before that checkpoint
RECORD" but now I guess that one thing is a checkpoint point and one thing
is a checkpoint record. Right?

2) I see that a checkpoint position can be right in the middle of a group
of records related to a transaction (in the example, transaction id 10684).
So a checkpoint position is NOT a consistency state point, right?

3) According to doc at 29.5, in pg_control the position of the last
checkpoint record (not the checkpoint position itself) is saved right?

4) If I'm right at 2) then, between the checkpoint position (1/F00A7448 )
and the checkpoint record position (1/FCBD7510) there must be a point where
the DB is in a consistency state. If not, in case of crash just after
writing the checkpoint record to the WAL and its position to pg_control,
the system would replay from the checkpoint position (known by last
checkpoint record) without finding a consistency state. Right?

5) How can we define, in terms of log records, a consistency state position?

Best regards
Pupillo

STANDBY SERVER LOG
LOG: redo starts at 1/F00A7448
....
LOG: consistent recovery state reached at 2/426DF28
LOG: invalid record length at 2/426DF28: wanted 24, got 0
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 2/4000000 on timeline 1

FROM PG_XLOGDUMP OF MASTER
........
rmgr: Heap len (rec/tot): 14/ 1186, tx: 10684, lsn:
1/F009EEE0, prev 1/F009EEA8, desc: UPDATE off 1 xmax 10684 ; new off 3 xmax
0, blkref #0: rel 1663/16384/16422 blk 20054, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btree len (rec/tot): 2/ 64, tx: 10684, lsn:
1/F009F388, prev 1/F009EEE0, desc: INSERT_LEAF off 2, blkref #0: rel
1663/16384/16428 blk 711
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F009F3C8, prev 1/F009F388, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9663 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A13E8, prev 1/F009F3C8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9664 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A3408, prev 1/F00A13E8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9665 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A5428, prev 1/F00A3408, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9666 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A7448, prev 1/F00A5428, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9667 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A9468, prev 1/F00A7448, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9668 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00AB488, prev 1/F00A9468, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9669 FPW
rmgr: Heap len (rec/tot): 8/ 8063, tx: 10682, lsn:
1/F00AD4A8, prev 1/F00AB488, desc: LOCK off 5: xid 10682: flags 0 LOCK_ONLY
EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19787 FPW
rmgr: Heap len (rec/tot): 8/ 8063, tx: 10684, lsn:
1/F00AF440, prev 1/F00AD4A8, desc: LOCK off 2: xid 10684: flags 0 LOCK_ONLY
EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19774 FPW
rmgr: Heap len (rec/tot): 14/ 4657, tx: 10682, lsn:
1/F00B13D8, prev 1/F00AF440, desc: UPDATE off 5 xmax 10682 ; new off 4 xmax
0, blkref #0: rel 1663/16384/16422 blk 20075 FPW, blkref #1: rel
1663/16384/16422 blk 19787
rmgr: Heap len (rec/tot): 14/ 4657, tx: 10684, lsn:
1/F00B2628, prev 1/F00B13D8, desc: UPDATE off 2 xmax 10684 ; new off 4 xmax
0, blkref #0: rel 1663/16384/16422 blk 20054 FPW, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btr
.......
rmgr: Heap2 len (rec/tot): 8/ 68, tx: 0, lsn:
1/FCBD7448, prev 1/FCBD7400, desc: CLEAN remxid 10903, blkref #0: rel
1663/16384/16422 blk 2001
rmgr: Heap2 len (rec/tot): 8/ 66, tx: 0, lsn:
1/FCBD7490, prev 1/FCBD7448, desc: CLEAN remxid 10903, blkref #0: rel
1663/16384/16422 blk 2003
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
1/FCBD74D8, prev 1/FCBD7490, desc: RUNNING_XACTS nextXid 10907
latestCompletedXid 10906 oldestRunningXid 10907
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
1/FCBD7510, prev 1/FCBD74D8, desc: CHECKPOINT_ONLINE redo 1/F00A7448; tli
1; prev tli 1; fpw true; xid 0:10685; oid 24665; multi 1; offset 0; oldest
xid 584 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 10682; online
rmgr: Heap len (rec/tot): 3/ 164, tx: 10907, lsn:
1/FCBD7580, prev 1/FCBD7510, desc: INSERT off 25, blkref #0: rel
1663/16384/16484 blk 16398
rmgr: Btree len (rec/tot): 2/ 64, tx: 10907, lsn:
1/FCBD7628, prev 1/FCBD7580, desc: INSERT_LEAF off 60, blkref #0: rel
1663/16384/16490 blk 2722
........

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#1)
Re: checkpoint clarifications needed

On 01/09/2017 06:14 AM, Tom DalPozzo wrote:

Hi, I need some clarifications about checkpoints.
Below here a log from my standby server when started and then some parts
of the interested WAL in the master's cluster obtained by pg_xlogdump.
Just to have an example to talk on.

1) I see: "LOG: redo starts at 1/F00A7448" . I was expecting a
checkpoint record around 1/F00A7448 but the related checkpoint record is
at lsn: 1/FCBD7510 instead.
I expected that because I read in the doc :"Checkpoints are points in
the sequence of transactions at which it is guaranteed that the heap and
index data files have been updated with ALL information written BEFORE
that checkpoint".
And I interpreted that as "All information written before that
checkpoint RECORD" but now I guess that one thing is a checkpoint point
and one thing is a checkpoint record. Right?

2) I see that a checkpoint position can be right in the middle of a
group of records related to a transaction (in the example, transaction
id 10684). So a checkpoint position is NOT a consistency state point, right?

3) According to doc at 29.5, in pg_control the position of the last
checkpoint record (not the checkpoint position itself) is saved right?

4) If I'm right at 2) then, between the checkpoint position (1/F00A7448
) and the checkpoint record position (1/FCBD7510) there must be a point
where the DB is in a consistency state. If not, in case of crash just
after writing the checkpoint record to the WAL and its position to
pg_control, the system would replay from the checkpoint position (known
by last checkpoint record) without finding a consistency state. Right?

5) How can we define, in terms of log records, a consistency state position?

https://www.postgresql.org/docs/9.5/static/wal-internals.html

"After a checkpoint has been made and the log flushed, the checkpoint's
position is saved in the file pg_control. Therefore, at the start of
recovery, the server first reads pg_control and then the checkpoint
record; then it performs the REDO operation by scanning forward from the
log position indicated in the checkpoint record. Because the entire
content of data pages is saved in the log on the first page modification
after a checkpoint (assuming full_page_writes is not disabled), all
pages changed since the checkpoint will be restored to a consistent state."

Best regards
Pupillo

STANDBY SERVER LOG
LOG: redo starts at 1/F00A7448
....
LOG: consistent recovery state reached at 2/426DF28
LOG: invalid record length at 2/426DF28: wanted 24, got 0
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 2/4000000 on timeline 1

FROM PG_XLOGDUMP OF MASTER
........
rmgr: Heap len (rec/tot): 14/ 1186, tx: 10684, lsn:
1/F009EEE0, prev 1/F009EEA8, desc: UPDATE off 1 xmax 10684 ; new off 3
xmax 0, blkref #0: rel 1663/16384/16422 blk 20054, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btree len (rec/tot): 2/ 64, tx: 10684, lsn:
1/F009F388, prev 1/F009EEE0, desc: INSERT_LEAF off 2, blkref #0: rel
1663/16384/16428 blk 711
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F009F3C8, prev 1/F009F388, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9663 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A13E8, prev 1/F009F3C8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9664 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A3408, prev 1/F00A13E8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9665 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A5428, prev 1/F00A3408, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9666 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A7448, prev 1/F00A5428, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9667 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00A9468, prev 1/F00A7448, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9668 FPW
rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
1/F00AB488, prev 1/F00A9468, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9669 FPW
rmgr: Heap len (rec/tot): 8/ 8063, tx: 10682, lsn:
1/F00AD4A8, prev 1/F00AB488, desc: LOCK off 5: xid 10682: flags 0
LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19787 FPW
rmgr: Heap len (rec/tot): 8/ 8063, tx: 10684, lsn:
1/F00AF440, prev 1/F00AD4A8, desc: LOCK off 2: xid 10684: flags 0
LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19774 FPW
rmgr: Heap len (rec/tot): 14/ 4657, tx: 10682, lsn:
1/F00B13D8, prev 1/F00AF440, desc: UPDATE off 5 xmax 10682 ; new off 4
xmax 0, blkref #0: rel 1663/16384/16422 blk 20075 FPW, blkref #1: rel
1663/16384/16422 blk 19787
rmgr: Heap len (rec/tot): 14/ 4657, tx: 10684, lsn:
1/F00B2628, prev 1/F00B13D8, desc: UPDATE off 2 xmax 10684 ; new off 4
xmax 0, blkref #0: rel 1663/16384/16422 blk 20054 FPW, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btr
.......
rmgr: Heap2 len (rec/tot): 8/ 68, tx: 0, lsn:
1/FCBD7448, prev 1/FCBD7400, desc: CLEAN remxid 10903, blkref #0: rel
1663/16384/16422 blk 2001
rmgr: Heap2 len (rec/tot): 8/ 66, tx: 0, lsn:
1/FCBD7490, prev 1/FCBD7448, desc: CLEAN remxid 10903, blkref #0: rel
1663/16384/16422 blk 2003
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
1/FCBD74D8, prev 1/FCBD7490, desc: RUNNING_XACTS nextXid 10907
latestCompletedXid 10906 oldestRunningXid 10907
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
1/FCBD7510, prev 1/FCBD74D8, desc: CHECKPOINT_ONLINE redo 1/F00A7448;
tli 1; prev tli 1; fpw true; xid 0:10685; oid 24665; multi 1; offset 0;
oldest xid 584 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
timestamp xid: 0/0; oldest running xid 10682; online
rmgr: Heap len (rec/tot): 3/ 164, tx: 10907, lsn:
1/FCBD7580, prev 1/FCBD7510, desc: INSERT off 25, blkref #0: rel
1663/16384/16484 blk 16398
rmgr: Btree len (rec/tot): 2/ 64, tx: 10907, lsn:
1/FCBD7628, prev 1/FCBD7580, desc: INSERT_LEAF off 60, blkref #0: rel
1663/16384/16490 blk 2722
........

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Adrian Klaver (#2)
Re: checkpoint clarifications needed

https://www.postgresql.org/docs/9.5/static/wal-internals.html

"After a checkpoint has been made and the log flushed, the checkpoint's
position is saved in the file pg_control. Therefore, at the start of
recovery, the server first reads pg_control and then the checkpoint record;
then it performs the REDO operation by scanning forward from the log
position indicated in the checkpoint record. Because the entire content of
data pages is saved in the log on the first page modification after a
checkpoint (assuming full_page_writes is not disabled), all pages changed
since the checkpoint will be restored to a consistent state."

Hi, yes I know that, it's what I meant in my point 3). As it says first

"the checkpoint's position (NOT the record)is saved in the file
pg_control" then I had some doubt.
Regards
Pupillo

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#3)
Re: checkpoint clarifications needed

On 01/09/2017 06:47 AM, Tom DalPozzo wrote:

https://www.postgresql.org/docs/9.5/static/wal-internals.html
<https://www.postgresql.org/docs/9.5/static/wal-internals.html&gt;

"After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file pg_control. Therefore, at
the start of recovery, the server first reads pg_control and then
the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record. Because the entire content of data pages is saved in the log
on the first page modification after a checkpoint (assuming
full_page_writes is not disabled), all pages changed since the
checkpoint will be restored to a consistent state."

Hi, yes I know that, it's what I meant in my point 3). As it says first
"the checkpoint's position (NOT the record)is saved in the file
pg_control" then I had some doubt.

Yes it is just one piece of information stored in the file.

To see what else is stored there do:

pg_controldata -D your_cluster_data_directory

Regards
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#3)
Re: checkpoint clarifications needed

On 01/09/2017 06:47 AM, Tom DalPozzo wrote:

https://www.postgresql.org/docs/9.5/static/wal-internals.html
<https://www.postgresql.org/docs/9.5/static/wal-internals.html&gt;

"After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file pg_control. Therefore, at
the start of recovery, the server first reads pg_control and then
the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record. Because the entire content of data pages is saved in the log
on the first page modification after a checkpoint (assuming
full_page_writes is not disabled), all pages changed since the
checkpoint will be restored to a consistent state."

Hi, yes I know that, it's what I meant in my point 3). As it says first
"the checkpoint's position (NOT the record)is saved in the file
pg_control" then I had some doubt.

Reread your original post and realized you where also asking about
transaction consistency and WALs. The thumbnail version is that Postgres
writes transactions to the WALs before they are written to the data
files on disk. A checkpoint represents a point in the sequence when is
is known that the changes recorded in the WAL have been also recorded in
the disk data files. So Postgres then knows that in a recovery scenario
it needs to only redo/replay the WAL changes that are past the last
checkpoint. So the transactions are there it is just a matter of if they
need to be replayed or not. This is subject to caveats:

https://www.postgresql.org/docs/9.5/static/wal-reliability.html

Regards
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Adrian Klaver (#5)
Re: checkpoint clarifications needed

Reread your original post and realized you where also asking about

transaction consistency and WALs. The thumbnail version is that Postgres
writes transactions to the WALs before they are written to the data files
on disk. A checkpoint represents a point in the sequence when is is known
that the changes recorded in the WAL have been also recorded in the disk
data files. So Postgres then knows that in a recovery scenario it needs to
only redo/replay the WAL changes that are past the last checkpoint. So the
transactions are there it is just a matter of if they need to be replayed
or not. This is subject to caveats:

https://www.postgresql.org/docs/9.5/static/wal-reliability.html

Hi, I had already read that doc but I can't answer clearly to my

questions 2,4 and 5.
Regards
Pupillo

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#6)
Re: checkpoint clarifications needed

On 01/09/2017 01:10 PM, Tom DalPozzo wrote:

Reread your original post and realized you where also asking
about transaction consistency and WALs. The thumbnail version is
that Postgres writes transactions to the WALs before they are
written to the data files on disk. A checkpoint represents a
point in the sequence when is is known that the changes recorded
in the WAL have been also recorded in the disk data files. So
Postgres then knows that in a recovery scenario it needs to only
redo/replay the WAL changes that are past the last checkpoint.
So the transactions are there it is just a matter of if they
need to be replayed or not. This is subject to caveats:

https://www.postgresql.org/docs/9.5/static/wal-reliability.html
<https://www.postgresql.org/docs/9.5/static/wal-reliability.html&gt;

Hi, I had already read that doc but I can't answer clearly to my
questions 2,4 and 5.

The answer would seem to depend on what you consider 'a consistency
state position'. Is it possible to be more explicit about what you mean?

Regards
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom DalPozzo (#1)
Re: checkpoint clarifications needed

Tom DalPozzo wrote:

2) I see that a checkpoint position can be right in the middle of a group
of records related to a transaction (in the example, transaction id 10684).
So a checkpoint position is NOT a consistency state point, right?

4) If I'm right at 2) then, between the checkpoint position (1/F00A7448 )
and the checkpoint record position (1/FCBD7510) there must be a point where
the DB is in a consistency state. If not, in case of crash just after
writing the checkpoint record to the WAL and its position to pg_control,
the system would replay from the checkpoint position (known by last
checkpoint record) without finding a consistency state. Right?

5) How can we define, in terms of log records, a consistency state position?

Whether any individual tuple in the data files is visible or not depends
not only on the data itself, but also on the commit status of the
transactions that created it (and deleted it, if any). Replaying WAL
also updates the commit status of transactions, so if you're in the
middle of replaying WAL, you may be adding tuples to the data files, but
those tuples will not become visible until their commit records are also
updated.

You can stop replaying WAL at any point, and data will always be in a
consistent state. Some data tuples might be "from the future" and those
will not be visible, which is what makes it all consistent.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Alvaro Herrera (#8)
Re: checkpoint clarifications needed

Whether any individual tuple in the data files is visible or not depends
not only on the data itself, but also on the commit status of the
transactions that created it (and deleted it, if any). Replaying WAL
also updates the commit status of transactions, so if you're in the
middle of replaying WAL, you may be adding tuples to the data files, but
those tuples will not become visible until their commit records are also
updated.

You can stop replaying WAL at any point, and data will always be in a
consistent state. Some data tuples might be "from the future" and those
will not be visible, which is what makes it all consistent.

Hi,

so let's suppose that the WAL is:
LSN 10: start transaction 123
LSN 11: update tuple 100
checkpoint position here (not a record but just for understanding)
LSN 12: update tuple 100
LSN 13: update tuple 100
LSN 14: checkpoint record ( postion=11)
LSN 15: update tuple 100
and that the system crashes now, before ending to write all the
transaction's recs to the WAL (other updates and commit record missing).

At the replay, starting from LSN 12, the entire page we had at LSN 11 is
written to the disk, though carrying inconsistent data.
Then we can even replay up to the end of WAL but always getting
inconsistent data.
BUT, you say, as the tuple is not commited in the WAL, only the old version
of the tuple will be visible? Right?

Regards
Pupillo

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom DalPozzo (#9)
Re: checkpoint clarifications needed

Tom DalPozzo wrote:

Hi,
so let's suppose that the WAL is:
LSN 10: start transaction 123
LSN 11: update tuple 100
checkpoint position here (not a record but just for understanding)
LSN 12: update tuple 100
LSN 13: update tuple 100
LSN 14: checkpoint record ( postion=11)
LSN 15: update tuple 100
and that the system crashes now, before ending to write all the
transaction's recs to the WAL (other updates and commit record missing).

At the replay, starting from LSN 12, the entire page we had at LSN 11 is
written to the disk, though carrying inconsistent data.
Then we can even replay up to the end of WAL but always getting
inconsistent data.
BUT, you say, as the tuple is not commited in the WAL, only the old version
of the tuple will be visible? Right?

Yes -- all the updated tuples are invisible because the commit record
for transaction 123 does not appear in wal. A future VACUUM will remove
all those tuples. Note that precisely for this reason, the original
version of the tuple had not been removed yet.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Alvaro Herrera (#10)
Re: checkpoint clarifications needed

Hi,
so let's suppose that the WAL is:
LSN 10: start transaction 123
LSN 11: update tuple 100
checkpoint position here (not a record but just for understanding)
LSN 12: update tuple 100
LSN 13: update tuple 100
LSN 14: checkpoint record ( postion=11)
LSN 15: update tuple 100
and that the system crashes now, before ending to write all the
transaction's recs to the WAL (other updates and commit record missing).

At the replay, starting from LSN 12, the entire page we had at LSN 11 is
written to the disk, though carrying inconsistent data.
Then we can even replay up to the end of WAL but always getting
inconsistent data.
BUT, you say, as the tuple is not commited in the WAL, only the old

version

of the tuple will be visible? Right?

Yes -- all the updated tuples are invisible because the commit record
for transaction 123 does not appear in wal. A future VACUUM will remove
all those tuples. Note that precisely for this reason, the original
version of the tuple had not been removed yet.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ok! Now many things are clear to me....
Thank you very much
Pupillo

#12Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Adrian Klaver (#7)
Re: checkpoint clarifications needed

Hi, I had already read that doc but I can't answer clearly to my

questions 2,4 and 5.

The answer would seem to depend on what you consider 'a consistency state
position'. Is it possible to be more explicit about what you mean?

Hi, I meant a position such that, if you replay up to it, then the DB is

in a consistent state (transactions done entirely or not a t all...).
But, as Alvaro wrote, any position is ok
Thank you very much
Pupillo