Issues in Replication Progress Tracking

Started by Amit Kapilaover 10 years ago7 messages
#1Amit Kapila
amit.kapila16@gmail.com

While reading the commit- 5aa23504 for Replication Progress
Tracking, I came across few issues which I would like to share.

1. catalogs.sgml
+     <row>
+      <entry><structfield>local_lsn</structfield></entry>
+
<entry><type>pg_lsn</type></entry>
+      <entry></entry>
+      <entry>This node's LSN that at
+      which
<literal>remote_lsn</literal> has been replicated. Used to
+      flush commit records before persisting data
to disk when using
+      asynchronous commits.</entry>

I think part of the sentence is not very clear.
"This node's LSN *that at which* remote_lsn has been
replicated."

Reference link in docs
http://www.postgresql.org/docs/devel/static/catalog-pg-replication-origin-status.html

2.
In the page,
http://www.postgresql.org/docs/devel/static/replication-origins.html
closing ')' is missing.

Check below line:

Using the replication origin infrastructure a session can be marked
as replaying from a remote node (using
the pg_replication_origin_session_setup() function.

3.
+      <row id="pg-replication-origin-session-progress">
+       <entry>
+        <indexterm>
+
<primary>pg_replication_origin_session_progress</primary>
+        </indexterm>
+
<literal><function>pg_replication_origin_progress(<parameter>flush</parameter>
<type>bool</type>)
</function></literal>
+       </entry>

Specified function name is wrong
/pg_replication_origin_progress/pg_replication_origin_session_progress

Refer below page in docs:
http://www.postgresql.org/docs/devel/static/functions-admin.html#PG-REPLICATION-ORIGIN-CREATE

4.
xloginsert.c
+/* Should te in-progress insertion log the origin */
+static bool include_origin = false;
+

typo /te

5.
origin.c

* * To create and drop replication origins an exclusive lock on
* pg_replication_slot is required for the
duration. That allows us to
* safely and conflict free assign new origins using a dirty snapshot.

a. Isn't in above comment, we need to use pg_replication_origin
instead of pg_replication_slot?
b. "..safely and conflict free assign..", I understand this part
of comment, but not sure if this is the best way to write it.

6.
origin.c
* setting up replorigin_sesssion_origin_lsn and replorigin_sesssion_origin
that ensures we

line length greater than 80 and looks slightly odd.

7.
origin.c
replorigin_create()
{
..
if (tuple == NULL)
ereport(ERROR,
(errcode
(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("no free replication oid could be
found")));
}

I think here error message should be
"no free replication origin oid could be found"

8.
origin.c
replorigin_drop()
{
..
tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
simple_heap_delete(rel, &tuple-

t_self);

..
}

Isn't it better to have check for a valid tuple after SearchSysCache1()?
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed

9.
origin.c
ReplicationOriginShmemSize(void)
{
..
* XXX: max_replication_slots is arguablethe wrong
..
}

a. *arguablethe*, space is required and it better to use arguably
b. One thing that in favour of using a separate/new guc for
ReplicationState is that even if the user has configured
max_replication_slots for some other usage (other than
tracking Replication Origin) of ReplicationSlots, even then we
will end up allocating shared memory which will never be used,
OTOH as the memory will not be huge, so we can even ignore it.

10.
origin.c
CheckPointReplicationOrigin()
{
..
if (unlink(tmppath) < 0 && errno != ENOENT)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m",
path)));
}

In ereport, tmppath should be used instead of path.

11.
origin.c
* local_commit needs to be a local LSN of the commit so that we can make
sure
* uppon a checkpoint that
enough WAL has been persisted to disk.

/uppon/upon

12.
In funcions replorigin_advance() and replorigin_session_setup(),
different ways (free_state and free_slot) are used. Isn't it better
to use same way?

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

14.
pg_replication_origin_session_reset()
{
..
/* FIXME */
replorigin_sesssion_origin = InvalidRepOriginId;
..
}

What needs be fixed in FIXME is not clear?

15.
Below changes in pg_resetxlog.c seems redundant.
--------------------- src/bin/pg_resetxlog/pg_resetxlog.c
---------------------
index a0805d8..4a22575 100644
@@ -56,6 +56,8 @@
 #include "common/restricted_token.h"
 #include "storage/large_object.h"
 #include "pg_getopt.h"
+#include "replication/logical.h"
+#include "replication/origin.h"

static ControlFileData ControlFile; /* pg_control values */
@@ -1091,6 +1093,7 @@ WriteEmptyXLOG(void)
record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort +
sizeof(CheckPoint);
record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
record->xl_rmid = RM_XLOG_ID;
+
recptr += SizeOfXLogRecord;
*(recptr++) = XLR_BLOCK_ID_DATA_SHORT;
*(recptr++) = sizeof(CheckPoint);

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#2Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#1)
Re: Issues in Replication Progress Tracking

Hi,

Thanks for looking through this!

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:

5.
origin.c

* * To create and drop replication origins an exclusive lock on
* pg_replication_slot is required for the
duration. That allows us to
* safely and conflict free assign new origins using a dirty snapshot.

b. "..safely and conflict free assign..", I understand this part
of comment, but not sure if this is the best way to write it.

Hm, don't see a problem with that part.

8.
origin.c
replorigin_drop()
{
..
tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
simple_heap_delete(rel, &tuple-

t_self);

..
}

Isn't it better to have check for a valid tuple after SearchSysCache1()?
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed

Sounds good to me.

9.
origin.c
ReplicationOriginShmemSize(void)
{
..
* XXX: max_replication_slots is arguablethe wrong
..
}

b. One thing that in favour of using a separate/new guc for
ReplicationState is that even if the user has configured
max_replication_slots for some other usage (other than
tracking Replication Origin) of ReplicationSlots, even then we
will end up allocating shared memory which will never be used,
OTOH as the memory will not be huge, so we can even ignore it.

I don't think it matters much for now, as you say it's only a small
amount of memory.

12.
In funcions replorigin_advance() and replorigin_session_setup(),
different ways (free_state and free_slot) are used. Isn't it better
to use same way?

Phew, I don't really care.

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

No, the point is that the replication progress is persisted via an extra
data block in the commit record. That's important for both performance
and correctness, because otherwise it gets hard to tie a transaction
made during replay with the update to the progress. Unless you use 2PC
which isn't really an alternative.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#2)
Re: Issues in Replication Progress Tracking

On Thu, May 21, 2015 at 12:42 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

No, the point is that the replication progress is persisted via an extra
data block in the commit record. That's important for both performance
and correctness, because otherwise it gets hard to tie a transaction
made during replay with the update to the progress. Unless you use 2PC
which isn't really an alternative.

Okay, but what triggered this question was the difference of those functions
as compare to when user call function pg_replication_origin_advance().
pg_replication_origin_advance() will WAL log the information during that
function call itself (via replorigin_advance()). So even if the transaction
issuing pg_replication_origin_advance() function will abort, it will still
update
the Replication State, why so?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#4Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#3)
Re: Issues in Replication Progress Tracking

On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:

On Thu, May 21, 2015 at 12:42 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

No, the point is that the replication progress is persisted via an extra
data block in the commit record. That's important for both performance
and correctness, because otherwise it gets hard to tie a transaction
made during replay with the update to the progress. Unless you use 2PC
which isn't really an alternative.

Okay, but what triggered this question was the difference of those functions
as compare to when user call function pg_replication_origin_advance().
pg_replication_origin_advance() will WAL log the information during that
function call itself (via replorigin_advance()). So even if the transaction
issuing pg_replication_origin_advance() function will abort, it will still
update
the Replication State, why so?

I don't see a problem here. pg_replication_origin_advance() is for
setting up the initial position/update the position upon configuration
changes. It'd be a fair amount of infrastructure to make it tie into
transactions - without a point to it afaics?

(Just to be clear, I plan to address all the points I've not commented
upon)

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#4)
Re: Issues in Replication Progress Tracking

On Fri, May 22, 2015 at 11:20 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:

On Thu, May 21, 2015 at 12:42 AM, Andres Freund <andres@anarazel.de>

wrote:

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

No, the point is that the replication progress is persisted via an

extra

data block in the commit record. That's important for both performance
and correctness, because otherwise it gets hard to tie a transaction
made during replay with the update to the progress. Unless you use 2PC
which isn't really an alternative.

Okay, but what triggered this question was the difference of those

functions

as compare to when user call function pg_replication_origin_advance().
pg_replication_origin_advance() will WAL log the information during that
function call itself (via replorigin_advance()). So even if the

transaction

issuing pg_replication_origin_advance() function will abort, it will

still

update
the Replication State, why so?

I don't see a problem here. pg_replication_origin_advance() is for
setting up the initial position/update the position upon configuration
changes.

Okay, I am not aware how exactly these API's will be used for replication
but let me try to clarify what I have in mind related to this API usage.

Can we use pg_replication_origin_advance() for node where Replay has
to happen, if Yes, then Let us say user of pg_replication_origin_advance()
API set the lsn position to X for the node N1 on which replay has to
happen, so now replay will proceed from X + 1 even though the
information related to X is not persisted, so now it could so happen
X will get written after the replay of X + 1 which might lead to problem
after crash recovery?

It'd be a fair amount of infrastructure to make it tie into
transactions - without a point to it afaics?

Agreed, that if there is no valid use case then we should keep it
as it is.

(Just to be clear, I plan to address all the points I've not commented
upon)

Thanks.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#6Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#5)
Re: Issues in Replication Progress Tracking

On May 22, 2015 10:23:06 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, May 22, 2015 at 11:20 PM, Andres Freund <andres@anarazel.de>
wrote:

On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:

On Thu, May 21, 2015 at 12:42 AM, Andres Freund

<andres@anarazel.de>
wrote:

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

No, the point is that the replication progress is persisted via

an
extra

data block in the commit record. That's important for both

performance

and correctness, because otherwise it gets hard to tie a

transaction

made during replay with the update to the progress. Unless you

use 2PC

which isn't really an alternative.

Okay, but what triggered this question was the difference of those

functions

as compare to when user call function

pg_replication_origin_advance().

pg_replication_origin_advance() will WAL log the information during

that

function call itself (via replorigin_advance()). So even if the

transaction

issuing pg_replication_origin_advance() function will abort, it

will
still

update
the Replication State, why so?

I don't see a problem here. pg_replication_origin_advance() is for
setting up the initial position/update the position upon

configuration

changes.

Okay, I am not aware how exactly these API's will be used for
replication
but let me try to clarify what I have in mind related to this API
usage.

Can we use pg_replication_origin_advance() for node where Replay has
to happen, if Yes, then Let us say user of
pg_replication_origin_advance()
API set the lsn position to X for the node N1 on which replay has to
happen, so now replay will proceed from X + 1 even though the
information related to X is not persisted, so now it could so happen
X will get written after the replay of X + 1 which might lead to
problem
after crash recovery?

Then you'd use the session variant - which does tie into transactions. Is there a reason that's be unsuitable for you?

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#6)
Re: Issues in Replication Progress Tracking

On Sat, May 23, 2015 at 11:19 AM, Andres Freund <andres@anarazel.de> wrote:

On May 22, 2015 10:23:06 PM PDT, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Fri, May 22, 2015 at 11:20 PM, Andres Freund <andres@anarazel.de>
wrote:

On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:

On Thu, May 21, 2015 at 12:42 AM, Andres Freund

<andres@anarazel.de>
wrote:

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

No, the point is that the replication progress is persisted via

an
extra

data block in the commit record. That's important for both

performance

and correctness, because otherwise it gets hard to tie a

transaction

made during replay with the update to the progress. Unless you

use 2PC

which isn't really an alternative.

Okay, but what triggered this question was the difference of those

functions

as compare to when user call function

pg_replication_origin_advance().

pg_replication_origin_advance() will WAL log the information during

that

function call itself (via replorigin_advance()). So even if the

transaction

issuing pg_replication_origin_advance() function will abort, it

will
still

update
the Replication State, why so?

I don't see a problem here. pg_replication_origin_advance() is for
setting up the initial position/update the position upon

configuration

changes.

Okay, I am not aware how exactly these API's will be used for
replication
but let me try to clarify what I have in mind related to this API
usage.

Can we use pg_replication_origin_advance() for node where Replay has
to happen, if Yes, then Let us say user of
pg_replication_origin_advance()
API set the lsn position to X for the node N1 on which replay has to
happen, so now replay will proceed from X + 1 even though the
information related to X is not persisted, so now it could so happen
X will get written after the replay of X + 1 which might lead to
problem
after crash recovery?

Then you'd use the session variant - which does tie into transactions. Is

there a reason that's be unsuitable for you?

I don't know because I have not thought about how one can
use these API's in various ways in design of the Replication
system, but I think ideally we should prohibit the use of API
in circumstances where it could lead to problem or if that is
difficult or not possible or not worth, then at least we can mention
the same in docs.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com