pg_rewind vs superuser

Started by Magnus Haganderalmost 7 years ago22 messages

magnus@hagander.net

almost 7 years ago

As pointed out by Michael Banck as a comment on my blogpost, the pg_rewind
documentation says it requires superuser permissions on the remote server.

Is that really so, though? I haven't tested it, but from a quick look at
the code it looks like it needs pg_ls_dir(), pg_stat_file() and
pg_read_binary_file(), all, of which are independently grantable.

Or am I missing something?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#1)

1 attachment(s)

Re: pg_rewind vs superuser

On Wed, Apr 03, 2019 at 11:28:50AM +0200, Magnus Hagander wrote:

As pointed out by Michael Banck as a comment on my blogpost, the pg_rewind
documentation says it requires superuser permissions on the remote server.

Is that really so, though? I haven't tested it, but from a quick look at
the code it looks like it needs pg_ls_dir(), pg_stat_file() and
pg_read_binary_file(), all, of which are independently grantable.

Or am I missing something?

Somebody I heard of has mentioned that stuff on his blog some time
ago:
https://paquier.xyz/postgresql-2/postgres-11-superuser-rewind/

And what you need to do is just that:
CREATE USER rewind_user LOGIN;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint,
bigint, boolean) TO rewind_user;

I think that we should document that and back-patch, as now the docs
only say that a superuser should be used, but that is wrong.

At the same time, let's also document that we need to use a checkpoint
on the promoted standby so as the control file gets a refresh and
pg_rewind is able to work properly. I promised that some time ago and
got reminded of that issue after seeing this thread...

What do you think about the attached?
--
Michael

Attachments:

rewind-docs.patchtext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index d98406c420..45425141fb 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -156,8 +156,10 @@ PostgreSQL documentation
        <para>
         Specifies a libpq connection string to connect to the source
         <productname>PostgreSQL</productname> server to synchronize the target with.
-        The connection must be a normal (non-replication) connection
-        with superuser access. This option requires the source
+        The connection must be a normal (non-replication) connection with a
+        superuser role or a role having sufficient permissions to execute the
+        functions used by <application>pg_rewind</application> on the source
+        server (see Notes section for details).  This option requires the source
         server to be running and not in recovery mode.
        </para>
       </listitem>
@@ -246,6 +248,30 @@ PostgreSQL documentation
  <refsect1>
   <title>Notes</title>
 
+  <para>
+   When executing <application>pg_rewind</application> using an online
+   cluster as source, a role having sufficient permissions to execute the
+   functions used by <application>pg_rewind</application> on the source
+   cluster can be used instead of a superuser.  Here is how to create such
+   a role, named <literal>rewind_user</literal> here:
+<programlisting>
+CREATE USER rewind_user LOGIN;
+GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO rewind_user;
+GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO rewind_user;
+GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO rewind_user;
+GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO rewind_user;
+</programlisting>  
+  </para>
+
+  <para>
+   When executing <application>pg_rewind</application> using an online
+   cluster as source which has been recently promoted, it is necessary
+   to execute a <command>CHECKPOINT</command> after promotion so as its
+   control file reflects up-to-date timeline information, which is used by
+   <application>pg_rewind</application> to check if the target cluster
+   can be rewound using the designated source cluster.
+  </para>
+
   <refsect2>
    <title>How it works</title>

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Michael Paquier (#2)

Re: pg_rewind vs superuser

On Thu, Apr 4, 2019 at 6:11 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Apr 03, 2019 at 11:28:50AM +0200, Magnus Hagander wrote:

As pointed out by Michael Banck as a comment on my blogpost, the

pg_rewind

documentation says it requires superuser permissions on the remote

server.

Is that really so, though? I haven't tested it, but from a quick look at
the code it looks like it needs pg_ls_dir(), pg_stat_file() and
pg_read_binary_file(), all, of which are independently grantable.

Or am I missing something?

Somebody I heard of has mentioned that stuff on his blog some time
ago:
https://paquier.xyz/postgresql-2/postgres-11-superuser-rewind/

Hah. I usually read your blog, but I had forgotten about that one :)

And what you need to do is just that:

CREATE USER rewind_user LOGIN;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint,
bigint, boolean) TO rewind_user;

I think that we should document that and back-patch, as now the docs
only say that a superuser should be used, but that is wrong.

At the same time, let's also document that we need to use a checkpoint
on the promoted standby so as the control file gets a refresh and
pg_rewind is able to work properly. I promised that some time ago and
got reminded of that issue after seeing this thread...

What do you think about the attached?

Looks good. Maybe we should list the "role having sufficient permissions"
before superuser, "just because", but not something I feel strongly about.

The part about CHECKPOINT also looks pretty good, but that's entirely
unrelated, right? :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#3)

Re: pg_rewind vs superuser

On Thu, Apr 04, 2019 at 10:18:45AM +0200, Magnus Hagander wrote:

Looks good. Maybe we should list the "role having sufficient permissions"
before superuser, "just because", but not something I feel strongly about.

Listing the superuser after sounds fine to me.

The part about CHECKPOINT also looks pretty good, but that's entirely
unrelated, right? :)

Completely unrelated, but as we are on this part of the documentation
now, and as we discussed that stuff face-to-face last September where
I actually promised to write a patch without doing it for seven
months, I see no problems to tackle this issue as well now. Better
later than never :)

I would like to apply this down to 9.5 for the checkpoint part and
down to 11 for the role part, so if anybody has any comments, please
feel free.
--
Michael

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Michael Paquier (#4)

Re: pg_rewind vs superuser

On Thu, Apr 4, 2019 at 12:43 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Apr 04, 2019 at 10:18:45AM +0200, Magnus Hagander wrote:

Looks good. Maybe we should list the "role having sufficient permissions"
before superuser, "just because", but not something I feel strongly

about.

Listing the superuser after sounds fine to me.

The part about CHECKPOINT also looks pretty good, but that's entirely
unrelated, right? :)

Completely unrelated, but as we are on this part of the documentation
now, and as we discussed that stuff face-to-face last September where
I actually promised to write a patch without doing it for seven
months, I see no problems to tackle this issue as well now. Better
later than never :)

:) Nope, I definitely think we need to include that.

I would like to apply this down to 9.5 for the checkpoint part and

down to 11 for the role part, so if anybody has any comments, please
feel free.

All of it, or just the checkpoint part? I assume just the checkpoint part?
AFAIK it does require superuser in those earlier versions?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#5)

Re: pg_rewind vs superuser

On Thu, Apr 04, 2019 at 01:19:44PM +0200, Magnus Hagander wrote:

All of it, or just the checkpoint part? I assume just the checkpoint part?
AFAIK it does require superuser in those earlier versions?

I meant of course the checkpoint part down to 9.5, and the rest down
to 11, so done this way.
--
Michael

Michael Banck

mbanck@gmx.net

almost 7 years ago

In reply to: Michael Paquier (#2)

Re: pg_rewind vs superuser

Hi,

On Thu, Apr 04, 2019 at 01:11:29PM +0900, Michael Paquier wrote:

At the same time, let's also document that we need to use a checkpoint
on the promoted standby so as the control file gets a refresh and
pg_rewind is able to work properly. I promised that some time ago and
got reminded of that issue after seeing this thread...

Is there a good reason why Postgres doesn't just issue a CHECKPOINT
after promote itself? After all, this seems to be about making the
control file having the proper content, which sounds like a good thing
to have in general.

Could this be a problem for anything else besides pg_rewind?

This looks like a needless footgun waiting to happen, and just
documenting it in pg_rewind's notes section looks a bit too hidden to me
(but is certainly an improvement).

Michael

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Michael Banck (#7)

Re: pg_rewind vs superuser

On Fri, Apr 05, 2019 at 09:41:58AM +0200, Michael Banck wrote:

Is there a good reason why Postgres doesn't just issue a CHECKPOINT
after promote itself? After all, this seems to be about making the
control file having the proper content, which sounds like a good thing
to have in general.

The startup process requests a checkpoint since 9.3, and before that
it was doing the checkpoint by itself (grep for fast_promoted and
RequestCheckpoint() around line 7579 in xlog.c). This allows the
recovery to finish much faster.

Could this be a problem for anything else besides pg_rewind?

Not that I know of, at least not in the tree.

This looks like a needless footgun waiting to happen, and just
documenting it in pg_rewind's notes section looks a bit too hidden to me
(but is certainly an improvement).

We had a couple of reports on the matter over the past years. Perhaps
we could use a big fat warning but that feels a bit overdoing it.
--
Michael

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Michael Paquier (#8)

Re: pg_rewind vs superuser

On Fri, Apr 5, 2019 at 9:56 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 05, 2019 at 09:41:58AM +0200, Michael Banck wrote:

Is there a good reason why Postgres doesn't just issue a CHECKPOINT
after promote itself? After all, this seems to be about making the
control file having the proper content, which sounds like a good thing
to have in general.

The startup process requests a checkpoint since 9.3, and before that
it was doing the checkpoint by itself (grep for fast_promoted and
RequestCheckpoint() around line 7579 in xlog.c). This allows the
recovery to finish much faster.

Could this be a problem for anything else besides pg_rewind?

Not that I know of, at least not in the tree.

This looks like a needless footgun waiting to happen, and just
documenting it in pg_rewind's notes section looks a bit too hidden to me
(but is certainly an improvement).

We had a couple of reports on the matter over the past years. Perhaps
we could use a big fat warning but that feels a bit overdoing it.

A related question is, could we (for 12+) actually make the problem go
away? As in, can we detect the state and just have pg_rewind issue the
checkpoint as needed?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#10

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#9)

Re: pg_rewind vs superuser

On Fri, Apr 05, 2019 at 09:59:29AM +0200, Magnus Hagander wrote:

A related question is, could we (for 12+) actually make the problem go
away? As in, can we detect the state and just have pg_rewind issue the
checkpoint as needed?

I am not sure as you can still bump into the legit case where one is
trying to rewind an instance which is on the same timeline as the
source, and nothing should happen in this case.
--
Michael

#11

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Michael Paquier (#10)

Re: pg_rewind vs superuser

On Fri, Apr 5, 2019 at 10:06 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 05, 2019 at 09:59:29AM +0200, Magnus Hagander wrote:

A related question is, could we (for 12+) actually make the problem go
away? As in, can we detect the state and just have pg_rewind issue the
checkpoint as needed?

I am not sure as you can still bump into the legit case where one is
trying to rewind an instance which is on the same timeline as the
source, and nothing should happen in this case.

If that is the case, would running a CHECKPOINT actually cause a problem?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#12

Michael Banck

mbanck@gmx.net

almost 7 years ago

In reply to: Michael Paquier (#8)

Re: pg_rewind vs superuser

Hi,

On Fri, Apr 05, 2019 at 04:56:32PM +0900, Michael Paquier wrote:

On Fri, Apr 05, 2019 at 09:41:58AM +0200, Michael Banck wrote:

Is there a good reason why Postgres doesn't just issue a CHECKPOINT
after promote itself? After all, this seems to be about making the
control file having the proper content, which sounds like a good thing
to have in general.

The startup process requests a checkpoint since 9.3, and before that
it was doing the checkpoint by itself (grep for fast_promoted and
RequestCheckpoint() around line 7579 in xlog.c). This allows the
recovery to finish much faster.

Ok, so the problem is that that checkpoint might be still ongoing when
you quickly issue a pg_rewind from the other side? If not, can you
explain once more the actual problem?

In any case, the updated documentation says:

|When executing pg_rewind using an online cluster as source which has
|been recently promoted, it is necessary to execute a CHECKPOINT after
|promotion so as its control file reflects up-to-date timeline
|information

I think it might be useful to specify more exactly which of the two
servers (the remote one AIUI) needs a CHECKPOINT in the abvoe. Also, if
it is the case that a CHECKPOINT is done automatically (see above), that
paragraph could be rewritten to say something like "pg_rewind needs to
wait for the checkoint on the remote server to finish. This can be
ensured by issueing an explicit checkpoint on the remote server prior to
running pg_rewind."

Finally, (and still, if I got the above correctly), to the suggestion of
Magnus of pg_rewind running the checkpoint itself on the remote: would
that again mean that pg_rewind needs SUPERUSER rights or is there
a(nother) GRANTable function that could be added to the list in this
case?

Sorry for being a bit dense here.

Michael

#13

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#11)

Re: pg_rewind vs superuser

On Fri, Apr 05, 2019 at 10:11:22AM +0200, Magnus Hagander wrote:

If that is the case, would running a CHECKPOINT actually cause a problem?

If you exclude the point that it may not be necessary and the
potential extra I/O, no. However we would come back to the point of
pg_rewind requiring a superuser :(
--
Michael

#14

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Michael Banck (#12)

Re: pg_rewind vs superuser

On Fri, Apr 05, 2019 at 10:39:26AM +0200, Michael Banck wrote:

Ok, so the problem is that that checkpoint might be still ongoing when
you quickly issue a pg_rewind from the other side?

The end-of-recovery checkpoint may not have even begun.

I think it might be useful to specify more exactly which of the two
servers (the remote one AIUI) needs a CHECKPOINT in the above. Also, if
it is the case that a CHECKPOINT is done automatically (see above), that
paragraph could be rewritten to say something like "pg_rewind needs to
wait for the checkoint on the remote server to finish. This can be
ensured by issueing an explicit checkpoint on the remote server prior to
running pg_rewind."

Well, the target server needs to be cleanly shut down, so it seems
pretty clear to me which one needs to have a checkpoint :)

Finally, (and still, if I got the above correctly), to the suggestion of
Magnus of pg_rewind running the checkpoint itself on the remote: would
that again mean that pg_rewind needs SUPERUSER rights or is there
a(nother) GRANTable function that could be added to the list in this
case?

pg_rewind would require again a superuser. So this could be
optional. In one HA workflow I maintain, what I actually do is to
enforce directly a checkpoint immediately after the promotion is done
to make sure that the data is up-to-date, and I don't meddle with
pg_rewind workflow.
--
Michael

#15

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Michael Paquier (#14)

Re: pg_rewind vs superuser

On Fri, Apr 5, 2019 at 1:05 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 05, 2019 at 10:39:26AM +0200, Michael Banck wrote:

Ok, so the problem is that that checkpoint might be still ongoing when
you quickly issue a pg_rewind from the other side?

The end-of-recovery checkpoint may not have even begun.

So can we *detect* that this is the case? Because if so, we could perhaps
just wait for it to be done? Because there will always be one?

The main point is -- we know from experience that it's pretty fragile to
assume the user read the documentation :) So if we can find *any* way to
handle this in code rather than docs, that'd be great. We would still
absolutely want the docs change for back branches of course.

I think it might be useful to specify more exactly which of the two

servers (the remote one AIUI) needs a CHECKPOINT in the above. Also, if
it is the case that a CHECKPOINT is done automatically (see above), that
paragraph could be rewritten to say something like "pg_rewind needs to
wait for the checkoint on the remote server to finish. This can be
ensured by issueing an explicit checkpoint on the remote server prior to
running pg_rewind."

Well, the target server needs to be cleanly shut down, so it seems
pretty clear to me which one needs to have a checkpoint :)

Clear to you and us of course, but quite possibly not to everybody. I'm
sure there are a *lot* of users out there who do not realize that "cleanly
shut down" means "ran a checkpoint just before it shut down".

Finally, (and still, if I got the above correctly), to the suggestion of

Magnus of pg_rewind running the checkpoint itself on the remote: would
that again mean that pg_rewind needs SUPERUSER rights or is there
a(nother) GRANTable function that could be added to the list in this
case?

pg_rewind would require again a superuser. So this could be

Ugh, you are right of course.

optional. In one HA workflow I maintain, what I actually do is to
enforce directly a checkpoint immediately after the promotion is done
to make sure that the data is up-to-date, and I don't meddle with
pg_rewind workflow.

Sure. And every other HA setup also has to take care of it. That's why it
would make sense to centralize it into the tool itself when it's
*mandatory* to deal with it somehow.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#16

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#15)

Re: pg_rewind vs superuser

On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:

So can we *detect* that this is the case? Because if so, we could perhaps
just wait for it to be done? Because there will always be one?

Yes, this one is technically possible. We could add a timeout option
which checks each N seconds the control file of the online source and
sees if its timeline differs or not with the target, waiting for the
change to happen. If we do that, we may want to revisit the behavior
of not issuing an error if the source and the target are detected as
being on the same timeline, and consider it as a failure.

The main point is -- we know from experience that it's pretty fragile to
assume the user read the documentation :) So if we can find *any* way to
handle this in code rather than docs, that'd be great. We would still
absolutely want the docs change for back branches of course.

Any veeeeery recent experience on the matter perhaps? :)
--
Michael

#17

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 7 years ago

In reply to: Michael Paquier (#16)

Re: pg_rewind vs superuser

At Mon, 8 Apr 2019 15:17:25 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190408061725.GF2712@paquier.xyz>

On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:

So can we *detect* that this is the case? Because if so, we could perhaps
just wait for it to be done? Because there will always be one?

Yes, this one is technically possible. We could add a timeout option
which checks each N seconds the control file of the online source and
sees if its timeline differs or not with the target, waiting for the
change to happen. If we do that, we may want to revisit the behavior
of not issuing an error if the source and the target are detected as
being on the same timeline, and consider it as a failure.

The main point is -- we know from experience that it's pretty fragile to
assume the user read the documentation :) So if we can find *any* way to
handle this in code rather than docs, that'd be great. We would still
absolutely want the docs change for back branches of course.

Any veeeeery recent experience on the matter perhaps? :)

I (am not Magnus) saw a similar but a bit different case. Just
after master's promote, standby was killed in immediate mode
after catching up to master's latest TLI but before restartpoint
finished. They are in different TLIs in control data so *the
tool* decides to try pg_rewind. Restart->shutdown (*1) sequence
for cleanup made standby catch up to the master's TLI but their
histories have diverged from each other in the latest TLI. Of
course, pg_rewind says "no need to rewind since they're on the
same TLI". The subsequent replication starts from the segment
beginning and overwrote the WAL records already applied on the
standby. The result was a broken database. I suspect that it is
the result of a kind of misoperation and sane operation won't
cause the situation, but such situation could be "cleaned up" if
pg_rewind did the work for a replication set on the same TLI.

I haven't find exactly what happend yet in the case.

*1: It is somewhat strange, that recovery reaches to the next TLI
despite that I heard that the restart is in non-standby,
non-recovery mode.. Something should be wrong.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 7 years ago

In reply to: Michael Paquier (#4)

Re: pg_rewind vs superuser

On 2019-04-04 12:43, Michael Paquier wrote:

I would like to apply this down to 9.5 for the checkpoint part and
down to 11 for the role part, so if anybody has any comments, please
feel free.

How about some tests to show that this is actually true?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Peter Eisentraut (#18)

1 attachment(s)

Re: pg_rewind vs superuser

On Mon, Apr 08, 2019 at 10:03:48AM +0200, Peter Eisentraut wrote:

How about some tests to show that this is actually true?

Sure. With something like the attached? I don't think that there is
much point to complicate the test code with multiple roles if the
default is a superuser.
--
Michael

Attachments:

rewind-user-tap.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index 900d452d8b..618de85161 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -144,6 +144,20 @@ sub start_master
 {
 	$node_master->start;
 
+	# Create a custom role which will be used to run pg_rewind.  This has
+	# minimal permissions to make pg_rewind able to work with an online
+	# source.
+	$node_master->psql('postgres', "
+		CREATE ROLE rewind_user LOGIN;
+		GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
+		  TO rewind_user;
+		GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean)
+		  TO rewind_user;
+		GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text)
+		  TO rewind_user;
+		GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean)
+		  TO rewind_user;");
+
 	#### Now run the test-specific parts to initialize the master before setting
 	# up standby
 
@@ -207,6 +221,9 @@ sub run_pg_rewind
 	my $standby_connstr = $node_standby->connstr('postgres');
 	my $tmp_folder      = TestLib::tempdir;
 
+	# Append the rewind role to the connection string.
+	$standby_connstr = "$standby_connstr user=rewind_user";
+
 	# Stop the master and be ready to perform the rewind
 	$node_master->stop;

#20

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Michael Paquier (#19)

Re: pg_rewind vs superuser

On Tue, Apr 09, 2019 at 10:38:19AM +0900, Michael Paquier wrote:

Sure. With something like the attached? I don't think that there is
much point to complicate the test code with multiple roles if the
default is a superuser.

As this topic differs from the original thread, I haev started a new
thread, so let's discuss the proposed patch there:
/messages/by-id/20190411041336.GM2728@paquier.xyz
--
Michael

#21

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Michael Paquier (#16)

Re: pg_rewind vs superuser

On Mon, Apr 8, 2019 at 8:17 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:

So can we *detect* that this is the case? Because if so, we could perhaps
just wait for it to be done? Because there will always be one?

Yes, this one is technically possible. We could add a timeout option
which checks each N seconds the control file of the online source and
sees if its timeline differs or not with the target, waiting for the
change to happen. If we do that, we may want to revisit the behavior
of not issuing an error if the source and the target are detected as
being on the same timeline, and consider it as a failure.

I think doing something like that would be a good idea.

I mean, we should *always* detect if if we can, since it's a condition
where things don't work properly.

And I think it would make sense to wait by default, but we could then also
have a commandline parameter that says "don't wait, instead error out in
case the checkpoint isn't done".

Or something like that?

The main point is -- we know from experience that it's pretty fragile to

assume the user read the documentation :) So if we can find *any* way to
handle this in code rather than docs, that'd be great. We would still
absolutely want the docs change for back branches of course.

Any veeeeery recent experience on the matter perhaps? :)

Actually no, I've been considering it for some time due to the number of
questions we get on it that get exactly the same answer. And then you doing
the docs patch reminded me of it :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#22

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Magnus Hagander (#21)

Re: pg_rewind vs superuser

On Thu, Apr 11, 2019 at 10:33:13AM +0200, Magnus Hagander wrote:

And I think it would make sense to wait by default, but we could then also
have a commandline parameter that says "don't wait, instead error out in
case the checkpoint isn't done".

Or something like that?

Yes, that would be the idea. You still need to cover the case where
both instances are on the same timeline, in which case you could wait
for a checkpoint forever, so we'd need to change the current behavior
a bit by making sure that we always throw an error if both nodes are
still on the same timeline after the timeout (see TAP test
005_same_timeline.pl). I am not sure that you need a separate option
to control the case where you don't want to wait though. Perhaps we
could have a separate switch, but a user could also just set
--timeout=0 to match that behavior.
--
Michael