when the startup process doesn't

Started by Robert Haasalmost 5 years ago143 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Hi,

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X

We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#1)
Re: when the startup process doesn't

On 2021-Apr-19, Robert Haas wrote:

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant).

Hmm. We already have pg_isready, which is pretty simplistic -- it tries
to connect to the server and derive a status in a very simplistic way.
Can we perhaps improve on that? I think your idea of using the
non-database-connected replication mode would let the server return a
tuple with some status information with a new command. And then
pg_isready could interpret that, or just print it.

--
�lvaro Herrera 39�49'30"S 73�17'W
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)

#3Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#1)
Re: when the startup process doesn't

On Mon, Apr 19, 2021 at 01:55:13PM -0400, Robert Haas wrote:

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

Yes, this certainly needs improvement.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#1)
Re: when the startup process doesn't

On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote:

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

+1 for both ideas. I've heard multiple requests for something like
that. A couple of users with update_process_title=off told me they
regretted that choice when they found themselves running a long crash
recovery with the only indicator of progress disabled.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#4)
Re: when the startup process doesn't

Thomas Munro <thomas.munro@gmail.com> writes:

On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote:

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

+1 for both ideas. I've heard multiple requests for something like
that. A couple of users with update_process_title=off told me they
regretted that choice when they found themselves running a long crash
recovery with the only indicator of progress disabled.

Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#1)
Re: when the startup process doesn't

On Mon, Apr 19, 2021 at 7:55 PM Robert Haas <robertmhaas@gmail.com> wrote:

Hi,

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X

We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

Thoughts?

(Ugh. Did reply instead of reply-all. Surely I should know that by
now... Here's a re-send!)

+1 for the log based one.

In general I'm usually against the log based one, but something over
the replication protocol is really not going to help a lot of people
who are in this situation. They may not even have permissions to log
in, and any kind of monitoring system would fail to work as well. And
can we even log users in at this point? We can't get the list of
roles... If we could, I would say it's probably better to allow the
login in a regular connection, but then immediately throw an error and
give this error a more detailed message if the user has monitoring
permissions.

But against either of those, the log based method is certainly a lot
easier to build :)

And FWIW, I believe most -- probably all -- cloud environments do give
an interface to view the log at least, so the log based solution would
work there as well. Maybe not as convenient, but it would work.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: when the startup process doesn't

On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.

All right, so it's nice to hear that nobody so far is opposed to the
log-based solution, and I think it's sensible to think about building
that one first and doing anything else later.

But, if we did want to invent something to allow monitoring via libpq
even at this early stage, how would we make it work? Magnus pointed
out that we can hardly read pg_authid during crash recovery, which
means that accepting logins in the usual sense at that stage is not
feasible. But, what if we picked a fixed, hard-coded role name for
this? I would suggest pg_monitor, but that's already taken for
something else, so maybe pg_console or some better thing someone else
can suggest. Without a pg_authid row, you couldn't use password, md5,
or scram authentication, unless we provided some other place to store
the verifier, like a flatfile. I'm not sure we want to go there, but
that still leaves a lot of workable authentication methods.

I think Álvaro is right to see this kind of work as an extension of
pg_isready, but the problem with pg_isready is that we don't want to
expose a lot of information to the whole Internet, or however much of
it can reach the postgres port. But with this approach, you can lock
down access via pg_hba.conf, which means that it's OK to expose
information that we don't want to make available to everyone. I think
we're still limited to exposing what can be observed from shared
memory here, because the whole idea is to have something that can be
used even before consistency is reached, so we shouldn't really be
doing anything that would look at the contents of data files. But that
still leaves a bunch of things that we could show here, the progress
of the startup process being one of them.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#7)
Re: when the startup process doesn't

On Tue, Apr 20, 2021 at 2:43 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.

All right, so it's nice to hear that nobody so far is opposed to the
log-based solution, and I think it's sensible to think about building
that one first and doing anything else later.

But, if we did want to invent something to allow monitoring via libpq
even at this early stage, how would we make it work? Magnus pointed
out that we can hardly read pg_authid during crash recovery, which
means that accepting logins in the usual sense at that stage is not
feasible. But, what if we picked a fixed, hard-coded role name for
this? I would suggest pg_monitor, but that's already taken for
something else, so maybe pg_console or some better thing someone else
can suggest. Without a pg_authid row, you couldn't use password, md5,
or scram authentication, unless we provided some other place to store
the verifier, like a flatfile. I'm not sure we want to go there, but
that still leaves a lot of workable authentication methods.

Another option would be to keep this check entirely outside the scope
of normal roles, and just listen on a port (or unix socket) during
startup which basically just replies with the current status if you
connect to it. On Unix this could also make use of peer authentication
requiring you to be the same user as postgres for example.

I think Álvaro is right to see this kind of work as an extension of
pg_isready, but the problem with pg_isready is that we don't want to
expose a lot of information to the whole Internet, or however much of
it can reach the postgres port. But with this approach, you can lock
down access via pg_hba.conf, which means that it's OK to expose
information that we don't want to make available to everyone. I think
we're still limited to exposing what can be observed from shared
memory here, because the whole idea is to have something that can be
used even before consistency is reached, so we shouldn't really be
doing anything that would look at the contents of data files. But that
still leaves a bunch of things that we could show here, the progress
of the startup process being one of them.

Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In reply to: Magnus Hagander (#8)
Re: when the startup process doesn't

On Tue, 20 Apr 2021 15:04:28 +0200
Magnus Hagander <magnus@hagander.net> wrote:
[...]

Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).

Two another options:

1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

2. maybe the startup process could ignore update_process_title? As far
as I understand the doc correctly, this GUC is mostly useful for backends on
Windows.

Regards,

#10Magnus Hagander
magnus@hagander.net
In reply to: Jehan-Guillaume de Rorthais (#9)
Re: when the startup process doesn't

On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:

On Tue, 20 Apr 2021 15:04:28 +0200
Magnus Hagander <magnus@hagander.net> wrote:
[...]

Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).

Two another options:

1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.

There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?

2. maybe the startup process could ignore update_process_title? As far
as I understand the doc correctly, this GUC is mostly useful for backends on
Windows.

You mention Windows -- that would be one excellent reason not to go
for this particular method. Viewing the process title is much harder
on Windows, as there is actually no such thing and we fake it.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#10)
Re: when the startup process doesn't

Magnus Hagander <magnus@hagander.net> writes:

On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:

Two another options:
1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.

Also, given that pg_control is as critical a bit of data as we have,
we really don't want to be writing it more often than we absolutely
have to.

There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?

Yeah, once you are talking about dumping stuff in a file, it's not
clear how that's better than progress-messages-in-the-log. People
already have a lot of tooling for looking at the postmaster log.

I think the point of Robert's other proposal is to allow remote
checks of the restart's progress, so local files aren't much of
a substitute anyway.

regards, tom lane

#12Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#11)
Re: when the startup process doesn't

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:

Two another options:
1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.

Also, given that pg_control is as critical a bit of data as we have,
we really don't want to be writing it more often than we absolutely
have to.

Yeah, don't think pg_control fiddling is what we want. I do agree with
improving the logging situation around here, certainly.

There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?

Yeah, once you are talking about dumping stuff in a file, it's not
clear how that's better than progress-messages-in-the-log. People
already have a lot of tooling for looking at the postmaster log.

Agreed.

I think the point of Robert's other proposal is to allow remote
checks of the restart's progress, so local files aren't much of
a substitute anyway.

Yeah, being able to pick up on this remotely seems like it'd be quite
nice. I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,
whenever their password or such changes. It's certainly a bit grotty
but I do think it'd work. I definitely don't want to go back to having
all of pg_authid written as a flat file and I'd rather that existing
tools and libraries work with this (meaning using the same port and
speaking the PG protocol and such) rather than inventing some new thing
that listens on some other port, etc.

On the fence about tying this to 'pg_monitor' instead of using a new
predefined role. Either way, I would definitely prefer to see the admin
have to create a role and then GRANT the predefined role to that role.
I really dislike the idea of having predefined roles that can be used to
directly log into the database.

Thanks,

Stephen

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#12)
Re: when the startup process doesn't

Stephen Frost <sfrost@snowman.net> writes:

Yeah, being able to pick up on this remotely seems like it'd be quite
nice. I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,

We got rid of those flat files for good and sufficient reasons. I really
really don't want to go back to having such.

I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

regards, tom lane

#14Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#13)
Re: when the startup process doesn't

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Yeah, being able to pick up on this remotely seems like it'd be quite
nice. I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,

We got rid of those flat files for good and sufficient reasons. I really
really don't want to go back to having such.

Yeah, certainly is part of the reason that I didn't really like that
idea either.

I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

Ultimately it seems like it would depend on exactly what we are thinking
of returning there. A simple percentage of recovery which has been
completed doesn't seem like it'd really be revealing too much
information though.

Thanks,

Stephen

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: when the startup process doesn't

Hi,

On 2021-04-19 13:55:13 -0400, Robert Haas wrote:

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

The hard part about this seems to be how to perform authentication -
obviously we can't do catalog lookups for users at that time.

If that weren't the issue, we could easily do much better than now, by
just providing an errdetail() with recovery progress information. But we
presumably don't want to spray such information to unauthenticated
connection attempts.

I've vaguely wondered before whether it'd be worth having something like
an "admin" socket somewhere in the data directory. Which explicitly
wouldn't require authentication, have the cluster owner as the user,
etc. That'd not just be useful for monitoring during recovery, but also
make some interactions with the server easier for admin tools I think.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

However, leaving aside the implementation effort, the crazy idea
above would not easily address the issue of only being accessible with
local access...

Greetings,

Andres Freund

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: when the startup process doesn't

Hi,

On 2021-04-20 14:56:58 -0400, Tom Lane wrote:

I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

Unfortunately I think something like a percentage is hard to calculate
right now. Even just looking at crash recovery (vs replication or
PITR), we don't currently know where the WAL ends without reading all
the WAL. The easiest thing to return would be something in LSNs or
bytes and I suspect that we don't want to expose either unauthenticated?

I wonder if we ought to occasionally update something like
ControlFileData->minRecoveryPoint on primaries, similar to what we do on
standbys? Then we could actually calculate a percentage, and it'd have
the added advantage of allowing to detect more cases where the end of
the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
adding a lot of fsyncs, but that seems doable?

Greetings,

Andres Freund

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#15)
Re: when the startup process doesn't

On 2021-Apr-20, Andres Freund wrote:

On 2021-04-19 13:55:13 -0400, Robert Haas wrote:

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

The hard part about this seems to be how to perform authentication -
obviously we can't do catalog lookups for users at that time.

Maybe a way to do this would involve some sort of monitoring cookie
that's obtained ahead of time (maybe at initdb time?) and is supplied to
the frontend by some OOB means. Then frontend can present that during
startup to the server, which ascertains its legitimacy without having to
access catalogs. Perhaps it even requires a specific pg_hba.conf rule.

--
�lvaro Herrera Valdivia, Chile
"La verdad no siempre es bonita, pero el hambre de ella s�"

#18SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Robert Haas (#1)
Re: when the startup process doesn't

+1 for both log messages and allowing connections. I believe these two
complement each other.

In the cloud world, we oftentimes want to monitor the progress of the
recovery without connecting to the server as the operators don't
necessarily have the required permissions to connect and query. Secondly,
having this information in the log helps going back in time and understand
where Postgres spent time during recovery.

The ability to query the server provides real time information and come
handy.

Thanks,
Satya

On Mon, Apr 19, 2021 at 10:55 AM Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

Hi,

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

LOG: still syncing data directory, elapsed time %ld.%03d ms, current path
%s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X

We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#16)
Re: when the startup process doesn't

Greetings,

* Andres Freund (andres@anarazel.de) wrote:

On 2021-04-20 14:56:58 -0400, Tom Lane wrote:

I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

Unfortunately I think something like a percentage is hard to calculate
right now. Even just looking at crash recovery (vs replication or
PITR), we don't currently know where the WAL ends without reading all
the WAL. The easiest thing to return would be something in LSNs or
bytes and I suspect that we don't want to expose either unauthenticated?

While it obviously wouldn't be exactly accurate, I wonder if we couldn't
just look at the WAL files we have to reply and then guess that we'll go
through about half of them before we reach the end..? I mean, wouldn't
exactly be the first time that a percentage progress report wasn't
completely accurate. :)

I wonder if we ought to occasionally update something like
ControlFileData->minRecoveryPoint on primaries, similar to what we do on
standbys? Then we could actually calculate a percentage, and it'd have
the added advantage of allowing to detect more cases where the end of
the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
adding a lot of fsyncs, but that seems doable?

This seems to go against Tom's concerns wrt rewriting pg_control.
Perhaps we could work through a solution to that, which would be nice,
but I'm not sure that we need the percentage to be super accurate
anyway, though, ideally, we'd work it out so that it's always increasing
and doesn't look "stuck" as long as we're actually moving forward
through the WAL.

Maybe a heuristic of 'look at the end of the WAL files, assume we'll go
through 50% of it, but only consider that to be 90%, with the last 10%
going from half-way through the WAL to the actual end of the WAL
available."

Yes, such heuristics are terrible, but they're also relatively simple
and wouldn't require tracking anything additional and would, maybe,
avoid the concern about needing to authenticate the user..

Thanks,

Stephen

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#19)
Re: when the startup process doesn't

Stephen Frost <sfrost@snowman.net> writes:

* Andres Freund (andres@anarazel.de) wrote:

On 2021-04-20 14:56:58 -0400, Tom Lane wrote:

I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

Unfortunately I think something like a percentage is hard to calculate
right now.

While it obviously wouldn't be exactly accurate, I wonder if we couldn't
just look at the WAL files we have to reply and then guess that we'll go
through about half of them before we reach the end..? I mean, wouldn't
exactly be the first time that a percentage progress report wasn't
completely accurate. :)

Or we could skip all the guessing and just print something like what
the startup process exposes in ps status, ie "currently processing
WAL file so-and-so".

regards, tom lane

#21Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#19)
#22Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#22)
#24Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#24)
#26Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#24)
#27Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#25)
#28Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#27)
In reply to: Andres Freund (#21)
#30Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Jehan-Guillaume de Rorthais (#29)
#31Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#32)
#34Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Tom Lane (#33)
#35Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#34)
#36Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Justin Pryzby (#35)
#37Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#36)
#38Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Justin Pryzby (#37)
#39Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#38)
#40Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Justin Pryzby (#39)
#41Amul Sul
sulamul@gmail.com
In reply to: Nitin Jadhav (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Amul Sul (#41)
#43Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nitin Jadhav (#40)
#44Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Bharath Rupireddy (#43)
#45Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nitin Jadhav (#44)
#46Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#44)
#47Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Justin Pryzby (#46)
#48Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#48)
#50Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#49)
#51Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#50)
#52Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#51)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#52)
#54Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#54)
#56Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#55)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#56)
#58Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#57)
#59Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#58)
#60Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#59)
#62Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#61)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#62)
#64Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#64)
#66Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#65)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#66)
#68Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#68)
#70Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#50)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#70)
#72Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#71)
#73Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#72)
#74Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#73)
#75Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#74)
#76Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Michael Paquier (#75)
#77Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Michael Paquier (#75)
#78Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#77)
#79Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Justin Pryzby (#78)
#80Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#79)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#80)
#82Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#81)
#83Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#82)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#82)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#83)
#86Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nitin Jadhav (#82)
#87Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#85)
#88Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#87)
#89Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#88)
#90Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nitin Jadhav (#89)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#89)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#90)
#93Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#90)
#94Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#92)
#95Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#93)
#96Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#94)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#95)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#96)
#99Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#99)
#101Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#100)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#101)
#103Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#102)
#104Amul Sul
sulamul@gmail.com
In reply to: Robert Haas (#91)
#105Robert Haas
robertmhaas@gmail.com
In reply to: Amul Sul (#104)
#106Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#102)
#107Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#106)
#108Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#107)
#109Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#108)
#110Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#109)
#111Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#108)
#112Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#111)
#113Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#112)
#114Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Robert Haas (#113)
#115Robert Haas
robertmhaas@gmail.com
In reply to: Nitin Jadhav (#114)
#116Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#115)
#117Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#116)
#118Simon Riggs
simon@2ndQuadrant.com
In reply to: Bharath Rupireddy (#117)
#119Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#118)
#120Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#119)
#121Simon Riggs
simon@2ndQuadrant.com
In reply to: Bharath Rupireddy (#120)
#122Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#120)
#123Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#122)
#124Simon Riggs
simon@2ndQuadrant.com
In reply to: Bharath Rupireddy (#123)
#125Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Simon Riggs (#124)
#126Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#123)
#127Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#126)
#128Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#126)
#129Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#128)
#130Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#129)
#131Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#130)
#132Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#131)
#133Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#132)
#134Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#133)
#135Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#134)
#136Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#135)
#137Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#136)
#138Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#137)
#139Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#138)
#140Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#139)
#141Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#138)
#142Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bharath Rupireddy (#141)
#143Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tom Lane (#142)