Time-Delayed Standbys

Started by Fabrízio de Royes Melloover 12 years ago63 messageshackers
Jump to latest
#1Fabrízio de Royes Mello
fabriziomello@gmail.com

Hi all,

The attached patch is a continuation of Robert's work [1]/messages/by-id/BANLkTi==TTzHDqWzwJDjmOf__8YuA7L1jw@mail.gmail.com.

I made some changes:
- use of Latches instead of pg_usleep, so we don't have to wakeup regularly.
- call HandleStartupProcInterrupts() before CheckForStandbyTrigger()
because might change the trigger file's location
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checks
- don't care about clockdrift because it's an admin problem.

Regards,

[1]: /messages/by-id/BANLkTi==TTzHDqWzwJDjmOf__8YuA7L1jw@mail.gmail.com
/messages/by-id/BANLkTi==TTzHDqWzwJDjmOf__8YuA7L1jw@mail.gmail.com

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

Show quoted text

Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello

Attachments:

time-delayed-standby-v1.patchapplication/octet-stream; name=time-delayed-standby-v1.patchDownload+106-1
#2KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Fabrízio de Royes Mello (#1)
Re: Time-Delayed Standbys

Hi Royes,

I'm sorry for my late review...

I feel potential of your patch in PG replication function, and it might be
something useful for all people. I check your patch and have some comment for
improvement. I haven't executed detail of unexpected sutuation yet. But I think
that under following problem2 is important functionality problem. So I ask you to
solve the problem in first.

* Regress test
No problem.

* Problem1
Your patch does not code recovery.conf.sample about recovery_time_delay.
Please add it.

* Problem2
When I set time-delayed standby and start standby server, I cannot access stanby
server by psql. It is because PG is in first starting recovery which cannot
access by psql. I think that time-delayed standby is only delayed recovery
position, it must not affect other functionality.

I didn't test recoevery in master server with recovery_time_delay. If you have
detail test result of these cases, please send me.

My first easy review of your patch is that all.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: KONDO Mitsumasa (#2)
Re: Time-Delayed Standbys

On Fri, Nov 29, 2013 at 5:49 AM, KONDO Mitsumasa <
kondo.mitsumasa@lab.ntt.co.jp> wrote:

Hi Royes,

I'm sorry for my late review...

No problem...

I feel potential of your patch in PG replication function, and it might

be something useful for all people. I check your patch and have some
comment for improvement. I haven't executed detail of unexpected sutuation
yet. But I think that under following problem2 is important functionality
problem. So I ask you to solve the problem in first.

* Regress test
No problem.

* Problem1
Your patch does not code recovery.conf.sample about recovery_time_delay.
Please add it.

Fixed.

* Problem2
When I set time-delayed standby and start standby server, I cannot access

stanby server by psql. It is because PG is in first starting recovery which
cannot access by psql. I think that time-delayed standby is only delayed
recovery position, it must not affect other functionality.

I didn't test recoevery in master server with recovery_time_delay. If you

have detail test result of these cases, please send me.

Well, I could not reproduce the problem that you described.

I run the following test:

1) Clusters
- build master
- build slave and attach to the master using SR and config
recovery_time_delay to 1min.

2) Stop de Slave

3) Run some transactions on the master using pgbench to generate a lot of
archives

4) Start the slave and connect to it using psql and in another session I
can see all archive recovery log

My first easy review of your patch is that all.

Thanks.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

Show quoted text

Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello

Attachments:

time-delayed-standby-v2.patchtext/x-diff; charset=US-ASCII; name=time-delayed-standby-v2.patchDownload+117-1
#4Christian Kruse
christian@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#3)
Re: Time-Delayed Standbys

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Greetings,
CK

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Christian Kruse (#4)
Re: Time-Delayed Standbys

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse
<christian@2ndquadrant.com>wrote:

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Thanks for your review Christian...

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

Show quoted text

Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello

#6Robert Haas
robertmhaas@gmail.com
In reply to: Fabrízio de Royes Mello (#5)
Re: Time-Delayed Standbys

On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Thanks for your review Christian...

So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Joshua D. Drake
jd@commandprompt.com
In reply to: Robert Haas (#6)
Re: Time-Delayed Standbys

On 12/03/2013 10:46 AM, Robert Haas wrote:

On Tue, Dec 3, 2013 at 12:36 PM, Fabr�zio de Royes Mello
<fabriziomello@gmail.com> wrote:

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Thanks for your review Christian...

So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.

I would agree that it is a good idea.

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#6)
Re: Time-Delayed Standbys

On 2013-12-03 13:46:28 -0500, Robert Haas wrote:

On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Thanks for your review Christian...

So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed.

I really fail to see why clock drift should be this patch's
responsibility. It's not like the world would go under^W data corruption
would ensue if the clocks drift. Your standby would get delayed
imprecisely. Big deal. From what I know of potential users of this
feature, they would set it to at the very least 30min - that's WAY above
the range for acceptable clock-drift on servers.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Fabrízio de Royes Mello (#1)
Re: Time-Delayed Standbys

On 18 October 2013 19:03, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:

The attached patch is a continuation of Robert's work [1].

Reviewing v2...

I made some changes:
- use of Latches instead of pg_usleep, so we don't have to wakeup regularly.

OK

- call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because
might change the trigger file's location

OK

- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checks

Why just those? Why not aborts and restore points also?

- don't care about clockdrift because it's an admin problem.

Few minor points on things

* The code with comment "Clear any previous recovery delay time" is in
wrong place, move down or remove completely. Setting the delay to zero
doesn't prevent calling recoveryDelay(), so that logic looks wrong
anyway.

* The loop exit in recoveryDelay() is inelegant, should break if <= 0

* There's a spelling mistake in sample

* The patch has whitespace in one place

and one important point...

* The delay loop happens AFTER we check for a pause. Which means if
the user notices a problem on a commit, then hits pause button on the
standby, the pause will have no effect and the next commit will be
applied anyway. Maybe just one commit, but its an off by one error
that removes the benefit of the patch. So I think we need to test this
again after we finish delaying

if (xlogctl->recoveryPause)
recoveryPausesHere();

We need to explain in the docs that this is intended only for use in a
live streaming deployment. It will have little or no meaning in a
PITR.

I think recovery_time_delay should be called
<something>_apply_delay
to highlight the point that it is the apply of records that is
delayed, not the receipt. And hence the need to document that sync rep
is NOT slowed down by setting this value.

And to make the name consistent with other parameters, I suggest
min_standby_apply_delay

We also need to document caveats about the patch, in that it only
delays on timestamped WAL records and other records may be applied
sooner than the delay in some circumstances, so it is not a way to
avoid all cancellations.

We also need to document the behaviour of the patch is to apply all
data received as quickly as possible once triggered, so the specified
delay does not slow down promoting the server to a master. That might
also be seen as a negative behaviour, since promoting the master
effectively sets recovery_time_delay to zero.

I will handle the additional documentation, if you can update the
patch with the main review comments. Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Fabrízio de Royes Mello (#3)
Re: Time-Delayed Standbys

(2013/11/30 5:34), Fabrízio de Royes Mello wrote:

On Fri, Nov 29, 2013 at 5:49 AM, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp
<mailto:kondo.mitsumasa@lab.ntt.co.jp>> wrote:

* Problem1
Your patch does not code recovery.conf.sample about recovery_time_delay.
Please add it.

Fixed.

OK. It seems no problem.

* Problem2
When I set time-delayed standby and start standby server, I cannot access

stanby server by psql. It is because PG is in first starting recovery which
cannot access by psql. I think that time-delayed standby is only delayed recovery
position, it must not affect other functionality.

I didn't test recoevery in master server with recovery_time_delay. If you have

detail test result of these cases, please send me.

Well, I could not reproduce the problem that you described.

I run the following test:

1) Clusters
- build master
- build slave and attach to the master using SR and config recovery_time_delay to
1min.

2) Stop de Slave

3) Run some transactions on the master using pgbench to generate a lot of archives

4) Start the slave and connect to it using psql and in another session I can see
all archive recovery log

Hmm... I had thought my mistake in reading your email, but it reproduce again.
When I sat small recovery_time_delay(=30000), it might work collectry. However, I
sat long timed recovery_time_delay(=3000000), it didn't work.

My reporduced operation log is under following.

[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 68704
latency average: 3.493 ms
tps = 2289.196747 (including connections establishing)
tps = 2290.175129 (excluding connections establishing)
[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST
LOG: entering standby mode
LOG: consistent recovery state reached at 0/5C4D8668
LOG: redo starts at 0/5C4000D8
[mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up

I attached my postgresql.conf and recovery.conf. It will be reproduced.

I think that your patch should be needed recovery flags which are like
ArchiveRecoveryRequested and InArchiveRecovery etc. It is because time-delayed
standy works only replication situasion. And I hope that it isn't bad in startup
standby server and archive recovery. Is it wrong with your image? I think this
patch have a lot of potential, however I think that standby functionality is more
important than this feature. And we might need to discuss that how behavior is
best in this patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachments:

conf.tar.gzapplication/x-gzip; name=conf.tar.gzDownload
#11KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Andres Freund (#8)
Re: Time-Delayed Standbys

(2013/12/04 4:00), Andres Freund wrote:

On 2013-12-03 13:46:28 -0500, Robert Haas wrote:

On Tue, Dec 3, 2013 at 12:36 PM, Fabr�zio de Royes Mello
<fabriziomello@gmail.com> wrote:

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Thanks for your review Christian...

So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed.

I really fail to see why clock drift should be this patch's
responsibility. It's not like the world would go under^W data corruption
would ensue if the clocks drift. Your standby would get delayed
imprecisely. Big deal. From what I know of potential users of this
feature, they would set it to at the very least 30min - that's WAY above
the range for acceptable clock-drift on servers.

Yes. I think that purpose of this patch is long time delay in standby server,
and not for little bit careful timing delay.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: KONDO Mitsumasa (#10)
Re: Time-Delayed Standbys

On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote:

4) Start the slave and connect to it using psql and in another session I can see
all archive recovery log

Hmm... I had thought my mistake in reading your email, but it reproduce again.
When I sat small recovery_time_delay(=30000), it might work collectry.
However, I sat long timed recovery_time_delay(=3000000), it didn't work.

My reporduced operation log is under following.

[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 68704
latency average: 3.493 ms
tps = 2289.196747 (including connections establishing)
tps = 2290.175129 (excluding connections establishing)
[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST
LOG: entering standby mode
LOG: consistent recovery state reached at 0/5C4D8668
LOG: redo starts at 0/5C4000D8
[mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up

I attached my postgresql.conf and recovery.conf. It will be reproduced.

So, you brought up a standby and it took more time to become consistent
because it waited on commits? That's the problem? If so, I don't think
that's a bug?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Christian Kruse
christian@2ndquadrant.com
In reply to: KONDO Mitsumasa (#10)
Re: Time-Delayed Standbys

Hi,

On 04/12/13 11:13, KONDO Mitsumasa wrote:

1) Clusters
- build master
- build slave and attach to the master using SR and config recovery_time_delay to
1min.

2) Stop de Slave

3) Run some transactions on the master using pgbench to generate a lot of archives

4) Start the slave and connect to it using psql and in another session I can see
all archive recovery log

Hmm... I had thought my mistake in reading your email, but it reproduce again.
When I sat small recovery_time_delay(=30000), it might work collectry.
However, I sat long timed recovery_time_delay(=3000000), it didn't work.
[…]

I'm not sure if I understand your problem correctly. I try to
summarize, please correct if I'm wrong:

You created a master node and a hot standby with 3000000 delay. Then
you stopped the standby, did the pgbench and startet the hot standby
again. It did not get in line with the master. Is this correct?

I don't see a problem here… the standby should not be in sync with the
master, it should be delayed. I did step by step what you did and
after 50 minutes (3000000ms) the standby was at the same level the
master was.

Did I missunderstand you?

Regards,
Christian Kruse

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14Mitsumasa KONDO
kondo.mitsumasa@gmail.com
In reply to: Andres Freund (#12)
Re: Time-Delayed Standbys

2013/12/4 Andres Freund <andres@2ndquadrant.com>

On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote:

4) Start the slave and connect to it using psql and in another session

I can see

all archive recovery log

Hmm... I had thought my mistake in reading your email, but it reproduce

again.

When I sat small recovery_time_delay(=30000), it might work collectry.
However, I sat long timed recovery_time_delay(=3000000), it didn't work.

My reporduced operation log is under following.

[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 68704
latency average: 3.493 ms
tps = 2289.196747 (including connections establishing)
tps = 2290.175129 (excluding connections establishing)
[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was shut down

in recovery at 2013-12-03 10:26:41 JST

LOG: entering standby mode
LOG: consistent recovery state reached at 0/5C4D8668
LOG: redo starts at 0/5C4000D8
[mitsu-ko@localhost postgresql]$ FATAL: the database system is

starting up

FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up

I attached my postgresql.conf and recovery.conf. It will be reproduced.

So, you brought up a standby and it took more time to become consistent
because it waited on commits? That's the problem? If so, I don't think
that's a bug?

When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.
Cannot connect server is not hoped behavior.
If you think this behavior is the best, I will set ready for commiter. And
commiter will fix it better.

Rregards,
--
Mitsumasa KONDO
NTT Open Source Software Center

#15Andres Freund
andres@anarazel.de
In reply to: Mitsumasa KONDO (#14)
Re: Time-Delayed Standbys

On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote:

2013/12/4 Andres Freund <andres@2ndquadrant.com>
When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.

That doesn't sound like a good plan - even if the clients cannot connect
yet, you can still promote the server. Just not taking delay into
consideration at that point seems like it would possibly surprise users
rather badly in situations they really cannot use such surprises.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#9)
Re: Time-Delayed Standbys

Hi,

On 2013-12-03 19:33:16 +0000, Simon Riggs wrote:

- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checks

Why just those? Why not aborts and restore points also?

What would the advantage of waiting on anything but commits be? If it's
not a commit, the action won't change the state of the database (yesyes,
there are exceptions, but those don't have a timestamp)...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Mitsumasa KONDO
kondo.mitsumasa@gmail.com
In reply to: Christian Kruse (#13)
Re: Time-Delayed Standbys

2013/12/4 Christian Kruse <christian@2ndquadrant.com>

You created a master node and a hot standby with 3000000 delay. Then
you stopped the standby, did the pgbench and startet the hot standby
again. It did not get in line with the master. Is this correct?

No. First, I start master, and execute pgbench. Second, I start standby
with 3000000ms(50min) delay.
Then it cannot connect standby server by psql at all. I'm not sure why
standby did not start.
It might because delay feature is disturbed in REDO loop when first standby
start-up.

I don't see a problem here… the standby should not be in sync with the
master, it should be delayed. I did step by step what you did and
after 50 minutes (3000000ms) the standby was at the same level the
master was.

I think we can connect standby server any time, nevertheless with delay
option.

Did I missunderstand you?

I'm not sure... You might right or another best way might be existed.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

#18Mitsumasa KONDO
kondo.mitsumasa@gmail.com
In reply to: Andres Freund (#15)
Re: Time-Delayed Standbys

2013/12/4 Andres Freund <andres@2ndquadrant.com>

On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote:

2013/12/4 Andres Freund <andres@2ndquadrant.com>
When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.

That doesn't sound like a good plan - even if the clients cannot connect
yet, you can still promote the server.

I'm not sure your argument, but does a purpose of this patch slip off?

Just not taking delay into

consideration at that point seems like it would possibly surprise users
rather badly in situations they really cannot use such surprises.

Hmm... I think user will be surprised...

I think it is easy to fix behavior using recovery flag.
So we had better to wait for other comments.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

#19Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#6)
Re: Time-Delayed Standbys

Robert Haas <robertmhaas@gmail.com> wrote:

So, I proposed this patch previously and I still think it's a
good idea, but it got voted down on the grounds that it didn't
deal with clock drift.  I view that as insufficient reason to
reject the feature, but others disagreed.  Unless some of those
people have changed their minds, I don't think this patch has
much future here.

There are many things that a system admin can get wrong.  Failing
to supply this feature because the sysadmin might not be running
ntpd (or equivalent) correctly seems to me to be like not having
the software do fsync because the sysadmin might not have turned
off write-back buffering on drives without persistent storage.
Either way, poor system management can defeat the feature.  Either
way, I see no reason to withhold the feature from those who manage
their systems in a sane fashion.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Christian Kruse
christian@2ndquadrant.com
In reply to: Kevin Grittner (#19)
Re: Time-Delayed Standbys

Hi,

On 04/12/13 07:22, Kevin Grittner wrote:

There are many things that a system admin can get wrong.  Failing
to supply this feature because the sysadmin might not be running
ntpd (or equivalent) correctly seems to me to be like not having
the software do fsync because the sysadmin might not have turned
off write-back buffering on drives without persistent storage.
Either way, poor system management can defeat the feature.  Either
way, I see no reason to withhold the feature from those who manage
their systems in a sane fashion.

I agree. But maybe we should add a warning in the documentation about
time syncing?

Greetings,
CK

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Fabrízio de Royes Mello (#3)
#22Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#6)
#23Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#22)
#24Simon Riggs
simon@2ndQuadrant.com
In reply to: Magnus Hagander (#23)
#25Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Simon Riggs (#24)
#26Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Simon Riggs (#9)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Fabrízio de Royes Mello (#26)
#28Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Robert Haas (#27)
#29Pavel Stehule
pavel.stehule@gmail.com
In reply to: Fabrízio de Royes Mello (#1)
#30KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Fabrízio de Royes Mello (#28)
#31KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: KONDO Mitsumasa (#30)
#32KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Pavel Stehule (#29)
#33Andres Freund
andres@anarazel.de
In reply to: KONDO Mitsumasa (#31)
#34Craig Ringer
craig@2ndquadrant.com
In reply to: Robert Haas (#6)
#35Bruce Momjian
bruce@momjian.us
In reply to: Craig Ringer (#34)
#36KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Andres Freund (#33)
#37Andres Freund
andres@anarazel.de
In reply to: KONDO Mitsumasa (#36)
#38KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Andres Freund (#37)
#39Simon Riggs
simon@2ndQuadrant.com
In reply to: KONDO Mitsumasa (#38)
#40Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Simon Riggs (#39)
#41Andres Freund
andres@anarazel.de
In reply to: Fabrízio de Royes Mello (#40)
#42Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Andres Freund (#41)
#43KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Fabrízio de Royes Mello (#42)
#44Simon Riggs
simon@2ndQuadrant.com
In reply to: KONDO Mitsumasa (#43)
#45Simon Riggs
simon@2ndQuadrant.com
In reply to: KONDO Mitsumasa (#32)
#46KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Simon Riggs (#45)
#47Simon Riggs
simon@2ndQuadrant.com
In reply to: KONDO Mitsumasa (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#45)
#49Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#48)
#50Mitsumasa KONDO
kondo.mitsumasa@gmail.com
In reply to: Simon Riggs (#47)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#49)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#51)
#53Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#52)
#54Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#53)
#55Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Simon Riggs (#54)
#56Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Fabrízio de Royes Mello (#55)
#57Simon Riggs
simon@2ndQuadrant.com
In reply to: Fabrízio de Royes Mello (#56)
#58Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#57)
#59Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#58)
#60Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#59)
#61Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#60)
#62Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Simon Riggs (#61)
#63Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#61)