testing hot standby

Started by Jaime Casanovaalmost 16 years ago13 messages

jcasanov@systemguards.com.ec

almost 16 years ago

Hi,

i'm startint to try Hot Standby & Streaming Replication, so i started
a replication:

1) Install master server with regression database
2) Start WAL archive (archive_mode=on, archive_command='cp %p
/usr/local/pgsql/wal_archive/%f')
3) select pg_start_backup('standby test');
4) cp -R /usr/local/pgsql/9.0/data /usr/local/pgsql/9.0slave/data
5) select pg_stop_backup();

at this point i checked wal_archive directory:
"""
postgres@casanova14:/usr/local/pgsql/9.0$ ls ../wal_archive/
000000010000000000000003 000000010000000000000004
000000010000000000000004.00000020.backup
"""

6) started standby recovery (archive_mode=off, standy_mode=on,
primary_conninfo = 'host=127.0.0.1 port=5432 user=postgres')

wait a little and check logs:
"""
LOG: database system was interrupted; last known up at 2010-04-09 14:48:16 ECT
LOG: entering standby mode
LOG: restored log file "000000010000000000000004" from archive
LOG: redo starts at 0/4000020
LOG: consistent recovery state reached at 0/5000000
LOG: database system is ready to accept read only connections
LOG: restored log file "000000010000000000000005" from archive
cp: no se puede efectuar `stat' sobre
«/usr/local/pgsql/wal_archive/000000010000000000000006»: No existe el
fichero ó directorio
LOG: unexpected pageaddr 0/2000000 in log file 0, segment 6, offset 0
cp: no se puede efectuar `stat' sobre
«/usr/local/pgsql/wal_archive/000000010000000000000006»: No existe el
fichero ó directorio
LOG: streaming replication successfully connected to primary
"""

mmm... are we waiting for a WAL file that doesn't exist?

7) i then, restart standby server
"""
LOG: received smart shutdown request
FATAL: terminating walreceiver process due to administrator command
LOG: shutting down
LOG: database system is shut down

LOG: database system was interrupted while in recovery at log time
2010-04-09 15:06:23 ECT
HINT: If this has occurred more than once some data might be
corrupted and you might need to choose an earlier recovery target.
LOG: entering standby mode
cp: no se puede efectuar `stat' sobre
«/usr/local/pgsql/wal_archive/000000010000000000000006»: No existe el
fichero ó directorio
LOG: invalid record length at 0/6000080
cp: no se puede efectuar `stat' sobre
«/usr/local/pgsql/wal_archive/000000010000000000000006»: No existe el
fichero ó directorio
"""

8) i initialize pgbench tables, which create missing WAL files (bin/pgbench -i)

and then it could connect to the primary, and some minutes later it
could accept connections
"""
LOG: streaming replication successfully connected to primary
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
LOG: redo starts at 0/6000080
LOG: consistent recovery state reached at 0/60000A0
FATAL: the database system is starting up
LOG: database system is ready to accept read only connections
"""

but, my main concern is why it was asking for
"000000010000000000000006"? is this normal? is this standby's way of
saying i'm working but i have nothing to do?
when that happens after a standby restart, is normal that i have to
wait until the file is created before it can accept connections?

sorry, if this questions sound very simple but i haven't following all
the design details :)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Jaime Casanova

jcasanov@systemguards.com.ec

almost 16 years ago

In reply to: Jaime Casanova (#1)

1 attachment(s)

Re: testing hot standby

On Fri, Apr 9, 2010 at 3:39 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

Hi,

i'm startint to try Hot Standby & Streaming Replication, so i started
a replication:

i think "make standbycheck" needs a little more work, why it isn't
accesible from top of source dir?

For now, to excercise it i have to do (on the standby server):
- cd src/test/regress
- psql -h ip_primary -f sql/hs_primary_setup.sql
- make standbycheck (and 2 tests fail for differences in the messages,
patch to fix attached)

just an idea, can't we use the info about primary_conninfo to know how
to execute the script hs_primary_setup.sql on the primary?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachments:

standbycheck.patchtext/x-patch; charset=US-ASCII; name=standbycheck.patchDownload

Index: src/test/regress/expected/hs_standby_disallowed.out
===================================================================
RCS file: /home/postgres/pg_repo/pgsql/src/test/regress/expected/hs_standby_disallowed.out,v
retrieving revision 1.1
diff -c -r1.1 hs_standby_disallowed.out
*** src/test/regress/expected/hs_standby_disallowed.out	19 Dec 2009 01:32:45 -0000	1.1
--- src/test/regress/expected/hs_standby_disallowed.out	10 Apr 2010 04:26:48 -0000
***************
*** 11,50 ****
  WARNING:  there is no transaction in progress
  -- SELECT
  select * from hs1 FOR SHARE;
! ERROR:  transaction is read-only
  select * from hs1 FOR UPDATE;
! ERROR:  transaction is read-only
  -- DML
  BEGIN;
  insert into hs1 values (37);
! ERROR:  transaction is read-only
  ROLLBACK;
  BEGIN;
  delete from hs1 where col1 = 1;
! ERROR:  transaction is read-only
  ROLLBACK;
  BEGIN;
  update hs1 set col1 = NULL where col1 > 0;
! ERROR:  transaction is read-only
  ROLLBACK;
  BEGIN;
  truncate hs3;
! ERROR:  transaction is read-only
  ROLLBACK;
  -- DDL
  create temporary table hstemp1 (col1 integer);
! ERROR:  transaction is read-only
  BEGIN;
  drop table hs2;
! ERROR:  transaction is read-only
  ROLLBACK;
  BEGIN;
  create table hs4 (col1 integer);
! ERROR:  transaction is read-only
  ROLLBACK;
  -- Sequences
  SELECT nextval('hsseq');
! ERROR:  cannot be executed during recovery
  -- Two-phase commit transaction stuff
  BEGIN;
  SELECT count(*) FROM hs1;
--- 11,50 ----
  WARNING:  there is no transaction in progress
  -- SELECT
  select * from hs1 FOR SHARE;
! ERROR:  cannot execute SELECT FOR SHARE in a read-only transaction
  select * from hs1 FOR UPDATE;
! ERROR:  cannot execute SELECT FOR UPDATE in a read-only transaction
  -- DML
  BEGIN;
  insert into hs1 values (37);
! ERROR:  cannot execute INSERT in a read-only transaction
  ROLLBACK;
  BEGIN;
  delete from hs1 where col1 = 1;
! ERROR:  cannot execute DELETE in a read-only transaction
  ROLLBACK;
  BEGIN;
  update hs1 set col1 = NULL where col1 > 0;
! ERROR:  cannot execute UPDATE in a read-only transaction
  ROLLBACK;
  BEGIN;
  truncate hs3;
! ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
  ROLLBACK;
  -- DDL
  create temporary table hstemp1 (col1 integer);
! ERROR:  cannot execute CREATE TABLE in a read-only transaction
  BEGIN;
  drop table hs2;
! ERROR:  cannot execute DROP TABLE in a read-only transaction
  ROLLBACK;
  BEGIN;
  create table hs4 (col1 integer);
! ERROR:  cannot execute CREATE TABLE in a read-only transaction
  ROLLBACK;
  -- Sequences
  SELECT nextval('hsseq');
! ERROR:  cannot execute nextval() in a read-only transaction
  -- Two-phase commit transaction stuff
  BEGIN;
  SELECT count(*) FROM hs1;
***************
*** 54,60 ****
  (1 row)
  
  PREPARE TRANSACTION 'foobar';
! ERROR:  cannot be executed during recovery
  ROLLBACK;
  BEGIN;
  SELECT count(*) FROM hs1;
--- 54,60 ----
  (1 row)
  
  PREPARE TRANSACTION 'foobar';
! ERROR:  cannot execute PREPARE TRANSACTION during recovery
  ROLLBACK;
  BEGIN;
  SELECT count(*) FROM hs1;
***************
*** 64,70 ****
  (1 row)
  
  COMMIT PREPARED 'foobar';
! ERROR:  cannot be executed during recovery
  ROLLBACK;
  BEGIN;
  SELECT count(*) FROM hs1;
--- 64,70 ----
  (1 row)
  
  COMMIT PREPARED 'foobar';
! ERROR:  COMMIT PREPARED cannot run inside a transaction block
  ROLLBACK;
  BEGIN;
  SELECT count(*) FROM hs1;
***************
*** 74,80 ****
  (1 row)
  
  PREPARE TRANSACTION 'foobar';
! ERROR:  cannot be executed during recovery
  ROLLBACK PREPARED 'foobar';
  ERROR:  current transaction is aborted, commands ignored until end of transaction block
  ROLLBACK;
--- 74,80 ----
  (1 row)
  
  PREPARE TRANSACTION 'foobar';
! ERROR:  cannot execute PREPARE TRANSACTION during recovery
  ROLLBACK PREPARED 'foobar';
  ERROR:  current transaction is aborted, commands ignored until end of transaction block
  ROLLBACK;
***************
*** 86,137 ****
  (1 row)
  
  ROLLBACK PREPARED 'foobar';
! ERROR:  cannot be executed during recovery
  ROLLBACK;
  -- Locks
  BEGIN;
  LOCK hs1;
! ERROR:  cannot be executed during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN SHARE UPDATE EXCLUSIVE MODE;
! ERROR:  cannot be executed during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN SHARE MODE;
! ERROR:  cannot be executed during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN SHARE ROW EXCLUSIVE MODE;
! ERROR:  cannot be executed during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN EXCLUSIVE MODE;
! ERROR:  cannot be executed during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN ACCESS EXCLUSIVE MODE;
! ERROR:  cannot be executed during recovery
  COMMIT;
  -- Listen
  listen a;
! ERROR:  cannot be executed during recovery
  notify a;
! ERROR:  cannot be executed during recovery
  unlisten a;
! ERROR:  cannot be executed during recovery
  unlisten *;
! ERROR:  cannot be executed during recovery
  -- disallowed commands
  ANALYZE hs1;
! ERROR:  cannot be executed during recovery
  VACUUM hs2;
! ERROR:  cannot be executed during recovery
  CLUSTER hs2 using hs1_pkey;
! ERROR:  cannot be executed during recovery
  REINDEX TABLE hs2;
! ERROR:  cannot be executed during recovery
  REVOKE SELECT ON hs1 FROM PUBLIC;
! ERROR:  transaction is read-only
  GRANT SELECT ON hs1 TO PUBLIC;
! ERROR:  transaction is read-only
--- 86,137 ----
  (1 row)
  
  ROLLBACK PREPARED 'foobar';
! ERROR:  ROLLBACK PREPARED cannot run inside a transaction block
  ROLLBACK;
  -- Locks
  BEGIN;
  LOCK hs1;
! ERROR:  cannot execute LOCK TABLE during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN SHARE UPDATE EXCLUSIVE MODE;
! ERROR:  cannot execute LOCK TABLE during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN SHARE MODE;
! ERROR:  cannot execute LOCK TABLE during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN SHARE ROW EXCLUSIVE MODE;
! ERROR:  cannot execute LOCK TABLE during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN EXCLUSIVE MODE;
! ERROR:  cannot execute LOCK TABLE during recovery
  COMMIT;
  BEGIN;
  LOCK hs1 IN ACCESS EXCLUSIVE MODE;
! ERROR:  cannot execute LOCK TABLE during recovery
  COMMIT;
  -- Listen
  listen a;
! ERROR:  cannot execute LISTEN during recovery
  notify a;
! ERROR:  cannot execute NOTIFY during recovery
  unlisten a;
! ERROR:  cannot execute UNLISTEN during recovery
  unlisten *;
! ERROR:  cannot execute UNLISTEN during recovery
  -- disallowed commands
  ANALYZE hs1;
! ERROR:  cannot execute VACUUM during recovery
  VACUUM hs2;
! ERROR:  cannot execute VACUUM during recovery
  CLUSTER hs2 using hs1_pkey;
! ERROR:  cannot execute CLUSTER during recovery
  REINDEX TABLE hs2;
! ERROR:  cannot execute REINDEX during recovery
  REVOKE SELECT ON hs1 FROM PUBLIC;
! ERROR:  cannot execute REVOKE in a read-only transaction
  GRANT SELECT ON hs1 TO PUBLIC;
! ERROR:  cannot execute GRANT in a read-only transaction
Index: src/test/regress/expected/hs_standby_functions.out
===================================================================
RCS file: /home/postgres/pg_repo/pgsql/src/test/regress/expected/hs_standby_functions.out,v
retrieving revision 1.1
diff -c -r1.1 hs_standby_functions.out
*** src/test/regress/expected/hs_standby_functions.out	19 Dec 2009 01:32:45 -0000	1.1
--- src/test/regress/expected/hs_standby_functions.out	10 Apr 2010 04:28:25 -0000
***************
*** 5,11 ****
  --
  -- should fail
  select txid_current();
! ERROR:  cannot be executed during recovery
  select length(txid_current_snapshot()::text) >= 4;
   ?column? 
  ----------
--- 5,11 ----
  --
  -- should fail
  select txid_current();
! ERROR:  cannot execute txid_current() during recovery
  select length(txid_current_snapshot()::text) >= 4;
   ?column? 
  ----------

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Jaime Casanova (#1)

Re: testing hot standby

On Sat, Apr 10, 2010 at 5:39 AM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

i'm startint to try Hot Standby & Streaming Replication, so i started
a replication:

Great!

but, my main concern is why it was asking for
"000000010000000000000006"? is this normal?

The standby server tries to replay all of the available WAL files in the
archive and pg_xlog directory at first. Then, when an invalid record is
found or no more WAL file is available, it starts streaming replication
and tries to read the missing WAL files from the primary.

In your testing, an invalid record was found in 000000010000000000000006,
then the standby was waiting for it to be shipped from the primary.

is this standby's way of
saying i'm working but i have nothing to do?

Since SR is a record-based log-shipping, the standby can receive the WAL
records from the primary before the WAL file has been filled up. So, in
your case, I guess the standby was receiving the WAL records which belong
to 000000010000000000000006 from the primary.

when that happens after a standby restart, is normal that i have to
wait until the file is created before it can accept connections?

No. All the WAL records for the standby to accept connections should be
shipped before any additional WAL records are written. Didn't the standby
accept connections before executing pgbench?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Jaime Casanova

jcasanov@systemguards.com.ec

almost 16 years ago

In reply to: Jaime Casanova (#1)

Re: testing hot standby

On Fri, Apr 9, 2010 at 3:39 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

but, my main concern is why it was asking for
"000000010000000000000006"? is this normal? is this standby's way of
saying i'm working but i have nothing to do?
when that happens after a standby restart, is normal that i have to
wait until the file is created before it can accept connections?

ok, i see this again in a new env. seems like this happen when i
shutdown standby and primary (in that order) after making some
WAL-logged action on the primary an then start again primary and
standby (in that order)... it doesn't occur always but it does occur
too often, still i'm not sure what is the key factor that triggers
this

standby waits for a file that doesn't exist to reach a consistent
state (last time i wait for an hour after i force a WAL-logged
action), here is an extract of the message on standby's log:
"""
postgres@casanova1:/usr/local/pgsql/9.0slave$ cat
data/pg_log/postgresql-2010-04-12_000947.log
LOG: database system was interrupted while in recovery at log time
2010-04-11 20:44:09 GMT
HINT: If this has occurred more than once some data might be
corrupted and you might need to choose an earlier recovery target.
LOG: entering standby mode
LOG: restored log file "000000010000000E00000014" from archive
LOG: redo starts at E/14000088
LOG: consistent recovery state reached at E/15000000
cp: no se puede efectuar `stat' sobre
«/usr/local/pgsql/wal_archive/000000010000000E00000015»: No existe el
fichero o el directorio
LOG: unexpected pageaddr D/EE000000 in log file 14, segment 21, offset 0
cp: no se puede efectuar `stat' sobre
«/usr/local/pgsql/wal_archive/000000010000000E00000015»: No existe el
fichero o el directorio
LOG: streaming replication successfully connected to primary

"""

another point, what happened with this:
http://archives.postgresql.org/message-id/1229549172.4793.105.camel@ebony.2ndQuadrant?
Obviously we still have the problem with hash indexes, and in that
thread Tom advice was just to document the issue and while that could
be fine at least we should be emitting better messages, consider this
one that i got on the standby server (where 4658650 is the oid of a
hash index):
"""
mic=# explain analyze select * from tt1 where col1 = 5000;
ERROR: could not read block 0 in file "base/21958/4658650": read only
0 of 8192 bytes
"""

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Jaime Casanova

jcasanov@systemguards.com.ec

almost 16 years ago

In reply to: Fujii Masao (#3)

Re: testing hot standby

On Mon, Apr 12, 2010 at 1:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Didn't the standby
accept connections before executing pgbench?

nop, and last time i try it was in that state for an hour (without
accepting connections)... after that i execute on the primary: CREATE
TABLE tt2 AS SELECT generate_series(1, 100) as i
After that, the standby start accepting connections

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Jaime Casanova (#5)

Re: testing hot standby

On Mon, Apr 12, 2010 at 3:29 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

On Mon, Apr 12, 2010 at 1:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Didn't the standby
accept connections before executing pgbench?

nop, and last time i try it was in that state for an hour (without
accepting connections)... after that i execute on the primary: CREATE
TABLE tt2 AS SELECT generate_series(1, 100) as i
After that, the standby start accepting connections

OK. Your reproduction scenario is the following?
If not, could you show me the complete scenario?

1. start the primary
2. pg_start_backup()
3. copy $PGDATA from the primary to the standby
4. pg_stop_backup();
5. create the recovery.conf and start the standby
6. shutdown (smart mode) the standby
7. start the standby again
--> cannot accept connnections until new WAL has been created

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Jaime Casanova (#4)

Re: testing hot standby

Jaime Casanova wrote:

On Fri, Apr 9, 2010 at 3:39 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

but, my main concern is why it was asking for
"000000010000000000000006"? is this normal? is this standby's way of
saying i'm working but i have nothing to do?

Yes.

when that happens after a standby restart, is normal that i have to
wait until the file is created before it can accept connections?

ok, i see this again in a new env. seems like this happen when i
shutdown standby and primary (in that order) after making some
WAL-logged action on the primary an then start again primary and
standby (in that order)... it doesn't occur always but it does occur
too often, still i'm not sure what is the key factor that triggers
this

Perhaps you're being bitten by the "can't start hot standby from a
shutdown checkpoint" issue I've complained for a long time. There's a
pending patch for that, see
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00222.php.

If that's what's hitting you, the way to reproduce is:

1. shut down primary
2. shut down standby
3. start standby
4. start primary.

The standby will get hung until it receives an online checkpoint record
from the primary. Connecting to the primary and issuing a manual
CHECKPOINT helps.

That's not the order of shutdowns you described, though...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Jaime Casanova

jcasanov@systemguards.com.ec

almost 16 years ago

In reply to: Fujii Masao (#6)

Re: testing hot standby

On Mon, Apr 12, 2010 at 1:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Apr 12, 2010 at 3:29 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

On Mon, Apr 12, 2010 at 1:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Didn't the standby
accept connections before executing pgbench?

nop, and last time i try it was in that state for an hour (without
accepting connections)... after that i execute on the primary: CREATE
TABLE tt2 AS SELECT generate_series(1, 100) as i
After that, the standby start accepting connections

OK. Your reproduction scenario is the following?
If not, could you show me the complete scenario?

1. start the primary
2. pg_start_backup()
3. copy $PGDATA from the primary to the standby
4. pg_stop_backup();
5. create the recovery.conf and start the standby

execute some WAL-logged action (i've seen this happen even with no
WAL-logged action if i wait for a while before shutdown servers)

6. shutdown (smart mode) the standby

shutdown (smart) the primary
start the primary again

7. start the standby again
--> cannot accept connnections until new WAL has been created

a manual CHECKPOINT on the primary does help

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Jaime Casanova

jcasanov@systemguards.com.ec

almost 16 years ago

In reply to: Jaime Casanova (#8)

Re: testing hot standby

On Mon, Apr 12, 2010 at 9:27 AM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

On Mon, Apr 12, 2010 at 1:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Apr 12, 2010 at 3:29 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

On Mon, Apr 12, 2010 at 1:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Didn't the standby
accept connections before executing pgbench?

nop, and last time i try it was in that state for an hour (without
accepting connections)... after that i execute on the primary: CREATE
TABLE tt2 AS SELECT generate_series(1, 100) as i
After that, the standby start accepting connections

OK. Your reproduction scenario is the following?
If not, could you show me the complete scenario?

1. start the primary
2. pg_start_backup()
3. copy $PGDATA from the primary to the standby
4. pg_stop_backup();
5. create the recovery.conf and start the standby

execute some WAL-logged action (i've seen this happen even with no
WAL-logged action if i wait for a while before shutdown servers)

6. shutdown (smart mode) the standby

shutdown (smart) the primary
start the primary again

7. start the standby again

i guess, this is because the primary is in recovery when the standby
tries to connect to it, and it should wait until the primary is ready
but seems like the primary is failing to advertise itself and the
standby doesn't recheck the condition... could be?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#10

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Jaime Casanova (#8)

Re: testing hot standby

On Mon, Apr 12, 2010 at 11:27 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

1. start the primary
2. pg_start_backup()
3. copy $PGDATA from the primary to the standby
4. pg_stop_backup();
5. create the recovery.conf and start the standby

execute some WAL-logged action (i've seen this happen even with no
WAL-logged action if i wait for a while before shutdown servers)

6. shutdown (smart mode) the standby

shutdown (smart) the primary
start the primary again

7. start the standby again
--> cannot accept connnections until new WAL has been created

I was not able to reproduce the problem using the above scenario.
But when I did one more restart of the primary and standby, I was
able to observe the problem. If this is the same as you encountered,
it would be the "can't start hot standby from a shutdown checkpoint"
issue that Heikki pointed out. So it's very helpful to check whether
the posted patch fixes your problem or not.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00407.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#11

Jaime Casanova

jcasanov@systemguards.com.ec

almost 16 years ago

In reply to: Fujii Masao (#10)

Re: testing hot standby

On Mon, Apr 12, 2010 at 8:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Apr 12, 2010 at 11:27 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

1. start the primary
2. pg_start_backup()
3. copy $PGDATA from the primary to the standby
4. pg_stop_backup();
5. create the recovery.conf and start the standby

execute some WAL-logged action (i've seen this happen even with no
WAL-logged action if i wait for a while before shutdown servers)

6. shutdown (smart mode) the standby

shutdown (smart) the primary
start the primary again

7. start the standby again
--> cannot accept connnections until new WAL has been created

I was not able to reproduce the problem using the above scenario.
But when I did one more restart of the primary and standby, I was
able to observe the problem. If this is the same as you encountered,
it would be the "can't start hot standby from a shutdown checkpoint"
issue that Heikki pointed out. So it's very helpful to check whether
the posted patch fixes your problem or not.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00407.php

this patch seems to fix the problem for me...
i will read it on the morning and the thread where it is, something
that seems strange to me is that the patch touch twophase.c and
twophase.h, why?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#12

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Jaime Casanova (#11)

Re: testing hot standby

Jaime Casanova wrote:

On Mon, Apr 12, 2010 at 8:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

But when I did one more restart of the primary and standby, I was
able to observe the problem. If this is the same as you encountered,
it would be the "can't start hot standby from a shutdown checkpoint"
issue that Heikki pointed out. So it's very helpful to check whether
the posted patch fixes your problem or not.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00407.php

this patch seems to fix the problem for me...

Committed, thanks for the testing.

i will read it on the morning and the thread where it is, something
that seems strange to me is that the patch touch twophase.c and
twophase.h, why?

When you start hot standby from an online checkpoint, the XIDs of any
two-phase transactions are included in the running-xacts record, just
like any other in-progress transactions. At a shutdown checkpoint, we
know that no regular transactions are in-progress, but there can be
transactions in prepared state, which need to be considered as
in-progress in the standby, but there's no information about them in the
shutdown record. So we scan pg_twophase to discover them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13

Jaime Casanova

jcasanov@systemguards.com.ec

over 15 years ago

In reply to: Heikki Linnakangas (#12)

Re: testing hot standby

On Tue, Apr 13, 2010 at 11:06 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Jaime Casanova wrote:

i will read it on the morning and the thread where it is, something
that seems strange to me is that the patch touch twophase.c and
twophase.h, why?

When you start hot standby from an online checkpoint, the XIDs of any
two-phase transactions are included in the running-xacts record, just
like any other in-progress transactions. At a shutdown checkpoint, we
know that no regular transactions are in-progress, but there can be
transactions in prepared state, which need to be considered as
in-progress in the standby, but there's no information about them in the
shutdown record. So we scan pg_twophase to discover them.

ah! that makes sense... thanks

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157