Streaming replication in docs

Started by Thom Brownalmost 16 years ago7 messages
#1Thom Brown
thombrown@gmail.com

The post about the dev docs needing more hot standby mentions prompted
me to have a look at how streaming replication is documented. Ignore
this if this has already been discussed (I couldn't find any posts),
but I couldn't find any mention of streaming replication except in the
write ahead log configuration. There doesn't seem to be anything in
the index linked to a description of this major feature, why you would
use it, or how it works. There's also no mention in the release
overview.

This is probably due to it being a work in progress and hasn't been
written yet, but thought I'd raise it just in case.

Thanks

Thom

#2Thom Brown
thombrown@gmail.com
In reply to: Thom Brown (#1)
Re: Streaming replication in docs

On 7 February 2010 10:20, Thom Brown <thombrown@gmail.com> wrote:

The post about the dev docs needing more hot standby mentions prompted
me to have a look at how streaming replication is documented.  Ignore
this if this has already been discussed (I couldn't find any posts),
but I couldn't find any mention of streaming replication except in the
write ahead log configuration.  There doesn't seem to be anything in
the index linked to a description of this major feature, why you would
use it, or how it works.  There's also no mention in the release
overview.

This is probably due to it being a work in progress and hasn't been
written yet, but thought I'd raise it just in case.

Thanks

Thom

Okay, after some searching, I found some sections on streaming
replication not referenced in the index:

http://developer.postgresql.org/pgdocs/postgres/warm-standby.html#STREAMING-REPLICATION

Streaming Replication Protocol:
http://developer.postgresql.org/pgdocs/postgres/protocol-replication.html

Main section about Streaming Replication: Maybe these should get
indexed under streaming replication?

Thanks

Thom

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Thom Brown (#2)
Re: Streaming replication in docs

Thom Brown wrote:

On 7 February 2010 10:20, Thom Brown <thombrown@gmail.com> wrote:

The post about the dev docs needing more hot standby mentions prompted
me to have a look at how streaming replication is documented. Ignore
this if this has already been discussed (I couldn't find any posts),
but I couldn't find any mention of streaming replication except in the
write ahead log configuration. There doesn't seem to be anything in
the index linked to a description of this major feature, why you would
use it, or how it works. There's also no mention in the release
overview.

This is probably due to it being a work in progress and hasn't been
written yet, but thought I'd raise it just in case.

Okay, after some searching, I found some sections on streaming
replication not referenced in the index:

http://developer.postgresql.org/pgdocs/postgres/warm-standby.html#STREAMING-REPLICATION

Streaming Replication Protocol:
http://developer.postgresql.org/pgdocs/postgres/protocol-replication.html

Main section about Streaming Replication: Maybe these should get
indexed under streaming replication?

Yeah, as you noticed, the documentation for streaming replication is
currently non-existent. That obviously needs to be fixed. I think we
need to move the existing sections around, and of course add a lot of
documentation for the new streaming replication and hot standby specific
stuff. I proposed a new layout here:

http://archives.postgresql.org/message-id/4B50A600.3090309@enterprisedb.com

which people seemed to agree with. If you want to help with that,
perhaps the best starting point would be to start writing text for the
new chapters of that new layout.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Thom Brown
thombrown@gmail.com
In reply to: Heikki Linnakangas (#3)
Re: Streaming replication in docs

On 7 February 2010 10:49, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Thom Brown wrote:

On 7 February 2010 10:20, Thom Brown <thombrown@gmail.com> wrote:

The post about the dev docs needing more hot standby mentions prompted
me to have a look at how streaming replication is documented.  Ignore
this if this has already been discussed (I couldn't find any posts),
but I couldn't find any mention of streaming replication except in the
write ahead log configuration.  There doesn't seem to be anything in
the index linked to a description of this major feature, why you would
use it, or how it works.  There's also no mention in the release
overview.

This is probably due to it being a work in progress and hasn't been
written yet, but thought I'd raise it just in case.

Okay, after some searching, I found some sections on streaming
replication not referenced in the index:

http://developer.postgresql.org/pgdocs/postgres/warm-standby.html#STREAMING-REPLICATION

Streaming Replication Protocol:
http://developer.postgresql.org/pgdocs/postgres/protocol-replication.html

Main section about Streaming Replication: Maybe these should get
indexed under streaming replication?

Yeah, as you noticed, the documentation for streaming replication is
currently non-existent. That obviously needs to be fixed. I think we
need to move the existing sections around, and of course add a lot of
documentation for the new streaming replication and hot standby specific
stuff. I proposed a new layout here:

http://archives.postgresql.org/message-id/4B50A600.3090309@enterprisedb.com

which people seemed to agree with. If you want to help with that,
perhaps the best starting point would be to start writing text for the
new chapters of that new layout.

I'm wondering if I "gave it a go", I'd be more of a hindrance than a
help, as I'm not really familiar enough with the workings of hot
standby to provide this missing information. In fact that's why I
looked for the documentation, because I wanted to learn how it works
and how to use it. I'd be happy to review any doc changes to ensure
enough information is available and that it's clear enough to
end-users though.

And I do prefer the proposed layout.

Thom

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#3)
1 attachment(s)
Re: Streaming replication in docs

On Sun, Feb 7, 2010 at 7:49 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Yeah, as you noticed, the documentation for streaming replication is
currently non-existent. That obviously needs to be fixed. I think we
need to move the existing sections around, and of course add a lot of
documentation for the new streaming replication and hot standby specific
stuff. I proposed a new layout here:

http://archives.postgresql.org/message-id/4B50A600.3090309@enterprisedb.com

The attached patch changes the layout according to your suggestion
(but with a small modification). Here is the new layout:

-------------------------------
Chapter 24. Backup and Restore
24.1. SQL Dump
24.1.1. Restoring the dump
24.1.2. Using pg_dumpall
24.1.3. Handling large databases
24.2. File System Level Backup
24.3. Continuous Archiving and Point-In-Time Recovery (PITR)
24.3.1. Setting up WAL archiving
24.3.2. Making a Base Backup
24.3.3. Recovering using a Continuous Archive Backup
24.3.4. Timelines
24.3.5. Tips and Examples
24.3.6. Caveats
24.4. Migration Between Releases

Chapter 25. High Availability, Load Balancing, and Replication
25.1. Comparison of different solutions
25.2. File-based Log Shipping
25.2.1. Planning
25.2.2. Implementation
25.2.3. Record-based Log Shipping
25.3. Streaming Replication
25.3.1. Setup
25.3.2. Authentication
25.4. Failover
25.5. Hot Standby
25.5.1. User's Overview
25.5.2. Handling query conflicts
25.5.3. Administrator's Overview
25.5.4. Hot Standby Parameter Reference
25.5.5. Caveats
25.6. Incrementally Updated Backups
-------------------------------

I didn't change the content except adjusting the "sect" and
the "indexterm" tags to new chapter.

If this layout change will have been committed, I'll improve
the SR section.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

replication-doc-layout-change.patchtext/x-patch; charset=US-ASCII; name=replication-doc-layout-change.patchDownload
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1492,2879 **** archive_command = 'local_backup_script.sh'
    </sect2>
   </sect1>
  
-  <sect1 id="warm-standby">
-   <title>Warm Standby Servers for High Availability</title>
- 
-   <indexterm zone="backup">
-    <primary>warm standby</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>PITR standby</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>standby server</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>log shipping</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>witness server</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>STONITH</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>high availability</primary>
-   </indexterm>
- 
-   <para>
-    Continuous archiving can be used to create a <firstterm>high
-    availability</> (HA) cluster configuration with one or more
-    <firstterm>standby servers</> ready to take over operations if the
-    primary server fails. This capability is widely referred to as
-    <firstterm>warm standby</> or <firstterm>log shipping</>.
-   </para>
- 
-   <para>
-    The primary and standby server work together to provide this capability,
-    though the servers are only loosely coupled. The primary server operates
-    in continuous archiving mode, while each standby server operates in
-    continuous recovery mode, reading the WAL files from the primary. No
-    changes to the database tables are required to enable this capability,
-    so it offers low administration overhead compared to some other
-    replication approaches. This configuration also has relatively low
-    performance impact on the primary server.
-   </para>
- 
-   <para>
-    Directly moving WAL records from one database server to another
-    is typically described as log shipping. <productname>PostgreSQL</>
-    implements file-based log shipping, which means that WAL records are
-    transferred one file (WAL segment) at a time. WAL files (16MB) can be
-    shipped easily and cheaply over any distance, whether it be to an
-    adjacent system, another system at the same site, or another system on
-    the far side of the globe. The bandwidth required for this technique
-    varies according to the transaction rate of the primary server.
-    Record-based log shipping is also possible with custom-developed
-    procedures, as discussed in <xref linkend="warm-standby-record">.
-   </para>
- 
-   <para>
-    It should be noted that the log shipping is asynchronous, i.e., the WAL
-    records are shipped after transaction commit. As a result there is a
-    window for data loss should the primary server suffer a catastrophic
-    failure: transactions not yet shipped will be lost.  The length of the
-    window of data loss can be limited by use of the
-    <varname>archive_timeout</varname> parameter, which can be set as low
-    as a few seconds if required.  However such a low setting will
-    substantially increase the bandwidth required for file shipping.
-    If you need a window of less than a minute or so, it's probably better
-    to consider record-based log shipping.
-   </para>
- 
-   <para>
-    The standby server is not available for access, since it is continually
-    performing recovery processing. Recovery performance is sufficiently
-    good that the standby will typically be only moments away from full
-    availability once it has been activated. As a result, we refer to this
-    capability as a warm standby configuration that offers high
-    availability. Restoring a server from an archived base backup and
-    rollforward will take considerably longer, so that technique only
-    offers a solution for disaster recovery, not high availability.
-   </para>
- 
-   <sect2 id="warm-standby-planning">
-    <title>Planning</title>
- 
-    <para>
-     It is usually wise to create the primary and standby servers
-     so that they are as similar as possible, at least from the
-     perspective of the database server.  In particular, the path names
-     associated with tablespaces will be passed across unmodified, so both
-     primary and standby servers must have the same mount paths for
-     tablespaces if that feature is used.  Keep in mind that if
-     <xref linkend="sql-createtablespace" endterm="sql-createtablespace-title">
-     is executed on the primary, any new mount point needed for it must
-     be created on the primary and all standby servers before the command
-     is executed. Hardware need not be exactly the same, but experience shows
-     that maintaining two identical systems is easier than maintaining two
-     dissimilar ones over the lifetime of the application and system.
-     In any case the hardware architecture must be the same &mdash; shipping
-     from, say, a 32-bit to a 64-bit system will not work.
-    </para>
- 
-    <para>
-     In general, log shipping between servers running different major
-     <productname>PostgreSQL</> release
-     levels is not possible. It is the policy of the PostgreSQL Global
-     Development Group not to make changes to disk formats during minor release
-     upgrades, so it is likely that running different minor release levels
-     on primary and standby servers will work successfully. However, no
-     formal support for that is offered and you are advised to keep primary
-     and standby servers at the same release level as much as possible.
-     When updating to a new minor release, the safest policy is to update
-     the standby servers first &mdash; a new minor release is more likely
-     to be able to read WAL files from a previous minor release than vice
-     versa.
-    </para>
- 
-    <para>
-     There is no special mode required to enable a standby server. The
-     operations that occur on both primary and standby servers are
-     normal continuous archiving and recovery tasks. The only point of
-     contact between the two database servers is the archive of WAL files
-     that both share: primary writing to the archive, standby reading from
-     the archive. Care must be taken to ensure that WAL archives from separate
-     primary servers do not become mixed together or confused. The archive
-     need not be large if it is only required for standby operation.
-    </para>
- 
-    <para>
-     The magic that makes the two loosely coupled servers work together is
-     simply a <varname>restore_command</> used on the standby that,
-     when asked for the next WAL file, waits for it to become available from
-     the primary. The <varname>restore_command</> is specified in the
-     <filename>recovery.conf</> file on the standby server. Normal recovery
-     processing would request a file from the WAL archive, reporting failure
-     if the file was unavailable.  For standby processing it is normal for
-     the next WAL file to be unavailable, so we must be patient and wait for
-     it to appear. For files ending in <literal>.backup</> or
-     <literal>.history</> there is no need to wait, and a non-zero return
-     code must be returned. A waiting <varname>restore_command</> can be
-     written as a custom script that loops after polling for the existence of
-     the next WAL file. There must also be some way to trigger failover, which
-     should interrupt the <varname>restore_command</>, break the loop and
-     return a file-not-found error to the standby server. This ends recovery
-     and the standby will then come up as a normal server.
-    </para>
- 
-    <para>
-     Pseudocode for a suitable <varname>restore_command</> is:
- <programlisting>
- triggered = false;
- while (!NextWALFileReady() &amp;&amp; !triggered)
- {
-     sleep(100000L);         /* wait for ~0.1 sec */
-     if (CheckForExternalTrigger())
-         triggered = true;
- }
- if (!triggered)
-         CopyWALFileForRecovery();
- </programlisting>
-    </para>
- 
-    <para>
-     A working example of a waiting <varname>restore_command</> is provided
-     as a <filename>contrib</> module named <application>pg_standby</>. It
-     should be used as a reference on how to correctly implement the logic
-     described above. It can also be extended as needed to support specific
-     configurations and environments.
-    </para>
- 
-    <para>
-     <productname>PostgreSQL</productname> does not provide the system
-     software required to identify a failure on the primary and notify
-     the standby database server.  Many such tools exist and are well
-     integrated with the operating system facilities required for
-     successful failover, such as IP address migration.
-    </para>
- 
-    <para>
-     The method for triggering failover is an important part of planning
-     and design. One potential option is the <varname>restore_command</>
-     command.  It is executed once for each WAL file, but the process
-     running the <varname>restore_command</> is created and dies for
-     each file, so there is no daemon or server process, and we cannot
-     use signals or a signal handler. Therefore, the
-     <varname>restore_command</> is not suitable to trigger failover.
-     It is possible to use a simple timeout facility, especially if
-     used in conjunction with a known <varname>archive_timeout</>
-     setting on the primary. However, this is somewhat error prone
-     since a network problem or busy primary server might be sufficient
-     to initiate failover. A notification mechanism such as the explicit
-     creation of a trigger file is ideal, if this can be arranged.
-    </para>
- 
-    <para>
-     The size of the WAL archive can be minimized by using the <literal>%r</>
-     option of the <varname>restore_command</>. This option specifies the
-     last archive file name that needs to be kept to allow the recovery to
-     restart correctly. This can be used to truncate the archive once
-     files are no longer required, assuming the archive is writable from the
-     standby server.
-    </para>
-   </sect2>
- 
-   <sect2 id="warm-standby-config">
-    <title>Implementation</title>
- 
-    <para>
-     The short procedure for configuring a standby server is as follows. For
-     full details of each step, refer to previous sections as noted.
-     <orderedlist>
-      <listitem>
-       <para>
-        Set up primary and standby systems as nearly identical as
-        possible, including two identical copies of
-        <productname>PostgreSQL</> at the same release level.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Set up continuous archiving from the primary to a WAL archive
-        directory on the standby server. Ensure that
-        <xref linkend="guc-archive-mode">,
-        <xref linkend="guc-archive-command"> and
-        <xref linkend="guc-archive-timeout">
-        are set appropriately on the primary
-        (see <xref linkend="backup-archiving-wal">).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Make a base backup of the primary server (see <xref
-        linkend="backup-base-backup">), and load this data onto the standby.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Begin recovery on the standby server from the local WAL
-        archive, using a <filename>recovery.conf</> that specifies a
-        <varname>restore_command</> that waits as described
-        previously (see <xref linkend="backup-pitr-recovery">).
-       </para>
-      </listitem>
-     </orderedlist>
-    </para>
- 
-    <para>
-     Recovery treats the WAL archive as read-only, so once a WAL file has
-     been copied to the standby system it can be copied to tape at the same
-     time as it is being read by the standby database server.
-     Thus, running a standby server for high availability can be performed at
-     the same time as files are stored for longer term disaster recovery
-     purposes.
-    </para>
- 
-    <para>
-     For testing purposes, it is possible to run both primary and standby
-     servers on the same system. This does not provide any worthwhile
-     improvement in server robustness, nor would it be described as HA.
-    </para>
-   </sect2>
- 
-   <sect2 id="warm-standby-failover">
-    <title>Failover</title>
- 
-    <para>
-     If the primary server fails then the standby server should begin
-     failover procedures.
-    </para>
- 
-    <para>
-     If the standby server fails then no failover need take place. If the
-     standby server can be restarted, even some time later, then the recovery
-     process can also be immediately restarted, taking advantage of
-     restartable recovery. If the standby server cannot be restarted, then a
-     full new standby server instance should be created.
-    </para>
- 
-    <para>
-     If the primary server fails and the standby server becomes the
-     new primary, and then the old primary restarts, you must have
-     a mechanism for informing old primary that it is no longer the primary. This is
-     sometimes known as STONITH (Shoot The Other Node In The Head), which is
-     necessary to avoid situations where both systems think they are the
-     primary, which will lead to confusion and ultimately data loss.
-    </para>
- 
-    <para>
-     Many failover systems use just two systems, the primary and the standby,
-     connected by some kind of heartbeat mechanism to continually verify the
-     connectivity between the two and the viability of the primary. It is
-     also possible to use a third system (called a witness server) to prevent
-     some cases of inappropriate failover, but the additional complexity
-     might not be worthwhile unless it is set up with sufficient care and
-     rigorous testing.
-    </para>
- 
-    <para>
-     Once failover to the standby occurs, we have only a
-     single server in operation. This is known as a degenerate state.
-     The former standby is now the primary, but the former primary is down
-     and might stay down.  To return to normal operation we must
-     fully recreate a standby server,
-     either on the former primary system when it comes up, or on a third,
-     possibly new, system. Once complete the primary and standby can be
-     considered to have switched roles. Some people choose to use a third
-     server to provide backup for the new primary until the new standby
-     server is recreated,
-     though clearly this complicates the system configuration and
-     operational processes.
-    </para>
- 
-    <para>
-     So, switching from primary to standby server can be fast but requires
-     some time to re-prepare the failover cluster. Regular switching from
-     primary to standby is useful, since it allows regular downtime on
-     each system for maintenance. This also serves as a test of the
-     failover mechanism to ensure that it will really work when you need it.
-     Written administration procedures are advised.
-    </para>
-   </sect2>
- 
-   <sect2 id="warm-standby-record">
-    <title>Record-based Log Shipping</title>
- 
-    <para>
-     <productname>PostgreSQL</productname> directly supports file-based
-     log shipping as described above. It is also possible to implement
-     record-based log shipping, though this requires custom development.
-    </para>
- 
-    <para>
-     An external program can call the <function>pg_xlogfile_name_offset()</>
-     function (see <xref linkend="functions-admin">)
-     to find out the file name and the exact byte offset within it of
-     the current end of WAL.  It can then access the WAL file directly
-     and copy the data from the last known end of WAL through the current end
-     over to the standby servers.  With this approach, the window for data
-     loss is the polling cycle time of the copying program, which can be very
-     small, and there is no wasted bandwidth from forcing partially-used
-     segment files to be archived.  Note that the standby servers'
-     <varname>restore_command</> scripts can only deal with whole WAL files,
-     so the incrementally copied data is not ordinarily made available to
-     the standby servers.  It is of use only when the primary dies &mdash;
-     then the last partial WAL file is fed to the standby before allowing
-     it to come up.  The correct implementation of this process requires
-     cooperation of the <varname>restore_command</> script with the data
-     copying program.
-    </para>
- 
-    <para>
-     Starting with <productname>PostgreSQL</> version 8.5, you can use
-     streaming replication (see <xref linkend="streaming-replication">) to
-     achieve the same with less effort.
-    </para>
-   </sect2>
- 
-   <sect2 id="streaming-replication">
-    <title>Streaming Replication</title>
- 
-    <para>
-     <productname>PostgreSQL</> includes a simple streaming replication
-     mechanism, which lets the standby server to stay more up-to-date than
-     file-based replication allows. The standby connects to the primary
-     and the primary starts streaming WAL records from where the standby
-     left off, and continues streaming them as they are generated, without
-     waiting for the WAL file to be filled. So with streaming replication,
-     <varname>archive_timeout</> does not need to be configured.
-    </para>
- 
-    <para>
-     Streaming replication relies on file-based continuous archiving for
-     making the base backup and for allowing a standby to catch up if it's
-     disconnected from the primary for long enough for the primary to
-     delete old WAL files still required by the standby.
-    </para>
- 
-    <sect3 id="streaming-replication-setup">
-     <title>Setup</title>
-     <para>
-      The short procedure for configuring streaming replication is as follows.
-      For full details of each step, refer to other sections as noted.
-      <orderedlist>
-       <listitem>
-        <para>
-         Set up primary and standby systems as near identically as possible,
-         including two identical copies of <productname>PostgreSQL</> at the
-         same release level.
-        </para>
-       </listitem>
-      <listitem>
-       <para>
-        Set up continuous archiving from the primary to a WAL archive located
-        in a directory on the standby server. Ensure that
-        <xref linkend="guc-archive-mode">,
-        <xref linkend="guc-archive-command"> and
-        <xref linkend="guc-archive-timeout">
-        are set appropriately on the primary
-        (see <xref linkend="backup-archiving-wal">).
-       </para>
-      </listitem>
- 
-      <listitem>
-       <para>
-        Set up connections and authentication so that the standby server can
-        successfully connect to the pseudo <literal>replication</> database of
-        the primary server (see
-        <xref linkend="streaming-replication-authentication">). Ensure that
-        <xref linkend="guc-listen-addresses"> and <filename>pg_hba.conf</> are
-        configured appropriately on the primary.
-       </para>
-       <para>
-        On systems that support the keepalive socket option, setting
-        <xref linkend="guc-tcp-keepalives-idle">,
-        <xref linkend="guc-tcp-keepalives-interval"> and
-        <xref linkend="guc-tcp-keepalives-count"> helps you to find the
-        troubles with replication (e.g., the network outage or the failure of
-        the standby server) as soon as possible.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Set the maximum number of concurrent connections from the standby servers
-        (see <xref linkend="guc-max-wal-senders"> for details).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Enable WAL archiving in the primary server because we need to make a base
-        backup of it later (see <xref linkend="guc-archive-mode"> and
-        <xref linkend="guc-archive-command"> for details).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Start the <productname>PostgreSQL</> server on the primary.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Make a base backup of the primary server (see
-        <xref linkend="backup-base-backup">), and load this data onto the
-        standby. Note that all files present in <filename>pg_xlog</>
-        and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
-        server should be removed because they might be obsolete.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Set up WAL archiving, connections and authentication like the primary
-        server, because the standby server might work as a primary server after
-        failover. Ensure that your settings are consistent with the
-        <emphasis>future</> environment after the primary and the standby
-        server are interchanged by failover. If you're setting up the standby
-        server for e.g reporting purposes, with no plans to fail over to it,
-        configure the standby accordingly.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Create a recovery command file <filename>recovery.conf</> in the data
-        directory on the standby server.
-       </para>
- 
-       <variablelist id="replication-config-settings" xreflabel="Replication Settings">
-        <varlistentry id="standby-mode" xreflabel="standby_mode">
-         <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
-         <listitem>
-          <para>
-           Specifies whether to start the <productname>PostgreSQL</> server as
-           a standby. If this parameter is <literal>on</>, the streaming
-           replication is enabled and the standby server will try to connect
-           to the primary to receive and apply WAL records continuously. The
-           default is <literal>off</>, which allows only an archive recovery
-           without replication. So, streaming replication requires this
-           parameter to be explicitly set to <literal>on</>.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
-         <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
-         <listitem>
-          <para>
-           Specifies a connection string which is used for the standby server
-           to connect with the primary. This string is in the same format as
-           described in <xref linkend="libpq-connect">. If any option is
-           unspecified in this string, then the corresponding environment
-           variable (see <xref linkend="libpq-envars">) is checked. If the
-           environment variable is not set either, then the indicated built-in
-           defaults are used.
-          </para>
-          <para>
-           The built-in replication requires that a host name (or host address)
-           or port number which the primary server listens on should be
-           specified in this string, respectively. Also ensure that a role with
-           the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
-           primary is set (see
-           <xref linkend="streaming-replication-authentication">). Note that
-           the password needs to be set if the primary demands password
-           authentication.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry id="trigger-file" xreflabel="trigger_file">
-         <term><varname>trigger_file</varname> (<type>string</type>)</term>
-         <listitem>
-          <para>
-           Specifies a trigger file whose presence activates the standby.
-           If no trigger file is specified, the standby never exits
-           recovery.
-          </para>
-         </listitem>
-        </varlistentry>
-       </variablelist>
-      </listitem>
-      <listitem>
-       <para>
-        Start the <productname>PostgreSQL</> server on the standby. The standby
-        server will go into recovery mode and proceeds to receive WAL records
-        from the primary and apply them continuously.
-       </para>
-      </listitem>
-      </orderedlist>
-     </para>
-    </sect3>
-    <sect3 id="streaming-replication-authentication">
-     <title>Authentication</title>
-     <para>
-      It's very important that the access privilege for replication are set
-      properly so that only trusted users can read the WAL stream, because it's
-      easy to extract serious information from it.
-     </para>
-     <para>
-      Only superuser is allowed to connect to the primary as the replication
-      standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
-      privileges needs to be created in the primary.
-     </para>
-     <para>
-      Client authentication for replication is controlled by the
-      <filename>pg_hba.conf</> record specifying <literal>replication</> in the
-      <replaceable>database</> field. For example, if the standby is running on
-      host IP <literal>192.168.1.100</> and the superuser's name for replication
-      is <literal>foo</>, the administrator can add the following line to the
-      <filename>pg_hba.conf</> file on the primary.
- 
- <programlisting>
- # Allow the user "foo" from host 192.168.1.100 to connect to the primary
- # as a replication standby if the user's password is correctly supplied.
- #
- # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
- host    replication     foo             192.168.1.100/32        md5
- </programlisting>
-     </para>
-     <para>
-      The host name and port number of the primary, user name to connect as,
-      and password are specified in the <filename>recovery.conf</> file or
-      the corresponding environment variable on the standby.
-      For example, if the primary is running on host IP <literal>192.168.1.50</>,
-      port <literal>5432</literal>, the superuser's name for replication is
-      <literal>foo</>, and the password is <literal>foopass</>, the administrator
-      can add the following line to the <filename>recovery.conf</> file on the
-      standby.
- 
- <programlisting>
- # The standby connects to the primary that is running on host 192.168.1.50
- # and port 5432 as the user "foo" whose password is "foopass".
- primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
- </programlisting>
-     </para>
-    </sect3>
-   </sect2>
- 
-   <sect2 id="backup-incremental-updated">
-    <title>Incrementally Updated Backups</title>
- 
-   <indexterm zone="backup">
-    <primary>incrementally updated backups</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>change accumulation</primary>
-   </indexterm>
- 
-    <para>
-     In a warm standby configuration, it is possible to offload the expense of
-     taking periodic base backups from the primary server; instead base backups
-     can be made by backing
-     up a standby server's files.  This concept is generally known as
-     incrementally updated backups, log change accumulation, or more simply,
-     change accumulation.
-    </para>
- 
-    <para>
-     If we take a file system backup of the standby server's data
-     directory while it is processing
-     logs shipped from the primary, we will be able to reload that backup and
-     restart the standby's recovery process from the last restart point.
-     We no longer need to keep WAL files from before the standby's restart point.
-     If we need to recover, it will be faster to recover from the incrementally
-     updated backup than from the original base backup.
-    </para>
- 
-    <para>
-     Since the standby server is not <quote>live</>, it is not possible to
-     use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
-     to manage the backup process; it will be up to you to determine how
-     far back you need to keep WAL segment files to have a recoverable
-     backup.  You can do this by running <application>pg_controldata</>
-     on the standby server to inspect the control file and determine the
-     current checkpoint WAL location, or by using the
-     <varname>log_checkpoints</> option to print values to the standby's
-     server log.
-    </para>
-   </sect2>
-  </sect1>
- 
-  <sect1 id="hot-standby">
-   <title>Hot Standby</title>
- 
-   <indexterm zone="backup">
-    <primary>Hot Standby</primary>
-   </indexterm>
- 
-    <para>
-     Hot Standby is the term used to describe the ability to connect to
-     the server and run queries while the server is in archive recovery. This
-     is useful for both log shipping replication and for restoring a backup
-     to an exact state with great precision.
-     The term Hot Standby also refers to the ability of the server to move
-     from recovery through to normal running while users continue running
-     queries and/or continue their connections.
-    </para>
- 
-    <para>
-     Running queries in recovery is in many ways the same as normal running
-     though there are a large number of usage and administrative points
-     to note.
-    </para>
- 
-   <sect2 id="hot-standby-users">
-    <title>User's Overview</title>
- 
-    <para>
-     Users can connect to the database while the server is in recovery
-     and perform read-only queries. Read-only access to catalogs and views
-     will also occur as normal.
-    </para>
- 
-    <para>
-     The data on the standby takes some time to arrive from the primary server
-     so there will be a measurable delay between primary and standby. Running the
-     same query nearly simultaneously on both primary and standby might therefore
-     return differing results. We say that data on the standby is eventually
-     consistent with the primary.
-     Queries executed on the standby will be correct with regard to the transactions
-     that had been recovered at the start of the query, or start of first statement,
-     in the case of serializable transactions. In comparison with the primary,
-     the standby returns query results that could have been obtained on the primary
-     at some exact moment in the past.
-    </para>
- 
-    <para>
-     When a transaction is started in recovery, the parameter
-     <varname>transaction_read_only</> will be forced to be true, regardless of the
-     <varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
-     It can't be manually set to false either. As a result, all transactions
-     started during recovery will be limited to read-only actions only. In all
-     other ways, connected sessions will appear identical to sessions
-     initiated during normal processing mode. There are no special commands
-     required to initiate a connection at this time, so all interfaces
-     work normally without change. After recovery finishes, the session
-     will allow normal read-write transactions at the start of the next
-     transaction, if these are requested.
-    </para>
- 
-    <para>
-     Read-only here means "no writes to the permanent database tables".
-     There are no problems with queries that make use of transient sort and
-     work files.
-    </para>
- 
-    <para>
-     The following actions are allowed
- 
-     <itemizedlist>
-      <listitem>
-       <para>
-        Query access - SELECT, COPY TO including views and SELECT RULEs
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Cursor commands - DECLARE, FETCH, CLOSE,
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Parameters - SHOW, SET, RESET
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Transaction management commands
-         <itemizedlist>
-          <listitem>
-           <para>
-            BEGIN, END, ABORT, START TRANSACTION
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            EXCEPTION blocks and other internal subtransactions
-           </para>
-          </listitem>
-         </itemizedlist>
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        LOCK TABLE, though only when explicitly in one of these modes:
-        ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Plugins and extensions - LOAD
-       </para>
-      </listitem>
-     </itemizedlist>
-    </para>
- 
-    <para>
-     These actions produce error messages
- 
-     <itemizedlist>
-      <listitem>
-       <para>
-        Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
-        Note that there are no allowed actions that result in a trigger
-        being executed during recovery.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
-        This also applies to temporary tables currently because currently their
-        definition causes writes to catalog tables.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        SELECT ... FOR SHARE | UPDATE which cause row locks to be written
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        RULEs on SELECT statements that generate DML commands.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
-        LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Transaction management commands that explicitly set non-read only state
-         <itemizedlist>
-          <listitem>
-           <para>
-             BEGIN READ WRITE,
-             START TRANSACTION READ WRITE
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-             SET TRANSACTION READ WRITE,
-             SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            SET transaction_read_only = off
-           </para>
-          </listitem>
-         </itemizedlist>
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
-        ROLLBACK PREPARED because even read-only transactions need to write
-        WAL in the prepare phase (the first phase of two phase commit).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        sequence update - nextval()
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
-       </para>
-      </listitem>
-     </itemizedlist>
-    </para>
- 
-    <para>
-     Note that current behaviour of read only transactions when not in
-     recovery is to allow the last two actions, so there are small and
-     subtle differences in behaviour between read-only transactions
-     run on standby and during normal running.
-     It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
-     temporary tables may be lifted in a future release, if their internal
-     implementation is altered to make this possible.
-    </para>
- 
-    <para>
-     If failover or switchover occurs the database will switch to normal
-     processing mode. Sessions will remain connected while the server
-     changes mode. Current transactions will continue, though will remain
-     read-only. After recovery is complete, it will be possible to initiate
-     read-write transactions.
-    </para>
- 
-    <para>
-     Users will be able to tell whether their session is read-only by
-     issuing SHOW transaction_read_only.  In addition a set of
-     functions <xref linkend="functions-recovery-info-table"> allow users to
-     access information about Hot Standby. These allow you to write
-     functions that are aware of the current state of the database. These
-     can be used to monitor the progress of recovery, or to allow you to
-     write complex programs that restore the database to particular states.
-    </para>
- 
-    <para>
-     In recovery, transactions will not be permitted to take any table lock
-     higher than RowExclusiveLock. In addition, transactions may never assign
-     a TransactionId and may never write WAL.
-     Any <command>LOCK TABLE</> command that runs on the standby and requests
-     a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
-    </para>
- 
-    <para>
-     In general queries will not experience lock conflicts with the database
-     changes made by recovery. This is becase recovery follows normal
-     concurrency control mechanisms, known as <acronym>MVCC</>. There are
-     some types of change that will cause conflicts, covered in the following
-     section.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-conflict">
-    <title>Handling query conflicts</title>
- 
-    <para>
-     The primary and standby nodes are in many ways loosely connected. Actions
-     on the primary will have an effect on the standby. As a result, there is
-     potential for negative interactions or conflicts between them. The easiest
-     conflict to understand is performance: if a huge data load is taking place
-     on the primary then this will generate a similar stream of WAL records on the
-     standby, so standby queries may contend for system resources, such as I/O.
-    </para>
- 
-    <para>
-     There are also additional types of conflict that can occur with Hot Standby.
-     These conflicts are <emphasis>hard conflicts</> in the sense that we may
-     need to cancel queries and in some cases disconnect sessions to resolve them.
-     The user is provided with a number of optional ways to handle these
-     conflicts, though we must first understand the possible reasons behind a conflict.
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          Access Exclusive Locks from primary node, including both explicit
-          LOCK commands and various kinds of DDL action
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Dropping tablespaces on the primary while standby queries are using
-          those tablespaces for temporary work files (work_mem overflow)
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Dropping databases on the primary while users are connected to that
-          database on the standby.
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Waiting to acquire buffer cleanup locks
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Early cleanup of data still visible to the current query's snapshot
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     Some WAL redo actions will be for DDL actions. These DDL actions are
-     repeating actions that have already committed on the primary node, so
-     they must not fail on the standby node. These DDL locks take priority
-     and will automatically *cancel* any read-only transactions that get in
-     their way, after a grace period. This is similar to the possibility of
-     being canceled by the deadlock detector, but in this case the standby
-     process always wins, since the replayed actions must not fail. This
-     also ensures that replication doesn't fall behind while we wait for a
-     query to complete. Again, we assume that the standby is there for high
-     availability purposes primarily.
-    </para>
- 
-    <para>
-     An example of the above would be an Administrator on Primary server
-     runs a <command>DROP TABLE</> on a table that's currently being queried
-     in the standby server.
-     Clearly the query cannot continue if we let the <command>DROP TABLE</>
-     proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
-     would wait until the query has finished. When the query is on the standby
-     and the <command>DROP TABLE</> is on the primary, the primary doesn't have
-     information about which queries are running on the standby and so the query
-     does not wait on the primary. The WAL change records come through to the
-     standby while the standby query is still running, causing a conflict.
-    </para>
- 
-    <para>
-     The most common reason for conflict between standby queries and WAL redo is
-     "early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
-     row versions when there are no users who may need to see them to ensure correct
-     visibility of data (the heart of MVCC). If there is a standby query that has
-     been running for longer than any query on the primary then it is possible
-     for old row versions to be removed by either a vacuum or HOT. This will
-     then generate WAL records that, if applied, would remove data on the
-     standby that might *potentially* be required by the standby query.
-     In more technical language, the primary's xmin horizon is later than
-     the standby's xmin horizon, allowing dead rows to be removed.
-    </para>
- 
-    <para>
-     Experienced users should note that both row version cleanup and row version
-     freezing will potentially conflict with recovery queries. Running a
-     manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
-     with no updated or deleted rows.
-    </para>
- 
-    <para>
-     We have a number of choices for resolving query conflicts.  The default
-     is that we wait and hope the query completes. The server will wait
-     automatically until the lag between primary and standby is at most
-     <varname>max_standby_delay</> seconds. Once that grace period expires,
-     we take one of the following actions:
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          If the conflict is caused by a lock, we cancel the conflicting standby
-          transaction immediately. If the transaction is idle-in-transaction
-          then currently we abort the session instead, though this may change
-          in the future.
-         </para>
-        </listitem>
- 
-        <listitem>
-         <para>
-          If the conflict is caused by cleanup records we tell the standby query
-          that a conflict has occurred and that it must cancel itself to avoid the
-          risk that it silently fails to read relevant data because
-          that data has been removed. (This is regrettably very similar to the
-          much feared and iconic error message "snapshot too old"). Some cleanup
-          records only cause conflict with older queries, though some types of
-          cleanup record affect all queries.
-         </para>
- 
-         <para>
-          If cancellation does occur, the query and/or transaction can always
-          be re-executed. The error is dynamic and will not necessarily occur
-          the same way if the query is executed again.
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     <varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
-     The parameter applies to the server as a whole so if the delay is all used
-     up by a single query then there may be little or no waiting for queries that
-     follow immediately, though they will have benefited equally from the initial
-     waiting period. The server may take time to catch up again before the grace
-     period is available again, though if there is a heavy and constant stream
-     of conflicts it may seldom catch up fully.
-    </para>
- 
-    <para>
-     Users should be clear that tables that are regularly and heavily updated on
-     primary server will quickly cause cancellation of longer running queries on
-     the standby. In those cases <varname>max_standby_delay</> can be
-     considered somewhat but not exactly the same as setting
-     <varname>statement_timeout</>.
-     </para>
- 
-    <para>
-     Other remedial actions exist if the number of cancellations is unacceptable.
-     The first option is to connect to primary server and keep a query active
-     for as long as we need to run queries on the standby. This guarantees that
-     a WAL cleanup record is never generated and we don't ever get query
-     conflicts as described above. This could be done using contrib/dblink
-     and pg_sleep(), or via other mechanisms. If you do this, you should note
-     that this will delay cleanup of dead rows by vacuum or HOT and many
-     people may find this undesirable. However, we should remember that
-     primary and standby nodes are linked via the WAL, so this situation is no
-     different to the case where we ran the query on the primary node itself
-     except we have the benefit of off-loading the execution onto the standby.
-    </para>
- 
-    <para>
-     It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
-     to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
-     more time for queries to execute before they are cancelled on the standby,
-     without the need for setting a high <varname>max_standby_delay</>.
-    </para>
- 
-    <para>
-     Three-way deadlocks are possible between AccessExclusiveLocks arriving from
-     the primary, cleanup WAL records that require buffer cleanup locks and
-     user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
-     are resolved by time-out when we exceed <varname>max_standby_delay</>.
-    </para>
- 
-    <para>
-     Dropping tablespaces or databases is discussed in the administrator's
-     section since they are not typical user situations.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-admin">
-    <title>Administrator's Overview</title>
- 
-    <para>
-     If there is a <filename>recovery.conf</> file present the server will start
-     in Hot Standby mode by default, though <varname>recovery_connections</> can
-     be disabled via <filename>postgresql.conf</>, if required. The server may take
-     some time to enable recovery connections since the server must first complete
-     sufficient recovery to provide a consistent state against which queries
-     can run before enabling read only connections. Look for these messages
-     in the server logs
- 
- <programlisting>
- LOG:  initializing recovery connections
- 
- ... then some time later ...
- 
- LOG:  consistent recovery state reached
- LOG:  database system is ready to accept read only connections
- </programlisting>
- 
-     Consistency information is recorded once per checkpoint on the primary, as long
-     as <varname>recovery_connections</> is enabled (on the primary). If this parameter
-     is disabled, it will not be possible to enable recovery connections on the standby.
-     The consistent state can also be delayed in the presence of both of these conditions
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          a write transaction has more than 64 subtransactions
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          very long-lived write transactions
-         </para>
-        </listitem>
-       </itemizedlist>
- 
-     If you are running file-based log shipping ("warm standby"), you may need
-     to wait until the next WAL file arrives, which could be as long as the
-     <varname>archive_timeout</> setting on the primary.
-    </para>
- 
-    <para>
-     The setting of some parameters on the standby will need reconfiguration
-     if they have been changed on the primary. The value on the standby must
-     be equal to or greater than the value on the primary. If these parameters
-     are not set high enough then the standby will not be able to track work
-     correctly from recovering transactions. If these values are set too low the
-     the server will halt. Higher values can then be supplied and the server
-     restarted to begin recovery again.
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          <varname>max_connections</>
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          <varname>max_prepared_transactions</>
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          <varname>max_locks_per_transaction</>
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     It is important that the administrator consider the appropriate setting
-     of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
-     There is no optimal setting and should be set according to business
-     priorities. For example if the server is primarily tasked as a High
-     Availability server, then you may wish to lower
-     <varname>max_standby_delay</> or even set it to zero, though that is a
-     very aggressive setting. If the standby server is tasked as an additional
-     server for decision support queries then it may be acceptable to set this
-     to a value of many hours (in seconds).
-    </para>
- 
-    <para>
-     Transaction status "hint bits" written on primary are not WAL-logged,
-     so data on standby will likely re-write the hints again on the standby.
-     Thus the main database blocks will produce write I/Os even though
-     all users are read-only; no changes have occurred to the data values
-     themselves.  Users will be able to write large sort temp files and
-     re-generate relcache info files, so there is no part of the database
-     that is truly read-only during hot standby mode. There is no restriction
-     on the use of set returning functions, or other users of tuplestore/tuplesort
-     code. Note also that writes to remote databases will still be possible,
-     even though the transaction is read-only locally.
-    </para>
- 
-    <para>
-     The following types of administrator command are not accepted
-     during recovery mode
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          Data Definition Language (DDL) - e.g. CREATE INDEX
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Privilege and Ownership - GRANT, REVOKE, REASSIGN
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     Note again that some of these commands are actually allowed during
-     "read only" mode transactions on the primary.
-    </para>
- 
-    <para>
-     As a result, you cannot create additional indexes that exist solely
-     on the standby, nor can statistics that exist solely on the standby.
-     If these administrator commands are needed they should be executed
-     on the primary so that the changes will propagate through to the
-     standby.
-    </para>
- 
-    <para>
-     <function>pg_cancel_backend()</> will work on user backends, but not the
-     Startup process, which performs recovery. pg_stat_activity does not
-     show an entry for the Startup process, nor do recovering transactions
-     show as active. As a result, pg_prepared_xacts is always empty during
-     recovery. If you wish to resolve in-doubt prepared transactions
-     then look at pg_prepared_xacts on the primary and issue commands to
-     resolve those transactions there.
-    </para>
- 
-    <para>
-     pg_locks will show locks held by backends as normal. pg_locks also shows
-     a virtual transaction managed by the Startup process that owns all
-     AccessExclusiveLocks held by transactions being replayed by recovery.
-     Note that Startup process does not acquire locks to
-     make database changes and thus locks other than AccessExclusiveLocks
-     do not show in pg_locks for the Startup process, they are just presumed
-     to exist.
-    </para>
- 
-    <para>
-     <productname>check_pgsql</> will work, but it is very simple.
-     <productname>check_postgres</> will also work, though many some actions
-     could give different or confusing results.
-     e.g. last vacuum time will not be maintained for example, since no
-     vacuum occurs on the standby (though vacuums running on the primary do
-     send their changes to the standby).
-    </para>
- 
-    <para>
-     WAL file control commands will not work during recovery
-     e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
-    </para>
- 
-    <para>
-     Dynamically loadable modules work, including pg_stat_statements.
-    </para>
- 
-    <para>
-     Advisory locks work normally in recovery, including deadlock detection.
-     Note that advisory locks are never WAL logged, so it is not possible for
-     an advisory lock on either the primary or the standby to conflict with WAL
-     replay. Nor is it possible to acquire an advisory lock on the primary
-     and have it initiate a similar advisory lock on the standby. Advisory
-     locks relate only to a single server on which they are acquired.
-    </para>
- 
-    <para>
-     Trigger-based replication systems such as <productname>Slony</>,
-     <productname>Londiste</> and <productname>Bucardo</> won't run on the
-     standby at all, though they will run happily on the primary server as
-     long as the changes are not sent to standby servers to be applied.
-     WAL replay is not trigger-based so you cannot relay from the
-     standby to any system that requires additional database writes or
-     relies on the use of triggers.
-    </para>
- 
-    <para>
-     New oids cannot be assigned, though some <acronym>UUID</> generators may still
-     work as long as they do not rely on writing new status to the database.
-    </para>
- 
-    <para>
-     Currently, temp table creation is not allowed during read only
-     transactions, so in some cases existing scripts will not run correctly.
-     It is possible we may relax that restriction in a later release. This is
-     both a SQL Standard compliance issue and a technical issue.
-    </para>
- 
-    <para>
-     <command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
-     Some standby users may be actively using the tablespace via their
-     <varname>temp_tablespaces</> parameter. If there are temp files in the
-     tablespace we currently cancel all active queries to ensure that temp
-     files are removed, so that we can remove the tablespace and continue with
-     WAL replay.
-    </para>
- 
-    <para>
-     Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
-     or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
-     that will cause all users connected to that database on the standby to be
-     forcibly disconnected. This action occurs immediately, whatever the setting of
-     <varname>max_standby_delay</>.
-    </para>
- 
-    <para>
-     In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
-     for a role with login capability while that user is still connected then
-     nothing happens to the connected user - they remain connected. The user cannot
-     reconnect however. This behaviour applies in recovery also, so a
-     <command>DROP USER</> on the primary does not disconnect that user on the standby.
-    </para>
- 
-    <para>
-     Stats collector is active during recovery. All scans, reads, blocks,
-     index usage etc will all be recorded normally on the standby. Replayed
-     actions will not duplicate their effects on primary, so replaying an
-     insert will not increment the Inserts column of pg_stat_user_tables.
-     The stats file is deleted at start of recovery, so stats from primary
-     and standby will differ; this is considered a feature not a bug.
-    </para>
- 
-    <para>
-     Autovacuum is not active during recovery, though will start normally
-     at the end of recovery.
-    </para>
- 
-    <para>
-     Background writer is active during recovery and will perform
-     restartpoints (similar to checkpoints on primary) and normal block
-     cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
-     though performs a restartpoint rather than a new checkpoint.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-parameters">
-    <title>Hot Standby Parameter Reference</title>
- 
-    <para>
-     Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
-     and <xref linkend="hot-standby-conflict"> sections.
-    </para>
- 
-    <para>
-     On the primary, parameters <varname>recovery_connections</> and
-     <varname>vacuum_defer_cleanup_age</> can be used to enable and control the
-     primary server to assist the successful configuration of Hot Standby servers.
-     <varname>max_standby_delay</> has no effect if set on the primary.
-    </para>
- 
-    <para>
-     On the standby, parameters <varname>recovery_connections</> and
-     <varname>max_standby_delay</> can be used to enable and control Hot Standby.
-     standby server to assist the successful configuration of Hot Standby servers.
-     <varname>vacuum_defer_cleanup_age</> has no effect during recovery.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-caveats">
-    <title>Caveats</title>
- 
-    <para>
-     At this writing, there are several limitations of Hot Standby.
-     These can and probably will be fixed in future releases:
- 
-   <itemizedlist>
-    <listitem>
-     <para>
-      Operations on hash indexes are not presently WAL-logged, so
-      replay will not update these indexes.  Hash indexes will not be
-      used for query plans during recovery.
-     </para>
-    </listitem>
-    <listitem>
-     <para>
-      Full knowledge of running transactions is required before snapshots
-      may be taken. Transactions that take use large numbers of subtransactions
-      (currently greater than 64) will delay the start of read only
-      connections until the completion of the longest running write transaction.
-      If this situation occurs explanatory messages will be sent to server log.
-     </para>
-    </listitem>
-    <listitem>
-     <para>
-      Valid starting points for recovery connections are generated at each
-      checkpoint on the master. If the standby is shutdown while the master
-      is in a shutdown state it may not be possible to re-enter Hot Standby
-      until the primary is started up so that it generates further starting
-      points in the WAL logs. This is not considered a serious issue
-      because the standby is usually switched into the primary role while
-      the first node is taken down.
-     </para>
-    </listitem>
-    <listitem>
-     <para>
-      At the end of recovery, AccessExclusiveLocks held by prepared transactions
-      will require twice the normal number of lock table entries. If you plan
-      on running either a large number of concurrent prepared transactions
-      that normally take AccessExclusiveLocks, or you plan on having one
-      large transaction that takes many AccessExclusiveLocks then you are
-      advised to select a larger value of <varname>max_locks_per_transaction</>,
-      up to, but never more than twice the value of the parameter setting on
-      the primary server in rare extremes. You need not consider this at all if
-      your setting of <varname>max_prepared_transactions</> is <literal>0</>.
-     </para>
-    </listitem>
-   </itemizedlist>
- 
-    </para>
-   </sect2>
- 
-  </sect1>
- 
   <sect1 id="migration">
    <title>Migration Between Releases</title>
  
--- 1492,1497 ----
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***************
*** 79,84 ****
--- 79,87 ----
    also available.
   </para>
  
+  <sect1 id="different-replication-solutions">
+  <title>Comparison of different solutions</title>
+ 
   <variablelist>
  
    <varlistentry>
***************
*** 450,453 **** protocol to make nodes agree on a serializable transactional order.
--- 453,1840 ----
  
   </variablelist>
  
+  </sect1>
+ 
+  <sect1 id="warm-standby">
+   <title>File-based Log Shipping</title>
+ 
+   <indexterm zone="high-availability">
+    <primary>warm standby</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>PITR standby</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>standby server</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>log shipping</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>witness server</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>STONITH</primary>
+   </indexterm>
+ 
+   <para>
+    Continuous archiving can be used to create a <firstterm>high
+    availability</> (HA) cluster configuration with one or more
+    <firstterm>standby servers</> ready to take over operations if the
+    primary server fails. This capability is widely referred to as
+    <firstterm>warm standby</> or <firstterm>log shipping</>.
+   </para>
+ 
+   <para>
+    The primary and standby server work together to provide this capability,
+    though the servers are only loosely coupled. The primary server operates
+    in continuous archiving mode, while each standby server operates in
+    continuous recovery mode, reading the WAL files from the primary. No
+    changes to the database tables are required to enable this capability,
+    so it offers low administration overhead compared to some other
+    replication approaches. This configuration also has relatively low
+    performance impact on the primary server.
+   </para>
+ 
+   <para>
+    Directly moving WAL records from one database server to another
+    is typically described as log shipping. <productname>PostgreSQL</>
+    implements file-based log shipping, which means that WAL records are
+    transferred one file (WAL segment) at a time. WAL files (16MB) can be
+    shipped easily and cheaply over any distance, whether it be to an
+    adjacent system, another system at the same site, or another system on
+    the far side of the globe. The bandwidth required for this technique
+    varies according to the transaction rate of the primary server.
+    Record-based log shipping is also possible with custom-developed
+    procedures, as discussed in <xref linkend="warm-standby-record">.
+   </para>
+ 
+   <para>
+    It should be noted that the log shipping is asynchronous, i.e., the WAL
+    records are shipped after transaction commit. As a result there is a
+    window for data loss should the primary server suffer a catastrophic
+    failure: transactions not yet shipped will be lost.  The length of the
+    window of data loss can be limited by use of the
+    <varname>archive_timeout</varname> parameter, which can be set as low
+    as a few seconds if required.  However such a low setting will
+    substantially increase the bandwidth required for file shipping.
+    If you need a window of less than a minute or so, it's probably better
+    to consider record-based log shipping.
+   </para>
+ 
+   <para>
+    The standby server is not available for access, since it is continually
+    performing recovery processing. Recovery performance is sufficiently
+    good that the standby will typically be only moments away from full
+    availability once it has been activated. As a result, we refer to this
+    capability as a warm standby configuration that offers high
+    availability. Restoring a server from an archived base backup and
+    rollforward will take considerably longer, so that technique only
+    offers a solution for disaster recovery, not high availability.
+   </para>
+ 
+   <sect2 id="warm-standby-planning">
+    <title>Planning</title>
+ 
+    <para>
+     It is usually wise to create the primary and standby servers
+     so that they are as similar as possible, at least from the
+     perspective of the database server.  In particular, the path names
+     associated with tablespaces will be passed across unmodified, so both
+     primary and standby servers must have the same mount paths for
+     tablespaces if that feature is used.  Keep in mind that if
+     <xref linkend="sql-createtablespace" endterm="sql-createtablespace-title">
+     is executed on the primary, any new mount point needed for it must
+     be created on the primary and all standby servers before the command
+     is executed. Hardware need not be exactly the same, but experience shows
+     that maintaining two identical systems is easier than maintaining two
+     dissimilar ones over the lifetime of the application and system.
+     In any case the hardware architecture must be the same &mdash; shipping
+     from, say, a 32-bit to a 64-bit system will not work.
+    </para>
+ 
+    <para>
+     In general, log shipping between servers running different major
+     <productname>PostgreSQL</> release
+     levels is not possible. It is the policy of the PostgreSQL Global
+     Development Group not to make changes to disk formats during minor release
+     upgrades, so it is likely that running different minor release levels
+     on primary and standby servers will work successfully. However, no
+     formal support for that is offered and you are advised to keep primary
+     and standby servers at the same release level as much as possible.
+     When updating to a new minor release, the safest policy is to update
+     the standby servers first &mdash; a new minor release is more likely
+     to be able to read WAL files from a previous minor release than vice
+     versa.
+    </para>
+ 
+    <para>
+     There is no special mode required to enable a standby server. The
+     operations that occur on both primary and standby servers are
+     normal continuous archiving and recovery tasks. The only point of
+     contact between the two database servers is the archive of WAL files
+     that both share: primary writing to the archive, standby reading from
+     the archive. Care must be taken to ensure that WAL archives from separate
+     primary servers do not become mixed together or confused. The archive
+     need not be large if it is only required for standby operation.
+    </para>
+ 
+    <para>
+     The magic that makes the two loosely coupled servers work together is
+     simply a <varname>restore_command</> used on the standby that,
+     when asked for the next WAL file, waits for it to become available from
+     the primary. The <varname>restore_command</> is specified in the
+     <filename>recovery.conf</> file on the standby server. Normal recovery
+     processing would request a file from the WAL archive, reporting failure
+     if the file was unavailable.  For standby processing it is normal for
+     the next WAL file to be unavailable, so we must be patient and wait for
+     it to appear. For files ending in <literal>.backup</> or
+     <literal>.history</> there is no need to wait, and a non-zero return
+     code must be returned. A waiting <varname>restore_command</> can be
+     written as a custom script that loops after polling for the existence of
+     the next WAL file. There must also be some way to trigger failover, which
+     should interrupt the <varname>restore_command</>, break the loop and
+     return a file-not-found error to the standby server. This ends recovery
+     and the standby will then come up as a normal server.
+    </para>
+ 
+    <para>
+     Pseudocode for a suitable <varname>restore_command</> is:
+ <programlisting>
+ triggered = false;
+ while (!NextWALFileReady() &amp;&amp; !triggered)
+ {
+     sleep(100000L);         /* wait for ~0.1 sec */
+     if (CheckForExternalTrigger())
+         triggered = true;
+ }
+ if (!triggered)
+         CopyWALFileForRecovery();
+ </programlisting>
+    </para>
+ 
+    <para>
+     A working example of a waiting <varname>restore_command</> is provided
+     as a <filename>contrib</> module named <application>pg_standby</>. It
+     should be used as a reference on how to correctly implement the logic
+     described above. It can also be extended as needed to support specific
+     configurations and environments.
+    </para>
+ 
+    <para>
+     <productname>PostgreSQL</productname> does not provide the system
+     software required to identify a failure on the primary and notify
+     the standby database server.  Many such tools exist and are well
+     integrated with the operating system facilities required for
+     successful failover, such as IP address migration.
+    </para>
+ 
+    <para>
+     The method for triggering failover is an important part of planning
+     and design. One potential option is the <varname>restore_command</>
+     command.  It is executed once for each WAL file, but the process
+     running the <varname>restore_command</> is created and dies for
+     each file, so there is no daemon or server process, and we cannot
+     use signals or a signal handler. Therefore, the
+     <varname>restore_command</> is not suitable to trigger failover.
+     It is possible to use a simple timeout facility, especially if
+     used in conjunction with a known <varname>archive_timeout</>
+     setting on the primary. However, this is somewhat error prone
+     since a network problem or busy primary server might be sufficient
+     to initiate failover. A notification mechanism such as the explicit
+     creation of a trigger file is ideal, if this can be arranged.
+    </para>
+ 
+    <para>
+     The size of the WAL archive can be minimized by using the <literal>%r</>
+     option of the <varname>restore_command</>. This option specifies the
+     last archive file name that needs to be kept to allow the recovery to
+     restart correctly. This can be used to truncate the archive once
+     files are no longer required, assuming the archive is writable from the
+     standby server.
+    </para>
+   </sect2>
+ 
+   <sect2 id="warm-standby-config">
+    <title>Implementation</title>
+ 
+    <para>
+     The short procedure for configuring a standby server is as follows. For
+     full details of each step, refer to previous sections as noted.
+     <orderedlist>
+      <listitem>
+       <para>
+        Set up primary and standby systems as nearly identical as
+        possible, including two identical copies of
+        <productname>PostgreSQL</> at the same release level.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Set up continuous archiving from the primary to a WAL archive
+        directory on the standby server. Ensure that
+        <xref linkend="guc-archive-mode">,
+        <xref linkend="guc-archive-command"> and
+        <xref linkend="guc-archive-timeout">
+        are set appropriately on the primary
+        (see <xref linkend="backup-archiving-wal">).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Make a base backup of the primary server (see <xref
+        linkend="backup-base-backup">), and load this data onto the standby.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Begin recovery on the standby server from the local WAL
+        archive, using a <filename>recovery.conf</> that specifies a
+        <varname>restore_command</> that waits as described
+        previously (see <xref linkend="backup-pitr-recovery">).
+       </para>
+      </listitem>
+     </orderedlist>
+    </para>
+ 
+    <para>
+     Recovery treats the WAL archive as read-only, so once a WAL file has
+     been copied to the standby system it can be copied to tape at the same
+     time as it is being read by the standby database server.
+     Thus, running a standby server for high availability can be performed at
+     the same time as files are stored for longer term disaster recovery
+     purposes.
+    </para>
+ 
+    <para>
+     For testing purposes, it is possible to run both primary and standby
+     servers on the same system. This does not provide any worthwhile
+     improvement in server robustness, nor would it be described as HA.
+    </para>
+   </sect2>
+ 
+   <sect2 id="warm-standby-record">
+    <title>Record-based Log Shipping</title>
+ 
+    <para>
+     <productname>PostgreSQL</productname> directly supports file-based
+     log shipping as described above. It is also possible to implement
+     record-based log shipping, though this requires custom development.
+    </para>
+ 
+    <para>
+     An external program can call the <function>pg_xlogfile_name_offset()</>
+     function (see <xref linkend="functions-admin">)
+     to find out the file name and the exact byte offset within it of
+     the current end of WAL.  It can then access the WAL file directly
+     and copy the data from the last known end of WAL through the current end
+     over to the standby servers.  With this approach, the window for data
+     loss is the polling cycle time of the copying program, which can be very
+     small, and there is no wasted bandwidth from forcing partially-used
+     segment files to be archived.  Note that the standby servers'
+     <varname>restore_command</> scripts can only deal with whole WAL files,
+     so the incrementally copied data is not ordinarily made available to
+     the standby servers.  It is of use only when the primary dies &mdash;
+     then the last partial WAL file is fed to the standby before allowing
+     it to come up.  The correct implementation of this process requires
+     cooperation of the <varname>restore_command</> script with the data
+     copying program.
+    </para>
+ 
+    <para>
+     Starting with <productname>PostgreSQL</> version 8.5, you can use
+     streaming replication (see <xref linkend="streaming-replication">) to
+     achieve the same with less effort.
+    </para>
+   </sect2>
+  </sect1>
+ 
+   <sect1 id="streaming-replication">
+    <title>Streaming Replication</title>
+ 
+    <indexterm zone="high-availability">
+     <primary>Streaming Replication</primary>
+    </indexterm>
+ 
+    <para>
+     <productname>PostgreSQL</> includes a simple streaming replication
+     mechanism, which lets the standby server to stay more up-to-date than
+     file-based replication allows. The standby connects to the primary
+     and the primary starts streaming WAL records from where the standby
+     left off, and continues streaming them as they are generated, without
+     waiting for the WAL file to be filled. So with streaming replication,
+     <varname>archive_timeout</> does not need to be configured.
+    </para>
+ 
+    <para>
+     Streaming replication relies on file-based continuous archiving for
+     making the base backup and for allowing a standby to catch up if it's
+     disconnected from the primary for long enough for the primary to
+     delete old WAL files still required by the standby.
+    </para>
+ 
+    <sect2 id="streaming-replication-setup">
+     <title>Setup</title>
+     <para>
+      The short procedure for configuring streaming replication is as follows.
+      For full details of each step, refer to other sections as noted.
+      <orderedlist>
+       <listitem>
+        <para>
+         Set up primary and standby systems as near identically as possible,
+         including two identical copies of <productname>PostgreSQL</> at the
+         same release level.
+        </para>
+       </listitem>
+      <listitem>
+       <para>
+        Set up continuous archiving from the primary to a WAL archive located
+        in a directory on the standby server. Ensure that
+        <xref linkend="guc-archive-mode">,
+        <xref linkend="guc-archive-command"> and
+        <xref linkend="guc-archive-timeout">
+        are set appropriately on the primary
+        (see <xref linkend="backup-archiving-wal">).
+       </para>
+      </listitem>
+ 
+      <listitem>
+       <para>
+        Set up connections and authentication so that the standby server can
+        successfully connect to the pseudo <literal>replication</> database of
+        the primary server (see
+        <xref linkend="streaming-replication-authentication">). Ensure that
+        <xref linkend="guc-listen-addresses"> and <filename>pg_hba.conf</> are
+        configured appropriately on the primary.
+       </para>
+       <para>
+        On systems that support the keepalive socket option, setting
+        <xref linkend="guc-tcp-keepalives-idle">,
+        <xref linkend="guc-tcp-keepalives-interval"> and
+        <xref linkend="guc-tcp-keepalives-count"> helps you to find the
+        troubles with replication (e.g., the network outage or the failure of
+        the standby server) as soon as possible.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Set the maximum number of concurrent connections from the standby servers
+        (see <xref linkend="guc-max-wal-senders"> for details).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Enable WAL archiving in the primary server because we need to make a base
+        backup of it later (see <xref linkend="guc-archive-mode"> and
+        <xref linkend="guc-archive-command"> for details).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Start the <productname>PostgreSQL</> server on the primary.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Make a base backup of the primary server (see
+        <xref linkend="backup-base-backup">), and load this data onto the
+        standby. Note that all files present in <filename>pg_xlog</>
+        and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
+        server should be removed because they might be obsolete.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Set up WAL archiving, connections and authentication like the primary
+        server, because the standby server might work as a primary server after
+        failover. Ensure that your settings are consistent with the
+        <emphasis>future</> environment after the primary and the standby
+        server are interchanged by failover. If you're setting up the standby
+        server for e.g reporting purposes, with no plans to fail over to it,
+        configure the standby accordingly.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Create a recovery command file <filename>recovery.conf</> in the data
+        directory on the standby server.
+       </para>
+ 
+       <variablelist id="replication-config-settings" xreflabel="Replication Settings">
+        <varlistentry id="standby-mode" xreflabel="standby_mode">
+         <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
+         <listitem>
+          <para>
+           Specifies whether to start the <productname>PostgreSQL</> server as
+           a standby. If this parameter is <literal>on</>, the streaming
+           replication is enabled and the standby server will try to connect
+           to the primary to receive and apply WAL records continuously. The
+           default is <literal>off</>, which allows only an archive recovery
+           without replication. So, streaming replication requires this
+           parameter to be explicitly set to <literal>on</>.
+          </para>
+         </listitem>
+        </varlistentry>
+        <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
+         <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
+         <listitem>
+          <para>
+           Specifies a connection string which is used for the standby server
+           to connect with the primary. This string is in the same format as
+           described in <xref linkend="libpq-connect">. If any option is
+           unspecified in this string, then the corresponding environment
+           variable (see <xref linkend="libpq-envars">) is checked. If the
+           environment variable is not set either, then the indicated built-in
+           defaults are used.
+          </para>
+          <para>
+           The built-in replication requires that a host name (or host address)
+           or port number which the primary server listens on should be
+           specified in this string, respectively. Also ensure that a role with
+           the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
+           primary is set (see
+           <xref linkend="streaming-replication-authentication">). Note that
+           the password needs to be set if the primary demands password
+           authentication.
+          </para>
+         </listitem>
+        </varlistentry>
+        <varlistentry id="trigger-file" xreflabel="trigger_file">
+         <term><varname>trigger_file</varname> (<type>string</type>)</term>
+         <listitem>
+          <para>
+           Specifies a trigger file whose presence activates the standby.
+           If no trigger file is specified, the standby never exits
+           recovery.
+          </para>
+         </listitem>
+        </varlistentry>
+       </variablelist>
+      </listitem>
+      <listitem>
+       <para>
+        Start the <productname>PostgreSQL</> server on the standby. The standby
+        server will go into recovery mode and proceeds to receive WAL records
+        from the primary and apply them continuously.
+       </para>
+      </listitem>
+      </orderedlist>
+     </para>
+    </sect2>
+    <sect2 id="streaming-replication-authentication">
+     <title>Authentication</title>
+     <para>
+      It's very important that the access privilege for replication are set
+      properly so that only trusted users can read the WAL stream, because it's
+      easy to extract serious information from it.
+     </para>
+     <para>
+      Only superuser is allowed to connect to the primary as the replication
+      standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
+      privileges needs to be created in the primary.
+     </para>
+     <para>
+      Client authentication for replication is controlled by the
+      <filename>pg_hba.conf</> record specifying <literal>replication</> in the
+      <replaceable>database</> field. For example, if the standby is running on
+      host IP <literal>192.168.1.100</> and the superuser's name for replication
+      is <literal>foo</>, the administrator can add the following line to the
+      <filename>pg_hba.conf</> file on the primary.
+ 
+ <programlisting>
+ # Allow the user "foo" from host 192.168.1.100 to connect to the primary
+ # as a replication standby if the user's password is correctly supplied.
+ #
+ # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
+ host    replication     foo             192.168.1.100/32        md5
+ </programlisting>
+     </para>
+     <para>
+      The host name and port number of the primary, user name to connect as,
+      and password are specified in the <filename>recovery.conf</> file or
+      the corresponding environment variable on the standby.
+      For example, if the primary is running on host IP <literal>192.168.1.50</>,
+      port <literal>5432</literal>, the superuser's name for replication is
+      <literal>foo</>, and the password is <literal>foopass</>, the administrator
+      can add the following line to the <filename>recovery.conf</> file on the
+      standby.
+ 
+ <programlisting>
+ # The standby connects to the primary that is running on host 192.168.1.50
+ # and port 5432 as the user "foo" whose password is "foopass".
+ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
+ </programlisting>
+     </para>
+    </sect2>
+   </sect1>
+ 
+   <sect1 id="warm-standby-failover">
+    <title>Failover</title>
+ 
+    <para>
+     If the primary server fails then the standby server should begin
+     failover procedures.
+    </para>
+ 
+    <para>
+     If the standby server fails then no failover need take place. If the
+     standby server can be restarted, even some time later, then the recovery
+     process can also be immediately restarted, taking advantage of
+     restartable recovery. If the standby server cannot be restarted, then a
+     full new standby server instance should be created.
+    </para>
+ 
+    <para>
+     If the primary server fails and the standby server becomes the
+     new primary, and then the old primary restarts, you must have
+     a mechanism for informing old primary that it is no longer the primary. This is
+     sometimes known as STONITH (Shoot The Other Node In The Head), which is
+     necessary to avoid situations where both systems think they are the
+     primary, which will lead to confusion and ultimately data loss.
+    </para>
+ 
+    <para>
+     Many failover systems use just two systems, the primary and the standby,
+     connected by some kind of heartbeat mechanism to continually verify the
+     connectivity between the two and the viability of the primary. It is
+     also possible to use a third system (called a witness server) to prevent
+     some cases of inappropriate failover, but the additional complexity
+     might not be worthwhile unless it is set up with sufficient care and
+     rigorous testing.
+    </para>
+ 
+    <para>
+     Once failover to the standby occurs, we have only a
+     single server in operation. This is known as a degenerate state.
+     The former standby is now the primary, but the former primary is down
+     and might stay down.  To return to normal operation we must
+     fully recreate a standby server,
+     either on the former primary system when it comes up, or on a third,
+     possibly new, system. Once complete the primary and standby can be
+     considered to have switched roles. Some people choose to use a third
+     server to provide backup for the new primary until the new standby
+     server is recreated,
+     though clearly this complicates the system configuration and
+     operational processes.
+    </para>
+ 
+    <para>
+     So, switching from primary to standby server can be fast but requires
+     some time to re-prepare the failover cluster. Regular switching from
+     primary to standby is useful, since it allows regular downtime on
+     each system for maintenance. This also serves as a test of the
+     failover mechanism to ensure that it will really work when you need it.
+     Written administration procedures are advised.
+    </para>
+   </sect1>
+ 
+  <sect1 id="hot-standby">
+   <title>Hot Standby</title>
+ 
+   <indexterm zone="high-availability">
+    <primary>Hot Standby</primary>
+   </indexterm>
+ 
+    <para>
+     Hot Standby is the term used to describe the ability to connect to
+     the server and run queries while the server is in archive recovery. This
+     is useful for both log shipping replication and for restoring a backup
+     to an exact state with great precision.
+     The term Hot Standby also refers to the ability of the server to move
+     from recovery through to normal running while users continue running
+     queries and/or continue their connections.
+    </para>
+ 
+    <para>
+     Running queries in recovery is in many ways the same as normal running
+     though there are a large number of usage and administrative points
+     to note.
+    </para>
+ 
+   <sect2 id="hot-standby-users">
+    <title>User's Overview</title>
+ 
+    <para>
+     Users can connect to the database while the server is in recovery
+     and perform read-only queries. Read-only access to catalogs and views
+     will also occur as normal.
+    </para>
+ 
+    <para>
+     The data on the standby takes some time to arrive from the primary server
+     so there will be a measurable delay between primary and standby. Running the
+     same query nearly simultaneously on both primary and standby might therefore
+     return differing results. We say that data on the standby is eventually
+     consistent with the primary.
+     Queries executed on the standby will be correct with regard to the transactions
+     that had been recovered at the start of the query, or start of first statement,
+     in the case of serializable transactions. In comparison with the primary,
+     the standby returns query results that could have been obtained on the primary
+     at some exact moment in the past.
+    </para>
+ 
+    <para>
+     When a transaction is started in recovery, the parameter
+     <varname>transaction_read_only</> will be forced to be true, regardless of the
+     <varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
+     It can't be manually set to false either. As a result, all transactions
+     started during recovery will be limited to read-only actions only. In all
+     other ways, connected sessions will appear identical to sessions
+     initiated during normal processing mode. There are no special commands
+     required to initiate a connection at this time, so all interfaces
+     work normally without change. After recovery finishes, the session
+     will allow normal read-write transactions at the start of the next
+     transaction, if these are requested.
+    </para>
+ 
+    <para>
+     Read-only here means "no writes to the permanent database tables".
+     There are no problems with queries that make use of transient sort and
+     work files.
+    </para>
+ 
+    <para>
+     The following actions are allowed
+ 
+     <itemizedlist>
+      <listitem>
+       <para>
+        Query access - SELECT, COPY TO including views and SELECT RULEs
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Cursor commands - DECLARE, FETCH, CLOSE,
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Parameters - SHOW, SET, RESET
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Transaction management commands
+         <itemizedlist>
+          <listitem>
+           <para>
+            BEGIN, END, ABORT, START TRANSACTION
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+            SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+            EXCEPTION blocks and other internal subtransactions
+           </para>
+          </listitem>
+         </itemizedlist>
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        LOCK TABLE, though only when explicitly in one of these modes:
+        ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Plugins and extensions - LOAD
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+ 
+    <para>
+     These actions produce error messages
+ 
+     <itemizedlist>
+      <listitem>
+       <para>
+        Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
+        Note that there are no allowed actions that result in a trigger
+        being executed during recovery.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
+        This also applies to temporary tables currently because currently their
+        definition causes writes to catalog tables.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        SELECT ... FOR SHARE | UPDATE which cause row locks to be written
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        RULEs on SELECT statements that generate DML commands.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+        LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Transaction management commands that explicitly set non-read only state
+         <itemizedlist>
+          <listitem>
+           <para>
+             BEGIN READ WRITE,
+             START TRANSACTION READ WRITE
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+             SET TRANSACTION READ WRITE,
+             SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+            SET transaction_read_only = off
+           </para>
+          </listitem>
+         </itemizedlist>
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
+        ROLLBACK PREPARED because even read-only transactions need to write
+        WAL in the prepare phase (the first phase of two phase commit).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        sequence update - nextval()
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+ 
+    <para>
+     Note that current behaviour of read only transactions when not in
+     recovery is to allow the last two actions, so there are small and
+     subtle differences in behaviour between read-only transactions
+     run on standby and during normal running.
+     It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
+     temporary tables may be lifted in a future release, if their internal
+     implementation is altered to make this possible.
+    </para>
+ 
+    <para>
+     If failover or switchover occurs the database will switch to normal
+     processing mode. Sessions will remain connected while the server
+     changes mode. Current transactions will continue, though will remain
+     read-only. After recovery is complete, it will be possible to initiate
+     read-write transactions.
+    </para>
+ 
+    <para>
+     Users will be able to tell whether their session is read-only by
+     issuing SHOW transaction_read_only.  In addition a set of
+     functions <xref linkend="functions-recovery-info-table"> allow users to
+     access information about Hot Standby. These allow you to write
+     functions that are aware of the current state of the database. These
+     can be used to monitor the progress of recovery, or to allow you to
+     write complex programs that restore the database to particular states.
+    </para>
+ 
+    <para>
+     In recovery, transactions will not be permitted to take any table lock
+     higher than RowExclusiveLock. In addition, transactions may never assign
+     a TransactionId and may never write WAL.
+     Any <command>LOCK TABLE</> command that runs on the standby and requests
+     a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
+    </para>
+ 
+    <para>
+     In general queries will not experience lock conflicts with the database
+     changes made by recovery. This is becase recovery follows normal
+     concurrency control mechanisms, known as <acronym>MVCC</>. There are
+     some types of change that will cause conflicts, covered in the following
+     section.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-conflict">
+    <title>Handling query conflicts</title>
+ 
+    <para>
+     The primary and standby nodes are in many ways loosely connected. Actions
+     on the primary will have an effect on the standby. As a result, there is
+     potential for negative interactions or conflicts between them. The easiest
+     conflict to understand is performance: if a huge data load is taking place
+     on the primary then this will generate a similar stream of WAL records on the
+     standby, so standby queries may contend for system resources, such as I/O.
+    </para>
+ 
+    <para>
+     There are also additional types of conflict that can occur with Hot Standby.
+     These conflicts are <emphasis>hard conflicts</> in the sense that we may
+     need to cancel queries and in some cases disconnect sessions to resolve them.
+     The user is provided with a number of optional ways to handle these
+     conflicts, though we must first understand the possible reasons behind a conflict.
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          Access Exclusive Locks from primary node, including both explicit
+          LOCK commands and various kinds of DDL action
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Dropping tablespaces on the primary while standby queries are using
+          those tablespaces for temporary work files (work_mem overflow)
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Dropping databases on the primary while users are connected to that
+          database on the standby.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Waiting to acquire buffer cleanup locks
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Early cleanup of data still visible to the current query's snapshot
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     Some WAL redo actions will be for DDL actions. These DDL actions are
+     repeating actions that have already committed on the primary node, so
+     they must not fail on the standby node. These DDL locks take priority
+     and will automatically *cancel* any read-only transactions that get in
+     their way, after a grace period. This is similar to the possibility of
+     being canceled by the deadlock detector, but in this case the standby
+     process always wins, since the replayed actions must not fail. This
+     also ensures that replication doesn't fall behind while we wait for a
+     query to complete. Again, we assume that the standby is there for high
+     availability purposes primarily.
+    </para>
+ 
+    <para>
+     An example of the above would be an Administrator on Primary server
+     runs a <command>DROP TABLE</> on a table that's currently being queried
+     in the standby server.
+     Clearly the query cannot continue if we let the <command>DROP TABLE</>
+     proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
+     would wait until the query has finished. When the query is on the standby
+     and the <command>DROP TABLE</> is on the primary, the primary doesn't have
+     information about which queries are running on the standby and so the query
+     does not wait on the primary. The WAL change records come through to the
+     standby while the standby query is still running, causing a conflict.
+    </para>
+ 
+    <para>
+     The most common reason for conflict between standby queries and WAL redo is
+     "early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
+     row versions when there are no users who may need to see them to ensure correct
+     visibility of data (the heart of MVCC). If there is a standby query that has
+     been running for longer than any query on the primary then it is possible
+     for old row versions to be removed by either a vacuum or HOT. This will
+     then generate WAL records that, if applied, would remove data on the
+     standby that might *potentially* be required by the standby query.
+     In more technical language, the primary's xmin horizon is later than
+     the standby's xmin horizon, allowing dead rows to be removed.
+    </para>
+ 
+    <para>
+     Experienced users should note that both row version cleanup and row version
+     freezing will potentially conflict with recovery queries. Running a
+     manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
+     with no updated or deleted rows.
+    </para>
+ 
+    <para>
+     We have a number of choices for resolving query conflicts.  The default
+     is that we wait and hope the query completes. The server will wait
+     automatically until the lag between primary and standby is at most
+     <varname>max_standby_delay</> seconds. Once that grace period expires,
+     we take one of the following actions:
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          If the conflict is caused by a lock, we cancel the conflicting standby
+          transaction immediately. If the transaction is idle-in-transaction
+          then currently we abort the session instead, though this may change
+          in the future.
+         </para>
+        </listitem>
+ 
+        <listitem>
+         <para>
+          If the conflict is caused by cleanup records we tell the standby query
+          that a conflict has occurred and that it must cancel itself to avoid the
+          risk that it silently fails to read relevant data because
+          that data has been removed. (This is regrettably very similar to the
+          much feared and iconic error message "snapshot too old"). Some cleanup
+          records only cause conflict with older queries, though some types of
+          cleanup record affect all queries.
+         </para>
+ 
+         <para>
+          If cancellation does occur, the query and/or transaction can always
+          be re-executed. The error is dynamic and will not necessarily occur
+          the same way if the query is executed again.
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     <varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
+     The parameter applies to the server as a whole so if the delay is all used
+     up by a single query then there may be little or no waiting for queries that
+     follow immediately, though they will have benefited equally from the initial
+     waiting period. The server may take time to catch up again before the grace
+     period is available again, though if there is a heavy and constant stream
+     of conflicts it may seldom catch up fully.
+    </para>
+ 
+    <para>
+     Users should be clear that tables that are regularly and heavily updated on
+     primary server will quickly cause cancellation of longer running queries on
+     the standby. In those cases <varname>max_standby_delay</> can be
+     considered somewhat but not exactly the same as setting
+     <varname>statement_timeout</>.
+     </para>
+ 
+    <para>
+     Other remedial actions exist if the number of cancellations is unacceptable.
+     The first option is to connect to primary server and keep a query active
+     for as long as we need to run queries on the standby. This guarantees that
+     a WAL cleanup record is never generated and we don't ever get query
+     conflicts as described above. This could be done using contrib/dblink
+     and pg_sleep(), or via other mechanisms. If you do this, you should note
+     that this will delay cleanup of dead rows by vacuum or HOT and many
+     people may find this undesirable. However, we should remember that
+     primary and standby nodes are linked via the WAL, so this situation is no
+     different to the case where we ran the query on the primary node itself
+     except we have the benefit of off-loading the execution onto the standby.
+    </para>
+ 
+    <para>
+     It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
+     to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
+     more time for queries to execute before they are cancelled on the standby,
+     without the need for setting a high <varname>max_standby_delay</>.
+    </para>
+ 
+    <para>
+     Three-way deadlocks are possible between AccessExclusiveLocks arriving from
+     the primary, cleanup WAL records that require buffer cleanup locks and
+     user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
+     are resolved by time-out when we exceed <varname>max_standby_delay</>.
+    </para>
+ 
+    <para>
+     Dropping tablespaces or databases is discussed in the administrator's
+     section since they are not typical user situations.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-admin">
+    <title>Administrator's Overview</title>
+ 
+    <para>
+     If there is a <filename>recovery.conf</> file present the server will start
+     in Hot Standby mode by default, though <varname>recovery_connections</> can
+     be disabled via <filename>postgresql.conf</>, if required. The server may take
+     some time to enable recovery connections since the server must first complete
+     sufficient recovery to provide a consistent state against which queries
+     can run before enabling read only connections. Look for these messages
+     in the server logs
+ 
+ <programlisting>
+ LOG:  initializing recovery connections
+ 
+ ... then some time later ...
+ 
+ LOG:  consistent recovery state reached
+ LOG:  database system is ready to accept read only connections
+ </programlisting>
+ 
+     Consistency information is recorded once per checkpoint on the primary, as long
+     as <varname>recovery_connections</> is enabled (on the primary). If this parameter
+     is disabled, it will not be possible to enable recovery connections on the standby.
+     The consistent state can also be delayed in the presence of both of these conditions
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          a write transaction has more than 64 subtransactions
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          very long-lived write transactions
+         </para>
+        </listitem>
+       </itemizedlist>
+ 
+     If you are running file-based log shipping ("warm standby"), you may need
+     to wait until the next WAL file arrives, which could be as long as the
+     <varname>archive_timeout</> setting on the primary.
+    </para>
+ 
+    <para>
+     The setting of some parameters on the standby will need reconfiguration
+     if they have been changed on the primary. The value on the standby must
+     be equal to or greater than the value on the primary. If these parameters
+     are not set high enough then the standby will not be able to track work
+     correctly from recovering transactions. If these values are set too low the
+     the server will halt. Higher values can then be supplied and the server
+     restarted to begin recovery again.
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          <varname>max_connections</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <varname>max_prepared_transactions</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <varname>max_locks_per_transaction</>
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     It is important that the administrator consider the appropriate setting
+     of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
+     There is no optimal setting and should be set according to business
+     priorities. For example if the server is primarily tasked as a High
+     Availability server, then you may wish to lower
+     <varname>max_standby_delay</> or even set it to zero, though that is a
+     very aggressive setting. If the standby server is tasked as an additional
+     server for decision support queries then it may be acceptable to set this
+     to a value of many hours (in seconds).
+    </para>
+ 
+    <para>
+     Transaction status "hint bits" written on primary are not WAL-logged,
+     so data on standby will likely re-write the hints again on the standby.
+     Thus the main database blocks will produce write I/Os even though
+     all users are read-only; no changes have occurred to the data values
+     themselves.  Users will be able to write large sort temp files and
+     re-generate relcache info files, so there is no part of the database
+     that is truly read-only during hot standby mode. There is no restriction
+     on the use of set returning functions, or other users of tuplestore/tuplesort
+     code. Note also that writes to remote databases will still be possible,
+     even though the transaction is read-only locally.
+    </para>
+ 
+    <para>
+     The following types of administrator command are not accepted
+     during recovery mode
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          Data Definition Language (DDL) - e.g. CREATE INDEX
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Privilege and Ownership - GRANT, REVOKE, REASSIGN
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     Note again that some of these commands are actually allowed during
+     "read only" mode transactions on the primary.
+    </para>
+ 
+    <para>
+     As a result, you cannot create additional indexes that exist solely
+     on the standby, nor can statistics that exist solely on the standby.
+     If these administrator commands are needed they should be executed
+     on the primary so that the changes will propagate through to the
+     standby.
+    </para>
+ 
+    <para>
+     <function>pg_cancel_backend()</> will work on user backends, but not the
+     Startup process, which performs recovery. pg_stat_activity does not
+     show an entry for the Startup process, nor do recovering transactions
+     show as active. As a result, pg_prepared_xacts is always empty during
+     recovery. If you wish to resolve in-doubt prepared transactions
+     then look at pg_prepared_xacts on the primary and issue commands to
+     resolve those transactions there.
+    </para>
+ 
+    <para>
+     pg_locks will show locks held by backends as normal. pg_locks also shows
+     a virtual transaction managed by the Startup process that owns all
+     AccessExclusiveLocks held by transactions being replayed by recovery.
+     Note that Startup process does not acquire locks to
+     make database changes and thus locks other than AccessExclusiveLocks
+     do not show in pg_locks for the Startup process, they are just presumed
+     to exist.
+    </para>
+ 
+    <para>
+     <productname>check_pgsql</> will work, but it is very simple.
+     <productname>check_postgres</> will also work, though many some actions
+     could give different or confusing results.
+     e.g. last vacuum time will not be maintained for example, since no
+     vacuum occurs on the standby (though vacuums running on the primary do
+     send their changes to the standby).
+    </para>
+ 
+    <para>
+     WAL file control commands will not work during recovery
+     e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
+    </para>
+ 
+    <para>
+     Dynamically loadable modules work, including pg_stat_statements.
+    </para>
+ 
+    <para>
+     Advisory locks work normally in recovery, including deadlock detection.
+     Note that advisory locks are never WAL logged, so it is not possible for
+     an advisory lock on either the primary or the standby to conflict with WAL
+     replay. Nor is it possible to acquire an advisory lock on the primary
+     and have it initiate a similar advisory lock on the standby. Advisory
+     locks relate only to a single server on which they are acquired.
+    </para>
+ 
+    <para>
+     Trigger-based replication systems such as <productname>Slony</>,
+     <productname>Londiste</> and <productname>Bucardo</> won't run on the
+     standby at all, though they will run happily on the primary server as
+     long as the changes are not sent to standby servers to be applied.
+     WAL replay is not trigger-based so you cannot relay from the
+     standby to any system that requires additional database writes or
+     relies on the use of triggers.
+    </para>
+ 
+    <para>
+     New oids cannot be assigned, though some <acronym>UUID</> generators may still
+     work as long as they do not rely on writing new status to the database.
+    </para>
+ 
+    <para>
+     Currently, temp table creation is not allowed during read only
+     transactions, so in some cases existing scripts will not run correctly.
+     It is possible we may relax that restriction in a later release. This is
+     both a SQL Standard compliance issue and a technical issue.
+    </para>
+ 
+    <para>
+     <command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
+     Some standby users may be actively using the tablespace via their
+     <varname>temp_tablespaces</> parameter. If there are temp files in the
+     tablespace we currently cancel all active queries to ensure that temp
+     files are removed, so that we can remove the tablespace and continue with
+     WAL replay.
+    </para>
+ 
+    <para>
+     Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
+     or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
+     that will cause all users connected to that database on the standby to be
+     forcibly disconnected. This action occurs immediately, whatever the setting of
+     <varname>max_standby_delay</>.
+    </para>
+ 
+    <para>
+     In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
+     for a role with login capability while that user is still connected then
+     nothing happens to the connected user - they remain connected. The user cannot
+     reconnect however. This behaviour applies in recovery also, so a
+     <command>DROP USER</> on the primary does not disconnect that user on the standby.
+    </para>
+ 
+    <para>
+     Stats collector is active during recovery. All scans, reads, blocks,
+     index usage etc will all be recorded normally on the standby. Replayed
+     actions will not duplicate their effects on primary, so replaying an
+     insert will not increment the Inserts column of pg_stat_user_tables.
+     The stats file is deleted at start of recovery, so stats from primary
+     and standby will differ; this is considered a feature not a bug.
+    </para>
+ 
+    <para>
+     Autovacuum is not active during recovery, though will start normally
+     at the end of recovery.
+    </para>
+ 
+    <para>
+     Background writer is active during recovery and will perform
+     restartpoints (similar to checkpoints on primary) and normal block
+     cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
+     though performs a restartpoint rather than a new checkpoint.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-parameters">
+    <title>Hot Standby Parameter Reference</title>
+ 
+    <para>
+     Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
+     and <xref linkend="hot-standby-conflict"> sections.
+    </para>
+ 
+    <para>
+     On the primary, parameters <varname>recovery_connections</> and
+     <varname>vacuum_defer_cleanup_age</> can be used to enable and control the
+     primary server to assist the successful configuration of Hot Standby servers.
+     <varname>max_standby_delay</> has no effect if set on the primary.
+    </para>
+ 
+    <para>
+     On the standby, parameters <varname>recovery_connections</> and
+     <varname>max_standby_delay</> can be used to enable and control Hot Standby.
+     standby server to assist the successful configuration of Hot Standby servers.
+     <varname>vacuum_defer_cleanup_age</> has no effect during recovery.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-caveats">
+    <title>Caveats</title>
+ 
+    <para>
+     At this writing, there are several limitations of Hot Standby.
+     These can and probably will be fixed in future releases:
+ 
+   <itemizedlist>
+    <listitem>
+     <para>
+      Operations on hash indexes are not presently WAL-logged, so
+      replay will not update these indexes.  Hash indexes will not be
+      used for query plans during recovery.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Full knowledge of running transactions is required before snapshots
+      may be taken. Transactions that take use large numbers of subtransactions
+      (currently greater than 64) will delay the start of read only
+      connections until the completion of the longest running write transaction.
+      If this situation occurs explanatory messages will be sent to server log.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Valid starting points for recovery connections are generated at each
+      checkpoint on the master. If the standby is shutdown while the master
+      is in a shutdown state it may not be possible to re-enter Hot Standby
+      until the primary is started up so that it generates further starting
+      points in the WAL logs. This is not considered a serious issue
+      because the standby is usually switched into the primary role while
+      the first node is taken down.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      At the end of recovery, AccessExclusiveLocks held by prepared transactions
+      will require twice the normal number of lock table entries. If you plan
+      on running either a large number of concurrent prepared transactions
+      that normally take AccessExclusiveLocks, or you plan on having one
+      large transaction that takes many AccessExclusiveLocks then you are
+      advised to select a larger value of <varname>max_locks_per_transaction</>,
+      up to, but never more than twice the value of the parameter setting on
+      the primary server in rare extremes. You need not consider this at all if
+      your setting of <varname>max_prepared_transactions</> is <literal>0</>.
+     </para>
+    </listitem>
+   </itemizedlist>
+ 
+    </para>
+   </sect2>
+ 
+  </sect1>
+ 
+   <sect1 id="backup-incremental-updated">
+    <title>Incrementally Updated Backups</title>
+ 
+   <indexterm zone="high-availability">
+    <primary>incrementally updated backups</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>change accumulation</primary>
+   </indexterm>
+ 
+    <para>
+     In a warm standby configuration, it is possible to offload the expense of
+     taking periodic base backups from the primary server; instead base backups
+     can be made by backing
+     up a standby server's files.  This concept is generally known as
+     incrementally updated backups, log change accumulation, or more simply,
+     change accumulation.
+    </para>
+ 
+    <para>
+     If we take a file system backup of the standby server's data
+     directory while it is processing
+     logs shipped from the primary, we will be able to reload that backup and
+     restart the standby's recovery process from the last restart point.
+     We no longer need to keep WAL files from before the standby's restart point.
+     If we need to recover, it will be faster to recover from the incrementally
+     updated backup than from the original base backup.
+    </para>
+ 
+    <para>
+     Since the standby server is not <quote>live</>, it is not possible to
+     use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
+     to manage the backup process; it will be up to you to determine how
+     far back you need to keep WAL segment files to have a recoverable
+     backup.  You can do this by running <application>pg_controldata</>
+     on the standby server to inspect the control file and determine the
+     current checkpoint WAL location, or by using the
+     <varname>log_checkpoints</> option to print values to the standby's
+     server log.
+    </para>
+   </sect1>
+ 
  </chapter>
#6Thom Brown
thombrown@gmail.com
In reply to: Fujii Masao (#5)
1 attachment(s)
Re: Streaming replication in docs

On 9 February 2010 01:57, Fujii Masao <masao.fujii@gmail.com> wrote:

On Sun, Feb 7, 2010 at 7:49 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Yeah, as you noticed, the documentation for streaming replication is
currently non-existent. That obviously needs to be fixed. I think we
need to move the existing sections around, and of course add a lot of
documentation for the new streaming replication and hot standby specific
stuff. I proposed a new layout here:

http://archives.postgresql.org/message-id/4B50A600.3090309@enterprisedb.com

The attached patch changes the layout according to your suggestion
(but with a small modification). Here is the new layout:

-------------------------------
Chapter 24. Backup and Restore
24.1. SQL Dump
   24.1.1. Restoring the dump
   24.1.2. Using pg_dumpall
   24.1.3. Handling large databases
24.2. File System Level Backup
24.3. Continuous Archiving and Point-In-Time Recovery (PITR)
   24.3.1. Setting up WAL archiving
   24.3.2. Making a Base Backup
   24.3.3. Recovering using a Continuous Archive Backup
   24.3.4. Timelines
   24.3.5. Tips and Examples
   24.3.6. Caveats
24.4. Migration Between Releases

Chapter 25. High Availability, Load Balancing, and Replication
25.1. Comparison of different solutions
25.2. File-based Log Shipping
   25.2.1. Planning
   25.2.2. Implementation
   25.2.3. Record-based Log Shipping
25.3. Streaming Replication
   25.3.1. Setup
   25.3.2. Authentication
25.4. Failover
25.5. Hot Standby
   25.5.1. User's Overview
   25.5.2. Handling query conflicts
   25.5.3. Administrator's Overview
   25.5.4. Hot Standby Parameter Reference
   25.5.5. Caveats
25.6. Incrementally Updated Backups
-------------------------------

I didn't change the content except adjusting the "sect" and
the "indexterm" tags to new chapter.

If this layout change will have been committed, I'll improve
the SR section.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

And can I just correct a couple typo's while we're changing it too? (attached)

Thanks

Thom

Attachments:

replication-doc-layout-changev2.patchapplication/octet-stream; name=replication-doc-layout-changev2.patchDownload
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1492,2879 **** archive_command = 'local_backup_script.sh'
    </sect2>
   </sect1>
  
-  <sect1 id="warm-standby">
-   <title>Warm Standby Servers for High Availability</title>
- 
-   <indexterm zone="backup">
-    <primary>warm standby</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>PITR standby</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>standby server</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>log shipping</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>witness server</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>STONITH</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>high availability</primary>
-   </indexterm>
- 
-   <para>
-    Continuous archiving can be used to create a <firstterm>high
-    availability</> (HA) cluster configuration with one or more
-    <firstterm>standby servers</> ready to take over operations if the
-    primary server fails. This capability is widely referred to as
-    <firstterm>warm standby</> or <firstterm>log shipping</>.
-   </para>
- 
-   <para>
-    The primary and standby server work together to provide this capability,
-    though the servers are only loosely coupled. The primary server operates
-    in continuous archiving mode, while each standby server operates in
-    continuous recovery mode, reading the WAL files from the primary. No
-    changes to the database tables are required to enable this capability,
-    so it offers low administration overhead compared to some other
-    replication approaches. This configuration also has relatively low
-    performance impact on the primary server.
-   </para>
- 
-   <para>
-    Directly moving WAL records from one database server to another
-    is typically described as log shipping. <productname>PostgreSQL</>
-    implements file-based log shipping, which means that WAL records are
-    transferred one file (WAL segment) at a time. WAL files (16MB) can be
-    shipped easily and cheaply over any distance, whether it be to an
-    adjacent system, another system at the same site, or another system on
-    the far side of the globe. The bandwidth required for this technique
-    varies according to the transaction rate of the primary server.
-    Record-based log shipping is also possible with custom-developed
-    procedures, as discussed in <xref linkend="warm-standby-record">.
-   </para>
- 
-   <para>
-    It should be noted that the log shipping is asynchronous, i.e., the WAL
-    records are shipped after transaction commit. As a result there is a
-    window for data loss should the primary server suffer a catastrophic
-    failure: transactions not yet shipped will be lost.  The length of the
-    window of data loss can be limited by use of the
-    <varname>archive_timeout</varname> parameter, which can be set as low
-    as a few seconds if required.  However such a low setting will
-    substantially increase the bandwidth required for file shipping.
-    If you need a window of less than a minute or so, it's probably better
-    to consider record-based log shipping.
-   </para>
- 
-   <para>
-    The standby server is not available for access, since it is continually
-    performing recovery processing. Recovery performance is sufficiently
-    good that the standby will typically be only moments away from full
-    availability once it has been activated. As a result, we refer to this
-    capability as a warm standby configuration that offers high
-    availability. Restoring a server from an archived base backup and
-    roll-forward will take considerably longer, so that technique only
-    offers a solution for disaster recovery, not high availability.
-   </para>
- 
-   <sect2 id="warm-standby-planning">
-    <title>Planning</title>
- 
-    <para>
-     It is usually wise to create the primary and standby servers
-     so that they are as similar as possible, at least from the
-     perspective of the database server.  In particular, the path names
-     associated with tablespaces will be passed across unmodified, so both
-     primary and standby servers must have the same mount paths for
-     tablespaces if that feature is used.  Keep in mind that if
-     <xref linkend="sql-createtablespace" endterm="sql-createtablespace-title">
-     is executed on the primary, any new mount point needed for it must
-     be created on the primary and all standby servers before the command
-     is executed. Hardware need not be exactly the same, but experience shows
-     that maintaining two identical systems is easier than maintaining two
-     dissimilar ones over the lifetime of the application and system.
-     In any case the hardware architecture must be the same &mdash; shipping
-     from, say, a 32-bit to a 64-bit system will not work.
-    </para>
- 
-    <para>
-     In general, log shipping between servers running different major
-     <productname>PostgreSQL</> release
-     levels is not possible. It is the policy of the PostgreSQL Global
-     Development Group not to make changes to disk formats during minor release
-     upgrades, so it is likely that running different minor release levels
-     on primary and standby servers will work successfully. However, no
-     formal support for that is offered and you are advised to keep primary
-     and standby servers at the same release level as much as possible.
-     When updating to a new minor release, the safest policy is to update
-     the standby servers first &mdash; a new minor release is more likely
-     to be able to read WAL files from a previous minor release than vice
-     versa.
-    </para>
- 
-    <para>
-     There is no special mode required to enable a standby server. The
-     operations that occur on both primary and standby servers are
-     normal continuous archiving and recovery tasks. The only point of
-     contact between the two database servers is the archive of WAL files
-     that both share: primary writing to the archive, standby reading from
-     the archive. Care must be taken to ensure that WAL archives from separate
-     primary servers do not become mixed together or confused. The archive
-     need not be large if it is only required for standby operation.
-    </para>
- 
-    <para>
-     The magic that makes the two loosely coupled servers work together is
-     simply a <varname>restore_command</> used on the standby that,
-     when asked for the next WAL file, waits for it to become available from
-     the primary. The <varname>restore_command</> is specified in the
-     <filename>recovery.conf</> file on the standby server. Normal recovery
-     processing would request a file from the WAL archive, reporting failure
-     if the file was unavailable.  For standby processing it is normal for
-     the next WAL file to be unavailable, so we must be patient and wait for
-     it to appear. For files ending in <literal>.backup</> or
-     <literal>.history</> there is no need to wait, and a non-zero return
-     code must be returned. A waiting <varname>restore_command</> can be
-     written as a custom script that loops after polling for the existence of
-     the next WAL file. There must also be some way to trigger failover, which
-     should interrupt the <varname>restore_command</>, break the loop and
-     return a file-not-found error to the standby server. This ends recovery
-     and the standby will then come up as a normal server.
-    </para>
- 
-    <para>
-     Pseudocode for a suitable <varname>restore_command</> is:
- <programlisting>
- triggered = false;
- while (!NextWALFileReady() &amp;&amp; !triggered)
- {
-     sleep(100000L);         /* wait for ~0.1 sec */
-     if (CheckForExternalTrigger())
-         triggered = true;
- }
- if (!triggered)
-         CopyWALFileForRecovery();
- </programlisting>
-    </para>
- 
-    <para>
-     A working example of a waiting <varname>restore_command</> is provided
-     as a <filename>contrib</> module named <application>pg_standby</>. It
-     should be used as a reference on how to correctly implement the logic
-     described above. It can also be extended as needed to support specific
-     configurations and environments.
-    </para>
- 
-    <para>
-     <productname>PostgreSQL</productname> does not provide the system
-     software required to identify a failure on the primary and notify
-     the standby database server.  Many such tools exist and are well
-     integrated with the operating system facilities required for
-     successful failover, such as IP address migration.
-    </para>
- 
-    <para>
-     The method for triggering failover is an important part of planning
-     and design. One potential option is the <varname>restore_command</>
-     command.  It is executed once for each WAL file, but the process
-     running the <varname>restore_command</> is created and dies for
-     each file, so there is no daemon or server process, and we cannot
-     use signals or a signal handler. Therefore, the
-     <varname>restore_command</> is not suitable to trigger failover.
-     It is possible to use a simple timeout facility, especially if
-     used in conjunction with a known <varname>archive_timeout</>
-     setting on the primary. However, this is somewhat error prone
-     since a network problem or busy primary server might be sufficient
-     to initiate failover. A notification mechanism such as the explicit
-     creation of a trigger file is ideal, if this can be arranged.
-    </para>
- 
-    <para>
-     The size of the WAL archive can be minimized by using the <literal>%r</>
-     option of the <varname>restore_command</>. This option specifies the
-     last archive file name that needs to be kept to allow the recovery to
-     restart correctly. This can be used to truncate the archive once
-     files are no longer required, assuming the archive is writable from the
-     standby server.
-    </para>
-   </sect2>
- 
-   <sect2 id="warm-standby-config">
-    <title>Implementation</title>
- 
-    <para>
-     The short procedure for configuring a standby server is as follows. For
-     full details of each step, refer to previous sections as noted.
-     <orderedlist>
-      <listitem>
-       <para>
-        Set up primary and standby systems as nearly identical as
-        possible, including two identical copies of
-        <productname>PostgreSQL</> at the same release level.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Set up continuous archiving from the primary to a WAL archive
-        directory on the standby server. Ensure that
-        <xref linkend="guc-archive-mode">,
-        <xref linkend="guc-archive-command"> and
-        <xref linkend="guc-archive-timeout">
-        are set appropriately on the primary
-        (see <xref linkend="backup-archiving-wal">).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Make a base backup of the primary server (see <xref
-        linkend="backup-base-backup">), and load this data onto the standby.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Begin recovery on the standby server from the local WAL
-        archive, using a <filename>recovery.conf</> that specifies a
-        <varname>restore_command</> that waits as described
-        previously (see <xref linkend="backup-pitr-recovery">).
-       </para>
-      </listitem>
-     </orderedlist>
-    </para>
- 
-    <para>
-     Recovery treats the WAL archive as read-only, so once a WAL file has
-     been copied to the standby system it can be copied to tape at the same
-     time as it is being read by the standby database server.
-     Thus, running a standby server for high availability can be performed at
-     the same time as files are stored for longer term disaster recovery
-     purposes.
-    </para>
- 
-    <para>
-     For testing purposes, it is possible to run both primary and standby
-     servers on the same system. This does not provide any worthwhile
-     improvement in server robustness, nor would it be described as HA.
-    </para>
-   </sect2>
- 
-   <sect2 id="warm-standby-failover">
-    <title>Failover</title>
- 
-    <para>
-     If the primary server fails then the standby server should begin
-     failover procedures.
-    </para>
- 
-    <para>
-     If the standby server fails then no failover need take place. If the
-     standby server can be restarted, even some time later, then the recovery
-     process can also be immediately restarted, taking advantage of
-     restartable recovery. If the standby server cannot be restarted, then a
-     full new standby server instance should be created.
-    </para>
- 
-    <para>
-     If the primary server fails and the standby server becomes the
-     new primary, and then the old primary restarts, you must have
-     a mechanism for informing old primary that it is no longer the primary. This is
-     sometimes known as STONITH (Shoot The Other Node In The Head), which is
-     necessary to avoid situations where both systems think they are the
-     primary, which will lead to confusion and ultimately data loss.
-    </para>
- 
-    <para>
-     Many failover systems use just two systems, the primary and the standby,
-     connected by some kind of heartbeat mechanism to continually verify the
-     connectivity between the two and the viability of the primary. It is
-     also possible to use a third system (called a witness server) to prevent
-     some cases of inappropriate failover, but the additional complexity
-     might not be worthwhile unless it is set up with sufficient care and
-     rigorous testing.
-    </para>
- 
-    <para>
-     Once failover to the standby occurs, we have only a
-     single server in operation. This is known as a degenerate state.
-     The former standby is now the primary, but the former primary is down
-     and might stay down.  To return to normal operation we must
-     fully recreate a standby server,
-     either on the former primary system when it comes up, or on a third,
-     possibly new, system. Once complete the primary and standby can be
-     considered to have switched roles. Some people choose to use a third
-     server to provide backup for the new primary until the new standby
-     server is recreated,
-     though clearly this complicates the system configuration and
-     operational processes.
-    </para>
- 
-    <para>
-     So, switching from primary to standby server can be fast but requires
-     some time to re-prepare the failover cluster. Regular switching from
-     primary to standby is useful, since it allows regular downtime on
-     each system for maintenance. This also serves as a test of the
-     failover mechanism to ensure that it will really work when you need it.
-     Written administration procedures are advised.
-    </para>
-   </sect2>
- 
-   <sect2 id="warm-standby-record">
-    <title>Record-based Log Shipping</title>
- 
-    <para>
-     <productname>PostgreSQL</productname> directly supports file-based
-     log shipping as described above. It is also possible to implement
-     record-based log shipping, though this requires custom development.
-    </para>
- 
-    <para>
-     An external program can call the <function>pg_xlogfile_name_offset()</>
-     function (see <xref linkend="functions-admin">)
-     to find out the file name and the exact byte offset within it of
-     the current end of WAL.  It can then access the WAL file directly
-     and copy the data from the last known end of WAL through the current end
-     over to the standby servers.  With this approach, the window for data
-     loss is the polling cycle time of the copying program, which can be very
-     small, and there is no wasted bandwidth from forcing partially-used
-     segment files to be archived.  Note that the standby servers'
-     <varname>restore_command</> scripts can only deal with whole WAL files,
-     so the incrementally copied data is not ordinarily made available to
-     the standby servers.  It is of use only when the primary dies &mdash;
-     then the last partial WAL file is fed to the standby before allowing
-     it to come up.  The correct implementation of this process requires
-     cooperation of the <varname>restore_command</> script with the data
-     copying program.
-    </para>
- 
-    <para>
-     Starting with <productname>PostgreSQL</> version 8.5, you can use
-     streaming replication (see <xref linkend="streaming-replication">) to
-     achieve the same with less effort.
-    </para>
-   </sect2>
- 
-   <sect2 id="streaming-replication">
-    <title>Streaming Replication</title>
- 
-    <para>
-     <productname>PostgreSQL</> includes a simple streaming replication
-     mechanism, which lets the standby server to stay more up-to-date than
-     file-based replication allows. The standby connects to the primary
-     and the primary starts streaming WAL records from where the standby
-     left off, and continues streaming them as they are generated, without
-     waiting for the WAL file to be filled. So with streaming replication,
-     <varname>archive_timeout</> does not need to be configured.
-    </para>
- 
-    <para>
-     Streaming replication relies on file-based continuous archiving for
-     making the base backup and for allowing a standby to catch up if it's
-     disconnected from the primary for long enough for the primary to
-     delete old WAL files still required by the standby.
-    </para>
- 
-    <sect3 id="streaming-replication-setup">
-     <title>Setup</title>
-     <para>
-      The short procedure for configuring streaming replication is as follows.
-      For full details of each step, refer to other sections as noted.
-      <orderedlist>
-       <listitem>
-        <para>
-         Set up primary and standby systems as near identically as possible,
-         including two identical copies of <productname>PostgreSQL</> at the
-         same release level.
-        </para>
-       </listitem>
-      <listitem>
-       <para>
-        Set up continuous archiving from the primary to a WAL archive located
-        in a directory on the standby server. Ensure that
-        <xref linkend="guc-archive-mode">,
-        <xref linkend="guc-archive-command"> and
-        <xref linkend="guc-archive-timeout">
-        are set appropriately on the primary
-        (see <xref linkend="backup-archiving-wal">).
-       </para>
-      </listitem>
- 
-      <listitem>
-       <para>
-        Set up connections and authentication so that the standby server can
-        successfully connect to the pseudo <literal>replication</> database of
-        the primary server (see
-        <xref linkend="streaming-replication-authentication">). Ensure that
-        <xref linkend="guc-listen-addresses"> and <filename>pg_hba.conf</> are
-        configured appropriately on the primary.
-       </para>
-       <para>
-        On systems that support the keepalive socket option, setting
-        <xref linkend="guc-tcp-keepalives-idle">,
-        <xref linkend="guc-tcp-keepalives-interval"> and
-        <xref linkend="guc-tcp-keepalives-count"> helps you to find the
-        troubles with replication (e.g., the network outage or the failure of
-        the standby server) as soon as possible.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Set the maximum number of concurrent connections from the standby servers
-        (see <xref linkend="guc-max-wal-senders"> for details).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Enable WAL archiving in the primary server because we need to make a base
-        backup of it later (see <xref linkend="guc-archive-mode"> and
-        <xref linkend="guc-archive-command"> for details).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Start the <productname>PostgreSQL</> server on the primary.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Make a base backup of the primary server (see
-        <xref linkend="backup-base-backup">), and load this data onto the
-        standby. Note that all files present in <filename>pg_xlog</>
-        and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
-        server should be removed because they might be obsolete.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Set up WAL archiving, connections and authentication like the primary
-        server, because the standby server might work as a primary server after
-        failover. Ensure that your settings are consistent with the
-        <emphasis>future</> environment after the primary and the standby
-        server are interchanged by failover. If you're setting up the standby
-        server for e.g reporting purposes, with no plans to fail over to it,
-        configure the standby accordingly.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Create a recovery command file <filename>recovery.conf</> in the data
-        directory on the standby server.
-       </para>
- 
-       <variablelist id="replication-config-settings" xreflabel="Replication Settings">
-        <varlistentry id="standby-mode" xreflabel="standby_mode">
-         <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
-         <listitem>
-          <para>
-           Specifies whether to start the <productname>PostgreSQL</> server as
-           a standby. If this parameter is <literal>on</>, the streaming
-           replication is enabled and the standby server will try to connect
-           to the primary to receive and apply WAL records continuously. The
-           default is <literal>off</>, which allows only an archive recovery
-           without replication. So, streaming replication requires this
-           parameter to be explicitly set to <literal>on</>.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
-         <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
-         <listitem>
-          <para>
-           Specifies a connection string which is used for the standby server
-           to connect with the primary. This string is in the same format as
-           described in <xref linkend="libpq-connect">. If any option is
-           unspecified in this string, then the corresponding environment
-           variable (see <xref linkend="libpq-envars">) is checked. If the
-           environment variable is not set either, then the indicated built-in
-           defaults are used.
-          </para>
-          <para>
-           The built-in replication requires that a host name (or host address)
-           or port number which the primary server listens on should be
-           specified in this string, respectively. Also ensure that a role with
-           the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
-           primary is set (see
-           <xref linkend="streaming-replication-authentication">). Note that
-           the password needs to be set if the primary demands password
-           authentication.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry id="trigger-file" xreflabel="trigger_file">
-         <term><varname>trigger_file</varname> (<type>string</type>)</term>
-         <listitem>
-          <para>
-           Specifies a trigger file whose presence activates the standby.
-           If no trigger file is specified, the standby never exits
-           recovery.
-          </para>
-         </listitem>
-        </varlistentry>
-       </variablelist>
-      </listitem>
-      <listitem>
-       <para>
-        Start the <productname>PostgreSQL</> server on the standby. The standby
-        server will go into recovery mode and proceeds to receive WAL records
-        from the primary and apply them continuously.
-       </para>
-      </listitem>
-      </orderedlist>
-     </para>
-    </sect3>
-    <sect3 id="streaming-replication-authentication">
-     <title>Authentication</title>
-     <para>
-      It's very important that the access privilege for replication are set
-      properly so that only trusted users can read the WAL stream, because it's
-      easy to extract serious information from it.
-     </para>
-     <para>
-      Only superuser is allowed to connect to the primary as the replication
-      standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
-      privileges needs to be created in the primary.
-     </para>
-     <para>
-      Client authentication for replication is controlled by the
-      <filename>pg_hba.conf</> record specifying <literal>replication</> in the
-      <replaceable>database</> field. For example, if the standby is running on
-      host IP <literal>192.168.1.100</> and the superuser's name for replication
-      is <literal>foo</>, the administrator can add the following line to the
-      <filename>pg_hba.conf</> file on the primary.
- 
- <programlisting>
- # Allow the user "foo" from host 192.168.1.100 to connect to the primary
- # as a replication standby if the user's password is correctly supplied.
- #
- # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
- host    replication     foo             192.168.1.100/32        md5
- </programlisting>
-     </para>
-     <para>
-      The host name and port number of the primary, user name to connect as,
-      and password are specified in the <filename>recovery.conf</> file or
-      the corresponding environment variable on the standby.
-      For example, if the primary is running on host IP <literal>192.168.1.50</>,
-      port <literal>5432</literal>, the superuser's name for replication is
-      <literal>foo</>, and the password is <literal>foopass</>, the administrator
-      can add the following line to the <filename>recovery.conf</> file on the
-      standby.
- 
- <programlisting>
- # The standby connects to the primary that is running on host 192.168.1.50
- # and port 5432 as the user "foo" whose password is "foopass".
- primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
- </programlisting>
-     </para>
-    </sect3>
-   </sect2>
- 
-   <sect2 id="backup-incremental-updated">
-    <title>Incrementally Updated Backups</title>
- 
-   <indexterm zone="backup">
-    <primary>incrementally updated backups</primary>
-   </indexterm>
- 
-   <indexterm zone="backup">
-    <primary>change accumulation</primary>
-   </indexterm>
- 
-    <para>
-     In a warm standby configuration, it is possible to offload the expense of
-     taking periodic base backups from the primary server; instead base backups
-     can be made by backing
-     up a standby server's files.  This concept is generally known as
-     incrementally updated backups, log change accumulation, or more simply,
-     change accumulation.
-    </para>
- 
-    <para>
-     If we take a file system backup of the standby server's data
-     directory while it is processing
-     logs shipped from the primary, we will be able to reload that backup and
-     restart the standby's recovery process from the last restart point.
-     We no longer need to keep WAL files from before the standby's restart point.
-     If we need to recover, it will be faster to recover from the incrementally
-     updated backup than from the original base backup.
-    </para>
- 
-    <para>
-     Since the standby server is not <quote>live</>, it is not possible to
-     use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
-     to manage the backup process; it will be up to you to determine how
-     far back you need to keep WAL segment files to have a recoverable
-     backup.  You can do this by running <application>pg_controldata</>
-     on the standby server to inspect the control file and determine the
-     current checkpoint WAL location, or by using the
-     <varname>log_checkpoints</> option to print values to the standby's
-     server log.
-    </para>
-   </sect2>
-  </sect1>
- 
-  <sect1 id="hot-standby">
-   <title>Hot Standby</title>
- 
-   <indexterm zone="backup">
-    <primary>Hot Standby</primary>
-   </indexterm>
- 
-    <para>
-     Hot Standby is the term used to describe the ability to connect to
-     the server and run queries while the server is in archive recovery. This
-     is useful for both log shipping replication and for restoring a backup
-     to an exact state with great precision.
-     The term Hot Standby also refers to the ability of the server to move
-     from recovery through to normal running while users continue running
-     queries and/or continue their connections.
-    </para>
- 
-    <para>
-     Running queries in recovery is in many ways the same as normal running
-     though there are a large number of usage and administrative points
-     to note.
-    </para>
- 
-   <sect2 id="hot-standby-users">
-    <title>User's Overview</title>
- 
-    <para>
-     Users can connect to the database while the server is in recovery
-     and perform read-only queries. Read-only access to catalogs and views
-     will also occur as normal.
-    </para>
- 
-    <para>
-     The data on the standby takes some time to arrive from the primary server
-     so there will be a measurable delay between primary and standby. Running the
-     same query nearly simultaneously on both primary and standby might therefore
-     return differing results. We say that data on the standby is eventually
-     consistent with the primary.
-     Queries executed on the standby will be correct with regard to the transactions
-     that had been recovered at the start of the query, or start of first statement,
-     in the case of serializable transactions. In comparison with the primary,
-     the standby returns query results that could have been obtained on the primary
-     at some exact moment in the past.
-    </para>
- 
-    <para>
-     When a transaction is started in recovery, the parameter
-     <varname>transaction_read_only</> will be forced to be true, regardless of the
-     <varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
-     It can't be manually set to false either. As a result, all transactions
-     started during recovery will be limited to read-only actions only. In all
-     other ways, connected sessions will appear identical to sessions
-     initiated during normal processing mode. There are no special commands
-     required to initiate a connection at this time, so all interfaces
-     work normally without change. After recovery finishes, the session
-     will allow normal read-write transactions at the start of the next
-     transaction, if these are requested.
-    </para>
- 
-    <para>
-     Read-only here means "no writes to the permanent database tables".
-     There are no problems with queries that make use of transient sort and
-     work files.
-    </para>
- 
-    <para>
-     The following actions are allowed
- 
-     <itemizedlist>
-      <listitem>
-       <para>
-        Query access - SELECT, COPY TO including views and SELECT RULEs
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Cursor commands - DECLARE, FETCH, CLOSE,
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Parameters - SHOW, SET, RESET
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Transaction management commands
-         <itemizedlist>
-          <listitem>
-           <para>
-            BEGIN, END, ABORT, START TRANSACTION
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            EXCEPTION blocks and other internal subtransactions
-           </para>
-          </listitem>
-         </itemizedlist>
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        LOCK TABLE, though only when explicitly in one of these modes:
-        ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Plugins and extensions - LOAD
-       </para>
-      </listitem>
-     </itemizedlist>
-    </para>
- 
-    <para>
-     These actions produce error messages
- 
-     <itemizedlist>
-      <listitem>
-       <para>
-        Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
-        Note that there are no allowed actions that result in a trigger
-        being executed during recovery.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
-        This also applies to temporary tables currently because currently their
-        definition causes writes to catalog tables.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        SELECT ... FOR SHARE | UPDATE which cause row locks to be written
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        RULEs on SELECT statements that generate DML commands.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
-        LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Transaction management commands that explicitly set non-read only state
-         <itemizedlist>
-          <listitem>
-           <para>
-             BEGIN READ WRITE,
-             START TRANSACTION READ WRITE
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-             SET TRANSACTION READ WRITE,
-             SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            SET transaction_read_only = off
-           </para>
-          </listitem>
-         </itemizedlist>
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
-        ROLLBACK PREPARED because even read-only transactions need to write
-        WAL in the prepare phase (the first phase of two phase commit).
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        sequence update - nextval()
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-        LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
-       </para>
-      </listitem>
-     </itemizedlist>
-    </para>
- 
-    <para>
-     Note that current behaviour of read only transactions when not in
-     recovery is to allow the last two actions, so there are small and
-     subtle differences in behaviour between read-only transactions
-     run on standby and during normal running.
-     It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
-     temporary tables may be lifted in a future release, if their internal
-     implementation is altered to make this possible.
-    </para>
- 
-    <para>
-     If failover or switchover occurs the database will switch to normal
-     processing mode. Sessions will remain connected while the server
-     changes mode. Current transactions will continue, though will remain
-     read-only. After recovery is complete, it will be possible to initiate
-     read-write transactions.
-    </para>
- 
-    <para>
-     Users will be able to tell whether their session is read-only by
-     issuing SHOW transaction_read_only.  In addition a set of
-     functions <xref linkend="functions-recovery-info-table"> allow users to
-     access information about Hot Standby. These allow you to write
-     functions that are aware of the current state of the database. These
-     can be used to monitor the progress of recovery, or to allow you to
-     write complex programs that restore the database to particular states.
-    </para>
- 
-    <para>
-     In recovery, transactions will not be permitted to take any table lock
-     higher than RowExclusiveLock. In addition, transactions may never assign
-     a TransactionId and may never write WAL.
-     Any <command>LOCK TABLE</> command that runs on the standby and requests
-     a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
-    </para>
- 
-    <para>
-     In general queries will not experience lock conflicts with the database
-     changes made by recovery. This is because recovery follows normal
-     concurrency control mechanisms, known as <acronym>MVCC</>. There are
-     some types of change that will cause conflicts, covered in the following
-     section.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-conflict">
-    <title>Handling query conflicts</title>
- 
-    <para>
-     The primary and standby nodes are in many ways loosely connected. Actions
-     on the primary will have an effect on the standby. As a result, there is
-     potential for negative interactions or conflicts between them. The easiest
-     conflict to understand is performance: if a huge data load is taking place
-     on the primary then this will generate a similar stream of WAL records on the
-     standby, so standby queries may contend for system resources, such as I/O.
-    </para>
- 
-    <para>
-     There are also additional types of conflict that can occur with Hot Standby.
-     These conflicts are <emphasis>hard conflicts</> in the sense that we may
-     need to cancel queries and in some cases disconnect sessions to resolve them.
-     The user is provided with a number of optional ways to handle these
-     conflicts, though we must first understand the possible reasons behind a conflict.
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          Access Exclusive Locks from primary node, including both explicit
-          LOCK commands and various kinds of DDL action
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Dropping tablespaces on the primary while standby queries are using
-          those tablespaces for temporary work files (work_mem overflow)
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Dropping databases on the primary while users are connected to that
-          database on the standby.
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Waiting to acquire buffer cleanup locks
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Early cleanup of data still visible to the current query's snapshot
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     Some WAL redo actions will be for DDL actions. These DDL actions are
-     repeating actions that have already committed on the primary node, so
-     they must not fail on the standby node. These DDL locks take priority
-     and will automatically *cancel* any read-only transactions that get in
-     their way, after a grace period. This is similar to the possibility of
-     being canceled by the deadlock detector, but in this case the standby
-     process always wins, since the replayed actions must not fail. This
-     also ensures that replication doesn't fall behind while we wait for a
-     query to complete. Again, we assume that the standby is there for high
-     availability purposes primarily.
-    </para>
- 
-    <para>
-     An example of the above would be an Administrator on Primary server
-     runs a <command>DROP TABLE</> on a table that's currently being queried
-     in the standby server.
-     Clearly the query cannot continue if we let the <command>DROP TABLE</>
-     proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
-     would wait until the query has finished. When the query is on the standby
-     and the <command>DROP TABLE</> is on the primary, the primary doesn't have
-     information about which queries are running on the standby and so the query
-     does not wait on the primary. The WAL change records come through to the
-     standby while the standby query is still running, causing a conflict.
-    </para>
- 
-    <para>
-     The most common reason for conflict between standby queries and WAL redo is
-     "early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
-     row versions when there are no users who may need to see them to ensure correct
-     visibility of data (the heart of MVCC). If there is a standby query that has
-     been running for longer than any query on the primary then it is possible
-     for old row versions to be removed by either a vacuum or HOT. This will
-     then generate WAL records that, if applied, would remove data on the
-     standby that might *potentially* be required by the standby query.
-     In more technical language, the primary's xmin horizon is later than
-     the standby's xmin horizon, allowing dead rows to be removed.
-    </para>
- 
-    <para>
-     Experienced users should note that both row version cleanup and row version
-     freezing will potentially conflict with recovery queries. Running a
-     manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
-     with no updated or deleted rows.
-    </para>
- 
-    <para>
-     We have a number of choices for resolving query conflicts.  The default
-     is that we wait and hope the query completes. The server will wait
-     automatically until the lag between primary and standby is at most
-     <varname>max_standby_delay</> seconds. Once that grace period expires,
-     we take one of the following actions:
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          If the conflict is caused by a lock, we cancel the conflicting standby
-          transaction immediately. If the transaction is idle-in-transaction
-          then currently we abort the session instead, though this may change
-          in the future.
-         </para>
-        </listitem>
- 
-        <listitem>
-         <para>
-          If the conflict is caused by cleanup records we tell the standby query
-          that a conflict has occurred and that it must cancel itself to avoid the
-          risk that it silently fails to read relevant data because
-          that data has been removed. (This is regrettably very similar to the
-          much feared and iconic error message "snapshot too old"). Some cleanup
-          records only cause conflict with older queries, though some types of
-          cleanup record affect all queries.
-         </para>
- 
-         <para>
-          If cancellation does occur, the query and/or transaction can always
-          be re-executed. The error is dynamic and will not necessarily occur
-          the same way if the query is executed again.
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     <varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
-     The parameter applies to the server as a whole so if the delay is all used
-     up by a single query then there may be little or no waiting for queries that
-     follow immediately, though they will have benefited equally from the initial
-     waiting period. The server may take time to catch up again before the grace
-     period is available again, though if there is a heavy and constant stream
-     of conflicts it may seldom catch up fully.
-    </para>
- 
-    <para>
-     Users should be clear that tables that are regularly and heavily updated on
-     primary server will quickly cause cancellation of longer running queries on
-     the standby. In those cases <varname>max_standby_delay</> can be
-     considered somewhat but not exactly the same as setting
-     <varname>statement_timeout</>.
-     </para>
- 
-    <para>
-     Other remedial actions exist if the number of cancellations is unacceptable.
-     The first option is to connect to primary server and keep a query active
-     for as long as we need to run queries on the standby. This guarantees that
-     a WAL cleanup record is never generated and we don't ever get query
-     conflicts as described above. This could be done using contrib/dblink
-     and pg_sleep(), or via other mechanisms. If you do this, you should note
-     that this will delay cleanup of dead rows by vacuum or HOT and many
-     people may find this undesirable. However, we should remember that
-     primary and standby nodes are linked via the WAL, so this situation is no
-     different to the case where we ran the query on the primary node itself
-     except we have the benefit of off-loading the execution onto the standby.
-    </para>
- 
-    <para>
-     It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
-     to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
-     more time for queries to execute before they are cancelled on the standby,
-     without the need for setting a high <varname>max_standby_delay</>.
-    </para>
- 
-    <para>
-     Three-way deadlocks are possible between AccessExclusiveLocks arriving from
-     the primary, cleanup WAL records that require buffer cleanup locks and
-     user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
-     are resolved by time-out when we exceed <varname>max_standby_delay</>.
-    </para>
- 
-    <para>
-     Dropping tablespaces or databases is discussed in the administrator's
-     section since they are not typical user situations.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-admin">
-    <title>Administrator's Overview</title>
- 
-    <para>
-     If there is a <filename>recovery.conf</> file present the server will start
-     in Hot Standby mode by default, though <varname>recovery_connections</> can
-     be disabled via <filename>postgresql.conf</>, if required. The server may take
-     some time to enable recovery connections since the server must first complete
-     sufficient recovery to provide a consistent state against which queries
-     can run before enabling read only connections. Look for these messages
-     in the server logs
- 
- <programlisting>
- LOG:  initializing recovery connections
- 
- ... then some time later ...
- 
- LOG:  consistent recovery state reached
- LOG:  database system is ready to accept read only connections
- </programlisting>
- 
-     Consistency information is recorded once per checkpoint on the primary, as long
-     as <varname>recovery_connections</> is enabled (on the primary). If this parameter
-     is disabled, it will not be possible to enable recovery connections on the standby.
-     The consistent state can also be delayed in the presence of both of these conditions
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          a write transaction has more than 64 subtransactions
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          very long-lived write transactions
-         </para>
-        </listitem>
-       </itemizedlist>
- 
-     If you are running file-based log shipping ("warm standby"), you may need
-     to wait until the next WAL file arrives, which could be as long as the
-     <varname>archive_timeout</> setting on the primary.
-    </para>
- 
-    <para>
-     The setting of some parameters on the standby will need reconfiguration
-     if they have been changed on the primary. The value on the standby must
-     be equal to or greater than the value on the primary. If these parameters
-     are not set high enough then the standby will not be able to track work
-     correctly from recovering transactions. If these values are set too low the
-     the server will halt. Higher values can then be supplied and the server
-     restarted to begin recovery again.
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          <varname>max_connections</>
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          <varname>max_prepared_transactions</>
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          <varname>max_locks_per_transaction</>
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     It is important that the administrator consider the appropriate setting
-     of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
-     There is no optimal setting and should be set according to business
-     priorities. For example if the server is primarily tasked as a High
-     Availability server, then you may wish to lower
-     <varname>max_standby_delay</> or even set it to zero, though that is a
-     very aggressive setting. If the standby server is tasked as an additional
-     server for decision support queries then it may be acceptable to set this
-     to a value of many hours (in seconds).
-    </para>
- 
-    <para>
-     Transaction status "hint bits" written on primary are not WAL-logged,
-     so data on standby will likely re-write the hints again on the standby.
-     Thus the main database blocks will produce write I/Os even though
-     all users are read-only; no changes have occurred to the data values
-     themselves.  Users will be able to write large sort temp files and
-     re-generate relcache info files, so there is no part of the database
-     that is truly read-only during hot standby mode. There is no restriction
-     on the use of set returning functions, or other users of tuplestore/tuplesort
-     code. Note also that writes to remote databases will still be possible,
-     even though the transaction is read-only locally.
-    </para>
- 
-    <para>
-     The following types of administrator command are not accepted
-     during recovery mode
- 
-       <itemizedlist>
-        <listitem>
-         <para>
-          Data Definition Language (DDL) - e.g. CREATE INDEX
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Privilege and Ownership - GRANT, REVOKE, REASSIGN
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
-         </para>
-        </listitem>
-       </itemizedlist>
-    </para>
- 
-    <para>
-     Note again that some of these commands are actually allowed during
-     "read only" mode transactions on the primary.
-    </para>
- 
-    <para>
-     As a result, you cannot create additional indexes that exist solely
-     on the standby, nor can statistics that exist solely on the standby.
-     If these administrator commands are needed they should be executed
-     on the primary so that the changes will propagate through to the
-     standby.
-    </para>
- 
-    <para>
-     <function>pg_cancel_backend()</> will work on user backends, but not the
-     Startup process, which performs recovery. pg_stat_activity does not
-     show an entry for the Startup process, nor do recovering transactions
-     show as active. As a result, pg_prepared_xacts is always empty during
-     recovery. If you wish to resolve in-doubt prepared transactions
-     then look at pg_prepared_xacts on the primary and issue commands to
-     resolve those transactions there.
-    </para>
- 
-    <para>
-     pg_locks will show locks held by backends as normal. pg_locks also shows
-     a virtual transaction managed by the Startup process that owns all
-     AccessExclusiveLocks held by transactions being replayed by recovery.
-     Note that Startup process does not acquire locks to
-     make database changes and thus locks other than AccessExclusiveLocks
-     do not show in pg_locks for the Startup process, they are just presumed
-     to exist.
-    </para>
- 
-    <para>
-     <productname>check_pgsql</> will work, but it is very simple.
-     <productname>check_postgres</> will also work, though many some actions
-     could give different or confusing results.
-     e.g. last vacuum time will not be maintained for example, since no
-     vacuum occurs on the standby (though vacuums running on the primary do
-     send their changes to the standby).
-    </para>
- 
-    <para>
-     WAL file control commands will not work during recovery
-     e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
-    </para>
- 
-    <para>
-     Dynamically loadable modules work, including pg_stat_statements.
-    </para>
- 
-    <para>
-     Advisory locks work normally in recovery, including deadlock detection.
-     Note that advisory locks are never WAL logged, so it is not possible for
-     an advisory lock on either the primary or the standby to conflict with WAL
-     replay. Nor is it possible to acquire an advisory lock on the primary
-     and have it initiate a similar advisory lock on the standby. Advisory
-     locks relate only to a single server on which they are acquired.
-    </para>
- 
-    <para>
-     Trigger-based replication systems such as <productname>Slony</>,
-     <productname>Londiste</> and <productname>Bucardo</> won't run on the
-     standby at all, though they will run happily on the primary server as
-     long as the changes are not sent to standby servers to be applied.
-     WAL replay is not trigger-based so you cannot relay from the
-     standby to any system that requires additional database writes or
-     relies on the use of triggers.
-    </para>
- 
-    <para>
-     New oids cannot be assigned, though some <acronym>UUID</> generators may still
-     work as long as they do not rely on writing new status to the database.
-    </para>
- 
-    <para>
-     Currently, temp table creation is not allowed during read only
-     transactions, so in some cases existing scripts will not run correctly.
-     It is possible we may relax that restriction in a later release. This is
-     both a SQL Standard compliance issue and a technical issue.
-    </para>
- 
-    <para>
-     <command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
-     Some standby users may be actively using the tablespace via their
-     <varname>temp_tablespaces</> parameter. If there are temp files in the
-     tablespace we currently cancel all active queries to ensure that temp
-     files are removed, so that we can remove the tablespace and continue with
-     WAL replay.
-    </para>
- 
-    <para>
-     Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
-     or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
-     that will cause all users connected to that database on the standby to be
-     forcibly disconnected. This action occurs immediately, whatever the setting of
-     <varname>max_standby_delay</>.
-    </para>
- 
-    <para>
-     In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
-     for a role with login capability while that user is still connected then
-     nothing happens to the connected user - they remain connected. The user cannot
-     reconnect however. This behaviour applies in recovery also, so a
-     <command>DROP USER</> on the primary does not disconnect that user on the standby.
-    </para>
- 
-    <para>
-     Stats collector is active during recovery. All scans, reads, blocks,
-     index usage etc will all be recorded normally on the standby. Replayed
-     actions will not duplicate their effects on primary, so replaying an
-     insert will not increment the Inserts column of pg_stat_user_tables.
-     The stats file is deleted at start of recovery, so stats from primary
-     and standby will differ; this is considered a feature not a bug.
-    </para>
- 
-    <para>
-     Autovacuum is not active during recovery, though will start normally
-     at the end of recovery.
-    </para>
- 
-    <para>
-     Background writer is active during recovery and will perform
-     restartpoints (similar to checkpoints on primary) and normal block
-     cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
-     though performs a restartpoint rather than a new checkpoint.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-parameters">
-    <title>Hot Standby Parameter Reference</title>
- 
-    <para>
-     Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
-     and <xref linkend="hot-standby-conflict"> sections.
-    </para>
- 
-    <para>
-     On the primary, parameters <varname>recovery_connections</> and
-     <varname>vacuum_defer_cleanup_age</> can be used to enable and control the
-     primary server to assist the successful configuration of Hot Standby servers.
-     <varname>max_standby_delay</> has no effect if set on the primary.
-    </para>
- 
-    <para>
-     On the standby, parameters <varname>recovery_connections</> and
-     <varname>max_standby_delay</> can be used to enable and control Hot Standby.
-     standby server to assist the successful configuration of Hot Standby servers.
-     <varname>vacuum_defer_cleanup_age</> has no effect during recovery.
-    </para>
-   </sect2>
- 
-   <sect2 id="hot-standby-caveats">
-    <title>Caveats</title>
- 
-    <para>
-     At this writing, there are several limitations of Hot Standby.
-     These can and probably will be fixed in future releases:
- 
-   <itemizedlist>
-    <listitem>
-     <para>
-      Operations on hash indexes are not presently WAL-logged, so
-      replay will not update these indexes.  Hash indexes will not be
-      used for query plans during recovery.
-     </para>
-    </listitem>
-    <listitem>
-     <para>
-      Full knowledge of running transactions is required before snapshots
-      may be taken. Transactions that take use large numbers of subtransactions
-      (currently greater than 64) will delay the start of read only
-      connections until the completion of the longest running write transaction.
-      If this situation occurs explanatory messages will be sent to server log.
-     </para>
-    </listitem>
-    <listitem>
-     <para>
-      Valid starting points for recovery connections are generated at each
-      checkpoint on the master. If the standby is shutdown while the master
-      is in a shutdown state it may not be possible to re-enter Hot Standby
-      until the primary is started up so that it generates further starting
-      points in the WAL logs. This is not considered a serious issue
-      because the standby is usually switched into the primary role while
-      the first node is taken down.
-     </para>
-    </listitem>
-    <listitem>
-     <para>
-      At the end of recovery, AccessExclusiveLocks held by prepared transactions
-      will require twice the normal number of lock table entries. If you plan
-      on running either a large number of concurrent prepared transactions
-      that normally take AccessExclusiveLocks, or you plan on having one
-      large transaction that takes many AccessExclusiveLocks then you are
-      advised to select a larger value of <varname>max_locks_per_transaction</>,
-      up to, but never more than twice the value of the parameter setting on
-      the primary server in rare extremes. You need not consider this at all if
-      your setting of <varname>max_prepared_transactions</> is <literal>0</>.
-     </para>
-    </listitem>
-   </itemizedlist>
- 
-    </para>
-   </sect2>
- 
-  </sect1>
- 
   <sect1 id="migration">
    <title>Migration Between Releases</title>
  
--- 1492,1497 ----
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***************
*** 79,84 ****
--- 79,87 ----
    also available.
   </para>
  
+  <sect1 id="different-replication-solutions">
+  <title>Comparison of different solutions</title>
+ 
   <variablelist>
  
    <varlistentry>
***************
*** 450,453 **** protocol to make nodes agree on a serializable transactional order.
--- 453,1840 ----
  
   </variablelist>
  
+  </sect1>
+ 
+  <sect1 id="warm-standby">
+   <title>File-based Log Shipping</title>
+ 
+   <indexterm zone="high-availability">
+    <primary>warm standby</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>PITR standby</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>standby server</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>log shipping</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>witness server</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>STONITH</primary>
+   </indexterm>
+ 
+   <para>
+    Continuous archiving can be used to create a <firstterm>high
+    availability</> (HA) cluster configuration with one or more
+    <firstterm>standby servers</> ready to take over operations if the
+    primary server fails. This capability is widely referred to as
+    <firstterm>warm standby</> or <firstterm>log shipping</>.
+   </para>
+ 
+   <para>
+    The primary and standby server work together to provide this capability,
+    though the servers are only loosely coupled. The primary server operates
+    in continuous archiving mode, while each standby server operates in
+    continuous recovery mode, reading the WAL files from the primary. No
+    changes to the database tables are required to enable this capability,
+    so it offers low administration overhead compared to some other
+    replication approaches. This configuration also has relatively low
+    performance impact on the primary server.
+   </para>
+ 
+   <para>
+    Directly moving WAL records from one database server to another
+    is typically described as log shipping. <productname>PostgreSQL</>
+    implements file-based log shipping, which means that WAL records are
+    transferred one file (WAL segment) at a time. WAL files (16MB) can be
+    shipped easily and cheaply over any distance, whether it be to an
+    adjacent system, another system at the same site, or another system on
+    the far side of the globe. The bandwidth required for this technique
+    varies according to the transaction rate of the primary server.
+    Record-based log shipping is also possible with custom-developed
+    procedures, as discussed in <xref linkend="warm-standby-record">.
+   </para>
+ 
+   <para>
+    It should be noted that the log shipping is asynchronous, i.e., the WAL
+    records are shipped after transaction commit. As a result there is a
+    window for data loss should the primary server suffer a catastrophic
+    failure: transactions not yet shipped will be lost.  The length of the
+    window of data loss can be limited by use of the
+    <varname>archive_timeout</varname> parameter, which can be set as low
+    as a few seconds if required.  However such a low setting will
+    substantially increase the bandwidth required for file shipping.
+    If you need a window of less than a minute or so, it's probably better
+    to consider record-based log shipping.
+   </para>
+ 
+   <para>
+    The standby server is not available for access, since it is continually
+    performing recovery processing. Recovery performance is sufficiently
+    good that the standby will typically be only moments away from full
+    availability once it has been activated. As a result, we refer to this
+    capability as a warm standby configuration that offers high
+    availability. Restoring a server from an archived base backup and
+    rollforward will take considerably longer, so that technique only
+    offers a solution for disaster recovery, not high availability.
+   </para>
+ 
+   <sect2 id="warm-standby-planning">
+    <title>Planning</title>
+ 
+    <para>
+     It is usually wise to create the primary and standby servers
+     so that they are as similar as possible, at least from the
+     perspective of the database server.  In particular, the path names
+     associated with tablespaces will be passed across unmodified, so both
+     primary and standby servers must have the same mount paths for
+     tablespaces if that feature is used.  Keep in mind that if
+     <xref linkend="sql-createtablespace" endterm="sql-createtablespace-title">
+     is executed on the primary, any new mount point needed for it must
+     be created on the primary and all standby servers before the command
+     is executed. Hardware need not be exactly the same, but experience shows
+     that maintaining two identical systems is easier than maintaining two
+     dissimilar ones over the lifetime of the application and system.
+     In any case the hardware architecture must be the same &mdash; shipping
+     from, say, a 32-bit to a 64-bit system will not work.
+    </para>
+ 
+    <para>
+     In general, log shipping between servers running different major
+     <productname>PostgreSQL</> release
+     levels is not possible. It is the policy of the PostgreSQL Global
+     Development Group not to make changes to disk formats during minor release
+     upgrades, so it is likely that running different minor release levels
+     on primary and standby servers will work successfully. However, no
+     formal support for that is offered and you are advised to keep primary
+     and standby servers at the same release level as much as possible.
+     When updating to a new minor release, the safest policy is to update
+     the standby servers first &mdash; a new minor release is more likely
+     to be able to read WAL files from a previous minor release than vice
+     versa.
+    </para>
+ 
+    <para>
+     There is no special mode required to enable a standby server. The
+     operations that occur on both primary and standby servers are
+     normal continuous archiving and recovery tasks. The only point of
+     contact between the two database servers is the archive of WAL files
+     that both share: primary writing to the archive, standby reading from
+     the archive. Care must be taken to ensure that WAL archives from separate
+     primary servers do not become mixed together or confused. The archive
+     need not be large if it is only required for standby operation.
+    </para>
+ 
+    <para>
+     The magic that makes the two loosely coupled servers work together is
+     simply a <varname>restore_command</> used on the standby that,
+     when asked for the next WAL file, waits for it to become available from
+     the primary. The <varname>restore_command</> is specified in the
+     <filename>recovery.conf</> file on the standby server. Normal recovery
+     processing would request a file from the WAL archive, reporting failure
+     if the file was unavailable.  For standby processing it is normal for
+     the next WAL file to be unavailable, so we must be patient and wait for
+     it to appear. For files ending in <literal>.backup</> or
+     <literal>.history</> there is no need to wait, and a non-zero return
+     code must be returned. A waiting <varname>restore_command</> can be
+     written as a custom script that loops after polling for the existence of
+     the next WAL file. There must also be some way to trigger failover, which
+     should interrupt the <varname>restore_command</>, break the loop and
+     return a file-not-found error to the standby server. This ends recovery
+     and the standby will then come up as a normal server.
+    </para>
+ 
+    <para>
+     Pseudocode for a suitable <varname>restore_command</> is:
+ <programlisting>
+ triggered = false;
+ while (!NextWALFileReady() &amp;&amp; !triggered)
+ {
+     sleep(100000L);         /* wait for ~0.1 sec */
+     if (CheckForExternalTrigger())
+         triggered = true;
+ }
+ if (!triggered)
+         CopyWALFileForRecovery();
+ </programlisting>
+    </para>
+ 
+    <para>
+     A working example of a waiting <varname>restore_command</> is provided
+     as a <filename>contrib</> module named <application>pg_standby</>. It
+     should be used as a reference on how to correctly implement the logic
+     described above. It can also be extended as needed to support specific
+     configurations and environments.
+    </para>
+ 
+    <para>
+     <productname>PostgreSQL</productname> does not provide the system
+     software required to identify a failure on the primary and notify
+     the standby database server.  Many such tools exist and are well
+     integrated with the operating system facilities required for
+     successful failover, such as IP address migration.
+    </para>
+ 
+    <para>
+     The method for triggering failover is an important part of planning
+     and design. One potential option is the <varname>restore_command</>
+     command.  It is executed once for each WAL file, but the process
+     running the <varname>restore_command</> is created and dies for
+     each file, so there is no daemon or server process, and we cannot
+     use signals or a signal handler. Therefore, the
+     <varname>restore_command</> is not suitable to trigger failover.
+     It is possible to use a simple timeout facility, especially if
+     used in conjunction with a known <varname>archive_timeout</>
+     setting on the primary. However, this is somewhat error prone
+     since a network problem or busy primary server might be sufficient
+     to initiate failover. A notification mechanism such as the explicit
+     creation of a trigger file is ideal, if this can be arranged.
+    </para>
+ 
+    <para>
+     The size of the WAL archive can be minimized by using the <literal>%r</>
+     option of the <varname>restore_command</>. This option specifies the
+     last archive file name that needs to be kept to allow the recovery to
+     restart correctly. This can be used to truncate the archive once
+     files are no longer required, assuming the archive is writable from the
+     standby server.
+    </para>
+   </sect2>
+ 
+   <sect2 id="warm-standby-config">
+    <title>Implementation</title>
+ 
+    <para>
+     The short procedure for configuring a standby server is as follows. For
+     full details of each step, refer to previous sections as noted.
+     <orderedlist>
+      <listitem>
+       <para>
+        Set up primary and standby systems as nearly identical as
+        possible, including two identical copies of
+        <productname>PostgreSQL</> at the same release level.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Set up continuous archiving from the primary to a WAL archive
+        directory on the standby server. Ensure that
+        <xref linkend="guc-archive-mode">,
+        <xref linkend="guc-archive-command"> and
+        <xref linkend="guc-archive-timeout">
+        are set appropriately on the primary
+        (see <xref linkend="backup-archiving-wal">).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Make a base backup of the primary server (see <xref
+        linkend="backup-base-backup">), and load this data onto the standby.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Begin recovery on the standby server from the local WAL
+        archive, using a <filename>recovery.conf</> that specifies a
+        <varname>restore_command</> that waits as described
+        previously (see <xref linkend="backup-pitr-recovery">).
+       </para>
+      </listitem>
+     </orderedlist>
+    </para>
+ 
+    <para>
+     Recovery treats the WAL archive as read-only, so once a WAL file has
+     been copied to the standby system it can be copied to tape at the same
+     time as it is being read by the standby database server.
+     Thus, running a standby server for high availability can be performed at
+     the same time as files are stored for longer term disaster recovery
+     purposes.
+    </para>
+ 
+    <para>
+     For testing purposes, it is possible to run both primary and standby
+     servers on the same system. This does not provide any worthwhile
+     improvement in server robustness, nor would it be described as HA.
+    </para>
+   </sect2>
+ 
+   <sect2 id="warm-standby-record">
+    <title>Record-based Log Shipping</title>
+ 
+    <para>
+     <productname>PostgreSQL</productname> directly supports file-based
+     log shipping as described above. It is also possible to implement
+     record-based log shipping, though this requires custom development.
+    </para>
+ 
+    <para>
+     An external program can call the <function>pg_xlogfile_name_offset()</>
+     function (see <xref linkend="functions-admin">)
+     to find out the file name and the exact byte offset within it of
+     the current end of WAL.  It can then access the WAL file directly
+     and copy the data from the last known end of WAL through the current end
+     over to the standby servers.  With this approach, the window for data
+     loss is the polling cycle time of the copying program, which can be very
+     small, and there is no wasted bandwidth from forcing partially-used
+     segment files to be archived.  Note that the standby servers'
+     <varname>restore_command</> scripts can only deal with whole WAL files,
+     so the incrementally copied data is not ordinarily made available to
+     the standby servers.  It is of use only when the primary dies &mdash;
+     then the last partial WAL file is fed to the standby before allowing
+     it to come up.  The correct implementation of this process requires
+     cooperation of the <varname>restore_command</> script with the data
+     copying program.
+    </para>
+ 
+    <para>
+     Starting with <productname>PostgreSQL</> version 8.5, you can use
+     streaming replication (see <xref linkend="streaming-replication">) to
+     achieve the same with less effort.
+    </para>
+   </sect2>
+  </sect1>
+ 
+   <sect1 id="streaming-replication">
+    <title>Streaming Replication</title>
+ 
+    <indexterm zone="high-availability">
+     <primary>Streaming Replication</primary>
+    </indexterm>
+ 
+    <para>
+     <productname>PostgreSQL</> includes a simple streaming replication
+     mechanism, which lets the standby server to stay more up-to-date than
+     file-based replication allows. The standby connects to the primary
+     and the primary starts streaming WAL records from where the standby
+     left off, and continues streaming them as they are generated, without
+     waiting for the WAL file to be filled. So with streaming replication,
+     <varname>archive_timeout</> does not need to be configured.
+    </para>
+ 
+    <para>
+     Streaming replication relies on file-based continuous archiving for
+     making the base backup and for allowing a standby to catch up if it's
+     disconnected from the primary for long enough for the primary to
+     delete old WAL files still required by the standby.
+    </para>
+ 
+    <sect2 id="streaming-replication-setup">
+     <title>Setup</title>
+     <para>
+      The short procedure for configuring streaming replication is as follows.
+      For full details of each step, refer to other sections as noted.
+      <orderedlist>
+       <listitem>
+        <para>
+         Set up primary and standby systems as near identically as possible,
+         including two identical copies of <productname>PostgreSQL</> at the
+         same release level.
+        </para>
+       </listitem>
+      <listitem>
+       <para>
+        Set up continuous archiving from the primary to a WAL archive located
+        in a directory on the standby server. Ensure that
+        <xref linkend="guc-archive-mode">,
+        <xref linkend="guc-archive-command"> and
+        <xref linkend="guc-archive-timeout">
+        are set appropriately on the primary
+        (see <xref linkend="backup-archiving-wal">).
+       </para>
+      </listitem>
+ 
+      <listitem>
+       <para>
+        Set up connections and authentication so that the standby server can
+        successfully connect to the pseudo <literal>replication</> database of
+        the primary server (see
+        <xref linkend="streaming-replication-authentication">). Ensure that
+        <xref linkend="guc-listen-addresses"> and <filename>pg_hba.conf</> are
+        configured appropriately on the primary.
+       </para>
+       <para>
+        On systems that support the keepalive socket option, setting
+        <xref linkend="guc-tcp-keepalives-idle">,
+        <xref linkend="guc-tcp-keepalives-interval"> and
+        <xref linkend="guc-tcp-keepalives-count"> helps you to find the
+        troubles with replication (e.g., the network outage or the failure of
+        the standby server) as soon as possible.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Set the maximum number of concurrent connections from the standby servers
+        (see <xref linkend="guc-max-wal-senders"> for details).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Enable WAL archiving in the primary server because we need to make a base
+        backup of it later (see <xref linkend="guc-archive-mode"> and
+        <xref linkend="guc-archive-command"> for details).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Start the <productname>PostgreSQL</> server on the primary.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Make a base backup of the primary server (see
+        <xref linkend="backup-base-backup">), and load this data onto the
+        standby. Note that all files present in <filename>pg_xlog</>
+        and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
+        server should be removed because they might be obsolete.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Set up WAL archiving, connections and authentication like the primary
+        server, because the standby server might work as a primary server after
+        failover. Ensure that your settings are consistent with the
+        <emphasis>future</> environment after the primary and the standby
+        server are interchanged by failover. If you're setting up the standby
+        server for e.g reporting purposes, with no plans to fail over to it,
+        configure the standby accordingly.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Create a recovery command file <filename>recovery.conf</> in the data
+        directory on the standby server.
+       </para>
+ 
+       <variablelist id="replication-config-settings" xreflabel="Replication Settings">
+        <varlistentry id="standby-mode" xreflabel="standby_mode">
+         <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
+         <listitem>
+          <para>
+           Specifies whether to start the <productname>PostgreSQL</> server as
+           a standby. If this parameter is <literal>on</>, the streaming
+           replication is enabled and the standby server will try to connect
+           to the primary to receive and apply WAL records continuously. The
+           default is <literal>off</>, which allows only an archive recovery
+           without replication. So, streaming replication requires this
+           parameter to be explicitly set to <literal>on</>.
+          </para>
+         </listitem>
+        </varlistentry>
+        <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
+         <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
+         <listitem>
+          <para>
+           Specifies a connection string which is used for the standby server
+           to connect with the primary. This string is in the same format as
+           described in <xref linkend="libpq-connect">. If any option is
+           unspecified in this string, then the corresponding environment
+           variable (see <xref linkend="libpq-envars">) is checked. If the
+           environment variable is not set either, then the indicated built-in
+           defaults are used.
+          </para>
+          <para>
+           The built-in replication requires that a host name (or host address)
+           or port number which the primary server listens on should be
+           specified in this string, respectively. Also ensure that a role with
+           the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
+           primary is set (see
+           <xref linkend="streaming-replication-authentication">). Note that
+           the password needs to be set if the primary demands password
+           authentication.
+          </para>
+         </listitem>
+        </varlistentry>
+        <varlistentry id="trigger-file" xreflabel="trigger_file">
+         <term><varname>trigger_file</varname> (<type>string</type>)</term>
+         <listitem>
+          <para>
+           Specifies a trigger file whose presence activates the standby.
+           If no trigger file is specified, the standby never exits
+           recovery.
+          </para>
+         </listitem>
+        </varlistentry>
+       </variablelist>
+      </listitem>
+      <listitem>
+       <para>
+        Start the <productname>PostgreSQL</> server on the standby. The standby
+        server will go into recovery mode and proceeds to receive WAL records
+        from the primary and apply them continuously.
+       </para>
+      </listitem>
+      </orderedlist>
+     </para>
+    </sect2>
+    <sect2 id="streaming-replication-authentication">
+     <title>Authentication</title>
+     <para>
+      It's very important that the access privilege for replication are set
+      properly so that only trusted users can read the WAL stream, because it's
+      easy to extract serious information from it.
+     </para>
+     <para>
+      Only superuser is allowed to connect to the primary as the replication
+      standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
+      privileges needs to be created in the primary.
+     </para>
+     <para>
+      Client authentication for replication is controlled by the
+      <filename>pg_hba.conf</> record specifying <literal>replication</> in the
+      <replaceable>database</> field. For example, if the standby is running on
+      host IP <literal>192.168.1.100</> and the superuser's name for replication
+      is <literal>foo</>, the administrator can add the following line to the
+      <filename>pg_hba.conf</> file on the primary.
+ 
+ <programlisting>
+ # Allow the user "foo" from host 192.168.1.100 to connect to the primary
+ # as a replication standby if the user's password is correctly supplied.
+ #
+ # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
+ host    replication     foo             192.168.1.100/32        md5
+ </programlisting>
+     </para>
+     <para>
+      The host name and port number of the primary, user name to connect as,
+      and password are specified in the <filename>recovery.conf</> file or
+      the corresponding environment variable on the standby.
+      For example, if the primary is running on host IP <literal>192.168.1.50</>,
+      port <literal>5432</literal>, the superuser's name for replication is
+      <literal>foo</>, and the password is <literal>foopass</>, the administrator
+      can add the following line to the <filename>recovery.conf</> file on the
+      standby.
+ 
+ <programlisting>
+ # The standby connects to the primary that is running on host 192.168.1.50
+ # and port 5432 as the user "foo" whose password is "foopass".
+ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
+ </programlisting>
+     </para>
+    </sect2>
+   </sect1>
+ 
+   <sect1 id="warm-standby-failover">
+    <title>Failover</title>
+ 
+    <para>
+     If the primary server fails then the standby server should begin
+     failover procedures.
+    </para>
+ 
+    <para>
+     If the standby server fails then no failover need take place. If the
+     standby server can be restarted, even some time later, then the recovery
+     process can also be immediately restarted, taking advantage of
+     restartable recovery. If the standby server cannot be restarted, then a
+     full new standby server instance should be created.
+    </para>
+ 
+    <para>
+     If the primary server fails and the standby server becomes the
+     new primary, and then the old primary restarts, you must have
+     a mechanism for informing old primary that it is no longer the primary. This is
+     sometimes known as STONITH (Shoot The Other Node In The Head), which is
+     necessary to avoid situations where both systems think they are the
+     primary, which will lead to confusion and ultimately data loss.
+    </para>
+ 
+    <para>
+     Many failover systems use just two systems, the primary and the standby,
+     connected by some kind of heartbeat mechanism to continually verify the
+     connectivity between the two and the viability of the primary. It is
+     also possible to use a third system (called a witness server) to prevent
+     some cases of inappropriate failover, but the additional complexity
+     might not be worthwhile unless it is set up with sufficient care and
+     rigorous testing.
+    </para>
+ 
+    <para>
+     Once failover to the standby occurs, we have only a
+     single server in operation. This is known as a degenerate state.
+     The former standby is now the primary, but the former primary is down
+     and might stay down.  To return to normal operation we must
+     fully recreate a standby server,
+     either on the former primary system when it comes up, or on a third,
+     possibly new, system. Once complete the primary and standby can be
+     considered to have switched roles. Some people choose to use a third
+     server to provide backup for the new primary until the new standby
+     server is recreated,
+     though clearly this complicates the system configuration and
+     operational processes.
+    </para>
+ 
+    <para>
+     So, switching from primary to standby server can be fast but requires
+     some time to re-prepare the failover cluster. Regular switching from
+     primary to standby is useful, since it allows regular downtime on
+     each system for maintenance. This also serves as a test of the
+     failover mechanism to ensure that it will really work when you need it.
+     Written administration procedures are advised.
+    </para>
+   </sect1>
+ 
+  <sect1 id="hot-standby">
+   <title>Hot Standby</title>
+ 
+   <indexterm zone="high-availability">
+    <primary>Hot Standby</primary>
+   </indexterm>
+ 
+    <para>
+     Hot Standby is the term used to describe the ability to connect to
+     the server and run queries while the server is in archive recovery. This
+     is useful for both log shipping replication and for restoring a backup
+     to an exact state with great precision.
+     The term Hot Standby also refers to the ability of the server to move
+     from recovery through to normal running while users continue running
+     queries and/or continue their connections.
+    </para>
+ 
+    <para>
+     Running queries in recovery is in many ways the same as normal running
+     though there are a large number of usage and administrative points
+     to note.
+    </para>
+ 
+   <sect2 id="hot-standby-users">
+    <title>User's Overview</title>
+ 
+    <para>
+     Users can connect to the database while the server is in recovery
+     and perform read-only queries. Read-only access to catalogs and views
+     will also occur as normal.
+    </para>
+ 
+    <para>
+     The data on the standby takes some time to arrive from the primary server
+     so there will be a measurable delay between primary and standby. Running the
+     same query nearly simultaneously on both primary and standby might therefore
+     return differing results. We say that data on the standby is eventually
+     consistent with the primary.
+     Queries executed on the standby will be correct with regard to the transactions
+     that had been recovered at the start of the query, or start of first statement,
+     in the case of serializable transactions. In comparison with the primary,
+     the standby returns query results that could have been obtained on the primary
+     at some exact moment in the past.
+    </para>
+ 
+    <para>
+     When a transaction is started in recovery, the parameter
+     <varname>transaction_read_only</> will be forced to be true, regardless of the
+     <varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
+     It can't be manually set to false either. As a result, all transactions
+     started during recovery will be limited to read-only actions only. In all
+     other ways, connected sessions will appear identical to sessions
+     initiated during normal processing mode. There are no special commands
+     required to initiate a connection at this time, so all interfaces
+     work normally without change. After recovery finishes, the session
+     will allow normal read-write transactions at the start of the next
+     transaction, if these are requested.
+    </para>
+ 
+    <para>
+     Read-only here means "no writes to the permanent database tables".
+     There are no problems with queries that make use of transient sort and
+     work files.
+    </para>
+ 
+    <para>
+     The following actions are allowed
+ 
+     <itemizedlist>
+      <listitem>
+       <para>
+        Query access - SELECT, COPY TO including views and SELECT RULEs
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Cursor commands - DECLARE, FETCH, CLOSE,
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Parameters - SHOW, SET, RESET
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Transaction management commands
+         <itemizedlist>
+          <listitem>
+           <para>
+            BEGIN, END, ABORT, START TRANSACTION
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+            SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+            EXCEPTION blocks and other internal subtransactions
+           </para>
+          </listitem>
+         </itemizedlist>
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        LOCK TABLE, though only when explicitly in one of these modes:
+        ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Plugins and extensions - LOAD
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+ 
+    <para>
+     These actions produce error messages
+ 
+     <itemizedlist>
+      <listitem>
+       <para>
+        Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
+        Note that there are no allowed actions that result in a trigger
+        being executed during recovery.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
+        This also applies to temporary tables currently because currently their
+        definition causes writes to catalog tables.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        SELECT ... FOR SHARE | UPDATE which cause row locks to be written
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        RULEs on SELECT statements that generate DML commands.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+        LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Transaction management commands that explicitly set non-read only state
+         <itemizedlist>
+          <listitem>
+           <para>
+             BEGIN READ WRITE,
+             START TRANSACTION READ WRITE
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+             SET TRANSACTION READ WRITE,
+             SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
+           </para>
+          </listitem>
+          <listitem>
+           <para>
+            SET transaction_read_only = off
+           </para>
+          </listitem>
+         </itemizedlist>
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
+        ROLLBACK PREPARED because even read-only transactions need to write
+        WAL in the prepare phase (the first phase of two phase commit).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        sequence update - nextval()
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+ 
+    <para>
+     Note that current behaviour of read only transactions when not in
+     recovery is to allow the last two actions, so there are small and
+     subtle differences in behaviour between read-only transactions
+     run on standby and during normal running.
+     It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
+     temporary tables may be lifted in a future release, if their internal
+     implementation is altered to make this possible.
+    </para>
+ 
+    <para>
+     If failover or switchover occurs the database will switch to normal
+     processing mode. Sessions will remain connected while the server
+     changes mode. Current transactions will continue, though will remain
+     read-only. After recovery is complete, it will be possible to initiate
+     read-write transactions.
+    </para>
+ 
+    <para>
+     Users will be able to tell whether their session is read-only by
+     issuing SHOW transaction_read_only.  In addition a set of
+     functions <xref linkend="functions-recovery-info-table"> allow users to
+     access information about Hot Standby. These allow you to write
+     functions that are aware of the current state of the database. These
+     can be used to monitor the progress of recovery, or to allow you to
+     write complex programs that restore the database to particular states.
+    </para>
+ 
+    <para>
+     In recovery, transactions will not be permitted to take any table lock
+     higher than RowExclusiveLock. In addition, transactions may never assign
+     a TransactionId and may never write WAL.
+     Any <command>LOCK TABLE</> command that runs on the standby and requests
+     a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
+    </para>
+ 
+    <para>
+     In general queries will not experience lock conflicts with the database
+     changes made by recovery. This is becase recovery follows normal
+     concurrency control mechanisms, known as <acronym>MVCC</>. There are
+     some types of change that will cause conflicts, covered in the following
+     section.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-conflict">
+    <title>Handling query conflicts</title>
+ 
+    <para>
+     The primary and standby nodes are in many ways loosely connected. Actions
+     on the primary will have an effect on the standby. As a result, there is
+     potential for negative interactions or conflicts between them. The easiest
+     conflict to understand is performance: if a huge data load is taking place
+     on the primary then this will generate a similar stream of WAL records on the
+     standby, so standby queries may contend for system resources, such as I/O.
+    </para>
+ 
+    <para>
+     There are also additional types of conflict that can occur with Hot Standby.
+     These conflicts are <emphasis>hard conflicts</> in the sense that we may
+     need to cancel queries and in some cases disconnect sessions to resolve them.
+     The user is provided with a number of optional ways to handle these
+     conflicts, though we must first understand the possible reasons behind a conflict.
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          Access Exclusive Locks from primary node, including both explicit
+          LOCK commands and various kinds of DDL action
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Dropping tablespaces on the primary while standby queries are using
+          those tablespaces for temporary work files (work_mem overflow)
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Dropping databases on the primary while users are connected to that
+          database on the standby.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Waiting to acquire buffer cleanup locks
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Early cleanup of data still visible to the current query's snapshot
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     Some WAL redo actions will be for DDL actions. These DDL actions are
+     repeating actions that have already committed on the primary node, so
+     they must not fail on the standby node. These DDL locks take priority
+     and will automatically *cancel* any read-only transactions that get in
+     their way, after a grace period. This is similar to the possibility of
+     being canceled by the deadlock detector, but in this case the standby
+     process always wins, since the replayed actions must not fail. This
+     also ensures that replication doesn't fall behind while we wait for a
+     query to complete. Again, we assume that the standby is there for high
+     availability purposes primarily.
+    </para>
+ 
+    <para>
+     An example of the above would be an Administrator on Primary server
+     runs a <command>DROP TABLE</> on a table that's currently being queried
+     in the standby server.
+     Clearly the query cannot continue if we let the <command>DROP TABLE</>
+     proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
+     would wait until the query has finished. When the query is on the standby
+     and the <command>DROP TABLE</> is on the primary, the primary doesn't have
+     information about which queries are running on the standby and so the query
+     does not wait on the primary. The WAL change records come through to the
+     standby while the standby query is still running, causing a conflict.
+    </para>
+ 
+    <para>
+     The most common reason for conflict between standby queries and WAL redo is
+     "early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
+     row versions when there are no users who may need to see them to ensure correct
+     visibility of data (the heart of MVCC). If there is a standby query that has
+     been running for longer than any query on the primary then it is possible
+     for old row versions to be removed by either a vacuum or HOT. This will
+     then generate WAL records that, if applied, would remove data on the
+     standby that might *potentially* be required by the standby query.
+     In more technical language, the primary's xmin horizon is later than
+     the standby's xmin horizon, allowing dead rows to be removed.
+    </para>
+ 
+    <para>
+     Experienced users should note that both row version cleanup and row version
+     freezing will potentially conflict with recovery queries. Running a
+     manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
+     with no updated or deleted rows.
+    </para>
+ 
+    <para>
+     We have a number of choices for resolving query conflicts.  The default
+     is that we wait and hope the query completes. The server will wait
+     automatically until the lag between primary and standby is at most
+     <varname>max_standby_delay</> seconds. Once that grace period expires,
+     we take one of the following actions:
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          If the conflict is caused by a lock, we cancel the conflicting standby
+          transaction immediately. If the transaction is idle-in-transaction
+          then currently we abort the session instead, though this may change
+          in the future.
+         </para>
+        </listitem>
+ 
+        <listitem>
+         <para>
+          If the conflict is caused by cleanup records we tell the standby query
+          that a conflict has occurred and that it must cancel itself to avoid the
+          risk that it silently fails to read relevant data because
+          that data has been removed. (This is regrettably very similar to the
+          much feared and iconic error message "snapshot too old"). Some cleanup
+          records only cause conflict with older queries, though some types of
+          cleanup record affect all queries.
+         </para>
+ 
+         <para>
+          If cancellation does occur, the query and/or transaction can always
+          be re-executed. The error is dynamic and will not necessarily occur
+          the same way if the query is executed again.
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     <varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
+     The parameter applies to the server as a whole so if the delay is all used
+     up by a single query then there may be little or no waiting for queries that
+     follow immediately, though they will have benefited equally from the initial
+     waiting period. The server may take time to catch up again before the grace
+     period is available again, though if there is a heavy and constant stream
+     of conflicts it may seldom catch up fully.
+    </para>
+ 
+    <para>
+     Users should be clear that tables that are regularly and heavily updated on
+     primary server will quickly cause cancellation of longer running queries on
+     the standby. In those cases <varname>max_standby_delay</> can be
+     considered somewhat but not exactly the same as setting
+     <varname>statement_timeout</>.
+     </para>
+ 
+    <para>
+     Other remedial actions exist if the number of cancellations is unacceptable.
+     The first option is to connect to primary server and keep a query active
+     for as long as we need to run queries on the standby. This guarantees that
+     a WAL cleanup record is never generated and we don't ever get query
+     conflicts as described above. This could be done using contrib/dblink
+     and pg_sleep(), or via other mechanisms. If you do this, you should note
+     that this will delay cleanup of dead rows by vacuum or HOT and many
+     people may find this undesirable. However, we should remember that
+     primary and standby nodes are linked via the WAL, so this situation is no
+     different to the case where we ran the query on the primary node itself
+     except we have the benefit of off-loading the execution onto the standby.
+    </para>
+ 
+    <para>
+     It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
+     to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
+     more time for queries to execute before they are cancelled on the standby,
+     without the need for setting a high <varname>max_standby_delay</>.
+    </para>
+ 
+    <para>
+     Three-way deadlocks are possible between AccessExclusiveLocks arriving from
+     the primary, cleanup WAL records that require buffer cleanup locks and
+     user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
+     are resolved by time-out when we exceed <varname>max_standby_delay</>.
+    </para>
+ 
+    <para>
+     Dropping tablespaces or databases is discussed in the administrator's
+     section since they are not typical user situations.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-admin">
+    <title>Administrator's Overview</title>
+ 
+    <para>
+     If there is a <filename>recovery.conf</> file present the server will start
+     in Hot Standby mode by default, though <varname>recovery_connections</> can
+     be disabled via <filename>postgresql.conf</>, if required. The server may take
+     some time to enable recovery connections since the server must first complete
+     sufficient recovery to provide a consistent state against which queries
+     can run before enabling read only connections. Look for these messages
+     in the server logs
+ 
+ <programlisting>
+ LOG:  initializing recovery connections
+ 
+ ... then some time later ...
+ 
+ LOG:  consistent recovery state reached
+ LOG:  database system is ready to accept read only connections
+ </programlisting>
+ 
+     Consistency information is recorded once per checkpoint on the primary, as long
+     as <varname>recovery_connections</> is enabled (on the primary). If this parameter
+     is disabled, it will not be possible to enable recovery connections on the standby.
+     The consistent state can also be delayed in the presence of both of these conditions
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          a write transaction has more than 64 subtransactions
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          very long-lived write transactions
+         </para>
+        </listitem>
+       </itemizedlist>
+ 
+     If you are running file-based log shipping ("warm standby"), you may need
+     to wait until the next WAL file arrives, which could be as long as the
+     <varname>archive_timeout</> setting on the primary.
+    </para>
+ 
+    <para>
+     The setting of some parameters on the standby will need reconfiguration
+     if they have been changed on the primary. The value on the standby must
+     be equal to or greater than the value on the primary. If these parameters
+     are not set high enough then the standby will not be able to track work
+     correctly from recovering transactions. If these values are set too low the
+     the server will halt. Higher values can then be supplied and the server
+     restarted to begin recovery again.
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          <varname>max_connections</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <varname>max_prepared_transactions</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <varname>max_locks_per_transaction</>
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     It is important that the administrator consider the appropriate setting
+     of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
+     There is no optimal setting and should be set according to business
+     priorities. For example if the server is primarily tasked as a High
+     Availability server, then you may wish to lower
+     <varname>max_standby_delay</> or even set it to zero, though that is a
+     very aggressive setting. If the standby server is tasked as an additional
+     server for decision support queries then it may be acceptable to set this
+     to a value of many hours (in seconds).
+    </para>
+ 
+    <para>
+     Transaction status "hint bits" written on primary are not WAL-logged,
+     so data on standby will likely re-write the hints again on the standby.
+     Thus the main database blocks will produce write I/Os even though
+     all users are read-only; no changes have occurred to the data values
+     themselves.  Users will be able to write large sort temp files and
+     re-generate relcache info files, so there is no part of the database
+     that is truly read-only during hot standby mode. There is no restriction
+     on the use of set returning functions, or other users of tuplestore/tuplesort
+     code. Note also that writes to remote databases will still be possible,
+     even though the transaction is read-only locally.
+    </para>
+ 
+    <para>
+     The following types of administrator command are not accepted
+     during recovery mode
+ 
+       <itemizedlist>
+        <listitem>
+         <para>
+          Data Definition Language (DDL) - e.g. CREATE INDEX
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Privilege and Ownership - GRANT, REVOKE, REASSIGN
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
+         </para>
+        </listitem>
+       </itemizedlist>
+    </para>
+ 
+    <para>
+     Note again that some of these commands are actually allowed during
+     "read only" mode transactions on the primary.
+    </para>
+ 
+    <para>
+     As a result, you cannot create additional indexes that exist solely
+     on the standby, nor can statistics that exist solely on the standby.
+     If these administrator commands are needed they should be executed
+     on the primary so that the changes will propagate through to the
+     standby.
+    </para>
+ 
+    <para>
+     <function>pg_cancel_backend()</> will work on user backends, but not the
+     Startup process, which performs recovery. pg_stat_activity does not
+     show an entry for the Startup process, nor do recovering transactions
+     show as active. As a result, pg_prepared_xacts is always empty during
+     recovery. If you wish to resolve in-doubt prepared transactions
+     then look at pg_prepared_xacts on the primary and issue commands to
+     resolve those transactions there.
+    </para>
+ 
+    <para>
+     pg_locks will show locks held by backends as normal. pg_locks also shows
+     a virtual transaction managed by the Startup process that owns all
+     AccessExclusiveLocks held by transactions being replayed by recovery.
+     Note that Startup process does not acquire locks to
+     make database changes and thus locks other than AccessExclusiveLocks
+     do not show in pg_locks for the Startup process, they are just presumed
+     to exist.
+    </para>
+ 
+    <para>
+     <productname>check_pgsql</> will work, but it is very simple.
+     <productname>check_postgres</> will also work, though many some actions
+     could give different or confusing results.
+     e.g. last vacuum time will not be maintained for example, since no
+     vacuum occurs on the standby (though vacuums running on the primary do
+     send their changes to the standby).
+    </para>
+ 
+    <para>
+     WAL file control commands will not work during recovery
+     e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
+    </para>
+ 
+    <para>
+     Dynamically loadable modules work, including pg_stat_statements.
+    </para>
+ 
+    <para>
+     Advisory locks work normally in recovery, including deadlock detection.
+     Note that advisory locks are never WAL logged, so it is not possible for
+     an advisory lock on either the primary or the standby to conflict with WAL
+     replay. Nor is it possible to acquire an advisory lock on the primary
+     and have it initiate a similar advisory lock on the standby. Advisory
+     locks relate only to a single server on which they are acquired.
+    </para>
+ 
+    <para>
+     Trigger-based replication systems such as <productname>Slony</>,
+     <productname>Londiste</> and <productname>Bucardo</> won't run on the
+     standby at all, though they will run happily on the primary server as
+     long as the changes are not sent to standby servers to be applied.
+     WAL replay is not trigger-based so you cannot relay from the
+     standby to any system that requires additional database writes or
+     relies on the use of triggers.
+    </para>
+ 
+    <para>
+     New oids cannot be assigned, though some <acronym>UUID</> generators may still
+     work as long as they do not rely on writing new status to the database.
+    </para>
+ 
+    <para>
+     Currently, temp table creation is not allowed during read only
+     transactions, so in some cases existing scripts will not run correctly.
+     It is possible we may relax that restriction in a later release. This is
+     both a SQL Standard compliance issue and a technical issue.
+    </para>
+ 
+    <para>
+     <command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
+     Some standby users may be actively using the tablespace via their
+     <varname>temp_tablespaces</> parameter. If there are temp files in the
+     tablespace we currently cancel all active queries to ensure that temp
+     files are removed, so that we can remove the tablespace and continue with
+     WAL replay.
+    </para>
+ 
+    <para>
+     Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
+     or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
+     that will cause all users connected to that database on the standby to be
+     forcibly disconnected. This action occurs immediately, whatever the setting of
+     <varname>max_standby_delay</>.
+    </para>
+ 
+    <para>
+     In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
+     for a role with login capability while that user is still connected then
+     nothing happens to the connected user - they remain connected. The user cannot
+     reconnect however. This behaviour applies in recovery also, so a
+     <command>DROP USER</> on the primary does not disconnect that user on the standby.
+    </para>
+ 
+    <para>
+     Stats collector is active during recovery. All scans, reads, blocks,
+     index usage etc will all be recorded normally on the standby. Replayed
+     actions will not duplicate their effects on primary, so replaying an
+     insert will not increment the Inserts column of pg_stat_user_tables.
+     The stats file is deleted at start of recovery, so stats from primary
+     and standby will differ; this is considered a feature not a bug.
+    </para>
+ 
+    <para>
+     Autovacuum is not active during recovery, though will start normally
+     at the end of recovery.
+    </para>
+ 
+    <para>
+     Background writer is active during recovery and will perform
+     restartpoints (similar to checkpoints on primary) and normal block
+     cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
+     though performs a restartpoint rather than a new checkpoint.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-parameters">
+    <title>Hot Standby Parameter Reference</title>
+ 
+    <para>
+     Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
+     and <xref linkend="hot-standby-conflict"> sections.
+    </para>
+ 
+    <para>
+     On the primary, parameters <varname>recovery_connections</> and
+     <varname>vacuum_defer_cleanup_age</> can be used to enable and control the
+     primary server to assist the successful configuration of Hot Standby servers.
+     <varname>max_standby_delay</> has no effect if set on the primary.
+    </para>
+ 
+    <para>
+     On the standby, parameters <varname>recovery_connections</> and
+     <varname>max_standby_delay</> can be used to enable and control Hot Standby.
+     standby server to assist the successful configuration of Hot Standby servers.
+     <varname>vacuum_defer_cleanup_age</> has no effect during recovery.
+    </para>
+   </sect2>
+ 
+   <sect2 id="hot-standby-caveats">
+    <title>Caveats</title>
+ 
+    <para>
+     At this writing, there are several limitations of Hot Standby.
+     These can and probably will be fixed in future releases:
+ 
+   <itemizedlist>
+    <listitem>
+     <para>
+      Operations on hash indexes are not presently WAL-logged, so
+      replay will not update these indexes.  Hash indexes will not be
+      used for query plans during recovery.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Full knowledge of running transactions is required before snapshots
+      may be taken. Transactions that take use large numbers of subtransactions
+      (currently greater than 64) will delay the start of read only
+      connections until the completion of the longest running write transaction.
+      If this situation occurs explanatory messages will be sent to server log.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Valid starting points for recovery connections are generated at each
+      checkpoint on the master. If the standby is shutdown while the master
+      is in a shutdown state it may not be possible to re-enter Hot Standby
+      until the primary is started up so that it generates further starting
+      points in the WAL logs. This is not considered a serious issue
+      because the standby is usually switched into the primary role while
+      the first node is taken down.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      At the end of recovery, AccessExclusiveLocks held by prepared transactions
+      will require twice the normal number of lock table entries. If you plan
+      on running either a large number of concurrent prepared transactions
+      that normally take AccessExclusiveLocks, or you plan on having one
+      large transaction that takes many AccessExclusiveLocks then you are
+      advised to select a larger value of <varname>max_locks_per_transaction</>,
+      up to, but never more than twice the value of the parameter setting on
+      the primary server in rare extremes. You need not consider this at all if
+      your setting of <varname>max_prepared_transactions</> is <literal>0</>.
+     </para>
+    </listitem>
+   </itemizedlist>
+ 
+    </para>
+   </sect2>
+ 
+  </sect1>
+ 
+   <sect1 id="backup-incremental-updated">
+    <title>Incrementally Updated Backups</title>
+ 
+   <indexterm zone="high-availability">
+    <primary>incrementally updated backups</primary>
+   </indexterm>
+ 
+   <indexterm zone="high-availability">
+    <primary>change accumulation</primary>
+   </indexterm>
+ 
+    <para>
+     In a warm standby configuration, it is possible to offload the expense of
+     taking periodic base backups from the primary server; instead base backups
+     can be made by backing
+     up a standby server's files.  This concept is generally known as
+     incrementally updated backups, log change accumulation, or more simply,
+     change accumulation.
+    </para>
+ 
+    <para>
+     If we take a file system backup of the standby server's data
+     directory while it is processing
+     logs shipped from the primary, we will be able to reload that backup and
+     restart the standby's recovery process from the last restart point.
+     We no longer need to keep WAL files from before the standby's restart point.
+     If we need to recover, it will be faster to recover from the incrementally
+     updated backup than from the original base backup.
+    </para>
+ 
+    <para>
+     Since the standby server is not <quote>live</>, it is not possible to
+     use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
+     to manage the backup process; it will be up to you to determine how
+     far back you need to keep WAL segment files to have a recoverable
+     backup.  You can do this by running <application>pg_controldata</>
+     on the standby server to inspect the control file and determine the
+     current checkpoint WAL location, or by using the
+     <varname>log_checkpoints</> option to print values to the standby's
+     server log.
+    </para>
+   </sect1>
+ 
  </chapter>
#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Thom Brown (#6)
Re: Streaming replication in docs

Thom Brown wrote:

On 9 February 2010 01:57, Fujii Masao <masao.fujii@gmail.com> wrote:

The attached patch changes the layout according to your suggestion
(but with a small modification). Here is the new layout:

-------------------------------
Chapter 24. Backup and Restore
24.1. SQL Dump
24.1.1. Restoring the dump
24.1.2. Using pg_dumpall
24.1.3. Handling large databases
24.2. File System Level Backup
24.3. Continuous Archiving and Point-In-Time Recovery (PITR)
24.3.1. Setting up WAL archiving
24.3.2. Making a Base Backup
24.3.3. Recovering using a Continuous Archive Backup
24.3.4. Timelines
24.3.5. Tips and Examples
24.3.6. Caveats
24.4. Migration Between Releases

Chapter 25. High Availability, Load Balancing, and Replication
25.1. Comparison of different solutions
25.2. File-based Log Shipping
25.2.1. Planning
25.2.2. Implementation
25.2.3. Record-based Log Shipping
25.3. Streaming Replication
25.3.1. Setup
25.3.2. Authentication
25.4. Failover
25.5. Hot Standby
25.5.1. User's Overview
25.5.2. Handling query conflicts
25.5.3. Administrator's Overview
25.5.4. Hot Standby Parameter Reference
25.5.5. Caveats
25.6. Incrementally Updated Backups
-------------------------------

I didn't change the content except adjusting the "sect" and
the "indexterm" tags to new chapter.

If this layout change will have been committed, I'll improve
the SR section.

And can I just correct a couple typo's while we're changing it too? (attached)

Thanks, committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com