Streaming replication document improvements

Started by Fujii Masaoalmost 16 years ago34 messages
#1Fujii Masao
masao.fujii@gmail.com
1 attachment(s)

Hi,

The attached patch improves the documents about SR as follows:

* Improve the description about the setup procedure of SR.
o Add the link to the recovery.conf parameter if needed.
o Add the example of recovery.conf setting.
* Document what should be monitored for avoiding the disk full
failure on the primary.
* Clarify how to safely take an incrementally updated backups.
* Define the recovery.conf parameters as an index term.
* Describe how to use recovery.conf.

Any comments are welcome.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

replication_doc_improvement_v1.patchapplication/octet-stream; name=replication_doc_improvement_v1.patchDownload
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***************
*** 801,815 **** if (!triggered)
        </listitem>
       <listitem>
        <para>
!        Set up continuous archiving from the primary to a WAL archive located
!        in a directory on the standby server. In particular, set
!        <xref linkend="guc-archive-mode"> and
!        <xref linkend="guc-archive-command">
!        to archive WAL files in a location accessible from the standby
         (see <xref linkend="backup-archiving-wal">).
        </para>
       </listitem>
- 
       <listitem>
        <para>
         Set <xref linkend="guc-listen-addresses"> and authentication options
--- 801,812 ----
        </listitem>
       <listitem>
        <para>
!        Set up continuous archiving from the primary to a WAL archive accessible
!        from the standby. Ensure that <xref linkend="guc-archive-mode"> and
!        <xref linkend="guc-archive-command"> are set appropriately on the primary
         (see <xref linkend="backup-archiving-wal">).
        </para>
       </listitem>
       <listitem>
        <para>
         Set <xref linkend="guc-listen-addresses"> and authentication options
***************
*** 821,827 **** if (!triggered)
         On systems that support the keepalive socket option, setting
         <xref linkend="guc-tcp-keepalives-idle">,
         <xref linkend="guc-tcp-keepalives-interval"> and
!        <xref linkend="guc-tcp-keepalives-count"> helps the master promptly
         notice a broken connection.
        </para>
       </listitem>
--- 818,824 ----
         On systems that support the keepalive socket option, setting
         <xref linkend="guc-tcp-keepalives-idle">,
         <xref linkend="guc-tcp-keepalives-interval"> and
!        <xref linkend="guc-tcp-keepalives-count"> helps the primary promptly
         notice a broken connection.
        </para>
       </listitem>
***************
*** 829,834 **** if (!triggered)
--- 826,834 ----
        <para>
         Set the maximum number of concurrent connections from the standby servers
         (see <xref linkend="guc-max-wal-senders"> for details).
+        Since those connections are for superusers,
+        <xref linkend="guc-superuser-reserved-connections"> should be
+        set accordingly.
        </para>
       </listitem>
       <listitem>
***************
*** 858,872 **** if (!triggered)
       <listitem>
        <para>
         Create a recovery command file <filename>recovery.conf</> in the data
!        directory on the standby server. Set <varname>restore_command</varname>
!        as you would in normal recovery from a continuous archiving backup
!        (see <xref linkend="backup-pitr-recovery">). <literal>pg_standby</> or
!        similar tools that wait for the next WAL file to arrive cannot be used
!        with streaming replication, as the server handles retries and waiting
!        itself. Enable <varname>standby_mode</varname>. Set
!        <varname>primary_conninfo</varname> to point to the primary server.
        </para>
- 
       </listitem>
       <listitem>
        <para>
--- 858,908 ----
       <listitem>
        <para>
         Create a recovery command file <filename>recovery.conf</> in the data
!        directory on the standby server.
!        <itemizedlist>
!         <listitem>
!          <para>
!           Enable <xref linkend="standby-mode">.
!          </para>
!         </listitem>
!         <listitem>
!          <para>
!           Set <xref linkend="primary-conninfo"> to point to the primary server.
!           The host name (or host address) and port number which the primary server
!           listens on should be specified in this string. Also ensure that a role with
!           the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
!           primary is set (see
!           <xref linkend="streaming-replication-authentication">). Note that
!           the password needs to be set if the primary demands password
!           authentication.
!          </para>
!         </listitem>
!         <listitem>
!          <para>
!           Set <xref linkend="restore-command"> as you would in normal recovery
!           from a continuous archiving backup
!           (see <xref linkend="backup-pitr-recovery">).
!           <application>pg_standby</> or similar tools that wait for the next
!           WAL file to arrive cannot be used with streaming replication,
!           as the server handles retries and waiting itself.
!          </para>
!         </listitem>
!         <listitem>
!          <para>
!           Specifies the trigger file whose presence ends the standby's
!           recovery in the <xref linkend="trigger-file">. If you have no
!           plan to perform failover, this parameter is not required.
!          </para>
!         </listitem>
!        </itemizedlist>
!        A simple example of the <filename>recovery.conf</> is:
! <programlisting>
! standby_mode = 'on'
! primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
! restore_command = 'cp /path/to/archive/%f %p'
! trigger_file = '/path/to/trigger_file'
! </programlisting>
        </para>
       </listitem>
       <listitem>
        <para>
***************
*** 929,934 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
--- 965,991 ----
        <varname>primary_conninfo</varname> then a FATAL error will be raised.
      </para>
     </sect2>
+ 
+    <sect2 id="streaming-replication-monitor">
+     <title>Monitor</title>
+     <para>
+      The WAL files required for the standby's recovery are not deleted from
+      the <filename>pg_xlog</> directory on the primary while the standby is
+      connecting to the standby. If the standby lags far behind the primary,
+      many WAL files would accumulate in there, and might cause a disk full
+      failure. So it is very important to check periodically how far the
+      standby lags behind and eliminate the cause of the delay as soon as
+      possible. You can check that by comparing the current WAL write
+      location on the primary with the last WAL location received by the
+      standby. The former and the latter can be taken from the
+      <function>pg_current_xlog_location</> on the primary and the
+      <function>pg_last_xlog_receive_location</> on the standby,
+      respectively (see <xref linkend="functions-admin-backup-table"> and
+      <xref linkend="functions-recovery-info-table"> for details).
+      Also the latter can be taken from the <command>ps</> message of
+      WAL receiver process (see <xref linkend="monitoring-ps"> for details).
+     </para>
+    </sect2>
    </sect1>
  
    <sect1 id="warm-standby-failover">
***************
*** 1798,1831 **** LOG:  database system is ready to accept read only connections
    </indexterm>
  
     <para>
!     In a warm standby configuration, it is possible to offload the expense of
!     taking periodic base backups from the primary server; instead base backups
!     can be made by backing
      up a standby server's files.  This concept is generally known as
      incrementally updated backups, log change accumulation, or more simply,
      change accumulation.
-    </para>
- 
-    <para>
-     If we take a file system backup of the standby server's data
-     directory while it is processing
-     logs shipped from the primary, we will be able to reload that backup and
-     restart the standby's recovery process from the last restart point.
-     We no longer need to keep WAL files from before the standby's restart point.
      If recovery is needed, it will be faster to recover from the incrementally
      updated backup than from the original base backup.
     </para>
  
     <para>
!     Since the standby server is not <quote>live</>, it is not possible to
!     use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
!     to manage the backup process; it will be up to you to determine how
!     far back you need to keep WAL segment files to have a recoverable
!     backup.  You can do this by running <application>pg_controldata</>
!     on the standby server to inspect the control file and determine the
!     current checkpoint WAL location, or by using the
!     <varname>log_checkpoints</> option to print values to the standby's
!     server log.
     </para>
    </sect1>
  
--- 1855,1918 ----
    </indexterm>
  
     <para>
!     In file-based log shipping or streaming replication,
!     it is possible to offload the expense of taking periodic base backups
!     from the primary server; instead base backups can be made by backing
      up a standby server's files.  This concept is generally known as
      incrementally updated backups, log change accumulation, or more simply,
      change accumulation.
      If recovery is needed, it will be faster to recover from the incrementally
      updated backup than from the original base backup.
     </para>
  
     <para>
!     The procedure for taking a file system backup of the standby server's
!     data directory while it is processing logs shipped from the primary is:
!    <orderedlist>
!     <listitem>
!      <para>
!       Perform the backup, without using <function>pg_start_backup</> and
!       <function>pg_stop_backup</>. But note that the <filename>pg_control</>
!       file must be backed up first like the following:
! <programlisting>
! cp /var/lib/pgsql/data/global/pg_control /tmp
! cp -r /var/lib/pgsql/data /path/to/backup
! mv /tmp/pg_control /path/to/backup/data/global
! </programlisting>
!       This avoids the problem that we might fail to restore a consistent
!       database state because recovery starts from the later restart point
!       than the start of the backup.
!      </para>
!     </listitem>
!     <listitem>
!      <para>
!       Take the backup ending WAL location by calling the <function>
!       pg_last_xlog_replay_location</> function at the end of the backup,
!       keep the record of it with the backup.
! <programlisting>
! psql -c "select pg_last_xlog_replay_location();" > /path/to/backup/end_location
! </programlisting>
!       When recovering from the incrementally updated backup, the server
!       can begin accepting connections and complete the recovery successfully
!       before the database has become consistent. To avoid these problems,
!       you must check whether the database has been consistent before
!       your users try to connect to the server and when archive recovery ends.
!       You can do this by comparing the progress of the recovery with
!       the backup ending WAL location recorded. The progress of the recovery
!       is also taken from the <function>pg_last_xlog_replay_location</> function.
!      </para>
!      <para>
!      </para>
!     </listitem>
!    </orderedlist>
!    </para>
! 
!    <para>
!     You no longer need to keep WAL files from before the restart point of the backup.
!     That restart point can be taken by running <application>pg_controldata</>
!     on the backup to inspect the <filename>pg_control</> file and determine the
!     latest checkpoint's REDO location, or by using the <varname>log_checkpoints</>
!     option to print values to the standby's server log.
     </para>
    </sect1>
  
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***************
*** 85,90 **** postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
--- 85,107 ----
    </para>
  
    <para>
+    In streaming replication (see <xref linkend="streaming-replication"> for details),
+    WAL sender process is listed in the <command>ps</> display on the primary server.
+    The form of its command line display is:
+ <screen>
+ postgres: wal sender process <replaceable>user</> <replaceable>host</> streaming <replaceable>location</>
+ </screen>
+    The user and host items remain the same for the life of the replication connection.
+    The location indicates how far WAL sender process has sent the WAL.
+    On the other hand, WAL receiver process is listed in the <command>ps</> display
+    on the standby server. The form of its command line display is:
+ <screen>
+ postgres: wal receiver process streaming <replaceable>location</>
+ </screen>
+    The location indicates how far WAL receiver has received the WAL.
+   </para>
+ 
+   <para>
     If you have turned off <xref linkend="guc-update-process-title"> then the
     activity indicator is not updated; the process title is set only once
     when a new process is launched.  On some platforms this saves a measurable
*** a/doc/src/sgml/recovery-config.sgml
--- b/doc/src/sgml/recovery-config.sgml
***************
*** 11,30 ****
  
     <para>
        This chapter describes the settings available in
!       <filename>recovery.conf</> file. They apply only for the duration of
        the recovery.  (A sample file, <filename>share/recovery.conf.sample</>,
        exists in the installation's <filename>share/</> directory.)  They must
        be reset for any subsequent recovery you wish to perform. They cannot
        be changed once recovery has begun.
     </para>
  
    <sect1 id="archive-recovery-settings">
  
!     <title>Archive recovery settings</title>
       <variablelist>
  
       <varlistentry id="restore-command" xreflabel="restore_command">
        <term><varname>restore_command</varname> (<type>string</type>)</term>
        <listitem>
         <para>
          The shell command to execute to retrieve an archived segment of
--- 11,56 ----
  
     <para>
        This chapter describes the settings available in
!       <filename>recovery.conf</><indexterm><primary>recovery.conf</></> file.
!       They apply only for the duration of
        the recovery.  (A sample file, <filename>share/recovery.conf.sample</>,
        exists in the installation's <filename>share/</> directory.)  They must
        be reset for any subsequent recovery you wish to perform. They cannot
        be changed once recovery has begun.
+       Upon completion of the recovery, <filename>recovery.conf</> is renamed
+       to <filename>recovery.done</> (to prevent accidentally re-entering
+       recovery mode later).
+    </para>
+ 
+    <para>
+     All parameter names are case-sensitive. Every parameter takes a
+     value of one of four types: Boolean, integer, string or timestamp.
+     Boolean values can be written as <literal>on</literal>,
+     <literal>off</literal>, <literal>true</literal>,
+     <literal>false</literal>, <literal>yes</literal>,
+     <literal>no</literal>, <literal>1</literal>,
+     <literal>0</literal> (all case-insensitive) or any unambiguous prefix
+     of these.
+    </para>
+ 
+    <para>
+     One parameter is specified per line. The equal sign between name and
+     value is optional. Whitespace is insignificant and blank lines are
+     ignored. Hash marks (<literal>#</literal>) designate the rest of the
+     line as a comment.  A parameter value must be single-quoted. A single
+     quote must not be embedded in a parameter value.
     </para>
  
    <sect1 id="archive-recovery-settings">
  
!     <title>Archive Recovery</title>
       <variablelist>
  
       <varlistentry id="restore-command" xreflabel="restore_command">
        <term><varname>restore_command</varname> (<type>string</type>)</term>
+       <indexterm>
+        <primary><varname>restore_command</> configuration parameter</primary>
+       </indexterm>
        <listitem>
         <para>
          The shell command to execute to retrieve an archived segment of
***************
*** 61,66 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
--- 87,95 ----
  
       <varlistentry id="restartpoint-command" xreflabel="restartpoint_command">
        <term><varname>restartpoint_command</varname> (<type>string</type>)</term>
+       <indexterm>
+        <primary><varname>restartpoint_command</> configuration parameter</primary>
+       </indexterm>
        <listitem>
         <para>
          This parameter specifies a shell command that will be executed at
***************
*** 87,92 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
--- 116,124 ----
  
       <varlistentry id="recovery-end-command" xreflabel="recovery_end_command">
        <term><varname>recovery_end_command</varname> (<type>string</type>)</term>
+       <indexterm>
+        <primary><varname>recovery_end_command</> configuration parameter</primary>
+       </indexterm>
        <listitem>
         <para>
          This parameter specifies a shell command that will be executed once only
***************
*** 111,123 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  
    <sect1 id="recovery-target-settings">
  
!     <title>Recovery target settings</title>
       <variablelist>
  
       <varlistentry id="recovery-target-time" xreflabel="recovery_target_time">
        <term><varname>recovery_target_time</varname>
             (<type>timestamp</type>)
        </term>
        <listitem>
         <para>
          This parameter specifies the time stamp up to which recovery
--- 143,158 ----
  
    <sect1 id="recovery-target-settings">
  
!     <title>Recovery Target</title>
       <variablelist>
  
       <varlistentry id="recovery-target-time" xreflabel="recovery_target_time">
        <term><varname>recovery_target_time</varname>
             (<type>timestamp</type>)
        </term>
+       <indexterm>
+        <primary><varname>recovery_target_time</> configuration parameter</primary>
+       </indexterm>
        <listitem>
         <para>
          This parameter specifies the time stamp up to which recovery
***************
*** 132,138 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </varlistentry>
  
       <varlistentry id="recovery-target-xid" xreflabel="recovery_target_xid">
!       <term><varname>recovery_target_xid</varname> (<type>string</type>)</term>
        <listitem>
         <para>
          This parameter specifies the transaction ID up to which recovery
--- 167,176 ----
       </varlistentry>
  
       <varlistentry id="recovery-target-xid" xreflabel="recovery_target_xid">
!       <term><varname>recovery_target_xid</varname> (<type>integer</type>)</term>
!       <indexterm>
!        <primary><varname>recovery_target_xid</> configuration parameter</primary>
!       </indexterm>
        <listitem>
         <para>
          This parameter specifies the transaction ID up to which recovery
***************
*** 155,160 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
--- 193,201 ----
        <term><varname>recovery_target_inclusive</varname>
          (<type>boolean</type>)
        </term>
+       <indexterm>
+        <primary><varname>recovery_target_inclusive</> configuration parameter</primary>
+       </indexterm>
        <listitem>
         <para>
          Specifies whether we stop just after the specified recovery target
***************
*** 174,179 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
--- 215,223 ----
        <term><varname>recovery_target_timeline</varname>
          (<type>string</type>)
        </term>
+       <indexterm>
+        <primary><varname>recovery_target_timeline</> configuration parameter</primary>
+       </indexterm>
        <listitem>
         <para>
          Specifies recovering into a particular timeline.  The default is
***************
*** 191,201 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  
    <sect1 id="standby-settings">
  
!     <title>Standby server settings</title>
       <variablelist>
  
         <varlistentry id="standby-mode" xreflabel="standby_mode">
          <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
          <listitem>
           <para>
            Specifies whether to start the <productname>PostgreSQL</> server as
--- 235,248 ----
  
    <sect1 id="standby-settings">
  
!     <title>Standby Server</title>
       <variablelist>
  
         <varlistentry id="standby-mode" xreflabel="standby_mode">
          <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
+       <indexterm>
+        <primary><varname>standby_mode</> configuration parameter</primary>
+       </indexterm>
          <listitem>
           <para>
            Specifies whether to start the <productname>PostgreSQL</> server as
***************
*** 209,214 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
--- 256,264 ----
         </varlistentry>
         <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
          <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
+       <indexterm>
+        <primary><varname>primary_conninfo</> configuration parameter</primary>
+       </indexterm>
          <listitem>
           <para>
            Specifies a connection string to be used for the standby server
***************
*** 221,247 **** restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
            defaults are used.
           </para>
           <para>
-           The built-in replication requires that a host name (or host address)
-           or port number which the primary server listens on be
-           specified in this string. Also ensure that a role with
-           the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
-           primary is set (see
-           <xref linkend="streaming-replication-authentication">). Note that
-           the password needs to be set if the primary demands password
-           authentication.
-          </para>
-          <para>
            This setting has no effect if <varname>standby_mode</> is <literal>off</>.
           </para>
          </listitem>
         </varlistentry>
         <varlistentry id="trigger-file" xreflabel="trigger_file">
          <term><varname>trigger_file</varname> (<type>string</type>)</term>
          <listitem>
           <para>
            Specifies a trigger file whose presence ends recovery in the
!           standby. If no trigger file is specified, the standby never exits
!           recovery.
            This setting has no effect if <varname>standby_mode</> is <literal>off</>.
           </para>
          </listitem>
--- 271,291 ----
            defaults are used.
           </para>
           <para>
            This setting has no effect if <varname>standby_mode</> is <literal>off</>.
           </para>
          </listitem>
         </varlistentry>
         <varlistentry id="trigger-file" xreflabel="trigger_file">
          <term><varname>trigger_file</varname> (<type>string</type>)</term>
+       <indexterm>
+        <primary><varname>trigger_file</> configuration parameter</primary>
+       </indexterm>
          <listitem>
           <para>
            Specifies a trigger file whose presence ends recovery in the
!           standby. This can be a path relative to the data directory or
!           an absolute path. If no trigger file is specified, the standby
!           never exits recovery.
            This setting has no effect if <varname>standby_mode</> is <literal>off</>.
           </para>
          </listitem>
*** a/doc/src/sgml/storage.sgml
--- b/doc/src/sgml/storage.sgml
***************
*** 30,36 **** managed by different server instances, can exist on the same machine.
  The <varname>PGDATA</> directory contains several subdirectories and control
  files, as shown in <xref linkend="pgdata-contents-table">.  In addition to
  these required items, the cluster configuration files
! <filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and
  <filename>pg_ident.conf</filename> are traditionally stored in
  <varname>PGDATA</> (although in <productname>PostgreSQL</productname> 8.0 and
  later, it is possible to keep them elsewhere).
--- 30,37 ----
  The <varname>PGDATA</> directory contains several subdirectories and control
  files, as shown in <xref linkend="pgdata-contents-table">.  In addition to
  these required items, the cluster configuration files
! <filename>postgresql.conf</filename>, <filename>recovery.conf</filename>,
! <filename>pg_hba.conf</filename>, and
  <filename>pg_ident.conf</filename> are traditionally stored in
  <varname>PGDATA</> (although in <productname>PostgreSQL</productname> 8.0 and
  later, it is possible to keep them elsewhere).
*** a/src/backend/access/transam/recovery.conf.sample
--- b/src/backend/access/transam/recovery.conf.sample
***************
*** 88,94 ****
  #recovery_target_timeline = '33'		# number or 'latest'
  #
  #---------------------------------------------------------------------------
! # LOG-STREAMING REPLICATION PARAMETERS
  #---------------------------------------------------------------------------
  #
  # When standby_mode is enabled, the PostgreSQL server will work as
--- 88,94 ----
  #recovery_target_timeline = '33'		# number or 'latest'
  #
  #---------------------------------------------------------------------------
! # STANDBY SERVER PARAMETERS
  #---------------------------------------------------------------------------
  #
  # When standby_mode is enabled, the PostgreSQL server will work as
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#1)
Re: Streaming replication document improvements

Fujii Masao wrote:

***************
*** 829,834 **** if (!triggered)
--- 826,834 ----
<para>
Set the maximum number of concurrent connections from the standby servers
(see <xref linkend="guc-max-wal-senders"> for details).
+        Since those connections are for superusers,
+        <xref linkend="guc-superuser-reserved-connections"> should be
+        set accordingly.
</para>
</listitem>
<listitem>

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots? How should
superuser_reserved_connections be set?

*** a/src/backend/access/transam/recovery.conf.sample
--- b/src/backend/access/transam/recovery.conf.sample
***************
*** 88,94 ****
#recovery_target_timeline = '33'		# number or 'latest'
#
#---------------------------------------------------------------------------
! # LOG-STREAMING REPLICATION PARAMETERS
#---------------------------------------------------------------------------
#
# When standby_mode is enabled, the PostgreSQL server will work as
--- 88,94 ----
#recovery_target_timeline = '33'		# number or 'latest'
#
#---------------------------------------------------------------------------
! # STANDBY SERVER PARAMETERS
#---------------------------------------------------------------------------
#
# When standby_mode is enabled, the PostgreSQL server will work as

Hmm, that makes the HOT STANDBY notice after that section look weird:

#---------------------------------------------------------------------------
# HOT STANDBY PARAMETERS
#---------------------------------------------------------------------------
#
# Hot Standby related parameters are listed in postgresql.conf
#
#---------------------------------------------------------------------------

Do we need that notice? Maybe just move that sentence to the "STANDBY
SERVER PARAMETERS" section.

I just committed a patch to move around the high-availability sections a
bit. That caused conflicts with this patch, so I picked the changes from
the patch and applied them over the new layout, and I also did a lot of
other editing. So, I committed most parts of this patch (except the
above), with a lot of changes to fix the bit-rot, and also other editing
to my liking. I hope I made it better not worse.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: Streaming replication document improvements

On Thu, Apr 1, 2010 at 5:39 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Fujii Masao wrote:

***************
*** 829,834 **** if (!triggered)
--- 826,834 ----
        <para>
         Set the maximum number of concurrent connections from the standby servers
         (see <xref linkend="guc-max-wal-senders"> for details).
+        Since those connections are for superusers,
+        <xref linkend="guc-superuser-reserved-connections"> should be
+        set accordingly.
        </para>
       </listitem>
       <listitem>

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots?

Yes. Since SR connection is superuser connection, setting
superuser_reserved_connections appropriately would be useful
to prevent non-superuser connections from blocking the connection
from the standby.

How should
superuser_reserved_connections be set?

Well.. setting the number of superuser connection required for
maintenance plus max_wal_senders seems to be reasonable.

*** a/src/backend/access/transam/recovery.conf.sample
--- b/src/backend/access/transam/recovery.conf.sample
***************
*** 88,94 ****
  #recovery_target_timeline = '33'            # number or 'latest'
  #
  #---------------------------------------------------------------------------
! # LOG-STREAMING REPLICATION PARAMETERS
  #---------------------------------------------------------------------------
  #
  # When standby_mode is enabled, the PostgreSQL server will work as
--- 88,94 ----
  #recovery_target_timeline = '33'            # number or 'latest'
  #
  #---------------------------------------------------------------------------
! # STANDBY SERVER PARAMETERS
  #---------------------------------------------------------------------------
  #
  # When standby_mode is enabled, the PostgreSQL server will work as

Hmm, that makes the HOT STANDBY notice after that section look weird:

#---------------------------------------------------------------------------
# HOT STANDBY PARAMETERS
#---------------------------------------------------------------------------
#
# Hot Standby related parameters are listed in postgresql.conf
#
#---------------------------------------------------------------------------

Do we need that notice? Maybe just move that sentence to the "STANDBY
SERVER PARAMETERS" section.

I don't think that the notice is necessary. But I agree to the move.

I just committed a patch to move around the high-availability sections a
bit. That caused conflicts with this patch, so I picked the changes from
the patch and applied them over the new layout, and I also did a lot of
other editing. So, I committed most parts of this patch (except the
above), with a lot of changes to fix the bit-rot, and also other editing
to my liking. I hope I made it better not worse.

Thanks a lot!

doc/src/sgml/monitoring.sgml

+    In streaming replication (see <xref linkend="streaming-replication"> for details),
+    WAL sender process is listed in the <command>ps</> display on the primary server.
+    The form of its command line display is:
+ <screen>
+ postgres: wal sender process <replaceable>user</> <replaceable>host</> streaming <replaceable>location</>
+ </screen>
+    The user and host items remain the same for the life of the replication connection.
+    The location indicates how far WAL sender process has sent the WAL.
+    On the other hand, WAL receiver process is listed in the <command>ps</> display
+    on the standby server. The form of its command line display is:
+ <screen>
+ postgres: wal receiver process streaming <replaceable>location</>
+ </screen>
+    The location indicates how far WAL receiver has received the WAL.
+   </para>
+
+   <para>

The original patch includes the above change for monitoring.sgml, but
it's been excluded from the commit. You think that it's not worth applying
the change?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#4Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#3)
Re: [DOCS] Streaming replication document improvements

On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots?

Yes. Since SR connection is superuser connection, setting
superuser_reserved_connections appropriately would be useful
to prevent non-superuser connections from blocking the connection
from the standby.

I think this is wacky, especially since we'd someday like to have
replication be a separate privilege from superuser. I think we should
change this.

...Robert

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#4)
Re: [HACKERS] Streaming replication document improvements

On Thu, Apr 1, 2010 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots?

Yes. Since SR connection is superuser connection, setting
superuser_reserved_connections appropriately would be useful
to prevent non-superuser connections from blocking the connection
from the standby.

I think this is wacky, especially since we'd someday like to have
replication be a separate privilege from superuser.  I think we should
change this.

You mean that we should change replication connection not to consume
superuser_reserved_connections slots in 9.0?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#6Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#5)
Re: [DOCS] Streaming replication document improvements

On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 1, 2010 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots?

Yes. Since SR connection is superuser connection, setting
superuser_reserved_connections appropriately would be useful
to prevent non-superuser connections from blocking the connection
from the standby.

I think this is wacky, especially since we'd someday like to have
replication be a separate privilege from superuser.  I think we should
change this.

You mean that we should change replication connection not to consume
superuser_reserved_connections slots in 9.0?

Yes.

...Robert

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#6)
Re: [HACKERS] Streaming replication document improvements

On Thu, Apr 1, 2010 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 1, 2010 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots?

Yes. Since SR connection is superuser connection, setting
superuser_reserved_connections appropriately would be useful
to prevent non-superuser connections from blocking the connection
from the standby.

I think this is wacky, especially since we'd someday like to have
replication be a separate privilege from superuser.  I think we should
change this.

You mean that we should change replication connection not to consume
superuser_reserved_connections slots in 9.0?

Yes.

Preventing superuser connections from consuming superuser_reserved_connections
slots seems strange for me. So I'm leaning toward just removing superuser
privilege from replication connection again. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#7)
Re: [HACKERS] Streaming replication document improvements

Fujii Masao wrote:

On Thu, Apr 1, 2010 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

You mean that we should change replication connection not to consume
superuser_reserved_connections slots in 9.0?

Yes.

I think it's good that walsenders can use the superuser reserved slots,
that way a client that opens max_connections connections can't block out
standby servers from connecting.

Preventing superuser connections from consuming superuser_reserved_connections
slots seems strange for me. So I'm leaning toward just removing superuser
privilege from replication connection again. Thought?

That would be good, but I fear it's a bigger change than we should be
doing at this point.

How about we adjust the backends math a bit:

Currently:

ReservedBackends = superuser_reserved_connections
MaxBackends = max_connections + autovacuum_max_workers + 1;

Proposal:

ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

So we implicitly reserve a slot and a superuser reserved slot for each
walsender. Walsenders use the slots reserved for superusers, but if you
set superuser_reserved_connections=3, there's still always at least
three slots available for superuser to log in with psql, even if the
maximum number of walsenders are connected.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#8)
Re: [HACKERS] Streaming replication document improvements

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Fujii Masao wrote:

Preventing superuser connections from consuming superuser_reserved_connections
slots seems strange for me. So I'm leaning toward just removing superuser
privilege from replication connection again. Thought?

That would be good, but I fear it's a bigger change than we should be
doing at this point.

Exactly. Despite Robert's unhappiness, NONE of this should get changed
right now. When and if we create a separate replication privilege,
it'll be time enough to revisit the connection limit arithmetic.

regards, tom lane

#10Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#8)
Re: [DOCS] Streaming replication document improvements

On Thu, Apr 1, 2010 at 9:09 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Fujii Masao wrote:

On Thu, Apr 1, 2010 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

You mean that we should change replication connection not to consume
superuser_reserved_connections slots in 9.0?

Yes.

I think it's good that walsenders can use the superuser reserved slots,
that way a client that opens max_connections connections can't block out
standby servers from connecting.

Preventing superuser connections from consuming superuser_reserved_connections
slots seems strange for me. So I'm leaning toward just removing superuser
privilege from replication connection again. Thought?

That would be good, but I fear it's a bigger change than we should be
doing at this point.

How about we adjust the backends math a bit:

Currently:

ReservedBackends = superuser_reserved_connections
MaxBackends = max_connections + autovacuum_max_workers + 1;

Proposal:

ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

So we implicitly reserve a slot and a superuser reserved slot for each
walsender. Walsenders use the slots reserved for superusers, but if you
set superuser_reserved_connections=3, there's still always at least
three slots available for superuser to log in with psql, even if the
maximum number of walsenders are connected.

That seems pretty reasonable to me. I haven't checked how much code
impact there is. I know Tom doesn't think we should change it at all,
but surely pre-beta is the time to fix nasty corner cases that were
added by recently committed patches?

...Robert

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: [HACKERS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> writes:

That seems pretty reasonable to me. I haven't checked how much code
impact there is. I know Tom doesn't think we should change it at all,
but surely pre-beta is the time to fix nasty corner cases that were
added by recently committed patches?

What nasty corner case? Having replication connections use superuser
reserved slots seems exactly the behavior I'd expect given that they are
running as superuser. I agree it would be good to decouple that later,
but we already decided we are not going to try to separate replication
privilege from superuser in 9.0.

(Also, autovacuum workers are a quite separate concept since the DBA
doesn't set them up or deal with them directly. So I'm unimpressed by
pointing to the treatment of autovacuum_max_workers as a precedent.)

regards, tom lane

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#11)
Re: [DOCS] Streaming replication document improvements

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Having replication connections use superuser reserved slots seems
exactly the behavior I'd expect given that they are running as
superuser.

It seems within the realm of possibility that not all users would
think to boost superuser_reserved_connections by the number of
replication connections, and be surprised when they are unable to
connect in an emergency.

-Kevin

#13Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#12)
Re: [HACKERS] Streaming replication document improvements

On 4/1/10 10:44 AM, Kevin Grittner wrote:

It seems within the realm of possibility that not all users would
think to boost superuser_reserved_connections by the number of
replication connections, and be surprised when they are unable to
connect in an emergency.

Well, that's easy to add to the documentation.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#14Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#13)
Re: [DOCS] Streaming replication document improvements

On Thu, Apr 1, 2010 at 1:46 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 4/1/10 10:44 AM, Kevin Grittner wrote:

It seems within the realm of possibility that not all users would
think to boost superuser_reserved_connections by the number of
replication connections, and be surprised when they are unable to
connect in an emergency.

Well, that's easy to add to the documentation.

It's probably also easy to fix so that it doesn't NEED to be documented.

The thing is, when dealing with new features, we reduce our overall
maintenance burden if we get it right the first time. Obviously it's
too late for major changes, but minor adjustments to maintain the POLA
seem like exactly what we SHOULD be doing right now.

...Robert

#15Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#14)
Re: [HACKERS] Streaming replication document improvements

The thing is, when dealing with new features, we reduce our overall
maintenance burden if we get it right the first time. Obviously it's
too late for major changes, but minor adjustments to maintain the POLA
seem like exactly what we SHOULD be doing right now.

Oh, I agree. Since we have a separate WALSender limit, it seems
counter-intuitive and difficult-to-admin to have the WALSenders also
limited by superuser_connections. They should be their own separate
connection pool, just like the other "background" processes.

However, if this was somehow infeasible, it wouldn't be hard to
document. That's all.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#16Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Josh Berkus (#15)
Re: [DOCS] Streaming replication document improvements

Josh Berkus <josh@agliodbs.com> writes:

Oh, I agree. Since we have a separate WALSender limit, it seems
counter-intuitive and difficult-to-admin to have the WALSenders also
limited by superuser_connections. They should be their own separate
connection pool, just like the other "background" processes.

+1

Regards,
--
dim

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#15)
Re: [HACKERS] Streaming replication document improvements

On Thu, 2010-04-01 at 11:49 -0700, Josh Berkus wrote:

The thing is, when dealing with new features, we reduce our overall
maintenance burden if we get it right the first time. Obviously it's
too late for major changes, but minor adjustments to maintain the POLA
seem like exactly what we SHOULD be doing right now.

Oh, I agree. Since we have a separate WALSender limit, it seems
counter-intuitive and difficult-to-admin to have the WALSenders also
limited by superuser_connections. They should be their own separate
connection pool, just like the other "background" processes.

However, if this was somehow infeasible, it wouldn't be hard to
document. That's all.

+1

--
Simon Riggs www.2ndQuadrant.com

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#14)
1 attachment(s)
Re: [HACKERS] Streaming replication document improvements

On Fri, Apr 2, 2010 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

It's probably also easy to fix so that it doesn't NEED to be documented.

The thing is, when dealing with new features, we reduce our overall
maintenance burden if we get it right the first time.  Obviously it's
too late for major changes, but minor adjustments to maintain the POLA
seem like exactly what we SHOULD be doing right now.

The attached patch implements the Heikki's proposal:

----------
ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1
----------

This change looks like minor adjustments rather than major changes.

I have one question:
In the patch, max_wal_senders is ignored by CheckRequiredParameterValues()
as autovacuum_max_workers is already. Is this OK for HS?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

replication_connections_v1.patchapplication/octet-stream; name=replication_connections_v1.patchDownload
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 162,175 **** char	   *ListenAddresses;
  
  /*
   * ReservedBackends is the number of backends reserved for superuser use.
!  * This number is taken out of the pool size given by MaxBackends so
!  * number of backend slots available to non-superusers is
!  * (MaxBackends - ReservedBackends).  Note what this really means is
!  * "if there are <= ReservedBackends connections available, only superusers
!  * can make new connections" --- pre-existing superuser connections don't
!  * count against the limit.
   */
! int			ReservedBackends;
  
  /* The socket(s) we're listening to. */
  #define MAXLISTEN	64
--- 162,177 ----
  
  /*
   * ReservedBackends is the number of backends reserved for superuser use.
!  * This number is ReservedConnections + max_wal_senders (it is computed by
!  * the GUC assign hooks for those variables) and taken out of the pool size
!  * given by MaxBackends so number of backend slots available to
!  * non-superusers is (MaxBackends - ReservedBackends).  Note what this
!  * really means is "if there are <= ReservedBackends connections available,
!  * only superusers can make new connections" --- pre-existing superuser
!  * connections don't count against the limit.
   */
! int			ReservedBackends = 3;
! int			ReservedConnections = 3;
  
  /* The socket(s) we're listening to. */
  #define MAXLISTEN	64
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***************
*** 142,149 **** ProcGlobalSemas(void)
   *	  running out when trying to start another backend is a common failure.
   *	  So, now we grab enough semaphores to support the desired max number
   *	  of backends immediately at initialization --- if the sysadmin has set
!  *	  MaxConnections or autovacuum_max_workers higher than his kernel will
!  *	  support, he'll find out sooner rather than later.
   *
   *	  Another reason for creating semaphores here is that the semaphore
   *	  implementation typically requires us to create semaphores in the
--- 142,149 ----
   *	  running out when trying to start another backend is a common failure.
   *	  So, now we grab enough semaphores to support the desired max number
   *	  of backends immediately at initialization --- if the sysadmin has set
!  *	  MaxConnections, autovacuum_max_workers, or max_wal_senders higher
!  *	  than his kernel will support, he'll find out sooner rather than later.
   *
   *	  Another reason for creating semaphores here is that the semaphore
   *	  implementation typically requires us to create semaphores in the
***************
*** 158,163 **** InitProcGlobal(void)
--- 158,164 ----
  {
  	PGPROC	   *procs;
  	int			i;
+ 	int			numconn;
  	bool		found;
  
  	/* Create the ProcGlobal shared structure */
***************
*** 185,197 **** InitProcGlobal(void)
  	/*
  	 * Pre-create the PGPROC structures and create a semaphore for each.
  	 */
! 	procs = (PGPROC *) ShmemAlloc((MaxConnections) * sizeof(PGPROC));
  	if (!procs)
  		ereport(FATAL,
  				(errcode(ERRCODE_OUT_OF_MEMORY),
  				 errmsg("out of shared memory")));
! 	MemSet(procs, 0, MaxConnections * sizeof(PGPROC));
! 	for (i = 0; i < MaxConnections; i++)
  	{
  		PGSemaphoreCreate(&(procs[i].sem));
  		procs[i].links.next = (SHM_QUEUE *) ProcGlobal->freeProcs;
--- 186,199 ----
  	/*
  	 * Pre-create the PGPROC structures and create a semaphore for each.
  	 */
! 	numconn = MaxConnections + max_wal_senders;
! 	procs = (PGPROC *) ShmemAlloc((numconn) * sizeof(PGPROC));
  	if (!procs)
  		ereport(FATAL,
  				(errcode(ERRCODE_OUT_OF_MEMORY),
  				 errmsg("out of shared memory")));
! 	MemSet(procs, 0, numconn * sizeof(PGPROC));
! 	for (i = 0; i < numconn; i++)
  	{
  		PGSemaphoreCreate(&(procs[i].sem));
  		procs[i].links.next = (SHM_QUEUE *) ProcGlobal->freeProcs;
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
***************
*** 100,107 **** int			maintenance_work_mem = 16384;
  
  /*
   * Primary determinants of sizes of shared-memory structures.  MaxBackends is
!  * MaxConnections + autovacuum_max_workers + 1 (it is computed by the GUC
!  * assign hooks for those variables):
   */
  int			NBuffers = 1000;
  int			MaxBackends = 100;
--- 100,107 ----
  
  /*
   * Primary determinants of sizes of shared-memory structures.  MaxBackends is
!  * MaxConnections + autovacuum_max_workers + 1 + max_wal_senders (it is
!  * computed by the GUC assign hooks for those variables):
   */
  int			NBuffers = 1000;
  int			MaxBackends = 100;
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 172,178 **** static const char *show_tcp_keepalives_idle(void);
--- 172,180 ----
  static const char *show_tcp_keepalives_interval(void);
  static const char *show_tcp_keepalives_count(void);
  static bool assign_maxconnections(int newval, bool doit, GucSource source);
+ static bool assign_reserved_connections(int newval, bool doit, GucSource source);
  static bool assign_autovacuum_max_workers(int newval, bool doit, GucSource source);
+ static bool assign_max_wal_senders(int newval, bool doit, GucSource source);
  static bool assign_effective_io_concurrency(int newval, bool doit, GucSource source);
  static const char *assign_pgstat_temp_directory(const char *newval, bool doit, GucSource source);
  static const char *assign_application_name(const char *newval, bool doit, GucSource source);
***************
*** 1361,1367 **** static struct config_int ConfigureNamesInt[] =
  	 * Note: MaxBackends is limited to INT_MAX/4 because some places compute
  	 * 4*MaxBackends without any overflow check.  This check is made in
  	 * assign_maxconnections, since MaxBackends is computed as MaxConnections
! 	 * plus autovacuum_max_workers plus one (for the autovacuum launcher).
  	 *
  	 * Likewise we have to limit NBuffers to INT_MAX/2.
  	 *
--- 1363,1370 ----
  	 * Note: MaxBackends is limited to INT_MAX/4 because some places compute
  	 * 4*MaxBackends without any overflow check.  This check is made in
  	 * assign_maxconnections, since MaxBackends is computed as MaxConnections
! 	 * plus autovacuum_max_workers plus one (for the autovacuum launcher)
! 	 * plus max_wal_senders.
  	 *
  	 * Likewise we have to limit NBuffers to INT_MAX/2.
  	 *
***************
*** 1390,1397 **** static struct config_int ConfigureNamesInt[] =
  			gettext_noop("Sets the number of connection slots reserved for superusers."),
  			NULL
  		},
! 		&ReservedBackends,
! 		3, 0, INT_MAX / 4, NULL, NULL
  	},
  
  	{
--- 1393,1400 ----
  			gettext_noop("Sets the number of connection slots reserved for superusers."),
  			NULL
  		},
! 		&ReservedConnections,
! 		3, 0, INT_MAX / 4, assign_reserved_connections, NULL
  	},
  
  	{
***************
*** 1706,1712 **** static struct config_int ConfigureNamesInt[] =
  			NULL
  		},
  		&max_wal_senders,
! 		0, 0, INT_MAX / 4, NULL, NULL
  	},
  
  	{
--- 1709,1715 ----
  			NULL
  		},
  		&max_wal_senders,
! 		0, 0, INT_MAX / 4, assign_max_wal_senders, NULL
  	},
  
  	{
***************
*** 7819,7829 **** show_tcp_keepalives_count(void)
  static bool
  assign_maxconnections(int newval, bool doit, GucSource source)
  {
! 	if (newval + autovacuum_max_workers + 1 > INT_MAX / 4)
  		return false;
  
  	if (doit)
! 		MaxBackends = newval + autovacuum_max_workers + 1;
  
  	return true;
  }
--- 7822,7844 ----
  static bool
  assign_maxconnections(int newval, bool doit, GucSource source)
  {
! 	if (newval + autovacuum_max_workers + 1 + max_wal_senders > INT_MAX / 4)
  		return false;
  
  	if (doit)
! 		MaxBackends = newval + autovacuum_max_workers + 1 + max_wal_senders;
! 
! 	return true;
! }
! 
! static bool
! assign_reserved_connections(int newval, bool doit, GucSource source)
! {
! 	if (newval + max_wal_senders > INT_MAX / 4)
! 		return false;
! 
! 	if (doit)
! 		ReservedBackends = newval + max_wal_senders;
  
  	return true;
  }
***************
*** 7831,7841 **** assign_maxconnections(int newval, bool doit, GucSource source)
  static bool
  assign_autovacuum_max_workers(int newval, bool doit, GucSource source)
  {
! 	if (MaxConnections + newval + 1 > INT_MAX / 4)
  		return false;
  
  	if (doit)
! 		MaxBackends = MaxConnections + newval + 1;
  
  	return true;
  }
--- 7846,7872 ----
  static bool
  assign_autovacuum_max_workers(int newval, bool doit, GucSource source)
  {
! 	if (MaxConnections + newval + 1 + max_wal_senders > INT_MAX / 4)
  		return false;
  
  	if (doit)
! 		MaxBackends = MaxConnections + newval + 1 + max_wal_senders;
! 
! 	return true;
! }
! 
! static bool
! assign_max_wal_senders(int newval, bool doit, GucSource source)
! {
! 	if (MaxConnections + autovacuum_max_workers + 1 + newval > INT_MAX / 4 ||
! 		ReservedConnections + newval > INT_MAX / 4)
! 		return false;
! 
! 	if (doit)
! 	{
! 		MaxBackends = MaxConnections + autovacuum_max_workers + 1 + newval;
! 		ReservedBackends = ReservedConnections + newval;
! 	}
  
  	return true;
  }
*** a/src/include/postmaster/postmaster.h
--- b/src/include/postmaster/postmaster.h
***************
*** 17,22 ****
--- 17,23 ----
  extern bool EnableSSL;
  extern bool SilentMode;
  extern int	ReservedBackends;
+ extern int	ReservedConnections;
  extern int	PostPortNumber;
  extern int	Unix_socket_permissions;
  extern char *Unix_socket_group;
#19Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#18)
Re: [HACKERS] Streaming replication document improvements

On Fri, Apr 2, 2010 at 2:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Apr 2, 2010 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

It's probably also easy to fix so that it doesn't NEED to be documented.

The thing is, when dealing with new features, we reduce our overall
maintenance burden if we get it right the first time.  Obviously it's
too late for major changes, but minor adjustments to maintain the POLA
seem like exactly what we SHOULD be doing right now.

The attached patch implements the Heikki's proposal:

----------
ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1
----------

This change looks like minor adjustments rather than major changes.

Instead of doing this, could we just change the logic in InitPostgres?

Current logic says we hit the connection limit if:

if (!am_superuser &&
ReservedBackends > 0 &&
!HaveNFreeProcs(ReservedBackends))

Couldn't we just change this to:

if ((!am_superuser || am_walsender) &&
ReservedBackends > 0 &&
!HaveNFreeProcs(ReservedBackends))

Seems like that'd be a whole lot simpler, if it'll do the job...

...Robert

#20Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#19)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 11:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Instead of doing this, could we just change the logic in InitPostgres?

Current logic says we hit the connection limit if:

       if (!am_superuser &&
               ReservedBackends > 0 &&
               !HaveNFreeProcs(ReservedBackends))

Couldn't we just change this to:

       if ((!am_superuser || am_walsender) &&
               ReservedBackends > 0 &&
               !HaveNFreeProcs(ReservedBackends))

Seems like that'd be a whole lot simpler, if it'll do the job...

It's very simple, but prevents superuser replication connection
from being established when connection limit exceeds for
non-superusers. It seems strange to me that superuser cannot use
superuser_reserved_connections slots. If we'd like to forbid
replication connection to use the slots, I think that we should
just get rid of a superuser privilege from it instead.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#21Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#20)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 5:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Apr 20, 2010 at 11:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Instead of doing this, could we just change the logic in InitPostgres?

Current logic says we hit the connection limit if:

       if (!am_superuser &&
               ReservedBackends > 0 &&
               !HaveNFreeProcs(ReservedBackends))

Couldn't we just change this to:

       if ((!am_superuser || am_walsender) &&
               ReservedBackends > 0 &&
               !HaveNFreeProcs(ReservedBackends))

Seems like that'd be a whole lot simpler, if it'll do the job...

It's very simple, but prevents superuser replication connection
from being established when connection limit exceeds for
non-superusers. It seems strange to me that superuser cannot use
superuser_reserved_connections slots. If we'd like to forbid
replication connection to use the slots, I think that we should
just get rid of a superuser privilege from it instead.

Let's just stop for a second and think about why we have
superuser_reserved_connections in the first place. As I understand
it, the point is that we want to make sure that superusers don't get
locked out of the database, because superuser intervention might be
necessary to recover from whatever series of unfortunate events has
caused all of the connection slots to get used up. For example, if
there are several different applications that connect to the database,
the superuser might like to log in and see which application has
requested more than its usual allotment of connections, or the
superuser might like to log in and terminate those backends which, in
his judgement, ought to be terminated. In other words, the purpose of
superuser_reserved_connections is to allow the database to recover
from a bad state that it has gotten into: specifically, a state where
all the connection slots have been used up and regular users can't
connect.

If replication connections can use up superuser_reserved_connections
slots, then it's very possible that this safety valve will fail
completely. If the server is being flooded with connection attempts,
and one of the streaming replication connection dies, then a regular
backend will immediate grab that slot. When the streaming replication
slave automatically tries to reconnect, it will now grab one of the
superuser_reserved_connections slots, putting us one step closer to
the bad scenario where there's no way for the superuser to log in
interactively and troubleshoot.

In other words, I don't care whether or not the replication connection
is or is not technically a superuser connection. What I think is
important is trying to preserve the ability for a superuser to log in
interactively and clean up the mess even when the regular supply of
connections is maxed out.

...Robert

#22Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#21)
Re: [DOCS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> wrote:

I don't care whether or not the replication connection is or is
not technically a superuser connection. What I think is important
is trying to preserve the ability for a superuser to log in
interactively and clean up the mess even when the regular supply
of connections is maxed out.

+1

-Kevin

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#21)
Re: [HACKERS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> writes:

If replication connections can use up superuser_reserved_connections
slots, then it's very possible that this safety valve will fail
completely.

Only if replication can use up *all* the superuser_reserved_connections
slots.

regards, tom lane

#24Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 9:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

If replication connections can use up superuser_reserved_connections
slots, then it's very possible that this safety valve will fail
completely.

Only if replication can use up *all* the superuser_reserved_connections
slots.

Sure. In many cases someone will have 3
superuser_reserved_connections and only 1 wal_sender, so it won't be
an issue. However, I still think we oughta make this (apparently
one-line) fix so that people will have the number of emergency
superuser connections that they expect to have, rather than some
smaller number that might be 0 if they have a number of SR slaves.

...Robert

#25Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#21)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 7:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Let's just stop for a second and think about why we have
superuser_reserved_connections in the first place.  As I understand
it, the point is that we want to make sure that superusers don't get
locked out of the database, because superuser intervention might be
necessary to recover from whatever series of unfortunate events has
caused all of the connection slots to get used up.  For example, if
there are several different applications that connect to the database,
the superuser might like to log in and see which application has
requested more than its usual allotment of connections, or the
superuser might like to log in and terminate those backends which, in
his judgement, ought to be terminated.  In other words, the purpose of
superuser_reserved_connections is to allow the database to recover
from a bad state that it has gotten into: specifically, a state where
all the connection slots have been used up and regular users can't
connect.

If replication connections can use up superuser_reserved_connections
slots, then it's very possible that this safety valve will fail
completely.  If the server is being flooded with connection attempts,
and one of the streaming replication connection dies, then a regular
backend will immediate grab that slot.  When the streaming replication
slave automatically tries to reconnect, it will now grab one of the
superuser_reserved_connections slots, putting us one step closer to
the bad scenario where there's no way for the superuser to log in
interactively and troubleshoot.

In other words, I don't care whether or not the replication connection
is or is not technically a superuser connection.  What I think is
important is trying to preserve the ability for a superuser to log in
interactively and clean up the mess even when the regular supply of
connections is maxed out.

Yeah, I agree with you, but the difference is only how to achieve.
ISTM that there are three choices:

1. Heikki's proposal

ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

2. My proposal
Remove superuser privilege from replication connection

3. Your proposal
Treat superuser replication connection like non-superuser one

Since 3. is confusing for me, I like 1. or 2.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#26Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#25)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 9:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Yeah, I agree with you, but the difference is only how to achieve.
ISTM that there are three choices:

1. Heikki's proposal

ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

This seemed sensible to me when Heikki first described it, but now it
seems overly complex.

2. My proposal
   Remove superuser privilege from replication connection

I'm not sure this really fixes the problem. If we add a separate
replication privilege, then presumably superusers will automatically
have that privilege, in accord with our usual policy on such things.
So potentially someone could still set up replication using a
superuser account and then they could still get bitten by this
problem.

3. Your proposal
   Treat superuser replication connection like non-superuser one

Well, only for this one very specific purpose. I would adjust the
docs like this:

Determines the number of connection "slots" that are reserved for
connections by PostgreSQL superusers. At most max_connections
connections can ever be active simultaneously. Whenever the number of
active concurrent connections is at least max_connections minus
superuser_reserved_connections, new connections will be accepted only
for superusers, and no new replication connections will be accepted.

I think that's pretty simple and clear. If we want to burn an extra
sentence explaining what this is all about, we could add:

(If replication connections were permitted to use the reserved
connection slots, an installation with max_wal_senders set to a value
greater than or equal to the value set for
superuser_reserved_connections might find that no reserved connections
remained for interactive access to the database.)

Since 3. is confusing for me, I like 1. or 2.

What do others think?

...Robert

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#26)
Re: [HACKERS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Apr 20, 2010 at 9:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

3. Your proposal
� �Treat superuser replication connection like non-superuser one

Well, only for this one very specific purpose. I would adjust the
docs like this:

Determines the number of connection "slots" that are reserved for
connections by PostgreSQL superusers. At most max_connections
connections can ever be active simultaneously. Whenever the number of
active concurrent connections is at least max_connections minus
superuser_reserved_connections, new connections will be accepted only
for superusers, and no new replication connections will be accepted.

I think that's pretty simple and clear.

+1. I'm not sure about the consequences of converting walsender to
non-superuser across the board, and would prefer not to experiment
with it at this stage of the release cycle.

regards, tom lane

#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#27)
Re: [HACKERS] Streaming replication document improvements

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Fujii Masao <masao.fujii@gmail.com> wrote:

3. Your proposal
Treat superuser replication connection like non-superuser one

Well, only for this one very specific purpose. I would adjust
the docs like this:

Determines the number of connection "slots" that are reserved for
connections by PostgreSQL superusers. At most max_connections
connections can ever be active simultaneously. Whenever the
number of active concurrent connections is at least
max_connections minus superuser_reserved_connections, new
connections will be accepted only for superusers, and no new
replication connections will be accepted.

I think that's pretty simple and clear.

+1. I'm not sure about the consequences of converting walsender
to non-superuser across the board, and would prefer not to
experiment with it at this stage of the release cycle.

+1

-Kevin

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#19)
Re: [HACKERS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> writes:

Current logic says we hit the connection limit if:

if (!am_superuser &&
ReservedBackends > 0 &&
!HaveNFreeProcs(ReservedBackends))

Couldn't we just change this to:

if ((!am_superuser || am_walsender) &&
ReservedBackends > 0 &&
!HaveNFreeProcs(ReservedBackends))

As of the patch I just committed, that code is not reached anymore by a
walsender process. However, it shouldn't be hard to put a similar test
into the walsender code path.

regards, tom lane

#30Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#26)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

3. Your proposal
   Treat superuser replication connection like non-superuser one

Well, only for this one very specific purpose.  I would adjust the
docs like this:

Determines the number of connection "slots" that are reserved for
connections by PostgreSQL  superusers. At most max_connections
connections can ever be active simultaneously. Whenever the number of
active concurrent connections is at least max_connections minus
superuser_reserved_connections, new connections will be accepted only
for superusers, and no new replication connections will be accepted.

I think that's pretty simple and clear.

Yeah, I'm sold.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#31Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#29)
Re: [HACKERS] Streaming replication document improvements

On Tue, Apr 20, 2010 at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Current logic says we hit the connection limit if:

        if (!am_superuser &&
                ReservedBackends > 0 &&
                !HaveNFreeProcs(ReservedBackends))

Couldn't we just change this to:

        if ((!am_superuser || am_walsender) &&
                ReservedBackends > 0 &&
                !HaveNFreeProcs(ReservedBackends))

As of the patch I just committed, that code is not reached anymore by a
walsender process.  However, it shouldn't be hard to put a similar test
into the walsender code path.

Thanks for the heads up. It doesn't look hard to put a similar test
in the walsender code path, but is there any reason to duplicate the
code? Seems like we might be able to just put this test (with the
necessary modification) right before this comment:

/*
* If walsender, we're done here --- we don't want to connect to any
* particular database.
*/

In fact, in some ways, it seems better to put it up there. If the
database is really being flooded with connection attempts, we want to
ephemerally consume a backend slot for as little time as possible...

...Robert

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
Re: [HACKERS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> writes:

Thanks for the heads up. It doesn't look hard to put a similar test
in the walsender code path, but is there any reason to duplicate the
code? Seems like we might be able to just put this test (with the
necessary modification) right before this comment:

Hm, actually I think you're right: we could move both of those
connection-rejecting tests up to before the walsender exit. The
only extra state we need is ReservedBackends, which should be valid
at that point (in particular, it can't be affected by any process-local
GUC settings).

+1 for just moving the test.

regards, tom lane

#33Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#32)
Re: [HACKERS] Streaming replication document improvements

On Wed, Apr 21, 2010 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Thanks for the heads up.  It doesn't look hard to put a similar test
in the walsender code path, but is there any reason to duplicate the
code?  Seems like we might be able to just put this test (with the
necessary modification) right before this comment:

Hm, actually I think you're right: we could move both of those
connection-rejecting tests up to before the walsender exit.  The

I am guessing that by "both of those connection-rejecting tests", you
mean the one that can throw "must be superuser to connect during
database shutdown" as well as the one we were discussing, which can
throw "connection limit exceeded for non-superusers", in which case...

only extra state we need is ReservedBackends, which should be valid
at that point (in particular, it can't be affected by any process-local
GUC settings).

+1 for just moving the test.

...shouldn't we move the "tests", plural, rather than just the one?
It seems right to reject new SR connections during shutdown.

...Robert

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#33)
Re: [HACKERS] Streaming replication document improvements

Robert Haas <robertmhaas@gmail.com> writes:

...shouldn't we move the "tests", plural, rather than just the one?
It seems right to reject new SR connections during shutdown.

Yeah; you'd also need to adjust both of them to consider am_walsender.
(IOW, we want to treat SR connections as non-superuser for both tests.)

regards, tom lane