backup.sgml patch that adds information on custom format backups

Started by Ivan Lezhnjov IValmost 13 years ago6 messages

Ivan Lezhnjov IV

iliv@commandprompt.com

almost 13 years ago

1 attachment(s)

Hello,

I'd like to submit the following patch that extends backup.sgml with a bit of practical but important information.

Project: postgresql
Patch filename: backup.sgml-cmd-v001.patch

The patch extends backup.sgml and adds practical information on custom format backups approach. Basically, we believe that plaintext backup format is suitable for a very limited range of use cases, and that in real world people are usually better off with a custom format backup. This is what we want PostgreSQL users to be aware of and provide some hands-on examples of how to do backups using this approach.

It is meant for application, and is against master branch.

The patch does pass 'make check' and 'make html' successfully.

PS: this is my first submission ever. So, if I'm missing something or not doing it as expected, please, do let me know. Thank you.

Ivan

Attachments:

backup.sgml-cmd-v001.patchapplication/octet-stream; name=backup.sgml-cmd-v001.patchDownload

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index c4215be..4db52fa 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -102,6 +102,43 @@ pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable cl
    </para>
   </important>
 
+  <para>
+   However, the backup created from this method has limited usefulness. It
+   can be used to restore a single database in full. A more useful form of
+   <productname>PostgreSQL</> backup syntax looks like this:
+<synopsis>
+pg_dump -U <replaceable class="parameter">username</replaceable> --format=c --file=<replaceable class="parameter">mydatabase.sqlc</replaceable> <replaceable class="parameter">dbname</replaceable>
+</synopsis>
+   The options in detail are:
+<synopsis>
+ -U, --username=NAME      connect as specified database user
+ -F, --format=c|t|p       output file format (custom, tar, plain text)
+ -f, --file=FILENAME      output file name
+</synopsis>
+   The most important of which is <replaceable class="parameter">--format</replaceable>.
+   By default <application>pg_dump</> uses the plaintext format. In
+   practice, the plaintext format is useful for very small databases with
+   a minimal number of objects but other than that, it should be avoided.
+   The custom format allows for a wealth of customizability.
+  </para>
+
+  <para>
+   Using the custom format you are able to restore single objects from a
+   backup. For example to restore only a specified index from a backup
+   file:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --index=<replaceable class="parameter">indexname</replaceable>
+</synopsis>
+   If you wanted to restore only a single function:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --function=<replaceable class="parameter">functionname(args)</replaceable>
+</synopsis>
+   If you wanted to restore only a single table:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --table=<replaceable class="parameter">tablename</replaceable>
+</synopsis>
+  </para>
+
   <sect2 id="backup-dump-restore">
    <title>Restoring the Dump</title>
 
@@ -176,6 +213,35 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
    </important>
 
    <para>
+    When it comes to restoring a dump with a custom format the command
+    used to restore a backup file is <application>pg_restore</>. It has
+    similar options to <application>pg_dump</>. A simple restore:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">databasename</replaceable> <replaceable class="parameter">filename</replaceable>
+</synopsis>
+    Where <replaceable class="parameter">filename</replaceable> is the name of the backup file.
+   </para>
+
+   <important>
+    <para>
+     Do not confuse <option>--file</> with <replaceable
+     class="parameter">filename</replaceable>. The
+     <option>--file</> option is used to turn a custom format backup
+     into a plaintext backup. The value of <option>--file</> will be
+     used as the output file for that transformation.
+    </para>
+   </important>
+
+   <para>
+    If you make the mistake of creating a plaintext backup,
+    <application>pg_restore</> can not be used as a restoration
+    mechanism. You can use <application>psql</> to restore it:
+<programlisting>
+psql <replaceable class="parameter">dbname</replaceable> &lt; <replaceable class="parameter">backupfile</replaceable>
+</programlisting>
+   </para>
+
+   <para>
     After restoring a backup, it is wise to run <xref
     linkend="sql-analyze"> on each
     database so the query optimizer has useful statistics;
@@ -222,6 +288,22 @@ psql -f <replaceable class="parameter">infile</replaceable> postgres
     each database will be internally consistent, the snapshots of
     different databases might not be exactly in-sync.
    </para>
+
+   <para>
+    Unfortunately, <application>pg_dumpall</> can only create plaintext
+    backups and should be considered deprecated. However, it is the only way
+    to backup the globals in your cluster. A reasonable backup strategy to
+    backup your globals and produce a flexible backup of every database in
+    the cluster would look like this:
+<programlisting>
+pg_dumpall -g -Uusername --file=globals.sql;
+psql -AtU postgres -c "SELECT datname FROM pg_database \
+                          WHERE NOT datistemplate"| \
+while read f;
+   do pg_dump -Upostgres --format=c --file=$f.sqlc $f;
+done;
+</programlisting>
+   </para>
   </sect2>
 
   <sect2 id="backup-dump-large">

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Ivan Lezhnjov IV (#1)

Re: backup.sgml patch that adds information on custom format backups

On Fri, Feb 8, 2013 at 1:56 PM, Ivan Lezhnjov IV <iliv@commandprompt.com> wrote:

Hello,

I'd like to submit the following patch that extends backup.sgml with a bit of practical but important information.

Project: postgresql
Patch filename: backup.sgml-cmd-v001.patch

The patch extends backup.sgml and adds practical information on custom format backups approach. Basically, we believe that plaintext backup format is suitable for a very limited range of use cases, and that in real world people are usually better off with a custom format backup. This is what we want PostgreSQL users to be aware of and provide some hands-on examples of how to do backups using this approach.

It is meant for application, and is against master branch.

The patch does pass 'make check' and 'make html' successfully.

PS: this is my first submission ever. So, if I'm missing something or not doing it as expected, please, do let me know. Thank you.

Generally it's a good idea to discuss whatever you'd like to change
before you actually write the patch, so as to get consensus on it.
I'm not sure what others think, but the proposed wording seems a bit
hard on plain text dumps to me. For many people, the advantage of
having a file in a format you can read may outweigh the disadvantages
of being unable to do a selective restore. Not that custom-format
dumps aren't great; they certainly are. But it isn't a bug that we
support more than one format.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

almost 13 years ago

In reply to: Robert Haas (#2)

Re: backup.sgml patch that adds information on custom format backups

Robert Haas <robertmhaas@gmail.com> wrote:

I'm not sure what others think, but the proposed wording seems a
bit hard on plain text dumps to me.

Agreed. I don't know how many times I've piped the output of
pg_dump to the input of psql. Certainly that was very common
before pg_upgrade was available. And for development purposes a
text script file is often exactly what is needed -- to store it in
custom format first would just be an extra step which would waste
time. Since I got my head around PITR dumps, I've rarely had a
reason to use the pg_dump custom format, myself.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Robert Haas (#2)

Re: backup.sgml patch that adds information on custom format backups

Robert Haas <robertmhaas@gmail.com> writes:

I'm not sure what others think, but the proposed wording seems a bit
hard on plain text dumps to me.

I wasn't terribly thrilled with labeling pg_dumpall "deprecated",
either. It might be imperfect for some use cases, but that adjective
suggests that we're trying to get rid of it, which surely isn't the
case.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ivan Lezhnjov IV

iliv@commandprompt.com

almost 13 years ago

In reply to: Tom Lane (#4)

Re: backup.sgml patch that adds information on custom format backups

Hello everyone!

I'd like to thank you for quick replies and for the thoughtful feedback.

I'm working on improving the current wording and I'm going to follow up shortly with an updated version.

Please, stay tuned.

Ivan

On Feb 11, 2013, at 12:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm not sure what others think, but the proposed wording seems a bit
hard on plain text dumps to me.

I wasn't terribly thrilled with labeling pg_dumpall "deprecated",
either. It might be imperfect for some use cases, but that adjective
suggests that we're trying to get rid of it, which surely isn't the
case.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ivan Lezhnjov IV

iliv@commandprompt.com

almost 13 years ago

In reply to: Ivan Lezhnjov IV (#5)

1 attachment(s)

Re: backup.sgml patch that adds information on custom format backups

Alright gentlemen here we go, take two.

backup.sgml-cmd-v002.patch addresses your feedback and offers a better worded statements that avoid implying that some features are being deprecated when it isn't the case. We also spent some more time polishing other details, like making adjustments to the tone of the text so that it sounds like a manual, not a blog post, more importantly, making it clear that superuser privileges are not always required to perform a successful backup in practice as long as the role used to make the backup has sufficient read privileges on all of the objects a user is interested in. We also mention and show examples of usage for pigz alongside with gzip, and probably something else too :P Please, see the diff file attached to this message for complete and detailed log of the changes.

Attachments:

backup.sgml-cmd-v002.patchapplication/octet-stream; name=backup.sgml-cmd-v002.patchDownload

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index c4215be..711b61e 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -38,7 +38,7 @@
 pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">outfile</replaceable>
 </synopsis>
    As you see, <application>pg_dump</> writes its result to the
-   standard output. We will see below how this can be useful.
+   standard output.
   </para>
 
   <para>
@@ -47,8 +47,11 @@ pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable cl
    that you can perform this backup procedure from any remote host that has
    access to the database. But remember that <application>pg_dump</>
    does not operate with special permissions. In particular, it must
-   have read access to all tables that you want to back up, so in
-   practice you almost always have to run it as a database superuser.
+   have read access to all tables that you want to back up. Whether
+   it is going to have proper access permissions is determined by
+   privileges granted to a role used to connect to a database.
+   As long as the role has read access to tables, sequences, etc.
+   it need not to be a superuser role.
   </para>
 
   <para>
@@ -95,13 +98,49 @@ pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable cl
 
   <important>
    <para>
-    If your database schema relies on OIDs (for instance, as foreign
-    keys) you must instruct <application>pg_dump</> to dump the OIDs
-    as well. To do this, use the <option>-o</option> command-line
-    option.
+    The use of OIDs on user objects is deprecated. If your database schema
+    relies on OIDs (for instance, as foreign keys) you must instruct
+    <application>pg_dump</> to dump the OIDs as well. To do this, use 
+    the <option>-o</option> command-line option.
    </para>
   </important>
 
+  <para>
+   This type of dump can be used to restore a database in full. However,
+   there is a more sophisticated form of <productname>PostgreSQL</> backup
+   that allows for far greater control when working with backups. A syntax looks like this:
+<synopsis>
+pg_dump -U <replaceable class="parameter">username</replaceable> --format=c --file=<replaceable class="parameter">mydatabase.sqlc</replaceable> <replaceable class="parameter">dbname</replaceable>
+</synopsis>
+   The options in detail are:
+<synopsis>
+ -U, --username=NAME      connect as specified database user
+ -F, --format=c|t|p       output file format (custom, tar, plain text)
+ -f, --file=FILENAME      output file name
+</synopsis>
+   The most important of which is <replaceable class="parameter">--format</replaceable>.
+   By default <application>pg_dump</> uses the plaintext format. In
+   some cases you may be better off using a custom format backup, though.
+   The custom format allows for a wealth of customizability.
+  </para>
+
+  <para>
+   Using the custom format you are able to restore single objects from a
+   backup. For example to restore only a specified index from a backup
+   file:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --index=<replaceable class="parameter">indexname</replaceable>
+</synopsis>
+   If you wanted to restore only a single function:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --function=<replaceable class="parameter">functionname(args)</replaceable>
+</synopsis>
+   If you wanted to restore only a single table:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --table=<replaceable class="parameter">tablename</replaceable>
+</synopsis>
+  </para>
+
   <sect2 id="backup-dump-restore">
    <title>Restoring the Dump</title>
 
@@ -176,6 +215,34 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
    </important>
 
    <para>
+    When it comes to restoring a dump with a custom format the command
+    used to restore a backup file is <application>pg_restore</>. It has
+    similar options to <application>pg_dump</>. A simple restore:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">databasename</replaceable> <replaceable class="parameter">filename</replaceable>
+</synopsis>
+    Where <replaceable class="parameter">filename</replaceable> is the name of the backup file.
+   </para>
+
+   <important>
+    <para>
+     Do not confuse <option>--file</> with <replaceable
+     class="parameter">filename</replaceable>. The
+     <option>--file</> option is used to turn a custom format backup
+     into a plaintext backup. The value of <option>--file</> will be
+     used as the output file for that transformation.
+    </para>
+   </important>
+
+   <para>
+    If what you have is a plaintext backup <application>pg_restore</> can
+    not be used as a restoration mechanism. You can use <application>psql</> to restore it:
+<programlisting>
+psql <replaceable class="parameter">dbname</replaceable> &lt; <replaceable class="parameter">backupfile</replaceable>
+</programlisting>
+   </para>
+
+   <para>
     After restoring a backup, it is wise to run <xref
     linkend="sql-analyze"> on each
     database so the query optimizer has useful statistics;
@@ -222,14 +289,30 @@ psql -f <replaceable class="parameter">infile</replaceable> postgres
     each database will be internally consistent, the snapshots of
     different databases might not be exactly in-sync.
    </para>
+
+   <para>
+    Unfortunately, <application>pg_dumpall</> can only create plaintext
+    backups. However, it is currently the only way to backup the globals in your
+    cluster. So, a reasonable backup strategy to backup your globals and
+    produce a flexible backup of every database in the cluster would
+    look like this:
+<programlisting>
+pg_dumpall -g -Uusername --file=globals.sql;
+psql -AtU postgres -c "SELECT datname FROM pg_database \
+                          WHERE NOT datistemplate"| \
+while read f;
+   do pg_dump -Upostgres --format=c --file=$f.sqlc $f;
+done;
+</programlisting>
+   </para>
   </sect2>
 
   <sect2 id="backup-dump-large">
    <title>Handling Large Databases</title>
 
    <para>
-    Some operating systems have maximum file size limits that cause
-    problems when creating large <application>pg_dump</> output files.
+    Although rare in modern operating systems, sometimes maximum file size limits may
+    cause problems when creating large <application>pg_dump</> output files.
     Fortunately, <application>pg_dump</> can write to the standard
     output, so you can use standard Unix tools to work around this
     potential problem.  There are several possible methods:
@@ -239,12 +322,20 @@ psql -f <replaceable class="parameter">infile</replaceable> postgres
     <title>Use compressed dumps.</title>
     <para>
      You can use your favorite compression program, for example
-     <application>gzip</application>:
+     <application>gzip</application> or <application>pigz</application> which 
+     is a parallel implementation of <application>gzip</application> for modern
+     multi-processor, multi-core machines:
 
 <programlisting>
 pg_dump <replaceable class="parameter">dbname</replaceable> | gzip &gt; <replaceable class="parameter">filename</replaceable>.gz
 </programlisting>
 
+     or:
+
+<programlisting>
+pg_dump <replaceable class="parameter">dbname</replaceable> | pigz &gt; <replaceable class="parameter">filename</replaceable>.gz
+</programlisting>
+
      Reload with:
 
 <programlisting>
@@ -254,7 +345,7 @@ gunzip -c <replaceable class="parameter">filename</replaceable>.gz | psql <repla
      or:
 
 <programlisting>
-cat <replaceable class="parameter">filename</replaceable>.gz | gunzip | psql <replaceable class="parameter">dbname</replaceable>
+pigz -dc <replaceable class="parameter">filename</replaceable>.gz | psql <replaceable class="parameter">dbname</replaceable>
 </programlisting>
     </para>
    </formalpara>