QSoC proposal: Rewrite pg_dump and pg_restore
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link
https://docs.google.com/document/d/1s-Q4rzEysPxo-dINsk_eKFJOBoVjNYDrQ-Oh75gtYEM/edit?usp=sharing)
Any suggestions and comments are welcome.
*
PostgreSQL GSoC 2014 proposal
Project name
Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll &
.dylib)
Short description
pg_dump is a utility for backing up a PostgreSQL database. It makes
consistent backups even if the database is being used concurrently.
pg_restore is a utility for restoring a PostgreSQL database from an
archive created by pg_dump in one of the non-plain-text formats. I think
it will be more comforatable to use these tools as libraries.
Name: Alexander Shvidchenko
E-mail: askellio@gmail.com <mailto:askellio@gmail.com>
Location: Rostov-on-Don, Russia (UTC +04.00)
Benefits to the PostgreSQL Community
This feature will expand opportunities to work with backups of
databases. Especially, it’s important for third-party developers. It
will be easier to use the functionality of the tools in applications
when they will be libraries.
Quantifiable results
Static and dynamic libraries with the functionality of pg_dump and
pg_restore tools (.so, .dll and .dylib files)
Project Schedule
until May 31
Make code review and solve architecture questions with help of community
1 June – 30 June
Detailed implementation of libraries.
1 July – 31 July
Finish Implementation of libraires and begin testing.
1 August -15 August
Final refactoring, testing and commit.
Academic experience
I entered the university in 2013. Before entering the university I
finished the college in 2012. My graduate work in the college was the
client-server application. It was a realization of XMPP. The client was
realized in Qt. The client worked with SQLite database and the server
worked with MySQL database.
Why is PostgreSQL?
- I’m intereted in this idea and believe this project would be useful
for the community;
- PostgreSQL is a very respected community. I would be proud to be a
part of it;
- PostgreSQL is one of the best DBMS and I would like to make it better.
Links
1) PostgreSQL 9.3.3 Documentation, pg_dump
http://www.postgresql.org/docs/9.3/static/app-pgdump.html
2) PostgreSQL 9.3.3 Documentation, pg_restore
http://www.postgresql.org/docs/9.3/static/app-pgrestore.html
3) Static library
http://en.wikipedia.org/wiki/Static_library
4) Dynamic library
http://en.wikipedia.org/wiki/Dynamic-link_library
*
With best wishes,
Alexander S.
On Tue, Mar 18, 2014 at 8:41 PM, Alexandr <askellio@gmail.com> wrote:
Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll &
.dylib)
This strikes me as (1) pretty vague and (2) probably too hard for a
summer project.
I mean, getting the existing binaries to build libraries that you can
call with some trivial interface that mimics the existing command-line
functionality of pg_dump might be doable, but that's not all that
interesting. What people are really going to want is a library with a
sophisticated API that lets you do interesting things
programmatically. But that's going to be hard. AFAIK, nobody's even
tried to figure out what that API should look like. Even if we had
that worked out, a non-trivial task, the pg_dump source code is a
mess, so refactoring it to provide such an API is likely to be a job
and a half.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/21/2014 09:28 AM, Robert Haas wrote:
On Tue, Mar 18, 2014 at 8:41 PM, Alexandr <askellio@gmail.com> wrote:
Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll &
.dylib)This strikes me as (1) pretty vague and (2) probably too hard for a
summer project.I mean, getting the existing binaries to build libraries that you can
call with some trivial interface that mimics the existing command-line
functionality of pg_dump might be doable, but that's not all that
interesting. What people are really going to want is a library with a
sophisticated API that lets you do interesting things
programmatically. But that's going to be hard. AFAIK, nobody's even
tried to figure out what that API should look like. Even if we had
that worked out, a non-trivial task, the pg_dump source code is a
mess, so refactoring it to provide such an API is likely to be a job
and a half.
... and still wouldn't solve one of the most frequently requested things
for pg_dump / pg_restore, which is the ability to use them *server-side*
over a regular PostgreSQL connection. It'd be useful progress toward
that, though.
Right now, we can't even get the PostgreSQL server to emit DDL for a
table, let alone do anything more sophisticated.
Here's how I think it needs to look:
- Design a useful API for pg_dump and pg_restore that is practical to
use for pg_dump and pg_restore's current tasks (fast database
dump/restore) and also useful for extracting specific objects
from the database. When designing, consider that we'll want to
expose this API or functions that use it over SQL later.
- Create a new "libpqdump" library.
- Implement the designed API in the new library, moving and
adjusting code from pg_dump / pg_restore where possible, writing
new code where not.
- Refactor (closer to rewrite) pg_dump and pg_restore to use libpqdump,
removing as much knowledge of the system catalogs etc as possible from
them.
- Make sure the result still performs OK
THEN, once that's settled in:
- Modify libpqdump to support compilation as a backend extension, with
use of the SPI for queries and use of syscaches or direct scans
where possible.
- Write a "pg_dump" extension that uses libpqdump in SPI mode
to expose its API over SQL, or at least uses it to provide SQL
functions to describe database objects. So you can dump a DB,
or a subset of it, over SQL.
After all, a "libpgdump" won't do much good for the large proportion of
PostgreSQL users who use Java/JDBC, who can't use a native library
(without hideous hacks with JNI). For the very large group who use libpq
via language-specific client interfaces like the Pg gem for Ruby,
psycopg2 for Python, DBD::Pg for Perl, etc, it'll require a lot of work
to wrap the API and maintain it. Wheras a server-side SQL-callable
interface would be useful and immediately usable for all of them.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Craig Ringer <craig@2ndquadrant.com> writes:
Here's how I think it needs to look:
[ move all the functionality to the backend ]
Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump. pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side. Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.
In any case, I quite agree with the sentiment that this is not a suitable
problem for a GSOC project.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/21/2014 11:09 AM, Tom Lane wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
Here's how I think it needs to look:
[ move all the functionality to the backend ]Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump. pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side. Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.
Yeah, that's why it's necessary to create a "libpqdump" that's usable
client-side even if you want server-side dump support.
So it's "allow the functionality to be used from the backend as well",
not just "move all the functionality to the backend".
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 21, 2014 at 4:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
Here's how I think it needs to look:
[ move all the functionality to the backend ]Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump. pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side. Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.
Or you should mandate that new server versions should be able to consume
_old_ pg_dump version output. This would change the recommended "when
upgrading, dump using the new pg_dump" to ""when upgrading, dump using the
old pg_dump".
This would be necessary policy going forward anyway, if most of the pg_dump
functionality was server-side, because it would be generating dumps in the
server-version dump format, not the client-version format.
'Regards
Marcin Mańk
(goes back to lurker cave...)
On Thu, Mar 20, 2014 at 11:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
Here's how I think it needs to look:
[ move all the functionality to the backend ]Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump. pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side. Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.
It's worse than that. If you put all the logic in the server, then a
dump taken on an older version won't be able to quote keywords added
in the newer version. Go directly to fail.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/21/2014 09:38 AM, Robert Haas wrote:
On Thu, Mar 20, 2014 at 11:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
Here's how I think it needs to look:
[ move all the functionality to the backend ]Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump. pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side. Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.It's worse than that. If you put all the logic in the server, then a
dump taken on an older version won't be able to quote keywords added
in the newer version. Go directly to fail.
Yeah. This tantalizing project has been looked at several times and
found to be a viper's nest.
What would be useful for many purposes, and is a long-standing project
of mine that I still haven't found time to make progress on, is that the
server should contain functions to produce the creation SQL for all its
own objects, free of the locks that pg_dump requires for consistency.
That would be a great SoC project, incidentally. I'd even volunteer to
mentor that one.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan escribi�:
What would be useful for many purposes, and is a long-standing
project of mine that I still haven't found time to make progress on,
is that the server should contain functions to produce the creation
SQL for all its own objects, free of the locks that pg_dump requires
for consistency.
Maybe you'd like my "DDL deparse" project, then. Right now it's only of
use for event triggers during DDL execution, but I don't see any strong
reason it cannot be used to reconstruct object creation commands from
only their identifying OID.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers