pg_dump object sorting
I have been looking at refining the sorting of objects in pg_dump to
make it take advantage of buffering and synchronised scanning, and
possibly make parallel restoration simpler and more efficient.
My first thought was to sort indexes by <namespace, tablename,
indexname> instead of by <namespace, indexname>. However, that doesn't
go far enough, I think. Is there any reason we can't do all of a table's
indexes and non-FK constraints together? Will that affect anything other
than PK and UNIQUE constraints, as NULL and CHECK constraints are
included in table definitions?
cheers
andrew
On Mon, 2008-04-14 at 11:18 -0400, Andrew Dunstan wrote:
I have been looking at refining the sorting of objects in pg_dump to
make it take advantage of buffering and synchronised scanning, and
possibly make parallel restoration simpler and more efficient.
Synchronized scanning is explicitly disabled in pg_dump. That was a
last-minute change to answer Greg Stark's complaint about dumping a
clustered table:
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php
That hopefully won't be a permanent solution, because I think
synchronized scans are useful for pg_dump.
However, I'm not clear on how the pg_dump order would be able to better
take advantage of synchronized scans anyway. What did you have in mind?
Regards,
Jeff Davis
Jeff Davis wrote:
On Mon, 2008-04-14 at 11:18 -0400, Andrew Dunstan wrote:
I have been looking at refining the sorting of objects in pg_dump to
make it take advantage of buffering and synchronised scanning, and
possibly make parallel restoration simpler and more efficient.Synchronized scanning is explicitly disabled in pg_dump. That was a
last-minute change to answer Greg Stark's complaint about dumping a
clustered table:http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php
That hopefully won't be a permanent solution, because I think
synchronized scans are useful for pg_dump.However, I'm not clear on how the pg_dump order would be able to better
take advantage of synchronized scans anyway. What did you have in mind?
I should have expressed it better. The idea is to have pg_dump emit the
objects in an order that allows the restore to take advantage of sync
scans. So sync scans being disabled in pg_dump would not at all matter.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
I should have expressed it better. The idea is to have pg_dump emit the
objects in an order that allows the restore to take advantage of sync
scans. So sync scans being disabled in pg_dump would not at all matter.
Unless you do something to explicitly parallelize the operations,
how will a different ordering improve matters?
I thought we had a paper design for this, and it involved teaching
pg_restore how to use multiple connections. In that context it's
entirely up to pg_restore to manage the ordering and ensure dependencies
are met. So I'm not seeing how it helps to have a different sort rule
at pg_dump time --- it won't really make pg_restore's task any easier.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I should have expressed it better. The idea is to have pg_dump emit the
objects in an order that allows the restore to take advantage of sync
scans. So sync scans being disabled in pg_dump would not at all matter.Unless you do something to explicitly parallelize the operations,
how will a different ordering improve matters?I thought we had a paper design for this, and it involved teaching
pg_restore how to use multiple connections. In that context it's
entirely up to pg_restore to manage the ordering and ensure dependencies
are met. So I'm not seeing how it helps to have a different sort rule
at pg_dump time --- it won't really make pg_restore's task any easier.
Well, what actually got me going on this initially was that I got
annoyed by having indexes not grouped by table when I dumped out the
schema of a database, because it seemed a bit illogical. Then I started
thinking about it and it seemed to me that even without synchronised
scanning or parallel restoration, we might benefit from building all the
indexes of a given table together, especially if the whole table could
fit in either our cache or the OS cache.
cheers
andrew