no XLOG during COPY?
Back in February, Tom said here:
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00963.php :
That defeats a couple of optimizations that
Simon put in recently. The one for no XLOG during COPY is not too
hard to see how to re-enable, but I'm not sure what else there was.
Could someone please point me at where this optimization was committed?
I'm having trouble locating it.
Thanks
andrew
On Thu, Sep 11, 2008 at 9:01 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Could someone please point me at where this optimization was committed? I'm
having trouble locating it.
I think it's this one:
http://archives.postgresql.org/pgsql-committers/2007-01/msg00296.php
--
Guillaume
Andrew Dunstan wrote:
Back in February, Tom said here:
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00963.php :That defeats a couple of optimizations that
Simon put in recently. The one for no XLOG during COPY is not too
hard to see how to re-enable, but I'm not sure what else there was.Could someone please point me at where this optimization was committed?
I'm having trouble locating it.
http://archives.postgresql.org/pgsql-committers/2007-01/msg00296.php
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
Back in February, Tom said here:
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00963.php :That defeats a couple of optimizations that
Simon put in recently. The one for no XLOG during COPY is not too
hard to see how to re-enable, but I'm not sure what else there was.Could someone please point me at where this optimization was
committed? I'm having trouble locating it.http://archives.postgresql.org/pgsql-committers/2007-01/msg00296.php
Great, thanks (and also to Guillaume).
It looks to me like the simple way around this issue would be to provide
an option to have pg_restore emit:
begin; truncate foo; copy foo ... commit;
The truncate will be trivial as there won't be any data or indexes at
that stage anyway.
cheers
andrew
On Thu, 2008-09-11 at 15:25 -0400, Andrew Dunstan wrote:
Great, thanks (and also to Guillaume).
It looks to me like the simple way around this issue would be to provide
an option to have pg_restore emit:
begin; truncate foo; copy foo ... commit;The truncate will be trivial as there won't be any data or indexes at
that stage anyway.
Not sure which stage you're talking about. If this is a parallel restore
and you are running a create in one session and a load in another, then
ISTM you have no way of knowing that for certain.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
On Thu, 2008-09-11 at 15:25 -0400, Andrew Dunstan wrote:
Great, thanks (and also to Guillaume).
It looks to me like the simple way around this issue would be to provide
an option to have pg_restore emit:
begin; truncate foo; copy foo ... commit;The truncate will be trivial as there won't be any data or indexes at
that stage anyway.Not sure which stage you're talking about. If this is a parallel restore
and you are running a create in one session and a load in another, then
ISTM you have no way of knowing that for certain.
Er, who doesn't know what for certain, exactly? pg_restore will
certainly know that it has created the table in another session and can
thus safely truncate the table in the same transaction as the data load.
cheers
andrew
Andrew Dunstan wrote:
[snip]
Er, who doesn't know what for certain, exactly? pg_restore will
certainly know that it has created the table in another session and
can thus safely truncate the table in the same transaction as the data
load.cheers
andrew
I'm confused about why table definition and data can't be loaded in the
same backend and transaction. Can somebody explain that?
All items in the tree like A -> B -> C -> D should all be loaded in
the same transaction as they are serially dependent. I can't think of a
way that the table data requires more than just the table to load.
Foreign keys may produce this situation but if all tables are loaded
with the data I can't see how it can happen. As Foreign key tables must
be loaded before the referencing table. But then I think these
constraints are loaded at the end anyway.
The first cut of this may not have the dependency resolution smarts to
work out how best to group restore items together to send to a backend
together. My research into how the directed graph dependency
information is stored should allow for dishing out the data to backends
in the best possible way. But currently there is no graph as such, just
a serial list of items that are safe to load. Producing the graph will
give a better idea of maximum concurrency based on what's dependent on
each other. But the graph has to be built from the dependency
information that's stored.
Is it also feasible to have the -1 (single transaction) option to
complete the largest possible work unit inside a single transaction.
This means there would be 1transaction per backend work unit, eg (A, B,
C, D in the above). I don' t know if indexes can skip WAL if they are
in the table creation transaction but that would seem like another win
if they were added at the same time as the table. That does play
against the ideas of running all of the index creation statements in
parallel to get the benefit of synchronized scan. I don't know what
going to be the biggest win on big hardware as I don't have any. Just
something to think about.
Thanks
Russell Smith
Russell Smith wrote:
Andrew Dunstan wrote:
[snip]
Er, who doesn't know what for certain, exactly? pg_restore will
certainly know that it has created the table in another session and
can thus safely truncate the table in the same transaction as the data
load.cheers
andrew
I'm confused about why table definition and data can't be loaded in the
same backend and transaction. Can somebody explain that?
That would delay other things that depend on the table definition (e.g.
function definitions). If we don't need to make this link (and we don't)
then I can't see why we should shackle ourselves with it.
cheers
andrew