pg_dump and LOs (another proposal)

Started by Philip Warnerover 25 years ago18 messages
#1Philip Warner
pjw@rhyme.com.au

Having now flirted with recreating BLOBs (and even DBs) with matching OIDs,
I find myself thinking it may be a waste of effort for the moment. A
modified version of the system used by Pavel Janik in pg_dumplo may be
substantially more reliable than my previous proposal:

To Dump
-------

Dump all LOs by looking in pg_class for relkind='l'.

Don't bother cross-referencing with actual table entries, since we are
trying to do a backup rather than a consistency check.

The dump will consist of the LO and it's original OID.

To Load
-------

Create a temporary table, lo_xref, with appropriate indexes

Reload the LOs, storing old & new oid in lo_xref.

Now, disable triggers and sequentially search through all tables that have
one or more oid columns: for each oid column, see if the column value is in
lo_xref, if it is, update it with the new value.

For large databases, this system will rely heavily on lo_xref, so my main
worries are:

1. How are temp tables stored? (eg. if in memory this is a problem -
Dec/Rdb stores temp tables in memory).

2. Are there any limitation on indexes of temp tables (I seem to be able to
create them at least - Dec/Rdb won't even let you do that).

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#1)
Re: pg_dump and LOs (another proposal)

Philip Warner <pjw@rhyme.com.au> writes:

Having now flirted with recreating BLOBs (and even DBs) with matching OIDs,
I find myself thinking it's a waste of effort for the moment. A modified
version of the system used by Pavel Janik in pg_dumplo may be substantially
more reliable than my previous proposal:

I like this a lot better than trying to restore the original OIDs. For
one thing, the restore-original-OIDs idea cannot be made to work if what
we want to do is load additional tables into an existing database.

For large databases, this system will rely heavily on lo_xref, so my main
worries are:

1. How are temp tables stored? (eg. if in memory this is a problem -
Dec/Rdb stores temp tables in memory).

2. Are there any limitation on indexes of temp tables (I seem to be able to
create them at least - Dec/Rdb won't even let you do that).

No problem. A temp table is a table, it's just got a unique name under
the hood. (So do its indices, IIRC...)

regards, tom lane

#3Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#2)
Re: Re: pg_dump and LOs (another proposal)

At 11:09 5/07/00 -0400, Tom Lane wrote:

Philip Warner <pjw@rhyme.com.au> writes:

Having now flirted with recreating BLOBs (and even DBs) with matching OIDs,
I find myself thinking it's a waste of effort for the moment. A modified
version of the system used by Pavel Janik in pg_dumplo may be substantially
more reliable than my previous proposal:

I like this a lot better than trying to restore the original OIDs. For
one thing, the restore-original-OIDs idea cannot be made to work if what
we want to do is load additional tables into an existing database.

The thing that bugs me about this if for 30,000 rows, I do 30,000 updates
after the restore. It seems *really* inefficient, not to mention slow.

I'll also have to modify pg_restore to talk to the database directly (for
lo import). As a result I will probably send the entire script directly
from withing pg_restore. Do you know if comment parsing ('--') is done in
the backend, or psql?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#3)
Re: Re: pg_dump and LOs (another proposal)

Philip Warner <pjw@rhyme.com.au> writes:

The thing that bugs me about this if for 30,000 rows, I do 30,000 updates
after the restore. It seems *really* inefficient, not to mention slow.

Shouldn't be a problem. For one thing, I can assure you there are no
databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO
infrastructure won't support it. (I think Denis Perchine has started
to work on a replacement one-table-for-all-LOs solution, btw.) Possibly
more to the point, there's no reason for pg_restore to grovel through
the individual rows for itself. Having identified a column that
contains (or might contain) LO OIDs, you can do something like

UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE
oidcolumn = tmptable.oldLOoid;

which should be quick enough, especially given indexes.

I'll also have to modify pg_restore to talk to the database directly (for
lo import). As a result I will probably send the entire script directly
from withing pg_restore. Do you know if comment parsing ('--') is done in
the backend, or psql?

Both, I believe --- psql discards comments, but so will the backend.
Not sure you really need to abandon use of psql, though.

regards, tom lane

#5Denis Perchine
dyp@perchine.com
In reply to: Tom Lane (#4)
Re[2]: Re: pg_dump and LOs (another proposal)

Hello Tom,

Wednesday, July 05, 2000, 9:06:33 PM, you wrote:

TL> Philip Warner <pjw@rhyme.com.au> writes:

The thing that bugs me about this if for 30,000 rows, I do 30,000 updates
after the restore. It seems *really* inefficient, not to mention slow.

TL> Shouldn't be a problem. For one thing, I can assure you there are no
TL> databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO

Hmmm... I have 127865 LOs at the moment. :-))) But with my patch where
all LOs are usual files on FS. I will move it to one-table-for-all-LOs
after my holidays.

TL> infrastructure won't support it. (I think Denis Perchine has started
TL> to work on a replacement one-table-for-all-LOs solution, btw.) Possibly

You can try it. I sent it to pgsql-patches some time ago.

TL> more to the point, there's no reason for pg_restore to grovel through
TL> the individual rows for itself. Having identified a column that
TL> contains (or might contain) LO OIDs, you can do something like

--
Best regards,
Denis mailto:dyp@perchine.com

#6Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#4)
Re: Re: pg_dump and LOs (another proposal)

At 13:06 5/07/00 -0400, Tom Lane wrote:

Shouldn't be a problem. For one thing, I can assure you there are no
databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO
infrastructure won't support it. (I think Denis Perchine has started

Eeek! Not so long ago I was going to use PG for a database will far more
than that many documents (mainly becuase of no backup, horrible storage
etc). Glad I didn't.

I'll also have to modify pg_restore to talk to the database directly (for
lo import). As a result I will probably send the entire script directly
from withing pg_restore. Do you know if comment parsing ('--') is done in
the backend, or psql?

Both, I believe --- psql discards comments, but so will the backend.
Not sure you really need to abandon use of psql, though.

Don't plan to abandon it, but I did plan to use lo_creat, lo_write to add
the LOs, and that requires no psql, I think. I want this utility to run
direct from tape, without lots of temp files.

I'll probably just have a new arg, --blobs, and another --db, which makes a
direct DB connection, and --blobs without --db will not be supported. Does
this sound OK?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#7Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#4)
Re: Re: pg_dump and LOs (another proposal)

At 13:06 5/07/00 -0400, Tom Lane wrote:

UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE
oidcolumn = tmptable.oldLOoid;

It's actually nastier than this since there could be multiple oid columns,
implying, potentially, multiple scans of the table.

I suppose

update userTable set
oidCol1 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol1
), oidCol1 ),
oidCol2 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol2
), oidCol2 ),
...

would work, or at least only update each row once, but it looks slow.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#7)
Re: Re: pg_dump and LOs (another proposal)

Philip Warner <pjw@rhyme.com.au> writes:

At 13:06 5/07/00 -0400, Tom Lane wrote:

UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE
oidcolumn = tmptable.oldLOoid;

It's actually nastier than this since there could be multiple oid columns,
implying, potentially, multiple scans of the table.

So?

I suppose

update userTable set
oidCol1 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol1
), oidCol1 ),
oidCol2 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol2
), oidCol2 ),
...

would work, or at least only update each row once, but it looks slow.

Almost certainly slower than processing each column in a separate
UPDATE. It does not pay to try to be smarter than the planner is ;-)

regards, tom lane

#9Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Tom Lane (#8)
AW: Re: pg_dump and LOs (another proposal)

At 11:09 5/07/00 -0400, Tom Lane wrote:

Philip Warner <pjw@rhyme.com.au> writes:

Having now flirted with recreating BLOBs (and even DBs)

with matching OIDs,

I find myself thinking it's a waste of effort for the

moment. A modified

version of the system used by Pavel Janik in pg_dumplo may

be substantially

more reliable than my previous proposal:

I like this a lot better than trying to restore the original

OIDs. For

one thing, the restore-original-OIDs idea cannot be made to

work if what

we want to do is load additional tables into an existing database.

The thing that bugs me about this if for 30,000 rows, I do
30,000 updates
after the restore. It seems *really* inefficient, not to mention slow.

I'll also have to modify pg_restore to talk to the database
directly (for
lo import). As a result I will probably send the entire
script directly
from withing pg_restore. Do you know if comment parsing
('--') is done in
the backend, or psql?

Strictly speaking you are absolutely safe if you only do one update
with the max oid from the 30,000 rows before you start creating the lo's.
Don't know if you know that beforehand though.

If you only know afterwards then you have to guarantee that no other
connection to this db (actually postmaster if you need the oid's site
unique)
does anything while you insert the lo's and then update to max oid.

Andreas

#10Philip Warner
pjw@rhyme.com.au
In reply to: Zeugswetter Andreas SB (#9)
Re: AW: Re: pg_dump and LOs (another proposal)

At 09:52 6/07/00 +0200, Zeugswetter Andreas SB wrote:

I'll also have to modify pg_restore to talk to the database
directly (for
lo import). As a result I will probably send the entire
script directly
from withing pg_restore. Do you know if comment parsing
('--') is done in
the backend, or psql?

Strictly speaking you are absolutely safe if you only do one update
with the max oid from the 30,000 rows before you start creating the lo's.
Don't know if you know that beforehand though.

If you only know afterwards then you have to guarantee that no other
connection to this db (actually postmaster if you need the oid's site
unique)
does anything while you insert the lo's and then update to max oid.

You may be confusing the two proposed techniques, the current flavour of
the minute is to restore the BLOBs using lo_craete to get a new oid; write
an entry in a table indicating what the old & new are, then when the table
data is loaded, update all oid fields that refer to oids in the xref table.
It's pretty nasty, but it has the big advantage of being as vanilla as
possible. It's also pretty close to what pg_dump_lo does.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#3)
Re: Re: pg_dump and LOs (another proposal)

Philip Warner writes:

I'll also have to modify pg_restore to talk to the database directly (for
lo import).

psql has \lo_import.

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#12Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#11)
Re: Re: pg_dump and LOs (another proposal)

At 18:12 6/07/00 +0200, Peter Eisentraut wrote:

Philip Warner writes:

I'll also have to modify pg_restore to talk to the database directly (for
lo import).

psql has \lo_import.

This is true, but if there are 30000 blobs on an archive tape, I cant dump
them into /tmp and wait for the user to run the script. At the current time
pg_restore just sends a script to a file or stdout - it has no guarantee of
when a \lo_import command will be run, so dumping blobs into the same file
between lo_import calls would not be appropriate, since I am in effect
requiring a psql attachment.

So the plan is, in the first pass, to make BLOB restoration dependant on
having a DB connection.

Does this make more sense?

P.S. I have only half-written the lo dumping code, so this is all quite
open...

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#13Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#12)
Re: Re: pg_dump and LOs (another proposal)

At 18:12 6/07/00 +0200, Peter Eisentraut wrote:

Philip Warner writes:

I'll also have to modify pg_restore to talk to the database directly (for
lo import).

psql has \lo_import.

P.S. Another, possibly minor, advantage of using a direct db connection is
I can allow the user to stop restoring the database on the first error,
unlike a script file to psql.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#14Karel Zak
zakkr@zf.jcu.cz
In reply to: Philip Warner (#12)
Re: Re: pg_dump and LOs (another proposal)

P.S. I have only half-written the lo dumping code, so this is all quite
open...

A "blasphemy" question, is really needful LO dump if we will have TOAST and
LO will lonely past? If anyone still need dump LO (for example I) is possible
use pg_dumplo from contrib tree, that (as some users say) works very well.
Not is work on LO dump, after several years and during LO funeral loss of
time? (sorry).

Karel

#15Philip Warner
pjw@rhyme.com.au
In reply to: Karel Zak (#14)
Re: Re: pg_dump and LOs (another proposal)

At 09:30 7/07/00 +0200, Karel Zak wrote:

P.S. I have only half-written the lo dumping code, so this is all quite
open...

A "blasphemy" question, is really needful LO dump if we will have TOAST and
LO will lonely past? If anyone still need dump LO (for example I) is

possible

use pg_dumplo from contrib tree, that (as some users say) works very well.
Not is work on LO dump, after several years and during LO funeral loss of
time? (sorry).

There are three reasons why I continue:

1. To learn

2. Because I believe that BLOBs will exist after TOAST, although the
implementation will have changed. The code to handle the current format
will be at least 70% reusable (assuming a similar set of
lo_open/read/write/close calls).

3. We will need a way of exporting old BLOBs and importing them as TOAST
BLOBs.

I could be wrong about (2), but I think binary data can not be easily
loaded from a pure text file, and (3) could ultimately be handled by
pg_dump_lo, but I like the idea of an integrated tool.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#12)
Re: Re: pg_dump and LOs (another proposal)

Philip Warner writes:

psql has \lo_import.

This is true, but if there are 30000 blobs on an archive tape, I cant dump
them into /tmp and wait for the user to run the script. At the current time
pg_restore just sends a script to a file or stdout - it has no guarantee of
when a \lo_import command will be run, so dumping blobs into the same file
between lo_import calls would not be appropriate, since I am in effect
requiring a psql attachment.

I don't understand. How else would you restore a large object if not using
libpq's lo_import() call?

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#13)
Re: Re: pg_dump and LOs (another proposal)

Philip Warner writes:

psql has \lo_import.

P.S. Another, possibly minor, advantage of using a direct db connection is
I can allow the user to stop restoring the database on the first error,
unlike a script file to psql.

\set ON_ERROR_STOP

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#18Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#16)
Re: Re: pg_dump and LOs (another proposal)

At 18:15 7/07/00 +0200, Peter Eisentraut wrote:

Philip Warner writes:

psql has \lo_import.

This is true, but if there are 30000 blobs on an archive tape, I cant dump
them into /tmp and wait for the user to run the script. At the current time
pg_restore just sends a script to a file or stdout - it has no guarantee of
when a \lo_import command will be run, so dumping blobs into the same file
between lo_import calls would not be appropriate, since I am in effect
requiring a psql attachment.

I don't understand. How else would you restore a large object if not using
libpq's lo_import() call?

Direct connection to DB and use lo_creat, lo_open, lo_write & lo_close -
ie. what lo_import does under the hood.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/