PostgreSQL vs. MySQL

Started by Louis Bertrandover 25 years ago31 messages
#1Louis Bertrand
louis@bertrandtech.on.ca

There's a benchmark/review comparing PostgreSQL and MySQL on PHP Builder:
http://www.phpbuilder.com/columns/tim20000705.php3

Ciao
--Louis <louis@bertrandtech.on.ca>

#2Chris Bitmead
chrisb@nimrod.itg.telstra.com.au
In reply to: Louis Bertrand (#1)
Re: [GENERAL] PostgreSQL vs. MySQL

Louis Bertrand wrote:

There's a benchmark/review comparing PostgreSQL and MySQL on PHP Builder:
http://www.phpbuilder.com/columns/tim20000705.php3

I'm wondering about the comments that postgres is slower in connection
time, could this be related to that libpq always uses asynchronous
sockets to connect? It always turns off blocking and then goes through a
state machine to go through the various stages of connect, instead of
just calling connect() and waiting for the kernel to do its thing. Of
course asynchronous connecting is a benefit when you want it. Or is the
overhead elsewhere, and I'm just being paranoid?

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Chris Bitmead (#2)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Louis Bertrand wrote:

There's a benchmark/review comparing PostgreSQL and MySQL on PHP Builder:
http://www.phpbuilder.com/columns/tim20000705.php3

I'm wondering about the comments that postgres is slower in connection
time, could this be related to that libpq always uses asynchronous
sockets to connect? It always turns off blocking and then goes through a
state machine to go through the various stages of connect, instead of
just calling connect() and waiting for the kernel to do its thing. Of
course asynchronous connecting is a benefit when you want it. Or is the
overhead elsewhere, and I'm just being paranoid?

The truth is, we really don't know what it is.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Bitmead (#2)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:

I'm wondering about the comments that postgres is slower in connection
time, could this be related to that libpq always uses asynchronous
sockets to connect? It always turns off blocking and then goes through a
state machine to go through the various stages of connect, instead of
just calling connect() and waiting for the kernel to do its thing.

I think you'd be wasting your time to "improve" that. A couple of
kernel calls are not enough to explain the problem. Moreover, we
had complaints about slow startup even back when libpq had never heard
of async anything.

I believe that the problem is on the backend side: there's an awful lot
of cache-initialization and so forth that happens each time a backend
is started. It's quick enough to be hard to profile accurately,
however, so getting the info needed to speed it up is not so easy.

regards, tom lane

#5Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#4)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

At 22:59 9/07/00 -0400, Tom Lane wrote:

Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:

I'm wondering about the comments that postgres is slower in connection
time, could this be related to that libpq always uses asynchronous
sockets to connect? It always turns off blocking and then goes through a
state machine to go through the various stages of connect, instead of
just calling connect() and waiting for the kernel to do its thing.

I believe that the problem is on the backend side: there's an awful lot
of cache-initialization and so forth that happens each time a backend
is started. It's quick enough to be hard to profile accurately,
however, so getting the info needed to speed it up is not so easy.

You could pre-start servers (ala Apache), then when a connection request
comes in, the connection should be pretty fast. This would involve
defining, for each database, the number of servers to prestart (default 0),
and perhaps the minimum number of free servers to maintain (ie. when all
free servers are used up, automatically create some new ones). You would
definitely need to make this dynamic to allow for clean database shutdowns.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#6Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#5)
pg_backup symlink?

Does anyone have a philosophical objection to a symlink from pg_dump to
(new) pg_backup?

The reason I ask is that with the new BLOB support, to do a proper backup
of the database one has to type:

pg_dump --blob -Fc ...etc

where --blob tells it to dump BLOBs and -Fc tells it to use the custon file
format, which at the moment is the only one that supports BLOB storage.

The idea would be for pg_dump to look at it's name, and make --blob and -Fc
defaults if it is called as pg_backup. These can of course be overridden
when binary blob load direct into psql is supported (maybe 'LO_COPY from
stdin Length {len}'?)

I know someone (Tom?) objected to symlinked files drastically changing
command behaviour, but this is not a drastic change, so I live in hope.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#5)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Philip Warner <pjw@rhyme.com.au> writes:

At 22:59 9/07/00 -0400, Tom Lane wrote:

I believe that the problem is on the backend side: there's an awful lot
of cache-initialization and so forth that happens each time a backend
is started. It's quick enough to be hard to profile accurately,
however, so getting the info needed to speed it up is not so easy.

You could pre-start servers (ala Apache), then when a connection request
comes in, the connection should be pretty fast. This would involve
defining, for each database, the number of servers to prestart (default 0),

Yeah, that's been discussed before. It seems possible if not exactly
simple --- one of the implications is that the postmaster no longer
listens for client connections, but is reduced to being a factory for
new backends. The prestarted backends themselves have to be listening
for client connections, since there's no portable way for the postmaster
to pass off a client socket to an already-existing backend.

And of course the major problem with *that* is how do you get the
connection request to arrive at a backend that's been prestarted in
the right database? If you don't commit to a database then there's
not a whole lot of prestarting that can be done.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

regards, tom lane

#8Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#7)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

At 01:02 10/07/00 -0400, Tom Lane wrote:

You could pre-start servers (ala Apache), then when a connection request
comes in, the connection should be pretty fast. This would involve
defining, for each database, the number of servers to prestart (default 0),

since there's no portable way for the postmaster
to pass off a client socket to an already-existing backend.

That's a pain, because you probably don't want to vary the postmaster
behaviour that much.

Couldn't you modify the connection protocol to request the port of a free
db server, then redo the connect invisibly inside the front end?

The postmaster would have to manage free servers, and mark the db server as
used etc etc.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This has other advantages too - I'd like to be able to shutdown *one*
database, and possibly restart it in 'administrator mode' (eg. for a
restore operation). It also means one misbehaving DB doesn't mess up other
DBs. Sounds very good to me.

Does this mean there would be a postmaster-master that told you the
postmaster port to connect to for the desired DB? Or is there a nicer way
of doing this...

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#9Benjamin Adida
ben@mit.edu
In reply to: Tom Lane (#7)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

This "pre-starting" is already being done by any web application that uses
connection pooling. (I suspect this speed of connection startup is only
important for something like a web system, correct?). Even if you did
"pre-start" these back-ends, you'd end up with one of two possibilities:

- you reuse the back-end processes from one connection to the other. I
suspect this is very hard, and you'd just be recreating connection pooling
at a lower level, which I don't think is that worthwhile an investment...

- you don't reuse the back-end processes, in which case you're still
spawning one process per connection, which remains a bad idea for web
systems, so you're back to the application-layer connection pooling idea.

I admire the entire Postgres's team efforts to fix any and all issues that
come in. You guys show true humility and a real desire to make this product
the best it can be.

It seems to me, though, that this particular issue is better resolved at the
application layer.

-Ben

on 7/10/00 1:02 AM, Tom Lane at tgl@sss.pgh.pa.us wrote:

Show quoted text

You could pre-start servers (ala Apache), then when a connection request
comes in, the connection should be pretty fast. This would involve
defining, for each database, the number of servers to prestart (default 0),

Yeah, that's been discussed before. It seems possible if not exactly
simple --- one of the implications is that the postmaster no longer
listens for client connections, but is reduced to being a factory for
new backends. The prestarted backends themselves have to be listening
for client connections, since there's no portable way for the postmaster
to pass off a client socket to an already-existing backend.

And of course the major problem with *that* is how do you get the
connection request to arrive at a backend that's been prestarted in
the right database? If you don't commit to a database then there's
not a whole lot of prestarting that can be done.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

regards, tom lane

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#7)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

And of course the major problem with *that* is how do you get the
connection request to arrive at a backend that's been prestarted in
the right database? If you don't commit to a database then there's
not a whole lot of prestarting that can be done.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This is interesting. You believe schema's would allow a pool of
backends to connect to any database? That would clearly be a win.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#11Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Bruce Momjian (#10)
AW: Re: [GENERAL] PostgreSQL vs. MySQL

And of course the major problem with *that* is how do you get the
connection request to arrive at a backend that's been prestarted in
the right database? If you don't commit to a database then there's
not a whole lot of prestarting that can be done.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This is interesting. You believe schema's would allow a pool of
backends to connect to any database? That would clearly be a win.

I do not agree. We need to keep the different databases per postmaster.
Schemas are something we need below a database. This is actually
required by SQL99. More than one database (catalog in SQL99)
is not required per SQL99, but imho a very useful feature.

We need something that allows us to access objects on another database,
but this should imho not be limited to databases on the same postmaster
(SQL99 cluster).

Andreas

#12Mike Mascari
mascarm@mascari.com
In reply to: Bruce Momjian (#10)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Bruce Momjian wrote:

And of course the major problem with *that* is how do you get the
connection request to arrive at a backend that's been prestarted in
the right database? If you don't commit to a database then there's
not a whole lot of prestarting that can be done.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This is interesting. You believe schema's would allow a pool of
backends to connect to any database? That would clearly be a win.

I'm just curious, but did a consensus ever develop on schemas? It
seemed that the schemas/tablespace thread just ran out of steam.
For what its worth, I like the idea of:

1. PostgreSQL installation -> SQL cluster of catalogs
2. PostgreSQL database -> SQL catalog
3. PostgreSQL schema -> SQL schema

This correlates nicely with the current representation of
DATABASE. People can run multiple SQL clusters by running
multiple postmasters on different ports. Today, most people
achieve a logical separation of data by issuing multiple CREATE
DATABASE commands. But under the above, most sites would run with
a single PostgreSQL database (SQL catalog), since:

"Catalogs are named collections of schemas in an SQL-environment"

This would mirror the behavior of Oracle, where most people run
with a single Oracle SID. The logical separation would be
achieved with SCHEMA's a level under the current DATABASE (a.k.a.
catalog). This eliminates the problem of using softlinks and
creating various subdirectories to mirror *logical* parititioning
of data. It also alleviates the problem people currently
encounter when they've built their data model around multiple
DATABASE's but learn later that they need access to more than one
simultaneously. Instead, they'll model their design around
multiple SCHEMA's which exist within a single DATABASE instance.

It seems that the discussion of tablespaces shouldn't be mixed
with SCHEMA's except to note that a DATABASE (catalog) should
have a default TABLESPACE whose path matches the current one:

../pgsql/data/base/<mydatabase>

Later, users might be able to create a hierarchy of default
TABLESPACE's where the location of the object is found with logic
like:

1. Is there a object-specified tablespace?
(ex: CREATE TABLE payroll IN TABLESPACE...)
2. Is there a user-specified default tablespace?
(ex: CREATE USER mike DEFAULT TABLESPACE...)
2. Is there a schema-specified default tablespace?
(ex: CREATE SCHEMA accounting DEFAULT TABLESPACE..)
3. Use the catalog-default tablespace
(ex: CREATE DATABASE postgres DEFAULT LOCATION '/home/pgsql')

with the last example creating the system tablespace,
'system_tablespace', with '/home/pgsql' as the location.

Anyways, it seems a consensus should be developed on the whole
Cluster/Catalog/Schema scenario.

Mike Mascari

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This is interesting. You believe schema's would allow a pool of
backends to connect to any database? That would clearly be a win.

No, I meant that we wouldn't have physically separate databases anymore
within an installation, but would provide the illusion of it via
schemas. So, only one pg_class (for example) per installation.
This would simplify life in a number of areas... but there are downsides
to it as well, of course.

regards, tom lane

#14Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#10)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

No, I meant that we wouldn't have physically separate databases anymore
within an installation, but would provide the illusion of it via
schemas. So, only one pg_class (for example) per installation.
This would simplify life in a number of areas... but there are downsides
to it as well, of course.

Oops. This seems the wrong way to go. Increasing coupling between
databases to support schemas really means that we've traded one feature
for another, not increased our feature set.

Schemas are intended to help logically partition a work area/database.
We will need to implement the SQL99 path lookup scheme for finding
resources within a schema-divided database. But imho most installations
will still want resource- and permissions-partitioning between different
databases, and schemas should figure out how to fit within a single
database.

I didn't participate in the tablespace discussion because there seems to
be several PoV's well represented, but I'm interested in the schema
issue ;)

- Thomas

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#14)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

No, I meant that we wouldn't have physically separate databases anymore
within an installation, but would provide the illusion of it via
schemas. So, only one pg_class (for example) per installation.
This would simplify life in a number of areas... but there are downsides
to it as well, of course.

Oops. This seems the wrong way to go. Increasing coupling between
databases to support schemas really means that we've traded one feature
for another, not increased our feature set.

You could argue it that way, or you could say that we're replacing a
crufty old single-purpose feature with a nice new multi-purpose feature.

I'm not by any means sold on removing the physical separation between
databases --- I can see lots of reasons not to. But I think we ought
to think hard about the choice, not have a knee-jerk reaction that we
don't want to "eliminate a feature". Physically separate databases
are an implementation choice, not a user feature.

regards, tom lane

#16Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Mike Mascari (#12)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Bruce Momjian wrote:

And of course the major problem with *that* is how do you get the
connection request to arrive at a backend that's been prestarted in
the right database? If you don't commit to a database then there's
not a whole lot of prestarting that can be done.

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This is interesting. You believe schema's would allow a pool of
backends to connect to any database? That would clearly be a win.

I'm just curious, but did a consensus ever develop on schemas? It
seemed that the schemas/tablespace thread just ran out of steam.
For what its worth, I like the idea of:

You can find the entire thread in the current development tree in
doc/TODO.detail/tablespaces.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#13)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It occurs to me that this'd get a whole lot more feasible if one
postmaster == one database, which is something we *could* do if we
implemented schemas. Hiroshi's been arguing that the current hard
separation between databases in an installation should be done away
with in favor of schemas, and I'm starting to see his point...

This is interesting. You believe schema's would allow a pool of
backends to connect to any database? That would clearly be a win.

No, I meant that we wouldn't have physically separate databases anymore
within an installation, but would provide the illusion of it via
schemas. So, only one pg_class (for example) per installation.
This would simplify life in a number of areas... but there are downsides
to it as well, of course.

Wow, I can image the complications.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#18Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#15)
Re: Re: [GENERAL] PostgreSQL vs. MySQL

I'm not by any means sold on removing the physical separation between
databases --- I can see lots of reasons not to. But I think we ought
to think hard about the choice, not have a knee-jerk reaction that we
don't want to "eliminate a feature". Physically separate databases
are an implementation choice, not a user feature.

If we put tables from different database in the same tablespace
directory, and a database gets hosed, there is no way to delete the
files associated with the hosed database, unless we go around and find
all the table files used by all databases, then remove the ones not
referenced.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#19Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#6)
Re: pg_backup symlink?

Philip Warner writes:

Does anyone have a philosophical objection to a symlink from pg_dump to
(new) pg_backup?

Yes. The behaviour of a program should not depend on the name used to
invoke it. You can use shell aliases or scripts for that.

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#20Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#19)
Re: pg_backup symlink?

At 00:24 11/07/00 +0200, Peter Eisentraut wrote:

Philip Warner writes:

Does anyone have a philosophical objection to a symlink from pg_dump to
(new) pg_backup?

Yes. The behaviour of a program should not depend on the name used to
invoke it. You can use shell aliases or scripts for that.

OK, I suppose I was thinking of the pg_dump symlink as a tool for
compatibility.

Is there a good solution? It dumps to text for compatibility with the old
pg_dump, but I will most often use 'pg_dump -Fc --blob'. Is there a
recommended 'correct' approach?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#21Lamar Owen
lamar.owen@wgcr.org
In reply to: Philip Warner (#6)
Re: pg_backup symlink?

Philip Warner wrote:

At 00:24 11/07/00 +0200, Peter Eisentraut wrote:

Philip Warner writes:

Does anyone have a philosophical objection to a symlink from pg_dump to
(new) pg_backup?

Yes. The behaviour of a program should not depend on the name used to
invoke it. You can use shell aliases or scripts for that.

OK, I suppose I was thinking of the pg_dump symlink as a tool for
compatibility.

There is already precedent -- postmaster is a symlink to postgres, but
operates differently due to its invocation name.

--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

#22Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#20)
Re: pg_backup symlink?

At 00:24 11/07/00 +0200, Peter Eisentraut wrote:

Philip Warner writes:

Does anyone have a philosophical objection to a symlink from pg_dump to
(new) pg_backup?

Yes. The behaviour of a program should not depend on the name used to
invoke it. You can use shell aliases or scripts for that.

OK, I suppose I was thinking of the pg_dump symlink as a tool for
compatibility.

Is there a good solution? It dumps to text for compatibility with the old
pg_dump, but I will most often use 'pg_dump -Fc --blob'. Is there a
recommended 'correct' approach?

The BSD way is to define an environment variable that is used to supply
additional arguments to the command. For example, BLOCKSIZE controls if
blocks are reported in 512 or 1k sizes by commands like du.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#23The Hermit Hacker
scrappy@hub.org
In reply to: Peter Eisentraut (#19)
Re: pg_backup symlink?

On Tue, 11 Jul 2000, Peter Eisentraut wrote:

Philip Warner writes:

Does anyone have a philosophical objection to a symlink from pg_dump to
(new) pg_backup?

Yes. The behaviour of a program should not depend on the name used to
invoke it. You can use shell aliases or scripts for that.

tell that to *how many* Unix programs? :) sendmail, of course, being the
first to jump to mind ...

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Lamar Owen (#21)
Re: pg_backup symlink?

Lamar Owen <lamar.owen@wgcr.org> writes:

Yes. The behaviour of a program should not depend on the name used to
invoke it. You can use shell aliases or scripts for that.

There is already precedent -- postmaster is a symlink to postgres, but
operates differently due to its invocation name.

There are dozens of other examples in any standard Unix system. Just
to take one example, 'ls' has six different links to it on my Unix box,
and they all act differently (ie, supply different default switches to
the basic 'ls' behavior).

Peter is definitely swimming upstream if he hopes to get anyone to adopt
the above as received wisdom.

regards, tom lane

#25Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#24)
Re: pg_backup symlink?

At 18:58 10/07/00 -0400, Tom Lane wrote:

There are dozens of other examples in any standard Unix system. Just
to take one example, 'ls' has six different links to it on my Unix box,
and they all act differently (ie, supply different default switches to
the basic 'ls' behavior).

Peter is definitely swimming upstream if he hopes to get anyone to adopt
the above as received wisdom.

Does this mean that using a pg_backup symlink would be deemed acceptable?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#26The Hermit Hacker
scrappy@hub.org
In reply to: Philip Warner (#25)
Re: pg_backup symlink?

On Tue, 11 Jul 2000, Philip Warner wrote:

At 18:58 10/07/00 -0400, Tom Lane wrote:

There are dozens of other examples in any standard Unix system. Just
to take one example, 'ls' has six different links to it on my Unix box,
and they all act differently (ie, supply different default switches to
the basic 'ls' behavior).

Peter is definitely swimming upstream if he hopes to get anyone to adopt
the above as received wisdom.

Does this mean that using a pg_backup symlink would be deemed acceptable?

yes :)

both 'commands' should be documented in the man pages too ... right? :)

#27Philip Warner
pjw@rhyme.com.au
In reply to: The Hermit Hacker (#26)
Re: pg_backup symlink?

At 20:39 10/07/00 -0300, The Hermit Hacker wrote:

both 'commands' should be documented in the man pages too ... right? :)

Believe it ot not, I have actually started on this. The SGML sources are a
bit hard on the eyes, even for someone who used to use TeX. Is there a
simpler way than manually editing pg_dump.sgml?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#28Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#27)
Re: pg_backup symlink?

At 20:39 10/07/00 -0300, The Hermit Hacker wrote:

both 'commands' should be documented in the man pages too ... right? :)

Believe it ot not, I have actually started on this. The SGML sources are a
bit hard on the eyes, even for someone who used to use TeX. Is there a
simpler way than manually editing pg_dump.sgml?

Yes, hard on the eyes. No, no better way. The only suggestion I have
is to use an editor in HTML colorizer mode so the tags are colored.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#29Peter Eisentraut
peter_e@gmx.net
In reply to: The Hermit Hacker (#23)
Re: pg_backup symlink?

The Hermit Hacker writes:

tell that to *how many* Unix programs? :) sendmail, of course, being the
first to jump to mind ...

That doesn't mean it's a good idea. For one, it would prevent anyone to
install them as pg_dump71, etc., which I had hoped to offer sometime. But
I'm just one voice... If you make pg_dump a one-line shell script on the
other hand you don't hurt anyone.

Does Windows 98 have (sym)links? That's a supported client platform.

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#30Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#20)
Re: pg_backup symlink?

Philip Warner writes:

Is there a good solution? It dumps to text for compatibility with the old
pg_dump, but I will most often use 'pg_dump -Fc --blob'. Is there a
recommended 'correct' approach?

IMHO, it's a bad strategy to add symlinks as shortcuts to certain
options. Where would that ever lead? There are tons of options settings I
use "most often" in various programs, but for that you can use shells
aliases or scripts, or the program provides an environment variable for
default options.

The default behaviour of pg_dump (or pg_backup or whatever) should be to
write plain text to stdout. If you want to write format "foo", use the
-Ffoo option. If you want to dump blobs, use the --blob option. That makes
sense.

You're really trying to force certain usage patterns by labeling one
invocation "backup" and another "dump". I can foresee the user problems:
"No, you have to use pg_dump for that, not pg_backup!" -- "Don't they do
the same thing?" -- "Why aren't they the same program then?" We're still
battling that sympton in the createdb vs CREATE DATABASE case.

What's wrong with just having pg_dump, period? After all pg_dump isn't
something you use like `ls' or `cat' where every extra keystroke is a
pain.

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#31Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#30)
Re: pg_backup symlink?

At 02:23 12/07/00 +0200, Peter Eisentraut wrote:

IMHO, it's a bad strategy to add symlinks as shortcuts to certain
options. Where would that ever lead?

I suppose the glib answer is "to a more convenient and easy to use tool" 8-}.

There are tons of options settings I
use "most often" in various programs, but for that you can use shells
aliases or scripts, or the program provides an environment variable for
default options.

In this case I view pg_dump's default behaviour as an anachronism caused by
compatibility issues, not a feature. Dumping to text without blobs is like
asking ls to only list files whose names are in lower case.

The default behaviour of pg_dump (or pg_backup or whatever) should be to
write plain text to stdout. If you want to write format "foo", use the
-Ffoo option. If you want to dump blobs, use the --blob option. That makes
sense.

With a symlink, that's what you get. You will still be able to add '-Ffoo'
to pg_dump (or -Fp to pg_backup)

You're really trying to force certain usage patterns by labeling one
invocation "backup" and another "dump". I can foresee the user problems:
"No, you have to use pg_dump for that, not pg_backup!"

The actualy answer to the question is: "either use 'pg_dump -Fc --blob', or
just use pg_backup, whichever you find easiest to remember".

This works both ways: "I used pg_dump to backup my db, but it doesn't
contain the blobs" - I've certainly seen that message a few times. Both
issues are solved by documentation.

Until a scipt file can import blob data directly from stdin, a text file
can not be used to backup blobs, so the default behaviour of pg_dump is
unsuitable for backups.

We're still
battling that sympton in the createdb vs CREATE DATABASE case.

My guess is these issues were also created by legacy code.

What's wrong with just having pg_dump, period? After all pg_dump isn't
something you use like `ls' or `cat' where every extra keystroke is a
pain.

No, but for less commonly used utilities, it's probably more important to
have a simple way invoke a basic, important, function.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/